diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..6fe8203b --- /dev/null +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md @@ -0,0 +1,399 @@ + +# Cluster Deployment + +This section describes how to manually deploy an instance that includes 3 ConfigNodes and 3 DataNodes, commonly known as a 3C3D cluster. + +
+ +
+ + +## Note + +1. Before installation, ensure that the system is complete by referring to [System Requirements](./Environment-Requirements.md) + +2. It is recommended to prioritize using `hostname` for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure /etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure the `cn_internal_address` and `dn_internal_address` of IoTDB using the host name. + + ``` shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + +4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + +5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: + +- Using root user (recommended): Using root user can avoid issues such as permissions. +- Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + +6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department,The steps for deploying a monitoring panel can refer to:[Monitoring Panel Deployment](./Monitoring-panel-deployment.md) + +## Preparation Steps + +1. Prepare the IoTDB database installation package: timechodb-{version}-bin.zip(The installation package can be obtained from:[IoTDB-Package](./IoTDB-Package_timecho.md)) +2. Configure the operating system environment according to environmental requirements(The system environment configuration can be found in:[Environment Requirement](./Environment-Requirements.md)) + +## Installation Steps + +Assuming there are three Linux servers now, the IP addresses and service roles are assigned as follows: + +| Node IP | Host Name | Service | +| ------------- | --------- | -------------------- | +| 11.101.17.224 | iotdb-1 | ConfigNode、DataNode | +| 11.101.17.225 | iotdb-2 | ConfigNode、DataNode | +| 11.101.17.226 | iotdb-3 | ConfigNode、DataNode | + +### Set Host Name + +On three machines, configure the host names separately. To set the host names, configure `/etc/hosts` on the target server. Use the following command: + +```Bash +echo "11.101.17.224 iotdb-1" >> /etc/hosts +echo "11.101.17.225 iotdb-2" >> /etc/hosts +echo "11.101.17.226 iotdb-3" >> /etc/hosts +``` + +### Configuration + +Unzip the installation package and enter the installation directory + +```Plain +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +#### Environment script configuration + +- `./conf/confignode-env.sh` configuration + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | + | MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- `./conf/datanode-env.sh` configuration + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | + | MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### General Configuration(./conf/iotdb-system.properties) + +- Cluster Configuration + + | **Configuration** | **Description** | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | + | ------------------------- | ------------------------------------------------------------ | -------------- | -------------- | -------------- | + | cluster_name | Cluster Name | defaultCluster | defaultCluster | defaultCluster | + | schema_replication_factor | The number of metadata replicas, the number of DataNodes should not be less than this number | 3 | 3 | 3 | + | data_replication_factor | The number of data replicas should not be less than this number of DataNodes | 2 | 2 | 2 | + +#### ConfigNode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | Note | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | 10710 | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | 10720 | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, `cn_internal_address:cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's `cn_internal-address: cn_internal_port` | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +#### Datanode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | Note | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 127.0.0.1 | Recommend using the **IPV4 address or hostname** of the server where it is located | iotdb-1 | iotdb-2 | iotdb-3 | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | 6667 | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | 10730 | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | 10740 | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | 10750 | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | 10760 | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, i.e. `cn_internal-address: cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's cn_internal-address: cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect + +### Start ConfigNode + +Start the first confignode of IoTDB-1 first, ensuring that the seed confignode node starts first, and then start the second and third confignode nodes in sequence + +```Bash +cd sbin + +./start-confignode.sh -d #"- d" parameter will start in the background +``` + +If the startup fails, please refer to [Common Questions](#common-questions). + +### Start DataNode + + Enter the `sbin` directory of iotdb and start three datanode nodes in sequence: + +```Go +cd sbin + +./start-datanode.sh -d #"- d" parameter will start in the background +``` + +### Activate Database + +#### Method 1: Activate file copy activation + +- After starting three Confignode Datanode nodes in sequence, copy the `activation` folder of each machine and the `system_info` file of each machine to the Timecho staff; + +- The staff will return the license files for each ConfigNode Datanode node, where 3 license files will be returned; + +- Put the three license files into the `activation` folder of the corresponding ConfigNode node; + +#### Method 2: Activate Script Activation + +- Retrieve the machine codes of 3 machines in sequence and enter IoTDB CLI + + - Table Model CLI Enter Command: + + ```SQL + # Linux or MACOS + ./start-cli.sh -sql_dialect table + + # windows + ./start-cli.bat -sql_dialect table + ``` + + - Enter the tree model CLI command: + + ```SQL + # Linux or MACOS + ./start-cli.sh + + # windows + ./start-cli.bat + ``` + + - Execute the following to obtain the machine code required for activation: + - Note: Currently, activation is only supported in tree models + + ```Bash + show system info + ``` + + - The following information is displayed, which shows the machine code of one machine: + + ```Bash + +--------------------------------------------------------------+ + | SystemInfo| + +--------------------------------------------------------------+ + |01-TE5NLES4-UDDWCMYE,01-GG5NLES4-XXDWCMYE,01-FF5NLES4-WWWWCMYE| + +--------------------------------------------------------------+ + Total line number = 1 + It costs 0.030s + ``` + +- The other two nodes enter the CLI of the IoTDB tree model in sequence, execute the statement, and copy the machine codes of the three machines obtained to the Timecho staff + +- The staff will return three activation codes, which normally correspond to the order of the three machine codes provided. Please paste each activation code into the CLI separately, as prompted below: + + - Note: The activation code needs to be marked with a `'`symbol before and after, as shown in + + ```Bash + IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' + ``` + +### Verify Activation + +When the status of the 'Result' field is displayed as' success', it indicates successful activation + +![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4-%E9%AA%8C%E8%AF%81.png) + +## Node Maintenance Steps + +### ConfigNode Node Maintenance + +ConfigNode node maintenance is divided into two types of operations: adding and removing ConfigNodes, with two common use cases: + +- Cluster expansion: For example, when there is only one ConfigNode in the cluster, and you want to increase the high availability of ConfigNode nodes, you can add two ConfigNodes, making a total of three ConfigNodes in the cluster. + +- Cluster failure recovery: When the machine where a ConfigNode is located fails, making the ConfigNode unable to run normally, you can remove this ConfigNode and then add a new ConfigNode to the cluster. + +> ❗️Note, after completing ConfigNode node maintenance, you need to ensure that there are 1 or 3 ConfigNodes running normally in the cluster. Two ConfigNodes do not have high availability, and more than three ConfigNodes will lead to performance loss. + +#### Adding ConfigNode Nodes + +Script command: + +```shell +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-confignode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-confignode.bat +``` + +#### Removing ConfigNode Nodes + +First connect to the cluster through the CLI and confirm the internal address and port number of the ConfigNode you want to remove by using `show confignodes`: + +```Bash +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] + +#Windows +sbin/remove-confignode.bat [confignode_id] + +``` + +### DataNode Node Maintenance + +There are two common scenarios for DataNode node maintenance: + +- Cluster expansion: For the purpose of expanding cluster capabilities, add new DataNodes to the cluster + +- Cluster failure recovery: When a machine where a DataNode is located fails, making the DataNode unable to run normally, you can remove this DataNode and add a new DataNode to the cluster + +> ❗️Note, in order for the cluster to work normally, during the process of DataNode node maintenance and after the maintenance is completed, the total number of DataNodes running normally should not be less than the number of data replicas (usually 2), nor less than the number of metadata replicas (usually 3). + +#### Adding DataNode Nodes + +Script command: + +```Bash +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-datanode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-datanode.bat +``` + +Note: After adding a DataNode, as new writes arrive (and old data expires, if TTL is set), the cluster load will gradually balance towards the new DataNode, eventually achieving a balance of storage and computation resources on all nodes. + +#### Removing DataNode Nodes + +First connect to the cluster through the CLI and confirm the RPC address and port number of the DataNode you want to remove with `show datanodes`: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [datanode_id] + +#Windows +sbin/remove-datanode.bat [datanode_id] +``` + +## Common Questions + +1. Multiple prompts indicating activation failure during deployment process + + - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. + + - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. + +2. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` + +## Appendix + +### Introduction to Configuration Node Parameters + +| Parameter | Description | Is it required | +| :-------- | :---------------------------------------------- | :------------- | +| -d | Start in daemon mode, running in the background | No | + +### Introduction to Datanode Node Parameters + +| Abbreviation | Description | Is it required | +| :----------- | :----------------------------------------------------------- | :------------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | \ No newline at end of file diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Database-Resources.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Database-Resources.md new file mode 100644 index 00000000..59a380db --- /dev/null +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Database-Resources.md @@ -0,0 +1,194 @@ + +# Database Resources +## CPU + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Number of timeseries (frequency<=1HZ)CPUNumber of nodes
standalone modeDouble activeDistributed
Within 1000002core-4core123
Within 3000004core-8core123
Within 5000008core-26core123
Within 100000016core-32core123
Within 200000032core-48core123
Within 1000000048core12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
+ +## Memory + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Number of timeseries (frequency<=1HZ)MemoryNumber of nodes
standalone modeDouble activeDistributed
Within 1000004G-8G123
Within 30000012G-32G123
Within 50000024G-48G123
Within 100000032G-96G123
Within 200000064G-128G123
Within 10000000128G12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
+ +## Storage (Disk) +### Storage space +Calculation formula: Number of measurement points * Sampling frequency (Hz) * Size of each data point (Byte, different data types may vary, see table below) * Storage time (seconds) * Number of copies (usually 1 copy for a single node and 2 copies for a cluster) ÷ Compression ratio (can be estimated at 5-10 times, but may be higher in actual situations) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Data point size calculation
data typeTimestamp (Bytes) Value (Bytes) Total size of data points (in bytes) +
Boolean819
INT32/FLOAT8412
INT64/DOUBLE8816
TEXT8The average is a8+a
+ +Example: 1000 devices, each with 100 measurement points, a total of 100000 sequences, INT32 type. Sampling frequency 1Hz (once per second), storage for 1 year, 3 copies. +- Complete calculation formula: 1000 devices * 100 measurement points * 12 bytes per data point * 86400 seconds per day * 365 days per year * 3 copies/10 compression ratio=11T +- Simplified calculation formula: 1000 * 100 * 12 * 86400 * 365 * 3/10=11T +### Storage Configuration +If the number of nodes is over 10000000 or the query load is high, it is recommended to configure SSD +## Network (Network card) +If the write throughput does not exceed 10 million points/second, configure 1Gbps network card. When the write throughput exceeds 10 million points per second, a 10Gbps network card needs to be configured. +| **Write throughput (data points per second)** | **NIC rate** | +| ------------------- | ------------- | +| <10 million | 1Gbps | +| >=10 million | 10Gbps | +## Other instructions +IoTDB has the ability to scale up clusters in seconds, and expanding node data does not require migration. Therefore, you do not need to worry about the limited cluster capacity estimated based on existing data. In the future, you can add new nodes to the cluster when you need to scale up. \ No newline at end of file diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md new file mode 100644 index 00000000..539d03b0 --- /dev/null +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md @@ -0,0 +1,191 @@ + +# System Requirements + +## Disk Array + +### Configuration Suggestions + +IoTDB has no strict operation requirements on disk array configuration. It is recommended to use multiple disk arrays to store IoTDB data to achieve the goal of concurrent writing to multiple disk arrays. For configuration, refer to the following suggestions: + +1. Physical environment + System disk: You are advised to use two disks as Raid1, considering only the space occupied by the operating system itself, and do not reserve system disk space for the IoTDB + Data disk: + Raid is recommended to protect data on disks + It is recommended to provide multiple disks (1-6 disks) or disk groups for the IoTDB. (It is not recommended to create a disk array for all disks, as this will affect the maximum performance of the IoTDB.) +2. Virtual environment + You are advised to mount multiple hard disks (1-6 disks). + +### Configuration Example + +- Example 1: Four 3.5-inch hard disks + +Only a few hard disks are installed on the server. Configure Raid5 directly. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| ----------- | -------- | -------- | --------- | -------- | +| system/data disk | RAID5 | 4 | 1 | 3 | is allowed to fail| + +- Example 2: Twelve 3.5-inch hard disks + +The server is configured with twelve 3.5-inch disks. +Two disks are recommended as Raid1 system disks. The two data disks can be divided into two Raid5 groups. Each group of five disks can be used as four disks. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| -------- | -------- | -------- | --------- | -------- | +| system disk | RAID1 | 2 | 1 | 1 | +| data disk | RAID5 | 5 | 1 | 4 | +| data disk | RAID5 | 5 | 1 | 4 | +- Example 3:24 2.5-inch disks + +The server is configured with 24 2.5-inch disks. +Two disks are recommended as Raid1 system disks. The last two disks can be divided into three Raid5 groups. Each group of seven disks can be used as six disks. The remaining block can be idle or used to store pre-write logs. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| -------- | -------- | -------- | --------- | -------- | +| system disk | RAID1 | 2 | 1 | 1 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | NoRaid | 1 | 0 | 1 | + +## Operating System + +### Version Requirements + +IoTDB supports operating systems such as Linux, Windows, and MacOS, while the enterprise version supports domestic CPUs such as Loongson, Phytium, and Kunpeng. It also supports domestic server operating systems such as Neokylin, KylinOS, UOS, and Linx. + +### Disk Partition + +- The default standard partition mode is recommended. LVM extension and hard disk encryption are not recommended. +- The system disk needs only the space used by the operating system, and does not need to reserve space for the IoTDB. +- Each disk group corresponds to only one partition. Data disks (with multiple disk groups, corresponding to raid) do not need additional partitions. All space is used by the IoTDB. +The following table lists the recommended disk partitioning methods. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Disk classificationDisk setDriveCapacityFile system type
System diskDisk group0/boot1GBAcquiesce
/Remaining space of the disk groupAcquiesce
Data diskDisk set1/data1Full space of disk group1Acquiesce
Disk set2/data2Full space of disk group2Acquiesce
......
+### Network Configuration + +1. Disable the firewall + +```Bash +# View firewall +systemctl status firewalld +# Disable firewall +systemctl stop firewalld +# Disable firewall permanently +systemctl disable firewalld +``` +2. Ensure that the required port is not occupied + +(1) Check the ports occupied by the cluster: In the default cluster configuration, ConfigNode occupies ports 10710 and 10720, and DataNode occupies ports 6667, 10730, 10740, 10750, 10760, 9090, 9190, and 3000. Ensure that these ports are not occupied. Check methods are as follows: + +```Bash +lsof -i:6667 or netstat -tunp | grep 6667 +lsof -i:10710 or netstat -tunp | grep 10710 +lsof -i:10720 or netstat -tunp | grep 10720 +# If the command outputs, the port is occupied. +``` + +(2) Checking the port occupied by the cluster deployment tool: When using the cluster management tool opskit to install and deploy the cluster, enable the SSH remote connection service configuration and open port 22. + +```Bash +yum install openssh-server # Install the ssh service +systemctl start sshd # Enable port 22 +``` + +3. Ensure that servers are connected to each other + +### Other Configuration + +1. Disable the system swap memory + +```Bash +echo "vm.swappiness = 0">> /etc/sysctl.conf +# The swapoff -a and swapon -a commands are executed together to dump the data in swap back to memory and to empty the data in swap. +# Do not omit the swappiness setting and just execute swapoff -a; Otherwise, swap automatically opens again after the restart, making the operation invalid. +swapoff -a && swapon -a +# Make the configuration take effect without restarting. +sysctl -p +# Check memory allocation, expecting swap to be 0 +free -m +``` +2. Set the maximum number of open files to 65535 to avoid the error of "too many open files". + +```Bash +# View current restrictions +ulimit -n +# Temporary changes +ulimit -n 65535 +# Permanent modification +echo "* soft nofile 65535" >> /etc/security/limits.conf +echo "* hard nofile 65535" >> /etc/security/limits.conf +# View after exiting the current terminal session, expect to display 65535 +ulimit -n +``` +## Software Dependence + +Install the Java runtime environment (Java version >= 1.8). Ensure that jdk environment variables are set. (It is recommended to deploy JDK17 for V1.3.2.2 or later. In some scenarios, the performance of JDK of earlier versions is compromised, and Datanodes cannot be stopped.) + +```Bash +# The following is an example of installing in centos7 using JDK-17: +tar -zxvf JDk-17_linux-x64_bin.tar # Decompress the JDK file +Vim ~/.bashrc # Configure the JDK environment +{ export JAVA_HOME=/usr/lib/jvm/jdk-17.0.9 + export PATH=$JAVA_HOME/bin:$PATH +} # Add JDK environment variables +source ~/.bashrc # The configuration takes effect +java -version # Check the JDK environment +``` \ No newline at end of file diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md new file mode 100644 index 00000000..57cad838 --- /dev/null +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md @@ -0,0 +1,42 @@ + +# Obtain TimechoDB +## How to obtain TimechoDB +The enterprise version installation package can be obtained through product trial application or by directly contacting the business personnel who are in contact with you. + +## Installation Package Structure +The directory structure after unpacking the installation package is as follows: +| **catalogue** | **Type** | **Explanation** | +| :--------------: | -------- | ------------------------------------------------------------ | +| activation | folder | The directory where the activation file is located, including the generated machine code and the enterprise version activation code obtained from the business side (this directory will only be generated after starting ConfigNode to obtain the activation code) | +| conf | folder | Configuration file directory, including configuration files such as ConfigNode, DataNode, JMX, and logback | +| data | folder | The default data file directory contains data files for ConfigNode and DataNode. (The directory will only be generated after starting the program) | +| lib | folder | IoTDB executable library file directory | +| licenses | folder | Open source community certificate file directory | +| logs | folder | The default log file directory, which includes log files for ConfigNode and DataNode (this directory will only be generated after starting the program) | +| sbin | folder | Main script directory, including start, stop, and other scripts | +| tools | folder | Directory of System Peripheral Tools | +| ext | folder | Related files for pipe, trigger, and UDF plugins (created by the user when needed) | +| LICENSE | file | certificate | +| NOTICE | file | Tip | +| README_ZH\.md | file | Explanation of the Chinese version in Markdown format | +| README\.md | file | Instructions for use | +| RELEASE_NOTES\.md | file | Version Description | diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md new file mode 100644 index 00000000..4e9a50a1 --- /dev/null +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -0,0 +1,680 @@ + +# Monitoring Panel Deployment + +The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. + +## Installation Preparation + +1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain +2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain + +## Installation Steps + +### Step 1: IoTDB enables monitoring indicator collection + +1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). + +| **Configuration** | Located in the configuration file | **Description** | +| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | +| cn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | +| cn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | +| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | +| dn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | +| dn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | +| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | + +Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: + +| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | +| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | +| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | + +2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: + +```Bash +./sbin/stop-standalone.sh #Stop confignode and datanode first +./sbin/start-confignode.sh -d #Start confignode +./sbin/start-datanode.sh -d #Start datanode +``` + +3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### Step 2: Install and configure Prometheus + +> Taking Prometheus installed on server 192.168.1.3 as an example. + +1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) +2. Unzip the installation package and enter the unzipped folder: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +3. Modify the configuration. Modify the configuration file prometheus.yml as follows + 1. Add configNode task to collect monitoring data for ConfigNode + 2. Add a datanode task to collect monitoring data for DataNodes + +```YAML +global: + scrape_interval: 15s + evaluation_interval: 15s +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true +``` + +4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. + +
+ + +
+ +6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: + +![](https://alioss.timecho.com/docs/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) + +### Step 3: Install Grafana and configure the data source + +> Taking Grafana installed on server 192.168.1.3 as an example. + +1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) +2. Unzip and enter the corresponding folder + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +3. Start Grafana: + +```Shell +./bin/grafana-server web +``` + +4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. + +5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus + +![](https://alioss.timecho.com/docs/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) + +When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration + +![](https://alioss.timecho.com/docs/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) + +### Step 4: Import IoTDB Grafana Dashboards + +1. Enter Grafana and select Dashboards: + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) + +2. Click the Import button on the right side + + ![](https://alioss.timecho.com/docs/img/Import%E6%8C%89%E9%92%AE.png) + +3. Import Dashboard using upload JSON file + + ![](https://alioss.timecho.com/docs/img/%E5%AF%BC%E5%85%A5Dashboard.png) + +4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) + +5. Select Prometheus as the data source and click Import + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) + +6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF.png) + +7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: + +
+ + + +
+ +8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) + +## Appendix, Detailed Explanation of Monitoring Indicators + +### System Dashboard + +This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. + +#### CPU + +- CPU Core:CPU cores +- CPU Load: + - System CPU Load:The average CPU load and busyness of the entire system during the sampling time + - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time +- CPU Time Per Minute:The total CPU time of all processes in the system per minute + +#### Memory + +- System Memory:The current usage of system memory. + - Commited vm size: The size of virtual memory allocated by the operating system to running processes. + - Total physical memory:The total amount of available physical memory in the system. + - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. +- System Swap Memory:Swap Space memory usage. +- Process Memory:The usage of memory by the IoTDB process. + - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) + - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. + - Used Memory:The total amount of memory currently used by the IoTDB process. + +#### Disk + +- Disk Space: + - Total disk space:The maximum disk space that IoTDB can use. + - Used disk space:The disk space already used by IoTDB. +- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. +- File Count:Number of IoTDB related files + - all:All file quantities + - TsFile:Number of TsFiles + - seq:Number of sequential TsFiles + - unseq:Number of unsequence TsFiles + - wal:Number of WAL files + - cross-temp:Number of cross space merge temp files + - inner-seq-temp:Number of merged temp files in sequential space + - innser-unseq-temp:Number of merged temp files in unsequential space + - mods:Number of tombstone files +- Open File Count:Number of file handles opened by the system +- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. +- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. +- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. +- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. +- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. +- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. +- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. +- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. +- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. + +#### JVM + +- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window +- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications +- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value +- Heap Memory:JVM heap memory usage. + - Maximum heap memory:The maximum available heap memory size for the JVM. + - Committed heap memory:The size of heap memory that has been committed by the JVM. + - Used heap memory:The size of heap memory already used by the JVM. + - PS Eden Space:The size of the PS Young area. + - PS Old Space:The size of the PS Old area. + - PS Survivor Space:The size of the PS survivor area. + - ...(CMS/G1/ZGC, etc) +- Off Heap Memory:Out of heap memory usage. + - direct memory:Out of heap direct memory. + - mapped memory:Out of heap mapped memory. +- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC +- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC +- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC +- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC +- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute +- The Number of Class: + - loaded:The number of classes currently loaded by the JVM + - unloaded:The number of classes uninstalled by the JVM since system startup +- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. + +#### Network + +Eno refers to the network card connected to the public network, while lo refers to the virtual network card. + +- Net Speed:The speed of network card sending and receiving data +- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart +- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets +- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) + +### Performance Overview Dashboard + +#### Cluster Overview + +- Total CPU Core:Total CPU cores of cluster machines +- DataNode CPU Load:CPU usage of each DataNode node in the cluster +- Disk + - Total Disk Space: Total disk size of cluster machines + - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster +- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas +- Cluster: Number of ConfigNode and DataNode nodes in the cluster +- Up Time: The duration of cluster startup until now +- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas +- Memory + - Total System Memory: Total memory size of cluster machine system + - Total Swap Memory: Total size of cluster machine swap memory + - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster +- Total File Number:Total number of cluster management files +- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage +- Total DataBase: The total number of databases managed by the cluster (including replicas) +- Total DataRegion: The total number of DataRegions managed by the cluster +- Total SchemaRegion: The total number of SchemeRegions managed by the cluster + +#### Node Overview + +- CPU Core: The number of CPU cores in the machine where the node is located +- Disk Space: The disk size of the machine where the node is located +- Timeseries: Number of time series managed by the machine where the node is located (including replicas) +- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio +- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) +- System Memory: The system memory size of the machine where the node is located +- Swap Memory:The swap memory size of the machine where the node is located +- File Number: Number of files managed by nodes + +#### Performance + +- Session Idle Time:The total idle time and total busy time of the session connection of the node +- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections +- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 +- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node +- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes +- Task Number: The number of system tasks for each node +- Average Time Consumed of Task: The average time spent on various system tasks of a node +- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes +- Operation Per Second: The number of operations per second for a node +- Mainstream Process + - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process + - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node + - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process +- Schedule Stage + - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage + - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage + - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node +- Local Schedule Sub Stages + - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node + - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node + - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node +- Storage Stage + - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage + - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage + - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage +- Engine Stage + - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage + - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node + - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage + +#### System + +- CPU Load: CPU load of nodes +- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores +- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC +- Heap Memory: Node's heap memory usage +- Off Heap Memory: Non heap memory usage of nodes +- The Number Of Java Thread: Number of Java threads on nodes +- File Count:Number of files managed by nodes +- File Size: Node management file size situation +- Log Number Per Minute: Different types of logs per minute for nodes + +### ConfigNode Dashboard + +This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. + +#### Node Overview + +- Database Count: Number of databases for nodes +- Region + - DataRegion Count:Number of DataRegions for nodes + - DataRegion Current Status: The state of the DataRegion of the node + - SchemaRegion Count: Number of SchemeRegions for nodes + - SchemaRegion Current Status: The state of the SchemeRegion of the node +- System Memory: The system memory size of the node +- Swap Memory: Node's swap memory size +- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located +- DataNodes:The DataNode situation of the cluster where the node is located +- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load + +#### NodeInfo + +- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode +- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located +- DataNode Status: The status of the DataNode node in the cluster where the node is located +- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located +- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located +- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located +- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located + +#### Protocol + +- Client Count + - Active Client Num: The number of active clients in each thread pool of a node + - Idle Client Num: The number of idle clients in each thread pool of a node + - Borrowed Client Count: Number of borrowed clients in each thread pool of the node + - Created Client Count: Number of created clients for each thread pool of the node + - Destroyed Client Count: The number of destroyed clients in each thread pool of the node +- Client time situation + - Client Mean Active Time: The average active time of clients in each thread pool of a node + - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node + - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + +#### Partition Table + +- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located +- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located +- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located +- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located +- DataRegion Status: The DataRegion status of the cluster where the node is located +- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located + +#### Consensus + +- Ratis Stage Time: The time consumption of each stage of the node's Ratis +- Write Log Entry: The time required to write a log for the Ratis of a node +- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes +- Remote / Local Write QPS: Remote and local QPS written to node Ratis +- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol + +### DataNode Dashboard + +This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. + +#### Node Overview + +- The Number Of Entity: Entity situation of node management +- Write Point Per Second: The write speed per second of the node +- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. + +#### Protocol + +- Node Operation Time Consumption + - The Time Consumed Of Operation (avg): The average time spent on various operations of a node + - The Time Consumed Of Operation (50%): The median time spent on various operations of a node + - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes +- Thrift Statistics + - The QPS Of Interface: QPS of various Thrift interfaces of nodes + - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node + - Thrift Connection: The number of Thrfit connections of each type of node + - Thrift Active Thread: The number of active Thrift connections for each type of node +- Client Statistics + - Active Client Num: The number of active clients in each thread pool of a node + - Idle Client Num: The number of idle clients in each thread pool of a node + - Borrowed Client Count:Number of borrowed clients for each thread pool of a node + - Created Client Count: Number of created clients for each thread pool of the node + - Destroyed Client Count: The number of destroyed clients in each thread pool of the node + - Client Mean Active Time: The average active time of clients in each thread pool of a node + - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node + - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + +#### Storage Engine + +- File Count: Number of files of various types managed by nodes +- File Size: Node management of various types of file sizes +- TsFile + - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management + - TsFile Count In Each Level: Number of TsFile files at each level of node management + - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management +- Task Number: Number of Tasks for Nodes +- The Time Consumed of Task: The time consumption of tasks for nodes +- Compaction + - Compaction Read And Write Per Second: The merge read and write speed of nodes per second + - Compaction Number Per Minute: The number of merged nodes per minute + - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes + - Compacted Point Num Per Minute: The number of merged nodes per minute + +#### Write Performance + +- Write Cost(avg): Average node write time, including writing wal and memtable +- Write Cost(50%): Median node write time, including writing wal and memtable +- Write Cost(99%): P99 for node write time, including writing wal and memtable +- WAL + - WAL File Size: Total size of WAL files managed by nodes + - WAL File Num:Number of WAL files managed by nodes + - WAL Nodes Num: Number of WAL nodes managed by nodes + - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes + - WAL Serialize Total Cost: Total time spent on node WAL serialization + - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster + - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry + - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot + - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush + - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes + - WAL Buffer + - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options + - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node + - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node +- Flush Statistics + - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage + - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage + - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage + - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages + - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages + - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages +- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node +- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes +- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable +- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions +- Size Of Flushing MemTable: The size of the Memtable for node disk flushing +- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node +- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node +- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk + +#### Schema Engine + +- Schema Engine Mode: The metadata engine pattern of nodes +- Schema Consensus Protocol: Node metadata consensus protocol +- Schema Region Number:Number of SchemeRegions managed by nodes +- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node +- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion +- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node +- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) +- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node +- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node +- Time Series statistics + - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion + - Series Type: Number of time series of different types of nodes + - Time Series Number: The total number of time series nodes + - Template Series Number: The total number of template time series for nodes + - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node +- IMNode Statistics + - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion + - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node + - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node + - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node + - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes + - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second +- Cache Hit Rate: Cache hit rate of nodes +- Release and Flush Thread Number: The current number of active Release and Flush threads on the node +- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing +- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing + +#### Query Engine + +- Time Consumption In Each Stage + - The time consumed of query plan stages(avg): The average time spent on node queries at each stage + - The time consumed of query plan stages(50%): Median time spent on node queries at each stage + - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage +- Execution Plan Distribution Time + - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution + - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution + - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time +- Execution Plan Execution Time + - The time consumed of query execution stages(avg): The average execution time of node query execution plan + - The time consumed of query execution stages(50%):Median execution time of node query execution plan + - The time consumed of query execution stages(99%): P99 of node query execution plan execution time +- Operator Execution Time + - The time consumed of operator execution stages(avg): The average execution time of node query operators + - The time consumed of operator execution(50%): Median execution time of node query operator + - The time consumed of operator execution(99%): P99 of node query operator execution time +- Aggregation Query Computation Time + - The time consumed of query aggregation(avg): The average computation time for node aggregation queries + - The time consumed of query aggregation(50%): Median computation time for node aggregation queries + - The time consumed of query aggregation(99%): P99 of node aggregation query computation time +- File/Memory Interface Time Consumption + - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes + - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes + - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface +- Number Of Resource Visits + - The usage of query resource(avg): The average number of resource visits for node queries + - The usage of query resource(50%): Median number of resource visits for node queries + - The usage of query resource(99%): P99 for node query resource access quantity +- Data Transmission Time + - The time consumed of query data exchange(avg): The average time spent on node query data transmission + - The time consumed of query data exchange(50%): Median query data transmission time for nodes + - The time consumed of query data exchange(99%): P99 for node query data transmission time +- Number Of Data Transfers + - The count of Data Exchange(avg): The average number of data transfers queried by nodes + - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 +- Task Scheduling Quantity And Time Consumption + - The number of query queue: Node query task scheduling quantity + - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks + - The time consumed of query schedule time(50%): Median time spent on node query task scheduling + - The time consumed of query schedule time(99%): P99 of node query task scheduling time + +#### Query Interface + +- Load Time Series Metadata + - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata + - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries + - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata +- Read Time Series + - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series + - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series + - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series +- Modify Time Series Metadata + - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata + - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes + - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata +- Load Chunk Metadata List + - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists + - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list + - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list +- Modify Chunk Metadata + - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata + - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries + - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata +- Filter According To Chunk Metadata + - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata + - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata + - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata +- Constructing Chunk Reader + - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries + - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries + - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries +- Read Chunk + - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks + - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks + - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes +- Initialize Chunk Reader + - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries + - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries + - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries +- Constructing TsBlock Through Page Reader + - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader + - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries + - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 +- Query the construction of TsBlock through Merge Reader + - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader + - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries + - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 + +#### Query Data Exchange + +The data exchange for the query is time-consuming. + +- Obtain TsBlock through source handle + - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle + - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle + - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle +- Deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query +- Send TsBlock through sink handle + - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle + - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle + - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 +- Callback data block event + - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event + - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event + - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event +- Get Data Block Tasks + - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks + - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks + - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task + +#### Query Related Resource + +- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries +- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards +- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running +- Coordinator: The number of queries recorded on the node +- MemoryPool Size: Node query related memory pool situation +- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values +- DriverScheduler: Number of queue tasks related to node queries + +#### Consensus - IoT Consensus + +- Memory Usage + - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage +- Synchronization Status Between Nodes + - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes + - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes + - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes + - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node + - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption + - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption + - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions + - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue +- Different Execution Stages Take Time + - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus + - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus + - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Stage Time: The time consumption of different stages of node Ratis +- Write Log Entry: The time consumption of writing logs at different stages of node Ratis +- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely +- Remote / Local Write QPS: QPS written by node Ratis locally or remotely +- RatisConsensus Memory:Memory usage of node Ratis + +#### Consensus - SchemaRegion Ratis Consensus + +- Ratis Stage Time: The time consumption of different stages of node Ratis +- Write Log Entry: The time consumption for writing logs at each stage of node Ratis +- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely +- Remote / Local Write QPS: QPS written by node Ratis locally or remotely +- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/Master/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md new file mode 100644 index 00000000..97164426 --- /dev/null +++ b/src/UserGuide/Master/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md @@ -0,0 +1,266 @@ + +# Stand-Alone Deployment + +This chapter will introduce how to start an IoTDB standalone instance, which includes 1 ConfigNode and 1 DataNode (commonly known as 1C1D). + +## Note + +1. Before installation, ensure that the system is complete by referring to [System Requirements](./Environment-Requirements.md). + + 2. It is recommended to prioritize using 'hostname' for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure/etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure IoTDB's' cn_internal-address' using the host name dn_internal_address、dn_rpc_address。 + + ```shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + + 3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + + 4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + + 5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: + + - Using root user (recommended): Using root user can avoid issues such as permissions. + - Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + + 6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department, and the steps for deploying the monitoring panel can be referred to:[Monitoring Board Install and Deploy](./Monitoring-panel-deployment.md). + +## Installation Steps + +### 1、Unzip the installation package and enter the installation directory + +```Plain +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +### 2、Parameter Configuration + +#### Memory Configuration + +- conf/confignode-env.sh(or .bat) + + | **Configuration** | **Description** | **Default** | **Recommended value** | Note | + | :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | + | MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- conf/datanode-env.sh(or .bat) + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | + | MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### Function Configuration + +The parameters that actually take effect in the system are in the file conf/iotdb-system.exe. To start, the following parameters need to be set, which can be viewed in the conf/iotdb-system.exe file for all parameters + +Cluster function configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------------: | :----------------------------------------------------------: | :------------: | :----------------------------------------------------------: | :---------------------------------------------------: | +| cluster_name | Cluster Name | defaultCluster | The cluster name can be set as needed, and if there are no special needs, the default can be kept | Cannot be modified after initial startup | +| schema_replication_factor | Number of metadata replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | +| data_replication_factor | Number of data replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | + +ConfigNode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------: | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +DataNode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | +| :------------------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------------------- | :--------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 0.0.0.0 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The ConfigNode address that the node connects to when registering to join the cluster, i.e. cn_internal-address: cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +### 3、Start ConfigNode + +Enter the sbin directory of iotdb and start confignode + +```shell + +./start-confignode.sh -d #The "- d" parameter will start in the background + +``` + +If the startup fails, please refer to [Common Problem](#common-problem). + +### 4、Start DataNode + + Enter the sbin directory of iotdb and start datanode: + +```shell + +cd sbin + +./start-datanode.sh -d # The "- d" parameter will start in the background + +``` + +### 5、Activate Database + +#### Method 1: Activate file copy activation + +- After starting the confignode datanode node, enter the activation folder and copy the systeminfo file to the Timecho staff + +- Received the license file returned by the staff + +- Place the license file in the activation folder of the corresponding node; + +#### Method 2: Activate Script Activation + +- Retrieve the machine codes of 3 machines in sequence and enter IoTDB CLI + + - Table Model CLI Enter Command: + + ```SQL + # Linux or MACOS + ./start-cli.sh -sql_dialect table + + # windows + ./start-cli.bat -sql_dialect table + ``` + + - Enter the tree model CLI command: + + ```SQL + # Linux or MACOS + ./start-cli.sh + + # windows + ./start-cli.bat + ``` + +- Execute the following to obtain the machine code required for activation: + - : Currently, activation is only supported in tree models + + ```Bash + + show system info + + ``` + +- The following information is displayed, which shows the machine code of one machine: + +```Bash ++--------------------------------------------------------------+ +| SystemInfo| ++--------------------------------------------------------------+ +| 01-TE5NLES4-UDDWCMYE| ++--------------------------------------------------------------+ +Total line number = 1 +It costs 0.030s +``` + +- Enter the activation code returned by the staff into the CLI and enter the following content + - Note: The activation code needs to be marked with a `'`symbol before and after, as shown in + +```Bash +IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' +``` + +### 6、Verify Activation + +When the "ClusterActivation Status" field is displayed as Activated, it indicates successful activation + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81.png) + +## Common Problem + +1. Multiple prompts indicating activation failure during deployment process + +​ - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. + +​ - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. + +2. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` + +## Appendix + +### Introduction to Configuration Node Parameters + +| Parameter | Description | Is it required | +| :-------- | :---------------------------------------------- | :----------------- | +| -d | Start in daemon mode, running in the background | No | + +### Introduction to Datanode Node Parameters + +| Abbreviation | Description | Is it required | +| :----------- | :----------------------------------------------------------- | :------------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | + diff --git a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md b/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md index 539d03b0..a1b54472 100644 --- a/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md @@ -150,7 +150,7 @@ systemctl start sshd # Enable port 22 ### Other Configuration -1. Disable the system swap memory +1. Reduce the system swap priority to the lowest level ```Bash echo "vm.swappiness = 0">> /etc/sysctl.conf @@ -159,7 +159,7 @@ echo "vm.swappiness = 0">> /etc/sysctl.conf swapoff -a && swapon -a # Make the configuration take effect without restarting. sysctl -p -# Check memory allocation, expecting swap to be 0 +# Swap's used memory has become 0 free -m ``` 2. Set the maximum number of open files to 65535 to avoid the error of "too many open files". diff --git a/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..2b4e70e0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md @@ -0,0 +1,400 @@ + +# Cluster Deployment + +This section describes how to manually deploy an instance that includes 3 ConfigNodes and 3 DataNodes, commonly known as a 3C3D cluster. + +
+ +
+ + +## Note + +1. Before installation, ensure that the system is complete by referring to [System Requirements](./Environment-Requirements.md) + +2. It is recommended to prioritize using `hostname` for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure /etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure the `cn_internal_address` and `dn_internal_address` of IoTDB using the host name. + + ``` shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + +4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + +5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: + +- Using root user (recommended): Using root user can avoid issues such as permissions. +- Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + +6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department,The steps for deploying a monitoring panel can refer to:[Monitoring Panel Deployment](./Monitoring-panel-deployment.md) + +## Preparation Steps + +1. Prepare the IoTDB database installation package: timechodb-{version}-bin.zip(The installation package can be obtained from:[IoTDB-Package](./IoTDB-Package_timecho.md)) +2. Configure the operating system environment according to environmental requirements(The system environment configuration can be found in:[Environment Requirement](./Environment-Requirements.md)) + +## Installation Steps + +Assuming there are three Linux servers now, the IP addresses and service roles are assigned as follows: + +| Node IP | Host Name | Service | +| ------------- | --------- | -------------------- | +| 11.101.17.224 | iotdb-1 | ConfigNode、DataNode | +| 11.101.17.225 | iotdb-2 | ConfigNode、DataNode | +| 11.101.17.226 | iotdb-3 | ConfigNode、DataNode | + +### Set Host Name + +On three machines, configure the host names separately. To set the host names, configure `/etc/hosts` on the target server. Use the following command: + +```Bash +echo "11.101.17.224 iotdb-1" >> /etc/hosts +echo "11.101.17.225 iotdb-2" >> /etc/hosts +echo "11.101.17.226 iotdb-3" >> /etc/hosts +``` + +### Configuration + +Unzip the installation package and enter the installation directory + +```Plain +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +#### Environment script configuration + +- `./conf/confignode-env.sh` configuration + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | + | MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- `./conf/datanode-env.sh` configuration + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | + | MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### General Configuration(./conf/iotdb-system.properties) + +- Cluster Configuration + + | **Configuration** | **Description** | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | + | ------------------------- | ------------------------------------------------------------ | -------------- | -------------- | -------------- | + | cluster_name | Cluster Name | defaultCluster | defaultCluster | defaultCluster | + | schema_replication_factor | The number of metadata replicas, the number of DataNodes should not be less than this number | 3 | 3 | 3 | + | data_replication_factor | The number of data replicas should not be less than this number of DataNodes | 2 | 2 | 2 | + +#### ConfigNode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | Note | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | 10710 | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | 10720 | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, `cn_internal_address:cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's `cn_internal-address: cn_internal_port` | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +#### Datanode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | Note | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 127.0.0.1 | Recommend using the **IPV4 address or hostname** of the server where it is located | iotdb-1 | iotdb-2 | iotdb-3 | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | 6667 | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | 10730 | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | 10740 | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | 10750 | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | 10760 | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, i.e. `cn_internal-address: cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's cn_internal-address: cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect + +### Start ConfigNode + +Start the first confignode of IoTDB-1 first, ensuring that the seed confignode node starts first, and then start the second and third confignode nodes in sequence + +```Bash +cd sbin + +./start-confignode.sh -d #"- d" parameter will start in the background +``` + +If the startup fails, please refer to [Common Questions](#common-questions). + +### Start DataNode + + Enter the `sbin` directory of iotdb and start three datanode nodes in sequence: + +```Go +cd sbin + +./start-datanode.sh -d #"- d" parameter will start in the background +``` + +### Activate Database + +#### Method 1: Activate file copy activation + +- After starting three Confignode Datanode nodes in sequence, copy the `activation` folder of each machine and the `system_info` file of each machine to the Timecho staff; + +- The staff will return the license files for each ConfigNode Datanode node, where 3 license files will be returned; + +- Put the three license files into the `activation` folder of the corresponding ConfigNode node; + +#### Method 2: Activate Script Activation + +- Retrieve the machine codes of 3 machines in sequence and enter IoTDB CLI + + - Table Model CLI Enter Command: + + ```SQL + # Linux or MACOS + ./start-cli.sh -sql_dialect table + + # windows + ./start-cli.bat -sql_dialect table + ``` + + - Enter the tree model CLI command: + + ```SQL + # Linux or MACOS + ./start-cli.sh + + # windows + ./start-cli.bat + ``` + + - Execute the following to obtain the machine code required for activation: + - Note: Currently, activation is only supported in tree models + + + ```Bash + show system info + ``` + + - The following information is displayed, which shows the machine code of one machine: + + ```Bash + +--------------------------------------------------------------+ + | SystemInfo| + +--------------------------------------------------------------+ + |01-TE5NLES4-UDDWCMYE,01-GG5NLES4-XXDWCMYE,01-FF5NLES4-WWWWCMYE| + +--------------------------------------------------------------+ + Total line number = 1 + It costs 0.030s + ``` + +- The other two nodes enter the CLI of the IoTDB tree model in sequence, execute the statement, and copy the machine codes of the three machines obtained to the Timecho staff + +- The staff will return three activation codes, which normally correspond to the order of the three machine codes provided. Please paste each activation code into the CLI separately, as prompted below: + + - Note: The activation code needs to be marked with a `'`symbol before and after, as shown in + + ```Bash + IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' + ``` + +### Verify Activation + +When the status of the 'Result' field is displayed as' success', it indicates successful activation + +![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4-%E9%AA%8C%E8%AF%81.png) + +## Node Maintenance Steps + +### ConfigNode Node Maintenance + +ConfigNode node maintenance is divided into two types of operations: adding and removing ConfigNodes, with two common use cases: + +- Cluster expansion: For example, when there is only one ConfigNode in the cluster, and you want to increase the high availability of ConfigNode nodes, you can add two ConfigNodes, making a total of three ConfigNodes in the cluster. + +- Cluster failure recovery: When the machine where a ConfigNode is located fails, making the ConfigNode unable to run normally, you can remove this ConfigNode and then add a new ConfigNode to the cluster. + +> ❗️Note, after completing ConfigNode node maintenance, you need to ensure that there are 1 or 3 ConfigNodes running normally in the cluster. Two ConfigNodes do not have high availability, and more than three ConfigNodes will lead to performance loss. + +#### Adding ConfigNode Nodes + +Script command: + +```shell +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-confignode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-confignode.bat +``` + +#### Removing ConfigNode Nodes + +First connect to the cluster through the CLI and confirm the internal address and port number of the ConfigNode you want to remove by using `show confignodes`: + +```Bash +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] + +#Windows +sbin/remove-confignode.bat [confignode_id] + +``` + +### DataNode Node Maintenance + +There are two common scenarios for DataNode node maintenance: + +- Cluster expansion: For the purpose of expanding cluster capabilities, add new DataNodes to the cluster + +- Cluster failure recovery: When a machine where a DataNode is located fails, making the DataNode unable to run normally, you can remove this DataNode and add a new DataNode to the cluster + +> ❗️Note, in order for the cluster to work normally, during the process of DataNode node maintenance and after the maintenance is completed, the total number of DataNodes running normally should not be less than the number of data replicas (usually 2), nor less than the number of metadata replicas (usually 3). + +#### Adding DataNode Nodes + +Script command: + +```Bash +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-datanode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-datanode.bat +``` + +Note: After adding a DataNode, as new writes arrive (and old data expires, if TTL is set), the cluster load will gradually balance towards the new DataNode, eventually achieving a balance of storage and computation resources on all nodes. + +#### Removing DataNode Nodes + +First connect to the cluster through the CLI and confirm the RPC address and port number of the DataNode you want to remove with `show datanodes`: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [datanode_id] + +#Windows +sbin/remove-datanode.bat [datanode_id] +``` + +## Common Questions + +1. Multiple prompts indicating activation failure during deployment process + + - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. + + - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. + +2. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` + +## Appendix + +### Introduction to Configuration Node Parameters + +| Parameter | Description | Is it required | +| :-------- | :---------------------------------------------- | :------------- | +| -d | Start in daemon mode, running in the background | No | + +### Introduction to Datanode Node Parameters + +| Abbreviation | Description | Is it required | +| :----------- | :----------------------------------------------------------- | :------------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Database-Resources.md b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Database-Resources.md new file mode 100644 index 00000000..59a380db --- /dev/null +++ b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Database-Resources.md @@ -0,0 +1,194 @@ + +# Database Resources +## CPU + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Number of timeseries (frequency<=1HZ)CPUNumber of nodes
standalone modeDouble activeDistributed
Within 1000002core-4core123
Within 3000004core-8core123
Within 5000008core-26core123
Within 100000016core-32core123
Within 200000032core-48core123
Within 1000000048core12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
+ +## Memory + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Number of timeseries (frequency<=1HZ)MemoryNumber of nodes
standalone modeDouble activeDistributed
Within 1000004G-8G123
Within 30000012G-32G123
Within 50000024G-48G123
Within 100000032G-96G123
Within 200000064G-128G123
Within 10000000128G12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
+ +## Storage (Disk) +### Storage space +Calculation formula: Number of measurement points * Sampling frequency (Hz) * Size of each data point (Byte, different data types may vary, see table below) * Storage time (seconds) * Number of copies (usually 1 copy for a single node and 2 copies for a cluster) ÷ Compression ratio (can be estimated at 5-10 times, but may be higher in actual situations) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Data point size calculation
data typeTimestamp (Bytes) Value (Bytes) Total size of data points (in bytes) +
Boolean819
INT32/FLOAT8412
INT64/DOUBLE8816
TEXT8The average is a8+a
+ +Example: 1000 devices, each with 100 measurement points, a total of 100000 sequences, INT32 type. Sampling frequency 1Hz (once per second), storage for 1 year, 3 copies. +- Complete calculation formula: 1000 devices * 100 measurement points * 12 bytes per data point * 86400 seconds per day * 365 days per year * 3 copies/10 compression ratio=11T +- Simplified calculation formula: 1000 * 100 * 12 * 86400 * 365 * 3/10=11T +### Storage Configuration +If the number of nodes is over 10000000 or the query load is high, it is recommended to configure SSD +## Network (Network card) +If the write throughput does not exceed 10 million points/second, configure 1Gbps network card. When the write throughput exceeds 10 million points per second, a 10Gbps network card needs to be configured. +| **Write throughput (data points per second)** | **NIC rate** | +| ------------------- | ------------- | +| <10 million | 1Gbps | +| >=10 million | 10Gbps | +## Other instructions +IoTDB has the ability to scale up clusters in seconds, and expanding node data does not require migration. Therefore, you do not need to worry about the limited cluster capacity estimated based on existing data. In the future, you can add new nodes to the cluster when you need to scale up. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Environment-Requirements.md b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Environment-Requirements.md new file mode 100644 index 00000000..539d03b0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Environment-Requirements.md @@ -0,0 +1,191 @@ + +# System Requirements + +## Disk Array + +### Configuration Suggestions + +IoTDB has no strict operation requirements on disk array configuration. It is recommended to use multiple disk arrays to store IoTDB data to achieve the goal of concurrent writing to multiple disk arrays. For configuration, refer to the following suggestions: + +1. Physical environment + System disk: You are advised to use two disks as Raid1, considering only the space occupied by the operating system itself, and do not reserve system disk space for the IoTDB + Data disk: + Raid is recommended to protect data on disks + It is recommended to provide multiple disks (1-6 disks) or disk groups for the IoTDB. (It is not recommended to create a disk array for all disks, as this will affect the maximum performance of the IoTDB.) +2. Virtual environment + You are advised to mount multiple hard disks (1-6 disks). + +### Configuration Example + +- Example 1: Four 3.5-inch hard disks + +Only a few hard disks are installed on the server. Configure Raid5 directly. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| ----------- | -------- | -------- | --------- | -------- | +| system/data disk | RAID5 | 4 | 1 | 3 | is allowed to fail| + +- Example 2: Twelve 3.5-inch hard disks + +The server is configured with twelve 3.5-inch disks. +Two disks are recommended as Raid1 system disks. The two data disks can be divided into two Raid5 groups. Each group of five disks can be used as four disks. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| -------- | -------- | -------- | --------- | -------- | +| system disk | RAID1 | 2 | 1 | 1 | +| data disk | RAID5 | 5 | 1 | 4 | +| data disk | RAID5 | 5 | 1 | 4 | +- Example 3:24 2.5-inch disks + +The server is configured with 24 2.5-inch disks. +Two disks are recommended as Raid1 system disks. The last two disks can be divided into three Raid5 groups. Each group of seven disks can be used as six disks. The remaining block can be idle or used to store pre-write logs. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| -------- | -------- | -------- | --------- | -------- | +| system disk | RAID1 | 2 | 1 | 1 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | NoRaid | 1 | 0 | 1 | + +## Operating System + +### Version Requirements + +IoTDB supports operating systems such as Linux, Windows, and MacOS, while the enterprise version supports domestic CPUs such as Loongson, Phytium, and Kunpeng. It also supports domestic server operating systems such as Neokylin, KylinOS, UOS, and Linx. + +### Disk Partition + +- The default standard partition mode is recommended. LVM extension and hard disk encryption are not recommended. +- The system disk needs only the space used by the operating system, and does not need to reserve space for the IoTDB. +- Each disk group corresponds to only one partition. Data disks (with multiple disk groups, corresponding to raid) do not need additional partitions. All space is used by the IoTDB. +The following table lists the recommended disk partitioning methods. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Disk classificationDisk setDriveCapacityFile system type
System diskDisk group0/boot1GBAcquiesce
/Remaining space of the disk groupAcquiesce
Data diskDisk set1/data1Full space of disk group1Acquiesce
Disk set2/data2Full space of disk group2Acquiesce
......
+### Network Configuration + +1. Disable the firewall + +```Bash +# View firewall +systemctl status firewalld +# Disable firewall +systemctl stop firewalld +# Disable firewall permanently +systemctl disable firewalld +``` +2. Ensure that the required port is not occupied + +(1) Check the ports occupied by the cluster: In the default cluster configuration, ConfigNode occupies ports 10710 and 10720, and DataNode occupies ports 6667, 10730, 10740, 10750, 10760, 9090, 9190, and 3000. Ensure that these ports are not occupied. Check methods are as follows: + +```Bash +lsof -i:6667 or netstat -tunp | grep 6667 +lsof -i:10710 or netstat -tunp | grep 10710 +lsof -i:10720 or netstat -tunp | grep 10720 +# If the command outputs, the port is occupied. +``` + +(2) Checking the port occupied by the cluster deployment tool: When using the cluster management tool opskit to install and deploy the cluster, enable the SSH remote connection service configuration and open port 22. + +```Bash +yum install openssh-server # Install the ssh service +systemctl start sshd # Enable port 22 +``` + +3. Ensure that servers are connected to each other + +### Other Configuration + +1. Disable the system swap memory + +```Bash +echo "vm.swappiness = 0">> /etc/sysctl.conf +# The swapoff -a and swapon -a commands are executed together to dump the data in swap back to memory and to empty the data in swap. +# Do not omit the swappiness setting and just execute swapoff -a; Otherwise, swap automatically opens again after the restart, making the operation invalid. +swapoff -a && swapon -a +# Make the configuration take effect without restarting. +sysctl -p +# Check memory allocation, expecting swap to be 0 +free -m +``` +2. Set the maximum number of open files to 65535 to avoid the error of "too many open files". + +```Bash +# View current restrictions +ulimit -n +# Temporary changes +ulimit -n 65535 +# Permanent modification +echo "* soft nofile 65535" >> /etc/security/limits.conf +echo "* hard nofile 65535" >> /etc/security/limits.conf +# View after exiting the current terminal session, expect to display 65535 +ulimit -n +``` +## Software Dependence + +Install the Java runtime environment (Java version >= 1.8). Ensure that jdk environment variables are set. (It is recommended to deploy JDK17 for V1.3.2.2 or later. In some scenarios, the performance of JDK of earlier versions is compromised, and Datanodes cannot be stopped.) + +```Bash +# The following is an example of installing in centos7 using JDK-17: +tar -zxvf JDk-17_linux-x64_bin.tar # Decompress the JDK file +Vim ~/.bashrc # Configure the JDK environment +{ export JAVA_HOME=/usr/lib/jvm/jdk-17.0.9 + export PATH=$JAVA_HOME/bin:$PATH +} # Add JDK environment variables +source ~/.bashrc # The configuration takes effect +java -version # Check the JDK environment +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md new file mode 100644 index 00000000..57cad838 --- /dev/null +++ b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md @@ -0,0 +1,42 @@ + +# Obtain TimechoDB +## How to obtain TimechoDB +The enterprise version installation package can be obtained through product trial application or by directly contacting the business personnel who are in contact with you. + +## Installation Package Structure +The directory structure after unpacking the installation package is as follows: +| **catalogue** | **Type** | **Explanation** | +| :--------------: | -------- | ------------------------------------------------------------ | +| activation | folder | The directory where the activation file is located, including the generated machine code and the enterprise version activation code obtained from the business side (this directory will only be generated after starting ConfigNode to obtain the activation code) | +| conf | folder | Configuration file directory, including configuration files such as ConfigNode, DataNode, JMX, and logback | +| data | folder | The default data file directory contains data files for ConfigNode and DataNode. (The directory will only be generated after starting the program) | +| lib | folder | IoTDB executable library file directory | +| licenses | folder | Open source community certificate file directory | +| logs | folder | The default log file directory, which includes log files for ConfigNode and DataNode (this directory will only be generated after starting the program) | +| sbin | folder | Main script directory, including start, stop, and other scripts | +| tools | folder | Directory of System Peripheral Tools | +| ext | folder | Related files for pipe, trigger, and UDF plugins (created by the user when needed) | +| LICENSE | file | certificate | +| NOTICE | file | Tip | +| README_ZH\.md | file | Explanation of the Chinese version in Markdown format | +| README\.md | file | Instructions for use | +| RELEASE_NOTES\.md | file | Version Description | diff --git a/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md new file mode 100644 index 00000000..4e9a50a1 --- /dev/null +++ b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -0,0 +1,680 @@ + +# Monitoring Panel Deployment + +The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. + +## Installation Preparation + +1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain +2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain + +## Installation Steps + +### Step 1: IoTDB enables monitoring indicator collection + +1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). + +| **Configuration** | Located in the configuration file | **Description** | +| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | +| cn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | +| cn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | +| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | +| dn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | +| dn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | +| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | + +Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: + +| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | +| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | +| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | + +2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: + +```Bash +./sbin/stop-standalone.sh #Stop confignode and datanode first +./sbin/start-confignode.sh -d #Start confignode +./sbin/start-datanode.sh -d #Start datanode +``` + +3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### Step 2: Install and configure Prometheus + +> Taking Prometheus installed on server 192.168.1.3 as an example. + +1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) +2. Unzip the installation package and enter the unzipped folder: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +3. Modify the configuration. Modify the configuration file prometheus.yml as follows + 1. Add configNode task to collect monitoring data for ConfigNode + 2. Add a datanode task to collect monitoring data for DataNodes + +```YAML +global: + scrape_interval: 15s + evaluation_interval: 15s +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true +``` + +4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. + +
+ + +
+ +6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: + +![](https://alioss.timecho.com/docs/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) + +### Step 3: Install Grafana and configure the data source + +> Taking Grafana installed on server 192.168.1.3 as an example. + +1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) +2. Unzip and enter the corresponding folder + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +3. Start Grafana: + +```Shell +./bin/grafana-server web +``` + +4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. + +5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus + +![](https://alioss.timecho.com/docs/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) + +When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration + +![](https://alioss.timecho.com/docs/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) + +### Step 4: Import IoTDB Grafana Dashboards + +1. Enter Grafana and select Dashboards: + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) + +2. Click the Import button on the right side + + ![](https://alioss.timecho.com/docs/img/Import%E6%8C%89%E9%92%AE.png) + +3. Import Dashboard using upload JSON file + + ![](https://alioss.timecho.com/docs/img/%E5%AF%BC%E5%85%A5Dashboard.png) + +4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) + +5. Select Prometheus as the data source and click Import + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) + +6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF.png) + +7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: + +
+ + + +
+ +8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) + +## Appendix, Detailed Explanation of Monitoring Indicators + +### System Dashboard + +This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. + +#### CPU + +- CPU Core:CPU cores +- CPU Load: + - System CPU Load:The average CPU load and busyness of the entire system during the sampling time + - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time +- CPU Time Per Minute:The total CPU time of all processes in the system per minute + +#### Memory + +- System Memory:The current usage of system memory. + - Commited vm size: The size of virtual memory allocated by the operating system to running processes. + - Total physical memory:The total amount of available physical memory in the system. + - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. +- System Swap Memory:Swap Space memory usage. +- Process Memory:The usage of memory by the IoTDB process. + - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) + - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. + - Used Memory:The total amount of memory currently used by the IoTDB process. + +#### Disk + +- Disk Space: + - Total disk space:The maximum disk space that IoTDB can use. + - Used disk space:The disk space already used by IoTDB. +- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. +- File Count:Number of IoTDB related files + - all:All file quantities + - TsFile:Number of TsFiles + - seq:Number of sequential TsFiles + - unseq:Number of unsequence TsFiles + - wal:Number of WAL files + - cross-temp:Number of cross space merge temp files + - inner-seq-temp:Number of merged temp files in sequential space + - innser-unseq-temp:Number of merged temp files in unsequential space + - mods:Number of tombstone files +- Open File Count:Number of file handles opened by the system +- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. +- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. +- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. +- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. +- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. +- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. +- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. +- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. +- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. + +#### JVM + +- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window +- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications +- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value +- Heap Memory:JVM heap memory usage. + - Maximum heap memory:The maximum available heap memory size for the JVM. + - Committed heap memory:The size of heap memory that has been committed by the JVM. + - Used heap memory:The size of heap memory already used by the JVM. + - PS Eden Space:The size of the PS Young area. + - PS Old Space:The size of the PS Old area. + - PS Survivor Space:The size of the PS survivor area. + - ...(CMS/G1/ZGC, etc) +- Off Heap Memory:Out of heap memory usage. + - direct memory:Out of heap direct memory. + - mapped memory:Out of heap mapped memory. +- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC +- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC +- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC +- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC +- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute +- The Number of Class: + - loaded:The number of classes currently loaded by the JVM + - unloaded:The number of classes uninstalled by the JVM since system startup +- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. + +#### Network + +Eno refers to the network card connected to the public network, while lo refers to the virtual network card. + +- Net Speed:The speed of network card sending and receiving data +- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart +- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets +- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) + +### Performance Overview Dashboard + +#### Cluster Overview + +- Total CPU Core:Total CPU cores of cluster machines +- DataNode CPU Load:CPU usage of each DataNode node in the cluster +- Disk + - Total Disk Space: Total disk size of cluster machines + - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster +- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas +- Cluster: Number of ConfigNode and DataNode nodes in the cluster +- Up Time: The duration of cluster startup until now +- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas +- Memory + - Total System Memory: Total memory size of cluster machine system + - Total Swap Memory: Total size of cluster machine swap memory + - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster +- Total File Number:Total number of cluster management files +- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage +- Total DataBase: The total number of databases managed by the cluster (including replicas) +- Total DataRegion: The total number of DataRegions managed by the cluster +- Total SchemaRegion: The total number of SchemeRegions managed by the cluster + +#### Node Overview + +- CPU Core: The number of CPU cores in the machine where the node is located +- Disk Space: The disk size of the machine where the node is located +- Timeseries: Number of time series managed by the machine where the node is located (including replicas) +- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio +- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) +- System Memory: The system memory size of the machine where the node is located +- Swap Memory:The swap memory size of the machine where the node is located +- File Number: Number of files managed by nodes + +#### Performance + +- Session Idle Time:The total idle time and total busy time of the session connection of the node +- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections +- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 +- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node +- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes +- Task Number: The number of system tasks for each node +- Average Time Consumed of Task: The average time spent on various system tasks of a node +- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes +- Operation Per Second: The number of operations per second for a node +- Mainstream Process + - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process + - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node + - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process +- Schedule Stage + - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage + - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage + - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node +- Local Schedule Sub Stages + - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node + - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node + - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node +- Storage Stage + - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage + - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage + - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage +- Engine Stage + - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage + - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node + - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage + +#### System + +- CPU Load: CPU load of nodes +- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores +- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC +- Heap Memory: Node's heap memory usage +- Off Heap Memory: Non heap memory usage of nodes +- The Number Of Java Thread: Number of Java threads on nodes +- File Count:Number of files managed by nodes +- File Size: Node management file size situation +- Log Number Per Minute: Different types of logs per minute for nodes + +### ConfigNode Dashboard + +This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. + +#### Node Overview + +- Database Count: Number of databases for nodes +- Region + - DataRegion Count:Number of DataRegions for nodes + - DataRegion Current Status: The state of the DataRegion of the node + - SchemaRegion Count: Number of SchemeRegions for nodes + - SchemaRegion Current Status: The state of the SchemeRegion of the node +- System Memory: The system memory size of the node +- Swap Memory: Node's swap memory size +- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located +- DataNodes:The DataNode situation of the cluster where the node is located +- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load + +#### NodeInfo + +- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode +- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located +- DataNode Status: The status of the DataNode node in the cluster where the node is located +- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located +- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located +- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located +- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located + +#### Protocol + +- Client Count + - Active Client Num: The number of active clients in each thread pool of a node + - Idle Client Num: The number of idle clients in each thread pool of a node + - Borrowed Client Count: Number of borrowed clients in each thread pool of the node + - Created Client Count: Number of created clients for each thread pool of the node + - Destroyed Client Count: The number of destroyed clients in each thread pool of the node +- Client time situation + - Client Mean Active Time: The average active time of clients in each thread pool of a node + - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node + - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + +#### Partition Table + +- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located +- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located +- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located +- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located +- DataRegion Status: The DataRegion status of the cluster where the node is located +- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located + +#### Consensus + +- Ratis Stage Time: The time consumption of each stage of the node's Ratis +- Write Log Entry: The time required to write a log for the Ratis of a node +- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes +- Remote / Local Write QPS: Remote and local QPS written to node Ratis +- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol + +### DataNode Dashboard + +This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. + +#### Node Overview + +- The Number Of Entity: Entity situation of node management +- Write Point Per Second: The write speed per second of the node +- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. + +#### Protocol + +- Node Operation Time Consumption + - The Time Consumed Of Operation (avg): The average time spent on various operations of a node + - The Time Consumed Of Operation (50%): The median time spent on various operations of a node + - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes +- Thrift Statistics + - The QPS Of Interface: QPS of various Thrift interfaces of nodes + - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node + - Thrift Connection: The number of Thrfit connections of each type of node + - Thrift Active Thread: The number of active Thrift connections for each type of node +- Client Statistics + - Active Client Num: The number of active clients in each thread pool of a node + - Idle Client Num: The number of idle clients in each thread pool of a node + - Borrowed Client Count:Number of borrowed clients for each thread pool of a node + - Created Client Count: Number of created clients for each thread pool of the node + - Destroyed Client Count: The number of destroyed clients in each thread pool of the node + - Client Mean Active Time: The average active time of clients in each thread pool of a node + - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node + - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + +#### Storage Engine + +- File Count: Number of files of various types managed by nodes +- File Size: Node management of various types of file sizes +- TsFile + - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management + - TsFile Count In Each Level: Number of TsFile files at each level of node management + - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management +- Task Number: Number of Tasks for Nodes +- The Time Consumed of Task: The time consumption of tasks for nodes +- Compaction + - Compaction Read And Write Per Second: The merge read and write speed of nodes per second + - Compaction Number Per Minute: The number of merged nodes per minute + - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes + - Compacted Point Num Per Minute: The number of merged nodes per minute + +#### Write Performance + +- Write Cost(avg): Average node write time, including writing wal and memtable +- Write Cost(50%): Median node write time, including writing wal and memtable +- Write Cost(99%): P99 for node write time, including writing wal and memtable +- WAL + - WAL File Size: Total size of WAL files managed by nodes + - WAL File Num:Number of WAL files managed by nodes + - WAL Nodes Num: Number of WAL nodes managed by nodes + - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes + - WAL Serialize Total Cost: Total time spent on node WAL serialization + - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster + - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry + - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot + - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush + - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes + - WAL Buffer + - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options + - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node + - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node +- Flush Statistics + - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage + - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage + - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage + - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages + - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages + - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages +- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node +- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes +- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable +- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions +- Size Of Flushing MemTable: The size of the Memtable for node disk flushing +- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node +- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node +- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk + +#### Schema Engine + +- Schema Engine Mode: The metadata engine pattern of nodes +- Schema Consensus Protocol: Node metadata consensus protocol +- Schema Region Number:Number of SchemeRegions managed by nodes +- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node +- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion +- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node +- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) +- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node +- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node +- Time Series statistics + - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion + - Series Type: Number of time series of different types of nodes + - Time Series Number: The total number of time series nodes + - Template Series Number: The total number of template time series for nodes + - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node +- IMNode Statistics + - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion + - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node + - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node + - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node + - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes + - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second +- Cache Hit Rate: Cache hit rate of nodes +- Release and Flush Thread Number: The current number of active Release and Flush threads on the node +- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing +- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing + +#### Query Engine + +- Time Consumption In Each Stage + - The time consumed of query plan stages(avg): The average time spent on node queries at each stage + - The time consumed of query plan stages(50%): Median time spent on node queries at each stage + - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage +- Execution Plan Distribution Time + - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution + - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution + - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time +- Execution Plan Execution Time + - The time consumed of query execution stages(avg): The average execution time of node query execution plan + - The time consumed of query execution stages(50%):Median execution time of node query execution plan + - The time consumed of query execution stages(99%): P99 of node query execution plan execution time +- Operator Execution Time + - The time consumed of operator execution stages(avg): The average execution time of node query operators + - The time consumed of operator execution(50%): Median execution time of node query operator + - The time consumed of operator execution(99%): P99 of node query operator execution time +- Aggregation Query Computation Time + - The time consumed of query aggregation(avg): The average computation time for node aggregation queries + - The time consumed of query aggregation(50%): Median computation time for node aggregation queries + - The time consumed of query aggregation(99%): P99 of node aggregation query computation time +- File/Memory Interface Time Consumption + - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes + - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes + - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface +- Number Of Resource Visits + - The usage of query resource(avg): The average number of resource visits for node queries + - The usage of query resource(50%): Median number of resource visits for node queries + - The usage of query resource(99%): P99 for node query resource access quantity +- Data Transmission Time + - The time consumed of query data exchange(avg): The average time spent on node query data transmission + - The time consumed of query data exchange(50%): Median query data transmission time for nodes + - The time consumed of query data exchange(99%): P99 for node query data transmission time +- Number Of Data Transfers + - The count of Data Exchange(avg): The average number of data transfers queried by nodes + - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 +- Task Scheduling Quantity And Time Consumption + - The number of query queue: Node query task scheduling quantity + - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks + - The time consumed of query schedule time(50%): Median time spent on node query task scheduling + - The time consumed of query schedule time(99%): P99 of node query task scheduling time + +#### Query Interface + +- Load Time Series Metadata + - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata + - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries + - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata +- Read Time Series + - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series + - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series + - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series +- Modify Time Series Metadata + - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata + - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes + - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata +- Load Chunk Metadata List + - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists + - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list + - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list +- Modify Chunk Metadata + - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata + - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries + - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata +- Filter According To Chunk Metadata + - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata + - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata + - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata +- Constructing Chunk Reader + - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries + - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries + - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries +- Read Chunk + - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks + - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks + - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes +- Initialize Chunk Reader + - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries + - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries + - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries +- Constructing TsBlock Through Page Reader + - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader + - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries + - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 +- Query the construction of TsBlock through Merge Reader + - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader + - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries + - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 + +#### Query Data Exchange + +The data exchange for the query is time-consuming. + +- Obtain TsBlock through source handle + - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle + - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle + - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle +- Deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query +- Send TsBlock through sink handle + - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle + - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle + - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 +- Callback data block event + - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event + - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event + - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event +- Get Data Block Tasks + - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks + - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks + - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task + +#### Query Related Resource + +- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries +- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards +- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running +- Coordinator: The number of queries recorded on the node +- MemoryPool Size: Node query related memory pool situation +- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values +- DriverScheduler: Number of queue tasks related to node queries + +#### Consensus - IoT Consensus + +- Memory Usage + - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage +- Synchronization Status Between Nodes + - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes + - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes + - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes + - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node + - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption + - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption + - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions + - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue +- Different Execution Stages Take Time + - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus + - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus + - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Stage Time: The time consumption of different stages of node Ratis +- Write Log Entry: The time consumption of writing logs at different stages of node Ratis +- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely +- Remote / Local Write QPS: QPS written by node Ratis locally or remotely +- RatisConsensus Memory:Memory usage of node Ratis + +#### Consensus - SchemaRegion Ratis Consensus + +- Ratis Stage Time: The time consumption of different stages of node Ratis +- Write Log Entry: The time consumption for writing logs at each stage of node Ratis +- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely +- Remote / Local Write QPS: QPS written by node Ratis locally or remotely +- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md new file mode 100644 index 00000000..76fd8eac --- /dev/null +++ b/src/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md @@ -0,0 +1,266 @@ + +# Stand-Alone Deployment + +This chapter will introduce how to start an IoTDB standalone instance, which includes 1 ConfigNode and 1 DataNode (commonly known as 1C1D). + +## Note + +1. Before installation, ensure that the system is complete by referring to [System Requirements](./Environment-Requirements.md). + + 2. It is recommended to prioritize using 'hostname' for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure/etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure IoTDB's' cn_internal-address' using the host name dn_internal_address、dn_rpc_address。 + + ```shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + + 3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + + 4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + + 5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: + + - Using root user (recommended): Using root user can avoid issues such as permissions. + - Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + + 6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department, and the steps for deploying the monitoring panel can be referred to:[Monitoring Board Install and Deploy](./Monitoring-panel-deployment.md). + +## Installation Steps + +### 1、Unzip the installation package and enter the installation directory + +```Plain +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +### 2、Parameter Configuration + +#### Memory Configuration + +- conf/confignode-env.sh(or .bat) + + | **Configuration** | **Description** | **Default** | **Recommended value** | Note | + | :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | + | MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- conf/datanode-env.sh(or .bat) + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | + | MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### Function Configuration + +The parameters that actually take effect in the system are in the file conf/iotdb-system.exe. To start, the following parameters need to be set, which can be viewed in the conf/iotdb-system.exe file for all parameters + +Cluster function configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------------: | :----------------------------------------------------------: | :------------: | :----------------------------------------------------------: | :---------------------------------------------------: | +| cluster_name | Cluster Name | defaultCluster | The cluster name can be set as needed, and if there are no special needs, the default can be kept | Cannot be modified after initial startup | +| schema_replication_factor | Number of metadata replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | +| data_replication_factor | Number of data replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | + +ConfigNode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------: | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +DataNode Configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | +| :------------------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------------------- | :--------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 0.0.0.0 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The ConfigNode address that the node connects to when registering to join the cluster, i.e. cn_internal-address: cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +### 3、Start ConfigNode + +Enter the sbin directory of iotdb and start confignode + +```shell + +./start-confignode.sh -d #The "- d" parameter will start in the background + +``` + +If the startup fails, please refer to [Common Problem](#common-problem). + +### 4、Start DataNode + + Enter the sbin directory of iotdb and start datanode: + +```shell + +cd sbin + +./start-datanode.sh -d # The "- d" parameter will start in the background + +``` + +### 5、Activate Database + +#### Method 1: Activate file copy activation + +- After starting the confignode datanode node, enter the activation folder and copy the systeminfo file to the Timecho staff + +- Received the license file returned by the staff + +- Place the license file in the activation folder of the corresponding node; + +#### Method 2: Activate Script Activation + +- Retrieve the machine codes of 3 machines in sequence and enter IoTDB CLI + + - Table Model CLI Enter Command: + + ```SQL + # Linux or MACOS + ./start-cli.sh -sql_dialect table + + # windows + ./start-cli.bat -sql_dialect table + ``` + + - Enter the tree model CLI command: + + ```SQL + # Linux or MACOS + ./start-cli.sh + + # windows + ./start-cli.bat + ``` + +- Execute the following to obtain the machine code required for activation: + - Note: Currently, activation is only supported in tree models + + ```Bash + + show system info + + ``` + +- The following information is displayed, which shows the machine code of one machine: + +```Bash ++--------------------------------------------------------------+ +| SystemInfo| ++--------------------------------------------------------------+ +| 01-TE5NLES4-UDDWCMYE| ++--------------------------------------------------------------+ +Total line number = 1 +It costs 0.030s +``` + +- Enter the activation code returned by the staff into the CLI and enter the following content + - Note: The activation code needs to be marked with a `'`symbol before and after, as shown in + +```Bash +IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' +``` + +### 6、Verify Activation + +When the "ClusterActivation Status" field is displayed as Activated, it indicates successful activation + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81.png) + +## Common Problem + +1. Multiple prompts indicating activation failure during deployment process + +​ - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. + +​ - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. + +2. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` + +## Appendix + +### Introduction to Configuration Node Parameters + +| Parameter | Description | Is it required | +| :-------- | :---------------------------------------------- | :----------------- | +| -d | Start in daemon mode, running in the background | No | + +### Introduction to Datanode Node Parameters + +| Abbreviation | Description | Is it required | +| :----------- | :----------------------------------------------------------- | :------------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | + diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-CSharp-Native-API.md b/src/UserGuide/V2.0.1/Tree/API/Programming-CSharp-Native-API.md new file mode 100644 index 00000000..12d431a3 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-CSharp-Native-API.md @@ -0,0 +1,213 @@ + + +# C# Native API + +## Installation + +### Install from NuGet Package + +We have prepared Nuget Package for C# users. Users can directly install the client through .NET CLI. [The link of our NuGet Package is here](https://www.nuget.org/packages/Apache.IoTDB/). Run the following command in the command line to complete installation + +```sh +dotnet add package Apache.IoTDB +``` + +Note that the `Apache.IoTDB` package only supports versions greater than `.net framework 4.6.1`. + +## Prerequisites + + .NET SDK Version >= 5.0 + .NET Framework >= 4.6.1 + +## How to Use the Client (Quick Start) + +Users can quickly get started by referring to the use cases under the Apache-IoTDB-Client-CSharp-UserCase directory. These use cases serve as a useful resource for getting familiar with the client's functionality and capabilities. + +For those who wish to delve deeper into the client's usage and explore more advanced features, the samples directory contains additional code samples. + +## Developer environment requirements for iotdb-client-csharp + +``` +.NET SDK Version >= 5.0 +.NET Framework >= 4.6.1 +ApacheThrift >= 0.14.1 +NLog >= 4.7.9 +``` + +### OS + +* Linux, Macos or other unix-like OS +* Windows+bash(WSL, cygwin, Git Bash) + +### Command Line Tools + +* dotnet CLI +* Thrift + +## Basic interface description + +The Session interface is semantically identical to other language clients + +```csharp +// Parameters +string host = "localhost"; +int port = 6667; +int pool_size = 2; + +// Init Session +var session_pool = new SessionPool(host, port, pool_size); + +// Open Session +await session_pool.Open(false); + +// Create TimeSeries +await session_pool.CreateTimeSeries("root.test_group.test_device.ts1", TSDataType.TEXT, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); +await session_pool.CreateTimeSeries("root.test_group.test_device.ts2", TSDataType.BOOLEAN, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); +await session_pool.CreateTimeSeries("root.test_group.test_device.ts3", TSDataType.INT32, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); + +// Insert Record +var measures = new List{"ts1", "ts2", "ts3"}; +var values = new List { "test_text", true, (int)123 }; +var timestamp = 1; +var rowRecord = new RowRecord(timestamp, values, measures); +await session_pool.InsertRecordAsync("root.test_group.test_device", rowRecord); + +// Insert Tablet +var timestamp_lst = new List{ timestamp + 1 }; +var value_lst = new List {"iotdb", true, (int) 12}; +var tablet = new Tablet("root.test_group.test_device", measures, value_lst, timestamp_ls); +await session_pool.InsertTabletAsync(tablet); + +// Close Session +await session_pool.Close(); +``` + +## **Row Record** + +- Encapsulate and abstract the `record` data in **IoTDB** +- e.g. + + | timestamp | status | temperature | + | --------- | ------ | ----------- | + | 1 | 0 | 20 | + +- Construction: + +```csharp +var rowRecord = + new RowRecord(long timestamps, List values, List measurements); +``` + +### **Tablet** + +- A data structure similar to a table, containing several non empty data blocks of a device's rows。 +- e.g. + + | time | status | temperature | + | ---- | ------ | ----------- | + | 1 | 0 | 20 | + | 2 | 0 | 20 | + | 3 | 3 | 21 | + +- Construction: + +```csharp +var tablet = + Tablet(string deviceId, List measurements, List> values, List timestamps); +``` + + + +## **API** + +### **Basic API** + +| api name | parameters | notes | use example | +| -------------- | ------------------------- | ------------------------ | ----------------------------- | +| Open | bool | open session | session_pool.Open(false) | +| Close | null | close session | session_pool.Close() | +| IsOpen | null | check if session is open | session_pool.IsOpen() | +| OpenDebugMode | LoggingConfiguration=null | open debug mode | session_pool.OpenDebugMode() | +| CloseDebugMode | null | close debug mode | session_pool.CloseDebugMode() | +| SetTimeZone | string | set time zone | session_pool.GetTimeZone() | +| GetTimeZone | null | get time zone | session_pool.GetTimeZone() | + +### **Record API** + +| api name | parameters | notes | use example | +| ----------------------------------- | ----------------------------- | ----------------------------------- | ------------------------------------------------------------ | +| InsertRecordAsync | string, RowRecord | insert single record | session_pool.InsertRecordAsync("root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE", new RowRecord(1, values, measures)); | +| InsertRecordsAsync | List\, List\ | insert records | session_pool.InsertRecordsAsync(device_id, rowRecords) | +| InsertRecordsOfOneDeviceAsync | string, List\ | insert records of one device | session_pool.InsertRecordsOfOneDeviceAsync(device_id, rowRecords) | +| InsertRecordsOfOneDeviceSortedAsync | string, List\ | insert sorted records of one device | InsertRecordsOfOneDeviceSortedAsync(deviceId, sortedRowRecords); | +| TestInsertRecordAsync | string, RowRecord | test insert record | session_pool.TestInsertRecordAsync("root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE", rowRecord) | +| TestInsertRecordsAsync | List\, List\ | test insert record | session_pool.TestInsertRecordsAsync(device_id, rowRecords) | + +### **Tablet API** + +| api name | parameters | notes | use example | +| ---------------------- | ------------ | -------------------- | -------------------------------------------- | +| InsertTabletAsync | Tablet | insert single tablet | session_pool.InsertTabletAsync(tablet) | +| InsertTabletsAsync | List\ | insert tablets | session_pool.InsertTabletsAsync(tablets) | +| TestInsertTabletAsync | Tablet | test insert tablet | session_pool.TestInsertTabletAsync(tablet) | +| TestInsertTabletsAsync | List\ | test insert tablets | session_pool.TestInsertTabletsAsync(tablets) | + +### **SQL API** + +| api name | parameters | notes | use example | +| ----------------------------- | ---------- | ------------------------------ | ------------------------------------------------------------ | +| ExecuteQueryStatementAsync | string | execute sql query statement | session_pool.ExecuteQueryStatementAsync("select * from root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE where time<15"); | +| ExecuteNonQueryStatementAsync | string | execute sql nonquery statement | session_pool.ExecuteNonQueryStatementAsync( "create timeseries root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE.status with datatype=BOOLEAN,encoding=PLAIN") | + +### **Scheam API** + +| api name | parameters | notes | use example | +| -------------------------- | ------------------------------------------------------------ | --------------------------- | ------------------------------------------------------------ | +| SetStorageGroup | string | set storage group | session_pool.SetStorageGroup("root.97209_TEST_CSHARP_CLIENT_GROUP_01") | +| CreateTimeSeries | string, TSDataType, TSEncoding, Compressor | create time series | session_pool.InsertTabletsAsync(tablets) | +| DeleteStorageGroupAsync | string | delete single storage group | session_pool.DeleteStorageGroupAsync("root.97209_TEST_CSHARP_CLIENT_GROUP_01") | +| DeleteStorageGroupsAsync | List\ | delete storage group | session_pool.DeleteStorageGroupAsync("root.97209_TEST_CSHARP_CLIENT_GROUP") | +| CreateMultiTimeSeriesAsync | List\, List\ , List\ , List\ | create multi time series | session_pool.CreateMultiTimeSeriesAsync(ts_path_lst, data_type_lst, encoding_lst, compressor_lst); | +| DeleteTimeSeriesAsync | List\ | delete time series | | +| DeleteTimeSeriesAsync | string | delete time series | | +| DeleteDataAsync | List\, long, long | delete data | session_pool.DeleteDataAsync(ts_path_lst, 2, 3) | + +### **Other API** + +| api name | parameters | notes | use example | +| -------------------------- | ---------- | --------------------------- | ---------------------------------------------------- | +| CheckTimeSeriesExistsAsync | string | check if time series exists | session_pool.CheckTimeSeriesExistsAsync(time series) | + + + +[e.g.](https://github.com/apache/iotdb-client-csharp/tree/main/samples/Apache.IoTDB.Samples) + +## SessionPool + +To implement concurrent client requests, we provide a `SessionPool` for the native interface. Since `SessionPool` itself is a superset of `Session`, when `SessionPool` is a When the `pool_size` parameter is set to 1, it reverts to the original `Session` + +We use the `ConcurrentQueue` data structure to encapsulate a client queue to maintain multiple connections with the server. When the `Open()` interface is called, a specified number of clients are created in the queue, and synchronous access to the queue is achieved through the `System.Threading.Monitor` class. + +When a request occurs, it will try to find an idle client connection from the Connection pool. If there is no idle connection, the program will need to wait until there is an idle connection + +When a connection is used up, it will automatically return to the pool and wait for the next time it is used up + diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-Cpp-Native-API.md b/src/UserGuide/V2.0.1/Tree/API/Programming-Cpp-Native-API.md new file mode 100644 index 00000000..83f024d8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-Cpp-Native-API.md @@ -0,0 +1,428 @@ + + +# C++ Native API + +## Dependencies + +- Java 8+ +- Flex +- Bison 2.7+ +- Boost 1.56+ +- OpenSSL 1.0+ +- GCC 5.5.0+ + +## Installation + +### Install Required Dependencies + +- **MAC** + 1. Install Bison: + + Use the following brew command to install the Bison version: + ```shell + brew install bison + ``` + + 2. Install Boost: Make sure to install the latest version of Boost. + + ```shell + brew install boost + ``` + + 3. Check OpenSSL: Make sure the OpenSSL library is installed. The default OpenSSL header file path is "/usr/local/opt/openssl/include". + + If you encounter errors related to OpenSSL not being found during compilation, try adding `-Dopenssl.include.dir=""`. + +- **Ubuntu 16.04+ or Other Debian-based Systems** + + Use the following commands to install dependencies: + + ```shell + sudo apt-get update + sudo apt-get install gcc g++ bison flex libboost-all-dev libssl-dev + ``` + +- **CentOS 7.7+/Fedora/Rocky Linux or Other Red Hat-based Systems** + + Use the yum command to install dependencies: + + ```shell + sudo yum update + sudo yum install gcc gcc-c++ boost-devel bison flex openssl-devel + ``` + +- **Windows** + + 1. Set Up the Build Environment + - Install MS Visual Studio (version 2019+ recommended): Make sure to select Visual Studio C/C++ IDE and compiler (supporting CMake, Clang, MinGW) during installation. + - Download and install [CMake](https://cmake.org/download/). + + 2. Download and Install Flex, Bison + - Download [Win_Flex_Bison](https://sourceforge.net/projects/winflexbison/). + - After downloading, rename the executables to flex.exe and bison.exe to ensure they can be found during compilation, and add the directory of these executables to the PATH environment variable. + + 3. Install Boost Library + - Download [Boost](https://www.boost.org/users/download/). + - Compile Boost locally: Run `bootstrap.bat` and `b2.exe` in sequence. + - Add the Boost installation directory to the PATH environment variable, e.g., `C:\Program Files (x86)\boost_1_78_0`. + + 4. Install OpenSSL + - Download and install [OpenSSL](http://slproweb.com/products/Win32OpenSSL.html). + - Add the include directory under the installation directory to the PATH environment variable. + +### Compilation + +Clone the source code from git: +```shell +git clone https://github.com/apache/iotdb.git +``` + +The default main branch is the master branch. If you want to use a specific release version, switch to that branch (e.g., version 1.3.2): +```shell +git checkout rc/1.3.2 +``` + +Run Maven to compile in the IoTDB root directory: + +- Mac or Linux with glibc version >= 2.32 + ```shell + ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp + ``` + +- Linux with glibc version >= 2.31 + ```shell + ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Diotdb-tools-thrift.version=0.14.1.1-old-glibc-SNAPSHOT + ``` + +- Linux with glibc version >= 2.17 + ```shell + ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Diotdb-tools-thrift.version=0.14.1.1-glibc223-SNAPSHOT + ``` + +- Windows using Visual Studio 2022 + ```Batchfile + .\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp + ``` + +- Windows using Visual Studio 2019 + ```Batchfile + .\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Dcmake.generator="Visual Studio 16 2019" -Diotdb-tools-thrift.version=0.14.1.1-msvc142-SNAPSHOT + ``` + - If you haven't added the Boost library path to the PATH environment variable, you need to add the relevant parameters to the compile command, e.g., `-DboostIncludeDir="C:\Program Files (x86)\boost_1_78_0" -DboostLibraryDir="C:\Program Files (x86)\boost_1_78_0\stage\lib"`. + +After successful compilation, the packaged library files will be located in `iotdb-client/client-cpp/target`, and you can find the compiled example program under `example/client-cpp-example/target`. + +### Compilation Q&A + +Q: What are the requirements for the environment on Linux? + +A: +- The known minimum version requirement for glibc (x86_64 version) is 2.17, and the minimum version for GCC is 5.5. +- The known minimum version requirement for glibc (ARM version) is 2.31, and the minimum version for GCC is 10.2. +- If the above requirements are not met, you can try compiling Thrift locally: + - Download the code from https://github.com/apache/iotdb-bin-resources/tree/iotdb-tools-thrift-v0.14.1.0/iotdb-tools-thrift. + - Run `./mvnw clean install`. + - Go back to the IoTDB code directory and run `./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp`. + +Q: How to resolve the `undefined reference to '_libc_single_thread'` error during Linux compilation? + +A: +- This issue is caused by the precompiled Thrift dependencies requiring a higher version of glibc. +- You can try adding `-Diotdb-tools-thrift.version=0.14.1.1-glibc223-SNAPSHOT` or `-Diotdb-tools-thrift.version=0.14.1.1-old-glibc-SNAPSHOT` to the Maven compile command. + +Q: What if I need to compile using Visual Studio 2017 or earlier on Windows? + +A: +- You can try compiling Thrift locally before compiling the client: + - Download the code from https://github.com/apache/iotdb-bin-resources/tree/iotdb-tools-thrift-v0.14.1.0/iotdb-tools-thrift. + - Run `.\mvnw.cmd clean install`. + - Go back to the IoTDB code directory and run `.\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Dcmake.generator="Visual Studio 15 2017"`. + + +## Native APIs + +Here we show the commonly used interfaces and their parameters in the Native API: + +### Initialization + +- Open a Session +```cpp +void open(); +``` + +- Open a session, with a parameter to specify whether to enable RPC compression +```cpp +void open(bool enableRPCCompression); +``` +Notice: this RPC compression status of client must comply with that of IoTDB server + +- Close a Session +```cpp +void close(); +``` + +### Data Definition Interface (DDL) + +#### Database Management + +- CREATE DATABASE +```cpp +void setStorageGroup(const std::string &storageGroupId); +``` + +- Delete one or several databases +```cpp +void deleteStorageGroup(const std::string &storageGroup); +void deleteStorageGroups(const std::vector &storageGroups); +``` + +#### Timeseries Management + +- Create one or multiple timeseries +```cpp +void createTimeseries(const std::string &path, TSDataType::TSDataType dataType, TSEncoding::TSEncoding encoding, + CompressionType::CompressionType compressor); + +void createMultiTimeseries(const std::vector &paths, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors, + std::vector> *propsList, + std::vector> *tagsList, + std::vector> *attributesList, + std::vector *measurementAliasList); +``` + +- Create aligned timeseries +```cpp +void createAlignedTimeseries(const std::string &deviceId, + const std::vector &measurements, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors); +``` + +- Delete one or several timeseries +```cpp +void deleteTimeseries(const std::string &path); +void deleteTimeseries(const std::vector &paths); +``` + +- Check whether the specific timeseries exists. +```cpp +bool checkTimeseriesExists(const std::string &path); +``` + +#### Schema Template + +- Create a schema template +```cpp +void createSchemaTemplate(const Template &templ); +``` + +- Set the schema template named `templateName` at path `prefixPath`. +```cpp +void setSchemaTemplate(const std::string &template_name, const std::string &prefix_path); +``` + +- Unset the schema template +```cpp +void unsetSchemaTemplate(const std::string &prefix_path, const std::string &template_name); +``` + +- After measurement template created, you can edit the template with belowed APIs. +```cpp +// Add aligned measurements to a template +void addAlignedMeasurementsInTemplate(const std::string &template_name, + const std::vector &measurements, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors); + +// Add one aligned measurement to a template +void addAlignedMeasurementsInTemplate(const std::string &template_name, + const std::string &measurement, + TSDataType::TSDataType dataType, + TSEncoding::TSEncoding encoding, + CompressionType::CompressionType compressor); + +// Add unaligned measurements to a template +void addUnalignedMeasurementsInTemplate(const std::string &template_name, + const std::vector &measurements, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors); + +// Add one unaligned measurement to a template +void addUnalignedMeasurementsInTemplate(const std::string &template_name, + const std::string &measurement, + TSDataType::TSDataType dataType, + TSEncoding::TSEncoding encoding, + CompressionType::CompressionType compressor); + +// Delete a node in template and its children +void deleteNodeInTemplate(const std::string &template_name, const std::string &path); +``` + +- You can query measurement templates with these APIS: +```cpp +// Return the amount of measurements inside a template +int countMeasurementsInTemplate(const std::string &template_name); + +// Return true if path points to a measurement, otherwise returne false +bool isMeasurementInTemplate(const std::string &template_name, const std::string &path); + +// Return true if path exists in template, otherwise return false +bool isPathExistInTemplate(const std::string &template_name, const std::string &path); + +// Return all measurements paths inside template +std::vector showMeasurementsInTemplate(const std::string &template_name); + +// Return all measurements paths under the designated patter inside template +std::vector showMeasurementsInTemplate(const std::string &template_name, const std::string &pattern); +``` + + +### Data Manipulation Interface (DML) + +#### Insert + +> It is recommended to use insertTablet to help improve write efficiency. + +- Insert a Tablet,which is multiple rows of a device, each row has the same measurements + - Better Write Performance + - Support null values: fill the null value with any value, and then mark the null value via BitMap +```cpp +void insertTablet(Tablet &tablet); +``` + +- Insert multiple Tablets +```cpp +void insertTablets(std::unordered_map &tablets); +``` + +- Insert a Record, which contains multiple measurement value of a device at a timestamp +```cpp +void insertRecord(const std::string &deviceId, int64_t time, const std::vector &measurements, + const std::vector &types, const std::vector &values); +``` + +- Insert multiple Records +```cpp +void insertRecords(const std::vector &deviceIds, + const std::vector ×, + const std::vector> &measurementsList, + const std::vector> &typesList, + const std::vector> &valuesList); +``` + +- Insert multiple Records that belong to the same device. With type info the server has no need to do type inference, which leads a better performance +```cpp +void insertRecordsOfOneDevice(const std::string &deviceId, + std::vector ×, + std::vector> &measurementsList, + std::vector> &typesList, + std::vector> &valuesList); +``` + +#### Insert with type inference + +Without type information, server has to do type inference, which may cost some time. + +```cpp +void insertRecord(const std::string &deviceId, int64_t time, const std::vector &measurements, + const std::vector &values); + + +void insertRecords(const std::vector &deviceIds, + const std::vector ×, + const std::vector> &measurementsList, + const std::vector> &valuesList); + + +void insertRecordsOfOneDevice(const std::string &deviceId, + std::vector ×, + std::vector> &measurementsList, + const std::vector> &valuesList); +``` + +#### Insert data into Aligned Timeseries + +The Insert of aligned timeseries uses interfaces like `insertAlignedXXX`, and others are similar to the above interfaces: + +- insertAlignedRecord +- insertAlignedRecords +- insertAlignedRecordsOfOneDevice +- insertAlignedTablet +- insertAlignedTablets + +#### Delete + +- Delete data in a time range of one or several timeseries +```cpp +void deleteData(const std::string &path, int64_t endTime); +void deleteData(const std::vector &paths, int64_t endTime); +void deleteData(const std::vector &paths, int64_t startTime, int64_t endTime); +``` + +### IoTDB-SQL Interface + +- Execute query statement +```cpp +unique_ptr executeQueryStatement(const std::string &sql); +``` + +- Execute non query statement +```cpp +void executeNonQueryStatement(const std::string &sql); +``` + + +## Examples + +The sample code of using these interfaces is in: + +- `example/client-cpp-example/src/SessionExample.cpp` +- `example/client-cpp-example/src/AlignedTimeseriesSessionExample.cpp` (Aligned Timeseries) + +If the compilation finishes successfully, the example project will be placed under `example/client-cpp-example/target` + +## FAQ + +### on Mac + +If errors occur when compiling thrift source code, try to downgrade your xcode-commandline from 12 to 11.5 + +see https://stackoverflow.com/questions/63592445/ld-unsupported-tapi-file-type-tapi-tbd-in-yaml-file/65518087#65518087 + + +### on Windows + +When Building Thrift and downloading packages via "wget", a possible annoying issue may occur with +error message looks like: +```shell +Failed to delete cached file C:\Users\Administrator\.m2\repository\.cache\download-maven-plugin\index.ser +``` +Possible fixes: +- Try to delete the ".m2\repository\\.cache\" directory and try again. +- Add "\true\" configuration to the download-maven-plugin maven phase that complains this error. + diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-Go-Native-API.md b/src/UserGuide/V2.0.1/Tree/API/Programming-Go-Native-API.md new file mode 100644 index 00000000..b227ed67 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-Go-Native-API.md @@ -0,0 +1,64 @@ + + +# Go Native API + +The Git repository for the Go Native API client is located [here](https://github.com/apache/iotdb-client-go/) + +## Dependencies + + * golang >= 1.13 + * make >= 3.0 + * curl >= 7.1.1 + * thrift 0.15.0 + * Linux、Macos or other unix-like systems + * Windows+bash (WSL、cygwin、Git Bash) + +## Installation + + * go mod + +```sh +export GO111MODULE=on +export GOPROXY=https://goproxy.io + +mkdir session_example && cd session_example + +curl -o session_example.go -L https://github.com/apache/iotdb-client-go/raw/main/example/session_example.go + +go mod init session_example +go run session_example.go +``` + +* GOPATH + +```sh +# get thrift 0.15.0 +go get github.com/apache/thrift +cd $GOPATH/src/github.com/apache/thrift +git checkout 0.15.0 + +mkdir -p $GOPATH/src/iotdb-client-go-example/session_example +cd $GOPATH/src/iotdb-client-go-example/session_example +curl -o session_example.go -L https://github.com/apache/iotdb-client-go/raw/main/example/session_example.go +go run session_example.go +``` + diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-JDBC.md b/src/UserGuide/V2.0.1/Tree/API/Programming-JDBC.md new file mode 100644 index 00000000..0251e469 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-JDBC.md @@ -0,0 +1,296 @@ + + +# JDBC (Not Recommend) + +*NOTICE: CURRENTLY, JDBC IS USED FOR CONNECTING SOME THIRD-PART TOOLS. +IT CAN NOT PROVIDE HIGH THROUGHPUT FOR WRITE OPERATIONS. +PLEASE USE [Java Native API](./Programming-Java-Native-API.md) INSTEAD* + +## Dependencies + +* JDK >= 1.8+ +* Maven >= 3.9+ + +## Installation + +In root directory: + +```shell +mvn clean install -pl iotdb-client/jdbc -am -DskipTests +``` + +## Use IoTDB JDBC with Maven + +```xml + + + org.apache.iotdb + iotdb-jdbc + 1.3.1 + + +``` + +## Coding Examples + +This chapter provides an example of how to open a database connection, execute an SQL query, and display the results. + +It requires including the packages containing the JDBC classes needed for database programming. + +**NOTE: For faster insertion, the insertTablet() in Session is recommended.** + +```java +import java.sql.*; +import org.apache.iotdb.jdbc.IoTDBSQLException; + +public class JDBCExample { + /** + * Before executing a SQL statement with a Statement object, you need to create a Statement object using the createStatement() method of the Connection object. + * After creating a Statement object, you can use its execute() method to execute a SQL statement + * Finally, remember to close the 'statement' and 'connection' objects by using their close() method + * For statements with query results, we can use the getResultSet() method of the Statement object to get the result set. + */ + public static void main(String[] args) throws SQLException { + Connection connection = getConnection(); + if (connection == null) { + System.out.println("get connection defeat"); + return; + } + Statement statement = connection.createStatement(); + //Create database + try { + statement.execute("CREATE DATABASE root.demo"); + }catch (IoTDBSQLException e){ + System.out.println(e.getMessage()); + } + + + //SHOW DATABASES + statement.execute("SHOW DATABASES"); + outputResult(statement.getResultSet()); + + //Create time series + //Different data type has different encoding methods. Here use INT32 as an example + try { + statement.execute("CREATE TIMESERIES root.demo.s0 WITH DATATYPE=INT32,ENCODING=RLE;"); + }catch (IoTDBSQLException e){ + System.out.println(e.getMessage()); + } + //Show time series + statement.execute("SHOW TIMESERIES root.demo"); + outputResult(statement.getResultSet()); + //Show devices + statement.execute("SHOW DEVICES"); + outputResult(statement.getResultSet()); + //Count time series + statement.execute("COUNT TIMESERIES root"); + outputResult(statement.getResultSet()); + //Count nodes at the given level + statement.execute("COUNT NODES root LEVEL=3"); + outputResult(statement.getResultSet()); + //Count timeseries group by each node at the given level + statement.execute("COUNT TIMESERIES root GROUP BY LEVEL=3"); + outputResult(statement.getResultSet()); + + + //Execute insert statements in batch + statement.addBatch("INSERT INTO root.demo(timestamp,s0) VALUES(1,1);"); + statement.addBatch("INSERT INTO root.demo(timestamp,s0) VALUES(1,1);"); + statement.addBatch("INSERT INTO root.demo(timestamp,s0) VALUES(2,15);"); + statement.addBatch("INSERT INTO root.demo(timestamp,s0) VALUES(2,17);"); + statement.addBatch("INSERT INTO root.demo(timestamp,s0) values(4,12);"); + statement.executeBatch(); + statement.clearBatch(); + + //Full query statement + String sql = "SELECT * FROM root.demo"; + ResultSet resultSet = statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Exact query statement + sql = "SELECT s0 FROM root.demo WHERE time = 4;"; + resultSet= statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Time range query + sql = "SELECT s0 FROM root.demo WHERE time >= 2 AND time < 5;"; + resultSet = statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Aggregate query + sql = "SELECT COUNT(s0) FROM root.demo;"; + resultSet = statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Delete time series + statement.execute("DELETE timeseries root.demo.s0"); + + //close connection + statement.close(); + connection.close(); + } + + public static Connection getConnection() { + // JDBC driver name and database URL + String driver = "org.apache.iotdb.jdbc.IoTDBDriver"; + String url = "jdbc:iotdb://127.0.0.1:6667/"; + // set rpc compress mode + // String url = "jdbc:iotdb://127.0.0.1:6667?rpc_compress=true"; + + // Database credentials + String username = "root"; + String password = "root"; + + Connection connection = null; + try { + Class.forName(driver); + connection = DriverManager.getConnection(url, username, password); + } catch (ClassNotFoundException e) { + e.printStackTrace(); + } catch (SQLException e) { + e.printStackTrace(); + } + return connection; + } + + /** + * This is an example of outputting the results in the ResultSet + */ + private static void outputResult(ResultSet resultSet) throws SQLException { + if (resultSet != null) { + System.out.println("--------------------------"); + final ResultSetMetaData metaData = resultSet.getMetaData(); + final int columnCount = metaData.getColumnCount(); + for (int i = 0; i < columnCount; i++) { + System.out.print(metaData.getColumnLabel(i + 1) + " "); + } + System.out.println(); + while (resultSet.next()) { + for (int i = 1; ; i++) { + System.out.print(resultSet.getString(i)); + if (i < columnCount) { + System.out.print(", "); + } else { + System.out.println(); + break; + } + } + } + System.out.println("--------------------------\n"); + } + } +} +``` + +The parameter `version` can be used in the url: +````java +String url = "jdbc:iotdb://127.0.0.1:6667?version=V_1_0"; +```` +The parameter `version` represents the SQL semantic version used by the client, which is used in order to be compatible with the SQL semantics of `0.12` when upgrading to `0.13`. +The possible values are: `V_0_12`, `V_0_13`, `V_1_0`. + +In addition, IoTDB provides additional interfaces in JDBC for users to read and write the database using different character sets (e.g., GB18030) in the connection. +The default character set for IoTDB is UTF-8. When users want to use a character set other than UTF-8, they need to specify the charset property in the JDBC connection. For example: +1. Create a connection using the GB18030 charset: +```java +DriverManager.getConnection("jdbc:iotdb://127.0.0.1:6667?charset=GB18030", "root", "root"); +``` +2. When executing SQL with the `IoTDBStatement` interface, the SQL can be provided as a `byte[]` array, and it will be parsed into a string according to the specified charset. +```java +public boolean execute(byte[] sql) throws SQLException; +``` +3. When outputting query results, the `getBytes` method of `ResultSet` can be used to get `byte[]`, which will be encoded using the charset specified in the connection. +```java +System.out.print(resultSet.getString(i) + " (" + new String(resultSet.getBytes(i), charset) + ")"); +``` +Here is a complete example: +```java +public class JDBCCharsetExample { + + private static final Logger LOGGER = LoggerFactory.getLogger(JDBCCharsetExample.class); + + public static void main(String[] args) throws Exception { + Class.forName("org.apache.iotdb.jdbc.IoTDBDriver"); + + try (final Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667?charset=GB18030", "root", "root"); + final IoTDBStatement statement = (IoTDBStatement) connection.createStatement()) { + + final String insertSQLWithGB18030 = + "insert into root.测试(timestamp, 维语, 彝语, 繁体, 蒙文, 简体, 标点符号, 藏语) values(1, 'ئۇيغۇر تىلى', 'ꆈꌠꉙ', \"繁體\", 'ᠮᠣᠩᠭᠣᠯ ᠬᠡᠯᠡ', '简体', '——?!', \"བོད་སྐད།\");"; + final byte[] insertSQLWithGB18030Bytes = insertSQLWithGB18030.getBytes("GB18030"); + statement.execute(insertSQLWithGB18030Bytes); + } catch (IoTDBSQLException e) { + LOGGER.error("IoTDB Jdbc example error", e); + } + + outputResult("GB18030"); + outputResult("UTF-8"); + outputResult("UTF-16"); + outputResult("GBK"); + outputResult("ISO-8859-1"); + } + + private static void outputResult(String charset) throws SQLException { + System.out.println("[Charset: " + charset + "]"); + try (final Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667?charset=" + charset, "root", "root"); + final IoTDBStatement statement = (IoTDBStatement) connection.createStatement()) { + outputResult(statement.executeQuery("select ** from root"), Charset.forName(charset)); + } catch (IoTDBSQLException e) { + LOGGER.error("IoTDB Jdbc example error", e); + } + } + + private static void outputResult(ResultSet resultSet, Charset charset) throws SQLException { + if (resultSet != null) { + System.out.println("--------------------------"); + final ResultSetMetaData metaData = resultSet.getMetaData(); + final int columnCount = metaData.getColumnCount(); + for (int i = 0; i < columnCount; i++) { + System.out.print(metaData.getColumnLabel(i + 1) + " "); + } + System.out.println(); + + while (resultSet.next()) { + for (int i = 1; ; i++) { + System.out.print( + resultSet.getString(i) + " (" + new String(resultSet.getBytes(i), charset) + ")"); + if (i < columnCount) { + System.out.print(", "); + } else { + System.out.println(); + break; + } + } + } + System.out.println("--------------------------\n"); + } + } +} +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-Java-Native-API.md b/src/UserGuide/V2.0.1/Tree/API/Programming-Java-Native-API.md new file mode 100644 index 00000000..387a9e07 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-Java-Native-API.md @@ -0,0 +1,842 @@ + + +# Java Native API + +## Installation + +### Dependencies + +* JDK >= 1.8 +* Maven >= 3.6 + + +### Using IoTDB Java Native API with Maven + +```xml + + + org.apache.iotdb + iotdb-session + 1.0.0 + + +``` + +## Syntax Convention + +- **IoTDB-SQL interface:** The input SQL parameter needs to conform to the [syntax conventions](../User-Manual/Syntax-Rule.md#Literal-Values) and be escaped for JAVA strings. For example, you need to add a backslash before the double-quotes. (That is: after JAVA escaping, it is consistent with the SQL statement executed on the command line.) +- **Other interfaces:** + - The node names in path or path prefix as parameter: The node names which should be escaped by backticks (`) in the SQL statement, escaping is required here. + - Identifiers (such as template names) as parameters: The identifiers which should be escaped by backticks (`) in the SQL statement, and escaping is not required here. +- **Code example for syntax convention could be found at:** `example/session/src/main/java/org/apache/iotdb/SyntaxConventionRelatedExample.java` + +## Native APIs + +Here we show the commonly used interfaces and their parameters in the Native API: + +### Session Management + +* Initialize a Session + +``` java +// use default configuration +session = new Session.Builder.build(); + +// initialize with a single node +session = + new Session.Builder() + .host(String host) + .port(int port) + .build(); + +// initialize with multiple nodes +session = + new Session.Builder() + .nodeUrls(List nodeUrls) + .build(); + +// other configurations +session = + new Session.Builder() + .fetchSize(int fetchSize) + .username(String username) + .password(String password) + .thriftDefaultBufferSize(int thriftDefaultBufferSize) + .thriftMaxFrameSize(int thriftMaxFrameSize) + .enableRedirection(boolean enableRedirection) + .version(Version version) + .build(); +``` + +Version represents the SQL semantic version used by the client, which is used to be compatible with the SQL semantics of 0.12 when upgrading 0.13. The possible values are: `V_0_12`, `V_0_13`, `V_1_0`, etc. + + +* Open a Session + +``` java +void open() +``` + +* Open a session, with a parameter to specify whether to enable RPC compression + +``` java +void open(boolean enableRPCCompression) +``` + +Notice: this RPC compression status of client must comply with that of IoTDB server + +* Close a Session + +``` java +void close() +``` + +* SessionPool + +We provide a connection pool (`SessionPool) for Native API. +Using the interface, you need to define the pool size. + +If you can not get a session connection in 60 seconds, there is a warning log but the program will hang. + +If a session has finished an operation, it will be put back to the pool automatically. +If a session connection is broken, the session will be removed automatically and the pool will try +to create a new session and redo the operation. +You can also specify an url list of multiple reachable nodes when creating a SessionPool, just as you would when creating a Session. To ensure high availability of clients in distributed cluster. + +For query operations: + +1. When using SessionPool to query data, the result set is `SessionDataSetWrapper`; +2. Given a `SessionDataSetWrapper`, if you have not scanned all the data in it and stop to use it, +you have to call `SessionPool.closeResultSet(wrapper)` manually; +3. When you call `hasNext()` and `next()` of a `SessionDataSetWrapper` and there is an exception, then +you have to call `SessionPool.closeResultSet(wrapper)` manually; +4. You can call `getColumnNames()` of `SessionDataSetWrapper` to get the column names of query result; + +Examples: ```session/src/test/java/org/apache/iotdb/session/pool/SessionPoolTest.java``` + +Or `example/session/src/main/java/org/apache/iotdb/SessionPoolExample.java` + + +### Database & Timeseries Management API + +#### Database Management + +* CREATE DATABASE + +``` java +void setStorageGroup(String storageGroupId) +``` + +* Delete one or several databases + +``` java +void deleteStorageGroup(String storageGroup) +void deleteStorageGroups(List storageGroups) +``` + +#### Timeseries Management + +* Create one or multiple timeseries + +``` java +void createTimeseries(String path, TSDataType dataType, + TSEncoding encoding, CompressionType compressor, Map props, + Map tags, Map attributes, String measurementAlias) + +void createMultiTimeseries(List paths, List dataTypes, + List encodings, List compressors, + List> propsList, List> tagsList, + List> attributesList, List measurementAliasList) +``` + +* Create aligned timeseries +``` +void createAlignedTimeseries(String prefixPath, List measurements, + List dataTypes, List encodings, + List compressors, List measurementAliasList); +``` + +Attention: Alias of measurements are **not supported** currently. + +* Delete one or several timeseries + +``` java +void deleteTimeseries(String path) +void deleteTimeseries(List paths) +``` + +* Check whether the specific timeseries exists. + +``` java +boolean checkTimeseriesExists(String path) +``` + +#### Schema Template + + +Create a schema template for massive identical devices will help to improve memory performance. You can use Template, InternalNode and MeasurementNode to depict the structure of the template, and use belowed interface to create it inside session. + +``` java +public void createSchemaTemplate(Template template); + +Class Template { + private String name; + private boolean directShareTime; + Map children; + public Template(String name, boolean isShareTime); + + public void addToTemplate(Node node); + public void deleteFromTemplate(String name); + public void setShareTime(boolean shareTime); +} + +Abstract Class Node { + private String name; + public void addChild(Node node); + public void deleteChild(Node node); +} + +Class MeasurementNode extends Node { + TSDataType dataType; + TSEncoding encoding; + CompressionType compressor; + public MeasurementNode(String name, + TSDataType dataType, + TSEncoding encoding, + CompressionType compressor); +} +``` + +We strongly suggest you implement templates only with flat-measurement (like object 'flatTemplate' in belowed snippet), since tree-structured template may not be a long-term supported feature in further version of IoTDB. + +A snippet of using above Method and Class: + +``` java +MeasurementNode nodeX = new MeasurementNode("x", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); +MeasurementNode nodeY = new MeasurementNode("y", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); +MeasurementNode nodeSpeed = new MeasurementNode("speed", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY); + +// This is the template we suggest to implement +Template flatTemplate = new Template("flatTemplate"); +template.addToTemplate(nodeX); +template.addToTemplate(nodeY); +template.addToTemplate(nodeSpeed); + +createSchemaTemplate(flatTemplate); +``` + +You can query measurement inside templates with these APIS: + +```java +// Return the amount of measurements inside a template +public int countMeasurementsInTemplate(String templateName); + +// Return true if path points to a measurement, otherwise returne false +public boolean isMeasurementInTemplate(String templateName, String path); + +// Return true if path exists in template, otherwise return false +public boolean isPathExistInTemplate(String templateName, String path); + +// Return all measurements paths inside template +public List showMeasurementsInTemplate(String templateName); + +// Return all measurements paths under the designated patter inside template +public List showMeasurementsInTemplate(String templateName, String pattern); +``` + +To implement schema template, you can set the measurement template named 'templateName' at path 'prefixPath'. + +**Please notice that, we strongly recommend not setting templates on the nodes above the database to accommodate future updates and collaboration between modules.** + +``` java +void setSchemaTemplate(String templateName, String prefixPath) +``` + +Before setting template, you should firstly create the template using + +``` java +void createSchemaTemplate(Template template) +``` + +After setting template to a certain path, you can use the template to create timeseries on given device paths through the following interface, or you can write data directly to trigger timeseries auto creation using schema template under target devices. + +``` java +void createTimeseriesUsingSchemaTemplate(List devicePathList) +``` + +After setting template to a certain path, you can query for info about template using belowed interface in session: + +``` java +/** @return All template names. */ +public List showAllTemplates(); + +/** @return All paths have been set to designated template. */ +public List showPathsTemplateSetOn(String templateName); + +/** @return All paths are using designated template. */ +public List showPathsTemplateUsingOn(String templateName) +``` + +If you are ready to get rid of schema template, you can drop it with belowed interface. Make sure the template to drop has been unset from MTree. + +``` java +void unsetSchemaTemplate(String prefixPath, String templateName); +public void dropSchemaTemplate(String templateName); +``` + +Unset the measurement template named 'templateName' from path 'prefixPath'. When you issue this interface, you should assure that there is a template named 'templateName' set at the path 'prefixPath'. + +Attention: Unsetting the template named 'templateName' from node at path 'prefixPath' or descendant nodes which have already inserted records using template is **not supported**. + + +### Data Manipulation Interface (DML Interface) + +### Data Insert API + +It is recommended to use insertTablet to help improve write efficiency. + +* Insert a Tablet,which is multiple rows of a device, each row has the same measurements + * **Better Write Performance** + * **Support batch write** + * **Support null values**: fill the null value with any value, and then mark the null value via BitMap + +``` java +void insertTablet(Tablet tablet) + +public class Tablet { + /** deviceId of this tablet */ + public String prefixPath; + /** the list of measurement schemas for creating the tablet */ + private List schemas; + /** timestamps in this tablet */ + public long[] timestamps; + /** each object is a primitive type array, which represents values of one measurement */ + public Object[] values; + /** each bitmap represents the existence of each value in the current column. */ + public BitMap[] bitMaps; + /** the number of rows to include in this tablet */ + public int rowSize; + /** the maximum number of rows for this tablet */ + private int maxRowNumber; + /** whether this tablet store data of aligned timeseries or not */ + private boolean isAligned; +} +``` + +* Insert multiple Tablets + +``` java +void insertTablets(Map tablet) +``` + +* Insert a Record, which contains multiple measurement value of a device at a timestamp. This method is equivalent to providing a common interface for multiple data types of values. Later, the value can be cast to the original type through TSDataType. + + The correspondence between the Object type and the TSDataType type is shown in the following table. + + | TSDataType | Object | + |------------|--------------| + | BOOLEAN | Boolean | + | INT32 | Integer | + | DATE | LocalDate | + | INT64 | Long | + | TIMESTAMP | Long | + | FLOAT | Float | + | DOUBLE | Double | + | TEXT | String, Binary | + | STRING | String, Binary | + | BLOB | Binary | +``` java +void insertRecord(String deviceId, long time, List measurements, + List types, List values) +``` + +* Insert multiple Records + +``` java +void insertRecords(List deviceIds, List times, + List> measurementsList, List> typesList, + List> valuesList) +``` +* Insert multiple Records that belong to the same device. + With type info the server has no need to do type inference, which leads a better performance + +``` java +void insertRecordsOfOneDevice(String deviceId, List times, + List> measurementsList, List> typesList, + List> valuesList) +``` + +#### Insert with type inference + +When the data is of String type, we can use the following interface to perform type inference based on the value of the value itself. For example, if value is "true" , it can be automatically inferred to be a boolean type. If value is "3.2" , it can be automatically inferred as a flout type. Without type information, server has to do type inference, which may cost some time. + +* Insert a Record, which contains multiple measurement value of a device at a timestamp + +``` java +void insertRecord(String prefixPath, long time, List measurements, List values) +``` + +* Insert multiple Records + +``` java +void insertRecords(List deviceIds, List times, + List> measurementsList, List> valuesList) +``` + +* Insert multiple Records that belong to the same device. + +``` java +void insertStringRecordsOfOneDevice(String deviceId, List times, + List> measurementsList, List> valuesList) +``` + +#### Insert of Aligned Timeseries + +The Insert of aligned timeseries uses interfaces like insertAlignedXXX, and others are similar to the above interfaces: + +* insertAlignedRecord +* insertAlignedRecords +* insertAlignedRecordsOfOneDevice +* insertAlignedStringRecordsOfOneDevice +* insertAlignedTablet +* insertAlignedTablets + +### Data Delete API + +* Delete data before or equal to a timestamp of one or several timeseries + +``` java +void deleteData(String path, long time) +void deleteData(List paths, long time) +``` + +### Data Query API + +* Time-series raw data query with time range: + - The specified query time range is a left-closed right-open interval, including the start time but excluding the end time. + +``` java +SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); +``` + +* Last query: + - Query the last data, whose timestamp is greater than or equal LastTime. + ``` java + SessionDataSet executeLastDataQuery(List paths, long LastTime); + ``` + - Query the latest point of the specified series of single device quickly, and support redirection; + If you are sure that the query path is valid, set 'isLegalPathNodes' to true to avoid performance penalties from path verification. + ``` java + SessionDataSet executeLastDataQueryForOneDevice( + String db, String device, List sensors, boolean isLegalPathNodes); + ``` + +* Aggregation query: + - Support specified query time range: The specified query time range is a left-closed right-open interval, including the start time but not the end time. + - Support GROUP BY TIME. + +``` java +SessionDataSet executeAggregationQuery(List paths, List aggregations); + +SessionDataSet executeAggregationQuery( + List paths, List aggregations, long startTime, long endTime); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval, + long slidingStep); +``` + +* Execute query statement + +``` java +SessionDataSet executeQueryStatement(String sql) +``` + +### Data Subscription + +#### 1 Topic Management + +The `SubscriptionSession` class in the IoTDB subscription client provides interfaces for topic management. The status changes of topics are illustrated in the diagram below: + +
+ +
+ +##### 1.1 Create Topic + +```Java + void createTopicIfNotExists(String topicName, Properties properties) throws Exception; +``` + +Example: + +```Java +try (final SubscriptionSession session = new SubscriptionSession(host, port)) { + session.open(); + final Properties config = new Properties(); + config.put(TopicConstant.PATH_KEY, "root.db.**"); + session.createTopic(topicName, config); +} +``` + +##### 1.2 Delete Topic + +```Java +void dropTopicIfExists(String topicName) throws Exception; +``` + +##### 1.3 View Topic + +```Java +// Get all topics +Set getTopics() throws Exception; + +// Get a specific topic +Optional getTopic(String topicName) throws Exception; +``` + +#### 2 Check Subscription Status +The `SubscriptionSession` class in the IoTDB subscription client provides interfaces to check the subscription status: + +```Java +Set getSubscriptions() throws Exception; +Set getSubscriptions(final String topicName) throws Exception; +``` + +#### 3 Create Consumer + +When creating a consumer using the JAVA native interface, you need to specify the parameters applied to the consumer. + +For both `SubscriptionPullConsumer` and `SubscriptionPushConsumer`, the following common configurations are available: + + +| key | **required or optional with default** | description | +| :---------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | +| host | optional: 127.0.0.1 | `String`: The RPC host of a certain DataNode in IoTDB | +| port | optional: 6667 | Integer: The RPC port of a certain DataNode in IoTDB | +| node-urls | optional: 127.0.0.1:6667 | `List`: The RPC addresses of all DataNodes in IoTDB, can be multiple; either host:port or node-urls can be filled in. If both host:port and node-urls are filled in, the union of host:port and node-urls will be used to form a new node-urls application | +| username | optional: root | `String`: The username of a DataNode in IoTDB | +| password | optional: root | `String`: The password of a DataNode in IoTDB | +| groupId | optional | `String`: consumer group id, if not specified, a new consumer group will be randomly assigned, ensuring that different consumer groups have different consumer group ids | +| consumerId | optional | `String`: consumer client id, if not specified, it will be randomly assigned, ensuring that each consumer client id in the same consumer group is unique | +| heartbeatIntervalMs | optional: 30000 (min: 1000) | `Long`: The interval at which the consumer sends heartbeat requests to the IoTDB DataNode | +| endpointsSyncIntervalMs | optional: 120000 (min: 5000) | `Long`: The interval at which the consumer detects the expansion and contraction of IoTDB cluster nodes and adjusts the subscription connection | +| fileSaveDir | optional: Paths.get(System.getProperty("user.dir"), "iotdb-subscription").toString() | `String`: The temporary directory path where the TsFile files subscribed by the consumer are stored | +| fileSaveFsync | optional: false | `Boolean`: Whether the consumer actively calls fsync during the subscription of TsFile | + + +##### 3.1 SubscriptionPushConsumer + +The following are special configurations for `SubscriptionPushConsumer`: + + +| key | **required or optional with default** | description | +| :----------------- | :------------------------------------ | :----------------------------------------------------------- | +| ackStrategy | optional: `ACKStrategy.AFTER_CONSUME` | Consumption progress confirmation mechanism includes the following options: `ACKStrategy.BEFORE_CONSUME` (submit consumption progress immediately when the consumer receives data, before `onReceive`) `ACKStrategy.AFTER_CONSUME` (submit consumption progress after the consumer has consumed the data, after `onReceive`) | +| consumeListener | optional | Consumption data callback function, need to implement the `ConsumeListener` interface, define the consumption logic of `SessionDataSetsHandler` and `TsFileHandler` form data| +| autoPollIntervalMs | optional: 5000 (min: 500) | Long: The interval at which the consumer automatically pulls data, in ms | +| autoPollTimeoutMs | optional: 10000 (min: 1000) | Long: The timeout time for the consumer to pull data each time, in ms | + +Among them, the ConsumerListener interface is defined as follows: + + +```Java +@FunctionInterface +interface ConsumeListener { + default ConsumeResult onReceive(Message message) { + return ConsumeResult.SUCCESS; + } +} + +enum ConsumeResult { + SUCCESS, + FAILURE, +} +``` + +##### 3.2 SubscriptionPullConsumer + +The following are special configurations for `SubscriptionPullConsumer` : + +| key | **required or optional with default** | description | +| :----------------- | :------------------------------------ | :----------------------------------------------------------- | +| autoCommit | optional: true | Boolean: Whether to automatically commit consumption progress. If this parameter is set to false, the commit method must be called to manually `commit` consumption progress. | +| autoCommitInterval | optional: 5000 (min: 500) | Long: The interval at which consumption progress is automatically committed, in milliseconds. This only takes effect when the autoCommit parameter is true. + | + +After creating a consumer, you need to manually call the consumer's open method: + + +```Java +void open() throws Exception; +``` + +At this point, the IoTDB subscription client will verify the correctness of the consumer's configuration. After a successful verification, the consumer will join the corresponding consumer group. That is, only after opening the consumer can you use the returned consumer object to subscribe to topics, consume data, and perform other operations. + +#### 4 Subscribe to Topics + +Both `SubscriptionPushConsumer` and `SubscriptionPullConsumer` provide the following JAVA native interfaces for subscribing to topics: + +```Java +// Subscribe to topics +void subscribe(String topic) throws Exception; +void subscribe(List topics) throws Exception; +``` + +- Before a consumer subscribes to a topic, the topic must have been created, otherwise, the subscription will fail. + +- If a consumer subscribes to a topic that it has already subscribed to, no error will occur. + +- If there are other consumers in the same consumer group that have subscribed to the same topics, the consumer will reuse the corresponding consumption progress. + + +#### 5 Consume Data + +For both push and pull mode consumers: + + +- Only after explicitly subscribing to a topic will the consumer receive data for that topic. + +- If no topics are subscribed to after creation, the consumer will not be able to consume any data, even if other consumers in the same consumer group have subscribed to some topics. + +##### 5.1 SubscriptionPushConsumer + +After `SubscriptionPushConsumer` subscribes to topics, there is no need to manually pull data. + +The data consumption logic is within the `consumeListener` configuration specified when creating `SubscriptionPushConsumer`. + +##### 5.2 SubscriptionPullConsumer + +After SubscriptionPullConsumer subscribes to topics, it needs to actively call the poll method to pull data: + +```Java +List poll(final Duration timeout) throws Exception; +List poll(final long timeoutMs) throws Exception; +List poll(final Set topicNames, final Duration timeout) throws Exception; +List poll(final Set topicNames, final long timeoutMs) throws Exception; +``` + +In the poll method, you can specify the topic names to be pulled (if not specified, it defaults to pulling all topics that the consumer has subscribed to) and the timeout period. + + +When the SubscriptionPullConsumer is configured with the autoCommit parameter set to false, it is necessary to manually call the commitSync and commitAsync methods to synchronously or asynchronously commit the consumption progress of a batch of data: + + +```Java +void commitSync(final SubscriptionMessage message) throws Exception; +void commitSync(final Iterable messages) throws Exception; + +CompletableFuture commitAsync(final SubscriptionMessage message); +CompletableFuture commitAsync(final Iterable messages); +void commitAsync(final SubscriptionMessage message, final AsyncCommitCallback callback); +void commitAsync(final Iterable messages, final AsyncCommitCallback callback); +``` + +The AsyncCommitCallback class is defined as follows: + +```Java +public interface AsyncCommitCallback { + default void onComplete() { + // Do nothing + } + + default void onFailure(final Throwable e) { + // Do nothing + } +} +``` + +#### 6 Unsubscribe + +The `SubscriptionPushConsumer` and `SubscriptionPullConsumer` provide the following JAVA native interfaces for unsubscribing and closing the consumer: + +```Java +// Unsubscribe from topics +void unsubscribe(String topic) throws Exception; +void unsubscribe(List topics) throws Exception; + +// Close consumer +void close(); +``` + +- If a consumer unsubscribes from a topic that it has not subscribed to, no error will occur. +- When a consumer is closed, it will exit the corresponding consumer group and automatically unsubscribe from all topics it is currently subscribed to. +- Once a consumer is closed, its lifecycle ends, and it cannot be reopened to subscribe to and consume data again. + + +#### 7 Code Examples + +##### 7.1 Single Pull Consumer Consuming SessionDataSetsHandler Format Data + +```Java +// Create topics +try (final SubscriptionSession session = new SubscriptionSession(HOST, PORT)) { + session.open(); + final Properties config = new Properties(); + config.put(TopicConstant.PATH_KEY, "root.db.**"); + session.createTopic(TOPIC_1, config); +} + +// Subscription: property-style ctor +final Properties config = new Properties(); +config.put(ConsumerConstant.CONSUMER_ID_KEY, "c1"); +config.put(ConsumerConstant.CONSUMER_GROUP_ID_KEY, "cg1"); + +final SubscriptionPullConsumer consumer1 = new SubscriptionPullConsumer(config); +consumer1.open(); +consumer1.subscribe(TOPIC_1); +while (true) { + LockSupport.parkNanos(SLEEP_NS); // wait some time + final List messages = consumer1.poll(POLL_TIMEOUT_MS); + for (final SubscriptionMessage message : messages) { + for (final SubscriptionSessionDataSet dataSet : message.getSessionDataSetsHandler()) { + System.out.println(dataSet.getColumnNames()); + System.out.println(dataSet.getColumnTypes()); + while (dataSet.hasNext()) { + System.out.println(dataSet.next()); + } + } + } + // Auto commit +} + +// Show topics and subscriptions +try (final SubscriptionSession session = new SubscriptionSession(HOST, PORT)) { + session.open(); + session.getTopics().forEach((System.out::println)); + session.getSubscriptions().forEach((System.out::println)); +} + +consumer1.unsubscribe(TOPIC_1); +consumer1.close(); +``` + +##### 7.2 Multiple Push Consumers Consuming TsFileHandler Format Data + +```Java +// Create topics +try (final SubscriptionSession subscriptionSession = new SubscriptionSession(HOST, PORT)) { + subscriptionSession.open(); + final Properties config = new Properties(); + config.put(TopicConstant.FORMAT_KEY, TopicConstant.FORMAT_TS_FILE_HANDLER_VALUE); + subscriptionSession.createTopic(TOPIC_2, config); +} + +final List threads = new ArrayList<>(); +for (int i = 0; i < 8; ++i) { + final int idx = i; + final Thread thread = + new Thread( + () -> { + // Subscription: builder-style ctor + try (final SubscriptionPushConsumer consumer2 = + new SubscriptionPushConsumer.Builder() + .consumerId("c" + idx) + .consumerGroupId("cg2") + .fileSaveDir(System.getProperty("java.io.tmpdir")) + .ackStrategy(AckStrategy.AFTER_CONSUME) + .consumeListener( + message -> { + doSomething(message.getTsFileHandler()); + return ConsumeResult.SUCCESS; + }) + .buildPushConsumer()) { + consumer2.open(); + consumer2.subscribe(TOPIC_2); + // block the consumer main thread + Thread.sleep(Long.MAX_VALUE); + } catch (final IOException | InterruptedException e) { + throw new RuntimeException(e); + } + }); + thread.start(); + threads.add(thread); +} + +for (final Thread thread : threads) { + thread.join(); +} +``` + +### Other Modules (Execute SQL Directly) + +* Execute non query statement + +``` java +void executeNonQueryStatement(String sql) +``` + + +### Write Test Interface (to profile network cost) + +These methods **don't** insert data into database and server just return after accept the request. + +* Test the network and client cost of insertRecord + +``` java +void testInsertRecord(String deviceId, long time, List measurements, List values) + +void testInsertRecord(String deviceId, long time, List measurements, + List types, List values) +``` + +* Test the network and client cost of insertRecords + +``` java +void testInsertRecords(List deviceIds, List times, + List> measurementsList, List> valuesList) + +void testInsertRecords(List deviceIds, List times, + List> measurementsList, List> typesList + List> valuesList) +``` + +* Test the network and client cost of insertTablet + +``` java +void testInsertTablet(Tablet tablet) +``` + +* Test the network and client cost of insertTablets + +``` java +void testInsertTablets(Map tablets) +``` + +### Coding Examples + +To get more information of the following interfaces, please view session/src/main/java/org/apache/iotdb/session/Session.java + +The sample code of using these interfaces is in example/session/src/main/java/org/apache/iotdb/SessionExample.java,which provides an example of how to open an IoTDB session, execute a batch insertion. + +For examples of aligned timeseries and measurement template, you can refer to `example/session/src/main/java/org/apache/iotdb/AlignedTimeseriesSessionExample.java` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-Kafka.md b/src/UserGuide/V2.0.1/Tree/API/Programming-Kafka.md new file mode 100644 index 00000000..0a041448 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-Kafka.md @@ -0,0 +1,118 @@ + + +# Kafka + +[Apache Kafka](https://kafka.apache.org/) is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. + +## Coding Example + +### kafka Producer Producing Data Java Code Example + +```java + Properties props = new Properties(); + props.put("bootstrap.servers", "127.0.0.1:9092"); + props.put("key.serializer", StringSerializer.class); + props.put("value.serializer", StringSerializer.class); + KafkaProducer producer = new KafkaProducer<>(props); + producer.send( + new ProducerRecord<>( + "Kafka-Test", "key", "root.kafka," + System.currentTimeMillis() + ",value,INT32,100")); + producer.close(); +``` + +### kafka Consumer Receiving Data Java Code Example + +```java + Properties props = new Properties(); + props.put("bootstrap.servers", "127.0.0.1:9092"); + props.put("key.deserializer", StringDeserializer.class); + props.put("value.deserializer", StringDeserializer.class); + props.put("auto.offset.reset", "earliest"); + props.put("group.id", "Kafka-Test"); + KafkaConsumer kafkaConsumer = new KafkaConsumer<>(props); + kafkaConsumer.subscribe(Collections.singleton("Kafka-Test")); + ConsumerRecords records = kafkaConsumer.poll(Duration.ofSeconds(1)); + ``` + +### Example of Java Code Stored in IoTDB Server + +```java + SessionPool pool = + new SessionPool.Builder() + .host("127.0.0.1") + .port(6667) + .user("root") + .password("root") + .maxSize(3) + .build(); + List datas = new ArrayList<>(records.count()); + for (ConsumerRecord record : records) { + datas.add(record.value()); + } + int size = datas.size(); + List deviceIds = new ArrayList<>(size); + List times = new ArrayList<>(size); + List> measurementsList = new ArrayList<>(size); + List> typesList = new ArrayList<>(size); + List> valuesList = new ArrayList<>(size); + for (String data : datas) { + String[] dataArray = data.split(","); + String device = dataArray[0]; + long time = Long.parseLong(dataArray[1]); + List measurements = Arrays.asList(dataArray[2].split(":")); + List types = new ArrayList<>(); + for (String type : dataArray[3].split(":")) { + types.add(TSDataType.valueOf(type)); + } + List values = new ArrayList<>(); + String[] valuesStr = dataArray[4].split(":"); + for (int i = 0; i < valuesStr.length; i++) { + switch (types.get(i)) { + case INT64: + values.add(Long.parseLong(valuesStr[i])); + break; + case DOUBLE: + values.add(Double.parseDouble(valuesStr[i])); + break; + case INT32: + values.add(Integer.parseInt(valuesStr[i])); + break; + case TEXT: + values.add(valuesStr[i]); + break; + case FLOAT: + values.add(Float.parseFloat(valuesStr[i])); + break; + case BOOLEAN: + values.add(Boolean.parseBoolean(valuesStr[i])); + break; + } + } + deviceIds.add(device); + times.add(time); + measurementsList.add(measurements); + typesList.add(types); + valuesList.add(values); + } + pool.insertRecords(deviceIds, times, measurementsList, typesList, valuesList); + ``` + diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-MQTT.md b/src/UserGuide/V2.0.1/Tree/API/Programming-MQTT.md new file mode 100644 index 00000000..5bbb610c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-MQTT.md @@ -0,0 +1,183 @@ + +# MQTT Protocol + +[MQTT](http://mqtt.org/) is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. +It was designed as an extremely lightweight publish/subscribe messaging transport. +It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium. + +IoTDB supports the MQTT v3.1(an OASIS Standard) protocol. +IoTDB server includes a built-in MQTT service that allows remote devices send messages into IoTDB server directly. + + + + +## Built-in MQTT Service +The Built-in MQTT Service provide the ability of direct connection to IoTDB through MQTT. It listen the publish messages from MQTT clients + and then write the data into storage immediately. +The MQTT topic corresponds to IoTDB timeseries. +The messages payload can be format to events by `PayloadFormatter` which loaded by java SPI, and the default implementation is `JSONPayloadFormatter`. +The default `json` formatter support two json format and its json array. The following is an MQTT message payload example: + +```json + { + "device":"root.sg.d1", + "timestamp":1586076045524, + "measurements":["s1","s2"], + "values":[0.530635,0.530635] + } +``` +or +```json + { + "device":"root.sg.d1", + "timestamps":[1586076045524,1586076065526], + "measurements":["s1","s2"], + "values":[[0.530635,0.530635], [0.530655,0.530695]] + } +``` +or json array of the above two. + + + +## MQTT Configurations +The IoTDB MQTT service load configurations from `${IOTDB_HOME}/${IOTDB_CONF}/iotdb-system.properties` by default. + +Configurations are as follows: + +| NAME | DESCRIPTION | DEFAULT | +| ------------- |:-------------:|:------:| +| enable_mqtt_service | whether to enable the mqtt service | false | +| mqtt_host | the mqtt service binding host | 127.0.0.1 | +| mqtt_port | the mqtt service binding port | 1883 | +| mqtt_handler_pool_size | the handler pool size for handing the mqtt messages | 1 | +| mqtt_payload_formatter | the mqtt message payload formatter | json | +| mqtt_max_message_size | the max mqtt message size in byte| 1048576 | + + +## Coding Examples +The following is an example which a mqtt client send messages to IoTDB server. + +```java +MQTT mqtt = new MQTT(); +mqtt.setHost("127.0.0.1", 1883); +mqtt.setUserName("root"); +mqtt.setPassword("root"); + +BlockingConnection connection = mqtt.blockingConnection(); +connection.connect(); + +Random random = new Random(); +for (int i = 0; i < 10; i++) { + String payload = String.format("{\n" + + "\"device\":\"root.sg.d1\",\n" + + "\"timestamp\":%d,\n" + + "\"measurements\":[\"s1\"],\n" + + "\"values\":[%f]\n" + + "}", System.currentTimeMillis(), random.nextDouble()); + + connection.publish("root.sg.d1.s1", payload.getBytes(), QoS.AT_LEAST_ONCE, false); +} + +connection.disconnect(); + +``` + +## Customize your MQTT Message Format + +If you do not like the above Json format, you can customize your MQTT Message format by just writing several lines +of codes. An example can be found in `example/mqtt-customize` project. + +Steps: +1. Create a java project, and add dependency: +```xml + + org.apache.iotdb + iotdb-server + 1.1.0-SNAPSHOT + +``` +2. Define your implementation which implements `org.apache.iotdb.db.protocol.mqtt.PayloadFormatter` +e.g., + +```java +package org.apache.iotdb.mqtt.server; + +import io.netty.buffer.ByteBuf; +import org.apache.iotdb.db.protocol.mqtt.Message; +import org.apache.iotdb.db.protocol.mqtt.PayloadFormatter; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +public class CustomizedJsonPayloadFormatter implements PayloadFormatter { + + @Override + public List format(ByteBuf payload) { + // Suppose the payload is a json format + if (payload == null) { + return null; + } + + String json = payload.toString(StandardCharsets.UTF_8); + // parse data from the json and generate Messages and put them into List ret + List ret = new ArrayList<>(); + // this is just an example, so we just generate some Messages directly + for (int i = 0; i < 2; i++) { + long ts = i; + Message message = new Message(); + message.setDevice("d" + i); + message.setTimestamp(ts); + message.setMeasurements(Arrays.asList("s1", "s2")); + message.setValues(Arrays.asList("4.0" + i, "5.0" + i)); + ret.add(message); + } + return ret; + } + + @Override + public String getName() { + // set the value of mqtt_payload_formatter in iotdb-system.properties as the following string: + return "CustomizedJson"; + } +} +``` +3. modify the file in `src/main/resources/META-INF/services/org.apache.iotdb.db.protocol.mqtt.PayloadFormatter`: + clean the file and put your implementation class name into the file. + In this example, the content is: `org.apache.iotdb.mqtt.server.CustomizedJsonPayloadFormatter` +4. compile your implementation as a jar file: `mvn package -DskipTests` + + +Then, in your server: +1. Create ${IOTDB_HOME}/ext/mqtt/ folder, and put the jar into this folder. +2. Update configuration to enable MQTT service. (`enable_mqtt_service=true` in `conf/iotdb-system.properties`) +3. Set the value of `mqtt_payload_formatter` in `conf/iotdb-system.properties` as the value of getName() in your implementation + , in this example, the value is `CustomizedJson` +4. Launch the IoTDB server. +5. Now IoTDB will use your implementation to parse the MQTT message. + +More: the message format can be anything you want. For example, if it is a binary format, +just use `payload.forEachByte()` or `payload.array` to get bytes content. + + + diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-NodeJS-Native-API.md b/src/UserGuide/V2.0.1/Tree/API/Programming-NodeJS-Native-API.md new file mode 100644 index 00000000..35c7964c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-NodeJS-Native-API.md @@ -0,0 +1,181 @@ + + +# Node.js Native API + +Apache IoTDB uses Thrift as a cross-language RPC-framework so access to IoTDB can be achieved through the interfaces provided by Thrift. +This document will introduce how to generate a native Node.js interface that can be used to access IoTDB. + +## Dependents + + * JDK >= 1.8 + * Node.js >= 16.0.0 + * Linux、Macos or like unix + * Windows+bash + +## Generate the Node.js native interface + +1. Find the `pom.xml` file in the root directory of the IoTDB source code folder. +2. Open the `pom.xml` file and find the following content: + ```xml + + generate-thrift-sources-python + generate-sources + + compile + + + py + ${project.build.directory}/generated-sources-python/ + + + ``` +3. Duplicate this block and change the `id`, `generator` and `outputDirectory` to this: + ```xml + + generate-thrift-sources-nodejs + generate-sources + + compile + + + js:node + ${project.build.directory}/generated-sources-nodejs/ + + + ``` +4. In the root directory of the IoTDB source code folder,run `mvn clean generate-sources`. + +This command will automatically delete the files in `iotdb/iotdb-protocol/thrift/target` and `iotdb/iotdb-protocol/thrift-commons/target`, and repopulate the folder with the newly generated files. +The newly generated JavaScript sources will be located in `iotdb/iotdb-protocol/thrift/target/generated-sources-nodejs` in the various modules of the `iotdb-protocol` module. + +## Using the Node.js native interface + +Simply copy the files in `iotdb/iotdb-protocol/thrift/target/generated-sources-nodejs/` and `iotdb/iotdb-protocol/thrift-commons/target/generated-sources-nodejs/` into your project. + +## rpc interface + +``` +// open a session +TSOpenSessionResp openSession(1:TSOpenSessionReq req); + +// close a session +TSStatus closeSession(1:TSCloseSessionReq req); + +// run an SQL statement in batch +TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); + +// execute SQL statement in batch +TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); + +// execute query SQL statement +TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); + +// execute insert, delete and update SQL statement +TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); + +// fetch next query result +TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) + +// fetch meta data +TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) + +// cancel a query +TSStatus cancelOperation(1:TSCancelOperationReq req); + +// close a query dataset +TSStatus closeOperation(1:TSCloseOperationReq req); + +// get time zone +TSGetTimeZoneResp getTimeZone(1:i64 sessionId); + +// set time zone +TSStatus setTimeZone(1:TSSetTimeZoneReq req); + +// get server's properties +ServerProperties getProperties(); + +// CREATE DATABASE +TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); + +// create timeseries +TSStatus createTimeseries(1:TSCreateTimeseriesReq req); + +// create multi timeseries +TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); + +// delete timeseries +TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) + +// delete sttorage groups +TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); + +// insert record +TSStatus insertRecord(1:TSInsertRecordReq req); + +// insert record in string format +TSStatus insertStringRecord(1:TSInsertStringRecordReq req); + +// insert tablet +TSStatus insertTablet(1:TSInsertTabletReq req); + +// insert tablets in batch +TSStatus insertTablets(1:TSInsertTabletsReq req); + +// insert records in batch +TSStatus insertRecords(1:TSInsertRecordsReq req); + +// insert records of one device +TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// insert records in batch as string format +TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); + +// test the latency of innsert tablet,caution:no data will be inserted, only for test latency +TSStatus testInsertTablet(1:TSInsertTabletReq req); + +// test the latency of innsert tablets,caution:no data will be inserted, only for test latency +TSStatus testInsertTablets(1:TSInsertTabletsReq req); + +// test the latency of innsert record,caution:no data will be inserted, only for test latency +TSStatus testInsertRecord(1:TSInsertRecordReq req); + +// test the latency of innsert record in string format,caution:no data will be inserted, only for test latency +TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); + +// test the latency of innsert records,caution:no data will be inserted, only for test latency +TSStatus testInsertRecords(1:TSInsertRecordsReq req); + +// test the latency of innsert records of one device,caution:no data will be inserted, only for test latency +TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// test the latency of innsert records in string formate,caution:no data will be inserted, only for test latency +TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); + +// delete data +TSStatus deleteData(1:TSDeleteDataReq req); + +// execute raw data query +TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); + +// request a statement id from server +i64 requestStatementId(1:i64 sessionId); +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-ODBC.md b/src/UserGuide/V2.0.1/Tree/API/Programming-ODBC.md new file mode 100644 index 00000000..8e0d7485 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-ODBC.md @@ -0,0 +1,146 @@ + + +# ODBC +With IoTDB JDBC, IoTDB can be accessed using the ODBC-JDBC bridge. + +## Dependencies +* IoTDB-JDBC's jar-with-dependency package +* ODBC-JDBC bridge (e.g. ZappySys JDBC Bridge) + +## Deployment +### Preparing JDBC package +Download the source code of IoTDB, and execute the following command in root directory: +```shell +mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies +``` +Then, you can see the output `iotdb-jdbc-1.3.2-SNAPSHOT-jar-with-dependencies.jar` under `iotdb-client/jdbc/target` directory. + +### Preparing ODBC-JDBC Bridge +*Note: Here we only provide one kind of ODBC-JDBC bridge as the instance. Readers can use other ODBC-JDBC bridges to access IoTDB with the IOTDB-JDBC.* +1. **Download Zappy-Sys ODBC-JDBC Bridge**: + Enter the https://zappysys.com/products/odbc-powerpack/odbc-jdbc-bridge-driver/ website, and click "download". + + ![ZappySys_website.jpg](https://alioss.timecho.com/upload/ZappySys_website.jpg) + +2. **Prepare IoTDB**: Set up IoTDB cluster, and write a row of data arbitrarily. + ```sql + IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) + ``` + +3. **Deploy and Test the Bridge**: + 1. Open ODBC Data Sources(32/64 bit), depending on the bits of Windows. One possible position is `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Administrative Tools`. + + ![ODBC_ADD_EN.jpg](https://alioss.timecho.com/upload/ODBC_ADD_EN.jpg) + + 2. Click on "add" and select ZappySys JDBC Bridge. + + ![ODBC_CREATE_EN.jpg](https://alioss.timecho.com/upload/ODBC_CREATE_EN.jpg) + + 3. Fill in the following settings: + + | Property | Content | Example | + |---------------------|-----------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------| + | Connection String | jdbc:iotdb://\:\/ | jdbc:iotdb://127.0.0.1:6667/ | + | Driver Class | org.apache.iotdb.jdbc.IoTDBDriver | org.apache.iotdb.jdbc.IoTDBDriver | + | JDBC driver file(s) | The path of IoTDB JDBC jar-with-dependencies | C:\Users\13361\Documents\GitHub\iotdb\iotdb-client\jdbc\target\iotdb-jdbc-1.3.2-SNAPSHOT-jar-with-dependencies.jar | + | User name | IoTDB's user name | root | + | User password | IoTDB's password | root | + + ![ODBC_CONNECTION.png](https://alioss.timecho.com/upload/ODBC_CONNECTION.png) + + 4. Click on "Test Connection" button, and a "Test Connection: SUCCESSFUL" should appear. + + ![ODBC_CONFIG_EN.jpg](https://alioss.timecho.com/upload/ODBC_CONFIG_EN.jpg) + + 5. Click the "Preview" button above, and replace the original query text with `select * from root.**`, then click "Preview Data", and the query result should correctly. + + ![ODBC_TEST.jpg](https://alioss.timecho.com/upload/ODBC_TEST.jpg) + +4. **Operate IoTDB's data with ODBC**: After correct deployment, you can use Microsoft's ODBC library to operate IoTDB's data. Here's an example written in C#: + ```C# + using System.Data.Odbc; + + // Get a connection + var dbConnection = new OdbcConnection("DSN=ZappySys JDBC Bridge"); + dbConnection.Open(); + + // Execute the write commands to prepare data + var dbCommand = dbConnection.CreateCommand(); + dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s1) values(1715670861634, 1)"; + dbCommand.ExecuteNonQuery(); + dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s2) values(1715670861634, true)"; + dbCommand.ExecuteNonQuery(); + dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s3) values(1715670861634, 3.1)"; + dbCommand.ExecuteNonQuery(); + + // Execute the read command + dbCommand.CommandText = "SELECT * FROM root.Keller.Flur.Energieversorgung"; + var dbReader = dbCommand.ExecuteReader(); + + // Write the output header + var fCount = dbReader.FieldCount; + Console.Write(":"); + for(var i = 0; i < fCount; i++) + { + var fName = dbReader.GetName(i); + Console.Write(fName + ":"); + } + Console.WriteLine(); + + // Output the content + while (dbReader.Read()) + { + Console.Write(":"); + for(var i = 0; i < fCount; i++) + { + var fieldType = dbReader.GetFieldType(i); + switch (fieldType.Name) + { + case "DateTime": + var dateTime = dbReader.GetInt64(i); + Console.Write(dateTime + ":"); + break; + case "Double": + if (dbReader.IsDBNull(i)) + { + Console.Write("null:"); + } + else + { + var fValue = dbReader.GetDouble(i); + Console.Write(fValue + ":"); + } + break; + default: + Console.Write(fieldType.Name + ":"); + break; + } + } + Console.WriteLine(); + } + + // Shut down gracefully + dbReader.Close(); + dbCommand.Dispose(); + dbConnection.Close(); + ``` + This program can write data into IoTDB, and query the data we have just written. diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-OPC-UA_timecho.md b/src/UserGuide/V2.0.1/Tree/API/Programming-OPC-UA_timecho.md new file mode 100644 index 00000000..703b47c6 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-OPC-UA_timecho.md @@ -0,0 +1,262 @@ + + +# OPC UA Protocol + +## OPC UA + +OPC UA is a technical specification used in the automation field for communication between different devices and systems, enabling cross platform, cross language, and cross network operations, providing a reliable and secure data exchange foundation for the Industrial Internet of Things. IoTDB supports OPC UA protocol, and IoTDB OPC Server supports both Client/Server and Pub/Sub communication modes. + +### OPC UA Client/Server Mode + +- **Client/Server Mode**:In this mode, IoTDB's stream processing engine establishes a connection with the OPC UA Server via an OPC UA Sink. The OPC UA Server maintains data within its Address Space, from which IoTDB can request and retrieve data. Additionally, other OPC UA Clients can access the data on the server. + +
+ +
+ + +- Features: + + - OPC UA will organize the device information received from Sink into folders under the Objects folder according to a tree model. + + - Each measurement point is recorded as a variable node and the latest value in the current database is recorded. + +### OPC UA Pub/Sub Mode + +- **Pub/Sub Mode**: In this mode, IoTDB's stream processing engine sends data change events to the OPC UA Server through an OPC UA Sink. These events are published to the server's message queue and managed through Event Nodes. Other OPC UA Clients can subscribe to these Event Nodes to receive notifications upon data changes. + +
+ +
+ +- Features: + + - Each measurement point is wrapped as an Event Node in OPC UA. + + + - The relevant fields and their meanings are as follows: + + | Field | Meaning | Type (Milo) | Example | + | :--------- | :--------------- | :------------ | :-------------------- | + | Time | Timestamp | DateTime | 1698907326198 | + | SourceName | Full path of the measurement point | String | root.test.opc.sensor0 | + | SourceNode | Data type of the measurement point | NodeId | Int32 | + | Message | Data | LocalizedText | 3.0 | + + - Events are only sent to clients that are already listening; if a client is not connected, the Event will be ignored. + + +## IoTDB OPC Server Startup method + +### Syntax + +The syntax for creating the Sink is as follows: + + +```SQL +create pipe p1 + with source (...) + with processor (...) + with sink ('sink' = 'opc-ua-sink', + 'sink.opcua.tcp.port' = '12686', + 'sink.opcua.https.port' = '8443', + 'sink.user' = 'root', + 'sink.password' = 'root', + 'sink.opcua.security.dir' = '...' + ) +``` + +### Parameters + +| key | value | value range | required or not | default value | +| :------------------------------ | :----------------------------------------------------------- | :------------------------------------- | :------- | :------------- | +| sink | OPC UA SINK | String: opc-ua-sink | Required | | +| sink.opcua.model | OPC UA model used | String: client-server / pub-sub | Optional | client-server | +| sink.opcua.tcp.port | OPC UA's TCP port | Integer: [0, 65536] | Optional | 12686 | +| sink.opcua.https.port | OPC UA's HTTPS port | Integer: [0, 65536] | Optional | 8443 | +| sink.opcua.security.dir | Directory for OPC UA's keys and certificates | String: Path, supports absolute and relative directories | Optional | Opc_security folder/in the conf directory of the DataNode related to iotdb
If there is no conf directory for iotdb (such as launching DataNode in IDEA), it will be the iotdb_opc_Security folder/in the user's home directory | +| sink.opcua.enable-anonymous-access | Whether OPC UA allows anonymous access | Boolean | Optional | true | +| sink.user | User for OPC UA, specified in the configuration | String | Optional | root | +| sink.password | Password for OPC UA, specified in the configuration | String | Optional | root | + +### 示例 + +```Bash +create pipe p1 + with sink ('sink' = 'opc-ua-sink', + 'sink.user' = 'root', + 'sink.password' = 'root'); +start pipe p1; +``` + +### Usage Limitations + +1. **DataRegion Requirement**: The OPC UA server will only start if there is a DataRegion in IoTDB. For an empty IoTDB, a data entry is necessary for the OPC UA server to become effective. + +2. **Data Availability**: Clients subscribing to the server will not receive data written to IoTDB before their connection. + +3. **Multiple DataNodes may have scattered sending/conflict issues**: + + - For IoTDB clusters with multiple dataRegions and scattered across different DataNode IPs, data will be sent in a dispersed manner on the leaders of the dataRegions. The client needs to listen to the configuration ports of the DataNode IP separately.。 + + - Suggest using this OPC UA server under 1C1D. + +4. **Does not support deleting data and modifying measurement point types:** In Client Server mode, OPC UA cannot delete data or change data type settings. In Pub Sub mode, if data is deleted, information cannot be pushed to the client. + +## IoTDB OPC Server Example + +### Client / Server Mode + +#### Preparation Work + +1. Take UAExpert client as an example, download the UAExpert client: https://www.unified-automation.com/downloads/opc-ua-clients.html + +2. Install UAExpert and fill in your own certificate information. + +#### Quick Start + +1. Use the following SQL to create and start the OPC UA Sink in client-server mode. For detailed syntax, please refer to: [IoTDB OPC Server Syntax](#syntax) + +```SQL +create pipe p1 with sink ('sink'='opc-ua-sink'); +``` + +2. Write some data. + +```SQL +insert into root.test.db(time, s2) values(now(), 2) +``` + +​ The metadata is automatically created and enabled here. + +3. Configure the connection to IoTDB in UAExpert, where the password should be set to the one defined in the sink.password parameter (using the default password "root" as an example): + +
+ +
+ +
+ +
+ +4. After trusting the server's certificate, you can see the written data in the Objects folder on the left. + +
+ +
+ +
+ +
+ +5. You can drag the node on the left to the center and display the latest value of that node: + +
+ +
+ +### Pub / Sub Mode + +#### Preparation Work + +The code is located in the [opc-ua-sink 文件夹](https://github.com/apache/iotdb/tree/master/example/pipe-opc-ua-sink/src/main/java/org/apache/iotdb/opcua) under the iotdb-example package. + +The code includes: + +- The main class (ClientTest) +- Client certificate-related logic(IoTDBKeyStoreLoaderClient) +- Client configuration and startup logic(ClientExampleRunner) +- The parent class of ClientTest(ClientExample) + +### Quick Start + +The steps are as follows: + +1. Start IoTDB and write some data. + +```SQL +insert into root.a.b(time, c, d) values(now(), 1, 2); +``` + +​ The metadata is automatically created and enabled here. + +2. Use the following SQL to create and start the OPC UA Sink in Pub-Sub mode. For detailed syntax, please refer to: [IoTDB OPC Server Syntax](#syntax) + +```SQL +create pipe p1 with sink ('sink'='opc-ua-sink', + 'sink.opcua.model'='pub-sub'); +start pipe p1; +``` + +​ At this point, you can see that the opc certificate-related directory has been created under the server's conf directory. + +
+ +
+ +3. Run the Client connection directly; the Client's certificate will be rejected by the server. + +
+ +
+ +4. Go to the server's sink.opcua.security.dir directory, then to the pki's rejected directory, where the Client's certificate should have been generated. + +
+ +
+ +5. Move (not copy) the client's certificate into (not into a subdirectory of) the trusted directory's certs folder in the same directory. + +
+ +
+ +6. Open the Client connection again; the server's certificate should now be rejected by the Client. + +
+ +
+ +7. Go to the client's /client/security directory, then to the pki's rejected directory, and move the server's certificate into (not into a subdirectory of) the trusted directory. + +
+ +
+ +8. Open the Client, and now the two-way trust is successful, and the Client can connect to the server. + +9. Write data to the server, and the Client will print out the received data. + +
+ +
+ + +### Notes + +1. **stand alone and cluster:**It is recommended to use a 1C1D (one coordinator and one data node) single machine version. If there are multiple DataNodes in the cluster, data may be sent in a scattered manner across various DataNodes, and it may not be possible to listen to all the data. + +2. **No Need to Operate Root Directory Certificates:** During the certificate operation process, there is no need to operate the `iotdb-server.pfx` certificate under the IoTDB security root directory and the `example-client.pfx` directory under the client security directory. When the Client and Server connect bidirectionally, they will send the root directory certificate to each other. If it is the first time the other party sees this certificate, it will be placed in the reject dir. If the certificate is in the trusted/certs, then the other party can trust it. + +3. **It is Recommended to Use Java 17+:** +In JVM 8 versions, there may be a key length restriction, resulting in an "Illegal key size" error. For specific versions (such as jdk.1.8u151+), you can add `Security.`*`setProperty`*`("crypto.policy", "unlimited");`; in the create client of ClientExampleRunner to solve this, or you can download the unlimited package `local_policy.jar` and `US_export_policy ` to replace the packages in the `JDK/jre/lib/security `. Download link:https://www.oracle.com/java/technologies/javase-jce8-downloads.html。 diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-Python-Native-API.md b/src/UserGuide/V2.0.1/Tree/API/Programming-Python-Native-API.md new file mode 100644 index 00000000..b17d73ea --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-Python-Native-API.md @@ -0,0 +1,732 @@ + + +# Python Native API + +## Requirements + +You have to install thrift (>=0.13) before using the package. + + + +## How to use (Example) + +First, download the package: `pip3 install apache-iotdb` + +You can get an example of using the package to read and write data at here: [Example](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/SessionExample.py) + +An example of aligned timeseries: [Aligned Timeseries Session Example](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/SessionAlignedTimeseriesExample.py) + +(you need to add `import iotdb` in the head of the file) + +Or: + +```python +from iotdb.Session import Session + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +session = Session(ip, port_, username_, password_) +session.open(False) +zone = session.get_time_zone() +session.close() +``` + +## Initialization + +* Initialize a Session + +```python +session = Session( + ip="127.0.0.1", + port="6667", + user="root", + password="root", + fetch_size=1024, + zone_id="UTC+8", + enable_redirection=True +) +``` + +* Initialize a Session to connect multiple nodes + +```python +session = Session.init_from_node_urls( + node_urls=["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"], + user="root", + password="root", + fetch_size=1024, + zone_id="UTC+8", + enable_redirection=True +) +``` + +* Open a session, with a parameter to specify whether to enable RPC compression + +```python +session.open(enable_rpc_compression=False) +``` + +Notice: this RPC compression status of client must comply with that of IoTDB server + +* Close a Session + +```python +session.close() +``` +## Managing Session through SessionPool + +Utilizing SessionPool to manage sessions eliminates the need to worry about session reuse. When the number of session connections reaches the maximum capacity of the pool, requests for acquiring a session will be blocked, and you can set the blocking wait time through parameters. After using a session, it should be returned to the SessionPool using the `putBack` method for proper management. + +### Create SessionPool + +```python +pool_config = PoolConfig(host=ip,port=port, user_name=username, + password=password, fetch_size=1024, + time_zone="UTC+8", max_retry=3) +max_pool_size = 5 +wait_timeout_in_ms = 3000 + +# # Create the connection pool +session_pool = SessionPool(pool_config, max_pool_size, wait_timeout_in_ms) +``` +### Create a SessionPool using distributed nodes. +```python +pool_config = PoolConfig(node_urls=node_urls=["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"], user_name=username, + password=password, fetch_size=1024, + time_zone="UTC+8", max_retry=3) +max_pool_size = 5 +wait_timeout_in_ms = 3000 +``` +### Acquiring a session through SessionPool and manually calling PutBack after use + +```python +session = session_pool.get_session() +session.set_storage_group(STORAGE_GROUP_NAME) +session.create_time_series( + TIMESERIES_PATH, TSDataType.BOOLEAN, TSEncoding.PLAIN, Compressor.SNAPPY +) +# After usage, return the session using putBack +session_pool.put_back(session) +# When closing the sessionPool, all managed sessions will be closed as well +session_pool.close() +``` + +## Data Definition Interface (DDL Interface) + +### Database Management + +* CREATE DATABASE + +```python +session.set_storage_group(group_name) +``` + +* Delete one or several databases + +```python +session.delete_storage_group(group_name) +session.delete_storage_groups(group_name_lst) +``` +### Timeseries Management + +* Create one or multiple timeseries + +```python +session.create_time_series(ts_path, data_type, encoding, compressor, + props=None, tags=None, attributes=None, alias=None) + +session.create_multi_time_series( + ts_path_lst, data_type_lst, encoding_lst, compressor_lst, + props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None +) +``` + +* Create aligned timeseries + +```python +session.create_aligned_time_series( + device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst +) +``` + +Attention: Alias of measurements are **not supported** currently. + +* Delete one or several timeseries + +```python +session.delete_time_series(paths_list) +``` + +* Check whether the specific timeseries exists + +```python +session.check_time_series_exists(path) +``` + +## Data Manipulation Interface (DML Interface) + +### Insert + +It is recommended to use insertTablet to help improve write efficiency. + +* Insert a Tablet,which is multiple rows of a device, each row has the same measurements + * **Better Write Performance** + * **Support null values**: fill the null value with any value, and then mark the null value via BitMap (from v0.13) + + +We have two implementations of Tablet in Python API. + +* Normal Tablet + +```python +values_ = [ + [False, 10, 11, 1.1, 10011.1, "test01"], + [True, 100, 11111, 1.25, 101.0, "test02"], + [False, 100, 1, 188.1, 688.25, "test03"], + [True, 0, 0, 0, 6.25, "test04"], +] +timestamps_ = [1, 2, 3, 4] +tablet_ = Tablet( + device_id, measurements_, data_types_, values_, timestamps_ +) +session.insert_tablet(tablet_) + +values_ = [ + [None, 10, 11, 1.1, 10011.1, "test01"], + [True, None, 11111, 1.25, 101.0, "test02"], + [False, 100, None, 188.1, 688.25, "test03"], + [True, 0, 0, 0, None, None], +] +timestamps_ = [16, 17, 18, 19] +tablet_ = Tablet( + device_id, measurements_, data_types_, values_, timestamps_ +) +session.insert_tablet(tablet_) +``` +* Numpy Tablet + +Comparing with Tablet, Numpy Tablet is using [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) to record data. +With less memory footprint and time cost of serialization, the insert performance will be better. + +**Notice** +1. time and numerical value columns in Tablet is ndarray +2. recommended to use the specific dtypes to each ndarray, see the example below + (if not, the default dtypes are also ok). + +```python +import numpy as np +data_types_ = [ + TSDataType.BOOLEAN, + TSDataType.INT32, + TSDataType.INT64, + TSDataType.FLOAT, + TSDataType.DOUBLE, + TSDataType.TEXT, +] +np_values_ = [ + np.array([False, True, False, True], TSDataType.BOOLEAN.np_dtype()), + np.array([10, 100, 100, 0], TSDataType.INT32.np_dtype()), + np.array([11, 11111, 1, 0], TSDataType.INT64.np_dtype()), + np.array([1.1, 1.25, 188.1, 0], TSDataType.FLOAT.np_dtype()), + np.array([10011.1, 101.0, 688.25, 6.25], TSDataType.DOUBLE.np_dtype()), + np.array(["test01", "test02", "test03", "test04"], TSDataType.TEXT.np_dtype()), +] +np_timestamps_ = np.array([1, 2, 3, 4], TSDataType.INT64.np_dtype()) +np_tablet_ = NumpyTablet( + device_id, measurements_, data_types_, np_values_, np_timestamps_ +) +session.insert_tablet(np_tablet_) + +# insert one numpy tablet with None into the database. +np_values_ = [ + np.array([False, True, False, True], TSDataType.BOOLEAN.np_dtype()), + np.array([10, 100, 100, 0], TSDataType.INT32.np_dtype()), + np.array([11, 11111, 1, 0], TSDataType.INT64.np_dtype()), + np.array([1.1, 1.25, 188.1, 0], TSDataType.FLOAT.np_dtype()), + np.array([10011.1, 101.0, 688.25, 6.25], TSDataType.DOUBLE.np_dtype()), + np.array(["test01", "test02", "test03", "test04"], TSDataType.TEXT.np_dtype()), +] +np_timestamps_ = np.array([98, 99, 100, 101], TSDataType.INT64.np_dtype()) +np_bitmaps_ = [] +for i in range(len(measurements_)): + np_bitmaps_.append(BitMap(len(np_timestamps_))) +np_bitmaps_[0].mark(0) +np_bitmaps_[1].mark(1) +np_bitmaps_[2].mark(2) +np_bitmaps_[4].mark(3) +np_bitmaps_[5].mark(3) +np_tablet_with_none = NumpyTablet( + device_id, measurements_, data_types_, np_values_, np_timestamps_, np_bitmaps_ +) +session.insert_tablet(np_tablet_with_none) +``` + +* Insert multiple Tablets + +```python +session.insert_tablets(tablet_lst) +``` + +* Insert a Record + +```python +session.insert_record(device_id, timestamp, measurements_, data_types_, values_) +``` + +* Insert multiple Records + +```python +session.insert_records( + device_ids_, time_list_, measurements_list_, data_type_list_, values_list_ +) +``` + +* Insert multiple Records that belong to the same device. + With type info the server has no need to do type inference, which leads a better performance + + +```python +session.insert_records_of_one_device(device_id, time_list, measurements_list, data_types_list, values_list) +``` + +### Insert with type inference + +When the data is of String type, we can use the following interface to perform type inference based on the value of the value itself. For example, if value is "true" , it can be automatically inferred to be a boolean type. If value is "3.2" , it can be automatically inferred as a flout type. Without type information, server has to do type inference, which may cost some time. + +* Insert a Record, which contains multiple measurement value of a device at a timestamp + +```python +session.insert_str_record(device_id, timestamp, measurements, string_values) +``` + +### Insert of Aligned Timeseries + +The Insert of aligned timeseries uses interfaces like insert_aligned_XXX, and others are similar to the above interfaces: + +* insert_aligned_record +* insert_aligned_records +* insert_aligned_records_of_one_device +* insert_aligned_tablet +* insert_aligned_tablets + + +## IoTDB-SQL Interface + +* Execute query statement + +```python +session.execute_query_statement(sql) +``` + +* Execute non query statement + +```python +session.execute_non_query_statement(sql) +``` + +* Execute statement + +```python +session.execute_statement(sql) +``` + +## Schema Template +### Create Schema Template +The step for creating a metadata template is as follows +1. Create the template class +2. Adding MeasurementNode +3. Execute create schema template function + +```python +template = Template(name=template_name, share_time=True) + +m_node_x = MeasurementNode("x", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) +m_node_y = MeasurementNode("y", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) +m_node_z = MeasurementNode("z", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) + +template.add_template(m_node_x) +template.add_template(m_node_y) +template.add_template(m_node_z) + +session.create_schema_template(template) +``` +### Modify Schema Template measurements +Modify measurements in a template, the template must be already created. These are functions that add or delete some measurement nodes. +* add node in template +```python +session.add_measurements_in_template(template_name, measurements_path, data_types, encodings, compressors, is_aligned) +``` + +* delete node in template +```python +session.delete_node_in_template(template_name, path) +``` + +### Set Schema Template +```python +session.set_schema_template(template_name, prefix_path) +``` + +### Uset Schema Template +```python +session.unset_schema_template(template_name, prefix_path) +``` + +### Show Schema Template +* Show all schema templates +```python +session.show_all_templates() +``` +* Count all measurements in templates +```python +session.count_measurements_in_template(template_name) +``` + +* Judge whether the path is measurement or not in templates, This measurement must be in the template +```python +session.count_measurements_in_template(template_name, path) +``` + +* Judge whether the path is exist or not in templates, This path may not belong to the template +```python +session.is_path_exist_in_template(template_name, path) +``` + +* Show nodes under in schema template +```python +session.show_measurements_in_template(template_name) +``` + +* Show the path prefix where a schema template is set +```python +session.show_paths_template_set_on(template_name) +``` + +* Show the path prefix where a schema template is used (i.e. the time series has been created) +```python +session.show_paths_template_using_on(template_name) +``` + +### Drop Schema Template +Delete an existing metadata template,dropping an already set template is not supported +```python +session.drop_schema_template("template_python") +``` + + +## Pandas Support + +To easily transform a query result to a [Pandas Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) +the SessionDataSet has a method `.todf()` which consumes the dataset and transforms it to a pandas dataframe. + +Example: + +```python +from iotdb.Session import Session + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +session = Session(ip, port_, username_, password_) +session.open(False) +result = session.execute_query_statement("SELECT * FROM root.*") + +# Transform to Pandas Dataset +df = result.todf() + +session.close() + +# Now you can work with the dataframe +df = ... +``` + + +## IoTDB Testcontainer + +The Test Support is based on the lib `testcontainers` (https://testcontainers-python.readthedocs.io/en/latest/index.html) which you need to install in your project if you want to use the feature. + +To start (and stop) an IoTDB Database in a Docker container simply do: +```python +class MyTestCase(unittest.TestCase): + + def test_something(self): + with IoTDBContainer() as c: + session = Session("localhost", c.get_exposed_port(6667), "root", "root") + session.open(False) + result = session.execute_query_statement("SHOW TIMESERIES") + print(result) + session.close() +``` + +by default it will load the image `apache/iotdb:latest`, if you want a specific version just pass it like e.g. `IoTDBContainer("apache/iotdb:0.12.0")` to get version `0.12.0` running. + +## IoTDB DBAPI + +IoTDB DBAPI implements the Python DB API 2.0 specification (https://peps.python.org/pep-0249/), which defines a common +interface for accessing databases in Python. + +### Examples ++ Initialization + +The initialized parameters are consistent with the session part (except for the sqlalchemy_mode). +```python +from iotdb.dbapi import connect + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +conn = connect(ip, port_, username_, password_,fetch_size=1024,zone_id="UTC+8",sqlalchemy_mode=False) +cursor = conn.cursor() +``` ++ simple SQL statement execution +```python +cursor.execute("SELECT ** FROM root") +for row in cursor.fetchall(): + print(row) +``` + ++ execute SQL with parameter + +IoTDB DBAPI supports pyformat style parameters +```python +cursor.execute("SELECT ** FROM root WHERE time < %(time)s",{"time":"2017-11-01T00:08:00.000"}) +for row in cursor.fetchall(): + print(row) +``` + ++ execute SQL with parameter sequences +```python +seq_of_parameters = [ + {"timestamp": 1, "temperature": 1}, + {"timestamp": 2, "temperature": 2}, + {"timestamp": 3, "temperature": 3}, + {"timestamp": 4, "temperature": 4}, + {"timestamp": 5, "temperature": 5}, +] +sql = "insert into root.cursor(timestamp,temperature) values(%(timestamp)s,%(temperature)s)" +cursor.executemany(sql,seq_of_parameters) +``` + ++ close the connection and cursor +```python +cursor.close() +conn.close() +``` + +## IoTDB SQLAlchemy Dialect (Experimental) +The SQLAlchemy dialect of IoTDB is written to adapt to Apache Superset. +This part is still being improved. +Please do not use it in the production environment! +### Mapping of the metadata +The data model used by SQLAlchemy is a relational data model, which describes the relationships between different entities through tables. +While the data model of IoTDB is a hierarchical data model, which organizes the data through a tree structure. +In order to adapt IoTDB to the dialect of SQLAlchemy, the original data model in IoTDB needs to be reorganized. +Converting the data model of IoTDB into the data model of SQLAlchemy. + +The metadata in the IoTDB are: + +1. Database +2. Path +3. Entity +4. Measurement + +The metadata in the SQLAlchemy are: +1. Schema +2. Table +3. Column + +The mapping relationship between them is: + +| The metadata in the SQLAlchemy | The metadata in the IoTDB | +| -------------------- | -------------------------------------------- | +| Schema | Database | +| Table | Path ( from database to entity ) + Entity | +| Column | Measurement | + +The following figure shows the relationship between the two more intuitively: + +![sqlalchemy-to-iotdb](https://alioss.timecho.com/docs/img/UserGuide/API/IoTDB-SQLAlchemy/sqlalchemy-to-iotdb.png?raw=true) + +### Data type mapping +| data type in IoTDB | data type in SQLAlchemy | +|--------------------|-------------------------| +| BOOLEAN | Boolean | +| INT32 | Integer | +| INT64 | BigInteger | +| FLOAT | Float | +| DOUBLE | Float | +| TEXT | Text | +| LONG | BigInteger | + +### Example + ++ execute statement + +```python +from sqlalchemy import create_engine + +engine = create_engine("iotdb://root:root@127.0.0.1:6667") +connect = engine.connect() +result = connect.execute("SELECT ** FROM root") +for row in result.fetchall(): + print(row) +``` + ++ ORM (now only simple queries are supported) + +```python +from sqlalchemy import create_engine, Column, Float, BigInteger, MetaData +from sqlalchemy.ext.declarative import declarative_base +from sqlalchemy.orm import sessionmaker + +metadata = MetaData( + schema='root.factory' +) +Base = declarative_base(metadata=metadata) + + +class Device(Base): + __tablename__ = "room2.device1" + Time = Column(BigInteger, primary_key=True) + temperature = Column(Float) + status = Column(Float) + + +engine = create_engine("iotdb://root:root@127.0.0.1:6667") + +DbSession = sessionmaker(bind=engine) +session = DbSession() + +res = session.query(Device.status).filter(Device.temperature > 1) + +for row in res: + print(row) +``` + + +## Developers + +### Introduction + +This is an example of how to connect to IoTDB with python, using the thrift rpc interfaces. Things are almost the same on Windows or Linux, but pay attention to the difference like path separator. + + + +### Prerequisites + +Python3.7 or later is preferred. + +You have to install Thrift (0.11.0 or later) to compile our thrift file into python code. Below is the official tutorial of installation, eventually, you should have a thrift executable. + +``` +http://thrift.apache.org/docs/install/ +``` + +Before starting you need to install `requirements_dev.txt` in your python environment, e.g. by calling +```shell +pip install -r requirements_dev.txt +``` + + + +### Compile the thrift library and Debug + +In the root of IoTDB's source code folder, run `mvn clean generate-sources -pl iotdb-client/client-py -am`. + +This will automatically delete and repopulate the folder `iotdb/thrift` with the generated thrift files. +This folder is ignored from git and should **never be pushed to git!** + +**Notice** Do not upload `iotdb/thrift` to the git repo. + + + + +### Session Client & Example + +We packed up the Thrift interface in `client-py/src/iotdb/Session.py` (similar with its Java counterpart), also provided an example file `client-py/src/SessionExample.py` of how to use the session module. please read it carefully. + + +Or, another simple example: + +```python +from iotdb.Session import Session + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +session = Session(ip, port_, username_, password_) +session.open(False) +zone = session.get_time_zone() +session.close() +``` + + + +### Tests + +Please add your custom tests in `tests` folder. + +To run all defined tests just type `pytest .` in the root folder. + +**Notice** Some tests need docker to be started on your system as a test instance is started in a docker container using [testcontainers](https://testcontainers-python.readthedocs.io/en/latest/index.html). + + + +### Futher Tools + +[black](https://pypi.org/project/black/) and [flake8](https://pypi.org/project/flake8/) are installed for autoformatting and linting. +Both can be run by `black .` or `flake8 .` respectively. + + + +## Releasing + +To do a release just ensure that you have the right set of generated thrift files. +Then run linting and auto-formatting. +Then, ensure that all tests work (via `pytest .`). +Then you are good to go to do a release! + + + +### Preparing your environment + +First, install all necessary dev dependencies via `pip install -r requirements_dev.txt`. + + + +### Doing the Release + +There is a convenient script `release.sh` to do all steps for a release. +Namely, these are + +* Remove all transient directories from last release (if exists) +* (Re-)generate all generated sources via mvn +* Run Linting (flake8) +* Run Tests via pytest +* Build +* Release to pypi + diff --git a/src/UserGuide/V2.0.1/Tree/API/Programming-Rust-Native-API.md b/src/UserGuide/V2.0.1/Tree/API/Programming-Rust-Native-API.md new file mode 100644 index 00000000..f58df68f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/Programming-Rust-Native-API.md @@ -0,0 +1,188 @@ + + +# Rust Native API Native API + +IoTDB uses Thrift as a cross language RPC framework, so access to IoTDB can be achieved through the interface provided by Thrift. +This document will introduce how to generate a native Rust interface that can access IoTDB. + +## Dependents + + * JDK >= 1.8 + * Rust >= 1.0.0 + * thrift 0.14.1 + * Linux、Macos or like unix + * Windows+bash + +Thrift (0.14.1 or higher) must be installed to compile Thrift files into Rust code. The following is the official installation tutorial, and in the end, you should receive a Thrift executable file. + +``` +http://thrift.apache.org/docs/install/ +``` + +## Compile the Thrift library and generate the Rust native interface + +1. Find the `pom.xml` file in the root directory of the IoTDB source code folder. +2. Open the `pom.xml` file and find the following content: + ```xml + + generate-thrift-sources-python + generate-sources + + compile + + + py + ${project.build.directory}/generated-sources-python/ + + + ``` +3. Duplicate this block and change the `id`, `generator` and `outputDirectory` to this: + ```xml + + generate-thrift-sources-rust + generate-sources + + compile + + + rs + ${project.build.directory}/generated-sources-rust/ + + + ``` +4. In the root directory of the IoTDB source code folder,run `mvn clean generate-sources`. + +This command will automatically delete the files in `iotdb/iotdb-protocol/thrift/target` and `iotdb/iotdb-protocol/thrift-commons/target`, and repopulate the folder with the newly generated files. +The newly generated Rust sources will be located in `iotdb/iotdb-protocol/thrift/target/generated-sources-rust` in the various modules of the `iotdb-protocol` module. + +## Using the Rust native interface + +Copy `iotdb/iotdb-protocol/thrift/target/generated-sources-rust/` and `iotdb/iotdb-protocol/thrift-commons/target/generated-sources-rust/` into your project。 + +## RPC interface + +``` +// open a session +TSOpenSessionResp openSession(1:TSOpenSessionReq req); + +// close a session +TSStatus closeSession(1:TSCloseSessionReq req); + +// run an SQL statement in batch +TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); + +// execute SQL statement in batch +TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); + +// execute query SQL statement +TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); + +// execute insert, delete and update SQL statement +TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); + +// fetch next query result +TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) + +// fetch meta data +TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) + +// cancel a query +TSStatus cancelOperation(1:TSCancelOperationReq req); + +// close a query dataset +TSStatus closeOperation(1:TSCloseOperationReq req); + +// get time zone +TSGetTimeZoneResp getTimeZone(1:i64 sessionId); + +// set time zone +TSStatus setTimeZone(1:TSSetTimeZoneReq req); + +// get server's properties +ServerProperties getProperties(); + +// CREATE DATABASE +TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); + +// create timeseries +TSStatus createTimeseries(1:TSCreateTimeseriesReq req); + +// create multi timeseries +TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); + +// delete timeseries +TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) + +// delete sttorage groups +TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); + +// insert record +TSStatus insertRecord(1:TSInsertRecordReq req); + +// insert record in string format +TSStatus insertStringRecord(1:TSInsertStringRecordReq req); + +// insert tablet +TSStatus insertTablet(1:TSInsertTabletReq req); + +// insert tablets in batch +TSStatus insertTablets(1:TSInsertTabletsReq req); + +// insert records in batch +TSStatus insertRecords(1:TSInsertRecordsReq req); + +// insert records of one device +TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// insert records in batch as string format +TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); + +// test the latency of innsert tablet,caution:no data will be inserted, only for test latency +TSStatus testInsertTablet(1:TSInsertTabletReq req); + +// test the latency of innsert tablets,caution:no data will be inserted, only for test latency +TSStatus testInsertTablets(1:TSInsertTabletsReq req); + +// test the latency of innsert record,caution:no data will be inserted, only for test latency +TSStatus testInsertRecord(1:TSInsertRecordReq req); + +// test the latency of innsert record in string format,caution:no data will be inserted, only for test latency +TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); + +// test the latency of innsert records,caution:no data will be inserted, only for test latency +TSStatus testInsertRecords(1:TSInsertRecordsReq req); + +// test the latency of innsert records of one device,caution:no data will be inserted, only for test latency +TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// test the latency of innsert records in string formate,caution:no data will be inserted, only for test latency +TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); + +// delete data +TSStatus deleteData(1:TSDeleteDataReq req); + +// execute raw data query +TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); + +// request a statement id from server +i64 requestStatementId(1:i64 sessionId); +``` diff --git a/src/UserGuide/V2.0.1/Tree/API/RestServiceV1.md b/src/UserGuide/V2.0.1/Tree/API/RestServiceV1.md new file mode 100644 index 00000000..738448e8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/RestServiceV1.md @@ -0,0 +1,930 @@ + + +# RESTful API V1(Not Recommend) +IoTDB's RESTful services can be used for query, write, and management operations, using the OpenAPI standard to define interfaces and generate frameworks. + +## Enable RESTful Services + +RESTful services are disabled by default. + +* Developer + + Find the `IoTDBrestServiceConfig` class under `org.apache.iotdb.db.conf.rest` in the sever module, and modify `enableRestService=true`. + +* User + + Find the `conf/conf/iotdb-system.properties` file under the IoTDB installation directory and set `enable_rest_service` to `true` to enable the module. + + ```properties + enable_rest_service=true + ``` + +## Authentication +Except the liveness probe API `/ping`, RESTful services use the basic authentication. Each URL request needs to carry `'Authorization': 'Basic ' + base64.encode(username + ':' + password)`. + +The username used in the following examples is: `root`, and password is: `root`. + +And the authorization header is + +``` +Authorization: Basic cm9vdDpyb290 +``` + +- If a user authorized with incorrect username or password, the following error is returned: + + HTTP Status Code:`401` + + HTTP response body: + ```json + { + "code": 600, + "message": "WRONG_LOGIN_PASSWORD_ERROR" + } + ``` + +- If the `Authorization` header is missing,the following error is returned: + + HTTP Status Code:`401` + + HTTP response body: + ```json + { + "code": 603, + "message": "UNINITIALIZED_AUTH_ERROR" + } + ``` + +## Interface + +### ping + +The `/ping` API can be used for service liveness probing. + +Request method: `GET` + +Request path: `http://ip:port/ping` + +The user name used in the example is: root, password: root + +Example request: + +```shell +$ curl http://127.0.0.1:18080/ping +``` + +Response status codes: + +- `200`: The service is alive. +- `503`: The service cannot accept any requests now. + +Response parameters: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +|code | integer | status code | +| message | string | message | + +Sample response: + +- With HTTP status code `200`: + + ```json + { + "code": 200, + "message": "SUCCESS_STATUS" + } + ``` + +- With HTTP status code `503`: + + ```json + { + "code": 500, + "message": "thrift service is unavailable" + } + ``` + +> `/ping` can be accessed without authorization. + +### query + +The query interface can be used to handle data queries and metadata queries. + +Request method: `POST` + +Request header: `application/json` + +Request path: `http://ip:port/rest/v1/query` + +Parameter Description: + +| parameter name | parameter type | required | parameter description | +|----------------| -------------- | -------- | ------------------------------------------------------------ | +| sql | string | yes | | +| rowLimit | integer | no | The maximum number of rows in the result set that can be returned by a query.
If this parameter is not set, the `rest_query_default_row_size_limit` of the configuration file will be used as the default value.
When the number of rows in the returned result set exceeds the limit, the status code `411` will be returned. | + +Response parameters: + +| parameter name | parameter type | parameter description | +|----------------| -------------- | ------------------------------------------------------------ | +| expressions | array | Array of result set column names for data query, `null` for metadata query | +| columnNames | array | Array of column names for metadata query result set, `null` for data query | +| timestamps | array | Timestamp column, `null` for metadata query | +| values | array | A two-dimensional array, the first dimension has the same length as the result set column name array, and the second dimension array represents a column of the result set | + +**Examples:** + +Tip: Statements like `select * from root.xx.**` are not recommended because those statements may cause OOM. + +**Expression query** + + ```shell + curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4, s3 + 1 from root.sg27 limit 2"}' http://127.0.0.1:18080/rest/v1/query + ```` +Response instance + ```json + { + "expressions": [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg27.s3 + 1" + ], + "columnNames": null, + "timestamps": [ + 1635232143960, + 1635232153960 + ], + "values": [ + [ + 11, + null + ], + [ + false, + true + ], + [ + 12.0, + null + ] + ] + } + ``` + +**Show child paths** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child paths root"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "child paths" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ] + ] +} +``` + +**Show child nodes** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child nodes root"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "child nodes" + ], + "timestamps": null, + "values": [ + [ + "sg27", + "sg28" + ] + ] +} +``` + +**Show all ttl** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show all ttl"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + null, + null + ] + ] +} +``` + +**Show ttl** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show ttl on root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27" + ], + [ + null + ] + ] +} +``` + +**Show functions** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show functions"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "function name", + "function type", + "class name (UDF)" + ], + "timestamps": null, + "values": [ + [ + "ABS", + "ACOS", + "ASIN", + ... + ], + [ + "built-in UDTF", + "built-in UDTF", + "built-in UDTF", + ... + ], + [ + "org.apache.iotdb.db.query.udf.builtin.UDTFAbs", + "org.apache.iotdb.db.query.udf.builtin.UDTFAcos", + "org.apache.iotdb.db.query.udf.builtin.UDTFAsin", + ... + ] + ] +} +``` + +**Show timeseries** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show timeseries"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg28.s3", + "root.sg28.s4" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg27", + "root.sg27", + "root.sg28", + "root.sg28" + ], + [ + "INT32", + "BOOLEAN", + "INT32", + "BOOLEAN" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +**Show latest timeseries** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show latest timeseries"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg28.s4", + "root.sg27.s4", + "root.sg28.s3", + "root.sg27.s3" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg28", + "root.sg27", + "root.sg28", + "root.sg27" + ], + [ + "BOOLEAN", + "BOOLEAN", + "INT32", + "INT32" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +**Count timeseries** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count timeseries root.**"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +**Count nodes** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count nodes root.** level=2"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +**Show devices** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "devices", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +**Show devices with database** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices with database"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "devices", + "database", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +**List user** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"list user"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "user" + ], + "timestamps": null, + "values": [ + [ + "root" + ] + ] +} +``` + +**Aggregation** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "columnNames": null, + "timestamps": [ + 0 + ], + "values": [ + [ + 1 + ], + [ + 2 + ] + ] +} +``` + +**Group by level** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.** group by level = 1"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "count(root.sg27.*)", + "count(root.sg28.*)" + ], + "timestamps": null, + "values": [ + [ + 3 + ], + [ + 3 + ] + ] +} +``` + +**Group by** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27 group by([1635232143960,1635232153960),1s)"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "columnNames": null, + "timestamps": [ + 1635232143960, + 1635232144960, + 1635232145960, + 1635232146960, + 1635232147960, + 1635232148960, + 1635232149960, + 1635232150960, + 1635232151960, + 1635232152960 + ], + "values": [ + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ], + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ] + ] +} +``` + +**Last** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select last s3 from root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "expressions": null, + "columnNames": [ + "timeseries", + "value", + "dataType" + ], + "timestamps": [ + 1635232143960 + ], + "values": [ + [ + "root.sg27.s3" + ], + [ + "11" + ], + [ + "INT32" + ] + ] +} +``` + +**Disable align** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select * from root.sg27 disable align"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "code": 407, + "message": "disable align clauses are not supported." +} +``` + +**Align by device** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(s3) from root.sg27 align by device"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "code": 407, + "message": "align by device clauses are not supported." +} +``` + +**Select into** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4 into root.sg29.s1, root.sg29.s2 from root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +```json +{ + "code": 407, + "message": "select into clauses are not supported." +} +``` + +### nonQuery + +Request method: `POST` + +Request header: `application/json` + +Request path: `http://ip:port/rest/v1/nonQuery` + +Parameter Description: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +| sql | string | query content | + +Example request: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"CREATE DATABASE root.ln"}' http://127.0.0.1:18080/rest/v1/nonQuery +``` + +Response parameters: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +| code | integer | status code | +| message | string | message | + +Sample response: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + + + +### insertTablet + +Request method: `POST` + +Request header: `application/json` + +Request path: `http://ip:port/rest/v1/insertTablet` + +Parameter Description: + +| parameter name |parameter type |is required|parameter describe| +|:---------------| :--- | :---| :---| +| timestamps | array | yes | Time column | +| measurements | array | yes | The name of the measuring point | +| dataTypes | array | yes | The data type | +| values | array | yes | Value columns, the values in each column can be `null` | +| isAligned | boolean | yes | Whether to align the timeseries | +| deviceId | string | yes | Device name | + +Example request: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232143960,1635232153960],"measurements":["s3","s4"],"dataTypes":["INT32","BOOLEAN"],"values":[[11,null],[false,true]],"isAligned":false,"deviceId":"root.sg27"}' http://127.0.0.1:18080/rest/v1/insertTablet +``` + +Sample response: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +| code | integer | status code | +| message | string | message | + +Sample response: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + +## Configuration + +The configuration is located in 'iotdb-system.properties'. + +* Set 'enable_rest_service' to 'true' to enable the module, and 'false' to disable the module. By default, this value is' false '. + +```properties +enable_rest_service=true +``` + +* This parameter is valid only when 'enable_REST_service =true'. Set 'rest_service_port' to a number (1025 to 65535) to customize the REST service socket port. By default, the value is 18080. + +```properties +rest_service_port=18080 +``` + +* Set 'enable_swagger' to 'true' to display rest service interface information through swagger, and 'false' to do not display the rest service interface information through the swagger. By default, this value is' false '. + +```properties +enable_swagger=false +``` + +* The maximum number of rows in the result set that can be returned by a query. When the number of rows in the returned result set exceeds the limit, the status code `411` is returned. + +````properties +rest_query_default_row_size_limit=10000 +```` + +* Expiration time for caching customer login information (used to speed up user authentication, in seconds, 8 hours by default) + +```properties +cache_expire=28800 +``` + + +* Maximum number of users stored in the cache (default: 100) + +```properties +cache_max_num=100 +``` + +* Initial cache size (default: 10) + +```properties +cache_init_num=10 +``` + +* REST Service whether to enable SSL configuration, set 'enable_https' to' true 'to enable the module, and set' false 'to disable the module. By default, this value is' false '. + +```properties +enable_https=false +``` + +* keyStore location path (optional) + +```properties +key_store_path= +``` + + +* keyStore password (optional) + +```properties +key_store_pwd= +``` + + +* trustStore location path (optional) + +```properties +trust_store_path= +``` + +* trustStore password (optional) + +```properties +trust_store_pwd= +``` + + +* SSL timeout period, in seconds + +```properties +idle_timeout=5000 +``` diff --git a/src/UserGuide/V2.0.1/Tree/API/RestServiceV2.md b/src/UserGuide/V2.0.1/Tree/API/RestServiceV2.md new file mode 100644 index 00000000..b4c733fb --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/API/RestServiceV2.md @@ -0,0 +1,970 @@ + + +# RESTful API V2 +IoTDB's RESTful services can be used for query, write, and management operations, using the OpenAPI standard to define interfaces and generate frameworks. + +## Enable RESTful Services + +RESTful services are disabled by default. + +* Developer + + Find the `IoTDBrestServiceConfig` class under `org.apache.iotdb.db.conf.rest` in the sever module, and modify `enableRestService=true`. + +* User + + Find the `conf/iotdb-system.properties` file under the IoTDB installation directory and set `enable_rest_service` to `true` to enable the module. + + ```properties + enable_rest_service=true + ``` + +## Authentication +Except the liveness probe API `/ping`, RESTful services use the basic authentication. Each URL request needs to carry `'Authorization': 'Basic ' + base64.encode(username + ':' + password)`. + +The username used in the following examples is: `root`, and password is: `root`. + +And the authorization header is + +``` +Authorization: Basic cm9vdDpyb290 +``` + +- If a user authorized with incorrect username or password, the following error is returned: + + HTTP Status Code:`401` + + HTTP response body: + ```json + { + "code": 600, + "message": "WRONG_LOGIN_PASSWORD_ERROR" + } + ``` + +- If the `Authorization` header is missing,the following error is returned: + + HTTP Status Code:`401` + + HTTP response body: + ```json + { + "code": 603, + "message": "UNINITIALIZED_AUTH_ERROR" + } + ``` + +## Interface + +### ping + +The `/ping` API can be used for service liveness probing. + +Request method: `GET` + +Request path: `http://ip:port/ping` + +The user name used in the example is: root, password: root + +Example request: + +```shell +$ curl http://127.0.0.1:18080/ping +``` + +Response status codes: + +- `200`: The service is alive. +- `503`: The service cannot accept any requests now. + +Response parameters: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +|code | integer | status code | +| message | string | message | + +Sample response: + +- With HTTP status code `200`: + + ```json + { + "code": 200, + "message": "SUCCESS_STATUS" + } + ``` + +- With HTTP status code `503`: + + ```json + { + "code": 500, + "message": "thrift service is unavailable" + } + ``` + +> `/ping` can be accessed without authorization. + +### query + +The query interface can be used to handle data queries and metadata queries. + +Request method: `POST` + +Request header: `application/json` + +Request path: `http://ip:port/rest/v2/query` + +Parameter Description: + +| parameter name | parameter type | required | parameter description | +|----------------| -------------- | -------- | ------------------------------------------------------------ | +| sql | string | yes | | +| row_limit | integer | no | The maximum number of rows in the result set that can be returned by a query.
If this parameter is not set, the `rest_query_default_row_size_limit` of the configuration file will be used as the default value.
When the number of rows in the returned result set exceeds the limit, the status code `411` will be returned. | + +Response parameters: + +| parameter name | parameter type | parameter description | +|----------------| -------------- | ------------------------------------------------------------ | +| expressions | array | Array of result set column names for data query, `null` for metadata query | +| column_names | array | Array of column names for metadata query result set, `null` for data query | +| timestamps | array | Timestamp column, `null` for metadata query | +| values | array | A two-dimensional array, the first dimension has the same length as the result set column name array, and the second dimension array represents a column of the result set | + +**Examples:** + +Tip: Statements like `select * from root.xx.**` are not recommended because those statements may cause OOM. + +**Expression query** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4, s3 + 1 from root.sg27 limit 2"}' http://127.0.0.1:18080/rest/v2/query +```` + +```json +{ + "expressions": [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg27.s3 + 1" + ], + "column_names": null, + "timestamps": [ + 1635232143960, + 1635232153960 + ], + "values": [ + [ + 11, + null + ], + [ + false, + true + ], + [ + 12.0, + null + ] + ] +} +``` + +**Show child paths** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child paths root"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "child paths" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ] + ] +} +``` + +**Show child nodes** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child nodes root"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "child nodes" + ], + "timestamps": null, + "values": [ + [ + "sg27", + "sg28" + ] + ] +} +``` + +**Show all ttl** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show all ttl"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + null, + null + ] + ] +} +``` + +**Show ttl** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show ttl on root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27" + ], + [ + null + ] + ] +} +``` + +**Show functions** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show functions"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "function name", + "function type", + "class name (UDF)" + ], + "timestamps": null, + "values": [ + [ + "ABS", + "ACOS", + "ASIN", + ... + ], + [ + "built-in UDTF", + "built-in UDTF", + "built-in UDTF", + ... + ], + [ + "org.apache.iotdb.db.query.udf.builtin.UDTFAbs", + "org.apache.iotdb.db.query.udf.builtin.UDTFAcos", + "org.apache.iotdb.db.query.udf.builtin.UDTFAsin", + ... + ] + ] +} +``` + +**Show timeseries** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show timeseries"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg28.s3", + "root.sg28.s4" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg27", + "root.sg27", + "root.sg28", + "root.sg28" + ], + [ + "INT32", + "BOOLEAN", + "INT32", + "BOOLEAN" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +**Show latest timeseries** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show latest timeseries"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg28.s4", + "root.sg27.s4", + "root.sg28.s3", + "root.sg27.s3" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg28", + "root.sg27", + "root.sg28", + "root.sg27" + ], + [ + "BOOLEAN", + "BOOLEAN", + "INT32", + "INT32" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +**Count timeseries** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count timeseries root.**"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +**Count nodes** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count nodes root.** level=2"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +**Show devices** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "devices", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +**Show devices with database** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices with database"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "devices", + "database", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +**List user** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"list user"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "user" + ], + "timestamps": null, + "values": [ + [ + "root" + ] + ] +} +``` + +**Aggregation** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "column_names": null, + "timestamps": [ + 0 + ], + "values": [ + [ + 1 + ], + [ + 2 + ] + ] +} +``` + +**Group by level** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.** group by level = 1"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "count(root.sg27.*)", + "count(root.sg28.*)" + ], + "timestamps": null, + "values": [ + [ + 3 + ], + [ + 3 + ] + ] +} +``` + +**Group by** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27 group by([1635232143960,1635232153960),1s)"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "column_names": null, + "timestamps": [ + 1635232143960, + 1635232144960, + 1635232145960, + 1635232146960, + 1635232147960, + 1635232148960, + 1635232149960, + 1635232150960, + 1635232151960, + 1635232152960 + ], + "values": [ + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ], + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ] + ] +} +``` + +**Last** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select last s3 from root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "expressions": null, + "column_names": [ + "timeseries", + "value", + "dataType" + ], + "timestamps": [ + 1635232143960 + ], + "values": [ + [ + "root.sg27.s3" + ], + [ + "11" + ], + [ + "INT32" + ] + ] +} +``` + +**Disable align** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select * from root.sg27 disable align"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "code": 407, + "message": "disable align clauses are not supported." +} +``` + +**Align by device** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(s3) from root.sg27 align by device"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "code": 407, + "message": "align by device clauses are not supported." +} +``` + +**Select into** + +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4 into root.sg29.s1, root.sg29.s2 from root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +```json +{ + "code": 407, + "message": "select into clauses are not supported." +} +``` + +### nonQuery + +Request method: `POST` + +Request header: `application/json` + +Request path: `http://ip:port/rest/v2/nonQuery` + +Parameter Description: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +| sql | string | query content | + +Example request: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"CREATE DATABASE root.ln"}' http://127.0.0.1:18080/rest/v2/nonQuery +``` + +Response parameters: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +| code | integer | status code | +| message | string | message | + +Sample response: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + + + +### insertTablet + +Request method: `POST` + +Request header: `application/json` + +Request path: `http://ip:port/rest/v2/insertTablet` + +Parameter Description: + +| parameter name |parameter type |is required|parameter describe| +|:---------------| :--- | :---| :---| +| timestamps | array | yes | Time column | +| measurements | array | yes | The name of the measuring point | +| data_types | array | yes | The data type | +| values | array | yes | Value columns, the values in each column can be `null` | +| is_aligned | boolean | yes | Whether to align the timeseries | +| device | string | yes | Device name | + +Example request: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232143960,1635232153960],"measurements":["s3","s4"],"data_types":["INT32","BOOLEAN"],"values":[[11,null],[false,true]],"is_aligned":false,"device":"root.sg27"}' http://127.0.0.1:18080/rest/v2/insertTablet +``` + +Sample response: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +| code | integer | status code | +| message | string | message | + +Sample response: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + +### insertRecords + +Request method: `POST` + +Request header: `application/json` + +Request path: `http://ip:port/rest/v2/insertRecords` + +Parameter Description: + +| parameter name |parameter type |is required|parameter describe| +|:------------------| :--- | :---| :---| +| timestamps | array | yes | Time column | +| measurements_list | array | yes | The name of the measuring point | +| data_types_list | array | yes | The data type | +| values_list | array | yes | Value columns, the values in each column can be `null` | +| devices | string | yes | Device name | +| is_aligned | boolean | yes | Whether to align the timeseries | + +Example request: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232113960,1635232151960,1635232143960,1635232143960],"measurements_list":[["s33","s44"],["s55","s66"],["s77","s88"],["s771","s881"]],"data_types_list":[["INT32","INT64"],["FLOAT","DOUBLE"],["FLOAT","DOUBLE"],["BOOLEAN","TEXT"]],"values_list":[[1,11],[2.1,2],[4,6],[false,"cccccc"]],"is_aligned":false,"devices":["root.s1","root.s1","root.s1","root.s3"]}' http://127.0.0.1:18080/rest/v2/insertRecords +``` + +Sample response: + +|parameter name |parameter type |parameter describe| +|:--- | :--- | :---| +| code | integer | status code | +| message | string | message | + +Sample response: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + + +## Configuration + +The configuration is located in 'iotdb-system.properties'. + +* Set 'enable_rest_service' to 'true' to enable the module, and 'false' to disable the module. By default, this value is' false '. + +```properties +enable_rest_service=true +``` + +* This parameter is valid only when 'enable_REST_service =true'. Set 'rest_service_port' to a number (1025 to 65535) to customize the REST service socket port. By default, the value is 18080. + +```properties +rest_service_port=18080 +``` + +* Set 'enable_swagger' to 'true' to display rest service interface information through swagger, and 'false' to do not display the rest service interface information through the swagger. By default, this value is' false '. + +```properties +enable_swagger=false +``` + +* The maximum number of rows in the result set that can be returned by a query. When the number of rows in the returned result set exceeds the limit, the status code `411` is returned. + +````properties +rest_query_default_row_size_limit=10000 +```` + +* Expiration time for caching customer login information (used to speed up user authentication, in seconds, 8 hours by default) + +```properties +cache_expire=28800 +``` + + +* Maximum number of users stored in the cache (default: 100) + +```properties +cache_max_num=100 +``` + +* Initial cache size (default: 10) + +```properties +cache_init_num=10 +``` + +* REST Service whether to enable SSL configuration, set 'enable_https' to' true 'to enable the module, and set' false 'to disable the module. By default, this value is' false '. + +```properties +enable_https=false +``` + +* keyStore location path (optional) + +```properties +key_store_path= +``` + + +* keyStore password (optional) + +```properties +key_store_pwd= +``` + + +* trustStore location path (optional) + +```properties +trust_store_path= +``` + +* trustStore password (optional) + +```properties +trust_store_pwd= +``` + + +* SSL timeout period, in seconds + +```properties +idle_timeout=5000 +``` diff --git a/src/UserGuide/V2.0.1/Tree/Background-knowledge/Cluster-Concept.md b/src/UserGuide/V2.0.1/Tree/Background-knowledge/Cluster-Concept.md new file mode 100644 index 00000000..d6f57bf2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Background-knowledge/Cluster-Concept.md @@ -0,0 +1,59 @@ + + +# Cluster-related Concepts +The figure below illustrates a typical IoTDB 3C3D1A cluster deployment mode, comprising 3 ConfigNodes, 3 DataNodes, and 1 AINode: + + +This deployment involves several key concepts that users commonly encounter when working with IoTDB clusters, including: +- **Nodes** (ConfigNode, DataNode, AINode); +- **Slots** (SchemaSlot, DataSlot); +- **Regions** (SchemaRegion, DataRegion); +- **Replica Groups**. + +The following sections will provide a detailed introduction to these concepts. + +## Nodes + +An IoTDB cluster consists of three types of nodes (processes): **ConfigNode** (the main node), **DataNode**, and **AINode**, as detailed below: +- **ConfigNode:** ConfigNodes store cluster configurations, database metadata, the routing information of time series' schema and data. They also monitor cluster nodes and conduct load balancing. All ConfigNodes maintain full mutual backups, as shown in the figure with ConfigNode-1, ConfigNode-2, and ConfigNode-3. ConfigNodes do not directly handle client read or write requests. Instead, they guide the distribution of time series' schema and data within the cluster using a series of [load balancing algorithms](../Technical-Insider/Cluster-data-partitioning.md). +- **DataNode:** DataNodes are responsible for reading and writing time series' schema and data. Each DataNode can accept client read and write requests and provide corresponding services, as illustrated with DataNode-1, DataNode-2, and DataNode-3 in the above figure. When a DataNode receives client requests, it can process them directly or forward them if it has the relevant routing information cached locally. Otherwise, it queries the ConfigNode for routing details and caches the information to improve the efficiency of subsequent requests. +- **AINode:** AINodes interact with ConfigNodes and DataNodes to extend IoTDB's capabilities for data intelligence analysis on time series data. They support registering pre-trained machine learning models from external sources and performing time series analysis tasks using simple SQL statements on specified data. This process integrates model creation, management, and inference within the database engine. Currently, the system provides built-in algorithms or self-training models for common time series analysis scenarios, such as forecasting and anomaly detection. + +## Slots + +IoTDB divides time series' schema and data into smaller, more manageable units called **slots**. Slots are logical entities, and in an IoTDB cluster, the **SchemaSlots** and **DataSlots** are defined as follows: +- **SchemaSlot:** A SchemaSlot represents a subset of the time series' schema collection. The total number of SchemaSlots is fixed, with a default value of 1000. IoTDB uses a hashing algorithm to evenly distribute all devices across these SchemaSlots. +- **DataSlot:** A DataSlot represents a subset of the time series' data collection. Based on the SchemaSlots, the data for corresponding devices is further divided into DataSlots by a fixed time interval. The default time interval for a DataSlot is 7 days. + +## Region + +In IoTDB, time series' schema and data are replicated across DataNodes to ensure high availability in the cluster. However, replicating data at the slot level can increase management complexity and reduce write throughput. To address this, IoTDB introduces the concept of **Region**, which groups SchemaSlots and DataSlots into **SchemaRegions** and **DataRegions** respectively. Replication is then performed at the Region level. The definitions of SchemaRegion and DataRegion are as follows: +- **SchemaRegion**: A SchemaRegion is the basic unit for storing and replicating time series' schema. All SchemaSlots in a database are evenly distributed across the database's SchemaRegions. SchemaRegions with the same RegionID are replicas of each other. For example, in the figure above, SchemaRegion-1 has three replicas located on DataNode-1, DataNode-2, and DataNode-3. +- **DataRegion**: A DataRegion is the basic unit for storing and replicating time series' data. All DataSlots in a database are evenly distributed across the database's DataRegions. DataRegions with the same RegionID are replicas of each other. For instance, in the figure above, DataRegion-2 has two replicas located on DataNode-1 and DataNode-2. + +## Replica Groups +Region replicas are critical for the fault tolerance of the cluster. Each Region's replicas are organized into **replica groups**, where the replicas are assigned roles as either **leader** or **follower**, working together to provide read and write services. Recommended replica group configurations under different architectures are as follows: + +| Category | Parameter | Single-node Recommended Configuration | Distributed Recommended Configuration | +|:------------:|:-----------------------:|:------------------------------------:|:-------------------------------------:| +| Schema | `schema_replication_factor` | 1 | 3 | +| Data | `data_replication_factor` | 1 | 2 | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Background-knowledge/Data-Type.md b/src/UserGuide/V2.0.1/Tree/Background-knowledge/Data-Type.md new file mode 100644 index 00000000..846e8067 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Background-knowledge/Data-Type.md @@ -0,0 +1,184 @@ + + +# Data Type + +## Basic Data Type + +IoTDB supports the following data types: + +* BOOLEAN (Boolean) +* INT32 (Integer) +* INT64 (Long Integer) +* FLOAT (Single Precision Floating Point) +* DOUBLE (Double Precision Floating Point) +* TEXT (Long String) +* STRING(String) +* BLOB(Large binary Object) +* TIMESTAMP(Timestamp) +* DATE(Date) + +The difference between STRING and TEXT types is that STRING type has more statistical information and can be used to optimize value filtering queries, while TEXT type is suitable for storing long strings. + +### Float Precision + +The time series of **FLOAT** and **DOUBLE** type can specify (MAX\_POINT\_NUMBER, see [this page](../SQL-Manual/SQL-Manual.md) for more information on how to specify), which is the number of digits after the decimal point of the floating point number, if the encoding method is [RLE](Encoding-and-Compression.md) or [TS\_2DIFF](Encoding-and-Compression.md). If MAX\_POINT\_NUMBER is not specified, the system will use [float\_precision](../Reference/DataNode-Config-Manual.md) in the configuration file `iotdb-system.properties`. + +```sql +CREATE TIMESERIES root.vehicle.d0.s0 WITH DATATYPE=FLOAT, ENCODING=RLE, 'MAX_POINT_NUMBER'='2'; +``` + +* For Float data value, The data range is (-Integer.MAX_VALUE, Integer.MAX_VALUE), rather than Float.MAX_VALUE, and the max_point_number is 19, caused by the limition of function Math.round(float) in Java. +* For Double data value, The data range is (-Long.MAX_VALUE, Long.MAX_VALUE), rather than Double.MAX_VALUE, and the max_point_number is 19, caused by the limition of function Math.round(double) in Java (Long.MAX_VALUE=9.22E18). + +### Data Type Compatibility + +When the written data type is inconsistent with the data type of time-series, +- If the data type of time-series is not compatible with the written data type, the system will give an error message. +- If the data type of time-series is compatible with the written data type, the system will automatically convert the data type. + +The compatibility of each data type is shown in the following table: + +| Series Data Type | Supported Written Data Types | +|------------------|------------------------------| +| BOOLEAN | BOOLEAN | +| INT32 | INT32 | +| INT64 | INT32 INT64 | +| FLOAT | INT32 FLOAT | +| DOUBLE | INT32 INT64 FLOAT DOUBLE | +| TEXT | TEXT | + +## Timestamp + +The timestamp is the time point at which data is produced. It includes absolute timestamps and relative timestamps + +### Absolute timestamp + +Absolute timestamps in IoTDB are divided into two types: LONG and DATETIME (including DATETIME-INPUT and DATETIME-DISPLAY). When a user inputs a timestamp, he can use a LONG type timestamp or a DATETIME-INPUT type timestamp, and the supported formats of the DATETIME-INPUT type timestamp are shown in the table below: + +
+ +**Supported formats of DATETIME-INPUT type timestamp** + + + +| Format | +| :--------------------------: | +| yyyy-MM-dd HH:mm:ss | +| yyyy/MM/dd HH:mm:ss | +| yyyy.MM.dd HH:mm:ss | +| yyyy-MM-dd HH:mm:ssZZ | +| yyyy/MM/dd HH:mm:ssZZ | +| yyyy.MM.dd HH:mm:ssZZ | +| yyyy/MM/dd HH:mm:ss.SSS | +| yyyy-MM-dd HH:mm:ss.SSS | +| yyyy.MM.dd HH:mm:ss.SSS | +| yyyy-MM-dd HH:mm:ss.SSSZZ | +| yyyy/MM/dd HH:mm:ss.SSSZZ | +| yyyy.MM.dd HH:mm:ss.SSSZZ | +| ISO8601 standard time format | + +
+ + +IoTDB can support LONG types and DATETIME-DISPLAY types when displaying timestamps. The DATETIME-DISPLAY type can support user-defined time formats. The syntax of the custom time format is shown in the table below: + +
+ +**The syntax of the custom time format** + + +| Symbol | Meaning | Presentation | Examples | +| :----: | :-------------------------: | :----------: | :--------------------------------: | +| G | era | era | era | +| C | century of era (>=0) | number | 20 | +| Y | year of era (>=0) | year | 1996 | +| | | | | +| x | weekyear | year | 1996 | +| w | week of weekyear | number | 27 | +| e | day of week | number | 2 | +| E | day of week | text | Tuesday; Tue | +| | | | | +| y | year | year | 1996 | +| D | day of year | number | 189 | +| M | month of year | month | July; Jul; 07 | +| d | day of month | number | 10 | +| | | | | +| a | halfday of day | text | PM | +| K | hour of halfday (0~11) | number | 0 | +| h | clockhour of halfday (1~12) | number | 12 | +| | | | | +| H | hour of day (0~23) | number | 0 | +| k | clockhour of day (1~24) | number | 24 | +| m | minute of hour | number | 30 | +| s | second of minute | number | 55 | +| S | fraction of second | millis | 978 | +| | | | | +| z | time zone | text | Pacific Standard Time; PST | +| Z | time zone offset/id | zone | -0800; -08:00; America/Los_Angeles | +| | | | | +| ' | escape for text | delimiter | | +| '' | single quote | literal | ' | + +
+ +### Relative timestamp + +Relative time refers to the time relative to the server time ```now()``` and ```DATETIME``` time. + + Syntax: + + ``` + Duration = (Digit+ ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS'))+ + RelativeTime = (now() | DATETIME) ((+|-) Duration)+ + + ``` + +
+ +**The syntax of the duration unit** + + +| Symbol | Meaning | Presentation | Examples | +| :----: | :---------: | :----------------------: | :------: | +| y | year | 1y=365 days | 1y | +| mo | month | 1mo=30 days | 1mo | +| w | week | 1w=7 days | 1w | +| d | day | 1d=1 day | 1d | +| | | | | +| h | hour | 1h=3600 seconds | 1h | +| m | minute | 1m=60 seconds | 1m | +| s | second | 1s=1 second | 1s | +| | | | | +| ms | millisecond | 1ms=1000_000 nanoseconds | 1ms | +| us | microsecond | 1us=1000 nanoseconds | 1us | +| ns | nanosecond | 1ns=1 nanosecond | 1ns | + +
+ + eg: + + ``` + now() - 1d2h //1 day and 2 hours earlier than the current server time + now() - 1w //1 week earlier than the current server time + ``` + + > Note:There must be spaces on the left and right of '+' and '-'. diff --git a/src/UserGuide/V2.0.1/Tree/Basic-Concept/Data-Model-and-Terminology.md b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Data-Model-and-Terminology.md new file mode 100644 index 00000000..015a4035 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Data-Model-and-Terminology.md @@ -0,0 +1,150 @@ + + +# Data Model + +A wind power IoT scenario is taken as an example to illustrate how to create a correct data model in IoTDB. + +According to the enterprise organization structure and equipment entity hierarchy, it is expressed as an attribute hierarchy structure, as shown below. The hierarchical from top to bottom is: power group layer - power plant layer - entity layer - measurement layer. ROOT is the root node, and each node of measurement layer is a leaf node. In the process of using IoTDB, the attributes on the path from ROOT node is directly connected to each leaf node with ".", thus forming the name of a timeseries in IoTDB. For example, The left-most path in Figure 2.1 can generate a timeseries named `root.ln.wf01.wt01.status`. + +
+ +Here are the basic concepts of the model involved in IoTDB. + +## Measurement, Entity, Database, Path + +### Measurement (Also called field) + +It is information measured by detection equipment in an actual scene and can transform the sensed information into an electrical signal or other desired form of information output and send it to IoTDB. In IoTDB, all data and paths stored are organized in units of measurements. + +### Entity (Also called device) + +**An entity** is an equipped with measurements in real scenarios. In IoTDB, all measurements should have their corresponding entities. Entities do not need to be created manually, the default is the second last layer. + +### Database + +**A group of entities.** Users can create any prefix path as a database. Provided that there are four timeseries `root.ln.wf01.wt01.status`, `root.ln.wf01.wt01.temperature`, `root.ln.wf02.wt02.hardware`, `root.ln.wf02.wt02.status`, two devices `wf01`, `wf02` under the path `root.ln` may belong to the same owner or the same manufacturer, so d1 and d2 are closely related. At this point, the prefix path root.vehicle can be designated as a database, which will enable IoTDB to store all devices under it in the same folder. Newly added devices under `root.ln` will also belong to this database. + +In general, it is recommended to create 1 database. + +> Note1: A full path (`root.ln.wf01.wt01.status` as in the above example) is not allowed to be set as a database. +> +> Note2: The prefix of a timeseries must belong to a database. Before creating a timeseries, users must set which database the series belongs to. Only timeseries whose database is set can be persisted to disk. +> +> Note3: The number of character in the path as database, including `root.`, shall not exceed 64. + +Once a prefix path is set as a database, the database settings cannot be changed. + +After a database is set, the ancestral layers, children and descendant layers of the corresponding prefix path are not allowed to be set up again (for example, after `root.ln` is set as the database, the root layer and `root.ln.wf01` are not allowed to be created as database). + +The Layer Name of database can only consist of characters, numbers, and underscores, like `root.storagegroup_1`. + +### Path + +A `path` is an expression that conforms to the following constraints: + +```sql +path + : nodeName ('.' nodeName)* + ; + +nodeName + : wildcard? identifier wildcard? + | wildcard + ; + +wildcard + : '*' + | '**' + ; +``` + +We call the part of a path divided by `'.'` as a `node` or `nodeName`. For example: `root.a.b.c` is a path with 4 nodes. + +The following are the constraints on the `nodeName`: + +* `root` is a reserved character, and it is only allowed to appear at the beginning layer of the time series mentioned below. If `root` appears in other layers, it cannot be parsed and an error will be reported. +* Except for the beginning layer (`root`) of the time series, the characters supported in other layers are as follows: + + * [ 0-9 a-z A-Z _ ] (letters, numbers, underscore) + * ['\u2E80'..'\u9FFF'] (Chinese characters) +* In particular, if the system is deployed on a Windows machine, the database layer name will be case-insensitive. For example, creating both `root.ln` and `root.LN` at the same time is not allowed. + +### Special characters (Reverse quotation marks) + +If you need to use special characters in the path node name, you can use reverse quotation marks to reference the path node name. For specific usage, please refer to [Reverse Quotation Marks](../Reference/Syntax-Rule.md#reverse-quotation-marks). + +### Path Pattern + +In order to make it easier and faster to express multiple timeseries paths, IoTDB provides users with the path pattern. Users can construct a path pattern by using wildcard `*` and `**`. Wildcard can appear in any node of the path. + +`*` represents one node. For example, `root.vehicle.*.sensor1` represents a 4-node path which is prefixed with `root.vehicle` and suffixed with `sensor1`. + +`**` represents (`*`)+, which is one or more nodes of `*`. For example, `root.vehicle.device1.**` represents all paths prefixed by `root.vehicle.device1` with nodes num greater than or equal to 4, like `root.vehicle.device1.*`, `root.vehicle.device1.*.*`, `root.vehicle.device1.*.*.*`, etc; `root.vehicle.**.sensor1` represents a path which is prefixed with `root.vehicle` and suffixed with `sensor1` and has at least 4 nodes. + +> Note1: Wildcard `*` and `**` cannot be placed at the beginning of the path. + + +## Timeseries + +### Timestamp + +The timestamp is the time point at which data is produced. It includes absolute timestamps and relative timestamps. For detailed description, please go to [Data Type doc](./Data-Type.md). + +### Data point + +**A "time-value" pair**. + +### Timeseries + +**The record of a measurement of an entity on the time axis.** Timeseries is a series of data points. + +A measurement of an entity corresponds to a timeseries. + +Also called meter, timeline, and tag, parameter in real time database. + +The number of measurements managed by IoTDB can reach more than billions. + +For example, if entity wt01 in power plant wf01 of power group ln has a measurement named status, its timeseries can be expressed as: `root.ln.wf01.wt01.status`. + +### Aligned timeseries + +There is a situation that multiple measurements of an entity are sampled simultaneously in practical applications, forming multiple timeseries with the same time column. Such a group of timeseries can be modeled as aligned timeseries in Apache IoTDB. + +The timestamp columns of a group of aligned timeseries need to be stored only once in memory and disk when inserting data, instead of once per timeseries. + +It would be best if you created a group of aligned timeseries at the same time. + +You cannot create non-aligned timeseries under the entity to which the aligned timeseries belong, nor can you create aligned timeseries under the entity to which the non-aligned timeseries belong. + +When querying, you can query each timeseries separately. + +When inserting data, it is allowed to insert null value in the aligned timeseries. + + + +In the following chapters of data definition language, data operation language and Java Native Interface, various operations related to aligned timeseries will be introduced one by one. + +## Schema Template + +In the actual scenario, many entities collect the same measurements, that is, they have the same measurements name and type. A **schema template** can be declared to define the collectable measurements set. Schema template helps save memory by implementing schema sharing. For detailed description, please refer to [Schema Template doc](../User-Manual/Operate-Metadata_timecho.md#Device-Template). + +In the following chapters of, data definition language, data operation language and Java Native Interface, various operations related to schema template will be introduced one by one. diff --git a/src/UserGuide/V2.0.1/Tree/Basic-Concept/Navigating_Time_Series_Data.md b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Navigating_Time_Series_Data.md new file mode 100644 index 00000000..20aaef32 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Navigating_Time_Series_Data.md @@ -0,0 +1,64 @@ + +# Entering Time Series Data + +## What Is Time Series Data? + +In today's era of the Internet of Things, various scenarios such as the Internet of Things and industrial scenarios are undergoing digital transformation. People collect various states of devices by installing sensors on them. If the motor collects voltage and current, the blade speed, angular velocity, and power generation of the fan; Vehicle collection of latitude and longitude, speed, and fuel consumption; The vibration frequency, deflection, displacement, etc. of the bridge. The data collection of sensors has penetrated into various industries. + +![](https://alioss.timecho.com/docs/img/20240505154735.png) + +Generally speaking, we refer to each collection point as a measurement point (also known as a physical quantity, time series, timeline, signal quantity, indicator, measurement value, etc.). Each measurement point continuously collects new data information over time, forming a time series. In the form of a table, each time series is a table formed by two columns: time and value; In a graphical way, each time series is a trend chart formed over time, which can also be vividly referred to as the device's electrocardiogram. + +![](https://alioss.timecho.com/docs/img/20240505154843.png) + +The massive time series data generated by sensors is the foundation of digital transformation in various industries, so our modeling of time series data mainly focuses on equipment and sensors. + +## Key Concepts of Time Series Data +The main concepts involved in time-series data can be divided from bottom to top: data points, measurement points, and equipment. + +![](https://alioss.timecho.com/docs/img/20240505154513.png) + +### Data Point + +- Definition: Consists of a timestamp and a value, where the timestamp is of type long and the value can be of various types such as BOOLEAN, FLOAT, INT32, etc. +- Example: A row of a time series in the form of a table in the above figure, or a point of a time series in the form of a graph, is a data point. + +![](https://alioss.timecho.com/docs/img/20240505154432.png) + +### Measurement Points + +- Definition: It is a time series formed by multiple data points arranged in increments according to timestamps. Usually, a measuring point represents a collection point and can regularly collect physical quantities of the environment it is located in. +- Also known as: physical quantity, time series, timeline, semaphore, indicator, measurement value, etc +- Example: + - Electricity scenario: current, voltage + - Energy scenario: wind speed, rotational speed + - Vehicle networking scenarios: fuel consumption, vehicle speed, longitude, dimensions + - Factory scenario: temperature, humidity + +### Device + +- Definition: Corresponding to a physical device in an actual scene, usually a collection of measurement points, identified by one to multiple labels +- Example: + - Vehicle networking scenario: Vehicles identified by vehicle identification code (VIN) + - Factory scenario: robotic arm, unique ID identification generated by IoT platform + - Energy scenario: Wind turbines, identified by region, station, line, model, instance, etc + - Monitoring scenario: CPU, identified by machine room, rack, Hostname, device type, etc \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_apache.md b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_apache.md new file mode 100644 index 00000000..58c01a88 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_apache.md @@ -0,0 +1,1253 @@ + + +# Timeseries Management + +## Database Management + +### Create Database + +According to the storage model we can set up the corresponding database. Two SQL statements are supported for creating databases, as follows: + +``` +IoTDB > create database root.ln +IoTDB > create database root.sgcc +``` + +We can thus create two databases using the above two SQL statements. + +It is worth noting that 1 database is recommended. + +When the path itself or the parent/child layer of the path is already created as database, the path is then not allowed to be created as database. For example, it is not feasible to create `root.ln.wf01` as database when two databases `root.ln` and `root.sgcc` exist. The system gives the corresponding error prompt as shown below: + +``` +IoTDB> CREATE DATABASE root.ln.wf01 +Msg: 300: root.ln has already been created as database. +IoTDB> create database root.ln.wf01 +Msg: 300: root.ln has already been created as database. +``` + +The LayerName of database can only be chinese or english characters, numbers, underscores, dots and backticks. If you want to set it to pure numbers or contain backticks or dots, you need to enclose the database name with backticks (` `` `). In ` `` `,2 backticks represents one, i.e. ` ```` ` represents `` ` ``. + +Besides, if deploy on Windows system, the LayerName is case-insensitive, which means it's not allowed to create databases `root.ln` and `root.LN` at the same time. + +### Show Databases + +After creating the database, we can use the [SHOW DATABASES](../SQL-Manual/SQL-Manual.md) statement and [SHOW DATABASES \](../SQL-Manual/SQL-Manual.md) to view the databases. The SQL statements are as follows: + +``` +IoTDB> SHOW DATABASES +IoTDB> SHOW DATABASES root.** +``` + +The result is as follows: + +``` ++-------------+----+-------------------------+-----------------------+-----------------------+ +|database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| ++-------------+----+-------------------------+-----------------------+-----------------------+ +| root.sgcc|null| 2| 2| 604800| +| root.ln|null| 2| 2| 604800| ++-------------+----+-------------------------+-----------------------+-----------------------+ +Total line number = 2 +It costs 0.060s +``` + +### Delete Database + +User can use the `DELETE DATABASE ` statement to delete all databases matching the pathPattern. Please note the data in the database will also be deleted. + +``` +IoTDB > DELETE DATABASE root.ln +IoTDB > DELETE DATABASE root.sgcc +// delete all data, all timeseries and all databases +IoTDB > DELETE DATABASE root.** +``` + +### Count Databases + +User can use the `COUNT DATABASE ` statement to count the number of databases. It is allowed to specify `PathPattern` to count the number of databases matching the `PathPattern`. + +SQL statement is as follows: + +``` +IoTDB> count databases +IoTDB> count databases root.* +IoTDB> count databases root.sgcc.* +IoTDB> count databases root.sgcc +``` + +The result is as follows: + +``` ++-------------+ +| database| ++-------------+ +| root.sgcc| +| root.turbine| +| root.ln| ++-------------+ +Total line number = 3 +It costs 0.003s + ++-------------+ +| database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.003s + ++-------------+ +| database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 0| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 1| ++-------------+ +Total line number = 1 +It costs 0.002s +``` + +### Setting up heterogeneous databases (Advanced operations) + +Under the premise of familiar with IoTDB metadata modeling, +users can set up heterogeneous databases in IoTDB to cope with different production needs. + +Currently, the following database heterogeneous parameters are supported: + +| Parameter | Type | Description | +| ------------------------- | ------- | --------------------------------------------- | +| TTL | Long | TTL of the Database | +| SCHEMA_REPLICATION_FACTOR | Integer | The schema replication number of the Database | +| DATA_REPLICATION_FACTOR | Integer | The data replication number of the Database | +| SCHEMA_REGION_GROUP_NUM | Integer | The SchemaRegionGroup number of the Database | +| DATA_REGION_GROUP_NUM | Integer | The DataRegionGroup number of the Database | + +Note the following when configuring heterogeneous parameters: + ++ TTL and TIME_PARTITION_INTERVAL must be positive integers. ++ SCHEMA_REPLICATION_FACTOR and DATA_REPLICATION_FACTOR must be smaller than or equal to the number of deployed DataNodes. ++ The function of SCHEMA_REGION_GROUP_NUM and DATA_REGION_GROUP_NUM are related to the parameter `schema_region_group_extension_policy` and `data_region_group_extension_policy` in iotdb-system.properties configuration file. Take DATA_REGION_GROUP_NUM as an example: + If `data_region_group_extension_policy=CUSTOM` is set, DATA_REGION_GROUP_NUM serves as the number of DataRegionGroups owned by the Database. + If `data_region_group_extension_policy=AUTO`, DATA_REGION_GROUP_NUM is used as the lower bound of the DataRegionGroup quota owned by the Database. That is, when the Database starts writing data, it will have at least this number of DataRegionGroups. + +Users can set any heterogeneous parameters when creating a Database, or adjust some heterogeneous parameters during a stand-alone/distributed IoTDB run. + +#### Set heterogeneous parameters when creating a Database + +The user can set any of the above heterogeneous parameters when creating a Database. The SQL statement is as follows: + +``` +CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? +``` + +For example: + +``` +CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### Adjust heterogeneous parameters at run time + +Users can adjust some heterogeneous parameters during the IoTDB runtime, as shown in the following SQL statement: + +``` +ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* +``` + +For example: + +``` +ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +Note that only the following heterogeneous parameters can be adjusted at runtime: + ++ SCHEMA_REGION_GROUP_NUM ++ DATA_REGION_GROUP_NUM + +#### Show heterogeneous databases + +The user can query the specific heterogeneous configuration of each Database, and the SQL statement is as follows: + +``` +SHOW DATABASES DETAILS prefixPath? +``` + +For example: + +``` +IoTDB> SHOW DATABASES DETAILS ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| +|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| +|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +Total line number = 3 +It costs 0.058s +``` + +The query results in each column are as follows: + ++ The name of the Database ++ The TTL of the Database ++ The schema replication number of the Database ++ The data replication number of the Database ++ The time partition interval of the Database ++ The current SchemaRegionGroup number of the Database ++ The required minimum SchemaRegionGroup number of the Database ++ The permitted maximum SchemaRegionGroup number of the Database ++ The current DataRegionGroup number of the Database ++ The required minimum DataRegionGroup number of the Database ++ The permitted maximum DataRegionGroup number of the Database + +### TTL + +IoTDB supports device-level TTL settings, which means it is able to delete old data automatically and periodically. The benefit of using TTL is that hopefully you can control the total disk space usage and prevent the machine from running out of disks. Moreover, the query performance may downgrade as the total number of files goes up and the memory usage also increases as there are more files. Timely removing such files helps to keep at a high query performance level and reduce memory usage. + +The default unit of TTL is milliseconds. If the time precision in the configuration file changes to another, the TTL is still set to milliseconds. + +When setting TTL, the system will look for all devices included in the set path and set TTL for these devices. The system will delete expired data at the device granularity. +After the device data expires, it will not be queryable. The data in the disk file cannot be guaranteed to be deleted immediately, but it can be guaranteed to be deleted eventually. +However, due to operational costs, the expired data will not be physically deleted right after expiring. The physical deletion is delayed until compaction. +Therefore, before the data is physically deleted, if the TTL is reduced or lifted, it may cause data that was previously invisible due to TTL to reappear. +The system can only set up to 1000 TTL rules, and when this limit is reached, some TTL rules need to be deleted before new rules can be set. + +#### TTL Path Rule +The path can only be prefix paths (i.e., the path cannot contain \* , except \*\* in the last level). +This path will match devices and also allows users to specify paths without asterisks as specific databases or devices. +When the path does not contain asterisks, the system will check if it matches a database; if it matches a database, both the path and path.\*\* will be set at the same time. Note: Device TTL settings do not verify the existence of metadata, i.e., it is allowed to set TTL for a non-existent device. +``` +qualified paths: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +unqualified paths: +root.*.db +root.**.db.* +root.db.* +``` +#### TTL Applicable Rules +When a device is subject to multiple TTL rules, the more precise and longer rules are prioritized. For example, for the device "root.bj.hd.dist001.turbine001", the rule "root.bj.hd.dist001.turbine001" takes precedence over "root.bj.hd.dist001.\*\*", and the rule "root.bj.hd.dist001.\*\*" takes precedence over "root.bj.hd.**". +#### Set TTL +The set ttl operation can be understood as setting a TTL rule, for example, setting ttl to root.sg.group1.** is equivalent to mounting ttl for all devices that can match this path pattern. +The unset ttl operation indicates unmounting TTL for the corresponding path pattern; if there is no corresponding TTL, nothing will be done. +If you want to set TTL to be infinitely large, you can use the INF keyword. +The SQL Statement for setting TTL is as follow: +``` +set ttl to pathPattern 360000; +``` +Set the Time to Live (TTL) to a pathPattern of 360,000 milliseconds; the pathPattern should not contain a wildcard (\*) in the middle and must end with a double asterisk (\*\*). The pathPattern is used to match corresponding devices. +To maintain compatibility with older SQL syntax, if the user-provided pathPattern matches a database (db), the path pattern is automatically expanded to include all sub-paths denoted by path.\*\*. +For instance, writing "set ttl to root.sg 360000" will automatically be transformed into "set ttl to root.sg.\*\* 360000", which sets the TTL for all devices under root.sg. However, if the specified pathPattern does not match a database, the aforementioned logic will not apply. For example, writing "set ttl to root.sg.group 360000" will not be expanded to "root.sg.group.\*\*" since root.sg.group does not match a database. +It is also permissible to specify a particular device without a wildcard (*). +#### Unset TTL + +To unset TTL, we can use follwing SQL statement: + +``` +IoTDB> unset ttl from root.ln +``` + +After unset TTL, all data will be accepted in `root.ln`. +``` +IoTDB> unset ttl from root.sgcc.** +``` + +Unset the TTL in the `root.sgcc` path. + +New syntax +``` +IoTDB> unset ttl from root.** +``` + +Old syntax +``` +IoTDB> unset ttl to root.** +``` +There is no functional difference between the old and new syntax, and they are compatible with each other. +The new syntax is just more conventional in terms of wording. + +Unset the TTL setting for all path pattern. + +#### Show TTL + +To Show TTL, we can use following SQL statement: + +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` + +The SHOW ALL TTL example gives the TTL for all path patterns. +The SHOW TTL ON pathPattern shows the TTL for the path pattern specified. + +Display devices' ttl +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +All devices will definitely have a TTL, meaning it cannot be null. INF represents infinity. + +## Device Template + +IoTDB supports the device template function, enabling different entities of the same type to share metadata, reduce the memory usage of metadata, and simplify the management of numerous entities and measurements. + +![img](https://alioss.timecho.com/docs/img/%E6%A8%A1%E6%9D%BF.png) + +![img](https://alioss.timecho.com/docs/img/templateEN.jpg) + +### Create Device Template + +The SQL syntax for creating a metadata template is as follows: + +```sql +CREATE DEVICE TEMPLATE ALIGNED? '(' [',' ]+ ')' +``` + +**Example 1:** Create a template containing two non-aligned timeseries + +```shell +IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +**Example 2:** Create a template containing a group of aligned timeseries + +```shell +IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` + +The` lat` and `lon` measurements are aligned. + + +### Set Device Template + +After a device template is created, it should be set to specific path before creating related timeseries or insert data. + +**It should be ensured that the related database has been set before setting template.** + +**It is recommended to set device template to database path. It is not suggested to set device template to some path above database** + +**It is forbidden to create timeseries under a path setting s tedeviceplate. Device template shall not be set on a prefix path of an existing timeseries.** + +The SQL Statement for setting device template is as follow: + +```shell +IoTDB> set device template t1 to root.sg1.d1 +``` + +### Activate Device Template + +After setting the device template, with the system enabled to auto create schema, you can insert data into the timeseries. For example, suppose there's a database root.sg1 and t1 has been set to root.sg1.d1, then timeseries like root.sg1.d1.temperature and root.sg1.d1.status are available and data points can be inserted. + + +**Attention**: Before inserting data or the system not enabled to auto create schema, timeseries defined by the device template will not be created. You can use the following SQL statement to create the timeseries or activate the templdeviceate, act before inserting data: + +```shell +IoTDB> create timeseries using device template on root.sg1.d1 +``` + +**Example:** Execute the following statement + +```shell +IoTDB> set device template t1 to root.sg1.d1 +IoTDB> set device template t2 to root.sg1.d2 +IoTDB> create timeseries using device template on root.sg1.d1 +IoTDB> create timeseries using device template on root.sg1.d2 +``` + +Show the time series: + +```sql +show timeseries root.sg1.** +```` + +```shell ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| +| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +``` + +Show the devices: + +```sql +show devices root.sg1.** +```` + +```shell ++---------------+---------+ +| devices|isAligned| ++---------------+---------+ +| root.sg1.d1| false| +| root.sg1.d2| true| ++---------------+---------+ +```` + +### Show Device Template + +- Show all device templates + +The SQL statement looks like this: + +```shell +IoTDB> show device templates +``` + +The execution result is as follows: + +```shell ++-------------+ +|template name| ++-------------+ +| t2| +| t1| ++-------------+ +``` + +- Show nodes under in device template + +The SQL statement looks like this: + +```shell +IoTDB> show nodes in device template t1 +``` + +The execution result is as follows: + +```shell ++-----------+--------+--------+-----------+ +|child nodes|dataType|encoding|compression| ++-----------+--------+--------+-----------+ +|temperature| FLOAT| RLE| SNAPPY| +| status| BOOLEAN| PLAIN| SNAPPY| ++-----------+--------+--------+-----------+ +``` + +- Show the path prefix where a device template is set + +```shell +IoTDB> show paths set device template t1 +``` + +The execution result is as follows: + +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +- Show the path prefix where a device template is used (i.e. the time series has been created) + +```shell +IoTDB> show paths using device template t1 +``` + +The execution result is as follows: + +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +### Deactivate device Template + +To delete a group of timeseries represented by device template, namely deactivate the device template, use the following SQL statement: + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.d1 +``` + +or + +```shell +IoTDB> deactivate device template t1 from root.sg1.d1 +``` + +The deactivation supports batch process. + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* +``` + +or + +```shell +IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* +``` + +If the template name is not provided in sql, all template activation on paths matched by given path pattern will be removed. + +### Unset Device Template + +The SQL Statement for unsetting device template is as follow: + +```shell +IoTDB> unset device template t1 from root.sg1.d1 +``` + +**Attention**: It should be guaranteed that none of the timeseries represented by the target device template exists, before unset it. It can be achieved by deactivation operation. + +### Drop Device Template + +The SQL Statement for dropping device template is as follow: + +```shell +IoTDB> drop device template t1 +``` + +**Attention**: Dropping an already set template is not supported. + +### Alter Device Template + +In a scenario where measurements need to be added, you can modify the template to add measurements to all devicesdevice using the device template. + +The SQL Statement for altering device template is as follow: + +```shell +IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) +``` + +**When executing data insertion to devices with device template set on related prefix path and there are measurements not present in this device template, the measurements will be auto added to this device template.** + +## Timeseries Management + +### Create Timeseries + +According to the storage model selected before, we can create corresponding timeseries in the two databases respectively. The SQL statements for creating timeseries are as follows: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +From v0.13, you can use a simplified version of the SQL statements to create timeseries: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE +``` + +Notice that when in the CREATE TIMESERIES statement the encoding method conflicts with the data type, the system gives the corresponding error prompt as shown below: + +``` +IoTDB > create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +error: encoding TS_2DIFF does not support BOOLEAN +``` + +Please refer to [Encoding](../Basic-Concept/Encoding-and-Compression.md) for correspondence between data type and encoding. + +### Create Aligned Timeseries + +The SQL statement for creating a group of timeseries are as follows: + +``` +IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +You can set different datatype, encoding, and compression for the timeseries in a group of aligned timeseries + +It is also supported to set an alias, tag, and attribute for aligned timeseries. + +### Delete Timeseries + +To delete the timeseries we created before, we are able to use `(DELETE | DROP) TimeSeries ` statement. + +The usage are as follows: + +``` +IoTDB> delete timeseries root.ln.wf01.wt01.status +IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +IoTDB> delete timeseries root.ln.wf02.* +IoTDB> drop timeseries root.ln.wf02.* +``` + +### Show Timeseries + +* SHOW LATEST? TIMESERIES pathPattern? whereClause? limitClause? + + There are four optional clauses added in SHOW TIMESERIES, return information of time series + +Timeseries information includes: timeseries path, alias of measurement, database it belongs to, data type, encoding type, compression type, tags and attributes. + +Examples: + +* SHOW TIMESERIES + + presents all timeseries information in JSON form + +* SHOW TIMESERIES <`PathPattern`> + + returns all timeseries information matching the given <`PathPattern`>. SQL statements are as follows: + +``` +IoTDB> show timeseries root.** +IoTDB> show timeseries root.ln.** +``` + +The results are shown below respectively: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.016s + ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +Total line number = 4 +It costs 0.004s +``` + +* SHOW TIMESERIES LIMIT INT OFFSET INT + + returns all the timeseries information start from the offset and limit the number of series returned. For example, + +``` +show timeseries root.ln.** limit 10 offset 10 +``` + +* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' + + The query result set is filtered by string fuzzy matching based on the names of the timeseries. For example: + +``` +show timeseries root.ln.** where timeseries contains 'wf01.wt' +``` + +The result is shown below: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 2 +It costs 0.016s +``` + +* SHOW TIMESERIES WHERE DataType=type + + The query result set is filtered by data type. For example: + +``` +show timeseries root.ln.** where dataType=FLOAT +``` + +The result is shown below: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 3 +It costs 0.016s + +``` + + +* SHOW LATEST TIMESERIES + + all the returned timeseries information should be sorted in descending order of the last timestamp of timeseries + +It is worth noting that when the queried path does not exist, the system will return no timeseries. + + +### Count Timeseries + +IoTDB is able to use `COUNT TIMESERIES ` to count the number of timeseries matching the path. SQL statements are as follows: + +* `WHERE` condition could be used to fuzzy match a time series name with the following syntax: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'`. +* `WHERE` condition could be used to filter result by data type with the syntax: `COUNT TIMESERIES WHERE DataType='`. +* `WHERE` condition could be used to filter result by tags with the syntax: `COUNT TIMESERIES WHERE TAGS(key)='value'` or `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`. +* `LEVEL` could be defined to show count the number of timeseries of each node at the given level in current Metadata Tree. This could be used to query the number of sensors under each device. The grammar is: `COUNT TIMESERIES GROUP BY LEVEL=`. + + +``` +IoTDB > COUNT TIMESERIES root.** +IoTDB > COUNT TIMESERIES root.ln.** +IoTDB > COUNT TIMESERIES root.ln.*.*.status +IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +``` + +For example, if there are several timeseries (use `show timeseries` to show all timeseries): + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.004s +``` + +Then the Metadata Tree will be as below: + +
+ +As can be seen, `root` is considered as `LEVEL=0`. So when you enter statements such as: + +``` +IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +You will get following results: + +``` ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +| root.sgcc| 2| +|root.turbine| 1| +| root.ln| 4| ++------------+-----------------+ +Total line number = 3 +It costs 0.002s + ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf02| 2| +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 2 +It costs 0.002s + ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 1 +It costs 0.002s +``` + +> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. + +### Tag and Attribute Management + +We can also add an alias, extra tag and attribute information while creating one timeseries. + +The differences between tag and attribute are: + +* Tag could be used to query the path of timeseries, we will maintain an inverted index in memory on the tag: Tag -> Timeseries +* Attribute could only be queried by timeseries path : Timeseries -> Attribute + +The SQL statements for creating timeseries with extra tag and attribute information are extended as follows: + +``` +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +The `temprature` in the brackets is an alias for the sensor `s1`. So we can use `temprature` to replace `s1` anywhere. + +> IoTDB also supports using AS function to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. + +> Notice that the size of the extra tag and attribute information shouldn't exceed the `tag_attribute_total_size`. + +We can update the tag information after creating it as following: + +* Rename the tag/attribute key + +``` +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +* Reset the tag/attribute value + +``` +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` + +* Delete the existing tag/attribute + +``` +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` + +* Add new tags + +``` +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` + +* Add new attributes + +``` +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` + +* Upsert alias, tags and attributes + +> add alias or a new key-value if the alias or key doesn't exist, otherwise, update the old one with new value. + +``` +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag3=v3, tag4=v4) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* Show timeseries using tags. Use TAGS(tagKey) to identify the tags used as filter key + +``` +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +returns all the timeseries information that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +``` +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +The results are shown below respectly: + +``` ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.005s + ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.004s +``` + +- count timeseries using tags + +``` +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +returns all the number of timeseries that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +``` +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +The results are shown below respectly : + +``` +IoTDB> count timeseries ++-----------------+ +|count(timeseries)| ++-----------------+ +| 6| ++-----------------+ +Total line number = 1 +It costs 0.019s +IoTDB> count timeseries root.** where TAGS(unit)='c' ++-----------------+ +|count(timeseries)| ++-----------------+ +| 2| ++-----------------+ +Total line number = 1 +It costs 0.020s +IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 ++--------------+-----------------+ +| column|count(timeseries)| ++--------------+-----------------+ +| root.ln.wf02| 2| +| root.ln.wf01| 0| +|root.sgcc.wf03| 0| ++--------------+-----------------+ +Total line number = 3 +It costs 0.011s +``` + +> Notice that, we only support one condition in the where clause. Either it's an equal filter or it is an `contains` filter. In both case, the property in the where condition must be a tag. + +create aligned timeseries + +``` +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +The execution result is as follows: + +``` +IoTDB> show timeseries ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| +|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +Support query: + +``` +IoTDB> show timeseries where TAGS(tag1)='v1' ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +The above operations are supported for timeseries tag, attribute updates, etc. + +## Node Management + +### Show Child Paths + +``` +SHOW CHILD PATHS pathPattern +``` + +Return all child paths and their node types of all the paths matching pathPattern. + +node types: ROOT -> DB INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES + + +Example: + +* return the child paths of root.ln:show child paths root.ln + +``` ++------------+----------+ +| child paths|node types| ++------------+----------+ +|root.ln.wf01| INTERNAL| +|root.ln.wf02| INTERNAL| ++------------+----------+ +Total line number = 2 +It costs 0.002s +``` + +> get all paths in form of root.xx.xx.xx:show child paths root.xx.xx + +### Show Child Nodes + +``` +SHOW CHILD NODES pathPattern +``` + +Return all child nodes of the pathPattern. + +Example: + +* return the child nodes of root:show child nodes root + +``` ++------------+ +| child nodes| ++------------+ +| ln| ++------------+ +``` + +* return the child nodes of root.ln:show child nodes root.ln + +``` ++------------+ +| child nodes| ++------------+ +| wf01| +| wf02| ++------------+ +``` + +### Count Nodes + +IoTDB is able to use `COUNT NODES LEVEL=` to count the number of nodes at + the given level in current Metadata Tree considering a given pattern. IoTDB will find paths that + match the pattern and counts distinct nodes at the specified level among the matched paths. + This could be used to query the number of devices with specified measurements. The usage are as + follows: + +``` +IoTDB > COUNT NODES root.** LEVEL=2 +IoTDB > COUNT NODES root.ln.** LEVEL=2 +IoTDB > COUNT NODES root.ln.wf01.** LEVEL=3 +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` + +As for the above mentioned example and Metadata tree, you can get following results: + +``` ++------------+ +|count(nodes)| ++------------+ +| 4| ++------------+ +Total line number = 1 +It costs 0.003s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 1| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s +``` + +> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. + +### Show Devices + +* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? + +Similar to `Show Timeseries`, IoTDB also supports two ways of viewing devices: + +* `SHOW DEVICES` statement presents all devices' information, which is equal to `SHOW DEVICES root.**`. +* `SHOW DEVICES ` statement specifies the `PathPattern` and returns the devices information matching the pathPattern and under the given level. +* `WHERE` condition supports `DEVICE contains 'xxx'` to do a fuzzy query based on the device name. + +SQL statement is as follows: + +``` +IoTDB> show devices +IoTDB> show devices root.ln.** +IoTDB> show devices root.ln.** where device contains 't' +``` + +You can get results below: + +``` ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +| root.ln.wf01.wt01| false| +| root.ln.wf02.wt02| false| +|root.sgcc.wf03.wt01| false| +| root.turbine.d1| false| ++-------------------+---------+ +Total line number = 4 +It costs 0.002s + ++-----------------+---------+ +| devices|isAligned| ++-----------------+---------+ +|root.ln.wf01.wt01| false| +|root.ln.wf02.wt02| false| ++-----------------+---------+ +Total line number = 2 +It costs 0.001s +``` + +`isAligned` indicates whether the timeseries under the device are aligned. + +To view devices' information with database, we can use `SHOW DEVICES WITH DATABASE` statement. + +* `SHOW DEVICES WITH DATABASE` statement presents all devices' information with their database. +* `SHOW DEVICES WITH DATABASE` statement specifies the `PathPattern` and returns the + devices' information under the given level with their database information. + +SQL statement is as follows: + +``` +IoTDB> show devices with database +IoTDB> show devices root.ln.** with database +``` + +You can get results below: + +``` ++-------------------+-------------+---------+ +| devices| database|isAligned| ++-------------------+-------------+---------+ +| root.ln.wf01.wt01| root.ln| false| +| root.ln.wf02.wt02| root.ln| false| +|root.sgcc.wf03.wt01| root.sgcc| false| +| root.turbine.d1| root.turbine| false| ++-------------------+-------------+---------+ +Total line number = 4 +It costs 0.003s + ++-----------------+-------------+---------+ +| devices| database|isAligned| ++-----------------+-------------+---------+ +|root.ln.wf01.wt01| root.ln| false| +|root.ln.wf02.wt02| root.ln| false| ++-----------------+-------------+---------+ +Total line number = 2 +It costs 0.001s +``` + +### Count Devices + +* COUNT DEVICES / + +The above statement is used to count the number of devices. At the same time, it is allowed to specify `PathPattern` to count the number of devices matching the `PathPattern`. + +SQL statement is as follows: + +``` +IoTDB> show devices +IoTDB> count devices +IoTDB> count devices root.ln.** +``` + +You can get results below: + +``` ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +|root.sgcc.wf03.wt03| false| +| root.turbine.d1| false| +| root.ln.wf02.wt02| false| +| root.ln.wf01.wt01| false| ++-------------------+---------+ +Total line number = 4 +It costs 0.024s + ++--------------+ +|count(devices)| ++--------------+ +| 4| ++--------------+ +Total line number = 1 +It costs 0.004s + ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +Total line number = 1 +It costs 0.004s +``` + diff --git a/src/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_timecho.md b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_timecho.md new file mode 100644 index 00000000..8d57facb --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_timecho.md @@ -0,0 +1,1324 @@ + + +# Timeseries Management + +## Database Management + +### Create Database + +According to the storage model we can set up the corresponding database. Two SQL statements are supported for creating databases, as follows: + +``` +IoTDB > create database root.ln +IoTDB > create database root.sgcc +``` + +We can thus create two databases using the above two SQL statements. + +It is worth noting that 1 database is recommended. + +When the path itself or the parent/child layer of the path is already created as database, the path is then not allowed to be created as database. For example, it is not feasible to create `root.ln.wf01` as database when two databases `root.ln` and `root.sgcc` exist. The system gives the corresponding error prompt as shown below: + +``` +IoTDB> CREATE DATABASE root.ln.wf01 +Msg: 300: root.ln has already been created as database. +IoTDB> create database root.ln.wf01 +Msg: 300: root.ln has already been created as database. +``` + +The LayerName of database can only be chinese or english characters, numbers, underscores, dots and backticks. If you want to set it to pure numbers or contain backticks or dots, you need to enclose the database name with backticks (` `` `). In ` `` `,2 backticks represents one, i.e. ` ```` ` represents `` ` ``. + +Besides, if deploy on Windows system, the LayerName is case-insensitive, which means it's not allowed to create databases `root.ln` and `root.LN` at the same time. + +### Show Databases + +After creating the database, we can use the [SHOW DATABASES](../SQL-Manual/SQL-Manual.md) statement and [SHOW DATABASES \](../SQL-Manual/SQL-Manual.md) to view the databases. The SQL statements are as follows: + +``` +IoTDB> SHOW DATABASES +IoTDB> SHOW DATABASES root.** +``` + +The result is as follows: + +``` ++-------------+----+-------------------------+-----------------------+-----------------------+ +|database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| ++-------------+----+-------------------------+-----------------------+-----------------------+ +| root.sgcc|null| 2| 2| 604800| +| root.ln|null| 2| 2| 604800| ++-------------+----+-------------------------+-----------------------+-----------------------+ +Total line number = 2 +It costs 0.060s +``` + +### Delete Database + +User can use the `DELETE DATABASE ` statement to delete all databases matching the pathPattern. Please note the data in the database will also be deleted. + +``` +IoTDB > DELETE DATABASE root.ln +IoTDB > DELETE DATABASE root.sgcc +// delete all data, all timeseries and all databases +IoTDB > DELETE DATABASE root.** +``` + +### Count Databases + +User can use the `COUNT DATABASE ` statement to count the number of databases. It is allowed to specify `PathPattern` to count the number of databases matching the `PathPattern`. + +SQL statement is as follows: + +``` +IoTDB> count databases +IoTDB> count databases root.* +IoTDB> count databases root.sgcc.* +IoTDB> count databases root.sgcc +``` + +The result is as follows: + +``` ++-------------+ +| database| ++-------------+ +| root.sgcc| +| root.turbine| +| root.ln| ++-------------+ +Total line number = 3 +It costs 0.003s + ++-------------+ +| database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.003s + ++-------------+ +| database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 0| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 1| ++-------------+ +Total line number = 1 +It costs 0.002s +``` + +### Setting up heterogeneous databases (Advanced operations) + +Under the premise of familiar with IoTDB metadata modeling, +users can set up heterogeneous databases in IoTDB to cope with different production needs. + +Currently, the following database heterogeneous parameters are supported: + +| Parameter | Type | Description | +| ------------------------- | ------- | --------------------------------------------- | +| TTL | Long | TTL of the Database | +| SCHEMA_REPLICATION_FACTOR | Integer | The schema replication number of the Database | +| DATA_REPLICATION_FACTOR | Integer | The data replication number of the Database | +| SCHEMA_REGION_GROUP_NUM | Integer | The SchemaRegionGroup number of the Database | +| DATA_REGION_GROUP_NUM | Integer | The DataRegionGroup number of the Database | + +Note the following when configuring heterogeneous parameters: + ++ TTL and TIME_PARTITION_INTERVAL must be positive integers. ++ SCHEMA_REPLICATION_FACTOR and DATA_REPLICATION_FACTOR must be smaller than or equal to the number of deployed DataNodes. ++ The function of SCHEMA_REGION_GROUP_NUM and DATA_REGION_GROUP_NUM are related to the parameter `schema_region_group_extension_policy` and `data_region_group_extension_policy` in iotdb-common.properties configuration file. Take DATA_REGION_GROUP_NUM as an example: + If `data_region_group_extension_policy=CUSTOM` is set, DATA_REGION_GROUP_NUM serves as the number of DataRegionGroups owned by the Database. + If `data_region_group_extension_policy=AUTO`, DATA_REGION_GROUP_NUM is used as the lower bound of the DataRegionGroup quota owned by the Database. That is, when the Database starts writing data, it will have at least this number of DataRegionGroups. + +Users can set any heterogeneous parameters when creating a Database, or adjust some heterogeneous parameters during a stand-alone/distributed IoTDB run. + +#### Set heterogeneous parameters when creating a Database + +The user can set any of the above heterogeneous parameters when creating a Database. The SQL statement is as follows: + +``` +CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? +``` + +For example: + +``` +CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### Adjust heterogeneous parameters at run time + +Users can adjust some heterogeneous parameters during the IoTDB runtime, as shown in the following SQL statement: + +``` +ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* +``` + +For example: + +``` +ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +Note that only the following heterogeneous parameters can be adjusted at runtime: + ++ SCHEMA_REGION_GROUP_NUM ++ DATA_REGION_GROUP_NUM + +#### Show heterogeneous databases + +The user can query the specific heterogeneous configuration of each Database, and the SQL statement is as follows: + +``` +SHOW DATABASES DETAILS prefixPath? +``` + +For example: + +``` +IoTDB> SHOW DATABASES DETAILS ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| +|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| +|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +Total line number = 3 +It costs 0.058s +``` + +The query results in each column are as follows: + ++ The name of the Database ++ The TTL of the Database ++ The schema replication number of the Database ++ The data replication number of the Database ++ The time partition interval of the Database ++ The current SchemaRegionGroup number of the Database ++ The required minimum SchemaRegionGroup number of the Database ++ The permitted maximum SchemaRegionGroup number of the Database ++ The current DataRegionGroup number of the Database ++ The required minimum DataRegionGroup number of the Database ++ The permitted maximum DataRegionGroup number of the Database + +### TTL + +IoTDB supports device-level TTL settings, which means it is able to delete old data automatically and periodically. The benefit of using TTL is that hopefully you can control the total disk space usage and prevent the machine from running out of disks. Moreover, the query performance may downgrade as the total number of files goes up and the memory usage also increases as there are more files. Timely removing such files helps to keep at a high query performance level and reduce memory usage. + +The default unit of TTL is milliseconds. If the time precision in the configuration file changes to another, the TTL is still set to milliseconds. + +When setting TTL, the system will look for all devices included in the set path and set TTL for these devices. The system will delete expired data at the device granularity. +After the device data expires, it will not be queryable. The data in the disk file cannot be guaranteed to be deleted immediately, but it can be guaranteed to be deleted eventually. +However, due to operational costs, the expired data will not be physically deleted right after expiring. The physical deletion is delayed until compaction. +Therefore, before the data is physically deleted, if the TTL is reduced or lifted, it may cause data that was previously invisible due to TTL to reappear. +The system can only set up to 1000 TTL rules, and when this limit is reached, some TTL rules need to be deleted before new rules can be set. + +#### TTL Path Rule +The path can only be prefix paths (i.e., the path cannot contain \* , except \*\* in the last level). +This path will match devices and also allows users to specify paths without asterisks as specific databases or devices. +When the path does not contain asterisks, the system will check if it matches a database; if it matches a database, both the path and path.\*\* will be set at the same time. Note: Device TTL settings do not verify the existence of metadata, i.e., it is allowed to set TTL for a non-existent device. +``` +qualified paths: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +unqualified paths: +root.*.db +root.**.db.* +root.db.* +``` +#### TTL Applicable Rules +When a device is subject to multiple TTL rules, the more precise and longer rules are prioritized. For example, for the device "root.bj.hd.dist001.turbine001", the rule "root.bj.hd.dist001.turbine001" takes precedence over "root.bj.hd.dist001.\*\*", and the rule "root.bj.hd.dist001.\*\*" takes precedence over "root.bj.hd.**". +#### Set TTL +The set ttl operation can be understood as setting a TTL rule, for example, setting ttl to root.sg.group1.** is equivalent to mounting ttl for all devices that can match this path pattern. +The unset ttl operation indicates unmounting TTL for the corresponding path pattern; if there is no corresponding TTL, nothing will be done. +If you want to set TTL to be infinitely large, you can use the INF keyword. +The SQL Statement for setting TTL is as follow: +``` +set ttl to pathPattern 360000; +``` +Set the Time to Live (TTL) to a pathPattern of 360,000 milliseconds; the pathPattern should not contain a wildcard (\*) in the middle and must end with a double asterisk (\*\*). The pathPattern is used to match corresponding devices. +To maintain compatibility with older SQL syntax, if the user-provided pathPattern matches a database (db), the path pattern is automatically expanded to include all sub-paths denoted by path.\*\*. +For instance, writing "set ttl to root.sg 360000" will automatically be transformed into "set ttl to root.sg.\*\* 360000", which sets the TTL for all devices under root.sg. However, if the specified pathPattern does not match a database, the aforementioned logic will not apply. For example, writing "set ttl to root.sg.group 360000" will not be expanded to "root.sg.group.\*\*" since root.sg.group does not match a database. +It is also permissible to specify a particular device without a wildcard (*). +#### Unset TTL + +To unset TTL, we can use follwing SQL statement: + +``` +IoTDB> unset ttl from root.ln +``` + +After unset TTL, all data will be accepted in `root.ln`. +``` +IoTDB> unset ttl from root.sgcc.** +``` + +Unset the TTL in the `root.sgcc` path. + +New syntax +``` +IoTDB> unset ttl from root.** +``` + +Old syntax +``` +IoTDB> unset ttl to root.** +``` +There is no functional difference between the old and new syntax, and they are compatible with each other. +The new syntax is just more conventional in terms of wording. + +Unset the TTL setting for all path pattern. + +#### Show TTL + +To Show TTL, we can use following SQL statement: + +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` + +The SHOW ALL TTL example gives the TTL for all path patterns. +The SHOW TTL ON pathPattern shows the TTL for the path pattern specified. + +Display devices' ttl +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +All devices will definitely have a TTL, meaning it cannot be null. INF represents infinity. + + +## Device Template + +IoTDB supports the device template function, enabling different entities of the same type to share metadata, reduce the memory usage of metadata, and simplify the management of numerous entities and measurements. + + +### Create Device Template + +The SQL syntax for creating a metadata template is as follows: + +```sql +CREATE DEVICE TEMPLATE ALIGNED? '(' [',' ]+ ')' +``` + +**Example 1:** Create a template containing two non-aligned timeseries + +```shell +IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +**Example 2:** Create a template containing a group of aligned timeseries + +```shell +IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` + +The` lat` and `lon` measurements are aligned. + +![img](https://alioss.timecho.com/docs/img/%E6%A8%A1%E6%9D%BF.png) + +![img](https://alioss.timecho.com/docs/img/templateEN.jpg) + +### Set Device Template + +After a device template is created, it should be set to specific path before creating related timeseries or insert data. + +**It should be ensured that the related database has been set before setting template.** + +**It is recommended to set device template to database path. It is not suggested to set device template to some path above database** + +**It is forbidden to create timeseries under a path setting s tedeviceplate. Device template shall not be set on a prefix path of an existing timeseries.** + +The SQL Statement for setting device template is as follow: + +```shell +IoTDB> set device template t1 to root.sg1.d1 +``` + +### Activate Device Template + +After setting the device template, with the system enabled to auto create schema, you can insert data into the timeseries. For example, suppose there's a database root.sg1 and t1 has been set to root.sg1.d1, then timeseries like root.sg1.d1.temperature and root.sg1.d1.status are available and data points can be inserted. + + +**Attention**: Before inserting data or the system not enabled to auto create schema, timeseries defined by the device template will not be created. You can use the following SQL statement to create the timeseries or activate the templdeviceate, act before inserting data: + +```shell +IoTDB> create timeseries using device template on root.sg1.d1 +``` + +**Example:** Execute the following statement + +```shell +IoTDB> set device template t1 to root.sg1.d1 +IoTDB> set device template t2 to root.sg1.d2 +IoTDB> create timeseries using device template on root.sg1.d1 +IoTDB> create timeseries using device template on root.sg1.d2 +``` + +Show the time series: + +```sql +show timeseries root.sg1.** +```` + +```shell ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| +| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +``` + +Show the devices: + +```sql +show devices root.sg1.** +```` + +```shell ++---------------+---------+ +| devices|isAligned| ++---------------+---------+ +| root.sg1.d1| false| +| root.sg1.d2| true| ++---------------+---------+ +```` + +### Show Device Template + +- Show all device templates + +The SQL statement looks like this: + +```shell +IoTDB> show device templates +``` + +The execution result is as follows: + +```shell ++-------------+ +|template name| ++-------------+ +| t2| +| t1| ++-------------+ +``` + +- Show nodes under in device template + +The SQL statement looks like this: + +```shell +IoTDB> show nodes in device template t1 +``` + +The execution result is as follows: + +```shell ++-----------+--------+--------+-----------+ +|child nodes|dataType|encoding|compression| ++-----------+--------+--------+-----------+ +|temperature| FLOAT| RLE| SNAPPY| +| status| BOOLEAN| PLAIN| SNAPPY| ++-----------+--------+--------+-----------+ +``` + +- Show the path prefix where a device template is set + +```shell +IoTDB> show paths set device template t1 +``` + +The execution result is as follows: + +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +- Show the path prefix where a device template is used (i.e. the time series has been created) + +```shell +IoTDB> show paths using device template t1 +``` + +The execution result is as follows: + +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +### Deactivate device Template + +To delete a group of timeseries represented by device template, namely deactivate the device template, use the following SQL statement: + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.d1 +``` + +or + +```shell +IoTDB> deactivate device template t1 from root.sg1.d1 +``` + +The deactivation supports batch process. + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* +``` + +or + +```shell +IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* +``` + +If the template name is not provided in sql, all template activation on paths matched by given path pattern will be removed. + +### Unset Device Template + +The SQL Statement for unsetting device template is as follow: + +```shell +IoTDB> unset device template t1 from root.sg1.d1 +``` + +**Attention**: It should be guaranteed that none of the timeseries represented by the target device template exists, before unset it. It can be achieved by deactivation operation. + +### Drop Device Template + +The SQL Statement for dropping device template is as follow: + +```shell +IoTDB> drop device template t1 +``` + +**Attention**: Dropping an already set template is not supported. + +### Alter Device Template + +In a scenario where measurements need to be added, you can modify the template to add measurements to all devicesdevice using the device template. + +The SQL Statement for altering device template is as follow: + +```shell +IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) +``` + +**When executing data insertion to devices with device template set on related prefix path and there are measurements not present in this device template, the measurements will be auto added to this device template.** + +## Timeseries Management + +### Create Timeseries + +According to the storage model selected before, we can create corresponding timeseries in the two databases respectively. The SQL statements for creating timeseries are as follows: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +From v0.13, you can use a simplified version of the SQL statements to create timeseries: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE +``` + +Notice that when in the CREATE TIMESERIES statement the encoding method conflicts with the data type, the system gives the corresponding error prompt as shown below: + +``` +IoTDB > create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +error: encoding TS_2DIFF does not support BOOLEAN +``` + +Please refer to [Encoding](../Basic-Concept/Encoding-and-Compression.md) for correspondence between data type and encoding. + +### Create Aligned Timeseries + +The SQL statement for creating a group of timeseries are as follows: + +``` +IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +You can set different datatype, encoding, and compression for the timeseries in a group of aligned timeseries + +It is also supported to set an alias, tag, and attribute for aligned timeseries. + +### Delete Timeseries + +To delete the timeseries we created before, we are able to use `(DELETE | DROP) TimeSeries ` statement. + +The usage are as follows: + +``` +IoTDB> delete timeseries root.ln.wf01.wt01.status +IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +IoTDB> delete timeseries root.ln.wf02.* +IoTDB> drop timeseries root.ln.wf02.* +``` + +### Show Timeseries + +* SHOW LATEST? TIMESERIES pathPattern? whereClause? limitClause? + + There are four optional clauses added in SHOW TIMESERIES, return information of time series + +Timeseries information includes: timeseries path, alias of measurement, database it belongs to, data type, encoding type, compression type, tags and attributes. + +Examples: + +* SHOW TIMESERIES + + presents all timeseries information in JSON form + +* SHOW TIMESERIES <`PathPattern`> + + returns all timeseries information matching the given <`PathPattern`>. SQL statements are as follows: + +``` +IoTDB> show timeseries root.** +IoTDB> show timeseries root.ln.** +``` + +The results are shown below respectively: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.016s + ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +Total line number = 4 +It costs 0.004s +``` + +* SHOW TIMESERIES LIMIT INT OFFSET INT + + returns all the timeseries information start from the offset and limit the number of series returned. For example, + +``` +show timeseries root.ln.** limit 10 offset 10 +``` + +* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' + + The query result set is filtered by string fuzzy matching based on the names of the timeseries. For example: + +``` +show timeseries root.ln.** where timeseries contains 'wf01.wt' +``` + +The result is shown below: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 2 +It costs 0.016s +``` + +* SHOW TIMESERIES WHERE DataType=type + + The query result set is filtered by data type. For example: + +``` +show timeseries root.ln.** where dataType=FLOAT +``` + +The result is shown below: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 3 +It costs 0.016s + +``` + + +* SHOW LATEST TIMESERIES + + all the returned timeseries information should be sorted in descending order of the last timestamp of timeseries + +It is worth noting that when the queried path does not exist, the system will return no timeseries. + + +### Count Timeseries + +IoTDB is able to use `COUNT TIMESERIES ` to count the number of timeseries matching the path. SQL statements are as follows: + +* `WHERE` condition could be used to fuzzy match a time series name with the following syntax: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'`. +* `WHERE` condition could be used to filter result by data type with the syntax: `COUNT TIMESERIES WHERE DataType='`. +* `WHERE` condition could be used to filter result by tags with the syntax: `COUNT TIMESERIES WHERE TAGS(key)='value'` or `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`. +* `LEVEL` could be defined to show count the number of timeseries of each node at the given level in current Metadata Tree. This could be used to query the number of sensors under each device. The grammar is: `COUNT TIMESERIES GROUP BY LEVEL=`. + + +``` +IoTDB > COUNT TIMESERIES root.** +IoTDB > COUNT TIMESERIES root.ln.** +IoTDB > COUNT TIMESERIES root.ln.*.*.status +IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +``` + +For example, if there are several timeseries (use `show timeseries` to show all timeseries): + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.004s +``` + +Then the Metadata Tree will be as below: + +
+ +As can be seen, `root` is considered as `LEVEL=0`. So when you enter statements such as: + +``` +IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +You will get following results: + +``` ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +| root.sgcc| 2| +|root.turbine| 1| +| root.ln| 4| ++------------+-----------------+ +Total line number = 3 +It costs 0.002s + ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf02| 2| +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 2 +It costs 0.002s + ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 1 +It costs 0.002s +``` + +> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. + +### Active Timeseries Query +By adding WHERE time filter conditions to the existing SHOW/COUNT TIMESERIES, we can obtain time series with data within the specified time range. + +It is important to note that in metadata queries with time filters, views are not considered; only the time series actually stored in the TsFile are taken into account. + +An example usage is as follows: +``` +IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); +IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); +IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); +IoTDB> show timeseries; ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data3.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data3.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ + +IoTDB> show timeseries where time >= 15000 and time < 16000; ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ + +IoTDB> count timeseries where time >= 15000 and time < 16000; ++-----------------+ +|count(timeseries)| ++-----------------+ +| 4| ++-----------------+ +``` +Regarding the definition of active time series, data that can be queried normally is considered active, meaning time series that have been inserted but deleted are not included. +### Tag and Attribute Management + +We can also add an alias, extra tag and attribute information while creating one timeseries. + +The differences between tag and attribute are: + +* Tag could be used to query the path of timeseries, we will maintain an inverted index in memory on the tag: Tag -> Timeseries +* Attribute could only be queried by timeseries path : Timeseries -> Attribute + +The SQL statements for creating timeseries with extra tag and attribute information are extended as follows: + +``` +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +The `temprature` in the brackets is an alias for the sensor `s1`. So we can use `temprature` to replace `s1` anywhere. + +> IoTDB also supports using AS function to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. + +> Notice that the size of the extra tag and attribute information shouldn't exceed the `tag_attribute_total_size`. + +We can update the tag information after creating it as following: + +* Rename the tag/attribute key + +``` +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +* Reset the tag/attribute value + +``` +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` + +* Delete the existing tag/attribute + +``` +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` + +* Add new tags + +``` +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` + +* Add new attributes + +``` +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` + +* Upsert alias, tags and attributes + +> add alias or a new key-value if the alias or key doesn't exist, otherwise, update the old one with new value. + +``` +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag3=v3, tag4=v4) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* Show timeseries using tags. Use TAGS(tagKey) to identify the tags used as filter key + +``` +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +returns all the timeseries information that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +``` +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +The results are shown below respectly: + +``` ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.005s + ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.004s +``` + +- count timeseries using tags + +``` +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +returns all the number of timeseries that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +``` +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +The results are shown below respectly : + +``` +IoTDB> count timeseries ++-----------------+ +|count(timeseries)| ++-----------------+ +| 6| ++-----------------+ +Total line number = 1 +It costs 0.019s +IoTDB> count timeseries root.** where TAGS(unit)='c' ++-----------------+ +|count(timeseries)| ++-----------------+ +| 2| ++-----------------+ +Total line number = 1 +It costs 0.020s +IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 ++--------------+-----------------+ +| column|count(timeseries)| ++--------------+-----------------+ +| root.ln.wf02| 2| +| root.ln.wf01| 0| +|root.sgcc.wf03| 0| ++--------------+-----------------+ +Total line number = 3 +It costs 0.011s +``` + +> Notice that, we only support one condition in the where clause. Either it's an equal filter or it is an `contains` filter. In both case, the property in the where condition must be a tag. + +create aligned timeseries + +``` +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +The execution result is as follows: + +``` +IoTDB> show timeseries ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| +|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +Support query: + +``` +IoTDB> show timeseries where TAGS(tag1)='v1' ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +The above operations are supported for timeseries tag, attribute updates, etc. + +## Node Management + +### Show Child Paths + +``` +SHOW CHILD PATHS pathPattern +``` + +Return all child paths and their node types of all the paths matching pathPattern. + +node types: ROOT -> DB INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES + + +Example: + +* return the child paths of root.ln:show child paths root.ln + +``` ++------------+----------+ +| child paths|node types| ++------------+----------+ +|root.ln.wf01| INTERNAL| +|root.ln.wf02| INTERNAL| ++------------+----------+ +Total line number = 2 +It costs 0.002s +``` + +> get all paths in form of root.xx.xx.xx:show child paths root.xx.xx + +### Show Child Nodes + +``` +SHOW CHILD NODES pathPattern +``` + +Return all child nodes of the pathPattern. + +Example: + +* return the child nodes of root:show child nodes root + +``` ++------------+ +| child nodes| ++------------+ +| ln| ++------------+ +``` + +* return the child nodes of root.ln:show child nodes root.ln + +``` ++------------+ +| child nodes| ++------------+ +| wf01| +| wf02| ++------------+ +``` + +### Count Nodes + +IoTDB is able to use `COUNT NODES LEVEL=` to count the number of nodes at + the given level in current Metadata Tree considering a given pattern. IoTDB will find paths that + match the pattern and counts distinct nodes at the specified level among the matched paths. + This could be used to query the number of devices with specified measurements. The usage are as + follows: + +``` +IoTDB > COUNT NODES root.** LEVEL=2 +IoTDB > COUNT NODES root.ln.** LEVEL=2 +IoTDB > COUNT NODES root.ln.wf01.** LEVEL=3 +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` + +As for the above mentioned example and Metadata tree, you can get following results: + +``` ++------------+ +|count(nodes)| ++------------+ +| 4| ++------------+ +Total line number = 1 +It costs 0.003s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 1| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s +``` + +> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. + +### Show Devices + +* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? + +Similar to `Show Timeseries`, IoTDB also supports two ways of viewing devices: + +* `SHOW DEVICES` statement presents all devices' information, which is equal to `SHOW DEVICES root.**`. +* `SHOW DEVICES ` statement specifies the `PathPattern` and returns the devices information matching the pathPattern and under the given level. +* `WHERE` condition supports `DEVICE contains 'xxx'` to do a fuzzy query based on the device name. + +SQL statement is as follows: + +``` +IoTDB> show devices +IoTDB> show devices root.ln.** +IoTDB> show devices root.ln.** where device contains 't' +``` + +You can get results below: + +``` ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +| root.ln.wf01.wt01| false| +| root.ln.wf02.wt02| false| +|root.sgcc.wf03.wt01| false| +| root.turbine.d1| false| ++-------------------+---------+ +Total line number = 4 +It costs 0.002s + ++-----------------+---------+ +| devices|isAligned| ++-----------------+---------+ +|root.ln.wf01.wt01| false| +|root.ln.wf02.wt02| false| ++-----------------+---------+ +Total line number = 2 +It costs 0.001s +``` + +`isAligned` indicates whether the timeseries under the device are aligned. + +To view devices' information with database, we can use `SHOW DEVICES WITH DATABASE` statement. + +* `SHOW DEVICES WITH DATABASE` statement presents all devices' information with their database. +* `SHOW DEVICES WITH DATABASE` statement specifies the `PathPattern` and returns the + devices' information under the given level with their database information. + +SQL statement is as follows: + +``` +IoTDB> show devices with database +IoTDB> show devices root.ln.** with database +``` + +You can get results below: + +``` ++-------------------+-------------+---------+ +| devices| database|isAligned| ++-------------------+-------------+---------+ +| root.ln.wf01.wt01| root.ln| false| +| root.ln.wf02.wt02| root.ln| false| +|root.sgcc.wf03.wt01| root.sgcc| false| +| root.turbine.d1| root.turbine| false| ++-------------------+-------------+---------+ +Total line number = 4 +It costs 0.003s + ++-----------------+-------------+---------+ +| devices| database|isAligned| ++-----------------+-------------+---------+ +|root.ln.wf01.wt01| root.ln| false| +|root.ln.wf02.wt02| root.ln| false| ++-----------------+-------------+---------+ +Total line number = 2 +It costs 0.001s +``` + +### Count Devices + +* COUNT DEVICES / + +The above statement is used to count the number of devices. At the same time, it is allowed to specify `PathPattern` to count the number of devices matching the `PathPattern`. + +SQL statement is as follows: + +``` +IoTDB> show devices +IoTDB> count devices +IoTDB> count devices root.ln.** +``` + +You can get results below: + +``` ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +|root.sgcc.wf03.wt03| false| +| root.turbine.d1| false| +| root.ln.wf02.wt02| false| +| root.ln.wf01.wt01| false| ++-------------------+---------+ +Total line number = 4 +It costs 0.024s + ++--------------+ +|count(devices)| ++--------------+ +| 4| ++--------------+ +Total line number = 1 +It costs 0.004s + ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +Total line number = 1 +It costs 0.004s +``` + +### Active Device Query +Similar to active timeseries query, we can add time filter conditions to device viewing and statistics to query active devices that have data within a certain time range. The definition of active here is the same as for active time series. An example usage is as follows: +``` +IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); +IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); +IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); +IoTDB> show devices; ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +| root.sg.data| false| +| root.sg.data2| false| +| root.sg.data3| false| ++-------------------+---------+ + +IoTDB> show devices where time >= 15000 and time < 16000; ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +| root.sg.data| false| +| root.sg.data2| false| ++-------------------+---------+ + +IoTDB> count devices where time >= 15000 and time < 16000; ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Basic-Concept/Query-Data.md b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Query-Data.md new file mode 100644 index 00000000..62fc3c9f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Query-Data.md @@ -0,0 +1,3009 @@ + +# Query Data +## OVERVIEW + +### Syntax Definition + +In IoTDB, `SELECT` statement is used to retrieve data from one or more selected time series. Here is the syntax definition of `SELECT` statement: + +```sql +SELECT [LAST] selectExpr [, selectExpr] ... + [INTO intoItem [, intoItem] ...] + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY { + ([startTime, endTime), interval [, slidingStep]) | + LEVEL = levelNum [, levelNum] ... | + TAGS(tagKey [, tagKey] ... ) | + VARIATION(expression[,delta][,ignoreNull=true/false]) | + CONDITION(expression,[keep>/>=/=/ 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` + +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires that the status and temperature sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. + +The execution result of this SQL statement is as follows: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 6 +It costs 0.018s +``` + +#### Select Multiple Columns of Data for the Same Device According to Multiple Time Intervals + +IoTDB supports specifying multiple time interval conditions in a query. Users can combine time interval conditions at will according to their needs. For example, the SQL statement is: + +```sql +select status,temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature"; the statement specifies two different time intervals, namely "2017-11-01T00:05:00.000 to 2017-11-01T00:12:00.000" and "2017-11-01T16:35:00.000 to 2017-11-01T16:37:00.000". The SQL statement requires that the values of selected timeseries satisfying any time interval be selected. + +The execution result of this SQL statement is as follows: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| +|2017-11-01T16:35:00.000+08:00| true| 23.44| +|2017-11-01T16:36:00.000+08:00| false| 21.98| +|2017-11-01T16:37:00.000+08:00| false| 21.93| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 9 +It costs 0.018s +``` + + +#### Choose Multiple Columns of Data for Different Devices According to Multiple Time Intervals + +The system supports the selection of data in any column in a query, i.e., the selected columns can come from different devices. For example, the SQL statement is: + +```sql +select wf01.wt01.status,wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +which means: + +The selected timeseries are "the power supply status of ln group wf01 plant wt01 device" and "the hardware version of ln group wf02 plant wt02 device"; the statement specifies two different time intervals, namely "2017-11-01T00:05:00.000 to 2017-11-01T00:12:00.000" and "2017-11-01T16:35:00.000 to 2017-11-01T16:37:00.000". The SQL statement requires that the values of selected timeseries satisfying any time interval be selected. + +The execution result of this SQL statement is as follows: + +``` ++-----------------------------+------------------------+--------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf02.wt02.hardware| ++-----------------------------+------------------------+--------------------------+ +|2017-11-01T00:06:00.000+08:00| false| v1| +|2017-11-01T00:07:00.000+08:00| false| v1| +|2017-11-01T00:08:00.000+08:00| false| v1| +|2017-11-01T00:09:00.000+08:00| false| v1| +|2017-11-01T00:10:00.000+08:00| true| v2| +|2017-11-01T00:11:00.000+08:00| false| v1| +|2017-11-01T16:35:00.000+08:00| true| v2| +|2017-11-01T16:36:00.000+08:00| false| v1| +|2017-11-01T16:37:00.000+08:00| false| v1| ++-----------------------------+------------------------+--------------------------+ +Total line number = 9 +It costs 0.014s +``` + +#### Order By Time Query + +IoTDB supports the 'order by time' statement since 0.11, it's used to display results in descending order by time. +For example, the SQL statement is: + +```sql +select * from root.ln.** where time > 1 order by time desc limit 10; +``` + +The execution result of this SQL statement is as follows: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-07T23:59:00.000+08:00| v1| false| 21.07| false| +|2017-11-07T23:58:00.000+08:00| v1| false| 22.93| false| +|2017-11-07T23:57:00.000+08:00| v2| true| 24.39| true| +|2017-11-07T23:56:00.000+08:00| v2| true| 24.44| true| +|2017-11-07T23:55:00.000+08:00| v2| true| 25.9| true| +|2017-11-07T23:54:00.000+08:00| v1| false| 22.52| false| +|2017-11-07T23:53:00.000+08:00| v2| true| 24.58| true| +|2017-11-07T23:52:00.000+08:00| v1| false| 20.18| false| +|2017-11-07T23:51:00.000+08:00| v1| false| 22.24| false| +|2017-11-07T23:50:00.000+08:00| v2| true| 23.7| true| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.016s +``` + +### Execution Interface + +In IoTDB, there are two ways to execute data query: + +- Execute queries using IoTDB-SQL. +- Efficient execution interfaces for common queries, including time-series raw data query, last query, and aggregation query. + +#### Execute queries using IoTDB-SQL + +Data query statements can be used in SQL command-line terminals, JDBC, JAVA / C++ / Python / Go and other native APIs, and RESTful APIs. + +- Execute the query statement in the SQL command line terminal: start the SQL command line terminal, and directly enter the query statement to execute, see [SQL command line terminal](../Tools-System/CLI.md). + +- Execute query statements in JDBC, see [JDBC](../API/Programming-JDBC.md) for details. + +- Execute query statements in native APIs such as JAVA / C++ / Python / Go. For details, please refer to the relevant documentation in the Application Programming Interface chapter. The interface prototype is as follows: + + ````java + SessionDataSet executeQueryStatement(String sql) + ```` + +- Used in RESTful API, see [HTTP API V1](../API/RestServiceV1.md) or [HTTP API V2](../API/RestServiceV2.md) for details. + +#### Efficient execution interfaces + +The native APIs provide efficient execution interfaces for commonly used queries, which can save time-consuming operations such as SQL parsing. include: + +* Time-series raw data query with time range: + - The specified query time range is a left-closed right-open interval, including the start time but excluding the end time. + +```java +SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); +``` + +* Last query: + - Query the last data, whose timestamp is greater than or equal LastTime. + +```java +SessionDataSet executeLastDataQuery(List paths, long LastTime); +``` + +* Aggregation query: + - Support specified query time range: The specified query time range is a left-closed right-open interval, including the start time but not the end time. + - Support GROUP BY TIME. + +```java +SessionDataSet executeAggregationQuery(List paths, List aggregations); + +SessionDataSet executeAggregationQuery( + List paths, List aggregations, long startTime, long endTime); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval, + long slidingStep); +``` + +## `SELECT` CLAUSE +The `SELECT` clause specifies the output of the query, consisting of several `selectExpr`. Each `selectExpr` defines one or more columns in the query result. For select expression details, see document [Operator-and-Expression](../SQL-Manual/Operator-and-Expression.md). + +- Example 1: + +```sql +select temperature from root.ln.wf01.wt01 +``` + +- Example 2: + +```sql +select status, temperature from root.ln.wf01.wt01 +``` + +### Last Query + +The last query is a special type of query in Apache IoTDB. It returns the data point with the largest timestamp of the specified time series. In other word, it returns the latest state of a time series. This feature is especially important in IoT data analysis scenarios. To meet the performance requirement of real-time device monitoring systems, Apache IoTDB caches the latest values of all time series to achieve microsecond read latency. + +The last query is to return the most recent data point of the given timeseries in a three column format. + +The SQL syntax is defined as: + +```sql +select last [COMMA ]* from < PrefixPath > [COMMA < PrefixPath >]* [ORDER BY TIMESERIES (DESC | ASC)?] +``` + +which means: Query and return the last data points of timeseries prefixPath.path. + +- Only time filter is supported in \. Any other filters given in the \ will give an exception. When the cached most recent data point does not satisfy the criterion specified by the filter, IoTDB will have to get the result from the external storage, which may cause a decrease in performance. + +- The result will be returned in a four column table format. + + ``` + | Time | timeseries | value | dataType | + ``` + + **Note:** The `value` colum will always return the value as `string` and thus also has `TSDataType.TEXT`. Therefore, the column `dataType` is returned also which contains the _real_ type how the value should be interpreted. + +- We can use `TIME/TIMESERIES/VALUE/DATATYPE (DESC | ASC)` to specify that the result set is sorted in descending/ascending order based on a particular column. When the value column contains multiple types of data, the sorting is based on the string representation of the values. + +**Example 1:** get the last point of root.ln.wf01.wt01.status: + +``` +IoTDB> select last status from root.ln.wf01.wt01 ++-----------------------------+------------------------+-----+--------+ +| Time| timeseries|value|dataType| ++-----------------------------+------------------------+-----+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.status|false| BOOLEAN| ++-----------------------------+------------------------+-----+--------+ +Total line number = 1 +It costs 0.000s +``` + +**Example 2:** get the last status and temperature points of root.ln.wf01.wt01, whose timestamp larger or equal to 2017-11-07T23:50:00。 + +``` +IoTDB> select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 3:** get the last points of all sensor in root.ln.wf01.wt01, and order the result by the timeseries column in descending order + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by timeseries desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 4:** get the last points of all sensor in root.ln.wf01.wt01, and order the result by the dataType column in descending order + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by dataType desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +## `WHERE` CLAUSE + +In IoTDB query statements, two filter conditions, **time filter** and **value filter**, are supported. + +The supported operators are as follows: + +- Comparison operators: greater than (`>`), greater than or equal ( `>=`), equal ( `=` or `==`), not equal ( `!=` or `<>`), less than or equal ( `<=`), less than ( `<`). +- Logical operators: and ( `AND` or `&` or `&&`), or ( `OR` or `|` or `||`), not ( `NOT` or `!`). +- Range contains operator: contains ( `IN` ). +- String matches operator: `LIKE`, `REGEXP`. + +### Time Filter + +Use time filters to filter data for a specific time range. For supported formats of timestamps, please refer to [Timestamp](../Basic-Concept/Data-Type.md) . + +An example is as follows: + +1. Select data with timestamp greater than 2022-01-01T00:05:00.000: + + ```sql + select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; + ```` + +2. Select data with timestamp equal to 2022-01-01T00:05:00.000: + + ```sql + select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; + ```` + +3. Select the data in the time interval [2017-11-01T00:05:00.000, 2017-11-01T00:12:00.000): + + ```sql + select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; + ```` + +Note: In the above example, `time` can also be written as `timestamp`. + +### Value Filter + +Use value filters to filter data whose data values meet certain criteria. **Allow** to use a time series not selected in the select clause as a value filter. + +An example is as follows: + +1. Select data with a value greater than 36.5: + + ```sql + select temperature from root.sg1.d1 where temperature > 36.5; + ```` + +2. Select data with value equal to true: + + ```sql + select status from root.sg1.d1 where status = true; + ```` + +3. Select data for the interval [36.5,40] or not: + + ```sql + select temperature from root.sg1.d1 where temperature between 36.5 and 40; + ```` + + ```sql + select temperature from root.sg1.d1 where temperature not between 36.5 and 40; + ```` + +4. Select data with values within a specific range: + + ```sql + select code from root.sg1.d1 where code in ('200', '300', '400', '500'); + ```` + +5. Select data with values outside a certain range: + + ```sql + select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); + ```` + +6. Select data with values is null: + + ```sql + select code from root.sg1.d1 where temperature is null; + ```` + +7. Select data with values is not null: + + ```sql + select code from root.sg1.d1 where temperature is not null; + ```` + +### Fuzzy Query + +Fuzzy query is divided into Like statement and Regexp statement, both of which can support fuzzy matching of TEXT type data. + +Like statement: + +#### Fuzzy matching using `Like` + +In the value filter condition, for TEXT type data, use `Like` and `Regexp` operators to perform fuzzy matching on data. + +**Matching rules:** + +- The percentage (`%`) wildcard matches any string of zero or more characters. +- The underscore (`_`) wildcard matches any single character. + +**Example 1:** Query data containing `'cc'` in `value` under `root.sg.d1`. + +``` +IoTDB> select * from root.sg.d1 where value like '%cc%' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query data that consists of 3 characters and the second character is `'b'` in `value` under `root.sg.d1`. + +``` +IoTDB> select * from root.sg.device where value like '_b_' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00| abc| ++-----------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +#### Fuzzy matching using `Regexp` + +The filter conditions that need to be passed in are regular expressions in the Java standard library style. + +**Examples of common regular matching:** + +``` +All characters with a length of 3-20: ^.{3,20}$ +Uppercase english characters: ^[A-Z]+$ +Numbers and English characters: ^[A-Za-z0-9]+$ +Beginning with a: ^a.* +``` + +**Example 1:** Query a string composed of 26 English characters for the value under root.sg.d1 + +``` +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query root.sg.d1 where the value value is a string composed of 26 lowercase English characters and the time is greater than 100 + +``` +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +## `GROUP BY` CLAUSE + +IoTDB supports using `GROUP BY` clause to aggregate the time series by segment and group. + +Segmented aggregation refers to segmenting data in the row direction according to the time dimension, aiming at the time relationship between different data points in the same time series, and obtaining an aggregated value for each segment. Currently only **group by time**、**group by variation**、**group by condition**、**group by session** and **group by count** is supported, and more segmentation methods will be supported in the future. + +Group aggregation refers to grouping the potential business attributes of time series for different time series. Each group contains several time series, and each group gets an aggregated value. Support **group by path level** and **group by tag** two grouping methods. + +### Aggregate By Segment + +#### Aggregate By Time + +Aggregate by time is a typical query method for time series data. Data is collected at high frequency and needs to be aggregated and calculated at certain time intervals. For example, to calculate the daily average temperature, the sequence of temperature needs to be segmented by day, and then calculated. average value. + +Aggregate by time refers to a query method that uses a lower frequency than the time frequency of data collection, and is a special case of segmented aggregation. For example, the frequency of data collection is one second. If you want to display the data in one minute, you need to use time aggregagtion. + +This section mainly introduces the related examples of time aggregation, using the `GROUP BY` clause. IoTDB supports partitioning result sets according to time interval and customized sliding step. And by default results are sorted by time in ascending order. + +The GROUP BY statement provides users with three types of specified parameters: + +* Parameter 1: The display window on the time axis +* Parameter 2: Time interval for dividing the time axis(should be positive) +* Parameter 3: Time sliding step (optional and defaults to equal the time interval if not set) + +The actual meanings of the three types of parameters are shown in Figure below. +Among them, the parameter 3 is optional. + +
+
+ + +There are three typical examples of frequency reduction aggregation: + +##### Aggregate By Time without Specifying the Sliding Step Length + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); +``` + +which means: + +Since the sliding step length is not specified, the `GROUP BY` statement by default set the sliding step the same as the time interval which is `1d`. + +The fist parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2017-11-07T23:00:00). + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1d) as time interval and startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [0,1d), [1d, 2d), [2d, 3d), etc. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-11-01T00:00:00, 2017-11-07 T23:00:00]), and map these data to the previously segmented time axis (in this case there are mapped data in every 1-day period from 2017-11-01T00:00:00 to 2017-11-07T23:00:00:00). + +Since there is data for each time period in the result range to be displayed, the execution result of the SQL statement is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 1440| 26.0| +|2017-11-02T00:00:00.000+08:00| 1440| 26.0| +|2017-11-03T00:00:00.000+08:00| 1440| 25.99| +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.024s +``` + +##### Aggregate By Time Specifying the Sliding Step Length + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); +``` + +which means: + +Since the user specifies the sliding step parameter as 1d, the `GROUP BY` statement will move the time interval `1 day` long instead of `3 hours` as default. + +That means we want to fetch all the data of 00:00:00 to 02:59:59 every day from 2017-11-01 to 2017-11-07. + +The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2017-11-07T23:00:00). + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (3h) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-11-01T00:00:00, 2017-11-01T03:00:00), [2017-11-02T00:00:00, 2017-11-02T03:00:00), [2017-11-03T00:00:00, 2017-11-03T03:00:00), etc. + +The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-11-01T00:00:00, 2017-11-07T23:00:00]), and map these data to the previously segmented time axis (in this case there are mapped data in every 3-hour period for each day from 2017-11-01T00:00:00 to 2017-11-07T23:00:00:00). + +Since there is data for each time period in the result range to be displayed, the execution result of the SQL statement is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-02T00:00:00.000+08:00| 180| 25.98| +|2017-11-03T00:00:00.000+08:00| 180| 25.96| +|2017-11-04T00:00:00.000+08:00| 180| 25.96| +|2017-11-05T00:00:00.000+08:00| 180| 26.0| +|2017-11-06T00:00:00.000+08:00| 180| 25.85| +|2017-11-07T00:00:00.000+08:00| 180| 25.99| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.006s +``` + +The sliding step can be smaller than the interval, in which case there is overlapping time between the aggregation windows (similar to a sliding window). + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-01 10:00:00), 4h, 2h); +``` + +The execution result of the SQL statement is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-01T02:00:00.000+08:00| 180| 25.98| +|2017-11-01T04:00:00.000+08:00| 180| 25.96| +|2017-11-01T06:00:00.000+08:00| 180| 25.96| +|2017-11-01T08:00:00.000+08:00| 180| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 5 +It costs 0.006s +``` + +##### Aggregate by Natural Month + +The SQL statement is: + +```sql +select count(status) from root.ln.wf01.wt01 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +which means: + +Since the user specifies the sliding step parameter as `2mo`, the `GROUP BY` statement will move the time interval `2 months` long instead of `1 month` as default. + +The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2019-11-07T23:00:00). + +The start time is 2017-11-01T00:00:00. The sliding step will increment monthly based on the start date, and the 1st day of the month will be used as the time interval's start time. + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1mo) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-11-01T00:00:00, 2017-12-01T00:00:00), [2018-02-01T00:00:00, 2018-03-01T00:00:00), [2018-05-03T00:00:00, 2018-06-01T00:00:00)), etc. + +The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of (2017-11-01T00:00:00, 2019-11-07T23:00:00], and map these data to the previously segmented time axis (in this case there are mapped data of the first month in every two month period from 2017-11-01T00:00:00 to 2019-11-07T23:00:00). + +The SQL execution result is: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-01T00:00:00.000+08:00| 259| +|2018-01-01T00:00:00.000+08:00| 250| +|2018-03-01T00:00:00.000+08:00| 259| +|2018-05-01T00:00:00.000+08:00| 251| +|2018-07-01T00:00:00.000+08:00| 242| +|2018-09-01T00:00:00.000+08:00| 225| +|2018-11-01T00:00:00.000+08:00| 216| +|2019-01-01T00:00:00.000+08:00| 207| +|2019-03-01T00:00:00.000+08:00| 216| +|2019-05-01T00:00:00.000+08:00| 207| +|2019-07-01T00:00:00.000+08:00| 199| +|2019-09-01T00:00:00.000+08:00| 181| +|2019-11-01T00:00:00.000+08:00| 60| ++-----------------------------+-------------------------------+ +``` + +The SQL statement is: + +```sql +select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +which means: + +Since the user specifies the sliding step parameter as `2mo`, the `GROUP BY` statement will move the time interval `2 months` long instead of `1 month` as default. + +The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-10-31T00:00:00, 2019-11-07T23:00:00). + +Different from the previous example, the start time is set to 2017-10-31T00:00:00. The sliding step will increment monthly based on the start date, and the 31st day of the month meaning the last day of the month will be used as the time interval's start time. If the start time is set to the 30th date, the sliding step will use the 30th or the last day of the month. + +The start time is 2017-10-31T00:00:00. The sliding step will increment monthly based on the start time, and the 1st day of the month will be used as the time interval's start time. + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1mo) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-10-31T00:00:00, 2017-11-31T00:00:00), [2018-02-31T00:00:00, 2018-03-31T00:00:00), [2018-05-31T00:00:00, 2018-06-31T00:00:00), etc. + +The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-10-31T00:00:00, 2019-11-07T23:00:00) and map these data to the previously segmented time axis (in this case there are mapped data of the first month in every two month period from 2017-10-31T00:00:00 to 2019-11-07T23:00:00). + +The SQL execution result is: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-10-31T00:00:00.000+08:00| 251| +|2017-12-31T00:00:00.000+08:00| 250| +|2018-02-28T00:00:00.000+08:00| 259| +|2018-04-30T00:00:00.000+08:00| 250| +|2018-06-30T00:00:00.000+08:00| 242| +|2018-08-31T00:00:00.000+08:00| 225| +|2018-10-31T00:00:00.000+08:00| 216| +|2018-12-31T00:00:00.000+08:00| 208| +|2019-02-28T00:00:00.000+08:00| 216| +|2019-04-30T00:00:00.000+08:00| 208| +|2019-06-30T00:00:00.000+08:00| 199| +|2019-08-31T00:00:00.000+08:00| 181| +|2019-10-31T00:00:00.000+08:00| 69| ++-----------------------------+-------------------------------+ +``` + +##### Left Open And Right Close Range + +The SQL statement is: + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); +``` + +In this sql, the time interval is left open and right close, so we won't include the value of timestamp 2017-11-01T00:00:00 and instead we will include the value of timestamp 2017-11-07T23:00:00. + +We will get the result like following: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------------+ +Total line number = 7 +It costs 0.004s +``` + +#### Aggregation By Variation + +IoTDB supports grouping by continuous stable values through the `GROUP BY VARIATION` statement. + +Group-By-Variation wil set the first point in group as the base point, +then if the difference between the new data and base point is small than or equal to delta, +the data point will be grouped together and execute aggregation query (The calculation of difference and the meaning of delte are introduced below). The groups won't overlap and there is no fixed start time and end time. +The syntax of clause is as follows: + +```sql +group by variation(controlExpression[,delta][,ignoreNull=true/false]) +``` + +The different parameters mean: + +* controlExpression + +The value that is used to calculate difference. It can be any columns or the expression of them. + +* delta + +The threshold that is used when grouping. The difference of controlExpression between the first data point and new data point should less than or equal to delta. +When delta is zero, all the continuous data with equal expression value will be grouped into the same group. + +* ignoreNull + +Used to specify how to deal with the data when the value of controlExpression is null. When ignoreNull is false, null will be treated as a new value and when ignoreNull is true, the data point will be directly skipped. + +The supported return types of controlExpression and how to deal with null value when ignoreNull is false are shown in the following table: + +| delta | Return Type Supported By controlExpression | The Handling of null when ignoreNull is False | +| -------- | ------------------------------------------ | ------------------------------------------------------------ | +| delta!=0 | INT32、INT64、FLOAT、DOUBLE | If the processing group doesn't contains null, null value should be treated as infinity/infinitesimal and will end current group.
Continuous null values are treated as stable values and assigned to the same group. | +| delta=0 | TEXT、BINARY、INT32、INT64、FLOAT、DOUBLE | Null is treated as a new value in a new group and continuous nulls belong to the same group. | + +groupByVariation + +##### Precautions for Use + +1. The result of controlExpression should be a unique value. If multiple columns appear after using wildcard stitching, an error will be reported. +2. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +3. Each device is grouped separately when used with `ALIGN BY DEVICE`. +4. Delta is zero and ignoreNull is true by default. +5. Currently `GROUP BY VARIATION` is not supported with `GROUP BY LEVEL`. + +Using the raw data below, several examples of `GROUP BY VARIAITON` queries will be given. + +``` ++-----------------------------+-------+-------+-------+--------+-------+-------+ +| Time| s1| s2| s3| s4| s5| s6| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| null| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| null| 245.0| 29.0| null| +|1970-01-01T08:00:00.030+08:00| 34.5| null| 30.0| 345.0| null| null| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| null| 59.0| 50.0| 545.0| 59.0| 6.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| null| +|1970-01-01T08:00:00.070+08:00| 74.5| 79.0| null| null| 79.0| 3.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 3.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 3.25| +|1970-01-01T08:00:00.150+08:00| 66.5| 77.0| 90.0| 945.0| 99.0| 9.25| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +``` + +##### delta = 0 + +The sql is shown below: + +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) +``` + +Get the result below which ignores the row with null value in `s6`. + +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.040+08:00| 24.5| 3| 50.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` + +when ignoreNull is false, the row with null value in `s6` will be considered. + +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) +``` + +Get the following result. + +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.020+08:00|1970-01-01T08:00:00.030+08:00| 29.5| 1| 30.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.040+08:00| 44.5| 1| 40.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.060+08:00|1970-01-01T08:00:00.060+08:00| 64.5| 1| 60.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` + +##### delta !=0 + +The sql is shown below: + +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) +``` + +Get the result below: + +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.050+08:00| 24.5| 4| 100.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` + +The sql is shown below: + +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6+s5, 10) +``` + +Get the result below: + +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.050+08:00| 44.5| 2| 90.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.080+08:00| 79.5| 2| 80.0| +|1970-01-01T08:00:00.090+08:00|1970-01-01T08:00:00.150+08:00| 80.5| 2| 180.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` + +#### Aggregation By Condition + +When you need to filter the data according to a specific condition and group the continuous ones for an aggregation query. +`GROUP BY CONDITION` is suitable for you.The rows which don't meet the given condition will be simply ignored because they don't belong to any group. +Its syntax is defined below: + +```sql +group by condition(predict,[keep>/>=/=/<=/<]threshold,[,ignoreNull=true/false]) +``` + +* predict + +Any legal expression return the type of boolean for filtering in grouping. + +* [keep>/>=/=/<=/<]threshold + +Keep expression is used to specify the number of continuous rows that meet the `predict` condition to form a group. Only the number of rows in group satisfy the keep condition, the result of group will be output. +Keep expression consists of a 'keep' string and a threshold of type `long` or a single 'long' type data. + +* ignoreNull=true/false + +Used to specify how to handle data rows that encounter null predict, skip the row when it's true and end current group when it's false. + +##### Precautions for Use + +1. keep condition is required in the query, but you can omit the 'keep' string and given a `long` number which defaults to 'keep=long number' condition. +2. IgnoreNull defaults to true. +3. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +4. Each device is grouped separately when used with `ALIGN BY DEVICE`. +5. Currently `GROUP BY CONDITION` is not supported with `GROUP BY LEVEL`. + +For the following raw data, several query examples are given below: + +``` ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +| Time|root.sg.beijing.car01.soc|root.sg.beijing.car01.charging_status|root.sg.beijing.car01.vehicle_status| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| 1| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| 1| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +``` + +The sql statement to query data with at least two continuous row shown below: + +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=true) +``` + +Get the result below: + +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 10| 5| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` + +When ignoreNull is false, the null value will be treated as a row that doesn't meet the condition. + +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=false) +``` + +Get the result below, the original group is split. + +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 7| 3| 36.0| +|1970-01-01T08:00:00.009+08:00| 10| 2| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` + +#### Aggregation By Session + +`GROUP BY SESSION` can be used to group data according to the interval of the time. Data with a time interval less than or equal to the given threshold will be assigned to the same group. +For example, in industrial scenarios, devices don't always run continuously, `GROUP BY SESSION` will group the data generated by each access session of the device. +Its syntax is defined as follows: + +```sql +group by session(timeInterval) +``` + +* timeInterval + +A given interval threshold to create a new group of data when the difference between the time of data is greater than the threshold. + +The figure below is a grouping diagram under `GROUP BY SESSION`. + +groupBySession + +##### Precautions for Use + +1. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +2. Each device is grouped separately when used with `ALIGN BY DEVICE`. +3. Currently `GROUP BY SESSION` is not supported with `GROUP BY LEVEL`. + +For the raw data below, a few query examples are given: + +``` ++-----------------------------+-----------------+-----------+--------+------+ +| Time| Device|temperature|hardware|status| ++-----------------------------+-----------------+-----------+--------+------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01| 35.7| 11| false| +|1970-01-01T08:00:02.000+08:00|root.ln.wf02.wt01| 35.8| 22| true| +|1970-01-01T08:00:03.000+08:00|root.ln.wf02.wt01| 35.4| 33| false| +|1970-01-01T08:00:04.000+08:00|root.ln.wf02.wt01| 36.4| 44| false| +|1970-01-01T08:00:05.000+08:00|root.ln.wf02.wt01| 36.8| 55| false| +|1970-01-01T08:00:10.000+08:00|root.ln.wf02.wt01| 36.8| 110| false| +|1970-01-01T08:00:20.000+08:00|root.ln.wf02.wt01| 37.8| 220| true| +|1970-01-01T08:00:30.000+08:00|root.ln.wf02.wt01| 37.5| 330| false| +|1970-01-01T08:00:40.000+08:00|root.ln.wf02.wt01| 37.4| 440| false| +|1970-01-01T08:00:50.000+08:00|root.ln.wf02.wt01| 37.9| 550| false| +|1970-01-01T08:01:40.000+08:00|root.ln.wf02.wt01| 38.0| 110| false| +|1970-01-01T08:02:30.000+08:00|root.ln.wf02.wt01| 38.8| 220| true| +|1970-01-01T08:03:20.000+08:00|root.ln.wf02.wt01| 38.6| 330| false| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01| 38.4| 440| false| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01| 38.3| 550| false| +|1970-01-01T08:06:40.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:07:50.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:08:00.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01| 38.2| 110| false| +|1970-01-02T08:08:02.000+08:00|root.ln.wf02.wt01| 37.5| 220| true| +|1970-01-02T08:08:03.000+08:00|root.ln.wf02.wt01| 37.4| 330| false| +|1970-01-02T08:08:04.000+08:00|root.ln.wf02.wt01| 36.8| 440| false| +|1970-01-02T08:08:05.000+08:00|root.ln.wf02.wt01| 37.4| 550| false| ++-----------------------------+-----------------+-----------+--------+------+ +``` + +TimeInterval can be set by different time units, the sql is shown below: + +```sql +select __endTime,count(*) from root.** group by session(1d) +``` + +Get the result: + +``` ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +| Time| __endTime|count(root.ln.wf02.wt01.temperature)|count(root.ln.wf02.wt01.hardware)|count(root.ln.wf02.wt01.status)| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +|1970-01-01T08:00:01.000+08:00|1970-01-01T08:08:00.000+08:00| 15| 18| 15| +|1970-01-02T08:08:01.000+08:00|1970-01-02T08:08:05.000+08:00| 5| 5| 5| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +``` + +It can be also used with `HAVING` and `ALIGN BY DEVICE` clauses. + +```sql +select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device +``` + +Get the result below: + +``` ++-----------------------------+-----------------+-----------------------------+-------------+ +| Time| Device| __endTime|sum(hardware)| ++-----------------------------+-----------------+-----------------------------+-------------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01|1970-01-01T08:03:20.000+08:00| 2475.0| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:04:20.000+08:00| 440.0| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:05:20.000+08:00| 550.0| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01|1970-01-02T08:08:05.000+08:00| 1650.0| ++-----------------------------+-----------------+-----------------------------+-------------+ +``` + +#### Aggregation By Count + +`GROUP BY COUNT`can aggregate the data points according to the number of points. It can group fixed number of continuous data points together for aggregation query. +Its syntax is defined as follows: + +```sql +group by count(controlExpression, size[,ignoreNull=true/false]) +``` + +* controlExpression + +The object to count during processing, it can be any column or an expression of columns. + +* size + +The number of data points in a group, a number of `size` continuous points will be divided to the same group. + +* ignoreNull=true/false + +Whether to ignore the data points with null in `controlExpression`, when ignoreNull is true, data points with the `controlExpression` of null will be skipped during counting. + +##### Precautions for Use + +1. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +2. Each device is grouped separately when used with `ALIGN BY DEVICE`. +3. Currently `GROUP BY SESSION` is not supported with `GROUP BY LEVEL`. +4. When the final number of data points in a group is less than `size`, the result of the group will not be output. + +For the data below, some examples will be given. + +``` ++-----------------------------+-----------+-----------------------+ +| Time|root.sg.soc|root.sg.charging_status| ++-----------------------------+-----------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| ++-----------------------------+-----------+-----------------------+ +``` + +The sql is shown below + +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) +``` + +Get the result below, in the second group from 1970-01-01T08:00:00.006+08:00 to 1970-01-01T08:00:00.010+08:00. There are only four points included which is less than `size`. So it won't be output. + +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` + +When `ignoreNull=false` is used to take null value into account. There will be two groups with 5 points in the resultSet, which is shown as follows: + +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) +``` + +Get the results: + +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| +|1970-01-01T08:00:00.006+08:00|1970-01-01T08:00:00.010+08:00| 24.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` + +### Aggregate By Group + +#### Aggregation By Level + +Aggregation by level statement is used to group the query result whose name is the same at the given level. + +- Keyword `LEVEL` is used to specify the level that need to be grouped. By convention, `level=0` represents *root* level. +- All aggregation functions are supported. When using five aggregations: sum, avg, min_value, max_value and extreme, please make sure all the aggregated series have exactly the same data type. Otherwise, it will generate a syntax error. + +**Example 1:** there are multiple series named `status` under different databases, like "root.ln.wf01.wt01.status", "root.ln.wf02.wt02.status", and "root.sgcc.wf03.wt01.status". If you need to count the number of data points of the `status` sequence under different databases, use the following query: + +```sql +select count(status) from root.** group by level = 1 +``` + +Result: + +``` ++-------------------------+---------------------------+ +|count(root.ln.*.*.status)|count(root.sgcc.*.*.status)| ++-------------------------+---------------------------+ +| 20160| 10080| ++-------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**Example 2:** If you need to count the number of data points under different devices, you can specify level = 3, + +```sql +select count(status) from root.** group by level = 3 +``` + +Result: + +``` ++---------------------------+---------------------------+ +|count(root.*.*.wt01.status)|count(root.*.*.wt02.status)| ++---------------------------+---------------------------+ +| 20160| 10080| ++---------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**Example 3:** Attention,the devices named `wt01` under databases `ln` and `sgcc` are grouped together, since they are regarded as devices with the same name. If you need to further count the number of data points in different devices under different databases, you can use the following query: + +```sql +select count(status) from root.** group by level = 1, 3 +``` + +Result: + +``` ++----------------------------+----------------------------+------------------------------+ +|count(root.ln.*.wt01.status)|count(root.ln.*.wt02.status)|count(root.sgcc.*.wt01.status)| ++----------------------------+----------------------------+------------------------------+ +| 10080| 10080| 10080| ++----------------------------+----------------------------+------------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**Example 4:** Assuming that you want to query the maximum value of temperature sensor under all time series, you can use the following query statement: + +```sql +select max_value(temperature) from root.** group by level = 0 +``` + +Result: + +``` ++---------------------------------+ +|max_value(root.*.*.*.temperature)| ++---------------------------------+ +| 26.0| ++---------------------------------+ +Total line number = 1 +It costs 0.013s +``` + +**Example 5:** The above queries are for a certain sensor. In particular, **if you want to query the total data points owned by all sensors at a certain level**, you need to explicitly specify `*` is selected. + +```sql +select count(*) from root.ln.** group by level = 2 +``` + +Result: + +``` ++----------------------+----------------------+ +|count(root.*.wf01.*.*)|count(root.*.wf02.*.*)| ++----------------------+----------------------+ +| 20160| 20160| ++----------------------+----------------------+ +Total line number = 1 +It costs 0.013s +``` + +##### Aggregate By Time with Level Clause + +Level could be defined to show count the number of points of each node at the given level in current Metadata Tree. + +This could be used to query the number of points under each device. + +The SQL statement is: + +Get time aggregation by level. + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; +``` + +Result: + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.006s +``` + +Time aggregation with sliding step and by level. + +```sql +select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; +``` + +Result: + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| +|2017-11-02T00:00:00.000+08:00| 180| +|2017-11-03T00:00:00.000+08:00| 180| +|2017-11-04T00:00:00.000+08:00| 180| +|2017-11-05T00:00:00.000+08:00| 180| +|2017-11-06T00:00:00.000+08:00| 180| +|2017-11-07T00:00:00.000+08:00| 180| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.004s +``` + +#### Aggregation By Tags + +IotDB allows you to do aggregation query with the tags defined in timeseries through `GROUP BY TAGS` clause as well. + +Firstly, we can put these example data into IoTDB, which will be used in the following feature introduction. + +These are the temperature data of the workshops, which belongs to the factory `factory1` and locates in different cities. The time range is `[1000, 10000)`. + +The device node of the timeseries path is the ID of the device. The information of city and workshop are modelled in the tags `city` and `workshop`. +The devices `d1` and `d2` belong to the workshop `d1` in `Beijing`. +`d3` and `d4` belong to the workshop `w2` in `Beijing`. +`d5` and `d6` belong to the workshop `w1` in `Shanghai`. +`d7` belongs to the workshop `w2` in `Shanghai`. +`d8` and `d9` are under maintenance, and don't belong to any workshops, so they have no tags. + + +```SQL +CREATE DATABASE root.factory1; +create timeseries root.factory1.d1.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d2.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d3.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d4.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d5.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d6.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d7.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w2); +create timeseries root.factory1.d8.temperature with datatype=FLOAT; +create timeseries root.factory1.d9.temperature with datatype=FLOAT; + +insert into root.factory1.d1(time, temperature) values(1000, 104.0); +insert into root.factory1.d1(time, temperature) values(3000, 104.2); +insert into root.factory1.d1(time, temperature) values(5000, 103.3); +insert into root.factory1.d1(time, temperature) values(7000, 104.1); + +insert into root.factory1.d2(time, temperature) values(1000, 104.4); +insert into root.factory1.d2(time, temperature) values(3000, 103.7); +insert into root.factory1.d2(time, temperature) values(5000, 103.3); +insert into root.factory1.d2(time, temperature) values(7000, 102.9); + +insert into root.factory1.d3(time, temperature) values(1000, 103.9); +insert into root.factory1.d3(time, temperature) values(3000, 103.8); +insert into root.factory1.d3(time, temperature) values(5000, 102.7); +insert into root.factory1.d3(time, temperature) values(7000, 106.9); + +insert into root.factory1.d4(time, temperature) values(1000, 103.9); +insert into root.factory1.d4(time, temperature) values(5000, 102.7); +insert into root.factory1.d4(time, temperature) values(7000, 106.9); + +insert into root.factory1.d5(time, temperature) values(1000, 112.9); +insert into root.factory1.d5(time, temperature) values(7000, 113.0); + +insert into root.factory1.d6(time, temperature) values(1000, 113.9); +insert into root.factory1.d6(time, temperature) values(3000, 113.3); +insert into root.factory1.d6(time, temperature) values(5000, 112.7); +insert into root.factory1.d6(time, temperature) values(7000, 112.3); + +insert into root.factory1.d7(time, temperature) values(1000, 101.2); +insert into root.factory1.d7(time, temperature) values(3000, 99.3); +insert into root.factory1.d7(time, temperature) values(5000, 100.1); +insert into root.factory1.d7(time, temperature) values(7000, 99.8); + +insert into root.factory1.d8(time, temperature) values(1000, 50.0); +insert into root.factory1.d8(time, temperature) values(3000, 52.1); +insert into root.factory1.d8(time, temperature) values(5000, 50.1); +insert into root.factory1.d8(time, temperature) values(7000, 50.5); + +insert into root.factory1.d9(time, temperature) values(1000, 50.3); +insert into root.factory1.d9(time, temperature) values(3000, 52.1); +``` + +##### Aggregation query by one single tag + +If the user wants to know the average temperature of each workshop, he can query like this + +```SQL +SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); +``` + +The query will calculate the average of the temperatures of those timeseries which have the same tag value of the key `city`. +The results are + +``` ++--------+------------------+ +| city| avg(temperature)| ++--------+------------------+ +| Beijing|104.04666697184244| +|Shanghai|107.85000076293946| +| NULL| 50.84999910990397| ++--------+------------------+ +Total line number = 3 +It costs 0.231s +``` + +From the results we can see that the differences between aggregation by tags query and aggregation by time or level query are: + +1. Aggregation query by tags will no longer remove wildcard to raw timeseries, but do the aggregation through the data of multiple timeseries, which have the same tag value. +2. Except for the aggregate result column, the result set contains the key-value column of the grouped tag. The column name is the tag key, and the values in the column are tag values which present in the searched timeseries. + If some searched timeseries doesn't have the grouped tag, a `NULL` value in the key-value column of the grouped tag will be presented, which means the aggregation of all the timeseries lacking the tagged key. + +##### Aggregation query by multiple tags + +Except for the aggregation query by one single tag, aggregation query by multiple tags in a particular order is allowed as well. + +For example, a user wants to know the average temperature of the devices in each workshop. +As the workshop names may be same in different city, it's not correct to aggregated by the tag `workshop` directly. +So the aggregation by the tag `city` should be done first, and then by the tag `workshop`. + +SQL + +```SQL +SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); +``` + +The results + +``` ++--------+--------+------------------+ +| city|workshop| avg(temperature)| ++--------+--------+------------------+ +| NULL| NULL| 50.84999910990397| +|Shanghai| w1|113.01666768391927| +| Beijing| w2| 104.4000004359654| +|Shanghai| w2|100.10000038146973| +| Beijing| w1|103.73750019073486| ++--------+--------+------------------+ +Total line number = 5 +It costs 0.027s +``` + +We can see that in a multiple tags aggregation query, the result set will output the key-value columns of all the grouped tag keys, which have the same order with the one in `GROUP BY TAGS`. + +##### Downsampling Aggregation by tags based on Time Window + +Downsampling aggregation by time window is one of the most popular features in a time series database. IoTDB supports to do aggregation query by tags based on time window. + +For example, a user wants to know the average temperature of the devices in each workshop, in every 5 seconds, in the range of time `[1000, 10000)`. + +SQL + +```SQL +SELECT avg(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); +``` + +The results + +``` ++-----------------------------+--------+--------+------------------+ +| Time| city|workshop| avg(temperature)| ++-----------------------------+--------+--------+------------------+ +|1970-01-01T08:00:01.000+08:00| NULL| NULL| 50.91999893188476| +|1970-01-01T08:00:01.000+08:00|Shanghai| w1|113.20000076293945| +|1970-01-01T08:00:01.000+08:00| Beijing| w2| 103.4| +|1970-01-01T08:00:01.000+08:00|Shanghai| w2| 100.1999994913737| +|1970-01-01T08:00:01.000+08:00| Beijing| w1|103.81666692097981| +|1970-01-01T08:00:06.000+08:00| NULL| NULL| 50.5| +|1970-01-01T08:00:06.000+08:00|Shanghai| w1| 112.6500015258789| +|1970-01-01T08:00:06.000+08:00| Beijing| w2| 106.9000015258789| +|1970-01-01T08:00:06.000+08:00|Shanghai| w2| 99.80000305175781| +|1970-01-01T08:00:06.000+08:00| Beijing| w1| 103.5| ++-----------------------------+--------+--------+------------------+ +``` + +Comparing to the pure tag aggregations, this kind of aggregation will divide the data according to the time window specification firstly, and do the aggregation query by the multiple tags in each time window secondly. +The result set will also contain a time column, which have the same meaning with the time column of the result in downsampling aggregation query by time window. + +##### Limitation of Aggregation by Tags + +As this feature is still under development, some queries have not been completed yet and will be supported in the future. + +> 1. Temporarily not support `HAVING` clause to filter the results. +> 2. Temporarily not support ordering by tag values. +> 3. Temporarily not support `LIMIT`,`OFFSET`,`SLIMIT`,`SOFFSET`. +> 4. Temporarily not support `ALIGN BY DEVICE`. +> 5. Temporarily not support expressions as aggregation function parameter,e.g. `count(s+1)`. +> 6. Not support the value filter, which stands the same with the `GROUP BY LEVEL` query. + +## `HAVING` CLAUSE + +If you want to filter the results of aggregate queries, +you can use the `HAVING` clause after the `GROUP BY` clause. + +> NOTE: +> +> 1.The expression in HAVING clause must consist of aggregate values; the original sequence cannot appear alone. +> The following usages are incorrect: +> +> ```sql +> select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 +> select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 +> ``` +> +> 2.When filtering the `GROUP BY LEVEL` result, the PATH in `SELECT` and `HAVING` can only have one node. +> The following usages are incorrect: +> +> ```sql +> select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 +> select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 +> ``` + +Here are a few examples of using the 'HAVING' clause to filter aggregate results. + +Aggregation result 1: + +``` ++-----------------------------+---------------------+---------------------+ +| Time|count(root.test.*.s1)|count(root.test.*.s2)| ++-----------------------------+---------------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 4| 4| +|1970-01-01T08:00:00.003+08:00| 1| 0| +|1970-01-01T08:00:00.005+08:00| 2| 4| +|1970-01-01T08:00:00.007+08:00| 3| 2| +|1970-01-01T08:00:00.009+08:00| 4| 4| ++-----------------------------+---------------------+---------------------+ +``` + +Aggregation result filtering query 1: + +```sql + select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 1 +``` + +Filtering result 1: + +``` ++-----------------------------+---------------------+ +| Time|count(root.test.*.s1)| ++-----------------------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 4| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 4| ++-----------------------------+---------------------+ +``` + +Aggregation result 2: + +``` ++-----------------------------+-------------+---------+---------+ +| Time| Device|count(s1)|count(s2)| ++-----------------------------+-------------+---------+---------+ +|1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.003+08:00|root.test.sg1| 1| 0| +|1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.007+08:00|root.test.sg1| 2| 1| +|1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| +|1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| +|1970-01-01T08:00:00.003+08:00|root.test.sg2| 0| 0| +|1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| +|1970-01-01T08:00:00.007+08:00|root.test.sg2| 1| 1| +|1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| ++-----------------------------+-------------+---------+---------+ +``` + +Aggregation result filtering query 2: + +```sql + select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device +``` + +Filtering result 2: + +``` ++-----------------------------+-------------+---------+---------+ +| Time| Device|count(s1)|count(s2)| ++-----------------------------+-------------+---------+---------+ +|1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| +|1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| +|1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| +|1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| ++-----------------------------+-------------+---------+---------+ +``` + +## `FILL` CLAUSE + +### Introduction + +When executing some queries, there may be no data for some columns in some rows, and data in these locations will be null, but this kind of null value is not conducive to data visualization and analysis, and the null value needs to be filled. + +In IoTDB, users can use the FILL clause to specify the fill mode when data is missing. Fill null value allows the user to fill any query result with null values according to a specific method, such as taking the previous value that is not null, or linear interpolation. The query result after filling the null value can better reflect the data distribution, which is beneficial for users to perform data analysis. + +### Syntax Definition + +**The following is the syntax definition of the `FILL` clause:** + +```sql +FILL '(' PREVIOUS | LINEAR | constant ')' +``` + +**Note:** + +- We can specify only one fill method in the `FILL` clause, and this method applies to all columns of the result set. +- Null value fill is not compatible with version 0.13 and previous syntax (`FILL(([(, , )?])+)`) is not supported anymore. + +### Fill Methods + +**IoTDB supports the following three fill methods:** + +- `PREVIOUS`: Fill with the previous non-null value of the column. +- `LINEAR`: Fill the column with a linear interpolation of the previous non-null value and the next non-null value of the column. +- Constant: Fill with the specified constant. + +**Following table lists the data types and supported fill methods.** + +| Data Type | Supported Fill Methods | +| :-------- | :---------------------- | +| boolean | previous, value | +| int32 | previous, linear, value | +| int64 | previous, linear, value | +| float | previous, linear, value | +| double | previous, linear, value | +| text | previous, value | + +**Note:** For columns whose data type does not support specifying the fill method, we neither fill it nor throw exception, just keep it as it is. + +**For examples:** + +If we don't use any fill methods: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000; +``` + +the original result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +#### `PREVIOUS` Fill + +**For null values in the query result set, fill with the previous non-null value of the column.** + +**Note:** If the first value of this column is null, we will keep first value as null and won't fill it until we meet first non-null value + +For example, with `PREVIOUS` fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 21.93| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| false| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +**While using `FILL(PREVIOUS)`, you can specify a time interval. If the interval between the timestamp of the current null value and the timestamp of the previous non-null value exceeds the specified time interval, no filling will be performed.** + +> 1. In the case of FILL(LINEAR) and FILL(CONSTANT), if the second parameter is specified, an exception will be thrown +> 2. The interval parameter only supports integers + For example, the raw data looks like this: + +```sql +select s1 from root.db.d1 +``` +``` ++-----------------------------+-------------+ +| Time|root.db.d1.s1| ++-----------------------------+-------------+ +|2023-11-08T16:41:50.008+08:00| 1.0| ++-----------------------------+-------------+ +|2023-11-08T16:46:50.011+08:00| 2.0| ++-----------------------------+-------------+ +|2023-11-08T16:48:50.011+08:00| 3.0| ++-----------------------------+-------------+ +``` + +We want to group the data by 1 min time interval: + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| null| ++-----------------------------+------------------+ +``` + +After grouping, we want to fill the null value: + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + +we also don't want the null value to be filled if it keeps null for 2 min. + +```sql +select avg(s1) +from root.db.d1 +group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS, 2m); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + +#### `LINEAR` Fill + +**For null values in the query result set, fill the column with a linear interpolation of the previous non-null value and the next non-null value of the column.** + +**Note:** + +- If all the values before current value are null or all the values after current value are null, we will keep current value as null and won't fill it. +- If the column's data type is boolean/text, we neither fill it nor throw exception, just keep it as it is. + +Here we give an example of filling null values using the linear method. The SQL statement is as follows: + +For example, with `LINEAR` fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 22.08| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +#### Constant Fill + +**For null values in the query result set, fill with the specified constant.** + +**Note:** + +- When using the ValueFill, IoTDB neither fill the query result if the data type is different from the input constant nor throw exception, just keep it as it is. + + | Constant Value Data Type | Support Data Type | + | :----------------------- | :-------------------------------------- | + | `BOOLEAN` | `BOOLEAN` `TEXT` | + | `INT64` | `INT32` `INT64` `FLOAT` `DOUBLE` `TEXT` | + | `DOUBLE` | `FLOAT` `DOUBLE` `TEXT` | + | `TEXT` | `TEXT` | + +- If constant value is larger than Integer.MAX_VALUE, IoTDB neither fill the query result if the data type is int32 nor throw exception, just keep it as it is. + +For example, with `FLOAT` constant fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 2.0| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +For example, with `BOOLEAN` constant fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| true| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +## `LIMIT` and `SLIMIT` CLAUSES (PAGINATION) + +When the query result set has a large amount of data, it is not conducive to display on one page. You can use the `LIMIT/SLIMIT` clause and the `OFFSET/SOFFSET` clause to control paging. + +- The `LIMIT` and `SLIMIT` clauses are used to control the number of rows and columns of query results. +- The `OFFSET` and `SOFFSET` clauses are used to control the starting position of the result display. + +### Row Control over Query Results + +By using LIMIT and OFFSET clauses, users control the query results in a row-related manner. We demonstrate how to use LIMIT and OFFSET clauses through the following examples. + +* Example 1: basic LIMIT clause + +The SQL statement is: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 10 +``` + +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires the first 10 rows of the query result. + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:00:00.000+08:00| true| 25.96| +|2017-11-01T00:01:00.000+08:00| true| 24.36| +|2017-11-01T00:02:00.000+08:00| false| 20.09| +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 10 +It costs 0.000s +``` + +* Example 2: LIMIT clause with OFFSET + +The SQL statement is: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 +``` + +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires rows 3 to 7 of the query result be returned (with the first row numbered as row 0). + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.342s +``` + +* Example 3: LIMIT clause combined with WHERE clause + +The SQL statement is: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2024-07-07T00:05:00.000 and time< 2024-07-12T00:12:00.000 limit 5 offset 3 +``` + +which means: + +The selected equipment is the ln group wf01 factory wt01 equipment; The selected time series are "state" and "temperature". The SQL statement requires the return of the status and temperature sensor values between the time "2024-07-07T00:05:00.000" and "2024-07-12T00:12:00.0000" on lines 3 to 7 (the first line is numbered as line 0). + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2024-07-09T17:32:11.943+08:00| true| 24.941973| +|2024-07-09T17:32:12.944+08:00| true| 20.05108| +|2024-07-09T17:32:13.945+08:00| true| 20.541632| +|2024-07-09T17:32:14.945+08:00| null| 23.09016| +|2024-07-09T17:32:14.946+08:00| true| null| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.070s +``` + +* Example 4: LIMIT clause combined with GROUP BY clause + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 5 offset 3 +``` + +which means: + +The SQL statement clause requires rows 3 to 7 of the query result be returned (with the first row numbered as row 0). + +The result is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 4 +It costs 0.016s +``` + +### Column Control over Query Results + +By using SLIMIT and SOFFSET clauses, users can control the query results in a column-related manner. We will demonstrate how to use SLIMIT and SOFFSET clauses through the following examples. + +* Example 1: basic SLIMIT clause + +The SQL statement is: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 +``` + +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is the first column under this device, i.e., the power supply status. The SQL statement requires the status sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. + +The result is shown below: + +``` ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| 20.71| +|2017-11-01T00:07:00.000+08:00| 21.45| +|2017-11-01T00:08:00.000+08:00| 22.58| +|2017-11-01T00:09:00.000+08:00| 20.98| +|2017-11-01T00:10:00.000+08:00| 25.52| +|2017-11-01T00:11:00.000+08:00| 22.91| ++-----------------------------+-----------------------------+ +Total line number = 6 +It costs 0.000s +``` + +* Example 2: SLIMIT clause with SOFFSET + +The SQL statement is: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 +``` + +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is the second column under this device, i.e., the temperature. The SQL statement requires the temperature sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. + +The result is shown below: + +``` ++-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.status| ++-----------------------------+------------------------+ +|2017-11-01T00:06:00.000+08:00| false| +|2017-11-01T00:07:00.000+08:00| false| +|2017-11-01T00:08:00.000+08:00| false| +|2017-11-01T00:09:00.000+08:00| false| +|2017-11-01T00:10:00.000+08:00| true| +|2017-11-01T00:11:00.000+08:00| false| ++-----------------------------+------------------------+ +Total line number = 6 +It costs 0.003s +``` + +* Example 3: SLIMIT clause combined with GROUP BY clause + +The SQL statement is: + +```sql +select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 +``` + +The result is shown below: + +``` ++-----------------------------+-----------------------------------+ +| Time|max_value(root.ln.wf01.wt01.status)| ++-----------------------------+-----------------------------------+ +|2017-11-01T00:00:00.000+08:00| true| +|2017-11-02T00:00:00.000+08:00| true| +|2017-11-03T00:00:00.000+08:00| true| +|2017-11-04T00:00:00.000+08:00| true| +|2017-11-05T00:00:00.000+08:00| true| +|2017-11-06T00:00:00.000+08:00| true| +|2017-11-07T00:00:00.000+08:00| true| ++-----------------------------+-----------------------------------+ +Total line number = 7 +It costs 0.000s +``` + +### Row and Column Control over Query Results + +In addition to row or column control over query results, IoTDB allows users to control both rows and columns of query results. Here is a complete example with both LIMIT clauses and SLIMIT clauses. + +The SQL statement is: + +```sql +select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 +``` + +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is columns 0 to 1 under this device (with the first column numbered as column 0). The SQL statement clause requires rows 100 to 109 of the query result be returned (with the first row numbered as row 0). + +The result is shown below: + +``` ++-----------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+-----------------------------+------------------------+ +|2017-11-01T01:40:00.000+08:00| 21.19| false| +|2017-11-01T01:41:00.000+08:00| 22.79| false| +|2017-11-01T01:42:00.000+08:00| 22.98| false| +|2017-11-01T01:43:00.000+08:00| 21.52| false| +|2017-11-01T01:44:00.000+08:00| 23.45| true| +|2017-11-01T01:45:00.000+08:00| 24.06| true| +|2017-11-01T01:46:00.000+08:00| 22.6| false| +|2017-11-01T01:47:00.000+08:00| 23.78| true| +|2017-11-01T01:48:00.000+08:00| 24.72| true| +|2017-11-01T01:49:00.000+08:00| 24.68| true| ++-----------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.009s +``` + +### Error Handling + +If the parameter N/SN of LIMIT/SLIMIT exceeds the size of the result set, IoTDB returns all the results as expected. For example, the query result of the original SQL statement consists of six rows, and we select the first 100 rows through the LIMIT clause: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 100 +``` + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 6 +It costs 0.005s +``` + +If the parameter N/SN of LIMIT/SLIMIT clause exceeds the allowable maximum value (N/SN is of type int64), the system prompts errors. For example, executing the following SQL statement: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 9223372036854775808 +``` + +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +Msg: 416: Out of range. LIMIT : N should be Int64. +``` + +If the parameter N/SN of LIMIT/SLIMIT clause is not a positive intege, the system prompts errors. For example, executing the following SQL statement: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 13.1 +``` + +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +Msg: 401: line 1:129 mismatched input '.' expecting {, ';'} +``` + +If the parameter OFFSET of LIMIT clause exceeds the size of the result set, IoTDB will return an empty result set. For example, executing the following SQL statement: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 2 offset 6 +``` + +The result is shown below: + +``` ++----+------------------------+-----------------------------+ +|Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++----+------------------------+-----------------------------+ ++----+------------------------+-----------------------------+ +Empty set. +It costs 0.005s +``` + +If the parameter SOFFSET of SLIMIT clause is not smaller than the number of available timeseries, the system prompts errors. For example, executing the following SQL statement: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 2 +``` + +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +Msg: 411: Meet error in query process: The value of SOFFSET (2) is equal to or exceeds the number of sequences (2) that can actually be returned. +``` + +## `ORDER BY` CLAUSE + +### Order by in ALIGN BY TIME mode + +The result set of IoTDB is in ALIGN BY TIME mode by default and `ORDER BY TIME` clause can also be used to specify the ordering of timestamp. The SQL statement is: + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; +``` + +Results: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-01T00:01:00.000+08:00| v2| true| 24.36| true| +|2017-11-01T00:00:00.000+08:00| v2| true| 25.96| true| +|1970-01-01T08:00:00.002+08:00| v2| false| null| null| +|1970-01-01T08:00:00.001+08:00| v1| true| null| null| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +``` + +### Order by in ALIGN BY DEVICE mode + +When querying in ALIGN BY DEVICE mode, `ORDER BY` clause can be used to specify the ordering of result set. + +ALIGN BY DEVICE mode supports four kinds of clauses with two sort keys which are `Device` and `Time`. + +1. ``ORDER BY DEVICE``: sort by the alphabetical order of the device name. The devices with the same column names will be clustered in a group view. + +2. ``ORDER BY TIME``: sort by the timestamp, the data points from different devices will be shuffled according to the timestamp. + +3. ``ORDER BY DEVICE,TIME``: sort by the alphabetical order of the device name. The data points with the same device name will be sorted by timestamp. + +4. ``ORDER BY TIME,DEVICE``: sort by timestamp. The data points with the same time will be sorted by the alphabetical order of the device name. + +> To make the result set more legible, when `ORDER BY` clause is not used, default settings will be provided. +> The default ordering clause is `ORDER BY DEVICE,TIME` and the default ordering is `ASC`. + +When `Device` is the main sort key, the result set is sorted by device name first, then by timestamp in the group with the same device name, the SQL statement is: + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; +``` + +The result shows below: + +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` + +When `Time` is the main sort key, the result set is sorted by timestamp first, then by device name in data points with the same timestamp. The SQL statement is: + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; +``` + +The result shows below: + +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` + +When `ORDER BY` clause is not used, sort in default way, the SQL statement is: + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` + +The result below indicates `ORDER BY DEVICE ASC,TIME ASC` is the clause in default situation. +`ASC` can be omitted because it's the default ordering. + +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| ++-----------------------------+-----------------+--------+------+-----------+ +``` + +Besides,`ALIGN BY DEVICE` and `ORDER BY` clauses can be used with aggregate query,the SQL statement is: + +```sql +select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device +``` + +The result shows below: + +``` ++-----------------------------+-----------------+---------------+-------------+------------------+ +| Time| Device|count(hardware)|count(status)|count(temperature)| ++-----------------------------+-----------------+---------------+-------------+------------------+ +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| 1| 1| +|2017-11-01T00:02:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:03:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| 1| 1| null| +|2017-11-01T00:02:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| +|2017-11-01T00:03:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| ++-----------------------------+-----------------+---------------+-------------+------------------+ +``` + +### Order by arbitrary expressions + +In addition to the predefined keywords "Time" and "Device" in IoTDB, `ORDER BY` can also be used to sort by any expressions. + +When sorting, `ASC` or `DESC` can be used to specify the sorting order, and `NULLS` syntax is supported to specify the priority of NULL values in the sorting. By default, `NULLS FIRST` places NULL values at the top of the result, and `NULLS LAST` ensures that NULL values appear at the end of the result. If not specified in the clause, the default order is ASC with NULLS LAST. + +Here are several examples of queries for sorting arbitrary expressions using the following data: + +``` ++-----------------------------+-------------+-------+-------+--------+-------+ +| Time| Device| base| score| bonus| total| ++-----------------------------+-------------+-------+-------+--------+-------+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0| 107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0| 105.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0| 103.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00| root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.020+08:00| root.three| 8| null| 22.5| 30.5| +|1970-01-01T08:00:00.030+08:00| root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0| 104.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0| 102.0| ++-----------------------------+-------------+-------+-------+--------+-------+ +``` + +When you need to sort the results based on the base score score, you can use the following SQL: + +```Sql +select score from root.** order by score desc align by device +``` + +This will give you the following results: + +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` + +If you want to sort the results based on the total score, you can use an expression in the `ORDER BY` clause to perform the calculation: + +```Sql +select score,total from root.one order by base+score+bonus desc +``` + +This SQL is equivalent to: + +```Sql +select score,total from root.one order by total desc +``` + +Here are the results: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.one.score|root.one.total| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.000+08:00| 50.0| 107.0| +|1970-01-02T08:00:00.000+08:00| 50.0| 105.0| +|1970-01-03T08:00:00.000+08:00| 50.0| 103.0| ++-----------------------------+--------------+--------------+ +``` + +If you want to sort the results based on the total score and, in case of tied scores, sort by score, base, bonus, and submission time in descending order, you can specify multiple layers of sorting using multiple expressions: + +```Sql +select base, score, bonus, total from root.** order by total desc NULLS Last, + score desc NULLS Last, + bonus desc NULLS Last, + time desc align by device +``` + +Here are the results: + +``` ++-----------------------------+----------+----+-----+-----+-----+ +| Time| Device|base|score|bonus|total| ++-----------------------------+----------+----+-----+-----+-----+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0|107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0|105.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0|104.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0|103.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0|102.0| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.000+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00|root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.030+08:00|root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.020+08:00|root.three| 8| null| 22.5| 30.5| ++-----------------------------+----------+----+-----+-----+-----+ +``` + +In the `ORDER BY` clause, you can also use aggregate query expressions. For example: + +```Sql +select min_value(total) from root.** order by min_value(total) asc align by device +``` + +This will give you the following results: + +``` ++----------+----------------+ +| Device|min_value(total)| ++----------+----------------+ +|root.three| 30.5| +| root.two| 33.0| +| root.four| 85.0| +| root.five| 102.0| +| root.one| 103.0| ++----------+----------------+ +``` + +When specifying multiple columns in the query, the unsorted columns will change order along with the rows and sorted columns. The order of rows when the sorting columns are the same may vary depending on the specific implementation (no fixed order). For example: + +```Sql +select min_value(total),max_value(base) from root.** order by max_value(total) desc align by device +``` + +This will give you the following results: +· + +``` ++----------+----------------+---------------+ +| Device|min_value(total)|max_value(base)| ++----------+----------------+---------------+ +| root.one| 103.0| 12| +| root.five| 102.0| 7| +| root.four| 85.0| 9| +| root.two| 33.0| 9| +|root.three| 30.5| 9| ++----------+----------------+---------------+ +``` + +You can use both `ORDER BY DEVICE,TIME` and `ORDER BY EXPRESSION` together. For example: + +```Sql +select score from root.** order by device asc, score desc, time asc align by device +``` + +This will give you the following results: + +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` + +## `ALIGN BY` CLAUSE + +In addition, IoTDB supports another result set format: `ALIGN BY DEVICE`. + +### Align by Device + +The `ALIGN BY DEVICE` indicates that the deviceId is considered as a column. Therefore, there are totally limited columns in the dataset. + +> NOTE: +> +> 1.You can see the result of 'align by device' as one relational table, `Time + Device` is the primary key of this Table. +> +> 2.The result is order by `Device` firstly, and then by `Time` order. + +The SQL statement is: + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` + +The result shows below: + +``` ++-----------------------------+-----------------+-----------+------+--------+ +| Time| Device|temperature|status|hardware| ++-----------------------------+-----------------+-----------+------+--------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| 25.96| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| 24.36| true| null| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| null| true| v1| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| null| false| v2| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| null| true| v2| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| null| true| v2| ++-----------------------------+-----------------+-----------+------+--------+ +Total line number = 6 +It costs 0.012s +``` + +### Ordering in ALIGN BY DEVICE + +ALIGN BY DEVICE mode arranges according to the device first, and sort each device in ascending order according to the timestamp. The ordering and priority can be adjusted through `ORDER BY` clause. + +## `INTO` CLAUSE (QUERY WRITE-BACK) + +The `SELECT INTO` statement copies data from query result set into target time series. + +The application scenarios are as follows: + +- **Implement IoTDB internal ETL**: ETL the original data and write a new time series. +- **Query result storage**: Persistently store the query results, which acts like a materialized view. +- **Non-aligned time series to aligned time series**: Rewrite non-aligned time series into another aligned time series. + +### SQL Syntax + +#### Syntax Definition + +**The following is the syntax definition of the `select` statement:** + +```sql +selectIntoStatement +: SELECT + resultColumn [, resultColumn] ... + INTO intoItem [, intoItem] ... + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY groupByTimeClause, groupByLevelClause] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +; + +intoItem +: [ALIGNED] intoDevicePath '(' intoMeasurementName [',' intoMeasurementName]* ')' + ; +``` + +#### `INTO` Clause + +The `INTO` clause consists of several `intoItem`. + +Each `intoItem` consists of a target device and a list of target measurements (similar to the `INTO` clause in an `INSERT` statement). + +Each target measurement and device form a target time series, and an `intoItem` contains a series of time series. For example: `root.sg_copy.d1(s1, s2)` specifies two target time series `root.sg_copy.d1.s1` and `root.sg_copy.d1.s2`. + +The target time series specified by the `INTO` clause must correspond one-to-one with the columns of the query result set. The specific rules are as follows: + +- **Align by time** (default): The number of target time series contained in all `intoItem` must be consistent with the number of columns in the query result set (except the time column) and correspond one-to-one in the order from left to right in the header. +- **Align by device** (using `ALIGN BY DEVICE`): the number of target devices specified in all `intoItem` is the same as the number of devices queried (i.e., the number of devices matched by the path pattern in the `FROM` clause), and One-to-one correspondence according to the output order of the result set device. +
The number of measurements specified for each target device should be consistent with the number of columns in the query result set (except for the time and device columns). It should be in one-to-one correspondence from left to right in the header. + +For examples: + +- **Example 1** (aligned by time) + +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; ++--------------+-------------------+--------+ +| source column| target timeseries| written| ++--------------+-------------------+--------+ +| root.sg.d1.s1| root.sg_copy.d1.t1| 8000| ++--------------+-------------------+--------+ +| root.sg.d2.s1| root.sg_copy.d2.t1| 10000| ++--------------+-------------------+--------+ +| root.sg.d1.s2| root.sg_copy.d2.t2| 12000| ++--------------+-------------------+--------+ +| root.sg.d2.s2| root.sg_copy.d1.t2| 10000| ++--------------+-------------------+--------+ +Total line number = 4 +It costs 0.725s +``` + +This statement writes the query results of the four time series under the `root.sg` database to the four specified time series under the `root.sg_copy` database. Note that `root.sg_copy.d2(t1, t2)` can also be written as `root.sg_copy.d2(t1), root.sg_copy.d2(t2)`. + +We can see that the writing of the `INTO` clause is very flexible as long as the combined target time series is not repeated and corresponds to the query result column one-to-one. + +> In the result set displayed by `CLI`, the meaning of each column is as follows: +> +> - The `source column` column represents the column name of the query result. +> - `target timeseries` represents the target time series for the corresponding column to write. +> - `written` indicates the amount of data expected to be written. + + +- **Example 2** (aligned by time) + +```shell +IoTDB> select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); ++--------------------------------------+-------------------------+--------+ +| source column| target timeseries| written| ++--------------------------------------+-------------------------+--------+ +| count(root.sg.d1.s1 + root.sg.d1.s2)| root.agg.count.s1_add_s2| 10| ++--------------------------------------+-------------------------+--------+ +| last_value(root.sg.d1.s2)| root.agg.last_value.s2| 10| ++--------------------------------------+-------------------------+--------+ +Total line number = 2 +It costs 0.375s +``` + +This statement stores the results of an aggregated query into the specified time series. + +- **Example 3** (aligned by device) + +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+-------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s1| root.sg_copy.d1.t1| 8000| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s2| root.sg_copy.d1.t2| 11000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s1| root.sg_copy.d2.t1| 12000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s2| root.sg_copy.d2.t2| 9000| ++--------------+--------------+-------------------+--------+ +Total line number = 4 +It costs 0.625s +``` + +This statement also writes the query results of the four time series under the `root.sg` database to the four specified time series under the `root.sg_copy` database. However, in ALIGN BY DEVICE, the number of `intoItem` must be the same as the number of queried devices, and each queried device corresponds to one `intoItem`. + +> When aligning the query by device, the result set displayed by `CLI` has one more column, the `source device` column indicating the queried device. + +- **Example 4** (aligned by device) + +```shell +IoTDB> select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+------------------------+--------+ +| root.sg.d1| s1 + s2| root.expr.add.d1s1_d1s2| 10000| ++--------------+--------------+------------------------+--------+ +| root.sg.d2| s1 + s2| root.expr.add.d2s1_d2s2| 10000| ++--------------+--------------+------------------------+--------+ +Total line number = 2 +It costs 0.532s +``` + +This statement stores the result of evaluating an expression into the specified time series. + +#### Using variable placeholders + +In particular, We can use variable placeholders to describe the correspondence between the target and query time series, simplifying the statement. The following two variable placeholders are currently supported: + +- Suffix duplication character `::`: Copy the suffix (or measurement) of the query device, indicating that from this layer to the last layer (or measurement) of the device, the node name (or measurement) of the target device corresponds to the queried device The node name (or measurement) is the same. +- Single-level node matcher `${i}`: Indicates that the current level node name of the target sequence is the same as the i-th level node name of the query sequence. For example, for the path `root.sg1.d1.s1`, `${1}` means `sg1`, `${2}` means `d1`, and `${3}` means `s1`. + +When using variable placeholders, there must be no ambiguity in the correspondence between `intoItem` and the columns of the query result set. The specific cases are classified as follows: + +##### ALIGN BY TIME (default) + +> Note: The variable placeholder **can only describe the correspondence between time series**. If the query includes aggregation and expression calculation, the columns in the query result cannot correspond to a time series, so neither the target device nor the measurement can use variable placeholders. + +###### (1) The target device does not use variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** + +1. In each `intoItem`, the length of the list of physical quantities must be 1.
(If the length can be greater than 1, e.g. `root.sg1.d1(::, s1)`, it is not possible to determine which columns match `::`) +2. The number of `intoItem` is 1, or the same as the number of columns in the query result set.
(When the length of each target measurement list is 1, if there is only one `intoItem`, it means that all the query sequences are written to the same device; if the number of `intoItem` is consistent with the query sequence, it is expressed as each query time series specifies a target device; if `intoItem` is greater than one and less than the number of query sequences, it cannot be a one-to-one correspondence with the query sequence) + +**Matching method:** Each query time series specifies the target device, and the target measurement is generated from the variable placeholder. + +**Example:** + +```sql +select s1, s2 +into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) +from root.sg.d1, root.sg.d2; +```` + +This statement is equivalent to: + +```sql +select s1, s2 +into root.sg_copy.d1(s1), root.sg_copy.d2(s1), root.sg_copy.d1(s2), root.sg_copy.d2(s2) +from root.sg.d1, root.sg.d2; +```` + +As you can see, the statement is not very simplified in this case. + +###### (2) The target device uses variable placeholders & the target measurement list does not use variable placeholders + +**Limitations:** The number of target measurements in all `intoItem` is the same as the number of columns in the query result set. + +**Matching method:** The target measurement is specified for each query time series, and the target device is generated according to the target device placeholder of the `intoItem` where the corresponding target measurement is located. + +**Example:** + +```sql +select d1.s1, d1.s2, d2.s3, d3.s4 +into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) +from root.sg; +```` + +###### (3) The target device uses variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** There is only one `intoItem`, and the length of the list of measurement list is 1. + +**Matching method:** Each query time series can get a target time series according to the variable placeholder. + +**Example:** + +```sql +select * into root.sg_bk.::(::) from root.sg.**; +```` + +Write the query results of all time series under `root.sg` to `root.sg_bk`, the device name suffix and measurement remain unchanged. + +##### ALIGN BY DEVICE + +> Note: The variable placeholder **can only describe the correspondence between time series**. If the query includes aggregation and expression calculation, the columns in the query result cannot correspond to a specific physical quantity, so the target measurement cannot use variable placeholders. + +###### (1) The target device does not use variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** In each `intoItem`, if the list of measurement uses variable placeholders, the length of the list must be 1. + +**Matching method:** Each query time series specifies the target device, and the target measurement is generated from the variable placeholder. + +**Example:** + +```sql +select s1, s2, s3, s4 +into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) +from root.sg.d1, root.sg.d2, root.sg.d3 +align by device; +```` + +###### (2) The target device uses variable placeholders & the target measurement list does not use variable placeholders + +**Limitations:** There is only one `intoItem`. (If there are multiple `intoItem` with placeholders, we will not know which source devices each `intoItem` needs to match) + +**Matching method:** Each query device obtains a target device according to the variable placeholder, and the target measurement written in each column of the result set under each device is specified by the target measurement list. + +**Example:** + +```sql +select avg(s1), sum(s2) + sum(s3), count(s4) +into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) +from root.** +align by device; +```` + +###### (3) The target device uses variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** There is only one `intoItem` and the length of the target measurement list is 1. + +**Matching method:** Each query time series can get a target time series according to the variable placeholder. + +**Example:** + +```sql +select * into ::(backup_${4}) from root.sg.** align by device; +```` + +Write the query result of each time series in `root.sg` to the same device, and add `backup_` before the measurement. + +#### Specify the target time series as the aligned time series + +We can use the `ALIGNED` keyword to specify the target device for writing to be aligned, and each `intoItem` can be set independently. + +**Example:** + +```sql +select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +``` + +This statement specifies that `root.sg_copy.d1` is an unaligned device and `root.sg_copy.d2` is an aligned device. + +#### Unsupported query clauses + +- `SLIMIT`, `SOFFSET`: The query columns are uncertain, so they are not supported. +- `LAST`, `GROUP BY TAGS`, `DISABLE ALIGN`: The table structure is inconsistent with the writing structure, so it is not supported. + +#### Other points to note + +- For general aggregation queries, the timestamp is meaningless, and the convention is to use 0 to store. +- When the target time-series exists, the data type of the source column and the target time-series must be compatible. About data type compatibility, see the document [Data Type](../Basic-Concept/Data-Type.md#Data Type Compatibility). +- When the target time series does not exist, the system automatically creates it (including the database). +- When the queried time series does not exist, or the queried sequence does not have data, the target time series will not be created automatically. + +### Application examples + +#### Implement IoTDB internal ETL + +ETL the original data and write a new time series. + +```shell +IOTDB > SELECT preprocess_udf(s1, s2) INTO ::(preprocessed_s1, preprocessed_s2) FROM root.sg.* ALIGN BY DEIVCE; ++--------------+-------------------+---------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s1)| root.sg.d1.preprocessed_s1| 8000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s2)| root.sg.d1.preprocessed_s2| 10000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s1)| root.sg.d2.preprocessed_s1| 11000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s2)| root.sg.d2.preprocessed_s2| 9000| ++--------------+-------------------+---------------------------+--------+ +``` + +#### Query result storage + +Persistently store the query results, which acts like a materialized view. + +```shell +IOTDB > SELECT count(s1), last_value(s1) INTO root.sg.agg_${2}(count_s1, last_value_s1) FROM root.sg1.d1 GROUP BY ([0, 10000), 10ms); ++--------------------------+-----------------------------+--------+ +| source column| target timeseries| written| ++--------------------------+-----------------------------+--------+ +| count(root.sg.d1.s1)| root.sg.agg_d1.count_s1| 1000| ++--------------------------+-----------------------------+--------+ +| last_value(root.sg.d1.s2)| root.sg.agg_d1.last_value_s2| 1000| ++--------------------------+-----------------------------+--------+ +Total line number = 2 +It costs 0.115s +``` + +#### Non-aligned time series to aligned time series + +Rewrite non-aligned time series into another aligned time series. + +**Note:** It is recommended to use the `LIMIT & OFFSET` clause or the `WHERE` clause (time filter) to batch data to prevent excessive data volume in a single operation. + +```shell +IOTDB > SELECT s1, s2 INTO ALIGNED root.sg1.aligned_d(s1, s2) FROM root.sg1.non_aligned_d WHERE time >= 0 and time < 10000; ++--------------------------+----------------------+--------+ +| source column| target timeseries| written| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s1| root.sg1.aligned_d.s1| 10000| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s2| root.sg1.aligned_d.s2| 10000| ++--------------------------+----------------------+--------+ +Total line number = 2 +It costs 0.375s +``` + +### User Permission Management + +The user must have the following permissions to execute a query write-back statement: + +* All `WRITE_SCHEMA` permissions for the source series in the `select` clause. +* All `WRITE_DATA` permissions for the target series in the `into` clause. + +For more user permissions related content, please refer to [Account Management Statements](./Authority-Management.md). + +### Configurable Properties + +* `select_into_insert_tablet_plan_row_limit`: The maximum number of rows can be processed in one insert-tablet-plan when executing select-into statements. 10000 by default. diff --git a/src/UserGuide/V2.0.1/Tree/Basic-Concept/Write-Delete-Data.md b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Write-Delete-Data.md new file mode 100644 index 00000000..b5600b99 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Basic-Concept/Write-Delete-Data.md @@ -0,0 +1,278 @@ + + + +# Write & Delete Data +## CLI INSERT + +IoTDB provides users with a variety of ways to insert real-time data, such as directly inputting [INSERT SQL statement](../SQL-Manual/SQL-Manual.md#insert-data) in [Client/Shell tools](../Tools-System/CLI.md), or using [Java JDBC](../API/Programming-JDBC.md) to perform single or batch execution of [INSERT SQL statement](../SQL-Manual/SQL-Manual.md). + +NOTE: This section mainly introduces the use of [INSERT SQL statement](../SQL-Manual/SQL-Manual.md#insert-data) for real-time data import in the scenario. + +Writing a repeat timestamp covers the original timestamp data, which can be regarded as updated data. + +### Use of INSERT Statements + +The [INSERT SQL statement](../SQL-Manual/SQL-Manual.md#insert-data) statement is used to insert data into one or more specified timeseries created. For each point of data inserted, it consists of a [timestamp](../Basic-Concept/Data-Model-and-Terminology.md) and a sensor acquisition value (see [Data Type](../Basic-Concept/Data-Type.md)). + +In the scenario of this section, take two timeseries `root.ln.wf02.wt02.status` and `root.ln.wf02.wt02.hardware` as an example, and their data types are BOOLEAN and TEXT, respectively. + +The sample code for single column data insertion is as follows: + +``` +IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) +IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') +``` + +The above example code inserts the long integer timestamp and the value "true" into the timeseries `root.ln.wf02.wt02.status` and inserts the long integer timestamp and the value "v1" into the timeseries `root.ln.wf02.wt02.hardware`. When the execution is successful, cost time is shown to indicate that the data insertion has been completed. + +> Note: In IoTDB, TEXT type data can be represented by single and double quotation marks. The insertion statement above uses double quotation marks for TEXT type data. The following example will use single quotation marks for TEXT type data. + +The INSERT statement can also support the insertion of multi-column data at the same time point. The sample code of inserting the values of the two timeseries at the same time point '2' is as follows: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (2, false, 'v2') +``` + +In addition, The INSERT statement support insert multi-rows at once. The sample code of inserting two rows as follows: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') +``` + +After inserting the data, we can simply query the inserted data using the SELECT statement: + +```sql +IoTDB > select * from root.ln.wf02.wt02 where time < 5 +``` + +The result is shown below. The query result shows that the insertion statements of single column and multi column data are performed correctly. + +``` ++-----------------------------+--------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status| ++-----------------------------+--------------------------+------------------------+ +|1970-01-01T08:00:00.001+08:00| v1| true| +|1970-01-01T08:00:00.002+08:00| v2| false| +|1970-01-01T08:00:00.003+08:00| v3| false| +|1970-01-01T08:00:00.004+08:00| v4| true| ++-----------------------------+--------------------------+------------------------+ +Total line number = 4 +It costs 0.004s +``` + +In addition, we can omit the timestamp column, and the system will use the current system timestamp as the timestamp of the data point. The sample code is as follows: + +```sql +IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') +``` + +**Note:** Timestamps must be specified when inserting multiple rows of data in a SQL. + +### Insert Data Into Aligned Timeseries + +To insert data into a group of aligned time series, we only need to add the `ALIGNED` keyword in SQL, and others are similar. + +The sample code is as follows: + +```sql +IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(2, 2, 2), (3, 3, 3) +IoTDB > select * from root.sg1.d1 +``` + +The result is shown below. The query result shows that the insertion statements are performed correctly. + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| 2.0| +|1970-01-01T08:00:00.003+08:00| 3| 3.0| ++-----------------------------+--------------+--------------+ +Total line number = 3 +It costs 0.004s +``` + +## NATIVE API WRITE + +The Native API ( Session ) is the most widely used series of APIs of IoTDB, including multiple APIs, adapted to different data collection scenarios, with high performance and multi-language support. + +### Multi-language API write + +#### Java + +Before writing via the Java API, you need to establish a connection, refer to [Java Native API](../API/Programming-Java-Native-API.md). +then refer to [ JAVA Data Manipulation Interface (DML) ](../API/Programming-Java-Native-API.md#insert) + +#### Python + +Refer to [ Python Data Manipulation Interface (DML) ](../API/Programming-Python-Native-API.md#insert) + +#### C++ + +Refer to [ C++ Data Manipulation Interface (DML) ](../API/Programming-Cpp-Native-API.md#insert) + +#### Go + +Refer to [Go Native API](../API/Programming-Go-Native-API.md) + +## REST API WRITE + +Refer to [insertTablet (v1)](../API/RestServiceV1.md#inserttablet) or [insertTablet (v2)](../API/RestServiceV2.md#inserttablet) + +Example: + +```JSON +{ +      "timestamps": [ +            1, +            2, +            3 +      ], +      "measurements": [ +            "temperature", +            "status" +      ], +      "data_types": [ +            "FLOAT", +            "BOOLEAN" +      ], +      "values": [ +            [ +                  1.1, +                  2.2, +                  3.3 +            ], +            [ +                  false, +                  true, +                  true +            ] +      ], +      "is_aligned": false, +      "device": "root.ln.wf01.wt01" +} +``` + +## MQTT WRITE + +Refer to [Built-in MQTT Service](../API/Programming-MQTT.md#built-in-mqtt-service) + +## BATCH DATA LOAD + +In different scenarios, the IoTDB provides a variety of methods for importing data in batches. This section describes the two most common methods for importing data in CSV format and TsFile format. + +### TsFile Batch Load + +TsFile is the file format of time series used in IoTDB. You can directly import one or more TsFile files with time series into another running IoTDB instance through tools such as CLI. For details, see [Data Import](../Tools-System/Data-Import-Tool.md). + +### CSV Batch Load + +CSV stores table data in plain text. You can write multiple formatted data into a CSV file and import the data into the IoTDB in batches. Before importing data, you are advised to create the corresponding metadata in the IoTDB. Don't worry if you forget to create one, the IoTDB can automatically infer the data in the CSV to its corresponding data type, as long as you have a unique data type for each column. In addition to a single file, the tool supports importing multiple CSV files as folders and setting optimization parameters such as time precision. For details, see [Data Import](../Tools-System/Data-Import-Tool.md). + +## DELETE + +Users can delete data that meet the deletion condition in the specified timeseries by using the [DELETE statement](../SQL-Manual/SQL-Manual.md#delete-data). When deleting data, users can select one or more timeseries paths, prefix paths, or paths with star to delete data within a certain time interval. + +In a JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute single or batch UPDATE statements. + +### Delete Single Timeseries + +Taking ln Group as an example, there exists such a usage scenario: + +The wf02 plant's wt02 device has many segments of errors in its power supply status before 2017-11-01 16:26:00, and the data cannot be analyzed correctly. The erroneous data affected the correlation analysis with other devices. At this point, the data before this time point needs to be deleted. The SQL statement for this operation is + +```sql +delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; +``` + +In case we hope to merely delete the data before 2017-11-01 16:26:00 in the year of 2017, The SQL statement is: + +```sql +delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` + +IoTDB supports to delete a range of timeseries points. Users can write SQL expressions as follows to specify the delete interval: + +```sql +delete from root.ln.wf02.wt02.status where time < 10 +delete from root.ln.wf02.wt02.status where time <= 10 +delete from root.ln.wf02.wt02.status where time < 20 and time > 10 +delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 +delete from root.ln.wf02.wt02.status where time > 20 +delete from root.ln.wf02.wt02.status where time >= 20 +delete from root.ln.wf02.wt02.status where time = 20 +``` + +Please pay attention that multiple intervals connected by "OR" expression are not supported in delete statement: + +``` +delete from root.ln.wf02.wt02.status where time > 4 or time < 0 +Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic +expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' +``` + +If no "where" clause specified in a delete statement, all the data in a timeseries will be deleted. + +```sql +delete from root.ln.wf02.wt02.status +``` + + +### Delete Multiple Timeseries + +If both the power supply status and hardware version of the ln group wf02 plant wt02 device before 2017-11-01 16:26:00 need to be deleted, [the prefix path with broader meaning or the path with star](../Basic-Concept/Data-Model-and-Terminology.md) can be used to delete the data. The SQL statement for this operation is: + +```sql +delete from root.ln.wf02.wt02 where time <= 2017-11-01T16:26:00; +``` + +or + +```sql +delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; +``` + +It should be noted that when the deleted path does not exist, IoTDB will not prompt that the path does not exist, but that the execution is successful, because SQL is a declarative programming method. Unless it is a syntax error, insufficient permissions and so on, it is not considered an error, as shown below: + +``` +IoTDB> delete from root.ln.wf03.wt02.status where time < now() +Msg: The statement is executed successfully. +``` + +### Delete Time Partition (experimental) + +You may delete all data in a time partition of a database using the following grammar: + +```sql +DELETE PARTITION root.ln 0,1,2 +``` + +The `0,1,2` above is the id of the partition that is to be deleted, you can find it from the IoTDB +data folders or convert a timestamp manually to an id using `timestamp / partitionInterval +` (flooring), and the `partitionInterval` should be in your config (if time-partitioning is +supported in your version). + +Please notice that this function is experimental and mainly for development, please use it with care. + diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/AINode_Deployment_timecho.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/AINode_Deployment_timecho.md new file mode 100644 index 00000000..fe7c766f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/AINode_Deployment_timecho.md @@ -0,0 +1,556 @@ + +# AINode Deployment + +## AINode Introduction + +### Capability Introduction + + AINode is the third type of endogenous node provided by IoTDB after the Configurable Node and DataNode. This node extends its ability to perform machine learning analysis on time series by interacting with the DataNode and Configurable Node of the IoTDB cluster. It supports the introduction of existing machine learning models from external sources for registration and the use of registered models to complete time series analysis tasks on specified time series data through simple SQL statements. The creation, management, and inference of models are integrated into the database engine. Currently, machine learning algorithms or self-developed models are available for common time series analysis scenarios, such as prediction and anomaly detection. + +### Delivery Method + It is an additional package outside the IoTDB cluster, with independent installation and activation (if you need to try or use it, please contact Timecho Technology Business or Technical Support). + +### Deployment mode +
+ + +
+ +## Installation preparation + +### Get installation package + + Users can download the software installation package for AINode, download and unzip it to complete the installation of AINode. + + Unzip and install the package + `(iotdb-enterprise-ainode-.zip)`, The directory structure after unpacking the installation package is as follows: +| **Catalogue** | **Type** | **Explain** | +| ------------ | -------- | ------------------------------------------------ | +| lib | folder | AINode compiled binary executable files and related code dependencies | +| sbin | folder | The running script of AINode can start, remove, and stop AINode | +| conf | folder | Contains configuration items for AINode, specifically including the following configuration items | +| LICENSE | file | Certificate | +| NOTICE | file | Tips | +| README_ZH.md | file | Explanation of the Chinese version of the markdown format | +| `README.md` | file | Instructions | + +### Environment preparation +- Suggested operating environment:Ubuntu, CentOS, MacOS + +- Runtime Environment + - Python>=3.8 and Python <= 3.14 is sufficient in a networked environment, and comes with pip and venv tools; Python 3.8 version is required for non networked environments, and download the zip package for the corresponding operating system from [here](https://cloud.tsinghua.edu.cn/d/4c1342f6c272439aa96c/?p=%2Flibs&mode=list) (Note that when downloading dependencies, you need to select the zip file in the libs folder, as shown in the following figure). Copy all files in the folder to the `lib` folder in the `iotdb-enterprise-ainode-` folder, and follow the steps below to start AINode. + + + + - There must be a Python interpreter in the environment variables that can be directly called through the `python` instruction. + - It is recommended to create a Python interpreter venv virtual environment in the `iotdb-enterprise-ainode-` folder. If installing version 3.8.0 virtual environment, the statement is as follows: + ```shell + # Install version 3.8.0 of Venv , Create a virtual environment with the folder name `venv`. + ../Python-3.8.0/python -m venv `venv` + ``` + +## Installation steps + +### Install AINode + +1. AINode activation + + Require IoTDB to be in normal operation and have AINode module authorization in the license (usually not in the license, please contact T Business or technical support personnel to obtain AINode module authorization). + + The authorization method for activating the AINode module is as follows: + - Method 1: Activate file copy activation + - After restarting the confignode node, enter the activation folder, copy the system_info file to the Timecho staff, and inform them to apply for independent authorization for AINode; + - Received the license file returned by the staff; + - Put the license file into the activation folder of the corresponding node; + +- Method 2: Activate Script Activation + - Obtain the required machine code for activation, enter the `sbin` directory of the installation directory, and execute the activation script: + ```shell + cd sbin + ./start-activate.sh + ``` + - The following information is displayed. Please copy the machine code (i.e. this string of characters) to the Timecho staff and inform them to apply for independent authorization of AINode: + ```shell + Please copy the system_info's content and send it to Timecho: + Y17hFA0xRCE1TmkVxILuCIEPc7uJcr5bzlXWiptw8uZTmTX5aThfypQdLUIhMljw075hNRSicyvyJR9JM7QaNm1gcFZPHVRWVXIiY5IlZkXdxCVc1erXMsbCqUYsR2R2Mw4PSpFJsUF5jHWSoFIIjQ2bmJFW5P52KCccFMVeHTc= + Please enter license: + ``` + - Enter the activation code returned by the staff into the `Please enter license:` command prompt in the previous step, as shown below: + ```shell + Please enter license: + Jw+MmF+AtexsfgNGOFgTm83BgXbq0zT1+fOfPvQsLlj6ZsooHFU6HycUSEGC78eT1g67KPvkcLCUIsz2QpbyVmPLr9x1+kVjBubZPYlVpsGYLqLFc8kgpb5vIrPLd3hGLbJ5Ks8fV1WOVrDDVQq89YF2atQa2EaB9EAeTWd0bRMZ+s9ffjc/1Zmh9NSP/T3VCfJcJQyi7YpXWy5nMtcW0gSV+S6fS5r7a96PjbtE0zXNjnEhqgRzdU+mfO8gVuUNaIy9l375cp1GLpeCh6m6pF+APW1CiXLTSijK9Qh3nsL5bAOXNeob5l+HO5fEMgzrW8OJPh26Vl6ljKUpCvpTiw== + License has been stored to sbin/../activation/license + Import completed. Please start cluster and excute 'show cluster' to verify activation status + ``` +- After updating the license, restart the DataNode node and enter the sbin directory of IoTDB to start the datanode: + ```shell + cd sbin + ./start-datanode.sh -d #The parameter'd 'will be started in the background + ``` + + 2. Check the kernel architecture of Linux + ```shell + uname -m + ``` + + 3. Import Python environment [Download](https://repo.anaconda.com/miniconda/) + + Recommend downloading the py311 version application and importing it into the iotdb dedicated folder in the user's root directory + + 4. Switch to the iotdb dedicated folder to install the Python environment + + Taking Miniconda 3-py311_24.5.0-0-Lux-x86_64 as an example: + + ```shell + bash ./Miniconda3-py311_24.5.0-0-Linux-x86_64.sh + ``` + > Type "Enter", "Long press space", "Enter", "Yes", "Yes" according to the prompt
+ > Close the current SSH window and reconnect + + 5. Create a dedicated environment + + ```shell + conda create -n ainode_py python=3.11.9 + ``` + + Type 'y' according to the prompt + + 6. Activate dedicated environment + + ```shell + conda activate ainode_py + ``` + + 7. Verify Python version + + ```shell + python --version + ``` + 8. Download and import AINode to a dedicated folder, switch to the dedicated folder and extract the installation package + + ```shell + unzip iotdb-enterprise-ainode-1.3.3.2.zip + ``` + + 9. Configuration item modification + + ```shell + vi iotdb-enterprise-ainode-1.3.3.2/conf/iotdb-ainode.properties + ``` + Configuration item modification:[detailed information](#configuration-item-modification) + + > ain_seed_config_node=iotdb-1:10710 (Cluster communication node IP: communication node port)
+ > ain_inference_rpc_address=iotdb-3 (IP address of the server running AINode) + + 10. Replace Python source + + ```shell + pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ + ``` + + 11. Start the AINode node + + ```shell + nohup bash iotdb-enterprise-ainode-1.3.3.2/sbin/start-ainode.sh > myout.file 2>& 1 & + ``` + > Return to the default environment of the system: conda deactivate + + ### Configuration item modification + +AINode supports modifying some necessary parameters. You can find the following parameters in the `conf/iotdb-ainode.properties` file and make persistent modifications to them: +: + +| **Name** | **Describe** | **Type** | **Default value** | **Effective method after modification** | +| :----------------------------- | ------------------------------------------------------------ | ------- | ------------------ | ---------------------------- | +| cluster_name | The identifier for AINode to join the cluster | string | defaultCluster | Only allow modifications before the first service startup | +| ain_seed_config_node | The Configurable Node address registered during AINode startup | String | 127.0.0.1:10710 | Only allow modifications before the first service startup | +| ain_inference_rpc_address | AINode provides service and communication addresses , Internal Service Communication Interface | String | 127.0.0.1 | Only allow modifications before the first service startup | +| ain_inference_rpc_port | AINode provides ports for services and communication | String | 10810 | Only allow modifications before the first service startup | +| ain_system_dir | AINode metadata storage path, the starting directory of the relative path is related to the operating system, and it is recommended to use an absolute path | String | data/AINode/system | Only allow modifications before the first service startup | +| ain_models_dir | AINode stores the path of the model file, and the starting directory of the relative path is related to the operating system. It is recommended to use an absolute path | String | data/AINode/models | Only allow modifications before the first service startup | +| ain_logs_dir | The path where AINode stores logs, the starting directory of the relative path is related to the operating system, and it is recommended to use an absolute path | String | logs/AINode | Effective after restart | +| ain_thrift_compression_enabled | Does AINode enable Thrift's compression mechanism , 0-Do not start, 1-Start | Boolean | 0 | Effective after restart | + +### Start AINode + + After completing the deployment of Seed Config Node, the registration and inference functions of the model can be supported by adding AINode nodes. After specifying the information of the IoTDB cluster in the configuration file, the corresponding instruction can be executed to start AINode and join the IoTDB cluster。 + +#### Networking environment startup + +##### Start command + +```shell + # Start command + # Linux and MacOS systems + bash sbin/start-ainode.sh + + # Windows systems + sbin\start-ainode.bat + + # Backend startup command (recommended for long-term running) + # Linux and MacOS systems + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + + # Windows systems + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +#### Detailed Syntax + +```shell + # Start command + # Linux and MacOS systems + bash sbin/start-ainode.sh -i -r -n + + # Windows systems + sbin\start-ainode.bat -i -r -n + ``` + +##### Parameter introduction: + +| **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | +| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | ---------------------- | +| ain_interpreter_dir | -i | The interpreter path of the virtual environment where AINode is installed requires the use of an absolute path. | no | String | Default reading of environment variables | Input or persist modifications during invocation | +| ain_force_reinstall | -r | Does this script check the version when checking the installation status of AINode. If it does, it will force the installation of the whl package in lib if the version is incorrect. | no | Bool | false | Input when calling | +| ain_no_dependencies | -n | Specify whether to install dependencies when installing AINode, and if so, only install the AINode main program without installing dependencies. | no | Bool | false | Input when calling | + + If you don't want to specify the corresponding parameters every time you start, you can also persistently modify the parameters in the `ainode-env.sh` and `ainode-env.bat` scripts in the `conf` folder (currently supporting persistent modification of the ain_interpreter-dir parameter). + + `ainode-env.sh` : + ```shell + # The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + # ain_interpreter_dir= + ``` + `ainode-env.bat` : +```shell + @REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + @REM set ain_interpreter_dir= + ``` + After writing the parameter value, uncomment the corresponding line and save it to take effect on the next script execution. + + +#### Example + +##### Directly start: + +```shell + # Start command + # Linux and MacOS systems + bash sbin/start-ainode.sh + # Windows systems + sbin\start-ainode.bat + + + # Backend startup command (recommended for long-term running) + # Linux and MacOS systems + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + # Windows systems + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +##### Update Start: +If the version of AINode has been updated (such as updating the `lib` folder), this command can be used. Firstly, it is necessary to ensure that AINode has stopped running, and then restart it using the `-r` parameter, which will reinstall AINode based on the files under `lib`. + + +```shell + # Update startup command + # Linux and MacOS systems + bash sbin/start-ainode.sh -r + # Windows systems + sbin\start-ainode.bat -r + + + # Backend startup command (recommended for long-term running) + # Linux and MacOS systems + nohup bash sbin/start-ainode.sh -r > myout.file 2>& 1 & + # Windows c + nohup bash sbin\start-ainode.bat -r > myout.file 2>& 1 & + ``` +#### Non networked environment startup + +##### Start command + +```shell + # Start command + # Linux and MacOS systems + bash sbin/start-ainode.sh + + # Windows systems + sbin\start-ainode.bat + + # Backend startup command (recommended for long-term running) + # Linux and MacOS systems + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + + # Windows systems + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +#### Detailed Syntax + +```shell + # Start command + # Linux and MacOS systems + bash sbin/start-ainode.sh -i -r -n + + # Windows systems + sbin\start-ainode.bat -i -r -n + ``` + +##### Parameter introduction: + +| **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | +| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | ---------------------- | +| ain_interpreter_dir | -i | The interpreter path of the virtual environment where AINode is installed requires the use of an absolute path | no | String | Default reading of environment variables | Input or persist modifications during invocation | +| ain_force_reinstall | -r | Does this script check the version when checking the installation status of AINode. If it does, it will force the installation of the whl package in lib if the version is incorrect | no | Bool | false | Input when calling | + +> Attention: When installation fails in a non networked environment, first check if the installation package corresponding to the platform is selected, and then confirm that the Python version is 3.8 (due to the limitations of the downloaded installation package on Python versions, 3.7, 3.9, and others are not allowed) + +#### Example + +##### Directly start: + +```shell + # Start command + # Linux and MacOS systems + bash sbin/start-ainode.sh + # Windows systems + sbin\start-ainode.bat + + # Backend startup command (recommended for long-term running) + # Linux and MacOS systems + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + # Windows systems + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +### Detecting the status of AINode nodes + +During the startup process of AINode, the new AINode will be automatically added to the IoTDB cluster. After starting AINode, you can enter SQL in the command line to query. If you see an AINode node in the cluster and its running status is Running (as shown below), it indicates successful joining. + + +```shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|Running| 127.0.0.1| 10810|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` + +### Stop AINode + +If you need to stop a running AINode node, execute the corresponding shutdown script. + +#### Stop command + +```shell + # Linux / MacOS + bash sbin/stop-ainode.sh + + #Windows + sbin\stop-ainode.bat + ``` + + +#### Detailed Syntax + +```shell + # Linux / MacOS + bash sbin/stop-ainode.sh -t + + #Windows + sbin\stop-ainode.bat -t + ``` + +##### Parameter introduction: + +| **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | +| ----------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ------ | ---------- | +| ain_remove_target | -t | When closing AINode, you can specify the Node ID, address, and port number of the target AINode to be removed, in the format of `` | no | String | nothing | Input when calling | + +#### Example + +```shell + # Linux / MacOS + bash sbin/stop-ainode.sh + + # Windows + sbin\stop-ainode.bat + ``` +After stopping AINode, you can still see AINode nodes in the cluster, whose running status is UNKNOWN (as shown below), and the AINode function cannot be used at this time. + + ```shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|UNKNOWN| 127.0.0.1| 10790|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` +If you need to restart the node, you need to execute the startup script again. + +### Remove AINode + +When it is necessary to remove an AINode node from the cluster, a removal script can be executed. The difference between removing and stopping scripts is that stopping retains the AINode node in the cluster but stops the AINode service, while removing removes the AINode node from the cluster. + +#### Remove command + + +```shell + # Linux / MacOS + bash sbin/remove-ainode.sh + + # Windows + sbin\remove-ainode.bat + ``` + +#### Detailed Syntax + +```shell + # Linux / MacOS + bash sbin/remove-ainode.sh -i -t/: -r -n + + # Windows + sbin\remove-ainode.bat -i -t/: -r -n + ``` + +##### Parameter introduction: + + | **Name** | **Label** | **Describe** | **Is it mandatory** | **Type** | **Default value** | **Input method** | +| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | --------------------- | +| ain_interpreter_dir | -i | The interpreter path of the virtual environment where AINode is installed requires the use of an absolute path | no | String | Default reading of environment variables | Input+persistent modification during invocation | +| ain_remove_target | -t | When closing AINode, you can specify the Node ID, address, and port number of the target AINode to be removed, in the format of `` | no | String | nothing | Input when calling | +| ain_force_reinstall | -r | Does this script check the version when checking the installation status of AINode. If it does, it will force the installation of the whl package in lib if the version is incorrect | no | Bool | false | Input when calling | +| ain_no_dependencies | -n | Specify whether to install dependencies when installing AINode, and if so, only install the AINode main program without installing dependencies | no | Bool | false | Input when calling | + + If you don't want to specify the corresponding parameters every time you start, you can also persistently modify the parameters in the `ainode-env.sh` and `ainode-env.bat` scripts in the `conf` folder (currently supporting persistent modification of the ain_interpreter-dir parameter). + + `ainode-env.sh` : + ```shell + # The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + # ain_interpreter_dir= + ``` + `ainode-env.bat` : +```shell + @REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + @REM set ain_interpreter_dir= + ``` + After writing the parameter value, uncomment the corresponding line and save it to take effect on the next script execution. + +#### Example + +##### Directly remove: + + ```shell + # Linux / MacOS + bash sbin/remove-ainode.sh + + # Windows + sbin\remove-ainode.bat + ``` + After removing the node, relevant information about the node cannot be queried. + + ```shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` +##### Specify removal: + +If the user loses files in the data folder, AINode may not be able to actively remove them locally. The user needs to specify the node number, address, and port number for removal. In this case, we support users to input parameters according to the following methods for deletion. + + ```shell + # Linux / MacOS + bash sbin/remove-ainode.sh -t /: + + # Windows + sbin\remove-ainode.bat -t /: + ``` + +## common problem + +### An error occurs when starting AINode stating that the venv module cannot be found + + When starting AINode using the default method, a Python virtual environment will be created in the installation package directory and dependencies will be installed, so it is required to install the venv module. Generally speaking, Python 3.8 and above versions come with built-in VenV, but for some systems with built-in Python environments, this requirement may not be met. There are two solutions when this error occurs (choose one or the other): + + To install the Venv module locally, taking Ubuntu as an example, you can run the following command to install the built-in Venv module in Python. Or install a Python version with built-in Venv from the Python official website. + + ```shell +apt-get install python3.8-venv +``` +Install version 3.8.0 of venv into AINode in the AINode path. + + ```shell +../Python-3.8.0/python -m venv venv(Folder Name) +``` + When running the startup script, use ` -i ` to specify an existing Python interpreter path as the running environment for AINode, eliminating the need to create a new virtual environment. + + ### The SSL module in Python is not properly installed and configured to handle HTTPS resources +WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. +You can install OpenSSLS and then rebuild Python to solve this problem +> Currently Python versions 3.6 to 3.9 are compatible with OpenSSL 1.0.2, 1.1.0, and 1.1.1. + + Python requires OpenSSL to be installed on our system, the specific installation method can be found in [link](https://stackoverflow.com/questions/56552390/how-to-fix-ssl-module-in-python-is-not-available-in-centos) + + ```shell +sudo apt-get install build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev uuid-dev lzma-dev liblzma-dev +sudo -E ./configure --with-ssl +make +sudo make install +``` + + ### Pip version is lower + + A compilation issue similar to "error: Microsoft Visual C++14.0 or greater is required..." appears on Windows + +The corresponding error occurs during installation and compilation, usually due to insufficient C++version or Setup tools version. You can check it in + + ```shell +./python -m pip install --upgrade pip +./python -m pip install --upgrade setuptools +``` + + + ### Install and compile Python + + Use the following instructions to download the installation package from the official website and extract it: + ```shell +.wget https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tar.xz +tar Jxf Python-3.8.0.tar.xz +``` + Compile and install the corresponding Python package: + ```shell +cd Python-3.8.0 +./configure prefix=/usr/local/python3 +make +sudo make install +python3 --version +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_apache.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_apache.md new file mode 100644 index 00000000..8ca7fd0a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_apache.md @@ -0,0 +1,343 @@ + +# Cluster Deployment + +This section will take the IoTDB classic cluster deployment architecture 3C3D (3 ConfigNodes and 3 DataNodes) as an example to introduce how to deploy a cluster, commonly known as the 3C3D cluster. The architecture diagram of the 3C3D cluster is as follows: + +
+ +
+ +## Note + +1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md) + +2. It is recommended to prioritize using `hostname` for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure /etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure the `cn_internal_address` and `dn_internal_address` of IoTDB using the host name. + + ``` shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + +4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + +5. Please note that when installing and deploying IoTDB, it is necessary to use the same user for operations. You can: +- Using root user (recommended): Using root user can avoid issues such as permissions. +- Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, stop and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + +## Preparation Steps + +1. Prepare the IoTDB database installation package::apache-iotdb-{version}-all-bin.zip(Please refer to the installation package for details:[IoTDB-Package](../Deployment-and-Maintenance/IoTDB-Package_apache.md)) + +2. Configure the operating system environment according to environmental requirements (system environment configuration can be found in:[Environment Requirements](https://iotdb.apache.org/UserGuide/latest/Deployment-and-Maintenance/Environment-Requirements.html)) + +## Installation Steps + +Assuming there are three Linux servers now, the IP addresses and service roles are assigned as follows: + +| Node IP | Host Name | Service | +| ----------- | --------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +### Set Host Name + +On three machines, configure the host names separately. To set the host names, configure `/etc/hosts` on the target server. Use the following command: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### Configuration + +Unzip the installation package and enter the installation directory + +```Plain +unzip apache-iotdb-{version}-all-bin.zip +cd apache-iotdb-{version}-all-bin +``` + +#### Environment Script Configuration + +- `./conf/confignode-env.sh` configuration + +| **配置项** | **Description** | **Default** | **Recommended value** | **Note** | +| :---------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | +| MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- `./conf/datanode-env.sh` configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | +| :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | +| MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### General Configuration + +Open the general configuration file `./conf/iotdb-system.properties`, The following parameters can be set according to the deployment method: + +| **Configuration** | **Description** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | +| ------------------------- | ------------------------------------------------------------ | -------------- | -------------- | -------------- | +| cluster_name | Cluster Name | defaultCluster | defaultCluster | defaultCluster | +| schema_replication_factor | The number of metadata replicas, the number of DataNodes should not be less than this number | 3 | 3 | 3 | +| data_replication_factor | The number of data replicas should not be less than this number of DataNodes | 2 | 2 | 2 | + +#### ConfigNode Configuration + +Open the ConfigNode configuration file `./conf/iotdb-system.properties`, Set the following parameters + +| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | 10710 | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | 10720 | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, `cn_internal_address:cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's `cn_internal-address: cn_internal_port` | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +#### DataNode Configuration + +Open DataNode Configuration File `./conf/iotdb-system.properties`,Set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 127.0.0.1 | Recommend using the **IPV4 address or hostname** of the server where it is located | iotdb-1 |iotdb-2 | iotdb-3 | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | 6667 | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | 10730 | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | 10740 | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | 10750 | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | 10760 | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, i.e. `cn_internal-address: cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's cn_internal-address: cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect + +### Start ConfigNode + +Start the first confignode of IoTDB-1 first, ensuring that the seed confignode node starts first, and then start the second and third confignode nodes in sequence + +```Bash +cd sbin +./start-confignode.sh -d #"- d" parameter will start in the background +``` + +If the startup fails, please refer to [Common Questions](#common-questions). + + +### Start DataNode + + Enter the `sbin` directory of iotdb and start three datanode nodes in sequence: + +```Bash +cd sbin +./start-datanode.sh -d #"- d" parameter will start in the background +``` + +### Verify Deployment + +Can be executed directly Cli startup script in `./sbin` directory: + +```Plain +./start-cli.sh -h ip(local IP or domain name) -p port(6667) +``` + +After successful startup, the following interface will appear displaying successful installation of IOTDB. + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90%E6%88%90%E5%8A%9F.png) + +You can use the `show cluster` command to view cluster information: + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90%E7%89%88%20show%20cluter.png) + +> The appearance of `ACTIVATED (W)` indicates passive activation, which means that this Configurable Node does not have a license file (or has not issued the latest license file with a timestamp), and its activation depends on other Activated Configurable Nodes in the cluster. At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. + +## Node Maintenance Steps + +### ConfigNode Node Maintenance + +ConfigNode node maintenance is divided into two types of operations: adding and removing ConfigNodes, with two common use cases: +- Cluster expansion: For example, when there is only one ConfigNode in the cluster, and you want to increase the high availability of ConfigNode nodes, you can add two ConfigNodes, making a total of three ConfigNodes in the cluster. +- Cluster failure recovery: When the machine where a ConfigNode is located fails, making the ConfigNode unable to run normally, you can remove this ConfigNode and then add a new ConfigNode to the cluster. + +> ❗️Note, after completing ConfigNode node maintenance, you need to ensure that there are 1 or 3 ConfigNodes running normally in the cluster. Two ConfigNodes do not have high availability, and more than three ConfigNodes will lead to performance loss. + +#### Adding ConfigNode Nodes + +Script command: +```shell +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-confignode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-confignode.bat +``` + +Parameter introduction: + +| Parameter | Description | Is it required | +| :--- | :--------------------------------------------- | :----------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | + +#### Removing ConfigNode Nodes + +First connect to the cluster through the CLI and confirm the internal address and port number of the ConfigNode you want to remove by using `show confignodes`: + +```Bash +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] + +#Windows +sbin/remove-confignode.bat [confignode_id] + +``` + +### DataNode Node Maintenance + +There are two common scenarios for DataNode node maintenance: + +- Cluster expansion: For the purpose of expanding cluster capabilities, add new DataNodes to the cluster +- Cluster failure recovery: When a machine where a DataNode is located fails, making the DataNode unable to run normally, you can remove this DataNode and add a new DataNode to the cluster + +> ❗️Note, in order for the cluster to work normally, during the process of DataNode node maintenance and after the maintenance is completed, the total number of DataNodes running normally should not be less than the number of data replicas (usually 2), nor less than the number of metadata replicas (usually 3). + +#### Adding DataNode Nodes + +Script command: + +```Bash +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-datanode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-datanode.bat +``` + +Parameter introduction: + +| Abbreviation | Description | Is it required | +| :--- | :--------------------------------------------- | :----------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | + +Note: After adding a DataNode, as new writes arrive (and old data expires, if TTL is set), the cluster load will gradually balance towards the new DataNode, eventually achieving a balance of storage and computation resources on all nodes. + +#### Removing DataNode Nodes + +First connect to the cluster through the CLI and confirm the RPC address and port number of the DataNode you want to remove with `show datanodes`: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [datanode_id] + +#Windows +sbin/remove-datanode.bat [datanode_id] +``` +## Common Questions + +1. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_timecho.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..08579e8a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_timecho.md @@ -0,0 +1,384 @@ + +# Cluster Deployment + +This section describes how to manually deploy an instance that includes 3 ConfigNodes and 3 DataNodes, commonly known as a 3C3D cluster. + +
+ +
+ +## Note + +1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md) + +2. It is recommended to prioritize using `hostname` for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure /etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure the `cn_internal_address` and `dn_internal_address` of IoTDB using the host name. + ``` shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + +4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + +5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: +- Using root user (recommended): Using root user can avoid issues such as permissions. +- Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + +6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department,The steps for deploying a monitoring panel can refer to:[Monitoring Panel Deployment](./Monitoring-panel-deployment.md) + +## Preparation Steps + +1. Prepare the IoTDB database installation package: iotdb enterprise- {version}-bin.zip(The installation package can be obtained from:[IoTDB-Package](../Deployment-and-Maintenance/IoTDB-Package_timecho.md)) +2. Configure the operating system environment according to environmental requirements(The system environment configuration can be found in:[Environment Requirement](https://www.timecho.com/docs/UserGuide/latest/Deployment-and-Maintenance/Environment-Requirements.html)) + +## Installation Steps + +Assuming there are three Linux servers now, the IP addresses and service roles are assigned as follows: + +| Node IP | Host Name | Service | +| ----------- | --------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +### Set Host Name + +On three machines, configure the host names separately. To set the host names, configure `/etc/hosts` on the target server. Use the following command: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### Configuration + +Unzip the installation package and enter the installation directory + +```Plain +unzip iotdb-enterprise-{version}-bin.zip +cd iotdb-enterprise-{version}-bin +``` + +#### Environment script configuration + +- `./conf/confignode-env.sh` configuration + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | + | MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- `./conf/datanode-env.sh` configuration + + | **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | + | :---------------- | :----------------------------------------------------------- | :---------- | :----------------------------------------------------------- | :---------------------------------- | + | MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | - | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### General Configuration + +Open the general configuration file `./conf/iotdb-system.properties`,The following parameters can be set according to the deployment method: + +| **Configuration** | **Description** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | +| ------------------------- | ------------------------------------------------------------ | -------------- | -------------- | -------------- | +| cluster_name | Cluster Name | defaultCluster | defaultCluster | defaultCluster | +| schema_replication_factor | The number of metadata replicas, the number of DataNodes should not be less than this number | 3 | 3 | 3 | +| data_replication_factor | The number of data replicas should not be less than this number of DataNodes | 2 | 2 | 2 | + +#### ConfigNode Configuration + +Open the ConfigNode configuration file `./conf/iotdb-system.properties`,Set the following parameters + +| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | 10710 | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | 10720 | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, `cn_internal_address:cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's `cn_internal-address: cn_internal_port` | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +#### DataNode Configuration + +Open DataNode Configuration File `./conf/iotdb-system.properties`,Set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | Note | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | ------------- | ------------- | ------------- | ---------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 127.0.0.1 | Recommend using the **IPV4 address or hostname** of the server where it is located | iotdb-1 |iotdb-2 | iotdb-3 | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | 6667 | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | iotdb-1 | iotdb-2 | iotdb-3 | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | 10730 | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | 10740 | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | 10750 | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | 10760 | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, i.e. `cn_internal-address: cn_internal_port` | 127.0.0.1:10710 | The first CongfigNode's cn_internal-address: cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | Cannot be modified after initial startup | + +> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect + +### Start ConfigNode + +Start the first confignode of IoTDB-1 first, ensuring that the seed confignode node starts first, and then start the second and third confignode nodes in sequence + +```Bash +cd sbin +./start-confignode.sh -d #"- d" parameter will start in the background +``` + +If the startup fails, please refer to [Common Questions](#common-questions). + + +### Activate Database + +#### Method 1: Activate file copy activation + +- After starting three confignode nodes in sequence, copy the `activation` folder of each machine and the `system_info` file of each machine to the Timecho staff; +- The staff will return the license files for each ConfigNode node, where 3 license files will be returned; +- Put the three license files into the `activation` folder of the corresponding ConfigNode node; + +#### Method 2: Activate Script Activation + +- Obtain the machine codes of three machines in sequence, enter the `sbin` directory of the installation directory, and execute the activation script `start activate.sh`: + + ```Bash + cd sbin + ./start-activate.sh + ``` + +- The following information is displayed, where the machine code of one machine is displayed: + + ```Bash + Please copy the system_info's content and send it to Timecho: + 01-KU5LDFFN-PNBEHDRH + Please enter license: + ``` + +- The other two nodes execute the activation script `start activate.sh` in sequence, and then copy the machine codes of the three machines obtained to the Timecho staff +- The staff will return 3 activation codes, which normally correspond to the order of the provided 3 machine codes. Please paste each activation code into the previous command line prompt `Please enter license:`, as shown below: + + ```Bash + Please enter license: + Jw+MmF+Atxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx5bAOXNeob5l+HO5fEMgzrW8OJPh26Vl6ljKUpCvpTiw== + License has been stored to sbin/../activation/license + Import completed. Please start cluster and excute 'show cluster' to verify activation status + ``` + +### Start DataNode + + Enter the `sbin` directory of iotdb and start three datanode nodes in sequence: + +```Bash +cd sbin +./start-datanode.sh -d #"- d" parameter will start in the background +``` + +### Verify Deployment + +Can be executed directly Cli startup script in `./sbin` directory: + +```Plain +./start-cli.sh -h ip(local IP or domain name) -p port(6667) +``` + + After successful startup, the following interface will appear displaying successful installation of IOTDB. + +![](https://alioss.timecho.com/docs/img/%E4%BC%81%E4%B8%9A%E7%89%88%E6%88%90%E5%8A%9F.png) + +After the installation success interface appears, continue to check if the activation is successful and use the `show cluster` command. + +When you see the display of `Activated` on the far right, it indicates successful activation. + +![](https://alioss.timecho.com/docs/img/%E4%BC%81%E4%B8%9A%E7%89%88%E6%BF%80%E6%B4%BB.png) + +> The appearance of `ACTIVATED (W)` indicates passive activation, which means that this Configurable Node does not have a license file (or has not issued the latest license file with a timestamp), and its activation depends on other Activated Configurable Nodes in the cluster. At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. + +## Node Maintenance Steps + +### ConfigNode Node Maintenance + +ConfigNode node maintenance is divided into two types of operations: adding and removing ConfigNodes, with two common use cases: +- Cluster expansion: For example, when there is only one ConfigNode in the cluster, and you want to increase the high availability of ConfigNode nodes, you can add two ConfigNodes, making a total of three ConfigNodes in the cluster. +- Cluster failure recovery: When the machine where a ConfigNode is located fails, making the ConfigNode unable to run normally, you can remove this ConfigNode and then add a new ConfigNode to the cluster. + +> ❗️Note, after completing ConfigNode node maintenance, you need to ensure that there are 1 or 3 ConfigNodes running normally in the cluster. Two ConfigNodes do not have high availability, and more than three ConfigNodes will lead to performance loss. + +#### Adding ConfigNode Nodes + +Script command: +```shell +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-confignode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-confignode.bat +``` + +Parameter introduction: + +| Parameter | Description | Is it required | +| :--- | :--------------------------------------------- | :----------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | + +#### Removing ConfigNode Nodes + +First connect to the cluster through the CLI and confirm the internal address and port number of the ConfigNode you want to remove by using `show confignodes`: + +```Bash +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] + +#Windows +sbin/remove-confignode.bat [confignode_id] + +``` + +### DataNode Node Maintenance + +There are two common scenarios for DataNode node maintenance: + +- Cluster expansion: For the purpose of expanding cluster capabilities, add new DataNodes to the cluster +- Cluster failure recovery: When a machine where a DataNode is located fails, making the DataNode unable to run normally, you can remove this DataNode and add a new DataNode to the cluster + +> ❗️Note, in order for the cluster to work normally, during the process of DataNode node maintenance and after the maintenance is completed, the total number of DataNodes running normally should not be less than the number of data replicas (usually 2), nor less than the number of metadata replicas (usually 3). + +#### Adding DataNode Nodes + +Script command: + +```Bash +# Linux / MacOS +# First switch to the IoTDB root directory +sbin/start-datanode.sh + +# Windows +# First switch to the IoTDB root directory +sbin/start-datanode.bat +``` + +Parameter introduction: + +| Abbreviation | Description | Is it required | +| :--- | :--------------------------------------------- | :----------- | +| -v | Show version information | No | +| -f | Run the script in the foreground, do not put it in the background | No | +| -d | Start in daemon mode, i.e. run in the background | No | +| -p | Specify a file to store the process ID for process management | No | +| -c | Specify the path to the configuration file folder, the script will load the configuration file from here | No | +| -g | Print detailed garbage collection (GC) information | No | +| -H | Specify the path of the Java heap dump file, used when JVM memory overflows | No | +| -E | Specify the path of the JVM error log file | No | +| -D | Define system properties, in the format key=value | No | +| -X | Pass -XX parameters directly to the JVM | No | +| -h | Help instruction | No | + +Note: After adding a DataNode, as new writes arrive (and old data expires, if TTL is set), the cluster load will gradually balance towards the new DataNode, eventually achieving a balance of storage and computation resources on all nodes. + +#### Removing DataNode Nodes + +First connect to the cluster through the CLI and confirm the RPC address and port number of the DataNode you want to remove with `show datanodes`: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +Then use the script to remove the DataNode. Script command: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [datanode_id] + +#Windows +sbin/remove-datanode.bat [datanode_id] +``` + +## Common Questions +1. Multiple prompts indicating activation failure during deployment process + - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. + - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. + +2. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Database-Resources.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Database-Resources.md new file mode 100644 index 00000000..59a380db --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Database-Resources.md @@ -0,0 +1,194 @@ + +# Database Resources +## CPU + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Number of timeseries (frequency<=1HZ)CPUNumber of nodes
standalone modeDouble activeDistributed
Within 1000002core-4core123
Within 3000004core-8core123
Within 5000008core-26core123
Within 100000016core-32core123
Within 200000032core-48core123
Within 1000000048core12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
+ +## Memory + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Number of timeseries (frequency<=1HZ)MemoryNumber of nodes
standalone modeDouble activeDistributed
Within 1000004G-8G123
Within 30000012G-32G123
Within 50000024G-48G123
Within 100000032G-96G123
Within 200000064G-128G123
Within 10000000128G12Please contact Timecho Business for consultation
Over 10000000Please contact Timecho Business for consultation
+ +## Storage (Disk) +### Storage space +Calculation formula: Number of measurement points * Sampling frequency (Hz) * Size of each data point (Byte, different data types may vary, see table below) * Storage time (seconds) * Number of copies (usually 1 copy for a single node and 2 copies for a cluster) ÷ Compression ratio (can be estimated at 5-10 times, but may be higher in actual situations) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Data point size calculation
data typeTimestamp (Bytes) Value (Bytes) Total size of data points (in bytes) +
Boolean819
INT32/FLOAT8412
INT64/DOUBLE8816
TEXT8The average is a8+a
+ +Example: 1000 devices, each with 100 measurement points, a total of 100000 sequences, INT32 type. Sampling frequency 1Hz (once per second), storage for 1 year, 3 copies. +- Complete calculation formula: 1000 devices * 100 measurement points * 12 bytes per data point * 86400 seconds per day * 365 days per year * 3 copies/10 compression ratio=11T +- Simplified calculation formula: 1000 * 100 * 12 * 86400 * 365 * 3/10=11T +### Storage Configuration +If the number of nodes is over 10000000 or the query load is high, it is recommended to configure SSD +## Network (Network card) +If the write throughput does not exceed 10 million points/second, configure 1Gbps network card. When the write throughput exceeds 10 million points per second, a 10Gbps network card needs to be configured. +| **Write throughput (data points per second)** | **NIC rate** | +| ------------------- | ------------- | +| <10 million | 1Gbps | +| >=10 million | 10Gbps | +## Other instructions +IoTDB has the ability to scale up clusters in seconds, and expanding node data does not require migration. Therefore, you do not need to worry about the limited cluster capacity estimated based on existing data. In the future, you can add new nodes to the cluster when you need to scale up. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md new file mode 100644 index 00000000..813bcfbb --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -0,0 +1,418 @@ + +# Docker Deployment + +## Environmental Preparation + +### Docker Installation + +```SQL +#Taking Ubuntu as an example, other operating systems can search for installation methods themselves +#step1: Install some necessary system tools +sudo apt-get update +sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common +#step2: Install GPG certificate +curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - +#step3: Write software source information +sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" +#step4: Update and install Docker CE +sudo apt-get -y update +sudo apt-get -y install docker-ce +#step5: Set Docker to start automatically upon startup +sudo systemctl enable docker +#step6: Verify if Docker installation is successful +docker --version #Display version information, indicating successful installation +``` + +### Docker-compose Installation + +```SQL +#Installation command +curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose +chmod +x /usr/local/bin/docker-compose +ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose +#Verify if the installation was successful +docker-compose --version #Displaying version information indicates successful installation +``` + +## Stand-Alone Deployment + +This section demonstrates how to deploy a standalone Docker version of 1C1D. + +### Pull Image File + +The Docker image of Apache IoTDB has been uploaded tohttps://hub.docker.com/r/apache/iotdb。 + +Taking obtaining version 1.3.2 as an example, pull the image command: + +```bash +docker pull apache/iotdb:1.3.2-standalone +``` + +View image: + +```bash +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E6%8B%89%E5%8F%96%E9%95%9C%E5%83%8F.PNG) + +### Create Docker Bridge Network + +```Bash +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +``` + +### Write The Yml File For Docker-Compose + +Here we take the example of consolidating the IoTDB installation directory and yml files in the/docker iotdb folder: + +The file directory structure is:`/docker-iotdb/iotdb`, `/docker-iotdb/docker-compose-standalone.yml ` + +```bash +docker-iotdb: +├── iotdb #Iotdb installation directory +│── docker-compose-standalone.yml #YML file for standalone Docker Composer +``` + +The complete docker-compose-standalone.yml content is as follows: + +```bash +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.3.2-standalone #The image used + hostname: iotdb + container_name: iotdb + restart: always + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb:10710 + - dn_rpc_address=iotdb + - dn_internal_address=iotdb + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb:10710 + privileged: true + volumes: + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro + networks: + iotdb: + ipv4_address: 172.18.0.6 +networks: + iotdb: + external: true +``` + +### Start IoTDB + +Use the following command to start: + +```bash +cd /docker-iotdb +docker-compose -f docker-compose-standalone.yml up -d #Background startup +``` + +### Validate Deployment + +- Viewing the log, the following words indicate successful startup + +```SQL +docker logs -f iotdb-datanode #View log command +2024-07-21 08:22:38,457 [main] INFO o.a.i.db.service.DataNode:227 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B2.png) + +- Enter the container to view the service running status and activation information + +View the launched container + +```SQL +docker ps +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B22.png) + +Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status + +```SQL +docker exec -it iotdb /bin/bash #Entering the container +./start-cli.sh -h iotdb #Log in to the database +IoTDB> show cluster #View status +``` + +You can see that all services are running and the activation status shows as activated. + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B23.png) + +### Map/conf Directory (optional) + +If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: + +Step 1: Copy the /conf directory from the container to `/docker-iotdb/iotdb/conf` + +```bash +docker cp iotdb:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +Step 2: Add mappings in docker-compose-standalone.yml + +```bash + volumes: + - ./iotdb/conf:/iotdb/conf #Add mapping for this/conf folder + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro +``` + +Step 3: Restart IoTDB + +```bash +docker-compose -f docker-compose-standalone.yml up -d +``` + +## Cluster Deployment + +This section describes how to manually deploy an instance that includes 3 Config Nodes and 3 Data Nodes, commonly known as a 3C3D cluster. + +
+ +
+ +**Note: The cluster version currently only supports host and overlay networks, and does not support bridge networks.** + +Taking the host network as an example, we will demonstrate how to deploy a 3C3D cluster. + +### Set Host Name + +Assuming there are currently three Linux servers, the IP addresses and service role assignments are as follows: + +| Node IP | Host Name | Service | +| ----------- | --------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +Configure the host names on three machines separately. To set the host names, configure `/etc/hosts` on the target server using the following command: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### Pull Image File + +The Docker image of Apache IoTDB has been uploaded tohttps://hub.docker.com/r/apache/iotdb。 + +Pull IoTDB images from three servers separately, taking version 1.3.2 as an example. The pull image command is: + +```SQL +docker pull apache/iotdb:1.3.2-standalone +``` + +View image: + +```SQL +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%881.png) + +### Write The Yml File For Docker Compose + +Here we take the example of consolidating the IoTDB installation directory and yml files in the `/docker-iotdb` folder: + +The file directory structure is :`/docker-iotdb/iotdb`, `/docker-iotdb/confignode.yml`,`/docker-iotdb/datanode.yml` + +```SQL +docker-iotdb: +├── confignode.yml #Yml file of confignode +├── datanode.yml #Yml file of datanode +└── iotdb #IoTDB installation directory +``` + +On each server, two yml files need to be written, namely confignnode. yml and datanode. yml. The example of yml is as follows: + +**confignode.yml:** + +```bash +#confignode.yml +version: "3" +services: + iotdb-confignode: + image: iotdb-enterprise:1.3.2.3-standalone #The image used + hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + container_name: iotdb-confignode + command: ["bash", "-c", "entrypoint.sh confignode"] + restart: always + environment: + - cn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-1:10710 #The default first node is the seed node + - schema_replication_factor=3 #Number of metadata copies + - data_replication_factor=2 #Number of data replicas + privileged: true + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + network_mode: "host" #Using the host network +``` + +**datanode.yml:** + +```bash +#datanode.yml +version: "3" +services: + iotdb-datanode: + image: iotdb-enterprise:1.3.2.3-standalone #The image used + hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + container_name: iotdb-datanode + command: ["bash", "-c", "entrypoint.sh datanode"] + restart: always + ports: + - "6667:6667" + privileged: true + environment: + - dn_rpc_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + - dn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + - dn_seed_config_node=iotdb-1:10710 #The default first node is the seed node + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - schema_replication_factor=3 #Number of metadata copies + - data_replication_factor=2 #Number of data replicas + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + network_mode: "host" #Using the host network +``` + +### Starting Confignode For The First Time + +First, start configNodes on each of the three servers to obtain the machine code. Pay attention to the startup order, start the first iotdb-1 first, then start iotdb-2 and iotdb-3. + +```bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d #Background startup +``` + +### Start Datanode + +Start datanodes on 3 servers separately + +```SQL +cd /docker-iotdb +docker-compose -f datanode.yml up -d #Background startup +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%882.png) + +### Validate Deployment + +- Viewing the logs, the following words indicate that the datanode has successfully started + + ```SQL + docker logs -f iotdb-datanode #View log command + 2024-07-21 09:40:58,120 [main] INFO o.a.i.db.service.DataNode:227 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%883.png) + +- Enter any container to view the service running status and activation information + + View the launched container + + ```SQL + docker ps + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%884.png) + + Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status + + ```SQL + docker exec -it iotdb-datanode /bin/bash #Entering the container + ./start-cli.sh -h iotdb-1 #Log in to the database + IoTDB> show cluster #View status + ``` + + You can see that all services are running and the activation status shows as activated. + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%885.png) + +### Map/conf Directory (optional) + +If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: + +Step 1: Copy the `/conf` directory from the container to `/docker-iotdb/iotdb/conf` on each of the three servers + +```bash +docker cp iotdb-confignode:/iotdb/conf /docker-iotdb/iotdb/conf +or +docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +Step 2: Add `/conf` directory mapping in `confignode.yml` and `datanode. yml` on 3 servers + +```bash +#confignode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro + +#datanode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro +``` + +Step 3: Restart IoTDB on 3 servers + +```bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d +docker-compose -f datanode.yml up -d +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_timecho.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_timecho.md new file mode 100644 index 00000000..a9c8daac --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_timecho.md @@ -0,0 +1,475 @@ + +# Docker Deployment + +## Environmental Preparation + +### Docker Installation + +```Bash +#Taking Ubuntu as an example, other operating systems can search for installation methods themselves +#step1: Install some necessary system tools +sudo apt-get update +sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common +#step2: Install GPG certificate +curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - +#step3: Write software source information +sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" +#step4: Update and install Docker CE +sudo apt-get -y update +sudo apt-get -y install docker-ce +#step5: Set Docker to start automatically upon startup +sudo systemctl enable docker +#step6: Verify if Docker installation is successful +docker --version #Display version information, indicating successful installation +``` + +### Docker-compose Installation + +```Bash +#Installation command +curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose +chmod +x /usr/local/bin/docker-compose +ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose +#Verify if the installation was successful +docker-compose --version #Displaying version information indicates successful installation +``` + +### Install The Dmidecode Plugin + +By default, Linux servers should already be installed. If not, you can use the following command to install them. + +```Bash +sudo apt-get install dmidecode +``` + +After installing dmidecode, search for the installation path: `wherever dmidecode`. Assuming the result is `/usr/sbin/dmidecode`, remember this path as it will be used in the later docker compose yml file. + +### Get Container Image Of IoTDB + +You can contact business or technical support to obtain container images for IoTDB Enterprise Edition. + +## Stand-Alone Deployment + +This section demonstrates how to deploy a standalone Docker version of 1C1D. + +### Load Image File + +For example, the container image file name of IoTDB obtained here is: `iotdb-enterprise-1.3.2-3-standalone-docker.tar.gz` + +Load image: + +```Bash +docker load -i iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz +``` + +View image: + +```Bash +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E6%9F%A5%E7%9C%8B%E9%95%9C%E5%83%8F.PNG) + +### Create Docker Bridge Network + +```Bash +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +``` + +### Write The Yml File For docker-compose + +Here we take the example of consolidating the IoTDB installation directory and yml files in the/docker iotdb folder: + +The file directory structure is:`/docker-iotdb/iotdb`, `/docker-iotdb/docker-compose-standalone.yml ` + +```Bash +docker-iotdb: +├── iotdb #Iotdb installation directory +│── docker-compose-standalone.yml #YML file for standalone Docker Composer +``` + +The complete docker-compose-standalone.yml content is as follows: + +```Bash +version: "3" +services: + iotdb-service: + image: iotdb-enterprise:1.3.2.3-standalone #The image used + hostname: iotdb + container_name: iotdb + restart: always + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb:10710 + - dn_rpc_address=iotdb + - dn_internal_address=iotdb + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb:10710 + privileged: true + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + networks: + iotdb: + ipv4_address: 172.18.0.6 +networks: + iotdb: + external: true +``` + +### First Launch + +Use the following command to start: + +```Bash +cd /docker-iotdb +docker-compose -f docker-compose-standalone.yml up +``` + +Due to lack of activation, it is normal to exit directly upon initial startup. The initial startup is to obtain the machine code file for the subsequent activation process. + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E6%BF%80%E6%B4%BB.png) + +### Apply For Activation + +- After the first startup, a system_info file will be generated in the physical machine directory `/docker-iotdb/iotdb/activation`, and this file will be copied to the Timecho staff. + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB1.png) + +- Received the license file returned by the staff, copy the license file to the `/docker iotdb/iotdb/activation` folder. + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB2.png) + +### Restart IoTDB + +```Bash +docker-compose -f docker-compose-standalone.yml up -d +``` + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8iotdb.png) + +### Validate Deployment + +- Viewing the log, the following words indicate successful startup + + ```Bash + docker logs -f iotdb-datanode #View log command + 2024-07-19 12:02:32,608 [main] INFO o.a.i.db.service.DataNode:231 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B21.png) + +- Enter the container to view the service running status and activation information + + View the launched container + + ```Bash + docker ps + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B22.png) + + Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status + + ```Bash + docker exec -it iotdb /bin/bash #Entering the container + ./start-cli.sh -h iotdb #Log in to the database + IoTDB> show cluster #View status + ``` + + You can see that all services are running and the activation status shows as activated. + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B23.png) + +### Map/conf Directory (optional) + +If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: + +Step 1: Copy the/conf directory from the container to/docker-iotdb/iotdb/conf + +```Bash +docker cp iotdb:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +Step 2: Add mappings in docker-compose-standalone.yml + +```Bash + volumes: + - ./iotdb/conf:/iotdb/conf #Add mapping for this/conf folder + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro +``` + +Step 3: Restart IoTDB + +```Bash +docker-compose -f docker-compose-standalone.yml up -d +``` + +## Cluster Deployment + +This section describes how to manually deploy an instance that includes 3 Config Nodes and 3 Data Nodes, commonly known as a 3C3D cluster. + +
+ +
+ +**Note: The cluster version currently only supports host and overlay networks, and does not support bridge networks.** + +Taking the host network as an example, we will demonstrate how to deploy a 3C3D cluster. + +### Set Host Name + +Assuming there are currently three Linux servers, the IP addresses and service role assignments are as follows: + +| Node IP | Host Name | Service | +| ----------- | --------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +Configure the host names on three machines separately. To set the host names, configure `/etc/hosts` on the target server using the following command: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### Load Image File + +For example, the container image file name obtained for IoTDB is: `iotdb-enterprise-1.3.23-standalone-docker.tar.gz` + +Execute the load image command on three servers separately: + +```Bash +docker load -i iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz +``` + +View image: + +```Bash +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E9%95%9C%E5%83%8F%E5%8A%A0%E8%BD%BD.png) + +### Write The Yml File For Docker Compose + +Here we take the example of consolidating the IoTDB installation directory and yml files in the /docker-iotdb folder: + +The file directory structure is:/docker-iotdb/iotdb, /docker-iotdb/confignode.yml,/docker-iotdb/datanode.yml + +```Bash +docker-iotdb: +├── confignode.yml #Yml file of confignode +├── datanode.yml #Yml file of datanode +└── iotdb #IoTDB installation directory +``` + +On each server, two yml files need to be written, namely confignnode. yml and datanode. yml. The example of yml is as follows: + +**confignode.yml:** + +```Bash +#confignode.yml +version: "3" +services: + iotdb-confignode: + image: iotdb-enterprise:1.3.2.3-standalone #The image used + hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + container_name: iotdb-confignode + command: ["bash", "-c", "entrypoint.sh confignode"] + restart: always + environment: + - cn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-1:10710 #The default first node is the seed node + - schema_replication_factor=3 #Number of metadata copies + - data_replication_factor=2 #Number of data replicas + privileged: true + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + network_mode: "host" #Using the host network +``` + +**datanode.yml:** + +```Bash +#datanode.yml +version: "3" +services: + iotdb-datanode: + image: iotdb-enterprise:1.3.2.3-standalone #The image used + hostname: iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + container_name: iotdb-datanode + command: ["bash", "-c", "entrypoint.sh datanode"] + restart: always + ports: + - "6667:6667" + privileged: true + environment: + - dn_rpc_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + - dn_internal_address=iotdb-1|iotdb-2|iotdb-3 #Choose from three options based on the actual situation + - dn_seed_config_node=iotdb-1:10710 #The default first node is the seed node + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - schema_replication_factor=3 #Number of metadata copies + - data_replication_factor=2 #Number of data replicas + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + network_mode: "host" #Using the host network +``` + +### Starting Confignode For The First Time + +First, start configNodes on each of the three servers to obtain the machine code. Pay attention to the startup order, start the first iotdb-1 first, then start iotdb-2 and iotdb-3. + +```Bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d #Background startup +``` + +### Apply For Activation + +- After starting three confignodes for the first time, a system_info file will be generated in each physical machine directory `/docker-iotdb/iotdb/activation`, and the system_info files of the three servers will be copied to the Timecho staff; + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB1.png) + +- Put the three license files into the `/docker iotdb/iotdb/activation` folder of the corresponding Configurable Node node; + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB2.png) + +- After the license is placed in the corresponding activation folder, confignode will be automatically activated without restarting confignode + +### Start Datanode + +Start datanodes on 3 servers separately + +```Bash +cd /docker-iotdb +docker-compose -f datanode.yml up -d #Background startup +``` + +![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4%E7%89%88-dn%E5%90%AF%E5%8A%A8.png) + +### Validate Deployment + +- Viewing the logs, the following words indicate that the datanode has successfully started + + ```Bash + docker logs -f iotdb-datanode #View log command + 2024-07-20 16:50:48,937 [main] INFO o.a.i.db.service.DataNode:231 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! + ``` + + ![](https://alioss.timecho.com/docs/img/dn%E5%90%AF%E5%8A%A8.png) + +- Enter any container to view the service running status and activation information + + View the launched container + + ```Bash + docker ps + ``` + + ![](https://alioss.timecho.com/docs/img/%E6%9F%A5%E7%9C%8B%E5%AE%B9%E5%99%A8.png) + + Enter the container, log in to the database through CLI, and use the `show cluster` command to view the service status and activation status + + ```Bash + docker exec -it iotdb-datanode /bin/bash #Entering the container + ./start-cli.sh -h iotdb-1 #Log in to the database + IoTDB> show cluster #View status + ``` + + You can see that all services are running and the activation status shows as activated. + + ![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4-%E6%BF%80%E6%B4%BB.png) + +### Map/conf Directory (optional) + +If you want to directly modify the configuration file in the physical machine in the future, you can map the/conf folder in the container in three steps: + +Step 1: Copy the `/conf` directory from the container to `/docker-iotdb/iotdb/conf` on each of the three servers + +```Bash +docker cp iotdb-confignode:/iotdb/conf /docker-iotdb/iotdb/conf +or +docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +Step 2: Add `/conf` directory mapping in `confignode.yml` and `datanode. yml` on 3 servers + +```Bash +#confignode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + +#datanode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #Add mapping for this /conf folder + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro +``` + +Step 3: Restart IoTDB on 3 servers + +```Bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d +docker-compose -f datanode.yml up -d +``` + diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md new file mode 100644 index 00000000..2fa40344 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md @@ -0,0 +1,164 @@ + +# Dual Active Deployment + +## What is a double active version? + +Dual active usually refers to two independent machines (or clusters) that perform real-time mirror synchronization. Their configurations are completely independent and can simultaneously receive external writes. Each independent machine (or cluster) can synchronize the data written to itself to another machine (or cluster), and the data of the two machines (or clusters) can achieve final consistency. + +- Two standalone machines (or clusters) can form a high availability group: when one of the standalone machines (or clusters) stops serving, the other standalone machine (or cluster) will not be affected. When the single machine (or cluster) that stopped the service is restarted, another single machine (or cluster) will synchronize the newly written data. Business can be bound to two standalone machines (or clusters) for read and write operations, thereby achieving high availability. +- The dual active deployment scheme allows for high availability with fewer than 3 physical nodes and has certain advantages in deployment costs. At the same time, the physical supply isolation of two sets of single machines (or clusters) can be achieved through the dual ring network of power and network, ensuring the stability of operation. +- At present, the dual active capability is a feature of the enterprise version. + +![](https://alioss.timecho.com/docs/img/20240731104336.png) + +## Note + +1. It is recommended to prioritize using `hostname` for IP configuration during deployment to avoid the problem of database failure caused by modifying the host IP in the later stage. To set the hostname, you need to configure `/etc/hosts` on the target server. If the local IP is 192.168.1.3 and the hostname is iotdb-1, you can use the following command to set the server's hostname and configure IoTDB's `cn_internal-address` and` dn_internal-address` using the hostname. + + ```Bash + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +2. Some parameters cannot be modified after the first startup, please refer to the "Installation Steps" section below to set them. + +3. Recommend deploying a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department. The steps for deploying the monitoring panel can be referred to [Monitoring Panel Deployment](https://www.timecho.com/docs/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.html) + +## Installation Steps + +Taking the dual active version IoTDB built by two single machines A and B as an example, the IP addresses of A and B are 192.168.1.3 and 192.168.1.4, respectively. Here, we use hostname to represent different hosts. The plan is as follows: + +| Machine | Machine IP | Host Name | +| ------- | ----------- | --------- | +| A | 192.168.1.3 | iotdb-1 | +| B | 192.168.1.4 | iotdb-2 | + +### Step1:Install Two Independent IoTDBs Separately + +Install IoTDB on two machines separately, and refer to the deployment documentation for the standalone version [Stand-Alone Deployment](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md),The deployment document for the cluster version can be referred to [Cluster Deployment](../Deployment-and-Maintenance/Cluster-Deployment_timecho.md)。**It is recommended that the configurations of clusters A and B remain consistent to achieve the best dual active effect** + +### Step2:Create A Aata Synchronization Task On Machine A To Machine B + +- Create a data synchronization process on machine A, where the data on machine A is automatically synchronized to machine B. Use the cli tool in the sbin directory to connect to the IoTDB database on machine A: + + ```Bash + ./sbin/start-cli.sh -h iotdb-1 + ``` + +- Create and start the data synchronization command with the following SQL: + + ```Bash + create pipe AB + with source ( + 'source.forwarding-pipe-requests' = 'false' + ) + with sink ( + 'sink'='iotdb-thrift-sink', + 'sink.ip'='iotdb-2', + 'sink.port'='6667' + ) + ``` + +- Note: To avoid infinite data loops, it is necessary to set the parameter `source. forwarding pipe questions` on both A and B to `false`, indicating that data transmitted from another pipe will not be forwarded. + +### Step3:Create A Data Synchronization Task On Machine B To Machine A + +- Create a data synchronization process on machine B, where the data on machine B is automatically synchronized to machine A. Use the cli tool in the sbin directory to connect to the IoTDB database on machine B + + ```Bash + ./sbin/start-cli.sh -h iotdb-2 + ``` + + Create and start the pipe with the following SQL: + + ```Bash + create pipe BA + with source ( + 'source.forwarding-pipe-requests' = 'false' + ) + with sink ( + 'sink'='iotdb-thrift-sink', + 'sink.ip'='iotdb-1', + 'sink.port'='6667' + ) + ``` + +- Note: To avoid infinite data loops, it is necessary to set the parameter `source. forwarding pipe questions` on both A and B to `false` , indicating that data transmitted from another pipe will not be forwarded. + +### Step4:Validate Deployment + +After the above data synchronization process is created, the dual active cluster can be started. + +#### Check the running status of the cluster + +```Bash +#Execute the show cluster command on two nodes respectively to check the status of IoTDB service +show cluster +``` + +**Machine A**: + +![](https://alioss.timecho.com/docs/img/%E5%8F%8C%E6%B4%BB-A.png) + +**Machine B**: + +![](https://alioss.timecho.com/docs/img/%E5%8F%8C%E6%B4%BB-B.png) + +Ensure that every Configurable Node and DataNode is in the Running state. + +#### Check synchronization status + +- Check the synchronization status on machine A + +```Bash +show pipes +``` + +![](https://alioss.timecho.com/docs/img/show%20pipes-A.png) + +- Check the synchronization status on machine B + +```Bash +show pipes +``` + +![](https://alioss.timecho.com/docs/img/show%20pipes-B.png) + +Ensure that every pipe is in the RUNNING state. + +### Step5:Stop Dual Active Version IoTDB + +- Execute the following command on machine A: + + ```SQL + ./sbin/start-cli.sh -h iotdb-1 #Log in to CLI + IoTDB> stop pipe AB #Stop the data synchronization process + ./sbin/stop-standalone.sh #Stop database service + ``` + +- Execute the following command on machine B: + + ```SQL + ./sbin/start-cli.sh -h iotdb-2 #Log in to CLI + IoTDB> stop pipe BA #Stop the data synchronization process + ./sbin/stop-standalone.sh #Stop database service + ``` + diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Environment-Requirements.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Environment-Requirements.md new file mode 100644 index 00000000..a1b54472 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Environment-Requirements.md @@ -0,0 +1,191 @@ + +# System Requirements + +## Disk Array + +### Configuration Suggestions + +IoTDB has no strict operation requirements on disk array configuration. It is recommended to use multiple disk arrays to store IoTDB data to achieve the goal of concurrent writing to multiple disk arrays. For configuration, refer to the following suggestions: + +1. Physical environment + System disk: You are advised to use two disks as Raid1, considering only the space occupied by the operating system itself, and do not reserve system disk space for the IoTDB + Data disk: + Raid is recommended to protect data on disks + It is recommended to provide multiple disks (1-6 disks) or disk groups for the IoTDB. (It is not recommended to create a disk array for all disks, as this will affect the maximum performance of the IoTDB.) +2. Virtual environment + You are advised to mount multiple hard disks (1-6 disks). + +### Configuration Example + +- Example 1: Four 3.5-inch hard disks + +Only a few hard disks are installed on the server. Configure Raid5 directly. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| ----------- | -------- | -------- | --------- | -------- | +| system/data disk | RAID5 | 4 | 1 | 3 | is allowed to fail| + +- Example 2: Twelve 3.5-inch hard disks + +The server is configured with twelve 3.5-inch disks. +Two disks are recommended as Raid1 system disks. The two data disks can be divided into two Raid5 groups. Each group of five disks can be used as four disks. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| -------- | -------- | -------- | --------- | -------- | +| system disk | RAID1 | 2 | 1 | 1 | +| data disk | RAID5 | 5 | 1 | 4 | +| data disk | RAID5 | 5 | 1 | 4 | +- Example 3:24 2.5-inch disks + +The server is configured with 24 2.5-inch disks. +Two disks are recommended as Raid1 system disks. The last two disks can be divided into three Raid5 groups. Each group of seven disks can be used as six disks. The remaining block can be idle or used to store pre-write logs. +The recommended configurations are as follows: +| **Use classification** | **Raid type** | **Disk number** | **Redundancy** | **Available capacity** | +| -------- | -------- | -------- | --------- | -------- | +| system disk | RAID1 | 2 | 1 | 1 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | RAID5 | 7 | 1 | 6 | +| data disk | NoRaid | 1 | 0 | 1 | + +## Operating System + +### Version Requirements + +IoTDB supports operating systems such as Linux, Windows, and MacOS, while the enterprise version supports domestic CPUs such as Loongson, Phytium, and Kunpeng. It also supports domestic server operating systems such as Neokylin, KylinOS, UOS, and Linx. + +### Disk Partition + +- The default standard partition mode is recommended. LVM extension and hard disk encryption are not recommended. +- The system disk needs only the space used by the operating system, and does not need to reserve space for the IoTDB. +- Each disk group corresponds to only one partition. Data disks (with multiple disk groups, corresponding to raid) do not need additional partitions. All space is used by the IoTDB. +The following table lists the recommended disk partitioning methods. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Disk classificationDisk setDriveCapacityFile system type
System diskDisk group0/boot1GBAcquiesce
/Remaining space of the disk groupAcquiesce
Data diskDisk set1/data1Full space of disk group1Acquiesce
Disk set2/data2Full space of disk group2Acquiesce
......
+### Network Configuration + +1. Disable the firewall + +```Bash +# View firewall +systemctl status firewalld +# Disable firewall +systemctl stop firewalld +# Disable firewall permanently +systemctl disable firewalld +``` +2. Ensure that the required port is not occupied + +(1) Check the ports occupied by the cluster: In the default cluster configuration, ConfigNode occupies ports 10710 and 10720, and DataNode occupies ports 6667, 10730, 10740, 10750, 10760, 9090, 9190, and 3000. Ensure that these ports are not occupied. Check methods are as follows: + +```Bash +lsof -i:6667 or netstat -tunp | grep 6667 +lsof -i:10710 or netstat -tunp | grep 10710 +lsof -i:10720 or netstat -tunp | grep 10720 +# If the command outputs, the port is occupied. +``` + +(2) Checking the port occupied by the cluster deployment tool: When using the cluster management tool opskit to install and deploy the cluster, enable the SSH remote connection service configuration and open port 22. + +```Bash +yum install openssh-server # Install the ssh service +systemctl start sshd # Enable port 22 +``` + +3. Ensure that servers are connected to each other + +### Other Configuration + +1. Reduce the system swap priority to the lowest level + +```Bash +echo "vm.swappiness = 0">> /etc/sysctl.conf +# The swapoff -a and swapon -a commands are executed together to dump the data in swap back to memory and to empty the data in swap. +# Do not omit the swappiness setting and just execute swapoff -a; Otherwise, swap automatically opens again after the restart, making the operation invalid. +swapoff -a && swapon -a +# Make the configuration take effect without restarting. +sysctl -p +# Swap's used memory has become 0 +free -m +``` +2. Set the maximum number of open files to 65535 to avoid the error of "too many open files". + +```Bash +# View current restrictions +ulimit -n +# Temporary changes +ulimit -n 65535 +# Permanent modification +echo "* soft nofile 65535" >> /etc/security/limits.conf +echo "* hard nofile 65535" >> /etc/security/limits.conf +# View after exiting the current terminal session, expect to display 65535 +ulimit -n +``` +## Software Dependence + +Install the Java runtime environment (Java version >= 1.8). Ensure that jdk environment variables are set. (It is recommended to deploy JDK17 for V1.3.2.2 or later. In some scenarios, the performance of JDK of earlier versions is compromised, and Datanodes cannot be stopped.) + +```Bash +# The following is an example of installing in centos7 using JDK-17: +tar -zxvf JDk-17_linux-x64_bin.tar # Decompress the JDK file +Vim ~/.bashrc # Configure the JDK environment +{ export JAVA_HOME=/usr/lib/jvm/jdk-17.0.9 + export PATH=$JAVA_HOME/bin:$PATH +} # Add JDK environment variables +source ~/.bashrc # The configuration takes effect +java -version # Check the JDK environment +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_apache.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_apache.md new file mode 100644 index 00000000..aab760b7 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_apache.md @@ -0,0 +1,42 @@ + +# Package Acquisition + +## How to obtain installation packages +The installation package can be directly obtained from the Apache IoTDB official website:https://iotdb.apache.org/Download/ + +## Installation Package Structure +Install the package after decompression(`apache-iotdb--all-bin.zip`),After decompressing the installation package, the directory structure is as follows: +| **catalogue** | **Type** | **Explanation** | +| :--------------: | :------: | :----------------------------------------------------------: | +| conf | folder | Configuration file directory, including configuration files such as ConfigNode, DataNode, JMX, and logback | +| data | folder | The default data file directory contains data files for ConfigNode and DataNode. (The directory will only be generated after starting the program) | +| lib | folder | IoTDB executable library file directory | +| licenses | folder | Open source community certificate file directory | +| logs | folder | The default log file directory, which includes log files for ConfigNode and DataNode (this directory will only be generated after starting the program) | +| sbin | folder | Main script directory, including start, stop, and other scripts | +| tools | folder | Directory of System Peripheral Tools | +| ext | folder | Related files for pipe, trigger, and UDF plugins (created by the user when needed) | +| LICENSE | file | certificate | +| NOTICE | file | Tip | +| README_ZH\.md | file | Explanation of the Chinese version in Markdown format | +| README\.md | file | Instructions for use | +| RELEASE_NOTES\.md | file | Version Description | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_timecho.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_timecho.md new file mode 100644 index 00000000..86e0af2a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_timecho.md @@ -0,0 +1,42 @@ + +# Obtain TimechoDB +## How to obtain TimechoDB +The enterprise version installation package can be obtained through product trial application or by directly contacting the business personnel who are in contact with you. + +## Installation Package Structure +Install the package after decompression(iotdb-enterprise-{version}-bin.zip),The directory structure after unpacking the installation package is as follows: +| **catalogue** | **Type** | **Explanation** | +| :--------------: | -------- | ------------------------------------------------------------ | +| activation | folder | The directory where the activation file is located, including the generated machine code and the enterprise version activation code obtained from the business side (this directory will only be generated after starting ConfigNode to obtain the activation code) | +| conf | folder | Configuration file directory, including configuration files such as ConfigNode, DataNode, JMX, and logback | +| data | folder | The default data file directory contains data files for ConfigNode and DataNode. (The directory will only be generated after starting the program) | +| lib | folder | IoTDB executable library file directory | +| licenses | folder | Open source community certificate file directory | +| logs | folder | The default log file directory, which includes log files for ConfigNode and DataNode (this directory will only be generated after starting the program) | +| sbin | folder | Main script directory, including start, stop, and other scripts | +| tools | folder | Directory of System Peripheral Tools | +| ext | folder | Related files for pipe, trigger, and UDF plugins (created by the user when needed) | +| LICENSE | file | certificate | +| NOTICE | file | Tip | +| README_ZH\.md | file | Explanation of the Chinese version in Markdown format | +| README\.md | file | Instructions for use | +| RELEASE_NOTES\.md | file | Version Description | diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md new file mode 100644 index 00000000..4e9a50a1 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -0,0 +1,680 @@ + +# Monitoring Panel Deployment + +The IoTDB monitoring panel is one of the supporting tools for the IoTDB Enterprise Edition. It aims to solve the monitoring problems of IoTDB and its operating system, mainly including operating system resource monitoring, IoTDB performance monitoring, and hundreds of kernel monitoring indicators, in order to help users monitor the health status of the cluster, and perform cluster optimization and operation. This article will take common 3C3D clusters (3 Confignodes and 3 Datanodes) as examples to introduce how to enable the system monitoring module in an IoTDB instance and use Prometheus+Grafana to visualize the system monitoring indicators. + +## Installation Preparation + +1. Installing IoTDB: You need to first install IoTDB V1.0 or above Enterprise Edition. You can contact business or technical support to obtain +2. Obtain the IoTDB monitoring panel installation package: Based on the enterprise version of IoTDB database monitoring panel, you can contact business or technical support to obtain + +## Installation Steps + +### Step 1: IoTDB enables monitoring indicator collection + +1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). + +| **Configuration** | Located in the configuration file | **Description** | +| :--------------------------------- | :-------------------------------- | :----------------------------------------------------------- | +| cn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | +| cn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | +| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item to maintain the default setting of 9091. If other ports are set, they will not conflict with each other | +| dn_metric_reporter_list | conf/iotdb-system.properties | Uncomment the configuration item and set the value to PROMETHEUS | +| dn_metric_level | conf/iotdb-system.properties | Uncomment the configuration item and set the value to IMPORTANT | +| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | Uncomment the configuration item and set it to 9092 by default. If other ports are set, they will not conflict with each other | + +Taking the 3C3D cluster as an example, the monitoring configuration that needs to be modified is as follows: + +| Node IP | Host Name | Cluster Role | Configuration File Path | Configuration | +| ----------- | --------- | ------------ | -------------------------------- | ------------------------------------------------------------ | +| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | + +2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: + +```Bash +./sbin/stop-standalone.sh #Stop confignode and datanode first +./sbin/start-confignode.sh -d #Start confignode +./sbin/start-datanode.sh -d #Start datanode +``` + +3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### Step 2: Install and configure Prometheus + +> Taking Prometheus installed on server 192.168.1.3 as an example. + +1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it(https://prometheus.io/docs/introduction/first_steps/) +2. Unzip the installation package and enter the unzipped folder: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +3. Modify the configuration. Modify the configuration file prometheus.yml as follows + 1. Add configNode task to collect monitoring data for ConfigNode + 2. Add a datanode task to collect monitoring data for DataNodes + +```YAML +global: + scrape_interval: 15s + evaluation_interval: 15s +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true +``` + +4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +5. Confirm successful startup. Enter in browser http://192.168.1.3:9090 Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. + +
+ + +
+ +6. Clicking on the left link in Targets will redirect you to web monitoring and view the monitoring information of the corresponding node: + +![](https://alioss.timecho.com/docs/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) + +### Step 3: Install Grafana and configure the data source + +> Taking Grafana installed on server 192.168.1.3 as an example. + +1. Download the Grafana installation package, which requires installing version 8.4.2 or higher. You can go to the Grafana official website to download it(https://grafana.com/grafana/download) +2. Unzip and enter the corresponding folder + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +3. Start Grafana: + +```Shell +./bin/grafana-server web +``` + +4. Log in to Grafana. Enter in browser http://192.168.1.3:3000 (or the modified port), enter Grafana, and the default initial username and password are both admin. + +5. Configure data sources. Find Data sources in Connections, add a new data source, and configure the Data Source to Prometheus + +![](https://alioss.timecho.com/docs/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) + +When configuring the Data Source, pay attention to the URL where Prometheus is located. After configuring it, click on Save&Test and a Data Source is working prompt will appear, indicating successful configuration + +![](https://alioss.timecho.com/docs/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) + +### Step 4: Import IoTDB Grafana Dashboards + +1. Enter Grafana and select Dashboards: + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) + +2. Click the Import button on the right side + + ![](https://alioss.timecho.com/docs/img/Import%E6%8C%89%E9%92%AE.png) + +3. Import Dashboard using upload JSON file + + ![](https://alioss.timecho.com/docs/img/%E5%AF%BC%E5%85%A5Dashboard.png) + +4. Select the JSON file of one of the panels in the IoTDB monitoring panel, using the Apache IoTDB ConfigNode Dashboard as an example (refer to the installation preparation section in this article for the monitoring panel installation package): + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) + +5. Select Prometheus as the data source and click Import + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) + +6. Afterwards, you can see the imported Apache IoTDB ConfigNode Dashboard monitoring panel + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF.png) + +7. Similarly, we can import the Apache IoTDB DataNode Dashboard Apache Performance Overview Dashboard、Apache System Overview Dashboard, You can see the following monitoring panel: + +
+ + + +
+ +8. At this point, all IoTDB monitoring panels have been imported and monitoring information can now be viewed at any time. + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) + +## Appendix, Detailed Explanation of Monitoring Indicators + +### System Dashboard + +This panel displays the current usage of system CPU, memory, disk, and network resources, as well as partial status of the JVM. + +#### CPU + +- CPU Core:CPU cores +- CPU Load: + - System CPU Load:The average CPU load and busyness of the entire system during the sampling time + - Process CPU Load:The proportion of CPU occupied by the IoTDB process during sampling time +- CPU Time Per Minute:The total CPU time of all processes in the system per minute + +#### Memory + +- System Memory:The current usage of system memory. + - Commited vm size: The size of virtual memory allocated by the operating system to running processes. + - Total physical memory:The total amount of available physical memory in the system. + - Used physical memory:The total amount of memory already used by the system. Contains the actual amount of memory used by the process and the memory occupied by the operating system buffers/cache. +- System Swap Memory:Swap Space memory usage. +- Process Memory:The usage of memory by the IoTDB process. + - Max Memory:The maximum amount of memory that an IoTDB process can request from the operating system. (Configure the allocated memory size in the datanode env/configure env configuration file) + - Total Memory:The total amount of memory that the IoTDB process has currently requested from the operating system. + - Used Memory:The total amount of memory currently used by the IoTDB process. + +#### Disk + +- Disk Space: + - Total disk space:The maximum disk space that IoTDB can use. + - Used disk space:The disk space already used by IoTDB. +- Log Number Per Minute:The average number of logs at each level of IoTDB per minute during the sampling time. +- File Count:Number of IoTDB related files + - all:All file quantities + - TsFile:Number of TsFiles + - seq:Number of sequential TsFiles + - unseq:Number of unsequence TsFiles + - wal:Number of WAL files + - cross-temp:Number of cross space merge temp files + - inner-seq-temp:Number of merged temp files in sequential space + - innser-unseq-temp:Number of merged temp files in unsequential space + - mods:Number of tombstone files +- Open File Count:Number of file handles opened by the system +- File Size:The size of IoTDB related files. Each sub item corresponds to the size of the corresponding file. +- Disk I/O Busy Rate:Equivalent to the% util indicator in iostat, it to some extent reflects the level of disk busyness. Each sub item is an indicator corresponding to the disk. +- Disk I/O Throughput:The average I/O throughput of each disk in the system over a period of time. Each sub item is an indicator corresponding to the disk. +- Disk I/O Ops:Equivalent to the four indicators of r/s, w/s, rrqm/s, and wrqm/s in iostat, it refers to the number of times a disk performs I/O per second. Read and write refer to the number of times a disk performs a single I/O. Due to the corresponding scheduling algorithm of block devices, in some cases, multiple adjacent I/Os can be merged into one. Merge read and merge write refer to the number of times multiple I/Os are merged into one I/O. +- Disk I/O Avg Time:Equivalent to the await of iostat, which is the average latency of each I/O request. Separate recording of read and write requests. +- Disk I/O Avg Size:Equivalent to the avgrq sz of iostat, it reflects the size of each I/O request. Separate recording of read and write requests. +- Disk I/O Avg Queue Size:Equivalent to avgqu sz in iostat, which is the average length of the I/O request queue. +- I/O System Call Rate:The frequency of process calls to read and write system calls, similar to IOPS. +- I/O Throughput:The throughput of process I/O can be divided into two categories: actual-read/write and attemppt-read/write. Actual read and actual write refer to the number of bytes that a process actually causes block devices to perform I/O, excluding the parts processed by Page Cache. + +#### JVM + +- GC Time Percentage:The proportion of GC time spent by the node JVM in the past minute's time window +- GC Allocated/Promoted Size Detail: The average size of objects promoted to the old era per minute by the node JVM, as well as the size of objects newly applied for by the new generation/old era and non generational new applications +- GC Data Size Detail:The long-term surviving object size of the node JVM and the maximum intergenerational allowed value +- Heap Memory:JVM heap memory usage. + - Maximum heap memory:The maximum available heap memory size for the JVM. + - Committed heap memory:The size of heap memory that has been committed by the JVM. + - Used heap memory:The size of heap memory already used by the JVM. + - PS Eden Space:The size of the PS Young area. + - PS Old Space:The size of the PS Old area. + - PS Survivor Space:The size of the PS survivor area. + - ...(CMS/G1/ZGC, etc) +- Off Heap Memory:Out of heap memory usage. + - direct memory:Out of heap direct memory. + - mapped memory:Out of heap mapped memory. +- GC Number Per Minute:The average number of garbage collection attempts per minute by the node JVM, including YGC and FGC +- GC Time Per Minute:The average time it takes for node JVM to perform garbage collection per minute, including YGC and FGC +- GC Number Per Minute Detail:The average number of garbage collections per minute by node JVM due to different reasons, including YGC and FGC +- GC Time Per Minute Detail:The average time spent by node JVM on garbage collection per minute due to different reasons, including YGC and FGC +- Time Consumed Of Compilation Per Minute:The total time JVM spends compiling per minute +- The Number of Class: + - loaded:The number of classes currently loaded by the JVM + - unloaded:The number of classes uninstalled by the JVM since system startup +- The Number of Java Thread:The current number of surviving threads in IoTDB. Each sub item represents the number of threads in each state. + +#### Network + +Eno refers to the network card connected to the public network, while lo refers to the virtual network card. + +- Net Speed:The speed of network card sending and receiving data +- Receive/Transmit Data Size:The size of data packets sent or received by the network card, calculated from system restart +- Packet Speed:The speed at which the network card sends and receives packets, and one RPC request can correspond to one or more packets +- Connection Num:The current number of socket connections for the selected process (IoTDB only has TCP) + +### Performance Overview Dashboard + +#### Cluster Overview + +- Total CPU Core:Total CPU cores of cluster machines +- DataNode CPU Load:CPU usage of each DataNode node in the cluster +- Disk + - Total Disk Space: Total disk size of cluster machines + - DataNode Disk Usage: The disk usage rate of each DataNode in the cluster +- Total Timeseries: The total number of time series managed by the cluster (including replicas), the actual number of time series needs to be calculated in conjunction with the number of metadata replicas +- Cluster: Number of ConfigNode and DataNode nodes in the cluster +- Up Time: The duration of cluster startup until now +- Total Write Point Per Second: The total number of writes per second in the cluster (including replicas), and the actual total number of writes needs to be analyzed in conjunction with the number of data replicas +- Memory + - Total System Memory: Total memory size of cluster machine system + - Total Swap Memory: Total size of cluster machine swap memory + - DataNode Process Memory Usage: Memory usage of each DataNode in the cluster +- Total File Number:Total number of cluster management files +- Cluster System Overview:Overview of cluster machines, including average DataNode node memory usage and average machine disk usage +- Total DataBase: The total number of databases managed by the cluster (including replicas) +- Total DataRegion: The total number of DataRegions managed by the cluster +- Total SchemaRegion: The total number of SchemeRegions managed by the cluster + +#### Node Overview + +- CPU Core: The number of CPU cores in the machine where the node is located +- Disk Space: The disk size of the machine where the node is located +- Timeseries: Number of time series managed by the machine where the node is located (including replicas) +- System Overview: System overview of the machine where the node is located, including CPU load, process memory usage ratio, and disk usage ratio +- Write Point Per Second: The write speed per second of the machine where the node is located (including replicas) +- System Memory: The system memory size of the machine where the node is located +- Swap Memory:The swap memory size of the machine where the node is located +- File Number: Number of files managed by nodes + +#### Performance + +- Session Idle Time:The total idle time and total busy time of the session connection of the node +- Client Connection: The client connection status of the node, including the total number of connections and the number of active connections +- Time Consumed Of Operation: The time consumption of various types of node operations, including average and P99 +- Average Time Consumed Of Interface: The average time consumption of each thrust interface of a node +- P99 Time Consumed Of Interface: P99 time consumption of various thrust interfaces of nodes +- Task Number: The number of system tasks for each node +- Average Time Consumed of Task: The average time spent on various system tasks of a node +- P99 Time Consumed of Task: P99 time consumption for various system tasks of nodes +- Operation Per Second: The number of operations per second for a node +- Mainstream Process + - Operation Per Second Of Stage: The number of operations per second for each stage of the node's main process + - Average Time Consumed Of Stage: The average time consumption of each stage in the main process of a node + - P99 Time Consumed Of Stage: P99 time consumption for each stage of the node's main process +- Schedule Stage + - OPS Of Schedule: The number of operations per second in each sub stage of the node schedule stage + - Average Time Consumed Of Schedule Stage:The average time consumption of each sub stage in the node schedule stage + - P99 Time Consumed Of Schedule Stage: P99 time consumption for each sub stage of the schedule stage of the node +- Local Schedule Sub Stages + - OPS Of Local Schedule Stage: The number of operations per second in each sub stage of the local schedule node + - Average Time Consumed Of Local Schedule Stage: The average time consumption of each sub stage in the local schedule stage of the node + - P99 Time Consumed Of Local Schedule Stage: P99 time consumption for each sub stage of the local schedule stage of the node +- Storage Stage + - OPS Of Storage Stage: The number of operations per second in each sub stage of the node storage stage + - Average Time Consumed Of Storage Stage: Average time consumption of each sub stage in the node storage stage + - P99 Time Consumed Of Storage Stage: P99 time consumption for each sub stage of node storage stage +- Engine Stage + - OPS Of Engine Stage: The number of operations per second in each sub stage of the node engine stage + - Average Time Consumed Of Engine Stage: The average time consumption of each sub stage in the engine stage of a node + - P99 Time Consumed Of Engine Stage: P99 time consumption of each sub stage in the node engine stage + +#### System + +- CPU Load: CPU load of nodes +- CPU Time Per Minute: The CPU time per minute of a node, with the maximum value related to the number of CPU cores +- GC Time Per Minute:The average GC time per minute for nodes, including YGC and FGC +- Heap Memory: Node's heap memory usage +- Off Heap Memory: Non heap memory usage of nodes +- The Number Of Java Thread: Number of Java threads on nodes +- File Count:Number of files managed by nodes +- File Size: Node management file size situation +- Log Number Per Minute: Different types of logs per minute for nodes + +### ConfigNode Dashboard + +This panel displays the performance of all management nodes in the cluster, including partitioning, node information, and client connection statistics. + +#### Node Overview + +- Database Count: Number of databases for nodes +- Region + - DataRegion Count:Number of DataRegions for nodes + - DataRegion Current Status: The state of the DataRegion of the node + - SchemaRegion Count: Number of SchemeRegions for nodes + - SchemaRegion Current Status: The state of the SchemeRegion of the node +- System Memory: The system memory size of the node +- Swap Memory: Node's swap memory size +- ConfigNodes: The running status of the ConfigNode in the cluster where the node is located +- DataNodes:The DataNode situation of the cluster where the node is located +- System Overview: System overview of nodes, including system memory, disk usage, process memory, and CPU load + +#### NodeInfo + +- Node Count: The number of nodes in the cluster where the node is located, including ConfigNode and DataNode +- ConfigNode Status: The status of the ConfigNode node in the cluster where the node is located +- DataNode Status: The status of the DataNode node in the cluster where the node is located +- SchemaRegion Distribution: The distribution of SchemaRegions in the cluster where the node is located +- SchemaRegionGroup Leader Distribution: The distribution of leaders in the SchemaRegionGroup of the cluster where the node is located +- DataRegion Distribution: The distribution of DataRegions in the cluster where the node is located +- DataRegionGroup Leader Distribution:The distribution of leaders in the DataRegionGroup of the cluster where the node is located + +#### Protocol + +- Client Count + - Active Client Num: The number of active clients in each thread pool of a node + - Idle Client Num: The number of idle clients in each thread pool of a node + - Borrowed Client Count: Number of borrowed clients in each thread pool of the node + - Created Client Count: Number of created clients for each thread pool of the node + - Destroyed Client Count: The number of destroyed clients in each thread pool of the node +- Client time situation + - Client Mean Active Time: The average active time of clients in each thread pool of a node + - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node + - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + +#### Partition Table + +- SchemaRegionGroup Count: The number of SchemaRegionGroups in the Database of the cluster where the node is located +- DataRegionGroup Count: The number of DataRegionGroups in the Database of the cluster where the node is located +- SeriesSlot Count: The number of SeriesSlots in the Database of the cluster where the node is located +- TimeSlot Count: The number of TimeSlots in the Database of the cluster where the node is located +- DataRegion Status: The DataRegion status of the cluster where the node is located +- SchemaRegion Status: The status of the SchemeRegion of the cluster where the node is located + +#### Consensus + +- Ratis Stage Time: The time consumption of each stage of the node's Ratis +- Write Log Entry: The time required to write a log for the Ratis of a node +- Remote / Local Write Time: The time consumption of remote and local writes for the Ratis of nodes +- Remote / Local Write QPS: Remote and local QPS written to node Ratis +- RatisConsensus Memory: Memory usage of Node Ratis consensus protocol + +### DataNode Dashboard + +This panel displays the monitoring status of all data nodes in the cluster, including write time, query time, number of stored files, etc. + +#### Node Overview + +- The Number Of Entity: Entity situation of node management +- Write Point Per Second: The write speed per second of the node +- Memory Usage: The memory usage of the node, including the memory usage of various parts of IoT Consensus, the total memory usage of SchemaRegion, and the memory usage of various databases. + +#### Protocol + +- Node Operation Time Consumption + - The Time Consumed Of Operation (avg): The average time spent on various operations of a node + - The Time Consumed Of Operation (50%): The median time spent on various operations of a node + - The Time Consumed Of Operation (99%): P99 time consumption for various operations of nodes +- Thrift Statistics + - The QPS Of Interface: QPS of various Thrift interfaces of nodes + - The Avg Time Consumed Of Interface: The average time consumption of each Thrift interface of a node + - Thrift Connection: The number of Thrfit connections of each type of node + - Thrift Active Thread: The number of active Thrift connections for each type of node +- Client Statistics + - Active Client Num: The number of active clients in each thread pool of a node + - Idle Client Num: The number of idle clients in each thread pool of a node + - Borrowed Client Count:Number of borrowed clients for each thread pool of a node + - Created Client Count: Number of created clients for each thread pool of the node + - Destroyed Client Count: The number of destroyed clients in each thread pool of the node + - Client Mean Active Time: The average active time of clients in each thread pool of a node + - Client Mean Borrow Wait Time: The average borrowing waiting time of clients in each thread pool of a node + - Client Mean Idle Time: The average idle time of clients in each thread pool of a node + +#### Storage Engine + +- File Count: Number of files of various types managed by nodes +- File Size: Node management of various types of file sizes +- TsFile + - TsFile Total Size In Each Level: The total size of TsFile files at each level of node management + - TsFile Count In Each Level: Number of TsFile files at each level of node management + - Avg TsFile Size In Each Level: The average size of TsFile files at each level of node management +- Task Number: Number of Tasks for Nodes +- The Time Consumed of Task: The time consumption of tasks for nodes +- Compaction + - Compaction Read And Write Per Second: The merge read and write speed of nodes per second + - Compaction Number Per Minute: The number of merged nodes per minute + - Compaction Process Chunk Status: The number of Chunks in different states merged by nodes + - Compacted Point Num Per Minute: The number of merged nodes per minute + +#### Write Performance + +- Write Cost(avg): Average node write time, including writing wal and memtable +- Write Cost(50%): Median node write time, including writing wal and memtable +- Write Cost(99%): P99 for node write time, including writing wal and memtable +- WAL + - WAL File Size: Total size of WAL files managed by nodes + - WAL File Num:Number of WAL files managed by nodes + - WAL Nodes Num: Number of WAL nodes managed by nodes + - Make Checkpoint Costs: The time required to create various types of CheckPoints for nodes + - WAL Serialize Total Cost: Total time spent on node WAL serialization + - Data Region Mem Cost: Memory usage of different DataRegions of nodes, total memory usage of DataRegions of the current instance, and total memory usage of DataRegions of the current cluster + - Serialize One WAL Info Entry Cost: Node serialization time for a WAL Info Entry + - Oldest MemTable Ram Cost When Cause Snapshot: MemTable size when node WAL triggers oldest MemTable snapshot + - Oldest MemTable Ram Cost When Cause Flush: MemTable size when node WAL triggers oldest MemTable flush + - Effective Info Ratio Of WALNode: The effective information ratio of different WALNodes of nodes + - WAL Buffer + - WAL Buffer Cost: Node WAL flush SyncBuffer takes time, including both synchronous and asynchronous options + - WAL Buffer Used Ratio: The usage rate of the WAL Buffer of the node + - WAL Buffer Entries Count: The number of entries in the WAL Buffer of a node +- Flush Statistics + - Flush MemTable Cost(avg): The total time spent on node Flush and the average time spent on each sub stage + - Flush MemTable Cost(50%): The total time spent on node Flush and the median time spent on each sub stage + - Flush MemTable Cost(99%): The total time spent on node Flush and the P99 time spent on each sub stage + - Flush Sub Task Cost(avg): The average time consumption of each node's Flush subtask, including sorting, encoding, and IO stages + - Flush Sub Task Cost(50%): The median time consumption of each subtask of the Flush node, including sorting, encoding, and IO stages + - Flush Sub Task Cost(99%): The average subtask time P99 for Flush of nodes, including sorting, encoding, and IO stages +- Pending Flush Task Num: The number of Flush tasks in a blocked state for a node +- Pending Flush Sub Task Num: Number of Flush subtasks blocked by nodes +- Tsfile Compression Ratio Of Flushing MemTable: The compression rate of TsFile corresponding to node flashing Memtable +- Flush TsFile Size Of DataRegions: The corresponding TsFile size for each disk flush of nodes in different DataRegions +- Size Of Flushing MemTable: The size of the Memtable for node disk flushing +- Points Num Of Flushing MemTable: The number of points when flashing data in different DataRegions of a node +- Series Num Of Flushing MemTable: The number of time series when flashing Memtables in different DataRegions of a node +- Average Point Num Of Flushing MemChunk: The average number of disk flushing points for node MemChunk + +#### Schema Engine + +- Schema Engine Mode: The metadata engine pattern of nodes +- Schema Consensus Protocol: Node metadata consensus protocol +- Schema Region Number:Number of SchemeRegions managed by nodes +- Schema Region Memory Overview: The amount of memory in the SchemeRegion of a node +- Memory Usgae per SchemaRegion:The average memory usage size of node SchemaRegion +- Cache MNode per SchemaRegion: The number of cache nodes in each SchemeRegion of a node +- MLog Length and Checkpoint: The total length and checkpoint position of the current mlog for each SchemeRegion of the node (valid only for SimpleConsense) +- Buffer MNode per SchemaRegion: The number of buffer nodes in each SchemeRegion of a node +- Activated Template Count per SchemaRegion: The number of activated templates in each SchemeRegion of a node +- Time Series statistics + - Timeseries Count per SchemaRegion: The average number of time series for node SchemaRegion + - Series Type: Number of time series of different types of nodes + - Time Series Number: The total number of time series nodes + - Template Series Number: The total number of template time series for nodes + - Template Series Count per SchemaRegion: The number of sequences created through templates in each SchemeRegion of a node +- IMNode Statistics + - Pinned MNode per SchemaRegion: Number of IMNode nodes with Pinned nodes in each SchemeRegion + - Pinned Memory per SchemaRegion: The memory usage size of the IMNode node for Pinned nodes in each SchemeRegion of the node + - Unpinned MNode per SchemaRegion: The number of unpinned IMNode nodes in each SchemeRegion of a node + - Unpinned Memory per SchemaRegion: Memory usage size of unpinned IMNode nodes in each SchemeRegion of the node + - Schema File Memory MNode Number: Number of IMNode nodes with global pinned and unpinned nodes + - Release and Flush MNode Rate: The number of IMNodes that release and flush nodes per second +- Cache Hit Rate: Cache hit rate of nodes +- Release and Flush Thread Number: The current number of active Release and Flush threads on the node +- Time Consumed of Relead and Flush (avg): The average time taken for node triggered cache release and buffer flushing +- Time Consumed of Relead and Flush (99%): P99 time consumption for node triggered cache release and buffer flushing + +#### Query Engine + +- Time Consumption In Each Stage + - The time consumed of query plan stages(avg): The average time spent on node queries at each stage + - The time consumed of query plan stages(50%): Median time spent on node queries at each stage + - The time consumed of query plan stages(99%): P99 time consumption for node query at each stage +- Execution Plan Distribution Time + - The time consumed of plan dispatch stages(avg): The average time spent on node query execution plan distribution + - The time consumed of plan dispatch stages(50%): Median time spent on node query execution plan distribution + - The time consumed of plan dispatch stages(99%): P99 of node query execution plan distribution time +- Execution Plan Execution Time + - The time consumed of query execution stages(avg): The average execution time of node query execution plan + - The time consumed of query execution stages(50%):Median execution time of node query execution plan + - The time consumed of query execution stages(99%): P99 of node query execution plan execution time +- Operator Execution Time + - The time consumed of operator execution stages(avg): The average execution time of node query operators + - The time consumed of operator execution(50%): Median execution time of node query operator + - The time consumed of operator execution(99%): P99 of node query operator execution time +- Aggregation Query Computation Time + - The time consumed of query aggregation(avg): The average computation time for node aggregation queries + - The time consumed of query aggregation(50%): Median computation time for node aggregation queries + - The time consumed of query aggregation(99%): P99 of node aggregation query computation time +- File/Memory Interface Time Consumption + - The time consumed of query scan(avg): The average time spent querying file/memory interfaces for nodes + - The time consumed of query scan(50%): Median time spent querying file/memory interfaces for nodes + - The time consumed of query scan(99%): P99 time consumption for node query file/memory interface +- Number Of Resource Visits + - The usage of query resource(avg): The average number of resource visits for node queries + - The usage of query resource(50%): Median number of resource visits for node queries + - The usage of query resource(99%): P99 for node query resource access quantity +- Data Transmission Time + - The time consumed of query data exchange(avg): The average time spent on node query data transmission + - The time consumed of query data exchange(50%): Median query data transmission time for nodes + - The time consumed of query data exchange(99%): P99 for node query data transmission time +- Number Of Data Transfers + - The count of Data Exchange(avg): The average number of data transfers queried by nodes + - The count of Data Exchange: The quantile of the number of data transfers queried by nodes, including the median and P99 +- Task Scheduling Quantity And Time Consumption + - The number of query queue: Node query task scheduling quantity + - The time consumed of query schedule time(avg): The average time spent on scheduling node query tasks + - The time consumed of query schedule time(50%): Median time spent on node query task scheduling + - The time consumed of query schedule time(99%): P99 of node query task scheduling time + +#### Query Interface + +- Load Time Series Metadata + - The time consumed of load timeseries metadata(avg): The average time taken for node queries to load time series metadata + - The time consumed of load timeseries metadata(50%): Median time spent on loading time series metadata for node queries + - The time consumed of load timeseries metadata(99%): P99 time consumption for node query loading time series metadata +- Read Time Series + - The time consumed of read timeseries metadata(avg): The average time taken for node queries to read time series + - The time consumed of read timeseries metadata(50%): The median time taken for node queries to read time series + - The time consumed of read timeseries metadata(99%): P99 time consumption for node query reading time series +- Modify Time Series Metadata + - The time consumed of timeseries metadata modification(avg):The average time taken for node queries to modify time series metadata + - The time consumed of timeseries metadata modification(50%): Median time spent on querying and modifying time series metadata for nodes + - The time consumed of timeseries metadata modification(99%): P99 time consumption for node query and modification of time series metadata +- Load Chunk Metadata List + - The time consumed of load chunk metadata list(avg): The average time it takes for node queries to load Chunk metadata lists + - The time consumed of load chunk metadata list(50%): Median time spent on node query loading Chunk metadata list + - The time consumed of load chunk metadata list(99%): P99 time consumption for node query loading Chunk metadata list +- Modify Chunk Metadata + - The time consumed of chunk metadata modification(avg): The average time it takes for node queries to modify Chunk metadata + - The time consumed of chunk metadata modification(50%): The total number of bits spent on modifying Chunk metadata for node queries + - The time consumed of chunk metadata modification(99%): P99 time consumption for node query and modification of Chunk metadata +- Filter According To Chunk Metadata + - The time consumed of chunk metadata filter(avg): The average time spent on node queries filtering by Chunk metadata + - The time consumed of chunk metadata filter(50%): Median filtering time for node queries based on Chunk metadata + - The time consumed of chunk metadata filter(99%): P99 time consumption for node query filtering based on Chunk metadata +- Constructing Chunk Reader + - The time consumed of construct chunk reader(avg): The average time spent on constructing Chunk Reader for node queries + - The time consumed of construct chunk reader(50%): Median time spent on constructing Chunk Reader for node queries + - The time consumed of construct chunk reader(99%): P99 time consumption for constructing Chunk Reader for node queries +- Read Chunk + - The time consumed of read chunk(avg): The average time taken for node queries to read Chunks + - The time consumed of read chunk(50%): Median time spent querying nodes to read Chunks + - The time consumed of read chunk(99%): P99 time spent on querying and reading Chunks for nodes +- Initialize Chunk Reader + - The time consumed of init chunk reader(avg): The average time spent initializing Chunk Reader for node queries + - The time consumed of init chunk reader(50%): Median time spent initializing Chunk Reader for node queries + - The time consumed of init chunk reader(99%):P99 time spent initializing Chunk Reader for node queries +- Constructing TsBlock Through Page Reader + - The time consumed of build tsblock from page reader(avg): The average time it takes for node queries to construct TsBlock through Page Reader + - The time consumed of build tsblock from page reader(50%): The median time spent on constructing TsBlock through Page Reader for node queries + - The time consumed of build tsblock from page reader(99%):Node query using Page Reader to construct TsBlock time-consuming P99 +- Query the construction of TsBlock through Merge Reader + - The time consumed of build tsblock from merge reader(avg): The average time taken for node queries to construct TsBlock through Merge Reader + - The time consumed of build tsblock from merge reader(50%): The median time spent on constructing TsBlock through Merge Reader for node queries + - The time consumed of build tsblock from merge reader(99%): Node query using Merge Reader to construct TsBlock time-consuming P99 + +#### Query Data Exchange + +The data exchange for the query is time-consuming. + +- Obtain TsBlock through source handle + - The time consumed of source handle get tsblock(avg): The average time taken for node queries to obtain TsBlock through source handle + - The time consumed of source handle get tsblock(50%):Node query obtains the median time spent on TsBlock through source handle + - The time consumed of source handle get tsblock(99%): Node query obtains TsBlock time P99 through source handle +- Deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(avg): The average time taken for node queries to deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(50%): The median time taken for node queries to deserialize TsBlock through source handle + - The time consumed of source handle deserialize tsblock(99%): P99 time spent on deserializing TsBlock through source handle for node query +- Send TsBlock through sink handle + - The time consumed of sink handle send tsblock(avg): The average time taken for node queries to send TsBlock through sink handle + - The time consumed of sink handle send tsblock(50%): Node query median time spent sending TsBlock through sink handle + - The time consumed of sink handle send tsblock(99%): Node query sends TsBlock through sink handle with a time consumption of P99 +- Callback data block event + - The time consumed of on acknowledge data block event task(avg): The average time taken for node query callback data block event + - The time consumed of on acknowledge data block event task(50%): Median time spent on node query callback data block event + - The time consumed of on acknowledge data block event task(99%): P99 time consumption for node query callback data block event +- Get Data Block Tasks + - The time consumed of get data block task(avg): The average time taken for node queries to obtain data block tasks + - The time consumed of get data block task(50%): The median time taken for node queries to obtain data block tasks + - The time consumed of get data block task(99%): P99 time consumption for node query to obtain data block task + +#### Query Related Resource + +- MppDataExchangeManager:The number of shuffle sink handles and source handles during node queries +- LocalExecutionPlanner: The remaining memory that nodes can allocate to query shards +- FragmentInstanceManager: The query sharding context information and the number of query shards that the node is running +- Coordinator: The number of queries recorded on the node +- MemoryPool Size: Node query related memory pool situation +- MemoryPool Capacity: The size of memory pools related to node queries, including maximum and remaining available values +- DriverScheduler: Number of queue tasks related to node queries + +#### Consensus - IoT Consensus + +- Memory Usage + - IoTConsensus Used Memory: The memory usage of IoT Consumes for nodes, including total memory usage, queue usage, and synchronization usage +- Synchronization Status Between Nodes + - IoTConsensus Sync Index: SyncIndex size for different DataRegions of IoT Consumption nodes + - IoTConsensus Overview:The total synchronization gap and cached request count of IoT consumption for nodes + - IoTConsensus Search Index Rate: The growth rate of writing SearchIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus Safe Index Rate: The growth rate of synchronous SafeIndex for different DataRegions of IoT Consumer nodes + - IoTConsensus LogDispatcher Request Size: The request size for node IoT Consusus to synchronize different DataRegions to other nodes + - Sync Lag: The size of synchronization gap between different DataRegions in IoT Consumption node + - Min Peer Sync Lag: The minimum synchronization gap between different DataRegions and different replicas of node IoT Consumption + - Sync Speed Diff Of Peers: The maximum difference in synchronization from different DataRegions to different replicas for node IoT Consumption + - IoTConsensus LogEntriesFromWAL Rate: The rate at which nodes IoT Consumus obtain logs from WAL for different DataRegions + - IoTConsensus LogEntriesFromQueue Rate: The rate at which nodes IoT Consumes different DataRegions retrieve logs from the queue +- Different Execution Stages Take Time + - The Time Consumed Of Different Stages (avg): The average time spent on different execution stages of node IoT Consumus + - The Time Consumed Of Different Stages (50%): The median time spent on different execution stages of node IoT Consusus + - The Time Consumed Of Different Stages (99%):P99 of the time consumption for different execution stages of node IoT Consusus + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Stage Time: The time consumption of different stages of node Ratis +- Write Log Entry: The time consumption of writing logs at different stages of node Ratis +- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotely +- Remote / Local Write QPS: QPS written by node Ratis locally or remotely +- RatisConsensus Memory:Memory usage of node Ratis + +#### Consensus - SchemaRegion Ratis Consensus + +- Ratis Stage Time: The time consumption of different stages of node Ratis +- Write Log Entry: The time consumption for writing logs at each stage of node Ratis +- Remote / Local Write Time: The time it takes for node Ratis to write locally or remotelyThe time it takes for node Ratis to write locally or remotely +- Remote / Local Write QPS: QPS written by node Ratis locally or remotely +- RatisConsensus Memory: Node Ratis Memory Usage \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md new file mode 100644 index 00000000..d40c54c5 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md @@ -0,0 +1,178 @@ + +# Stand-Alone Deployment + +## Matters Needing Attention + +1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md). + +2. It is recommended to prioritize using 'hostname' for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure/etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure IoTDB's' cn_internal-address' using the host name dn_internal_address、dn_rpc_address。 + + ``` Shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + +4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + +5. Please note that when installing and deploying IoTDB, it is necessary to use the same user for operations. You can: +- Using root user (recommended): Using root user can avoid issues such as permissions. +- Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, stop and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + +## Installation Steps + +### 1、Unzip the installation package and enter the installation directory + +```Shell +unzip apache-iotdb-{version}-all-bin.zip +cd apache-iotdb-{version}-all-bin +``` + +### 2、Parameter Configuration + +#### Environment Script Configuration + +- ./conf/confignode-env.sh (./conf/confignode-env.bat) configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | +| MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- ./conf/datanode-env.sh (./conf/datanode-env.bat) configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | +| :---------: | :----------------------------------: | :--------: | :----------------------------------------------: | :----------: | +| MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### System General Configuration + +Open the general configuration file (./conf/iotdb-system. properties file) and set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------------: | :----------------------------------------------------------: | :------------: | :----------------------------------------------------------: | :---------------------------------------------------: | +| cluster_name | Cluster Name | defaultCluster | The cluster name can be set as needed, and if there are no special needs, the default can be kept | Cannot be modified after initial startup | +| schema_replication_factor | Number of metadata replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | +| data_replication_factor | Number of data replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | + +#### ConfigNode Configuration + +Open the ConfigNode configuration file (./conf/iotdb-system. properties file) and set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------: | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +#### DataNode Configuration + +Open the DataNode configuration file (./conf/iotdb-system. properties file) and set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | +| :-----------------------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 0.0.0.0 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The ConfigNode address that the node connects to when registering to join the cluster, i.e. cn_internal-address: cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect + +### 3、Start ConfigNode + +Enter the sbin directory of iotdb and start confignode + +```Shell +./start-confignode.sh -d #The "- d" parameter will start in the background +``` +If the startup fails, please refer to [Common Questions](#common-questions). + +### 4、Start DataNode + +Enter the sbin directory of iotdb and start datanode: + +```Shell +cd sbin +./start-datanode.sh -d #The "- d" parameter will start in the background +``` + +### 5、Verify Deployment + +Can be executed directly/ Cli startup script in sbin directory: + +```Shell +./start-cli.sh -h ip(local IP or domain name) -p port(6667) +``` + +After successful startup, the following interface will appear displaying successful installation of IOTDB. + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90%E7%89%88%E5%90%AF%E5%8A%A8%E6%88%90%E5%8A%9F.png) + +After the successful installation interface appears, use the `show cluster` command to check the service running status + +When the status is all running, it indicates that the service has started successfully + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E5%8D%95%E6%9C%BAshow.jpeg) + +> The appearance of 'Activated (W)' indicates passive activation, indicating that this Config Node does not have a license file (or has not issued the latest license file with a timestamp). At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. + +## Common Questions + +1. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md new file mode 100644 index 00000000..a4e3e3c5 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md @@ -0,0 +1,220 @@ + +# Stand-Alone Deployment + +This chapter will introduce how to start an IoTDB standalone instance, which includes 1 ConfigNode and 1 DataNode (commonly known as 1C1D). + +## Matters Needing Attention + +1. Before installation, ensure that the system is complete by referring to [System configuration](./Environment-Requirements.md). + +2. It is recommended to prioritize using 'hostname' for IP configuration during deployment, which can avoid the problem of modifying the host IP in the later stage and causing the database to fail to start. To set the host name, you need to configure/etc/hosts on the target server. For example, if the local IP is 192.168.1.3 and the host name is iotdb-1, you can use the following command to set the server's host name and configure IoTDB's' cn_internal-address' using the host name dn_internal_address、dn_rpc_address。 + + ```shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. Some parameters cannot be modified after the first startup. Please refer to the "Parameter Configuration" section below for settings. + +4. Whether in linux or windows, ensure that the IoTDB installation path does not contain Spaces and Chinese characters to avoid software exceptions. + +5. Please note that when installing and deploying IoTDB (including activating and using software), it is necessary to use the same user for operations. You can: +- Using root user (recommended): Using root user can avoid issues such as permissions. +- Using a fixed non root user: + - Using the same user operation: Ensure that the same user is used for start, activation, stop, and other operations, and do not switch users. + - Avoid using sudo: Try to avoid using sudo commands as they execute commands with root privileges, which may cause confusion or security issues. + +6. It is recommended to deploy a monitoring panel, which can monitor important operational indicators and keep track of database operation status at any time. The monitoring panel can be obtained by contacting the business department, and the steps for deploying the monitoring panel can be referred to:[Monitoring Board Install and Deploy](./Monitoring-panel-deployment.md). + +## Installation Steps + +### 1、Unzip the installation package and enter the installation directory + +```shell +unzip iotdb-enterprise-{version}-bin.zip +cd iotdb-enterprise-{version}-bin +``` + +### 2、Parameter Configuration + +#### Environment Script Configuration + +- ./conf/confignode-env.sh (./conf/confignode-env.bat) configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :---------------: | :----------------------------------------------------------: | :---------: | :----------------------------------------------------------: | :---------------------------------: | +| MEMORY_SIZE | The total amount of memory that IoTDB ConfigNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +- ./conf/datanode-env.sh (./conf/datanode-env.bat) configuration + +| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | +| :---------: | :----------------------------------: | :--------: | :----------------------------------------------: | :----------: | +| MEMORY_SIZE | The total amount of memory that IoTDB DataNode nodes can use | empty | Can be filled in as needed, and the system will allocate memory based on the filled in values | Restarting the service takes effect | + +#### System General Configuration + +Open the general configuration file (./conf/iotdb-system. properties file) and set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------------: | :----------------------------------------------------------: | :------------: | :----------------------------------------------------------: | :---------------------------------------------------: | +| cluster_name | Cluster Name | defaultCluster | The cluster name can be set as needed, and if there are no special needs, the default can be kept | Cannot be modified after initial startup | +| schema_replication_factor | Number of metadata replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | +| data_replication_factor | Number of data replicas, set to 1 for the standalone version here | 1 | 1 | Default 1, cannot be modified after the first startup | + +#### ConfigNode Configuration + +Open the ConfigNode configuration file (./conf/iotdb-system. properties file) and set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | Note | +| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------------------: | :--------------------------------------: | +| cn_internal_address | The address used by ConfigNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| cn_internal_port | The port used by ConfigNode for communication within the cluster | 10710 | 10710 | Cannot be modified after initial startup | +| cn_consensus_port | The port used for ConfigNode replica group consensus protocol communication | 10720 | 10720 | Cannot be modified after initial startup | +| cn_seed_config_node | The address of the ConfigNode that the node connects to when registering to join the cluster, cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +#### DataNode Configuration + +Open the DataNode configuration file (./conf/iotdb-system. properties file) and set the following parameters: + +| **Configuration** | **Description** | **Default** | **Recommended value** | **Note** | +| :------------------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------------------- | :--------------------------------------- | +| dn_rpc_address | The address of the client RPC service | 0.0.0.0 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Restarting the service takes effect | +| dn_rpc_port | The port of the client RPC service | 6667 | 6667 | Restarting the service takes effect | +| dn_internal_address | The address used by DataNode for communication within the cluster | 127.0.0.1 | The IPV4 address or host name of the server where it is located, and it is recommended to use host name | Cannot be modified after initial startup | +| dn_internal_port | The port used by DataNode for communication within the cluster | 10730 | 10730 | Cannot be modified after initial startup | +| dn_mpp_data_exchange_port | The port used by DataNode to receive data streams | 10740 | 10740 | Cannot be modified after initial startup | +| dn_data_region_consensus_port | The port used by DataNode for data replica consensus protocol communication | 10750 | 10750 | Cannot be modified after initial startup | +| dn_schema_region_consensus_port | The port used by DataNode for metadata replica consensus protocol communication | 10760 | 10760 | Cannot be modified after initial startup | +| dn_seed_config_node | The ConfigNode address that the node connects to when registering to join the cluster, i.e. cn_internal-address: cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | Cannot be modified after initial startup | + +> ❗️Attention: Editors such as VSCode Remote do not have automatic configuration saving function. Please ensure that the modified files are saved persistently, otherwise the configuration items will not take effect + +### 3、Start ConfigNode + +Enter the sbin directory of iotdb and start confignode + +```shell +./start-confignode.sh -d #The "- d" parameter will start in the background +``` +If the startup fails, please refer to [Common Questions](#common-questions). + +### 4、Activate Database + +#### Method 1: Activate file copy activation + +- After starting the confignode node, enter the activation folder and copy the systeminfo file to the Timecho staff +- Received the license file returned by the staff +- Place the license file in the activation folder of the corresponding node; + +#### Method 2: Activate Script Activation + +- Obtain the required machine code for activation, enter the sbin directory of the installation directory, and execute the activation script: + +```shell + cd sbin +./start-activate.sh +``` + +- The following information is displayed. Please copy the machine code (i.e. the string of characters) to the Timecho staff: + +```shell +Please copy the system_info's content and send it to Timecho: +01-KU5LDFFN-PNBEHDRH +Please enter license: +``` + +- Enter the activation code returned by the staff into the previous command line prompt 'Please enter license:', as shown below: + +```shell +Please enter license: +JJw+MmF+AtexsfgNGOFgTm83Bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxm6pF+APW1CiXLTSijK9Qh3nsLgzrW8OJPh26Vl6ljKUpCvpTiw== +License has been stored to sbin/../activation/license +Import completed. Please start cluster and excute 'show cluster' to verify activation status +``` + +### 5、Start DataNode + +Enter the sbin directory of iotdb and start datanode: + +```shell +cd sbin +./start-datanode.sh -d # The "- d" parameter will start in the background +``` + +### 6、Verify Deployment + +Can be executed directly/ Cli startup script in sbin directory: + +```shell +./start-cli.sh -h ip(local IP or domain name) -p port(6667) +``` + +After successful startup, the following interface will appear displaying successful installation of IOTDB. + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8%E6%88%90%E5%8A%9F.png) + +After the installation success interface appears, continue to check if the activation is successful and use the `show cluster`command + +When you see the display "Activated" on the far right, it indicates successful activation + +![](https://alioss.timecho.com/docs/img/show%20cluster.png) + + +> The appearance of 'Activated (W)' indicates passive activation, indicating that this Config Node does not have a license file (or has not issued the latest license file with a timestamp). At this point, it is recommended to check if the license file has been placed in the license folder. If not, please place the license file. If a license file already exists, it may be due to inconsistency between the license file of this node and the information of other nodes. Please contact Timecho staff to reapply. + +## Common Problem +1. Multiple prompts indicating activation failure during deployment process + - Use the `ls -al` command: Use the `ls -al` command to check if the owner information of the installation package root directory is the current user. + - Check activation directory: Check all files in the `./activation` directory and whether the owner information is the current user. + +2. Confignode failed to start + + Step 1: Please check the startup log to see if any parameters that cannot be changed after the first startup have been modified. + + Step 2: Please check the startup log for any other abnormalities. If there are any abnormal phenomena in the log, please contact Timecho Technical Support personnel for consultation on solutions. + + Step 3: If it is the first deployment or data can be deleted, you can also clean up the environment according to the following steps, redeploy, and restart. + + Step 4: Clean up the environment: + + a. Terminate all ConfigNode Node and DataNode processes. + ```Bash + # 1. Stop the ConfigNode and DataNode services + sbin/stop-standalone.sh + + # 2. Check for any remaining processes + jps + # Or + ps -ef|gerp iotdb + + # 3. If there are any remaining processes, manually kill the + kill -9 + # If you are sure there is only one iotdb on the machine, you can use the following command to clean up residual processes + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. Delete the data and logs directories. + + Explanation: Deleting the data directory is necessary, deleting the logs directory is for clean logs and is not mandatory. + + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/workbench-deployment_timecho.md b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/workbench-deployment_timecho.md new file mode 100644 index 00000000..9f96df00 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/workbench-deployment_timecho.md @@ -0,0 +1,223 @@ + +# Workbench Deployment + +The visualization console is one of the supporting tools for IoTDB (similar to Navicat for MySQL). It is an official application tool system used for database deployment implementation, operation and maintenance management, and application development stages, making the use, operation, and management of databases simpler and more efficient, truly achieving low-cost management and operation of databases. This document will assist you in installing Workbench. + +
+  +  +
+ + +## Installation Preparation + +| Preparation Content | Name | Version Requirements | Link | +| :----------------------: | :-------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| Operating System | Windows or Linux | - | - | +| Installation Environment | JDK | Need>=V1.8.0_162 (recommended to use 11 or 17, please choose ARM or x64 installation package according to machine configuration when downloading) | https://www.oracle.com/java/technologies/downloads/ | +| Related Software | Prometheus | Requires installation of V2.30.3 and above. | https://prometheus.io/download/ | +| Database | IoTDB | Requires V1.2.0 Enterprise Edition and above | You can contact business or technical support to obtain | +| Console | IoTDB-Workbench-`` | - | You can choose according to the appendix version comparison table and contact business or technical support to obtain it | + +## Installation Steps + +### Step 1: IoTDB enables monitoring indicator collection + +1. Open the monitoring configuration item. The configuration items related to monitoring in IoTDB are disabled by default. Before deploying the monitoring panel, you need to open the relevant configuration items (note that the service needs to be restarted after enabling monitoring configuration). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ConfigurationLocated in the configuration fileDescription
cn_metric_reporter_listconf/iotdb-system.propertiesPlease add this configuration item to the configuration file and set the value to PROMETHEUS
cn_metric_levelPlease add this configuration item to the configuration file and set the value to IMPORTANT
cn_metric_prometheus_reporter_portPlease add this configuration item to the configuration file to maintain the default setting of 9091. If other ports are set, they will not conflict with each other
dn_metric_reporter_listconf/iotdb-system.propertiesPlease add this configuration item to the configuration file and set the value to PROMETHEUS
dn_metric_levelPlease add this configuration item to the configuration file and set the value to IMPORTANT
dn_metric_prometheus_reporter_portPlease add this configuration item to the configuration file and set it to 9092 by default. If other ports are set, they will not conflict with each other
dn_metric_internal_reporter_typePlease add this configuration item to the configuration file and set the value to IOTDB
enable_audit_logconf/iotdb-system.propertiesPlease add this configuration item to the configuration file and set the value to true
audit_log_storagePlease add this configuration item in the configuration file, with values set to IOTDB and LOGGER
audit_log_operationPlease add this configuration item in the configuration file, with values set to DML,DDL,QUERY
+ + +2. Restart all nodes. After modifying the monitoring indicator configuration of three nodes, the confignode and datanode services of all nodes can be restarted: + + ```shell + ./sbin/stop-standalone.sh #Stop confignode and datanode first + ./sbin/start-confignode.sh -d #Start confignode + ./sbin/start-datanode.sh -d #Start datanode + ``` + +3. After restarting, confirm the running status of each node through the client. If the status is Running, it indicates successful configuration: + + ![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### Step 2: Install and configure Prometheus + +1. Download the Prometheus installation package, which requires installation of V2.30.3 and above. You can go to the Prometheus official website to download it (https://prometheus.io/docs/introduction/first_steps/) +2. Unzip the installation package and enter the unzipped folder: + + ```Shell + tar xvfz prometheus-*.tar.gz + cd prometheus-* + ``` + +3. Modify the configuration. Modify the configuration file prometheus.yml as follows + 1. Add configNode task to collect monitoring data for ConfigNode + 2. Add a datanode task to collect monitoring data for DataNodes + + ```shell + global: + scrape_interval: 15s + evaluation_interval: 15s + scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true + ``` + +4. Start Prometheus. The default expiration time for Prometheus monitoring data is 15 days. In production environments, it is recommended to adjust it to 180 days or more to track historical monitoring data for a longer period of time. The startup command is as follows: + + ```Shell + ./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d + ``` + +5. Confirm successful startup. Enter in browser `http://IP:port` Go to Prometheus and click on the Target interface under Status. When you see that all States are Up, it indicates successful configuration and connectivity. + +
+ + +
+ + +### Step 3: Install Workbench + +1. Enter the config directory of iotdb Workbench -`` + +2. Modify Workbench configuration file: Go to the `config` folder and modify the configuration file `application-prod.properties`. If you are installing it locally, there is no need to modify it. If you are deploying it on a server, you need to modify the IP address + > Workbench can be deployed on a local or cloud server as long as it can connect to IoTDB + + | Configuration | Before Modification | After modification | + | ---------------- | ----------------------------------- | ----------------------------------------------- | + | pipe.callbackUrl | pipe.callbackUrl=`http://127.0.0.1` | pipe.callbackUrl=`http://` | + + ![](https://alioss.timecho.com/docs/img/workbench-conf-1.png) + +3. Startup program: Please execute the startup command in the sbin folder of IoTDB Workbench -`` + Windows: + ```shell + # Start Workbench in the background + start.bat -d + ``` + Linux: + ```shell + # Start Workbench in the background + ./start.sh -d + ``` +4. You can use the `jps` command to check if the startup was successful, as shown in the figure: + + ![](https://alioss.timecho.com/docs/img/windows-jps.png) + +5. Verification successful: Open "`http://Server IP: Port in configuration file`" in the browser to access, for example:"`http://127.0.0.1:9190`" When the login interface appears, it is considered successful + + ![](https://alioss.timecho.com/docs/img/workbench-en.png) + + +### Step 4: Configure Instance Information + +1. Configure instance information: You only need to fill in the following information to connect to the instance + + ![](https://alioss.timecho.com/docs/img/workbench-en-1.jpeg) + + + | Field Name | Is It A Required Field | Field Meaning | Default Value | + | --------------- | ---------------------- | ------------------------------------------------------------ | ------ | + | Connection Type | Yes | The content filled in for different connection types varies, and supports selecting "single machine, cluster, dual active" | - | + | Instance Name | Yes | You can distinguish different instances based on their names, with a maximum input of 50 characters | - | + | Instance | Yes | Fill in the database address (`dn_rpc_address` field in the `iotdb/conf/iotdb-system.properties` file) and port number (`dn_rpc_port` field). Note: For clusters and dual active devices, clicking the "+" button supports entering multiple instance information | - | + | Prometheus | No | Fill in `http://:/app/v1/query` to view some monitoring information on the homepage. We recommend that you configure and use it | - | + | Username | Yes | Fill in the username for IoTDB, supporting input of 4 to 32 characters, including uppercase and lowercase letters, numbers, and special characters (! @ # $% ^&* () _+-=) | root | + | Enter Password | No | Fill in the password for IoTDB. To ensure the security of the database, we will not save the password. Please fill in the password yourself every time you connect to the instance or test | root | + +2. Test the accuracy of the information filled in: You can perform a connection test on the instance information by clicking the "Test" button + + ![](https://alioss.timecho.com/docs/img/workbench-en-2.png) + +## Appendix: IoTDB and Workbench Version Comparison Table + +| Workbench Version Number | Release Note | Supports IoTDB Versions | +| :------------------------: | :------------------------------------------------------------: | :-------------------------: | +| V1.4.0 | New tree model display and internationalization | V1.3.2 and above versions | +| V1.3.1 |New analysis methods have been added to the analysis function, and functions such as optimizing import templates have been optimized |V1.3.2 and above versions | +| V1.3.0 | Add database configuration function |V1.3.2 and above versions | +| V1.2.6 | Optimize the permission control function of each module | V1.3.1 and above versions | +| V1.2.5 | The visualization function has added the concept of "commonly used templates", and all interface optimization and page caching functions have been supplemented | V1.3.0 and above versions | +| V1.2.4 | The calculation function has added the "import and export" function, and the measurement point list has added the "time alignment" field | V1.2.2 and above versions | +| V1.2.3 | New "activation details" and analysis functions added to the homepage | V1.2.2 and above versions | +| V1.2.2 | Optimize the display content and other functions of "measurement point description" | V1.2.2 and above versions | +| V1.2.1 | New "Monitoring Panel" added to the data synchronization interface to optimize Prometheus prompt information | V1.2.2 and above versions | +| V1.2.0 | New Workbench version upgrade | V1.2.0 and above versions | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DBeaver.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DBeaver.md new file mode 100644 index 00000000..56b62d93 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DBeaver.md @@ -0,0 +1,92 @@ + + +# DBeaver + +DBeaver is a SQL client software application and a database administration tool. It can use the JDBC application programming interface (API) to interact with IoTDB via the JDBC driver. + +## DBeaver Installation + +* From DBeaver site: https://dbeaver.io/download/ + +## IoTDB Installation + +* Download binary version + * From IoTDB site: https://iotdb.apache.org/Download/ + * Version >= 0.13.0 +* Or compile from source code + * See https://github.com/apache/iotdb + +## Connect IoTDB and DBeaver + +1. Start IoTDB server + + ```shell + ./sbin/start-server.sh + ``` +2. Start DBeaver +3. Open Driver Manager + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/01.png) + +4. Create a new driver type for IoTDB + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) + +5. Download `iotdb-jdbc`, from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/),choose the corresponding jar file,download the suffix `jar-with-dependencies.jar` file. + ![](https://alioss.timecho.com/docs/img/20230920-192746.jpg) + +6. Add the downloaded jar file, then select `Find Class`. + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) + +7. Edit the driver Settings + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/05.png) + + ``` + Driver Name: IoTDB + Driver Type: Generic + URL Template: jdbc:iotdb://{host}:{port}/ + Default Port: 6667 + Default User: root + ``` + +8. Open New DataBase Connection and select iotdb + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/06.png) + +9. Edit JDBC Connection Settings + + ``` + JDBC URL: jdbc:iotdb://127.0.0.1:6667/ + Username: root + Password: root + ``` + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/07.png) + +10. Test Connection + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/08.png) + +11. Enjoy IoTDB with DBeaver + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/09.png) diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DataEase.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DataEase.md new file mode 100644 index 00000000..7e471aaf --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DataEase.md @@ -0,0 +1,228 @@ + +# DataEase + +## Product Overview + +1. Introduction to DataEase + + DataEase is an open-source data visualization and analysis tool that provides a drag-and-drop interface, allowing users to easily create charts and dashboards. It supports multiple data sources such as MySQL, SQL Server, Hive, ClickHouse, and DM, and can be integrated into other applications. This tool helps users quickly gain insights from their data and make informed decisions. For more detailed information, please refer to [DataEase official website](https://www.fit2cloud.com/dataease/index.html) + +
+ +
+ +2. Introduction to the DataEase-IoTDB Connector + + IoTDB can be efficiently integrated with DataEase through API data sources, and IoTDB data can be accessed through the Session interface using API data source plugins. This plugin supports customized data processing functions, providing users with greater flexibility and more diverse data operation options. +
+ +
+ +## Installation Requirements + +| **Preparation Content** | **Version Requirements** | +| :-------------------- | :----------------------------------------------------------- | +| IoTDB | Version not required, please refer to [Deployment Guidance](https://www.timecho-global.com/docs/UserGuide/latest/Deployment-and-Maintenance/IoTDB-Package_timecho.html) | +| JDK | Requires JDK 11 or higher (JDK 17 or above is recommended for optimal performance) | +| DataEase | Requires v1 series v1.18 version, please refer to the official [DataEase Installation Guide](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(V2.x is currently not supported. For integration with other versions, please contact Timecho) | +| DataEase-IoTDB Connector | Please contact Timecho for assistance | + +## Installation Steps + +Step 1: Please contact Timecho to obtain the file and unzip the installation package `iotdb-api-source-1.0.0.zip` + +Step 2: After extracting the files, modify the `application.properties` configuration file in the `config` folder + +- `server.port` can be modified as needed. +- `iotdb.nodeUrls` should be configured with the address and port of the IoTDB instance to be connected. +- `iotdb.user` should be set to the IoTDB username. +- `iotdb.password` should be set to the IoTDB password. + +```Properties +# Port on which the IoTDB API Source listens +server.port=8097 +# IoTDB instance addresses, multiple nodeUrls separated by ; +iotdb.nodeUrls=127.0.0.1:6667 +# IoTDB username +iotdb.user=root +# IoTDB password +iotdb.password=root +``` + +Step 3: Start up DataEase-IoTDB Connector + +- Foreground start + +```Shell +./sbin/start.sh +``` + +- Background start (add - d parameter) + +```Shell +./sbin/start.sh -d +``` + +Step 4: After startup, you can check whether the startup was successful through the log. + +```Shell + lsof -i:8097 // The port configured in the file where the IoTDB API Source listens +``` + +## Instructions + +### Sign in DataEase + +1. Sign in DataEase,access address: `http://[target server IP address]:80` +
+ +
+ +### Configure data source + +1. Navigate to "Data Source". +
+ +
+ +2. Click on the "+" on the top left corner, choose "API" at the bottom as data source. +
+ +
+ +3. Set the "Display Name" and add the API Data Source. +
+ +
+ +4. Set the name of the Dataset Table, select "Post" as the Request Type, fill in the address with `http://[IoTDB API Source]:[port]/getData>`. If operating on the local machine and using the default port, the address should be set to `http://127.0.0.1:8097/getData`. +
+ +
+ +5. In the "Request parameters"-"Request Body" configuration, set the format as "JSON". Please fill in the parameters according to the following example: + - timeseries:The full path of the series to be queried (currently only one series can be queried). + - limit:The number of entries to query (valid range is greater than 0 and less than 100,000). + + ```JSON + { + "timeseries": "root.ln.wf03.wt03.speed", + "limit": 1000 + } + ``` +
+ +
+ +6. In the "Request parameters"-"Request Body" configuration, set "Basic Auth" as the verification method, and enter the IoTDB username and password. +
+ +
+ +7. In the next step, results are returned in the "data" section. For example, it returns `time`, `rownumber` and `value` as shown in the interface below. The date type for each field also need to be specified. After completing the settings, click the "Save" button in the bottom. +
+ +
+ +8. Save the settings to complete creating new API data source. +
+ +
+ +9. You can now view and edit the data source and its detailed information under "API"-"Data Source". +
+ +
+ +### Configure the Dataset + +1. Create API dataset: Navigate to "Data Set",click on the "+" on the top left corner, select "API dataset" and choose the directory where this dataset is located to enter the New API Dataset interface. +
+ + +
+ +2. Select the newly created API data source and the corresponding dataset table, then define the DataSet Name. Save the settings to complete the creation of the dataset. +
+ + +
+ +3. Select the newly created dataset and navigate to "Field Manage", check the required fields (such as `rowNum`) and convert them to dimensions. +
+ +
+ +4. Configure update frequency: Click on "Add Task" under "Update info" tag and set the following information: + + - Task Name: Define the task name + + - Update method: Select "Full update" + + - Execution frequency: Set according to the actual situation. Considering the data retrieval speed of DataEase, it is recommended to set the update frequency to more than 5 seconds. For example, to set the update frequency to every 5 seconds, select "Expression setting" and configure the cron expression as `0/5****?*`. + Click on "Confirm" to save the settings. +
+ +
+ +5. The task is now successfully added. You can click "Execution record" to view the logs. +
+ +
+ +### Configure Dashboard + +1. Navigate to "Dashboard", click on "+" to create a directory, then click on "+" of the directory and select "Create Dashboard". +
+ +
+ +2. After setting up as needed, click on "Confirm". We will taking "Custom" setting as an example. +
+ +
+ +3. In the new dashboard interface, click on "Chart" to open a pop-up window for adding views. Select the previously created dataset and click on "Next". +
+ +
+ +4. Choose a chart type by need and define the chart title. We take "Base Line" as an example. Confirm to proceed. +
+ +
+ +5. In the chart configuration interface, drag and drop the `rowNum` field to the category axis (usually the X-axis) and the `value` field to the value axis (usually the Y-axis). +
+ +
+ +6. In the chart's category axis settings, set the sorting order to ascending, so that the data will be displayed in increasing order. Set the data refresh frequency to determine the frequency of chart updates. After completing these settings, you can further adjust other format and style options for the chart, such as color, size, etc., to meet display needs. Once adjustments are made, click the "Save" button to save the chart configuration. +>Since DataEase may cause the API data, originally returned in ascending order, to become disordered after automatically updating the dataset, it is necessary to manually specify the sorting order in the chart configuration. +
+ +
+ +7. After exiting the editing mode, you will be able to see the corresponding chart. +
+ +
\ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-IoTDB.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-IoTDB.md new file mode 100644 index 00000000..efb39723 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-IoTDB.md @@ -0,0 +1,215 @@ + + +# Apache Flink(IoTDB) + +IoTDB integration for [Apache Flink](https://flink.apache.org/). This module includes the IoTDB sink that allows a flink job to write events into timeseries, and the IoTDB source allowing reading data from IoTDB. + +## IoTDBSink + +To use the `IoTDBSink`, you need construct an instance of it by specifying `IoTDBSinkOptions` and `IoTSerializationSchema` instances. +The `IoTDBSink` send only one event after another by default, but you can change to batch by invoking `withBatchSize(int)`. + +### Example + +This example shows a case that sends data to a IoTDB server from a Flink job: + +- A simulated Source `SensorSource` generates data points per 1 second. +- Flink uses `IoTDBSink` to consume the generated data points and write the data into IoTDB. + +It is noteworthy that to use IoTDBSink, schema auto-creation in IoTDB should be enabled. + +```java +import org.apache.iotdb.flink.options.IoTDBSinkOptions; +import org.apache.iotdb.tsfile.file.metadata.enums.CompressionType; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding; + +import com.google.common.collect.Lists; +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; +import org.apache.flink.streaming.api.functions.source.SourceFunction; + +import java.security.SecureRandom; +import java.util.HashMap; +import java.util.Map; +import java.util.Random; + +public class FlinkIoTDBSink { + public static void main(String[] args) throws Exception { + // run the flink job on local mini cluster + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + IoTDBSinkOptions options = new IoTDBSinkOptions(); + options.setHost("127.0.0.1"); + options.setPort(6667); + options.setUser("root"); + options.setPassword("root"); + + // If the server enables auto_create_schema, then we do not need to register all timeseries + // here. + options.setTimeseriesOptionList( + Lists.newArrayList( + new IoTDBSinkOptions.TimeseriesOption( + "root.sg.d1.s1", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY))); + + IoTSerializationSchema serializationSchema = new DefaultIoTSerializationSchema(); + IoTDBSink ioTDBSink = + new IoTDBSink(options, serializationSchema) + // enable batching + .withBatchSize(10) + // how many connections to the server will be created for each parallelism + .withSessionPoolSize(3); + + env.addSource(new SensorSource()) + .name("sensor-source") + .setParallelism(1) + .addSink(ioTDBSink) + .name("iotdb-sink"); + + env.execute("iotdb-flink-example"); + } + + private static class SensorSource implements SourceFunction> { + boolean running = true; + Random random = new SecureRandom(); + + @Override + public void run(SourceContext context) throws Exception { + while (running) { + Map tuple = new HashMap(); + tuple.put("device", "root.sg.d1"); + tuple.put("timestamp", String.valueOf(System.currentTimeMillis())); + tuple.put("measurements", "s1"); + tuple.put("types", "DOUBLE"); + tuple.put("values", String.valueOf(random.nextDouble())); + + context.collect(tuple); + Thread.sleep(1000); + } + } + + @Override + public void cancel() { + running = false; + } + } +} + +``` + +### Usage + +* Launch the IoTDB server. +* Run `org.apache.iotdb.flink.FlinkIoTDBSink.java` to run the flink job on local mini cluster. + +## IoTDBSource +To use the `IoTDBSource`, you need to construct an instance of `IoTDBSource` by specifying `IoTDBSourceOptions` +and implementing the abstract method `convert()` in `IoTDBSource`. The `convert` methods defines how +you want the row data to be transformed. + +### Example +This example shows a case where data are read from IoTDB. +```java +import org.apache.iotdb.flink.options.IoTDBSourceOptions; +import org.apache.iotdb.rpc.IoTDBConnectionException; +import org.apache.iotdb.rpc.StatementExecutionException; +import org.apache.iotdb.rpc.TSStatusCode; +import org.apache.iotdb.session.Session; +import org.apache.iotdb.tsfile.file.metadata.enums.CompressionType; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding; +import org.apache.iotdb.tsfile.read.common.RowRecord; + +import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + +import java.util.ArrayList; +import java.util.List; + +public class FlinkIoTDBSource { + + static final String LOCAL_HOST = "127.0.0.1"; + static final String ROOT_SG1_D1_S1 = "root.sg1.d1.s1"; + static final String ROOT_SG1_D1 = "root.sg1.d1"; + + public static void main(String[] args) throws Exception { + prepareData(); + + // run the flink job on local mini cluster + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + IoTDBSourceOptions ioTDBSourceOptions = + new IoTDBSourceOptions("127.0.0.1", 6667, "root", "root", + "select s1 from " + ROOT_SG1_D1 + " align by device"); + + env.addSource( + new IoTDBSource(ioTDBSourceOptions) { + @Override + public RowRecord convert(RowRecord rowRecord) { + return rowRecord; + } + }) + .name("sensor-source") + .print() + .setParallelism(2); + env.execute(); + } + + /** + * Write some data to IoTDB + */ + private static void prepareData() throws IoTDBConnectionException, StatementExecutionException { + Session session = new Session(LOCAL_HOST, 6667, "root", "root"); + session.open(false); + try { + session.setStorageGroup("root.sg1"); + if (!session.checkTimeseriesExists(ROOT_SG1_D1_S1)) { + session.createTimeseries( + ROOT_SG1_D1_S1, TSDataType.INT64, TSEncoding.RLE, CompressionType.SNAPPY); + List measurements = new ArrayList<>(); + List types = new ArrayList<>(); + measurements.add("s1"); + measurements.add("s2"); + measurements.add("s3"); + types.add(TSDataType.INT64); + types.add(TSDataType.INT64); + types.add(TSDataType.INT64); + + for (long time = 0; time < 100; time++) { + List values = new ArrayList<>(); + values.add(1L); + values.add(2L); + values.add(3L); + session.insertRecord(ROOT_SG1_D1, time, measurements, types, values); + } + } + } catch (StatementExecutionException e) { + if (e.getStatusCode() != TSStatusCode.PATH_ALREADY_EXIST_ERROR.getStatusCode()) { + throw e; + } + } + } +} +``` + +### Usage +Launch the IoTDB server. +Run org.apache.iotdb.flink.FlinkIoTDBSource.java to run the flink job on local mini cluster. + diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-TsFile.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-TsFile.md new file mode 100644 index 00000000..e1ea626d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-TsFile.md @@ -0,0 +1,180 @@ + + +# Apache Flink(TsFile) + +## About Flink-TsFile-Connector + +Flink-TsFile-Connector implements the support of Flink for external data sources of Tsfile type. +This enables users to read and write Tsfile by Flink via DataStream/DataSet API. + +With this connector, you can + +* load a single TsFile or multiple TsFiles(only for DataSet), from either the local file system or hdfs, into Flink +* load all files in a specific directory, from either the local file system or hdfs, into Flink + +## Quick Start + +### TsFileInputFormat Example + +1. create TsFileInputFormat with default RowRowRecordParser. + +```java +String[] filedNames = { + QueryConstant.RESERVED_TIME, + "device_1.sensor_1", + "device_1.sensor_2", + "device_1.sensor_3", + "device_2.sensor_1", + "device_2.sensor_2", + "device_2.sensor_3" +}; +TypeInformation[] typeInformations = new TypeInformation[] { + Types.LONG, + Types.FLOAT, + Types.INT, + Types.INT, + Types.FLOAT, + Types.INT, + Types.INT +}; +List paths = Arrays.stream(filedNames) + .filter(s -> !s.equals(QueryConstant.RESERVED_TIME)) + .map(Path::new) + .collect(Collectors.toList()); +RowTypeInfo rowTypeInfo = new RowTypeInfo(typeInformations, filedNames); +QueryExpression queryExpression = QueryExpression.create(paths, null); +RowRowRecordParser parser = RowRowRecordParser.create(rowTypeInfo, queryExpression.getSelectedSeries()); +TsFileInputFormat inputFormat = new TsFileInputFormat<>(queryExpression, parser); +``` + +2. Read data from the input format and print to stdout: + +DataStream: + +```java +StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment(); +inputFormat.setFilePath("source.tsfile"); +DataStream source = senv.createInput(inputFormat); +DataStream rowString = source.map(Row::toString); +Iterator result = DataStreamUtils.collect(rowString); +while (result.hasNext()) { + System.out.println(result.next()); +} +``` + +DataSet: + +```java +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +inputFormat.setFilePath("source.tsfile"); +DataSet source = env.createInput(inputFormat); +List result = source.map(Row::toString).collect(); +for (String s : result) { + System.out.println(s); +} +``` + +### Example of TSRecordOutputFormat + +1. create TSRecordOutputFormat with default RowTSRecordConverter. + +```java +String[] filedNames = { + QueryConstant.RESERVED_TIME, + "device_1.sensor_1", + "device_1.sensor_2", + "device_1.sensor_3", + "device_2.sensor_1", + "device_2.sensor_2", + "device_2.sensor_3" +}; +TypeInformation[] typeInformations = new TypeInformation[] { + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG +}; +RowTypeInfo rowTypeInfo = new RowTypeInfo(typeInformations, filedNames); +Schema schema = new Schema(); +schema.extendTemplate("template", new MeasurementSchema("sensor_1", TSDataType.INT64, TSEncoding.TS_2DIFF)); +schema.extendTemplate("template", new MeasurementSchema("sensor_2", TSDataType.INT64, TSEncoding.TS_2DIFF)); +schema.extendTemplate("template", new MeasurementSchema("sensor_3", TSDataType.INT64, TSEncoding.TS_2DIFF)); +RowTSRecordConverter converter = new RowTSRecordConverter(rowTypeInfo); +TSRecordOutputFormat outputFormat = new TSRecordOutputFormat<>(schema, converter); +``` + +2. write data via the output format: + +DataStream: + +```java +StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment(); +senv.setParallelism(1); +List data = new ArrayList<>(7); +data.add(new Tuple7(1L, 2L, 3L, 4L, 5L, 6L, 7L)); +data.add(new Tuple7(2L, 3L, 4L, 5L, 6L, 7L, 8L)); +data.add(new Tuple7(3L, 4L, 5L, 6L, 7L, 8L, 9L)); +data.add(new Tuple7(4L, 5L, 6L, 7L, 8L, 9L, 10L)); +data.add(new Tuple7(6L, 6L, 7L, 8L, 9L, 10L, 11L)); +data.add(new Tuple7(7L, 7L, 8L, 9L, 10L, 11L, 12L)); +data.add(new Tuple7(8L, 8L, 9L, 10L, 11L, 12L, 13L)); +outputFormat.setOutputFilePath(new org.apache.flink.core.fs.Path(path)); +DataStream source = senv.fromCollection( + data, Types.TUPLE(Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG)); +source.map(t -> { + Row row = new Row(7); + for (int i = 0; i < 7; i++) { + row.setField(i, t.getField(i)); + } + return row; +}).returns(rowTypeInfo).writeUsingOutputFormat(outputFormat); +senv.execute(); +``` + +DataSet: + +```java +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +env.setParallelism(1); +List data = new ArrayList<>(7); +data.add(new Tuple7(1L, 2L, 3L, 4L, 5L, 6L, 7L)); +data.add(new Tuple7(2L, 3L, 4L, 5L, 6L, 7L, 8L)); +data.add(new Tuple7(3L, 4L, 5L, 6L, 7L, 8L, 9L)); +data.add(new Tuple7(4L, 5L, 6L, 7L, 8L, 9L, 10L)); +data.add(new Tuple7(6L, 6L, 7L, 8L, 9L, 10L, 11L)); +data.add(new Tuple7(7L, 7L, 8L, 9L, 10L, 11L, 12L)); +data.add(new Tuple7(8L, 8L, 9L, 10L, 11L, 12L, 13L)); +DataSet source = env.fromCollection( + data, Types.TUPLE(Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG)); +source.map(t -> { + Row row = new Row(7); + for (int i = 0; i < 7; i++) { + row.setField(i, t.getField(i)); + } + return row; +}).returns(rowTypeInfo).write(outputFormat, path); +env.execute(); +``` + diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Connector.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Connector.md new file mode 100644 index 00000000..ceb6ec9f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Connector.md @@ -0,0 +1,180 @@ + + +# Grafana(IoTDB) + +Grafana is an open source volume metrics monitoring and visualization tool, which can be used to display time series data and application runtime analysis. Grafana supports Graphite, InfluxDB and other major time series databases as data sources. IoTDB-Grafana-Connector is a connector which we developed to show time series data in IoTDB by reading data from IoTDB and sends to Grafana(https://grafana.com/). Before using this tool, make sure Grafana and IoTDB are correctly installed and started. + +## Installation and deployment + +### Install Grafana + +* Download url: https://grafana.com/grafana/download +* Version >= 4.4.1 + +### Install data source plugin + +* Plugin name: simple-json-datasource +* Download url: https://github.com/grafana/simple-json-datasource + +After downloading this plugin, use the grafana-cli tool to install SimpleJson from the commandline: + +``` +grafana-cli plugins install grafana-simple-json-datasource +``` + +Alternatively, manually download the .zip file and unpack it into grafana plugins directory. + +* `{grafana-install-directory}\data\plugins\` (Windows) +* `/var/lib/grafana/plugins` (Linux) +* `/usr/local/var/lib/grafana/plugins`(Mac) + +Then you need to restart grafana server, then you can use browser to visit grafana. + +If you see "SimpleJson" in "Type" of "Add data source" pages, then it is install successfully. + +Or, if you meet following errors: + +``` +Unsigned plugins were found during plugin initialization. Grafana Labs cannot guarantee the integrity of these plugins. We recommend only using signed plugins. +The following plugins are disabled and not shown in the list below: +``` + +Please try to find config file of grafana(eg. customer.ini in windows, and /etc/grafana/grafana.ini in linux), then add following configuration: + +``` +allow_loading_unsigned_plugins = "grafana-simple-json-datasource" +``` + +### Start Grafana +If Unix is used, Grafana will start automatically after installing, or you can run `sudo service grafana-server start` command. See more information [here](http://docs.grafana.org/installation/debian/). + +If Mac and `homebrew` are used to install Grafana, you can use `homebrew` to start Grafana. +First make sure homebrew/services is installed by running `brew tap homebrew/services`, then start Grafana using: `brew services start grafana`. +See more information [here](http://docs.grafana.org/installation/mac/). + +If Windows is used, start Grafana by executing grafana-server.exe, located in the bin directory, preferably from the command line. See more information [here](http://docs.grafana.org/installation/windows/). + +## IoTDB installation + +See https://github.com/apache/iotdb + +## IoTDB-Grafana-Connector installation + +```shell +git clone https://github.com/apache/iotdb.git +``` + +## Start IoTDB-Grafana-Connector + +* Option one + +Import the entire project, after the maven dependency is installed, directly run`iotdb/grafana-connector/rc/main/java/org/apache/iotdb/web/grafana`directory` TsfileWebDemoApplication.java`, this grafana connector is developed by springboot + +* Option two + +In `/grafana/target/`directory + +```shell +cd iotdb +mvn clean package -pl iotdb-connector/grafana-connector -am -Dmaven.test.skip=true +cd iotdb-connector/grafana-connector/target +java -jar iotdb-grafana-connector-{version}.war +``` + +If following output is displayed, then iotdb-grafana-connector connector is successfully activated. + +```shell +$ java -jar iotdb-grafana-connector-{version}.war + + . ____ _ __ _ _ + /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ +( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ + \\/ ___)| |_)| | | | | || (_| | ) ) ) ) + ' |____| .__|_| |_|_| |_\__, | / / / / + =========|_|==============|___/=/_/_/_/ + :: Spring Boot :: (v1.5.4.RELEASE) +... +``` + +To configure properties, move the `grafana-connector/src/main/resources/application.properties` to the same directory as the war package (`grafana/target`) + +## Explore in Grafana + +The default port of Grafana is 3000, see http://localhost:3000/ + +Username and password are both "admin" by default. + +### Add data source + +Select `Data Sources` and then `Add data source`, select `SimpleJson` in `Type` and `URL` is http://localhost:8888. +After that, make sure IoTDB has been started, click "Save & Test", and "Data Source is working" will be shown to indicate successful configuration. + + + + +### Design in dashboard + +Add diagrams in dashboard and customize your query. See http://docs.grafana.org/guides/getting_started/ + + + +## config grafana + +``` +# ip and port of IoTDB +spring.datasource.url=jdbc:iotdb://127.0.0.1:6667/ +spring.datasource.username=root +spring.datasource.password=root +spring.datasource.driver-class-name=org.apache.iotdb.jdbc.IoTDBDriver +server.port=8888 +# Use this value to set timestamp precision as "ms", "us" or "ns", which must to be same with the timestamp +# precision of Apache IoTDB engine. +timestamp_precision=ms + +# Use this value to set down sampling true/false +isDownSampling=true +# defaut sampling intervals +interval=1m +# aggregation function to use to downsampling the data (int, long, float, double) +# COUNT, FIRST_VALUE, LAST_VALUE, MAX_TIME, MAX_VALUE, AVG, MIN_TIME, MIN_VALUE, NOW, SUM +continuous_data_function=AVG +# aggregation function to use to downsampling the data (boolean, string) +# COUNT, FIRST_VALUE, LAST_VALUE, MAX_TIME, MIN_TIME, NOW +discrete_data_function=LAST_VALUE +``` + +The specific configuration information of interval is as follows + +<1h: no sampling + +1h~1d : intervals = 1m + +1d~30d:intervals = 1h + +\>30d:intervals = 1d + +After configuration, please re-run war package + +``` +java -jar iotdb-grafana-connector-{version}.war +``` + diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Plugin.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Plugin.md new file mode 100644 index 00000000..dc837046 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Plugin.md @@ -0,0 +1,298 @@ + + +# Grafana Plugin + + +Grafana is an open source volume metrics monitoring and visualization tool, which can be used to present time series data and analyze application runtime status. + +We developed the Grafana-Plugin for IoTDB, using the IoTDB REST service to present time series data and providing many visualization methods for time series data. +Compared with previous IoTDB-Grafana-Connector, current Grafana-Plugin performs more efficiently and supports more query types. So, **we recommend using Grafana-Plugin instead of IoTDB-Grafana-Connector**. + +## Installation and deployment + +### Install Grafana + +* Download url: https://grafana.com/grafana/download +* Version >= 9.3.0 + + +### Acquisition method of grafana plugin + +#### Download apache-iotdb-datasource from Grafana's official website + +Download url:https://grafana.com/api/plugins/apache-iotdb-datasource/versions/1.0.0/download + +### Install Grafana-Plugin + +### Method 1: Install using the grafana cli tool (recommended) + +* Use the grafana cli tool to install apache-iotdb-datasource from the command line. The command content is as follows: + +```shell +grafana-cli plugins install apache-iotdb-datasource +``` + +### Method 2: Install using the Grafana interface (recommended) + +* Click on Configuration ->Plugins ->Search IoTDB from local Grafana to install the plugin + +### Method 3: Manually install the grafana-plugin plugin (not recommended) + + +* Copy the front-end project target folder generated above to Grafana's plugin directory `${Grafana directory}\data\plugins\`。If there is no such directory, you can manually create it or start grafana and it will be created automatically. Of course, you can also modify the location of plugins. For details, please refer to the following instructions for modifying the location of Grafana's plugin directory. + +* Start Grafana (restart if the Grafana service is already started) + +For more details,please click [here](https://grafana.com/docs/grafana/latest/plugins/installation/) + +### Start Grafana + +Start Grafana with the following command in the Grafana directory: + +* Windows: + +```shell +bin\grafana-server.exe +``` +* Linux: + +```shell +sudo service grafana-server start +``` + +* MacOS: + +```shell +brew services start grafana +``` + +For more details,please click [here](https://grafana.com/docs/grafana/latest/installation/) + + + +### Configure IoTDB REST Service + +* Modify `{iotdb directory}/conf/iotdb-system.properties` as following: + +```properties +# Is the REST service enabled +enable_rest_service=true + +# the binding port of the REST service +rest_service_port=18080 +``` + +Start IoTDB (restart if the IoTDB service is already started) + + +## How to use Grafana-Plugin + +### Access Grafana dashboard + +Grafana displays data in a web page dashboard. Please open your browser and visit `http://:` when using it. + +* IP is the IP of the server where your Grafana is located, and Port is the running port of Grafana (default 3000). + +* The default login username and password are both `admin`. + + +### Add IoTDB as Data Source + +Click the `Settings` icon on the left, select the `Data Source` option, and then click `Add data source`. + + + + + +Select the `Apache IoTDB` data source. + +* Fill in `http://:` in the `URL` field + * ip is the host ip where your IoTDB server is located + * port is the running port of the REST service (default 18080). +* Enter the username and password of the IoTDB server + +Click `Save & Test`, and `Data source is working` will appear. + + + + +### Create a new Panel + +Click the `Dashboards` icon on the left, and select `Manage` option. + + + +Click the `New Dashboard` icon on the top right, and select `Add an empty panel` option. + + + +Grafana plugin supports SQL: Full Customized mode and SQL: Drop-down List mode, and the default mode is SQL: Full Customized mode. + + + +#### SQL: Full Customized input method + +Enter content in the SELECT, FROM , WHERE and CONTROL input box, where the WHERE and CONTROL input boxes are optional. + +If a query involves multiple expressions, we can click `+` on the right side of the SELECT input box to add expressions in the SELECT clause, or click `+` on the right side of the FROM input box to add a path prefix: + + + +SELECT input box: contents can be the time series suffix, function, udf, arithmetic expression, or nested expressions. You can also use the as clause to rename the result. + +Here are some examples of valid SELECT content: + +* `s1` +* `top_k(s1, 'k'='1') as top` +* `sin(s1) + cos(s1 + s2)` +* `udf(s1) as "alias"` + +FROM input box: contents must be the prefix path of the time series, such as `root.sg.d`. + +WHERE input box: contents should be the filter condition of the query, such as `time > 0` or `s1 < 1024 and s2 > 1024`. + +CONTROL input box: contents should be a special clause that controls the query type and output format. +The GROUP BY input box supports the use of grafana's global variables to obtain the current time interval changes $__from (start time), $__to (end time) + +Here are some examples of valid CONTROL content: + +* `GROUP BY ([$__from, $__to), 1d)` +* `GROUP BY ([$__from, $__to),3h,1d)` +* `GROUP BY ([2017-11-01T00:00:00, 2017-11-07T23:00:00), 1d)` +* `GROUP BY ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` +* `GROUP BY ([$__from, $__to), 1m) FILL (PREVIOUSUNTILLAST)` +* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (PREVIOUSUNTILLAST)` +* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (PREVIOUS, 1m)` +* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (LINEAR, 5m, 5m)` +* `GROUP BY ((2017-11-01T00:00:00, 2017-11-07T23:00:00], 1d), LEVEL=1` +* `GROUP BY ([0, 20), 2ms, 3ms), LEVEL=1` + + +Tip: Statements like `select * from root.xx.**` are not recommended because those statements may cause OOM. + +#### SQL: Drop-down List + +Select a time series in the TIME-SERIES selection box, select a function in the FUNCTION option, and enter the contents in the SAMPLING INTERVAL、SLIDING STEP、LEVEL、FILL input boxes, where TIME-SERIES is a required item and the rest are non required items. + + + +### Support for variables and template functions + +Both SQL: Full Customized and SQL: Drop-down List input methods support the variable and template functions of grafana. In the following example, raw input method is used, and aggregation is similar. + +After creating a new Panel, click the Settings button in the upper right corner: + + + +Select `Variables`, click `Add variable`: + + + +Example 1:Enter `Name`, `Label`, and `Query`, and then click the `Update` button: + + + +Apply Variables, enter the variable in the `grafana panel` and click the `save` button: + + + +Example 2: Nested use of variables: + + + + + + + + +Example 3: using function variables + + + + + +The Name in the above figure is the variable name and the variable name we will use in the panel in the future. Label is the display name of the variable. If it is empty, the variable of Name will be displayed. Otherwise, the name of the Label will be displayed. +There are Query, Custom, Text box, Constant, DataSource, Interval, Ad hoc filters, etc. in the Type drop-down, all of which can be used in IoTDB's Grafana Plugin +For a more detailed introduction to usage, please check the official manual (https://grafana.com/docs/grafana/latest/variables/) + +In addition to the examples above, the following statements are supported: + +* `show databases` +* `show timeseries` +* `show child nodes` +* `show all ttl` +* `show latest timeseries` +* `show devices` +* `select xx from root.xxx limit xx 等sql 查询` + +Tip: If the query field contains Boolean data, the result value will be converted to 1 by true and 0 by false. + +### Grafana alert function + +This plugin supports Grafana alert function. + +1. In the Grafana panel, click the `alerting` button, as shown in the following figure: + + + +2. Click `Create alert rule from this panel`, as shown in the figure below: + + + +3. Set query and alarm conditions in step 1. Conditions represent query conditions, and multiple combined query conditions can be configured. As shown below: + + +The query condition in the figure: `min() OF A IS BELOW 0`, means that the condition will be triggered when the minimum value in the A tab is 0, click this function to change it to another function. + +Tip: Queries used in alert rules cannot contain any template variables. Currently we only support AND and OR operators between conditions, which are executed serially. +For example, we have 3 conditions in the following order: Condition: B (Evaluates to: TRUE) OR Condition: C (Evaluates to: FALSE) and Condition: D (Evaluates to: TRUE) So the result will evaluate to ((True or False ) and right) = right. + + +4. After selecting indicators and alarm rules, click the `Preview` button to preview the data as shown in the figure below: + + + +5. In step 2, specify the alert evaluation interval, and for `Evaluate every`, specify the evaluation frequency. Must be a multiple of 10 seconds. For example, 1m, 30s. + For `Evaluate for`, specify the duration before the alert fires. As shown below: + + + +6. In step 3, add the storage location, rule group, and other metadata associated with the rule. Where `Rule name` specifies the name of the rule. Rule names must be unique. + + + +7. In step 4, add a custom label. Add a custom label by selecting an existing key-value pair from the drop-down list, or add a new label by entering a new key or value. As shown below: + + + +8. Click `Save` to save the rule or click `Save and Exit` to save the rule and return to the alerts page. + +9. Commonly used alarm states include `Normal`, `Pending`, `Firing` and other states, as shown in the figure below: + + + + +10. We can also configure `Contact points` for alarms to receive alarm notifications. For more detailed operations, please refer to the official document (https://grafana.com/docs/grafana/latest/alerting/manage-notifications/create-contact-point/). + +## More Details about Grafana + +For more details about Grafana operation, please refer to the official Grafana documentation: http://docs.grafana.org/guides/getting_started/. diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Hive-TsFile.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Hive-TsFile.md new file mode 100644 index 00000000..e8b4dc30 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Hive-TsFile.md @@ -0,0 +1,170 @@ + +# Apache Hive(TsFile) + +## About Hive-TsFile-Connector + +Hive-TsFile-Connector implements the support of Hive for external data sources of Tsfile type. This enables users to operate TsFile by Hive. + +With this connector, you can + +* Load a single TsFile, from either the local file system or hdfs, into hive +* Load all files in a specific directory, from either the local file system or hdfs, into hive +* Query the tsfile through HQL. +* As of now, the write operation is not supported in hive-connector. So, insert operation in HQL is not allowed while operating tsfile through hive. + +## System Requirements + +|Hadoop Version |Hive Version | Java Version | TsFile | +|------------- |------------ | ------------ |------------ | +| `2.7.3` or `3.2.1` | `2.3.6` or `3.1.2` | `1.8` | `1.0.0`| + +> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile. + +## Data Type Correspondence + +| TsFile data type | Hive field type | +| ---------------- | --------------- | +| BOOLEAN | Boolean | +| INT32 | INT | +| INT64 | BIGINT | +| FLOAT | Float | +| DOUBLE | Double | +| TEXT | STRING | + + +## Add Dependency For Hive + +To use hive-connector in hive, we should add the hive-connector jar into hive. + +After downloading the code of iotdb from , you can use the command of `mvn clean package -pl iotdb-connector/hive-connector -am -Dmaven.test.skip=true -P get-jar-with-dependencies` to get a `hive-connector-X.X.X-jar-with-dependencies.jar`. + +Then in hive, use the command of `add jar XXX` to add the dependency. For example: + +``` +hive> add jar /Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar; + +Added [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] to class path +Added resources: [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] +``` + + +## Create Tsfile-backed Hive tables + +To create a Tsfile-backed table, specify the `serde` as `org.apache.iotdb.hive.TsFileSerDe`, +specify the `inputformat` as `org.apache.iotdb.hive.TSFHiveInputFormat`, +and the `outputformat` as `org.apache.iotdb.hive.TSFHiveOutputFormat`. + +Also provide a schema which only contains two fields: `time_stamp` and `sensor_id` for the table. +`time_stamp` is the time value of the time series +and `sensor_id` is the sensor name to extract from the tsfile to hive such as `sensor_1`. +The name of the table can be any valid table names in hive. + +Also a location provided for hive-connector to pull the most current data for the table. + +The location should be a specific directory on your local file system or HDFS to set up Hadoop. +If it is in your local file system, the location should look like `file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/` + +Last, set the `device_id` in `TBLPROPERTIES` to the device name you want to analyze. + +For example: + +``` +CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1( + time_stamp TIMESTAMP, + sensor_1 BIGINT) +ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe' +STORED AS + INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat' + OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat' +LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/' +TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1'); +``` +In this example, the data of `root.baic2.WWS.leftfrontdoor.plc1.sensor_1` is pulled from the directory of `/data/data/sequence/root.baic2.WWS.leftfrontdoor/`. +This table results in a description as below: + +``` +hive> describe only_sensor_1; +OK +time_stamp timestamp from deserializer +sensor_1 bigint from deserializer +Time taken: 0.053 seconds, Fetched: 2 row(s) +``` +At this point, the Tsfile-backed table can be worked with in Hive like any other table. + +## Query from TsFile-backed Hive tables + +Before we do any queries, we should set the `hive.input.format` in hive by executing the following command. + +``` +hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; +``` + +Now, we already have an external table named `only_sensor_1` in hive. +We can use any query operations through HQL to analyse it. + +For example: + +### Select Clause Example + +``` +hive> select * from only_sensor_1 limit 10; +OK +1 1000000 +2 1000001 +3 1000002 +4 1000003 +5 1000004 +6 1000005 +7 1000006 +8 1000007 +9 1000008 +10 1000009 +Time taken: 1.464 seconds, Fetched: 10 row(s) +``` + +### Aggregate Clause Example + +``` +hive> select count(*) from only_sensor_1; +WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. +Query ID = jackietien_20191016202416_d1e3e233-d367-4453-b39a-2aac9327a3b6 +Total jobs = 1 +Launching Job 1 out of 1 +Number of reduce tasks determined at compile time: 1 +In order to change the average load for a reducer (in bytes): + set hive.exec.reducers.bytes.per.reducer= +In order to limit the maximum number of reducers: + set hive.exec.reducers.max= +In order to set a constant number of reducers: + set mapreduce.job.reduces= +Job running in-process (local Hadoop) +2019-10-16 20:24:18,305 Stage-1 map = 0%, reduce = 0% +2019-10-16 20:24:27,443 Stage-1 map = 100%, reduce = 100% +Ended Job = job_local867757288_0002 +MapReduce Jobs Launched: +Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS +Total MapReduce CPU Time Spent: 0 msec +OK +1000000 +Time taken: 11.334 seconds, Fetched: 1 row(s) +``` + diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md new file mode 100644 index 00000000..c2d3784d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md @@ -0,0 +1,273 @@ + + +# Ignition + +## Product Overview + +1. Introduction to Ignition + + Ignition is a web-based monitoring and data acquisition tool (SCADA) - an open and scalable universal platform. Ignition allows you to more easily control, track, display, and analyze all data of your enterprise, enhancing business capabilities. For more introduction details, please refer to [Ignition Official Website](https://docs.inductiveautomation.com/docs/8.1/getting-started/introducing-ignition) + +2. Introduction to the Ignition-IoTDB Connector + + The ignition-IoTDB Connector is divided into two modules: the ignition-IoTDB Connector,Ignition-IoTDB With JDBC。 Among them: + + - Ignition-IoTDB Connector: Provides the ability to store data collected by Ignition into IoTDB, and also supports data reading in Components. It injects script interfaces such as `system. iotdb. insert`and`system. iotdb. query`to facilitate programming in Ignition + - Ignition-IoTDB With JDBC: Ignition-IoTDB With JDBC can be used in the`Transaction Groups`module and is not applicable to the`Tag Historian`module. It can be used for custom writing and querying. + + The specific relationship and content between the two modules and ignition are shown in the following figure. + + ![](https://alioss.timecho.com/docs/img/20240703114443.png) + +## Installation Requirements + +| **Preparation Content** | Version Requirements | +| ------------------------------- | ------------------------------------------------------------ | +| IoTDB | Version 1.3.1 and above are required to be installed, please refer to IoTDB for installation [Deployment Guidance](../Deployment-and-Maintenance/IoTDB-Package_timecho.md) | +| Ignition | Requirement: 8.1 version (8.1.37 and above) of version 8.1 must be installed. Please refer to the Ignition official website for installation [Installation Guidance](https://docs.inductiveautomation.com/docs/8.1/getting-started/installing-and-upgrading)(Other versions are compatible, please contact the business department for more information) | +| Ignition-IoTDB Connector module | Please contact Business to obtain | +| Ignition-IoTDB With JDBC module | Download address:https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/ | + +## Instruction Manual For Ignition-IoTDB Connector + +### Introduce + +The Ignition-IoTDB Connector module can store data in a database connection associated with the historical database provider. The data is directly stored in a table in the SQL database based on its data type, as well as a millisecond timestamp. Store data only when making changes based on the value pattern and dead zone settings on each label, thus avoiding duplicate and unnecessary data storage. + +The Ignition-IoTDB Connector provides the ability to store the data collected by Ignition into IoTDB. + +### Installation Steps + +Step 1: Enter the `Configuration` - `System` - `Modules` module and click on the `Install or Upgrade a Module` button at the bottom + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-1.PNG) + +Step 2: Select the obtained `modl`, select the file and upload it, click `Install`, and trust the relevant certificate. + +![](https://alioss.timecho.com/docs/img/20240703-151030.png) + +Step 3: After installation is completed, you can see the following content + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-3.PNG) + +Step 4: Enter the `Configuration` - `Tags` - `History` module and click on `Create new Historical Tag Provider` below + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-4.png) + +Step 5: Select `IoTDB` and fill in the configuration information + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-5.PNG) + +The configuration content is as follows: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameDescriptionDefault ValueNotes
Main
Provider NameProvider Name-
Enabled trueThe provider can only be used when it is true
DescriptionDescription-
IoTDB Settings
Host NameThe address of the target IoTDB instance-
Port NumberThe port of the target IoTDB instance6667
UsernameThe username of the target IoTDB-
PasswordPassword for target IoTDB-
Database NameThe database name to be stored, starting with root, such as root db-
Pool SizeSize of SessionPool50Can be configured as needed
Store and Forward SettingsJust keep it as default
+ + + +### Instructions + +#### Configure Historical Data Storage + +- After configuring the `Provider`, you can use the `IoTDB Tag Historian` in the `Designer`, just like using other `Providers`. Right click on the corresponding `Tag` and select `Edit Tag (s) `, then select the History category in the Tag Editor + + ![](https://alioss.timecho.com/docs/img/ignition-7.png) + +- Set `History Disabled` to `true`, select `Storage Provider` as the `Provider` created in the previous step, configure other parameters as needed, click `OK`, and then save the project. At this point, the data will be continuously stored in the 'IoTDB' instance according to the set content. + + ![](https://alioss.timecho.com/docs/img/ignition-8.png) + +#### Read Data + +- You can also directly select the tags stored in IoTDB under the Data tab of the Report + + ![](https://alioss.timecho.com/docs/img/ignition-9.png) + +- You can also directly browse relevant data in Components + + ![](https://alioss.timecho.com/docs/img/ignition-10.png) + +#### Script module: This function can interact with IoTDB + +1. system.iotdb.insert: + + +- Script Description: Write data to an IoTDB instance + +- Script Definition: + + `system.iotdb.insert(historian, deviceId, timestamps, measurementNames, measurementValues)` + +- Parameter: + + - `str historian`:The name of the corresponding IoTDB Tag Historian Provider + - `str deviceId`:The deviceId written, excluding the configured database, such as Sine + - `long[] timestamps`:List of timestamps for written data points + - `str[] measurementNames`:List of names for written physical quantities + - `str[][] measurementValues`:The written data point data corresponds to the timestamp list and physical quantity name list + +- Return Value: None + +- Available Range:Client, Designer, Gateway + +- Usage example: + + ```shell + system.iotdb.insert("IoTDB", "Sine", [system.date.now()],["measure1","measure2"],[["val1","val2"]]) + ``` + +2. system.iotdb.query: + + +- Script Description:Query the data written to the IoTDB instance + +- Script Definition: + + `system.iotdb.query(historian, sql)` + +- Parameter: + + - `str historian`:The name of the corresponding IoTDB Tag Historian Provider + - `str sql`:SQL statement to be queried + +- Return Value: + Query Results:`List>` + +- Available Range:Client, Designer, Gateway + +- Usage example: + + ```Python + system.iotdb.query("IoTDB", "select * from root.db.Sine where time > 1709563427247") + ``` + +## Ignition-IoTDB With JDBC + +### Introduce + + Ignition-IoTDB With JDBC provides a JDBC driver that allows users to connect and query the Ignition IoTDB database using standard JDBC APIs + +### Installation Steps + +Step 1: Enter the `Configuration` - `Databases` -`Drivers` module and create the `Translator` + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%20With%20JDBC-1.png) + +Step 2: Enter the `Configuration` - `Databases` - `Drivers` module, create a `JDBC Driver` , select the `Translator` configured in the previous step, and upload the downloaded `IoTDB JDBC`. Set the Classname to `org. apache. iotdb. jdbc.IoTDBDriver` + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%20With%20JDBC-2.png) + +Step 3: Enter the `Configuration` - `Databases` - `Connections` module, create a new `Connections` , select the`IoTDB Driver` created in the previous step for `JDBC Driver`, configure the relevant information, and save it to use + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%20With%20JDBC-3.png) + +### Instructions + +#### Data Writing + +Select the previously created `Connection` from the `Data Source` in the `Transaction Groups` + +- `Table name`needs to be set as the complete device path starting from root +- Uncheck `Automatically create table` +- `Store timestame to` configure as time + +Do not select other options, set the fields, and after `enabled` , the data will be installed and stored in the corresponding IoTDB + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E5%86%99%E5%85%A5-1.png) + +#### Query + +- Select `Data Source` in the `Database Query Browser` and select the previously created `Connection` to write an SQL statement to query the data in IoTDB + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E6%9F%A5%E8%AF%A2-ponz.png) + diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/NiFi-IoTDB.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/NiFi-IoTDB.md new file mode 100644 index 00000000..531c5119 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/NiFi-IoTDB.md @@ -0,0 +1,141 @@ + +# Apache NiFi + +## Apache NiFi Introduction + +Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data. + +Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. + +Apache NiFi includes the following capabilities: + +* Browser-based user interface + * Seamless experience for design, control, feedback, and monitoring +* Data provenance tracking + * Complete lineage of information from beginning to end +* Extensive configuration + * Loss-tolerant and guaranteed delivery + * Low latency and high throughput + * Dynamic prioritization + * Runtime modification of flow configuration + * Back pressure control +* Extensible design + * Component architecture for custom Processors and Services + * Rapid development and iterative testing +* Secure communication + * HTTPS with configurable authentication strategies + * Multi-tenant authorization and policy management + * Standard protocols for encrypted communication including TLS and SSH + +## PutIoTDBRecord + +This is a processor that reads the content of the incoming FlowFile as individual records using the configured 'Record Reader' and writes them to Apache IoTDB using native interface. + +### Properties of PutIoTDBRecord + +| property | description | default value | necessary | +|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------- | --------- | +| Host | The host of IoTDB. | null | true | +| Port | The port of IoTDB. | 6667 | true | +| Username | Username to access the IoTDB. | null | true | +| Password | Password to access the IoTDB. | null | true | +| Prefix | The Prefix begin with root. that will be add to the tsName in data.
It can be updated by expression language. | null | true | +| Time | The name of time field | null | true | +| Record Reader | Specifies the type of Record Reader controller service to use
for parsing the incoming data and determining the schema. | null | true | +| Schema | The schema that IoTDB needs doesn't support good by NiFi.
Therefore, you can define the schema here.
Besides, you can set encoding type and compression type by this method.
If you don't set this property, the inferred schema will be used.
It can be updated by expression language. | null | false | +| Aligned | Whether using aligned interface? It can be updated by expression language. | false | false | +| MaxRowNumber | Specifies the max row number of each tablet. It can be updated by expression language. | 1024 | false | + +### Inferred Schema of Flowfile + +There are a couple of rules about flowfile: + +1. The flowfile can be read by `Record Reader`. +2. The schema of flowfile must contain a time field with name set in Time property. +3. The data type of time must be `STRING` or `LONG`. +4. Fields excepted time must start with `root.`. +5. The supported data types are `INT`, `LONG`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `TEXT`. + +### Convert Schema by property + +As mentioned above, converting schema by property which is more flexible and stronger than inferred schema. + +The structure of property `Schema`: + +```json +{ + "fields": [{ + "tsName": "s1", + "dataType": "INT32", + "encoding": "RLE", + "compressionType": "GZIP" + }, { + "tsName": "s2", + "dataType": "INT64", + "encoding": "RLE", + "compressionType": "GZIP" + }] +} +``` + +**Note** + +1. The first column must be `Time`. The rest must be arranged in the same order as in `field` of JSON. +1. The JSON of schema must contain `timeType` and `fields`. +2. There are only two options `LONG` and `STRING` for `timeType`. +3. The columns `tsName` and `dataType` must be set. +4. The property `Prefix` will be added to tsName as the field name when add data to IoTDB. +5. The supported `dataTypes` are `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `TEXT`. +6. The supported `encoding` are `PLAIN`, `DICTIONARY`, `RLE`, `DIFF`, `TS_2DIFF`, `BITMAP`, `GORILLA_V1`, `REGULAR`, `GORILLA`, `CHIMP`, `SPRINTZ`, `RLBE`. +7. The supported `compressionType` are `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZO`, `SDT`, `PAA`, `PLA`, `LZ4`, `ZSTD`, `LZMA2`. + +## Relationships + +| relationship | description | +| ------------ | ---------------------------------------------------- | +| success | Data can be written correctly or flow file is empty. | +| failure | The shema or flow file is abnormal. | + + +## QueryIoTDBRecord + +This is a processor that reads the sql query from the incoming FlowFile and using it to query the result from IoTDB using native interface. Then it use the configured 'Record Writer' to generate the flowfile + +### Properties of QueryIoTDBRecord + +| property | description | default value | necessary | +|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------| --------- | +| Host | The host of IoTDB. | null | true | +| Port | The port of IoTDB. | 6667 | true | +| Username | Username to access the IoTDB. | null | true | +| Password | Password to access the IoTDB. | null | true | +| Record Writer | Specifies the Controller Service to use for writing results to a FlowFile. The Record Writer may use Inherit Schema to emulate the inferred schema behavior, i.e. An explicit schema need not be defined in the writer, and will be supplied by the same logic used to infer the schema from the column types. | null | true | +| iotdb-query | The IoTDB query to execute.
Note: If there are incoming connections, then the query is created from incoming FlowFile's content otherwise"it is created from this property. | null | false | +| iotdb-query-chunk-size | Chunking can be used to return results in a stream of smaller batches (each has a partial results up to a chunk size) rather than as a single response. Chunking queries can return an unlimited number of rows. Note: Chunking is enable when result chunk size is greater than 0 | 0 | false | + + +## Relationships + +| relationship | description | +| ------------ | ---------------------------------------------------- | +| success | Data can be written correctly or flow file is empty. | +| failure | The shema or flow file is abnormal. | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-IoTDB.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-IoTDB.md new file mode 100644 index 00000000..7e03da5c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-IoTDB.md @@ -0,0 +1,232 @@ + + +# Apache Spark(IoTDB) + +## Supported Versions + +Supported versions of Spark and Scala are as follows: + +| Spark Version | Scala Version | +|----------------|---------------| +| `2.4.0-latest` | `2.11, 2.12` | + +## Precautions + +1. The current version of `spark-iotdb-connector` supports Scala `2.11` and `2.12`, but not `2.13`. +2. `spark-iotdb-connector` supports usage in Spark for both Java, Scala, and PySpark. + +## Deployment + +`spark-iotdb-connector` has two use cases: IDE development and `spark-shell` debugging. + +### IDE Development + +For IDE development, simply add the following dependency to the `pom.xml` file: + +``` xml + + org.apache.iotdb + + spark-iotdb-connector_2.12.10 + ${iotdb.version} + +``` + +### `spark-shell` Debugging + +To use `spark-iotdb-connector` in `spark-shell`, you need to download the `with-dependencies` version of the jar package +from the official website. After that, copy the jar package to the `${SPARK_HOME}/jars` directory. +Simply execute the following command: + +```shell +cp spark-iotdb-connector_2.12.10-${iotdb.version}.jar $SPARK_HOME/jars/ +``` + +In addition, to ensure that spark can use JDBC and IoTDB connections, you need to do the following: + +Run the following command to compile the IoTDB JDBC connector: + +```shell +mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies +``` + +The compiled jar package is located in the following directory: + +```shell +$IoTDB_HOME/iotdb-client/jdbc/target/iotdb-jdbc-{version}-SNAPSHOT-jar-with-dependencies.jar +``` + +At last, copy the jar package to the ${SPARK_HOME}/jars directory. Simply execute the following command: + +```shell +cp iotdb-jdbc-{version}-SNAPSHOT-jar-with-dependencies.jar $SPARK_HOME/jars/ +``` + +## Usage + +### Parameters + +| Parameter | Description | Default Value | Scope | Can be Empty | +|--------------|--------------------------------------------------------------------------------------------------------------|---------------|-------------|--------------| +| url | Specifies the JDBC URL of IoTDB | null | read, write | false | +| user | The username of IoTDB | root | read, write | true | +| password | The password of IoTDB | root | read, write | true | +| sql | Specifies the SQL statement for querying | null | read | true | +| numPartition | Specifies the partition number of the DataFrame when in read, and the write concurrency number when in write | 1 | read, write | true | +| lowerBound | The start timestamp of the query (inclusive) | 0 | read | true | +| upperBound | The end timestamp of the query (inclusive) | 0 | read | true | + +### Reading Data from IoTDB + +Here is an example that demonstrates how to read data from IoTDB into a DataFrame: + +```scala +import org.apache.iotdb.spark.db._ + +val df = spark.read.format("org.apache.iotdb.spark.db") + .option("user", "root") + .option("password", "root") + .option("url", "jdbc:iotdb://127.0.0.1:6667/") + .option("sql", "select ** from root") // query SQL + .option("lowerBound", "0") // lower timestamp bound + .option("upperBound", "100000000") // upper timestamp bound + .option("numPartition", "5") // number of partitions + .load + +df.printSchema() + +df.show() +``` + +### Writing Data to IoTDB + +Here is an example that demonstrates how to write data to IoTDB: + +```scala +// Construct narrow table data +val df = spark.createDataFrame(List( + (1L, "root.test.d0", 1, 1L, 1.0F, 1.0D, true, "hello"), + (2L, "root.test.d0", 2, 2L, 2.0F, 2.0D, false, "world"))) + +val dfWithColumn = df.withColumnRenamed("_1", "Time") + .withColumnRenamed("_2", "Device") + .withColumnRenamed("_3", "s0") + .withColumnRenamed("_4", "s1") + .withColumnRenamed("_5", "s2") + .withColumnRenamed("_6", "s3") + .withColumnRenamed("_7", "s4") + .withColumnRenamed("_8", "s5") + +// Write narrow table data +dfWithColumn + .write + .format("org.apache.iotdb.spark.db") + .option("url", "jdbc:iotdb://127.0.0.1:6667/") + .save + +// Construct wide table data +val df = spark.createDataFrame(List( + (1L, 1, 1L, 1.0F, 1.0D, true, "hello"), + (2L, 2, 2L, 2.0F, 2.0D, false, "world"))) + +val dfWithColumn = df.withColumnRenamed("_1", "Time") + .withColumnRenamed("_2", "root.test.d0.s0") + .withColumnRenamed("_3", "root.test.d0.s1") + .withColumnRenamed("_4", "root.test.d0.s2") + .withColumnRenamed("_5", "root.test.d0.s3") + .withColumnRenamed("_6", "root.test.d0.s4") + .withColumnRenamed("_7", "root.test.d0.s5") + +// Write wide table data +dfWithColumn.write.format("org.apache.iotdb.spark.db") + .option("url", "jdbc:iotdb://127.0.0.1:6667/") + .option("numPartition", "10") + .save +``` + +### Wide and Narrow Table Conversion + +Here are examples of how to convert between wide and narrow tables: + +* From wide to narrow + +```scala +import org.apache.iotdb.spark.db._ + +val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root.** where time < 1100 and time > 1000").load +val narrow_df = Transformer.toNarrowForm(spark, wide_df) +``` + +* From narrow to wide + +```scala +import org.apache.iotdb.spark.db._ + +val wide_df = Transformer.toWideForm(spark, narrow_df) +``` + +## Wide and Narrow Tables + +Using the TsFile structure as an example: there are three measurements in the TsFile pattern, +namely `Status`, `Temperature`, and `Hardware`. The basic information for each of these three measurements is as +follows: + +| Name | Type | Encoding | +|-------------|---------|----------| +| Status | Boolean | PLAIN | +| Temperature | Float | RLE | +| Hardware | Text | PLAIN | + +The existing data in the TsFile is as follows: + +* `d1:root.ln.wf01.wt01` +* `d2:root.ln.wf02.wt02` + +| time | d1.status | time | d1.temperature | time | d2.hardware | time | d2.status | +|------|-----------|------|----------------|------|-------------|------|-----------| +| 1 | True | 1 | 2.2 | 2 | "aaa" | 1 | True | +| 3 | True | 2 | 2.2 | 4 | "bbb" | 2 | False | +| 5 | False | 3 | 2.1 | 6 | "ccc" | 4 | True | + +The wide (default) table form is as follows: + +| Time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware | +|------|-------------------------------|--------------------------|----------------------------|-------------------------------|--------------------------|----------------------------| +| 1 | null | true | null | 2.2 | true | null | +| 2 | null | false | aaa | 2.2 | null | null | +| 3 | null | null | null | 2.1 | true | null | +| 4 | null | true | bbb | null | null | null | +| 5 | null | null | null | null | false | null | +| 6 | null | null | ccc | null | null | null | + +You can also use the narrow table format as shown below: + +| Time | Device | status | hardware | temperature | +|------|-------------------|--------|----------|-------------| +| 1 | root.ln.wf02.wt01 | true | null | 2.2 | +| 1 | root.ln.wf02.wt02 | true | null | null | +| 2 | root.ln.wf02.wt01 | null | null | 2.2 | +| 2 | root.ln.wf02.wt02 | false | aaa | null | +| 3 | root.ln.wf02.wt01 | true | null | 2.1 | +| 4 | root.ln.wf02.wt02 | true | bbb | null | +| 5 | root.ln.wf02.wt01 | false | null | null | +| 6 | root.ln.wf02.wt02 | null | ccc | null | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-TsFile.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-TsFile.md new file mode 100644 index 00000000..ed00538c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-TsFile.md @@ -0,0 +1,315 @@ + + +# Apache Spark(TsFile) + +## About Spark-TsFile-Connector + +Spark-TsFile-Connector implements the support of Spark for external data sources of Tsfile type. This enables users to read, write and query Tsfile by Spark. + +With this connector, you can + +* load a single TsFile, from either the local file system or hdfs, into Spark +* load all files in a specific directory, from either the local file system or hdfs, into Spark +* write data from Spark into TsFile + +## System Requirements + +|Spark Version | Scala Version | Java Version | TsFile | +|:-------------: | :-------------: | :------------: |:------------: | +| `2.4.3` | `2.11.8` | `1.8` | `1.0.0`| + +> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile. +> Currently we only support spark version 2.4.3 and there are some known issue on 2.4.7, do no use it + +## Quick Start +### Local Mode + +Start Spark with TsFile-Spark-Connector in local mode: + +``` +./ --jars tsfile-spark-connector.jar,tsfile-{version}-jar-with-dependencies.jar,hadoop-tsfile-{version}-jar-with-dependencies.jar +``` + +Note: + +* \ is the real path of your spark-shell. +* Multiple jar packages are separated by commas without any spaces. +* See https://github.com/apache/iotdb/tree/master/tsfile for how to get TsFile. + + +### Distributed Mode + +Start Spark with TsFile-Spark-Connector in distributed mode (That is, the spark cluster is connected by spark-shell): + +``` +. / --jars tsfile-spark-connector.jar,tsfile-{version}-jar-with-dependencies.jar,hadoop-tsfile-{version}-jar-with-dependencies.jar --master spark://ip:7077 +``` + +Note: + +* \ is the real path of your spark-shell. +* Multiple jar packages are separated by commas without any spaces. +* See https://github.com/apache/iotdb/tree/master/tsfile for how to get TsFile. + +## Data Type Correspondence + +| TsFile data type | SparkSQL data type| +| --------------| -------------- | +| BOOLEAN | BooleanType | +| INT32 | IntegerType | +| INT64 | LongType | +| FLOAT | FloatType | +| DOUBLE | DoubleType | +| TEXT | StringType | + +## Schema Inference + +The way to display TsFile is dependent on the schema. Take the following TsFile structure as an example: There are three measurements in the TsFile schema: status, temperature, and hardware. The basic information of these three measurements is listed: + + +|Name|Type|Encode| +|---|---|---| +|status|Boolean|PLAIN| +|temperature|Float|RLE| +|hardware|Text|PLAIN| + +The existing data in the TsFile are: + +ST 1 + +The corresponding SparkSQL table is: + +| time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware | +|------|-------------------------------|--------------------------|----------------------------|-------------------------------|--------------------------|----------------------------| +| 1 | null | true | null | 2.2 | true | null | +| 2 | null | false | aaa | 2.2 | null | null | +| 3 | null | null | null | 2.1 | true | null | +| 4 | null | true | bbb | null | null | null | +| 5 | null | null | null | null | false | null | +| 6 | null | null | ccc | null | null | null | + +You can also use narrow table form which as follows: (You can see part 6 about how to use narrow form) + +| time | device_name | status | hardware | temperature | +|------|-------------------------------|--------------------------|----------------------------|-------------------------------| +| 1 | root.ln.wf02.wt01 | true | null | 2.2 | +| 1 | root.ln.wf02.wt02 | true | null | null | +| 2 | root.ln.wf02.wt01 | null | null | 2.2 | +| 2 | root.ln.wf02.wt02 | false | aaa | null | +| 3 | root.ln.wf02.wt01 | true | null | 2.1 | +| 4 | root.ln.wf02.wt02 | true | bbb | null | +| 5 | root.ln.wf02.wt01 | false | null | null | +| 6 | root.ln.wf02.wt02 | null | ccc | null | + + + +## Scala API + +NOTE: Remember to assign necessary read and write permissions in advance. + +* Example 1: read from the local file system + +```scala +import org.apache.iotdb.spark.tsfile._ +val wide_df = spark.read.tsfile("test.tsfile") +wide_df.show + +val narrow_df = spark.read.tsfile("test.tsfile", true) +narrow_df.show +``` + +* Example 2: read from the hadoop file system + +```scala +import org.apache.iotdb.spark.tsfile._ +val wide_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +wide_df.show + +val narrow_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +narrow_df.show +``` + +* Example 3: read from a specific directory + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/usr/hadoop") +df.show +``` + +Note 1: Global time ordering of all TsFiles in a directory is not supported now. + +Note 2: Measurements of the same name should have the same schema. + +* Example 4: query in wide form + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select * from tsfile_table where `device_1.sensor_1`>0 and `device_1.sensor_2` < 22") +newDf.show +``` + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select count(*) from tsfile_table") +newDf.show +``` + +* Example 5: query in narrow form + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select * from tsfile_table where device_name = 'root.ln.wf02.wt02' and temperature > 5") +newDf.show +``` + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select count(*) from tsfile_table") +newDf.show +``` + +* Example 6: write in wide form + +```scala +// we only support wide_form table to write +import org.apache.iotdb.spark.tsfile._ + +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +df.show +df.write.tsfile("hdfs://localhost:9000/output") + +val newDf = spark.read.tsfile("hdfs://localhost:9000/output") +newDf.show +``` + +* Example 7: write in narrow form + +```scala +// we only support wide_form table to write +import org.apache.iotdb.spark.tsfile._ + +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +df.show +df.write.tsfile("hdfs://localhost:9000/output", true) + +val newDf = spark.read.tsfile("hdfs://localhost:9000/output", true) +newDf.show +``` + + +Appendix A: Old Design of Schema Inference + +The way to display TsFile is related to TsFile Schema. Take the following TsFile structure as an example: There are three measurements in the Schema of TsFile: status, temperature, and hardware. The basic info of these three Measurements is: + + +|Name|Type|Encode| +|---|---|---| +|status|Boolean|PLAIN| +|temperature|Float|RLE| +|hardware|Text|PLAIN| + + +The existing data in the file are: + +ST 2 + +A set of time-series data + +There are two ways to show a set of time-series data: + +* the default way + +Two columns are created to store the full path of the device: time(LongType) and delta_object(StringType). + +- `time` : Timestamp, LongType +- `delta_object` : Delta_object ID, StringType + +Next, a column is created for each Measurement to store the specific data. The SparkSQL table structure is: + +|time(LongType)|delta\_object(StringType)|status(BooleanType)|temperature(FloatType)|hardware(StringType)| +|---|---|---|---|---| +|1| root.ln.wf01.wt01 |True|2.2|null| +|1| root.ln.wf02.wt02 |True|null|null| +|2| root.ln.wf01.wt01 |null|2.2|null| +|2| root.ln.wf02.wt02 |False|null|"aaa"| +|2| root.sgcc.wf03.wt01 |True|null|null| +|3| root.ln.wf01.wt01 |True|2.1|null| +|3| root.sgcc.wf03.wt01 |True|3.3|null| +|4| root.ln.wf01.wt01 |null|2.0|null| +|4| root.ln.wf02.wt02 |True|null|"bbb"| +|4| root.sgcc.wf03.wt01 |True|null|null| +|5| root.ln.wf01.wt01 |False|null|null| +|5| root.ln.wf02.wt02 |False|null|null| +|5| root.sgcc.wf03.wt01 |True|null|null| +|6| root.ln.wf02.wt02 |null|null|"ccc"| +|6| root.sgcc.wf03.wt01 |null|6.6|null| +|7| root.ln.wf01.wt01 |True|null|null| +|8| root.ln.wf02.wt02 |null|null|"ddd"| +|8| root.sgcc.wf03.wt01 |null|8.8|null| +|9| root.sgcc.wf03.wt01 |null|9.9|null| + + + +* unfold delta_object column + +Expand the device column by "." into multiple columns, ignoring the root directory "root". Convenient for richer aggregation operations. To use this display way, the parameter "delta\_object\_name" is set in the table creation statement (refer to Example 5 in Section 5.1 of this manual), as in this example, parameter "delta\_object\_name" is set to "root.device.turbine". The number of path layers needs to be one-to-one. At this point, one column is created for each layer of the device path except the "root" layer. The column name is the name in the parameter and the value is the name of the corresponding layer of the device. Next, one column is created for each Measurement to store the specific data. + +Then SparkSQL Table Structure is as follows: + +|time(LongType)| group(StringType)| field(StringType)| device(StringType)|status(BooleanType)|temperature(FloatType)|hardware(StringType)| +|---|---|---|---|---|---|---| +|1| ln | wf01 | wt01 |True|2.2|null| +|1| ln | wf02 | wt02 |True|null|null| +|2| ln | wf01 | wt01 |null|2.2|null| +|2| ln | wf02 | wt02 |False|null|"aaa"| +|2| sgcc | wf03 | wt01 |True|null|null| +|3| ln | wf01 | wt01 |True|2.1|null| +|3| sgcc | wf03 | wt01 |True|3.3|null| +|4| ln | wf01 | wt01 |null|2.0|null| +|4| ln | wf02 | wt02 |True|null|"bbb"| +|4| sgcc | wf03 | wt01 |True|null|null| +|5| ln | wf01 | wt01 |False|null|null| +|5| ln | wf02 | wt02 |False|null|null| +|5| sgcc | wf03 | wt01 |True|null|null| +|6| ln | wf02 | wt02 |null|null|"ccc"| +|6| sgcc | wf03 | wt01 |null|6.6|null| +|7| ln | wf01 | wt01 |True|null|null| +|8| ln | wf02 | wt02 |null|null|"ddd"| +|8| sgcc | wf03 | wt01 |null|8.8|null| +|9| sgcc | wf03 | wt01 |null|9.9|null| + + +TsFile-Spark-Connector displays one or more TsFiles as a table in SparkSQL By SparkSQL. It also allows users to specify a single directory or use wildcards to match multiple directories. If there are multiple TsFiles, the union of the measurements in all TsFiles will be retained in the table, and the measurement with the same name have the same data type by default. Note that if a situation with the same name but different data types exists, TsFile-Spark-Connector does not guarantee the correctness of the results. + +The writing process is to write a DataFrame as one or more TsFiles. By default, two columns need to be included: time and delta_object. The rest of the columns are used as Measurement. If user wants to write the second table structure back to TsFile, user can set the "delta\_object\_name" parameter(refer to Section 5.1 of Section 5.1 of this manual). + +Appendix B: Old Note +NOTE: Check the jar packages in the root directory of your Spark and replace libthrift-0.9.2.jar and libfb303-0.9.2.jar with libthrift-0.9.1.jar and libfb303-0.9.1.jar respectively. diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Telegraf-IoTDB.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Telegraf-IoTDB.md new file mode 100644 index 00000000..cdb7475a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Telegraf-IoTDB.md @@ -0,0 +1,110 @@ + + +# Telegraf +Telegraf is an open-source agent that facilitates the collection, processing, and transmission of metric data. Developed by InfluxData. +Telegraf includes the following features: +* Plugin Architecture: Telegraf's strength lies in its extensive plugin ecosystem. It supports a wide range of input, output, and processor plugins, allowing seamless integration with various data sources and destinations. +* Data Collection: Telegraf excels in collecting metrics from diverse sources, such as system metrics, logs, databases, and more. Its versatility makes it suitable for monitoring applications, infrastructure, and IoT devices. +* Output Destinations: Once collected, data can be sent to various output destinations, including popular databases like InfluxDB. This flexibility makes Telegraf adaptable to different monitoring and analytics setups. +* Ease of Configuration: Telegraf's configuration is done using TOML files. This simplicity allows users to define inputs, outputs, and processors with ease, making customization straightforward. +* Community and Support: Being open-source, Telegraf benefits from an active community. Users can contribute plugins, report issues, and seek assistance through forums and documentation. + +# Telegraf IoTDB Output Plugin +This output plugin saves Telegraf metrics to an Apache IoTDB backend, supporting session connection and data insertion. + +## Precautions +1. Before using this plugin, please configure the IP address, port number, username, password and other information of the database server, as well as some data type conversion, time unit and other configurations. +2. The path should follow the rule in Chapter 'Syntax Rule' +3. See https://github.com/influxdata/telegraf/tree/master/plugins/outputs/iotdb for how to configure this plugin. + +## Example +Here is an example that demonstrates how to collect cpu data from Telegraf into IoTDB. +1. generate the configuration file by telegraf +``` +telegraf --sample-config --input-filter cpu --output-filter iotdb > cpu_iotdb.conf +``` +2. modify the default cpu inputs plugin configuration +``` +# Read metrics about cpu usage +[[inputs.cpu]] + ## Whether to report per-cpu stats or not + percpu = true + ## Whether to report total system cpu stats or not + totalcpu = true + ## If true, collect raw CPU time metrics + collect_cpu_time = false + ## If true, compute and report the sum of all non-idle CPU states + report_active = false + ## If true and the info is available then add core_id and physical_id tags + core_tags = false + name_override = "root.demo.telgraf.cpu" +``` +3. modify the IoTDB outputs plugin configuration +``` +# Save metrics to an IoTDB Database +[[outputs.iotdb]] + ## Configuration of IoTDB server connection + host = "127.0.0.1" + # port = "6667" + + ## Configuration of authentication + # user = "root" + # password = "root" + + ## Timeout to open a new session. + ## A value of zero means no timeout. + # timeout = "5s" + + ## Configuration of type conversion for 64-bit unsigned int + ## IoTDB currently DOES NOT support unsigned integers (version 13.x). + ## 32-bit unsigned integers are safely converted into 64-bit signed integers by the plugin, + ## however, this is not true for 64-bit values in general as overflows may occur. + ## The following setting allows to specify the handling of 64-bit unsigned integers. + ## Available values are: + ## - "int64" -- convert to 64-bit signed integers and accept overflows + ## - "int64_clip" -- convert to 64-bit signed integers and clip the values on overflow to 9,223,372,036,854,775,807 + ## - "text" -- convert to the string representation of the value + # uint64_conversion = "int64_clip" + + ## Configuration of TimeStamp + ## TimeStamp is always saved in 64bits int. timestamp_precision specifies the unit of timestamp. + ## Available value: + ## "second", "millisecond", "microsecond", "nanosecond"(default) + timestamp_precision = "millisecond" + + ## Handling of tags + ## Tags are not fully supported by IoTDB. + ## A guide with suggestions on how to handle tags can be found here: + ## https://iotdb.apache.org/UserGuide/Master/API/InfluxDB-Protocol.html + ## + ## Available values are: + ## - "fields" -- convert tags to fields in the measurement + ## - "device_id" -- attach tags to the device ID + ## + ## For Example, a metric named "root.sg.device" with the tags `tag1: "private"` and `tag2: "working"` and + ## fields `s1: 100` and `s2: "hello"` will result in the following representations in IoTDB + ## - "fields" -- root.sg.device, s1=100, s2="hello", tag1="private", tag2="working" + ## - "device_id" -- root.sg.device.private.working, s1=100, s2="hello" + convert_tags_to = "fields" +``` +4. run telegraf with this configuration file, after some time, the data can be found in IoTDB + diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Thingsboard.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Thingsboard.md new file mode 100644 index 00000000..d962a58f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Thingsboard.md @@ -0,0 +1,99 @@ + +# ThingsBoard + +## Product Overview + +1. Introduction to ThingsBoard + + ThingsBoard is an open-source IoT platform that enables rapid development, management, and expansion of IoT projects. For more detailed information, please refer to [ThingsBoard Official Website](https://thingsboard.io/docs/getting-started-guides/what-is-thingsboard/). + + ![](https://alioss.timecho.com/docs/img/ThingsBoard-en1.png) + +1. Introduction to ThingsBoard-IoTDB + + ThingsBoard IoTDB provides the ability to store data from ThingsBoard to IoTDB, and also supports reading data information from the `root.thingsboard` database in ThingsBoard. The detailed architecture diagram is shown in yellow in the following figure. + +### Relationship Diagram + + ![](https://alioss.timecho.com/docs/img/Thingsboard-2.png) + +## Installation Requirements + +| **Preparation Content** | **Version Requirements** | +| :---------------------------------------- | :----------------------------------------------------------- | +| JDK | JDK17 or above. Please refer to the downloads on [Oracle Official Website](https://www.oracle.com/java/technologies/downloads/) | +| IoTDB |IoTDB v1.3.0 or above. Please refer to the [Deployment guidance](../Deployment-and-Maintenance/IoTDB-Package_timecho.md) | +| ThingsBoard
(IoTDB adapted version) | Please contact Timecho staff to obtain the installation package. Detailed installation steps are provided below. | + +## Installation Steps + +Please refer to the installation steps on [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/),wherein: + +- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/)【 Step 2: ThingsBoard Service Installation 】 Use the installation package provided by your Timecho contact to install the software. Please note that the official ThingsBoard installation package does not support IoTDB. +- [ThingsBoard Official Website](https://thingsboard.io/docs/user-guide/install/ubuntu/) 【Step 3: Configure ThingsBoard Database - ThingsBoard Configuration】 In this step, you need to add environment variables according to the following content + +```Shell +# ThingsBoard original configuration +export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/thingsboard +export SPRING_DATASOURCE_USERNAME=postgres +export SPRING_DATASOURCE_PASSWORD=PUT_YOUR_POSTGRESQL_PASSWORD_HERE ##Change password to pg + +# To use IoTDB, the following variables need to be modified +export DATABASE_TS_TYPE=iotdb ## Originally configured as SQL, change the variable value to iotdb + + +# To use IoTDB, the following variables need to be added +export DATABASE_TS_LATEST_TYPE=iotdb +export IoTDB_HOST=127.0.0.1 ## The IP address where iotdb is located +export IoTDB_PORT:6667 ## The port number for iotdb is 6667 by default +export IoTDB_USER:root ## The username for iotdb,defaults as root +export IoTDB_PASSWORD:root ## The password for iotdb,default as root +export IoTDB_CONNECTION_TIMEOUT:5000 ## IoTDB timeout setting +export IoTDB_FETCH_SIZE:1024 ## The number of data pulled in a single request is recommended to be set to 1024 +export IoTDB_MAX_SIZE:200 ## The maximum number of sessions in the session pool is recommended to be set to>=concurrent requests +export IoTDB_DATABASE:root.thingsboard ## Thingsboard data is written to the database stored in IoTDB, supporting customization +``` + +## Instructions + +1. Set up devices and connect datasource: Add a new device under "Entities" - "Devices" in Thingsboard and send data to the specified devices through gateway. + + ![](https://alioss.timecho.com/docs/img/Thingsboard-en2.png) + +2. Set rule chain: Set alarm rules for "SD-032F pump" in the rule chain library and set the rule chain as the root chain + +
+  +  +
+ + +3. View alarm records: The generated alarm records can be found under "Devices" - "Alarms + + ![](https://alioss.timecho.com/docs/img/Thingsboard-en5.png) + +4. Data Visualization: Configure datasource and parameters for data visualization. + +
+  +  +
\ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB.md b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB.md new file mode 100644 index 00000000..4e204e58 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB.md @@ -0,0 +1,185 @@ + + +# Apache Zeppelin + +## About Zeppelin + +Zeppelin is a web-based notebook that enables interactive data analytics. You can connect to data sources and perform interactive operations with SQL, Scala, etc. The operations can be saved as documents, just like Jupyter. Zeppelin has already supported many data sources, including Spark, ElasticSearch, Cassandra, and InfluxDB. Now, we have enabled Zeppelin to operate IoTDB via SQL. + +![iotdb-note-snapshot](https://alioss.timecho.com/docs/img/github/102752947-520a3e80-43a5-11eb-8fb1-8fac471c8c7e.png) + + + +## Zeppelin-IoTDB Interpreter + +### System Requirements + +| IoTDB Version | Java Version | Zeppelin Version | +| :-----------: | :-----------: | :--------------: | +| >=`0.12.0` | >=`1.8.0_271` | `>=0.9.0` | + +Install IoTDB: Reference to [IoTDB Quick Start](../QuickStart/QuickStart.html). Suppose IoTDB is placed at `$IoTDB_HOME`. + +Install Zeppelin: +> Method A. Download directly: You can download [Zeppelin](https://zeppelin.apache.org/download.html#) and unpack the binary package. [netinst](http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-netinst.tgz) binary package is recommended since it's relatively small by excluding irrelevant interpreters. +> +> Method B. Compile from source code: Reference to [build Zeppelin from source](https://zeppelin.apache.org/docs/latest/setup/basics/how_to_build.html). The command is `mvn clean package -pl zeppelin-web,zeppelin-server -am -DskipTests`. + +Suppose Zeppelin is placed at `$Zeppelin_HOME`. + +### Build Interpreter + +``` + cd $IoTDB_HOME + mvn clean package -pl iotdb-connector/zeppelin-interpreter -am -DskipTests -P get-jar-with-dependencies +``` + +The interpreter will be in the folder: + +``` + $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar +``` + + + +### Install Interpreter + +Once you have built your interpreter, create a new folder under the Zeppelin interpreter directory and put the built interpreter into it. + +``` + cd $IoTDB_HOME + mkdir -p $Zeppelin_HOME/interpreter/iotdb + cp $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar $Zeppelin_HOME/interpreter/iotdb +``` + +### Modify Configuration + +Enter `$Zeppelin_HOME/conf` and use template to create Zeppelin configuration file: + +```shell +cp zeppelin-site.xml.template zeppelin-site.xml +``` + +Open the zeppelin-site.xml file and change the `zeppelin.server.addr` item to `0.0.0.0` + + +### Running Zeppelin and IoTDB + +Go to `$Zeppelin_HOME` and start Zeppelin by running: + +``` + ./bin/zeppelin-daemon.sh start +``` + +or in Windows: + +``` + .\bin\zeppelin.cmd +``` + +Go to `$IoTDB_HOME` and start IoTDB server: + +``` + # Unix/OS X + > nohup sbin/start-server.sh >/dev/null 2>&1 & + or + > nohup sbin/start-server.sh -c -rpc_port >/dev/null 2>&1 & + + # Windows + > sbin\start-server.bat -c -rpc_port +``` + + + +## Use Zeppelin-IoTDB + +Wait for Zeppelin server to start, then visit http://127.0.0.1:8080/ + +In the interpreter page: + +1. Click the `Create new node` button +2. Set the note name +3. Configure your interpreter + +Now you are ready to use your interpreter. + +![iotdb-create-note](https://alioss.timecho.com/docs/img/github/102752945-5171a800-43a5-11eb-8614-53b3276a3ce2.png) + +We provide some simple SQL to show the use of Zeppelin-IoTDB interpreter: + +```sql + CREATE DATABASE root.ln.wf01.wt01; + CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN; + CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=PLAIN; + CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32, ENCODING=PLAIN; + + INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) + VALUES (1, 1.1, false, 11); + + INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) + VALUES (2, 2.2, true, 22); + + INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) + VALUES (3, 3.3, false, 33); + + INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) + VALUES (4, 4.4, false, 44); + + INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) + VALUES (5, 5.5, false, 55); + + + SELECT * + FROM root.ln.wf01.wt01 + WHERE time >= 1 + AND time <= 6; +``` + +The screenshot is as follows: + +![iotdb-note-snapshot2](https://alioss.timecho.com/docs/img/github/102752948-52a2d500-43a5-11eb-9156-0c55667eb4cd.png) + +You can also design more fantasy documents referring to [[1]](https://zeppelin.apache.org/docs/0.9.0/usage/display_system/basic.html) and others. + +The above demo notebook can be found at `$IoTDB_HOME/zeppelin-interpreter/Zeppelin-IoTDB-Demo.zpln`. + + + +## Configuration + +You can configure the connection parameters in http://127.0.0.1:8080/#/interpreter : + +![iotdb-configuration](https://alioss.timecho.com/docs/img/github/102752940-50407b00-43a5-11eb-94fb-3e3be222183c.png) + +The parameters you can configure are as follows: + +| Property | Default | Description | +| ---------------------------- | --------- | ------------------------------- | +| iotdb.host | 127.0.0.1 | IoTDB server host to connect to | +| iotdb.port | 6667 | IoTDB server port to connect to | +| iotdb.username | root | Username for authentication | +| iotdb.password | root | Password for authentication | +| iotdb.fetchSize | 10000 | Query fetch size | +| iotdb.zoneId | | Zone Id | +| iotdb.enable.rpc.compression | FALSE | Whether enable rpc compression | +| iotdb.time.display.type | default | The time format to display | + diff --git a/src/UserGuide/V2.0.1/Tree/FAQ/Frequently-asked-questions.md b/src/UserGuide/V2.0.1/Tree/FAQ/Frequently-asked-questions.md new file mode 100644 index 00000000..de789a04 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/FAQ/Frequently-asked-questions.md @@ -0,0 +1,263 @@ + + +# Frequently Asked Questions + +## General FAQ + +### How can I identify my version of IoTDB? + +There are several ways to identify the version of IoTDB that you are using: + +* Launch IoTDB's Command Line Interface: + +``` +> ./start-cli.sh -p 6667 -pw root -u root -h localhost + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version x.x.x +``` + +* Check pom.xml file: + +``` +x.x.x +``` + +* Use JDBC API: + +``` +String iotdbVersion = tsfileDatabaseMetadata.getDatabaseProductVersion(); +``` + +* Use Command Line Interface: + +``` +IoTDB> show version +show version ++---------------+ +|version | ++---------------+ +|x.x.x | ++---------------+ +Total line number = 1 +It costs 0.241s +``` + +### Where can I find IoTDB logs? + +Suppose your root directory is: + +``` +$ pwd +/workspace/iotdb + +$ ls -l +server/ +cli/ +pom.xml +Readme.md +... +``` + +Let `$IOTDB_HOME = /workspace/iotdb/server/target/iotdb-server-{project.version}` + +Let `$IOTDB_CLI_HOME = /workspace/iotdb/cli/target/iotdb-cli-{project.version}` + +By default settings, the logs are stored under ```IOTDB_HOME/logs```. You can change log level and storage path by configuring ```logback.xml``` under ```IOTDB_HOME/conf```. + +### Where can I find IoTDB data files? + +By default settings, the data files (including tsfile, metadata, and WAL files) are stored under ```IOTDB_HOME/data/datanode```. + +### How do I know how many time series are stored in IoTDB? + +Use IoTDB's Command Line Interface: + +``` +IoTDB> show timeseries root +``` + +In the result, there is a statement shows `Total timeseries number`, this number is the timeseries number in IoTDB. + +In the current version, IoTDB supports querying the number of time series. Use IoTDB's Command Line Interface: + +``` +IoTDB> count timeseries root +``` + +If you are using Linux, you can use the following shell command: + +``` +> grep "0,root" $IOTDB_HOME/data/system/schema/mlog.txt | wc -l +> 6 +``` + +### Can I use Hadoop and Spark to read TsFile in IoTDB? + +Yes. IoTDB has intense integration with Open Source Ecosystem. IoTDB supports [Hadoop](https://github.com/apache/iotdb-extras/tree/master/connectors/hadoop), [Spark](https://github.com/apache/iotdb-extras/tree/master/connectors/spark-iotdb-connector) and [Grafana](https://github.com/apache/iotdb-extras/tree/master/connectors/grafana-connector) visualization tool. + +### How does IoTDB handle duplicate points? + +A data point is uniquely identified by a full time series path (e.g. ```root.vehicle.d0.s0```) and timestamp. If you submit a new point with the same path and timestamp as an existing point, IoTDB updates the value of this point instead of inserting a new point. + +### How can I tell what type of the specific timeseries? + +Use ```SHOW TIMESERIES ``` SQL in IoTDB's Command Line Interface: + +For example, if you want to know the type of all timeseries, the \ should be `root.**`. The statement will be: + +``` +IoTDB> show timeseries.** +``` + +If you want to query specific sensor, you can replace the \ with the sensor name. For example: + +``` +IoTDB> show timeseries root.fit.d1.s1 +``` + +Otherwise, you can also use wildcard in timeseries path: + +``` +IoTDB> show timeseries root.fit.d1.* +``` + +### How can I change IoTDB's Cli time display format? + +The default IoTDB's Cli time display format is readable (e.g. ```1970-01-01T08:00:00.001```), if you want to display time in timestamp type or other readable format, add parameter ```-disableISO8601``` in start command: + +``` +> $IOTDB_CLI_HOME/sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root -disableISO8601 +``` + +### How to handle error `IndexOutOfBoundsException` from `org.apache.ratis.grpc.server.GrpcLogAppender`? + +This is an internal error log from Ratis 2.4.1, our dependency, and no impact on data writes or reads is expected. +It has been reported to the Ratis community and will be fixed in the future releases. + +### How to deal with estimated out of memory errors? + +Report an error message: +``` +301: There is not enough memory to execute current fragment instance, current remaining free memory is 86762854, estimated memory usage for current fragment instance is 270139392 +``` +Error analysis: +The datanode_memory_proportion parameter controls the memory divided to the query, and the chunk_timeseriesmeta_free_memory_proportion parameter controls the memory available for query execution. +By default the memory allocated to the query is 30% of the heap memory and the memory available for query execution is 20% of the query memory. +The error report shows that the current remaining memory available for query execution is 86762854B = 82.74MB, and the query is estimated to use 270139392B = 257.6MB of execution memory. + +Some possible improvement items: + +- Without changing the default parameters, crank up the heap memory of IoTDB greater than 4.2G (4.2G * 1024MB = 4300MB), 4300M * 30% * 20% = 258M > 257.6M, which can fulfill the requirement. +- Change parameters such as datanode_memory_proportion so that the available memory for query execution is >257.6MB. +- Reduce the number of exported time series. +- Add slimit limit to the query statement, which is also an option to reduce the query time series. +- Add align by device, which will export in device order, and the memory usage will be reduced to single-device level. + +It is an internal error introduced by Ratis 2.4.1 dependency, and we can safely ignore this exception as it will +not affect normal operations. We will fix this message in the incoming releases. + +## FAQ for Cluster Setup + +### Cluster StartUp and Stop + +#### Failed to start ConfigNode for the first time, how to find the reason? + +- Make sure that the data/confignode directory is cleared when start ConfigNode for the first time. +- Make sure that the used by ConfigNode is not occupied, and the is also not conflicted with other ConfigNodes. +- Make sure that the `cn_seed_config_node` is configured correctly, which points to the alive ConfigNode. And if the ConfigNode is started for the first time, make sure that `cn_seed_config_node` points to itself. +- Make sure that the configuration(consensus protocol and replica number) of the started ConfigNode is accord with the `cn_seed_config_node` ConfigNode. + +#### ConfigNode is started successfully, but why the node doesn't appear in the results of `show cluster`? + +- Examine whether the `cn_seed_config_node` points to the correct address. If `cn_seed_config_node` points to itself, a new ConfigNode cluster is started. + +#### Failed to start DataNode for the first time, how to find the reason? + +- Make sure that the data/datanode directory is cleared when start DataNode for the first time. If the start result is “Reject DataNode restart.”, maybe the data/datanode directory is not cleared. +- Make sure that the used by DataNode is not occupied, and the is also not conflicted with other DataNodes. +- Make sure that the `dn_seed_config_node` points to the alive ConfigNode. + +#### Failed to remove DataNode, how to find the reason? + +- Examine whether the parameter of `remove-datanode.sh` is correct, only rpcIp:rpcPort and dataNodeId are correct parameter. +- Only when the number of available DataNodes in the cluster is greater than max(schema_replication_factor, data_replication_factor), removing operation can be executed. +- Removing DataNode will migrate the data from the removing DataNode to other alive DataNodes. Data migration is based on Region, if some regions are migrated failed, the removing DataNode will always in the status of `Removing`. +- If the DataNode is in the status of `Removing`, the regions in the removing DataNode will also in the status of `Removing` or `Unknown`, which are unavailable status. Besides, the removing DataNode will not receive new write requests from client. + And users can use the command `set system status to running` to make the status of DataNode from Removing to Running; + If users want to make the Regions from Removing to available status, command `migrate region from datanodeId1 to datanodeId2` can take effect, this command can migrate the regions to other alive DataNodes. + Besides, IoTDB will publish `remove-datanode.sh -f` command in the next version, which can remove DataNodes forced (The failed migrated regions will be discarded). + +#### Whether the down DataNode can be removed? + +- The down DataNode can be removed only when the replica factor of schema and data is greater than 1. + Besides, IoTDB will publish `remove-datanode.sh -f` function in the next version. + +#### What should be paid attention to when upgrading from 0.13 to 1.0? + +- The file structure between 0.13 and 1.0 is different, we can't copy the data directory from 0.13 to 1.0 to use directly. + If you want to load the data from 0.13 to 1.0, you can use the LOAD function. +- The default RPC address of 0.13 is `0.0.0.0`, but the default RPC address of 1.0 is `127.0.0.1`. + + +### Cluster Restart + +#### How to restart any ConfigNode in the cluster? + +- First step: stop the process by `stop-confignode.sh` or kill PID of ConfigNode. +- Second step: execute `start-confignode.sh` to restart ConfigNode. + +#### How to restart any DataNode in the cluster? + +- First step: stop the process by `stop-datanode.sh` or kill PID of DataNode. +- Second step: execute `start-datanode.sh` to restart DataNode. + +#### If it's possible to restart ConfigNode using the old data directory when it's removed? + +- Can't. The running result will be "Reject ConfigNode restart. Because there are no corresponding ConfigNode(whose nodeId=xx) in the cluster". + +#### If it's possible to restart DataNode using the old data directory when it's removed? + +- Can't. The running result will be "Reject DataNode restart. Because there are no corresponding DataNode(whose nodeId=xx) in the cluster. Possible solutions are as follows:...". + +#### Can we execute `start-confignode.sh`/`start-datanode.sh` successfully when delete the data directory of given ConfigNode/DataNode without killing the PID? + +- Can't. The running result will be "The port is already occupied". + +### Cluster Maintenance + +#### How to find the reason when Show cluster failed, and error logs like "please check server status" are shown? + +- Make sure that more than one half ConfigNodes are alive. +- Make sure that the DataNode connected by the client is alive. + +#### How to fix one DataNode when the disk file is broken? + +- We can use `remove-datanode.sh` to fix it. Remove-datanode will migrate the data in the removing DataNode to other alive DataNodes. +- IoTDB will publish Node-Fix tools in the next version. + +#### How to decrease the memory usage of ConfigNode/DataNode? + +- Adjust the ON_HEAP_MEMORY、OFF_HEAP_MEMORY options in conf/confignode-env.sh and conf/datanode-env.sh. diff --git a/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_apache.md b/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_apache.md new file mode 100644 index 00000000..e783f74b --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_apache.md @@ -0,0 +1,77 @@ + + +# What is IoTDB + +Apache IoTDB is a low-cost, high-performance native temporal database for the Internet of Things. It can solve various problems encountered by enterprises when building IoT big data platforms to manage time-series data, such as complex application scenarios, large data volumes, high sampling frequencies, high amount of unaligned data, long data processing time, diverse analysis requirements, and high storage and operation costs. + +- Github repository link: https://github.com/apache/iotdb + +- Open source installation package download: https://iotdb.apache.org/zh/Download/ + +- Installation, deployment, and usage documentation: [QuickStart](../QuickStart/QuickStart_apache.md) + + +## Product Components + +IoTDB products consist of several components that help users efficiently manage and analyze the massive amount of time-series data generated by the IoT. + +
+ Introduction-en-timecho.png + +
+ +1. Time-series Database (Apache IoTDB): The core component for time-series data storage, it provides users with high-compression storage capabilities, rich time-series querying capabilities, real-time stream processing capabilities, and ensures high availability of data and high scalability of clusters. It also offers comprehensive security protection. Additionally, IoTDB provides users with a variety of application tools for easy configuration and management of the system; multi-language APIs and external system application integration capabilities, making it convenient for users to build business applications based on IoTDB. + +2. Time-series Data Standard File Format (Apache TsFile): This file format is specifically designed for time-series data and can efficiently store and query massive amounts of time-series data. Currently, the underlying storage files for modules such as IoTDB and AINode are supported by Apache TsFile. With TsFile, users can uniformly use the same file format for data management during the collection, management, application, and analysis phases, greatly simplifying the entire process from data collection to analysis, and improving the efficiency and convenience of time-series data management. + +3. Time-series Model Training and Inference Integrated Engine (IoTDB AINode): For intelligent analysis scenarios, IoTDB provides the AINode time-series model training and inference integrated engine, which offers a complete set of time-series data analysis tools. The underlying engine supports model training tasks and data management, including machine learning and deep learning. With these tools, users can conduct in-depth analysis of the data stored in IoTDB and extract its value. + + +## Product Features + +TimechoDB has the following advantages and characteristics: + +- Flexible deployment methods: Support for one-click cloud deployment, out-of-the-box use after unzipping at the terminal, and seamless connection between terminal and cloud (data cloud synchronization tool). + +- Low hardware cost storage solution: Supports high compression ratio disk storage, no need to distinguish between historical and real-time databases, unified data management. + +- Hierarchical sensor organization and management: Supports modeling in the system according to the actual hierarchical relationship of devices to achieve alignment with the industrial sensor management structure, and supports directory viewing, search, and other capabilities for hierarchical structures. + +- High throughput data reading and writing: supports access to millions of devices, high-speed data reading and writing, out of unaligned/multi frequency acquisition, and other complex industrial reading and writing scenarios. + +- Rich time series query semantics: Supports a native computation engine for time series data, supports timestamp alignment during queries, provides nearly a hundred built-in aggregation and time series calculation functions, and supports time series feature analysis and AI capabilities. + +- Highly available distributed system: Supports HA distributed architecture, the system provides 7*24 hours uninterrupted real-time database services, the failure of a physical node or network fault will not affect the normal operation of the system; supports the addition, deletion, or overheating of physical nodes, the system will automatically perform load balancing of computing/storage resources; supports heterogeneous environments, servers of different types and different performance can form a cluster, and the system will automatically load balance according to the configuration of the physical machine. + +- Extremely low usage and operation threshold: supports SQL like language, provides multi language native secondary development interface, and has a complete tool system such as console. + +- Rich ecological environment docking: Supports docking with big data ecosystem components such as Hadoop, Spark, and supports equipment management and visualization tools such as Grafana, Thingsboard, DataEase. + +## Commercial version + +Timecho provides the original commercial product TimechoDB based on the open source version of Apache IoTDB, providing enterprise level products and services for enterprises and commercial customers. It can solve various problems encountered by enterprises when building IoT big data platforms to manage time-series data, such as complex application scenarios, large data volumes, high sampling frequencies, high amount of unaligned data, long data processing time, diverse analysis requirements, and high storage and operation costs. + +Timecho provides a more diverse range of product features, stronger performance and stability, and a richer set of utility tools based on TimechoDB. It also offers comprehensive enterprise services to users, thereby providing commercial customers with more powerful product capabilities and a higher quality of development, operations, and usage experience. + +- Timecho Official website:https://www.timecho.com/ + +- TimechoDB installation, deployment and usage documentation:[QuickStart](../QuickStart/QuickStart_timecho.md) \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_timecho.md b/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_timecho.md new file mode 100644 index 00000000..f4866314 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_timecho.md @@ -0,0 +1,266 @@ + + +# What is TimechoDB + +TimechoDB is a low-cost, high-performance native temporal database for the Internet of Things, provided by Timecho based on the Apache IoTDB community version as an original commercial product. It can solve various problems encountered by enterprises when building IoT big data platforms to manage time-series data, such as complex application scenarios, large data volumes, high sampling frequencies, high amount of unaligned data, long data processing time, diverse analysis requirements, and high storage and operation costs. + +Timecho provides a more diverse range of product features, stronger performance and stability, and a richer set of utility tools based on TimechoDB. It also offers comprehensive enterprise services to users, thereby providing commercial customers with more powerful product capabilities and a higher quality of development, operations, and usage experience. + +- Download 、Deployment and Usage:[QuickStart](../QuickStart/QuickStart_timecho.md) + + +## Product Components + +Timecho products is composed of several components, covering the entire time-series data lifecycle from data collection, data management to data analysis & application, helping users efficiently manage and analyze the massive amount of time-series data generated by the IoT. + +
+ Introduction-en-timecho-new.png + +
+ +1. **Time-series database (TimechoDB, a commercial product based on Apache IoTDB provided by the original team)**: The core component of time-series data storage, which can provide users with high-compression storage capabilities, rich time-series query capabilities, real-time stream processing capabilities, while also having high availability of data and high scalability of clusters, and providing security protection. At the same time, TimechoDB also provides users with a variety of application tools for easy management of the system; multi-language API and external system application integration capabilities, making it convenient for users to build applications based on TimechoDB. + +2. **Time-series data standard file format (Apache TsFile, led and contributed by core team members of Timecho)**: This file format is a storage format specifically designed for time-series data, which can efficiently store and query massive amounts of time-series data. Currently, the underlying storage files of Timecho's collection, storage, and intelligent analysis modules are all supported by Apache TsFile. TsFile can be efficiently loaded into TimechoDB and can also be migrated out. Through TsFile, users can use the same file format for data management in the stages of collection, management, application & analysis, greatly simplifying the entire process from data collection to analysis, and improving the efficiency and convenience of time-series data management. + +3. **Time-series model training and inference integrated engine (AINode)**: For intelligent analysis scenarios, TimechoDB provides the AINode time-series model training and inference integrated engine, which offers a complete set of time-series data analysis tools, with the underlying model training engine supporting training tasks and data management, including machine learning, deep learning, etc. With these tools, users can conduct in-depth analysis of the data stored in TimechoDB and mine its value. + +4. **Data collection**: To more conveniently dock with various industrial collection scenarios, Timecho provides data collection access services, supporting multiple protocols and formats, which can access data generated by various sensors and devices, while also supporting features such as breakpoint resumption and network barrier penetration. It is more adapted to the characteristics of difficult configuration, slow transmission, and weak network in the industrial field collection process, making the user's data collection simpler and more efficient. + +## Product Features + +TimechoDB has the following advantages and characteristics: + +- Flexible deployment methods: Support for one-click cloud deployment, out-of-the-box use after unzipping at the terminal, and seamless connection between terminal and cloud (data cloud synchronization tool). + +- Low hardware cost storage solution: Supports high compression ratio disk storage, no need to distinguish between historical and real-time databases, unified data management. + +- Hierarchical sensor organization and management: Supports modeling in the system according to the actual hierarchical relationship of devices to achieve alignment with the industrial sensor management structure, and supports directory viewing, search, and other capabilities for hierarchical structures. + +- High throughput data reading and writing: supports access to millions of devices, high-speed data reading and writing, out of unaligned/multi frequency acquisition, and other complex industrial reading and writing scenarios. + +- Rich time series query semantics: Supports a native computation engine for time series data, supports timestamp alignment during queries, provides nearly a hundred built-in aggregation and time series calculation functions, and supports time series feature analysis and AI capabilities. + +- Highly available distributed system: Supports HA distributed architecture, the system provides 7*24 hours uninterrupted real-time database services, the failure of a physical node or network fault will not affect the normal operation of the system; supports the addition, deletion, or overheating of physical nodes, the system will automatically perform load balancing of computing/storage resources; supports heterogeneous environments, servers of different types and different performance can form a cluster, and the system will automatically load balance according to the configuration of the physical machine. + +- Extremely low usage and operation threshold: supports SQL like language, provides multi language native secondary development interface, and has a complete tool system such as console. + +- Rich ecological environment docking: Supports docking with big data ecosystem components such as Hadoop, Spark, and supports equipment management and visualization tools such as Grafana, Thingsboard, DataEase. + +## Enterprise characteristics + +### Higher level product features + +Building on the open-source version, TimechoDB offers a range of advanced product features, with native upgrades and optimizations at the kernel level for industrial production scenarios. These include multi-level storage, cloud-edge collaboration, visualization tools, and security enhancements, allowing users to focus more on business development without worrying too much about underlying logic. This simplifies and enhances industrial production, bringing more economic benefits to enterprises. For example: + + +- Dual Active Deployment:Dual active usually refers to two independent single machines (or clusters) that perform real-time mirror synchronization. Their configurations are completely independent and can simultaneously receive external writes. Each independent single machine (or clusters) can synchronize the data written to itself to another single machine (or clusters), and the data of the two single machines (or clusters) can achieve final consistency. + +- Data Synchronisation:Through the built-in synchronization module of the database, data can be aggregated from the station to the center, supporting various scenarios such as full aggregation, partial aggregation, and hierarchical aggregation. It can support both real-time data synchronization and batch data synchronization modes. Simultaneously providing multiple built-in plugins to support requirements such as gateway penetration, encrypted transmission, and compressed transmission in enterprise data synchronization applications. + +- Tiered Storage:Multi level storage: By upgrading the underlying storage capacity, data can be divided into different levels such as cold, warm, and hot based on factors such as access frequency and data importance, and stored in different media (such as SSD, mechanical hard drive, cloud storage, etc.). At the same time, the system also performs data scheduling during the query process. Thereby reducing customer data storage costs while ensuring data access speed. + +- Security Enhancements: Features like whitelists and audit logs strengthen internal management and reduce the risk of data breaches. + +The detailed functional comparison is as follows: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionApache IoTDBTimechoDB
Deployment ModeStand-Alone Deployment
Distributed Deployment
Dual Active Deployment×
Container DeploymentPartial support
Database FunctionalitySensor Management
Write Data
Query Data
Continuous Query
Trigger
User Defined Function
Permission Management
Data SynchronisationOnly file synchronization, no built-in pluginsReal time synchronization+file synchronization, enriched with built-in plugins
Stream ProcessingOnly framework, no built-in pluginsFramework+rich built-in plugins
Tiered Storage×
View×
White List×
Audit Log×
Supporting ToolsWorkbench×
Cluster Management Tool×
System Monitor Tool×
LocalizationLocalization Compatibility Certification×
Technical SupportBest Practices×
Use Training×
+ +### More efficient/stable product performance + +TimechoDB has optimized stability and performance on the basis of the open source version. With technical support from the enterprise version, it can achieve more than 10 times performance improvement and has the performance advantage of timely fault recovery. + +### More User-Friendly Tool System + +TimechoDB will provide users with a simpler and more user-friendly tool system. Through products such as the Cluster Monitoring Panel (Grafana), Database Console (Workbench), and Cluster Management Tool (Deploy Tool, abbreviated as IoTD), it will help users quickly deploy, manage, and monitor database clusters, reduce the work/learning costs of operation and maintenance personnel, simplify database operation and maintenance work, and make the operation and maintenance process more convenient and efficient. + +- Cluster Monitoring Panel: Designed to address the monitoring issues of TimechoDB and its operating system, including operating system resource monitoring, TimechoDB performance monitoring, and hundreds of kernel monitoring indicators, to help users monitor the health status of the cluster and perform cluster tuning and operation. + +
+

Overall Overview

+

Operating System Resource Monitoring

+

TimechoDB Performance Monitoring

+
+
+ + + +
+

+ +- Database Console: Designed to provide a low threshold database interaction tool, it helps users perform metadata management, data addition, deletion, modification, query, permission management, system management, and other operations in a concise and clear manner through an interface console, simplifying the difficulty of database use and improving database efficiency. + + +
+

Home Page

+

Operate Metadata

+

SQL Query

+
+
+ + + +
+

+ + +- Cluster management tool: aimed at solving the operational difficulties of multi node distributed systems, mainly including cluster deployment, cluster start stop, elastic expansion, configuration updates, data export and other functions, so as to achieve one click instruction issuance for complex database clusters, greatly reducing management difficulty. + + +
+  +
+ +### More professional enterprise technical services + +TimechoDB customers provide powerful original factory services, including but not limited to on-site installation and training, expert consultant consultation, on-site emergency assistance, software upgrades, online self-service, remote support, and guidance on using the latest development version. At the same time, in order to make TimechoDB more suitable for industrial production scenarios, we will recommend modeling solutions, optimize read-write performance, optimize compression ratios, recommend database configurations, and provide other technical support based on the actual data structure and read-write load of the enterprise. If encountering industrial customization scenarios that are not covered by some products, TimechoDB will provide customized development tools based on user characteristics. + +Compared to the open source version, TimechoDB provides a faster release frequency every 2-3 months. At the same time, it offers day level exclusive fixes for urgent customer issues to ensure stable production environments. + +### More compatible localization adaptation + +The TimechoDB code is self-developed and controllable, and is compatible with most mainstream information and creative products (CPU, operating system, etc.), and has completed compatibility certification with multiple manufacturers to ensure product compliance and security. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/Scenario.md b/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/Scenario.md new file mode 100644 index 00000000..60bbc096 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/IoTDB-Introduction/Scenario.md @@ -0,0 +1,94 @@ + + +# Scenario + +## Application 1: Internet of Vehicles + +### Background + +> - Challenge: a large number of vehicles and time series + +A car company has a huge business volume and needs to deal with a large number of vehicles and a large amount of data. It has hundreds of millions of data measurement points, over ten million new data points per second, millisecond-level collection frequency, posing high requirements on real-time writing, storage and processing of databases. + +In the original architecture, the HBase cluster was used as the storage database. The query delay was high, and the system maintenance was difficult and costly. The HBase cluster cannot meet the demand. On the contrary, IoTDB supports high-frequency data writing with millions of measurement points and millisecond-level query response speed. The efficient data processing capability allows users to obtain the required data quickly and accurately. Therefore, IoTDB is chosen as the data storage layer, which has a lightweight architecture, reduces operation and maintenance costs, and supports elastic expansion and contraction and high availability to ensure system stability and availability. + +### Architecture + +The data management architecture of the car company using IoTDB as the time-series data storage engine is shown in the figure below. + + +![img](https://alioss.timecho.com/docs/img/architecture1.png) + +The vehicle data is encoded based on TCP and industrial protocols and sent to the edge gateway, and the gateway sends the data to the message queue Kafka cluster, decoupling the two ends of production and consumption. Kafka sends data to Flink for real-time processing, and the processed data is written into IoTDB. Both historical data and latest data are queried in IoTDB, and finally the data flows into the visualization platform through API for application. + +## Application 2: Intelligent Operation and Maintenance + +### Background + +A steel factory aims to build a low-cost, large-scale access-capable remote intelligent operation and maintenance software and hardware platform, access hundreds of production lines, more than one million devices, and tens of millions of time series, to achieve remote coverage of intelligent operation and maintenance. + +There are many challenges in this process: + +> - Wide variety of devices, protocols, and data types +> - Time series data, especially high-frequency data, has a huge amount of data +> - The reading and writing speed of massive time series data cannot meet business needs +> - Existing time series data management components cannot meet various advanced application requirements + +After selecting IoTDB as the storage database of the intelligent operation and maintenance platform, it can stably write multi-frequency and high-frequency acquisition data, covering the entire steel process, and use a composite compression algorithm to reduce the data size by more than 10 times, saving costs. IoTDB also effectively supports downsampling query of historical data of more than 10 years, helping enterprises to mine data trends and assist enterprises in long-term strategic analysis. + +### Architecture + +The figure below shows the architecture design of the intelligent operation and maintenance platform of the steel plant. + +![img](https://alioss.timecho.com/docs/img/architecture2.jpg) + +## Application 3: Smart Factory + +### Background + +> - Challenge:Cloud-edge collaboration + +A cigarette factory hopes to upgrade from a "traditional factory" to a "high-end factory". It uses the Internet of Things and equipment monitoring technology to strengthen information management and services to realize the free flow of data within the enterprise and to help improve productivity and lower operating costs. + +### Architecture + +The figure below shows the factory's IoT system architecture. IoTDB runs through the three-level IoT platform of the company, factory, and workshop to realize unified joint debugging and joint control of equipment. The data at the workshop level is collected, processed and stored in real time through the IoTDB at the edge layer, and a series of analysis tasks are realized. The preprocessed data is sent to the IoTDB at the platform layer for data governance at the business level, such as device management, connection management, and service support. Eventually, the data will be integrated into the IoTDB at the group level for comprehensive analysis and decision-making across the organization. + +![img](https://alioss.timecho.com/docs/img/architecture3.jpg) + + +## Application 4: Condition monitoring + +### Background + +> - Challenge: Smart heating, cost reduction and efficiency increase + +A power plant needs to monitor tens of thousands of measuring points of main and auxiliary equipment such as fan boiler equipment, generators, and substation equipment. In the previous heating process, there was a lack of prediction of the heat supply in the next stage, resulting in ineffective heating, overheating, and insufficient heating. + +After using IoTDB as the storage and analysis engine, combined with meteorological data, building control data, household control data, heat exchange station data, official website data, heat source side data, etc., all data are time-aligned in IoTDB to provide reliable data basis to realize smart heating. At the same time, it also solves the problem of monitoring the working conditions of various important components in the relevant heating process, such as on-demand billing and pipe network, heating station, etc., to reduce manpower input. + +### Architecture + +The figure below shows the data management architecture of the power plant in the heating scene. + +![img](https://alioss.timecho.com/docs/img/architecture4.jpg) + diff --git a/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart.md b/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart.md new file mode 100644 index 00000000..d94d5e0c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart.md @@ -0,0 +1,23 @@ +--- +redirectTo: QuickStart_apache.html +--- + diff --git a/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_apache.md b/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_apache.md new file mode 100644 index 00000000..a0778511 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_apache.md @@ -0,0 +1,91 @@ + +# Quick Start + +This document will help you understand how to quickly get started with IoTDB. + +## How to install and deploy? + +This document will help you quickly install and deploy IoTDB. You can quickly locate the content you need to view through the following document links: + +1. Prepare the necessary machine resources:The deployment and operation of IoTDB require consideration of multiple aspects of machine resource configuration. Specific resource allocation can be viewed [Database Resources](../Deployment-and-Maintenance/Database-Resources.md) + +2. Complete system configuration preparation:The system configuration of IoTDB involves multiple aspects, and the key system configuration introductions can be viewed [System Requirements](../Deployment-and-Maintenance/Environment-Requirements.md) + +3. Get installation package:You can visit [Apache IoTDB official website](https://iotdb.apache.org/zh/Download/ ) Get the IoTDB installation package.The specific installation package structure can be viewed: [Obtain IoTDB](../Deployment-and-Maintenance/IoTDB-Package_apache.md) + +4. Install database: You can choose the following tutorials for installation and deployment based on the actual deployment architecture: + + - Stand-Alone Deployment: [Stand-Alone Deployment](../Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md) + + - Cluster Deployment: [Cluster Deployment](../Deployment-and-Maintenance/Cluster-Deployment_apache.md) + +> ❗️Attention: Currently, we still recommend installing and deploying directly on physical/virtual machines. If Docker deployment is required, please refer to: [Docker Deployment](../Deployment-and-Maintenance/Docker-Deployment_apache.md) + +## How to use it? + +1. Database modeling design: Database modeling is an important step in creating a database system, which involves designing the structure and relationships of data to ensure that the organization of data meets the specific application requirements. The following document will help you quickly understand the modeling design of IoTDB: + + - Introduction to the concept of timeseries: [Navigating Time Series Data](../Basic-Concept/Navigating_Time_Series_Data.md) + + - Introduction to Modeling Design: [Data Model](../Basic-Concept/Data-Model-and-Terminology.md) + + - SQL syntax introduction: [Operate Metadata](../Basic-Concept/Operate-Metadata_apache.md) + +2. Write Data: In terms of data writing, IoTDB provides multiple ways to insert real-time data. Please refer to the basic data writing operations for details [Write Data](../Basic-Concept/Write-Delete-Data.md) + +3. Query Data: IoTDB provides rich data query functions. Please refer to the basic introduction of data query [Query Data](../Basic-Concept/Query-Data.md) + +4. Other advanced features: In addition to common functions such as writing and querying in databases, IoTDB also supports "Data Synchronisation、Stream Framework、Database Administration " and other functions, specific usage methods can be found in the specific document: + + - Data Synchronisation: [Data Synchronisation](../User-Manual/Data-Sync_apache.md) + + - Stream Framework: [Stream Framework](../User-Manual/Streaming_apache.md) + + - Database Administration: [Database Administration](../User-Manual/Authority-Management.md) + +5. API: IoTDB provides multiple application programming interfaces (API) for developers to interact with IoTDB in their applications, and currently supports [Java Native API](../API/Programming-Java-Native-API.md)、[Python Native API](../API/Programming-Python-Native-API.md)、[C++ Native API](../API/Programming-Cpp-Native-API.md) ,For more API, please refer to the official website 【API】 and other chapters + +## What other convenient peripheral tools are available? + +In addition to its rich features, IoTDB also has a comprehensive range of tools in its surrounding system. This document will help you quickly use the peripheral tool system : + + - Benchmark Tool: IoT benchmark is a time series database benchmark testing tool developed based on Java and big data environments, developed and open sourced by the School of Software at Tsinghua University. It supports multiple writing and querying methods, can store test information and results for further query or analysis, and supports integration with Tableau to visualize test results. For specific usage instructions, please refer to: [Benchmark Tool](../Tools-System/Benchmark.md) + + - Data Import Script: For different scenarios, IoTDB provides users with multiple ways to batch import data. For specific usage instructions, please refer to: [Data Import](../Tools-System/Data-Import-Tool.md) + + - Data Export Script: For different scenarios, IoTDB provides users with multiple ways to batch export data. For specific usage instructions, please refer to: [Data Export](../Tools-System/Data-Export-Tool.md) + +## Want to Learn More About the Technical Details? + +If you are interested in delving deeper into the technical aspects of IoTDB, you can refer to the following documents: + + - Publication: IoTDB features columnar storage, data encoding, pre-calculation, and indexing technologies, along with a SQL-like interface and high-performance data processing capabilities. It also integrates seamlessly with Apache Hadoop, MapReduce, and Apache Spark. For related research papers, please refer to: [Publication](../Technical-Insider/Publication.md) + + - Encoding & Compression: IoTDB optimizes storage efficiency for different data types through a variety of encoding and compression techniques. To learn more, please refer to:[Encoding & Compression](../Technical-Insider/Encoding-and-Compression.md) + + - Data Partitioning and Load Balancing: IoTDB has meticulously designed data partitioning strategies and load balancing algorithms based on the characteristics of time series data, enhancing the availability and performance of the cluster. For more information, please refer to: [Data Partitioning and Load Balancing](../Technical-Insider/Cluster-data-partitioning.md) + + +## Encountering problems during use? + +If you encounter difficulties during installation or use, you can move to [Frequently Asked Questions](../FAQ/Frequently-asked-questions.md) View in the middle + diff --git a/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_timecho.md b/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_timecho.md new file mode 100644 index 00000000..5b99e1a7 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_timecho.md @@ -0,0 +1,106 @@ + +# Quick Start + +This document will help you understand how to quickly get started with IoTDB. + +## How to install and deploy? + +This document will help you quickly install and deploy IoTDB. You can quickly locate the content you need to view through the following document links: + +1. Prepare the necessary machine resources: The deployment and operation of IoTDB require consideration of multiple aspects of machine resource configuration. Specific resource allocation can be viewed [Database Resources](../Deployment-and-Maintenance/Database-Resources.md) + +2. Complete system configuration preparation: The system configuration of IoTDB involves multiple aspects, and the key system configuration introductions can be viewed [System Requirements](../Deployment-and-Maintenance/Environment-Requirements.md) + +3. Get installation package: You can contact Timecho Business to obtain the IoTDB installation package to ensure that the downloaded version is the latest and stable. The specific installation package structure can be viewed: [Obtain TimechoDB](../Deployment-and-Maintenance/IoTDB-Package_timecho.md) + +4. Install database and activate: You can choose the following tutorials for installation and deployment based on the actual deployment architecture: + + - Stand-Alone Deployment: [Stand-Alone Deployment](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md) + + - Cluster Deployment: [Cluster Deployment](../Deployment-and-Maintenance/Cluster-Deployment_timecho.md) + + - Dual Active Deployment: [Dual Active Deployment](../Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md) + +> ❗️Attention: Currently, we still recommend installing and deploying directly on physical/virtual machines. If Docker deployment is required, please refer to: [Docker Deployment](../Deployment-and-Maintenance/Docker-Deployment_timecho.md) + +5. Install database supporting tools: The enterprise version database provides a monitoring panel 、Workbench Supporting tools, etc,It is recommended to install IoTDB when deploying the enterprise version, which can help you use IoTDB more conveniently: + + - Monitoring panel:Provides over a hundred database monitoring metrics for detailed monitoring of IoTDB and its operating system, enabling system optimization, performance optimization, bottleneck discovery, and more. The installation steps can be viewed [Monitoring panel](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) + + - Workbench: It is the visual interface of IoTDB,Support providing through interface interaction Operate Metadata、Query Data、Data Visualization and other functions, help users use the database easily and efficiently, and the installation steps can be viewed [Workbench Deployment](../Deployment-and-Maintenance/workbench-deployment_timecho.md) + +## How to use it? + +1. Database modeling design: Database modeling is an important step in creating a database system, which involves designing the structure and relationships of data to ensure that the organization of data meets the specific application requirements. The following document will help you quickly understand the modeling design of IoTDB: + + - Introduction to the concept of timeseries:[Navigating Time Series Data](../Basic-Concept/Navigating_Time_Series_Data.md) + + - Introduction to Modeling Design: [Data Model](../Basic-Concept/Data-Model-and-Terminology.md) + + - SQL syntax introduction:[Operate Metadata](../Basic-Concept/Operate-Metadata_timecho.md) + +2. Write Data: In terms of data writing, IoTDB provides multiple ways to insert real-time data. Please refer to the basic data writing operations for details [Write Data](../Basic-Concept/Write-Delete-Data.md) + +3. Query Data: IoTDB provides rich data query functions. Please refer to the basic introduction of data query [Query Data](../Basic-Concept/Query-Data.md) + +4. Other advanced features: In addition to common functions such as writing and querying in databases, IoTDB also supports "Data Synchronisation、Stream Framework、Security Management、Database Administration、AI Capability"and other functions, specific usage methods can be found in the specific document: + + - Data Synchronisation: [Data Synchronisation](../User-Manual/Data-Sync_timecho.md) + + - Stream Framework: [Stream Framework](../User-Manual/Streaming_timecho.md) + + - Security Management: [Security Management](../User-Manual/White-List_timecho.md) + + - Database Administration: [Database Administration](../User-Manual/Authority-Management.md) + + - AI Capability :[AI Capability](../User-Manual/AINode_timecho.md) + +5. API: IoTDB provides multiple application programming interfaces (API) for developers to interact with IoTDB in their applications, and currently supports[ Java Native API](../API/Programming-Java-Native-API.md)、[Python Native API](../API/Programming-Python-Native-API.md)、[C++ Native API](../API/Programming-Cpp-Native-API.md)、[Go Native API](../API/Programming-Go-Native-API.md), For more API, please refer to the official website 【API】 and other chapters + +## What other convenient peripheral tools are available? + +In addition to its rich features, IoTDB also has a comprehensive range of tools in its surrounding system. This document will help you quickly use the peripheral tool system : + + - Workbench: Workbench is a visual interface for IoTDB that supports interactive operations. It offers intuitive features for metadata management, data querying, and data visualization, enhancing the convenience and efficiency of user database operations. For detailed usage instructions, please refer to: [Workbench](../Deployment-and-Maintenance/workbench-deployment_timecho.md) + + - Monitor Tool: This is a tool for meticulous monitoring of IoTDB and its host operating system, covering hundreds of database monitoring metrics including database performance and system resources, which aids in system optimization and bottleneck identification. For detailed usage instructions, please refer to: [Monitor Tool](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) + + - Benchmark Tool: IoT benchmark is a time series database benchmark testing tool developed based on Java and big data environments, developed and open sourced by the School of Software at Tsinghua University. It supports multiple writing and querying methods, can store test information and results for further query or analysis, and supports integration with Tableau to visualize test results. For specific usage instructions, please refer to: [Benchmark Tool](../Tools-System/Benchmark.md) + + - Data Import Script: For different scenarios, IoTDB provides users with multiple ways to batch import data. For specific usage instructions, please refer to: [Data Import](../Tools-System/Data-Import-Tool.md) + + - Data Export Script: For different scenarios, IoTDB provides users with multiple ways to batch export data. For specific usage instructions, please refer to: [Data Export](../Tools-System/Data-Export-Tool.md) + +## Want to Learn More About the Technical Details? + +If you are interested in delving deeper into the technical aspects of IoTDB, you can refer to the following documents: + + - Research Paper: IoTDB features columnar storage, data encoding, pre-calculation, and indexing technologies, along with a SQL-like interface and high-performance data processing capabilities. It also integrates seamlessly with Apache Hadoop, MapReduce, and Apache Spark. For related research papers, please refer to: [Research Paper](../Technical-Insider/Publication.md) + + - Compression & Encoding: IoTDB optimizes storage efficiency for different data types through a variety of encoding and compression techniques. To learn more, please refer to:[Compression & Encoding](../Technical-Insider/Encoding-and-Compression.md) + + - Data Partitioning and Load Balancing: IoTDB has meticulously designed data partitioning strategies and load balancing algorithms based on the characteristics of time series data, enhancing the availability and performance of the cluster. For more information, please refer to: [Data Partitionin & Load Balancing](../Technical-Insider/Cluster-data-partitioning.md) + + +## Encountering problems during use? + +If you encounter difficulties during installation or use, you can move to [Frequently Asked Questions](../FAQ/Frequently-asked-questions.md) View in the middle \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Reference/Common-Config-Manual.md b/src/UserGuide/V2.0.1/Tree/Reference/Common-Config-Manual.md new file mode 100644 index 00000000..48eac21a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/Common-Config-Manual.md @@ -0,0 +1,2213 @@ + + +# Common Configuration + +IoTDB common files for ConfigNode and DataNode are under `conf`. + +* `iotdb-system.properties`:IoTDB system configurations. + + +## Effective +Different configuration parameters take effect in the following three ways: + ++ **Only allowed to be modified in first start up:** Can't be modified after first start, otherwise the ConfigNode/DataNode cannot start. ++ **After restarting system:** Can be modified after the ConfigNode/DataNode first start, but take effect after restart. ++ **hot-load:** Can be modified while the ConfigNode/DataNode is running, and trigger through sending the command(sql) `load configuration` or `set configuration` to the IoTDB server by client or session. + +## Configuration File + +### Replication Configuration + +* config\_node\_consensus\_protocol\_class + +| Name | config\_node\_consensus\_protocol\_class | +|:-----------:|:-----------------------------------------------------------------------| +| Description | Consensus protocol of ConfigNode replicas, only support RatisConsensus | +| Type | String | +| Default | org.apache.iotdb.consensus.ratis.RatisConsensus | +| Effective | Only allowed to be modified in first start up | + +* schema\_replication\_factor + +| Name | schema\_replication\_factor | +|:-----------:|:-----------------------------------------------------------------| +| Description | Schema replication num | +| Type | int32 | +| Default | 1 | +| Effective | Take effect on **new created Databases** after restarting system | + +* schema\_region\_consensus\_protocol\_class + +| Name | schema\_region\_consensus\_protocol\_class | +|:-----------:|:--------------------------------------------------------------------------------------------------------------------------------------------:| +| Description | Consensus protocol of schema replicas,larger than 1 replicas could only use RatisConsensus | +| Type | String | +| Default | org.apache.iotdb.consensus.ratis.RatisConsensus | +| Effective | Only allowed to be modified in first start up | + +* data\_replication\_factor + +| Name | data\_replication\_factor | +|:-----------:|:-----------------------------------------------------------------| +| Description | Data replication num | +| Type | int32 | +| Default | 1 | +| Effective | Take effect on **new created Databases** after restarting system | + +* data\_region\_consensus\_protocol\_class + +| Name | data\_region\_consensus\_protocol\_class | +|:-----------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | Consensus protocol of data replicas,larger than 1 replicas could use IoTConsensus or RatisConsensus | +| Type | String | +| Default | org.apache.iotdb.consensus.simple.SimpleConsensus | +| Effective | Only allowed to be modified in first start up | + +### Load balancing Configuration + +* series\_partition\_slot\_num + +| Name | series\_slot\_num | +|:-----------:|:----------------------------------------------| +| Description | Slot num of series partition | +| Type | int32 | +| Default | 10000 | +| Effective | Only allowed to be modified in first start up | + +* series\_partition\_executor\_class + +| Name | series\_partition\_executor\_class | +|:-----------:|:------------------------------------------------------------------| +| Description | Series partition hash function | +| Type | String | +| Default | org.apache.iotdb.commons.partition.executor.hash.BKDRHashExecutor | +| Effective | Only allowed to be modified in first start up | + +* schema\_region\_group\_extension\_policy + +| Name | schema\_region\_group\_extension\_policy | +|:-----------:|:------------------------------------------| +| Description | The extension policy of SchemaRegionGroup | +| Type | string | +| Default | AUTO | +| Effective | After restarting system | + +* default\_schema\_region\_group\_num\_per\_database + +| Name | default\_schema\_region\_group\_num\_per\_database | +|:-----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | The number of SchemaRegionGroups that each Database has when using the CUSTOM-SchemaRegionGroup extension policy. The least number of SchemaRegionGroups that each Database has when using the AUTO-SchemaRegionGroup extension policy. | +| Type | int | +| Default | 1 | +| Effective | After restarting system | + +* schema\_region\_per\_data\_node + +| Name | schema\_region\_per\_data\_node | +|:-----------:|:---------------------------------------------------------------------------| +| Description | The maximum number of SchemaRegion expected to be managed by each DataNode | +| Type | double | +| Default | 1.0 | +| Effective | After restarting system | + +* data\_region\_group\_extension\_policy + +| Name | data\_region\_group\_extension\_policy | +|:-----------:|:----------------------------------------| +| Description | The extension policy of DataRegionGroup | +| Type | string | +| Default | AUTO | +| Effective | After restarting system | + +* default\_data\_region\_group\_num\_per\_database + +| Name | default\_data\_region\_group\_num\_per\_database | +|:-----------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | The number of DataRegionGroups that each Database has when using the CUSTOM-DataRegionGroup extension policy. The least number of DataRegionGroups that each Database has when using the AUTO-DataRegionGroup extension policy. | +| Type | int | +| Default | 1 | +| Effective | After restarting system | + +* data\_region\_per\_data\_node + +| Name | data\_region\_per\_data\_node | +|:-----------:|:--------------------------------------------------------------------------| +| Description | The maximum number of DataRegion expected to be managed by each DataNode | +| Type | double | +| Default | 1.0 | +| Effective | After restarting system | + +* enable\_data\_partition\_inherit\_policy + +| Name | enable\_data\_partition\_inherit\_policy | +|:-----------:|:---------------------------------------------------| +| Description | Whether to enable the DataPartition inherit policy | +| Type | Boolean | +| Default | false | +| Effective | After restarting system | + +* leader\_distribution\_policy + +| Name | leader\_distribution\_policy | +|:-----------:|:--------------------------------------------------------| +| Description | The policy of cluster RegionGroups' leader distribution | +| Type | String | +| Default | MIN_COST_FLOW | +| Effective | After restarting system | + +* enable\_auto\_leader\_balance\_for\_ratis + +| Name | enable\_auto\_leader\_balance\_for\_ratis\_consensus | +|:-----------:|:-------------------------------------------------------------------| +| Description | Whether to enable auto leader balance for Ratis consensus protocol | +| Type | Boolean | +| Default | false | +| Effective | After restarting system | + +* enable\_auto\_leader\_balance\_for\_iot\_consensus + +| Name | enable\_auto\_leader\_balance\_for\_iot\_consensus | +|:-----------:|:----------------------------------------------------------------| +| Description | Whether to enable auto leader balance for IoTConsensus protocol | +| Type | Boolean | +| Default | true | +| Effective | After restarting system | + +### Cluster Management + +* cluster\_name + +| Name | cluster\_name | +|:-----------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | The name of cluster | +| Type | String | +| Default | default_cluster | +| Effective | Execute SQL in CLI: ```set configuration "cluster_name"="xxx"``` (xxx is the new cluster name) | +| Attention | This change is distributed to each node through the network. In the event of network fluctuations or node downtime, it is not guaranteed that the modification will be successful on all nodes. Nodes that fail to modify will not be able to join the cluster upon restart. At this time, it is necessary to manually modify the cluster_name item in the configuration file of the node, and then restart. Under normal circumstances, it is not recommended to change the cluster name by manually modifying the configuration file, nor is it recommended to hot load through the load configuration method. | + +* time\_partition\_interval + +| Name | time\_partition\_interval | +|:-----------:|:--------------------------------------------------------------| +| Description | Time partition interval of data when ConfigNode allocate data | +| Type | Long | +| Unit | ms | +| Default | 604800000 | +| Effective | Only allowed to be modified in first start up | + +* heartbeat\_interval\_in\_ms + +| Name | heartbeat\_interval\_in\_ms | +|:-----------:|:----------------------------------------| +| Description | Heartbeat interval in the cluster nodes | +| Type | Long | +| Unit | ms | +| Default | 1000 | +| Effective | After restarting system | + +* disk\_space\_warning\_threshold + +| Name | disk\_space\_warning\_threshold | +|:-----------:|:--------------------------------| +| Description | Disk remaining threshold | +| Type | double(percentage) | +| Default | 0.05 | +| Effective | After restarting system | + +### Memory Control Configuration + +* datanode\_memory\_proportion + +|Name| datanode\_memory\_proportion | +|:---:|:-------------------------------------------------------------------------------------------------------------| +|Description| Memory Allocation Ratio: StorageEngine, QueryEngine, SchemaEngine, StreamingEngine, Consensus and Free Memory | +|Type| Ratio | +|Default| 3:3:1:1:1:1 | +|Effective| After restarting system | + +* schema\_memory\_allocate\_proportion + +|Name| schema\_memory\_allocate\_proportion | +|:---:|:----------------------------------------------------------------------------------------| +|Description| Schema Memory Allocation Ratio: SchemaRegion, SchemaCache, PartitionCache and LastCache | +|Type| Ratio | +|Default| 5:3:1:1 | +|Effective| After restarting system | + +* storage\_engine\_memory\_proportion + +|Name| storage\_engine\_memory\_proportion | +|:---:|:------------------------------------| +|Description| Memory allocation ratio in StorageEngine: Write, Compaction | +|Type| Ratio | +|Default| 8:2 | +|Effective| After restarting system | + +* write\_memory\_proportion + +|Name| write\_memory\_proportion | +|:---:|:----------------------------------------------------------------| +|Description| Memory allocation ratio in writing: Memtable, TimePartitionInfo | +|Type| Ratio | +|Default| 19:1 | +|Effective| After restarting system | + +* concurrent\_writing\_time\_partition + +|Name| concurrent\_writing\_time\_partition | +|:---:|:---| +|Description| This config decides how many time partitions in a database can be inserted concurrently
For example, your partitionInterval is 86400 and you want to insert data in 5 different days, | +|Type|int32| +|Default| 1 | +|Effective|After restarting system| + +* primitive\_array\_size + +| Name | primitive\_array\_size | +|:-----------:|:----------------------------------------------------------| +| Description | primitive array size (length of each array) in array pool | +| Type | Int32 | +| Default | 64 | +| Effective | After restart system | + +* chunk\_metadata\_size\_proportion + +|Name| chunk\_metadata\_size\_proportion | +|:---:|:------------------------------------| +|Description| size proportion for chunk metadata maintains in memory when writing tsfile | +|Type| Double | +|Default| 0.1 | +|Effective|After restart system| + +* flush\_proportion + +| Name | flush\_proportion | +|:-----------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | Ratio of write memory for invoking flush disk, 0.4 by default If you have extremely high write load (like batch=1000), it can be set lower than the default value like 0.2 | +| Type | Double | +| Default | 0.4 | +| Effective | After restart system | + +* buffered\_arrays\_memory\_proportion + +|Name| buffered\_arrays\_memory\_proportion | +|:---:|:---| +|Description| Ratio of write memory allocated for buffered arrays | +|Type| Double | +|Default| 0.6 | +|Effective|After restart system| + +* reject\_proportion + +|Name| reject\_proportion | +|:---:|:---| +|Description| Ratio of write memory for rejecting insertion | +|Type| Double | +|Default| 0.8 | +|Effective|After restart system| + +* write\_memory\_variation\_report\_proportion + +| Name | write\_memory\_variation\_report\_proportion | +| :---------: | :----------------------------------------------------------- | +| Description | if memory cost of data region increased more than proportion of allocated memory for write, report to system | +| Type | Double | +| Default | 0.001 | +| Effective | After restarting system | + +* check\_period\_when\_insert\_blocked + +|Name| check\_period\_when\_insert\_blocked | +|:---:|:----------------------------------------------------------------------------| +|Description| when an inserting is rejected, waiting period (in ms) to check system again | +|Type| Int32 | +|Default| 50 | +|Effective| After restart system | + +* io\_task\_queue\_size\_for\_flushing + +|Name| io\_task\_queue\_size\_for\_flushing | +|:---:|:----------------------------------------------| +|Description| size of ioTaskQueue. The default value is 10 | +|Type| Int32 | +|Default| 10 | +|Effective| After restart system | + +* enable\_query\_memory\_estimation + +|Name| enable\_query\_memory\_estimation | +|:---:|:----------------------------------| +|Description| If true, we will estimate each query's possible memory footprint before executing it and deny it if its estimated memory exceeds current free memory | +|Type| bool | +|Default| true | +|Effective|hot-load| + +* partition\_cache\_size + +|Name| partition\_cache\_size | +|:---:|:---| +|Description| The max num of partition info record cached on DataNode. | +|Type| Int32 | +|Default| 1000 | +|Effective|After restarting system| + +### Schema Engine Configuration + +* schema\_engine\_mode + +| Name | schema\_engine\_mode | +|:-----------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | Schema engine mode, supporting Memory and PBTree modes; PBTree mode support evict the timeseries schema temporarily not used in memory at runtime, and load it into memory from disk when needed. This parameter must be the same on all DataNodes in one cluster. | +| Type | string | +| Default | Memory | +| Effective | Only allowed to be modified in first start up | + +* mlog\_buffer\_size + +|Name| mlog\_buffer\_size | +|:---:|:---| +|Description| size of log buffer in each metadata operation plan(in byte) | +|Type|int32| +|Default| 1048576 | +|Effective|After restart system| + +* sync\_mlog\_period\_in\_ms + +| Name | sync\_mlog\_period\_in\_ms | +| :---------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Description | The cycle when metadata log is periodically forced to be written to disk(in milliseconds). If force_mlog_period_in_ms = 0 it means force metadata log to be written to disk after each refreshment | +| Type | Int64 | +| Default | 100 | +| Effective | After restarting system | + +* tag\_attribute\_flush\_interval + +|Name| tag\_attribute\_flush\_interval | +|:---:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +|Description| interval num for tag and attribute records when force flushing to disk. When a certain amount of tag and attribute records is reached, they will be force flushed to disk. It is possible to lose at most tag_attribute_flush_interval records | +|Type| int32 | +|Default| 1000 | +|Effective| Only allowed to be modified in first start up | + +* tag\_attribute\_total\_size + +|Name| tag\_attribute\_total\_size | +|:---:|:---| +|Description| The maximum persistence size of tags and attributes of each time series.| +|Type| int32 | +|Default| 700 | +|Effective|Only allowed to be modified in first start up| + +* schema\_region\_device\_node\_cache\_size + +|Name| schema\_region\_device\_node\_cache\_size | +|:---:|:--------------------------------| +|Description| The max num of device node, used for speeding up device query, cached in schemaRegion. | +|Type| Int32 | +|Default| 10000 | +|Effective|After restarting system| + +* max\_measurement\_num\_of\_internal\_request + +|Name| max\_measurement\_num\_of\_internal\_request | +|:---:|:--------------------------------| +|Description| When there's too many measurements in one create timeseries plan, the plan will be split to several sub plan, with measurement num no more than this param.| +|Type| Int32 | +|Default| 10000 | +|Effective|After restarting system| + +### Configurations for creating schema automatically + +* enable\_auto\_create\_schema + +| Name | enable\_auto\_create\_schema | +| :---------: | :---------------------------------------------------------------------------- | +| Description | whether auto create the time series when a non-existed time series data comes | +| Type | true or false | +| Default | true | +| Effective | After restarting system | + +* default\_storage\_group\_level + +| Name | default\_storage\_group\_level | +| :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Description | Database level when creating schema automatically is enabled. For example, if we receives a data point from root.sg0.d1.s2, we will set root.sg0 as the database if database level is 1. (root is level 0) | +| Type | integer | +| Default | 1 | +| Effective | After restarting system | + +* boolean\_string\_infer\_type + +| Name | boolean\_string\_infer\_type | +| :---------: | :------------------------------------------------------------ | +| Description | To which type the values "true" and "false" should be reslved | +| Type | BOOLEAN or TEXT | +| Default | BOOLEAN | +| Effective | After restarting system | + +* integer\_string\_infer\_type + +| Name | integer\_string\_infer\_type | +| :---------: | :---------------------------------------------------------------------- | +| Description | To which type an integer string like "67" in a query should be resolved | +| Type | INT32, INT64, DOUBLE, FLOAT or TEXT | +| Default | DOUBLE | +| Effective | After restarting system | + +* floating\_string\_infer\_type + +| Name | floating\_string\_infer\_type | +| :---------: |:--------------------------------------------------------------------------------| +| Description | To which type a floating number string like "6.7" in a query should be resolved | +| Type | DOUBLE, FLOAT or TEXT | +| Default | DOUBLE | +| Effective | After restarting system | + +* nan\_string\_infer\_type + +| Name | nan\_string\_infer\_type | +| :---------: | :-------------------------------------------------------- | +| Description | To which type the value NaN in a query should be resolved | +| Type | DOUBLE, FLOAT or TEXT | +| Default | FLOAT | +| Effective | After restarting system | + +### Query Configurations + +* read\_consistency\_level + +| Name | mpp\_data\_exchange\_core\_pool\_size | +|:-----------:|:---------------------------------------------| +| Description | The read consistency level,
1. strong(Default, read from the leader replica)
2. weak(Read from a random replica) | +| Type | string | +| Default | strong | +| Effective | After restarting system | + +* meta\_data\_cache\_enable + +|Name| meta\_data\_cache\_enable | +|:---:|:---| +|Description| Whether to cache meta data(BloomFilter, ChunkMetadata and TimeSeriesMetadata) or not.| +|Type|Boolean| +|Default| true | +|Effective| After restarting system| + +* chunk\_timeseriesmeta\_free\_memory\_proportion + +|Name| chunk\_timeseriesmeta\_free\_memory\_proportion | +|:---:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +|Description| Read memory Allocation Ratio: BloomFilterCache : ChunkCache : TimeSeriesMetadataCache : Coordinator : Operators : DataExchange : timeIndex in TsFileResourceList : others. | +|Default| 1 : 100 : 200 : 300 : 400 | +|Effective| After restarting system | + +* enable\_last\_cache + +|Name| enable\_last\_cache | +|:---:|:---| +|Description| Whether to enable LAST cache. | +|Type| Boolean | +|Default| true | +|Effective|After restarting system| + +* max\_deduplicated\_path\_num + +|Name| max\_deduplicated\_path\_num | +|:---:|:---| +|Description| allowed max numbers of deduplicated path in one query. | +|Type| Int32 | +|Default| 1000 | +|Effective|After restarting system| + +* mpp\_data\_exchange\_core\_pool\_size + +| Name | mpp\_data\_exchange\_core\_pool\_size | +|:-----------:|:---------------------------------------------| +| Description | Core size of ThreadPool of MPP data exchange | +| Type | int32 | +| Default | 10 | +| Effective | After restarting system | + +* mpp\_data\_exchange\_max\_pool\_size + +| Name | mpp\_data\_exchange\_max\_pool\_size | +| :---------: | :------------------------------------------ | +| Description | Max size of ThreadPool of MPP data exchange | +| Type | int32 | +| Default | 10 | +| Effective | After restarting system | + +* mpp\_data\_exchange\_keep\_alive\_time\_in\_ms + +|Name| mpp\_data\_exchange\_keep\_alive\_time\_in\_ms | +|:---:|:---| +|Description| Max waiting time for MPP data exchange | +|Type| long | +|Default| 1000 | +|Effective|After restarting system| + +* driver\_task\_execution\_time\_slice\_in\_ms + +| Name | driver\_task\_execution\_time\_slice\_in\_ms | +| :---------: | :------------------------------------------- | +| Description | Maximum execution time of a DriverTask | +| Type | int32 | +| Default | 100 | +| Effective | After restarting system | + +* max\_tsblock\_size\_in\_bytes + +| Name | max\_tsblock\_size\_in\_bytes | +| :---------: | :---------------------------- | +| Description | Maximum capacity of a TsBlock | +| Type | int32 | +| Default | 1024 * 1024 (1 MB) | +| Effective | After restarting system | + +* max\_tsblock\_line\_numbers + +| Name | max\_tsblock\_line\_numbers | +| :---------: | :------------------------------------------ | +| Description | Maximum number of lines in a single TsBlock | +| Type | int32 | +| Default | 1000 | +| Effective | After restarting system | + +* slow\_query\_threshold + +|Name| slow\_query\_threshold | +|:---:|:----------------------------------------| +|Description| Time cost(ms) threshold for slow query. | +|Type| Int32 | +|Default| 30000 | +|Effective| Trigger | + +* query\_timeout\_threshold + +|Name| query\_timeout\_threshold | +|:---:|:---| +|Description| The max executing time of query. unit: ms | +|Type| Int32 | +|Default| 60000 | +|Effective| After restarting system| + +* max\_allowed\_concurrent\_queries + +|Name| max\_allowed\_concurrent\_queries | +|:---:|:---| +|Description| The maximum allowed concurrently executing queries. | +|Type| Int32 | +|Default| 1000 | +|Effective|After restarting system| + +* query\_thread\_count + +|Name| query\_thread\_count | +|:---:|:---------------------------------------------------------------------------------------------------------------------| +|Description| How many threads can concurrently execute query statement. When <= 0, use CPU core number. | +|Type| Int32 | +|Default | CPU core number | +|Effective| After restarting system | + +* batch\_size + +|Name| batch\_size | +|:---:|:---| +|Description| The amount of data iterate each time in server (the number of data strips, that is, the number of different timestamps.) | +|Type| Int32 | +|Default| 100000 | +|Effective|After restarting system| + +### TTL 配置 +* ttl\_check\_interval + +| Name | ttl\_check\_interval | +|:-----------:|:-------------------------------------------------------------------------------| +| Description | The interval of TTL check task in each database. Unit: ms. Default is 2 hours. | +| Type | int | +| Default | 7200000 | +| Effective | After restarting system | + +* max\_expired\_time + +| Name | max\_expired\_time | +| :----------: |:--------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | If a file contains device that has expired for more than this duration, then the file will be settled immediately. Unit: ms. Default is 1 month. | +| Type | int | +| Default | 2592000000 | +| Effective | After restarting system | + +* expired\_data\_ratio + +| Name | expired\_data\_ratio | +| :----------: |:--------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | The expired device ratio. If the ratio of expired devices in one file exceeds this value, then expired data of this file will be cleaned by compaction. | +| Type | float | +| Default | 0.3 | +| Effective | After restarting system | + + +### Storage Engine Configuration + +* timestamp\_precision + +| Name | timestamp\_precision | +| :----------: | :-------------------------- | +| Description | timestamp precision,support ms、us、ns | +| Type | String | +| Default | ms | +| Effective | Only allowed to be modified in first start up | + +* tier\_ttl\_in\_ms + +|Name| tier\_ttl\_in\_ms | +|:---:|:--------------| +|Description| Define the maximum age of data for which each tier is responsible | +|Type| long | +|Default| -1 | +|Effective| After restarting system | + +* max\_waiting\_time\_when\_insert\_blocked + +| Name | max\_waiting\_time\_when\_insert\_blocked | +| :---------: |:------------------------------------------------------------------------------| +| Description | When the waiting time(in ms) of an inserting exceeds this, throw an exception | +| Type | Int32 | +| Default | 10000 | +| Effective | After restarting system | + +* handle\_system\_error + +| Name | handle\_system\_error | +| :---------: |:-------------------------------------------------------| +| Description | What will the system do when unrecoverable error occurs| +| Type | String | +| Default | CHANGE\_TO\_READ\_ONLY | +| Effective | After restarting system | + +* write\_memory\_variation\_report\_proportion + +| Name | write\_memory\_variation\_report\_proportion | +| :---------: | :----------------------------------------------------------------------------------------------------------- | +| Description | if memory cost of data region increased more than proportion of allocated memory for write, report to system | +| Type | Double | +| Default | 0.001 | +| Effective | After restarting system | + +* enable\_timed\_flush\_seq\_memtable + +| Name | enable\_timed\_flush\_seq\_memtable | +|:-----------:|:------------------------------------------------| +| Description | whether to enable timed flush sequence memtable | +| Type | Boolean | +| Default | true | +| Effective | hot-load | + +* seq\_memtable\_flush\_interval\_in\_ms + +| Name | seq\_memtable\_flush\_interval\_in\_ms | +|:-----------:|:---------------------------------------------------------------------------------------------------------| +| Description | if a memTable's created time is older than current time minus this, the memtable will be flushed to disk | +| Type | int32 | +| Default | 10800000 | +| Effective | hot-load | + +* seq\_memtable\_flush\_check\_interval\_in\_ms + +|Name| seq\_memtable\_flush\_check\_interval\_in\_ms | +|:---:|:---| +|Description| the interval to check whether sequence memtables need flushing | +|Type|int32| +|Default| 600000 | +|Effective| hot-load | + +* enable\_timed\_flush\_unseq\_memtable + +|Name| enable\_timed\_flush\_unseq\_memtable | +|:---:|:---| +|Description| whether to enable timed flush unsequence memtable | +|Type|Boolean| +|Default| false | +|Effective| hot-load | + +* unseq\_memtable\_flush\_interval\_in\_ms + +| Name | unseq\_memtable\_flush\_interval\_in\_ms | +|:-----------:|:---------------------------------------------------------------------------------------------------------| +| Description | if a memTable's created time is older than current time minus this, the memtable will be flushed to disk | +| Type | int32 | +| Default | 600000 | +| Effective | hot-load | + +* unseq\_memtable\_flush\_check\_interval\_in\_ms + +|Name| unseq\_memtable\_flush\_check\_interval\_in\_ms | +|:---:|:---| +|Description| the interval to check whether unsequence memtables need flushing | +|Type|int32| +|Default| 30000 | +|Effective| hot-load | + +* tvlist\_sort\_algorithm + +|Name| tvlist\_sort\_algorithm | +|:---:|:--------------------------------------------------| +|Description| the sort algorithm used in the memtable's TVList | +|Type| String | +|Default| TIM | +|Effective| After restarting system | + +* avg\_series\_point\_number\_threshold + +|Name| avg\_series\_point\_number\_threshold | +|:---:|:-------------------------------------------------------| +|Description| max average number of point of each series in memtable | +|Type| int32 | +|Default| 100000 | +|Effective| After restarting system | + +* flush\_thread\_count + +|Name| flush\_thread\_count | +|:---:|:---| +|Description| The thread number used to perform the operation when IoTDB writes data in memory to disk. If the value is less than or equal to 0, then the number of CPU cores installed on the machine is used. The default is 0.| +|Type| int32 | +|Default| 0 | +|Effective|After restarting system| + +* enable\_partial\_insert + +|Name| enable\_partial\_insert | +|:---:|:---| +|Description| Whether continue to write other measurements if some measurements are failed in one insertion.| +|Type| Boolean | +|Default| true | +|Effective|After restarting system| + +* recovery\_log\_interval\_in\_ms + +|Name| recovery\_log\_interval\_in\_ms | +|:---:|:------------------------------------------------------------------------| +|Description| the interval to log recover progress of each region when starting iotdb | +|Type| Int32 | +|Default| 5000 | +|Effective| After restarting system | + +* 0.13\_data\_insert\_adapt + +|Name| 0.13\_data\_insert\_adapt | +|:---:|:----------------------------------------------------------------------| +|Description| if using v0.13 client to insert data, set this configuration to true. | +|Type| Boolean | +|Default| false | +|Effective| After restarting system | + + +* device\_path\_cache\_size + +| Name | device\_path\_cache\_size | +|:---------:|:--------------------------------------------------------------------------------------------------------------------------| +|Description| The max size of the device path cache. This cache is for avoiding initialize duplicated device id object in write process | +| Type | Int32 | +| Default | 500000 | +| Effective | After restarting system | + +* insert\_multi\_tablet\_enable\_multithreading\_column\_threshold + +| Name | insert\_multi\_tablet\_enable\_multithreading\_column\_threshold | +| :---------: | :--------------------------------------------------------------------------------------------- | +| Description | When the insert plan column count reaches the specified threshold, multi-threading is enabled. | +| Type | int32 | +| Default | 10 | +| Effective | After restarting system | + +### Compaction Configurations + +* enable\_seq\_space\_compaction + +| Name | enable\_seq\_space\_compaction | +| :---------: |:---------------------------------------------| +| Description | enable the compaction between sequence files | +| Type | Boolean | +| Default | true | +| Effective | hot-load | + +* enable\_unseq\_space\_compaction + +| Name | enable\_unseq\_space\_compaction | +| :---------: |:-----------------------------------------------| +| Description | enable the compaction between unsequence files | +| Type | Boolean | +| Default | true | +| Effective | hot-load | + +* enable\_cross\_space\_compaction + +| Name | enable\_cross\_space\_compaction | +| :---------: |:------------------------------------------------------------------| +| Description | enable the compaction between sequence files and unsequence files | +| Type | Boolean | +| Default | true | +| Effective | hot-load | + +* enable\_auto\_repair\_compaction + +| Name | enable\_auto\_repair\_compaction | +| :---------: |:------------------------------------------------------------------| +| Description | enable auto repair unsorted file by compaction | +| Type | Boolean | +| Default | true | +| Effective | hot-load | + +* cross\_selector + +|Name| cross\_selector | +|:---:|:-------------------------------------------------| +|Description| the task selector type of cross space compaction | +|Type| String | +|Default| rewrite | +|Effective| After restart system | + +* cross\_performer + +|Name| cross\_performer | +|:---:|:---------------------------------------------------| +|Description| the task performer type of cross space compaction. The options are read_point and fast, read_point is the default and fast is still under test | +|Type| String | +|Default| read\_point | +|Effective| After restart system | + +* inner\_seq\_selector + +|Name| inner\_seq\_selector | +|:---:|:-----------------------------------------------------------------------------------------------------------------------------| +|Description| the task selector type of inner sequence space compaction. Options: size\_tiered\_single_\target,size\_tiered\_multi\_target | +|Type| String | +|Default| hot-load | +|Effective| hot-load | + +* inner\_seq\_performer + +|Name| inner\_seq\_peformer | +|:---:|:--------------------------------------------------------------------------------------------------------------------------------------------------------| +|Description| the task performer type of inner sequence space compaction. The options are read_chunk and fast, read_chunk is the default and fast is still under test | +|Type| String | +|Default| read\_chunk | +|Effective| After restart system | + +* inner\_unseq\_selector + +|Name| inner\_unseq\_selector | +|:---:|:------------------------------------------------------------| +|Description| the task selector type of inner unsequence space compactionn. Options: size\_tiered\_single_\target,size\_tiered\_multi\_target | +|Type| String | +|Default| hot-load | +|Effective| hot-load | + +* inner\_unseq\_performer + +|Name| inner\_unseq\_peformer | +|:---:|:--------------------------------------------------------------| +|Description| the task performer type of inner unsequence space compaction. The options are read_point and fast, read_point is the default and fast is still under test | +|Type| String | +|Default| read\_point | +|Effective| After restart system | + +* compaction\_priority + +| Name | compaction\_priority | +| :---------: |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | Priority of compaction task. When it is BALANCE, system executes all types of compaction equally; when it is INNER\_CROSS, system takes precedence over executing inner space compaction task; when it is CROSS\_INNER, system takes precedence over executing cross space compaction task | +| Type | String | +| Default | INNER_CROSS | +| Effective | After restart system | + +* target\_compaction\_file\_size + +| Name | target\_compaction\_file\_size | +| :---------: |:-----------------------------------------------| +| Description | The target file size in compaction | +| Type | Int64 | +| Default | 2147483648 | +| Effective | After restart system | + +* target\_chunk\_size + +| Name | target\_chunk\_size | +| :---------: | :--------------------------------- | +| Description | The target size of compacted chunk | +| Type | Int64 | +| Default | 1048576 | +| Effective | After restart system | + +* target\_chunk\_point\_num + +|Name| target\_chunk\_point\_num | +|:---:|:---| +|Description| The target point number of compacted chunk | +|Type| int32 | +|Default| 100000 | +|Effective|After restart system| + +* chunk\_size\_lower\_bound\_in\_compaction + +| Name | chunk\_size\_lower\_bound\_in\_compaction | +| :---------: |:----------------------------------------------------------------------------------------| +| Description | A source chunk will be deserialized in compaction when its size is less than this value | +| Type | Int64 | +| Default | 10240 | +| Effective | After restart system | + +* chunk\_point\_num\_lower\_bound\_in\_compaction + +|Name| chunk\_point\_num\_lower\_bound\_in\_compaction | +|:---:|:---------------------------------------------------------------------------------------------| +|Description| A source chunk will be deserialized in compaction when its point num is less than this value | +|Type| int32 | +|Default| 1000 | +|Effective| After restart system | + +* inner\_compaction\_total\_file\_num\_threshold + +|Name| inner\_compaction\_total\_file\_num\_threshold | +|:---:|:---------------------------------------------------------| +|Description| The max num of files encounter in inner space compaction | +|Type| int32 | +|Default| 100 | +|Effective| hot-load | + +* inner\_compaction\_total\_file\_size\_threshold + +|Name| inner\_compaction\_total\_file\_size\_threshold | +|:---:|:----------------------------------------------------------------| +|Description| The total file size limit in inner space compaction. Unit: byte | +|Type| int64 | +|Default| 10737418240 | +|Effective| hot-load | + +* compaction\_max\_aligned\_series\_num\_in\_one\_batch + +|Name| compaction\_max\_aligned\_series\_num\_in\_one\_batch | +|:---:|:--------------------------------------------------------------------| +|Description| How many value chunk will be compacted in aligned series compaction | +|Type| int32 | +|Default| 10 | +|Effective| hot-load | + +* max\_level\_gap\_in\_inner\_compaction + +|Name| max\_level\_gap\_in\_inner\_compaction | +|:---:|:------------------------------------------------| +|Description| The max level gap in inner compaction selection | +|Type| int32 | +|Default| 2 | +|Effective| hot-load | + +* inner\_compaction\_candidate\_file\_num + +|Name| inner\_compaction\_candidate\_file\_num | +|:---:|:-------------------------------------------------------------------------------| +|Description| The file num requirement when selecting inner space compaction candidate files | +|Type| int32 | +|Default| 30 | +|Effective| hot-load | + +* max\_cross\_compaction\_file\_num + +|Name| max\_cross\_compaction\_candidate\_file\_num | +|:---:|:---------------------------------------------------------| +|Description| The max num of files encounter in cross space compaction | +|Type| int32 | +|Default| 500 | +|Effective| hot-load | + +* max\_cross\_compaction\_file\_size + +|Name| max\_cross\_compaction\_candidate\_file\_size | +|:---:|:----------------------------------------------------------| +|Description| The max size of files encounter in cross space compaction | +|Type| Int64 | +|Default| 5368709120 | +|Effective| hot-load | + +* compaction\_thread\_count + +|Name| compaction\_thread\_count | +|:---:|:---------------------------------| +|Description| thread num to execute compaction | +|Type| int32 | +|Default| 10 | +|Effective| hot-load | + +* compaction\_schedule\_interval\_in\_ms + +| Name | compaction\_schedule\_interval\_in\_ms | +| :---------: | :------------------------------------- | +| Description | interval of scheduling compaction | +| Type | Int64 | +| Default | 60000 | +| Effective | After restart system | + +* compaction\_submission\_interval\_in\_ms + +| Name | compaction\_submission\_interval\_in\_ms | +| :---------: | :--------------------------------------- | +| Description | interval of submitting compaction task | +| Type | Int64 | +| Default | 60000 | +| Effective | After restart system | + +* compaction\_write\_throughput\_mb\_per\_sec + +|Name| compaction\_write\_throughput\_mb\_per\_sec | +|:---:|:-------------------------------------------------| +|Description| The write rate of all compaction tasks in MB/s, values less than or equal to 0 means no limit | +|Type| int32 | +|Default| 16 | +|Effective| hot-load | + +* compaction\_read\_throughput\_mb\_per\_sec + +|Name| compaction\_read\_throughput\_mb\_per\_sec | +|:---:|:------------------------------------------------| +|Description| The read rate of all compaction tasks in MB/s, values less than or equal to 0 means no limit | +|Type| int32 | +|Default| 0 | +|Effective| hot-load | + +* compaction\_read\_operation\_per\_sec + +|Name| compaction\_read\_operation\_per\_sec | +|:---:|:---------------------------------------------------------------------------------------------------------------| +|Description| The read operation of all compaction tasks can reach per second, values less than or equal to 0 means no limit | +|Type| int32 | +|Default| 0 | +|Effective| hot-load | + +* sub\_compaction\_thread\_count + +|Name| sub\_compaction\_thread\_count | +|:---:|:--------------------------------------------------------------------------| +|Description| the number of sub-compaction threads to accelerate cross space compaction | +|Type| Int32 | +|Default| 4 | +|Effective| hot-load | + +* enable\_tsfile\_validation + +| Name | enable\_tsfile\_validation | +|:-----------:|:--------------------------------------------------------------------------| +| Description | Verify that TSfiles generated by Flush, Load, and Compaction are correct. | +| Type | boolean | +| Default | false | +| Effective | hot-load | + +* candidate\_compaction\_task\_queue\_size + +|Name| candidate\_compaction\_task\_queue\_size | +|:---:|:--------------------------------------------| +|Description| The size of candidate compaction task queue | +|Type| Int32 | +|Default| 50 | +|Effective| After restart system | + +* compaction\_schedule\_thread\_num + +|Name| compaction\_schedule\_thread\_num | +|:---:|:--------------------------------------------------------------------------| +|Description| The number of threads to be set up to select compaction task. | +|Type| Int32 | +|Default| 4 | +|Effective| hot-load | + +### Write Ahead Log Configuration + +* wal\_mode + +| Name | wal\_mode | +|:-----------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | The write mode of wal. For DISABLE mode, the system will disable wal. For SYNC mode, the system will submit wal synchronously, write request will not return until its wal is fsynced to the disk successfully. For ASYNC mode, the system will submit wal asynchronously, write request will return immediately no matter its wal is fsynced to the disk successfully. | +| Type | String | +| Default | ASYNC | +| Effective | After restart system | + +* max\_wal\_nodes\_num + +| Name | max\_wal\_nodes\_num | +|:-----------:|:---------------------------------------------------------------------------------------------------------------------------------------| +| Description | Max number of wal nodes, each node corresponds to one wal directory. The default value 0 means the number is determined by the system. | +| Type | int32 | +| Default | 0 | +| Effective | After restart system | + +* wal\_async\_mode\_fsync\_delay\_in\_ms + +| Name | wal\_async\_mode\_fsync\_delay\_in\_ms | +|:-----------:|:--------------------------------------------------------------------------------| +| Description | Duration a wal flush operation will wait before calling fsync in the async mode | +| Type | int32 | +| Default | 1000 | +| Effective | hot-load | + +* wal\_sync\_mode\_fsync\_delay\_in\_ms + +| Name | wal\_sync\_mode\_fsync\_delay\_in\_ms | +|:-----------:|:-------------------------------------------------------------------------------| +| Description | Duration a wal flush operation will wait before calling fsync in the sync mode | +| Type | int32 | +| Default | 3 | +| Effective | hot-load | + +* wal\_buffer\_size\_in\_byte + +| Name | wal\_buffer\_size\_in\_byte | +|:-----------:|:-----------------------------| +| Description | Buffer size of each wal node | +| Type | int32 | +| Default | 33554432 | +| Effective | After restart system | + +* wal\_buffer\_queue\_capacity + +| Name | wal\_buffer\_queue\_capacity | +|:-----------:|:-------------------------------------------| +| Description | Blocking queue capacity of each wal buffer | +| Type | int32 | +| Default | 500 | +| Effective | After restart system | + +* wal\_file\_size\_threshold\_in\_byte + +| Name | wal\_file\_size\_threshold\_in\_byte | +|:-----------:|:-------------------------------------| +| Description | Size threshold of each wal file | +| Type | int32 | +| Default | 31457280 | +| Effective | hot-load | + +* wal\_min\_effective\_info\_ratio + +| Name | wal\_min\_effective\_info\_ratio | +|:-----------:|:----------------------------------------------------| +| Description | Minimum ratio of effective information in wal files | +| Type | double | +| Default | 0.1 | +| Effective | hot-load | + +* wal\_memtable\_snapshot\_threshold\_in\_byte + +| Name | wal\_memtable\_snapshot\_threshold\_in\_byte | +|:-----------:|:----------------------------------------------------------------| +| Description | MemTable size threshold for triggering MemTable snapshot in wal | +| Type | int64 | +| Default | 8388608 | +| Effective | hot-load | + +* max\_wal\_memtable\_snapshot\_num + +| Name | max\_wal\_memtable\_snapshot\_num | +|:-----------:|:--------------------------------------| +| Description | MemTable's max snapshot number in wal | +| Type | int32 | +| Default | 1 | +| Effective | hot-load | + +* delete\_wal\_files\_period\_in\_ms + +| Name | delete\_wal\_files\_period\_in\_ms | +|:-----------:|:------------------------------------------------------------| +| Description | The period when outdated wal files are periodically deleted | +| Type | int64 | +| Default | 20000 | +| Effective | hot-load | + +### TsFile Configurations + +* group\_size\_in\_byte + +|Name|group\_size\_in\_byte| +|:---:|:---| +|Description|The data size written to the disk per time| +|Type|int32| +|Default| 134217728 | +|Effective|hot-load| + +* page\_size\_in\_byte + +|Name| page\_size\_in\_byte | +|:---:|:---| +|Description|The maximum size of a single page written in memory when each column in memory is written (in bytes)| +|Type|int32| +|Default| 65536 | +|Effective|hot-load| + +* max\_number\_of\_points\_in\_page + +|Name| max\_number\_of\_points\_in\_page | +|:---:|:-----------------------------------------------------------------------------------| +|Description| The maximum number of data points (timestamps - valued groups) contained in a page | +|Type| int32 | +|Default| 10000 | +|Effective| hot-load | + +* pattern\_matching\_threshold + +|Name| pattern\_matching\_threshold | +|:---:|:-----------------------------------| +|Description| Max matching time of regex pattern | +|Type| int32 | +|Default| 1000000 | +|Effective| hot-load | + +* max\_degree\_of\_index\_node + +|Name| max\_degree\_of\_index\_node | +|:---:|:---| +|Description|The maximum degree of the metadata index tree (that is, the max number of each node's children)| +|Type|int32| +|Default| 256 | +|Effective|Only allowed to be modified in first start up| + +* max\_string\_length + +|Name| max\_string\_length | +|:---:|:---| +|Description|The maximum length of a single string (number of character)| +|Type|int32| +|Default| 128 | +|Effective|hot-load| + +* value\_encoder + +| Name | value\_encoder | +| :---------: | :------------------------------------ | +| Description | Encoding type of value column | +| Type | Enum String: “TS_2DIFF”,“PLAIN”,“RLE” | +| Default | PLAIN | +| Effective | hot-load | + +* float\_precision + +|Name| float\_precision | +|:---:|:---| +|Description| The precision of the floating point number.(The number of digits after the decimal point) | +|Type|int32| +|Default| The default is 2 digits. Note: The 32-bit floating point number has a decimal precision of 7 bits, and the 64-bit floating point number has a decimal precision of 15 bits. If the setting is out of the range, it will have no practical significance. | +|Effective|hot-load| + +* compressor + +| Name | compressor | +|:-----------:|:-----------------------------------------------------------------------| +| Description | Data compression method; Time compression method in aligned timeseries | +| Type | Enum String : "UNCOMPRESSED", "SNAPPY", "LZ4", "ZSTD", "LZMA2" | +| Default | SNAPPY | +| Effective | hot-load | + +* bloomFilterErrorRate + +| Name | bloomFilterErrorRate | +| :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Description | The false positive rate of bloom filter in each TsFile. Bloom filter checks whether a given time series is in the tsfile before loading metadata. This can improve the performance of loading metadata and skip the tsfile that doesn't contain specified time series. If you want to learn more about its mechanism, you can refer to: [wiki page of bloom filter](https://en.wikipedia.org/wiki/Bloom_filter). | +| Type | float, (0, 1) | +| Default | 0.05 | +| Effective | After restarting system | + + +### Authorization Configuration + +* authorizer\_provider\_class + +| Name | authorizer\_provider\_class | +| :--------------------: | :------------------------------------------------------ | +| Description | the class name of the authorization service | +| Type | String | +| Default | org.apache.iotdb.commons.auth.authorizer.LocalFileAuthorizer | +| Effective | After restarting system | +| Other available values | org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer | + +* openID\_url + +| Name | openID\_url | +| :---------: | :----------------------------------------------- | +| Description | the openID server if OpenIdAuthorizer is enabled | +| Type | String (a http url) | +| Default | no | +| Effective | After restarting system | + +* iotdb\_server\_encrypt\_decrypt\_provider + +| Name | iotdb\_server\_encrypt\_decrypt\_provider | +| :---------: | :------------------------------------------------------------- | +| Description | The Class for user password encryption | +| Type | String | +| Default | org.apache.iotdb.commons.security.encrypt.MessageDigestEncrypt | +| Effective | Only allowed to be modified in first start up | + +* iotdb\_server\_encrypt\_decrypt\_provider\_parameter + +| Name | iotdb\_server\_encrypt\_decrypt\_provider\_parameter | +| :---------: | :--------------------------------------------------------------- | +| Description | Parameters used to initialize the user password encryption class | +| Type | String | +| Default | 空 | +| Effective | After restarting system | + +* author\_cache\_size + +| Name | author\_cache\_size | +| :---------: | :-------------------------- | +| Description | Cache size of user and role | +| Type | int32 | +| Default | 1000 | +| Effective | After restarting system | + +* author\_cache\_expire\_time + +| Name | author\_cache\_expire\_time | +| :---------: | :------------------------------------------------ | +| Description | Cache expire time of user and role, Unit: minutes | +| Type | int32 | +| Default | 30 | +| Effective | After restarting system | + +### UDF Configuration + +* udf\_initial\_byte\_array\_length\_for\_memory\_control + +| Name | udf\_initial\_byte\_array\_length\_for\_memory\_control | +| :---------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Description | Used to estimate the memory usage of text fields in a UDF query. It is recommended to set this value to be slightly larger than the average length of all texts. | +| Type | int32 | +| Default | 48 | +| Effective | After restarting system | + +* udf\_memory\_budget\_in\_mb + +| Name | udf\_memory\_budget\_in\_mb | +| :---------: | :--------------------------------------------------------------------------------------------------------- | +| Description | How much memory may be used in ONE UDF query (in MB). The upper limit is 20% of allocated memory for read. | +| Type | Float | +| Default | 30.0 | +| Effective | After restarting system | + +* udf\_reader\_transformer\_collector\_memory\_proportion + +| Name | udf\_reader\_transformer\_collector\_memory\_proportion | +| :---------: | :---------------------------------------------------------------------------------------------------------------------------------- | +| Description | UDF memory allocation ratio for reader, transformer and collector. The parameter form is a : b : c, where a, b, and c are integers. | +| Type | String | +| Default | 1:1:1 | +| Effective | After restarting system | + +* udf\_root\_dir + +| Name | udf\_root\_dir | +| :---------: | :------------------------ | +| Description | Root directory of UDF | +| Type | String | +| Default | ext/udf(Windows:ext\\udf) | +| Effective | After restarting system | + +* udf\_lib\_dir + +| Name | udf\_lib\_dir | +| :---------: | :--------------------------- | +| Description | UDF log and jar file dir | +| Type | String | +| Default | ext/udf(Windows:ext\\udf) | +| Effective | After restarting system | + +### Trigger Configuration + + +* trigger\_lib\_dir + +| Name | trigger\_lib\_dir | +| :---------: |:------------------------| +| Description | Trigger JAR file dir | +| Type | String | +| Default | ext/trigger | +| Effective | After restarting system | + +* stateful\_trigger\_retry\_num\_when\_not\_found + +| Name | stateful\_trigger\_retry\_num\_when\_not\_found | +| :---------: |:-----------------------------------------------------------------------------------| +| Description | How many times we will retry to found an instance of stateful trigger on DataNodes | +| Type | Int32 | +| Default | 3 | +| Effective | After restarting system | + + +### SELECT-INTO + +* into\_operation\_buffer\_size\_in\_byte + +| Name | into\_operation\_buffer\_size\_in\_byte | +| :---------: | :---------------------------------------------------------------------------------------------------------------------------------- | +| Description | When the select-into statement is executed, the maximum memory occupied by the data to be written (unit: Byte) | +| Type | int64 | +| Default | 100MB | +| Effective | hot-load | + + +* select\_into\_insert\_tablet\_plan\_row\_limit + +| Name | select\_into\_insert\_tablet\_plan\_row\_limit | +| :---------: | :---------------------------------------------------------------------------------------------------------------------------------- | +| Description | The maximum number of rows that can be processed in insert-tablet-plan when executing select-into statements. When <= 0, use 10000. | +| Type | int32 | +| Default | 10000 | +| Effective | hot-load | + +* into\_operation\_execution\_thread\_count + +| Name | into\_operation\_execution\_thread\_count | +| :---------: | :------------------------------------------------------------ | +| Description | The number of threads in the thread pool that execute insert-tablet tasks | +| Type | int32 | +| Default | 2 | +| Effective | After restarting system | + +### Continuous Query + +* continuous\_query\_execution\_thread + +| Name | continuous\_query\_execution\_thread | +| :---------: | :------------------------------------------------------------ | +| Description | How many threads will be set up to perform continuous queries | +| Type | int32 | +| Default | max(1, the / 2) | +| Effective | After restarting system | + +* continuous\_query\_min\_every\_interval + +| Name | continuous\_query\_min\_every\_interval | +| :---------: | :-------------------------------------------------- | +| Description | Minimum every interval to perform continuous query. | +| Type | duration | +| Default | 1s | +| Effective | After restarting system | + +### PIPE Configuration + +* pipe_lib_dir + +| **Name** | **pipe_lib_dir** | +| ------------ | -------------------------- | +| Description | Directory for storing custom Pipe plugins | +| Type | string | +| Default Value | ext/pipe | +| Effective | Not currently supported for modification | + +* pipe_subtask_executor_max_thread_num + +| **Name** | **pipe_subtask_executor_max_thread_num** | +| ------------ | ------------------------------------------------------------ | +| Description | The maximum number of threads that can be used for processors and sinks in Pipe subtasks. The actual value will be the minimum of pipe_subtask_executor_max_thread_num and the maximum of 1 and half of the CPU core count. | +| Type | int | +| Default Value | 5 | +| Effective | After restarting system | + +* pipe_sink_timeout_ms + +| **Name** | **pipe_sink_timeout_ms** | +| ------------ | --------------------------------------------- | +| Description | The connection timeout for Thrift clients in milliseconds. | +| Type | int | +| Default Value | 900000 | +| Effective | After restarting system | + +* pipe_sink_selector_number + +| **Name** | **pipe_sink_selector_number** | +| ------------ | ------------------------------------------------------------ | +| Description | The maximum number of threads for processing execution results in the iotdb-thrift-async-sink plugin. It is recommended to set this value to be less than or equal to pipe_sink_max_client_number. | +| Type | int | +| Default Value | 4 | +| Effective | After restarting system | + +* pipe_sink_max_client_number + +| **Name** | **pipe_sink_max_client_number** | +| ------------ | ----------------------------------------------------------- | +| Description | The maximum number of clients that can be used in the iotdb-thrift-async-sink plugin. | +| Type | int | +| Default Value | 16 | +| Effective | After restarting system | + +* pipe_air_gap_receiver_enabled + +| **Name** | **pipe_air_gap_receiver_enabled** | +| ------------ | ------------------------------------------------------------ | +| Description | Whether to enable receiving Pipe data through a gateway. The receiver can only return 0 or 1 in TCP mode to indicate whether the data was successfully received. | +| Type | Boolean | +| Default Value | false | +| Effective | After restarting system | + +* pipe_air_gap_receiver_port + +| **Name** | **pipe_air_gap_receiver_port** | +| ------------ | ------------------------------------ | +| Description | The port used by the server to receive Pipe data through a gateway. | +| Type | int | +| Default Value | 9780 | +| Effective | After restarting system | + +* pipe_all_sinks_rate_limit_bytes_per_second + +| **Name** | **pipe_all_sinks_rate_limit_bytes_per_second** | +| ------------ | ------------------------------------------------------------ | +| Description | The total number of bytes per second that all Pipe sinks can transmit. When the given value is less than or equal to 0, it indicates there is no limit. The default value is -1, which means there is no limit. | +| Type | double | +| Default Value | -1 | +| Effective | Can be hot-loaded | + +### IOTConsensus Configuration + +* data_region_iot_max_log_entries_num_per_batch + +| Name | data_region_iot_max_log_entries_num_per_batch | +| :---------: | :------------------------------------------------ | +| Description | The maximum log entries num in IoTConsensus Batch | +| Type | int32 | +| Default | 1024 | +| Effective | After restarting system | + +* data_region_iot_max_size_per_batch + +| Name | data_region_iot_max_size_per_batch | +| :---------: | :------------------------------------- | +| Description | The maximum size in IoTConsensus Batch | +| Type | int32 | +| Default | 16MB | +| Effective | After restarting system | + +* data_region_iot_max_pending_batches_num + +| Name | data_region_iot_max_pending_batches_num | +| :---------: | :---------------------------------------------- | +| Description | The maximum pending batches num in IoTConsensus | +| Type | int32 | +| Default | 12 | +| Effective | After restarting system | + +* data_region_iot_max_memory_ratio_for_queue + +| Name | data_region_iot_max_memory_ratio_for_queue | +| :---------: | :------------------------------------------------- | +| Description | The maximum memory ratio for queue in IoTConsensus | +| Type | double | +| Default | 0.6 | +| Effective | After restarting system | + +### RatisConsensus Configuration + +* config\_node\_ratis\_log\_appender\_buffer\_size\_max + +| Name | config\_node\_ratis\_log\_appender\_buffer\_size\_max | +|:------:|:-----------------------------------------------| +| Description | confignode max payload size for a single log-sync-RPC from leader to follower | +| Type | int32 | +| Default | 4MB | +| Effective | After restarting system | + + +* schema\_region\_ratis\_log\_appender\_buffer\_size\_max + +| Name | schema\_region\_ratis\_log\_appender\_buffer\_size\_max | +|:------:|:-------------------------------------------------| +| Description | schema region max payload size for a single log-sync-RPC from leader to follower | +| Type | int32 | +| Default | 4MB | +| Effective | After restarting system | + +* data\_region\_ratis\_log\_appender\_buffer\_size\_max + +| Name | data\_region\_ratis\_log\_appender\_buffer\_size\_max | +|:------:|:-----------------------------------------------| +| Description | data region max payload size for a single log-sync-RPC from leader to follower | +| Type | int32 | +| Default | 4MB | +| Effective | After restarting system | + +* config\_node\_ratis\_snapshot\_trigger\_threshold + +| Name | config\_node\_ratis\_snapshot\_trigger\_threshold | +|:------:|:---------------------------------------------| +| Description | confignode trigger a snapshot when snapshot_trigger_threshold logs are written | +| Type | int32 | +| Default | 400,000 | +| Effective | After restarting system | + +* schema\_region\_ratis\_snapshot\_trigger\_threshold + +| Name | schema\_region\_ratis\_snapshot\_trigger\_threshold | +|:------:|:-----------------------------------------------| +| Description | schema region trigger a snapshot when snapshot_trigger_threshold logs are written | +| Type | int32 | +| Default | 400,000 | +| Effective | After restarting system | + +* data\_region\_ratis\_snapshot\_trigger\_threshold + +| Name | data\_region\_ratis\_snapshot\_trigger\_threshold | +|:------:|:---------------------------------------------| +| Description | data region trigger a snapshot when snapshot_trigger_threshold logs are written | +| Type | int32 | +| Default | 400,000 | +| Effective | After restarting system | + +* config\_node\_ratis\_log\_unsafe\_flush\_enable + +| Name | config\_node\_ratis\_log\_unsafe\_flush\_enable | +|:------:|:---------------------------------------------------| +| Description | confignode allows flushing Raft Log asynchronously | +| Type | boolean | +| Default | false | +| Effective | After restarting system | + +* schema\_region\_ratis\_log\_unsafe\_flush\_enable + +| Name | schema\_region\_ratis\_log\_unsafe\_flush\_enable | +|:------:|:------------------------------------------------------| +| Description | schema region allows flushing Raft Log asynchronously | +| Type | boolean | +| Default | false | +| Effective | After restarting system | + +* data\_region\_ratis\_log\_unsafe\_flush\_enable + +| Name | data\_region\_ratis\_log\_unsafe\_flush\_enable | +|:------:|:----------------------------------------------------| +| Description | data region allows flushing Raft Log asynchronously | +| Type | boolean | +| Default | false | +| Effective | After restarting system | + +* config\_node\_ratis\_log\_segment\_size\_max\_in\_byte + +| Name | config\_node\_ratis\_log\_segment\_size\_max\_in\_byte | +|:------:|:-----------------------------------------------| +| Description | confignode max capacity of a single Log segment file | +| Type | int32 | +| Default | 24MB | +| Effective | After restarting system | + +* schema\_region\_ratis\_log\_segment\_size\_max\_in\_byte + +| Name | schema\_region\_ratis\_log\_segment\_size\_max\_in\_byte | +|:------:|:-------------------------------------------------| +| Description | schema region max capacity of a single Log segment file | +| Type | int32 | +| Default | 24MB | +| Effective | After restarting system | + +* data\_region\_ratis\_log\_segment\_size\_max\_in\_byte + +| Name | data\_region\_ratis\_log\_segment\_size\_max\_in\_byte | +|:------:|:-----------------------------------------------| +| Description | data region max capacity of a single Log segment file | +| Type | int32 | +| Default | 24MB | +| Effective | After restarting system | + +* config\_node\_ratis\_grpc\_flow\_control\_window + +| Name | config\_node\_ratis\_grpc\_flow\_control\_window | +|:------:|:-----------------------------------------------------------------------------| +| Description | confignode flow control window for ratis grpc log appender | +| Type | int32 | +| Default | 4MB | +| Effective | After restarting system | + +* schema\_region\_ratis\_grpc\_flow\_control\_window + +| Name | schema\_region\_ratis\_grpc\_flow\_control\_window | +|:------:|:---------------------------------------------| +| Description | schema region flow control window for ratis grpc log appender | +| Type | int32 | +| Default | 4MB | +| Effective | After restarting system | + +* data\_region\_ratis\_grpc\_flow\_control\_window + +| Name | data\_region\_ratis\_grpc\_flow\_control\_window | +|:------:|:-------------------------------------------| +| Description | data region flow control window for ratis grpc log appender | +| Type | int32 | +| Default | 4MB | +| Effective | After restarting system | + +* config\_node\_ratis\_grpc\_leader\_outstanding\_appends\_max + +| Name | config\_node\_ratis\_grpc\_leader\_outstanding\_appends\_max | +| :---------: | :----------------------------------------------------- | +| Description | config node grpc pipeline concurrency threshold | +| Type | int32 | +| Default | 128 | +| Effective | After restarting system | + +* schema\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max + +| Name | schema\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max | +| :---------: | :------------------------------------------------------ | +| Description | schema region grpc pipeline concurrency threshold | +| Type | int32 | +| Default | 128 | +| Effective | After restarting system | + +* data\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max + +| Name | data\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max | +| :---------: | :---------------------------------------------------- | +| Description | data region grpc pipeline concurrency threshold | +| Type | int32 | +| Default | 128 | +| Effective | After restarting system | + +* config\_node\_ratis\_log\_force\_sync\_num + +| Name | config\_node\_ratis\_log\_force\_sync\_num | +| :---------: | :------------------------------------ | +| Description | config node fsync threshold | +| Type | int32 | +| Default | 128 | +| Effective | After restarting system | + +* schema\_region\_ratis\_log\_force\_sync\_num + +| Name | schema\_region\_ratis\_log\_force\_sync\_num | +| :---------: | :-------------------------------------- | +| Description | schema region fsync threshold | +| Type | int32 | +| Default | 128 | +| Effective | After restarting system | + +* data\_region\_ratis\_log\_force\_sync\_num + +| Name | data\_region\_ratis\_log\_force\_sync\_num | +| :---------: | :----------------------------------- | +| Description | data region fsync threshold | +| Type | int32 | +| Default | 128 | +| Effective | After restarting system | + +* config\_node\_ratis\_rpc\_leader\_election\_timeout\_min\_ms + +| Name | config\_node\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | +|:------:|:-----------------------------------------------------| +| Description | confignode min election timeout for leader election | +| Type | int32 | +| Default | 2000ms | +| Effective | After restarting system | + +* schema\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms + +| Name | schema\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | +|:------:|:-------------------------------------------------------| +| Description | schema region min election timeout for leader election | +| Type | int32 | +| Default | 2000ms | +| Effective | After restarting system | + +* data\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms + +| Name | data\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | +|:------:|:-----------------------------------------------------| +| Description | data region min election timeout for leader election | +| Type | int32 | +| Default | 2000ms | +| Effective | After restarting system | + +* config\_node\_ratis\_rpc\_leader\_election\_timeout\_max\_ms + +| Name | config\_node\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | +|:------:|:-----------------------------------------------------| +| Description | confignode max election timeout for leader election | +| Type | int32 | +| Default | 2000ms | +| Effective | After restarting system | + +* schema\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms + +| Name | schema\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | +|:------:|:-------------------------------------------------------| +| Description | schema region max election timeout for leader election | +| Type | int32 | +| Default | 2000ms | +| Effective | After restarting system | + +* data\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms + +| Name | data\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | +|:------:|:-----------------------------------------------------| +| Description | data region max election timeout for leader election | +| Type | int32 | +| Default | 2000ms | +| Effective | After restarting system | + +* config\_node\_ratis\_request\_timeout\_ms + +| Name | config\_node\_ratis\_request\_timeout\_ms | +|:------:|:-------------------------------------| +| Description | confignode ratis client retry threshold | +| Type | int32 | +| Default | 10s | +| Effective | After restarting system | + +* schema\_region\_ratis\_request\_timeout\_ms + +| Name | schema\_region\_ratis\_request\_timeout\_ms | +|:------:|:---------------------------------------| +| Description | schema region ratis client retry threshold | +| Type | int32 | +| Default | 10s | +| Effective | After restarting system | + +* data\_region\_ratis\_request\_timeout\_ms + +| Name | data\_region\_ratis\_request\_timeout\_ms | +|:------:|:-------------------------------------| +| Description | data region ratis client retry threshold | +| Type | int32 | +| Default | 10s | +| Effective | After restarting system | + +* config\_node\_ratis\_max\_retry\_attempts + +| Name | config\_node\_ratis\_max\_retry\_attempts | +|:------:|:-------------------------------------------| +| Description | confignode ratis client max retry attempts | +| Type | int32 | +| Default | 10 | +| Effective | After restarting system | + +* config\_node\_ratis\_initial\_sleep\_time\_ms + +| Name | config\_node\_ratis\_initial\_sleep\_time\_ms | +|:------:|:-------------------------------------------------| +| Description | confignode ratis client retry initial sleep time | +| Type | int32 | +| Default | 100ms | +| Effective | After restarting system | + +* config\_node\_ratis\_max\_sleep\_time\_ms + +| Name | config\_node\_ratis\_max\_sleep\_time\_ms | +|:------:|:---------------------------------------------| +| Description | confignode ratis client retry max sleep time | +| Type | int32 | +| Default | 10s | +| Effective | After restarting system | + +* schema\_region\_ratis\_max\_retry\_attempts + +| Name | schema\_region\_ratis\_max\_retry\_attempts | +|:------:|:---------------------------------------| +| Description | schema region ratis client max retry attempts | +| Type | int32 | +| Default | 10 | +| Effective | After restarting system | + +* schema\_region\_ratis\_initial\_sleep\_time\_ms + +| Name | schema\_region\_ratis\_initial\_sleep\_time\_ms | +|:------:|:------------------------------------------| +| Description | schema region ratis client retry initial sleep time | +| Type | int32 | +| Default | 100ms | +| Effective | After restarting system | + +* schema\_region\_ratis\_max\_sleep\_time\_ms + +| Name | schema\_region\_ratis\_max\_sleep\_time\_ms | +|:------:|:--------------------------------------| +| Description | schema region ratis client retry max sleep time | +| Type | int32 | +| Default | 10s | +| Effective | After restarting system | + +* data\_region\_ratis\_max\_retry\_attempts + +| Name | data\_region\_ratis\_max\_retry\_attempts | +|:------:|:-------------------------------------| +| Description | data region ratis client max retry attempts | +| Type | int32 | +| Default | 10 | +| Effective | After restarting system | + +* data\_region\_ratis\_initial\_sleep\_time\_ms + +| Name | data\_region\_ratis\_initial\_sleep\_time\_ms | +|:------:|:----------------------------------------| +| Description | data region ratis client retry initial sleep time | +| Type | int32 | +| Default | 100ms | +| Effective | After restarting system | + +* data\_region\_ratis\_max\_sleep\_time\_ms + +| Name | data\_region\_ratis\_max\_sleep\_time\_ms | +|:------:|:------------------------------------| +| Description | data region ratis client retry max sleep time | +| Type | int32 | +| Default | 10s | +| Effective | After restarting system | + +* ratis\_first\_election\_timeout\_min\_ms + +| Name | ratis\_first\_election\_timeout\_min\_ms | +|:------:|:----------------------------------------------------------------| +| Description | minimal first election timeout for RatisConsensus | +| Type | int64 | +| Default | 50 (ms) | +| Effective | After restarting system | + +* ratis\_first\_election\_timeout\_max\_ms + +| Name | ratis\_first\_election\_timeout\_max\_ms | +|:------:|:----------------------------------------------------------------| +| Description | maximal first election timeout for RatisConsensus | +| Type | int64 | +| Default | 150 (ms) | +| Effective | After restarting system | + + +* config\_node\_ratis\_preserve\_logs\_num\_when\_purge + +| Name | config\_node\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:---------------------------------------------------------------| +| Description | confignode preserves certain logs when take snapshot and purge | +| Type | int32 | +| Default | 1000 | +| Effective | After restarting system | + +* schema\_region\_ratis\_preserve\_logs\_num\_when\_purge + +| Name | schema\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:------------------------------------------------------------------| +| Description | schema region preserves certain logs when take snapshot and purge | +| Type | int32 | +| Default | 1000 | +| Effective | After restarting system | + +* data\_region\_ratis\_preserve\_logs\_num\_when\_purge + +| Name | data\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:----------------------------------------------------------------| +| Description | data region preserves certain logs when take snapshot and purge | +| Type | int32 | +| Default | 1000 | +| Effective | After restarting system | + +* config\_node\_ratis\_log\_max\_size + +| Name | config\_node\_ratis\_log\_max\_size | +|:------:|:----------------------------------------------------------------| +| Description | Max file size of in-disk Raft Log for config node | +| Type | int64 | +| Default | 2147483648 (2GB) | +| Effective | After restarting system | + +* schema\_region\_ratis\_log\_max\_size + +| Name | schema\_region\_ratis\_log\_max\_size | +|:------:|:----------------------------------------------------------------| +| Description | Max file size of in-disk Raft Log for schema region | +| Type | int64 | +| Default | 2147483648 (2GB) | +| Effective | After restarting system | + +* data\_region\_ratis\_log\_max\_size + +| Name | data\_region\_ratis\_log\_max\_size | +|:------:|:----------------------------------------------------------------| +| Description | Max file size of in-disk Raft Log for data region | +| Type | int64 | +| Default | 21474836480 (20GB) | +| Effective | After restarting system | + +* config\_node\_ratis\_periodic\_snapshot\_interval + +| Name | config\_node\_ratis\_periodic\_snapshot\_interval | +|:------:|:----------------------------------------------------------------| +| Description | duration interval of config-node periodic snapshot | +| Type | int64 | +| Default | 86400 (seconds) | +| Effective | After restarting system | + +* schema\_region\_ratis\_periodic\_snapshot\_interval + +| Name | schema\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:----------------------------------------------------------------| +| Description | duration interval of schema-region periodic snapshot | +| Type | int64 | +| Default | 86400 (seconds) | +| Effective | After restarting system | + +* data\_region\_ratis\_periodic\_snapshot\_interval + +| Name | data\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:----------------------------------------------------------------| +| Description | duration interval of data-region periodic snapshot | +| Type | int64 | +| Default | 86400 (seconds) | +| Effective | After restarting system | + +### Procedure Configuration + +* procedure\_core\_worker\_thread\_count + +| Name | procedure\_core\_worker\_thread\_count | +| :---------: | :--------------------------------- | +| Description | The number of worker thread count | +| Type | int32 | +| Default | 4 | +| Effective | After restarting system | + +* procedure\_completed\_clean\_interval + +| Name | procedure\_completed\_clean\_interval | +| :---------: | :--------------------------------------------------- | +| Description | Time interval of completed procedure cleaner work in | +| Type | int32 | +| Unit | second | +| Default | 30 | +| Effective | After restarting system | + +* procedure\_completed\_evict\_ttl + +| Name | procedure\_completed\_evict\_ttl | +| :---------: | :----------------------------- | +| Description | The ttl of completed procedure | +| Type | int32 | +| Unit | second | +| Default | 800 | +| Effective | After restarting system | + +### MQTT Broker Configuration + +* enable\_mqtt\_service + +| Name | enable\_mqtt\_service。 | +|:-----------:|:------------------------------------| +| Description | Whether to enable the MQTT service | +| Type | Boolean | +| Default | False | +| Effective | hot-load | + +* mqtt\_host + +| Name | mqtt\_host | +|:-----------:|:---------------------------------------------| +| Description | The host to which the MQTT service is bound | +| Type | String | +| Default | 0.0.0.0 | +| Effective | hot-load | + +* mqtt\_port + +| Name | mqtt\_port | +|:-----------:|:--------------------------------------------| +| Description | The port to which the MQTT service is bound | +| Type | int32 | +| Default | 1883 | +| Effective | hot-load | + +* mqtt\_handler\_pool\_size + +|Name| mqtt\_handler\_pool\_size | +|:---:|:------------------------------------------------------------| +|Description| The size of the handler pool used to process MQTT messages | +|Type| int32 | +|Default| 1 | +|Effective| hot-load | + +* mqtt\_payload\_formatter + +| Name | mqtt\_payload\_formatter | +|:-----------:|:-------------------------------| +| Description | MQTT message payload formatter | +| Type | String | +| Default | JSON | +| Effective | hot-load | + +* mqtt\_max\_message\_size + +| Name | mqtt\_max\_message\_size | +|:------:|:-----------------------------------------| +| Description | Maximum length of MQTT message in bytes | +| Type | int32 | +| Default | 1048576 | +| Effective | hot-load | + + + + +#### TsFile Active Listening&Loading Function Configuration + +* load\_active\_listening\_enable + +|Name| load\_active\_listening\_enable | +|:---:|:---| +|Description| Whether to enable the DataNode's active listening and loading of tsfile functionality (default is enabled). | +|Type| Boolean | +|Default| true | +|Effective| hot-load | + +* load\_active\_listening\_dirs + +|Name| load\_active\_listening\_dirs | +|:---:|:---| +|Description| The directories to be listened to (automatically includes subdirectories of the directory), if there are multiple, separate with “,”. The default directory is ext/load/pending (supports hot loading). | +|Type| String | +|Default| ext/load/pending | +|Effective|hot-load| + +* load\_active\_listening\_fail\_dir + +|Name| load\_active\_listening\_fail\_dir | +|:---:|:---| +|Description| The directory to which files are transferred after the execution of loading tsfile files fails, only one directory can be configured. | +|Type| String | +|Default| ext/load/failed | +|Effective|hot-load| + +* load\_active\_listening\_max\_thread\_num + +|Name| load\_active\_listening\_max\_thread\_num | +|:---:|:---| +|Description| The maximum number of threads to perform loading tsfile tasks simultaneously. The default value when the parameter is commented out is max(1, CPU core count / 2). When the user sets a value not in the range [1, CPU core count / 2], it will be set to the default value (1, CPU core count / 2). | +|Type| Long | +|Default| max(1, CPU core count / 2) | +|Effective|Effective after restart| + + +* load\_active\_listening\_check\_interval\_seconds + +|Name| load\_active\_listening\_check\_interval\_seconds | +|:---:|:---| +|Description| Active listening polling interval in seconds. The function of actively listening to tsfile is achieved by polling the folder. This configuration specifies the time interval between two checks of load_active_listening_dirs, and the next check will be executed after load_active_listening_check_interval_seconds seconds of each check. When the user sets the polling interval to less than 1, it will be set to the default value of 5 seconds. | +|Type| Long | +|Default| 5| +|Effective|Effective after restart| \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Reference/ConfigNode-Config-Manual.md b/src/UserGuide/V2.0.1/Tree/Reference/ConfigNode-Config-Manual.md new file mode 100644 index 00000000..80c2cbaf --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/ConfigNode-Config-Manual.md @@ -0,0 +1,223 @@ + + +# ConfigNode Configuration + +IoTDB ConfigNode files are under `conf`. + +* `confignode-env.sh/bat`:Environment configurations, in which we could set the memory allocation of ConfigNode. + +* `iotdb-system.properties`:IoTDB system configurations. + +## Environment Configuration File(confignode-env.sh/bat) + +The environment configuration file is mainly used to configure the Java environment related parameters when ConfigNode is running, such as JVM related configuration. This part of the configuration is passed to the JVM when the ConfigNode starts. + +The details of each parameter are as follows: + +* MEMORY\_SIZE + +|Name|MEMORY\_SIZE| +|:---:|:---| +|Description|The memory size that IoTDB ConfigNode will use when startup | +|Type|String| +|Default|The default is three-tenths of the memory, with a maximum of 16G.| +|Effective|After restarting system| + +* ON\_HEAP\_MEMORY + +|Name|ON\_HEAP\_MEMORY| +|:---:|:---| +|Description|The heap memory size that IoTDB ConfigNode can use, Former Name: MAX\_HEAP\_SIZE | +|Type|String| +|Default| Calculate based on MEMORY\_SIZE.| +|Effective|After restarting system| + +* OFF\_HEAP\_MEMORY + +|Name|OFF\_HEAP\_MEMORY| +|:---:|:---| +|Description|The direct memory that IoTDB ConfigNode can use, Former Name: MAX\_DIRECT\_MEMORY\_SIZE | +|Type|String| +|Default| Calculate based on MEMORY\_SIZE.| +|Effective|After restarting system| + + +## ConfigNode Configuration File (iotdb-system.properties) + +The global configuration of cluster is in ConfigNode. + +### Config Node RPC Configuration + +* cn\_internal\_address + +| Name | cn\_internal\_address | +|:-----------:|:------------------------------------| +| Description | ConfigNode internal service address | +| Type | String | +| Default | 127.0.0.1 | +| Effective | Only allowed to be modified in first start up | + +* cn\_internal\_port + +|Name| cn\_internal\_port | +|:---:|:---| +|Description| ConfigNode internal service port| +|Type| Short Int : [0,65535] | +|Default| 10710 | +|Effective|Only allowed to be modified in first start up| + +### Consensus + +* cn\_consensus\_port + +|Name| cn\_consensus\_port | +|:---:|:---| +|Description| ConfigNode data Consensus Port | +|Type| Short Int : [0,65535] | +|Default| 10720 | +|Effective|Only allowed to be modified in first start up| + +### SeedConfigNode + +* cn\_seed\_config\_node + +|Name| cn\_seed\_config\_node | +|:---:|:----------------------------------------------------------------------| +|Description| Seed ConfigNode's address for current ConfigNode to join the cluster. This parameter is corresponding to cn\_target\_config\_node\_list before V1.2.2 | +|Type| String | +|Default| 127.0.0.1:10710 | +|Effective| Only allowed to be modified in first start up | + +### Directory configuration + +* cn\_system\_dir + +|Name| cn\_system\_dir | +|:---:|:---| +|Description| ConfigNode system data dir | +|Type| String | +|Default| data/system(Windows:data\\system) | +|Effective|After restarting system| + +* cn\_consensus\_dir + +|Name| cn\_consensus\_dir | +|:---:|:---------------------------------------------------------------| +|Description| ConfigNode Consensus protocol data dir | +|Type| String | +|Default| data/confignode/consensus(Windows:data\\confignode\\consensus) | +|Effective| After restarting system | + +### Thrift RPC configuration + +* cn\_rpc\_thrift\_compression\_enable + +|Name| cn\_rpc\_thrift\_compression\_enable | +|:---:|:---| +|Description| Whether enable thrift's compression (using GZIP).| +|Type|Boolean| +|Default| false | +|Effective|After restarting system| + +* cn\_rpc\_thrift\_compression\_enable + +|Name| cn\_rpc\_thrift\_compression\_enable | +|:---:|:---| +|Description| Whether enable thrift's compression (using GZIP).| +|Type|Boolean| +|Default| false | +|Effective|After restarting system| + +* cn\_rpc\_advanced\_compression\_enable + +|Name| cn\_rpc\_advanced\_compression\_enable | +|:---:|:---| +|Description| Whether enable thrift's advanced compression.| +|Type|Boolean| +|Default| false | +|Effective|After restarting system| + +* cn\_rpc\_max\_concurrent\_client\_num + +|Name| cn\_rpc\_max\_concurrent\_client\_num | +|:---:|:---| +|Description| Max concurrent rpc connections| +|Type| Short Int : [0,65535] | +|Description| 65535 | +|Effective|After restarting system| + +* cn\_thrift\_max\_frame\_size + +|Name| cn\_thrift\_max\_frame\_size | +|:---:|:---| +|Description| Max size of bytes of each thrift RPC request/response| +|Type| Long | +|Unit|Byte| +|Default| 536870912 | +|Effective|After restarting system| + +* cn\_thrift\_init\_buffer\_size + +|Name| cn\_thrift\_init\_buffer\_size | +|:---:|:---| +|Description| Initial size of bytes of buffer that thrift used | +|Type| long | +|Default| 1024 | +|Effective|After restarting system| + +* cn\_connection\_timeout\_ms + +| Name | cn\_connection\_timeout\_ms | +|:-----------:|:-------------------------------------------------------| +| Description | Thrift socket and connection timeout between nodes | +| Type | int | +| Default | 60000 | +| Effective | After restarting system | + +* cn\_selector\_thread\_nums\_of\_client\_manager + +| Name | cn\_selector\_thread\_nums\_of\_client\_manager | +|:-----------:|:-------------------------------------------------------------------------------| +| Description | selector thread (TAsyncClientManager) nums for async thread in a clientManager | +| Type | int | +| Default | 1 | +| Effective | After restarting system | + +* cn\_core\_client\_count\_for\_each\_node\_in\_client\_manager + +| Name | cn\_core\_client\_count\_for\_each\_node\_in\_client\_manager | +|:------------:|:---------------------------------------------------------------| +| Description | Number of core clients routed to each node in a ClientManager | +| Type | int | +| Default | 200 | +| Effective | After restarting system | + +* cn\_max\_client\_count\_for\_each\_node\_in\_client\_manager + +| Name | cn\_max\_client\_count\_for\_each\_node\_in\_client\_manager | +|:--------------:|:-------------------------------------------------------------| +| Description | Number of max clients routed to each node in a ClientManager | +| Type | int | +| Default | 300 | +| Effective | After restarting system | + +### Metric Configuration diff --git a/src/UserGuide/V2.0.1/Tree/Reference/DataNode-Config-Manual.md b/src/UserGuide/V2.0.1/Tree/Reference/DataNode-Config-Manual.md new file mode 100644 index 00000000..94ede501 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/DataNode-Config-Manual.md @@ -0,0 +1,584 @@ + + +# DataNode Configuration Parameters + +We use the same configuration files for IoTDB DataNode and Standalone version, all under the `conf`. + +* `datanode-env.sh/bat`:Environment configurations, in which we could set the memory allocation of DataNode and Standalone. + +* `iotdb-system.properties`:IoTDB system configurations. + +## Hot Modification Configuration + +For the convenience of users, IoTDB provides users with hot modification function, that is, modifying some configuration parameters in `iotdb-system.properties` during the system operation and applying them to the system immediately. +In the parameters described below, these parameters whose way of `Effective` is `hot-load` support hot modification. + +Trigger way: The client sends the command(sql) `load configuration` or `set configuration` to the IoTDB server. + +## Environment Configuration File(datanode-env.sh/bat) + +The environment configuration file is mainly used to configure the Java environment related parameters when DataNode is running, such as JVM related configuration. This part of the configuration is passed to the JVM when the DataNode starts. + +The details of each parameter are as follows: + +* MEMORY\_SIZE + +|Name|MEMORY\_SIZE| +|:---:|:---| +|Description|The minimum heap memory size that IoTDB DataNode will use when startup | +|Type|String| +|Default| The default is a half of the memory.| +|Effective|After restarting system| + +* ON\_HEAP\_MEMORY + +|Name|ON\_HEAP\_MEMORY| +|:---:|:---| +|Description|The heap memory size that IoTDB DataNode can use, Former Name: MAX\_HEAP\_SIZE | +|Type|String| +|Default| Calculate based on MEMORY\_SIZE.| +|Effective|After restarting system| + +* OFF\_HEAP\_MEMORY + +|Name|OFF\_HEAP\_MEMORY| +|:---:|:---| +|Description|The direct memory that IoTDB DataNode can use, Former Name: MAX\_DIRECT\_MEMORY\_SIZE| +|Type|String| +|Default| Calculate based on MEMORY\_SIZE.| +|Effective|After restarting system| + +* JMX\_LOCAL + +|Name|JMX\_LOCAL| +|:---:|:---| +|Description|JMX monitoring mode, configured as yes to allow only local monitoring, no to allow remote monitoring| +|Type|Enum String: "true", "false"| +|Default|true| +|Effective|After restarting system| + +* JMX\_PORT + +|Name|JMX\_PORT| +|:---:|:---| +|Description|JMX listening port. Please confirm that the port is not a system reserved port and is not occupied| +|Type|Short Int: [0,65535]| +|Default|31999| +|Effective|After restarting system| + +* JMX\_IP + +|Name|JMX\_IP| +|:---:|:---| +|Description|JMX listening address. Only take effect if JMX\_LOCAL=false. 0.0.0.0 is never allowed| +|Type|String| +|Default|127.0.0.1| +|Effective|After restarting system| + +## JMX Authorization + +We **STRONGLY RECOMMENDED** you CHANGE the PASSWORD for the JMX remote connection. + +The user and passwords are in ${IOTDB\_CONF}/conf/jmx.password. + +The permission definitions are in ${IOTDB\_CONF}/conf/jmx.access. + +## DataNode/Standalone Configuration File (iotdb-system.properties) + +### Data Node RPC Configuration + +* dn\_rpc\_address + +|Name| dn\_rpc\_address | +|:---:|:-----------------------------------------------| +|Description| The client rpc service listens on the address. | +|Type| String | +|Default| 0.0.0.0 | +|Effective| After restarting system | + +* dn\_rpc\_port + +|Name| dn\_rpc\_port | +|:---:|:---| +|Description| The client rpc service listens on the port.| +|Type|Short Int : [0,65535]| +|Default| 6667 | +|Effective|After restarting system| + +* dn\_internal\_address + +|Name| dn\_internal\_address | +|:---:|:---| +|Description| DataNode internal service host/IP | +|Type| string | +|Default| 127.0.0.1 | +|Effective|Only allowed to be modified in first start up| + +* dn\_internal\_port + +|Name| dn\_internal\_port | +|:---:|:-------------------------------| +|Description| DataNode internal service port | +|Type| int | +|Default| 10730 | +|Effective| Only allowed to be modified in first start up | + +* dn\_mpp\_data\_exchange\_port + +|Name| mpp\_data\_exchange\_port | +|:---:|:---| +|Description| MPP data exchange port | +|Type| int | +|Default| 10740 | +|Effective|Only allowed to be modified in first start up| + +* dn\_schema\_region\_consensus\_port + +|Name| dn\_schema\_region\_consensus\_port | +|:---:|:---| +|Description| DataNode Schema replica communication port for consensus | +|Type| int | +|Default| 10750 | +|Effective|Only allowed to be modified in first start up| + +* dn\_data\_region\_consensus\_port + +|Name| dn\_data\_region\_consensus\_port | +|:---:|:---| +|Description| DataNode Data replica communication port for consensus | +|Type| int | +|Default| 10760 | +|Effective|Only allowed to be modified in first start up| + +* dn\_join\_cluster\_retry\_interval\_ms + +|Name| dn\_join\_cluster\_retry\_interval\_ms | +|:---:|:--------------------------------------------------------------------------| +|Description| The time of data node waiting for the next retry to join into the cluster | +|Type| long | +|Default| 5000 | +|Effective| After restarting system | + +### SSL Configuration + +* enable\_thrift\_ssl + +|Name| enable\_thrift\_ssl | +|:---:|:---------------------------| +|Description|When enable\_thrift\_ssl is configured as true, SSL encryption will be used for communication through dn\_rpc\_port | +|Type| Boolean | +|Default| false | +|Effective| After restarting system | + +* enable\_https + +|Name| enable\_https | +|:---:|:-------------------------| +|Description| REST Service Specifies whether to enable SSL configuration | +|Type| Boolean | +|Default| false | +|Effective| After restarting system | + +* key\_store\_path + +|Name| key\_store\_path | +|:---:|:-----------------| +|Description| SSL certificate path | +|Type| String | +|Default| "" | +|Effective| After restarting system | + +* key\_store\_pwd + +|Name| key\_store\_pwd | +|:---:|:----------------| +|Description| SSL certificate password | +|Type| String | +|Default| "" | +|Effective| After restarting system | + +### SeedConfigNode + +* dn\_seed\_config\_node + +|Name| dn\_seed\_config\_node | +|:---:|:------------------------------------------------| +|Description| ConfigNode Address for DataNode to join cluster. This parameter is corresponding to dn\_target\_config\_node\_list before V1.2.2 | +|Type| String | +|Default| 127.0.0.1:10710 | +|Effective| Only allowed to be modified in first start up | + +### Connection Configuration + +* dn\_rpc\_thrift\_compression\_enable + +|Name| dn\_rpc\_thrift\_compression\_enable | +|:---:|:---| +|Description| Whether enable thrift's compression (using GZIP).| +|Type|Boolean| +|Default| false | +|Effective|After restarting system| + +* dn\_rpc\_advanced\_compression\_enable + +|Name| dn\_rpc\_advanced\_compression\_enable | +|:---:|:---| +|Description| Whether enable thrift's advanced compression.| +|Type|Boolean| +|Default| false | +|Effective|After restarting system| + +* dn\_rpc\_selector\_thread\_count + +|Name| dn\_rpc\_selector\_thread\_count | +|:---:|:-----------------------------------| +|Description| The number of rpc selector thread. | +|Type| int | +|Default| false | +|Effective| After restarting system | + +* dn\_rpc\_min\_concurrent\_client\_num + +|Name| dn\_rpc\_min\_concurrent\_client\_num | +|:---:|:-----------------------------------| +|Description| Minimum concurrent rpc connections | +|Type| Short Int : [0,65535] | +|Description| 1 | +|Effective| After restarting system | + +* dn\_rpc\_max\_concurrent\_client\_num + +|Name| dn\_rpc\_max\_concurrent\_client\_num | +|:---:|:---| +|Description| Max concurrent rpc connections| +|Type| Short Int : [0,65535] | +|Description| 65535 | +|Effective|After restarting system| + +* dn\_thrift\_max\_frame\_size + +|Name| dn\_thrift\_max\_frame\_size | +|:---:|:---| +|Description| Max size of bytes of each thrift RPC request/response| +|Type| Long | +|Unit|Byte| +|Default| 536870912 | +|Effective|After restarting system| + +* dn\_thrift\_init\_buffer\_size + +|Name| dn\_thrift\_init\_buffer\_size | +|:---:|:---| +|Description| Initial size of bytes of buffer that thrift used | +|Type| long | +|Default| 1024 | +|Effective|After restarting system| + +* dn\_connection\_timeout\_ms + +| Name | dn\_connection\_timeout\_ms | +|:-----------:|:---------------------------------------------------| +| Description | Thrift socket and connection timeout between nodes | +| Type | int | +| Default | 60000 | +| Effective | After restarting system | + +* dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager + +| Name | dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager | +|:------------:|:--------------------------------------------------------------| +| Description | Number of core clients routed to each node in a ClientManager | +| Type | int | +| Default | 200 | +| Effective | After restarting system | + +* dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager + +| Name | dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager | +|:--------------:|:-------------------------------------------------------------| +| Description | Number of max clients routed to each node in a ClientManager | +| Type | int | +| Default | 300 | +| Effective | After restarting system | + +### Dictionary Configuration + +* dn\_system\_dir + +| Name | dn\_system\_dir | +|:-----------:|:----------------------------------------------------------------------------| +| Description | The directories of system files. It is recommended to use an absolute path. | +| Type | String | +| Default | data/datanode/system (Windows: data\\datanode\\system) | +| Effective | After restarting system | + +* dn\_data\_dirs + +| Name | dn\_data\_dirs | +|:-----------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | The directories of data files. Multiple directories are separated by comma. The starting directory of the relative path is related to the operating system. It is recommended to use an absolute path. If the path does not exist, the system will automatically create it. | +| Type | String[] | +| Default | data/datanode/data (Windows: data\\datanode\\data) | +| Effective | After restarting system | + +* dn\_multi\_dir\_strategy + +| Name | dn\_multi\_dir\_strategy | +|:-----------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Description | IoTDB's strategy for selecting directories for TsFile in tsfile_dir. You can use a simple class name or a full name of the class. The system provides the following three strategies:
1. SequenceStrategy: IoTDB selects the directory from tsfile\_dir in order, traverses all the directories in tsfile\_dir in turn, and keeps counting;
2. MaxDiskUsableSpaceFirstStrategy: IoTDB first selects the directory with the largest free disk space in tsfile\_dir;
You can complete a user-defined policy in the following ways:
1. Inherit the org.apache.iotdb.db.storageengine.rescon.disk.strategy.DirectoryStrategy class and implement its own Strategy method;
2. Fill in the configuration class with the full class name of the implemented class (package name plus class name, UserDfineStrategyPackage);
3. Add the jar file to the project. | +| Type | String | +| Default | SequenceStrategy | +| Effective | hot-load | + +* dn\_consensus\_dir + +| Name | dn\_consensus\_dir | +|:-----------:|:-------------------------------------------------------------------------------| +| Description | The directories of consensus files. It is recommended to use an absolute path. | +| Type | String | +| Default | data/datanode/consensus | +| Effective | After restarting system | + +* dn\_wal\_dirs + +| Name | dn\_wal\_dirs | +|:-----------:|:-------------------------------------------------------------------------| +| Description | Write Ahead Log storage path. It is recommended to use an absolute path. | +| Type | String | +| Default | data/datanode/wal | +| Effective | After restarting system | + +* dn\_tracing\_dir + +| Name | dn\_tracing\_dir | +|:-----------:|:----------------------------------------------------------------------------| +| Description | The tracing root directory path. It is recommended to use an absolute path. | +| Type | String | +| Default | datanode/tracing | +| Effective | After restarting system | + +* dn\_sync\_dir + +| Name | dn\_sync\_dir | +|:-----------:|:--------------------------------------------------------------------------| +| Description | The directories of sync files. It is recommended to use an absolute path. | +| Type | String | +| Default | data/datanode/sync | +| Effective | After restarting system | + +### Metric Configuration + +## Enable GC log + +GC log is off by default. +For performance tuning, you may want to collect the GC info. + +To enable GC log, just add a parameter "printgc" when you start the DataNode. + +```bash +nohup sbin/start-datanode.sh printgc >/dev/null 2>&1 & +``` +Or +```cmd +sbin\start-datanode.bat printgc +``` + +GC log is stored at `IOTDB_HOME/logs/gc.log`. +There will be at most 10 gc.log.* files and each one can reach to 10MB. + +### REST Service Configuration + +* enable\_rest\_service + +|Name| enable\_rest\_service | +|:---:|:--------------------------------------| +|Description| Whether to enable the Rest service | +|Type| Boolean | +|Default| false | +|Effective| After restarting system | + +* rest\_service\_port + +|Name| rest\_service\_port | +|:---:|:------------------| +|Description| The Rest service listens to the port number | +|Type| int32 | +|Default| 18080 | +|Effective| After restarting system | + +* enable\_swagger + +|Name| enable\_swagger | +|:---:|:-----------------------| +|Description| Whether to enable swagger to display rest interface information | +|Type| Boolean | +|Default| false | +|Effective| After restarting system | + +* rest\_query\_default\_row\_size\_limit + +|Name| rest\_query\_default\_row\_size\_limit | +|:---:|:------------------------------------------------------------------------------------------| +|Description| The maximum number of rows in a result set that can be returned by a query | +|Type| int32 | +|Default| 10000 | +|Effective| After restarting system | + +* cache\_expire + +|Name| cache\_expire | +|:---:|:--------------------------------------------------------| +|Description| Expiration time for caching customer login information | +|Type| int32 | +|Default| 28800 | +|Effective| After restarting system | + +* cache\_max\_num + +|Name| cache\_max\_num | +|:---:|:--------------| +|Description| The maximum number of users stored in the cache | +|Type| int32 | +|Default| 100 | +|Effective| After restarting system | + +* cache\_init\_num + +|Name| cache\_init\_num | +|:---:|:---------------| +|Description| Initial cache capacity | +|Type| int32 | +|Default| 10 | +|Effective| After restarting system | + + +* trust\_store\_path + +|Name| trust\_store\_path | +|:---:|:---------------| +|Description| keyStore Password (optional) | +|Type| String | +|Default| "" | +|Effective| After restarting system | + +* trust\_store\_pwd + +|Name| trust\_store\_pwd | +|:---:|:---------------------------------| +|Description| trustStore Password (Optional) | +|Type| String | +|Default| "" | +|Effective| After restarting system | + +* idle\_timeout + +|Name| idle\_timeout | +|:---:|:--------------| +|Description| SSL timeout duration, expressed in seconds | +|Type| int32 | +|Default| 5000 | +|Effective| After restarting system | + + +#### Storage engine configuration + + +* dn\_default\_space\_usage\_thresholds + +|Name| dn\_default\_space\_usage\_thresholds | +|:---:|:--------------| +|Description| Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | +|Type| double | +|Default| 0.85 | +|Effective| hot-load | + +* remote\_tsfile\_cache\_dirs + +|Name| remote\_tsfile\_cache\_dirs | +|:---:|:--------------| +|Description| Cache directory stored locally in the cloud | +|Type| string | +|Default| data/datanode/data/cache | +|Effective| After restarting system | + +* remote\_tsfile\_cache\_page\_size\_in\_kb + +|Name| remote\_tsfile\_cache\_page\_size\_in\_kb | +|:---:|:--------------| +|Description| Block size of locally cached files stored in the cloud | +|Type| int | +|Default| 20480 | +|Effective| After restarting system | + +* remote\_tsfile\_cache\_max\_disk\_usage\_in\_mb + +|Name| remote\_tsfile\_cache\_max\_disk\_usage\_in\_mb | +|:---:|:--------------| +|Description| Maximum Disk Occupancy Size for Cloud Storage Local Cache | +|Type| long | +|Default| 51200 | +|Effective| After restarting system | + +* object\_storage\_type + +|Name| object\_storage\_type | +|:---:|:--------------| +|Description| Cloud Storage Type | +|Type| string | +|Default| AWS_S3 | +|Effective| After restarting system | + +* object\_storage\_bucket + +|Name| object\_storage\_bucket | +|:---:|:--------------| +|Description| Name of cloud storage bucket | +|Type| string | +|Default| iotdb_data | +|Effective| After restarting system | + +* object\_storage\_endpoiont + +|Name| object\_storage\_endpoiont | +|:---:|:--------------| +|Description| endpoint of cloud storage | +|Type| string | +|Default| None | +|Effective| After restarting system | + +* object\_storage\_access\_key + +|Name| object\_storage\_access\_key | +|:---:|:--------------| +|Description| Authentication information stored in the cloud: key | +|Type| string | +|Default| None | +|Effective| After restarting system | + +* object\_storage\_access\_secret + +|Name| object\_storage\_access\_secret | +|:---:|:--------------| +|Description| Authentication information stored in the cloud: secret | +|Type| string | +|Default| None | +|Effective| After restarting system | diff --git a/src/UserGuide/V2.0.1/Tree/Reference/Keywords.md b/src/UserGuide/V2.0.1/Tree/Reference/Keywords.md new file mode 100644 index 00000000..c098b3e9 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/Keywords.md @@ -0,0 +1,227 @@ + + +# Keywords + +Reserved words(Can not be used as identifier): + +- ROOT +- TIME +- TIMESTAMP + +Common Keywords: + +- ADD +- AFTER +- ALIAS +- ALIGN +- ALIGNED +- ALL +- ALTER +- ALTER_TIMESERIES +- ANY +- APPEND +- APPLY_TEMPLATE +- AS +- ASC +- ATTRIBUTES +- BEFORE +- BEGIN +- BLOCKED +- BOUNDARY +- BY +- CACHE +- CHILD +- CLEAR +- CLUSTER +- CONCAT +- CONFIGNODES +- CONFIGURATION +- CONTINUOUS +- COUNT +- CONTAIN +- CQ +- CQS +- CREATE +- CREATE_CONTINUOUS_QUERY +- CREATE_FUNCTION +- CREATE_ROLE +- CREATE_TIMESERIES +- CREATE_TRIGGER +- CREATE_USER +- DATA +- DATABASE +- DATABASES +- DATANODES +- DEACTIVATE +- DEBUG +- DELETE +- DELETE_ROLE +- DELETE_STORAGE_GROUP +- DELETE_TIMESERIES +- DELETE_USER +- DESC +- DESCRIBE +- DEVICE +- DEVICEID +- DEVICES +- DISABLE +- DISCARD +- DROP +- DROP_CONTINUOUS_QUERY +- DROP_FUNCTION +- DROP_TRIGGER +- END +- ENDTIME +- EVERY +- EXPLAIN +- FILL +- FILE +- FLUSH +- FOR +- FROM +- FULL +- FUNCTION +- FUNCTIONS +- GLOBAL +- GRANT +- GRANT_ROLE_PRIVILEGE +- GRANT_USER_PRIVILEGE +- GRANT_USER_ROLE +- GROUP +- HEAD +- HAVING +- INDEX +- INFO +- INSERT +- INSERT_TIMESERIES +- INTO +- KILL +- LABEL +- LAST +- LATEST +- LEVEL +- LIKE +- LIMIT +- LINEAR +- LINK +- LIST +- LIST_ROLE +- LIST_USER +- LOAD +- LOCAL +- LOCK +- MERGE +- METADATA +- MODIFY_PASSWORD +- NODES +- NONE +- NOW +- OF +- OFF +- OFFSET +- ON +- ORDER +- ONSUCCESS +- PARTITION +- PASSWORD +- PATHS +- PIPE +- PIPES +- PIPESINK +- PIPESINKS +- PIPESINKTYPE +- POLICY +- PREVIOUS +- PREVIOUSUNTILLAST +- PRIVILEGES +- PROCESSLIST +- PROPERTY +- PRUNE +- QUERIES +- QUERY +- RANGE +- READONLY +- READ_TEMPLATE +- READ_TEMPLATE_APPLICATION +- READ_TIMESERIES +- REGEXP +- REGIONID +- REGIONS +- REMOVE +- RENAME +- RESAMPLE +- RESOURCE +- REVOKE +- REVOKE_ROLE_PRIVILEGE +- REVOKE_USER_PRIVILEGE +- REVOKE_USER_ROLE +- ROLE +- RUNNING +- SCHEMA +- SELECT +- SERIESSLOTID +- SET +- SET_STORAGE_GROUP +- SETTLE +- SGLEVEL +- SHOW +- SLIMIT +- SOFFSET +- STORAGE +- START +- STARTTIME +- STATELESS +- STATEFUL +- STOP +- SYSTEM +- TAIL +- TAGS +- TASK +- TEMPLATE +- TIMEOUT +- TIMESERIES +- TIMESLOTID +- TO +- TOLERANCE +- TOP +- TRACING +- TRIGGER +- TRIGGERS +- TTL +- UNLINK +- UNLOAD +- UNSET +- UPDATE +- UPDATE_TEMPLATE +- UPSERT +- URI +- USER +- USING +- VALUES +- VERIFY +- VERSION +- VIEW +- WATERMARK_EMBEDDING +- WHERE +- WITH +- WITHOUT +- WRITABLE diff --git a/src/UserGuide/V2.0.1/Tree/Reference/Modify-Config-Manual.md b/src/UserGuide/V2.0.1/Tree/Reference/Modify-Config-Manual.md new file mode 100644 index 00000000..ad61e7eb --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/Modify-Config-Manual.md @@ -0,0 +1,71 @@ + + +# Introduction to configuration item modification +## Method to modify +* Use sql statement to modify [recommended] +* Directly modify the configuration file [not recommended] +## Effective method +* Cannot be modified after the first startup. (first_start) +* Take effect after restart (restart) +* hot load (hot_reload) +# Modify configuration files directly +It can take effect by restarting or following the command +## Hot reload configuration command +Make changes to configuration items that support hot reloading take effect immediately. +For configuration items that have been modified in the configuration file, deleting or commenting them from the configuration file and then performing load configuration will restore the default values. +``` +load configuration +``` +# SetConfiguration statement +``` +set configuration "key1"="value1" "key2"="value2"... (on nodeId) +``` +### Example 1 +``` +set configuration "enable_cross_space_compaction"="false" +``` +To take effect permanently on all nodes in the cluster, set enable_cross_space_compaction to false and write it to iotdb-system.properties. +### Example 2 +``` +set configuration "enable_cross_space_compaction"="false" "enable_seq_space_compaction"="false" on 1 +``` +To take effect permanently on the node with nodeId 1, set enable_cross_space_compaction to false, set enable_seq_space_compaction to false, and write it to iotdb-system.properties. +### Example 3 +``` +set configuration "enable_cross_space_compaction"="false" "timestamp_precision"="ns" +``` +To take effect permanently on all nodes in the cluster, set enable_cross_space_compaction to false, timestamp_precision to ns, and write it to iotdb-system.properties. However, timestamp_precision is a configuration item that cannot be modified after the first startup, so the update of this configuration item will be ignored and the return is as follows. +``` +Msg: org.apache.iotdb.jdbc.IoTDBSQLException: 301: ignored config items: [timestamp_precision] +``` +Effective configuration item +Configuration items that support hot reloading and take effect immediately are marked with effectiveMode as hot_reload in the iotdb-system.properties.template file. + +Example +``` +# Used for indicate cluster name and distinguish different cluster. +# If you need to modify the cluster name, it's recommended to use 'set configuration "cluster_name=xxx"' sql. +# Manually modifying configuration file is not recommended, which may cause node restart fail. +# effectiveMode: hot_reload +# Datatype: string +cluster_name=defaultCluster +``` diff --git a/src/UserGuide/V2.0.1/Tree/Reference/Status-Codes.md b/src/UserGuide/V2.0.1/Tree/Reference/Status-Codes.md new file mode 100644 index 00000000..5dffc1ed --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/Status-Codes.md @@ -0,0 +1,178 @@ + + +# Status Codes + +A sample solution as IoTDB requires registering the time series first before writing data is: + +``` +try { + writeData(); +} catch (SQLException e) { + // the most case is that the time series does not exist + if (e.getMessage().contains("exist")) { + //However, using the content of the error message is not so efficient + registerTimeSeries(); + //write data once again + writeData(); + } +} + +``` + +With Status Code, instead of writing codes like `if (e.getErrorMessage().contains("exist"))`, we can simply use `e.getErrorCode() == TSStatusCode.TIME_SERIES_NOT_EXIST_ERROR.getStatusCode()`. + +Here is a list of Status Code and related message: + +| Status Code | Status Type | Meanings | +|:------------|:---------------------------------------|:------------------------------------------------------------------------------------------| +| 200 | SUCCESS_STATUS | | +| 201 | INCOMPATIBLE_VERSION | Incompatible version | +| 202 | CONFIGURATION_ERROR | Configuration error | +| 203 | START_UP_ERROR | Meet error while starting | +| 204 | SHUT_DOWN_ERROR | Meet error while shutdown | +| 300 | UNSUPPORTED_OPERATION | Unsupported operation | +| 301 | EXECUTE_STATEMENT_ERROR | Execute statement error | +| 302 | MULTIPLE_ERROR | Meet error when executing multiple statements | +| 303 | ILLEGAL_PARAMETER | Parameter is illegal | +| 304 | OVERLAP_WITH_EXISTING_TASK | Current task has some conflict with existing tasks | +| 305 | INTERNAL_SERVER_ERROR | Internal server error | +| 306 | DISPATCH_ERROR | Meet error while dispatching | +| 400 | REDIRECTION_RECOMMEND | Recommend Client redirection | +| 500 | DATABASE_NOT_EXIST | Database does not exist | +| 501 | DATABASE_ALREADY_EXISTS | Database already exist | +| 502 | SERIES_OVERFLOW | Series number exceeds the threshold | +| 503 | TIMESERIES_ALREADY_EXIST | Timeseries already exists | +| 504 | TIMESERIES_IN_BLACK_LIST | Timeseries is being deleted | +| 505 | ALIAS_ALREADY_EXIST | Alias already exists | +| 506 | PATH_ALREADY_EXIST | Path already exists | +| 507 | METADATA_ERROR | Meet error when dealing with metadata | +| 508 | PATH_NOT_EXIST | Path does not exist | +| 509 | ILLEGAL_PATH | Illegal path | +| 510 | CREATE_TEMPLATE_ERROR | Create schema template error | +| 511 | DUPLICATED_TEMPLATE | Schema template is duplicated | +| 512 | UNDEFINED_TEMPLATE | Schema template is not defined | +| 513 | TEMPLATE_NOT_SET | Schema template is not set | +| 514 | DIFFERENT_TEMPLATE | Template is not consistent | +| 515 | TEMPLATE_IS_IN_USE | Template is in use | +| 516 | TEMPLATE_INCOMPATIBLE | Template is not compatible | +| 517 | SEGMENT_NOT_FOUND | Segment not found | +| 518 | PAGE_OUT_OF_SPACE | No enough space on schema page | +| 519 | RECORD_DUPLICATED | Record is duplicated | +| 520 | SEGMENT_OUT_OF_SPACE | No enough space on schema segment | +| 521 | PBTREE_FILE_NOT_EXISTS | PBTreeFile does not exist | +| 522 | OVERSIZE_RECORD | Size of record exceeds the threshold of page of PBTreeFile | +| 523 | PBTREE_FILE_REDO_LOG_BROKEN | PBTreeFile redo log has broken | +| 524 | TEMPLATE_NOT_ACTIVATED | Schema template is not activated | +| 526 | SCHEMA_QUOTA_EXCEEDED | Schema usage exceeds quota limit | +| 527 | MEASUREMENT_ALREADY_EXISTS_IN_TEMPLATE | Measurement already exists in schema template | +| 600 | SYSTEM_READ_ONLY | IoTDB system is read only | +| 601 | STORAGE_ENGINE_ERROR | Storage engine related error | +| 602 | STORAGE_ENGINE_NOT_READY | The storage engine is in recovery, not ready fore accepting read/write operation | +| 603 | DATAREGION_PROCESS_ERROR | DataRegion related error | +| 604 | TSFILE_PROCESSOR_ERROR | TsFile processor related error | +| 605 | WRITE_PROCESS_ERROR | Writing data related error | +| 606 | WRITE_PROCESS_REJECT | Writing data rejected error | +| 607 | OUT_OF_TTL | Insertion time is less than TTL time bound | +| 608 | COMPACTION_ERROR | Meet error while merging | +| 609 | ALIGNED_TIMESERIES_ERROR | Meet error in aligned timeseries | +| 610 | WAL_ERROR | WAL error | +| 611 | DISK_SPACE_INSUFFICIENT | Disk space is insufficient | +| 700 | SQL_PARSE_ERROR | Meet error while parsing SQL | +| 701 | SEMANTIC_ERROR | SQL semantic error | +| 702 | GENERATE_TIME_ZONE_ERROR | Meet error while generating time zone | +| 703 | SET_TIME_ZONE_ERROR | Meet error while setting time zone | +| 704 | QUERY_NOT_ALLOWED | Query statements are not allowed error | +| 705 | LOGICAL_OPERATOR_ERROR | Logical operator related error | +| 706 | LOGICAL_OPTIMIZE_ERROR | Logical optimize related error | +| 707 | UNSUPPORTED_FILL_TYPE | Unsupported fill type related error | +| 708 | QUERY_PROCESS_ERROR | Query process related error | +| 709 | MPP_MEMORY_NOT_ENOUGH | Not enough memory for task execution in MPP | +| 710 | CLOSE_OPERATION_ERROR | Meet error in close operation | +| 711 | TSBLOCK_SERIALIZE_ERROR | TsBlock serialization error | +| 712 | INTERNAL_REQUEST_TIME_OUT | MPP Operation timeout | +| 713 | INTERNAL_REQUEST_RETRY_ERROR | Internal operation retry failed | +| 714 | NO_SUCH_QUERY | Cannot find target query | +| 715 | QUERY_WAS_KILLED | Query was killed when execute | +| 800 | UNINITIALIZED_AUTH_ERROR | Failed to initialize auth module | +| 801 | WRONG_LOGIN_PASSWORD | Username or password is wrong | +| 802 | NOT_LOGIN | Not login | +| 803 | NO_PERMISSION | No permisstion to operate | +| 804 | USER_NOT_EXIST | User not exists | +| 805 | USER_ALREADY_EXIST | User already exists | +| 806 | USER_ALREADY_HAS_ROLE | User already has target role | +| 807 | USER_NOT_HAS_ROLE | User not has target role | +| 808 | ROLE_NOT_EXIST | Role not exists | +| 809 | ROLE_ALREADY_EXIST | Role already exists | +| 810 | ALREADY_HAS_PRIVILEGE | Already has privilege | +| 811 | NOT_HAS_PRIVILEGE | Not has privilege | +| 812 | CLEAR_PERMISSION_CACHE_ERROR | Failed to clear permission cache | +| 813 | UNKNOWN_AUTH_PRIVILEGE | Unknown auth privilege | +| 814 | UNSUPPORTED_AUTH_OPERATION | Unsupported auth operation | +| 815 | AUTH_IO_EXCEPTION | IO Exception in auth module | +| 900 | MIGRATE_REGION_ERROR | Error when migrate region | +| 901 | CREATE_REGION_ERROR | Create region error | +| 902 | DELETE_REGION_ERROR | Delete region error | +| 903 | PARTITION_CACHE_UPDATE_ERROR | Update partition cache failed | +| 904 | CONSENSUS_NOT_INITIALIZED | Consensus is not initialized and cannot provide service | +| 905 | REGION_LEADER_CHANGE_ERROR | Region leader migration failed | +| 906 | NO_AVAILABLE_REGION_GROUP | Cannot find an available region group | +| 907 | LACK_DATA_PARTITION_ALLOCATION | Lacked some data partition allocation result in the response | +| 1000 | DATANODE_ALREADY_REGISTERED | DataNode already registered in cluster | +| 1001 | NO_ENOUGH_DATANODE | The number of DataNode is not enough, cannot remove DataNode or create enough replication | +| 1002 | ADD_CONFIGNODE_ERROR | Add ConfigNode error | +| 1003 | REMOVE_CONFIGNODE_ERROR | Remove ConfigNode error | +| 1004 | DATANODE_NOT_EXIST | DataNode not exist error | +| 1005 | DATANODE_STOP_ERROR | DataNode stop error | +| 1006 | REMOVE_DATANODE_ERROR | Remove datanode failed | +| 1007 | REGISTER_DATANODE_WITH_WRONG_ID | The DataNode to be registered has incorrect register id | +| 1008 | CAN_NOT_CONNECT_DATANODE | Can not connect to DataNode | +| 1100 | LOAD_FILE_ERROR | Meet error while loading file | +| 1101 | LOAD_PIECE_OF_TSFILE_ERROR | Error when load a piece of TsFile when loading | +| 1102 | DESERIALIZE_PIECE_OF_TSFILE_ERROR | Error when deserialize a piece of TsFile | +| 1103 | SYNC_CONNECTION_ERROR | Sync connection error | +| 1104 | SYNC_FILE_REDIRECTION_ERROR | Sync TsFile redirection error | +| 1105 | SYNC_FILE_ERROR | Sync TsFile error | +| 1106 | CREATE_PIPE_SINK_ERROR | Failed to create a PIPE sink | +| 1107 | PIPE_ERROR | PIPE error | +| 1108 | PIPESERVER_ERROR | PIPE server error | +| 1109 | VERIFY_METADATA_ERROR | Meet error in validate timeseries schema | +| 1200 | UDF_LOAD_CLASS_ERROR | Error when loading UDF class | +| 1201 | UDF_DOWNLOAD_ERROR | DataNode cannot download UDF from ConfigNode | +| 1202 | CREATE_UDF_ON_DATANODE_ERROR | Error when create UDF on DataNode | +| 1203 | DROP_UDF_ON_DATANODE_ERROR | Error when drop a UDF on DataNode | +| 1300 | CREATE_TRIGGER_ERROR | ConfigNode create trigger error | +| 1301 | DROP_TRIGGER_ERROR | ConfigNode delete Trigger error | +| 1302 | TRIGGER_FIRE_ERROR | Error when firing trigger | +| 1303 | TRIGGER_LOAD_CLASS_ERROR | Error when load class of trigger | +| 1304 | TRIGGER_DOWNLOAD_ERROR | Error when download trigger from ConfigNode | +| 1305 | CREATE_TRIGGER_INSTANCE_ERROR | Error when create trigger instance | +| 1306 | ACTIVE_TRIGGER_INSTANCE_ERROR | Error when activate trigger instance | +| 1307 | DROP_TRIGGER_INSTANCE_ERROR | Error when drop trigger instance | +| 1308 | UPDATE_TRIGGER_LOCATION_ERROR | Error when move stateful trigger to new datanode | +| 1400 | NO_SUCH_CQ | CQ task does not exist | +| 1401 | CQ_ALREADY_ACTIVE | CQ is already active | +| 1402 | CQ_AlREADY_EXIST | CQ is already exist | +| 1403 | CQ_UPDATE_LAST_EXEC_TIME_ERROR | CQ update last execution time failed | + +> All exceptions are refactored in the latest version by extracting uniform message into exception classes. Different error codes are added to all exceptions. When an exception is caught and a higher-level exception is thrown, the error code will keep and pass so that users will know the detailed error reason. +A base exception class "ProcessException" is also added to be extended by all exceptions. + diff --git a/src/UserGuide/V2.0.1/Tree/Reference/Syntax-Rule.md b/src/UserGuide/V2.0.1/Tree/Reference/Syntax-Rule.md new file mode 100644 index 00000000..320fa146 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/Syntax-Rule.md @@ -0,0 +1,281 @@ + + +# Identifiers + +## Literal Values + +This section describes how to write literal values in IoTDB. These include strings, numbers, timestamp values, boolean values, and NULL. + +### String Literals + +in IoTDB, **A string is a sequence of bytes or characters, enclosed within either single quote (`'`) or double quote (`"`) characters.** Examples: + +```js +'a string' +"another string" +``` + +#### Usage Scenarios + +Usages of string literals: + +- Values of `TEXT` type data in `INSERT` or `SELECT` statements + + ```sql + # insert + insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') + insert into root.ln.wf02.wt02(timestamp,hardware) values(2, '\\') + + +-----------------------------+--------------------------+ + | Time|root.ln.wf02.wt02.hardware| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.001+08:00| v1| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.002+08:00| \\| + +-----------------------------+--------------------------+ + + # select + select code from root.sg1.d1 where code in ('string1', 'string2'); + ``` + +- Used in`LOAD` / `REMOVE` / `SETTLE` instructions to represent file path. + + ```sql + # load + LOAD 'examplePath' + + # remove + REMOVE 'examplePath' + + # SETTLE + SETTLE 'examplePath' + ``` + +- Password fields in user management statements + + ```sql + # write_pwd is the password + CREATE USER ln_write_user 'write_pwd' + ``` + +- Full Java class names in UDF and trigger management statements + + ```sql + # Trigger example. Full java class names after 'AS' should be string literals. + CREATE TRIGGER `alert-listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + + # UDF example. Full java class names after 'AS' should be string literals. + CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- `AS` function provided by IoTDB can assign an alias to time series selected in query. Alias can be constant(including string) or identifier. + + ```sql + select s1 as 'temperature', s2 as 'speed' from root.ln.wf01.wt01; + + # Header of dataset + +-----------------------------+-----------|-----+ + | Time|temperature|speed| + +-----------------------------+-----------|-----+ + ``` + +- The key/value of an attribute can be String Literal and identifier, more details can be found at **key-value pair** part. + + +#### How to use quotation marks in String Literals + +There are several ways to include quote characters within a string: + + - `'` inside a string quoted with `"` needs no special treatment and need not be doubled or escaped. In the same way, `"` inside a string quoted with `'` needs no special treatment. + - A `'` inside a string quoted with `'` may be written as `''`. +- A `"` inside a string quoted with `"` may be written as `""`. + +The following examples demonstrate how quoting and escaping work: + +```js +'string' // string +'"string"' // "string" +'""string""' // ""string"" +'''string' // 'string + +"string" // string +"'string'" // 'string' +"''string''" // ''string'' +"""string" // "string +``` + +### Numeric Literals + +Number literals include integer (exact-value) literals and floating-point (approximate-value) literals. + +Integers are represented as a sequence of digits. Numbers may be preceded by `-` or `+` to indicate a negative or positive value, respectively. Examples: `1`, `-1`. + +Numbers with fractional part or represented in scientific notation with a mantissa and exponent are approximate-value numbers. Examples: `.1`, `3.14`, `-2.23`, `+1.70`, `1.2E3`, `1.2E-3`, `-1.2E3`, `-1.2E-3`. + +The `INT32` and `INT64` data types are integer types and calculations are exact. + +The `FLOAT` and `DOUBLE` data types are floating-point types and calculations are approximate. + +An integer may be used in floating-point context; it is interpreted as the equivalent floating-point number. + +### Timestamp Literals + +The timestamp is the time point at which data is produced. It includes absolute timestamps and relative timestamps in IoTDB. For information about timestamp support in IoTDB, see [Data Type Doc](../Basic-Concept/Data-Type.md). + +Specially, `NOW()` represents a constant timestamp that indicates the system time at which the statement began to execute. + +### Boolean Literals + +The constants `TRUE` and `FALSE` evaluate to 1 and 0, respectively. The constant names can be written in any lettercase. + +### NULL Values + +The `NULL` value means “no data.” `NULL` can be written in any lettercase. + +## Identifier + +### Usage scenarios + +Certain objects within IoTDB, including `TRIGGER`, `FUNCTION`(UDF), `CONTINUOUS QUERY`, `SCHEMA TEMPLATE`, `USER`, `ROLE`,`Pipe`,`PipeSink`,`alias` and other object names are known as identifiers. + +### Constraints + +Below are basic constraints of identifiers, specific identifiers may have other constraints, for example, `user` should consists of more than 4 characters. + +- Permitted characters in unquoted identifiers: + - [0-9 a-z A-Z _ ] (letters, digits and underscore) + - ['\u2E80'..'\u9FFF'] (UNICODE Chinese characters) + +### Reverse quotation marks + +**If the following situations occur, the identifier needs to be quoted using reverse quotes:** + +- The identifier contains special characters that are not allowed. +- The identifier is a real number. + +#### How to use quotation marks in identifiers caused by reverse quotation marks + +**Single and double quotes can be directly used in identifiers caused by reverse quotes.** + +**In identifiers referenced with quotation marks, quotation marks can be used by double writing them, meaning ` can be represented as``.** + +example: + +```SQL +# Create Template t1 ` t +create device template `t1``t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + +# Create Template t1't "t +create device template `t1't"t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +#### Examples of Reverse Quotation Marks + +- When the trigger name encounters the above special situations, reverse quotation marks should be used to reference it: + + ```sql + # Create trigger alert.` listener-sg1d1s1 + CREATE TRIGGER `alert.``listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + ``` + +- When the UDF name encounters the above special situations, reverse quotation marks should be used for reference: + + ```sql + # Create a UDF named 111, which is a real number and needs to be quoted in reverse quotation marks. + CREATE FUNCTION `111` AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- When the metadata template name encounters the above special situations, reverse quotation marks should be used for reference: + + ```sql + # Create a metadata template named 111, where 111 is a real number and needs to be quoted in reverse quotation marks. + create device template `111` + (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + ``` + +- When the username and role name encounter the above special situations, reverse quotation marks should be used for reference. Regardless of whether reverse quotation marks are used or not, spaces are not allowed in the username and role name. Please refer to the instructions in the permission management section for details. + + ```sql + # Create user special ` user. + CREATE USER `special``user.` 'write_pwd' + + # Create Character 111 + CREATE ROLE `111` + ``` + +- When encountering the above special situations in continuous query identification, reverse quotation marks should be used to reference: + + ```sql + # Create continuous query test.cq + CREATE CONTINUOUS QUERY `test.cq` + BEGIN + SELECT max_value(temperature) + INTO temperature_max + FROM root.ln.*.* + GROUP BY time(10s) + END + ``` + +- When the names Pipe and PipeSink encounter the above special situations, reverse quotation marks should be used for reference: + + ```sql + # Create PipeSink test. * 1 + CREATE PIPESINK `test.*1` AS IoTDB ('ip' = '输入你的IP') + + # Create Pipe Test. * 2 + CREATE PIPE `test.*2` TO `test.*1` FROM + (select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS) WITH 'SyncDelOp' = 'true' + ``` + +- In the Select clause, an alias can be specified for the value in the result set, which can be defined as a string or identifier. Examples are as follows: + + ```sql + select s1 as temperature, s2 as speed from root.ln.wf01.wt01; + # 表头如下所示 + +-----------------------------+-----------+-----+ + | Time|temperature|speed| + +-----------------------------+-----------+-----+ + ``` + +- Used to represent key value pairs, the keys and values of key value pairs can be defined as constants (including strings) or identifiers. Please refer to the Key Value Pair section for details. + +- Non database nodes in the path are allowed to contain the symbol `*`. When using it, the node needs to be enclosed in reverse quotes (as shown below), but this usage is only recommended when the path inevitably contains `*`. + + ```sql + `root.db.*` + ``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Reference/UDF-Libraries_apache.md b/src/UserGuide/V2.0.1/Tree/Reference/UDF-Libraries_apache.md new file mode 100644 index 00000000..f7b68d4f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Reference/UDF-Libraries_apache.md @@ -0,0 +1,5244 @@ + + +# UDF Libraries + +# UDF Libraries + +Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. + +> Note: The functions in the current UDF library only support millisecond level timestamp accuracy. + +## Installation steps + +1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. + + | UDF libraries version | Supported IoTDB versions | Download link | + | --------------- | ----------------- | ------------------------------------------------------------ | + | UDF-1.3.3.zip | V1.3.3 and above | Please contact Timecho for assistance | + | UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact Timecho for assistance| + +2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster +3. In the SQL command line terminal (CLI) or visualization console (Workbench) SQL operation interface of IoTDB, execute the corresponding function registration statement as follows. +4. Batch registration: Two registration methods: registration script or SQL full statement +- Register Script + - Copy the registration script (register-UDF.sh or register-UDF.bat) from the compressed package to the `tools` directory of IoTDB as needed, and modify the parameters in the script (default is host=127.0.0.1, rpcPort=6667, user=root, pass=root); + - Start IoTDB service, run registration script to batch register UDF + +- All SQL statements + - Open the SQl file in the compressed package, copy all SQL statements, and execute all SQl statements in the SQL command line terminal (CLI) of IoTDB or the SQL operation interface of the visualization console (Workbench) to batch register UDF + +## Data Quality + +### Completeness + +#### Registration statement + +```sql +create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' +``` + +#### Usage + +This function is used to calculate the completeness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the completeness of each window will be output. + +**Name:** COMPLETENESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. ++ `downtime`: Whether the downtime exception is considered in the calculation of completeness. It is 'true' or 'false' (default). When considering the downtime exception, long-term missing data will be considered as downtime exception without any influence on completeness. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +### Consistency + +#### Registration statement + +```sql +create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' +``` + +#### Usage + +This function is used to calculate the consistency of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the consistency of each window will be output. + +**Name:** CONSISTENCY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +### Timeliness + +#### Registration statement + +```sql +create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' +``` + +#### Usage + +This function is used to calculate the timeliness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the timeliness of each window will be output. + +**Name:** TIMELINESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +### Validity + +#### Registration statement + +```sql +create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' +``` + +#### Usage + +This function is used to calculate the Validity of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the Validity of each window will be output. + +**Name:** VALIDITY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + + + + + +## Data Profiling + +### ACF + +#### Registration statement + +```sql +create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' +``` + +#### Usage + +This function is used to calculate the auto-correlation factor of the input time series, +which equals to cross correlation between the same series. +For more information, please refer to [XCorr](./UDF-Libraries.md#xcorr) function. + +**Name:** ACF + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. +There are $2N-1$ data points in the series, and the values are interpreted in details in [XCorr](./UDF-Libraries.md#XCorr) function. + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| null| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +### Distinct + +#### Registration statement + +```sql +create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' +``` + +#### Usage + +This function returns all unique values in time series. + +**Name:** DISTINCT + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** + ++ The timestamp of the output series is meaningless. The output order is arbitrary. ++ Missing points and null points in the input series will be ignored, but `NaN` will not. ++ Case Sensitive. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select distinct(s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +### Histogram + +#### Registration statement + +```sql +create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' +``` + +#### Usage + +This function is used to calculate the distribution histogram of a single column of numerical data. + +**Name:** HISTOGRAM + +**Input Series:** Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameters:** + ++ `min`: The lower limit of the requested data range, the default value is -Double.MAX_VALUE. ++ `max`: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end. ++ `count`: The number of buckets of the histogram, the default value is 1. It must be a positive integer. + +**Output Series:** The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is $min+ (i-1)\cdot\frac{max-min}{count}$ and the upper bound is $min + i \cdot \frac{max-min}{count}$. + +**Note:** + ++ If the value is lower than `min`, it will be put into the 1st bucket. If the value is larger than `max`, it will be put into the last bucket. ++ Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +### Integral + +#### Registration statement + +```sql +create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' +``` + +#### Usage + +This function is used to calculate the integration of time series, +which equals to the area under the curve with time as X-axis and values as Y-axis. + +**Name:** INTEGRAL + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `unit`: The unit of time used when computing the integral. + The value should be chosen from "1S", "1s", "1m", "1H", "1d"(case-sensitive), + and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration. + +**Note:** + ++ The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. + Choosing different `unit` implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + +#### Examples + +##### Default Parameters + +With default parameters, this function will take one second as 1.0. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + +##### Specific time unit + +With time unit specified as "1m", this function will take one minute as 1.0. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +### IntegralAvg + +#### Registration statement + +```sql +create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' +``` + +#### Usage + +This function is used to calculate the function average of time series. +The output equals to the area divided by the time interval using the same time `unit`. +For more information of the area under the curve, please refer to `Integral` function. + +**Name:** INTEGRALAVG + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average. + +**Note:** + ++ The time-weighted value equals to the integral value with any `unit` divided by the time interval of input series. + The result is irrelevant to the time unit used in integral, and it's consistent with the timestamp precision of IoTDB by default. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + ++ If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +### Mad + +#### Registration statement + +```sql +create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' +``` + +#### Usage + +The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements' median. + +Take a dataset $\{1,3,3,5,5,6,7,8,9\}$ as an instance. Its median is 5 and the deviation of each element from the median is $\{0,0,1,2,2,2,3,4,4\}$, whose median is 2. Therefore, the MAD of the original dataset is 2. + +**Name:** MAD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking `error`=0.01 as an instance, suppose the exact MAD is $a$ and the approximate MAD is $b$, we have $0.99a \le b \le 1.01a$. With `error`=0, the output is the exact MAD. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +##### Exact Query + +With the default `error`(`error`=0), the function queries the exact MAD. + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +............ +Total line number = 20 +``` + +SQL for query: + +```sql +select mad(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|median(root.test.s1, "error"="0")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +##### Approximate Query + +By setting `error` within (0,1), the function queries the approximate MAD. + +SQL for query: + +```sql +select mad(s1, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s1, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| ++-----------------------------+---------------------------------+ +``` + +### Median + +#### Registration statement + +```sql +create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' +``` + +#### Usage + +The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample. + +**Name:** MEDIAN + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact median. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +Total line number = 20 +``` + +SQL for query: + +```sql +select median(s1, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s1, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+------------------------------------+ +``` + +### MinMax + +#### Registration statement + +```sql +create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' +``` + +#### Usage + +This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1. + +**Name:** MINMAX + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide minimum and maximum values. The default method is "batch". ++ `min`: The maximum value when method is set to "stream". ++ `max`: The minimum value when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select minmax(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + + +### MvAvg + +#### Registration statement + +```sql +create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' +``` + +#### Usage + +This function is used to calculate moving average of input series. + +**Name:** MVAVG + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `window`: Length of the moving window. Default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +### PACF + +#### Registration statement + +```sql +create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' +``` + +#### Usage + +This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output. + +**Name:** PACF + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lag`: Maximum lag of pacf to calculate. The default value is $\min(10\log_{10}n,n-1)$, where $n$ is the number of data points. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Assigning maximum lag + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select pacf(s1, "lag"="5") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|pacf(root.test.d1.s1, "lag"="5")| ++-----------------------------+--------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| +|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| +|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| +|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| ++-----------------------------+--------------------------------+ +``` + +### Percentile + +#### Registration statement + +```sql +create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' +``` + +#### Usage + +The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series. + +**Name:** PERCENTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with `rank`=0.5 is the median. ++ `error`: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact percentile. + +**Output Series:** Output a single series. The type is the same as input series. If `error`=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------+ +| Time|root.test2.s1| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+-------------+ +Total line number = 20 +``` + +SQL for query: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| ++-----------------------------+-------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| -1.0| ++-----------------------------+-------------------------------------------------------+ +``` + +### Quantile + +#### Registration statement + +```sql +create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' +``` + +#### Usage + +The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series. + +**Name:** QUANTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with `rank`=0.5 is the median. ++ `K`: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases. + +**Output Series:** Output a single series. The type is the same as input series. The timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------+ +| Time|root.test1.s1| ++-----------------------------+-------------+ +|2021-03-17T10:32:17.054+08:00| 7| +|2021-03-17T10:32:18.054+08:00| 15| +|2021-03-17T10:32:19.054+08:00| 36| +|2021-03-17T10:32:20.054+08:00| 39| +|2021-03-17T10:32:21.054+08:00| 40| +|2021-03-17T10:32:22.054+08:00| 41| +|2021-03-17T10:32:23.054+08:00| 20| +|2021-03-17T10:32:24.054+08:00| 18| ++-----------------------------+-------------+ +............ +Total line number = 8 +``` + +SQL for query: + +```sql +select quantile(s1, "rank"="0.2", "K"="800") from root.test1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------+ +| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.000000000000001| ++-----------------------------+------------------------------------------------+ +``` + +### Period + +#### Registration statement + +```sql +create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' +``` + +#### Usage + +The function is used to compute the period of a numeric time series. + +**Name:** PERIOD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period. + +#### Examples + +Input series: + + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select period(s1) from root.test.d3 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +### QLB + +#### Registration statement + +```sql +create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' +``` + +#### Usage + +This function is used to calculate Ljung-Box statistics $Q_{LB}$ for time series, and convert it to p value. + +**Name:** QLB + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters**: + +`lag`: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2. + +**Output Series:** Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag. + +**Note:** If you want to calculate Ljung-Box statistics $Q_{LB}$ instead of p value, you may use ACF function. + +#### Examples + +##### Using Default Parameter + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select QLB(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +### Resample + +#### Registration statement + +```sql +create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' +``` + +#### Usage + +This function is used to resample the input series according to a given frequency, +including up-sampling and down-sampling. +Currently, the supported up-sampling methods are +NaN (filling with `NaN`), +FFill (filling with previous value), +BFill (filling with next value) and +Linear (filling with linear interpolation). +Down-sampling relies on group aggregation, +which supports Max, Min, First, Last, Mean and Median. + +**Name:** RESAMPLE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + + ++ `every`: The frequency of resampling, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. ++ `interp`: The interpolation method of up-sampling, which is 'NaN', 'FFill', 'BFill' or 'Linear'. By default, NaN is used. ++ `aggr`: The aggregation method of down-sampling, which is 'Max', 'Min', 'First', 'Last', 'Mean' or 'Median'. By default, Mean is used. ++ `start`: The start time (inclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the first valid data point. ++ `end`: The end time (exclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the last valid data point. + +**Output Series:** Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency `every`. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +##### Up-sampling + +When the frequency of resampling is higher than the original frequency, up-sampling starts. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +SQL for query: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +##### Down-sampling + +When the frequency of resampling is lower than the original frequency, down-sampling starts. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + + +##### Specify the time period + +The time period of resampling can be specified with `start` and `end`. +The period outside the actual time range will be interpolated. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +### Sample + +#### Registration statement + +```sql +create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' +``` + +#### Usage + +This function is used to sample the input series, +that is, select a specified number of data points from the input series and output them. +Currently, three sampling methods are supported: +**Reservoir sampling** randomly selects data points. +All of the points have the same probability of being sampled. +**Isometric sampling** selects data points at equal index intervals. +**Triangle sampling** assigns data points to the buckets based on the number of sampling. +Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. +For more detail, please read [paper](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf) + +**Name:** SAMPLE + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Parameters:** + ++ `method`: The method of sampling, which is 'reservoir', 'isometric' or 'triangle'. By default, reservoir sampling is used. ++ `k`: The number of sampling, which is a positive integer. By default, it's 1. + +**Output Series:** Output a single series. The type is the same as the input. The length of the output series is `k`. Each data point in the output series comes from the input series. + +**Note:** If `k` is greater than the length of input series, all data points in the input series will be output. + +#### Examples + +##### Reservoir Sampling + +When `method` is 'reservoir' or the default, reservoir sampling is used. +Due to the randomness of this method, the output series shown below is only a possible result. + + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + +##### Isometric Sampling + +When `method` is 'isometric', isometric sampling is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +### Segment + +#### Registration statement + +```sql +create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' +``` + +#### Usage + +This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point. + +**Name:** SEGMENT + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `output` :"all" to output all fitted points; "first" to output first fitted points in each subsequence. + ++ `error`: error allowed at linear regression. It is defined as mean absolute error of a subsequence. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select segment(s1, "error"="0.1") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +### Skew + +#### Registration statement + +```sql +create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' +``` + +#### Usage + +This function is used to calculate the population skewness. + +**Name:** SKEW + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select skew(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +### Spline + +#### Registration statement + +```sql +create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' +``` + +#### Usage + +This function is used to calculate cubic spline interpolation of input series. + +**Name:** SPLINE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `points`: Number of resampling points. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note**: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series. + +#### Examples + +##### Assigning number of interpolation points + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select spline(s1, "points"="151") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +### Spread + +#### Registration statement + +```sql +create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' +``` + +#### Usage + +This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value. + +**Name:** SPREAD + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + + + +### ZScore + +#### Registration statement + +```sql +create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' +``` + +#### Usage + +This function is used to standardize the input series with z-score. + +**Name:** ZSCORE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide mean and standard deviation. The default method is "batch". ++ `avg`: Mean value when method is set to "stream". ++ `sd`: Standard deviation when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select zscore(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` + + +## Anomaly Detection + +### IQR + +#### Registration statement + +```sql +create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' +``` + +#### Usage + +This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected. + +**Name:** IQR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch". ++ `q1`: The lower quantile when method is set to "stream". ++ `q3`: The upper quantile when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** $IQR=Q_3-Q_1$ + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select iqr(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +### KSigma + +#### Registration statement + +```sql +create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' +``` + +#### Usage + +This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. +Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly. + +**Name:** KSIGMA + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `k`: How many times to multiply on standard deviation to define anomaly, the default value is 3. ++ `window`: The window size of Dynamic K-Sigma Algorithm, the default value is 10000. + +**Output Series:** Output a single series. The type is same as input series. + +**Note:** Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output. + +#### Examples + +##### Assigning k + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +### LOF + +#### Registration statement + +```sql +create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' +``` + +#### Usage + +This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output. + +**Name:** LOF + +**Input Series:** Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension. ++ `k`:use the k-th distance to calculate lof. Default value is 3. ++ `window`: size of window to split origin data points. Default value is 10000. ++ `windowsize`:dimension that will be transformed into when method is "series". The default value is 5. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** Incomplete rows will be ignored. They are neither calculated nor marked as anomaly. + +#### Examples + +##### Using default parameters + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +##### Diagnosing 1d timeseries + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +### MissDetect + +#### Registration statement + +```sql +create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' +``` + +#### Usage + +This function is used to detect missing anomalies. +In some datasets, missing values are filled by linear interpolation. +Thus, there are several long perfect linear segments. +By discovering these perfect linear segments, +missing anomalies are detected. + +**Name:** MISSDETECT + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + +`error`: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10. + +**Output Series:** Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +### Range + +#### Registration statement + +```sql +create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' +``` + +#### Usage + +This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output. + +**Name:** RANGE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lower_bound`:lower bound of range anomaly detection. ++ `upper_bound`:upper bound of range anomaly detection. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** Only when `upper_bound` is larger than `lower_bound`, the anomaly detection will be performed. Otherwise, nothing will be output. + + + +#### Examples + +##### Assigning Lower and Upper Bound + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +### TwoSidedFilter + +#### Registration statement + +```sql +create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' +``` + +#### Usage + +The function is used to filter anomalies of a numeric time series based on two-sided window detection. + +**Name:** TWOSIDEDFILTER + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE + +**Output Series:** Output a single series. The type is the same as the input. It is the input without anomalies. + +**Parameter:** + +- `len`: The size of the window, which is a positive integer. By default, it's 5. When `len`=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point. + +- `threshold`: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +Output series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +### Outlier + +#### Registration statement + +```sql +create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' +``` + +#### Usage + +This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier. + +**Name:** OUTLIER + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `r`:the neighbor distance threshold. ++ `k`:the neighbor count threshold. ++ `w`:the window size. ++ `s`:the slide size. + +**Output Series:** Output a single series. The type is the same as the input. + +#### Examples + +##### Assigning Parameters of Queries + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + + +### MasterTrain + +#### Usage + +This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points. + +**Name:** MasterTrain + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn spotless:apply`. +- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. +- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ +``` + +### MasterDetect + +#### Usage + +This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain. + +**Name:** MasterDetect + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. ++ `eta`: The detection threshold. By default, it will be estimated based on the 3-sigma rule. ++ `output_type`: The type of output. 'repair' for repairing and 'anomaly' for anomaly detection. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn spotless:apply`. +- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. +- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +##### Repairing + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +##### Anomaly Detection + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| true| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + +## Frequency Domain Analysis + +### Conv + +#### Registration statement + +```sql +create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' +``` + +#### Usage + +This function is used to calculate the convolution, i.e. polynomial multiplication. + +**Name:** CONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output:** Output a single series. The type is DOUBLE. It is the result of convolution whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +### Deconv + +#### Registration statement + +```sql +create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' +``` + +#### Usage + +This function is used to calculate the deconvolution, i.e. polynomial division. + +**Name:** DECONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `result`: The result of deconvolution, which is 'quotient' or 'remainder'. By default, the quotient will be output. + +**Output:** Output a single series. The type is DOUBLE. It is the result of deconvolving the second series from the first series (dividing the first series by the second series) whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Calculate the quotient + +When `result` is 'quotient' or the default, this function calculates the quotient of the deconvolution. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +##### Calculate the remainder + +When `result` is 'remainder', this function calculates the remainder of the deconvolution. + +Input series is the same as above, the SQL for query is shown below: + + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +### DWT + +#### Registration statement + +```sql +create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' +``` + +#### Usage + +This function is used to calculate 1d discrete wavelet transform of a numerical series. + +**Name:** DWT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of wavelet. May select 'Haar', 'DB4', 'DB6', 'DB8', where DB means Daubechies. User may offer coefficients of wavelet transform and ignore this parameter. Case ignored. ++ `coef`: Coefficients of wavelet transform. When providing this parameter, use comma ',' to split them, and leave no spaces or other punctuations. ++ `layer`: Times to transform. The number of output vectors equals $layer+1$. Default is 1. + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. + +**Note:** The length of input series must be an integer number power of 2. + +#### Examples + + +##### Haar wavelet transform + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +### FFT + +#### Registration statement + +```sql +create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' +``` + +#### Usage + +This function is used to calculate the fast Fourier transform (FFT) of a numerical series. + +**Name:** FFT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of FFT, which is 'uniform' (by default) or 'nonuniform'. If the value is 'uniform', the timestamps will be ignored and all data points will be regarded as equidistant. Thus, the equidistant fast Fourier transform algorithm will be applied. If the value is 'nonuniform' (TODO), the non-equidistant fast Fourier transform algorithm will be applied based on timestamps. ++ `result`: The result of FFT, which is 'real', 'imag', 'abs' or 'angle', corresponding to the real part, imaginary part, magnitude and phase angle. By default, the magnitude will be output. ++ `compress`: The parameter of compression, which is within (0,1]. It is the reserved energy ratio of lossy compression. By default, there is no compression. + + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. The timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Uniform FFT + +With the default `type`, uniform FFT is applied. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select fft(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, there are peaks in $k=4$ and $k=5$ of the output. + +##### Uniform FFT with Compression + +Input series is the same as above, the SQL for query is shown below: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +Note: Based on the conjugation of the Fourier transform result, only the first half of the compression result is reserved. +According to the given parameter, data points are reserved from low frequency to high frequency until the reserved energy ratio exceeds it. +The last data point is reserved to indicate the length of the series. + +### HighPass + +#### Registration statement + +```sql +create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' +``` + +#### Usage + +This function performs low-pass filtering on the input series and extracts components above the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** HIGHPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=sin(2\pi t/4)$ after high-pass filtering. + +### IFFT + +#### Registration statement + +```sql +create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' +``` + +#### Usage + +This function treats the two input series as the real and imaginary part of a complex series, performs an inverse fast Fourier transform (IFFT), and outputs the real part of the result. +For the input format, please refer to the output format of `FFT` function. +Moreover, the compressed output of `FFT` function is also supported. + +**Name:** IFFT + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `start`: The start time of the output series with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is '1970-01-01 08:00:00'. ++ `interval`: The interval of the output series, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it is 1s. + +**Output:** Output a single series. The type is DOUBLE. It is strictly equispaced. The values are the results of IFFT. + +**Note:** If a row contains null points or `NaN`, it will be ignored. + +#### Examples + + +Input series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +SQL for query: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +### LowPass + +#### Registration statement + +```sql +create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' +``` + +#### Usage + +This function performs low-pass filtering on the input series and extracts components below the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** LOWPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=2sin(2\pi t/5)$ after low-pass filtering. + + + +## Data Matching + +### Cov + +#### Registration statement + +```sql +create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' +``` + +#### Usage + +This function is used to calculate the population covariance. + +**Name:** COV + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population covariance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +### DTW + +#### Registration statement + +```sql +create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' +``` + +#### Usage + +This function is used to calculate the DTW distance between two input series. + +**Name:** DTW + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the DTW distance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `0` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +### Pearson + +#### Registration statement + +```sql +create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' +``` + +#### Usage + +This function is used to calculate the Pearson Correlation Coefficient. + +**Name:** PEARSON + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the Pearson Correlation Coefficient. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +### PtnSym + +#### Registration statement + +```sql +create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' +``` + +#### Usage + +This function is used to find all symmetric subseries in the input whose degree of symmetry is less than the threshold. +The degree of symmetry is calculated by DTW. +The smaller the degree, the more symmetrical the series is. + +**Name:** PATTERNSYMMETRIC + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameter:** + ++ `window`: The length of the symmetric subseries. It's a positive integer and the default value is 10. ++ `threshold`: The threshold of the degree of symmetry. It's non-negative. Only the subseries whose degree of symmetry is below it will be output. By default, all subseries will be output. + + +**Output Series:** Output a single series. The type is DOUBLE. Each data point in the output series corresponds to a symmetric subseries. The output timestamp is the starting timestamp of the subseries and the output value is the degree of symmetry. + +#### Example + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +### XCorr + +#### Registration statement + +```sql +create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' +``` + +#### Usage + +This function is used to calculate the cross correlation function of given two time series. +For discrete time series, cross correlation is given by +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +which represent the similarities between two series with different index shifts. + +**Name:** XCORR + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series with DOUBLE as datatype. +There are $2N-1$ data points in the series, the center of which represents the cross correlation +calculated with pre-aligned series(that is $CR(0)$ in the formula above), +and the previous(or post) values represent those with shifting the latter series forward(or backward otherwise) +until the two series are no longer overlapped(not included). +In short, the values of output series are given by(index starts from 1) +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` + + + +## Data Repairing + +### TimestampRepair + +#### Registration statement + +```sql +create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' +``` + +#### Usage + +This function is used for timestamp repair. +According to the given standard time interval, +the method of minimizing the repair cost is adopted. +By fine-tuning the timestamps, +the original data with unstable timestamp interval is repaired to strictly equispaced data. +If no standard time interval is given, +this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval. + +**Name:** TIMESTAMPREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method. ++ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +##### Manually Specify the Standard Time Interval + +When `interval` is given, this function repairs according to the given standard time interval. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +Output series: + + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +##### Automatically Estimate the Standard Time Interval + +When `interval` is default, this function estimates the standard time interval. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +### ValueFill + +#### Registration statement + +```sql +create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' +``` + +#### Usage + +This function is used to impute time series. Several methods are supported. + +**Name**: ValueFill +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear". + Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0). + +#### Examples + +##### Fill with linear + +When `method` is "linear" or the default, Screen method is used to impute. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuefill(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +##### Previous Fill + +When `method` is "previous", previous method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +### ValueRepair + +#### Registration statement + +```sql +create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' +``` + +#### Usage + +This function is used to repair the value of the time series. +Currently, two methods are supported: +**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes; +**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood. + + +**Name:** VALUEREPAIR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used. ++ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation. ++ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation. ++ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0. ++ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Screen + +When `method` is 'Screen' or the default, Screen method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +##### Repair with LsGreedy + +When `method` is 'LsGreedy', LsGreedy method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +### MasterRepair + +#### Usage + +This function is used to clean time series with master data. + +**Name**: MasterRepair +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. ++ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +SQL for query: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +Output series: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +### SeasonalRepair + +#### Usage +This function is used to repair the value of the seasonal time series via decomposition. Currently, two methods are supported: **Classical** - detect irregular fluctuations through residual component decomposed by classical decomposition, and repair them through moving average; **Improved** - detect irregular fluctuations through residual component decomposed by improved decomposition, and repair them through moving median. + +**Name:** SEASONALREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The decomposition method used to repair, which is 'Classical' or 'Improved'. By default, classical decomposition is used. ++ `period`: It is the period of the time series. ++ `k`: It is the range threshold of residual term, which limits the degree to which the residual term is off-center. By default, it is 9. ++ `max_iter`: It is the maximum number of iterations for the algorithm. By default, it is 10. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Classical + +When `method` is 'Classical' or default value, classical decomposition method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +##### Repair with Improved +When `method` is 'Improved', improved decomposition method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` + + + +## Series Discovery + +### ConsecutiveSequences + +#### Registration statement + +```sql +create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' +``` + +#### Usage + +This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest. + +**Name:** CONSECUTIVESEQUENCES + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + +##### Manually Specify the Standard Time Interval + +It's able to manually specify the standard time interval by the parameter `gap`. It's notable that false parameter leads to false output. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + + +##### Automatically Estimate the Standard Time Interval + +When `gap` is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +### ConsecutiveWindows + +#### Registration statement + +```sql +create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' +``` + +#### Usage + +This function is used to find consecutive windows of specified length in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive window is the subsequence that is strictly equispaced with the standard time interval without any missing data. + +**Name:** CONSECUTIVEWINDOWS + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. ++ `length`: The length of the window which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a consecutive window. The output timestamp is the starting timestamp of the window and the output value is the number of data points in the window. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` + + + +## Machine Learning + +### AR + +#### Registration statement + +```sql +create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' +``` + +#### Usage + +This function is used to learn the coefficients of the autoregressive models for a time series. + +**Name:** AR + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `p`: The order of the autoregressive model. Its default value is 1. + +**Output Series:** Output a single series. The type is DOUBLE. The first line corresponds to the first order coefficient, and so on. + +**Note:** + +- Parameter `p` should be a positive integer. +- Most points in the series should be sampled at a constant time interval. +- Linear interpolation is applied for the missing points in the series. + +#### Examples + +##### Assigning Model Order + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +### Representation + +#### Usage + +This function is used to represent a time series. + +**Name:** Representation + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is INT32. The length is `tb*vb`. The timestamps starting from 0 only indicate the order. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +### RM + +#### Usage + +This function is used to calculate the matching score of two time series according to the representation. + +**Name:** RM + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the matching score. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/SQL-Manual/Function-and-Expression.md b/src/UserGuide/V2.0.1/Tree/SQL-Manual/Function-and-Expression.md new file mode 100644 index 00000000..8fba38ac --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/SQL-Manual/Function-and-Expression.md @@ -0,0 +1,3014 @@ +# Function and Expression + + + +## Arithmetic Operators and Functions + +### Arithmetic Operators + +#### Unary Arithmetic Operators + +Supported operators: `+`, `-` + +Supported input data types: `INT32`, `INT64` and `FLOAT` + +Output data type: consistent with the input data type + +#### Binary Arithmetic Operators + +Supported operators: `+`, `-`, `*`, `/`, `%` + +Supported input data types: `INT32`, `INT64`, `FLOAT` and `DOUBLE` + +Output data type: `DOUBLE` + +Note: Only when the left operand and the right operand under a certain timestamp are not `null`, the binary arithmetic operation will have an output value. + +#### Example + +```sql +select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 +``` + +Result: + +``` ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.sg.d1.s1|-root.sg.d1.s1|root.sg.d1.s2|root.sg.d1.s2|root.sg.d1.s1 + root.sg.d1.s2|root.sg.d1.s1 - root.sg.d1.s2|root.sg.d1.s1 * root.sg.d1.s2|root.sg.d1.s1 / root.sg.d1.s2|root.sg.d1.s1 % root.sg.d1.s2| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| -1.0| 1.0| 1.0| 2.0| 0.0| 1.0| 1.0| 0.0| +|1970-01-01T08:00:00.002+08:00| 2.0| -2.0| 2.0| 2.0| 4.0| 0.0| 4.0| 1.0| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.0| -3.0| 3.0| 3.0| 6.0| 0.0| 9.0| 1.0| 0.0| +|1970-01-01T08:00:00.004+08:00| 4.0| -4.0| 4.0| 4.0| 8.0| 0.0| 16.0| 1.0| 0.0| +|1970-01-01T08:00:00.005+08:00| 5.0| -5.0| 5.0| 5.0| 10.0| 0.0| 25.0| 1.0| 0.0| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.014s +``` + +### Arithmetic Functions + +Currently, IoTDB supports the following mathematical functions. The behavior of these mathematical functions is consistent with the behavior of these functions in the Java Math standard library. + +| Function Name | Allowed Input Series Data Types | Output Series Data Type | Required Attributes | Corresponding Implementation in the Java Standard Library | +| ------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | Same type as the input series | / | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 'places' : Round the significant number, positive number is the significant number after the decimal point, negative number is the significant number of whole number | Math#rint(Math#pow(10,places))/Math#pow(10,places) | +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sqrt(double) | + +Example: + +``` sql +select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +Result: + +``` ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +| Time| root.sg1.d1.s1|sin(root.sg1.d1.s1)| cos(root.sg1.d1.s1)|tan(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 0.8133527237573284| 0.5817708713544664| 1.3980636773094157| +|2020-12-10T17:11:49.038+08:00|4377791063319964531|-0.8938962705202537| 0.4482738644511651| -1.994085181866842| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 0.9627757585308978|-0.27030138509681073|-3.5618602479083545| +|2020-12-10T17:11:49.040+08:00|2508858212791964081|-0.6073417341629443| -0.7944406950452296| 0.7644897069734913| +|2020-12-10T17:11:49.041+08:00|2817297431185141819|-0.8419358900502509| -0.5395775727782725| 1.5603611649667768| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +Total line number = 5 +It costs 0.008s +``` + +#### ROUND +Example: +```sql +select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1 +``` + +```sql ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +| Time|root.db.d1.s4|ROUND(root.db.d1.s4)|ROUND(root.db.d1.s4,2)|ROUND(root.db.d1.s4,-1)| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 101.14345| 101.0| 101.14| 100.0| +|1970-01-01T08:00:00.002+08:00| 20.144346| 20.0| 20.14| 20.0| +|1970-01-01T08:00:00.003+08:00| 20.614372| 21.0| 20.61| 20.0| +|1970-01-01T08:00:00.005+08:00| 20.814346| 21.0| 20.81| 20.0| +|1970-01-01T08:00:00.006+08:00| 60.71443| 61.0| 60.71| 60.0| +|2023-03-13T16:16:19.764+08:00| 10.143425| 10.0| 10.14| 10.0| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +Total line number = 6 +It costs 0.059s +``` + + + +## Comparison Operators and Functions + +### Basic comparison operators + +Supported operators `>`, `>=`, `<`, `<=`, `==`, `!=` (or `<>` ) + +Supported input data types: `INT32`, `INT64`, `FLOAT` and `DOUBLE` + +Note: It will transform all type to `DOUBLE` then do computation. + +Output data type: `BOOLEAN` + +**Example:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` + +### `BETWEEN ... AND ...` operator + +|operator |meaning| +|-----------------------------|-----------| +|`BETWEEN ... AND ...` |within the specified range| +|`NOT BETWEEN ... AND ...` |Not within the specified range| + +**Example:** Select data within or outside the interval [36.5,40]: + +```sql +select temperature from root.sg1.d1 where temperature between 36.5 and 40; +``` + +```sql +select temperature from root.sg1.d1 where temperature not between 36.5 and 40; +``` + +### Fuzzy matching operator + +For TEXT type data, support fuzzy matching of data using `Like` and `Regexp` operators. + +|operator |meaning| +|-----------------------------|-----------| +|`LIKE` | matches simple patterns| +|`NOT LIKE` |cannot match simple pattern| +|`REGEXP` | Match regular expression| +|`NOT REGEXP` |Cannot match regular expression| + +Input data type: `TEXT` + +Return type: `BOOLEAN` + +#### Use `Like` for fuzzy matching + +**Matching rules:** + +- `%` means any 0 or more characters. +- `_` means any single character. + +**Example 1:** Query the data under `root.sg.d1` that contains `'cc'` in `value`. + +```shell +IoTDB> select * from root.sg.d1 where value like '%cc%' ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++--------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query the data under `root.sg.d1` with `'b'` in the middle of `value` and any single character before and after. + +```shell +IoTDB> select * from root.sg.device where value like '_b_' ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00|abc| ++--------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +#### Use `Regexp` for fuzzy matching + +The filter condition that needs to be passed in is **Java standard library style regular expression**. + +**Common regular matching examples:** + +``` +All characters with a length of 3-20: ^.{3,20}$ +Uppercase English characters: ^[A-Z]+$ +Numbers and English characters: ^[A-Za-z0-9]+$ +Starting with a: ^a.* +``` + +**Example 1:** Query the string of 26 English characters for value under root.sg.d1. + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++--------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query root.sg.d1 where the value is a string consisting of 26 lowercase English characters and the time is greater than 100. + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++--------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 3:** + +```sql +select b, b like '1%', b regexp '[0-2]' from root.test; +``` + +operation result +``` ++-----------------------------+-----------+------- ------------------+--------------------------+ +| Time|root.test.b|root.test.b LIKE '^1.*?$'|root.test.b REGEXP '[0-2]'| ++-----------------------------+-----------+------- ------------------+--------------------------+ +|1970-01-01T08:00:00.001+08:00| 111test111| true| true| +|1970-01-01T08:00:00.003+08:00| 333test333| false| false| ++-----------------------------+-----------+------- ------------------+--------------------------+ +``` + +### `IS NULL` operator + +|operator |meaning| +|-----------------------------|-----------| +|`IS NULL` |is a null value| +|`IS NOT NULL` |is not a null value| + +**Example 1:** Select data with empty values: + +```sql +select code from root.sg1.d1 where temperature is null; +``` + +**Example 2:** Select data with non-null values: + +```sql +select code from root.sg1.d1 where temperature is not null; +``` + +### `IN` operator + +|operator |meaning| +|-----------------------------|-----------| +|`IN` / `CONTAINS` | are the values ​​in the specified list| +|`NOT IN` / `NOT CONTAINS` |not a value in the specified list| + +Input data type: `All Types` + +return type `BOOLEAN` + +**Note: Please ensure that the values ​​in the collection can be converted to the type of the input data. ** +> +> For example: +> +> `s1 in (1, 2, 3, 'test')`, the data type of `s1` is `INT32` +> +> We will throw an exception because `'test'` cannot be converted to type `INT32` + +**Example 1:** Select data with values ​​within a certain range: + +```sql +select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +``` + +**Example 2:** Select data with values ​​outside a certain range: + +```sql +select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +``` + +**Example 3:** + +```sql +select a, a in (1, 2) from root.test; +``` + +Output 2: +``` ++-----------------------------+-----------+------- -------------+ +| Time|root.test.a|root.test.a IN (1,2)| ++-----------------------------+-----------+------- -------------+ +|1970-01-01T08:00:00.001+08:00| 1| true| +|1970-01-01T08:00:00.003+08:00| 3| false| ++-----------------------------+-----------+------- -------------+ +``` + +### Condition Functions + +Condition functions are used to check whether timeseries data points satisfy some specific condition. + +They return BOOLEANs. + +Currently, IoTDB supports the following condition functions: + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | --------------------------------------------- | ----------------------- | --------------------------------------------- | +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`: a double type variate | BOOLEAN | Return `ts_value >= threshold`. | +| IN_RANGR | INT32 / INT64 / FLOAT / DOUBLE | `lower`: DOUBLE type
`upper`: DOUBLE type | BOOLEAN | Return `ts_value >= lower && value <= upper`. | + +Example Data: +``` +IoTDB> select ts from root.test; ++-----------------------------+------------+ +| Time|root.test.ts| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 3| +|1970-01-01T08:00:00.004+08:00| 4| ++-----------------------------+------------+ +``` + +#### Test 1 +SQL: +```sql +select ts, on_off(ts, 'threshold'='2') from root.test; +``` + +Output: +``` +IoTDB> select ts, on_off(ts, 'threshold'='2') from root.test; ++-----------------------------+------------+-------------------------------------+ +| Time|root.test.ts|on_off(root.test.ts, "threshold"="2")| ++-----------------------------+------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| true| ++-----------------------------+------------+-------------------------------------+ +``` + +#### Test 2 +Sql: +```sql +select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; +``` + +Output: +``` +IoTDB> select ts, in_range(ts,'lower'='2', 'upper'='3.1') from root.test; ++-----------------------------+------------+--------------------------------------------------+ +| Time|root.test.ts|in_range(root.test.ts, "lower"="2", "upper"="3.1")| ++-----------------------------+------------+--------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| false| ++-----------------------------+------------+--------------------------------------------------+ +``` + + + +## Logical Operators + +### Unary Logical Operators + +Supported operator `!` + +Supported input data types: `BOOLEAN` + +Output data type: `BOOLEAN` + +Hint: the priority of `!` is the same as `-`. Remember to use brackets to modify priority. + +### Binary Logical Operators + +Supported operators AND:`and`,`&`, `&&`; OR:`or`,`|`,`||` + +Supported input data types: `BOOLEAN` + +Output data type: `BOOLEAN` + +Note: Only when the left operand and the right operand under a certain timestamp are both `BOOLEAN` type, the binary logic operation will have an output value. + +**Example:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +Output: +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` + + + +## Aggregate Functions + +Aggregate functions are many-to-one functions. They perform aggregate calculations on a set of values, resulting in a single aggregated result. + +All aggregate functions except `COUNT()`, `COUNT_IF()` ignore null values and return null when there are no input rows or all values are null. For example, `SUM()` returns null instead of zero, and `AVG()` does not include null values in the count. + +The aggregate functions supported by IoTDB are as follows: + +| Function Name | Description | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | +| ------------- | ------------------------------------------------------------ |-----------------------------------------------------| ------------------------------------------------------------ | ----------------------------------- | +| SUM | Summation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| COUNT | Counts the number of data points. | All data types | / | INT | +| AVG | Average. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| EXTREME | Finds the value with the largest absolute value. Returns a positive value if the maximum absolute value of positive and negative values is equal. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| MAX_VALUE | Find the maximum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| MIN_VALUE | Find the minimum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| FIRST_VALUE | Find the value with the smallest timestamp. | All data types | / | Consistent with input data type | +| LAST_VALUE | Find the value with the largest timestamp. | All data types | / | Consistent with input data type | +| MAX_TIME | Find the maximum timestamp. | All data Types | / | Timestamp | +| MIN_TIME | Find the minimum timestamp. | All data Types | / | Timestamp | +| COUNT_IF | Find the number of data points that continuously meet a given condition and the number of data points that meet the condition (represented by keep) meet the specified threshold. | BOOLEAN | `[keep >=/>/=/!=/= threshold` if `threshold` is used alone, type of `threshold` is `INT64` `ignoreNull`:Optional, default value is `true`;If the value is `true`, null values are ignored, it means that if there is a null value in the middle, the value is ignored without interrupting the continuity. If the value is `true`, null values are not ignored, it means that if there are null values in the middle, continuity will be broken | INT64 | +| TIME_DURATION | Find the difference between the timestamp of the largest non-null value and the timestamp of the smallest non-null value in a column | All data Types | / | INT64 | +| MODE | Find the mode. Note: 1.Having too many different values in the input series risks a memory exception; 2.If all the elements have the same number of occurrences, that is no Mode, return the value with earliest time; 3.If there are many Modes, return the Mode with earliest time. | All data Types | / | Consistent with the input data type | +| STDDEV | Calculate the overall standard deviation of the data. Note:
Missing points, null points and `NaN` in the input series will be ignored.| INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| COUNT_TIME | The number of timestamps in the query data set. When used with `align by device`, the result is the number of timestamps in the data set per device. | All data Types, the input parameter can only be `*` | / | INT64 | + + +### COUNT + +#### example + +```sql +select count(status) from root.ln.wf01.wt01; +``` +Result: + +``` ++-------------------------------+ +|count(root.ln.wf01.wt01.status)| ++-------------------------------+ +| 10080| ++-------------------------------+ +Total line number = 1 +It costs 0.016s +``` + +### COUNT_IF + +#### Grammar +```sql +count_if(predicate, [keep >=/>/=/!=/Note: count_if is not supported to use with SlidingWindow in group by time now + +#### example + +##### raw data + +``` ++-----------------------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2| ++-----------------------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 0| 0| +|1970-01-01T08:00:00.002+08:00| null| 0| +|1970-01-01T08:00:00.003+08:00| 0| 0| +|1970-01-01T08:00:00.004+08:00| 0| 0| +|1970-01-01T08:00:00.005+08:00| 1| 0| +|1970-01-01T08:00:00.006+08:00| 1| 0| +|1970-01-01T08:00:00.007+08:00| 1| 0| +|1970-01-01T08:00:00.008+08:00| 0| 0| +|1970-01-01T08:00:00.009+08:00| 0| 0| +|1970-01-01T08:00:00.010+08:00| 0| 0| ++-----------------------------+-------------+-------------+ +``` + +##### Not use `ignoreNull` attribute (Ignore Null) + +SQL: +```sql +select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1 +``` + +Result: +``` ++--------------------------------------------------+--------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3)|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3)| ++--------------------------------------------------+--------------------------------------------------+ +| 2| 1| ++--------------------------------------------------+-------------------------------------------------- +``` + +##### Use `ignoreNull` attribute + +SQL: +```sql +select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1 +``` + +Result: +``` ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +| 1| 1| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +``` + +### TIME_DURATION +#### Grammar +```sql + time_duration(Path) +``` +#### Example +##### raw data +```sql ++----------+-------------+ +| Time|root.db.d1.s1| ++----------+-------------+ +| 1| 70| +| 3| 10| +| 4| 303| +| 6| 110| +| 7| 302| +| 8| 110| +| 9| 60| +| 10| 70| +|1677570934| 30| ++----------+-------------+ +``` +##### Insert sql +```sql +"CREATE DATABASE root.db", +"CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN tags(city=Beijing)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1, 2, 10, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(2, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(3, 10, 0, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(4, 303, 30, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(5, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(6, 110, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(7, 302, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(8, 110, null, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(9, 60, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(10,70, 20, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1677570934, 30, 0, true)", +``` + +SQL: +```sql +select time_duration(s1) from root.db.d1 +``` + +Result: +``` ++----------------------------+ +|time_duration(root.db.d1.s1)| ++----------------------------+ +| 1677570933| ++----------------------------+ +``` +> Note: Returns 0 if there is only one data point, or null if the data point is null. + +### COUNT_TIME +#### Grammar +```sql + count_time(*) +``` +#### Example +##### raw data +``` ++----------+-------------+-------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2|root.db.d2.s1|root.db.d2.s2| ++----------+-------------+-------------+-------------+-------------+ +| 0| 0| null| null| 0| +| 1| null| 1| 1| null| +| 2| null| 2| 2| null| +| 4| 4| null| null| 4| +| 5| 5| 5| 5| 5| +| 7| null| 7| 7| null| +| 8| 8| 8| 8| 8| +| 9| null| 9| null| null| ++----------+-------------+-------------+-------------+-------------+ +``` +##### Insert sql +```sql +CREATE DATABASE root.db; +CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d1.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +INSERT INTO root.db.d1(time, s1) VALUES(0, 0), (4,4), (5,5), (8,8); +INSERT INTO root.db.d1(time, s2) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8), (9,9); +INSERT INTO root.db.d2(time, s1) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8); +INSERT INTO root.db.d2(time, s2) VALUES(0, 0), (4,4), (5,5), (8,8); +``` + +Query-Example - 1: +```sql +select count_time(*) from root.db.** +``` + +Result +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +Query-Example - 2: +```sql +select count_time(*) from root.db.d1, root.db.d2 +``` + +Result +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +Query-Example - 3: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) +``` + +Result +``` ++-----------------------------+-------------+ +| Time|count_time(*)| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 1| +|1970-01-01T08:00:00.008+08:00| 2| ++-----------------------------+-------------+ +``` + +Query-Example - 4: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) align by device +``` + +Result +``` ++-----------------------------+----------+-------------+ +| Time| Device|count_time(*)| ++-----------------------------+----------+-------------+ +|1970-01-01T08:00:00.000+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.000+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d2| 1| ++-----------------------------+----------+-------------+ +``` + +> Note: +> 1. The parameter in count_time can only be *. +> 2. Count_time aggregation cannot be used with other aggregation functions. +> 3. Count_time aggregation used with having statement is not supported, and count_time aggregation can not appear in the having statement. +> 4. Count_time does not support use with group by level, group by tag. + + + +## String Processing + +### STRING_CONTAINS + +#### Function introduction + +This function checks whether the substring `s` exists in the string + +**Function name:** STRING_CONTAINS + +**Input sequence:** Only a single input sequence is supported, the type is TEXT. + +**parameter:** ++ `s`: The string to search for. + +**Output Sequence:** Output a single sequence, the type is BOOLEAN. + +#### Usage example + +``` sql +select s1, string_contains(s1, 's'='warn') from root.sg1.d4; +``` + +``` ++-----------------------------+--------------+-------------------------------------------+ +| Time|root.sg1.d4.s1|string_contains(root.sg1.d4.s1, "s"="warn")| ++-----------------------------+--------------+-------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| true| +|1970-01-01T08:00:00.002+08:00| error:-37229| false| +|1970-01-01T08:00:00.003+08:00| warn:1731| true| ++-----------------------------+--------------+-------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +### STRING_MATCHES + +#### Function introduction + +This function judges whether a string can be matched by the regular expression `regex`. + +**Function name:** STRING_MATCHES + +**Input sequence:** Only a single input sequence is supported, the type is TEXT. + +**parameter:** ++ `regex`: Java standard library-style regular expressions. + +**Output Sequence:** Output a single sequence, the type is BOOLEAN. + +#### Usage example + +```sql +select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; +``` + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d4.s1|string_matches(root.sg1.d4.s1, "regex"="[^\\s]+37229")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| false| +|1970-01-01T08:00:00.002+08:00| error:-37229| true| +|1970-01-01T08:00:00.003+08:00| warn:1731| false| ++-----------------------------+--------------+------------------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +### Length + +#### Usage + +The function is used to get the length of input series. + +**Name:** LENGTH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is INT32. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, length(s1) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+----------------------+ +| Time|root.sg1.d1.s1|length(root.sg1.d1.s1)| ++-----------------------------+--------------+----------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 6| +|1970-01-01T08:00:00.002+08:00| 22test22| 8| ++-----------------------------+--------------+----------------------+ +``` + +### Locate + +#### Usage + +The function is used to get the position of the first occurrence of substring `target` in input series. Returns -1 if there are no `target` in input. + +**Name:** LOCATE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `target`: The substring to be located. ++ `reverse`: Indicates whether reverse locate is required. The default value is `false`, means left-to-right locate. + +**Output Series:** Output a single series. The type is INT32. + +**Note:** The index begins from 0. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, locate(s1, "target"="1") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 0| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------+ +``` + +Another SQL for query: + +```sql +select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1", "reverse"="true")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 5| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------------------------+ +``` + +### StartsWith + +#### Usage + +The function is used to check whether input series starts with the specified prefix. + +**Name:** STARTSWITH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ `target`: The prefix to be checked. + +**Output Series:** Output a single series. The type is BOOLEAN. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, startswith(s1, "target"="1") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+----------------------------------------+ +| Time|root.sg1.d1.s1|startswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+----------------------------------------+ +``` + +### EndsWith + +#### Usage + +The function is used to check whether input series ends with the specified suffix. + +**Name:** ENDSWITH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ `target`: The suffix to be checked. + +**Output Series:** Output a single series. The type is BOOLEAN. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, endswith(s1, "target"="1") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|endswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+--------------------------------------+ +``` + +### Concat + +#### Usage + +The function is used to concat input series and target strings. + +**Name:** CONCAT + +**Input Series:** At least one input series. The data type is TEXT. + +**Parameter:** ++ `targets`: A series of K-V, key needs to start with `target` and be not duplicated, value is the string you want to concat. ++ `series_behind`: Indicates whether series behind targets. The default value is `false`. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** ++ If value of input series is NULL, it will be skipped. ++ We can only concat input series and `targets` separately. `concat(s1, "target1"="IoT", s2, "target2"="DB")` and + `concat(s1, s2, "target1"="IoT", "target2"="DB")` gives the same result. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| 1test1IoTDB| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 22test222222testIoTDB| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +``` + +Another SQL for query: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB", "series_behind"="true")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| IoTDB1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| IoTDB22test222222test| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +``` + +### substring + +#### Usage + +Extracts a substring of a string, starting with the first specified character and stopping after the specified number of characters.The index start at 1. The value range of from and for is an INT32. + +**Name:** SUBSTRING + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ `from`: Indicates the start position of substring. ++ `for`: Indicates how many characters to stop after of substring. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, substring(s1 from 1 for 2) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|SUBSTRING(root.sg1.d1.s1 FROM 1 FOR 2)| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1t| +|1970-01-01T08:00:00.002+08:00| 22test22| 22| ++-----------------------------+--------------+--------------------------------------+ +``` + +### replace + +#### Usage + +Replace a substring in the input sequence with the target substring. + +**Name:** REPLACE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ first parameter: The target substring to be replaced. ++ second parameter: The substring to replace with. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, replace(s1, 'es', 'tt') from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+-----------------------------------+ +| Time|root.sg1.d1.s1|REPLACE(root.sg1.d1.s1, 'es', 'tt')| ++-----------------------------+--------------+-----------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1tttt1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22tttt22| ++-----------------------------+--------------+-----------------------------------+ +``` + +### Upper + +#### Usage + +The function is used to get the string of input series with all characters changed to uppercase. + +**Name:** UPPER + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, upper(s1) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|upper(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22TEST22| ++-----------------------------+--------------+---------------------+ +``` + +### Lower + +#### Usage + +The function is used to get the string of input series with all characters changed to lowercase. + +**Name:** LOWER + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, lower(s1) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|lower(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| 1test1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| 22test22| ++-----------------------------+--------------+---------------------+ +``` + +### Trim + +#### Usage + +The function is used to get the string whose value is same to input series, with all leading and trailing space removed. + +**Name:** TRIM + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s3| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s3, trim(s3) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------------+ +| Time|root.sg1.d1.s3|trim(root.sg1.d1.s3)| ++-----------------------------+--------------+--------------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | 3querytest3| ++-----------------------------+--------------+--------------------+ +``` + +### StrCmp + +#### Usage + +The function is used to get the compare result of two input series. Returns `0` if series value are the same, a `negative integer` if value of series1 is smaller than series2, +a `positive integer` if value of series1 is more than series2. + +**Name:** StrCmp + +**Input Series:** Support two input series. Data types are all the TEXT. + +**Output Series:** Output a single series. The type is INT32. + +**Note:** Returns NULL either series value is NULL. + +#### Examples + +Input series: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select s1, s2, strcmp(s1, s2) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|strcmp(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 66| ++-----------------------------+--------------+--------------+--------------------------------------+ +``` + + +### StrReplace + +#### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to replace the specific substring with given string. + +**Name:** STRREPLACE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `target`: The target substring to be replaced. ++ `replace`: The string to be put on. ++ `limit`: The number of matches to be replaced which should be an integer no less than -1, + default to -1 which means all matches will be replaced. ++ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. ++ `reverse`: Whether to count all the matches reversely, default to 'false'. + +**Output Series:** Output a single series. The type is TEXT. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",",| +| | "replace"="/", "limit"="2")| ++-----------------------------+-----------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A/B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A/A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B/B| +|2021-01-01T00:00:04.000+08:00| A+/A/A+,A| +|2021-01-01T00:00:05.000+08:00| A/B-/B,B| ++-----------------------------+-----------------------------------------+ +``` + +Another SQL for query: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= | +| | "|", "limit"="1", "offset"="1", "reverse"="true")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A,B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B,B| +|2021-01-01T00:00:04.000+08:00| A+,A/A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-/B,B| ++-----------------------------+-----------------------------------------------------+ +``` + +### RegexMatch + +#### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to fetch matched contents from text with given regular expression. + +**Name:** REGEXMATCH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `regex`: The regular expression to match in the text. All grammars supported by Java are acceptable, + for example, `\d+\.\d+\.\d+\.\d+` is expected to match any IPv4 addresses. ++ `group`: The wanted group index in the matched result. + Reference to java.util.regex, group 0 is the whole pattern and + the next ones are numbered with the appearance order of left parentheses. + For example, the groups in `A(B(CD))` are: 0-`A(B(CD))`, 1-`B(CD)`, 2-`CD`. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Those points with null values or not matched with the given pattern will not return any results. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +SQL for query: + +```sql +select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------------------------------------+ +| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")| ++-----------------------------+----------------------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 192.168.0.1| +|2021-01-01T00:00:02.000+08:00| 192.168.0.24| +|2021-01-01T00:00:03.000+08:00| 192.168.0.2| +|2021-01-01T00:00:04.000+08:00| 192.168.0.5| +|2021-01-01T00:00:05.000+08:00| 192.168.0.124| ++-----------------------------+----------------------------------------------------------------------+ +``` + +### RegexReplace + +#### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to replace the specific regular expression matches with given string. + +**Name:** REGEXREPLACE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `regex`: The target regular expression to be replaced. All grammars supported by Java are acceptable. ++ `replace`: The string to be put on and back reference notes in Java is also supported, + for example, '$1' refers to group 1 in the `regex` which will be filled with corresponding matched results. ++ `limit`: The number of matches to be replaced which should be an integer no less than -1, + default to -1 which means all matches will be replaced. ++ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. ++ `reverse`: Whether to count all the matches reversely, default to 'false'. + +**Output Series:** Output a single series. The type is TEXT. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +SQL for query: + +```sql +select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------------+ +| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",| +| | "replace"="cluster-$1", "limit"="1")| ++-----------------------------+-----------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]| ++-----------------------------+-----------------------------------------------------------+ +``` + +### RegexSplit + +#### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to split text with given regular expression and return specific element. + +**Name:** REGEXSPLIT + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `regex`: The regular expression used to split the text. + All grammars supported by Java are acceptable, for example, `['"]` is expected to match `'` and `"`. ++ `index`: The wanted index of elements in the split result. + It should be an integer no less than -1, default to -1 which means the length of the result array is returned + and any non-negative integer is used to fetch the text of the specific index starting from 0. + +**Output Series:** Output a single series. The type is INT32 when `index` is -1 and TEXT when it's an valid index. + +**Note:** When `index` is out of the range of the result array, for example `0,1,2` split with `,` and `index` is set to 3, +no result are returned for that record. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 4| +|2021-01-01T00:00:02.000+08:00| 4| +|2021-01-01T00:00:03.000+08:00| 3| +|2021-01-01T00:00:04.000+08:00| 4| +|2021-01-01T00:00:05.000+08:00| 4| ++-----------------------------+------------------------------------------------------+ +``` + +Another SQL for query: + +SQL for query: + +```sql +select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| B-| +|2021-01-01T00:00:02.000+08:00| B+| +|2021-01-01T00:00:04.000+08:00| A| +|2021-01-01T00:00:05.000+08:00| B| ++-----------------------------+-----------------------------------------------------+ +``` + + + +## Data Type Conversion Function + +The IoTDB currently supports 6 data types, including INT32, INT64 ,FLOAT, DOUBLE, BOOLEAN, TEXT. When we query or evaluate data, we may need to convert data types, such as TEXT to INT32, or FLOAT to DOUBLE. IoTDB supports cast function to convert data types. + +Syntax example: + +```sql +SELECT cast(s1 as INT32) from root.sg +``` + +The syntax of the cast function is consistent with that of PostgreSQL. The data type specified after AS indicates the target type to be converted. Currently, all six data types supported by IoTDB can be used in the cast function. The conversion rules to be followed are shown in the following table. The row represents the original data type, and the column represents the target data type to be converted into: + +| | **INT32** | **INT64** | **FLOAT** | **DOUBLE** | **BOOLEAN** | **TEXT** | +| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | -------------------------------- | +| **INT32** | No need to cast | Cast directly | Cast directly | Cast directly | !=0 : true
==0: false | String.valueOf() | +| **INT64** | Out of the range of INT32: throw Exception
Otherwise: Cast directly | No need to cast | Cast directly | Cast directly | !=0L : true
==0: false | String.valueOf() | +| **FLOAT** | Out of the range of INT32: throw Exception
Otherwise: Math.round() | Out of the range of INT64: throw Exception
Otherwise: Math.round() | No need to cast | Cast directly | !=0.0f : true
==0: false | String.valueOf() | +| **DOUBLE** | Out of the range of INT32: throw Exception
Otherwise: Math.round() | Out of the range of INT64: throw Exception
Otherwise: Math.round() | Out of the range of FLOAT:throw Exception
Otherwise: Cast directly | No need to cast | !=0.0 : true
==0: false | String.valueOf() | +| **BOOLEAN** | true: 1
false: 0 | true: 1L
false: 0 | true: 1.0f
false: 0 | true: 1.0
false: 0 | No need to cast | true: "true"
false: "false" | +| **TEXT** | Integer.parseInt() | Long.parseLong() | Float.parseFloat() | Double.parseDouble() | text.toLowerCase =="true" : true
text.toLowerCase =="false" : false
Otherwise: throw Exception | No need to cast | + +### Examples + +``` +// timeseries +IoTDB> show timeseries root.sg.d1.** ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +|root.sg.d1.s3| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s4| null| root.sg| DOUBLE| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s5| null| root.sg| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s6| null| root.sg| TEXT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s1| null| root.sg| INT32| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s2| null| root.sg| INT64| PLAIN| SNAPPY|null| null| null| null| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ + +// data of timeseries +IoTDB> select * from root.sg.d1; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d1.s3|root.sg.d1.s4|root.sg.d1.s5|root.sg.d1.s6|root.sg.d1.s1|root.sg.d1.s2| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| false| 10000| 0| 0| +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| false| 3| 1| 1| +|1970-01-01T08:00:00.002+08:00| 2.7| 2.7| true| TRue| 2| 2| +|1970-01-01T08:00:00.003+08:00| 3.33| 3.33| true| faLse| 3| 3| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ + +// cast BOOLEAN to other types +IoTDB> select cast(s5 as INT32), cast(s5 as INT64),cast(s5 as FLOAT),cast(s5 as DOUBLE), cast(s5 as TEXT) from root.sg.d1 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +| Time|CAST(root.sg.d1.s5 AS INT32)|CAST(root.sg.d1.s5 AS INT64)|CAST(root.sg.d1.s5 AS FLOAT)|CAST(root.sg.d1.s5 AS DOUBLE)|CAST(root.sg.d1.s5 AS TEXT)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ + +// cast TEXT to numeric types +IoTDB> select cast(s6 as INT32), cast(s6 as INT64), cast(s6 as FLOAT), cast(s6 as DOUBLE) from root.sg.d1 where time < 2 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +| Time|CAST(root.sg.d1.s6 AS INT32)|CAST(root.sg.d1.s6 AS INT64)|CAST(root.sg.d1.s6 AS FLOAT)|CAST(root.sg.d1.s6 AS DOUBLE)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 10000| 10000| 10000.0| 10000.0| +|1970-01-01T08:00:00.001+08:00| 3| 3| 3.0| 3.0| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ + +// cast TEXT to BOOLEAN +IoTDB> select cast(s6 as BOOLEAN) from root.sg.d1 where time >= 2 ++-----------------------------+------------------------------+ +| Time|CAST(root.sg.d1.s6 AS BOOLEAN)| ++-----------------------------+------------------------------+ +|1970-01-01T08:00:00.002+08:00| true| +|1970-01-01T08:00:00.003+08:00| false| ++-----------------------------+------------------------------+ +``` + + + + +## Constant Timeseries Generating Functions + +The constant timeseries generating function is used to generate a timeseries in which the values of all data points are the same. + +The constant timeseries generating function accepts one or more timeseries inputs, and the timestamp set of the output data points is the union of the timestamp sets of the input timeseries. + +Currently, IoTDB supports the following constant timeseries generating functions: + +| Function Name | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------ | +| CONST | `value`: the value of the output data point
`type`: the type of the output data point, it can only be INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | Determined by the required attribute `type` | Output the user-specified constant timeseries according to the attributes `value` and `type`. | +| PI | None | DOUBLE | Data point value: a `double` value of `π`, the ratio of the circumference of a circle to its diameter, which is equals to `Math.PI` in the *Java Standard Library*. | +| E | None | DOUBLE | Data point value: a `double` value of `e`, the base of the natural logarithms, which is equals to `Math.E` in the *Java Standard Library*. | + +Example: + +``` sql +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; +``` + +Result: + +``` +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|const(root.sg1.d1.s1, "value"="1024", "type"="INT64")|pi(root.sg1.d1.s2)|e(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 1024| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.001+08:00| 1.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.002+08:00| 2.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.003+08:00| null| 3.0| null| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.004+08:00| null| 4.0| null| 3.141592653589793| 2.718281828459045| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +Total line number = 5 +It costs 0.005s +``` + + + +## Selector Functions + +Currently, IoTDB supports the following selector functions: + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the largest values in a time series. | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the smallest values in a time series. | + +Example: + +``` sql +select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; +``` + +Result: + +``` ++-----------------------------+--------------------+------------------------------+---------------------------------+ +| Time| root.sg1.d2.s1|top_k(root.sg1.d2.s1, "k"="2")|bottom_k(root.sg1.d2.s1, "k"="2")| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +|2020-12-10T20:36:15.531+08:00| 1531604122307244742| 1531604122307244742| null| +|2020-12-10T20:36:15.532+08:00|-7426070874923281101| null| null| +|2020-12-10T20:36:15.533+08:00|-7162825364312197604| -7162825364312197604| null| +|2020-12-10T20:36:15.534+08:00|-8581625725655917595| null| -8581625725655917595| +|2020-12-10T20:36:15.535+08:00|-7667364751255535391| null| -7667364751255535391| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +Total line number = 5 +It costs 0.006s +``` + + + +## Continuous Interval Functions + +The continuous interval functions are used to query all continuous intervals that meet specified conditions. +They can be divided into two categories according to return value: +1. Returns the start timestamp and time span of the continuous interval that meets the conditions (a time span of 0 means that only the start time point meets the conditions) +2. Returns the start timestamp of the continuous interval that meets the condition and the number of points in the interval (a number of 1 means that only the start time point meets the conditions) + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------- | ------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always 0(false), and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always not 0, and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always 0(false). Data points number `n` satisfy `n >= min && n <= max` | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always not 0(false). Data points number `n` satisfy `n >= min && n <= max` | + +### Demonstrate +Example data: +``` +IoTDB> select s1,s2,s3,s4,s5 from root.sg.d2; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d2.s1|root.sg.d2.s2|root.sg.d2.s3|root.sg.d2.s4|root.sg.d2.s5| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.004+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.005+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.006+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.007+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +``` + +Sql: +```sql +select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; +``` + +Result: +``` ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +| Time|root.sg.d2.s1|zero_count(root.sg.d2.s1)|non_zero_count(root.sg.d2.s2)|zero_duration(root.sg.d2.s3)|non_zero_duration(root.sg.d2.s4)| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.001+08:00| 1| null| 2| null| 1| +|1970-01-01T08:00:00.002+08:00| 1| null| null| null| null| +|1970-01-01T08:00:00.003+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.004+08:00| 1| null| 1| null| 0| +|1970-01-01T08:00:00.005+08:00| 0| 2| null| 1| null| +|1970-01-01T08:00:00.006+08:00| 0| null| null| null| null| +|1970-01-01T08:00:00.007+08:00| 1| null| 1| null| 0| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +``` + + + +## Variation Trend Calculation Functions + +Currently, IoTDB supports the following variation trend calculation functions: + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | / | INT64 | Calculates the difference between the time stamp of a data point and the time stamp of the previous data point. There is no corresponding output for the first data point. | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the absolute value of the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the rate of change of a data point compared to the previous data point, the result is equals to DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the absolute value of the rate of change of a data point compared to the previous data point, the result is equals to NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:optional,default is true. If is true, the previous data point is ignored when it is null and continues to find the first non-null value forwardly. If the value is false, previous data point is not ignored when it is null, the result is also null because null is used for subtraction | DOUBLE | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point, so output is null | + +Example: + +``` sql +select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +Result: + +``` ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +| Time| root.sg1.d1.s1|time_difference(root.sg1.d1.s1)|difference(root.sg1.d1.s1)|non_negative_difference(root.sg1.d1.s1)|derivative(root.sg1.d1.s1)|non_negative_derivative(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 1| -8431715764844238876| 8431715764844238876| -8.4317157648442388E18| 8.4317157648442388E18| +|2020-12-10T17:11:49.038+08:00|4377791063319964531| 1| -2982932021602795251| 2982932021602795251| -2.982932021602795E18| 2.982932021602795E18| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 1| 3594694504414678384| 3594694504414678384| 3.5946945044146785E18| 3.5946945044146785E18| +|2020-12-10T17:11:49.040+08:00|2508858212791964081| 1| -5463627354942678834| 5463627354942678834| -5.463627354942679E18| 5.463627354942679E18| +|2020-12-10T17:11:49.041+08:00|2817297431185141819| 1| 308439218393177738| 308439218393177738| 3.0843921839317773E17| 3.0843921839317773E17| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +Total line number = 5 +It costs 0.014s +``` + +### Example + +#### RawData + +``` ++-----------------------------+------------+------------+ +| Time|root.test.s1|root.test.s2| ++-----------------------------+------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| null| +|1970-01-01T08:00:00.003+08:00| null| 3.0| +|1970-01-01T08:00:00.004+08:00| 4| null| +|1970-01-01T08:00:00.005+08:00| 5| 5.0| +|1970-01-01T08:00:00.006+08:00| null| 6.0| ++-----------------------------+------------+------------+ +``` + +#### Not use `ignoreNull` attribute (Ignore Null) + +SQL: +```sql +SELECT DIFF(s1), DIFF(s2) from root.test; +``` + +Result: +``` ++-----------------------------+------------------+------------------+ +| Time|DIFF(root.test.s1)|DIFF(root.test.s2)| ++-----------------------------+------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| 2.0| +|1970-01-01T08:00:00.004+08:00| 2.0| null| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+------------------+------------------+ +``` + +#### Use `ignoreNull` attribute + +SQL: +```sql +SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; +``` + +Result: +``` ++-----------------------------+----------------------------------------+----------------------------------------+ +| Time|DIFF(root.test.s1, "ignoreNull"="false")|DIFF(root.test.s2, "ignoreNull"="false")| ++-----------------------------+----------------------------------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| null| +|1970-01-01T08:00:00.004+08:00| null| null| +|1970-01-01T08:00:00.005+08:00| 1.0| null| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+----------------------------------------+----------------------------------------+ +``` + + + +## Sample Functions + +### Equal Size Bucket Sample Function + +This function samples the input sequence in equal size buckets, that is, according to the downsampling ratio and downsampling method given by the user, the input sequence is equally divided into several buckets according to a fixed number of points. Sampling by the given sampling method within each bucket. +- `proportion`: sample ratio, the value range is `(0, 1]`. +#### Equal Size Bucket Random Sample +Random sampling is performed on the equally divided buckets. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns a random sample of equal buckets that matches the sampling ratio | + +##### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`. + +```sql +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 5.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|.............................|.............................| +|1970-01-01T08:00:00.089+08:00| 89.0| +|1970-01-01T08:00:00.090+08:00| 90.0| +|1970-01-01T08:00:00.091+08:00| 91.0| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 95.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +Sql: +```sql +select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+-------------+ +| Time|random_sample| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.035+08:00| 35.0| +|1970-01-01T08:00:00.047+08:00| 47.0| +|1970-01-01T08:00:00.059+08:00| 59.0| +|1970-01-01T08:00:00.063+08:00| 63.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.086+08:00| 86.0| +|1970-01-01T08:00:00.096+08:00| 96.0| ++-----------------------------+-------------+ +Total line number = 10 +It costs 0.024s +``` + +#### Equal Size Bucket Aggregation Sample + +The input sequence is sampled by the aggregation sampling method, and the user needs to provide an additional aggregation function parameter, namely +- `type`: Aggregate type, which can be `avg` or `max` or `min` or `sum` or `extreme` or `variance`. By default, `avg` is used. `extreme` represents the value with the largest absolute value in the equal bucket. `variance` represents the variance in the sampling equal buckets. + +The timestamp of the sampling output of each bucket is the timestamp of the first point of the bucket. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1`
`type`: The value types are `avg`, `max`, `min`, `sum`, `extreme`, `variance`, the default is `avg` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket aggregation samples that match the sampling ratio | + +##### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`, and the test data is randomly sampled in equal buckets. + +Sql: +```sql +select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +| Time| agg_avg|agg_max|agg_min|agg_sum|agg_extreme|agg_variance| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| 14.5| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| 20.0| 245.0| 29.0| 8.25| +|1970-01-01T08:00:00.030+08:00| 34.5| 39.0| 30.0| 345.0| 39.0| 8.25| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| 54.5| 59.0| 50.0| 545.0| 59.0| 8.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| 8.25| +|1970-01-01T08:00:00.070+08:00|74.50000000000001| 79.0| 70.0| 745.0| 79.0| 8.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 8.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 8.25| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +Total line number = 10 +It costs 0.044s +``` + +#### Equal Size Bucket M4 Sample + +The input sequence is sampled using the M4 sampling method. That is to sample the head, tail, min and max values for each bucket. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket M4 samples that match the sampling ratio | + +##### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`, and the test data is randomly sampled in equal buckets. + +Sql: +```sql +select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+---------+ +| Time|M4_sample| ++-----------------------------+---------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.038+08:00| 38.0| +|1970-01-01T08:00:00.039+08:00| 39.0| +|1970-01-01T08:00:00.040+08:00| 40.0| +|1970-01-01T08:00:00.041+08:00| 41.0| +|1970-01-01T08:00:00.078+08:00| 78.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.080+08:00| 80.0| +|1970-01-01T08:00:00.081+08:00| 81.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+---------+ +Total line number = 12 +It costs 0.065s +``` + +#### Equal Size Bucket Outlier Sample + +This function samples the input sequence with equal number of bucket outliers, that is, according to the downsampling ratio given by the user and the number of samples in the bucket, the input sequence is divided into several buckets according to a fixed number of points. Sampling by the given outlier sampling method within each bucket. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | The value range of `proportion` is `(0, 1]`, the default is `0.1`
The value of `type` is `avg` or `stendis` or `cos` or `prenextdis`, the default is `avg`
The value of `number` should be greater than 0, the default is `3`| INT32 / INT64 / FLOAT / DOUBLE | Returns outlier samples in equal buckets that match the sampling ratio and the number of samples in the bucket | + +Parameter Description +- `proportion`: sampling ratio +- `number`: the number of samples in each bucket, default `3` +- `type`: outlier sampling method, the value is + - `avg`: Take the average of the data points in the bucket, and find the `top number` farthest from the average according to the sampling ratio + - `stendis`: Take the vertical distance between each data point in the bucket and the first and last data points of the bucket to form a straight line, and according to the sampling ratio, find the `top number` with the largest distance + - `cos`: Set a data point in the bucket as b, the data point on the left of b as a, and the data point on the right of b as c, then take the cosine value of the angle between the ab and bc vectors. The larger the angle, the more likely it is an outlier. Find the `top number` with the smallest cos value + - `prenextdis`: Let a data point in the bucket be b, the data point to the left of b is a, and the data point to the right of b is c, then take the sum of the lengths of ab and bc as the yardstick, the larger the sum, the more likely it is to be an outlier, and find the `top number` with the largest sum value + +##### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`. Among them, in order to add outliers, we make the number modulo 5 equal to 0 increment by 100. + +```sql +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 105.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|1970-01-01T08:00:00.013+08:00| 13.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.015+08:00| 115.0| +|1970-01-01T08:00:00.016+08:00| 16.0| +|.............................|.............................| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 195.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +Sql: +```sql +select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +| Time|outlier_avg_sample|outlier_stendis_sample|outlier_cos_sample|outlier_prenextdis_sample| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +|1970-01-01T08:00:00.005+08:00| 105.0| 105.0| 105.0| 105.0| +|1970-01-01T08:00:00.015+08:00| 115.0| 115.0| 115.0| 115.0| +|1970-01-01T08:00:00.025+08:00| 125.0| 125.0| 125.0| 125.0| +|1970-01-01T08:00:00.035+08:00| 135.0| 135.0| 135.0| 135.0| +|1970-01-01T08:00:00.045+08:00| 145.0| 145.0| 145.0| 145.0| +|1970-01-01T08:00:00.055+08:00| 155.0| 155.0| 155.0| 155.0| +|1970-01-01T08:00:00.065+08:00| 165.0| 165.0| 165.0| 165.0| +|1970-01-01T08:00:00.075+08:00| 175.0| 175.0| 175.0| 175.0| +|1970-01-01T08:00:00.085+08:00| 185.0| 185.0| 185.0| 185.0| +|1970-01-01T08:00:00.095+08:00| 195.0| 195.0| 195.0| 195.0| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +Total line number = 10 +It costs 0.041s +``` + +### M4 Function + +M4 is used to sample the `first, last, bottom, top` points for each sliding window: + +- the first point is the point with the **m**inimal time; +- the last point is the point with the **m**aximal time; +- the bottom point is the point with the **m**inimal value (if there are multiple such points, M4 returns one of them); +- the top point is the point with the **m**aximal value (if there are multiple such points, M4 returns one of them). + +image + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | Different attributes used by the size window and the time window. The size window uses attributes `windowSize` and `slidingStep`. The time window uses attributes `timeInterval`, `slidingStep`, `displayWindowBegin`, and `displayWindowEnd`. More details see below. | INT32 / INT64 / FLOAT / DOUBLE | Returns the `first, last, bottom, top` points in each sliding window. M4 sorts and deduplicates the aggregated points within the window before outputting them. | + +#### Attributes + +**(1) Attributes for the size window:** + ++ `windowSize`: The number of points in a window. Int data type. **Required**. ++ `slidingStep`: Slide a window by the number of points. Int data type. Optional. If not set, default to the same as `windowSize`. + +image + +**(2) Attributes for the time window:** + ++ `timeInterval`: The time interval length of a window. Long data type. **Required**. ++ `slidingStep`: Slide a window by the time length. Long data type. Optional. If not set, default to the same as `timeInterval`. ++ `displayWindowBegin`: The starting position of the window (included). Long data type. Optional. If not set, default to Long.MIN_VALUE, meaning using the time of the first data point of the input time series as the starting position of the window. ++ `displayWindowEnd`: End time limit (excluded, essentially playing the same role as `WHERE time < displayWindowEnd`). Long data type. Optional. If not set, default to Long.MAX_VALUE, meaning there is no additional end time limit other than the end of the input time series itself. + +groupBy window + +#### Examples + +Input series: + +```sql ++-----------------------------+------------------+ +| Time|root.vehicle.d1.s1| ++-----------------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.002+08:00| 15.0| +|1970-01-01T08:00:00.005+08:00| 10.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.027+08:00| 20.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.040+08:00| 20.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+------------------+ +``` + +SQL for query1: + +```sql +select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 +``` + +Output1: + +```sql ++-----------------------------+-----------------------------------------------------------------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "timeInterval"="25", "displayWindowBegin"="0", "displayWindowEnd"="100")| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +Total line number = 8 +``` + +SQL for query2: + +```sql +select M4(s1,'windowSize'='10') from root.vehicle.d1 +``` + +Output2: + +```sql ++-----------------------------+-----------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "windowSize"="10")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------+ +Total line number = 7 +``` + +#### Suggested Use Cases + +**(1) Use Case: Extreme-point-preserving downsampling** + +As M4 aggregation selects the `first, last, bottom, top` points for each window, M4 usually preserves extreme points and thus patterns better than other downsampling methods such as Piecewise Aggregate Approximation (PAA). Therefore, if you want to downsample the time series while preserving extreme points, you may give M4 a try. + +**(2) Use case: Error-free two-color line chart visualization of large-scale time series through M4 downsampling** + +Referring to paper ["M4: A Visualization-Oriented Time Series Data Aggregation"](http://www.vldb.org/pvldb/vol7/p797-jugel.pdf), M4 is a downsampling method to facilitate large-scale time series visualization without deforming the shape in terms of a two-color line chart. + +Given a chart of `w*h` pixels, suppose that the visualization time range of the time series is `[tqs,tqe)` and (tqe-tqs) is divisible by w, the points that fall within the `i`-th time span `Ii=[tqs+(tqe-tqs)/w*(i-1),tqs+(tqe-tqs)/w*i)` will be drawn on the `i`-th pixel column, i=1,2,...,w. Therefore, from a visualization-driven perspective, use the sql: `"select M4(s1,'timeInterval'='(tqe-tqs)/w','displayWindowBegin'='tqs','displayWindowEnd'='tqe') from root.vehicle.d1"` to sample the `first, last, bottom, top` points for each time span. The resulting downsampled time series has no more than `4*w` points, a big reduction compared to the original large-scale time series. Meanwhile, the two-color line chart drawn from the reduced data is identical that to that drawn from the original data (pixel-level consistency). + +To eliminate the hassle of hardcoding parameters, we recommend the following usage of Grafana's [template variable](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables) `$__interval_ms` when Grafana is used for visualization: + +``` +select M4(s1,'timeInterval'='$__interval_ms') from root.sg1.d1 +``` + +where `timeInterval` is set as `(tqe-tqs)/w` automatically. Note that the time precision here is assumed to be milliseconds. + +#### Comparison with Other Functions + +| SQL | Whether support M4 aggregation | Sliding window type | Example | Docs | +| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| 1. native built-in aggregate functions with Group By clause | No. Lack `BOTTOM_TIME` and `TOP_TIME`, which are respectively the time of the points that have the mininum and maximum value. | Time Window | `select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#built-in-aggregate-functions
https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#downsampling-aggregate-query | +| 2. EQUAL_SIZE_BUCKET_M4_SAMPLE (built-in UDF) | Yes* | Size Window. `windowSize = 4*(int)(1/proportion)` | `select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Select-Expression.html#time-series-generating-functions | +| **3. M4 (built-in UDF)** | Yes* | Size Window, Time Window | (1) Size Window: `select M4(s1,'windowSize'='10') from root.vehicle.d1`
(2) Time Window: `select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1` | refer to this doc | +| 4. extend native built-in aggregate functions with Group By clause to support M4 aggregation | not implemented | not implemented | not implemented | not implemented | + +Further compare `EQUAL_SIZE_BUCKET_M4_SAMPLE` and `M4`: + +**(1) Different M4 aggregation definition:** + +For each window, `EQUAL_SIZE_BUCKET_M4_SAMPLE` extracts the top and bottom points from points **EXCLUDING** the first and last points. + +In contrast, `M4` extracts the top and bottom points from points **INCLUDING** the first and last points, which is more consistent with the semantics of `max_value` and `min_value` stored in metadata. + +It is worth noting that both functions sort and deduplicate the aggregated points in a window before outputting them to the collectors. + +**(2) Different sliding windows:** + +`EQUAL_SIZE_BUCKET_M4_SAMPLE` uses SlidingSizeWindowAccessStrategy and **indirectly** controls sliding window size by sampling proportion. The conversion formula is `windowSize = 4*(int)(1/proportion)`. + +`M4` supports two types of sliding window: SlidingSizeWindowAccessStrategy and SlidingTimeWindowAccessStrategy. `M4` **directly** controls the window point size or time length using corresponding parameters. + + + + +## Time Series Processing + +### CHANGE_POINTS + +#### Usage + +This function is used to remove consecutive identical values from an input sequence. +For example, input:`1,1,2,2,3` output:`1,2,3`. + +**Name:** CHANGE_POINTS + +**Input Series:** Support only one input series. + +**Parameters:** No parameters. + +#### Example + +Raw data: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|root.testChangePoints.d1.s1|root.testChangePoints.d1.s2|root.testChangePoints.d1.s3|root.testChangePoints.d1.s4|root.testChangePoints.d1.s5|root.testChangePoints.d1.s6| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| true| 2| 2| 2.0| 1.0| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| 2| 1.0| 1.0| 2test2| +|1970-01-01T08:00:00.004+08:00| true| 1| 3| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.005+08:00| true| 1| 3| 1.0| 1.0| 1test1| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +``` + +SQL for query: + +```sql +select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +| Time|change_points(root.testChangePoints.d1.s1)|change_points(root.testChangePoints.d1.s2)|change_points(root.testChangePoints.d1.s3)|change_points(root.testChangePoints.d1.s4)|change_points(root.testChangePoints.d1.s5)|change_points(root.testChangePoints.d1.s6)| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| null| 2| 2| 2.0| null| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| null| 1.0| null| null| +|1970-01-01T08:00:00.004+08:00| true| null| 3| null| null| 1test1| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +``` + + + +## Lambda Expression + +### JEXL Function + +Java Expression Language (JEXL) is an expression language engine. We use JEXL to extend UDFs, which are implemented on the command line with simple lambda expressions. See the link for [operators supported in jexl lambda expressions](https://commons.apache.org/proper/commons-jexl/apidocs/org/apache/commons/jexl3/package-summary.html#customization). + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Series Data Type Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr` is a lambda expression that supports standard one or multi arguments in the form `x -> {...}` or `(x, y, z) -> {...}`, e.g. ` x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | Returns the input time series transformed by a lambda expression | + +##### Demonstrate +Example data: `root.ln.wf01.wt01.temperature`, `root.ln.wf01.wt01.st`, `root.ln.wf01.wt01.str` a total of `11` data. + +``` +IoTDB> select * from root.ln.wf01.wt01; ++-----------------------------+---------------------+--------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.str|root.ln.wf01.wt01.st|root.ln.wf01.wt01.temperature| ++-----------------------------+---------------------+--------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| str| 10.0| 0.0| +|1970-01-01T08:00:00.001+08:00| str| 20.0| 1.0| +|1970-01-01T08:00:00.002+08:00| str| 30.0| 2.0| +|1970-01-01T08:00:00.003+08:00| str| 40.0| 3.0| +|1970-01-01T08:00:00.004+08:00| str| 50.0| 4.0| +|1970-01-01T08:00:00.005+08:00| str| 60.0| 5.0| +|1970-01-01T08:00:00.006+08:00| str| 70.0| 6.0| +|1970-01-01T08:00:00.007+08:00| str| 80.0| 7.0| +|1970-01-01T08:00:00.008+08:00| str| 90.0| 8.0| +|1970-01-01T08:00:00.009+08:00| str| 100.0| 9.0| +|1970-01-01T08:00:00.010+08:00| str| 110.0| 10.0| ++-----------------------------+---------------------+--------------------+-----------------------------+ +``` +Sql: +```sql +select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` +``` + +Result: +``` ++-----------------------------+-----+-----+-----+------+-----+--------+ +| Time|jexl1|jexl2|jexl3| jexl4|jexl5| jexl6| ++-----------------------------+-----+-----+-----+------+-----+--------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 0.0| 0.0| 10.0| 10.0str| +|1970-01-01T08:00:00.001+08:00| 2.0| 3.0| 1.0| 100.0| 21.0| 21.0str| +|1970-01-01T08:00:00.002+08:00| 4.0| 6.0| 4.0| 200.0| 32.0| 32.0str| +|1970-01-01T08:00:00.003+08:00| 6.0| 9.0| 9.0| 300.0| 43.0| 43.0str| +|1970-01-01T08:00:00.004+08:00| 8.0| 12.0| 16.0| 400.0| 54.0| 54.0str| +|1970-01-01T08:00:00.005+08:00| 10.0| 15.0| 25.0| 500.0| 65.0| 65.0str| +|1970-01-01T08:00:00.006+08:00| 12.0| 18.0| 36.0| 600.0| 76.0| 76.0str| +|1970-01-01T08:00:00.007+08:00| 14.0| 21.0| 49.0| 700.0| 87.0| 87.0str| +|1970-01-01T08:00:00.008+08:00| 16.0| 24.0| 64.0| 800.0| 98.0| 98.0str| +|1970-01-01T08:00:00.009+08:00| 18.0| 27.0| 81.0| 900.0|109.0|109.0str| +|1970-01-01T08:00:00.010+08:00| 20.0| 30.0|100.0|1000.0|120.0|120.0str| ++-----------------------------+-----+-----+-----+------+-----+--------+ +Total line number = 11 +It costs 0.118s +``` + + + + +## Conditional Expressions + +### CASE + +The CASE expression is a kind of conditional expression that can be used to return different values based on specific conditions, similar to the if-else statements in other languages. + +The CASE expression consists of the following parts: + +- CASE keyword: Indicates the start of the CASE expression. +- WHEN-THEN clauses: There may be multiple clauses used to define conditions and give results. This clause is divided into two parts, WHEN and THEN. The WHEN part defines the condition, and the THEN part defines the result expression. If the WHEN condition is true, the corresponding THEN result is returned. +- ELSE clause: If none of the WHEN conditions is true, the result in the ELSE clause will be returned. The ELSE clause can be omitted. +- END keyword: Indicates the end of the CASE expression. + +The CASE expression is a scalar operation that can be used in combination with any other scalar operation or aggregate function. + +In the following text, all THEN parts and ELSE clauses will be collectively referred to as result clauses. + +#### Syntax + +The CASE expression supports two formats. + +- Format 1: + ```sql + CASE + WHEN condition1 THEN expression1 + [WHEN condition2 THEN expression2] ... + [ELSE expression_end] + END + ``` + The `condition`s will be evaluated one by one. + + The first `condition` that is true will return the corresponding expression. + +- Format 2: + ```sql + CASE caseValue + WHEN whenValue1 THEN expression1 + [WHEN whenValue2 THEN expression2] ... + [ELSE expression_end] + END + ``` + The `caseValue` will be evaluated first, and then the `whenValue`s will be evaluated one by one. The first `whenValue` that is equal to the `caseValue` will return the corresponding `expression`. + + Format 2 will be transformed into an equivalent Format 1 by iotdb. + + For example, the above SQL statement will be transformed into: + + ```sql + CASE + WHEN caseValue=whenValue1 THEN expression1 + [WHEN caseValue=whenValue1 THEN expression1] ... + [ELSE expression_end] + END + ``` + +If none of the conditions are true, or if none of the `whenValue`s match the `caseValue`, the `expression_end` will be returned. + +If there is no ELSE clause, `null` will be returned. + +#### Notes + +- In format 1, all WHEN clauses must return a BOOLEAN type. +- In format 2, all WHEN clauses must be able to be compared to the CASE clause. +- All result clauses in a CASE expression must satisfy certain conditions for their return value types: + - BOOLEAN types cannot coexist with other types and will cause an error if present. + - TEXT types cannot coexist with other types and will cause an error if present. + - The other four numeric types can coexist, and the final result will be of DOUBLE type, with possible precision loss during conversion. + - If necessary, you can use the CAST function to convert the result to a type that can coexist with others. +- The CASE expression does not implement lazy evaluation, meaning that all clauses will be evaluated. +- The CASE expression does not support mixing with UDFs. +- Aggregate functions cannot be used within a CASE expression, but the result of a CASE expression can be used as input for an aggregate function. +- When using the CLI, because the CASE expression string can be lengthy, it is recommended to provide an alias for the expression using AS. + +#### Using Examples + +##### Example 1 + +The CASE expression can be used to analyze data in a visual way. For example: +- The preparation of a certain chemical product requires that the temperature and pressure be within specific ranges. +- During the preparation process, sensors will detect the temperature and pressure, forming two time-series T (temperature) and P (pressure) in IoTDB. +In this application scenario, the CASE expression can indicate which time parameters are appropriate, which are not, and why they are not. + +data: +```sql +IoTDB> select * from root.test1 ++-----------------------------+------------+------------+ +| Time|root.test1.P|root.test1.T| ++-----------------------------+------------+------------+ +|2023-03-29T11:25:54.724+08:00| 1000000.0| 1025.0| +|2023-03-29T11:26:13.445+08:00| 1000094.0| 1040.0| +|2023-03-29T11:27:36.988+08:00| 1000095.0| 1041.0| +|2023-03-29T11:27:56.446+08:00| 1000095.0| 1059.0| +|2023-03-29T11:28:20.838+08:00| 1200000.0| 1040.0| ++-----------------------------+------------+------------+ +``` + +SQL statements: +```sql +select T, P, case +when 1000=1050 then "bad temperature" +when P<=1000000 or P>=1100000 then "bad pressure" +end as `result` +from root.test1 +``` + + +output: +``` ++-----------------------------+------------+------------+---------------+ +| Time|root.test1.T|root.test1.P| result| ++-----------------------------+------------+------------+---------------+ +|2023-03-29T11:25:54.724+08:00| 1025.0| 1000000.0| bad pressure| +|2023-03-29T11:26:13.445+08:00| 1040.0| 1000094.0| good!| +|2023-03-29T11:27:36.988+08:00| 1041.0| 1000095.0| good!| +|2023-03-29T11:27:56.446+08:00| 1059.0| 1000095.0|bad temperature| +|2023-03-29T11:28:20.838+08:00| 1040.0| 1200000.0| bad pressure| ++-----------------------------+------------+------------+---------------+ +``` + + +##### Example 2 + +The CASE expression can achieve flexible result transformation, such as converting strings with a certain pattern to other strings. + +data: +```sql +IoTDB> select * from root.test2 ++-----------------------------+--------------+ +| Time|root.test2.str| ++-----------------------------+--------------+ +|2023-03-27T18:23:33.427+08:00| abccd| +|2023-03-27T18:23:39.389+08:00| abcdd| +|2023-03-27T18:23:43.463+08:00| abcdefg| ++-----------------------------+--------------+ +``` + +SQL statements: +```sql +select str, case +when str like "%cc%" then "has cc" +when str like "%dd%" then "has dd" +else "no cc and dd" end as `result` +from root.test2 +``` + +output: +``` ++-----------------------------+--------------+------------+ +| Time|root.test2.str| result| ++-----------------------------+--------------+------------+ +|2023-03-27T18:23:33.427+08:00| abccd| has cc| +|2023-03-27T18:23:39.389+08:00| abcdd| has dd| +|2023-03-27T18:23:43.463+08:00| abcdefg|no cc and dd| ++-----------------------------+--------------+------------+ +``` + +##### Example 3: work with aggregation functions + +###### Valid: aggregation function ← CASE expression + +The CASE expression can be used as a parameter for aggregate functions. For example, used in conjunction with the COUNT function, it can implement statistics based on multiple conditions simultaneously. + +data: +```sql +IoTDB> select * from root.test3 ++-----------------------------+------------+ +| Time|root.test3.x| ++-----------------------------+------------+ +|2023-03-27T18:11:11.300+08:00| 0.0| +|2023-03-27T18:11:14.658+08:00| 1.0| +|2023-03-27T18:11:15.981+08:00| 2.0| +|2023-03-27T18:11:17.668+08:00| 3.0| +|2023-03-27T18:11:19.112+08:00| 4.0| +|2023-03-27T18:11:20.822+08:00| 5.0| +|2023-03-27T18:11:22.462+08:00| 6.0| +|2023-03-27T18:11:24.174+08:00| 7.0| +|2023-03-27T18:11:25.858+08:00| 8.0| +|2023-03-27T18:11:27.979+08:00| 9.0| ++-----------------------------+------------+ +``` + +SQL statements: + +```sql +select +count(case when x<=1 then 1 end) as `(-∞,1]`, +count(case when 1 select * from root.test4 ++-----------------------------+------------+ +| Time|root.test4.x| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| ++-----------------------------+------------+ +``` + +SQL statements: +```sql +select x, case x when 1 then "one" when 2 then "two" else "other" end from root.test4 +``` + +output: +``` ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +| Time|root.test4.x|CASE WHEN root.test4.x = 1 THEN "one" WHEN root.test4.x = 2 THEN "two" ELSE "other"| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| one| +|1970-01-01T08:00:00.002+08:00| 2.0| two| +|1970-01-01T08:00:00.003+08:00| 3.0| other| +|1970-01-01T08:00:00.004+08:00| 4.0| other| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +``` + +##### Example 5: type of return clauses + +The result clause of a CASE expression needs to satisfy certain type restrictions. + +In this example, we continue to use the data from Example 4. + +###### Invalid: BOOLEAN cannot coexist with other types + +SQL statements: +```sql +select x, case x when 1 then true when 2 then 2 end from root.test4 +``` + +output: +``` +Msg: 701: CASE expression: BOOLEAN and other types cannot exist at same time +``` + +###### Valid: Only BOOLEAN type exists + +SQL statements: +```sql +select x, case x when 1 then true when 2 then false end as `result` from root.test4 +``` + +output: +``` ++-----------------------------+------------+------+ +| Time|root.test4.x|result| ++-----------------------------+------------+------+ +|1970-01-01T08:00:00.001+08:00| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 2.0| false| +|1970-01-01T08:00:00.003+08:00| 3.0| null| +|1970-01-01T08:00:00.004+08:00| 4.0| null| ++-----------------------------+------------+------+ +``` + +###### Invalid:TEXT cannot coexist with other types + +SQL statements: +```sql +select x, case x when 1 then 1 when 2 then "str" end from root.test4 +``` + +output: +``` +Msg: 701: CASE expression: TEXT and other types cannot exist at same time +``` + +###### Valid: Only TEXT type exists + +See in Example 1. + +###### Valid: Numerical types coexist + +SQL statements: +```sql +select x, case x +when 1 then 1 +when 2 then 222222222222222 +when 3 then 3.3 +when 4 then 4.4444444444444 +end as `result` +from root.test4 +``` + +output: +``` ++-----------------------------+------------+-------------------+ +| Time|root.test4.x| result| ++-----------------------------+------------+-------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0|2.22222222222222E14| +|1970-01-01T08:00:00.003+08:00| 3.0| 3.299999952316284| +|1970-01-01T08:00:00.004+08:00| 4.0| 4.44444465637207| ++-----------------------------+------------+-------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/SQL-Manual/Operator-and-Expression.md b/src/UserGuide/V2.0.1/Tree/SQL-Manual/Operator-and-Expression.md new file mode 100644 index 00000000..438cd243 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/SQL-Manual/Operator-and-Expression.md @@ -0,0 +1,573 @@ + + +# Operator and Expression + +This chapter describes the operators and functions supported by IoTDB. IoTDB provides a wealth of built-in operators and functions to meet your computing needs, and supports extensions through the [User-Defined Function](../Reference/UDF-Libraries.md). + +A list of all available functions, both built-in and custom, can be displayed with `SHOW FUNCTIONS` command. + +See the documentation [Select-Expression](../Reference/Function-and-Expression.md#selector-functions) for the behavior of operators and functions in SQL. + +## OPERATORS + +### Arithmetic Operators + +| Operator | Meaning | +| -------- | ------------------------- | +| `+` | positive (unary operator) | +| `-` | negative (unary operator) | +| `*` | multiplication | +| `/` | division | +| `%` | modulo | +| `+` | addition | +| `-` | subtraction | + +For details and examples, see the document [Arithmetic Operators and Functions](../Reference/Function-and-Expression.md#arithmetic-functions). + +### Comparison Operators + +| Operator | Meaning | +| ------------------------- | ------------------------------------ | +| `>` | greater than | +| `>=` | greater than or equal to | +| `<` | less than | +| `<=` | less than or equal to | +| `==` | equal to | +| `!=` / `<>` | not equal to | +| `BETWEEN ... AND ...` | within the specified range | +| `NOT BETWEEN ... AND ...` | not within the specified range | +| `LIKE` | match simple pattern | +| `NOT LIKE` | cannot match simple pattern | +| `REGEXP` | match regular expression | +| `NOT REGEXP` | cannot match regular expression | +| `IS NULL` | is null | +| `IS NOT NULL` | is not null | +| `IN` / `CONTAINS` | is a value in the specified list | +| `NOT IN` / `NOT CONTAINS` | is not a value in the specified list | + +For details and examples, see the document [Comparison Operators and Functions](../Reference/Function-and-Expression.md#comparison-operators-and-functions). + +### Logical Operators + +| Operator | Meaning | +| --------------------------- | --------------------------------- | +| `NOT` / `!` | logical negation (unary operator) | +| `AND` / `&` / `&&` | logical AND | +| `OR`/ | / || | logical OR | + +For details and examples, see the document [Logical Operators](../Reference/Function-and-Expression.md#logical-operators). + +### Operator Precedence + +The precedence of operators is arranged as shown below from high to low, and operators on the same row have the same precedence. + +```sql +!, - (unary operator), + (unary operator) +*, /, DIV, %, MOD +-, + +=, ==, <=>, >=, >, <=, <, <>, != +LIKE, REGEXP, NOT LIKE, NOT REGEXP +BETWEEN ... AND ..., NOT BETWEEN ... AND ... +IS NULL, IS NOT NULL +IN, CONTAINS, NOT IN, NOT CONTAINS +AND, &, && +OR, |, || +``` + +## BUILT-IN FUNCTIONS + +The built-in functions can be used in IoTDB without registration, and the functions in the data quality function library need to be registered by referring to the registration steps in the next chapter before they can be used. + +### Aggregate Functions + +| Function Name | Description | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | +|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------| +| SUM | Summation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| COUNT | Counts the number of data points. | All types | / | INT | +| AVG | Average. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| STDDEV | Alias for STDDEV_SAMP. Return the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| STDDEV_POP | Return the population standard deviation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| STDDEV_SAMP | Return the sample standard deviation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| VARIANCE | Alias for VAR_SAMP. Return the sample variance. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| VAR_POP | Return the population variance. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| VAR_SAMP | Return the sample variance. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| EXTREME | Finds the value with the largest absolute value. Returns a positive value if the maximum absolute value of positive and negative values is equal. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| MAX_VALUE | Find the maximum value. | INT32 INT64 FLOAT DOUBLE STRING TIMESTAMP DATE | / | Consistent with the input data type | +| MIN_VALUE | Find the minimum value. | INT32 INT64 FLOAT DOUBLE STRING TIMESTAMP DATE | / | Consistent with the input data type | +| FIRST_VALUE | Find the value with the smallest timestamp. | All data types | / | Consistent with input data type | +| LAST_VALUE | Find the value with the largest timestamp. | All data types | / | Consistent with input data type | +| MAX_TIME | Find the maximum timestamp. | All data Types | / | Timestamp | +| MIN_TIME | Find the minimum timestamp. | All data Types | / | Timestamp | +| COUNT_IF | Find the number of data points that continuously meet a given condition and the number of data points that meet the condition (represented by keep) meet the specified threshold. | BOOLEAN | `[keep >=/>/=/!=/= threshold` if `threshold` is used alone, type of `threshold` is `INT64` `ignoreNull`:Optional, default value is `true`;If the value is `true`, null values are ignored, it means that if there is a null value in the middle, the value is ignored without interrupting the continuity. If the value is `true`, null values are not ignored, it means that if there are null values in the middle, continuity will be broken | INT64 | +| TIME_DURATION | Find the difference between the timestamp of the largest non-null value and the timestamp of the smallest non-null value in a column | All data Types | / | INT64 | +| MODE | Find the mode. Note: 1.Having too many different values in the input series risks a memory exception; 2.If all the elements have the same number of occurrences, that is no Mode, return the value with earliest time; 3.If there are many Modes, return the Mode with earliest time. | All data Types | / | Consistent with the input data type | +| MAX_BY | MAX_BY(x, y) returns the value of x corresponding to the maximum value of the input y. MAX_BY(time, x) returns the timestamp when x is at its maximum value. | The first input x can be of any type, while the second input y must be of type INT32, INT64, FLOAT, DOUBLE, STRING, TIMESTAMP or DATE. | / | Consistent with the data type of the first input x. | +| MIN_BY | MIN_BY(x, y) returns the value of x corresponding to the minimum value of the input y. MIN_BY(time, x) returns the timestamp when x is at its minimum value. | The first input x can be of any type, while the second input y must be of type INT32, INT64, FLOAT, DOUBLE, STRING, TIMESTAMP or DATE. | / | Consistent with the data type of the first input x. | + +For details and examples, see the document [Aggregate Functions](../Reference/Function-and-Expression.md#aggregate-functions). + +### Arithmetic Functions + +| Function Name | Allowed Input Series Data Types | Output Series Data Type | Required Attributes | Corresponding Implementation in the Java Standard Library | +| ------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | Same type as the input series | / | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 'places' : Round the significant number, positive number is the significant number after the decimal point, negative number is the significant number of whole number | Math#rint(Math#pow(10,places))/Math#pow(10,places) | +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sqrt(double) | + +For details and examples, see the document [Arithmetic Operators and Functions](../Reference/Function-and-Expression.md#arithmetic-operators-and-functions). + +### Comparison Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | ----------------------------------------- | ----------------------- | --------------------------------------------- | +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`: a double type variate | BOOLEAN | Return `ts_value >= threshold`. | +| IN_RANGR | INT32 / INT64 / FLOAT / DOUBLE | `lower`: DOUBLE type `upper`: DOUBLE type | BOOLEAN | Return `ts_value >= lower && value <= upper`. | + +For details and examples, see the document [Comparison Operators and Functions](../Reference/Function-and-Expression.md#comparison-operators-and-functions). + +### String Processing Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| --------------- |---------------------------------| ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| STRING_CONTAINS | TEXT STRING | `s`: string to search for | BOOLEAN | Checks whether the substring `s` exists in the string. | +| STRING_MATCHES | TEXT STRING | `regex`: Java standard library-style regular expressions. | BOOLEAN | Judges whether a string can be matched by the regular expression `regex`. | +| LENGTH | TEXT STRING | / | INT32 | Get the length of input series. | +| LOCATE | TEXT STRING | `target`: The substring to be located.
`reverse`: Indicates whether reverse locate is required. The default value is `false`, means left-to-right locate. | INT32 | Get the position of the first occurrence of substring `target` in input series. Returns -1 if there are no `target` in input. | +| STARTSWITH | TEXT STRING | `target`: The prefix to be checked. | BOOLEAN | Check whether input series starts with the specified prefix `target`. | +| ENDSWITH | TEXT STRING | `target`: The suffix to be checked. | BOOLEAN | Check whether input series ends with the specified suffix `target`. | +| CONCAT | TEXT STRING | `targets`: a series of K-V, key needs to start with `target` and be not duplicated, value is the string you want to concat.
`series_behind`: Indicates whether series behind targets. The default value is `false`. | TEXT | Concatenate input string and `target` string. | +| SUBSTRING | TEXT STRING | `from`: Indicates the start position of substring.
`for`: Indicates how many characters to stop after of substring. | TEXT | Extracts a substring of a string, starting with the first specified character and stopping after the specified number of characters.The index start at 1. | +| REPLACE | TEXT STRING | first parameter: The target substring to be replaced.
second parameter: The substring to replace with. | TEXT | Replace a substring in the input sequence with the target substring. | +| UPPER | TEXT STRING | / | TEXT | Get the string of input series with all characters changed to uppercase. | +| LOWER | TEXT STRING | / | TEXT | Get the string of input series with all characters changed to lowercase. | +| TRIM | TEXT STRING | / | TEXT | Get the string whose value is same to input series, with all leading and trailing space removed. | +| STRCMP | TEXT STRING | / | TEXT | Get the compare result of two input series. Returns `0` if series value are the same, a `negative integer` if value of series1 is smaller than series2,
a `positive integer` if value of series1 is more than series2. | + +For details and examples, see the document [String Processing](../Reference/Function-and-Expression.md#string-processing). + +### Data Type Conversion Function + +| Function Name | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| CAST | `type`: Output data type, INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | determined by `type` | Convert the data to the type specified by the `type` parameter. | + +For details and examples, see the document [Data Type Conversion Function](../Reference/Function-and-Expression.md#data-type-conversion-function). + +### Constant Timeseries Generating Functions + +| Function Name | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------ | +| CONST | `value`: the value of the output data point `type`: the type of the output data point, it can only be INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | Determined by the required attribute `type` | Output the user-specified constant timeseries according to the attributes `value` and `type`. | +| PI | None | DOUBLE | Data point value: a `double` value of `π`, the ratio of the circumference of a circle to its diameter, which is equals to `Math.PI` in the *Java Standard Library*. | +| E | None | DOUBLE | Data point value: a `double` value of `e`, the base of the natural logarithms, which is equals to `Math.E` in the *Java Standard Library*. | + +For details and examples, see the document [Constant Timeseries Generating Functions](../Reference/Function-and-Expression.md#constant-timeseries-generating-functions). + +### Selector Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- |-------------------------------------------------------------------| ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT / STRING / DATE / TIEMSTAMP | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the largest values in a time series. | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT / STRING / DATE / TIEMSTAMP | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the smallest values in a time series. | + +For details and examples, see the document [Selector Functions](../Reference/Function-and-Expression.md#selector-functions). + +### Continuous Interval Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------- | ------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always 0(false), and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always not 0, and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always 0(false). Data points number `n` satisfy `n >= min && n <= max` | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always not 0(false). Data points number `n` satisfy `n >= min && n <= max` | + +For details and examples, see the document [Continuous Interval Functions](../Reference/Function-and-Expression.md#continuous-interval-functions). + +### Variation Trend Calculation Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | / | INT64 | Calculates the difference between the time stamp of a data point and the time stamp of the previous data point. There is no corresponding output for the first data point. | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the absolute value of the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the rate of change of a data point compared to the previous data point, the result is equals to DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the absolute value of the rate of change of a data point compared to the previous data point, the result is equals to NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:optional,default is true. If is true, the previous data point is ignored when it is null and continues to find the first non-null value forwardly. If the value is false, previous data point is not ignored when it is null, the result is also null because null is used for subtraction | DOUBLE | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point, so output is null | + +For details and examples, see the document [Variation Trend Calculation Functions](../Reference/Function-and-Expression.md#variation-trend-calculation-functions). + +### Sample Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| -------------------------------- | ------------------------------- | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns a random sample of equal buckets that matches the sampling ratio | +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1`
`type`: The value types are `avg`, `max`, `min`, `sum`, `extreme`, `variance`, the default is `avg` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket aggregation samples that match the sampling ratio | +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket M4 samples that match the sampling ratio | +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | The value range of `proportion` is `(0, 1]`, the default is `0.1`
The value of `type` is `avg` or `stendis` or `cos` or `prenextdis`, the default is `avg`
The value of `number` should be greater than 0, the default is `3` | INT32 / INT64 / FLOAT / DOUBLE | Returns outlier samples in equal buckets that match the sampling ratio and the number of samples in the bucket | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | Different attributes used by the size window and the time window. The size window uses attributes `windowSize` and `slidingStep`. The time window uses attributes `timeInterval`, `slidingStep`, `displayWindowBegin`, and `displayWindowEnd`. More details see below. | INT32 / INT64 / FLOAT / DOUBLE | Returns the `first, last, bottom, top` points in each sliding window. M4 sorts and deduplicates the aggregated points within the window before outputting them. | + +For details and examples, see the document [Sample Functions](../Reference/Function-and-Expression.md#sample-functions). + +### Change Points Function + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | ------------------- | ----------------------------- | ----------------------------------------------------------- | +| CHANGE_POINTS | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Remove consecutive identical values from an input sequence. | + +For details and examples, see the document [Time-Series](../Reference/Function-and-Expression.md#time-series-processing). + + +## LAMBDA EXPRESSION + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Series Data Type Description | +| ------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | ------------------------------------------------------------ | +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr` is a lambda expression that supports standard one or multi arguments in the form `x -> {...}` or `(x, y, z) -> {...}`, e.g. `x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | Returns the input time series transformed by a lambda expression | + +For details and examples, see the document [Lambda](../Reference/Function-and-Expression.md#lambda-expression). + +## CONDITIONAL EXPRESSION + +| Expression Name | Description | +| --------------- | -------------------- | +| `CASE` | similar to "if else" | + +For details and examples, see the document [Conditional Expressions](../Reference/Function-and-Expression.md#conditional-expressions). + +## SELECT EXPRESSION + +The `SELECT` clause specifies the output of the query, consisting of several `selectExpr`. Each `selectExpr` defines one or more columns in the query result. + +**`selectExpr` is an expression consisting of time series path suffixes, constants, functions, and operators. That is, `selectExpr` can contain: ** + +- Time series path suffix (wildcards are supported) +- operator + - Arithmetic operators + - comparison operators + - Logical Operators +- function + - aggregate functions + - Time series generation functions (including built-in functions and user-defined functions) +- constant + +### Use Alias + +Since the unique data model of IoTDB, lots of additional information like device will be carried before each sensor. Sometimes, we want to query just one specific device, then these prefix information show frequently will be redundant in this situation, influencing the analysis of result set. At this time, we can use `AS` function provided by IoTDB, assign an alias to time series selected in query. + +For example: + +```sql +select s1 as temperature, s2 as speed from root.ln.wf01.wt01; +``` + +The result set is: + +| Time | temperature | speed | +| ---- | ----------- | ----- | +| ... | ... | ... | + + +### Operator + +See this documentation for a list of operators supported in IoTDB. + +### Function + +#### Aggregate Functions + +Aggregate functions are many-to-one functions. They perform aggregate calculations on a set of values, resulting in a single aggregated result. + +**A query that contains an aggregate function is called an aggregate query**, otherwise, it is called a time series query. + +> Please note that mixed use of `Aggregate Query` and `Timeseries Query` is not allowed. Below are examples for queries that are not allowed. +> +> ``` +> select a, count(a) from root.sg +> select sin(a), count(a) from root.sg +> select a, count(a) from root.sg group by ([10,100),10ms) +> ``` + +For the aggregation functions supported by IoTDB, see the document [Aggregate Functions](../Reference/Function-and-Expression.md#aggregate-functions). + + +#### Time Series Generation Function + +A time series generation function takes several raw time series as input and produces a list of time series as output. Unlike aggregate functions, time series generators have a timestamp column in their result sets. + +All time series generation functions accept * as input, and all can be mixed with raw time series queries. + +##### Built-in Time Series Generation Functions + +See this documentation for a list of built-in functions supported in IoTDB. + +##### User-Defined Time Series Generation Functions + +IoTDB supports function extension through User Defined Function (click for [User-Defined Function](./Database-Programming.md#udtfuser-defined-timeseries-generating-function)) capability. + +### Nested Expressions + +IoTDB supports the calculation of arbitrary nested expressions. Since time series query and aggregation query can not be used in a query statement at the same time, we divide nested expressions into two types, which are nested expressions with time series query and nested expressions with aggregation query. + +The following is the syntax definition of the `select` clause: + +```sql +selectClause + : SELECT resultColumn (',' resultColumn)* + ; + +resultColumn + : expression (AS ID)? + ; + +expression + : '(' expression ')' + | '-' expression + | expression ('*' | '/' | '%') expression + | expression ('+' | '-') expression + | functionName '(' expression (',' expression)* functionAttribute* ')' + | timeSeriesSuffixPath + | number + ; +``` + +#### Nested Expressions with Time Series Query + +IoTDB supports the calculation of arbitrary nested expressions consisting of **numbers, time series, time series generating functions (including user-defined functions) and arithmetic expressions** in the `select` clause. + +##### Example + +Input1: + +```sql +select a, + b, + ((a + 1) * 2 - 1) % 2 + 1.5, + sin(a + sin(a + sin(b))), + -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 +from root.sg1; +``` + +Result1: + +``` ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Time|root.sg1.a|root.sg1.b|((((root.sg1.a + 1) * 2) - 1) % 2) + 1.5|sin(root.sg1.a + sin(root.sg1.a + sin(root.sg1.b)))|(-root.sg1.a + root.sg1.b * ((sin(root.sg1.a + root.sg1.b) * sin(root.sg1.a + root.sg1.b)) + (cos(root.sg1.a + root.sg1.b) * cos(root.sg1.a + root.sg1.b)))) + 1| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 1| 1| 2.5| 0.9238430524420609| -1.0| +|1970-01-01T08:00:00.020+08:00| 2| 2| 2.5| 0.7903505371876317| -3.0| +|1970-01-01T08:00:00.030+08:00| 3| 3| 2.5| 0.14065207680386618| -5.0| +|1970-01-01T08:00:00.040+08:00| 4| null| 2.5| null| null| +|1970-01-01T08:00:00.050+08:00| null| 5| null| null| null| +|1970-01-01T08:00:00.060+08:00| 6| 6| 2.5| -0.7288037411970916| -11.0| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +Total line number = 6 +It costs 0.048s +``` + +Input2: + +```sql +select (a + b) * 2 + sin(a) from root.sg +``` + +Result2: + +``` ++-----------------------------+----------------------------------------------+ +| Time|((root.sg.a + root.sg.b) * 2) + sin(root.sg.a)| ++-----------------------------+----------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 59.45597888911063| +|1970-01-01T08:00:00.020+08:00| 100.91294525072763| +|1970-01-01T08:00:00.030+08:00| 139.01196837590714| +|1970-01-01T08:00:00.040+08:00| 180.74511316047935| +|1970-01-01T08:00:00.050+08:00| 219.73762514629607| +|1970-01-01T08:00:00.060+08:00| 259.6951893788978| +|1970-01-01T08:00:00.070+08:00| 300.7738906815579| +|1970-01-01T08:00:00.090+08:00| 39.45597888911063| +|1970-01-01T08:00:00.100+08:00| 39.45597888911063| ++-----------------------------+----------------------------------------------+ +Total line number = 9 +It costs 0.011s +``` + +Input3: + +```sql +select (a + *) / 2 from root.sg1 +``` + +Result3: + +``` ++-----------------------------+-----------------------------+-----------------------------+ +| Time|(root.sg1.a + root.sg1.a) / 2|(root.sg1.a + root.sg1.b) / 2| ++-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.010+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.020+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.030+08:00| 3.0| 3.0| +|1970-01-01T08:00:00.040+08:00| 4.0| null| +|1970-01-01T08:00:00.060+08:00| 6.0| 6.0| ++-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.011s +``` + +Input4: + +```sql +select (a + b) * 3 from root.sg, root.ln +``` + +Result4: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|(root.sg.a + root.sg.b) * 3|(root.sg.a + root.ln.b) * 3|(root.ln.a + root.sg.b) * 3|(root.ln.a + root.ln.b) * 3| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.010+08:00| 90.0| 270.0| 360.0| 540.0| +|1970-01-01T08:00:00.020+08:00| 150.0| 330.0| 690.0| 870.0| +|1970-01-01T08:00:00.030+08:00| 210.0| 450.0| 570.0| 810.0| +|1970-01-01T08:00:00.040+08:00| 270.0| 240.0| 690.0| 660.0| +|1970-01-01T08:00:00.050+08:00| 330.0| null| null| null| +|1970-01-01T08:00:00.060+08:00| 390.0| null| null| null| +|1970-01-01T08:00:00.070+08:00| 450.0| null| null| null| +|1970-01-01T08:00:00.090+08:00| 60.0| null| null| null| +|1970-01-01T08:00:00.100+08:00| 60.0| null| null| null| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +Total line number = 9 +It costs 0.014s +``` + +##### Explanation + +- Only when the left operand and the right operand under a certain timestamp are not `null`, the nested expressions will have an output value. Otherwise this row will not be included in the result. + - In Result1 of the Example part, the value of time series `root.sg.a` at time 40 is 4, while the value of time series `root.sg.b` is `null`. So at time 40, the value of nested expressions `(a + b) * 2 + sin(a)` is `null`. So in Result2, this row is not included in the result. +- If one operand in the nested expressions can be translated into multiple time series (For example, `*`), the result of each time series will be included in the result (Cartesian product). Please refer to Input3, Input4 and corresponding Result3 and Result4 in Example. + +##### Note + +> Please note that Aligned Time Series has not been supported in Nested Expressions with Time Series Query yet. An error message is expected if you use it with Aligned Time Series selected in a query statement. + +#### Nested Expressions Query with Aggregations + +IoTDB supports the calculation of arbitrary nested expressions consisting of **numbers, aggregations and arithmetic expressions** in the `select` clause. + +##### Example + +Aggregation query without `GROUP BY`. + +Input1: + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) +from root.ln.wf01.wt01; +``` + +Result1: + +``` ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|avg(root.ln.wf01.wt01.temperature) + sum(root.ln.wf01.wt01.hardware)| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +| 15.927999999999999| -0.21826546964855045| 16.927999999999997| -7426.0| 7441.928| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +Total line number = 1 +It costs 0.009s +``` + +Input2: + +```sql +select avg(*), + (avg(*) + 1) * 3 / 2 -1 +from root.sg1 +``` + +Result2: + +``` ++---------------+---------------+-------------------------------------+-------------------------------------+ +|avg(root.sg1.a)|avg(root.sg1.b)|(avg(root.sg1.a) + 1) * 3 / 2 - 1 |(avg(root.sg1.b) + 1) * 3 / 2 - 1 | ++---------------+---------------+-------------------------------------+-------------------------------------+ +| 3.2| 3.4| 5.300000000000001| 5.6000000000000005| ++---------------+---------------+-------------------------------------+-------------------------------------+ +Total line number = 1 +It costs 0.007s +``` + +Aggregation with `GROUP BY`. + +Input3: + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) as custom_sum +from root.ln.wf01.wt01 +GROUP BY([10, 90), 10ms); +``` + +Result3: + +``` ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +| Time|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|custom_sum| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +|1970-01-01T08:00:00.010+08:00| 13.987499999999999| 0.9888207947857667| 14.987499999999999| -3211.0| 3224.9875| +|1970-01-01T08:00:00.020+08:00| 29.6| -0.9701057337071853| 30.6| -3720.0| 3749.6| +|1970-01-01T08:00:00.030+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.040+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.050+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.060+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.070+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.080+08:00| null| null| null| null| null| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +Total line number = 8 +It costs 0.012s +``` + +##### Explanation + +- Only when the left operand and the right operand under a certain timestamp are not `null`, the nested expressions will have an output value. Otherwise this row will not be included in the result. But for nested expressions with `GROUP BY` clause, it is better to show the result of all time intervals. Please refer to Input3 and corresponding Result3 in Example. +- If one operand in the nested expressions can be translated into multiple time series (For example, `*`), the result of each time series will be included in the result (Cartesian product). Please refer to Input2 and corresponding Result2 in Example. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/SQL-Manual/SQL-Manual.md b/src/UserGuide/V2.0.1/Tree/SQL-Manual/SQL-Manual.md new file mode 100644 index 00000000..9f096743 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/SQL-Manual/SQL-Manual.md @@ -0,0 +1,1759 @@ + + +# SQL Manual + +## DATABASE MANAGEMENT + +For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata_timecho.md). + +### Create Database + +```sql +IoTDB > create database root.ln +IoTDB > create database root.sgcc +``` + +### Show Databases + +```sql +IoTDB> SHOW DATABASES +IoTDB> SHOW DATABASES root.** +``` + +### Delete Database + +```sql +IoTDB > DELETE DATABASE root.ln +IoTDB > DELETE DATABASE root.sgcc +// delete all data, all timeseries and all databases +IoTDB > DELETE DATABASE root.** +``` + +### Count Databases + +```sql +IoTDB> count databases +IoTDB> count databases root.* +IoTDB> count databases root.sgcc.* +IoTDB> count databases root.sgcc +``` + +### Setting up heterogeneous databases (Advanced operations) + +#### Set heterogeneous parameters when creating a Database + +```sql +CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### Adjust heterogeneous parameters at run time + +```sql +ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### Show heterogeneous databases + +```sql +SHOW DATABASES DETAILS +``` + +### TTL + +#### Set TTL + +```sql +IoTDB> set ttl to root.ln 3600000 +IoTDB> set ttl to root.sgcc.** 3600000 +IoTDB> set ttl to root.** 3600000 +``` + +#### Unset TTL + +```sql +IoTDB> unset ttl from root.ln +IoTDB> unset ttl from root.sgcc.** +IoTDB> unset ttl from root.** +``` + +#### Show TTL + +```sql +IoTDB> SHOW ALL TTL +IoTDB> SHOW TTL ON StorageGroupNames +IoTDB> SHOW DEVICES +``` + +## DEVICE TEMPLATE + +For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata_timecho.md). + +![img](https://alioss.timecho.com/docs/img/%E6%A8%A1%E6%9D%BF.png) + + + + + +![img](https://alioss.timecho.com/docs/img/templateEN.jpg) + +### Create Device Template + +**Example 1:** Create a template containing two non-aligned timeseires + +```shell +IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +**Example 2:** Create a template containing a group of aligned timeseires + +```shell +IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` + +The` lat` and `lon` measurements are aligned. + +### Set Device Template + +```sql +IoTDB> set device template t1 to root.sg1.d1 +``` + +### Activate Device Template + +```sql +IoTDB> set device template t1 to root.sg1.d1 +IoTDB> set device template t2 to root.sg1.d2 +IoTDB> create timeseries using device template on root.sg1.d1 +IoTDB> create timeseries using device template on root.sg1.d2 +``` + +### Show Device Template + +```sql +IoTDB> show device templates +IoTDB> show nodes in device template t1 +IoTDB> show paths set device template t1 +IoTDB> show paths using device template t1 +``` + +### Deactivate Device Template + +```sql +IoTDB> delete timeseries of device template t1 from root.sg1.d1 +IoTDB> deactivate device template t1 from root.sg1.d1 +IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* +IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* +``` + +### Unset Device Template + +```sql +IoTDB> unset device template t1 from root.sg1.d1 +``` + +### Drop Device Template + +```sql +IoTDB> drop device template t1 +``` + +### Alter Device Template + +```sql +IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) +``` + +## TIMESERIES MANAGEMENT + +For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata_timecho.md). + +### Create Timeseries + +```sql +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +- From v0.13, you can use a simplified version of the SQL statements to create timeseries: + +```sql +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +- Notice that when in the CREATE TIMESERIES statement the encoding method conflicts with the data type, the system gives the corresponding error prompt as shown below: + +```sql +IoTDB > create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +error: encoding TS_2DIFF does not support BOOLEAN +``` + +### Create Aligned Timeseries + +```sql +IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +### Delete Timeseries + +```sql +IoTDB> delete timeseries root.ln.wf01.wt01.status +IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +IoTDB> delete timeseries root.ln.wf02.* +IoTDB> drop timeseries root.ln.wf02.* +``` + +### Show Timeseries + +```sql +IoTDB> show timeseries root.** +IoTDB> show timeseries root.ln.** +IoTDB> show timeseries root.ln.** limit 10 offset 10 +IoTDB> show timeseries root.ln.** where timeseries contains 'wf01.wt' +IoTDB> show timeseries root.ln.** where dataType=FLOAT +``` + +### Count Timeseries + +```sql +IoTDB > COUNT TIMESERIES root.** +IoTDB > COUNT TIMESERIES root.ln.** +IoTDB > COUNT TIMESERIES root.ln.*.*.status +IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +### Tag and Attribute Management + +```sql +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +* Rename the tag/attribute key + +```SQL +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +* Reset the tag/attribute value + +```SQL +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` + +* Delete the existing tag/attribute + +```SQL +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` + +* Add new tags + +```SQL +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` + +* Add new attributes + +```SQL +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` + +* Upsert alias, tags and attributes + +> add alias or a new key-value if the alias or key doesn't exist, otherwise, update the old one with new value. + +```SQL +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag3=v3, tag4=v4) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* Show timeseries using tags. Use TAGS(tagKey) to identify the tags used as filter key + +```SQL +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +returns all the timeseries information that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +```SQL +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +- count timeseries using tags + +```SQL +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +returns all the number of timeseries that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +```SQL +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +create aligned timeseries + +```SQL +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +The execution result is as follows: + +```SQL +IoTDB> show timeseries ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| +|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +Support query: + +```SQL +IoTDB> show timeseries where TAGS(tag1)='v1' ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +The above operations are supported for timeseries tag, attribute updates, etc. + +## NODE MANAGEMENT + +For more details, see document [Operate-Metadata](../User-Manual/Operate-Metadata_timecho.md). + +### Show Child Paths + +```SQL +SHOW CHILD PATHS pathPattern +``` + +### Show Child Nodes + +```SQL +SHOW CHILD NODES pathPattern +``` + +### Count Nodes + +```SQL +IoTDB > COUNT NODES root.** LEVEL=2 +IoTDB > COUNT NODES root.ln.** LEVEL=2 +IoTDB > COUNT NODES root.ln.wf01.** LEVEL=3 +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` + +### Show Devices + +```SQL +IoTDB> show devices +IoTDB> show devices root.ln.** +IoTDB> show devices root.ln.** where device contains 't' +IoTDB> show devices with database +IoTDB> show devices root.ln.** with database +``` + +### Count Devices + +```SQL +IoTDB> show devices +IoTDB> count devices +IoTDB> count devices root.ln.** +``` + +## INSERT & LOAD DATA + +### Insert Data + +For more details, see document [Write-Delete-Data](../User-Manual/Write-Delete-Data.md). + +#### Use of INSERT Statements + +- Insert Single Timeseries + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) +IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') +``` + +- Insert Multiple Timeseries + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (2, false, 'v2') +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') +``` + +- Use the Current System Timestamp as the Timestamp of the Data Point + +```SQL +IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') +``` + +#### Insert Data Into Aligned Timeseries + +```SQL +IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(2, 2, 2), (3, 3, 3) +IoTDB > select * from root.sg1.d1 +``` + +### Load External TsFile Tool + +For more details, see document [Data Import](../Tools-System/Data-Import-Tool.md). + +#### Load with SQL + +1. Load a single tsfile by specifying a file path (absolute path). + +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1 onSuccess=delete` + + +2. Load a batch of files by specifying a folder path (absolute path). + +- `load '/Users/Desktop/data'` +- `load '/Users/Desktop/data' sglevel=1` +- `load '/Users/Desktop/data' onSuccess=delete` +- `load '/Users/Desktop/data' sglevel=1 onSuccess=delete` + +#### Load with Script + +``` +./load-rewrite.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root +``` + +## DELETE DATA + +For more details, see document [Write-Delete-Data](../User-Manual/Write-Delete-Data.md). + +### Delete Single Timeseries + +```sql +IoTDB > delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; +IoTDB > delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +IoTDB > delete from root.ln.wf02.wt02.status where time < 10 +IoTDB > delete from root.ln.wf02.wt02.status where time <= 10 +IoTDB > delete from root.ln.wf02.wt02.status where time < 20 and time > 10 +IoTDB > delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 +IoTDB > delete from root.ln.wf02.wt02.status where time > 20 +IoTDB > delete from root.ln.wf02.wt02.status where time >= 20 +IoTDB > delete from root.ln.wf02.wt02.status where time = 20 +IoTDB > delete from root.ln.wf02.wt02.status where time > 4 or time < 0 +Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic +expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' +IoTDB > delete from root.ln.wf02.wt02.status +``` + +### Delete Multiple Timeseries + +```sql +IoTDB > delete from root.ln.wf02.wt02 where time <= 2017-11-01T16:26:00; +IoTDB > delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; +IoTDB> delete from root.ln.wf03.wt02.status where time < now() +Msg: The statement is executed successfully. +``` + +### Delete Time Partition (experimental) + +```sql +IoTDB > DELETE PARTITION root.ln 0,1,2 +``` + +## QUERY DATA + +For more details, see document [Query-Data](../User-Manual/Query-Data.md). + +```sql +SELECT [LAST] selectExpr [, selectExpr] ... + [INTO intoItem [, intoItem] ...] + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY { + ([startTime, endTime), interval [, slidingStep]) | + LEVEL = levelNum [, levelNum] ... | + TAGS(tagKey [, tagKey] ... ) | + VARIATION(expression[,delta][,ignoreNull=true/false]) | + CONDITION(expression,[keep>/>=/=/ select temperature from root.ln.wf01.wt01 where time < 2017-11-01T00:08:00.000 +``` + +#### Select Multiple Columns of Data Based on a Time Interval + +```sql +IoTDB > select status, temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` + +#### Select Multiple Columns of Data for the Same Device According to Multiple Time Intervals + +```sql +IoTDB > select status,temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +#### Choose Multiple Columns of Data for Different Devices According to Multiple Time Intervals + +```sql +IoTDB > select wf01.wt01.status,wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +#### Order By Time Query + +```sql +IoTDB > select * from root.ln.** where time > 1 order by time desc limit 10; +``` + +### `SELECT` CLAUSE + +#### Use Alias + +```sql +IoTDB > select s1 as temperature, s2 as speed from root.ln.wf01.wt01; +``` + +#### Nested Expressions + +##### Nested Expressions with Time Series Query + +```sql +IoTDB > select a, + b, + ((a + 1) * 2 - 1) % 2 + 1.5, + sin(a + sin(a + sin(b))), + -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 +from root.sg1; + +IoTDB > select (a + b) * 2 + sin(a) from root.sg + +IoTDB > select (a + *) / 2 from root.sg1 + +IoTDB > select (a + b) * 3 from root.sg, root.ln +``` + +##### Nested Expressions query with aggregations + +```sql +IoTDB > select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) +from root.ln.wf01.wt01; + +IoTDB > select avg(*), + (avg(*) + 1) * 3 / 2 -1 +from root.sg1 + +IoTDB > select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) as custom_sum +from root.ln.wf01.wt01 +GROUP BY([10, 90), 10ms); +``` + +#### Last Query + +```sql +IoTDB > select last status from root.ln.wf01.wt01 +IoTDB > select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 +IoTDB > select last * from root.ln.wf01.wt01 order by timeseries desc; +IoTDB > select last * from root.ln.wf01.wt01 order by dataType desc; +``` + +### `WHERE` CLAUSE + +#### Time Filter + +```sql +IoTDB > select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; +IoTDB > select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; +IoTDB > select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` + +#### Value Filter + +```sql +IoTDB > select temperature from root.sg1.d1 where temperature > 36.5; +IoTDB > select status from root.sg1.d1 where status = true; +IoTDB > select temperature from root.sg1.d1 where temperature between 36.5 and 40; +IoTDB > select temperature from root.sg1.d1 where temperature not between 36.5 and 40; +IoTDB > select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +IoTDB > select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +IoTDB > select code from root.sg1.d1 where temperature is null; +IoTDB > select code from root.sg1.d1 where temperature is not null; +``` + +#### Fuzzy Query + +- Fuzzy matching using `Like` + +```sql +IoTDB > select * from root.sg.d1 where value like '%cc%' +IoTDB > select * from root.sg.device where value like '_b_' +``` + +- Fuzzy matching using `Regexp` + +```sql +IoTDB > select * from root.sg.d1 where value regexp '^[A-Za-z]+$' +IoTDB > select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 +``` + +### `GROUP BY` CLAUSE + +- Aggregate By Time without Specifying the Sliding Step Length + +```sql +IoTDB > select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); +``` + +- Aggregate By Time Specifying the Sliding Step Length + +```sql +IoTDB > select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); +``` + +- Aggregate by Natural Month + +```sql +IoTDB > select count(status) from root.ln.wf01.wt01 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +IoTDB > select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +- Left Open And Right Close Range + +```sql +IoTDB > select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); +``` + +- Aggregation By Variation + +```sql +IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) +IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) +IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) +IoTDB > select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6+s5, 10) +``` + +- Aggregation By Condition + +```sql +IoTDB > select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=true) +IoTDB > select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=false) +``` + +- Aggregation By Session + +```sql +IoTDB > select __endTime,count(*) from root.** group by session(1d) +IoTDB > select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device +``` + +- Aggregation By Count + +```sql +IoTDB > select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) +IoTDB > select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) +``` + +- Aggregation By Level + +```sql +IoTDB > select count(status) from root.** group by level = 1 +IoTDB > select count(status) from root.** group by level = 3 +IoTDB > select count(status) from root.** group by level = 1, 3 +IoTDB > select max_value(temperature) from root.** group by level = 0 +IoTDB > select count(*) from root.ln.** group by level = 2 +``` + +- Aggregate By Time with Level Clause + +```sql +IoTDB > select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; +IoTDB > select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; +``` + +- Aggregation query by one single tag + +```sql +IoTDB > SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); +``` + +- Aggregation query by multiple tags + +```sql +IoTDB > SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); +``` + +- Downsampling Aggregation by tags based on Time Window + +```sql +IoTDB > SELECT avg(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); +``` + +### `HAVING` CLAUSE + +Correct: + +```sql +IoTDB > select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 1 +IoTDB > select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device +``` + +Incorrect: + +```sql +IoTDB > select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 +IoTDB > select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 +IoTDB > select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 +IoTDB > select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 +``` + +### `FILL` CLAUSE + +#### `PREVIOUS` Fill + +```sql +IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); +``` + +#### `PREVIOUS` FILL and specify the fill timeout threshold +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous, 2m); +``` + +#### `LINEAR` Fill + +```sql +IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); +``` + +#### Constant Fill + +```sql +IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); +IoTDB > select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); +``` + +### `LIMIT` and `SLIMIT` CLAUSES (PAGINATION) + +#### Row Control over Query Results + +```sql +IoTDB > select status, temperature from root.ln.wf01.wt01 limit 10 +IoTDB > select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 +IoTDB > select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time< 2017-11-01T00:12:00.000 limit 2 offset 3 +IoTDB > select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 5 offset 3 +``` + +#### Column Control over Query Results + +```sql +IoTDB > select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 +IoTDB > select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 +IoTDB > select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 +``` + +#### Row and Column Control over Query Results + +```sql +IoTDB > select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 +``` + +### `ORDER BY` CLAUSE + +#### Order by in ALIGN BY TIME mode + +```sql +IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; +``` + +#### Order by in ALIGN BY DEVICE mode + +```sql +IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; +IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; +IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +IoTDB > select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device +``` + +#### Order by arbitrary expressions + +```sql +IoTDB > select score from root.** order by score desc align by device +IoTDB > select score,total from root.one order by base+score+bonus desc +IoTDB > select score,total from root.one order by total desc +IoTDB > select base, score, bonus, total from root.** order by total desc NULLS Last, + score desc NULLS Last, + bonus desc NULLS Last, + time desc align by device +IoTDB > select min_value(total) from root.** order by min_value(total) asc align by device +IoTDB > select min_value(total),max_value(base) from root.** order by max_value(total) desc align by device +IoTDB > select score from root.** order by device asc, score desc, time asc align by device +``` + +### `ALIGN BY` CLAUSE + +#### Align by Device + +```sql +IoTDB > select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` + +### `INTO` CLAUSE (QUERY WRITE-BACK) + +```sql +IoTDB > select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; +IoTDB > select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); +IoTDB > select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +IoTDB > select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; +``` + +- Using variable placeholders: + +```sql +IoTDB > select s1, s2 +into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) +from root.sg.d1, root.sg.d2; + +IoTDB > select d1.s1, d1.s2, d2.s3, d3.s4 +into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) +from root.sg; + +IoTDB > select * into root.sg_bk.::(::) from root.sg.**; + +IoTDB > select s1, s2, s3, s4 +into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) +from root.sg.d1, root.sg.d2, root.sg.d3 +align by device; + +IoTDB > select avg(s1), sum(s2) + sum(s3), count(s4) +into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) +from root.** +align by device; + +IoTDB > select * into ::(backup_${4}) from root.sg.** align by device; + +IoTDB > select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +``` + +## Maintennance +Generate the corresponding query plan: +``` +explain select s1,s2 from root.sg.d1 +``` +Execute the corresponding SQL, analyze the execution and output: +``` +explain analyze select s1,s2 from root.sg.d1 order by s1 +``` +## OPERATOR + +For more details, see document [Operator-and-Expression](./Operator-and-Expression.md). + +### Arithmetic Operators + +For details and examples, see the document [Arithmetic Operators and Functions](./Operator-and-Expression.md#arithmetic-operators). + +```sql +select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 +``` + +### Comparison Operators + +For details and examples, see the document [Comparison Operators and Functions](./Operator-and-Expression.md#comparison-operators). + +```sql +# Basic comparison operators +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; + +# `BETWEEN ... AND ...` operator +select temperature from root.sg1.d1 where temperature between 36.5 and 40; +select temperature from root.sg1.d1 where temperature not between 36.5 and 40; + +# Fuzzy matching operator: Use `Like` for fuzzy matching +select * from root.sg.d1 where value like '%cc%' +select * from root.sg.device where value like '_b_' + +# Fuzzy matching operator: Use `Regexp` for fuzzy matching +select * from root.sg.d1 where value regexp '^[A-Za-z]+$' +select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 +select b, b like '1%', b regexp '[0-2]' from root.test; + +# `IS NULL` operator +select code from root.sg1.d1 where temperature is null; +select code from root.sg1.d1 where temperature is not null; + +# `IN` operator +select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +select a, a in (1, 2) from root.test; +``` + +### Logical Operators + +For details and examples, see the document [Logical Operators](./Operator-and-Expression.md#logical-operators). + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +## BUILT-IN FUNCTIONS + +For more details, see document [Operator-and-Expression](./Operator-and-Expression.md#built-in-functions). + +### Aggregate Functions + +For details and examples, see the document [Aggregate Functions](./Operator-and-Expression.md#aggregate-functions). + +```sql +select count(status) from root.ln.wf01.wt01; + +select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1; +select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1; + +select time_duration(s1) from root.db.d1; +``` + +### Arithmetic Functions + +For details and examples, see the document [Arithmetic Operators and Functions](./Operator-and-Expression.md#arithmetic-functions). + +```sql +select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; +select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1; +``` + +### Comparison Functions + +For details and examples, see the document [Comparison Operators and Functions](./Operator-and-Expression.md#comparison-functions). + +```sql +select ts, on_off(ts, 'threshold'='2') from root.test; +select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; +``` + +### String Processing Functions + +For details and examples, see the document [String Processing](./Operator-and-Expression.md#string-processing-functions). + +```sql +select s1, string_contains(s1, 's'='warn') from root.sg1.d4; +select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; +select s1, length(s1) from root.sg1.d1 +select s1, locate(s1, "target"="1") from root.sg1.d1 +select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 +select s1, startswith(s1, "target"="1") from root.sg1.d1 +select s1, endswith(s1, "target"="1") from root.sg1.d1 +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 +select s1, substring(s1 from 1 for 2) from root.sg1.d1 +select s1, replace(s1, 'es', 'tt') from root.sg1.d1 +select s1, upper(s1) from root.sg1.d1 +select s1, lower(s1) from root.sg1.d1 +select s3, trim(s3) from root.sg1.d1 +select s1, s2, strcmp(s1, s2) from root.sg1.d1 +select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 +select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 +select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 +select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 +select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 +select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 +``` + +### Data Type Conversion Function + +For details and examples, see the document [Data Type Conversion Function](./Operator-and-Expression.md#data-type-conversion-function). + +```sql +SELECT cast(s1 as INT32) from root.sg +``` + +### Constant Timeseries Generating Functions + +For details and examples, see the document [Constant Timeseries Generating Functions](./Operator-and-Expression.md#constant-timeseries-generating-functions). + +```sql +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; +``` + +### Selector Functions + +For details and examples, see the document [Selector Functions](./Operator-and-Expression.md#selector-functions). + +```sql +select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; +``` + +### Continuous Interval Functions + +For details and examples, see the document [Continuous Interval Functions](./Operator-and-Expression.md#continuous-interval-functions). + +```sql +select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; +``` + +### Variation Trend Calculation Functions + +For details and examples, see the document [Variation Trend Calculation Functions](./Operator-and-Expression.md#variation-trend-calculation-functions). + +```sql +select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; + +SELECT DIFF(s1), DIFF(s2) from root.test; +SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; +``` + +### Sample Functions + +For details and examples, see the document [Sample Functions](./Operator-and-Expression.md#sample-functions). + +```sql +select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; +select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; +select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; +select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; + +select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 +select M4(s1,'windowSize'='10') from root.vehicle.d1 +``` + +### Change Points Function + +For details and examples, see the document [Time-Series](./Operator-and-Expression.md#change-points-function). + +```sql +select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 +``` + +## DATA QUALITY FUNCTION LIBRARY + +For more details, see document [Operator-and-Expression](./UDF-Libraries_timecho.md). + +### Data Quality + +For details and examples, see the document [Data-Quality](./UDF-Libraries_timecho.md#data-quality). + +```sql +# Completeness +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Consistency +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Timeliness +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Validity +select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Accuracy +select Accuracy(t1,t2,t3,m1,m2,m3) from root.test +``` + +### Data Profiling + +For details and examples, see the document [Data-Profiling](./UDF-Libraries_timecho.md#data-profiling). + +```sql +# ACF +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 + +# Distinct +select distinct(s2) from root.test.d2 + +# Histogram +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 + +# Integral +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 + +# IntegralAvg +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 + +# Mad +select mad(s0) from root.test +select mad(s0, "error"="0.01") from root.test + +# Median +select median(s0, "error"="0.01") from root.test + +# MinMax +select minmax(s1) from root.test + +# Mode +select mode(s2) from root.test.d2 + +# MvAvg +select mvavg(s1, "window"="3") from root.test + +# PACF +select pacf(s1, "lag"="5") from root.test + +# Percentile +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test + +# Quantile +select quantile(s0, "rank"="0.2", "K"="800") from root.test + +# Period +select period(s1) from root.test.d3 + +# QLB +select QLB(s1) from root.test.d1 + +# Resample +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 + +# Sample +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +select sample(s1,'method'='isometric','k'='5') from root.test.d1 + +# Segment +select segment(s1, "error"="0.1") from root.test + +# Skew +select skew(s1) from root.test.d1 + +# Spline +select spline(s1, "points"="151") from root.test + +# Spread +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 + +# Stddev +select stddev(s1) from root.test.d1 + +# ZScore +select zscore(s1) from root.test +``` + +### Anomaly Detection + +For details and examples, see the document [Anomaly-Detection](./UDF-Libraries_timecho.md#anomaly-detection). + +```sql +# IQR +select iqr(s1) from root.test + +# KSigma +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 + +# LOF +select lof(s1,s2) from root.test.d1 where time<1000 +select lof(s1, "method"="series") from root.test.d1 where time<1000 + +# MissDetect +select missdetect(s2,'minlen'='10') from root.test.d2 + +# Range +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 + +# TwoSidedFilter +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test + +# Outlier +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test + +# MasterTrain +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test + +# MasterDetect +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +### Frequency Domain + +For details and examples, see the document [Frequency-Domain](./UDF-Libraries_timecho.md#frequency-domain-analysis). + +```sql +# Conv +select conv(s1,s2) from root.test.d2 + +# Deconv +select deconv(s3,s2) from root.test.d2 +select deconv(s3,s2,'result'='remainder') from root.test.d2 + +# DWT +select dwt(s1,"method"="haar") from root.test.d1 + +# FFT +select fft(s1) from root.test.d1 +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 + +# HighPass +select highpass(s1,'wpass'='0.45') from root.test.d1 + +# IFFT +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 + +# LowPass +select lowpass(s1,'wpass'='0.45') from root.test.d1 + +# Envelope +select envelope(s1) from root.test.d1 +``` + +### Data Matching + +For details and examples, see the document [Data-Matching](./UDF-Libraries_timecho.md#data-matching). + +```sql +# Cov +select cov(s1,s2) from root.test.d2 + +# DTW +select dtw(s1,s2) from root.test.d2 + +# Pearson +select pearson(s1,s2) from root.test.d2 + +# PtnSym +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 + +# XCorr +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +### Data Repairing + +For details and examples, see the document [Data-Repairing](./UDF-Libraries_timecho.md#data-repairing). + +```sql +# TimestampRepair +select timestamprepair(s1,'interval'='10000') from root.test.d2 +select timestamprepair(s1) from root.test.d2 + +# ValueFill +select valuefill(s1) from root.test.d2 +select valuefill(s1,"method"="previous") from root.test.d2 + +# ValueRepair +select valuerepair(s1) from root.test.d2 +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 + +# MasterRepair +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test + +# SeasonalRepair +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +### Series Discovery + +For details and examples, see the document [Series-Discovery](./UDF-Libraries_timecho.md#series-discovery). + +```sql +# ConsecutiveSequences +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +select consecutivesequences(s1,s2) from root.test.d1 + +# ConsecutiveWindows +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +### Machine Learning + +For details and examples, see the document [Machine-Learning](./UDF-Libraries_timecho.md#machine-learning). + +```sql +# AR +select ar(s0,"p"="2") from root.test.d0 + +# Representation +select representation(s0,"tb"="3","vb"="2") from root.test.d0 + +# RM +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +## LAMBDA EXPRESSION + +For details and examples, see the document [Lambda](./UDF-Libraries_timecho.md#lambda-expression). + +```sql +select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` +``` + +## CONDITIONAL EXPRESSION + +For details and examples, see the document [Conditional Expressions](./UDF-Libraries_timecho.md#conditional-expressions). + +```sql +select T, P, case +when 1000=1050 then "bad temperature" +when P<=1000000 or P>=1100000 then "bad pressure" +end as `result` +from root.test1 + +select str, case +when str like "%cc%" then "has cc" +when str like "%dd%" then "has dd" +else "no cc and dd" end as `result` +from root.test2 + +select +count(case when x<=1 then 1 end) as `(-∞,1]`, +count(case when 1 +[RESAMPLE + [EVERY ] + [BOUNDARY ] + [RANGE [, end_time_offset]] +] +[TIMEOUT POLICY BLOCKED|DISCARD] +BEGIN + SELECT CLAUSE + INTO CLAUSE + FROM CLAUSE + [WHERE CLAUSE] + [GROUP BY([, ]) [, level = ]] + [HAVING CLAUSE] + [FILL ({PREVIOUS | LINEAR | constant} (, interval=DURATION_LITERAL)?)] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +END +``` + +### Configuring execution intervals + +```sql +CREATE CONTINUOUS QUERY cq1 +RESAMPLE EVERY 20s +BEGIN +SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +### Configuring time range for resampling + +```sql +CREATE CONTINUOUS QUERY cq2 +RESAMPLE RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +### Configuring execution intervals and CQ time ranges + +```sql +CREATE CONTINUOUS QUERY cq3 +RESAMPLE EVERY 20s RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +### Configuring end_time_offset for CQ time range + +```sql +CREATE CONTINUOUS QUERY cq4 +RESAMPLE EVERY 20s RANGE 40s, 20s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +### CQ without group by clause + +```sql +CREATE CONTINUOUS QUERY cq5 +RESAMPLE EVERY 20s +BEGIN + SELECT temperature + 1 + INTO root.precalculated_sg.::(temperature) + FROM root.ln.*.* + align by device +END +``` + +### CQ Management + +#### Listing continuous queries + +```sql +SHOW (CONTINUOUS QUERIES | CQS) +``` + +#### Dropping continuous queries + +```sql +DROP (CONTINUOUS QUERY | CQ) +``` + +#### Altering continuous queries + +CQs can't be altered once they're created. To change a CQ, you must `DROP` and re`CREATE` it with the updated settings. + +## USER-DEFINED FUNCTION (UDF) + +For more details, see document [Operator-and-Expression](./UDF-Libraries_timecho.md). + +### UDF Registration + +```sql +CREATE FUNCTION AS (USING URI URI-STRING)? +``` + +### UDF Deregistration + +```sql +DROP FUNCTION +``` + +### UDF Queries + +```sql +SELECT example(*) from root.sg.d1 +SELECT example(s1, *) from root.sg.d1 +SELECT example(*, *) from root.sg.d1 + +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; + +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` + +### Show All Registered UDFs + +```sql +SHOW FUNCTIONS +``` + +## ADMINISTRATION MANAGEMENT + +For more details, see document [Operator-and-Expression](./Operator-and-Expression.md). + +### SQL Statements + +- Create user (Requires MANAGE_USER permission) + +```SQL +CREATE USER +eg: CREATE USER user1 'passwd' +``` + +- Delete user (Requires MANAGE_USER permission) + +```sql +DROP USER +eg: DROP USER user1 +``` + +- Create role (Requires MANAGE_ROLE permission) + +```sql +CREATE ROLE +eg: CREATE ROLE role1 +``` + +- Delete role (Requires MANAGE_ROLE permission) + +```sql +DROP ROLE +eg: DROP ROLE role1 +``` + +- Grant role to user (Requires MANAGE_ROLE permission) + +```sql +GRANT ROLE TO +eg: GRANT ROLE admin TO user1 +``` + +- Revoke role from user(Requires MANAGE_ROLE permission) + +```sql +REVOKE ROLE FROM +eg: REVOKE ROLE admin FROM user1 +``` + +- List all user (Requires MANAGE_USER permission) + +```sql +LIST USER +``` + +- List all role (Requires MANAGE_ROLE permission) + +```sql +LIST ROLE +``` + +- List all users granted specific role.(Requires MANAGE_USER permission) + +```sql +LIST USER OF ROLE +eg: LIST USER OF ROLE roleuser +``` + +- List all role granted to specific user. + +```sql +LIST ROLE OF USER +eg: LIST ROLE OF USER tempuser +``` + +- List all privileges of user + +```sql +LIST PRIVILEGES OF USER ; +eg: LIST PRIVILEGES OF USER tempuser; +``` + +- List all privileges of role + +```sql +LIST PRIVILEGES OF ROLE ; +eg: LIST PRIVILEGES OF ROLE actor; +``` + +- Update password + +```sql +ALTER USER SET PASSWORD ; +eg: ALTER USER tempuser SET PASSWORD 'newpwd'; +``` + +### Authorization and Deauthorization + + +```sql +GRANT ON TO ROLE/USER [WITH GRANT OPTION]; +eg: GRANT READ ON root.** TO ROLE role1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.** TO USER user1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.**,root.t2.** TO USER user1; +eg: GRANT MANAGE_ROLE ON root.** TO USER user1 WITH GRANT OPTION; +eg: GRANT ALL ON root.** TO USER user1 WITH GRANT OPTION; +``` + +```sql +REVOKE ON FROM ROLE/USER ; +eg: REVOKE READ ON root.** FROM ROLE role1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.** FROM USER user1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.**, root.t2.** FROM USER user1; +eg: REVOKE MANAGE_ROLE ON root.** FROM USER user1; +eg: REVOKE ALL ON ROOT.** FROM USER user1; +``` + + +#### Delete Time Partition (experimental) + +``` +Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 +``` + +#### Continuous Query,CQ + +``` +Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END +``` + +#### Maintenance Command + +- FLUSH + +``` +Eg: IoTDB > flush +``` + +- MERGE + +``` +Eg: IoTDB > MERGE +Eg: IoTDB > FULL MERGE +``` + +- CLEAR CACHE + +```sql +Eg: IoTDB > CLEAR CACHE +``` + +- START REPAIR DATA + +```sql +Eg: IoTDB > START REPAIR DATA +``` + +- STOP REPAIR DATA + +```sql +Eg: IoTDB > STOP REPAIR DATA +``` + +- SET SYSTEM TO READONLY / WRITABLE + +``` +Eg: IoTDB > SET SYSTEM TO READONLY / WRITABLE +``` + +- Query abort + +``` +Eg: IoTDB > KILL QUERY 1 +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_apache.md b/src/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_apache.md new file mode 100644 index 00000000..f7b68d4f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_apache.md @@ -0,0 +1,5244 @@ + + +# UDF Libraries + +# UDF Libraries + +Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. + +> Note: The functions in the current UDF library only support millisecond level timestamp accuracy. + +## Installation steps + +1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. + + | UDF libraries version | Supported IoTDB versions | Download link | + | --------------- | ----------------- | ------------------------------------------------------------ | + | UDF-1.3.3.zip | V1.3.3 and above | Please contact Timecho for assistance | + | UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact Timecho for assistance| + +2. Place the library-udf.jar file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster +3. In the SQL command line terminal (CLI) or visualization console (Workbench) SQL operation interface of IoTDB, execute the corresponding function registration statement as follows. +4. Batch registration: Two registration methods: registration script or SQL full statement +- Register Script + - Copy the registration script (register-UDF.sh or register-UDF.bat) from the compressed package to the `tools` directory of IoTDB as needed, and modify the parameters in the script (default is host=127.0.0.1, rpcPort=6667, user=root, pass=root); + - Start IoTDB service, run registration script to batch register UDF + +- All SQL statements + - Open the SQl file in the compressed package, copy all SQL statements, and execute all SQl statements in the SQL command line terminal (CLI) of IoTDB or the SQL operation interface of the visualization console (Workbench) to batch register UDF + +## Data Quality + +### Completeness + +#### Registration statement + +```sql +create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' +``` + +#### Usage + +This function is used to calculate the completeness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the completeness of each window will be output. + +**Name:** COMPLETENESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. ++ `downtime`: Whether the downtime exception is considered in the calculation of completeness. It is 'true' or 'false' (default). When considering the downtime exception, long-term missing data will be considered as downtime exception without any influence on completeness. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +### Consistency + +#### Registration statement + +```sql +create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' +``` + +#### Usage + +This function is used to calculate the consistency of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the consistency of each window will be output. + +**Name:** CONSISTENCY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +### Timeliness + +#### Registration statement + +```sql +create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' +``` + +#### Usage + +This function is used to calculate the timeliness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the timeliness of each window will be output. + +**Name:** TIMELINESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +### Validity + +#### Registration statement + +```sql +create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' +``` + +#### Usage + +This function is used to calculate the Validity of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the Validity of each window will be output. + +**Name:** VALIDITY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + + + + + +## Data Profiling + +### ACF + +#### Registration statement + +```sql +create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' +``` + +#### Usage + +This function is used to calculate the auto-correlation factor of the input time series, +which equals to cross correlation between the same series. +For more information, please refer to [XCorr](./UDF-Libraries.md#xcorr) function. + +**Name:** ACF + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. +There are $2N-1$ data points in the series, and the values are interpreted in details in [XCorr](./UDF-Libraries.md#XCorr) function. + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| null| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +### Distinct + +#### Registration statement + +```sql +create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' +``` + +#### Usage + +This function returns all unique values in time series. + +**Name:** DISTINCT + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** + ++ The timestamp of the output series is meaningless. The output order is arbitrary. ++ Missing points and null points in the input series will be ignored, but `NaN` will not. ++ Case Sensitive. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select distinct(s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +### Histogram + +#### Registration statement + +```sql +create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' +``` + +#### Usage + +This function is used to calculate the distribution histogram of a single column of numerical data. + +**Name:** HISTOGRAM + +**Input Series:** Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameters:** + ++ `min`: The lower limit of the requested data range, the default value is -Double.MAX_VALUE. ++ `max`: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end. ++ `count`: The number of buckets of the histogram, the default value is 1. It must be a positive integer. + +**Output Series:** The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is $min+ (i-1)\cdot\frac{max-min}{count}$ and the upper bound is $min + i \cdot \frac{max-min}{count}$. + +**Note:** + ++ If the value is lower than `min`, it will be put into the 1st bucket. If the value is larger than `max`, it will be put into the last bucket. ++ Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +### Integral + +#### Registration statement + +```sql +create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' +``` + +#### Usage + +This function is used to calculate the integration of time series, +which equals to the area under the curve with time as X-axis and values as Y-axis. + +**Name:** INTEGRAL + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `unit`: The unit of time used when computing the integral. + The value should be chosen from "1S", "1s", "1m", "1H", "1d"(case-sensitive), + and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration. + +**Note:** + ++ The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. + Choosing different `unit` implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + +#### Examples + +##### Default Parameters + +With default parameters, this function will take one second as 1.0. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + +##### Specific time unit + +With time unit specified as "1m", this function will take one minute as 1.0. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +### IntegralAvg + +#### Registration statement + +```sql +create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' +``` + +#### Usage + +This function is used to calculate the function average of time series. +The output equals to the area divided by the time interval using the same time `unit`. +For more information of the area under the curve, please refer to `Integral` function. + +**Name:** INTEGRALAVG + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average. + +**Note:** + ++ The time-weighted value equals to the integral value with any `unit` divided by the time interval of input series. + The result is irrelevant to the time unit used in integral, and it's consistent with the timestamp precision of IoTDB by default. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + ++ If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +### Mad + +#### Registration statement + +```sql +create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' +``` + +#### Usage + +The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements' median. + +Take a dataset $\{1,3,3,5,5,6,7,8,9\}$ as an instance. Its median is 5 and the deviation of each element from the median is $\{0,0,1,2,2,2,3,4,4\}$, whose median is 2. Therefore, the MAD of the original dataset is 2. + +**Name:** MAD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking `error`=0.01 as an instance, suppose the exact MAD is $a$ and the approximate MAD is $b$, we have $0.99a \le b \le 1.01a$. With `error`=0, the output is the exact MAD. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +##### Exact Query + +With the default `error`(`error`=0), the function queries the exact MAD. + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +............ +Total line number = 20 +``` + +SQL for query: + +```sql +select mad(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|median(root.test.s1, "error"="0")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +##### Approximate Query + +By setting `error` within (0,1), the function queries the approximate MAD. + +SQL for query: + +```sql +select mad(s1, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s1, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| ++-----------------------------+---------------------------------+ +``` + +### Median + +#### Registration statement + +```sql +create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' +``` + +#### Usage + +The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample. + +**Name:** MEDIAN + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact median. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +Total line number = 20 +``` + +SQL for query: + +```sql +select median(s1, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s1, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+------------------------------------+ +``` + +### MinMax + +#### Registration statement + +```sql +create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' +``` + +#### Usage + +This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1. + +**Name:** MINMAX + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide minimum and maximum values. The default method is "batch". ++ `min`: The maximum value when method is set to "stream". ++ `max`: The minimum value when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select minmax(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + + +### MvAvg + +#### Registration statement + +```sql +create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' +``` + +#### Usage + +This function is used to calculate moving average of input series. + +**Name:** MVAVG + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `window`: Length of the moving window. Default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +### PACF + +#### Registration statement + +```sql +create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' +``` + +#### Usage + +This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output. + +**Name:** PACF + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lag`: Maximum lag of pacf to calculate. The default value is $\min(10\log_{10}n,n-1)$, where $n$ is the number of data points. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Assigning maximum lag + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select pacf(s1, "lag"="5") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|pacf(root.test.d1.s1, "lag"="5")| ++-----------------------------+--------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| +|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| +|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| +|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| ++-----------------------------+--------------------------------+ +``` + +### Percentile + +#### Registration statement + +```sql +create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' +``` + +#### Usage + +The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series. + +**Name:** PERCENTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with `rank`=0.5 is the median. ++ `error`: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact percentile. + +**Output Series:** Output a single series. The type is the same as input series. If `error`=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------+ +| Time|root.test2.s1| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+-------------+ +Total line number = 20 +``` + +SQL for query: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| ++-----------------------------+-------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| -1.0| ++-----------------------------+-------------------------------------------------------+ +``` + +### Quantile + +#### Registration statement + +```sql +create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' +``` + +#### Usage + +The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series. + +**Name:** QUANTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with `rank`=0.5 is the median. ++ `K`: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases. + +**Output Series:** Output a single series. The type is the same as input series. The timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------+ +| Time|root.test1.s1| ++-----------------------------+-------------+ +|2021-03-17T10:32:17.054+08:00| 7| +|2021-03-17T10:32:18.054+08:00| 15| +|2021-03-17T10:32:19.054+08:00| 36| +|2021-03-17T10:32:20.054+08:00| 39| +|2021-03-17T10:32:21.054+08:00| 40| +|2021-03-17T10:32:22.054+08:00| 41| +|2021-03-17T10:32:23.054+08:00| 20| +|2021-03-17T10:32:24.054+08:00| 18| ++-----------------------------+-------------+ +............ +Total line number = 8 +``` + +SQL for query: + +```sql +select quantile(s1, "rank"="0.2", "K"="800") from root.test1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------+ +| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.000000000000001| ++-----------------------------+------------------------------------------------+ +``` + +### Period + +#### Registration statement + +```sql +create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' +``` + +#### Usage + +The function is used to compute the period of a numeric time series. + +**Name:** PERIOD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period. + +#### Examples + +Input series: + + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select period(s1) from root.test.d3 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +### QLB + +#### Registration statement + +```sql +create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' +``` + +#### Usage + +This function is used to calculate Ljung-Box statistics $Q_{LB}$ for time series, and convert it to p value. + +**Name:** QLB + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters**: + +`lag`: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2. + +**Output Series:** Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag. + +**Note:** If you want to calculate Ljung-Box statistics $Q_{LB}$ instead of p value, you may use ACF function. + +#### Examples + +##### Using Default Parameter + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select QLB(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +### Resample + +#### Registration statement + +```sql +create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' +``` + +#### Usage + +This function is used to resample the input series according to a given frequency, +including up-sampling and down-sampling. +Currently, the supported up-sampling methods are +NaN (filling with `NaN`), +FFill (filling with previous value), +BFill (filling with next value) and +Linear (filling with linear interpolation). +Down-sampling relies on group aggregation, +which supports Max, Min, First, Last, Mean and Median. + +**Name:** RESAMPLE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + + ++ `every`: The frequency of resampling, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. ++ `interp`: The interpolation method of up-sampling, which is 'NaN', 'FFill', 'BFill' or 'Linear'. By default, NaN is used. ++ `aggr`: The aggregation method of down-sampling, which is 'Max', 'Min', 'First', 'Last', 'Mean' or 'Median'. By default, Mean is used. ++ `start`: The start time (inclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the first valid data point. ++ `end`: The end time (exclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the last valid data point. + +**Output Series:** Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency `every`. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +##### Up-sampling + +When the frequency of resampling is higher than the original frequency, up-sampling starts. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +SQL for query: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +##### Down-sampling + +When the frequency of resampling is lower than the original frequency, down-sampling starts. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + + +##### Specify the time period + +The time period of resampling can be specified with `start` and `end`. +The period outside the actual time range will be interpolated. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +### Sample + +#### Registration statement + +```sql +create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' +``` + +#### Usage + +This function is used to sample the input series, +that is, select a specified number of data points from the input series and output them. +Currently, three sampling methods are supported: +**Reservoir sampling** randomly selects data points. +All of the points have the same probability of being sampled. +**Isometric sampling** selects data points at equal index intervals. +**Triangle sampling** assigns data points to the buckets based on the number of sampling. +Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. +For more detail, please read [paper](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf) + +**Name:** SAMPLE + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Parameters:** + ++ `method`: The method of sampling, which is 'reservoir', 'isometric' or 'triangle'. By default, reservoir sampling is used. ++ `k`: The number of sampling, which is a positive integer. By default, it's 1. + +**Output Series:** Output a single series. The type is the same as the input. The length of the output series is `k`. Each data point in the output series comes from the input series. + +**Note:** If `k` is greater than the length of input series, all data points in the input series will be output. + +#### Examples + +##### Reservoir Sampling + +When `method` is 'reservoir' or the default, reservoir sampling is used. +Due to the randomness of this method, the output series shown below is only a possible result. + + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + +##### Isometric Sampling + +When `method` is 'isometric', isometric sampling is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +### Segment + +#### Registration statement + +```sql +create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' +``` + +#### Usage + +This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point. + +**Name:** SEGMENT + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `output` :"all" to output all fitted points; "first" to output first fitted points in each subsequence. + ++ `error`: error allowed at linear regression. It is defined as mean absolute error of a subsequence. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select segment(s1, "error"="0.1") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +### Skew + +#### Registration statement + +```sql +create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' +``` + +#### Usage + +This function is used to calculate the population skewness. + +**Name:** SKEW + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select skew(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +### Spline + +#### Registration statement + +```sql +create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' +``` + +#### Usage + +This function is used to calculate cubic spline interpolation of input series. + +**Name:** SPLINE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `points`: Number of resampling points. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note**: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series. + +#### Examples + +##### Assigning number of interpolation points + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select spline(s1, "points"="151") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +### Spread + +#### Registration statement + +```sql +create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' +``` + +#### Usage + +This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value. + +**Name:** SPREAD + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + + + +### ZScore + +#### Registration statement + +```sql +create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' +``` + +#### Usage + +This function is used to standardize the input series with z-score. + +**Name:** ZSCORE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide mean and standard deviation. The default method is "batch". ++ `avg`: Mean value when method is set to "stream". ++ `sd`: Standard deviation when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select zscore(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` + + +## Anomaly Detection + +### IQR + +#### Registration statement + +```sql +create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' +``` + +#### Usage + +This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected. + +**Name:** IQR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch". ++ `q1`: The lower quantile when method is set to "stream". ++ `q3`: The upper quantile when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** $IQR=Q_3-Q_1$ + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select iqr(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +### KSigma + +#### Registration statement + +```sql +create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' +``` + +#### Usage + +This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. +Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly. + +**Name:** KSIGMA + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `k`: How many times to multiply on standard deviation to define anomaly, the default value is 3. ++ `window`: The window size of Dynamic K-Sigma Algorithm, the default value is 10000. + +**Output Series:** Output a single series. The type is same as input series. + +**Note:** Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output. + +#### Examples + +##### Assigning k + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +### LOF + +#### Registration statement + +```sql +create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' +``` + +#### Usage + +This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output. + +**Name:** LOF + +**Input Series:** Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension. ++ `k`:use the k-th distance to calculate lof. Default value is 3. ++ `window`: size of window to split origin data points. Default value is 10000. ++ `windowsize`:dimension that will be transformed into when method is "series". The default value is 5. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** Incomplete rows will be ignored. They are neither calculated nor marked as anomaly. + +#### Examples + +##### Using default parameters + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +##### Diagnosing 1d timeseries + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +### MissDetect + +#### Registration statement + +```sql +create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' +``` + +#### Usage + +This function is used to detect missing anomalies. +In some datasets, missing values are filled by linear interpolation. +Thus, there are several long perfect linear segments. +By discovering these perfect linear segments, +missing anomalies are detected. + +**Name:** MISSDETECT + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + +`error`: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10. + +**Output Series:** Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +### Range + +#### Registration statement + +```sql +create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' +``` + +#### Usage + +This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output. + +**Name:** RANGE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lower_bound`:lower bound of range anomaly detection. ++ `upper_bound`:upper bound of range anomaly detection. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** Only when `upper_bound` is larger than `lower_bound`, the anomaly detection will be performed. Otherwise, nothing will be output. + + + +#### Examples + +##### Assigning Lower and Upper Bound + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +### TwoSidedFilter + +#### Registration statement + +```sql +create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' +``` + +#### Usage + +The function is used to filter anomalies of a numeric time series based on two-sided window detection. + +**Name:** TWOSIDEDFILTER + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE + +**Output Series:** Output a single series. The type is the same as the input. It is the input without anomalies. + +**Parameter:** + +- `len`: The size of the window, which is a positive integer. By default, it's 5. When `len`=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point. + +- `threshold`: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +Output series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +### Outlier + +#### Registration statement + +```sql +create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' +``` + +#### Usage + +This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier. + +**Name:** OUTLIER + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `r`:the neighbor distance threshold. ++ `k`:the neighbor count threshold. ++ `w`:the window size. ++ `s`:the slide size. + +**Output Series:** Output a single series. The type is the same as the input. + +#### Examples + +##### Assigning Parameters of Queries + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + + +### MasterTrain + +#### Usage + +This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points. + +**Name:** MasterTrain + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn spotless:apply`. +- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. +- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ +``` + +### MasterDetect + +#### Usage + +This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain. + +**Name:** MasterDetect + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. ++ `eta`: The detection threshold. By default, it will be estimated based on the 3-sigma rule. ++ `output_type`: The type of output. 'repair' for repairing and 'anomaly' for anomaly detection. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn spotless:apply`. +- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. +- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +##### Repairing + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +##### Anomaly Detection + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| true| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + +## Frequency Domain Analysis + +### Conv + +#### Registration statement + +```sql +create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' +``` + +#### Usage + +This function is used to calculate the convolution, i.e. polynomial multiplication. + +**Name:** CONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output:** Output a single series. The type is DOUBLE. It is the result of convolution whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +### Deconv + +#### Registration statement + +```sql +create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' +``` + +#### Usage + +This function is used to calculate the deconvolution, i.e. polynomial division. + +**Name:** DECONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `result`: The result of deconvolution, which is 'quotient' or 'remainder'. By default, the quotient will be output. + +**Output:** Output a single series. The type is DOUBLE. It is the result of deconvolving the second series from the first series (dividing the first series by the second series) whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Calculate the quotient + +When `result` is 'quotient' or the default, this function calculates the quotient of the deconvolution. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +##### Calculate the remainder + +When `result` is 'remainder', this function calculates the remainder of the deconvolution. + +Input series is the same as above, the SQL for query is shown below: + + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +### DWT + +#### Registration statement + +```sql +create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' +``` + +#### Usage + +This function is used to calculate 1d discrete wavelet transform of a numerical series. + +**Name:** DWT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of wavelet. May select 'Haar', 'DB4', 'DB6', 'DB8', where DB means Daubechies. User may offer coefficients of wavelet transform and ignore this parameter. Case ignored. ++ `coef`: Coefficients of wavelet transform. When providing this parameter, use comma ',' to split them, and leave no spaces or other punctuations. ++ `layer`: Times to transform. The number of output vectors equals $layer+1$. Default is 1. + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. + +**Note:** The length of input series must be an integer number power of 2. + +#### Examples + + +##### Haar wavelet transform + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +### FFT + +#### Registration statement + +```sql +create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' +``` + +#### Usage + +This function is used to calculate the fast Fourier transform (FFT) of a numerical series. + +**Name:** FFT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of FFT, which is 'uniform' (by default) or 'nonuniform'. If the value is 'uniform', the timestamps will be ignored and all data points will be regarded as equidistant. Thus, the equidistant fast Fourier transform algorithm will be applied. If the value is 'nonuniform' (TODO), the non-equidistant fast Fourier transform algorithm will be applied based on timestamps. ++ `result`: The result of FFT, which is 'real', 'imag', 'abs' or 'angle', corresponding to the real part, imaginary part, magnitude and phase angle. By default, the magnitude will be output. ++ `compress`: The parameter of compression, which is within (0,1]. It is the reserved energy ratio of lossy compression. By default, there is no compression. + + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. The timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Uniform FFT + +With the default `type`, uniform FFT is applied. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select fft(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, there are peaks in $k=4$ and $k=5$ of the output. + +##### Uniform FFT with Compression + +Input series is the same as above, the SQL for query is shown below: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +Note: Based on the conjugation of the Fourier transform result, only the first half of the compression result is reserved. +According to the given parameter, data points are reserved from low frequency to high frequency until the reserved energy ratio exceeds it. +The last data point is reserved to indicate the length of the series. + +### HighPass + +#### Registration statement + +```sql +create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' +``` + +#### Usage + +This function performs low-pass filtering on the input series and extracts components above the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** HIGHPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=sin(2\pi t/4)$ after high-pass filtering. + +### IFFT + +#### Registration statement + +```sql +create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' +``` + +#### Usage + +This function treats the two input series as the real and imaginary part of a complex series, performs an inverse fast Fourier transform (IFFT), and outputs the real part of the result. +For the input format, please refer to the output format of `FFT` function. +Moreover, the compressed output of `FFT` function is also supported. + +**Name:** IFFT + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `start`: The start time of the output series with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is '1970-01-01 08:00:00'. ++ `interval`: The interval of the output series, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it is 1s. + +**Output:** Output a single series. The type is DOUBLE. It is strictly equispaced. The values are the results of IFFT. + +**Note:** If a row contains null points or `NaN`, it will be ignored. + +#### Examples + + +Input series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +SQL for query: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +### LowPass + +#### Registration statement + +```sql +create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' +``` + +#### Usage + +This function performs low-pass filtering on the input series and extracts components below the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** LOWPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=2sin(2\pi t/5)$ after low-pass filtering. + + + +## Data Matching + +### Cov + +#### Registration statement + +```sql +create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' +``` + +#### Usage + +This function is used to calculate the population covariance. + +**Name:** COV + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population covariance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +### DTW + +#### Registration statement + +```sql +create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' +``` + +#### Usage + +This function is used to calculate the DTW distance between two input series. + +**Name:** DTW + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the DTW distance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `0` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +### Pearson + +#### Registration statement + +```sql +create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' +``` + +#### Usage + +This function is used to calculate the Pearson Correlation Coefficient. + +**Name:** PEARSON + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the Pearson Correlation Coefficient. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +### PtnSym + +#### Registration statement + +```sql +create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' +``` + +#### Usage + +This function is used to find all symmetric subseries in the input whose degree of symmetry is less than the threshold. +The degree of symmetry is calculated by DTW. +The smaller the degree, the more symmetrical the series is. + +**Name:** PATTERNSYMMETRIC + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameter:** + ++ `window`: The length of the symmetric subseries. It's a positive integer and the default value is 10. ++ `threshold`: The threshold of the degree of symmetry. It's non-negative. Only the subseries whose degree of symmetry is below it will be output. By default, all subseries will be output. + + +**Output Series:** Output a single series. The type is DOUBLE. Each data point in the output series corresponds to a symmetric subseries. The output timestamp is the starting timestamp of the subseries and the output value is the degree of symmetry. + +#### Example + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +### XCorr + +#### Registration statement + +```sql +create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' +``` + +#### Usage + +This function is used to calculate the cross correlation function of given two time series. +For discrete time series, cross correlation is given by +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +which represent the similarities between two series with different index shifts. + +**Name:** XCORR + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series with DOUBLE as datatype. +There are $2N-1$ data points in the series, the center of which represents the cross correlation +calculated with pre-aligned series(that is $CR(0)$ in the formula above), +and the previous(or post) values represent those with shifting the latter series forward(or backward otherwise) +until the two series are no longer overlapped(not included). +In short, the values of output series are given by(index starts from 1) +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` + + + +## Data Repairing + +### TimestampRepair + +#### Registration statement + +```sql +create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' +``` + +#### Usage + +This function is used for timestamp repair. +According to the given standard time interval, +the method of minimizing the repair cost is adopted. +By fine-tuning the timestamps, +the original data with unstable timestamp interval is repaired to strictly equispaced data. +If no standard time interval is given, +this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval. + +**Name:** TIMESTAMPREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method. ++ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +##### Manually Specify the Standard Time Interval + +When `interval` is given, this function repairs according to the given standard time interval. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +Output series: + + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +##### Automatically Estimate the Standard Time Interval + +When `interval` is default, this function estimates the standard time interval. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +### ValueFill + +#### Registration statement + +```sql +create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' +``` + +#### Usage + +This function is used to impute time series. Several methods are supported. + +**Name**: ValueFill +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear". + Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0). + +#### Examples + +##### Fill with linear + +When `method` is "linear" or the default, Screen method is used to impute. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuefill(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +##### Previous Fill + +When `method` is "previous", previous method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +### ValueRepair + +#### Registration statement + +```sql +create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' +``` + +#### Usage + +This function is used to repair the value of the time series. +Currently, two methods are supported: +**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes; +**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood. + + +**Name:** VALUEREPAIR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used. ++ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation. ++ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation. ++ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0. ++ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Screen + +When `method` is 'Screen' or the default, Screen method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +##### Repair with LsGreedy + +When `method` is 'LsGreedy', LsGreedy method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +### MasterRepair + +#### Usage + +This function is used to clean time series with master data. + +**Name**: MasterRepair +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. ++ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +SQL for query: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +Output series: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +### SeasonalRepair + +#### Usage +This function is used to repair the value of the seasonal time series via decomposition. Currently, two methods are supported: **Classical** - detect irregular fluctuations through residual component decomposed by classical decomposition, and repair them through moving average; **Improved** - detect irregular fluctuations through residual component decomposed by improved decomposition, and repair them through moving median. + +**Name:** SEASONALREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The decomposition method used to repair, which is 'Classical' or 'Improved'. By default, classical decomposition is used. ++ `period`: It is the period of the time series. ++ `k`: It is the range threshold of residual term, which limits the degree to which the residual term is off-center. By default, it is 9. ++ `max_iter`: It is the maximum number of iterations for the algorithm. By default, it is 10. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Classical + +When `method` is 'Classical' or default value, classical decomposition method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +##### Repair with Improved +When `method` is 'Improved', improved decomposition method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` + + + +## Series Discovery + +### ConsecutiveSequences + +#### Registration statement + +```sql +create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' +``` + +#### Usage + +This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest. + +**Name:** CONSECUTIVESEQUENCES + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + +##### Manually Specify the Standard Time Interval + +It's able to manually specify the standard time interval by the parameter `gap`. It's notable that false parameter leads to false output. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + + +##### Automatically Estimate the Standard Time Interval + +When `gap` is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +### ConsecutiveWindows + +#### Registration statement + +```sql +create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' +``` + +#### Usage + +This function is used to find consecutive windows of specified length in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive window is the subsequence that is strictly equispaced with the standard time interval without any missing data. + +**Name:** CONSECUTIVEWINDOWS + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. ++ `length`: The length of the window which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a consecutive window. The output timestamp is the starting timestamp of the window and the output value is the number of data points in the window. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` + + + +## Machine Learning + +### AR + +#### Registration statement + +```sql +create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' +``` + +#### Usage + +This function is used to learn the coefficients of the autoregressive models for a time series. + +**Name:** AR + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `p`: The order of the autoregressive model. Its default value is 1. + +**Output Series:** Output a single series. The type is DOUBLE. The first line corresponds to the first order coefficient, and so on. + +**Note:** + +- Parameter `p` should be a positive integer. +- Most points in the series should be sampled at a constant time interval. +- Linear interpolation is applied for the missing points in the series. + +#### Examples + +##### Assigning Model Order + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +### Representation + +#### Usage + +This function is used to represent a time series. + +**Name:** Representation + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is INT32. The length is `tb*vb`. The timestamps starting from 0 only indicate the order. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +### RM + +#### Usage + +This function is used to calculate the matching score of two time series according to the representation. + +**Name:** RM + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the matching score. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_timecho.md b/src/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_timecho.md new file mode 100644 index 00000000..b03e86e5 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_timecho.md @@ -0,0 +1,5304 @@ + + +# UDF Libraries + +# UDF Libraries + +Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. + +> Note: The functions in the current UDF library only support millisecond level timestamp accuracy. + +## Installation steps + +1. Please obtain the compressed file of the UDF library JAR package that is compatible with the IoTDB version. + + | UDF libraries version | Supported IoTDB versions | Download link | + | --------------- | ----------------- | ------------------------------------------------------------ | + | UDF-1.3.3.zip | V1.3.3 and above | Please contact Timecho for assistance | + | UDF-1.3.2.zip | V1.0.0~V1.3.2 | Please contact Timecho for assistance | + +2. Place the `library-udf.jar` file in the compressed file obtained in the directory `/ext/udf ` of all nodes in the IoTDB cluster +3. In the SQL command line terminal (CLI) or visualization console (Workbench) SQL operation interface of IoTDB, execute the corresponding function registration statement as follows. +4. Batch registration: Two registration methods: registration script or SQL full statement +- Register Script + - Copy the registration script (`register-UDF.sh` or `register-UDF.bat`) from the compressed package to the `tools` directory of IoTDB as needed, and modify the parameters in the script (default is host=127.0.0.1, rpcPort=6667, user=root, pass=root); + - Start IoTDB service, run registration script to batch register UDF + +- All SQL statements + - Open the SQl file in the compressed package, copy all SQL statements, and execute all SQl statements in the SQL command line terminal (CLI) of IoTDB or the SQL operation interface of the visualization console (Workbench) to batch register UDF + +## Data Quality + +### Completeness + +#### Registration statement + +```sql +create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' +``` + +#### Usage + +This function is used to calculate the completeness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the completeness of each window will be output. + +**Name:** COMPLETENESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. ++ `downtime`: Whether the downtime exception is considered in the calculation of completeness. It is 'true' or 'false' (default). When considering the downtime exception, long-term missing data will be considered as downtime exception without any influence on completeness. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +### Consistency + +#### Registration statement + +```sql +create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' +``` + +#### Usage + +This function is used to calculate the consistency of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the consistency of each window will be output. + +**Name:** CONSISTENCY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +### Timeliness + +#### Registration statement + +```sql +create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' +``` + +#### Usage + +This function is used to calculate the timeliness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the timeliness of each window will be output. + +**Name:** TIMELINESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +### Validity + +#### Registration statement + +```sql +create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' +``` + +#### Usage + +This function is used to calculate the Validity of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the Validity of each window will be output. + +**Name:** VALIDITY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + + + + + +## Data Profiling + +### ACF + +#### Registration statement + +```sql +create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' +``` + +#### Usage + +This function is used to calculate the auto-correlation factor of the input time series, +which equals to cross correlation between the same series. +For more information, please refer to [XCorr](./UDF-Libraries.md#xcorr) function. + +**Name:** ACF + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. +There are $2N-1$ data points in the series, and the values are interpreted in details in [XCorr](./UDF-Libraries.md#XCorr) function. + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| null| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +### Distinct + +#### Registration statement + +```sql +create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' +``` + +#### Usage + +This function returns all unique values in time series. + +**Name:** DISTINCT + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** + ++ The timestamp of the output series is meaningless. The output order is arbitrary. ++ Missing points and null points in the input series will be ignored, but `NaN` will not. ++ Case Sensitive. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select distinct(s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +### Histogram + +#### Registration statement + +```sql +create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' +``` + +#### Usage + +This function is used to calculate the distribution histogram of a single column of numerical data. + +**Name:** HISTOGRAM + +**Input Series:** Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameters:** + ++ `min`: The lower limit of the requested data range, the default value is -Double.MAX_VALUE. ++ `max`: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end. ++ `count`: The number of buckets of the histogram, the default value is 1. It must be a positive integer. + +**Output Series:** The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is $min+ (i-1)\cdot\frac{max-min}{count}$ and the upper bound is $min + i \cdot \frac{max-min}{count}$. + +**Note:** + ++ If the value is lower than `min`, it will be put into the 1st bucket. If the value is larger than `max`, it will be put into the last bucket. ++ Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +### Integral + +#### Registration statement + +```sql +create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' +``` + +#### Usage + +This function is used to calculate the integration of time series, +which equals to the area under the curve with time as X-axis and values as Y-axis. + +**Name:** INTEGRAL + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `unit`: The unit of time used when computing the integral. + The value should be chosen from "1S", "1s", "1m", "1H", "1d"(case-sensitive), + and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration. + +**Note:** + ++ The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. + Choosing different `unit` implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + +#### Examples + +##### Default Parameters + +With default parameters, this function will take one second as 1.0. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + +##### Specific time unit + +With time unit specified as "1m", this function will take one minute as 1.0. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +### IntegralAvg + +#### Registration statement + +```sql +create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' +``` + +#### Usage + +This function is used to calculate the function average of time series. +The output equals to the area divided by the time interval using the same time `unit`. +For more information of the area under the curve, please refer to `Integral` function. + +**Name:** INTEGRALAVG + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average. + +**Note:** + ++ The time-weighted value equals to the integral value with any `unit` divided by the time interval of input series. + The result is irrelevant to the time unit used in integral, and it's consistent with the timestamp precision of IoTDB by default. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + ++ If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +### Mad + +#### Registration statement + +```sql +create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' +``` + +#### Usage + +The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements' median. + +Take a dataset $\{1,3,3,5,5,6,7,8,9\}$ as an instance. Its median is 5 and the deviation of each element from the median is $\{0,0,1,2,2,2,3,4,4\}$, whose median is 2. Therefore, the MAD of the original dataset is 2. + +**Name:** MAD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking `error`=0.01 as an instance, suppose the exact MAD is $a$ and the approximate MAD is $b$, we have $0.99a \le b \le 1.01a$. With `error`=0, the output is the exact MAD. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +##### Exact Query + +With the default `error`(`error`=0), the function queries the exact MAD. + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +............ +Total line number = 20 +``` + +SQL for query: + +```sql +select mad(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|median(root.test.s1, "error"="0")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +##### Approximate Query + +By setting `error` within (0,1), the function queries the approximate MAD. + +SQL for query: + +```sql +select mad(s1, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s1, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| ++-----------------------------+---------------------------------+ +``` + +### Median + +#### Registration statement + +```sql +create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' +``` + +#### Usage + +The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample. + +**Name:** MEDIAN + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact median. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +Total line number = 20 +``` + +SQL for query: + +```sql +select median(s1, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s1, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+------------------------------------+ +``` + +### MinMax + +#### Registration statement + +```sql +create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' +``` + +#### Usage + +This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1. + +**Name:** MINMAX + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide minimum and maximum values. The default method is "batch". ++ `min`: The maximum value when method is set to "stream". ++ `max`: The minimum value when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select minmax(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + + +### MvAvg + +#### Registration statement + +```sql +create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' +``` + +#### Usage + +This function is used to calculate moving average of input series. + +**Name:** MVAVG + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `window`: Length of the moving window. Default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +### PACF + +#### Registration statement + +```sql +create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' +``` + +#### Usage + +This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output. + +**Name:** PACF + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lag`: Maximum lag of pacf to calculate. The default value is $\min(10\log_{10}n,n-1)$, where $n$ is the number of data points. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Assigning maximum lag + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select pacf(s1, "lag"="5") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|pacf(root.test.d1.s1, "lag"="5")| ++-----------------------------+--------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| +|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| +|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| +|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| ++-----------------------------+--------------------------------+ +``` + +### Percentile + +#### Registration statement + +```sql +create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' +``` + +#### Usage + +The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series. + +**Name:** PERCENTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with `rank`=0.5 is the median. ++ `error`: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact percentile. + +**Output Series:** Output a single series. The type is the same as input series. If `error`=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------+ +| Time|root.test2.s1| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+-------------+ +Total line number = 20 +``` + +SQL for query: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| ++-----------------------------+-------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| -1.0| ++-----------------------------+-------------------------------------------------------+ +``` + +### Quantile + +#### Registration statement + +```sql +create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' +``` + +#### Usage + +The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series. + +**Name:** QUANTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with `rank`=0.5 is the median. ++ `K`: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases. + +**Output Series:** Output a single series. The type is the same as input series. The timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+-------------+ +| Time|root.test1.s1| ++-----------------------------+-------------+ +|2021-03-17T10:32:17.054+08:00| 7| +|2021-03-17T10:32:18.054+08:00| 15| +|2021-03-17T10:32:19.054+08:00| 36| +|2021-03-17T10:32:20.054+08:00| 39| +|2021-03-17T10:32:21.054+08:00| 40| +|2021-03-17T10:32:22.054+08:00| 41| +|2021-03-17T10:32:23.054+08:00| 20| +|2021-03-17T10:32:24.054+08:00| 18| ++-----------------------------+-------------+ +............ +Total line number = 8 +``` + +SQL for query: + +```sql +select quantile(s1, "rank"="0.2", "K"="800") from root.test1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------+ +| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.000000000000001| ++-----------------------------+------------------------------------------------+ +``` + +### Period + +#### Registration statement + +```sql +create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' +``` + +#### Usage + +The function is used to compute the period of a numeric time series. + +**Name:** PERIOD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period. + +#### Examples + +Input series: + + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select period(s1) from root.test.d3 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +### QLB + +#### Registration statement + +```sql +create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' +``` + +#### Usage + +This function is used to calculate Ljung-Box statistics $Q_{LB}$ for time series, and convert it to p value. + +**Name:** QLB + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters**: + +`lag`: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2. + +**Output Series:** Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag. + +**Note:** If you want to calculate Ljung-Box statistics $Q_{LB}$ instead of p value, you may use ACF function. + +#### Examples + +##### Using Default Parameter + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select QLB(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +### Resample + +#### Registration statement + +```sql +create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' +``` + +#### Usage + +This function is used to resample the input series according to a given frequency, +including up-sampling and down-sampling. +Currently, the supported up-sampling methods are +NaN (filling with `NaN`), +FFill (filling with previous value), +BFill (filling with next value) and +Linear (filling with linear interpolation). +Down-sampling relies on group aggregation, +which supports Max, Min, First, Last, Mean and Median. + +**Name:** RESAMPLE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + + ++ `every`: The frequency of resampling, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. ++ `interp`: The interpolation method of up-sampling, which is 'NaN', 'FFill', 'BFill' or 'Linear'. By default, NaN is used. ++ `aggr`: The aggregation method of down-sampling, which is 'Max', 'Min', 'First', 'Last', 'Mean' or 'Median'. By default, Mean is used. ++ `start`: The start time (inclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the first valid data point. ++ `end`: The end time (exclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the last valid data point. + +**Output Series:** Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency `every`. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +##### Up-sampling + +When the frequency of resampling is higher than the original frequency, up-sampling starts. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +SQL for query: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +##### Down-sampling + +When the frequency of resampling is lower than the original frequency, down-sampling starts. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + + +##### Specify the time period + +The time period of resampling can be specified with `start` and `end`. +The period outside the actual time range will be interpolated. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +### Sample + +#### Registration statement + +```sql +create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' +``` + +#### Usage + +This function is used to sample the input series, +that is, select a specified number of data points from the input series and output them. +Currently, three sampling methods are supported: +**Reservoir sampling** randomly selects data points. +All of the points have the same probability of being sampled. +**Isometric sampling** selects data points at equal index intervals. +**Triangle sampling** assigns data points to the buckets based on the number of sampling. +Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. +For more detail, please read [paper](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf) + +**Name:** SAMPLE + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Parameters:** + ++ `method`: The method of sampling, which is 'reservoir', 'isometric' or 'triangle'. By default, reservoir sampling is used. ++ `k`: The number of sampling, which is a positive integer. By default, it's 1. + +**Output Series:** Output a single series. The type is the same as the input. The length of the output series is `k`. Each data point in the output series comes from the input series. + +**Note:** If `k` is greater than the length of input series, all data points in the input series will be output. + +#### Examples + +##### Reservoir Sampling + +When `method` is 'reservoir' or the default, reservoir sampling is used. +Due to the randomness of this method, the output series shown below is only a possible result. + + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + +##### Isometric Sampling + +When `method` is 'isometric', isometric sampling is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +### Segment + +#### Registration statement + +```sql +create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' +``` + +#### Usage + +This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point. + +**Name:** SEGMENT + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `output` :"all" to output all fitted points; "first" to output first fitted points in each subsequence. + ++ `error`: error allowed at linear regression. It is defined as mean absolute error of a subsequence. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select segment(s1, "error"="0.1") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +### Skew + +#### Registration statement + +```sql +create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' +``` + +#### Usage + +This function is used to calculate the population skewness. + +**Name:** SKEW + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select skew(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +### Spline + +#### Registration statement + +```sql +create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' +``` + +#### Usage + +This function is used to calculate cubic spline interpolation of input series. + +**Name:** SPLINE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `points`: Number of resampling points. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note**: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series. + +#### Examples + +##### Assigning number of interpolation points + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select spline(s1, "points"="151") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +### Spread + +#### Registration statement + +```sql +create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' +``` + +#### Usage + +This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value. + +**Name:** SPREAD + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + + + +### ZScore + +#### Registration statement + +```sql +create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' +``` + +#### Usage + +This function is used to standardize the input series with z-score. + +**Name:** ZSCORE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide mean and standard deviation. The default method is "batch". ++ `avg`: Mean value when method is set to "stream". ++ `sd`: Standard deviation when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select zscore(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` + + +## Anomaly Detection + +### IQR + +#### Registration statement + +```sql +create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' +``` + +#### Usage + +This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected. + +**Name:** IQR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch". ++ `q1`: The lower quantile when method is set to "stream". ++ `q3`: The upper quantile when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** $IQR=Q_3-Q_1$ + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select iqr(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +### KSigma + +#### Registration statement + +```sql +create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' +``` + +#### Usage + +This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. +Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly. + +**Name:** KSIGMA + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `k`: How many times to multiply on standard deviation to define anomaly, the default value is 3. ++ `window`: The window size of Dynamic K-Sigma Algorithm, the default value is 10000. + +**Output Series:** Output a single series. The type is same as input series. + +**Note:** Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output. + +#### Examples + +##### Assigning k + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +### LOF + +#### Registration statement + +```sql +create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' +``` + +#### Usage + +This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output. + +**Name:** LOF + +**Input Series:** Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension. ++ `k`:use the k-th distance to calculate lof. Default value is 3. ++ `window`: size of window to split origin data points. Default value is 10000. ++ `windowsize`:dimension that will be transformed into when method is "series". The default value is 5. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** Incomplete rows will be ignored. They are neither calculated nor marked as anomaly. + +#### Examples + +##### Using default parameters + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +##### Diagnosing 1d timeseries + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +### MissDetect + +#### Registration statement + +```sql +create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' +``` + +#### Usage + +This function is used to detect missing anomalies. +In some datasets, missing values are filled by linear interpolation. +Thus, there are several long perfect linear segments. +By discovering these perfect linear segments, +missing anomalies are detected. + +**Name:** MISSDETECT + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + +`error`: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10. + +**Output Series:** Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +### Range + +#### Registration statement + +```sql +create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' +``` + +#### Usage + +This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output. + +**Name:** RANGE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lower_bound`:lower bound of range anomaly detection. ++ `upper_bound`:upper bound of range anomaly detection. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** Only when `upper_bound` is larger than `lower_bound`, the anomaly detection will be performed. Otherwise, nothing will be output. + + + +#### Examples + +##### Assigning Lower and Upper Bound + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +### TwoSidedFilter + +#### Registration statement + +```sql +create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' +``` + +#### Usage + +The function is used to filter anomalies of a numeric time series based on two-sided window detection. + +**Name:** TWOSIDEDFILTER + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE + +**Output Series:** Output a single series. The type is the same as the input. It is the input without anomalies. + +**Parameter:** + +- `len`: The size of the window, which is a positive integer. By default, it's 5. When `len`=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point. + +- `threshold`: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +Output series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +### Outlier + +#### Registration statement + +```sql +create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' +``` + +#### Usage + +This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier. + +**Name:** OUTLIER + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `r`:the neighbor distance threshold. ++ `k`:the neighbor count threshold. ++ `w`:the window size. ++ `s`:the slide size. + +**Output Series:** Output a single series. The type is the same as the input. + +#### Examples + +##### Assigning Parameters of Queries + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + + +### MasterTrain + +#### Usage + +This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points. + +**Name:** MasterTrain + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn spotless:apply`. +- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. +- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ +``` + +### MasterDetect + +#### Usage + +This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain. + +**Name:** MasterDetect + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. ++ `eta`: The detection threshold. By default, it will be estimated based on the 3-sigma rule. ++ `output_type`: The type of output. 'repair' for repairing and 'anomaly' for anomaly detection. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn spotless:apply`. +- Run `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies`. +- Copy `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +##### Repairing + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +##### Anomaly Detection + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| true| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + +## Frequency Domain Analysis + +### Conv + +#### Registration statement + +```sql +create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' +``` + +#### Usage + +This function is used to calculate the convolution, i.e. polynomial multiplication. + +**Name:** CONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output:** Output a single series. The type is DOUBLE. It is the result of convolution whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +### Deconv + +#### Registration statement + +```sql +create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' +``` + +#### Usage + +This function is used to calculate the deconvolution, i.e. polynomial division. + +**Name:** DECONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `result`: The result of deconvolution, which is 'quotient' or 'remainder'. By default, the quotient will be output. + +**Output:** Output a single series. The type is DOUBLE. It is the result of deconvolving the second series from the first series (dividing the first series by the second series) whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Calculate the quotient + +When `result` is 'quotient' or the default, this function calculates the quotient of the deconvolution. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +##### Calculate the remainder + +When `result` is 'remainder', this function calculates the remainder of the deconvolution. + +Input series is the same as above, the SQL for query is shown below: + + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +### DWT + +#### Registration statement + +```sql +create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' +``` + +#### Usage + +This function is used to calculate 1d discrete wavelet transform of a numerical series. + +**Name:** DWT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of wavelet. May select 'Haar', 'DB4', 'DB6', 'DB8', where DB means Daubechies. User may offer coefficients of wavelet transform and ignore this parameter. Case ignored. ++ `coef`: Coefficients of wavelet transform. When providing this parameter, use comma ',' to split them, and leave no spaces or other punctuations. ++ `layer`: Times to transform. The number of output vectors equals $layer+1$. Default is 1. + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. + +**Note:** The length of input series must be an integer number power of 2. + +#### Examples + + +##### Haar wavelet transform + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +### FFT + +#### Registration statement + +```sql +create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' +``` + +#### Usage + +This function is used to calculate the fast Fourier transform (FFT) of a numerical series. + +**Name:** FFT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of FFT, which is 'uniform' (by default) or 'nonuniform'. If the value is 'uniform', the timestamps will be ignored and all data points will be regarded as equidistant. Thus, the equidistant fast Fourier transform algorithm will be applied. If the value is 'nonuniform' (TODO), the non-equidistant fast Fourier transform algorithm will be applied based on timestamps. ++ `result`: The result of FFT, which is 'real', 'imag', 'abs' or 'angle', corresponding to the real part, imaginary part, magnitude and phase angle. By default, the magnitude will be output. ++ `compress`: The parameter of compression, which is within (0,1]. It is the reserved energy ratio of lossy compression. By default, there is no compression. + + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. The timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Uniform FFT + +With the default `type`, uniform FFT is applied. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select fft(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, there are peaks in $k=4$ and $k=5$ of the output. + +##### Uniform FFT with Compression + +Input series is the same as above, the SQL for query is shown below: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +Note: Based on the conjugation of the Fourier transform result, only the first half of the compression result is reserved. +According to the given parameter, data points are reserved from low frequency to high frequency until the reserved energy ratio exceeds it. +The last data point is reserved to indicate the length of the series. + +### HighPass + +#### Registration statement + +```sql +create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' +``` + +#### Usage + +This function performs low-pass filtering on the input series and extracts components above the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** HIGHPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=sin(2\pi t/4)$ after high-pass filtering. + +### IFFT + +#### Registration statement + +```sql +create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' +``` + +#### Usage + +This function treats the two input series as the real and imaginary part of a complex series, performs an inverse fast Fourier transform (IFFT), and outputs the real part of the result. +For the input format, please refer to the output format of `FFT` function. +Moreover, the compressed output of `FFT` function is also supported. + +**Name:** IFFT + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `start`: The start time of the output series with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is '1970-01-01 08:00:00'. ++ `interval`: The interval of the output series, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it is 1s. + +**Output:** Output a single series. The type is DOUBLE. It is strictly equispaced. The values are the results of IFFT. + +**Note:** If a row contains null points or `NaN`, it will be ignored. + +#### Examples + + +Input series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +SQL for query: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +### LowPass + +#### Registration statement + +```sql +create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' +``` + +#### Usage + +This function performs low-pass filtering on the input series and extracts components below the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** LOWPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=2sin(2\pi t/5)$ after low-pass filtering. + + +### Envelope + +#### Registration statement + +```sql +create function envelope as 'org.apache.iotdb.library.frequency.UDFEnvelopeAnalysis' +``` + +#### Usage + +This function achieves signal demodulation and envelope extraction by inputting a one-dimensional floating-point array and a user specified modulation frequency. The goal of demodulation is to extract the parts of interest from complex signals, making them easier to understand. For example, demodulation can be used to find the envelope of the signal, that is, the trend of amplitude changes. + +**Name:** Envelope + +**Input:** Only supports a single input sequence, with types INT32/INT64/FLOAT/DOUBLE + + +**Parameters:** + ++ `frequency`: Frequency (optional, positive number. If this parameter is not filled in, the system will infer the frequency based on the time interval corresponding to the sequence). ++ `amplification`: Amplification factor (optional, positive integer. The output of the Time column is a set of positive integers and does not output decimals. When the frequency is less than 1, this parameter can be used to amplify the frequency to display normal results). + +**Output:** ++ `Time`: The meaning of the value returned by this column is frequency rather than time. If the output format is time format (e.g. 1970-01-01T08:00: 19.000+08:00), please convert it to a timestamp value. + + ++ `Envelope(Path, 'frequency'='{frequency}')`:Output a single sequence of type DOUBLE, which is the result of envelope analysis. + +**Note:** When the values of the demodulated original sequence are discontinuous, this function will treat it as continuous processing. It is recommended that the analyzed time series be a complete time series of values. It is also recommended to specify a start time and an end time. + +#### Examples + +Input series: + + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:01.000+08:00| 1.0 | +|1970-01-01T08:00:02.000+08:00| 2.0 | +|1970-01-01T08:00:03.000+08:00| 3.0 | +|1970-01-01T08:00:04.000+08:00| 4.0 | +|1970-01-01T08:00:05.000+08:00| 5.0 | +|1970-01-01T08:00:06.000+08:00| 6.0 | +|1970-01-01T08:00:07.000+08:00| 7.0 | +|1970-01-01T08:00:08.000+08:00| 8.0 | +|1970-01-01T08:00:09.000+08:00| 9.0 | +|1970-01-01T08:00:10.000+08:00| 10.0 | ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +set time_display_type=long; +select envelope(s1),envelope(s1,'frequency'='1000'),envelope(s1,'amplification'='10') from root.test.d1; +``` + +Output series: + + +``` ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +|Time|envelope(root.test.d1.s1)|envelope(root.test.d1.s1, "frequency"="1000")|envelope(root.test.d1.s1, "amplification"="10")| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +| 0| 6.284350808484124| 6.284350808484124| 6.284350808484124| +| 100| 1.5581923657404393| 1.5581923657404393| null| +| 200| 0.8503211038340728| 0.8503211038340728| null| +| 300| 0.512808785945551| 0.512808785945551| null| +| 400| 0.26361156774506744| 0.26361156774506744| null| +|1000| null| null| 1.5581923657404393| +|2000| null| null| 0.8503211038340728| +|3000| null| null| 0.512808785945551| +|4000| null| null| 0.26361156774506744| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ + +``` + + +## Data Matching + +### Cov + +#### Registration statement + +```sql +create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' +``` + +#### Usage + +This function is used to calculate the population covariance. + +**Name:** COV + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population covariance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +### DTW + +#### Registration statement + +```sql +create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' +``` + +#### Usage + +This function is used to calculate the DTW distance between two input series. + +**Name:** DTW + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the DTW distance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `0` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +### Pearson + +#### Registration statement + +```sql +create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' +``` + +#### Usage + +This function is used to calculate the Pearson Correlation Coefficient. + +**Name:** PEARSON + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the Pearson Correlation Coefficient. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +### PtnSym + +#### Registration statement + +```sql +create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' +``` + +#### Usage + +This function is used to find all symmetric subseries in the input whose degree of symmetry is less than the threshold. +The degree of symmetry is calculated by DTW. +The smaller the degree, the more symmetrical the series is. + +**Name:** PATTERNSYMMETRIC + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameter:** + ++ `window`: The length of the symmetric subseries. It's a positive integer and the default value is 10. ++ `threshold`: The threshold of the degree of symmetry. It's non-negative. Only the subseries whose degree of symmetry is below it will be output. By default, all subseries will be output. + + +**Output Series:** Output a single series. The type is DOUBLE. Each data point in the output series corresponds to a symmetric subseries. The output timestamp is the starting timestamp of the subseries and the output value is the degree of symmetry. + +#### Example + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +### XCorr + +#### Registration statement + +```sql +create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' +``` + +#### Usage + +This function is used to calculate the cross correlation function of given two time series. +For discrete time series, cross correlation is given by +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +which represent the similarities between two series with different index shifts. + +**Name:** XCORR + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series with DOUBLE as datatype. +There are $2N-1$ data points in the series, the center of which represents the cross correlation +calculated with pre-aligned series(that is $CR(0)$ in the formula above), +and the previous(or post) values represent those with shifting the latter series forward(or backward otherwise) +until the two series are no longer overlapped(not included). +In short, the values of output series are given by(index starts from 1) +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` + + + +## Data Repairing + +### TimestampRepair + +#### Registration statement + +```sql +create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' +``` + +#### Usage + +This function is used for timestamp repair. +According to the given standard time interval, +the method of minimizing the repair cost is adopted. +By fine-tuning the timestamps, +the original data with unstable timestamp interval is repaired to strictly equispaced data. +If no standard time interval is given, +this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval. + +**Name:** TIMESTAMPREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method. ++ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +##### Manually Specify the Standard Time Interval + +When `interval` is given, this function repairs according to the given standard time interval. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +Output series: + + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +##### Automatically Estimate the Standard Time Interval + +When `interval` is default, this function estimates the standard time interval. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +### ValueFill + +#### Registration statement + +```sql +create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' +``` + +#### Usage + +This function is used to impute time series. Several methods are supported. + +**Name**: ValueFill +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear". + Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0). + +#### Examples + +##### Fill with linear + +When `method` is "linear" or the default, Screen method is used to impute. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuefill(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +##### Previous Fill + +When `method` is "previous", previous method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +### ValueRepair + +#### Registration statement + +```sql +create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' +``` + +#### Usage + +This function is used to repair the value of the time series. +Currently, two methods are supported: +**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes; +**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood. + + +**Name:** VALUEREPAIR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used. ++ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation. ++ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation. ++ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0. ++ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Screen + +When `method` is 'Screen' or the default, Screen method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +##### Repair with LsGreedy + +When `method` is 'LsGreedy', LsGreedy method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +### MasterRepair + +#### Usage + +This function is used to clean time series with master data. + +**Name**: MasterRepair +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. ++ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +SQL for query: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +Output series: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +### SeasonalRepair + +#### Usage +This function is used to repair the value of the seasonal time series via decomposition. Currently, two methods are supported: **Classical** - detect irregular fluctuations through residual component decomposed by classical decomposition, and repair them through moving average; **Improved** - detect irregular fluctuations through residual component decomposed by improved decomposition, and repair them through moving median. + +**Name:** SEASONALREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The decomposition method used to repair, which is 'Classical' or 'Improved'. By default, classical decomposition is used. ++ `period`: It is the period of the time series. ++ `k`: It is the range threshold of residual term, which limits the degree to which the residual term is off-center. By default, it is 9. ++ `max_iter`: It is the maximum number of iterations for the algorithm. By default, it is 10. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Classical + +When `method` is 'Classical' or default value, classical decomposition method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +##### Repair with Improved +When `method` is 'Improved', improved decomposition method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` + + + +## Series Discovery + +### ConsecutiveSequences + +#### Registration statement + +```sql +create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' +``` + +#### Usage + +This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest. + +**Name:** CONSECUTIVESEQUENCES + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + +##### Manually Specify the Standard Time Interval + +It's able to manually specify the standard time interval by the parameter `gap`. It's notable that false parameter leads to false output. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + + +##### Automatically Estimate the Standard Time Interval + +When `gap` is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +### ConsecutiveWindows + +#### Registration statement + +```sql +create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' +``` + +#### Usage + +This function is used to find consecutive windows of specified length in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive window is the subsequence that is strictly equispaced with the standard time interval without any missing data. + +**Name:** CONSECUTIVEWINDOWS + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. ++ `length`: The length of the window which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a consecutive window. The output timestamp is the starting timestamp of the window and the output value is the number of data points in the window. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` + + + +## Machine Learning + +### AR + +#### Registration statement + +```sql +create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' +``` + +#### Usage + +This function is used to learn the coefficients of the autoregressive models for a time series. + +**Name:** AR + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `p`: The order of the autoregressive model. Its default value is 1. + +**Output Series:** Output a single series. The type is DOUBLE. The first line corresponds to the first order coefficient, and so on. + +**Note:** + +- Parameter `p` should be a positive integer. +- Most points in the series should be sampled at a constant time interval. +- Linear interpolation is applied for the missing points in the series. + +#### Examples + +##### Assigning Model Order + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +### Representation + +#### Usage + +This function is used to represent a time series. + +**Name:** Representation + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is INT32. The length is `tb*vb`. The timestamps starting from 0 only indicate the order. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +### RM + +#### Usage + +This function is used to calculate the matching score of two time series according to the representation. + +**Name:** RM + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the matching score. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/Technical-Insider/Cluster-data-partitioning.md b/src/UserGuide/V2.0.1/Tree/Technical-Insider/Cluster-data-partitioning.md new file mode 100644 index 00000000..c10a6e6a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Technical-Insider/Cluster-data-partitioning.md @@ -0,0 +1,110 @@ + + +# Load Balance +This document introduces the partitioning strategies and load balance strategies in IoTDB. According to the characteristics of time series data, IoTDB partitions them by series and time dimensions. Combining a series partition with a time partition creates a partition, the unit of division. To enhance throughput and reduce management costs, these partitions are evenly allocated to RegionGroups, which serve as the unit of replication. The RegionGroup's Regions then determine the storage location, with the leader Region managing the primary load. During this process, the Region placement strategy determines which nodes will host the replicas, while the leader selection strategy designates which Region will act as the leader. + +## Partitioning Strategy & Partition Allocation +IoTDB implements tailored partitioning algorithms for time series data. Building on this foundation, the partition information cached on both ConfigNodes and DataNodes is not only manageable in size but also clearly differentiated between hot and cold. Subsequently, balanced partitions are evenly allocated across the cluster's RegionGroups to achieve storage balance. + +### Partitioning Strategy +IoTDB maps each sensor in the production environment to a time series. The time series are then partitioned using the series partitioning algorithm to manage their schema, and combined with the time partitioning algorithm to manage their data. The following figure illustrates how IoTDB partitions time series data. + + + +#### Partitioning Algorithm +Because numerous devices and sensors are commonly deployed in production environments, IoTDB employs a series partitioning algorithm to ensure the size of partition information is manageable. Since the generated time series associated with timestamps, IoTDB uses a time partioning algorithm to clearly distinguish between hot and cold partitions. + +##### Series Partitioning Algorithm +By default, IoTDB limits the number of series partitions to 1000 and configures the series partitioning algorithm to use a hash partitioning algorithm. This leads to the following outcomes: ++ Since the number of series partitions is a fixed constant, the mapping between series and series partitions remains stable. As a result, IoTDB does not require frequent data migrations. ++ The load across series partitions is relatively balanced because the number of series partitions is much smaller than the number of sensors deployed in the production environment. + +Furthermore, if a more accurate estimate of the actual load in the production environment is available, the series partitioning algorithm can be configured to use a customized hash partitioning or a list partitioning to achieve a more uniform load distribution across all series partitions. + +##### Time Partitioning Algorithm +The time partitioning algorithm converts a given timestamp to the corresponding time partition by + +$$\left\lfloor\frac{\text{Timestamp}-\text{StartTimestamp}}{\text{TimePartitionInterval}}\right\rfloor.$$ + +In this equation, both $\text{StartTimestamp}$ and $\text{TimePartitionInterval}$ are configurable parameters to accommodate various production environments. The $\text{StartTimestamp}$ represents the starting time of the first time partition, while the $\text{TimePartitionInterval}$ defines the duration of each time partition. By default, the $\text{TimePartitionInterval}$ is set to seven day. + +#### Schema Partitioning +Since the series partitioning algorithm evenly partitions the time series, each series partition corresponds to a schema partition. These schema partitions are then evenly allocated across the SchemaRegionGroups to achieve a balanced schema distribution. + +#### Data Partitioning +Combining a series partition with a time partition creates a data partition. Since the series partitioning algorithm evenly partitions the time series, the load of data partitions within a specified time partition remains balanced. These data partitions are then evenly allocated across the DataRegionGroups to achieve balanced data distribution. + +### Partition Allocation +IoTDB uses RegionGroups to enable elastic storage of time series, with the number of RegionGroups in the cluster determined by the total resources available across all DataNodes. Since the number of RegionGroups is dynamic, IoTDB can easily scale out. Both the SchemaRegionGroup and DataRegionGroup follow the same partition allocation algorithm, which evenly splits all series partitions. The following figure demonstrates the partition allocation process, where the dynamic RegionGroups match the variously expending time series and cluster. + + + +#### RegionGroup Expansion +The number of RegionGroups is given by + +$$\text{RegionGroupNumber}=\left\lfloor\frac{\sum_{i=1}^{DataNodeNumber}\text{RegionNumber}_i}{\text{ReplicationFactor}}\right\rfloor.$$ + +In this equation, $\text{RegionNumber}_i$ represents the number of Regions expected to be hosted on the $i$-th DataNode, while $\text{ReplicationFactor}$ denotes the number of Regions within each RegionGroup. Both $\text{RegionNumber}_i$ and $\text{ReplicationFactor}$ are configurable parameters. The $\text{RegionNumber}_i$ can be determined by the available hardware resources---such as CPU cores, memory sizes, etc.---on the $i$-th DataNode to accommodate different physical servers. The $\text{ReplicationFactor}$ can be adjusted to ensure diverse levels of fault tolerance. + +#### Allocation Algorithm +Both the SchemaRegionGroup and the DataRegionGroup follow the same allocation algorithm--splitting all series partitions evenly. As a result, each SchemaRegionGroup holds the same number of schema partitions, ensuring balanced schema storage. Similarly, for each time partition, each DataRegionGroup acquires the data partitions corresponding to the series partitions it holds. Consequently, the data partitions within a time partition are evenly distributed across all DataRegionGroups, ensuring balanced data storage in each time partition. + +Notably, IoTDB effectively leverages the characteristics of time series data. When the TTL (Time to Live) is configured, IoTDB enables migration-free elastic storage for time series data. This feature facilitates cluster expansion while minimizing the impact on online operations. The figures above illustrate an instance of this feature: newborn data partitions are evenly allocated to each DataRegion, and expired data are automatically archived. As a result, the cluster's storage will eventually remain balanced. + +## Balance Strategy +To enhance the cluster's availability and performance, IoTDB employs sophisticated storage load and computing load balance algorithms. + +### Storage Load Balance +The number of Regions held by a DataNode reflects its storage load. If the difference in the number of Regions across DataNodes is relatively large, the DataNode with more Regions is likely to become a storage bottleneck. Although a straightforward Round Robin placement algorithm can achieve storage balance by ensuring that each DataNode hosts an equal number of Regions, it compromises the cluster's fault tolerance, as illustrated below: + + + ++ Assume the cluster has 4 DataNodes, 4 RegionGroups and a replication factor of 2. ++ Place RegionGroup $r_1$'s 2 Regions on DataNodes $n_1$ and $n_2$. ++ Place RegionGroup $r_2$'s 2 Regions on DataNodes $n_3$ and $n_4$. ++ Place RegionGroup $r_3$'s 2 Regions on DataNodes $n_1$ and $n_3$. ++ Place RegionGroup $r_4$'s 2 Regions on DataNodes $n_2$ and $n_4$. + +In this scenario, if DataNode $n_2$ fails, the load previously handled by DataNode $n_2$ would be transferred solely to DataNode $n_1$, potentially overloading it. + +To address this issue, IoTDB employs a Region placement algorithm that not only evenly distributes Regions across all DataNodes but also ensures that each DataNode can offload its storage to sufficient other DataNodes in the event of a failure. As a result, the cluster achieves balanced storage distribution and a high level of fault tolerance, ensuring its availability. + +### Computing Load Balance +The number of leader Regions held by a DataNode reflects its Computing load. If the difference in the number of leaders across DataNodes is relatively large, the DataNode with more leaders is likely to become a Computing bottleneck. If the leader selection process is conducted using a transparent Greedy algorithm, the result may be an unbalanced leader distribution when the Regions are fault-tolerantly placed, as demonstrated below: + + + ++ Assume the cluster has 4 DataNodes, 4 RegionGroups and a replication factor of 2. ++ Select RegionGroup $r_5$'s Region on DataNode $n_5$ as the leader. ++ Select RegionGroup $r_6$'s Region on DataNode $n_7$ as the leader. ++ Select RegionGroup $r_7$'s Region on DataNode $n_7$ as the leader. ++ Select RegionGroup $r_8$'s Region on DataNode $n_8$ as the leader. + +Please note that all the above steps strictly follow the Greedy algorithm. However, by Step 3, selecting the leader of RegionGroup $r_7$ on either DataNode $n_5$ or $n_7$ results in an unbalanced leader distribution. The rationale is that each greedy step lacks a global perspective, leading to a locally optimal solution. + +To address this issue, IoTDB employs a leader selection algorithm that can consistently balance the cluster's leader distribution. Consequently, the cluster achieves balanced Computing load distribution, ensuring its performance. + +## Source Code ++ [Data Partitioning](https://github.com/apache/iotdb/tree/master/iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/partition) ++ [Partition Allocation](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/partition) ++ [Region Placement](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/region) ++ [Leader Selection](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/router/leader) \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Technical-Insider/Encoding-and-Compression.md b/src/UserGuide/V2.0.1/Tree/Technical-Insider/Encoding-and-Compression.md new file mode 100644 index 00000000..5a6639bf --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Technical-Insider/Encoding-and-Compression.md @@ -0,0 +1,131 @@ + + +# Encoding and Compression + + +## Encoding Methods + +To improve the efficiency of data storage, it is necessary to encode data during data writing, thereby reducing the amount of disk space used. In the process of writing and reading data, the amount of data involved in the I/O operations can be reduced to improve performance. IoTDB supports the following encoding methods for different data types: + +1. PLAIN + + PLAIN encoding, the default encoding mode, i.e, no encoding, supports multiple data types. It has high compression and decompression efficiency while suffering from low space storage efficiency. + +2. TS_2DIFF + + Second-order differential encoding is more suitable for encoding monotonically increasing or decreasing sequence data, and is not recommended for sequence data with large fluctuations. + +3. RLE + + Run-length encoding is suitable for storing sequence with continuous values, and is not recommended for sequence data with most of the time different values. + + Run-length encoding can also be used to encode floating-point numbers, while it is necessary to specify reserved decimal digits (MAX\_POINT\_NUMBER) when creating time series. It is more suitable to store sequence data where floating-point values appear continuously, monotonously increasing or decreasing, and it is not suitable for storing sequence data with high precision requirements after the decimal point or with large fluctuations. + + > TS_2DIFF and RLE have precision limit for data type of float and double. By default, two decimal places are reserved. GORILLA is recommended. + +4. GORILLA + + GORILLA encoding is lossless. It is more suitable for numerical sequence with similar values and is not recommended for sequence data with large fluctuations. + + Currently, there are two versions of GORILLA encoding implementation, it is recommended to use `GORILLA` instead of `GORILLA_V1` (deprecated). + + Usage restrictions: When using GORILLA to encode INT32 data, you need to ensure that there is no data point with the value `Integer.MIN_VALUE` in the sequence. When using GORILLA to encode INT64 data, you need to ensure that there is no data point with the value `Long.MIN_VALUE` in the sequence. + +5. DICTIONARY + + DICTIONARY encoding is lossless. It is suitable for TEXT data with low cardinality (i.e. low number of distinct values). It is not recommended to use it for high-cardinality data. + +6. ZIGZAG + + ZIGZAG encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small variant encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers. + +7. CHIMP + + CHIMP encoding is lossless. It is the state-of-the-art compression algorithm for streaming floating point data, providing impressive savings compared to earlier approaches. It is suitable for any numerical sequence with similar values and works best for sequence data without large fluctuations and/or random noise. + + Usage restrictions: When using CHIMP to encode INT32 data, you need to ensure that there is no data point with the value `Integer.MIN_VALUE` in the sequence. When using CHIMP to encode INT64 data, you need to ensure that there is no data point with the value `Long.MIN_VALUE` in the sequence. + +8. SPRINTZ + + SPRINTZ coding is a type of lossless data compression technique that involves predicting the original time series data, applying Zigzag encoding, bit-packing encoding, and run-length encoding. SPRINTZ encoding is effective for time series data with small absolute differences between values. However, it may not be as effective for time series data with large differences between values, indicating large fluctuation. +9. RLBE + + RLBE is a lossless encoding that combines the ideas of differential encoding, bit-packing encoding, run-length encoding, Fibonacci encoding and concatenation. RLBE encoding is suitable for time series data with increasing and small increment value, and is not suitable for time series data with large fluctuation. + + +### Correspondence between data type and encoding + +The five encodings described in the previous sections are applicable to different data types. If the correspondence is wrong, the time series cannot be created correctly. + +The correspondence between the data type and its supported encodings is summarized in the Table below. + +| **Data Type** | **Best Encoding (default)** | **Supported Encoding** | +| ------------- | --------------------------- | ----------------------------------------------------------- | +| BOOLEAN | RLE | PLAIN, RLE | +| INT32 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| DATE | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| INT64 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| TIMESTAMP | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| FLOAT | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| DOUBLE | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| TEXT | PLAIN | PLAIN, DICTIONARY | +| STRING | PLAIN | PLAIN, DICTIONARY | +| BLOB | PLAIN | PLAIN | + +When the data type specified by the user does not correspond to the encoding method, the system will prompt an error. + +As shown below, the second-order difference encoding does not support the Boolean type: + +``` +IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +Msg: 507: encoding TS_2DIFF does not support BOOLEAN +``` +## Compression + +When the time series is written and encoded as binary data according to the specified type, IoTDB compresses the data using compression technology to further improve space storage efficiency. Although both encoding and compression are designed to improve storage efficiency, encoding techniques are usually available only for specific data types (e.g., second-order differential encoding is only suitable for INT32 or INT64 data type, and storing floating-point numbers requires multiplying them by 10m to convert to integers), after which the data is converted to a binary stream. The compression method (SNAPPY) compresses the binary stream, so the use of the compression method is no longer limited by the data type. + +### Basic Compression Methods + +IoTDB allows you to specify the compression method of the column when creating a time series, and supports the following compression methods: + +* UNCOMPRESSED + +* SNAPPY + +* LZ4 (Best compression method) + +* GZIP + +* ZSTD + +* LZMA2 + +The specified syntax for compression is detailed in [Create Timeseries Statement](../SQL-Manual/SQL-Manual.md). + +### Compression Ratio Statistics + +Compression ratio statistics file: data/datanode/system/compression_ratio + +* ratio_sum: sum of memtable compression ratios +* memtable_flush_time: memtable flush times + +The average compression ratio can be calculated by `ratio_sum / memtable_flush_time` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Technical-Insider/Publication.md b/src/UserGuide/V2.0.1/Tree/Technical-Insider/Publication.md new file mode 100644 index 00000000..1f1832ef --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Technical-Insider/Publication.md @@ -0,0 +1,42 @@ + + +# Academic Achievement + +Apache IoTDB starts at Tsinghua University, School of Software. IoTDB is a database for managing large amount of time series data with columnar storage, data encoding, pre-computation, and index techniques. It has SQL-like interface to write millions of data points per second per node and is optimized to get query results in few seconds over trillions of data points. It can also be easily integrated with Apache Hadoop MapReduce and Apache Spark for analytics. + +The research papers related are as follows: +* [Apache IoTDB: A Time Series Database for IoT Applications](https://sxsong.github.io/doc/23sigmod-iotdb.pdf), Chen Wang, Jialin Qiao, Xiangdong Huang, Shaoxu Song, Haonan Hou, Tian Jiang, Lei Rui, Jianmin Wang, Jiaguang Sun. SIGMOD 2023. +* [Grouping Time Series for Efficient Columnar Storage](https://sxsong.github.io/doc/23sigmod-group.pdf), Chenguang Fang, Shaoxu Song, Haoquan Guan, Xiangdong Huang, Chen Wang, Jianmin Wang. SIGMOD 2023. +* [Learning Autoregressive Model in LSM-Tree based Store](https://sxsong.github.io/doc/23kdd.pdf), Yunxiang Su, Wenxuan Ma, Shaoxu Song. SIGMOD 2023. +* [TsQuality: Measuring Time Series Data Quality in Apache IoTDB](https://sxsong.github.io/doc/23vldb-qaulity.pdf), Yuanhui Qiu, Chenguang Fang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. VLDB 2023. +* [Frequency Domain Data Encoding in Apache IoTDB](https://sxsong.github.io/doc/22vldb-frequency.pdf), Haoyu Wang, Shaoxu Song. VLDB 2023. +* [Non-Blocking Raft for High Throughput IoT Data](https://sxsong.github.io/doc/23icde-raft.pdf), Tian Jiang, Xiangdong Huang, Shaoxu Song, Chen Wang, Jianmin Wang, Ruibo Li, Jincheng Sun. ICDE 2023. +* [Backward-Sort for Time Series in Apache IoTDB](https://sxsong.github.io/doc/23icde-sort.pdf), Xiaojian Zhang, Hongyin Zhang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. ICDE 2023. +* [Time Series Data Encoding for Efficient Storage: A Comparative Analysis in Apache IoTDB](https://sxsong.github.io/doc/22vldb-encoding.pdf), Jinzhao Xiao, Yuxiang Huang, Changyu Hu, Shaoxu Song, Xiangdong Huang, Jianmin Wang. VLDB 2022. +* [Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree](https://sxsong.github.io/doc/22icde-separation.pdf), Yuyuan Kang, Xiangdong Huang, Shaoxu Song, Lingzhe Zhang, Jialin Qiao, Chen Wang, Jianmin Wang, Julian Feinauer. ICDE 2022. +* [Dual-PISA: An index for aggregation operations on time series data](https://www.sciencedirect.com/science/article/pii/S0306437918305489), Jialin Qiao, Xiangdong Huang, Jianmin Wang, Raymond K Wong. IS 2020. +* [Apache IoTDB: time-series database for internet of things](http://www.vldb.org/pvldb/vol13/p2901-wang.pdf), Chen Wang, Xiangdong Huang, Jialin Qiao, Tian Jiang, Lei Rui, Jinrui Zhang, Rong Kang, Julian Feinauer, Kevin A. McGrail, Peng Wang, Jun Yuan, Jianmin Wang, Jiaguang Sun. VLDB 2020. +* [KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping](https://www.semanticscholar.org/paper/KV-match%3A-A-Subsequence-Matching-Approach-and-Time-Wu-Wang/9ed84cb15b7e5052028fc5b4d667248713ac8592), Jiaye Wu and Peng Wang and Chen Wang and Wei Wang and Jianmin Wang. ICDE 2019. +* [The Design of Apache IoTDB distributed framework](http://ndbc2019.sdu.edu.cn/info/1002/1044.htm), Tianan Li, Jianmin Wang, Xiangdong Huang, Yi Xu, Dongfang Mao, Jun Yuan. NDBC 2019. +* [Matching Consecutive Subpatterns over Streaming Time Series](https://link.springer.com/chapter/10.1007/978-3-319-96893-3_8), Rong Kang and Chen Wang and Peng Wang and Yuting Ding and Jianmin Wang. APWeb/WAIM 2018. +* [PISA: An Index for Aggregating Big Time Series Data](https://dl.acm.org/citation.cfm?id=2983775&dl=ACM&coll=DL), Xiangdong Huang and Jianmin Wang and Raymond K. Wong and Jinrui Zhang and Chen Wang. CIKM 2016. + diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Benchmark.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Benchmark.md new file mode 100644 index 00000000..86ff88bc --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Benchmark.md @@ -0,0 +1,344 @@ + + +# Benchmark Tool + +IoT-benchmark is a time-series database benchmarking tool based on Java and big data environment, developed and open sourced by School of Software Tsinghua University. It is easy to use, supports multiple writing and query methods, supports storing test information and results for further query or analysis, and supports integration with Tableau to visualize test results. + +Figure 1-1 below includes the test benchmark process and other extended functions. These processes can be unified by IoT-benchmark. IoT Benchmark supports a variety of workloads, including **pure write, pure query, write query mixed**, etc., supports **software and hardware system monitoring, test metric measurement** and other monitoring functions, and also realizes **initializing the database automatically, test data analysis and system parameter optimization** functions. + +![img](https://alioss.timecho.com/docs/img/benchmark-English1.png) + + +Figure 1-1 + +Referring to the YCSB test tool's design idea of separating the three components of workload generation, performance metric measurement and database interface, the modular design of IoT-benchmark is shown in Figure 1-2. Different from the YCSB-based test tool system, IoT-benchmark adds a system monitoring module to support the persistence of test data and system monitoring data. In addition, some special load testing functions especially designed for time series data scenarios have been added, such as supporting batch writing and multiple out-of-sequence data writing modes for IoT scenarios. + +![img](https://alioss.timecho.com/docs/img/benchmark-%20English2.png) + + +Figure 1-2 + +Currently IoT-benchmark supports the following time series databases, versions and connection methods: + +| Database | Version | Connection mmethod | +| --------------- | ------- | -------------------------------------------------------- | +| InfluxDB | v1.x
v2.0 | SDK | | +| TimescaleDB | -- | jdbc | +| OpenTSDB | -- | Http Request | +| QuestDB | v6.0.7 | jdbc | +| TDengine | v2.2.0.2 | jdbc | +| VictoriaMetrics | v1.64.0 | Http Request | +| KairosDB | -- | Http Request | +| IoTDB | v1.x
v0.13 | jdbc、sessionByTablet、sessionByRecord、sessionByRecords | + +Table 1-1 Comparison of big data test benchmarks + +## Software Installation and Environment Setup + +### Prerequisites + +1. Java 8 +2. Maven 3.6+ +3. The corresponding appropriate version of the database, such as Apache IoTDB 1.0 + +### How to Get IoT Benchmark + +- **Get the binary package**: Enter https://github.com/thulab/iot-benchmark/releases to download the required installation package. Download it as a compressed file, select a folder to decompress and use it. +- Compiled from source (can be tested with Apache IoTDB 1.0): + - The first step (compile the latest IoTDB Session package): Enter the official website https://github.com/apache/iotdb/tree/rel/1.0 to download the IoTDB source code, and run the command `mvn clean package install -pl session -am -DskipTests` in the root directory to compiles the latest package for IoTDB Session. + - The second step (compile the IoTDB Benchmark test package): Enter the official website https://github.com/thulab/iot-benchmark to download the source code, run `mvn clean package install -pl iotdb-1.0 -am -DskipTests` in the root directory to compile Apache IoTDB version 1.0 test package. The relative path between the test package and the root directory is `./iotdb-1.0/target/iotdb-1.0-0.0.1/iotdb-1.0-0.0.1`. + +### IoT Benchmark's Test Package Structure + +The directory structure of the test package is shown in Figure 1-3 below. The test configuration file is conf/config.properties, and the test startup scripts are benchmark\.sh (Linux & MacOS) and benchmark.bat (Windows). The detailed usage of the files is shown in Table 1-2. + +![img](https://alioss.timecho.com/docs/img/bm3.png) + + +Figure 1-3 List of files and folders + +| Name | File | Usage | +| ---------------- | ----------------- | -------------------------------- | +| benchmark.bat | - | Startup script on Windows | +| benchmark\.sh | - | Startup script on Linux/Mac | +| conf | config.properties | Test scenario configuration file | +| logback.xml | - | Log output configuration file | +| lib | - | Dependency library | +| LICENSE | - | License file | +| bin | startup\.sh | Init script folder | +| ser-benchmark\.sh | - | Monitor mode startup script | + +Table 1-2 Usage list of files and folders + +### IoT Benchmark Execution Test + +1. Modify the configuration file according to the test requirements. For the main parameters, see next chapter. The corresponding configuration file is conf/config.properties. For example, to test Apache IoTDB 1.0, you need to modify DB_SWITCH=IoTDB-100-SESSION_BY_TABLET. +2. Start the time series database under test. +3. Running. +4. Start IoT-benchmark to execute the test. Observe the status of the time series database and IoT-benchmark under test during execution, and view the results and analyze the test process after execution. + +### IoT Benchmark Results Interpretation + +All the log files of the test are stored in the logs folder, and the test results are stored in the data/csvOutput folder after the test is completed. For example, after the test, we get the following result matrix: + + +![img](https://alioss.timecho.com/docs/img/bm4.png) + +- Result Matrix + - OkOperation: successful operations + - OkPoint: For write operations, it is the number of points successfully written; for query operations, it is the number of points successfully queried. + - FailOperation: failed operations + - FailPoint: For write operations, it is the number of write failure points +- Latency(mx) Matrix + - AVG: average operation time + - MIN: minimum operation time + - Pn: the quantile value of the overall distribution of operations, for example, P25 is the lower quartile. + +## Main Parameters + +This chapter mainly explains the purpose and configuration method of the main parameters. + +### Working Mode and Operation Proportion + +- The working mode parameter "BENCHMARK_WORK_MODE" can be selected as "default mode" and "server monitoring"; the "server monitoring" mode can be started directly by executing the ser-benchmark\.sh script, and the script will automatically modify this parameter. "Default mode" is a commonly used test mode, combined with the configuration of the OPERATION_PROPORTION parameter to achieve the definition of test operation proportions of "pure write", "pure query" and "read-write mix". + +- When running ServerMode to monitor the operating environment of the time series database under test, IoT-benchmark relies on sysstat software related commands; if MySQL or IoTDB is selected for persistent test process data, this type of database needs to be installed; the recording mode of ServerMode and CSV can only be used in the Linux system to record relevant system information during the test. Therefore, we recommend using MacOs or Linux system. This article uses Linux (Centos7) system as an example. If you use Windows system, you can use the benchmark.bat script in the conf folder to start IoT-benchmark. + +Table 1-3 Test mode + +| Mode Name | BENCHMARK_WORK_MODE | Description | +| ------------ | ------------------- | ------------------------------------------------------------ | +| default mode | testWithDefaultPath | Supports mixed workloads with multiple read and write operations | +| server mode | serverMODE | Server resource usage monitoring mode (running in this mode is started by the ser-benchmark\.sh script, no need to manually configure this parameter) | + +### Server Connection Information + +After the working mode is specified, how to inform IoT-benchmark of the information of the time series database under test? Currently, the type of the time-series database under test is informed through "DB_SWITCH"; the network address of the time-series database under test is informed through "HOST"; the network port of the time-series database under test is informed through "PORT"; the login user name of the time-series database under test is informed through "USERNAME"; "PASSWORD" informs the password of the login user of the time series database under test; informs the name of the time series database under test through "DB_NAME"; informs the connection authentication token of the time series database under test through "TOKEN" (used by InfluxDB 2.0). + +### Write Scene Setup Parameters + +Table 1-4 Write scene setup parameters + +| Parameter Name | Type | Example | Description | +| -------------------------- | --------- | ------------------------- | ------------------------------------------------------------ | +| CLIENT_NUMBER | Integer | 100 | Total number of clients | +| GROUP_NUMBER | Integer | 20 | Number of storage groups; only for IoTDB. | +| DEVICE_NUMBER | Integer | 100 | Total number of devices | +| SENSOR_NUMBER | Integer | 300 | Total number of sensors per device | +| INSERT_DATATYPE_PROPORTION | String | 1:1:1:1:1:1 | the data type proportion of the device, BOOLEAN:INT32:INT64:FLOAT:DOUBLE:TEXT | +| POINT_STEP | Integer | 1000 | Timestamp interval, that is, the fixed length between two timestamps of generated data. | +| OP_MIN_INTERVAL | Integer | 0 | Minimum operation execution interval: if the operation time is greater than this value, execute the next one immediately, otherwise wait (OP_MIN_INTERVAL-actual execution time) ms; if it is 0, the parameter will not take effect; if it is -1, its value is consistent with POINT_STEP. | +| IS_OUT_OF_ORDER | Boolean | false | Whether to write out of order | +| OUT_OF_ORDER_RATIO | Float | 0.3 | Ratio of data written out of order | +| BATCH_SIZE_PER_WRITE | Integer | 1 | Number of data rows written in batches (how many rows of data are written at a time) | +| START_TIME | Timestamp | 2022-10-30T00:00:00+08:00 | The start timestamp of writing data; use this timestamp as the starting point to start the simulation to create the data timestamp. | +| LOOP | Integer | 86400 | Total number of operations: Each type of operation will be divided according to the ratio defined by OPERATION_PROPORTION | +| OPERATION_PROPORTION | String | 1:0:0:0:0:0:0:0:0:0:0 | The ratio of each operation. Write:Q1:Q2:Q3:Q4:Q5:Q6:Q7:Q8:Q9:Q10, please note the use of English colons. Each term in the scale is an integer. | + +According to the configuration parameters in Table 1-4, the test scenario can be described as follows: write 30,000 (100 devices, 300 sensors for each device) time series sequential data for a day on October 30, 2022 to the time series database under test, in total 2.592 billion data points. The 300 sensor data types of each device are 50 Booleans, 50 integers, 50 long integers, 50 floats, 50 doubles, and 50 characters. If we change the value of IS_OUT_OF_ORDER in the table to true, then the scenario is: write 30,000 time series data on October 30, 2022 to the measured time series database, and there are 30% out of order data ( arrives in the time series database later than other data points whose generation time is later than itself). + +### Query Scene Setup Parameters + +Table 1-5 Query scene setup parameters + +| Parameter Name | Type | Example | Description | +| -------------------- | ------- | --------------------- | ------------------------------------------------------------ | +| QUERY_DEVICE_NUM | Integer | 2 | The number of devices involved in the query in each query statement. | +| QUERY_SENSOR_NUM | Integer | 2 | The number of sensors involved in the query in each query statement. | +| QUERY_AGGREGATE_FUN | String | count | Aggregate functions used in aggregate queries, such as count, avg, sum, max_time, etc. | +| STEP_SIZE | Integer | 1 | The change step of the starting time point of the time filter condition, if set to 0, the time filter condition of each query is the same, unit: POINT_STEP. | +| QUERY_INTERVAL | Integer | 250000 | The time interval between the start time and the end time in the start and end time query, and the time interval in Group By. | +| QUERY_LOWER_VALUE | Integer | -5 | Parameters for conditional query clauses, where xxx > QUERY_LOWER_VALUE. | +| GROUP_BY_TIME_UNIT | Integer | 20000 | The size of the group in the Group By statement. | +| LOOP | Integer | 10 | Total number of operations. Each type of operation will be divided according to the ratio defined by OPERATION_PROPORTION. | +| OPERATION_PROPORTION | String | 0:0:0:0:0:0:0:0:0:0:1 | Write:Q1:Q2:Q3:Q4:Q5:Q6:Q7:Q8:Q9:Q10 | + +Table 1-6 Query types and example SQL + +| Id | Query Type | IoTDB Example SQL | +| ---- | ---------------------------------------------------- | ------------------------------------------------------------ | +| Q1 | exact point query | select v1 from root.db.d1 where time = ? | +| Q2 | time range query | select v1 from root.db.d1 where time > ? and time < ? | +| Q3 | time range query with value filtering | select v1 from root.db.d1 where time > ? and time < ? and v1 > ? | +| Q4 | time range aggregation query | select count(v1) from root.db.d1 where and time > ? and time < ? | +| Q5 | full time range aggregate query with value filtering | select count(v1) from root.db.d1 where v1 > ? | +| Q6 | time range aggregation query with value filtering | select count(v1) from root.db.d1 where v1 > ? and time > ? and time < ? | +| Q7 | time grouping aggregation query | select count(v1) from root.db.d1 group by ([?, ?), ?, ?) | +| Q8 | latest point query | select last v1 from root.db.d1 | +| Q9 | reverse order time range query | select v1 from root.sg.d1 where time > ? and time < ? order by time desc | +| Q10 | reverse order time range query with value filtering | select v1 from root.sg.d1 where time > ? and time < ? and v1 > ? order by time desc | + +According to the configuration parameters in Table 1-5, the test scenario can be described as follows: Execute 10 reverse order time range queries with value filtering for 2 devices and 2 sensors from the time series database under test. The SQL statement is: `select s_0,s_31from data where time >2022-10-30T00:00:00+08:00 and time < 2022-10-30T00:04:10+08:00 and s_0 > -5 and device in d_21,d_46 order by time desc`. + +### Persistence of Test Process and Test Results + +IoT-benchmark currently supports persisting the test process and test results to IoTDB, MySQL, and CSV through the configuration parameter "TEST_DATA_PERSISTENCE"; writing to MySQL and CSV can define the upper limit of the number of rows in the sub-database and sub-table, such as "RECORD_SPLIT=true, RECORD_SPLIT_MAX_LINE=10000000" means that each database table or CSV file is divided and stored according to the total number of 10 million rows; if the records are recorded to MySQL or IoTDB, database link information needs to be provided, including "TEST_DATA_STORE_IP" the IP address of the database, "TEST_DATA_STORE_PORT" the port number of the database, "TEST_DATA_STORE_DB" the name of the database, "TEST_DATA_STORE_USER" the database user name, and "TEST_DATA_STORE_PW" the database user password. + +If we set "TEST_DATA_PERSISTENCE=CSV", we can see the newly generated data folder under the IoT-benchmark root directory during and after the test execution, which contains the csv folder to record the test process; the csvOutput folder to record the test results . If we set "TEST_DATA_PERSISTENCE=MySQL", it will create a data table named "testWithDefaultPath_tested database name_remarks_test start time" in the specified MySQL database before the test starts to record the test process; it will record the test process in the "CONFIG" data table (create the table if it does not exist), write the configuration information of this test; when the test is completed, the result of this test will be written in the data table named "FINAL_RESULT" (create the table if it does not exist). + +## Use Case + +We take the application of CRRC Qingdao Sifang Vehicle Research Institute Co., Ltd. as an example, and refer to the scene described in "Apache IoTDB in Intelligent Operation and Maintenance Platform Storage" for practical operation instructions. + +Test objective: Simulate the actual needs of switching time series databases in the scene of CRRC Qingdao Sifang Institute, and compare the performance of the expected IoTDB and KairosDB used by the original system. + +Test environment: In order to ensure that the impact of other irrelevant services and processes on database performance and the mutual influence between different databases are eliminated during the experiment, the local databases in this experiment are deployed and run on multiple independent virtual servers with the same resource configuration. Therefore, this experiment set up 4 Linux (CentOS7 /x86) virtual machines, and deployed IoT-benchmark, IoTDB database, KairosDB database, and MySQL database on them respectively. The specific resource configuration of each virtual machine is shown in Table 2-1. The specific usage of each virtual machine is shown in Table 2-2. + +Table 2-1 Virtual machine configuration information + +| Hardware Configuration Information | Value | +| ---------------------------------- | ------- | +| OS system | CentOS7 | +| number of CPU cores | 16 | +| memory | 32G | +| hard disk | 200G | +| network | Gigabit | + +Table 2-2 Virtual machine usage + +| IP | Usage | +| ---------- | ------------- | +| 172.21.4.2 | IoT-benchmark | +| 172.21.4.3 | Apache-iotdb | +| 172.21.4.4 | KaiosDB | +| 172.21.4.5 | MySQL | + +### Write Test + +Scenario description: Create 100 clients to simulate 100 trains, each train has 3000 sensors, the data type is DOUBLE, the data time interval is 500ms (2Hz), and they are sent sequentially. Referring to the above requirements, we need to modify the IoT-benchmark configuration parameters as listed in Table 2-3. + +Table 2-3 Configuration parameter information + +| Parameter Name | IoTDB Value | KairosDB Value | +| -------------------------- | --------------------------- | -------------- | +| DB_SWITCH | IoTDB-013-SESSION_BY_TABLET | KairosDB | +| HOST | 172.21.4.3 | 172.21.4.4 | +| PORT | 6667 | 8080 | +| BENCHMARK_WORK_MODE | testWithDefaultPath | | +| OPERATION_PROPORTION | 1:0:0:0:0:0:0:0:0:0:0 | | +| CLIENT_NUMBER | 100 | | +| GROUP_NUMBER | 10 | | +| DEVICE_NUMBER | 100 | | +| SENSOR_NUMBER | 3000 | | +| INSERT_DATATYPE_PROPORTION | 0:0:0:0:1:0 | | +| POINT_STEP | 500 | | +| OP_MIN_INTERVAL | 0 | | +| IS_OUT_OF_ORDER | false | | +| BATCH_SIZE_PER_WRITE | 1 | | +| LOOP | 10000 | | +| TEST_DATA_PERSISTENCE | MySQL | | +| TEST_DATA_STORE_IP | 172.21.4.5 | | +| TEST_DATA_STORE_PORT | 3306 | | +| TEST_DATA_STORE_DB | demo | | +| TEST_DATA_STORE_USER | root | | +| TEST_DATA_STORE_PW | admin | | +| REMARK | demo | | + +First, start the tested time series databases Apache-IoTDB and KairosDB on 172.21.4.3 and 172.21.4.4 respectively, and then start server resource monitoring through the ser-benchamrk\.sh script on 172.21.4.2, 172.21.4.3 and 172.21.4.4 (Figure 2-1). Then modify the conf/config.properties files in the iotdb-0.13-0.0.1 and kairosdb-0.0.1 folders in 172.21.4.2 according to Table 2-3 to meet the test requirements. Use benchmark\.sh to start the writing test of Apache-IoTDB and KairosDB successively. + + +![img](https://alioss.timecho.com/docs/img/bm5.png) + +Figure 2-1 Server monitoring tasks + +For example, if we first start the test on KairosDB, IoT-benchmark will create a CONFIG data table in the MySQL database to store the configuration information of this test (Figure 2-2), and there will be a log output of the current test progress during the test execution (Figure 2-3) . When the test is completed, the test result will be output (Figure 2-3), and the result will be written into the FINAL_RESULT data table (Figure 2-4). + +![img](https://alioss.timecho.com/docs/img/bm6.png) + +Figure 2-2 Test configuration information table + +![img](https://alioss.timecho.com/docs/img/bm7.png) +![img](https://alioss.timecho.com/docs/img/bm8.png) +![img](https://alioss.timecho.com/docs/img/bm9.png) +![img](https://alioss.timecho.com/docs/img/bm10.png) + +Figure 2-3 Test progress and results + +![img](https://alioss.timecho.com/docs/img/bm11.png) + + + +Figure 2-4 Test result table + +Afterwards, we will start the test on Apache-IoTDB. The same IoT-benchmark will write the test configuration information in the MySQL database CONFIG data table. During the test execution, there will be a log to output the current test progress. When the test is completed, the test result will be output, and the result will be written into the FINAL_RESULT data table. + +According to the test result information, we know that under the same configuration the write delay times of Apache-IoTDB and KairosDB are 55.98ms and 1324.45ms respectively; the write throughputs are 5,125,600.86 points/second and 224,819.01 points/second respectively; the tests were executed respectively 585.30 seconds and 11777.99 seconds. And KairosDB has a write failure. After investigation, it is found that the data disk usage has reached 100%, and there is no disk space to continue receiving data. However, Apache-IoTDB has no write failure, and the disk space occupied after all data is written is only 4.7G (as shown in Figure 2-5); Apache-IoTDB is better than KairosDB in terms of write throughput and disk occupation. Of course, there will be other tests in the follow-up to observe and compare from various aspects, such as query performance, file compression ratio, data security, etc. + +![img](https://alioss.timecho.com/docs/img/bm12.png) + + +Figure 2-5 Disk usage + +So what is the resource usage of each server during the test? What is the specific performance of each write operation? At this time, we can visualize the data in the server monitoring table and test process recording table by installing and using Tableau. The use of Tableau will not be introduced in this article. After connecting to the data table for test data persistence, the specific results are as follows (taking Apache-IoTDB as an example): + + +![img](https://alioss.timecho.com/docs/img/bm13.png) +![img](https://alioss.timecho.com/docs/img/bm14.png) + + +Figure 2-6 Visualization of testing process in Tableau + +### Query Test + +Scenario description: In the writing test scenario, 10 clients are simulated to perform all types of query tasks on the data stored in the time series database Apache-IoTDB. The configuration is as follows. + +Table 2-4 Configuration parameter information + +| Parameter Name | Example | +| -------------------- | --------------------- | +| CLIENT_NUMBER | 10 | +| QUERY_DEVICE_NUM | 2 | +| QUERY_SENSOR_NUM | 2 | +| QUERY_AGGREGATE_FUN | count | +| STEP_SIZE | 1 | +| QUERY_INTERVAL | 250000 | +| QUERY_LOWER_VALUE | -5 | +| GROUP_BY_TIME_UNIT | 20000 | +| LOOP | 30 | +| OPERATION_PROPORTION | 0:1:1:1:1:1:1:1:1:1:1 | + +Results: + +![img](https://alioss.timecho.com/docs/img/bm15.png) + +Figure 2-7 Query test results + +### Description of Other Parameters + +In the previous chapters, the write performance comparison between Apache-IoTDB and KairosDB was performed, but if the user wants to perform a simulated real write rate test, how to configure it? How to control if the test time is too long? Are there any regularities in the generated simulated data? If the IoT-Benchmark server configuration is low, can multiple machines be used to simulate pressure output? + +Table 2-5 Configuration parameter information + +| Scenario | Parameter | Value | Notes | +| ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| Simulate real write rate | OP_INTERVAL | -1 | You can also enter an integer to control the operation interval. | +| Specify test duration (1 hour) | TEST_MAX_TIME | 3600000 | The unit is ms; the LOOP execution time needs to be greater than this value. | +| Define the law of simulated data: support all data types, and the number is evenly classified; support five data distributions, and the number is evenly distributed; the length of the string is 10; the number of decimal places is 2. | INSERT_DATATYPE_PROPORTION | 1:1:1:1:1:1 | Data type distribution proportion | +| LINE_RATIO | 1 | linear | | +| SIN_RATIO | 1 | Fourier function | | +| SQUARE_RATIO | 1 | Square wave | | +| RANDOM_RATIO | 1 | Random number | | +| CONSTANT_RATIO | 1 | Constant | | +| STRING_LENGTH | 10 | String length | | +| DOUBLE_LENGTH | 2 | Decimal places | | +| Three machines simulate data writing of 300 devices | BENCHMARK_CLUSTER | true | Enable multi-benchmark mode | +| BENCHMARK_INDEX | 0, 1, 3 | Take the writing parameters in the [write test](./Benchmark.md#write-test) as an example: No. 0 is responsible for writing data of device numbers 0-99; No. 1 is responsible for writing data of device numbers 100-199; No. 2 is responsible for writing data of device numbers 200-299. | | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/CLI.md b/src/UserGuide/V2.0.1/Tree/Tools-System/CLI.md new file mode 100644 index 00000000..34803c3e --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/CLI.md @@ -0,0 +1,295 @@ + + +# Command Line Interface (CLI) + + +IoTDB provides Cli/shell tools for users to interact with IoTDB server in command lines. This document shows how Cli/shell tool works and the meaning of its parameters. + +> Note: In this document, \$IOTDB\_HOME represents the path of the IoTDB installation directory. + +## Installation + +If you use the source code version of IoTDB, then under the root path of IoTDB, execute: + +```shell +> mvn clean package -pl iotdb-client/cli -am -DskipTests -P get-jar-with-dependencies +``` + +After build, the IoTDB Cli will be in the folder "cli/target/iotdb-cli-{project.version}". + +If you download the binary version, then the Cli can be used directly in sbin folder. + +## Running + +### Running Cli + +After installation, there is a default user in IoTDB: `root`, and the +default password is `root`. Users can use this username to try IoTDB Cli/Shell tool. The cli startup script is the `start-cli` file under the \$IOTDB\_HOME/bin folder. When starting the script, you need to specify the IP and PORT. (Make sure the IoTDB cluster is running properly when you use Cli/Shell tool to connect to it.) + +Here is an example where the cluster is started locally and the user has not changed the running port. The default rpc port is +6667
+If you need to connect to the remote DataNode or changes +the rpc port number of the DataNode running, set the specific IP and RPC PORT at -h and -p.
+You also can set your own environment variable at the front of the start script ("/sbin/start-cli.sh" for linux and "/sbin/start-cli.bat" for windows) + +The Linux and MacOS system startup commands are as follows: + +```shell +Shell > bash sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root +``` + +The Windows system startup commands are as follows: + +```shell +Shell > sbin\start-cli.bat -h 127.0.0.1 -p 6667 -u root -pw root +``` + +After operating these commands, the cli can be started successfully. The successful status will be as follows: + +``` + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version + + +Successfully login at 127.0.0.1:6667 +IoTDB> +``` + +Enter ```quit``` or `exit` can exit Cli. + +### Cli Parameters + +| Parameter name | Parameter type | Required | Description | Example | +| :--------------------------- | :------------------------- | :------- | :----------------------------------------------------------- | :------------------ | +| -disableISO8601 | No parameters | No | If this parameter is set, IoTDB will print the timestamp in digital form | -disableISO8601 | +| -h <`host`> | string, no quotation marks | Yes | The IP address of the IoTDB server | -h 10.129.187.21 | +| -help | No parameters | No | Print help information for IoTDB | -help | +| -p <`rpcPort`> | int | Yes | The rpc port number of the IoTDB server. IoTDB runs on rpc port 6667 by default | -p 6667 | +| -pw <`password`> | string, no quotation marks | No | The password used for IoTDB to connect to the server. If no password is entered, IoTDB will ask for password in Cli command | -pw root | +| -u <`username`> | string, no quotation marks | Yes | User name used for IoTDB to connect the server | -u root | +| -maxPRC <`maxPrintRowCount`> | int | No | Set the maximum number of rows that IoTDB returns | -maxPRC 10 | +| -e <`execute`> | string | No | manipulate IoTDB in batches without entering cli input mode | -e "show databases" | +| -c | empty | No | If the server enables `rpc_thrift_compression_enable=true`, then cli must use `-c` | -c | + +Following is a cli command which connects the host with IP +10.129.187.21, rpc port 6667, username "root", password "root", and prints the timestamp in digital form. The maximum number of lines displayed on the IoTDB command line is 10. + +The Linux and MacOS system startup commands are as follows: + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` + +The Windows system startup commands are as follows: + +```shell +Shell > sbin\start-cli.bat -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` + +### CLI Special Command + +Special commands of Cli are below. + +| Command | Description / Example | +| :-------------------------- | :------------------------------------------------------ | +| `set time_display_type=xxx` | eg. long, default, ISO8601, yyyy-MM-dd HH:mm:ss | +| `show time_display_type` | show time display type | +| `set time_zone=xxx` | eg. +08:00, Asia/Shanghai | +| `show time_zone` | show cli time zone | +| `set fetch_size=xxx` | set fetch size when querying data from server | +| `show fetch_size` | show fetch size | +| `set max_display_num=xxx` | set max lines for cli to output, -1 equals to unlimited | +| `help` | Get hints for CLI special commands | +| `exit/quit` | Exit CLI | + +### Note on using the CLI with OpenID Connect Auth enabled on Server side + +Openid connect (oidc) uses keycloack as the authority authentication service of oidc service + + +#### configuration + +The configuration is located in iotdb-system.properties , set the author_provider_class is org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer Openid service is enabled, and the default value is org.apache.iotdb.db.auth.authorizer.LocalFileAuthorizer Indicates that the openid service is not enabled. + +``` +authorizer_provider_class=org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer +``` + +If the openid service is turned on, openid_URL is required,openID_url value is http://ip:port/realms/{realmsName} + +``` +openID_url=http://127.0.0.1:8080/realms/iotdb/ +``` + +#### keycloack configuration + +1、Download the keycloack file (This tutorial is version 21.1.0) and start keycloack in keycloack/bin + +```shell +Shell >cd bin +Shell >./kc.sh start-dev +``` + +2、use url(https://ip:port) login keycloack, the first login needs to create a user +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/login_keycloak.png?raw=true) + +3、Click administration console +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/AdministrationConsole.png?raw=true) + +4、In the master menu on the left, click Create realm and enter Realm name to create a new realm +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_1.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_2.jpg?raw=true) + + +5、Click the menu clients on the left to create clients + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/client.jpg?raw=true) + +6、Click user on the left menu to create user + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/user.jpg?raw=true) + +7、Click the newly created user ID, click the credentials navigation, enter the password and close the temporary option. The configuration of keycloud is completed + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/pwd.jpg?raw=true) + +8、To create a role, click Roles on the left menu and then click the Create Role button to add a role + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role1.jpg?raw=true) + +9、 Enter `iotdb_admin` in the Role Name and click the save button. Tip: `iotdb_admin` here cannot be any other name, otherwise even after successful login, you will not have permission to use iotdb's query, insert, create database, add users, roles and other functions + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role2.jpg?raw=true) + +10、Click on the User menu on the left and then click on the user in the user list to add the `iotdb_admin` role we just created for that user + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role3.jpg?raw=true) + +11、 Select Role Mappings, select the `iotdb_admin` role in Assign Role + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role4.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role5.jpg?raw=true) + + +Tip: If the user role is adjusted, you need to regenerate the token and log in to iotdb again to take effect + +The above steps provide a way for keycloak to log into iotdb. For more ways, please refer to keycloak configuration + +If OIDC is enabled on server side then no username / passwort is needed but a valid Access Token from the OIDC Provider. +So as username you use the token and the password has to be empty, e.g. + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u {my-access-token} -pw "" +``` + +Among them, you need to replace {my access token} (note, including {}) with your token, that is, the value corresponding to access_token. The password is empty and needs to be confirmed again. + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/iotdbpw.jpeg?raw=true) + + +How to get the token is dependent on your OpenID Connect setup and not covered here. +In the simplest case you can get this via the command line with the `passwort-grant`. +For example, if you use keycloack as OIDC and you have a realm with a client `iotdb` defined as public you could use +the following `curl` command to fetch a token (replace all `{}` with appropriate values). + +```shell +curl -X POST "https://{your-keycloack-server}/realms/{your-realm}/protocol/openid-connect/token" \ + -H "Content-Type: application/x-www-form-urlencoded" \ + -d "username={username}" \ + -d "password={password}" \ + -d 'grant_type=password' \ + -d "client_id=iotdb-client" +``` + +The response looks something like + +```json +{"access_token":"eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJxMS1XbTBvelE1TzBtUUg4LVNKYXAyWmNONE1tdWNXd25RV0tZeFpKNG93In0.eyJleHAiOjE1OTAzOTgwNzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNjA0ZmYxMDctN2NiNy00NTRmLWIwYmQtY2M2ZDQwMjFiNGU4IiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiYWNjb3VudCIsInN1YiI6ImJhMzJlNDcxLWM3NzItNGIzMy04ZGE2LTZmZThhY2RhMDA3MyIsInR5cCI6IkJlYXJlciIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsImFjciI6IjEiLCJhbGxvd2VkLW9yaWdpbnMiOlsibG9jYWxob3N0OjgwODAiXSwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbIm9mZmxpbmVfYWNjZXNzIiwidW1hX2F1dGhvcml6YXRpb24iLCJpb3RkYl9hZG1pbiJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoiZW1haWwgcHJvZmlsZSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJ1c2VyIn0.nwbrJkWdCNjzFrTDwKNuV5h9dDMg5ytRKGOXmFIajpfsbOutJytjWTCB2WpA8E1YI3KM6gU6Jx7cd7u0oPo5syHhfCz119n_wBiDnyTZkFOAPsx0M2z20kvBLN9k36_VfuCMFUeddJjO31MeLTmxB0UKg2VkxdczmzMH3pnalhxqpnWWk3GnrRrhAf2sZog0foH4Ae3Ks0lYtYzaWK_Yo7E4Px42-gJpohy3JevOC44aJ4auzJR1RBj9LUbgcRinkBy0JLi6XXiYznSC2V485CSBHW3sseXn7pSXQADhnmGQrLfFGO5ZljmPO18eFJaimdjvgSChsrlSEmTDDsoo5Q","expires_in":300,"refresh_expires_in":1800,"refresh_token":"eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhMzZlMGU0NC02MWNmLTQ5NmMtOGRlZi03NTkwNjQ5MzQzMjEifQ.eyJleHAiOjE1OTAzOTk1NzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNmMxNTBiY2EtYmE5NC00NTgxLWEwODEtYjI2YzhhMmI5YmZmIiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwic3ViIjoiYmEzMmU0NzEtYzc3Mi00YjMzLThkYTYtNmZlOGFjZGEwMDczIiwidHlwIjoiUmVmcmVzaCIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsInNjb3BlIjoiZW1haWwgcHJvZmlsZSJ9.ayNpXdNX28qahodX1zowrMGiUCw2AodlHBQFqr8Ui7c","token_type":"bearer","not-before-policy":0,"session_state":"060d2862-14ed-42fe-baf7-8d1f784657f1","scope":"email profile"} +``` + +The interesting part here is the access token with the key `access_token`. +This has to be passed as username (with parameter `-u`) and empty password to the CLI. + +### Batch Operation of Cli + +-e parameter is designed for the Cli/shell tool in the situation where you would like to manipulate IoTDB in batches through scripts. By using the -e parameter, you can operate IoTDB without entering the cli's input mode. + +In order to avoid confusion between statements and other parameters, the current version only supports the -e parameter as the last parameter. + +The usage of -e parameter for Cli/shell is as follows: + +The Linux and MacOS system commands: + +```shell +Shell > bash sbin/start-cli.sh -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +The Windows system commands: + +```shell +Shell > sbin\start-cli.bat -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +In the Windows environment, the SQL statement of the -e parameter needs to use ` `` ` to replace `" "` + +In order to better explain the use of -e parameter, take following as an example(On linux system). + +Suppose you want to create a database root.demo to a newly launched IoTDB, create a timeseries root.demo.s1 and insert three data points into it. With -e parameter, you could write a shell like this: + +```shell +# !/bin/bash + +host=127.0.0.1 +rpcPort=6667 +user=root +pass=root + +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create database root.demo" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create timeseries root.demo.s1 WITH DATATYPE=INT32, ENCODING=RLE" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(1,10)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(2,11)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(3,12)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "select s1 from root.demo" +``` + +The results are shown in the figure, which are consistent with the Cli and jdbc operations. + +```shell + Shell > bash ./shell.sh ++-----------------------------+------------+ +| Time|root.demo.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 10| +|1970-01-01T08:00:00.002+08:00| 11| +|1970-01-01T08:00:00.003+08:00| 12| ++-----------------------------+------------+ +Total line number = 3 +It costs 0.267s +``` + +It should be noted that the use of the -e parameter in shell scripts requires attention to the escaping of special characters. diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Data-Export-Tool.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Data-Export-Tool.md new file mode 100644 index 00000000..3c4a60b0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Data-Export-Tool.md @@ -0,0 +1,213 @@ +# Data Export + +## 1. Introduction to Export Tools + +Export tools can export data queried from SQL into specified formats, including the `export-tsfile.sh/bat` script for exporting TsFile files and the `export-data.sh/bat` script that supports exporting in CSV and SQL formats. + +## 2. Supported Data Types + +- CSV: A plain text format for storing formatted data, which needs to be constructed according to the specified CSV format mentioned below. + +- SQL: A file containing custom SQL statements. + +- TsFile: The file format for time series used in IoTDB. + +## 3. export-tsfile Script + +Supports TsFile: The file format for time series used in IoTDB. + + +#### 3.1 Command + +```Bash +# Unix/OS X +tools/export-tsfile.sh -h -p -u -pw -td [-f -q -s ] + +# Windows +tools\export-tsfile.bat -h -p -u -pw -td [-f -q -s ] +``` + +#### 3.2 Parameter Introduction + +| **Parameter** | **Definition** | **Required** | **Default** | +| -------- | ------------------------------------------------------------ | ------------ | --------- | +| -h | Hostname | No | 127.0.0.1 | +| -p | Port | No | 6667 | +| -u | Username | No | root | +| -pw | Password | No | root | +| -t | Target file directory, used to specify the directory where the output file should be saved | Yes | - | +| -tfn | Name of the export file | No | - | +| -q | Number of query commands to be executed, possibly used for batch execution of queries | No | - | +| -s | SQL file path, used to specify the location of the file containing the SQL statements to be executed | No | - | +| -timeout | Session query timeout, used to specify the maximum allowed time before the query operation is automatically terminated | No | - | + +In addition, if the `-s` and `-q` parameters are not used, after the export script is started, you need to enter the query statement according to the program prompt, and different query results will be saved to different TsFile files. + + + +#### 3.3 Running Examples + +```Bash +# Unix/OS X +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 + +# Windows +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 +``` + +## 4. export-data Script + +Supports CSV: A plain text format for storing formatted data, which needs to be constructed according to the specified CSV format below. + +Supports SQL: A file containing custom SQL statements. + +#### 4.1 Command + +```Bash +# Unix/OS X +>tools/export-data.sh -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] + +# Windows +>tools\export-data.bat -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] +``` + +#### 4.2 Parameter Introduction + +| **Parameter** | **Definition** | **Required** | **Default** | +| --------- | ------------------------------------------------------------ | ------------ | ------------------------ | +| -h | Hostname | No | 127.0.0.1 | +| -p | Port | No | 6667 | +| -u | Username | No | root | +| -pw | Password | No | root | +| -t | Exported CSV or SQL file output path (In V1.3.2, the parameter was `-td`) | Yes | | +| -datatype | Whether to print the corresponding data type behind the time series in the CSV file header, options are true or false | No | true | +| -q | Directly specify the query statement to be executed in the command (currently only supports some statements, detailed list see below) Note: -q and -s parameters are required to fill in one, if filled in at the same time, -q takes effect. For detailed supported SQL statement examples, please refer to "SQL Statement Support Details" below | 否 | | +| -s | Specify the SQL file, which may contain one or more SQL statements. If it contains multiple SQL statements, they should be separated by newlines (carriage returns). Each SQL statement corresponds to one or more output CSV or SQL files. Note: -q and -s parameters are required to fill in one, if filled in at the same time, -q takes effect. For detailed supported SQL statement examples, please refer to "SQL Statement Support Rules" below | 否 | | +| -type | Specify the type of exported file, options are csv or sql | No | csv | +| -tf | Specify the time format. The time format must comply with the[ISO 8601](https://calendars.wikia.org/wiki/ISO_8601)standard, or `timestamp` Explanation: Only effective when the - type is CSV | 否 | yyyy-MM-dd HH:mm:ss.SSSz | +| -lpf | Specify the maximum number of lines in the dump file to be exported (V1.3.2 version parameter is`-linesPerFile`) | No | 10000 | +| -timeout | Specify the timeout period for session queries, in ms | No | -1 | + +#### 4.3 SQL 语句支持规则 + +1. Only query statements are supported; non-query statements (such as metadata management, system management, etc.) are not supported. For unsupported SQL, the program will automatically skip and output an error message. + +2. In query statements, the current version only supports the export of raw data. If there are group by, aggregate functions, UDFs, operational operators, etc., they are not supported for export as SQL. When exporting raw data, please note that if exporting data from multiple devices, please use the align by device statement. Detailed examples are as follows: + + +| | **支持导出** | **示例** | +| ----------------------------------------- | ------------ | --------------------------------------------- | +| Raw data single device query | Supported | select * from root.s_0.d_0 | +| | 原始数据多设备查询(aligin by device) | Supported | select * from root.** align by device | + | 支持 | select * from root.** align by device | +| Raw data multi-device query (without align by device) | Not Supported | select * from root.**select * from root.s_0.* | + +#### 4.4 Running Examples + +- Export all data within the scope of a SQL execution to a CSV file. + +```Bash +# Unix/OS X +>tools/export-data.sh -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] + +# Windows +>tools\export-data.bat -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] +``` + +- Export results + +```Bash +Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice +2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 +2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 +``` + +- Export all data within the scope of all SQL executions in a SQL file to a CSV file. + + +```Bash +# Unix/OS X +>tools/export-data.sh -t ./data/ -s export.sql +# Windows +>tools/export-data.bat -t ./data/ -s export.sql +``` + +- Content of export.sql file (file pointed to by -s parameter) + + +```Bash +select * from root.stock.** limit 100 +select * from root.db.** limit 100 +``` + +- Export result file 1 + + +```Bash +Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice +2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 +2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 +``` + +- Export result file 2 + + +```Bash +Time,root.db.Random.RandomBoolean +2024-07-22T17:16:05.820+08:00,true +2024-07-22T17:16:02.597+08:00,false +``` + +- Export data defined in the SQL file within the IoTDB database in an aligned format as SQL statements. + + +```Bash +# Unix/OS X +>tools/export-data.sh -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +# Windows +>tools/export-data.bat -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +``` + +- Export results + + +```Bash +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249629831,0.62308747,2.0,0.012206747854849653,-6.0,false,0.14164352); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249630834,0.7520042,3.0,0.22760657101910464,-5.0,true,0.089064896); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249631835,0.3981064,3.0,0.6254559288663467,-6.0,false,0.9767922); +``` + +- Export all data within the scope of a SQL execution to a CSV file, specifying the export time format as `yyyy-MM-dd HH:mm:ss`, and print the corresponding data type behind the table header time series. + + +```Bash +# Unix/OS X +>tools/export-data.sh -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +# Windows +>tools/export-data.bat -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +``` + +- Export results + + +```Bash +Time,root.stock.Legacy.0700HK.L1_BidPrice(DOUBLE),root.stock.Legacy.0700HK.Type(DOUBLE),root.stock.Legacy.0700HK.L1_BidSize(DOUBLE),root.stock.Legacy.0700HK.Domain(DOUBLE),root.stock.Legacy.0700HK.L1_BuyNo(BOOLEAN),root.stock.Legacy.0700HK.L1_AskPrice(DOUBLE) +2024-07-30 10:33:55,0.44574088,3.0,0.21476832811611501,-4.0,true,0.5951748 +2024-07-30 10:33:56,0.6880933,3.0,0.6289119476165305,-5.0,false,0.114634395 +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Data-Import-Tool.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Data-Import-Tool.md new file mode 100644 index 00000000..f03c3a5b --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Data-Import-Tool.md @@ -0,0 +1,217 @@ +# Data Import + +## 1. IoTDB Data Import + +IoTDB currently supports importing data in CSV, SQL, and TsFile (IoTDB's underlying open-time series file format) into the database. The specific functionalities are as follows: + + + + + + + + + + + + + + + + + + + + + + + + + + +
File FormatIoTDB ToolDescription
CSVimport-data.sh/batCan be used for single or batch import of CSV files into IoTDB
SQLimport-data.sh/batCan be used for single or batch import of SQL files into IoTDB
TsFileload-tsfile.sh/batCan be used for single or batch import of TsFile files into IoTDB
TsFile Active Listening & Loading FeatureAccording to user configuration, it listens for changes in TsFile files in the specified path and loads newly added TsFile files into IoTDB
+ +## 2. import-data Scripts + +- Supported formats: CSV、SQL + +### 2.1 Command + +```Bash +# Unix/OS X +>tools/import-data.sh -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] + +# Windows +>tools\import-data.bat -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] +``` + +### 2.2 Parameter Introduction + + +| **Parameter** | **Definition** | **Required** | **Default** | +| --------- | ------------------------------------------------------------ | ------------ | ------------------------ | +| -h | Hostname | No | 127.0.0.1 | +| -p | Port | No | 6667 | +| -u | Username | No | root | +| -pw | Password | No | root | +| -s | Specify the data to be imported, here you can specify files or folders. If a folder is specified, all files with suffixes of csv or sql in the folder will be batch imported (In V1.3.2, the parameter is `-f`) | Yes | | +| -fd | Specify the directory for storing failed SQL files. If this parameter is not specified, failed files will be saved in the source data directory. Note: For unsupported SQL, illegal SQL, and failed SQL, they will be put into the failed directory under the failed file (default is the file name with `.failed` suffix) | No |The source filename with `.failed` suffix | +| -aligned | Specify whether to use the `aligned` interface, options are true or false. Note: This parameter is only effective when importing csv files. | No | false | +| -batch | Used to specify the number of data points per batch (minimum value is 1, maximum value is Integer.*MAX_VALUE*). If the program reports the error `org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`, you can appropriately reduce this parameter. | No | 100000 | +| -tp | Specify the time precision, options include `ms` (milliseconds), `ns` (nanoseconds), `us` (microseconds) | No | ms | +| -lpf | Specify the number of data lines written per failed file (In V1.3.2, the parameter is `-linesPerFailedFile`) | No | 10000 | +| -typeInfer | Used to specify type inference rules, such as . Note: Used to specify type inference rules. `srcTsDataType` includes `boolean`, `int`, `long`, `float`, `double`, `NaN`. `dstTsDataType` includes `boolean`, `int`, `long`, `float`, `double`, `text`. When `srcTsDataType` is `boolean`, `dstTsDataType` can only be `boolean` or `text`. When `srcTsDataType` is `NaN`, `dstTsDataType` can only be `float`, `double`, or `text`. When `srcTsDataType` is a numerical type, the precision of `dstTsDataType` needs to be higher than `srcTsDataType`. For example: `-typeInfer boolean=text,float=double` | No | | + + +### 2.3 Running Example + + +- Import the `dump0_0.sql` data in the current `data` directory to the local IoTDB database. + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.sql +# Windows +>tools/import-data.bat -s ./data/dump0_0.sql +``` + +- Import all data in the current `data` directory in an aligned manner to the local IoTDB database. + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/ -fd ./failed/ -aligned true +# Windows +>tools/import-data.bat -s ./data/ -fd ./failed/ -aligned true +``` + +- Import the `dump0_0.csv` data in the current `data` directory to the local IoTDB database. + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.csv -fd ./failed/ +# Windows +>tools/import-data.bat -s ./data/dump0_0.csv -fd ./failed/ +``` + +- Import the `dump0_0.csv` data in the current `data` directory in an aligned manner, batch import 100000 lines to the IoTDB database on the host with IP `192.168.100.1`, record failures in the current `failed` directory, with a maximum of 1000 lines per file. + + +```Bash +# Unix/OS X +>tools/import-data.sh -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +# Windows +>tools/import-data.bat -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +``` + + +## 3. load-tsfile Script + +- Supported formats: TsFile + +### 3.1 Command + +```Bash +# Unix/OS X +>tools/load-tsfile.sh -h -p -u -pw -s -os [-sd ] -of [-fd ] [-tn ] + +# Windows +>tools\load-tsfile.bat -h -p -u -pw -s -os [-sd ] -of [-fd ] [-tn ] +``` + +### 3.2 Parameter Introduction + + +| **Parameter** | **Description** | **Required** | **Default** | +| -------- | ------------------------------------------------------------ | ----------------------------------- | ------------------- | +| -h | Hostname | No | root | +| -p | Port | No | root | +| -u | Username | No | 127.0.0.1 | +| -pw | Password | No | 6667 | +| -s | The local directory path of the script file (folder) to be loaded | Yes | | +| -os | none: Do not delete
mv: Move successful files to the target folder
cp: Hard link (copy) successful files to the target folder
delete: Delete | Yes | | +| -sd | When --on_success is mv or cp, the target folder for mv or cp. The file name of the file becomes the folder flattened and then concatenated with the original file name. | When --on_success is mv or cp, it is required to fill in Yes | ${EXEC_DIR}/success | +| -of | none: Skip
mv: Move failed files to the target folder
cp: Hard link (copy) failed files to the target folder
delete: Delete | Yes | | +| -fd | When --on_fail is specified as mv or cp, the target folder for mv or cp. The file name of the file becomes the folder flattened and then concatenated with the original file name. | When --on_fail is specified as mv or cp, it is required to fill in | ${EXEC_DIR}/fail | +| -tn | Maximum number of parallel threads | Yes | 8 | + + + +### 3.3 Running Examples + + +```Bash +# Unix/OS X +> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8 +> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8 + +# Windows +> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8 +> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8 +``` + +## 4. TsFile Active Listening & Loading Feature + +The TsFile Active Listening & Loading Feature can actively monitor TsFile file changes in the specified target path (configured by the user) and automatically synchronize TsFile files from the target path to the specified reception path (configured by the user). Through this feature, IoTDB can automatically detect and load these files without the need for any additional manual loading operations. This automated process not only simplifies the user's operational steps but also reduces potential errors that may occur during the operation, effectively reducing the complexity for users during the usage process. + +![](https://alioss.timecho.com/docs/img/Data-import2.png) + + +### 4.1 Configuration Parameters + +You can enable the TsFile Active Listening & Loading Feature by finding the following parameters in the configuration file template `iotdb-system.properties.template` and adding them to the IoTDB configuration file `iotdb-system.properties`. The complete configuration is as follows: + + +| **Configuration Parameter** | **Description** | **Value Range** | **Required** | **Default Value** | **Loading Method** | +| -------------------------------------------- | ------------------------------------------------------------ | -------------------------- | ------------ | ---------------------- | ---------------- | +| load_active_listening_enable | Whether to enable the DataNode's active listening and loading of tsfile functionality (default is enabled). | Boolean: true,false | Optional | true | Hot Loading | +| load_active_listening_dirs | The directories to be listened to (automatically includes subdirectories of the directory), if there are multiple, separate with “,”. The default directory is ext/load/pending (supports hot loading). | String: one or more file directories | Optional | ext/load/pending | Hot Loading | +| load_active_listening_fail_dir | The directory to which files are transferred after the execution of loading tsfile files fails, only one directory can be configured. | String: one file directory | Optional | ext/load/failed | Hot Loading | +| load_active_listening_max_thread_num | The maximum number of threads to perform loading tsfile tasks simultaneously. The default value when the parameter is commented out is max(1, CPU core count / 2). When the user sets a value not in the range [1, CPU core count / 2], it will be set to the default value (1, CPU core count / 2). | Long: [1, Long.MAX_VALUE] | Optional | max(1, CPU core count / 2) | Effective after restart | +| load_active_listening_check_interval_seconds | Active listening polling interval in seconds. The function of actively listening to tsfile is achieved by polling the folder. This configuration specifies the time interval between two checks of load_active_listening_dirs, and the next check will be executed after load_active_listening_check_interval_seconds seconds of each check. When the user sets the polling interval to less than 1, it will be set to the default value of 5 seconds. | Long: [1, Long.MAX_VALUE] | Optional | 5 | Effective after restart | + + +### 4.2 Precautions + +1. If there is a mods file in the files to be loaded, the mods file should be moved to the listening directory first, and then the tsfile files should be moved, with the mods file and the corresponding tsfile file in the same directory. This prevents the loading of tsfile files without the corresponding mods files. + + +```SQL +FUNCTION moveFilesToListeningDirectory(sourceDirectory, listeningDirectory) + // Move mods files + modsFiles = searchFiles(sourceDirectory, "*mods*") + IF modsFiles IS NOT EMPTY + FOR EACH file IN modsFiles + MOVE(file, listeningDirectory) + END FOR + END IF + + // Move tsfile files + tsfileFiles = searchFiles(sourceDirectory, "*tsfile*") + IF tsfileFiles IS NOT EMPTY + FOR EACH file IN tsfileFiles + MOVE(file, listeningDirectory) + END FOR + END IF +END FUNCTION + +FUNCTION searchFiles(directory, pattern) + matchedFiles = [] + FOR EACH file IN directory.files + IF file.name MATCHES pattern + APPEND file TO matchedFiles + END IF + END FOR + RETURN matchedFiles +END FUNCTION + +FUNCTION MOVE(sourceFile, targetDirectory) + // Implement the logic of moving files from sourceFile to targetDirectory +END FUNCTION +``` + +2. Prohibit setting the receiver directory of Pipe, the data directory for storing data, etc., as the listening directory. + +3. Prohibit `load_active_listening_fail_dir` from having the same directory as `load_active_listening_dirs`, or each other's nesting. + +4. Ensure that the `load_active_listening_dirs` directory has sufficient permissions. After the load is successful, the files will be deleted. If there is no delete permission, it will lead to repeated loading. + diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_apache.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_apache.md new file mode 100644 index 00000000..98b69f17 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_apache.md @@ -0,0 +1,228 @@ + +# Cluster management tool + +## IoTDB Data Directory Overview Tool + +IoTDB data directory overview tool is used to print an overview of the IoTDB data directory structure. The location is tools/tsfile/print-iotdb-data-dir. + +### Usage + +- For Windows: + +```bash +.\print-iotdb-data-dir.bat () +``` + +- For Linux or MacOs: + +```shell +./print-iotdb-data-dir.sh () +``` + +Note: if the storage path of the output overview file is not set, the default relative path "IoTDB_data_dir_overview.txt" will be used. + +### Example + +Use Windows in this example: + +`````````````````````````bash +.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data +```````````````````````` +Starting Printing the IoTDB Data Directory Overview +```````````````````````` +output save path:IoTDB_data_dir_overview.txt +data dir num:1 +143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +|============================================================== +|D:\github\master\iotdb\data\datanode\data +|--sequence +| |--root.redirect0 +| | |--1 +| | | |--0 +| |--root.redirect1 +| | |--2 +| | | |--0 +| |--root.redirect2 +| | |--3 +| | | |--0 +| |--root.redirect3 +| | |--4 +| | | |--0 +| |--root.redirect4 +| | |--5 +| | | |--0 +| |--root.redirect5 +| | |--6 +| | | |--0 +| |--root.sg1 +| | |--0 +| | | |--0 +| | | |--2760 +|--unsequence +|============================================================== +````````````````````````` + +## TsFile Sketch Tool + +TsFile sketch tool is used to print the content of a TsFile in sketch mode. The location is tools/tsfile/print-tsfile. + +### Usage + +- For Windows: + +``` +.\print-tsfile-sketch.bat () +``` + +- For Linux or MacOs: + +``` +./print-tsfile-sketch.sh () +``` + +Note: if the storage path of the output sketch file is not set, the default relative path "TsFile_sketch_view.txt" will be used. + +### Example + +Use Windows in this example: + +`````````````````````````bash +.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt +```````````````````````` +Starting Printing the TsFile Sketch +```````````````````````` +TsFile path:D:\github\master\1669359533965-1-0-0.tsfile +Sketch save path:D:\github\master\sketch.txt +148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +-------------------------------- TsFile Sketch -------------------------------- +file path: D:\github\master\1669359533965-1-0-0.tsfile +file length: 2974 + + POSITION| CONTENT + -------- ------- + 0| [magic head] TsFile + 6| [version number] 3 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 + 7| [Chunk Group Header] + | [marker] 0 + | [deviceID] root.sg1.d1 + 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends + 2656| [marker] 2 + 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [ChunkIndex] offset=20 + 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [ChunkIndex] offset=893 + 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [ChunkIndex] offset=1766 + 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT + | + | +||||||||||||||||||||| [TsFileMetadata] begins + 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE + | + | + | [meta offset] 2656 + | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 +||||||||||||||||||||| [TsFileMetadata] ends + 2964| [TsFileMetadataSize] 73 + 2968| [magic tail] TsFile + 2974| END of TsFile +---------------------------- IndexOfTimerseriesIndex Tree ----------------------------- + [MetadataIndex:LEAF_DEVICE] + └──────[root.sg1.d1,2870] + [MetadataIndex:LEAF_MEASUREMENT] + └──────[s1,2657] +---------------------------------- TsFile Sketch End ---------------------------------- +````````````````````````` + +Explanations: + +- Separated by "|", the left is the actual position in the TsFile, and the right is the summary content. +- "||||||||||||||||||||" is the guide information added to enhance readability, not the actual data stored in TsFile. +- The last printed "IndexOfTimerseriesIndex Tree" is a reorganization of the metadata index tree at the end of the TsFile, which is convenient for intuitive understanding, and again not the actual data stored in TsFile. + +## TsFile Resource Sketch Tool + +TsFile resource sketch tool is used to print the content of a TsFile resource file. The location is tools/tsfile/print-tsfile-resource-files. + +### Usage + +- For Windows: + +```bash +.\print-tsfile-resource-files.bat +``` + +- For Linux or MacOs: + +``` +./print-tsfile-resource-files.sh +``` + +### Example + +Use Windows in this example: + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. +````````````````````````` + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. +````````````````````````` diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_timecho.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_timecho.md new file mode 100644 index 00000000..d7778793 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_timecho.md @@ -0,0 +1,960 @@ + +# Cluster management tool + +## IoTDB-OpsKit + +The IoTDB OpsKit is an easy-to-use operation and maintenance tool (enterprise version tool). +It is designed to solve the operation and maintenance problems of multiple nodes in the IoTDB distributed system. +It mainly includes cluster deployment, cluster start and stop, elastic expansion, configuration update, data export and other functions, thereby realizing one-click command issuance for complex database clusters, which greatly Reduce management difficulty. +This document will explain how to remotely deploy, configure, start and stop IoTDB cluster instances with cluster management tools. + +### Environment dependence + +This tool is a supporting tool for TimechoDB(Enterprise Edition based on IoTDB). You can contact your sales representative to obtain the tool download method. + +The machine where IoTDB is to be deployed needs to rely on jdk 8 and above, lsof, netstat, and unzip functions. If not, please install them yourself. You can refer to the installation commands required for the environment in the last section of the document. + +Tip: The IoTDB cluster management tool requires an account with root privileges + +### Deployment method + +#### Download and install + +This tool is a supporting tool for TimechoDB(Enterprise Edition based on IoTDB). You can contact your salesperson to obtain the tool download method. + +Note: Since the binary package only supports GLIBC2.17 and above, the minimum version is Centos7. + +* After entering the following commands in the iotdb-opskit directory: + +```bash +bash install-iotdbctl.sh +``` + +The iotdbctl keyword can be activated in the subsequent shell, such as checking the environment instructions required before deployment as follows: + +```bash +iotdbctl cluster check example +``` + +* You can also directly use <iotdbctl absolute path>/sbin/iotdbctl without activating iotdbctl to execute commands, such as checking the environment required before deployment: + +```bash +/sbin/iotdbctl cluster check example +``` + +### Introduction to cluster configuration files + +* There is a cluster configuration yaml file in the `iotdbctl/config` directory. The yaml file name is the cluster name. There can be multiple yaml files. In order to facilitate users to configure yaml files, a `default_cluster.yaml` example is provided under the iotdbctl/config directory. +* The yaml file configuration consists of five major parts: `global`, `confignode_servers`, `datanode_servers`, `grafana_server`, and `prometheus_server` +* `global` is a general configuration that mainly configures machine username and password, IoTDB local installation files, Jdk configuration, etc. A `default_cluster.yaml` sample data is provided in the `iotdbctl/config` directory, + Users can copy and modify it to their own cluster name and refer to the instructions inside to configure the IoTDB cluster. In the `default_cluster.yaml` sample, all uncommented items are required, and those that have been commented are non-required. + +例如要执行`default_cluster.yaml`检查命令则需要执行命令`iotdbctl cluster check default_cluster`即可, +更多详细命令请参考下面命令列表。 + + +| parameter name | parameter describe | required | +|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| iotdb\_zip\_dir | IoTDB deployment distribution directory, if the value is empty, it will be downloaded from the address specified by `iotdb_download_url` | NO | +| iotdb\_download\_url | IoTDB download address, if `iotdb_zip_dir` has no value, download from the specified address | NO | +| jdk\_tar\_dir | jdk local directory, you can use this jdk path to upload and deploy to the target node. | NO | +| jdk\_deploy\_dir | jdk remote machine deployment directory, jdk will be deployed to this directory, and the following `jdk_dir_name` parameter forms a complete jdk deployment directory, that is, `/` | NO | +| jdk\_dir\_name | The directory name after jdk decompression defaults to jdk_iotdb | NO | +| iotdb\_lib\_dir | The IoTDB lib directory or the IoTDB lib compressed package only supports .zip format and is only used for IoTDB upgrade. It is in the comment state by default. If you need to upgrade, please open the comment and modify the path. If you use a zip file, please use the zip command to compress the iotdb/lib directory, such as zip -r lib.zip apache-iotdb-1.2.0/lib/* d | NO | +| user | User name for ssh login deployment machine | YES | +| password | The password for ssh login. If the password does not specify the use of pkey to log in, please ensure that the ssh login between nodes has been configured without a key. | NO | +| pkey | Key login: If password has a value, password is used first, otherwise pkey is used to log in. | NO | +| ssh\_port | ssh port | YES | +| deploy\_dir | IoTDB deployment directory, IoTDB will be deployed to this directory and the following `iotdb_dir_name` parameter will form a complete IoTDB deployment directory, that is, `/` | YES | +| iotdb\_dir\_name | The directory name after decompression of IoTDB is iotdb by default. | NO | +| datanode-env.sh | Corresponding to `iotdb/config/datanode-env.sh`, when `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first | NO | +| confignode-env.sh | Corresponding to `iotdb/config/confignode-env.sh`, the value in `datanode_servers` is used first when `global` and `datanode_servers` are configured at the same time | NO | +| iotdb-system.properties | Corresponds to `/config/iotdb-system.properties` | NO | +| cn\_internal\_address | The cluster configuration address points to the surviving ConfigNode, and it points to confignode_x by default. When `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_internal\_address | The cluster configuration address points to the surviving ConfigNode, and points to confignode_x by default. When configuring values for `global` and `datanode_servers` at the same time, the value in `datanode_servers` is used first, corresponding to `dn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | + +Among them, datanode-env.sh and confignode-env.sh can be configured with extra parameters extra_opts. When this parameter is configured, corresponding values will be appended after datanode-env.sh and confignode-env.sh. Refer to default_cluster.yaml for configuration examples as follows: +datanode-env.sh: +extra_opts: | +IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:+UseG1GC" +IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:MaxGCPauseMillis=200" + +* `confignode_servers` is the configuration for deploying IoTDB Confignodes, in which multiple Confignodes can be configured + By default, the first started ConfigNode node node1 is regarded as the Seed-ConfigNode + +| parameter name | parameter describe | required | +|-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| name | Confignode name | YES | +| deploy\_dir | IoTDB config node deployment directory | YES | +| cn\_internal\_address | Corresponds to iotdb/internal communication address, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | +| cn_internal_address | The cluster configuration address points to the surviving ConfigNode, and it points to confignode_x by default. When `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | +| cn\_internal\_port | Internal communication port, corresponding to `cn_internal_port` in `iotdb/config/iotdb-system.properties` | YES | +| cn\_consensus\_port | Corresponds to `cn_consensus_port` in `iotdb/config/iotdb-system.properties` | NO | +| cn\_data\_dir | Corresponds to `cn_consensus_port` in `iotdb/config/iotdb-system.properties` Corresponds to `cn_data_dir` in `iotdb/config/iotdb-system.properties` | YES | +| iotdb-system.properties | Corresponding to `iotdb/config/iotdb-system.properties`, when configuring values in `global` and `confignode_servers` at the same time, the value in confignode_servers will be used first. | NO | + +* datanode_servers 是部署IoTDB Datanodes配置,里面可以配置多个Datanode + +| parameter name | parameter describe | required | +|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| name | Datanode name | YES | +| deploy\_dir | IoTDB data node deployment directory | YES | +| dn\_rpc\_address | The datanode rpc address corresponds to `dn_rpc_address` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_internal\_address | Internal communication address, corresponding to `dn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_seed\_config\_node | The cluster configuration address points to the surviving ConfigNode, and points to confignode_x by default. When configuring values for `global` and `datanode_servers` at the same time, the value in `datanode_servers` is used first, corresponding to `dn_seed_config_node` in `iotdb/config/iotdb-system.properties`. | YES | +| dn\_rpc\_port | Datanode rpc port address, corresponding to `dn_rpc_port` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_internal\_port | Internal communication port, corresponding to `dn_internal_port` in `iotdb/config/iotdb-system.properties` | YES | +| iotdb-system.properties | Corresponding to `iotdb/config/iotdb-system.properties`, when configuring values in `global` and `datanode_servers` at the same time, the value in `datanode_servers` will be used first. | NO | + +* grafana_server is the configuration related to deploying Grafana + +| parameter name | parameter describe | required | +|--------------------|-------------------------------------------------------------|-----------| +| grafana\_dir\_name | Grafana decompression directory name(default grafana_iotdb) | NO | +| host | Server ip deployed by grafana | YES | +| grafana\_port | The port of grafana deployment machine, default 3000 | NO | +| deploy\_dir | grafana deployment server directory | YES | +| grafana\_tar\_dir | Grafana compressed package location | YES | +| dashboards | dashboards directory | NO | + +* prometheus_server 是部署Prometheus 相关配置 + +| parameter name | parameter describe | required | +|--------------------------------|----------------------------------------------------|----------| +| prometheus\_dir\_name | prometheus decompression directory name, default prometheus_iotdb | NO | +| host | Server IP deployed by prometheus | YES | +| prometheus\_port | The port of prometheus deployment machine, default 9090 | NO | +| deploy\_dir | prometheus deployment server directory | YES | +| prometheus\_tar\_dir | prometheus compressed package path | YES | +| storage\_tsdb\_retention\_time | The number of days to save data is 15 days by default | NO | +| storage\_tsdb\_retention\_size | The data size that can be saved by the specified block defaults to 512M. Please note the units are KB, MB, GB, TB, PB, and EB. | NO | + +If metrics are configured in `iotdb-system.properties` and `iotdb-system.properties` of config/xxx.yaml, the configuration will be automatically put into promethues without manual modification. + +Note: How to configure the value corresponding to the yaml key to contain special characters such as: etc. It is recommended to use double quotes for the entire value, and do not use paths containing spaces in the corresponding file paths to prevent abnormal recognition problems. + +### scenes to be used + +#### Clean data + +* Cleaning up the cluster data scenario will delete the data directory in the IoTDB cluster and `cn_system_dir`, `cn_consensus_dir`, `cn_consensus_dir` configured in the yaml file + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs` and `ext` directories. +* First execute the stop cluster command, and then execute the cluster cleanup command. + +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster clean default_cluster +``` + +#### Cluster destruction + +* The cluster destruction scenario will delete `data`, `cn_system_dir`, `cn_consensus_dir`, in the IoTDB cluster + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs`, `ext`, `IoTDB` deployment directory, + grafana deployment directory and prometheus deployment directory. +* First execute the stop cluster command, and then execute the cluster destruction command. + + +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster destroy default_cluster +``` + +#### Cluster upgrade + +* To upgrade the cluster, you first need to configure `iotdb_lib_dir` in config/xxx.yaml as the directory path where the jar to be uploaded to the server is located (for example, iotdb/lib). +* If you use zip files to upload, please use the zip command to compress the iotdb/lib directory, such as zip -r lib.zip apache-iotdb-1.2.0/lib/* +* Execute the upload command and then execute the restart IoTDB cluster command to complete the cluster upgrade. + +```bash +iotdbctl cluster dist-lib default_cluster +iotdbctl cluster restart default_cluster +``` + +#### hot deployment + +* First modify the configuration in config/xxx.yaml. +* Execute the distribution command, and then execute the hot deployment command to complete the hot deployment of the cluster configuration + +```bash +iotdbctl cluster dist-conf default_cluster +iotdbctl cluster reload default_cluster +``` + +#### Cluster expansion + +* First modify and add a datanode or confignode node in config/xxx.yaml. +* Execute the cluster expansion command + +```bash +iotdbctl cluster scaleout default_cluster +``` + +#### Cluster scaling + +* First find the node name or ip+port to shrink in config/xxx.yaml (where confignode port is cn_internal_port, datanode port is rpc_port) +* Execute cluster shrink command + +```bash +iotdbctl cluster scalein default_cluster +``` + +#### Using cluster management tools to manipulate existing IoTDB clusters + +* Configure the server's `user`, `passwod` or `pkey`, `ssh_port` +* Modify the IoTDB deployment path in config/xxx.yaml, `deploy_dir` (IoTDB deployment directory), `iotdb_dir_name` (IoTDB decompression directory name, the default is iotdb) + For example, if the full path of IoTDB deployment is `/home/data/apache-iotdb-1.1.1`, you need to modify the yaml files `deploy_dir:/home/data/` and `iotdb_dir_name:apache-iotdb-1.1.1` +* If the server is not using java_home, modify `jdk_deploy_dir` (jdk deployment directory) and `jdk_dir_name` (the directory name after jdk decompression, the default is jdk_iotdb). If java_home is used, there is no need to modify the configuration. + For example, the full path of jdk deployment is `/home/data/jdk_1.8.2`, you need to modify the yaml files `jdk_deploy_dir:/home/data/`, `jdk_dir_name:jdk_1.8.2` +* Configure `cn_internal_address`, `dn_internal_address` +* Configure `cn_internal_address`, `cn_internal_port`, `cn_consensus_port`, `cn_system_dir`, in `iotdb-system.properties` in `confignode_servers` + If the values in `cn_consensus_dir` and `iotdb-system.properties` are not the default for IoTDB, they need to be configured, otherwise there is no need to configure them. +* Configure `dn_rpc_address`, `dn_internal_address`, `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir` in `iotdb-system.properties` +* Execute initialization command + +```bash +iotdbctl cluster init default_cluster +``` + +#### Deploy IoTDB, Grafana and Prometheus + +* Configure `iotdb-system.properties` to open the metrics interface +* Configure the Grafana configuration. If there are multiple `dashboards`, separate them with commas. The names cannot be repeated or they will be overwritten. +* Configure the Prometheus configuration. If the IoTDB cluster is configured with metrics, there is no need to manually modify the Prometheus configuration. The Prometheus configuration will be automatically modified according to which node is configured with metrics. +* Start the cluster + +```bash +iotdbctl cluster start default_cluster +``` + +For more detailed parameters, please refer to the cluster configuration file introduction above + +### Command + +The basic usage of this tool is: +```bash +iotdbctl cluster [params (Optional)] +``` +* key indicates a specific command. + +* cluster name indicates the cluster name (that is, the name of the yaml file in the `iotdbctl/config` file). + +* params indicates the required parameters of the command (optional). + +* For example, the command format to deploy the default_cluster cluster is: + +```bash +iotdbctl cluster deploy default_cluster +``` + +* The functions and parameters of the cluster are listed as follows: + +| command | description | parameter | +|-----------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| check | check whether the cluster can be deployed | Cluster name list | +| clean | cleanup-cluster | cluster-name | +| deploy/dist-all | deploy cluster | Cluster name, -N, module name (optional for iotdb, grafana, prometheus), -op force (optional) | +| list | cluster status list | None | +| start | start cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional) | +| stop | stop cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional), -op force (nodename, grafana, prometheus optional) | +| restart | restart cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional), -op force (nodename, grafana, prometheus optional) | +| show | view cluster information. The details field indicates the details of the cluster information. | Cluster name, details (optional) | +| destroy | destroy cluster | Cluster name, -N, module name (iotdb, grafana, prometheus optional) | +| scaleout | cluster expansion | Cluster name | +| scalein | cluster shrink | Cluster name, -N, cluster node name or cluster node ip+port | +| reload | hot loading of cluster configuration files | Cluster name | +| dist-conf | cluster configuration file distribution | Cluster name | +| dumplog | Back up specified cluster logs | Cluster name, -N, cluster node name -h Back up to target machine ip -pw Back up to target machine password -p Back up to target machine port -path Backup directory -startdate Start time -enddate End time -loglevel Log type -l transfer speed | +| dumpdata | Backup cluster data | Cluster name, -h backup to target machine ip -pw backup to target machine password -p backup to target machine port -path backup directory -startdate start time -enddate end time -l transmission speed | +| dist-lib | lib package upgrade | Cluster name | +| init | When an existing cluster uses the cluster deployment tool, initialize the cluster configuration | Cluster name | +| status | View process status | Cluster name | +| activate | Activate cluster | Cluster name | +| health_check | health check | Cluster name, -N, nodename (optional) | +| backup | Activate cluster | Cluster name,-N nodename (optional) | +| importschema | Activate cluster | Cluster name,-N nodename -param paramters | +| exportschema | Activate cluster | Cluster name,-N nodename -param paramters | + + + +### Detailed command execution process + +The following commands are executed using default_cluster.yaml as an example, and users can modify them to their own cluster files to execute + +#### Check cluster deployment environment commands + +```bash +iotdbctl cluster check default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Verify that the target node is able to log in via SSH + +* Verify whether the JDK version on the corresponding node meets IoTDB jdk1.8 and above, and whether the server is installed with unzip, lsof, and netstat. + +* If you see the following prompt `Info:example check successfully!`, it proves that the server has already met the installation requirements. + If `Error:example check fail!` is output, it proves that some conditions do not meet the requirements. You can check the Error log output above (for example: `Error:Server (ip:172.20.31.76) iotdb port(10713) is listening`) to make repairs. , + If the jdk check does not meet the requirements, we can configure a jdk1.8 or above version in the yaml file ourselves for deployment without affecting subsequent use. + If checking lsof, netstat or unzip does not meet the requirements, you need to install it on the server yourself. + +#### Deploy cluster command + +```bash +iotdbctl cluster deploy default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Upload IoTDB compressed package and jdk compressed package according to the node information in `confignode_servers` and `datanode_servers` (if `jdk_tar_dir` and `jdk_deploy_dir` values ​​are configured in yaml) + +* Generate and upload `iotdb-system.properties` according to the yaml file node configuration information + +```bash +iotdbctl cluster deploy default_cluster -op force +``` + +Note: This command will force the deployment, and the specific process will delete the existing deployment directory and redeploy + +*deploy a single module* +```bash +# Deploy grafana module +iotdbctl cluster deploy default_cluster -N grafana +# Deploy the prometheus module +iotdbctl cluster deploy default_cluster -N prometheus +# Deploy the iotdb module +iotdbctl cluster deploy default_cluster -N iotdb +``` + +#### Start cluster command + +```bash +iotdbctl cluster start default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Start confignode, start sequentially according to the order in `confignode_servers` in the yaml configuration file and check whether the confignode is normal according to the process id, the first confignode is seek config + +* Start the datanode in sequence according to the order in `datanode_servers` in the yaml configuration file and check whether the datanode is normal according to the process id. + +* After checking the existence of the process according to the process id, check whether each service in the cluster list is normal through the cli. If the cli link fails, retry every 10s until it succeeds and retry up to 5 times + + +* +Start a single node command* +```bash +#Start according to the IoTDB node name +iotdbctl cluster start default_cluster -N datanode_1 +#Start according to IoTDB cluster ip+port, where port corresponds to cn_internal_port of confignode and rpc_port of datanode. +iotdbctl cluster start default_cluster -N 192.168.1.5:6667 +#Start grafana +iotdbctl cluster start default_cluster -N grafana +#Start prometheus +iotdbctl cluster start default_cluster -N prometheus +``` + +* Find the yaml file in the default location based on cluster-name + +* Find the node location information based on the provided node name or ip:port. If the started node is `data_node`, the ip uses `dn_rpc_address` in the yaml file, and the port uses `dn_rpc_port` in datanode_servers in the yaml file. + If the started node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` + +* start the node + +Note: Since the cluster deployment tool only calls the start-confignode.sh and start-datanode.sh scripts in the IoTDB cluster, +When the actual output result fails, it may be that the cluster has not started normally. It is recommended to use the status command to check the current cluster status (iotdbctl cluster status xxx) + + +#### View IoTDB cluster status command + +```bash +iotdbctl cluster show default_cluster +#View IoTDB cluster details +iotdbctl cluster show default_cluster details +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Execute `show cluster details` through cli on datanode in turn. If one node is executed successfully, it will not continue to execute cli on subsequent nodes and return the result directly. + +#### Stop cluster command + + +```bash +iotdbctl cluster stop default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* According to the datanode node information in `datanode_servers`, stop the datanode nodes in order according to the configuration. + +* Based on the confignode node information in `confignode_servers`, stop the confignode nodes in sequence according to the configuration + +*force stop cluster command* + +```bash +iotdbctl cluster stop default_cluster -op force +``` +Will directly execute the kill -9 pid command to forcibly stop the cluster + +*Stop single node command* + +```bash +#Stop by IoTDB node name +iotdbctl cluster stop default_cluster -N datanode_1 +#Stop according to IoTDB cluster ip+port (ip+port is to get the only node according to ip+dn_rpc_port in datanode or ip+cn_internal_port in confignode to get the only node) +iotdbctl cluster stop default_cluster -N 192.168.1.5:6667 +#Stop grafana +iotdbctl cluster stop default_cluster -N grafana +#Stop prometheus +iotdbctl cluster stop default_cluster -N prometheus +``` + +* Find the yaml file in the default location based on cluster-name + +* Find the corresponding node location information based on the provided node name or ip:port. If the stopped node is `data_node`, the ip uses `dn_rpc_address` in the yaml file, and the port uses `dn_rpc_port` in datanode_servers in the yaml file. + If the stopped node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` + +* stop the node + +Note: Since the cluster deployment tool only calls the stop-confignode.sh and stop-datanode.sh scripts in the IoTDB cluster, in some cases the iotdb cluster may not be stopped. + + +#### Clean cluster data command + +```bash +iotdbctl cluster clean default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Based on the information in `confignode_servers` and `datanode_servers`, check whether there are still services running, + If any service is running, the cleanup command will not be executed. + +* Delete the data directory in the IoTDB cluster and the `cn_system_dir`, `cn_consensus_dir`, configured in the yaml file + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs` and `ext` directories. + + + +#### Restart cluster command + +```bash +iotdbctl cluster restart default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` + +* Execute the above stop cluster command (stop), and then execute the start cluster command (start). For details, refer to the above start and stop commands. + +*Force restart cluster command* + +```bash +iotdbctl cluster restart default_cluster -op force +``` +Will directly execute the kill -9 pid command to force stop the cluster, and then start the cluster + + +*Restart a single node command* + +```bash +#Restart datanode_1 according to the IoTDB node name +iotdbctl cluster restart default_cluster -N datanode_1 +#Restart confignode_1 according to the IoTDB node name +iotdbctl cluster restart default_cluster -N confignode_1 +#Restart grafana +iotdbctl cluster restart default_cluster -N grafana +#Restart prometheus +iotdbctl cluster restart default_cluster -N prometheus +``` + +#### Cluster shrink command + +```bash +#Scale down by node name +iotdbctl cluster scalein default_cluster -N nodename +#Scale down according to ip+port (ip+port obtains the only node according to ip+dn_rpc_port in datanode, and obtains the only node according to ip+cn_internal_port in confignode) +iotdbctl cluster scalein default_cluster -N ip:port +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Determine whether there is only one confignode node and datanode to be reduced. If there is only one left, the reduction cannot be performed. + +* Then get the node information to shrink according to ip:port or nodename, execute the shrink command, and then destroy the node directory. If the shrink node is `data_node`, use `dn_rpc_address` in the yaml file for ip, and use `dn_rpc_address` in the port. `dn_rpc_port` in datanode_servers in yaml file. + If the shrinking node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` + + +Tip: Currently, only one node scaling is supported at a time + +#### Cluster expansion command + +```bash +iotdbctl cluster scaleout default_cluster +``` +* Modify the config/xxx.yaml file to add a datanode node or confignode node + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Find the node to be expanded, upload the IoTDB compressed package and jdb package (if the `jdk_tar_dir` and `jdk_deploy_dir` values ​​are configured in yaml) and decompress it + +* Generate and upload `iotdb-system.properties` according to the yaml file node configuration information + +* Execute the command to start the node and verify whether the node is started successfully + +Tip: Currently, only one node expansion is supported at a time + +#### destroy cluster command +```bash +iotdbctl cluster destroy default_cluster +``` + +* cluster-name finds the yaml file in the default location + +* Check whether the node is still running based on the node node information in `confignode_servers`, `datanode_servers`, `grafana`, and `prometheus`. + Stop the destroy command if any node is running + +* Delete `data` in the IoTDB cluster and `cn_system_dir`, `cn_consensus_dir` configured in the yaml file + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs`, `ext`, `IoTDB` deployment directory, + grafana deployment directory and prometheus deployment directory + +*Destroy a single module* + +```bash +# Destroy grafana module +iotdbctl cluster destroy default_cluster -N grafana +# Destroy prometheus module +iotdbctl cluster destroy default_cluster -N prometheus +# Destroy iotdb module +iotdbctl cluster destroy default_cluster -N iotdb +``` + +#### Distribute cluster configuration commands + +```bash +iotdbctl cluster dist-conf default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` + +* Generate and upload `iotdb-system.properties` to the specified node according to the node configuration information of the yaml file + +#### Hot load cluster configuration command + +```bash +iotdbctl cluster reload default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Execute `load configuration` in the cli according to the node configuration information of the yaml file. + +#### Cluster node log backup +```bash +iotdbctl cluster dumplog default_cluster -N datanode_1,confignode_1 -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/logs' -logs '/root/data/db/iotdb/logs' +``` + +* Find the yaml file in the default location based on cluster-name + +* This command will verify the existence of datanode_1 and confignode_1 according to the yaml file, and then back up the log data of the specified node datanode_1 and confignode_1 to the specified service `192.168.9.48` port 36000 according to the configured start and end dates (startdate<=logtime<=enddate) The data backup path is `/iotdb/logs`, and the IoTDB log storage path is `/root/data/db/iotdb/logs` (not required, if you do not fill in -logs xxx, the default is to backup logs from the IoTDB installation path /logs ) + +| command | description | required | +|------------|-------------------------------------------------------------------------|----------| +| -h | backup data server ip | NO | +| -u | backup data server username | NO | +| -pw | backup data machine password | NO | +| -p | backup data machine port(default 22) | NO | +| -path | path to backup data (default current path) | NO | +| -loglevel | Log levels include all, info, error, warn (default is all) | NO | +| -l | speed limit (default 1024 speed limit range 0 to 104857601 unit Kbit/s) | NO | +| -N | multiple configuration file cluster names are separated by commas. | YES | +| -startdate | start time (including default 1970-01-01) | NO | +| -enddate | end time (included) | NO | +| -logs | IoTDB log storage path, the default is ({iotdb}/logs)) | NO | + +#### Cluster data backup +```bash +iotdbctl cluster dumpdata default_cluster -granularity partition -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/datas' +``` +* This command will obtain the leader node based on the yaml file, and then back up the data to the /iotdb/datas directory on the 192.168.9.48 service based on the start and end dates (startdate<=logtime<=enddate) + +| command | description | required | +|--------------|-------------------------------------------------------------------------|----------| +| -h | backup data server ip | NO | +| -u | backup data server username | NO | +| -pw | backup data machine password | NO | +| -p | backup data machine port(default 22) | NO | +| -path | path to backup data (default current path) | NO | +| -granularity | partition | YES | +| -l | speed limit (default 1024 speed limit range 0 to 104857601 unit Kbit/s) | NO | +| -startdate | start time (including default 1970-01-01) | YES | +| -enddate | end time (included) | YES | + +#### Cluster upgrade +```bash +iotdbctl cluster dist-lib default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Upload lib package + +Note that after performing the upgrade, please restart IoTDB for it to take effect. + +#### Cluster initialization +```bash +iotdbctl cluster init default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` +* Initialize cluster configuration + +#### View cluster process status +```bash +iotdbctl cluster status default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` +* Display the survival status of each node in the cluster + +#### Cluster authorization activation + +Cluster activation is activated by entering the activation code by default, or by using the - op license_path activated through license path + +* Default activation method +```bash +iotdbctl cluster activate default_cluster +``` +* Find the yaml file in the default location based on `cluster-name` and obtain the `confignode_servers` configuration information +* Obtain the machine code inside +* Waiting for activation code input + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* Activate a node + +```bash +iotdbctl cluster activate default_cluster -N confignode1 +``` + +* Activate through license path + +```bash +iotdbctl cluster activate default_cluster -op license_path +``` +* Find the yaml file in the default location based on `cluster-name` and obtain the `confignode_servers` configuration information +* Obtain the machine code inside +* Waiting for activation code input + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* Activate a node + +```bash +iotdbctl cluster activate default_cluster -N confignode1 -op license_path +``` + +#### Cluster Health Check +```bash +iotdbctl cluster health_check default_cluster +``` +* Locate the yaml file in the default location based on the cluster-name to retrieve confignode_servers and datanode_servers configuration information. +* Execute health_check.sh on each node. +* Single Node Health Check +```bash +iotdbctl cluster health_check default_cluster -N datanode_1 +``` +* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. +* Execute health_check.sh on datanode1. + +#### Cluster Shutdown Backup + +```bash +iotdbctl cluster backup default_cluster +``` +* Locate the yaml file in the default location based on the cluster-name to retrieve confignode_servers and datanode_servers configuration information. +* Execute backup.sh on each node + +* Single Node Backup + +```bash +iotdbctl cluster backup default_cluster -N datanode_1 +``` + +* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. +* Execute backup.sh on datanode1. +Note: Multi-node deployment on a single machine only supports quick mode. + +#### Cluster Metadata Import +```bash +iotdbctl cluster importschema default_cluster -N datanode1 -param "-s ./dump0.csv -fd ./failed/ -lpf 10000" +``` +* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. +* Execute metadata import with import-schema.sh on datanode1. +* Parameters for -param are as follows: + +| command | description | required | +|------------|-------------------------------------------------------------------------|----------| +| -s | Specify the data file to be imported. You can specify a file or a directory. If a directory is specified, all files with a .csv extension in the directory will be imported in bulk. | YES | +| -fd | Specify a directory to store failed import files. If this parameter is not specified, failed files will be saved in the source data directory with the extension .failed added to the original filename. | No | +| -lpf | Specify the number of lines written to each failed import file. The default is 10000.| NO | + +#### Cluster Metadata Export + +```bash +iotdbctl cluster exportschema default_cluster -N datanode1 -param "-t ./ -pf ./pattern.txt -lpf 10 -t 10000" +``` + +* Locate the yaml file in the default location based on the cluster-name to retrieve datanode_servers configuration information. +* Execute metadata export with export-schema.sh on datanode1. +* Parameters for -param are as follows: + +| command | description | required | +|-------------|-------------------------------------------------------------------------|----------| +| -t | Specify the output path for the exported CSV file. | YES | +| -path | Specify the path pattern for exporting metadata. If this parameter is specified, the -s parameter will be ignored. Example: root.stock.** | NO | +| -pf | If -path is not specified, this parameter must be specified. It designates the file path containing the metadata paths to be exported, supporting txt file format. Each path to be exported is on a new line.| NO | +| -lpf | Specify the maximum number of lines for the exported dump file. The default is 10000.| NO | +| -timeout | Specify the timeout for session queries in milliseconds.| NO | + + + +### Introduction to Cluster Deployment Tool Samples + +In the cluster deployment tool installation directory config/example, there are three yaml examples. If necessary, you can copy them to config and modify them. + +| name | description | +|-----------------------------|------------------------------------------------| +| default\_1c1d.yaml | 1 confignode and 1 datanode configuration example | +| default\_3c3d.yaml | 3 confignode and 3 datanode configuration samples | +| default\_3c3d\_grafa\_prome | 3 confignode and 3 datanode, Grafana, Prometheus configuration examples | + + +## IoTDB Data Directory Overview Tool + +IoTDB data directory overview tool is used to print an overview of the IoTDB data directory structure. The location is tools/tsfile/print-iotdb-data-dir. + +### Usage + +- For Windows: + +```bash +.\print-iotdb-data-dir.bat () +``` + +- For Linux or MacOs: + +```shell +./print-iotdb-data-dir.sh () +``` + +Note: if the storage path of the output overview file is not set, the default relative path "IoTDB_data_dir_overview.txt" will be used. + +### Example + +Use Windows in this example: + +`````````````````````````bash +.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data +```````````````````````` +Starting Printing the IoTDB Data Directory Overview +```````````````````````` +output save path:IoTDB_data_dir_overview.txt +data dir num:1 +143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +|============================================================== +|D:\github\master\iotdb\data\datanode\data +|--sequence +| |--root.redirect0 +| | |--1 +| | | |--0 +| |--root.redirect1 +| | |--2 +| | | |--0 +| |--root.redirect2 +| | |--3 +| | | |--0 +| |--root.redirect3 +| | |--4 +| | | |--0 +| |--root.redirect4 +| | |--5 +| | | |--0 +| |--root.redirect5 +| | |--6 +| | | |--0 +| |--root.sg1 +| | |--0 +| | | |--0 +| | | |--2760 +|--unsequence +|============================================================== +````````````````````````` + +## TsFile Sketch Tool + +TsFile sketch tool is used to print the content of a TsFile in sketch mode. The location is tools/tsfile/print-tsfile. + +### Usage + +- For Windows: + +``` +.\print-tsfile-sketch.bat () +``` + +- For Linux or MacOs: + +``` +./print-tsfile-sketch.sh () +``` + +Note: if the storage path of the output sketch file is not set, the default relative path "TsFile_sketch_view.txt" will be used. + +### Example + +Use Windows in this example: + +`````````````````````````bash +.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt +```````````````````````` +Starting Printing the TsFile Sketch +```````````````````````` +TsFile path:D:\github\master\1669359533965-1-0-0.tsfile +Sketch save path:D:\github\master\sketch.txt +148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +-------------------------------- TsFile Sketch -------------------------------- +file path: D:\github\master\1669359533965-1-0-0.tsfile +file length: 2974 + + POSITION| CONTENT + -------- ------- + 0| [magic head] TsFile + 6| [version number] 3 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 + 7| [Chunk Group Header] + | [marker] 0 + | [deviceID] root.sg1.d1 + 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends + 2656| [marker] 2 + 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [ChunkIndex] offset=20 + 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [ChunkIndex] offset=893 + 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [ChunkIndex] offset=1766 + 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT + | + | +||||||||||||||||||||| [TsFileMetadata] begins + 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE + | + | + | [meta offset] 2656 + | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 +||||||||||||||||||||| [TsFileMetadata] ends + 2964| [TsFileMetadataSize] 73 + 2968| [magic tail] TsFile + 2974| END of TsFile +---------------------------- IndexOfTimerseriesIndex Tree ----------------------------- + [MetadataIndex:LEAF_DEVICE] + └──────[root.sg1.d1,2870] + [MetadataIndex:LEAF_MEASUREMENT] + └──────[s1,2657] +---------------------------------- TsFile Sketch End ---------------------------------- +````````````````````````` + +Explanations: + +- Separated by "|", the left is the actual position in the TsFile, and the right is the summary content. +- "||||||||||||||||||||" is the guide information added to enhance readability, not the actual data stored in TsFile. +- The last printed "IndexOfTimerseriesIndex Tree" is a reorganization of the metadata index tree at the end of the TsFile, which is convenient for intuitive understanding, and again not the actual data stored in TsFile. + +## TsFile Resource Sketch Tool + +TsFile resource sketch tool is used to print the content of a TsFile resource file. The location is tools/tsfile/print-tsfile-resource-files. + +### Usage + +- For Windows: + +```bash +.\print-tsfile-resource-files.bat +``` + +- For Linux or MacOs: + +``` +./print-tsfile-resource-files.sh +``` + +### Example + +Use Windows in this example: + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. +````````````````````````` + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. +````````````````````````` diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_apache.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_apache.md new file mode 100644 index 00000000..b2371d04 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_apache.md @@ -0,0 +1,180 @@ + + +# Monitor Tool + +## Prometheus + +### The mapping from metric type to prometheus format + +> For metrics whose Metric Name is name and Tags are K1=V1, ..., Kn=Vn, the mapping is as follows, where value is a +> specific value + +| Metric Type | Mapping | +| ---------------- | ------------------------------------------------------------ | +| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | +| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | +| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | +| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="mean"} value | +| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | + +### Config File + +1) Taking DataNode as an example, modify the iotdb-system.properties configuration file as follows: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=CORE +dn_metric_prometheus_reporter_port=9091 +``` + +Then you can get metrics data as follows + +2) Start IoTDB DataNodes +3) Open a browser or use ```curl``` to visit ```http://servier_ip:9091/metrics```, you can get the following metric + data: + +``` +... +# HELP file_count +# TYPE file_count gauge +file_count{name="wal",} 0.0 +file_count{name="unseq",} 0.0 +file_count{name="seq",} 2.0 +... +``` + +### Prometheus + Grafana + +As shown above, IoTDB exposes monitoring metrics data in the standard Prometheus format to the outside world. Prometheus +can be used to collect and store monitoring indicators, and Grafana can be used to visualize monitoring indicators. + +The following picture describes the relationships among IoTDB, Prometheus and Grafana + +![iotdb_prometheus_grafana](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) + +1. Along with running, IoTDB will collect its metrics continuously. +2. Prometheus scrapes metrics from IoTDB at a constant interval (can be configured). +3. Prometheus saves these metrics to its inner TSDB. +4. Grafana queries metrics from Prometheus at a constant interval (can be configured) and then presents them on the + graph. + +So, we need to do some additional works to configure and deploy Prometheus and Grafana. + +For instance, you can config your Prometheus as follows to get metrics data from IoTDB: + +```yaml +job_name: pull-metrics +honor_labels: true +honor_timestamps: true +scrape_interval: 15s +scrape_timeout: 10s +metrics_path: /metrics +scheme: http +follow_redirects: true +static_configs: + - targets: + - localhost:9091 +``` + +The following documents may help you have a good journey with Prometheus and Grafana. + +[Prometheus getting_started](https://prometheus.io/docs/prometheus/latest/getting_started/) + +[Prometheus scrape metrics](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) + +[Grafana getting_started](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) + +[Grafana query metrics from Prometheus](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) + +## Apache IoTDB Dashboard + +`Apache IoTDB Dashboard` is available as a supplement to IoTDB Enterprise Edition, designed for unified centralized operations and management. With it, multiple clusters can be monitored through a single panel. You can access the Dashboard's Json file by contacting Commerce. + + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20default%20cluster.png) + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20cluster2.png) + + + +### Cluster Overview + +Including but not limited to: + +- Total cluster CPU cores, memory space, and hard disk space. +- Number of ConfigNodes and DataNodes in the cluster. +- Cluster uptime duration. +- Cluster write speed. +- Current CPU, memory, and disk usage across all nodes in the cluster. +- Information on individual nodes. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%A6%82%E8%A7%88.png) + + +### Data Writing + +Including but not limited to: + +- Average write latency, median latency, and the 99% percentile latency. +- Number and size of WAL files. +- Node WAL flush SyncBuffer latency. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%86%99%E5%85%A5.png) + +### Data Querying + +Including but not limited to: + +- Node query load times for time series metadata. +- Node read duration for time series. +- Node edit duration for time series metadata. +- Node query load time for Chunk metadata list. +- Node edit duration for Chunk metadata. +- Node filtering duration based on Chunk metadata. +- Average time to construct a Chunk Reader. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%9F%A5%E8%AF%A2.png) + +### Storage Engine + +Including but not limited to: + +- File count and sizes by type. +- The count and size of TsFiles at various stages. +- Number and duration of various tasks. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E.png) + +### System Monitoring + +Including but not limited to: + +- System memory, swap memory, and process memory. +- Disk space, file count, and file sizes. +- JVM GC time percentage, GC occurrences by type, GC volume, and heap memory usage across generations. +- Network transmission rate, packet sending rate + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E5%86%85%E5%AD%98%E4%B8%8E%E7%A1%AC%E7%9B%98.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9Fjvm.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E7%BD%91%E7%BB%9C.png) diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_timecho.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_timecho.md new file mode 100644 index 00000000..cfe27ad1 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_timecho.md @@ -0,0 +1,180 @@ + + +# Monitor Tool + +## Prometheus + +### The mapping from metric type to prometheus format + +> For metrics whose Metric Name is name and Tags are K1=V1, ..., Kn=Vn, the mapping is as follows, where value is a +> specific value + +| Metric Type | Mapping | +| ---------------- | ------------------------------------------------------------ | +| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | +| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | +| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | +| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="mean"} value | +| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | + +### Config File + +1) Taking DataNode as an example, modify the iotdb-system.properties configuration file as follows: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=CORE +dn_metric_prometheus_reporter_port=9091 +``` + +Then you can get metrics data as follows + +2) Start IoTDB DataNodes +3) Open a browser or use ```curl``` to visit ```http://servier_ip:9091/metrics```, you can get the following metric + data: + +``` +... +# HELP file_count +# TYPE file_count gauge +file_count{name="wal",} 0.0 +file_count{name="unseq",} 0.0 +file_count{name="seq",} 2.0 +... +``` + +### Prometheus + Grafana + +As shown above, IoTDB exposes monitoring metrics data in the standard Prometheus format to the outside world. Prometheus +can be used to collect and store monitoring indicators, and Grafana can be used to visualize monitoring indicators. + +The following picture describes the relationships among IoTDB, Prometheus and Grafana + +![iotdb_prometheus_grafana](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) + +1. Along with running, IoTDB will collect its metrics continuously. +2. Prometheus scrapes metrics from IoTDB at a constant interval (can be configured). +3. Prometheus saves these metrics to its inner TSDB. +4. Grafana queries metrics from Prometheus at a constant interval (can be configured) and then presents them on the + graph. + +So, we need to do some additional works to configure and deploy Prometheus and Grafana. + +For instance, you can config your Prometheus as follows to get metrics data from IoTDB: + +```yaml +job_name: pull-metrics +honor_labels: true +honor_timestamps: true +scrape_interval: 15s +scrape_timeout: 10s +metrics_path: /metrics +scheme: http +follow_redirects: true +static_configs: + - targets: + - localhost:9091 +``` + +The following documents may help you have a good journey with Prometheus and Grafana. + +[Prometheus getting_started](https://prometheus.io/docs/prometheus/latest/getting_started/) + +[Prometheus scrape metrics](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) + +[Grafana getting_started](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) + +[Grafana query metrics from Prometheus](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) + +## Apache IoTDB Dashboard + +We introduce the Apache IoTDB Dashboard, designed for unified centralized operations and management. With it, multiple clusters can be monitored through a single panel. + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20default%20cluster.png) + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20cluster2.png) + + +You can access the Dashboard's Json file in the enterprise edition. + +### Cluster Overview + +Including but not limited to: + +- Total cluster CPU cores, memory space, and hard disk space. +- Number of ConfigNodes and DataNodes in the cluster. +- Cluster uptime duration. +- Cluster write speed. +- Current CPU, memory, and disk usage across all nodes in the cluster. +- Information on individual nodes. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%A6%82%E8%A7%88.png) + + +### Data Writing + +Including but not limited to: + +- Average write latency, median latency, and the 99% percentile latency. +- Number and size of WAL files. +- Node WAL flush SyncBuffer latency. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%86%99%E5%85%A5.png) + +### Data Querying + +Including but not limited to: + +- Node query load times for time series metadata. +- Node read duration for time series. +- Node edit duration for time series metadata. +- Node query load time for Chunk metadata list. +- Node edit duration for Chunk metadata. +- Node filtering duration based on Chunk metadata. +- Average time to construct a Chunk Reader. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%9F%A5%E8%AF%A2.png) + +### Storage Engine + +Including but not limited to: + +- File count and sizes by type. +- The count and size of TsFiles at various stages. +- Number and duration of various tasks. + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E.png) + +### System Monitoring + +Including but not limited to: + +- System memory, swap memory, and process memory. +- Disk space, file count, and file sizes. +- JVM GC time percentage, GC occurrences by type, GC volume, and heap memory usage across generations. +- Network transmission rate, packet sending rate + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E5%86%85%E5%AD%98%E4%B8%8E%E7%A1%AC%E7%9B%98.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9Fjvm.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E7%BD%91%E7%BB%9C.png) diff --git a/src/UserGuide/V2.0.1/Tree/Tools-System/Workbench_timecho.md b/src/UserGuide/V2.0.1/Tree/Tools-System/Workbench_timecho.md new file mode 100644 index 00000000..94ad3674 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/Tools-System/Workbench_timecho.md @@ -0,0 +1,30 @@ +# WorkBench +## Product Introduction +IoTDB Visualization Console is an extension component developed for industrial scenarios based on the IoTDB Enterprise Edition time series database. It integrates real-time data collection, storage, and analysis, aiming to provide users with efficient and reliable real-time data storage and query solutions. It features lightweight, high performance, and ease of use, seamlessly integrating with the Hadoop and Spark ecosystems. It is suitable for high-speed writing and complex analytical queries of massive time series data in industrial IoT applications. + +## Instructions for Use +| **Functional Module** | **Functional Description** | +| ---------------------- | ------------------------------------------------------------ | +| Instance Management | Support unified management of connected instances, support creation, editing, and deletion, while visualizing the relationships between multiple instances, helping customers manage multiple database instances more clearly | +| Home | Support viewing the service running status of each node in the database instance (such as activation status, running status, IP information, etc.), support viewing the running monitoring status of clusters, ConfigNodes, and DataNodes, monitor the operational health of the database, and determine if there are any potential operational issues with the instance. | +| Measurement Point List | Support directly viewing the measurement point information in the instance, including database information (such as database name, data retention time, number of devices, etc.), and measurement point information (measurement point name, data type, compression encoding, etc.), while also supporting the creation, export, and deletion of measurement points either individually or in batches. | +| Data Model | Support viewing hierarchical relationships and visually displaying the hierarchical model. | +| Data Query | Support interface-based query interactions for common data query scenarios, and enable batch import and export of queried data. | +| Statistical Query | Support interface-based query interactions for common statistical data scenarios, such as outputting results for maximum, minimum, average, and sum values. | +| SQL Operations | Support interactive SQL operations on the database through a graphical user interface, allowing for the execution of single or multiple statements, and displaying and exporting the results. | +| Trend | Support one-click visualization to view the overall trend of data, draw real-time and historical data for selected measurement points, and observe the real-time and historical operational status of the measurement points. | +| Analysis | Support visualizing data through different analysis methods (such as FFT) for visualization. | +| View | Support viewing information such as view name, view description, result measuring points, and expressions through the interface. Additionally, enable users to quickly create, edit, and delete views through interactive interfaces. | +| Data synchronization | Support the intuitive creation, viewing, and management of data synchronization tasks between databases. Enable direct viewing of task running status, synchronized data, and target addresses. Users can also monitor changes in synchronization status in real-time through the interface. | +| Permission management | Support interface-based control of permissions for managing and controlling database user access and operations. | +| Audit logs | Support detailed logging of user operations on the database, including Data Definition Language (DDL), Data Manipulation Language (DML), and query operations. Assist users in tracking and identifying potential security threats, database errors, and misuse behavior. | + +Main feature showcase +* Home +![首页.png](https://alioss.timecho.com/docs/img/%E9%A6%96%E9%A1%B5.png) +* Measurement Point List +![测点列表.png](https://alioss.timecho.com/docs/img/workbench-en-bxzk.png) +* Data Query +![数据查询.png](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E6%9F%A5%E8%AF%A2.png) +* Trend +![历史趋势.png](https://alioss.timecho.com/docs/img/%E5%8E%86%E5%8F%B2%E8%B6%8B%E5%8A%BF.png) \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/AINode_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/AINode_timecho.md new file mode 100644 index 00000000..215e370e --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/AINode_timecho.md @@ -0,0 +1,654 @@ + + +# AI Capability(AINode) + +AINode is the third internal node after ConfigNode and DataNode in Apache IoTDB, which extends the capability of machine learning analysis of time series by interacting with DataNode and ConfigNode of IoTDB cluster, supports the introduction of pre-existing machine learning models from the outside to be registered, and uses the registered models in the It supports the process of introducing existing machine learning models from outside for registration, and using the registered models to complete the time series analysis tasks on the specified time series data through simple SQL statements, which integrates the model creation, management and inference in the database engine. At present, we have provided machine learning algorithms or self-developed models for common timing analysis scenarios (e.g. prediction and anomaly detection). + +The system architecture is shown below: +::: center + +::: +The responsibilities of the three nodes are as follows: + +- **ConfigNode**: responsible for storing and managing the meta-information of the model; responsible for distributed node management. +- **DataNode**: responsible for receiving and parsing SQL requests from users; responsible for storing time-series data; responsible for preprocessing computation of data. +- **AINode**: responsible for model file import creation and model inference. + +## Advantageous features + +Compared with building a machine learning service alone, it has the following advantages: + +- **Simple and easy to use**: no need to use Python or Java programming, the complete process of machine learning model management and inference can be completed using SQL statements. Creating a model can be done using the CREATE MODEL statement, and using a model for inference can be done using the CALL INFERENCE (...) statement, making it simpler and more convenient to use. + +- **Avoid Data Migration**: With IoTDB native machine learning, data stored in IoTDB can be directly applied to the inference of machine learning models without having to move the data to a separate machine learning service platform, which accelerates data processing, improves security, and reduces costs. + +![](https://alioss.timecho.com/upload/AInode1.png) + +- **Built-in Advanced Algorithms**: supports industry-leading machine learning analytics algorithms covering typical timing analysis tasks, empowering the timing database with native data analysis capabilities. Such as: + - **Time Series Forecasting**: learns patterns of change from past time series; thus outputs the most likely prediction of future series based on observations at a given past time. + - **Anomaly Detection for Time Series**: detects and identifies outliers in a given time series data, helping to discover anomalous behaviour in the time series. + - **Annotation for Time Series (Time Series Annotation)**: Adds additional information or markers, such as event occurrence, outliers, trend changes, etc., to each data point or specific time period to better understand and analyse the data. + + + +## Basic Concepts + +- **Model**: a machine learning model that takes time-series data as input and outputs the results or decisions of an analysis task. Model is the basic management unit of AINode, which supports adding (registration), deleting, checking, and using (inference) of models. +- **Create**: Load externally designed or trained model files or algorithms into MLNode for unified management and use by IoTDB. +- **Inference**: The process of using the created model to complete the timing analysis task applicable to the model on the specified timing data. +- **Built-in capabilities**: AINode comes with machine learning algorithms or home-grown models for common timing analysis scenarios (e.g., prediction and anomaly detection). + +::: center + +:::: + +## Installation and Deployment + +The deployment of AINode can be found in the document [Deployment Guidelines](../Deployment-and-Maintenance/AINode_Deployment_timecho.md#AINode-部署) . + + +## Usage Guidelines + +AINode provides model creation and deletion process for deep learning models related to timing data. Built-in models do not need to be created and deleted, they can be used directly, and the built-in model instances created after inference is completed will be destroyed automatically. + +### Registering Models + +A trained deep learning model can be registered by specifying the vector dimensions of the model's inputs and outputs, which can be used for model inference. + +Models that meet the following criteria can be registered in AINode: +1. Models trained on PyTorch 2.1.0 and 2.2.0 versions supported by AINode should avoid using features from versions 2.2.0 and above. +2. AINode supports models stored using PyTorch JIT, and the model file needs to include the parameters and structure of the model. +3. The input sequence of the model can contain one or more columns, and if there are multiple columns, they need to correspond to the model capability and model configuration file. +4. The input and output dimensions of the model must be clearly defined in the `config.yaml` configuration file. When using the model, it is necessary to strictly follow the input-output dimensions defined in the `config.yaml` configuration file. If the number of input and output columns does not match the configuration file, it will result in errors. + +The following is the SQL syntax definition for model registration. + +```SQL +create model using uri +``` + +The specific meanings of the parameters in the SQL are as follows: + +- model_name: a globally unique identifier for the model, which cannot be repeated. The model name has the following constraints: + + - Identifiers [ 0-9 a-z A-Z _ ] (letters, numbers, underscores) are allowed. + - Length is limited to 2-64 characters + - Case sensitive + +- uri: resource path to the model registration file, which should contain the **model weights model.pt file and the model's metadata description file config.yaml**. + + - Model weight file: the weight file obtained after the training of the deep learning model is completed, currently supporting pytorch training of the .pt file + + - yaml metadata description file: parameters related to the model structure that need to be provided when the model is registered, which must contain the input and output dimensions of the model for model inference: + + - | **Parameter name** | **Parameter description** | **Example** | + | ------------ | ---------------------------- | -------- | + | input_shape | Rows and columns of model inputs for model inference | [96,2] | + | output_shape | rows and columns of model outputs, for model inference | [48,2] | + + - In addition to model inference, the data types of model input and output can be specified: + + - | **Parameter name** | **Parameter description** | **Example** | + | ----------- | ------------------ | --------------------- | + | input_type | model input data type | ['float32','float32'] | + | output_type | data type of the model output | ['float32','float32'] | + + - In addition to this, additional notes can be specified for display during model management + + - | **Parameter name** | **Parameter description** | **Examples** | + | ---------- | ---------------------------------------------- | ------------------------------------------- | + | attributes | optional, user-defined model notes for model display | 'model_type': 'dlinear','kernel_size': '25' | + + +In addition to registration of local model files, registration can also be done by specifying remote resource paths via URIs, using open source model repositories (e.g. HuggingFace). + +#### Example + +In the current example folder, it contains model.pt and config.yaml files, model.pt is the training get, and the content of config.yaml is as follows: + +```YAML +configs. + # Required options + input_shape: [96, 2] # The model receives data in 96 rows x 2 columns. + output_shape: [48, 2] # Indicates that the model outputs 48 rows x 2 columns. + + # Optional Default is all float32 and the number of columns is the number of columns in the shape. + input_type: ["int64", "int64"] # Input data type, need to match the number of columns. + output_type: ["text", "int64"] #Output data type, need to match the number of columns. + +attributes: # Optional user-defined notes for the input. + 'model_type': 'dlinear' + 'kernel_size': '25' +``` + +Specify this folder as the load path to register the model. + +```SQL +IoTDB> create model dlinear_example using uri "file://. /example" +``` + +Alternatively, you can download the corresponding model file from huggingFace and register it. + +```SQL +IoTDB> create model dlinear_example using uri "https://huggingface.com/IoTDBML/dlinear/" +``` + +After the SQL is executed, the registration process will be carried out asynchronously, and you can view the registration status of the model through the model showcase (see the Model Showcase section), and the time consumed for successful registration is mainly affected by the size of the model file. + +Once the model registration is complete, you can call specific functions and perform model inference by using normal queries. + +### Viewing Models + +Successfully registered models can be queried for model-specific information through the show models command. The SQL definition is as follows: + +```SQL +show models + +show models +``` + +In addition to displaying information about all models directly, you can specify a model id to view information about a specific model. The results of the model show contain the following information: + +| **ModelId** | **State** | **Configs** | **Attributes** | +| ------------ | ------------------------------------- | ---------------------------------------------- | -------------- | +| Model Unique Identifier | Model Registration Status (LOADING, ACTIVE, DROPPING) | InputShape, outputShapeInputTypes, outputTypes | Model Notes | + +State is used to show the current state of model registration, which consists of the following three stages + +- **LOADING**: The corresponding model meta information has been added to the configNode, and the model file is being transferred to the AINode node. +- **ACTIVE**: The model has been set up and the model is in the available state +- **DROPPING**: Model deletion is in progress, model related information is being deleted from configNode and AINode. +- **UNAVAILABLE**: Model creation failed, you can delete the failed model_name by drop model. + +#### Example + +```SQL +IoTDB> show models + + ++---------------------+--------------------------+-----------+----------------------------+-----------------------+ +| ModelId| ModelType| State| Configs| Notes| ++---------------------+--------------------------+-----------+----------------------------+-----------------------+ +| dlinear_example| USER_DEFINED| ACTIVE| inputShape:[96,2]| | +| | | | outputShape:[48,2]| | +| | | | inputDataType:[float,float]| | +| | | |outputDataType:[float,float]| | +| _STLForecaster| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +| _NaiveForecaster| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +| _ARIMA| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +|_ExponentialSmoothing| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +| _GaussianHMM|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| +| _GMMHMM|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| +| _Stray|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| ++---------------------+--------------------------+-----------+------------------------------------------------------------+-----------------------+ +``` + +We have registered the corresponding model earlier, you can view the model status through the corresponding designation, active indicates that the model is successfully registered and can be used for inference. + +### Delete Model + +For a successfully registered model, the user can delete it via SQL. In addition to deleting the meta information on the configNode, this operation also deletes all the related model files under the AINode. The SQL is as follows: + +```SQL +drop model +``` + +You need to specify the model model_name that has been successfully registered to delete the corresponding model. Since model deletion involves the deletion of data on multiple nodes, the operation will not be completed immediately, and the state of the model at this time is DROPPING, and the model in this state cannot be used for model inference. + +### Using Built-in Model Reasoning + +The SQL syntax is as follows: + + +```SQL +call inference(,sql[,=]) +``` + +Built-in model inference does not require a registration process, the inference function can be used by calling the inference function through the call keyword, and its corresponding parameters are described as follows: + +- **built_in_model_name**: built-in model name +- **parameterName**: parameter name +- **parameterValue**: parameter value + +#### Built-in Models and Parameter Descriptions + +The following machine learning models are currently built-in, please refer to the following links for detailed parameter descriptions. + +| Model | built_in_model_name | Task type | Parameter description | +| -------------------- | --------------------- | -------- | ------------------------------------------------------------ | +| Arima | _Arima | Forecast | [Arima Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.arima.ARIMA.html?highlight=Arima) | +| STLForecaster | _STLForecaster | Forecast | [STLForecaster Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.trend.STLForecaster.html#sktime.forecasting.trend.STLForecaster) | +| NaiveForecaster | _NaiveForecaster | Forecast | [NaiveForecaster Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.naive.NaiveForecaster.html#naiveforecaster) | +| ExponentialSmoothing | _ExponentialSmoothing | Forecast | [ExponentialSmoothing 参Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.exp_smoothing.ExponentialSmoothing.html) | +| GaussianHMM | _GaussianHMM | Annotation | [GaussianHMMParameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.hmm_learn.gaussian.GaussianHMM.html) | +| GMMHMM | _GMMHMM | Annotation | [GMMHMM参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.hmm_learn.gmm.GMMHMM.html) | +| Stray | _Stray | Anomaly detection | [Stray Parameter description](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.stray.STRAY.html) | + + +#### Example + +The following is an example of an operation using built-in model inference. The built-in Stray model is used for anomaly detection algorithm. The input is `[144,1]` and the output is `[144,1]`. We use it for reasoning through SQL. + +```SQL +IoTDB> select * from root.eg.airline ++-----------------------------+------------------+ +| Time|root.eg.airline.s0| ++-----------------------------+------------------+ +|1949-01-31T00:00:00.000+08:00| 224.0| +|1949-02-28T00:00:00.000+08:00| 118.0| +|1949-03-31T00:00:00.000+08:00| 132.0| +|1949-04-30T00:00:00.000+08:00| 129.0| +...... +|1960-09-30T00:00:00.000+08:00| 508.0| +|1960-10-31T00:00:00.000+08:00| 461.0| +|1960-11-30T00:00:00.000+08:00| 390.0| +|1960-12-31T00:00:00.000+08:00| 432.0| ++-----------------------------+------------------+ +Total line number = 144 + +IoTDB> call inference(_Stray, "select s0 from root.eg.airline", k=2) ++-------+ +|output0| ++-------+ +| 0| +| 0| +| 0| +| 0| +...... +| 1| +| 1| +| 0| +| 0| +| 0| +| 0| ++-------+ +Total line number = 144 +``` + +### Reasoning with Deep Learning Models + +The SQL syntax is as follows: + +```SQL +call inference(,sql[,window=]) + + +window_function: + head(window_size) + tail(window_size) + count(window_size,sliding_step) +``` + +After completing the registration of the model, the inference function can be used by calling the inference function through the call keyword, and its corresponding parameters are described as follows: + +- **model_name**: corresponds to a registered model +- **sql**: sql query statement, the result of the query is used as input to the model for model inference. The dimensions of the rows and columns in the result of the query need to match the size specified in the specific model config. (It is not recommended to use the `SELECT *` clause for the sql here because in IoTDB, `*` does not sort the columns, so the order of the columns is undefined, you can use `SELECT s0,s1` to ensure that the columns order matches the expectations of the model input) +- **window_function**: Window functions that can be used in the inference process, there are currently three types of window functions provided to assist in model inference: + - **head(window_size)**: Get the top window_size points in the data for model inference, this window can be used for data cropping. + ![](https://alioss.timecho.com/docs/img/AINode-call1.png) + + - **tail(window_size)**: get the last window_size point in the data for model inference, this window can be used for data cropping. + ![](https://alioss.timecho.com/docs/img/AINode-call2.png) + + - **count(window_size, sliding_step)**: sliding window based on the number of points, the data in each window will be reasoned through the model respectively, as shown in the example below, window_size for 2 window function will be divided into three windows of the input dataset, and each window will perform reasoning operations to generate results respectively. The window can be used for continuous inference + ![](https://alioss.timecho.com/docs/img/AINode-call3.png) + +**Explanation 1**: window can be used to solve the problem of cropping rows when the results of the sql query and the input row requirements of the model do not match. Note that when the number of columns does not match or the number of rows is directly less than the model requirement, the inference cannot proceed and an error message will be returned. + +**Explanation 2**: In deep learning applications, timestamp-derived features (time columns in the data) are often used as covariates in generative tasks, and are input into the model together to enhance the model, but the time columns are generally not included in the model's output. In order to ensure the generality of the implementation, the model inference results only correspond to the real output of the model, if the model does not output the time column, it will not be included in the results. + + +#### Example + +The following is an example of inference in action using a deep learning model, for the `dlinear` prediction model with input `[96,2]` and output `[48,2]` mentioned above, which we use via SQL. + +```Shell +IoTDB> select s1,s2 from root.** ++-----------------------------+-------------------+-------------------+ +| Time| root.eg.etth.s0| root.eg.etth.s1| ++-----------------------------+-------------------+-------------------+ +|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| +|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| +|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| +|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| +|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| +|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| +|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| +...... +|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| +|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| +|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| +|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| +|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| +|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| +|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| ++-----------------------------+-------------------+-------------------+ +Total line number = 96 + +IoTDB> call inference(dlinear_example,"select s0,s1 from root.**") ++--------------------------------------------+-----------------------------+ +| _result_0| _result_1| ++--------------------------------------------+-----------------------------+ +| 0.726302981376648| 1.6549958229064941| +| 0.7354921698570251| 1.6482787370681763| +| 0.7238251566886902| 1.6278168201446533| +...... +| 0.7692174911499023| 1.654654049873352| +| 0.7685555815696716| 1.6625318765640259| +| 0.7856493592262268| 1.6508299350738525| ++--------------------------------------------+-----------------------------+ +Total line number = 48 +``` + +#### Example of using the tail/head window function + +When the amount of data is variable and you want to take the latest 96 rows of data for inference, you can use the corresponding window function tail. head function is used in a similar way, except that it takes the earliest 96 points. + +```Shell +IoTDB> select s1,s2 from root.** ++-----------------------------+-------------------+-------------------+ +| Time| root.eg.etth.s0| root.eg.etth.s1| ++-----------------------------+-------------------+-------------------+ +|1988-01-01T00:00:00.000+08:00| 0.7355| 1.211| +...... +|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| +|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| +|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| +|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| +|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| +|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| +|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| +...... +|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| +|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| +|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| +|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| +|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| +|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| +|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| ++-----------------------------+-------------------+-------------------+ +Total line number = 996 + +IoTDB> call inference(dlinear_example,"select s0,s1 from root.**",window=tail(96)) ++--------------------------------------------+-----------------------------+ +| _result_0| _result_1| ++--------------------------------------------+-----------------------------+ +| 0.726302981376648| 1.6549958229064941| +| 0.7354921698570251| 1.6482787370681763| +| 0.7238251566886902| 1.6278168201446533| +...... +| 0.7692174911499023| 1.654654049873352| +| 0.7685555815696716| 1.6625318765640259| +| 0.7856493592262268| 1.6508299350738525| ++--------------------------------------------+-----------------------------+ +Total line number = 48 +``` + +#### Example of using the count window function + +This window is mainly used for computational tasks. When the task's corresponding model can only handle a fixed number of rows of data at a time, but the final desired outcome is multiple sets of prediction results, this window function can be used to perform continuous inference using a sliding window of points. Suppose we now have an anomaly detection model `anomaly_example(input: [24,2], output[1,1])`, which generates a 0/1 label for every 24 rows of data. An example of its use is as follows: + +```Shell +IoTDB> select s1,s2 from root.** ++-----------------------------+-------------------+-------------------+ +| Time| root.eg.etth.s0| root.eg.etth.s1| ++-----------------------------+-------------------+-------------------+ +|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| +|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| +|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| +|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| +|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| +|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| +|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| +...... +|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| +|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| +|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| +|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| +|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| +|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| +|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| ++-----------------------------+-------------------+-------------------+ +Total line number = 96 + +IoTDB> call inference(anomaly_example,"select s0,s1 from root.**",window=count(24,24)) ++-------------------------+ +| _result_0| ++-------------------------+ +| 0| +| 1| +| 1| +| 0| ++-------------------------+ +Total line number = 4 +``` + +In the result set, each row's label corresponds to the output of the anomaly detection model after inputting each group of 24 rows of data. + +## Privilege Management + +When using AINode related functions, the authentication of IoTDB itself can be used to do a permission management, users can only use the model management related functions when they have the USE_MODEL permission. When using the inference function, the user needs to have the permission to access the source sequence corresponding to the SQL of the input model. + +| Privilege Name | Privilege Scope | Administrator User (default ROOT) | Normal User | Path Related | +| --------- | --------------------------------- | ---------------------- | -------- | -------- | +| USE_MODEL | create model/show models/drop model | √ | √ | x | +| READ_DATA| call inference | √ | √|√ | + +## Practical Examples + +### Power Load Prediction + +In some industrial scenarios, there is a need to predict power loads, which can be used to optimise power supply, conserve energy and resources, support planning and expansion, and enhance power system reliability. + +The data for the test set of ETTh1 that we use is [ETTh1](https://alioss.timecho.com/docs/img/ETTh1.csv). + + +It contains power data collected at 1h intervals, and each data consists of load and oil temperature as High UseFul Load, High UseLess Load, Middle UseLess Load, Low UseFul Load, Low UseLess Load, Oil Temperature. + +On this dataset, the model inference function of IoTDB-ML can predict the oil temperature in the future period of time through the relationship between the past values of high, middle and low use loads and the corresponding time stamp oil temperature, which empowers the automatic regulation and monitoring of grid transformers. + +#### Step 1: Data Import + +Users can import the ETT dataset into IoTDB using `import-csv.sh` in the tools folder + +``Bash +bash . /import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ... /... /ETTh1.csv +`` + +#### Step 2: Model Import + +We can enter the following SQL in iotdb-cli to pull a trained model from huggingface for registration for subsequent inference. + +```SQL +create model dlinear using uri 'https://huggingface.co/hvlgo/dlinear/tree/main' +``` + +This model is trained on the lighter weight deep model DLinear, which is able to capture as many trends within a sequence and relationships between variables as possible with relatively fast inference, making it more suitable for fast real-time prediction than other deeper models. + +#### Step 3: Model inference + +```Shell +IoTDB> select s0,s1,s2,s3,s4,s5,s6 from root.eg.etth LIMIT 96 ++-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ +| Time|root.eg.etth.s0|root.eg.etth.s1|root.eg.etth.s2|root.eg.etth.s3|root.eg.etth.s4|root.eg.etth.s5|root.eg.etth.s6| ++-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ +|2017-10-20T00:00:00.000+08:00| 10.449| 3.885| 8.706| 2.025| 2.041| 0.944| 8.864| +|2017-10-20T01:00:00.000+08:00| 11.119| 3.952| 8.813| 2.31| 2.071| 1.005| 8.442| +|2017-10-20T02:00:00.000+08:00| 9.511| 2.88| 7.533| 1.564| 1.949| 0.883| 8.16| +|2017-10-20T03:00:00.000+08:00| 9.645| 2.21| 7.249| 1.066| 1.828| 0.914| 7.949| +...... +|2017-10-23T20:00:00.000+08:00| 8.105| 0.938| 4.371| -0.569| 3.533| 1.279| 9.708| +|2017-10-23T21:00:00.000+08:00| 7.167| 1.206| 4.087| -0.462| 3.107| 1.432| 8.723| +|2017-10-23T22:00:00.000+08:00| 7.1| 1.34| 4.015| -0.32| 2.772| 1.31| 8.864| +|2017-10-23T23:00:00.000+08:00| 9.176| 2.746| 7.107| 1.635| 2.65| 1.097| 9.004| ++-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ +Total line number = 96 + +IoTDB> call inference(dlinear_example, "select s0,s1,s2,s3,s4,s5,s6 from root.eg.etth", window=head(96)) ++-----------+----------+----------+------------+---------+----------+----------+ +| output0| output1| output2| output3| output4| output5| output6| ++-----------+----------+----------+------------+---------+----------+----------+ +| 10.319546| 3.1450553| 7.877341| 1.5723765|2.7303758| 1.1362307| 8.867775| +| 10.443649| 3.3286757| 7.8593454| 1.7675098| 2.560634| 1.1177158| 8.920919| +| 10.883752| 3.2341104| 8.47036| 1.6116762|2.4874182| 1.1760603| 8.798939| +...... +| 8.0115595| 1.2995274| 6.9900327|-0.098746896| 3.04923| 1.176214| 9.548782| +| 8.612427| 2.5036244| 5.6790237| 0.66474205|2.8870275| 1.2051733| 9.330128| +| 10.096699| 3.399722| 6.9909| 1.7478468|2.7642853| 1.1119363| 9.541455| ++-----------+----------+----------+------------+---------+----------+----------+ +Total line number = 48 +``` + +We compare the results of the prediction of the oil temperature with the real results, and we can get the following image. + +The data before 10/24 00:00 represents the past data input to the model, the blue line after 10/24 00:00 is the oil temperature forecast result given by the model, and the red line is the actual oil temperature data from the dataset (used for comparison). + +![](https://alioss.timecho.com/docs/img/AINode-analysis1.png) + +As can be seen, we have used the relationship between the six load information and the corresponding time oil temperatures for the past 96 hours (4 days) to model the possible changes in this data for the oil temperature for the next 48 hours (2 days) based on the inter-relationships between the sequences learned previously, and it can be seen that the predicted curves maintain a high degree of consistency in trend with the actual results after visualisation. + +### Power Prediction + +Power monitoring of current, voltage and power data is required in substations for detecting potential grid problems, identifying faults in the power system, effectively managing grid loads and analysing power system performance and trends. + +We have used the current, voltage and power data in a substation to form a dataset in a real scenario. The dataset consists of data such as A-phase voltage, B-phase voltage, and C-phase voltage collected every 5 - 6s for a time span of nearly four months in the substation. + +The test set data content is [data](https://alioss.timecho.com/docs/img/data.csv). + +On this dataset, the model inference function of IoTDB-ML can predict the C-phase voltage in the future period through the previous values and corresponding timestamps of A-phase voltage, B-phase voltage and C-phase voltage, empowering the monitoring management of the substation. + +#### Step 1: Data Import + +Users can import the dataset using `import-csv.sh` in the tools folder + +```Bash +bash ./import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ... /... /data.csv +``` + +#### Step 2: Model Import + +We can select built-in models or registered models in IoTDB CLI for subsequent inference. + +We use the built-in model STLForecaster for prediction. STLForecaster is a time series forecasting method based on the STL implementation in the statsmodels library. + +#### Step 3: Model Inference + +```Shell +IoTDB> select * from root.eg.voltage limit 96 ++-----------------------------+------------------+------------------+------------------+ +| Time|root.eg.voltage.s0|root.eg.voltage.s1|root.eg.voltage.s2| ++-----------------------------+------------------+------------------+------------------+ +|2023-02-14T20:38:32.000+08:00| 2038.0| 2028.0| 2041.0| +|2023-02-14T20:38:38.000+08:00| 2014.0| 2005.0| 2018.0| +|2023-02-14T20:38:44.000+08:00| 2014.0| 2005.0| 2018.0| +...... +|2023-02-14T20:47:52.000+08:00| 2024.0| 2016.0| 2027.0| +|2023-02-14T20:47:57.000+08:00| 2024.0| 2016.0| 2027.0| +|2023-02-14T20:48:03.000+08:00| 2024.0| 2016.0| 2027.0| ++-----------------------------+------------------+------------------+------------------+ +Total line number = 96 + +IoTDB> call inference(_STLForecaster, "select s0,s1,s2 from root.eg.voltage", window=head(96),predict_length=48) ++---------+---------+---------+ +| output0| output1| output2| ++---------+---------+---------+ +|2026.3601|2018.2953|2029.4257| +|2019.1538|2011.4361|2022.0888| +|2025.5074|2017.4522|2028.5199| +...... + +|2022.2336|2015.0290|2025.1023| +|2015.7241|2008.8975|2018.5085| +|2022.0777|2014.9136|2024.9396| +|2015.5682|2008.7821|2018.3458| ++---------+---------+---------+ +Total line number = 48 +``` + +Comparing the predicted results of the C-phase voltage with the real results, we can get the following image. + +The data before 02/14 20:48 represents the past data input to the model, the blue line after 02/14 20:48 is the predicted result of phase C voltage given by the model, while the red line is the actual phase C voltage data from the dataset (used for comparison). + +![](https://alioss.timecho.com/docs/img/AINode-analysis2.png) + +It can be seen that we used the voltage data from the past 10 minutes and, based on the previously learned inter-sequence relationships, modeled the possible changes in the phase C voltage data for the next 5 minutes. The visualized forecast curve shows a certain degree of synchronicity with the actual results in terms of trend. + +### Anomaly Detection + +In the civil aviation and transport industry, there exists a need for anomaly detection of the number of passengers travelling on an aircraft. The results of anomaly detection can be used to guide the adjustment of flight scheduling to make the organisation more efficient. + +Airline Passengers is a time-series dataset that records the number of international air passengers between 1949 and 1960, sampled at one-month intervals. The dataset contains a total of one time series. The dataset is [airline](https://alioss.timecho.com/docs/img/airline.csv). +On this dataset, the model inference function of IoTDB-ML can empower the transport industry by capturing the changing patterns of the sequence in order to detect anomalies at the sequence time points. + +#### Step 1: Data Import + +Users can import the dataset using `import-csv.sh` in the tools folder + +``Bash +bash . /import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ... /... /data.csv +`` + +#### Step 2: Model Inference + +IoTDB has some built-in machine learning algorithms that can be used directly, a sample prediction using one of the anomaly detection algorithms is shown below: + +```Shell +IoTDB> select * from root.eg.airline ++-----------------------------+------------------+ +| Time|root.eg.airline.s0| ++-----------------------------+------------------+ +|1949-01-31T00:00:00.000+08:00| 224.0| +|1949-02-28T00:00:00.000+08:00| 118.0| +|1949-03-31T00:00:00.000+08:00| 132.0| +|1949-04-30T00:00:00.000+08:00| 129.0| +...... +|1960-09-30T00:00:00.000+08:00| 508.0| +|1960-10-31T00:00:00.000+08:00| 461.0| +|1960-11-30T00:00:00.000+08:00| 390.0| +|1960-12-31T00:00:00.000+08:00| 432.0| ++-----------------------------+------------------+ +Total line number = 144 + +IoTDB> call inference(_Stray, "select s0 from root.eg.airline", k=2) ++-------+ +|output0| ++-------+ +| 0| +| 0| +| 0| +| 0| +...... +| 1| +| 1| +| 0| +| 0| +| 0| +| 0| ++-------+ +Total line number = 144 +``` + +We plot the results detected as anomalies to get the following image. Where the blue curve is the original time series and the time points specially marked with red dots are the time points that the algorithm detects as anomalies. + +![](https://alioss.timecho.com/docs/img/s6.png) + +It can be seen that the Stray model has modelled the input sequence changes and successfully detected the time points where anomalies occur. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Audit-Log_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Audit-Log_timecho.md new file mode 100644 index 00000000..741135b0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Audit-Log_timecho.md @@ -0,0 +1,93 @@ + + +# Audit log + +## Background of the function + +Audit log is the record credentials of a database, which can be queried by the audit log function to ensure information security by various operations such as user add, delete, change and check in the database. With the audit log function of IoTDB, the following scenarios can be achieved: + +- We can decide whether to record audit logs according to the source of the link ( human operation or not), such as: non-human operation such as hardware collector write data no need to record audit logs, human operation such as ordinary users through cli, workbench and other tools to operate the data need to record audit logs. +- Filter out system-level write operations, such as those recorded by the IoTDB monitoring system itself. + +### Scene Description + +#### Logging all operations (add, delete, change, check) of all users + +The audit log function traces all user operations in the database. The information recorded should include data operations (add, delete, query) and metadata operations (add, modify, delete, query), client login information (user name, ip address). + +Client Sources: +- Cli、workbench、Zeppelin、Grafana、通过 Session/JDBC/MQTT 等协议传入的请求 + +![](https://alioss.timecho.com/docs/img/%E5%AE%A1%E8%AE%A1%E6%97%A5%E5%BF%97.PNG) + +#### Audit logging can be turned off for some user connections + +No audit logs are required for data written by the hardware collector via Session/JDBC/MQTT if it is a non-human action. + +## Function Definition + +It is available through through configurations: + +- Decide whether to enable the audit function or not +- Decide where to output the audit logs, support output to one or more + 1. log file + 2. IoTDB storage +- Decide whether to block the native interface writes to prevent recording too many audit logs to affect performance. +- Decide the content category of the audit log, supporting recording one or more + 1. data addition and deletion operations + 2. data and metadata query operations + 3. metadata class adding, modifying, and deleting operations. + +### configuration item + +In iotdb-system.properties, change the following configurations: + +```YAML +#################### +### Audit log Configuration +#################### + +# whether to enable the audit log. +# Datatype: Boolean +# enable_audit_log=false + +# Output location of audit logs +# Datatype: String +# IOTDB: the stored time series is: root.__system.audit._{user} +# LOGGER: log_audit.log in the log directory +# audit_log_storage=IOTDB,LOGGER + +# whether enable audit log for DML operation of data +# whether enable audit log for DDL operation of schema +# whether enable audit log for QUERY operation of data and schema +# Datatype: String +# audit_log_operation=DML,DDL,QUERY + +# whether the local write api records audit logs +# Datatype: Boolean +# This contains Session insert api: insertRecord(s), insertTablet(s),insertRecordsOfOneDevice +# MQTT insert api +# RestAPI insert api +# This parameter will cover the DML in audit_log_operation +# enable_audit_log_for_native_insert_api=true +``` + diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Authority-Management.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Authority-Management.md new file mode 100644 index 00000000..751caf7b --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Authority-Management.md @@ -0,0 +1,519 @@ + + +# Database Administration + +IoTDB provides permission management operations, offering users the ability to manage permissions for data and cluster systems, ensuring data and system security. + +This article introduces the basic concepts of the permission module in IoTDB, including user definition, permission management, authentication logic, and use cases. In the JAVA programming environment, you can use the [JDBC API](https://chat.openai.com/API/Programming-JDBC.md) to execute permission management statements individually or in batches. + +## Basic Concepts + +### User + +A user is a legitimate user of the database. Each user corresponds to a unique username and has a password as a means of authentication. Before using the database, a person must provide a valid (i.e., stored in the database) username and password for a successful login. + +### Permission + +The database provides various operations, but not all users can perform all operations. If a user can perform a certain operation, they are said to have permission to execute that operation. Permissions are typically limited in scope by a path, and [path patterns](https://chat.openai.com/Basic-Concept/Data-Model-and-Terminology.md) can be used to manage permissions flexibly. + +### Role + +A role is a collection of multiple permissions and has a unique role name as an identifier. Roles often correspond to real-world identities (e.g., a traffic dispatcher), and a real-world identity may correspond to multiple users. Users with the same real-world identity often have the same permissions, and roles are abstractions for unified management of such permissions. + +### Default Users and Roles + +After installation and initialization, IoTDB includes a default user: root, with the default password root. This user is an administrator with fixed permissions, which cannot be granted or revoked and cannot be deleted. There is only one administrator user in the database. + +A newly created user or role does not have any permissions initially. + +## User Definition + +Users with MANAGE_USER and MANAGE_ROLE permissions or administrators can create users or roles. Creating a user must meet the following constraints. + +### Username Constraints + +4 to 32 characters, supports the use of uppercase and lowercase English letters, numbers, and special characters (`!@#$%^&*()_+-=`). + +Users cannot create users with the same name as the administrator. + +### Password Constraints + +4 to 32 characters, can use uppercase and lowercase letters, numbers, and special characters (`!@#$%^&*()_+-=`). Passwords are encrypted by default using MD5. + +### Role Name Constraints + +4 to 32 characters, supports the use of uppercase and lowercase English letters, numbers, and special characters (`!@#$%^&*()_+-=`). + +Users cannot create roles with the same name as the administrator. + + + +## Permission Management + +IoTDB primarily has two types of permissions: series permissions and global permissions. + +### Series Permissions + +Series permissions constrain the scope and manner in which users access data. IOTDB support authorization for both absolute paths and prefix-matching paths, and can be effective at the timeseries granularity. + +The table below describes the types and scope of these permissions: + + + +| Permission Name | Description | +|-----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| READ_DATA | Allows reading time series data under the authorized path. | +| WRITE_DATA | Allows reading time series data under the authorized path.
Allows inserting and deleting time series data under the authorized path.
Allows importing and loading data under the authorized path. When importing data, you need the WRITE_DATA permission for the corresponding path. When automatically creating databases or time series, you need MANAGE_DATABASE and WRITE_SCHEMA permissions. | +| READ_SCHEMA | Allows obtaining detailed information about the metadata tree under the authorized path,
including databases, child paths, child nodes, devices, time series, templates, views, etc. | +| WRITE_SCHEMA | Allows obtaining detailed information about the metadata tree under the authorized path.
Allows creating, deleting, and modifying time series, templates, views, etc. under the authorized path. When creating or modifying views, it checks the WRITE_SCHEMA permission for the view path and READ_SCHEMA permission for the data source. When querying and inserting data into views, it checks the READ_DATA and WRITE_DATA permissions for the view path.
Allows setting, unsetting, and viewing TTL under the authorized path.
Allows attaching or detaching templates under the authorized path. | + + +### Global Permissions + +Global permissions constrain the database functions that users can use and restrict commands that change the system and task state. Once a user obtains global authorization, they can manage the database. +The table below describes the types of system permissions: + + +| Permission Name | Description | +|:---------------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| MANAGE_DATABASE | Allow users to create and delete databases. | +| MANAGE_USER | Allow users to create, delete, modify, and view users. | +| MANAGE_ROLE | Allow users to create, delete, modify, and view roles.
Allow users to grant/revoke roles to/from other users. | +| USE_TRIGGER | Allow users to create, delete, and view triggers.
Independent of data source permission checks for triggers. | +| USE_UDF | Allow users to create, delete, and view user-defined functions.
Independent of data source permission checks for user-defined functions. | +| USE_CQ | Allow users to create, delete, and view continuous queries.
Independent of data source permission checks for continuous queries. | +| USE_PIPE | Allow users to create, start, stop, delete, and view pipelines.
Allow users to create, delete, and view pipeline plugins.
Independent of data source permission checks for pipelines. | +| EXTEND_TEMPLATE | Permission to automatically create templates. | +| MAINTAIN | Allow users to query and cancel queries.
Allow users to view variables.
Allow users to view cluster status. | +| USE_MODEL | Allow users to create, delete and view deep learning model. | +Regarding template permissions: + +1. Only administrators are allowed to create, delete, modify, query, mount, and unmount templates. +2. To activate a template, you need to have WRITE_SCHEMA permission for the activation path. +3. If automatic creation is enabled, writing to a non-existent path that has a template mounted will automatically extend the template and insert data. Therefore, one needs EXTEND_TEMPLATE permission and WRITE_DATA permission for writing to the sequence. +4. To deactivate a template, WRITE_SCHEMA permission for the mounted template path is required. +5. To query paths that use a specific metadata template, you needs READ_SCHEMA permission for the paths; otherwise, it will return empty results. + + + +### Granting and Revoking Permissions + +In IoTDB, users can obtain permissions through three methods: + +1. Granted by administrator, who has control over the permissions of other users. +2. Granted by a user allowed to authorize permissions, and this user was assigned the grant option keyword when obtaining the permission. +3. Granted a certain role by administrator or a user with MANAGE_ROLE, thereby obtaining permissions. + +Revoking a user's permissions can be done through the following methods: + +1. Revoked by administrator. +2. Revoked by a user allowed to authorize permissions, and this user was assigned the grant option keyword when obtaining the permission. +3. Revoked from a user's role by administrator or a user with MANAGE_ROLE, thereby revoking the permissions. + +- When granting permissions, a path must be specified. Global permissions need to be specified as root.**, while series-specific permissions must be absolute paths or prefix paths ending with a double wildcard. +- When granting user/role permissions, you can specify the "with grant option" keyword for that permission, which means that the user can grant permissions on their authorized paths and can also revoke permissions on other users' authorized paths. For example, if User A is granted read permission for `group1.company1.**` with the grant option keyword, then A can grant read permissions to others on any node or series below `group1.company1`, and can also revoke read permissions on any node below `group1.company1` for other users. +- When revoking permissions, the revocation statement will match against all of the user's permission paths and clear the matched permission paths. For example, if User A has read permission for `group1.company1.factory1`, when revoking read permission for `group1.company1.**`, it will remove A's read permission for `group1.company1.factory1`. + + + +## Authentication + +User permissions mainly consist of three parts: permission scope (path), permission type, and the "with grant option" flag: + +``` +userTest1: + root.t1.** - read_schema, read_data - with grant option + root.** - write_schema, write_data - with grant option +``` + +Each user has such a permission access list, identifying all the permissions they have acquired. You can view their permissions by using the command `LIST PRIVILEGES OF USER `. + +When authorizing a path, the database will match the path with the permissions. For example, when checking the read_schema permission for `root.t1.t2`, it will first match with the permission access list `root.t1.**`. If it matches successfully, it will then check if that path contains the permission to be authorized. If not, it continues to the next path-permission match until a match is found or all matches are exhausted. + +When performing authorization for multiple paths, such as executing a multi-path query task, the database will only present data for which the user has permissions. Data for which the user does not have permissions will not be included in the results, and information about these paths without permissions will be output to the alert messages. + +Please note that the following operations require checking multiple permissions: + +1. Enabling the automatic sequence creation feature requires not only write permission for the corresponding sequence when a user inserts data into a non-existent sequence but also metadata modification permission for the sequence. + +2. When executing the "select into" statement, it is necessary to check the read permission for the source sequence and the write permission for the target sequence. It should be noted that the source sequence data may only be partially accessible due to insufficient permissions, and if the target sequence has insufficient write permissions, an error will occur, terminating the task. + +3. View permissions and data source permissions are independent. Performing read and write operations on a view will only check the permissions of the view itself and will not perform permission validation on the source path. + + +## Function Syntax and Examples + +IoTDB provides composite permissions for user authorization: + +| Permission Name | Permission Scope | +|-----------------|--------------------------| +| ALL | All permissions | +| READ | READ_SCHEMA, READ_DATA | +| WRITE | WRITE_SCHEMA, WRITE_DATA | + +Composite permissions are not specific permissions themselves but a shorthand way to denote a combination of permissions, with no difference from directly specifying the corresponding permission names. + +The following series of specific use cases will demonstrate the usage of permission statements. Non-administrator users executing the following statements require obtaining the necessary permissions, which are indicated after the operation description. + +### User and Role Related + +- Create user (Requires MANAGE_USER permission) + +```SQL +CREATE USER +eg: CREATE USER user1 'passwd' +``` + +- Delete user (Requires MANAGE_USER permission) + +```sql +DROP USER +eg: DROP USER user1 +``` + +- Create role (Requires MANAGE_ROLE permission) + +```sql +CREATE ROLE +eg: CREATE ROLE role1 +``` + +- Delete role (Requires MANAGE_ROLE permission) + +```sql +DROP ROLE +eg: DROP ROLE role1 +``` + +- Grant role to user (Requires MANAGE_ROLE permission) + +```sql +GRANT ROLE TO +eg: GRANT ROLE admin TO user1 +``` + +- Revoke role from user(Requires MANAGE_ROLE permission) + +```sql +REVOKE ROLE FROM +eg: REVOKE ROLE admin FROM user1 +``` + +- List all user (Requires MANAGE_USER permission) + +```sql +LIST USER +``` + +- List all role (Requires MANAGE_ROLE permission) + +```sql +LIST ROLE +``` + +- List all users granted specific role.(Requires MANAGE_USER permission) + +```sql +LIST USER OF ROLE +eg: LIST USER OF ROLE roleuser +``` + +- List all role granted to specific user. + + Users can list their own roles, but listing roles of other users requires the MANAGE_ROLE permission. + +```sql +LIST ROLE OF USER +eg: LIST ROLE OF USER tempuser +``` + +- List all privileges of user + +Users can list their own privileges, but listing privileges of other users requires the MANAGE_USER permission. + +```sql +LIST PRIVILEGES OF USER ; +eg: LIST PRIVILEGES OF USER tempuser; +``` + +- List all privileges of role + +Users can list the permission information of roles they have, but listing permissions of other roles requires the MANAGE_ROLE permission. + +```sql +LIST PRIVILEGES OF ROLE ; +eg: LIST PRIVILEGES OF ROLE actor; +``` + +- Update password + +Users can update their own password, but updating passwords of other users requires the MANAGE_USER permission. + +```sql +ALTER USER SET PASSWORD ; +eg: ALTER USER tempuser SET PASSWORD 'newpwd'; +``` + +### Authorization and Deauthorization + +Users can use authorization statements to grant permissions to other users. The syntax is as follows: + +```sql +GRANT ON TO ROLE/USER [WITH GRANT OPTION]; +eg: GRANT READ ON root.** TO ROLE role1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.** TO USER user1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.**,root.t2.** TO USER user1; +eg: GRANT MANAGE_ROLE ON root.** TO USER user1 WITH GRANT OPTION; +eg: GRANT ALL ON root.** TO USER user1 WITH GRANT OPTION; +``` + +Users can use deauthorization statements to revoke permissions from others. The syntax is as follows: + +```sql +REVOKE ON FROM ROLE/USER ; +eg: REVOKE READ ON root.** FROM ROLE role1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.** FROM USER user1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.**, root.t2.** FROM USER user1; +eg: REVOKE MANAGE_ROLE ON root.** FROM USER user1; +eg: REVOKE ALL ON ROOT.** FROM USER user1; +``` + +- **When non-administrator users execute authorization/deauthorization statements, they need to have \ permissions on \, and these permissions must be marked with WITH GRANT OPTION.** + +- When granting or revoking global permissions or when the statement contains global permissions (expanding ALL includes global permissions), you must specify the path as root**. For example, the following authorization/deauthorization statements are valid: + + ```sql + GRANT MANAGE_USER ON root.** TO USER user1; + GRANT MANAGE_ROLE ON root.** TO ROLE role1 WITH GRANT OPTION; + GRANT ALL ON root.** TO role role1 WITH GRANT OPTION; + REVOKE MANAGE_USER ON root.** FROM USER user1; + REVOKE MANAGE_ROLE ON root.** FROM ROLE role1; + REVOKE ALL ON root.** FROM ROLE role1; + ``` + + The following statements are invalid: + + ```sql + GRANT READ, MANAGE_ROLE ON root.t1.** TO USER user1; + GRANT ALL ON root.t1.t2 TO USER user1 WITH GRANT OPTION; + REVOKE ALL ON root.t1.t2 FROM USER user1; + REVOKE READ, MANAGE_ROLE ON root.t1.t2 FROM ROLE ROLE1; + ``` + +- \ must be a full path or a matching path ending with a double wildcard. The following paths are valid: + + ```sql + root.** + root.t1.t2.** + root.t1.t2.t3 + ``` + + The following paths are invalid: + + ```sql + root.t1.* + root.t1.**.t2 + root.t1*.t2.t3 + ``` + + + +## Examples + + Based on the described [sample data](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt), IoTDB's sample data may belong to different power generation groups such as ln, sgcc, and so on. Different power generation groups do not want other groups to access their database data, so we need to implement data isolation at the group level. + +#### Create Users +Use `CREATE USER ` to create users. For example, we can create two users for the ln and sgcc groups with the root user, who has all permissions, and name them ln_write_user and sgcc_write_user. It is recommended to enclose the username in backticks. The SQL statements are as follows: +```SQL +CREATE USER `ln_write_user` 'write_pwd' +CREATE USER `sgcc_write_user` 'write_pwd' +``` + +Now, using the SQL statement to display users: + +```sql +LIST USER +``` + +We can see that these two users have been created, and the result is as follows: + +```sql +IoTDB> CREATE USER `ln_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> LIST USER; ++---------------+ +| user| ++---------------+ +| ln_write_user| +| root| +|sgcc_write_user| ++---------------+ +Total line number = 3 +It costs 0.012s +``` + +#### Granting Permissions to Users + +At this point, although two users have been created, they do not have any permissions, so they cannot operate on the database. For example, if we use the ln_write_user to write data to the database, the SQL statement is as follows: + +```sql +INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +``` + +At this point, the system does not allow this operation, and an error is displayed: + +```sql +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Msg: 803: No permissions for this operation, please add privilege WRITE_DATA on [root.ln.wf01.wt01.status] +``` + +Now, we will grant each user write permissions to the corresponding paths using the root user. + +We use the `GRANT ON TO USER ` statement to grant permissions to users, for example: + +```sql +GRANT WRITE_DATA ON root.ln.** TO USER `ln_write_user` +GRANT WRITE_DATA ON root.sgcc1.**, root.sgcc2.** TO USER `sgcc_write_user` +``` + +The execution status is as follows: + +```sql +IoTDB> GRANT WRITE_DATA ON root.ln.** TO USER `ln_write_user` +Msg: The statement is executed successfully. +IoTDB> GRANT WRITE_DATA ON root.sgcc1.**, root.sgcc2.** TO USER `sgcc_write_user` +Msg: The statement is executed successfully. +``` + +Then, using ln_write_user, try to write data again: + +```sql +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: The statement is executed successfully. +``` + +#### Revoking User Permissions + +After granting user permissions, we can use the `REVOKE ON FROM USER ` to revoke the permissions granted to users. For example, using the root user to revoke the permissions of ln_write_user and sgcc_write_user: + +```sql +REVOKE WRITE_DATA ON root.ln.** FROM USER `ln_write_user` +REVOKE WRITE_DATA ON root.sgcc1.**, root.sgcc2.** FROM USER `sgcc_write_user` +``` + + +The execution status is as follows: + +```sql +IoTDB> REVOKE WRITE_DATA ON root.ln.** FROM USER `ln_write_user` +Msg: The statement is executed successfully. +IoTDB> REVOKE WRITE_DATA ON root.sgcc1.**, root.sgcc2.** FROM USER `sgcc_write_user` +Msg: The statement is executed successfully. +``` + +After revoking the permissions, ln_write_user no longer has the permission to write data to root.ln.**: + +```sql +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: 803: No permissions for this operation, please add privilege WRITE_DATA on [root.ln.wf01.wt01.status] +``` + +## Other Explanations + +Roles are collections of permissions, and both permissions and roles are attributes of users. In other words, a role can have multiple permissions, and a user can have multiple roles and permissions (referred to as the user's self-permissions). + +Currently, in IoTDB, there are no conflicting permissions. Therefore, the actual permissions a user has are the union of their self-permissions and the permissions of all their roles. In other words, to determine if a user can perform a certain operation, it's necessary to check whether their self-permissions or the permissions of all their roles allow that operation. Self-permissions, role permissions, and the permissions of multiple roles a user has may contain the same permission, but this does not have any impact. + +It's important to note that if a user has a certain permission (corresponding to operation A) on their own, and one of their roles has the same permission, revoking the permission from the user alone will not prevent the user from performing operation A. To prevent the user from performing operation A, you need to revoke the permission from both the user and the role, or remove the user from the role that has the permission. Similarly, if you only revoke the permission from the role, it won't prevent the user from performing operation A if they have the same permission on their own. + +At the same time, changes to roles will be immediately reflected in all users who have that role. For example, adding a certain permission to a role will immediately grant that permission to all users who have that role, and removing a certain permission will cause those users to lose that permission (unless the user has it on their own). + + + +## Upgrading from a previous version + +Before version 1.3, there were many different permission types. In 1.3 version's implementation, we have streamlined the permission types. + +The permission paths in version 1.3 of the database must be either full paths or matching paths ending with a double wildcard. During system upgrades, any invalid permission paths and permission types will be automatically converted. The first invalid node on the path will be replaced with "**", and any unsupported permission types will be mapped to the permissions supported by the current system. + +| Permission | Path | Mapped-Permission | Mapped-path | +|-------------------|-----------------|-------------------|---------------| +| CREATE_DATBASE | root.db.t1.* | MANAGE_DATABASE | root.** | +| INSERT_TIMESERIES | root.db.t2.*.t3 | WRITE_DATA | root.db.t2.** | +| CREATE_TIMESERIES | root.db.t2*c.t3 | WRITE_SCHEMA | root.db.** | +| LIST_ROLE | root.** | (ignore) | | + + + +You can refer to the table below for a comparison of permission types between the old and new versions (where "--IGNORE" indicates that the new version ignores that permission): + +| Permission Name | Path-Related | New Permission Name | Path-Related | +|---------------------------|--------------|---------------------|--------------| +| CREATE_DATABASE | YES | MANAGE_DATABASE | NO | +| INSERT_TIMESERIES | YES | WRITE_DATA | YES | +| UPDATE_TIMESERIES | YES | WRITE_DATA | YES | +| READ_TIMESERIES | YES | READ_DATA | YES | +| CREATE_TIMESERIES | YES | WRITE_SCHEMA | YES | +| DELETE_TIMESERIES | YES | WRITE_SCHEMA | YES | +| CREATE_USER | NO | MANAGE_USER | NO | +| DELETE_USER | NO | MANAGE_USER | NO | +| MODIFY_PASSWORD | NO | -- IGNORE | | +| LIST_USER | NO | -- IGNORE | | +| GRANT_USER_PRIVILEGE | NO | -- IGNORE | | +| REVOKE_USER_PRIVILEGE | NO | -- IGNORE | | +| GRANT_USER_ROLE | NO | MANAGE_ROLE | NO | +| REVOKE_USER_ROLE | NO | MANAGE_ROLE | NO | +| CREATE_ROLE | NO | MANAGE_ROLE | NO | +| DELETE_ROLE | NO | MANAGE_ROLE | NO | +| LIST_ROLE | NO | -- IGNORE | | +| GRANT_ROLE_PRIVILEGE | NO | -- IGNORE | | +| REVOKE_ROLE_PRIVILEGE | NO | -- IGNORE | | +| CREATE_FUNCTION | NO | USE_UDF | NO | +| DROP_FUNCTION | NO | USE_UDF | NO | +| CREATE_TRIGGER | YES | USE_TRIGGER | NO | +| DROP_TRIGGER | YES | USE_TRIGGER | NO | +| START_TRIGGER | YES | USE_TRIGGER | NO | +| STOP_TRIGGER | YES | USE_TRIGGER | NO | +| CREATE_CONTINUOUS_QUERY | NO | USE_CQ | NO | +| DROP_CONTINUOUS_QUERY | NO | USE_CQ | NO | +| ALL | NO | All privilegs | | +| DELETE_DATABASE | YES | MANAGE_DATABASE | NO | +| ALTER_TIMESERIES | YES | WRITE_SCHEMA | YES | +| UPDATE_TEMPLATE | NO | -- IGNORE | | +| READ_TEMPLATE | NO | -- IGNORE | | +| APPLY_TEMPLATE | YES | WRITE_SCHEMA | YES | +| READ_TEMPLATE_APPLICATION | NO | -- IGNORE | | +| SHOW_CONTINUOUS_QUERIES | NO | -- IGNORE | | +| CREATE_PIPEPLUGIN | NO | USE_PIPE | NO | +| DROP_PIPEPLUGINS | NO | USE_PIPE | NO | +| SHOW_PIPEPLUGINS | NO | -- IGNORE | | +| CREATE_PIPE | NO | USE_PIPE | NO | +| START_PIPE | NO | USE_PIPE | NO | +| STOP_PIPE | NO | USE_PIPE | NO | +| DROP_PIPE | NO | USE_PIPE | NO | +| SHOW_PIPES | NO | -- IGNORE | | +| CREATE_VIEW | YES | WRITE_SCHEMA | YES | +| ALTER_VIEW | YES | WRITE_SCHEMA | YES | +| RENAME_VIEW | YES | WRITE_SCHEMA | YES | +| DELETE_VIEW | YES | WRITE_SCHEMA | YES | diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_apache.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_apache.md new file mode 100644 index 00000000..16e58170 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_apache.md @@ -0,0 +1,530 @@ + + +# Data Synchronisation + +Data synchronization is a typical requirement in industrial Internet of Things (IoT). Through data synchronization mechanisms, it is possible to achieve data sharing between IoTDB, and to establish a complete data link to meet the needs for internal and external network data interconnectivity, edge-cloud synchronization, data migration, and data backup. + +## Function Overview + +### Data Synchronization + +A data synchronization task consists of three stages: + +![](https://alioss.timecho.com/docs/img/sync_en_01.png) + +- Source Stage:This part is used to extract data from the source IoTDB, defined in the source section of the SQL statement. +- Process Stage:This part is used to process the data extracted from the source IoTDB, defined in the processor section of the SQL statement. +- Sink Stage:This part is used to send data to the target IoTDB, defined in the sink section of the SQL statement. + +By declaratively configuring the specific content of the three parts through SQL statements, flexible data synchronization capabilities can be achieved. Currently, data synchronization supports the synchronization of the following information, and you can select the synchronization scope when creating a synchronization task (the default is data.insert, which means synchronizing newly written data): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Synchronization ScopeSynchronization Content Description
allAll scopes
data(Data)insertSynchronize newly written data
deleteSynchronize deleted data
schemadatabaseSynchronize database creation, modification or deletion operations
timeseriesSynchronize the definition and attributes of time series
TTLSynchronize the data retention time
auth-Synchronize user permissions and access control
+ +### Functional limitations and instructions + +The schema and auth synchronization functions have the following limitations: + +- When using schema synchronization, it is required that the consensus protocol for `Schema region` and `ConfigNode` must be the default ratis protocol. This means that the `iotdb-system.properties` configuration file should contain the settings `config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus` and `schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus`. If these are not specified, the default ratis protocol is used. + +- To prevent potential conflicts, please disable the automatic creation of schema on the receiving end when enabling schema synchronization. This can be done by setting the `enable_auto_create_schema` configuration in the `iotdb-system.properties` file to false. + +- When schema synchronization is enabled, the use of custom plugins is not supported. + +- During data synchronization tasks, please avoid performing any deletion operations to prevent inconsistent states between the two ends. + +## Usage Instructions + +Data synchronization tasks have three states: RUNNING, STOPPED, and DROPPED. The task state transitions are shown in the following diagram: + +![](https://alioss.timecho.com/docs/img/Data-Sync02.png) + +After creation, the task will start directly, and when the task stops abnormally, the system will automatically attempt to restart the task. + +Provide the following SQL statements for state management of synchronization tasks. + +### Create Task + +Use the `CREATE PIPE` statement to create a data synchronization task. The `PipeId` and `sink` attributes are required, while `source` and `processor` are optional. When entering the SQL, note that the order of the `SOURCE` and `SINK` plugins cannot be swapped. + +The SQL example is as follows: + +```SQL +CREATE PIPE [IF NOT EXISTS] -- PipeId is the name that uniquely identifies the task. +-- Data extraction plugin, optional plugin +WITH SOURCE ( + [ = ,], +) +-- Data processing plugin, optional plugin +WITH PROCESSOR ( + [ = ,], +) +-- Data connection plugin, required plugin +WITH SINK ( + [ = ,], +) +``` + +**IF NOT EXISTS semantics**: Used in creation operations to ensure that the create command is executed when the specified Pipe does not exist, preventing errors caused by attempting to create an existing Pipe. + +### Start Task + +Start processing data: + +```SQL +START PIPE +``` + +### Stop Task + +Stop processing data: + +```SQL +STOP PIPE +``` + +### Delete Task + +Deletes the specified task: + +```SQL +DROP PIPE [IF EXISTS] +``` +**IF EXISTS semantics**: Used in deletion operations to ensure that when a specified Pipe exists, the delete command is executed to prevent errors caused by attempting to delete non-existent Pipes. + +Deleting a task does not require stopping the synchronization task first. + +### View Task + +View all tasks: + +```SQL +SHOW PIPES +``` + +To view a specified task: + +```SQL +SHOW PIPE +``` + +Example of the show pipes result for a pipe: + +```SQL ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor| PipeSink|ExceptionMessage|RemainingEventCount|EstimatedRemainingSeconds| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +|59abf95db892428b9d01c5fa318014ea|2024-06-17T14:03:44.189|RUNNING| {}| {}|{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}| | 128| 1.03| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +``` + +The meanings of each column are as follows: + +- **ID**:The unique identifier for the synchronization task +- **CreationTime**:The time when the synchronization task was created +- **State**:The state of the synchronization task +- **PipeSource**:The source of the synchronized data stream +- **PipeProcessor**:The processing logic of the synchronized data stream during transmission +- **PipeSink**:The destination of the synchronized data stream +- **ExceptionMessage**:Displays the exception information of the synchronization task +- **RemainingEventCount (Statistics with Delay)**: The number of remaining events, which is the total count of all events in the current data synchronization task, including data and schema synchronization events, as well as system and user-defined events. +- **EstimatedRemainingSeconds (Statistics with Delay)**: The estimated remaining time, based on the current number of events and the rate at the pipe, to complete the transfer. + +### Synchronization Plugins + +To make the overall architecture more flexible to match different synchronization scenario requirements, we support plugin assembly within the synchronization task framework. The system comes with some pre-installed common plugins that you can use directly. At the same time, you can also customize processor plugins and Sink plugins, and load them into the IoTDB system for use. You can view the plugins in the system (including custom and built-in plugins) with the following statement: + +```SQL +SHOW PIPEPLUGINS +``` + +The return result is as follows (version 1.3.2): + +```SQL +IoTDB> SHOW PIPEPLUGINS ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| PluginName|PluginType| ClassName| PluginJar| ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | +| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | +| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | +| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | +| IOTDB-THRIFT-SSL-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ + +``` + +Detailed introduction of pre-installed plugins is as follows (for detailed parameters of each plugin, please refer to the [Parameter Description](#reference-parameter-description) section): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeCustom PluginPlugin NameDescriptionApplicable Version
source pluginNot Supportediotdb-sourceThe default extractor plugin, used to extract historical or real-time data from IoTDB1.2.x
processor pluginSupporteddo-nothing-processorThe default processor plugin, which does not process the incoming data1.2.x
sink pluginSupporteddo-nothing-sinkDoes not process the data that is sent out1.2.x
iotdb-thrift-sinkThe default sink plugin ( V1.3.1+ ), used for data transfer between IoTDB ( V1.2.0+ ) and IoTDB( V1.2.0+ ) . It uses the Thrift RPC framework to transfer data, with a multi-threaded async non-blocking IO model, high transfer performance, especially suitable for scenarios where the target end is distributed1.2.x
iotdb-thrift-ssl-sinkUsed for data transfer between IoTDB ( V1.3.1+ ) and IoTDB ( V1.2.0+ ). It uses the Thrift RPC framework to transfer data, with a single-threaded sync blocking IO model, suitable for scenarios with higher security requirements1.3.1+
+ +For importing custom plugins, please refer to the [Stream Processing](./Streaming_timecho.md#custom-stream-processing-plugin-management) section. + +## Use examples + +### Full data synchronisation + +This example is used to demonstrate the synchronisation of all data from one IoTDB to another IoTDB with the data link as shown below: + +![](https://alioss.timecho.com/upload/pipe1.jpg) + +In this example, we can create a synchronization task named A2B to synchronize the full data from A IoTDB to B IoTDB. The iotdb-thrift-sink plugin (built-in plugin) for the sink is required. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls, as shown in the following example statement: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +``` + +### Partial data synchronization + +This example is used to demonstrate the synchronisation of data from a certain historical time range (8:00pm 23 August 2023 to 8:00pm 23 October 2023) to another IoTDB, the data link is shown below: + +![](https://alioss.timecho.com/upload/pipe2.jpg) + +In this example, we can create a synchronization task named A2B. First, we need to define the range of data to be transferred in the source. Since the data being transferred is historical data (historical data refers to data that existed before the creation of the synchronization task), we need to configure the start-time and end-time of the data and the transfer mode mode. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls. + +The detailed statements are as follows: + +```SQL +create pipe A2B +WITH SOURCE ( + 'source'= 'iotdb-source', + 'realtime.mode' = 'stream' -- The extraction mode for newly inserted data (after pipe creation) + 'start-time' = '2023.08.23T08:00:00+00:00', -- The start event time for synchronizing all data, including start-time + 'end-time' = '2023.10.23T08:00:00+00:00' -- The end event time for synchronizing all data, including end-time +) +with SINK ( + 'sink'='iotdb-thrift-async-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +``` + +### Edge-cloud data transfer + +This example is used to demonstrate the scenario where data from multiple IoTDB is transferred to the cloud, with data from clusters B, C, and D all synchronized to cluster A, as shown in the figure below: + +![](https://alioss.timecho.com/docs/img/sync_en_03.png) + +In this example, to synchronize the data from clusters B, C, and D to A, the pipe between BA, CA, and DA needs to configure the `path` to limit the range, and to keep the edge and cloud data consistent, the pipe needs to be configured with `inclusion=all` to synchronize full data and metadata. The detailed statement is as follows: + +On B IoTDB, execute the following statement to synchronize data from B to A: + +```SQL +create pipe BA +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'path'='root.db.**', -- Limit the range +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +On C IoTDB, execute the following statement to synchronize data from C to A: + +```SQL +create pipe CA +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'path'='root.db.**', -- Limit the range +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +On D IoTDB, execute the following statement to synchronize data from D to A: + +```SQL +create pipe DA +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'path'='root.db.**', -- Limit the range +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +### Cascading data transfer + +This example is used to demonstrate the scenario where data is transferred in a cascading manner between multiple IoTDB, with data from cluster A synchronized to cluster B, and then to cluster C, as shown in the figure below: + +![](https://alioss.timecho.com/docs/img/sync_en_04.png) + +In this example, to synchronize the data from cluster A to C, the `forwarding-pipe-requests` needs to be set to `true` between BC. The detailed statement is as follows: + +On A IoTDB, execute the following statement to synchronize data from A to B: + +```SQL +create pipe AB +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +On B IoTDB, execute the following statement to synchronize data from B to C: + +```SQL +create pipe BC +with source ( + 'forwarding-pipe-requests' = 'true' -- Whether to forward data written by other Pipes +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + + +### Compression Synchronization (V1.3.3+) + +IoTDB supports specifying data compression methods during synchronization. Real time compression and transmission of data can be achieved by configuring the `compressor` parameter. `Compressor` currently supports 5 optional algorithms: snappy/gzip/lz4/zstd/lzma2, and can choose multiple compression algorithm combinations to compress in the order of configuration `rate-limit-bytes-per-second`(supported in V1.3.3 and later versions) is the maximum number of bytes allowed to be transmitted per second, calculated as compressed bytes. If it is less than 0, there is no limit. + +For example, to create a synchronization task named A2B: + +```SQL +create pipe A2B +with sink ( + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB + 'compressor' = 'snappy,lz4' -- Compression algorithms +) +``` + +### Encrypted Synchronization (V1.3.1+) + +IoTDB supports the use of SSL encryption during the synchronization process, ensuring the secure transfer of data between different IoTDB instances. By configuring SSL-related parameters, such as the certificate address and password (`ssl.trust-store-path`)、(`ssl.trust-store-pwd`), data can be protected by SSL encryption during the synchronization process. + +For example, to create a synchronization task named A2B: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-ssl-sink', + 'node-urls'='127.0.0.1:6667', -- The URL of the data service port of the DataNode node on the target IoTDB + 'ssl.trust-store-path'='pki/trusted', -- The trust store certificate path required to connect to the target DataNode + 'ssl.trust-store-pwd'='root' -- The trust store certificate password required to connect to the target DataNode +) +``` + +## Reference: Notes + +You can adjust the parameters for data synchronization by modifying the IoTDB configuration file (`iotdb-system.properties`), such as the directory for storing synchronized data. The complete configuration is as follows: + +V1.3.3+: + +```Properties +# pipe_receiver_file_dir +# If this property is unset, system will save the data in the default relative path directory under the IoTDB folder(i.e., %IOTDB_HOME%/${cn_system_dir}/pipe/receiver). +# If it is absolute, system will save the data in the exact location it points to. +# If it is relative, system will save the data in the relative path directory it indicates under the IoTDB folder. +# Note: If pipe_receiver_file_dir is assigned an empty string(i.e.,zero-size), it will be handled as a relative path. +# effectiveMode: restart +# For windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is absolute. Otherwise, it is relative. +# pipe_receiver_file_dir=data\\confignode\\system\\pipe\\receiver +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_receiver_file_dir=data/confignode/system/pipe/receiver + +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# effectiveMode: first_start +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# effectiveMode: restart +# Datatype: int +pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# effectiveMode: restart +# Datatype: int +pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# effectiveMode: restart +# Datatype: int +pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# effectiveMode: restart +# Datatype: int +pipe_sink_max_client_number=16 + +# The total bytes that all pipe sinks can transfer per second. +# When given a value less than or equal to 0, it means no limit. +# default value is -1, which means no limit. +# effectiveMode: hot_reload +# Datatype: double +pipe_all_sinks_rate_limit_bytes_per_second=-1 +``` + +## Reference: parameter description + +### source parameter(V1.3.3) + +| key | value | value range | required or not | default value | +| :------------------------------ | :----------------------------------------------------------- | :------------------------------------- | :------- | :------------- | +| source | iotdb-source | String: iotdb-source | Required | - | +| inclusion | Used to specify the range of data to be synchronized in the data synchronization task, including data, schema, and auth | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | data.insert | +| inclusion.exclusion | Used to exclude specific operations from the range specified by inclusion, reducing the amount of data synchronized | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | - | +| path | Used to filter the path pattern schema of time series and data to be synchronized / schema synchronization can only use pathpath is exact matching, parameters must be prefix paths or complete paths, i.e., cannot contain `"*"`, at most one `"**"` at the end of the path parameter | String:IoTDB pattern | Optional | root.** | +| pattern | Used to filter the path prefix of time series | String: Optional | Optional | root | +| start-time | The start event time for synchronizing all data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MIN_VALUE | +| end-time | The end event time for synchronizing all data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MAX_VALUE | +| realtime.mode | The extraction mode for newly inserted data (after pipe creation) | String: batch | Optional | batch | +| forwarding-pipe-requests | Whether to forward data written by other Pipes (usually data synchronization) | Boolean: true | Optional | true | +| history.loose-range | When transferring TsFile, whether to relax the range of historical data (before the creation of the pipe). "": Do not relax the range, select data strictly according to the set conditions. "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency. "path": Relax the path range to avoid splitting TsFile, which can improve synchronization efficiency. "time, path", "path, time", "all": Relax all ranges to avoid splitting TsFile, which can improve synchronization efficiency. | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | Optional |""| +| realtime.loose-range | When transferring TsFile, whether to relax the range of real-time data (before the creation of the pipe). "": Do not relax the range, select data strictly according to the set conditions. "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency. "path": Relax the path range to avoid splitting TsFile, which can improve synchronization efficiency. "time, path", "path, time", "all": Relax all ranges to avoid splitting TsFile, which can improve synchronization efficiency. | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | Optional |""| +| mods.enable | Whether to send the mods file of tsfile | Boolean: true / false | Optional | false | + +> 💎 **Explanation**:To maintain compatibility with lower versions, history.enable, history.start-time, history.end-time, realtime.enable can still be used, but they are not recommended in the new version. +> +> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** +> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. +> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. + + +## sink parameter + +> In versions 1.3.3 and above, when only the sink is included, the additional "with sink" prefix is no longer required. + +#### iotdb-thrift-sink + + +| key | value | value Range | required or not | Default Value | +| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | +| sink | iotdb-thrift-sink or iotdb-thrift-async-sink | String: iotdb-thrift-sink or iotdb-thrift-async-sink | Required | | +| node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | +| batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | +| batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | +| batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | + +#### iotdb-thrift-ssl-sink + +| key | value | value Range | required or not | Default Value | +| :---------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | +| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | Required | - | +| node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | +| batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | +| batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | +| batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | +| ssl.trust-store-path | The trust store certificate path required to connect to the target DataNode | String: certificate directory name, when configured as a relative directory, it is relative to the IoTDB root directory. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667'| Required | - | +| ssl.trust-store-pwd | The trust store certificate password required to connect to the target DataNode | Integer | Required | - | diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_timecho.md new file mode 100644 index 00000000..d7084b4f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_timecho.md @@ -0,0 +1,610 @@ + + +# Data Sync + +Data synchronization is a typical requirement in industrial Internet of Things (IoT). Through data synchronization mechanisms, it is possible to achieve data sharing between IoTDB, and to establish a complete data link to meet the needs for internal and external network data interconnectivity, edge-cloud synchronization, data migration, and data backup. + +## Function Overview + +### Data Synchronization + +A data synchronization task consists of three stages: + +![](https://alioss.timecho.com/docs/img/sync_en_01.png) + +- Source Stage:This part is used to extract data from the source IoTDB, defined in the source section of the SQL statement. +- Process Stage:This part is used to process the data extracted from the source IoTDB, defined in the processor section of the SQL statement. +- Sink Stage:This part is used to send data to the target IoTDB, defined in the sink section of the SQL statement. + +By declaratively configuring the specific content of the three parts through SQL statements, flexible data synchronization capabilities can be achieved. Currently, data synchronization supports the synchronization of the following information, and you can select the synchronization scope when creating a synchronization task (the default is data.insert, which means synchronizing newly written data): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Synchronization ScopeSynchronization Content Description
allAll scopes
data(Data)insertSynchronize newly written data
deleteSynchronize deleted data
schemadatabaseSynchronize database creation, modification or deletion operations
timeseriesSynchronize the definition and attributes of time series
TTLSynchronize the data retention time
auth-Synchronize user permissions and access control
+ +### Functional limitations and instructions + +The schema and auth synchronization functions have the following limitations: + +- When using schema synchronization, it is required that the consensus protocol for `Schema region` and `ConfigNode` must be the default ratis protocol. This means that the `iotdb-system.properties` configuration file should contain the settings `config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus` and `schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus`. If these are not specified, the default ratis protocol is used. + +- To prevent potential conflicts, please disable the automatic creation of schema on the receiving end when enabling schema synchronization. This can be done by setting the `enable_auto_create_schema` configuration in the `iotdb-system.properties` file to false. + +- When schema synchronization is enabled, the use of custom plugins is not supported. + +- In a dual-active cluster, schema synchronization should avoid simultaneous operations on both ends. + +- During data synchronization tasks, please avoid performing any deletion operations to prevent inconsistent states between the two ends. + +## Usage Instructions + +Data synchronization tasks have three states: RUNNING, STOPPED, and DROPPED. The task state transitions are shown in the following diagram: + +![](https://alioss.timecho.com/docs/img/Data-Sync02.png) + +After creation, the task will start directly, and when the task stops abnormally, the system will automatically attempt to restart the task. + +Provide the following SQL statements for state management of synchronization tasks. + +### Create Task + +Use the `CREATE PIPE` statement to create a data synchronization task. The `PipeId` and `sink` attributes are required, while `source` and `processor` are optional. When entering the SQL, note that the order of the `SOURCE` and `SINK` plugins cannot be swapped. + +The SQL example is as follows: + +```SQL +CREATE PIPE [IF NOT EXISTS] -- PipeId is the name that uniquely identifies the task. +-- Data extraction plugin, optional plugin +WITH SOURCE ( + [ = ,], +) +-- Data processing plugin, optional plugin +WITH PROCESSOR ( + [ = ,], +) +-- Data connection plugin, required plugin +WITH SINK ( + [ = ,], +) +``` + +**IF NOT EXISTS semantics**: Used in creation operations to ensure that the create command is executed when the specified Pipe does not exist, preventing errors caused by attempting to create an existing Pipe. + +### Start Task + +Start processing data: + +```SQL +START PIPE +``` + +### Stop Task + +Stop processing data: + +```SQL +STOP PIPE +``` + +### Delete Task + +Deletes the specified task: + +```SQL +DROP PIPE [IF EXISTS] +``` +**IF EXISTS semantics**: Used in deletion operations to ensure that when a specified Pipe exists, the delete command is executed to prevent errors caused by attempting to delete non-existent Pipes. + +Deleting a task does not require stopping the synchronization task first. + +### View Task + +View all tasks: + +```SQL +SHOW PIPES +``` + +To view a specified task: + +```SQL +SHOW PIPE +``` + +Example of the show pipes result for a pipe: + +```SQL ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor| PipeSink|ExceptionMessage|RemainingEventCount|EstimatedRemainingSeconds| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +|59abf95db892428b9d01c5fa318014ea|2024-06-17T14:03:44.189|RUNNING| {}| {}|{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}| | 128| 1.03| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +``` + +The meanings of each column are as follows: + +- **ID**:The unique identifier for the synchronization task +- **CreationTime**:The time when the synchronization task was created +- **State**:The state of the synchronization task +- **PipeSource**:The source of the synchronized data stream +- **PipeProcessor**:The processing logic of the synchronized data stream during transmission +- **PipeSink**:The destination of the synchronized data stream +- **ExceptionMessage**:Displays the exception information of the synchronization task +- **RemainingEventCount (Statistics with Delay)**: The number of remaining events, which is the total count of all events in the current data synchronization task, including data and schema synchronization events, as well as system and user-defined events. +- **EstimatedRemainingSeconds (Statistics with Delay)**: The estimated remaining time, based on the current number of events and the rate at the pipe, to complete the transfer. + +### Synchronization Plugins + +To make the overall architecture more flexible to match different synchronization scenario requirements, we support plugin assembly within the synchronization task framework. The system comes with some pre-installed common plugins that you can use directly. At the same time, you can also customize processor plugins and Sink plugins, and load them into the IoTDB system for use. You can view the plugins in the system (including custom and built-in plugins) with the following statement: + +```SQL +SHOW PIPEPLUGINS +``` + +The return result is as follows (version 1.3.2): + +```SQL +IoTDB> SHOW PIPEPLUGINS ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| PluginName|PluginType| ClassName| PluginJar| ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | +| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | +| IOTDB-AIR-GAP-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.airgap.IoTDBAirGapConnector| | +| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | +| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | +| IOTDB-THRIFT-SSL-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ + +``` + +Detailed introduction of pre-installed plugins is as follows (for detailed parameters of each plugin, please refer to the [Parameter Description](#reference-parameter-description) section): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeCustom PluginPlugin NameDescriptionApplicable Version
source pluginNot Supportediotdb-sourceThe default extractor plugin, used to extract historical or real-time data from IoTDB1.2.x
processor pluginSupporteddo-nothing-processorThe default processor plugin, which does not process the incoming data1.2.x
sink pluginSupporteddo-nothing-sinkDoes not process the data that is sent out1.2.x
iotdb-thrift-sinkThe default sink plugin ( V1.3.1+ ), used for data transfer between IoTDB ( V1.2.0+ ) and IoTDB( V1.2.0+ ) . It uses the Thrift RPC framework to transfer data, with a multi-threaded async non-blocking IO model, high transfer performance, especially suitable for scenarios where the target end is distributed1.2.x
iotdb-air-gap-sinkUsed for data synchronization across unidirectional data diodes from IoTDB ( V1.2.0+ ) to IoTDB ( V1.2.0+ ). Supported diode models include Nanrui Syskeeper 2000, etc1.2.x
iotdb-thrift-ssl-sinkUsed for data transfer between IoTDB ( V1.3.1+ ) and IoTDB ( V1.2.0+ ). It uses the Thrift RPC framework to transfer data, with a single-threaded sync blocking IO model, suitable for scenarios with higher security requirements1.3.1+
+ +For importing custom plugins, please refer to the [Stream Processing](./Streaming_timecho.md#custom-stream-processing-plugin-management) section. + +## Use examples + +### Full data synchronisation + +This example is used to demonstrate the synchronisation of all data from one IoTDB to another IoTDB with the data link as shown below: + +![](https://alioss.timecho.com/upload/pipe1.jpg) + +In this example, we can create a synchronization task named A2B to synchronize the full data from A IoTDB to B IoTDB. The iotdb-thrift-sink plugin (built-in plugin) for the sink is required. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls, as shown in the following example statement: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +``` + +### Partial data synchronization + +This example is used to demonstrate the synchronisation of data from a certain historical time range (8:00pm 23 August 2023 to 8:00pm 23 October 2023) to another IoTDB, the data link is shown below: + +![](https://alioss.timecho.com/upload/pipe2.jpg) + +In this example, we can create a synchronization task named A2B. First, we need to define the range of data to be transferred in the source. Since the data being transferred is historical data (historical data refers to data that existed before the creation of the synchronization task), we need to configure the start-time and end-time of the data and the transfer mode mode. The URL of the data service port of the DataNode node on the target IoTDB needs to be configured through node-urls. + +The detailed statements are as follows: + +```SQL +create pipe A2B +WITH SOURCE ( + 'source'= 'iotdb-source', + 'realtime.mode' = 'stream' -- The extraction mode for newly inserted data (after pipe creation) + 'start-time' = '2023.08.23T08:00:00+00:00', -- The start event time for synchronizing all data, including start-time + 'end-time' = '2023.10.23T08:00:00+00:00' -- The end event time for synchronizing all data, including end-time +) +with SINK ( + 'sink'='iotdb-thrift-async-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +``` + +### Bidirectional data transfer + +This example is used to demonstrate the scenario where two IoTDB act as active-active pairs, with the data link shown in the figure below: + +![](https://alioss.timecho.com/upload/pipe3.jpg) + +In this example, to avoid infinite data loops, the `forwarding-pipe-requests` parameter on A and B needs to be set to `false`, indicating that data transmitted from another pipe is not forwarded, and to keep the data consistent on both sides, the pipe needs to be configured with `inclusion=all` to synchronize full data and metadata. + +The detailed statement is as follows: + +On A IoTDB, execute the following statement: + +```SQL +create pipe AB +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'forwarding-pipe-requests' = 'false' -- Do not forward data written by other Pipes +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +``` + +On B IoTDB, execute the following statement: + +```SQL +create pipe BA +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'forwarding-pipe-requests' = 'false' -- Do not forward data written by other Pipes +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667', -- The URL of the data service port of the DataNode node on the target IoTDB +) +``` + +### Edge-cloud data transfer + +This example is used to demonstrate the scenario where data from multiple IoTDB is transferred to the cloud, with data from clusters B, C, and D all synchronized to cluster A, as shown in the figure below: + +![](https://alioss.timecho.com/docs/img/sync_en_03.png) + +In this example, to synchronize the data from clusters B, C, and D to A, the pipe between BA, CA, and DA needs to configure the `path` to limit the range, and to keep the edge and cloud data consistent, the pipe needs to be configured with `inclusion=all` to synchronize full data and metadata. The detailed statement is as follows: + +On B IoTDB, execute the following statement to synchronize data from B to A: + +```SQL +create pipe BA +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'path'='root.db.**', -- Limit the range +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +On C IoTDB, execute the following statement to synchronize data from C to A: + +```SQL +create pipe CA +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'path'='root.db.**', -- Limit the range +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +On D IoTDB, execute the following statement to synchronize data from D to A: + +```SQL +create pipe DA +with source ( + 'inclusion'='all', -- Indicates synchronization of full data, schema , and auth + 'path'='root.db.**', -- Limit the range +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +### Cascading data transfer + +This example is used to demonstrate the scenario where data is transferred in a cascading manner between multiple IoTDB, with data from cluster A synchronized to cluster B, and then to cluster C, as shown in the figure below: + +![](https://alioss.timecho.com/docs/img/sync_en_04.png) + +In this example, to synchronize the data from cluster A to C, the `forwarding-pipe-requests` needs to be set to `true` between BC. The detailed statement is as follows: + +On A IoTDB, execute the following statement to synchronize data from A to B: + +```SQL +create pipe AB +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +On B IoTDB, execute the following statement to synchronize data from B to C: + +```SQL +create pipe BC +with source ( + 'forwarding-pipe-requests' = 'true' -- Whether to forward data written by other Pipes +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669', -- The URL of the data service port of the DataNode node on the target IoTDB +) +) +``` + +### Cross-gate data transfer + +This example is used to demonstrate the scenario where data from one IoTDB is synchronized to another IoTDB through a unidirectional gateway, as shown in the figure below: + +![](https://alioss.timecho.com/upload/pipe5.jpg) + + +In this example, the iotdb-air-gap-sink plugin in the sink task needs to be used (currently supports some gateway models, for specific models, please contact Timecho staff for confirmation). After configuring the gateway, execute the following statement on A IoTDB. Fill in the node-urls with the URL of the data service port of the DataNode node on the target IoTDB configured by the gateway, as detailed below: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-air-gap-sink', + 'node-urls' = '10.53.53.53:9780', -- The URL of the data service port of the DataNode node on the target IoTDB +``` + +### Compression Synchronization (V1.3.3+) + +IoTDB supports specifying data compression methods during synchronization. Real time compression and transmission of data can be achieved by configuring the `compressor` parameter. `Compressor` currently supports 5 optional algorithms: snappy/gzip/lz4/zstd/lzma2, and can choose multiple compression algorithm combinations to compress in the order of configuration `rate-limit-bytes-per-second`(supported in V1.3.3 and later versions) is the maximum number of bytes allowed to be transmitted per second, calculated as compressed bytes. If it is less than 0, there is no limit. + +For example, to create a synchronization task named A2B: + +```SQL +create pipe A2B +with sink ( + 'node-urls' = '127.0.0.1:6668', -- The URL of the data service port of the DataNode node on the target IoTDB + 'compressor' = 'snappy,lz4' -- Compression algorithms +) +``` + +### Encrypted Synchronization (V1.3.1+) + +IoTDB supports the use of SSL encryption during the synchronization process, ensuring the secure transfer of data between different IoTDB instances. By configuring SSL-related parameters, such as the certificate address and password (`ssl.trust-store-path`)、(`ssl.trust-store-pwd`), data can be protected by SSL encryption during the synchronization process. + +For example, to create a synchronization task named A2B: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-ssl-sink', + 'node-urls'='127.0.0.1:6667', -- The URL of the data service port of the DataNode node on the target IoTDB + 'ssl.trust-store-path'='pki/trusted', -- The trust store certificate path required to connect to the target DataNode + 'ssl.trust-store-pwd'='root' -- The trust store certificate password required to connect to the target DataNode +) +``` + +## Reference: Notes + +You can adjust the parameters for data synchronization by modifying the IoTDB configuration file (`iotdb-system.properties`), such as the directory for storing synchronized data. The complete configuration is as follows: + +V1.3.3+: + +```Properties +# pipe_receiver_file_dir +# If this property is unset, system will save the data in the default relative path directory under the IoTDB folder(i.e., %IOTDB_HOME%/${cn_system_dir}/pipe/receiver). +# If it is absolute, system will save the data in the exact location it points to. +# If it is relative, system will save the data in the relative path directory it indicates under the IoTDB folder. +# Note: If pipe_receiver_file_dir is assigned an empty string(i.e.,zero-size), it will be handled as a relative path. +# effectiveMode: restart +# For windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is absolute. Otherwise, it is relative. +# pipe_receiver_file_dir=data\\confignode\\system\\pipe\\receiver +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_receiver_file_dir=data/confignode/system/pipe/receiver + +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# effectiveMode: first_start +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# effectiveMode: restart +# Datatype: int +pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# effectiveMode: restart +# Datatype: int +pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# effectiveMode: restart +# Datatype: int +pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# effectiveMode: restart +# Datatype: int +pipe_sink_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# effectiveMode: restart +# Datatype: Boolean +pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# Datatype: int +# effectiveMode: restart +pipe_air_gap_receiver_port=9780 + +# The total bytes that all pipe sinks can transfer per second. +# When given a value less than or equal to 0, it means no limit. +# default value is -1, which means no limit. +# effectiveMode: hot_reload +# Datatype: double +pipe_all_sinks_rate_limit_bytes_per_second=-1 +``` + +## Reference: parameter description + +### source parameter(V1.3.3) + +| key | value | value range | required or not | default value | +| :------------------------------ | :----------------------------------------------------------- | :------------------------------------- | :------- | :------------- | +| source | iotdb-source | String: iotdb-source | Required | - | +| inclusion | Used to specify the range of data to be synchronized in the data synchronization task, including data, schema, and auth | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | data.insert | +| inclusion.exclusion | Used to exclude specific operations from the range specified by inclusion, reducing the amount of data synchronized | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | Optional | - | +| path | Used to filter the path pattern schema of time series and data to be synchronized / schema synchronization can only use pathpath is exact matching, parameters must be prefix paths or complete paths, i.e., cannot contain `"*"`, at most one `"**"` at the end of the path parameter | String:IoTDB pattern | Optional | root.** | +| pattern | Used to filter the path prefix of time series | String: Optional | Optional | root | +| start-time | The start event time for synchronizing all data, including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MIN_VALUE | +| end-time | The end event time for synchronizing all data, including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | Optional | Long.MAX_VALUE | +| realtime.mode | The extraction mode for newly inserted data (after pipe creation) | String: stream, batch | Optional | stream | +| forwarding-pipe-requests | Whether to forward data written by other Pipes (usually data synchronization) | Boolean: true, false | Optional | true | +| history.loose-range | When transferring TsFile, whether to relax the range of historical data (before the creation of the pipe). "": Do not relax the range, select data strictly according to the set conditions. "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency. "path": Relax the path range to avoid splitting TsFile, which can improve synchronization efficiency. "time, path", "path, time", "all": Relax all ranges to avoid splitting TsFile, which can improve synchronization efficiency. | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | Optional |""| +| realtime.loose-range | When transferring TsFile, whether to relax the range of real-time data (before the creation of the pipe). "": Do not relax the range, select data strictly according to the set conditions. "time": Relax the time range to avoid splitting TsFile, which can improve synchronization efficiency. "path": Relax the path range to avoid splitting TsFile, which can improve synchronization efficiency. "time, path", "path, time", "all": Relax all ranges to avoid splitting TsFile, which can improve synchronization efficiency. | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | Optional |""| +| mods.enable | Whether to send the mods file of tsfile | Boolean: true / false | Optional | false | + +> 💎 **Explanation**:To maintain compatibility with lower versions, history.enable, history.start-time, history.end-time, realtime.enable can still be used, but they are not recommended in the new version. +> +> 💎 **Explanation: Differences between Stream and Batch Data Extraction Modes** +> - **stream (recommended)**: In this mode, tasks process and send data in real-time. It is characterized by high timeliness and low throughput. +> - **batch**: In this mode, tasks process and send data in batches (according to the underlying data files). It is characterized by low timeliness and high throughput. + + +## sink parameter + +> In versions 1.3.3 and above, when only the sink is included, the additional "with sink" prefix is no longer required. + +#### iotdb-thrift-sink + + +| key | value | value Range | required or not | Default Value | +| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | +| sink | iotdb-thrift-sink or iotdb-thrift-async-sink | String: iotdb-thrift-sink or iotdb-thrift-async-sink | Required | | +| node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | +| batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | +| batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | +| batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | + +#### iotdb-air-gap-sink + +| key | value | value Range | required or not | Default Value | +| :--------------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | +| sink | iotdb-air-gap-sink | String: iotdb-air-gap-sink | Required | - | +| node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB | String. Example: :'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | +| air-gap.handshake-timeout-ms | The timeout duration of the handshake request when the sender and receiver first attempt to establish a connection, unit: ms | Integer | Optional | 5000 | + +#### iotdb-thrift-ssl-sink + +| key | value | value Range | required or not | Default Value | +| :---------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | :------- | :----------- | +| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | Required | - | +| node-urls | The URL of the data service port of any DataNode nodes on the target IoTDB (please note that synchronization tasks do not support forwarding to its own service) | String. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | Required | - | +| batch.enable | Whether to enable batched log transmission mode to improve transmission throughput and reduce IOPS | Boolean: true, false | Optional | true | +| batch.max-delay-seconds | Effective when batched log transmission mode is enabled, it represents the maximum waiting time for a batch of data before sending (unit: s) | Integer | Optional | 1 | +| batch.size-bytes | Effective when batched log transmission mode is enabled, it represents the maximum batch size for a batch of data (unit: byte) | Long | Optional | 16*1024*1024 | +| ssl.trust-store-path | The trust store certificate path required to connect to the target DataNode | String: certificate directory name, when configured as a relative directory, it is relative to the IoTDB root directory. Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667'| Required | - | +| ssl.trust-store-pwd | The trust store certificate password required to connect to the target DataNode | Integer | Required | - | diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Data-subscription.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Data-subscription.md new file mode 100644 index 00000000..9a5522ac --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Data-subscription.md @@ -0,0 +1,150 @@ +# Data Subscription + +## 1. Feature Introduction + +The IoTDB data subscription module (also known as the IoTDB subscription client) is a feature supported after IoTDB V1.3.3, which provides users with a streaming data consumption method that is different from data queries. It refers to the basic concepts and logic of message queue products such as Kafka, **providing data subscription and consumption interfaces**, but it is not intended to completely replace these consumer queue products. Instead, it offers more convenient data subscription services for scenarios where simple streaming data acquisition is needed. + +Using the IoTDB Subscription Client to consume data has significant advantages in the following application scenarios: + +1. **Continuously obtaining the latest data**: By using a subscription method, it is more real-time than scheduled queries, simpler to program applications, and has a lower system burden; + +2. **Simplify data push to third-party systems**: No need to develop data push components for different systems within IoTDB, data can be streamed within third-party systems, making it easier to send data to systems such as Flink, Kafka, DataX, Camel, MySQL, PG, etc. + +## 2. Key Concepts + +The IoTDB Subscription Client encompasses three core concepts: Topic, Consumer, and Consumer Group. The specific relationships are illustrated in the diagram below: + +
+ +
+ +1. **Topic**: Topic is the data space of IoTDB, represented by paths and time ranges (such as the full time range of root. * *). Consumers can subscribe to data on these topics (currently existing and future written). Unlike Kafka, IoTDB can create topics after data is stored, and the output format can be either Message or TsFile. + +2. **Consumer**: Consumer is an IoTDB subscription client is located, responsible for receiving and processing data published to specific topics. Consumers retrieve data from the queue and process it accordingly. There are two types of Consumers available in the IoTDB subscription client: + - `SubscriptionPullConsumer`, which corresponds to the pull consumption model in message queues, where user code needs to actively invoke data retrieval logic. + - `SubscriptionPushConsumer`, which corresponds to the push consumption model in message queues, where user code is triggered by newly arriving data events. + + +3. **Consumer Group**: A Consumer Group is a collection of Consumers who share the same Consumer Group ID. The Consumer Group has the following characteristics: + - Consumer Group and Consumer are in a one to many relationship. That is, there can be any number of consumers in a consumer group, but a consumer is not allowed to join multiple consumer groups simultaneously. + - A Consumer Group can have different types of Consumers (`SubscriptionPullConsumer` and `SubscriptionPushConsumer`). + - It is not necessary for all consumers in a Consumer Group to subscribe to the same topic. + - When different Consumers in the same Consumer Group subscribe to the same Topic, each piece of data under that Topic will only be processed by one Consumer within the group, ensuring that data is not processed repeatedly. + +## 3. SQL Statements + +### 3.1 Topic Management + +IoTDB supports the creation, deletion, and viewing of Topics through SQL statements. The status changes of Topics are illustrated in the diagram below: + +
+ +
+ +#### 3.1.1 Create Topic + +The SQL statement is as follows: + +```SQL + CREATE TOPIC [IF NOT EXISTS] + WITH ( + [ = ,], + ); +``` + +**IF NOT EXISTS semantics**: Used in creation operations to ensure that the create command is executed when the specified topic does not exist, preventing errors caused by attempting to create an existing topic. + +Detailed explanation of each parameter is as follows: + +| Key | Required or Optional with Default | Description | +| :-------------------------------------------- | :--------------------------------- | :----------------------------------------------------------- | +| **path** | optional: `root.**` | The path of the time series data corresponding to the topic, representing a set of time series to be subscribed. | +| **start-time** | optional: `MIN_VALUE` | The start time (event time) of the time series data corresponding to the topic. Can be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00, or a long value representing a raw timestamp consistent with the database's timestamp precision. Supports the special value `now`, which means the creation time of the topic. When start-time is `now` and end-time is MAX_VALUE, it indicates that only real-time data is subscribed. | +| **end-time** | optional: `MAX_VALUE` | The end time (event time) of the time series data corresponding to the topic. Can be in ISO format, such as 2011-12-03T10:15:30 or 2011-12:03T10:15:30+01:00, or a long value representing a raw timestamp consistent with the database's timestamp precision. Supports the special value `now`, which means the creation time of the topic. When end-time is `now` and start-time is MIN_VALUE, it indicates that only historical data is subscribed. | +| **processor** | optional: `do-nothing-processor` | The name and parameter configuration of the processor plugin, representing the custom processing logic applied to the original subscribed data, which can be specified in a similar way to pipe processor plugins. + | +| **format** | optional: `SessionDataSetsHandler` | Represents the form in which data is subscribed from the topic. Currently supports the following two forms of data: `SessionDataSetsHandler`: Data subscribed from the topic is obtained using `SubscriptionSessionDataSetsHandler`, and consumers can consume each piece of data row by row. `TsFileHandler`: Data subscribed from the topic is obtained using `SubscriptionTsFileHandler`, and consumers can directly subscribe to the TsFile storing the corresponding data. | +| **mode** **(supported in versions 1.3.3.2 and later)** | option: `live` | The subscription mode corresponding to the topic, with two options: `live`: When subscribing to this topic, the subscribed dataset mode is a dynamic dataset, which means that you can continuously consume the latest data. `snapshot`: When the consumer subscribes to this topic, the subscribed dataset mode is a static dataset, which means the snapshot of the data at the moment the consumer group subscribes to the topic (not the moment the topic is created); the formed static dataset after subscription does not support TTL.| +| **loose-range** **(supported in versions 1.3.3.2 and later)** | option: `""` | String: Whether to strictly filter the data corresponding to this topic according to the path and time range, for example: "": Strictly filter the data corresponding to this topic according to the path and time range. `"time"`: Do not strictly filter the data corresponding to this topic according to the time range (rough filter); strictly filter the data corresponding to this topic according to the path. `"path"`: Do not strictly filter the data corresponding to this topic according to the path (rough filter); strictly filter the data corresponding to this topic according to the time range. `"time, path"` / `"path, time"` / `"all"`: Do not strictly filter the data corresponding to this topic according to the path and time range (rough filter).| + +Examples are as follows: + + + +```SQL +-- Full subscription +CREATE TOPIC root_all; + +-- Custom subscription +CREATE TOPIC IF NOT EXISTS db_timerange +WITH ( + 'path' = 'root.db.**', + 'start-time' = '2023-01-01', + 'end-time' = '2023-12-31', +); +``` + +#### 3.1.2 Delete Topic + +A Topic can only be deleted if it is not subscribed to. When a Topic is deleted, its related consumption progress will be cleared. + +```SQL +DROP TOPIC [IF EXISTS] ; +``` +**IF EXISTS semantics**: Used in deletion operations to ensure that the delete command is executed when a specified topic exists, preventing errors caused by attempting to delete non-existent topics. + +#### 3.1.3 View Topic + +```SQL +SHOW TOPICS; +SHOW TOPIC ; +``` + +Result set: + +```SQL +[TopicName|TopicConfigs] +``` + +- TopicName: Topic ID +- TopicConfigs: Topic configurations + +### 3.2 Check Subscription Status + +View all subscription relationships: + +```SQL +-- Query the subscription relationships between all topics and consumer groups +SHOW SUBSCRIPTIONS +-- Query all subscriptions under a specific topic +SHOW SUBSCRIPTIONS ON +``` + +Result set: + +```SQL +[TopicName|ConsumerGroupName|SubscribedConsumers] +``` + +- TopicName: The ID of the topic. +- ConsumerGroupName: The ID of the consumer group specified in the user's code. +- SubscribedConsumers: All client IDs in the consumer group that have subscribed to the topic. + +## 4. API interface + +In addition to SQL statements, IoTDB also supports using data subscription features through Java native interfaces, more details see([link](../API/Programming-Java-Native-API.md)). + + +## 5. Frequently Asked Questions + +### 5.1 What is the difference between IoTDB data subscription and Kafka? + +1. Consumption Orderliness + +- **Kafka guarantees that messages within a single partition are ordered**,when a topic corresponds to only one partition and only one consumer subscribes to this topic, the order in which the consumer (single-threaded) consumes the topic data is the same as the order in which the data is written. +- The IoTDB subscription client **does not guarantee** that the order in which the consumer consumes the data is the same as the order in which the data is written, but it will try to reflect the order of data writing. + +2. Message Delivery Semantics + +- Kafka can achieve Exactly once semantics for both Producers and Consumers through configuration. +- The IoTDB subscription client currently cannot provide Exactly once semantics for Consumers. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Database-Programming.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Database-Programming.md new file mode 100644 index 00000000..e5dfd494 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Database-Programming.md @@ -0,0 +1,592 @@ + + +# CONTINUOUS QUERY(CQ) + + +## Introduction + +Continuous queries(CQ) are queries that run automatically and periodically on realtime data and store query results in other specified time series. + +Users can implement sliding window streaming computing through continuous query, such as calculating the hourly average temperature of a sequence and writing it into a new sequence. Users can customize the `RESAMPLE` clause to create different sliding windows, which can achieve a certain degree of tolerance for out-of-order data. + +## Syntax + +```sql +CREATE (CONTINUOUS QUERY | CQ) +[RESAMPLE + [EVERY ] + [BOUNDARY ] + [RANGE [, end_time_offset]] +] +[TIMEOUT POLICY BLOCKED|DISCARD] +BEGIN + SELECT CLAUSE + INTO CLAUSE + FROM CLAUSE + [WHERE CLAUSE] + [GROUP BY([, ]) [, level = ]] + [HAVING CLAUSE] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +END +``` + +> Note: +> +> 1. If there exists any time filters in WHERE CLAUSE, IoTDB will throw an error, because IoTDB will automatically generate a time range for the query each time it's executed. +> 2. GROUP BY TIME CLAUSE is different, it doesn't contain its original first display window parameter which is [start_time, end_time). It's still because IoTDB will automatically generate a time range for the query each time it's executed. +> 3. If there is no group by time clause in query, EVERY clause is required, otherwise IoTDB will throw an error. + +### Descriptions of parameters in CQ syntax + +- `` specifies the globally unique id of CQ. +- `` specifies the query execution time interval. We currently support the units of ns, us, ms, s, m, h, d, w, and its value should not be lower than the minimum threshold configured by the user, which is `continuous_query_min_every_interval`. It's an optional parameter, default value is set to `group_by_interval` in group by clause. +- `` specifies the start time of each query execution as `now()-`. We currently support the units of ns, us, ms, s, m, h, d, w.It's an optional parameter, default value is set to `every_interval` in resample clause. +- `` specifies the end time of each query execution as `now()-`. We currently support the units of ns, us, ms, s, m, h, d, w.It's an optional parameter, default value is set to `0`. +- `` is a date that represents the execution time of a certain cq task. + - `` can be earlier than, equals to, later than **current time**. + - This parameter is optional. If not specified, it is equal to `BOUNDARY 0`。 + - **The start time of the first time window** is ` - `. + - **The end time of the first time window** is ` - `. + - The **time range** of the `i (1 <= i)th` window is `[ - + (i - 1) * , - + (i - 1) * )`. + - If the **current time** is earlier than or equal to `execution_boundary_time`, then the first execution moment of the continuous query is `execution_boundary_time`. + - If the **current time** is later than `execution_boundary_time`, then the first execution moment of the continuous query is the first `execution_boundary_time + i * ` that is later than or equal to the current time . + +> - ``,`` and `` should all be greater than `0`. +> - The value of `` should be less than or equal to the value of ``, otherwise the system will throw an error. +> - Users should specify the appropriate `` and `` according to actual needs. +> - If `` is greater than ``, there will be partial data overlap in each query window. +> - If `` is less than ``, there may be uncovered data between each query window. +> - `start_time_offset` should be larger than `end_time_offset`, otherwise the system will throw an error. + +#### `` == `` + +![1](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic1.png?raw=true) + +#### `` > `` + +![2](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic2.png?raw=true) + +#### `` < `` + +![3](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic3.png?raw=true) + +#### `` is not zero + +![](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic4.png?raw=true) + + +- `TIMEOUT POLICY` specify how we deal with the cq task whose previous time interval execution is not finished while the next execution time has reached. The default value is `BLOCKED`. + - `BLOCKED` means that we will block and wait to do the current cq execution task until the previous time interval cq task finishes. If using `BLOCKED` policy, all the time intervals will be executed, but it may be behind the latest time interval. + - `DISCARD` means that we just discard the current cq execution task and wait for the next execution time and do the next time interval cq task. If using `DISCARD` policy, some time intervals won't be executed when the execution time of one cq task is longer than the ``. However, once a cq task is executed, it will use the latest time interval, so it can catch up at the sacrifice of some time intervals being discarded. + + +## Examples of CQ + +The examples below use the following sample data. It's a real time data stream and we can assume that the data arrives on time. + +```` ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.ln.wf02.wt02.temperature|root.ln.wf02.wt01.temperature|root.ln.wf01.wt02.temperature|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|2021-05-11T22:18:14.598+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:19.941+08:00| 0.0| 68.0| 68.0| 103.0| +|2021-05-11T22:18:24.949+08:00| 122.0| 45.0| 11.0| 14.0| +|2021-05-11T22:18:29.967+08:00| 47.0| 14.0| 59.0| 181.0| +|2021-05-11T22:18:34.979+08:00| 182.0| 113.0| 29.0| 180.0| +|2021-05-11T22:18:39.990+08:00| 42.0| 11.0| 52.0| 19.0| +|2021-05-11T22:18:44.995+08:00| 78.0| 38.0| 123.0| 52.0| +|2021-05-11T22:18:49.999+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:55.003+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +```` + +### Configuring execution intervals + +Use an `EVERY` interval in the `RESAMPLE` clause to specify the CQ’s execution interval, if not specific, default value is equal to `group_by_interval`. + +```sql +CREATE CONTINUOUS QUERY cq1 +RESAMPLE EVERY 20s +BEGIN +SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq1` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. + +`cq1` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq1` runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq1` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq1` won't deal with data that is before the current time window which is `2021-05-11T22:18:20.000+08:00`, so here are the results: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### Configuring time range for resampling + +Use `start_time_offset` in the `RANGE` clause to specify the start time of the CQ’s time range, if not specific, default value is equal to `EVERY` interval. + +```sql +CREATE CONTINUOUS QUERY cq2 +RESAMPLE RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq2` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. + +`cq2` executes at 10-second intervals, the same interval as the `group_by_interval`. Every 10 seconds, `cq2` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()` , that is, the time range between 40 seconds prior to `now()` and `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq2` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| NULL| NULL| NULL| NULL| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:18:50.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:10, 2021-05-11T22:18:50)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq2` won't write lines that are all null. Notice `cq2` will also calculate the results for some time interval many times. Here are the results: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### Configuring execution intervals and CQ time ranges + +Use an `EVERY` interval and `RANGE` interval in the `RESAMPLE` clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use `fill()` to change the value reported for time intervals with no data. + +```sql +CREATE CONTINUOUS QUERY cq3 +RESAMPLE EVERY 20s RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq3` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value `100.0` for time intervals with no results. + +`cq3` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq3` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()`, that is, the time range between 40 seconds prior to `now()` and `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq3` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +Notice that `cq3` will calculate the results for some time interval many times, so here are the results: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### Configuring end_time_offset for CQ time range + +Use an `EVERY` interval and `RANGE` interval in the RESAMPLE clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use `fill()` to change the value reported for time intervals with no data. + +```sql +CREATE CONTINUOUS QUERY cq4 +RESAMPLE EVERY 20s RANGE 40s, 20s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq4` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value `100.0` for time intervals with no results. + +`cq4` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq4` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()` minus the `end_time_offset`, that is, the time range between 40 seconds prior to `now()` and 20 seconds prior to `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq4` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:20)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +Notice that `cq4` will calculate the results for all time intervals only once after a delay of 20 seconds, so here are the results: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### CQ without group by clause + +Use an `EVERY` interval in the `RESAMPLE` clause to specify the CQ’s execution interval and the length of the CQ’s time range. + +```sql +CREATE CONTINUOUS QUERY cq5 +RESAMPLE EVERY 20s +BEGIN + SELECT temperature + 1 + INTO root.precalculated_sg.::(temperature) + FROM root.ln.*.* + align by device +END +``` + +`cq5` calculates the `temperature + 1` under the `root.ln` prefix path and stores the results in the `root.precalculated_sg` database. Sensors use the same prefix path as the corresponding sensor. + +`cq5` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq5` runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq5` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq5` generate 16 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| ++-----------------------------+-------------------------------+-----------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq5` generate 12 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +> +```` + +`cq5` won't deal with data that is before the current time window which is `2021-05-11T22:18:20.000+08:00`, so here are the results: + +```` +> SELECT temperature from root.precalculated_sg.*.* align by device; ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +```` + +## CQ Management + +### Listing continuous queries + +List every CQ on the IoTDB Cluster with: + +```sql +SHOW (CONTINUOUS QUERIES | CQS) +``` + +`SHOW (CONTINUOUS QUERIES | CQS)` order results by `cq_id`. + +#### Examples + +```sql +SHOW CONTINUOUS QUERIES; +``` + +we will get: + +| cq_id | query | state | +| :---------- | ------------------------------------------------------------ | ------ | +| s1_count_cq | CREATE CQ s1_count_cq
BEGIN
SELECT count(s1)
INTO root.sg_count.d.count_s1
FROM root.sg.d
GROUP BY(30m)
END | active | + + +### Dropping continuous queries + +Drop a CQ with a specific `cq_id`: + +```sql +DROP (CONTINUOUS QUERY | CQ) +``` + +DROP CQ returns an empty result. + +#### Examples + +Drop the CQ named `s1_count_cq`: + +```sql +DROP CONTINUOUS QUERY s1_count_cq; +``` + +### Altering continuous queries + +CQs can't be altered once they're created. To change a CQ, you must `DROP` and re`CREATE` it with the updated settings. + + +## CQ Use Cases + +### Downsampling and Data Retention + +Use CQs with `TTL` set on database in IoTDB to mitigate storage concerns. Combine CQs and `TTL` to automatically downsample high precision data to a lower precision and remove the dispensable, high precision data from the database. + +### Recalculating expensive queries + +Shorten query runtimes by pre-calculating expensive queries with CQs. Use a CQ to automatically downsample commonly-queried, high precision data to a lower precision. Queries on lower precision data require fewer resources and return faster. + +> Pre-calculate queries for your preferred graphing tool to accelerate the population of graphs and dashboards. + +### Substituting for sub-query + +IoTDB does not support sub queries. We can get the same functionality by creating a CQ as a sub query and store its result into other time series and then querying from those time series again will be like doing nested sub query. + +#### Example + +IoTDB does not accept the following query with a nested sub query. The query calculates the average number of non-null values of `s1` at 30 minute intervals: + +```sql +SELECT avg(count_s1) from (select count(s1) as count_s1 from root.sg.d group by([0, now()), 30m)); +``` + +To get the same results: + +**Create a CQ** + +This step performs the nested sub query in from clause of the query above. The following CQ automatically calculates the number of non-null values of `s1` at 30 minute intervals and writes those counts into the new `root.sg_count.d.count_s1` time series. + +```sql +CREATE CQ s1_count_cq +BEGIN + SELECT count(s1) + INTO root.sg_count.d(count_s1) + FROM root.sg.d + GROUP BY(30m) +END +``` + +**Query the CQ results** + +Next step performs the avg([...]) part of the outer query above. + +Query the data in the time series `root.sg_count.d.count_s1` to calculate the average of it: + +```sql +SELECT avg(count_s1) from root.sg_count.d; +``` + + +## System Parameter Configuration + +| Name | Description | Data Type | Default Value | +| :------------------------------------------ | ------------------------------------------------------------ | --------- | ------------- | +| `continuous_query_submit_thread` | The number of threads in the scheduled thread pool that submit continuous query tasks periodically | int32 | 2 | +| `continuous_query_min_every_interval_in_ms` | The minimum value of the continuous query execution time interval | duration | 1000 | + diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/IoTDB-View_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/IoTDB-View_timecho.md new file mode 100644 index 00000000..cceada40 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/IoTDB-View_timecho.md @@ -0,0 +1,549 @@ + + +# View + +## Sequence View Application Background + +## Application Scenario 1 Time Series Renaming (PI Asset Management) + +In practice, the equipment collecting data may be named with identification numbers that are difficult to be understood by human beings, which brings difficulties in querying to the business layer. + +The Sequence View, on the other hand, is able to re-organise the management of these sequences and access them using a new model structure without changing the original sequence content and without the need to create new or copy sequences. + +**For example**: a cloud device uses its own NIC MAC address to form entity numbers and stores data by writing the following time sequence:`root.db.0800200A8C6D.xvjeifg`. + +It is difficult for the user to understand. However, at this point, the user is able to rename it using the sequence view feature, map it to a sequence view, and use `root.view.device001.temperature` to access the captured data. + +### Application Scenario 2 Simplifying business layer query logic + +Sometimes users have a large number of devices that manage a large number of time series. When conducting a certain business, the user wants to deal with only some of these sequences. At this time, the focus of attention can be picked out by the sequence view function, which is convenient for repeated querying and writing. + +**For example**: Users manage a product assembly line with a large number of time series for each segment of the equipment. The temperature inspector only needs to focus on the temperature of the equipment, so he can extract the temperature-related sequences and compose the sequence view. + +### Application Scenario 3 Auxiliary Rights Management + +In the production process, different operations are generally responsible for different scopes. For security reasons, it is often necessary to restrict the access scope of the operations staff through permission management. + +**For example**: The safety management department now only needs to monitor the temperature of each device in a production line, but these data are stored in the same database with other confidential data. At this point, it is possible to create a number of new views that contain only temperature-related time series on the production line, and then to give the security officer access to only these sequence views, thus achieving the purpose of permission restriction. + +### Motivation for designing sequence view functionality + +Combining the above two types of usage scenarios, the motivations for designing sequence view functionality, are: + +1. time series renaming. +2. to simplify the query logic at the business level. +3. Auxiliary rights management, open data to specific users through the view. + +## Sequence View Concepts + +### Terminology Concepts + +Concept: If not specified, the views specified in this document are **Sequence Views**, and new features such as device views may be introduced in the future. + +### Sequence view + +A sequence view is a way of organising the management of time series. + +In traditional relational databases, data must all be stored in a table, whereas in time series databases such as IoTDB, it is the sequence that is the storage unit. Therefore, the concept of sequence views in IoTDB is also built on sequences. + +A sequence view is a virtual time series, and each virtual time series is like a soft link or shortcut that maps to a sequence or some kind of computational logic external to a certain view. In other words, a virtual sequence either maps to some defined external sequence or is computed from multiple external sequences. + +Users can create views using complex SQL queries, where the sequence view acts as a stored query statement, and when data is read from the view, the stored query statement is used as the source of the data in the FROM clause. + +### Alias Sequences + +There is a special class of beings in a sequence view that satisfy all of the following conditions: + +1. the data source is a single time series +2. there is no computational logic +3. no filtering conditions (e.g., no WHERE clause restrictions). + +Such a sequence view is called an **alias sequence**, or alias sequence view. A sequence view that does not fully satisfy all of the above conditions is called a non-alias sequence view. The difference between them is that only aliased sequences support write functionality. + +** All sequence views, including aliased sequences, do not currently support Trigger functionality. ** + +### Nested Views + +A user may want to select a number of sequences from an existing sequence view to form a new sequence view, called a nested view. + +**The current version does not support the nested view feature**. + +### Some constraints on sequence views in IoTDB + +#### Constraint 1 A sequence view must depend on one or several time series + +A sequence view has two possible forms of existence: + +1. it maps to a time series +2. it is computed from one or more time series. + +The former form of existence has been exemplified in the previous section and is easy to understand; the latter form of existence here is because the sequence view allows for computational logic. + +For example, the user has installed two thermometers in the same boiler and now needs to calculate the average of the two temperature values as a measurement. The user has captured the following two sequences: `root.db.d01.temperature01`, `root.db.d01.temperature02`. + +At this point, the user can use the average of the two sequences as one sequence in the view: `root.db.d01.avg_temperature`. + +This example will 3.1.2 expand in detail. + +#### Restriction 2 Non-alias sequence views are read-only + +Writing to non-alias sequence views is not allowed. + +Only aliased sequence views are supported for writing. + +#### Restriction 3 Nested views are not allowed + +It is not possible to select certain columns in an existing sequence view to create a sequence view, either directly or indirectly. + +An example of this restriction will be given in 3.1.3. + +#### Restriction 4 Sequence view and time series cannot be renamed + +Both sequence views and time series are located under the same tree, so they cannot be renamed. + +The name (path) of any sequence should be uniquely determined. + +#### Restriction 5 Sequence views share timing data with time series, metadata such as labels are not shared + +Sequence views are mappings pointing to time series, so they fully share timing data, with the time series being responsible for persistent storage. + +However, their metadata such as tags and attributes are not shared. + +This is because the business query, view-oriented users are concerned about the structure of the current view, and if you use group by tag and other ways to do the query, obviously want to get the view contains the corresponding tag grouping effect, rather than the time series of the tag grouping effect (the user is not even aware of those time series). + +## Sequence view functionality + +### Creating a view + +Creating a sequence view is similar to creating a time series, the difference is that you need to specify the data source, i.e., the original sequence, through the AS keyword. + +#### SQL for creating a view + +User can select some sequences to create a view: + +```SQL +CREATE VIEW root.view.device.status +AS + SELECT s01 + FROM root.db.device +``` + +It indicates that the user has selected the sequence `s01` from the existing device `root.db.device`, creating the sequence view `root.view.device.status`. + +The sequence view can exist under the same entity as the time series, for example: + +```SQL +CREATE VIEW root.db.device.status +AS + SELECT s01 + FROM root.db.device +``` + +Thus, there is a virtual copy of `s01` under `root.db.device`, but with a different name `status`. + +It can be noticed that the sequence views in both of the above examples are aliased sequences, and we are giving the user a more convenient way of creating a sequence for that sequence: + +```SQL +CREATE VIEW root.view.device.status +AS + root.db.device.s01 +``` + +#### Creating views with computational logic + +Following the example in section 2.2 Limitations 1: + +> A user has installed two thermometers in the same boiler and now needs to calculate the average of the two temperature values as a measurement. The user has captured the following two sequences: `root.db.d01.temperature01`, `root.db.d01.temperature02`. +> +> At this point, the user can use the two sequences averaged as one sequence in the view: `root.view.device01.avg_temperature`. + +If the view is not used, the user can query the average of the two temperatures like this: + +```SQL +SELECT (temperature01 + temperature02) / 2 +FROM root.db.d01 +``` + +And if using a sequence view, the user can create a view this way to simplify future queries: + +```SQL +CREATE VIEW root.db.d01.avg_temperature +AS + SELECT (temperature01 + temperature02) / 2 + FROM root.db.d01 +``` + +The user can then query it like this: + +```SQL +SELECT avg_temperature FROM root.db.d01 +``` + +#### Nested sequence views not supported + +Continuing with the example from 3.1.2, the user now wants to create a new view using the sequence view `root.db.d01.avg_temperature`, which is not allowed. We currently do not support nested views, whether it is an aliased sequence or not. + +For example, the following SQL statement will report an error: + +```SQL +CREATE VIEW root.view.device.avg_temp_copy +AS + root.db.d01.avg_temperature -- Not supported. Nested views are not allowed +``` + +#### Creating multiple sequence views at once + +If only one sequence view can be specified at a time which is not convenient for the user to use, then multiple sequences can be specified at a time, for example: + +```SQL +CREATE VIEW root.db.device.status, root.db.device.sub.hardware +AS + SELECT s01, s02 + FROM root.db.device +``` + +此外,上述写法可以做简化: + +```SQL +CREATE VIEW root.db.device(status, sub.hardware) +AS + SELECT s01, s02 + FROM root.db.device +``` + +Both statements above are equivalent to the following typing: + +```SQL +CREATE VIEW root.db.device.status +AS + SELECT s01 + FROM root.db.device; + +CREATE VIEW root.db.device.sub.hardware +AS + SELECT s02 + FROM root.db.device +``` + +is also equivalent to the following: + +```SQL +CREATE VIEW root.db.device.status, root.db.device.sub.hardware +AS + root.db.device.s01, root.db.device.s02 + +-- or + +CREATE VIEW root.db.device(status, sub.hardware) +AS + root.db.device(s01, s02) +``` + +##### The mapping relationships between all sequences are statically stored + +Sometimes, the SELECT clause may contain a number of statements that can only be determined at runtime, such as below: + +```SQL +SELECT s01, s02 +FROM root.db.d01, root.db.d02 +``` + +The number of sequences that can be matched by the above statement is uncertain and is related to the state of the system. Even so, the user can use it to create views. + +However, it is important to note that the mapping relationship between all sequences is stored statically (fixed at creation)! Consider the following example: + +The current database contains only three sequences `root.db.d01.s01`, `root.db.d02.s01`, `root.db.d02.s02`, and then the view is created: + +```SQL +CREATE VIEW root.view.d(alpha, beta, gamma) +AS + SELECT s01, s02 + FROM root.db.d01, root.db.d02 +``` + +The mapping relationship between time series is as follows: + +| sequence number | time series | sequence view | +| ---- | ----------------- | ----------------- | +| 1 | `root.db.d01.s01` | root.view.d.alpha | +| 2 | `root.db.d02.s01` | root.view.d.beta | +| 3 | `root.db.d02.s02` | root.view.d.gamma | + +After that, if the user adds the sequence `root.db.d01.s02`, it does not correspond to any view; then, if the user deletes `root.db.d01.s01`, the query for `root.view.d.alpha` will report an error directly, and it will not correspond to `root.db.d01.s02` either. + +Please always note that inter-sequence mapping relationships are stored statically and solidly. + +#### Batch Creation of Sequence Views + +There are several existing devices, each with a temperature value, for example: + +1. root.db.d1.temperature +2. root.db.d2.temperature +3. ... + +There may be many other sequences stored under these devices (e.g. `root.db.d1.speed`), but for now it is possible to create a view that contains only the temperature values for these devices, without relation to the other sequences:. + +```SQL +CREATE VIEW root.db.view(${2}_temperature) +AS + SELECT temperature FROM root.db.* +``` + +This is modelled on the query writeback (`SELECT INTO`) convention for naming rules, which uses variable placeholders to specify naming rules. See also: [QUERY WRITEBACK (SELECT INTO)](../User-Manual/Query-Data.md#into-clause-query-write-back) + +Here `root.db.*.temperature` specifies what time series will be included in the view; and `${2}` specifies from which node in the time series the name is extracted to name the sequence view. + +Here, `${2}` refers to level 2 (starting at 0) of `root.db.*.temperature`, which is the result of the `*` match; and `${2}_temperature` is the result of the match and `temperature` spliced together with underscores to make up the node names of the sequences under the view. + +The above statement for creating a view is equivalent to the following writeup: + +```SQL +CREATE VIEW root.db.view(${2}_${3}) +AS + SELECT temperature from root.db.* +``` + +The final view contains these sequences: + +1. root.db.view.d1_temperature +2. root.db.view.d2_temperature +3. ... + +Created using wildcards, only static mapping relationships at the moment of creation will be stored. + +#### SELECT clauses are somewhat limited when creating views + +The SELECT clause used when creating a serial view is subject to certain restrictions. The main restrictions are as follows: + +1. the `WHERE` clause cannot be used. +2. `GROUP BY` clause cannot be used. +3. `MAX_VALUE` and other aggregation functions cannot be used. + +Simply put, after `AS` you can only use `SELECT ... FROM ... ` and the results of this query must form a time series. + +### View Data Queries + +For the data query functions that can be supported, the sequence view and time series can be used indiscriminately with identical behaviour when performing time series data queries. + +**The types of queries that are not currently supported by the sequence view are as follows:** + +1. **align by device query +2. **group by tags query + +Users can also mix time series and sequence view queries in the same SELECT statement, for example: + +```SQL +SELECT temperature01, temperature02, avg_temperature +FROM root.db.d01 +WHERE temperature01 < temperature02 +``` + +However, if the user wants to query the metadata of the sequence, such as tag, attributes, etc., the query is the result of the sequence view, not the result of the time series referenced by the sequence view. + +In addition, for aliased sequences, if the user wants to get information about the time series such as tags, attributes, etc., the user needs to query the mapping of the view columns to find the corresponding time series, and then query the time series for the tags, attributes, etc. The method of querying the mapping of the view columns will be explained in section 3.5. + +### Modify Views + +Modifying a view, such as changing its name, modifying its calculation logic, deleting it, etc., is similar to creating a new view, in that you need to re-specify all the column descriptions for the entire view. + +#### Modify view data source + +```SQL +ALTER VIEW root.view.device.status +AS + SELECT s01 + FROM root.ln.wf.d01 +``` + +#### Modify the view's calculation logic + +```SQL +ALTER VIEW root.db.d01.avg_temperature +AS + SELECT (temperature01 + temperature02 + temperature03) / 3 + FROM root.db.d01 +``` + +#### Tag point management + +- Add a new +tag +```SQL +ALTER view root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` + +- Add a new attribute + +```SQL +ALTER view root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` + +- rename tag or attribute + +```SQL +ALTER view root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +- Reset the value of a tag or attribute + +```SQL +ALTER view root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` + +- Delete an existing tag or attribute + +```SQL +ALTER view root.turbine.d1.s1 DROP tag1, tag2 +``` + +- Update insert aliases, tags and attributes + +> If the alias, tag or attribute did not exist before, insert it, otherwise, update the old value with the new one. + +```SQL +ALTER view root.turbine.d1.s1 UPSERT TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +#### Deleting Views + +Since a view is a sequence, a view can be deleted as if it were a time series. + + +```SQL +DELETE VIEW root.view.device.avg_temperatue +``` + +### View Synchronisation + +Sequence view data is always obtained via real-time queries, so data synchronisation is naturally supported. + +#### If the dependent original sequence is deleted + +When the sequence view is queried (when the sequence is parsed), **the empty result set** is returned if the dependent time series does not exist. + +This is similar to the feedback for querying a non-existent sequence, but with a difference: if the dependent time series cannot be parsed, the empty result set is the one that contains the table header as a reminder to the user that the view is problematic. + +Additionally, when the dependent time series is deleted, no attempt is made to find out if there is a view that depends on the column, and the user receives no warning. + +#### Data Writes to Non-Aliased Sequences Not Supported + +Writes to non-alias sequences are not supported. + +Please refer to the previous section 2.1.6 Restrictions2 for more details. + +#### Metadata for sequences is not shared + +Please refer to the previous section 2.1.6 Restriction 5 for details. + +### View Metadata Queries + +View metadata query specifically refers to querying the metadata of the view itself (e.g., how many columns the view has), as well as information about the views in the database (e.g., what views are available). + +#### Viewing Current View Columns + +The user has two ways of querying: + +1. a query using `SHOW TIMESERIES`, which contains both time series and series views. This query contains both the time series and the sequence view. However, only some of the attributes of the view can be displayed. +2. a query using `SHOW VIEW`, which contains only the sequence view. It displays the complete properties of the sequence view. + +Example: + +```Shell +IoTDB> show timeseries; ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.device.s01 | null| root.db| INT32| RLE| SNAPPY|null| null| null| null| BASE| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.view.status | null| root.db| INT32| RLE| SNAPPY|null| null| null| null| VIEW| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.d01.temp01 | null| root.db| FLOAT| RLE| SNAPPY|null| null| null| null| BASE| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.d01.temp02 | null| root.db| FLOAT| RLE| SNAPPY|null| null| null| null| BASE| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.d01.avg_temp| null| root.db| FLOAT| null| null|null| null| null| null| VIEW| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +Total line number = 5 +It costs 0.789s +IoTDB> +``` + +The last column `ViewType` shows the type of the sequence, the time series is BASE and the sequence view is VIEW. + +In addition, some of the sequence view properties will be missing, for example `root.db.d01.avg_temp` is calculated from temperature averages, so the `Encoding` and `Compression` properties are null values. + +In addition, the query results of the `SHOW TIMESERIES` statement are divided into two main parts. + +1. information about the timing data, such as data type, compression, encoding, etc. +2. other metadata information, such as tag, attribute, database, etc. + +For the sequence view, the temporal data information presented is the same as the original sequence or null (e.g., the calculated average temperature has a data type but no compression method); the metadata information presented is the content of the view. + +To learn more about the view, use `SHOW ``VIEW`. The `SHOW ``VIEW` shows the source of the view's data, etc. + +```Shell +IoTDB> show VIEW root.**; ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +| Timeseries|Database|DataType|Tags|Attributes|ViewType| SOURCE| ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +|root.db.view.status | root.db| INT32|null| null| VIEW| root.db.device.s01| ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +|root.db.d01.avg_temp| root.db| FLOAT|null| null| VIEW|(root.db.d01.temp01+root.db.d01.temp02)/2| ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +Total line number = 2 +It costs 0.789s +IoTDB> +``` + +The last column, `SOURCE`, shows the data source for the sequence view, listing the SQL statement that created the sequence. + +##### About Data Types + +Both of the above queries involve the data type of the view. The data type of a view is inferred from the original time series type of the query statement or alias sequence that defines the view. This data type is computed in real time based on the current state of the system, so the data type queried at different moments may be changing. + +## FAQ + +#### Q1: I want the view to implement the function of type conversion. For example, a time series of type int32 was originally placed in the same view as other series of type int64. I now want all the data queried through the view to be automatically converted to int64 type. + +> Ans: This is not the function of the sequence view. But the conversion can be done using `CAST`, for example: + +```SQL +CREATE VIEW root.db.device.int64_status +AS + SELECT CAST(s1, 'type'='INT64') from root.db.device +``` + +> This way, a query for `root.view.status` will yield a result of type int64. +> +> Please note in particular that in the above example, the data for the sequence view is obtained by `CAST` conversion, so `root.db.device.int64_status` is not an aliased sequence, and thus **not supported for writing**. + +#### Q2: Is default naming supported? Select a number of time series and create a view; but I don't specify the name of each series, it is named automatically by the database? + +> Ans: Not supported. Users must specify the naming explicitly. + +#### Q3: In the original system, create time series `root.db.device.s01`, you can find that database `root.db` is automatically created and device `root.db.device` is automatically created. Next, deleting the time series `root.db.device.s01` reveals that `root.db.device` was automatically deleted, while `root.db` remained. Will this mechanism be followed for creating views? What are the considerations? + +> Ans: Keep the original behaviour unchanged, the introduction of view functionality will not change these original logics. + +#### Q4: Does it support sequence view renaming? + +> A: Renaming is not supported in the current version, you can create your own view with new name to put it into use. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Maintennance.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Maintennance.md new file mode 100644 index 00000000..4f2e88b1 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Maintennance.md @@ -0,0 +1,372 @@ + + + +# Maintennance + +## Explain/Explain Analyze Statements + +The purpose of query analysis is to assist users in understanding the execution mechanism and performance bottlenecks of queries, thereby facilitating query optimization and performance enhancement. This is crucial not only for the efficiency of query execution but also for the user experience of applications and the efficient utilization of resources. For effective query analysis, IoTDB versions V1.3.2 and above offer the query analysis statements: Explain and Explain Analyze. + +- Explain Statement: The Explain statement allows users to preview the execution plan of a query SQL, including how IoTDB organizes data retrieval and processing. + +- Explain Analyze Statement: The Explain Analyze statement builds upon the Explain statement by incorporating performance analysis, fully executing the SQL, and displaying the time and resource consumption during the query execution process. This provides IoTDB users with detailed information to deeply understand the details of the query and to perform query optimization. Compared to other common IoTDB troubleshooting methods, Explain Analyze imposes no deployment burden and can analyze a single SQL statement, which can better pinpoint issues. + +The comparison of various methods is as follows: + +| Method | Installation Difficulty | Business Impact | Functional Scope | +| :------------------ | :----------------------------------------------------------- | :------------------------------------------------ | :----------------------------------------------------------- | +| Explain Analyze Statement | Low. No additional components are needed; it's a built-in SQL statement of IoTDB. | Low. It only affects the single query being analyzed, with no impact on other online loads. | Supports distributed systems, and can track a single SQL statement. | +| Monitoring Panel | Medium. Requires the installation of the IoTDB monitoring panel tool (an enterprise version tool) and the activation of the IoTDB monitoring service. | Medium. The IoTDB monitoring service's recording of metrics will introduce additional latency. | Supports distributed systems, but only analyzes the overall query load and time consumption of the database. | +| Arthas Sampling | Medium. Requires the installation of the Java Arthas tool (Arthas cannot be directly installed in some intranets, and sometimes a restart of the application is needed after installation). | High. CPU sampling may affect the response speed of online business. | Does not support distributed systems and only analyzes the overall query load and time consumption of the database. | + +### Explain Statement + +#### Syntax + +The Explain command enables users to view the execution plan of a SQL query. The execution plan is presented in the form of operators, describing how IoTDB will execute the query. The syntax is as follows, where SELECT_STATEMENT is the SQL statement related to the query: + +```SQL +EXPLAIN +``` + +The results returned by Explain include information such as data access strategies, whether filter conditions are pushed down, and how the query plan is distributed across different nodes, providing users with a means to visualize the internal execution logic of the query. + +#### Example + +```SQL + +# Insert data + +insert into root.explain.data(timestamp, column1, column2) values(1710494762, "hello", "explain") + +# Execute explain statement + +explain select * from root.explain.data +``` + +Executing the above SQL will yield the following results. It is evident that IoTDB uses two SeriesScan nodes to retrieve the data for column1 and column2, and finally connects them through a fullOuterTimeJoin. + +```Plain ++-----------------------------------------------------------------------+ +| distribution plan| ++-----------------------------------------------------------------------+ +| ┌───────────────────┐ | +| │FullOuterTimeJoin-3│ | +| │Order: ASC │ | +| └───────────────────┘ | +| ┌─────────────────┴─────────────────┐ | +| │ │ | +|┌─────────────────────────────────┐ ┌─────────────────────────────────┐| +|│SeriesScan-4 │ │SeriesScan-5 │| +|│Series: root.explain.data.column1│ │Series: root.explain.data.column2│| +|│Partition: 3 │ │Partition: 3 │| +|└─────────────────────────────────┘ └─────────────────────────────────┘| ++-----------------------------------------------------------------------+ +``` + +### Explain Analyze Statement + +#### Syntax + +Explain Analyze is a performance analysis SQL that comes with the IoTDB query engine. Unlike Explain, it executes the corresponding query plan and collects execution information, which can be used to track the specific performance distribution of a query, for observing resources, performance tuning, and anomaly analysis. The syntax is as follows: + +```SQL +EXPLAIN ANALYZE [VERBOSE] +``` + +Where SELECT_STATEMENT corresponds to the query statement that needs to be analyzed; VERBOSE prints detailed analysis results, and when VERBOSE is not filled in, EXPLAIN ANALYZE will omit some information. + +In the EXPLAIN ANALYZE result set, the following information is included: + + +![explain-analyze-1.png](https://alioss.timecho.com/upload/explain-analyze-1.png) + + +- QueryStatistics contains statistical information at the query level, mainly including the time spent in the planning and parsing phase, Fragment metadata, and other information. +- FragmentInstance is an encapsulation of the query plan on a node by IoTDB. Each node will output a Fragment information in the result set, mainly including FragmentStatistics and operator information. FragmentStatistics contains statistical information of the Fragment, including total actual time (wall time), TsFile involved, scheduling information, etc. At the same time, the statistical information of the plan nodes under this Fragment will be displayed in a hierarchical way of the node tree, mainly including: CPU running time, the number of output data rows, the number of times the specified interface is called, the memory occupied, and the custom information exclusive to the node. + +#### Special Instructions + +1. Simplification of Explain Analyze Statement Results + +Since the Fragment will output all the node information executed in the current node, when a query involves too many series, each node is output, which will cause the result set returned by Explain Analyze to be too large. Therefore, when the same type of node exceeds 10, the system will automatically merge all the same types of nodes under the current Fragment, and the merged statistical information is also accumulated. Some custom information that cannot be merged will be directly discarded (as shown in the figure below). + +![explain-analyze-2.png](https://alioss.timecho.com/upload/explain-analyze-2.png) + +Users can also modify the configuration item `merge_threshold_of_explain_analyze` in `iotdb-system.properties` to set the threshold for triggering the merge of nodes. This parameter supports hot loading. + +2. Use of Explain Analyze Statement in Query Timeout Scenarios + +Explain Analyze itself is a special query. When the execution times out, it cannot be analyzed with the Explain Analyze statement. In order to be able to investigate the cause of the timeout through the analysis results even when the query times out, Explain Analyze also provides a timing log mechanism (no user configuration is required), which will output the current results of Explain Analyze in the form of text to a special log at a certain time interval. When the query times out, users can go to `logs/log_explain_analyze.log` to check the corresponding log for investigation. + +The time interval of the log is calculated based on the query timeout time to ensure that at least two result records will be saved before the timeout. + +#### Example + +Here is an example of Explain Analyze: + +```SQL + +# Insert data + +insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494762, "hello", "explain", "analyze") +insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494862, "hello2", "explain2", "analyze2") +insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494962, "hello3", "explain3", "analyze3") + +# Execute explain analyze statement + +explain analyze select column2 from root.explain.analyze.data order by column1 +``` + +The output is as follows: + + +```Plain ++-------------------------------------------------------------------------------------------------+ +| Explain Analyze| ++-------------------------------------------------------------------------------------------------+ +|Analyze Cost: 1.739 ms | +|Fetch Partition Cost: 0.940 ms | +|Fetch Schema Cost: 0.066 ms | +|Logical Plan Cost: 0.000 ms | +|Logical Optimization Cost: 0.000 ms | +|Distribution Plan Cost: 0.000 ms | +|Fragment Instances Count: 1 | +| | +|FRAGMENT-INSTANCE[Id: 20240315_115800_00030_1.2.0][IP: 127.0.0.1][DataRegion: 4][State: FINISHED]| +| Total Wall Time: 25 ms | +| Cost of initDataQuerySource: 0.175 ms | +| Seq File(unclosed): 0, Seq File(closed): 1 | +| UnSeq File(unclosed): 0, UnSeq File(closed): 0 | +| ready queued time: 0.280 ms, blocked queued time: 2.456 ms | +| [PlanNodeId 10]: IdentitySinkNode(IdentitySinkOperator) | +| CPU Time: 0.780 ms | +| output: 1 rows | +| HasNext() Called Count: 3 | +| Next() Called Count: 2 | +| Estimated Memory Size: : 1245184 | +| [PlanNodeId 5]: TransformNode(TransformOperator) | +| CPU Time: 0.764 ms | +| output: 1 rows | +| HasNext() Called Count: 3 | +| Next() Called Count: 2 | +| Estimated Memory Size: : 1245184 | +| [PlanNodeId 4]: SortNode(SortOperator) | +| CPU Time: 0.721 ms | +| output: 1 rows | +| HasNext() Called Count: 3 | +| Next() Called Count: 2 | +| sortCost/ns: 1125 | +| sortedDataSize: 272 | +| prepareCost/ns: 610834 | +| [PlanNodeId 3]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) | +| CPU Time: 0.706 ms | +| output: 1 rows | +| HasNext() Called Count: 5 | +| Next() Called Count: 1 | +| [PlanNodeId 7]: SeriesScanNode(SeriesScanOperator) | +| CPU Time: 1.085 ms | +| output: 1 rows | +| HasNext() Called Count: 2 | +| Next() Called Count: 1 | +| SeriesPath: root.explain.analyze.data.column2 | +| [PlanNodeId 8]: SeriesScanNode(SeriesScanOperator) | +| CPU Time: 1.091 ms | +| output: 1 rows | +| HasNext() Called Count: 2 | +| Next() Called Count: 1 | +| SeriesPath: root.explain.analyze.data.column1 | ++-------------------------------------------------------------------------------------------------+ +``` + +Example of Partial Results After Triggering Merge: + +```Plain +Analyze Cost: 143.679 ms +Fetch Partition Cost: 22.023 ms +Fetch Schema Cost: 63.086 ms +Logical Plan Cost: 0.000 ms +Logical Optimization Cost: 0.000 ms +Distribution Plan Cost: 0.000 ms +Fragment Instances Count: 2 + +FRAGMENT-INSTANCE[Id: 20240311_041502_00001_1.2.0][IP: 192.168.130.9][DataRegion: 14] + Total Wall Time: 39964 ms + Cost of initDataQuerySource: 1.834 ms + Seq File(unclosed): 0, Seq File(closed): 3 + UnSeq File(unclosed): 0, UnSeq File(closed): 0 + ready queued time: 504.334 ms, blocked queued time: 25356.419 ms + [PlanNodeId 20793]: IdentitySinkNode(IdentitySinkOperator) Count: * 1 + CPU Time: 24440.724 ms + input: 71216 rows + HasNext() Called Count: 35963 + Next() Called Count: 35962 + Estimated Memory Size: : 33882112 + [PlanNodeId 10385]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) Count: * 8 + CPU Time: 41437.708 ms + input: 243011 rows + HasNext() Called Count: 41965 + Next() Called Count: 41958 + Estimated Memory Size: : 33882112 + [PlanNodeId 11569]: SeriesScanNode(SeriesScanOperator) Count: * 1340 + CPU Time: 1397.822 ms + input: 134000 rows + HasNext() Called Count: 2353 + Next() Called Count: 1340 + Estimated Memory Size: : 32833536 + [PlanNodeId 20778]: ExchangeNode(ExchangeOperator) Count: * 7 + CPU Time: 109.245 ms + input: 71891 rows + HasNext() Called Count: 1431 + Next() Called Count: 1431 + +FRAGMENT-INSTANCE[Id: 20240311_041502_00001_1.3.0][IP: 192.168.130.9][DataRegion: 11] + Total Wall Time: 39912 ms + Cost of initDataQuerySource: 15.439 ms + Seq File(unclosed): 0, Seq File(closed): 2 + UnSeq File(unclosed): 0, UnSeq File(closed): 0 + ready queued time: 152.988 ms, blocked queued time: 37775.356 ms + [PlanNodeId 20786]: IdentitySinkNode(IdentitySinkOperator) Count: * 1 + CPU Time: 2020.258 ms + input: 48800 rows + HasNext() Called Count: 978 + Next() Called Count: 978 + Estimated Memory Size: : 42336256 + [PlanNodeId 20771]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) Count: * 8 + CPU Time: 5255.307 ms + input: 195800 rows + HasNext() Called Count: 2455 + Next() Called Count: 2448 + Estimated Memory Size: : 42336256 + [PlanNodeId 11867]: SeriesScanNode(SeriesScanOperator) Count: * 1680 + CPU Time: 1248.080 ms + input: 168000 rows + HasNext() Called Count: 3198 + Next() Called Count: 1680 + Estimated Memory Size: : 41287680 + +...... +``` + + +### Common Issues + +#### What is the difference between WALL TIME and CPU TIME? + +CPU time, also known as processor time or CPU usage time, refers to the actual time the CPU is occupied with computation during the execution of a program, indicating the actual consumption of processor resources by the program. + +Wall time, also known as real time or physical time, refers to the total time from the start to the end of a program's execution, including all waiting times. + +1. Scenarios where WALL TIME < CPU TIME: For example, a query slice is finally executed in parallel by the scheduler using two threads. In the real physical world, 10 seconds have passed, but the two threads may have occupied two CPU cores and run for 10 seconds each, so the CPU time would be 20 seconds, while the wall time would be 10 seconds. + +2. Scenarios where WALL TIME > CPU TIME: Since there may be multiple queries running in parallel within the system, but the number of execution threads and memory is fixed, + 1. So when a query slice is blocked by some resources (such as not having enough memory for data transfer or waiting for upstream data), it will be put into the Blocked Queue. At this time, the query slice will not occupy CPU time, but the WALL TIME (real physical time) is still advancing. + 2. Or when the query thread resources are insufficient, for example, there are currently 16 query threads in total, but there are 20 concurrent query slices within the system. Even if all queries are not blocked, only 16 query slices can run in parallel at the same time, and the other four will be put into the READY QUEUE, waiting to be scheduled for execution. At this time, the query slice will not occupy CPU time, but the WALL TIME (real physical time) is still advancing. + +#### Is there any additional overhead with Explain Analyze, and is the measured time different from when the query is actually executed? + +Almost none, because the explain analyze operator is executed by a separate thread to collect the statistical information of the original query, and these statistical information, even if not explain analyze, the original query will also generate, but no one goes to get it. And explain analyze is a pure next traversal of the result set, which will not be printed, so there will be no significant difference from the actual execution time of the original query. + +#### What are the main indicators to focus on for IO time consumption? + +The main indicators that may involve IO time consumption are loadTimeSeriesMetadataDiskSeqTime, loadTimeSeriesMetadataDiskUnSeqTime, and construct[NonAligned/Aligned]ChunkReadersDiskTime. + +The loading of TimeSeriesMetadata statistics is divided into sequential and unaligned files, but the reading of Chunks is not temporarily separated, but the proportion of sequential and unaligned can be calculated based on the proportion of TimeSeriesMetadata. + +#### Can the impact of unaligned data on query performance be demonstrated with some indicators? + +There are mainly two impacts of unaligned data: + +1. An additional merge sort needs to be done in memory (it is generally believed that this time consumption is relatively short, after all, it is a pure memory CPU operation) + +2. Unaligned data will generate overlapping time ranges between data blocks, making statistical information unusable + 1. Unable to directly skip the entire chunk that does not meet the value filtering requirements using statistical information + 1. Generally, the user's query only includes time filtering conditions, so there will be no impact + 2. Unable to directly calculate the aggregate value using statistical information without reading the data + +At present, there is no effective observation method for the performance impact of unaligned data alone, unless a query is executed when there is unaligned data, and then executed again after the unaligned data is merged, in order to compare. + +Because even if this part of the unaligned data is entered into the sequence, IO, compression, and decoding are also required. This time cannot be reduced, and it will not be reduced just because the unaligned data has been merged into the unaligned. + +#### Why is there no output in the log_explain_analyze.log when the query times out during the execution of explain analyze? + +During the upgrade, only the lib package was replaced, and the conf/logback-datanode.xml was not replaced. It needs to be replaced, and there is no need to restart (the content of this file can be hot loaded). After waiting for about 1 minute, re-execute explain analyze verbose. + + +### Practical Case Studies + +#### Case Study 1: The query involves too many files, and disk IO becomes a bottleneck, causing the query speed to slow down. + +![explain-analyze-3.png](https://alioss.timecho.com/upload/explain-analyze-3.png) + +The total query time is 938 ms, of which the time to read the index area and data area from the files accounts for 918 ms, involving a total of 289 files. Assuming the query involves N TsFiles, the theoretical time for the first query (not hitting the cache) is cost = N * (t_seek + t_index + t_seek + t_chunk). Based on experience, the time for a single seek on an HDD disk is about 5-10ms, so the more files involved in the query, the greater the query delay will be. + +The final optimization plan is: + +1. Adjust the merge parameters to reduce the number of files + +2. Replace HDD with SSD to reduce the latency of a single disk IO + + +#### Case Study 2: The execution of the like predicate is slow, causing the query to time out + +When executing the following SQL, the query times out (the default timeout is 60 seconds) + +```SQL +select count(s1) as total from root.db.d1 where s1 like '%XXXXXXXX%' +``` + +When executing explain analyze verbose, even if the query times out, the intermediate collection results will be output to log_explain_analyze.log every 15 seconds. The last two outputs obtained from log_explain_analyze.log are as follows: + +![explain-analyze-4.png](https://alioss.timecho.com/upload/explain-analyze-4.png) + +![explain-analyze-5.png](https://alioss.timecho.com/upload/explain-analyze-5.png) + + +Observing the results, we found that it is because the query did not add a time condition, involving too much data, and the time of constructAlignedChunkReadersDiskTime and pageReadersDecodeAlignedDiskTime has been increasing, which means that new chunks are being read all the time. However, the output information of AlignedSeriesScanNode has always been 0, because the operator only gives up the time slice and updates the information when at least one line of data that meets the condition is output. Looking at the total reading time (loadTimeSeriesMetadataAlignedDiskSeqTime + loadTimeSeriesMetadataAlignedDiskUnSeqTime + constructAlignedChunkReadersDiskTime + pageReadersDecodeAlignedDiskTime = about 13.4 seconds), the other time (60s - 13.4 = 46.6) should all be spent on executing the filtering condition (the execution of the like predicate is very time-consuming). + +The final optimization plan is: Add a time filtering condition to avoid a full table scan. + + +## Start/Stop Repair Data Statements +Used to repair the unsorted data generate by system bug. +### START REPAIR DATA + +Start a repair task to scan all files created before current time. +The repair task will scan all tsfiles and repair some bad files. + +```sql +IoTDB> START REPAIR DATA +IoTDB> START REPAIR DATA ON LOCAL +IoTDB> START REPAIR DATA ON CLUSTER +``` + +### STOP REPAIR DATA + +Stop the running repair task. To restart the stopped task. +If there is a stopped repair task, it can be restart and recover the repair progress by executing SQL `START REPAIR DATA`. + +```sql +IoTDB> STOP REPAIR DATA +IoTDB> STOP REPAIR DATA ON LOCAL +IoTDB> STOP REPAIR DATA ON CLUSTER +``` diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Streaming_apache.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Streaming_apache.md new file mode 100644 index 00000000..bd1711a4 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Streaming_apache.md @@ -0,0 +1,804 @@ + + +# Stream Processing + +The IoTDB stream processing framework allows users to implement customized stream processing logic, which can monitor and capture storage engine changes, transform changed data, and push transformed data outward. + +We call a data flow processing task a Pipe. A stream processing task (Pipe) contains three subtasks: + +- Source task +- Processor task +- Sink task + +The stream processing framework allows users to customize the processing logic of three subtasks using Java language and process data in a UDF-like manner. +In a Pipe, the three subtasks mentioned above are executed and implemented by three types of plugins. Data flows through these three plugins sequentially for processing: +Pipe Source is used to extract data, Pipe Processor is used to process data, Pipe Sink is used to send data, and the final data will be sent to an external system. + +**The model for a Pipe task is as follows:** + +![pipe.png](https://alioss.timecho.com/docs/img/1706778988482.jpg) + +A data stream processing task essentially describes the attributes of the Pipe Source, Pipe Processor, and Pipe Sink plugins. + +Users can configure the specific attributes of these three subtasks declaratively using SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. + +Using the stream processing framework, it is possible to build a complete data pipeline to fulfill various requirements such as *edge-to-cloud synchronization, remote disaster recovery, and read/write load balancing across multiple databases*. + +## Custom Stream Processing Plugin Development + +### Programming development dependencies + +It is recommended to use Maven to build the project. Add the following dependencies in the `pom.xml` file. Please make sure to choose dependencies with the same version as the IoTDB server version. + +```xml + + org.apache.iotdb + pipe-api + 1.3.1 + provided + +``` + +### Event-Driven Programming Model + +The design of user programming interfaces for stream processing plugins follows the principles of the event-driven programming model. In this model, events serve as the abstraction of data in the user programming interface. The programming interface is decoupled from the specific execution method, allowing the focus to be on describing how the system expects events (data) to be processed upon arrival. + +In the user programming interface of stream processing plugins, events abstract the write operations of database data. Events are captured by the local stream processing engine and passed sequentially through the three stages of stream processing, namely Pipe Source, Pipe Processor, and Pipe Sink plugins. User logic is triggered and executed within these three plugins. + +To accommodate both low-latency stream processing in low-load scenarios and high-throughput stream processing in high-load scenarios at the edge, the stream processing engine dynamically chooses the processing objects from operation logs and data files. Therefore, the user programming interface for stream processing requires the user to provide the handling logic for two types of events: TabletInsertionEvent for operation log write events and TsFileInsertionEvent for data file write events. + +#### **TabletInsertionEvent** + +The TabletInsertionEvent is a high-level data abstraction for user write requests, which provides the ability to manipulate the underlying data of the write request by providing a unified operation interface. + +For different database deployments, the underlying storage structure corresponding to the operation log write event is different. For stand-alone deployment scenarios, the operation log write event is an encapsulation of write-ahead log (WAL) entries; for distributed deployment scenarios, the operation log write event is an encapsulation of individual node consensus protocol operation log entries. + +For write operations generated by different write request interfaces of the database, the data structure of the request structure corresponding to the operation log write event is also different.IoTDB provides many write interfaces such as InsertRecord, InsertRecords, InsertTablet, InsertTablets, and so on, and each kind of write request uses a completely different serialisation method to generate a write request. completely different serialisation methods and generate different binary entries. + +The existence of operation log write events provides users with a unified view of data operations, which shields the implementation differences of the underlying data structures, greatly reduces the programming threshold for users, and improves the ease of use of the functionality. + +```java +/** TabletInsertionEvent is used to define the event of data insertion. */ +public interface TabletInsertionEvent extends Event { + + /** + * The consumer processes the data row by row and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processRowByRow(BiConsumer consumer); + + /** + * The consumer processes the Tablet directly and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processTablet(BiConsumer consumer); +} +``` + +#### **TsFileInsertionEvent** + +The TsFileInsertionEvent represents a high-level abstraction of the database's disk flush operation and is a collection of multiple TabletInsertionEvents. + +IoTDB's storage engine is based on the LSM (Log-Structured Merge) structure. When data is written, the write operations are first flushed to log-structured files, while the written data is also stored in memory. When the memory reaches its capacity limit, a flush operation is triggered, converting the data in memory into a database file while deleting the previously written log entries. During the conversion from memory data to database file data, two compression processes, encoding compression and universal compression, are applied. As a result, the data in the database file occupies less space compared to the original data in memory. + +In extreme network conditions, directly transferring data files is more cost-effective than transmitting individual write operations. It consumes lower network bandwidth and achieves faster transmission speed. However, there is no such thing as a free lunch. Performing calculations on data in the disk file incurs additional costs for file I/O compared to performing calculations directly on data in memory. Nevertheless, the coexistence of disk data files and memory write operations permits dynamic trade-offs and adjustments. It is based on this observation that the data file write event is introduced into the event model of the plugin. + +In summary, the data file write event appears in the event stream of stream processing plugins in the following two scenarios: + +1. Historical data extraction: Before a stream processing task starts, all persisted write data exists in the form of TsFiles. When collecting historical data at the beginning of a stream processing task, the historical data is abstracted as TsFileInsertionEvent. + +2. Real-time data extraction: During the execution of a stream processing task, if the speed of processing the log entries representing real-time operations is slower than the rate of write requests, the unprocessed log entries will be persisted to disk in the form of TsFiles. When these data are extracted by the stream processing engine, they are abstracted as TsFileInsertionEvent. + +```java +/** + * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, + * which is compressed and encoded, and requires IO cost for computational processing. + */ +public interface TsFileInsertionEvent extends Event { + + /** + * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. + * + * @return {@code Iterable} the list of TabletInsertionEvent + */ + Iterable toTabletInsertionEvents(); +} +``` + +### Custom Stream Processing Plugin Programming Interface Definition + +Based on the custom stream processing plugin programming interface, users can easily write data extraction plugins, data processing plugins, and data sending plugins, allowing the stream processing functionality to adapt flexibly to various industrial scenarios. +#### Data Extraction Plugin Interface + +Data extraction is the first stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data extraction plugin (PipeSource) serves as a bridge between the stream processing engine and the storage engine. It captures various data write events by listening to the behavior of the storage engine. +```java +/** + * PipeSource + * + *

PipeSource is responsible for capturing events from sources. + * + *

Various data sources can be supported by implementing different PipeSource classes. + * + *

The lifecycle of a PipeSource is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH SOURCE` clause in SQL are + * parsed and the validation method {@link PipeSource#validate(PipeParameterValidator)} will + * be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} will be called to + * config the runtime behavior of the PipeSource. + *
  • Then the method {@link PipeSource#start()} will be called to start the PipeSource. + *
  • While the collaboration task is in progress, the method {@link PipeSource#supply()} will be + * called to capture events from sources and then the events will be passed to the + * PipeProcessor. + *
  • The method {@link PipeSource#close()} will be called when the collaboration task is + * cancelled (the `DROP PIPE` command is executed). + *
+ */ +public interface PipeSource extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSource. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSourceRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSource#validate(PipeParameterValidator)} + * is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSource + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSourceRuntimeConfiguration configuration) + throws Exception; + + /** + * Start the source. After this method is called, events should be ready to be supplied by + * {@link PipeSource#supply()}. This method is called after {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @throws Exception the user can throw errors if necessary + */ + void start() throws Exception; + + /** + * Supply single event from the source and the caller will send the event to the processor. + * This method is called after {@link PipeSource#start()} is called. + * + * @return the event to be supplied. the event may be null if the source has no more events at + * the moment, but the source is still running for more events. + * @throws Exception the user can throw errors if necessary + */ + Event supply() throws Exception; +} +``` + +#### Data Processing Plugin Interface + +Data processing is the second stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data processing plugin (PipeProcessor) is primarily used for filtering and transforming the various events captured by the data extraction plugin (PipeSource). + +```java +/** + * PipeProcessor + * + *

PipeProcessor is used to filter and transform the Event formed by the PipeSource. + * + *

The lifecycle of a PipeProcessor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are + * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeProcessor. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSink. The + * following 3 methods will be called: {@link + * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link + * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link + * PipeProcessor#process(Event, EventCollector)}. + *
    • PipeSink serializes the events into binaries and send them to sinks. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeProcessor#close() } method will be called. + *
+ */ +public interface PipeProcessor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeProcessor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the + * events processing. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeProcessor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is called to process the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) + throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + process(tabletInsertionEvent, eventCollector); + } + } + + /** + * This method is called to process the Event. + * + * @param event Event to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(Event event, EventCollector eventCollector) throws Exception; +} +``` + +#### Data Sending Plugin Interface + +Data sending is the third stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data sending plugin (PipeSink) is responsible for sending the various events processed by the data processing plugin (PipeProcessor). It serves as the network implementation layer of the stream processing framework and should support multiple real-time communication protocols and connectors in its interface. + +```java +/** + * PipeSink + * + *

PipeSink is responsible for sending events to sinks. + * + *

Various network protocols can be supported by implementing different PipeSink classes. + * + *

The lifecycle of a PipeSink is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH SINK` clause in SQL are + * parsed and the validation method {@link PipeSink#validate(PipeParameterValidator)} will be + * called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link PipeSink#customize(PipeParameters, + * PipeSinkRuntimeConfiguration)} will be called to configure the runtime behavior of the + * PipeSink and the method {@link PipeSink#handshake()} will be called to create a connection + * with sink. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSink. + *
    • PipeSink serializes the events into binaries and send them to sinks. The following 3 + * methods will be called: {@link PipeSink#transfer(TabletInsertionEvent)}, {@link + * PipeSink#transfer(TsFileInsertionEvent)} and {@link PipeSink#transfer(Event)}. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeSink#close() } method will be called. + *
+ * + *

In addition, the method {@link PipeSink#heartbeat()} will be called periodically to check + * whether the connection with sink is still alive. The method {@link PipeSink#handshake()} will be + * called to create a new connection with the sink when the method {@link PipeSink#heartbeat()} + * throws exceptions. + */ +public interface PipeSink extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSink. In this method, the user can do the following + * things: + * + *

    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSinkRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSink#validate(PipeParameterValidator)} is + * called and before the method {@link PipeSink#handshake()} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSink + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSinkRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is used to create a connection with sink. This method will be called after the + * method {@link PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called or + * will be called when the method {@link PipeSink#heartbeat()} throws exceptions. + * + * @throws Exception if the connection is failed to be created + */ + void handshake() throws Exception; + + /** + * This method will be called periodically to check whether the connection with sink is still + * alive. + * + * @throws Exception if the connection dies + */ + void heartbeat() throws Exception; + + /** + * This method is used to transfer the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; + + /** + * This method is used to transfer the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { + try { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + transfer(tabletInsertionEvent); + } + } finally { + tsFileInsertionEvent.close(); + } + } + + /** + * This method is used to transfer the generic events, including HeartbeatEvent. + * + * @param event Event to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(Event event) throws Exception; +} +``` + +## Custom Stream Processing Plugin Management + +To ensure the flexibility and usability of user-defined plugins in production environments, the system needs to provide the capability to dynamically manage plugins. This section introduces the management statements for stream processing plugins, which enable the dynamic and unified management of plugins. + +### Load Plugin Statement + +In IoTDB, to dynamically load a user-defined plugin into the system, you first need to implement a specific plugin class based on PipeSource, PipeProcessor, or PipeSink. Then, you need to compile and package the plugin class into an executable jar file. Finally, you can use the loading plugin management statement to load the plugin into IoTDB. + +The syntax of the loading plugin management statement is as follows: + +```sql +CREATE PIPEPLUGIN [IF NOT EXISTS] +AS +USING +``` + +**IF NOT EXISTS semantics**: Used in creation operations to ensure that the create command is executed when the specified Pipe Plugin does not exist, preventing errors caused by attempting to create an existing Pipe Plugin. + +For example, if a user implements a data processing plugin with the fully qualified class name "edu.tsinghua.iotdb.pipe.ExampleProcessor" and packages it into a jar file, which is stored at "https://example.com:8080/iotdb/pipe-plugin.jar", and the user wants to use this plugin in the stream processing engine, marking the plugin as "example". The creation statement for this data processing plugin is as follows: + +```sql +CREATE PIPEPLUGIN IF NOT EXISTS example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +### Delete Plugin Statement + +When user no longer wants to use a plugin and needs to uninstall the plugin from the system, you can use the Remove plugin statement as shown below. +```sql +DROP PIPEPLUGIN [IF EXISTS] +``` + +**IF EXISTS semantics**: Used in deletion operations to ensure that when a specified Pipe Plugin exists, the delete command is executed to prevent errors caused by attempting to delete a non-existent Pipe Plugin. + +### Show Plugin Statement + +User can also view the plugin in the system on need. The statement to view plugin is as follows. +```sql +SHOW PIPEPLUGINS +``` + +## System Pre-installed Stream Processing Plugin + +### Pre-built Source Plugin + +#### iotdb-source + +Function: Extract historical or realtime data inside IoTDB into pipe. + + +| key | value | value range | required or optional with default | +|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------------------------| +| source | iotdb-source | String: iotdb-source | required | +| source.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | +| source.history.start-time | start of synchronizing historical data event time,including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| source.history.end-time | end of synchronizing historical data event time,including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| start-time(V1.3.1+) | start of synchronizing all data event time,including start-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| end-time(V1.3.1+) | end of synchronizing all data event time,including end-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | + +> 🚫 **source.pattern Parameter Description** +> +> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * In the underlying implementation, when pattern is detected as root (default value) or a database name, synchronization efficiency is higher, and any other format will reduce performance. +> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'source.pattern'='root.aligned.1': + > + > * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100TS + > + > the data will be synchronized; + > + > * root.aligned.\`123\` + > + > the data will not be synchronized. + +> ❗️**start-time, end-time parameter description of source** +> +> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00. However, version 1.3.1+ supports timeStamp format like 1706704494000. + +> ✅ **A piece of data from production to IoTDB contains two key concepts of time** +> +> * **event time:** the time when the data is actually produced (or the generation time assigned to the data by the data production system, which is a time item in the data point), also called the event time. +> * **arrival time:** the time the data arrived in the IoTDB system. +> +> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. + +> 💎 **the work of iotdb-source can be split into two stages** +> +> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data +> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data +> +> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** + +### Pre-built Processor Plugin + +#### do-nothing-processor + +Function: Do not do anything with the events passed in by the source. + + +| key | value | value range | required or optional with default | +|-----------|----------------------|------------------------------|-----------------------------------| +| processor | do-nothing-processor | String: do-nothing-processor | required | +### Pre-built Sink Plugin + +#### do-nothing-sink + +Function: Does not do anything with the events passed in by the processor. + + +| key | value | value range | required or optional with default | +|------|-----------------|-------------------------|-----------------------------------| +| sink | do-nothing-sink | String: do-nothing-sink | required | + +## Stream Processing Task Management + +### Create Stream Processing Task + +A stream processing task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: + +```sql +CREATE PIPE -- PipeId is the name that uniquely identifies the sync task +WITH SOURCE ( + -- Default IoTDB Data Extraction Plugin + 'source' = 'iotdb-source', + -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery + 'source.pattern' = 'root.timecho', + -- Whether to extract historical data + 'source.history.enable' = 'true', + -- Describes the time range of the historical data being extracted, indicating the earliest possible time + 'source.history.start-time' = '2011.12.03T10:15:30+01:00', + -- Describes the time range of the extracted historical data, indicating the latest time + 'source.history.end-time' = '2022.12.03T10:15:30+01:00', + -- Whether to extract realtime data + 'source.realtime.enable' = 'true', +) +WITH PROCESSOR ( + -- Default data processing plugin, means no processing + 'processor' = 'do-nothing-processor', +) +WITH SINK ( + -- IoTDB data sending plugin with target IoTDB + 'sink' = 'iotdb-thrift-sink', + -- Data service for one of the DataNode nodes on the target IoTDB ip + 'sink.ip' = '127.0.0.1', + -- Data service port of one of the DataNode nodes of the target IoTDB + 'sink.port' = '6667', +) +``` + +**To create a stream processing task it is necessary to configure the PipeId and the parameters of the three plugin sections:** + + +| configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | +|--------------------|-------------------------------------------------------------------------------------|---------------------------------|------------------------|-----------------------------------------------------------------------------------------------|-----------------------------------------| +| pipeId | Globally uniquely identifies the name of a sync task | required | - | - | - | +| source | pipe Source plugin, for extracting synchronized data at the bottom of the database | Optional | iotdb-source | Integrate all historical data of the database and subsequent realtime data into the sync task | no | +| processor | Pipe Processor plugin, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | +| sink | Pipe Sink plugin,for sending data | required | - | - | yes | + +In the example, the iotdb-source, do-nothing-processor, and iotdb-thrift-sink plugins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plugins, **see the section "System pre-built data synchronisation plugins" **. See the "System Pre-installed Stream Processing Plugin" section**. + +**An example of a minimalist CREATE PIPE statement is as follows:** + +```sql +CREATE PIPE -- PipeId is a name that uniquely identifies the task. +WITH SINK ( + -- IoTDB data sending plugin with target IoTDB + 'sink' = 'iotdb-thrift-sink', + -- Data service for one of the DataNode nodes on the target IoTDB ip + 'sink.ip' = '127.0.0.1', + -- Data service port of one of the DataNode nodes of the target IoTDB + 'sink.port' = '6667', +) +``` + +The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of realtime data from this database instance to the IoTDB instance with target 127.0.0.1:6667. + +**Note:** + +- SOURCE and PROCESSOR are optional, if no configuration parameters are filled in, the system will use the corresponding default implementation. +- The SINK is a mandatory configuration that needs to be declared in the CREATE PIPE statement for configuring purposes. +- The SINK exhibits self-reusability. For different tasks, if their SINK possesses identical KV properties (where the value corresponds to every key), **the system will ultimately create only one instance of the SINK** to achieve resource reuse for connections. + + - For example, there are the following pipe1, pipe2 task declarations: + + ```sql + CREATE PIPE pipe1 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.thrift.host' = 'localhost', + 'sink.thrift.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.thrift.port' = '9999', + 'sink.thrift.host' = 'localhost', + ) + ``` + +- Since they have identical SINK declarations (**even if the order of some properties is different**), the framework will automatically reuse the SINK declared by them. Hence, the SINK instances for pipe1 and pipe2 will be the same. +- Please note that we should avoid constructing application scenarios that involve data cycle sync (as it can result in an infinite loop): + +- IoTDB A -> IoTDB B -> IoTDB A +- IoTDB A -> IoTDB A + +### Start Stream Processing Task + +After the successful execution of the CREATE PIPE statement, task-related instances will be created. However, the overall task's running status will be set to STOPPED(V1.3.0), meaning the task will not immediately process data. In version 1.3.1 and later, the status of the task will be set to RUNNING after CREATE. + +You can use the START PIPE statement to make the stream processing task start processing data: +```sql +START PIPE +``` + +### Stop Stream Processing Task + +Use the STOP PIPE statement to stop the stream processing task from processing data: + +```sql +STOP PIPE +``` + +### Delete Stream Processing Task + +If a stream processing task is in the RUNNING state, you can use the DROP PIPE statement to stop it and delete the entire task: + +```sql +DROP PIPE +``` + +Before deleting a stream processing task, there is no need to execute the STOP operation. + +### Show Stream Processing Task + +Use the SHOW PIPES statement to view all stream processing tasks: +```sql +SHOW PIPES +``` + +The query results are as follows: + +```sql ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +| ID| CreationTime | State|PipeSource|PipeProcessor|PipeSink|ExceptionMessage| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| {}| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +``` + +You can use `` to specify the status of a stream processing task you want to see: +```sql +SHOW PIPE +``` + +Additionally, the WHERE clause can be used to determine if the Pipe Sink used by a specific \ is being reused. + +```sql +SHOW PIPES +WHERE SINK USED BY +``` + +### Stream Processing Task Running Status Migration + +A stream processing task status can transition through several states during the lifecycle of a data synchronization pipe: + +- **RUNNING:** The pipe is actively processing data + - After the successful creation of a pipe, its initial state is set to RUNNING (V1.3.1+) +- **STOPPED:** The pipe is in a stopped state. It can have the following possibilities: + - After the successful creation of a pipe, its initial state is set to RUNNING (V1.3.0) + - The user manually pauses a pipe that is in normal running state, transitioning its status from RUNNING to STOPPED + - If a pipe encounters an unrecoverable error during execution, its status automatically changes from RUNNING to STOPPED. +- **DROPPED:** The pipe is permanently deleted + +The following diagram illustrates the different states and their transitions: + +![state migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## Authority Management + +### Stream Processing Task + +| Authority Name | Description | +|----------------|---------------------------------| +| USE_PIPE | Register task,path-independent | +| USE_PIPE | Start task,path-independent | +| USE_PIPE | Stop task,path-independent | +| USE_PIPE | Uninstall task,path-independent | +| USE_PIPE | Query task,path-independent | +### Stream Processing Task Plugin + + +| Authority Name | Description | +|----------------|---------------------------------------------------------| +| USE_PIPE | Register stream processing task plugin,path-independent | +| USE_PIPE | Delete stream processing task plugin,path-independent | +| USE_PIPE | Query stream processing task plugin,path-independent | + +## Configure Parameters + +In iotdb-system.properties : + +V1.3.0: +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 + +# The maximum number of selectors that can be used in the async connector. +# pipe_async_connector_selector_number=1 + +# The core number of clients that can be used in the async connector. +# pipe_async_connector_core_client_number=8 + +# The maximum number of clients that can be used in the async connector. +# pipe_async_connector_max_client_number=16 +``` + +V1.3.1+: +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# pipe_sink_max_client_number=16 +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Streaming_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Streaming_timecho.md new file mode 100644 index 00000000..e4c460ca --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Streaming_timecho.md @@ -0,0 +1,857 @@ + + +# Stream Processing + +The IoTDB stream processing framework allows users to implement customized stream processing logic, which can monitor and capture storage engine changes, transform changed data, and push transformed data outward. + +We call a data flow processing task a Pipe. A stream processing task (Pipe) contains three subtasks: + +- Source task +- Processor task +- Sink task + +The stream processing framework allows users to customize the processing logic of three subtasks using Java language and process data in a UDF-like manner. +In a Pipe, the above three subtasks are executed by three plugins respectively, and the data will be processed by these three plugins in turn: +Pipe Source is used to extract data, Pipe Processor is used to process data, Pipe Sink is used to send data, and the final data will be sent to an external system. + +**The model of the Pipe task is as follows:** + +![pipe.png](https://alioss.timecho.com/docs/img/1706778988482.jpg) + +Describing a data flow processing task essentially describes the properties of Pipe Source, Pipe Processor and Pipe Sink plugins. +Users can declaratively configure the specific attributes of the three subtasks through SQL statements, and achieve flexible data ETL capabilities by combining different attributes. + +Using the stream processing framework, a complete data link can be built to meet the needs of end-side-cloud synchronization, off-site disaster recovery, and read-write load sub-library*. + +## Custom stream processing plugin development + +### Programming development dependencies + +It is recommended to use maven to build the project and add the following dependencies in `pom.xml`. Please be careful to select the same dependency version as the IoTDB server version. + +```xml + + org.apache.iotdb + pipe-api + 1.3.1 + provided + +``` + +### Event-driven programming model + +The user programming interface design of the stream processing plugin refers to the general design concept of the event-driven programming model. Events are data abstractions in the user programming interface, and the programming interface is decoupled from the specific execution method. It only needs to focus on describing the processing method expected by the system after the event (data) reaches the system. + +In the user programming interface of the stream processing plugin, events are an abstraction of database data writing operations. The event is captured by the stand-alone stream processing engine, and is passed to the PipeSource plugin, PipeProcessor plugin, and PipeSink plugin in sequence according to the three-stage stream processing process, and triggers the execution of user logic in the three plugins in turn. + +In order to take into account the low latency of stream processing in low load scenarios on the end side and the high throughput of stream processing in high load scenarios on the end side, the stream processing engine will dynamically select processing objects in the operation logs and data files. Therefore, user programming of stream processing The interface requires users to provide processing logic for the following two types of events: operation log writing event TabletInsertionEvent and data file writing event TsFileInsertionEvent. + +#### **Operation log writing event (TabletInsertionEvent)** + +The operation log write event (TabletInsertionEvent) is a high-level data abstraction for user write requests. It provides users with the ability to manipulate the underlying data of write requests by providing a unified operation interface. + +For different database deployment methods, the underlying storage structures corresponding to operation log writing events are different. For stand-alone deployment scenarios, the operation log writing event is an encapsulation of write-ahead log (WAL) entries; for a distributed deployment scenario, the operation log writing event is an encapsulation of a single node consensus protocol operation log entry. + +For write operations generated by different write request interfaces in the database, the data structure of the request structure corresponding to the operation log write event is also different. IoTDB provides numerous writing interfaces such as InsertRecord, InsertRecords, InsertTablet, InsertTablets, etc. Each writing request uses a completely different serialization method, and the generated binary entries are also different. + +The existence of operation log writing events provides users with a unified view of data operations, which shields the implementation differences of the underlying data structure, greatly reduces the user's programming threshold, and improves the ease of use of the function. + +```java +/** TabletInsertionEvent is used to define the event of data insertion. */ +public interface TabletInsertionEvent extends Event { + + /** + * The consumer processes the data row by row and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processRowByRow(BiConsumer consumer); + + /** + * The consumer processes the Tablet directly and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processTablet(BiConsumer consumer); +} +``` + +#### **Data file writing event (TsFileInsertionEvent)** + +The data file writing event (TsFileInsertionEvent) is a high-level abstraction of the database file writing operation. It is a data collection of several operation log writing events (TabletInsertionEvent). + +The storage engine of IoTDB is LSM structured. When data is written, the writing operation will first be placed into a log-structured file, and the written data will be stored in the memory at the same time. When the memory reaches the control upper limit, the disk flushing behavior will be triggered, that is, the data in the memory will be converted into a database file, and the previously prewritten operation log will be deleted. When the data in the memory is converted into the data in the database file, it will undergo two compression processes: encoding compression and general compression. Therefore, the data in the database file takes up less space than the original data in the memory. + +In extreme network conditions, directly transmitting data files is more economical than transmitting data writing operations. It will occupy lower network bandwidth and achieve faster transmission speeds. Of course, there is no free lunch. Computing and processing data in files requires additional file I/O costs compared to directly computing and processing data in memory. However, it is precisely the existence of two structures, disk data files and memory write operations, with their own advantages and disadvantages, that gives the system the opportunity to make dynamic trade-offs and adjustments. It is based on this observation that data files are introduced into the plugin's event model. Write event. + +To sum up, the data file writing event appears in the event stream of the stream processing plugin, and there are two situations: + +(1) Historical data extraction: Before a stream processing task starts, all written data that has been placed on the disk will exist in the form of TsFile. After a stream processing task starts, when collecting historical data, the historical data will be abstracted using TsFileInsertionEvent; + +(2) Real-time data extraction: When a stream processing task is in progress, when the real-time processing speed of operation log write events in the data stream is slower than the write request speed, after a certain progress, the operation log write events that cannot be processed in the future will be persisted. to disk and exists in the form of TsFile. After this data is extracted by the stream processing engine, TsFileInsertionEvent will be used as an abstraction. + +```java +/** + * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, + * which is compressed and encoded, and requires IO cost for computational processing. + */ +public interface TsFileInsertionEvent extends Event { + + /** + * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. + * + * @return {@code Iterable} the list of TabletInsertionEvent + */ + Iterable toTabletInsertionEvents(); +} +``` + +### Custom stream processing plugin programming interface definition + +Based on the custom stream processing plugin programming interface, users can easily write data extraction plugins, data processing plugins and data sending plugins, so that the stream processing function can be flexibly adapted to various industrial scenarios. + +#### Data extraction plugin interface + +Data extraction is the first stage of the three stages of stream processing data from data extraction to data sending. The data extraction plugin (PipeSource) is the bridge between the stream processing engine and the storage engine. It monitors the behavior of the storage engine, +Capture various data write events. + +```java +/** + * PipeSource + * + *

PipeSource is responsible for capturing events from sources. + * + *

Various data sources can be supported by implementing different PipeSource classes. + * + *

The lifecycle of a PipeSource is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH Source` clause in SQL are + * parsed and the validation method {@link PipeSource#validate(PipeParameterValidator)} will + * be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} will be called to + * configure the runtime behavior of the PipeSource. + *
  • Then the method {@link PipeSource#start()} will be called to start the PipeSource. + *
  • While the collaboration task is in progress, the method {@link PipeSource#supply()} will be + * called to capture events from sources and then the events will be passed to the + * PipeProcessor. + *
  • The method {@link PipeSource#close()} will be called when the collaboration task is + * cancelled (the `DROP PIPE` command is executed). + *
+ */ +public interface PipeSource extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSource. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSourceRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSource#validate(PipeParameterValidator)} + * is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSource + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSourceRuntimeConfiguration configuration) + throws Exception; + + /** + * Start the Source. After this method is called, events should be ready to be supplied by + * {@link PipeSource#supply()}. This method is called after {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @throws Exception the user can throw errors if necessary + */ + void start() throws Exception; + + /** + * Supply single event from the Source and the caller will send the event to the processor. + * This method is called after {@link PipeSource#start()} is called. + * + * @return the event to be supplied. the event may be null if the Source has no more events at + * the moment, but the Source is still running for more events. + * @throws Exception the user can throw errors if necessary + */ + Event supply() throws Exception; +} +``` + +#### Data processing plugin interface + +Data processing is the second stage of the three stages of stream processing data from data extraction to data sending. The data processing plugin (PipeProcessor) is mainly used to filter and transform the data captured by the data extraction plugin (PipeSource). +various events. + +```java +/** + * PipeProcessor + * + *

PipeProcessor is used to filter and transform the Event formed by the PipeSource. + * + *

The lifecycle of a PipeProcessor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are + * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called + * to configure the runtime behavior of the PipeProcessor. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSource. The + * following 3 methods will be called: {@link + * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link + * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link + * PipeProcessor#process(Event, EventCollector)}. + *
    • PipeSink serializes the events into binaries and send them to sinks. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeProcessor#close() } method will be called. + *
+ */ +public interface PipeProcessor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeProcessor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the + * events processing. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeProcessor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is called to process the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) + throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + process(tabletInsertionEvent, eventCollector); + } + } + + /** + * This method is called to process the Event. + * + * @param event Event to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(Event event, EventCollector eventCollector) throws Exception; +} +``` + +#### Data sending plugin interface + +Data sending is the third stage of the three stages of stream processing data from data extraction to data sending. The data sending plugin (PipeSink) is mainly used to send data processed by the data processing plugin (PipeProcessor). +Various events, it serves as the network implementation layer of the stream processing framework, and the interface should allow access to multiple real-time communication protocols and multiple sinks. + +```java +/** + * PipeSink + * + *

PipeSink is responsible for sending events to sinks. + * + *

Various network protocols can be supported by implementing different PipeSink classes. + * + *

The lifecycle of a PipeSink is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH SINK` clause in SQL are + * parsed and the validation method {@link PipeSink#validate(PipeParameterValidator)} will be + * called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link PipeSink#customize(PipeParameters, + * PipeSinkRuntimeConfiguration)} will be called to configure the runtime behavior of the + * PipeSink and the method {@link PipeSink#handshake()} will be called to create a connection + * with sink. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSink. + *
    • PipeSink serializes the events into binaries and send them to sinks. The following 3 + * methods will be called: {@link PipeSink#transfer(TabletInsertionEvent)}, {@link + * PipeSink#transfer(TsFileInsertionEvent)} and {@link PipeSink#transfer(Event)}. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeSink#close() } method will be called. + *
+ * + *

In addition, the method {@link PipeSink#heartbeat()} will be called periodically to check + * whether the connection with sink is still alive. The method {@link PipeSink#handshake()} will be + * called to create a new connection with the sink when the method {@link PipeSink#heartbeat()} + * throws exceptions. + */ +public interface PipeSink extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSink. In this method, the user can do the following + * things: + * + *

    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSinkRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSink#validate(PipeParameterValidator)} is + * called and before the method {@link PipeSink#handshake()} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSink + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSinkRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is used to create a connection with sink. This method will be called after the + * method {@link PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called or + * will be called when the method {@link PipeSink#heartbeat()} throws exceptions. + * + * @throws Exception if the connection is failed to be created + */ + void handshake() throws Exception; + + /** + * This method will be called periodically to check whether the connection with sink is still + * alive. + * + * @throws Exception if the connection dies + */ + void heartbeat() throws Exception; + + /** + * This method is used to transfer the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; + + /** + * This method is used to transfer the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { + try { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + transfer(tabletInsertionEvent); + } + } finally { + tsFileInsertionEvent.close(); + } + } + + /** + * This method is used to transfer the generic events, including HeartbeatEvent. + * + * @param event Event to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(Event event) throws Exception; +} +``` + +## Custom stream processing plugin management + +In order to ensure the flexibility and ease of use of user-defined plugins in actual production, the system also needs to provide the ability to dynamically and uniformly manage plugins. +The stream processing plugin management statements introduced in this chapter provide an entry point for dynamic unified management of plugins. + +### Load plugin statement + +In IoTDB, if you want to dynamically load a user-defined plugin in the system, you first need to implement a specific plugin class based on PipeSource, PipeProcessor or PipeSink. +Then the plugin class needs to be compiled and packaged into a jar executable file, and finally the plugin is loaded into IoTDB using the management statement for loading the plugin. + +The syntax of the management statement for loading the plugin is shown in the figure. + +```sql +CREATE PIPEPLUGIN [IF NOT EXISTS] +AS +USING +``` +**IF NOT EXISTS semantics**: Used in creation operations to ensure that the create command is executed when the specified Pipe Plugin does not exist, preventing errors caused by attempting to create an existing Pipe Plugin. + +Example: If you implement a data processing plugin named edu.tsinghua.iotdb.pipe.ExampleProcessor, and the packaged jar package is pipe-plugin.jar, you want to use this plugin in the stream processing engine, and mark the plugin as example. There are two ways to use the plugin package, one is to upload to the URI server, and the other is to upload to the local directory of the cluster. + +Method 1: Upload to the URI server + +Preparation: To register in this way, you need to upload the JAR package to the URI server in advance and ensure that the IoTDB instance that executes the registration statement can access the URI server. For example https://example.com:8080/iotdb/pipe-plugin.jar . + +SQL: + +```sql +CREATE PIPEPLUGIN IF NOT EXISTS example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +Method 2: Upload the data to the local directory of the cluster + +Preparation: To register in this way, you need to place the JAR package in any path on the machine where the DataNode node is located, and we recommend that you place the JAR package in the /ext/pipe directory of the IoTDB installation path (the installation package is already in the installation package, so you do not need to create a new one). For example: iotdb-1.x.x-bin/ext/pipe/pipe-plugin.jar. **(Note: If you are using a cluster, you will need to place the JAR package under the same path as the machine where each DataNode node is located)** + +SQL: + +```sql +CREATE PIPEPLUGIN IF NOT EXISTS example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +### Delete plugin statement + +When the user no longer wants to use a plugin and needs to uninstall the plugin from the system, he can use the delete plugin statement as shown in the figure. + +```sql +DROP PIPEPLUGIN [IF EXISTS] +``` + +**IF EXISTS semantics**: Used in deletion operations to ensure that when a specified Pipe Plugin exists, the delete command is executed to prevent errors caused by attempting to delete a non-existent Pipe Plugin. + +### View plugin statements + +Users can also view plugins in the system on demand. View the statement of the plugin as shown in the figure. +```sql +SHOW PIPEPLUGINS +``` + +## System preset stream processing plugin + +### Pre-built Source Plugin + +#### iotdb-source + +Function: Extract historical or realtime data inside IoTDB into pipe. + + +| key | value | value range | required or optional with default | +|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------------------------| +| source | iotdb-source | String: iotdb-source | required | +| source.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | +| source.history.start-time | start of synchronizing historical data event time,including start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| source.history.end-time | end of synchronizing historical data event time,including end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| source.forwarding-pipe-requests | Whether to forward data written by another Pipe (usually Data Sync) | Boolean: true, false | optional:true | +| start-time(V1.3.1+) | start of synchronizing all data event time,including start-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| end-time(V1.3.1+) | end of synchronizing all data event time,including end-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| source.realtime.mode | Extraction mode for real-time data | String: hybrid, stream, batch | optional:hybrid | +| source.forwarding-pipe-requests | Whether to forward data written by another Pipe (usually Data Sync) | Boolean: true, false | optional:true | + +> 🚫 **source.pattern Parameter Description** +> +> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/Download/)) +> * In the underlying implementation, when pattern is detected as root (default value) or a database name, synchronization efficiency is higher, and any other format will reduce performance. +> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'source.pattern'='root.aligned.1': + > + > * root.aligned.1TS + > * root.aligned.1TS.\`1\` + > * root.aligned.100TS + > + > the data will be synchronized; + > + > * root.aligned.\`1\` +> * root.aligned.\`123\` + > + > the data will not be synchronized. + +> ❗️**start-time, end-time parameter description of source** +> +> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00. However, version 1.3.1+ supports timeStamp format like 1706704494000. + +> ✅ **A piece of data from production to IoTDB contains two key concepts of time** +> +> * **event time:** The time when the data is actually produced (or the generation time assigned to the data by the data production system, which is the time item in the data point), also called event time. +> * **arrival time:** The time when data arrives in the IoTDB system. +> +> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. + +> 💎 **The work of iotdb-source can be split into two stages** +> +> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data +> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data +> +> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** + +> 📌 **source.realtime.mode: Data extraction mode** +> +> * log: In this mode, the task only uses the operation log for data processing and sending +> * file: In this mode, the task only uses data files for data processing and sending. +> * hybrid: This mode takes into account the characteristics of low latency but low throughput when sending data one by one in the operation log, and the characteristics of high throughput but high latency when sending in batches of data files. It can automatically operate under different write loads. Switch the appropriate data extraction method. First, adopt the data extraction method based on operation logs to ensure low sending delay. When a data backlog occurs, it will automatically switch to the data extraction method based on data files to ensure high sending throughput. When the backlog is eliminated, it will automatically switch back to the data extraction method based on data files. The data extraction method of the operation log avoids the problem of difficulty in balancing data sending delay or throughput using a single data extraction algorithm. + +> 🍕 **source.forwarding-pipe-requests: Whether to allow forwarding data transmitted from another pipe** +> +> * If you want to use pipe to build data synchronization of A -> B -> C, then the pipe of B -> C needs to set this parameter to true, so that the data written by A to B through the pipe in A -> B can be forwarded correctly. to C +> * If you want to use pipe to build two-way data synchronization (dual-active) of A \<-> B, then the pipes of A -> B and B -> A need to set this parameter to false, otherwise the data will be endless. inter-cluster round-robin forwarding + +### Preset processor plugin + +#### do-nothing-processor + +Function: No processing is done on the events passed in by the source. + + +| key | value | value range | required or optional with default | +|-----------|----------------------|------------------------------|-----------------------------------| +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### Preset sink plugin + +#### do-nothing-sink + +Function: No processing is done on the events passed in by the processor. + +| key | value | value range | required or optional with default | +|------|-----------------|-------------------------|-----------------------------------| +| sink | do-nothing-sink | String: do-nothing-sink | required | + +## Stream processing task management + +### Create a stream processing task + +Use the `CREATE PIPE` statement to create a stream processing task. Taking the creation of a data synchronization stream processing task as an example, the sample SQL statement is as follows: + +```sql +CREATE PIPE -- PipeId is the name that uniquely identifies the sync task +WITH SOURCE ( + -- Default IoTDB Data Extraction Plugin + 'source' = 'iotdb-source', + -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery + 'source.pattern' = 'root.timecho', + -- Whether to extract historical data + 'source.history.enable' = 'true', + -- Describes the time range of the historical data being extracted, indicating the earliest possible time + 'source.history.start-time' = '2011.12.03T10:15:30+01:00', + -- Describes the time range of the extracted historical data, indicating the latest time + 'source.history.end-time' = '2022.12.03T10:15:30+01:00', + -- Whether to extract realtime data + 'source.realtime.enable' = 'true', +) +WITH PROCESSOR ( + -- Default data processing plugin, means no processing + 'processor' = 'do-nothing-processor', +) +WITH SINK ( + -- IoTDB data sending plugin with target IoTDB + 'sink' = 'iotdb-thrift-sink', + -- Data service for one of the DataNode nodes on the target IoTDB ip + 'sink.ip' = '127.0.0.1', + -- Data service port of one of the DataNode nodes of the target IoTDB + 'sink.port' = '6667', +) +``` + +**When creating a stream processing task, you need to configure the PipeId and the parameters of the three plugin parts:** + +| Configuration | Description | Required or not | Default implementation | Default implementation description | Default implementation description | +|---------------|-----------------------------------------------------------------------------------------------------|---------------------------------|------------------------|---------------------------------------------------------------------------------------------------------------------------|------------------------------------| +| PipeId | A globally unique name that identifies a stream processing | Required | - | - | - | +| source | Pipe Source plugin, responsible for extracting stream processing data at the bottom of the database | Optional | iotdb-source | Integrate the full historical data of the database and subsequent real-time data arriving into the stream processing task | No | +| processor | Pipe Processor plugin, responsible for processing data | Optional | do-nothing-processor | Does not do any processing on the incoming data | Yes | +| sink | Pipe Sink plugin, responsible for sending data | Required | - | - | Yes | + +In the example, the iotdb-source, do-nothing-processor and iotdb-thrift-sink plugins are used to build the data flow processing task. IoTDB also has other built-in stream processing plugins, **please check the "System Preset Stream Processing plugin" section**. + +**A simplest example of the CREATE PIPE statement is as follows:** + +```sql +CREATE PIPE -- PipeId is a name that uniquely identifies the stream processing task +WITH SINK ( + -- IoTDB data sending plugin, the target is IoTDB + 'sink' = 'iotdb-thrift-sink', + --The data service IP of one of the DataNode nodes in the target IoTDB + 'sink.ip' = '127.0.0.1', + -- The data service port of one of the DataNode nodes in the target IoTDB + 'sink.port' = '6667', +) +``` + +The semantics expressed are: synchronize all historical data in this database instance and subsequent real-time data arriving to the IoTDB instance with the target 127.0.0.1:6667. + +**Notice:** + +- SOURCE and PROCESSOR are optional configurations. If you do not fill in the configuration parameters, the system will use the corresponding default implementation. +- SINK is a required configuration and needs to be configured declaratively in the CREATE PIPE statement +- SINK has self-reuse capability. For different stream processing tasks, if their SINKs have the same KV attributes (the keys corresponding to the values of all attributes are the same), then the system will only create one SINK instance in the end to realize the duplication of connection resources. + + - For example, there are the following declarations of two stream processing tasks, pipe1 and pipe2: + + ```sql + CREATE PIPE pipe1 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.ip' = 'localhost', + 'sink.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.port' = '9999', + 'sink.ip' = 'localhost', + ) + ``` + +- Because their declarations of SINK are exactly the same (**even if the order of declaration of some attributes is different**), the framework will automatically reuse the SINKs they declared, and ultimately the SINKs of pipe1 and pipe2 will be the same instance. . +- When the source is the default iotdb-source, and source.forwarding-pipe-requests is the default value true, please do not build an application scenario that includes data cycle synchronization (it will cause an infinite loop): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### Start the stream processing task + +After the CREATE PIPE statement is successfully executed, the stream processing task-related instance will be created, but the running status of the entire stream processing task will be set to STOPPED(V1.3.0), that is, the stream processing task will not process data immediately. In version 1.3.1 and later, the status of the task will be set to RUNNING after CREATE. + +You can use the START PIPE statement to cause a stream processing task to start processing data: + +```sql +START PIPE +``` + +### Stop the stream processing task + +Use the STOP PIPE statement to stop the stream processing task from processing data: + +```sql +STOP PIPE +``` + +### Delete stream processing tasks + +Use the DROP PIPE statement to stop the stream processing task from processing data (when the stream processing task status is RUNNING), and then delete the entire stream processing task: + +```sql +DROP PIPE +``` + +Users do not need to perform a STOP operation before deleting the stream processing task. + +### Display stream processing tasks + +Use the SHOW PIPES statement to view all stream processing tasks: + +```sql +SHOW PIPES +``` + +The query results are as follows: + +```sql ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor|PipeSink|ExceptionMessage| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| {}| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +``` + +You can use `` to specify the status of a stream processing task you want to see: + +```sql +SHOW PIPE +``` + +You can also use the where clause to determine whether the Pipe Sink used by a certain \ is reused. + +```sql +SHOW PIPES +WHERE SINK USED BY +``` + +### Stream processing task running status migration + +A stream processing pipe will pass through various states during its managed life cycle: + +- **RUNNING:** pipe is working properly + - When a pipe is successfully created, its initial state is RUNNING.(V1.3.1+) +- **STOPPED:** The pipe is stopped. When the pipeline is in this state, there are several possibilities: + - When a pipe is successfully created, its initial state is STOPPED.(V1.3.0) + - The user manually pauses a pipe that is in normal running status, and its status will passively change from RUNNING to STOPPED. + - When an unrecoverable error occurs during the running of a pipe, its status will automatically change from RUNNING to STOPPED +- **DROPPED:** The pipe task was permanently deleted + +The following diagram shows all states and state transitions: + +![State migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## authority management + +### Stream processing tasks + + +| Permission name | Description | +|-----------------|------------------------------------------------------------| +| USE_PIPE | Register a stream processing task. The path is irrelevant. | +| USE_PIPE | Start the stream processing task. The path is irrelevant. | +| USE_PIPE | Stop the stream processing task. The path is irrelevant. | +| USE_PIPE | Offload stream processing tasks. The path is irrelevant. | +| USE_PIPE | Query stream processing tasks. The path is irrelevant. | + +### Stream processing task plugin + + +| Permission name | Description | +|-----------------|----------------------------------------------------------------------| +| USE_PIPE | Register stream processing task plugin. The path is irrelevant. | +| USE_PIPE | Uninstall the stream processing task plugin. The path is irrelevant. | +| USE_PIPE | Query stream processing task plugin. The path is irrelevant. | + +## Configuration parameters + +In iotdb-system.properties: + +V1.3.0+: +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 + +# The maximum number of selectors that can be used in the async connector. +# pipe_async_connector_selector_number=1 + +# The core number of clients that can be used in the async connector. +# pipe_async_connector_core_client_number=8 + +# The maximum number of clients that can be used in the async connector. +# pipe_async_connector_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# pipe_air_gap_receiver_port=9780 +``` + +V1.3.1+: +```Properties +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# pipe_sink_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# pipe_air_gap_receiver_port=9780 +``` diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Tiered-Storage_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Tiered-Storage_timecho.md new file mode 100644 index 00000000..1cb50b1e --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Tiered-Storage_timecho.md @@ -0,0 +1,96 @@ + + +# Tiered Storage +## Overview + +The Tiered storage functionality allows users to define multiple layers of storage, spanning across multiple types of storage media (Memory mapped directory, SSD, rotational hard discs or cloud storage). While memory and cloud storage is usually singular, the local file system storages can consist of multiple directories joined together into one tier. Meanwhile, users can classify data based on its hot or cold nature and store data of different categories in specified "tier". Currently, IoTDB supports the classification of hot and cold data through TTL (Time to live / age) of data. When the data in one tier does not meet the TTL rules defined in the current tier, the data will be automatically migrated to the next tier. + +## Parameter Definition + +To enable tiered storage in IoTDB, you need to configure the following aspects: + +1. configure the data catalogue and divide the data catalogue into different tiers +2. configure the TTL of the data managed in each tier to distinguish between hot and cold data categories managed in different tiers. +3. configure the minimum remaining storage space ratio for each tier so that when the storage space of the tier triggers the threshold, the data of the tier will be automatically migrated to the next tier (optional). + +The specific parameter definitions and their descriptions are as follows. + +| Configuration | Default | Description | Constraint | +| ---------------------------------------- | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| dn_data_dirs | data/datanode/data | specify different storage directories and divide the storage directories into tiers | Each level of storage uses a semicolon to separate, and commas to separate within a single level; cloud (OBJECT_STORAGE) configuration can only be used as the last level of storage and the first level can't be used as cloud storage; a cloud object at most; the remote storage directory is denoted by OBJECT_STORAGE | +| tier_ttl_in_ms | -1 | Define the maximum age of data for which each tier is responsible | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs;"-1" means "unlimited". | +| dn_default_space_usage_thresholds | 0.85 | Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | +| object_storage_type | AWS_S3 | Cloud Storage Type | IoTDB currently only supports AWS S3 as a remote storage type, and this parameter can't be modified | +| object_storage_bucket | iotdb_data | Name of cloud storage bucket | Bucket definition in AWS S3; no need to configure if remote storage is not used | +| object_storage_endpoiont | | endpoint of cloud storage | endpoint of AWS S3;If remote storage is not used, no configuration required | +| object_storage_access_key | | Authentication information stored in the cloud: key | AWS S3 credential key;If remote storage is not used, no configuration required | +| object_storage_access_secret | | Authentication information stored in the cloud: secret | AWS S3 credential secret;If remote storage is not used, no configuration required | +| remote_tsfile_cache_dirs | data/datanode/data/cache | Cache directory stored locally in the cloud | If remote storage is not used, no configuration required | +| remote_tsfile_cache_page_size_in_kb | 20480 |Block size of locally cached files stored in the cloud | If remote storage is not used, no configuration required | +| remote_tsfile_cache_max_disk_usage_in_mb | 51200 | Maximum Disk Occupancy Size for Cloud Storage Local Cache | If remote storage is not used, no configuration required | + +## local tiered storag configuration example + +The following is an example of a local two-level storage configuration. + +```JavaScript +//Required configuration items +dn_data_dirs=/data1/data;/data2/data,/data3/data; +tier_ttl_in_ms=86400000;-1 +dn_default_space_usage_thresholds=0.2;0.1 +``` + +In this example, two levels of storage are configured, specifically: + +| **tier** | **data path** | **data range** | **threshold for minimum remaining disk space** | +| -------- | -------------------------------------- | --------------- | ------------------------ | +| tier 1 | path 1:/data1/data | data for last 1 day | 20% | +| tier 2 | path 2:/data2/data path 2:/data3/data | data from 1 day ago | 10% | + +## remote tiered storag configuration example + +The following takes three-level storage as an example: + +```JavaScript +//Required configuration items +dn_data_dirs=/data1/data;/data2/data,/data3/data;OBJECT_STORAGE +tier_ttl_in_ms=86400000;864000000;-1 +dn_default_space_usage_thresholds=0.2;0.15;0.1 +object_storage_name=AWS_S3 +object_storage_bucket=iotdb +object_storage_endpoiont= +object_storage_access_key= +object_storage_access_secret= + +// Optional configuration items +remote_tsfile_cache_dirs=data/datanode/data/cache +remote_tsfile_cache_page_size_in_kb=20971520 +remote_tsfile_cache_max_disk_usage_in_mb=53687091200 +``` + +In this example, a total of three levels of storage are configured, specifically: + +| **tier** | **data path** | **data range** | **threshold for minimum remaining disk space** | +| -------- | -------------------------------------- | ---------------------------- | ------------------------ | +| tier1 | path 1:/data1/data | data for last 1 day | 20% | +| tier2 | path 1:/data2/data path 2:/data3/data | data from past 1 day to past 10 days | 15% | +| tier3 | Remote AWS S3 Storage | data from 10 days ago | 10% | diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/Trigger.md b/src/UserGuide/V2.0.1/Tree/User-Manual/Trigger.md new file mode 100644 index 00000000..7c4e163f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/Trigger.md @@ -0,0 +1,466 @@ + + +# TRIGGER + +## Instructions + +The trigger provides a mechanism for listening to changes in time series data. With user-defined logic, tasks such as alerting and data forwarding can be conducted. + +The trigger is implemented based on the reflection mechanism. Users can monitor data changes by implementing the Java interfaces. IoTDB allows users to dynamically register and drop triggers without restarting the server. + +The document will help you learn to define and manage triggers. + +### Pattern for Listening + +A single trigger can be used to listen for data changes in a time series that match a specific pattern. For example, a trigger can listen for the data changes of time series `root.sg.a`, or time series that match the pattern `root.sg.*`. When you register a trigger, you can specify the path pattern that the trigger listens on through an SQL statement. + +### Trigger Type + +There are currently two types of triggers, and you can specify the type through an SQL statement when registering a trigger: + +- Stateful triggers: The execution logic of this type of trigger may depend on data from multiple insertion statement . The framework will aggregate the data written by different nodes into the same trigger instance for calculation to retain context information. This type of trigger is usually used for sampling or statistical data aggregation for a period of time. information. Only one node in the cluster holds an instance of a stateful trigger. +- Stateless triggers: The execution logic of the trigger is only related to the current input data. The framework does not need to aggregate the data of different nodes into the same trigger instance. This type of trigger is usually used for calculation of single row data and abnormal detection. Each node in the cluster holds an instance of a stateless trigger. + +### Trigger Event + +There are currently two trigger events for the trigger, and other trigger events will be expanded in the future. When you register a trigger, you can specify the trigger event through an SQL statement: + +- BEFORE INSERT: Fires before the data is persisted. **Please note that currently the trigger does not support data cleaning and will not change the data to be persisted itself.** +- AFTER INSERT: Fires after the data is persisted. + +## How to Implement a Trigger + +You need to implement the trigger by writing a Java class, where the dependency shown below is required. If you use [Maven](http://search.maven.org/), you can search for them directly from the [Maven repository](http://search.maven.org/). + +### Dependency + +```xml + + org.apache.iotdb + iotdb-server + 1.0.0 + provided + +``` + +Note that the dependency version should be correspondent to the target server version. + +### Interface Description + +To implement a trigger, you need to implement the `org.apache.iotdb.trigger.api.Trigger` class. + +```java +import org.apache.iotdb.trigger.api.enums.FailureStrategy; +import org.apache.iotdb.tsfile.write.record.Tablet; + +public interface Trigger { + + /** + * This method is mainly used to validate {@link TriggerAttributes} before calling {@link + * Trigger#onCreate(TriggerAttributes)}. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void validate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when creating a trigger after validation. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void onCreate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when dropping a trigger. + * + * @throws Exception e + */ + default void onDrop() throws Exception {} + + /** + * When restarting a DataNode, Triggers that have been registered will be restored and this method + * will be called during the process of restoring. + * + * @throws Exception e + */ + default void restore() throws Exception {} + + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } + + /** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +} +``` + +This class provides two types of programming interfaces: **Lifecycle related interfaces** and **data change listening related interfaces**. All the interfaces in this class are not required to be implemented. When the interfaces are not implemented, the trigger will not respond to the data changes. You can implement only some of these interfaces according to your needs. + +Descriptions of the interfaces are as followed. + +#### Lifecycle Related Interfaces + +| Interface | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| *default void validate(TriggerAttributes attributes) throws Exception {}* | When you creates a trigger using the `CREATE TRIGGER` statement, you can specify the parameters that the trigger needs to use, and this interface will be used to verify the correctness of the parameters。 | +| *default void onCreate(TriggerAttributes attributes) throws Exception {}* | This interface is called once when you create a trigger using the `CREATE TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface is mainly used for the following functions: helping users to parse custom attributes in SQL statements (using `TriggerAttributes`). You can create or apply for resources, such as establishing external links, opening files, etc. | +| *default void onDrop() throws Exception {}* | This interface is called when you drop a trigger using the `DROP TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface mainly has the following functions: it can perform the operation of resource release and can be used to persist the results of trigger calculations. | +| *default void restore() throws Exception {}* | When the DataNode is restarted, the cluster will restore the trigger instance registered on the DataNode, and this interface will be called once for stateful trigger during the process. After the DataNode where the stateful trigger instance is located goes down, the cluster will restore the trigger instance on another available DataNode, calling this interface once in the process. This interface can be used to customize recovery logic. | + +#### Data Change Listening Related Interfaces + +##### Listening Interface + +```java +/** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +``` + +When the data changes, the trigger uses the Tablet as the unit of firing operation. You can obtain the metadata and data of the corresponding sequence through Tablet, and then perform the corresponding trigger operation. If the fire process is successful, the return value should be true. If the interface returns false or throws an exception, we consider the trigger fire process as failed. When the trigger fire process fails, we will perform corresponding operations according to the listening strategy interface. + +When performing an INSERT operation, for each time series in it, we will detect whether there is a trigger that listens to the path pattern, and then assemble the time series data that matches the path pattern listened by the same trigger into a new Tablet for trigger fire interface. Can be understood as: + +```java +Map> pathToTriggerListMap => Map +``` + +**Note that currently we do not make any guarantees about the order in which triggers fire.** + +Here is an example: + +Suppose there are three triggers, and the trigger event of the triggers are all BEFORE INSERT: + +- Trigger1 listens on `root.sg.*` +- Trigger2 listens on `root.sg.a` +- Trigger3 listens on `root.sg.b` + +Insertion statement: + +```sql +insert into root.sg(time, a, b) values (1, 1, 1); +``` + +The time series `root.sg.a` matches Trigger1 and Trigger2, and the sequence `root.sg.b` matches Trigger1 and Trigger3, then: + +- The data of `root.sg.a` and `root.sg.b` will be assembled into a new tablet1, and Trigger1.fire(tablet1) will be executed at the corresponding Trigger Event. +- The data of `root.sg.a` will be assembled into a new tablet2, and Trigger2.fire(tablet2) will be executed at the corresponding Trigger Event. +- The data of `root.sg.b` will be assembled into a new tablet3, and Trigger3.fire(tablet3) will be executed at the corresponding Trigger Event. + +##### Listening Strategy Interface + +When the trigger fails to fire, we will take corresponding actions according to the strategy set by the listening strategy interface. You can set `org.apache.iotdb.trigger.api.enums.FailureStrategy`. There are currently two strategies, optimistic and pessimistic: + +- Optimistic strategy: The trigger that fails to fire does not affect the firing of subsequent triggers, nor does it affect the writing process, that is, we do not perform additional processing on the sequence involved in the trigger failure, only log the failure to record the failure, and finally inform user that data insertion is successful, but the trigger fire part failed. +- Pessimistic strategy: The failure trigger affects the processing of all subsequent Pipelines, that is, we believe that the firing failure of the trigger will cause all subsequent triggering processes to no longer be carried out. If the trigger event of the trigger is BEFORE INSERT, then the insertion will no longer be performed, and the insertion failure will be returned directly. + +```java + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } +``` + +### Example + +If you use [Maven](http://search.maven.org/), you can refer to our sample project **trigger-example**. + +You can find it [here](https://github.com/apache/iotdb/tree/master/example/trigger). + +Here is the code from one of the sample projects: + +```java +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.trigger; + +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerConfiguration; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerEvent; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerHandler; +import org.apache.iotdb.trigger.api.Trigger; +import org.apache.iotdb.trigger.api.TriggerAttributes; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.write.record.Tablet; +import org.apache.iotdb.tsfile.write.schema.MeasurementSchema; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; + +public class ClusterAlertingExample implements Trigger { + private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class); + + private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); + + private final AlertManagerConfiguration alertManagerConfiguration = + new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); + + private String alertname; + + private final HashMap labels = new HashMap<>(); + + private final HashMap annotations = new HashMap<>(); + + @Override + public void onCreate(TriggerAttributes attributes) throws Exception { + alertname = "alert_test"; + + labels.put("series", "root.ln.wf01.wt01.temperature"); + labels.put("value", ""); + labels.put("severity", ""); + + annotations.put("summary", "high temperature"); + annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); + + alertManagerHandler.open(alertManagerConfiguration); + } + + @Override + public void onDrop() throws IOException { + alertManagerHandler.close(); + } + + @Override + public boolean fire(Tablet tablet) throws Exception { + List measurementSchemaList = tablet.getSchemas(); + for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { + if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { + // for example, we only deal with the columns of Double type + double[] values = (double[]) tablet.values[i]; + for (double value : values) { + if (value > 100.0) { + LOGGER.info("trigger value > 100"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "critical"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } else if (value > 50.0) { + LOGGER.info("trigger value > 50"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "warning"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } + } + } + } + return true; + } +} +``` + +## Trigger Management + +You can create and drop a trigger through an SQL statement, and you can also query all registered triggers through an SQL statement. + +**We recommend that you stop insertion while creating triggers.** + +### Create Trigger + +Triggers can be registered on arbitrary path patterns. The time series registered with the trigger will be listened to by the trigger. When there is data change on the series, the corresponding fire method in the trigger will be called. + +Registering a trigger can be done as follows: + +1. Implement a Trigger class as described in the How to implement a Trigger chapter, assuming the class's full class name is `org.apache.iotdb.trigger.ClusterAlertingExample` +2. Package the project into a JAR package. +3. Register the trigger with an SQL statement. During the creation process, the `validate` and `onCreate` interfaces of the trigger will only be called once. For details, please refer to the chapter of How to implement a Trigger. + +The complete SQL syntax is as follows: + +```sql +// Create Trigger +createTrigger + : CREATE triggerType TRIGGER triggerName=identifier triggerEventClause ON pathPattern AS className=STRING_LITERAL uriClause? triggerAttributeClause? + ; + +triggerType + : STATELESS | STATEFUL + ; + +triggerEventClause + : (BEFORE | AFTER) INSERT + ; + +uriClause + : USING URI uri + ; + +uri + : STRING_LITERAL + ; + +triggerAttributeClause + : WITH LR_BRACKET triggerAttribute (COMMA triggerAttribute)* RR_BRACKET + ; + +triggerAttribute + : key=attributeKey operator_eq value=attributeValue + ; +``` + +Below is the explanation for the SQL syntax: + +- triggerName: The trigger ID, which is globally unique and used to distinguish different triggers, is case-sensitive. +- triggerType: Trigger types are divided into two categories, STATELESS and STATEFUL. +- triggerEventClause: when the trigger fires, BEFORE INSERT and AFTER INSERT are supported now. +- pathPattern:The path pattern the trigger listens on, can contain wildcards * and **. +- className:The class name of the Trigger class. +- jarLocation: Optional. When this option is not specified, by default, we consider that the DBA has placed the JAR package required to create the trigger in the trigger_root_dir directory (configuration item, default is IOTDB_HOME/ext/trigger) of each DataNode node. When this option is specified, we will download and distribute the file resource corresponding to the URI to the trigger_root_dir/install directory of each DataNode. +- triggerAttributeClause: It is used to specify the parameters that need to be set when the trigger instance is created. This part is optional in the SQL syntax. + +Here is an example SQL statement to help you understand: + +```sql +CREATE STATELESS TRIGGER triggerTest +BEFORE INSERT +ON root.sg.** +AS 'org.apache.iotdb.trigger.ClusterAlertingExample' +USING URI '/jar/ClusterAlertingExample.jar' +WITH ( + "name" = "trigger", + "limit" = "100" +) +``` + +The above SQL statement creates a trigger named triggerTest: + +- The trigger is stateless. +- Fires before insertion. +- Listens on path pattern root.sg.** +- The implemented trigger class is named `org.apache.iotdb.trigger.ClusterAlertingExample` +- The JAR package URI is http://jar/ClusterAlertingExample.jar +- When creating the trigger instance, two parameters, name and limit, are passed in. + +### Drop Trigger + +The trigger can be dropped by specifying the trigger ID. During the process of dropping the trigger, the `onDrop` interface of the trigger will be called only once. + +The SQL syntax is: + +```sql +// Drop Trigger +dropTrigger + : DROP TRIGGER triggerName=identifier +; +``` + +Here is an example statement: + +```sql +DROP TRIGGER triggerTest1 +``` + +The above statement will drop the trigger with ID triggerTest1. + +### Show Trigger + +You can query information about triggers that exist in the cluster through an SQL statement. + +The SQL syntax is as follows: + +```sql +SHOW TRIGGERS +``` + +The result set format of this statement is as follows: + +| TriggerName | Event | Type | State | PathPattern | ClassName | NodeId | +| ------------ | ---------------------------- | -------------------- | ------------------------------------------- | ----------- | --------------------------------------- | --------------------------------------- | +| triggerTest1 | BEFORE_INSERT / AFTER_INSERT | STATELESS / STATEFUL | INACTIVE / ACTIVE / DROPPING / TRANSFFERING | root.** | org.apache.iotdb.trigger.TriggerExample | ALL(STATELESS) / DATA_NODE_ID(STATEFUL) | + +### Trigger State + +During the process of creating and dropping triggers in the cluster, we maintain the states of the triggers. The following is a description of these states: + +| State | Description | Is it recommended to insert data? | +| ------------ | ------------------------------------------------------------ | --------------------------------- | +| INACTIVE | The intermediate state of executing `CREATE TRIGGER`, the cluster has just recorded the trigger information on the ConfigNode, and the trigger has not been activated on any DataNode. | NO | +| ACTIVE | Status after successful execution of `CREATE TRIGGE`, the trigger is available on all DataNodes in the cluster. | YES | +| DROPPING | Intermediate state of executing `DROP TRIGGER`, the cluster is in the process of dropping the trigger. | NO | +| TRANSFERRING | The cluster is migrating the location of this trigger instance. | NO | + +## Notes + +- The trigger takes effect from the time of registration, and does not process the existing historical data. **That is, only insertion requests that occur after the trigger is successfully registered will be listened to by the trigger. ** +- The fire process of trigger is synchronous currently, so you need to ensure the efficiency of the trigger, otherwise the writing performance may be greatly affected. **You need to guarantee concurrency safety of triggers yourself**. +- Please do no register too many triggers in the cluster. Because the trigger information is fully stored in the ConfigNode, and there is a copy of the information in all DataNodes +- **It is recommended to stop writing when registering triggers**. Registering a trigger is not an atomic operation. When registering a trigger, there will be an intermediate state in which some nodes in the cluster have registered the trigger, and some nodes have not yet registered successfully. To avoid write requests on some nodes being listened to by triggers and not being listened to on some nodes, we recommend not to perform writes when registering triggers. +- When the node holding the stateful trigger instance goes down, we will try to restore the corresponding instance on another node. During the recovery process, we will call the restore interface of the trigger class once. +- The trigger JAR package has a size limit, which must be less than min(`config_node_ratis_log_appender_buffer_size_max`, 2G), where `config_node_ratis_log_appender_buffer_size_max` is a configuration item. For the specific meaning, please refer to the IOTDB configuration item description. +- **It is better not to have classes with the same full class name but different function implementations in different JAR packages.** For example, trigger1 and trigger2 correspond to resources trigger1.jar and trigger2.jar respectively. If two JAR packages contain a `org.apache.iotdb.trigger.example.AlertListener` class, when `CREATE TRIGGER` uses this class, the system will randomly load the class in one of the JAR packages, which will eventually leads the inconsistent behavior of trigger and other issues. + +## Configuration Parameters + +| Parameter | Meaning | +| ------------------------------------------------- | ------------------------------------------------------------ | +| *trigger_lib_dir* | Directory to save the trigger jar package | +| *stateful\_trigger\_retry\_num\_when\_not\_found* | How many times will we retry to found an instance of stateful trigger on DataNodes if not found | diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/UDF-development.md b/src/UserGuide/V2.0.1/Tree/User-Manual/UDF-development.md new file mode 100644 index 00000000..057aabbe --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/UDF-development.md @@ -0,0 +1,743 @@ + # UDF development + +## 1. UDF development + +### 1.1 UDF Development Dependencies + +If you use [Maven](http://search.maven.org/), you can search for the development dependencies listed below from the [Maven repository](http://search.maven.org/) . Please note that you must select the same dependency version as the target IoTDB server version for development. + +``` xml + + org.apache.iotdb + udf-api + 1.0.0 + provided + +``` + +## 1.2 UDTF(User Defined Timeseries Generating Function) + +To write a UDTF, you need to inherit the `org.apache.iotdb.udf.api.UDTF` class, and at least implement the `beforeStart` method and a `transform` method. + +#### Interface Description: + +| Interface definition | Description | Required to Implement | +| :----------------------------------------------------------- | :----------------------------------------------------------- | ----------------------------------------------------- | +| void validate(UDFParameterValidator validator) throws Exception | This method is mainly used to validate `UDFParameters` and it is executed before `beforeStart(UDFParameters, UDTFConfigurations)` is called. | Optional | +| void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception | The initialization method to call the user-defined initialization behavior before a UDTF processes the input data. Every time a user executes a UDTF query, the framework will construct a new UDF instance, and `beforeStart` will be called. | Required | +| Object transform(Row row) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `MappableRowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Row`, and the transformation result should be returned. | Required to implement at least one `transform` method | +| void transform(Column[] columns, ColumnBuilder builder) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `MappableRowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Column[]`, and the transformation result should be output by `ColumnBuilder`. You need to call the data collection method provided by `builder` to determine the output data. | Required to implement at least one `transform` method | +| void transform(Row row, PointCollector collector) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `RowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Row`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | +| void transform(RowWindow rowWindow, PointCollector collector) throws Exception | This method is called by the framework. This data processing method will be called when you choose to use the `SlidingSizeWindowAccessStrategy` or `SlidingTimeWindowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `RowWindow`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | +| void terminate(PointCollector collector) throws Exception | This method is called by the framework. This method will be called once after all `transform` calls have been executed. In a single UDF query, this method will and will only be called once. You need to call the data collection method provided by `collector` to determine the output data. | Optional | +| void beforeDestroy() | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | + +In the life cycle of a UDTF instance, the calling sequence of each method is as follows: + +1. void validate(UDFParameterValidator validator) throws Exception +2. void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception +3. `Object transform(Row row) throws Exception` or `void transform(Column[] columns, ColumnBuilder builder) throws Exception` or `void transform(Row row, PointCollector collector) throws Exception` or `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` +4. void terminate(PointCollector collector) throws Exception +5. void beforeDestroy() + +> Note that every time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDTF without considering the influence of concurrency and other factors. + +#### Detailed interface introduction: + +1. **void validate(UDFParameterValidator validator) throws Exception** + +The `validate` method is used to validate the parameters entered by the user. + +In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification. + +Please refer to the Javadoc for the usage of `UDFParameterValidator`. + + +2. **void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception** + +This method is mainly used to customize UDTF. In this method, the user can do the following things: + +1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. +2. Set the strategy to access the raw data and set the output data type in UDTFConfigurations. +3. Create resources, such as establishing external connections, opening files, etc. + + +2.1 **UDFParameters** + +`UDFParameters` is used to parse UDF parameters in SQL statements (the part in parentheses after the UDF function name in SQL). The input parameters have two parts. The first part is data types of the time series that the UDF needs to process, and the second part is the key-value pair attributes for customization. Only the second part can be empty. + + +Example: + +``` sql +SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; +``` + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + String stringValue = parameters.getString("key1"); // iotdb + Float floatValue = parameters.getFloat("key2"); // 123.45 + Double doubleValue = parameters.getDouble("key3"); // null + int intValue = parameters.getIntOrDefault("key4", 678); // 678 + // do something + + // configurations + // ... +} +``` + + +2.2 **UDTFConfigurations** + +You must use `UDTFConfigurations` to specify the strategy used by UDF to access raw data and the type of output sequence. + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(Type.INT32); +} +``` + +The `setAccessStrategy` method is used to set the UDF's strategy for accessing the raw data, and the `setOutputDataType` method is used to set the data type of the output sequence. + + 2.2.1 **setAccessStrategy** + + +Note that the raw data access strategy you set here determines which `transform` method the framework will call. Please implement the `transform` method corresponding to the raw data access strategy. Of course, you can also dynamically decide which strategy to set based on the attribute parameters parsed by `UDFParameters`. Therefore, two `transform` methods are also allowed to be implemented in one UDF. + +The following are the strategies you can set: + +| Interface definition | Description | The `transform` Method to Call | +| :-------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ | +| MappableRowByRowStrategy | Custom scalar function
The framework will call the `transform` method once for each row of raw data input, with k columns of time series and 1 row of data input, and 1 column of time series and 1 row of data output. It can be used in any clause and expression where scalar functions appear, such as select clauses, where clauses, etc. | void transform(Column[] columns, ColumnBuilder builder) throws ExceptionObject transform(Row row) throws Exception | +| RowByRowAccessStrategy | Customize time series generation function to process raw data line by line.
The framework will call the `transform` method once for each row of raw data input, inputting k columns of time series and 1 row of data, and outputting 1 column of time series and n rows of data.
When a sequence is input, the row serves as a data point for the input sequence.
When multiple sequences are input, after aligning the input sequences in time, each row serves as a data point for the input sequence.
(In a row of data, there may be a column with a `null` value, but not all columns are `null`) | void transform(Row row, PointCollector collector) throws Exception | +| SlidingTimeWindowAccessStrategy | Customize time series generation functions to process raw data in a sliding time window manner.
The framework will call the `transform` method once for each raw data input window, input k columns of time series m rows of data, and output 1 column of time series n rows of data.
A window may contain multiple rows of data, and after aligning the input sequence in time, each window serves as a data point for the input sequence.
(Each window may have i rows, and each row of data may have a column with a `null` value, but not all of them are `null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | +| SlidingSizeWindowAccessStrategy | Customize the time series generation function to process raw data in a fixed number of rows, meaning that each data processing window will contain a fixed number of rows of data (except for the last window).
The framework will call the `transform` method once for each raw data input window, input k columns of time series m rows of data, and output 1 column of time series n rows of data.
A window may contain multiple rows of data, and after aligning the input sequence in time, each window serves as a data point for the input sequence.
(Each window may have i rows, and each row of data may have a column with a `null` value, but not all of them are `null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | +| SessionTimeWindowAccessStrategy | Customize time series generation functions to process raw data in a session window format.
The framework will call the `transform` method once for each raw data input window, input k columns of time series m rows of data, and output 1 column of time series n rows of data.
A window may contain multiple rows of data, and after aligning the input sequence in time, each window serves as a data point for the input sequence.
(Each window may have i rows, and each row of data may have a column with a `null` value, but not all of them are `null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | +| StateWindowAccessStrategy | Customize time series generation functions to process raw data in a state window format.
he framework will call the `transform` method once for each raw data input window, inputting 1 column of time series m rows of data and outputting 1 column of time series n rows of data.
A window may contain multiple rows of data, and currently only supports opening windows for one physical quantity, which is one column of data. | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | + + +#### Interface Description: + +- `MappableRowByRowStrategy` and `RowByRowAccessStrategy`: The construction of `RowByRowAccessStrategy` does not require any parameters. + +- `SlidingTimeWindowAccessStrategy` + +Window opening diagram: + + + +`SlidingTimeWindowAccessStrategy`: `SlidingTimeWindowAccessStrategy` has many constructors, you can pass 3 types of parameters to them: + +- Parameter 1: The display window on the time axis + +The first type of parameters are optional. If the parameters are not provided, the beginning time of the display window will be set to the same as the minimum timestamp of the query result set, and the ending time of the display window will be set to the same as the maximum timestamp of the query result set. + +- Parameter 2: Time interval for dividing the time axis (should be positive) +- Parameter 3: Time sliding step (not required to be greater than or equal to the time interval, but must be a positive number) + +The sliding step parameter is also optional. If the parameter is not provided, the sliding step will be set to the same as the time interval for dividing the time axis. + +The relationship between the three types of parameters can be seen in the figure below. Please see the Javadoc for more details. + +

+ +> Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, there may be cases where the number of data rows in some time windows is 0. In these cases, the framework will also call the `transform` method for the empty windows. + +- `SlidingSizeWindowAccessStrategy` + +Window opening diagram: + + + +`SlidingSizeWindowAccessStrategy`: `SlidingSizeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: + +* Parameter 1: Window size. This parameter specifies the number of data rows contained in a data processing window. Note that the number of data rows in some of the last time windows may be less than the specified number of data rows. +* Parameter 2: Sliding step. This parameter means the number of rows between the first point of the next window and the first point of the current window. (This parameter is not required to be greater than or equal to the window size, but must be a positive number) + +The sliding step parameter is optional. If the parameter is not provided, the sliding step will be set to the same as the window size. + +- `SessionTimeWindowAccessStrategy` + +Window opening diagram: **Time intervals less than or equal to the given minimum time interval `sessionGap` are assigned in one group.** + + + +`SessionTimeWindowAccessStrategy`: `SessionTimeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: + +- Parameter 1: The display window on the time axis. +- Parameter 2: The minimum time interval `sessionGap` of two adjacent windows. + +- `StateWindowAccessStrategy` + +Window opening diagram: **For numerical data, if the state difference is less than or equal to the given threshold `delta`, it will be assigned in one group.** + + + +`StateWindowAccessStrategy` has four constructors. + +- Constructor 1: For numerical data, there are 3 parameters: the time axis can display the start and end time of the time window and the threshold `delta` for the allowable change within a single window. +- Constructor 2: For text data and boolean data, there are 3 parameters: the time axis can be provided to display the start and end time of the time window. For both data types, the data within a single window is same, and there is no need to provide an allowable change threshold. +- Constructor 3: For numerical data, there are 1 parameters: you can only provide the threshold delta that is allowed to change within a single window. The start time of the time axis display time window will be defined as the smallest timestamp in the entire query result set, and the time axis display time window end time will be defined as The largest timestamp in the entire query result set. +- Constructor 4: For text data and boolean data, you can provide no parameter. The start and end timestamps are explained in Constructor 3. + +StateWindowAccessStrategy can only take one column as input for now. + +Please see the Javadoc for more details. + + 2.2.2 **setOutputDataType** + +Note that the type of output sequence you set here determines the type of data that the `PointCollector` can actually receive in the `transform` method. The relationship between the output data type set in `setOutputDataType` and the actual data output type that `PointCollector` can receive is as follows: + +| Output Data Type Set in `setOutputDataType` | Data Type that `PointCollector` Can Receive | +| :------------------------------------------ | :----------------------------------------------------------- | +| INT32 | int | +| INT64 | long | +| FLOAT | float | +| DOUBLE | double | +| BOOLEAN | boolean | +| TEXT | java.lang.String and org.apache.iotdb.udf.api.type.Binar` | + +The type of output time series of a UDTF is determined at runtime, which means that a UDTF can dynamically determine the type of output time series according to the type of input time series. +Here is a simple example: + +```java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(parameters.getDataType(0)); +} +``` + +3. **Object transform(Row row) throws Exception** + +You need to implement this method or `transform(Column[] columns, ColumnBuilder builder) throws Exception` when you specify the strategy of UDF to read the original data as `MappableRowByRowAccessStrategy`. + +This method processes the raw data one row at a time. The raw data is input from `Row` and output by its return object. You must return only one object based on each input data point in a single `transform` method call, i.e., input and output are one-to-one. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +The following is a complete UDF example that implements the `Object transform(Row row) throws Exception` method. It is an adder that receives two columns of time series as input. + +```java +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameterValidator; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.MappableRowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + private Type dataType; + + @Override + public void validate(UDFParameterValidator validator) throws Exception { + validator + .validateInputSeriesNumber(2) + .validateInputSeriesDataType(0, Type.INT64) + .validateInputSeriesDataType(1, Type.INT64); + } + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + dataType = parameters.getDataType(0); + configurations + .setAccessStrategy(new MappableRowByRowAccessStrategy()) + .setOutputDataType(dataType); + } + + @Override + public Object transform(Row row) throws Exception { + return row.getLong(0) + row.getLong(1); + } +} +``` + + + +4. **void transform(Column[] columns, ColumnBuilder builder) throws Exception** + +You need to implement this method or `Object transform(Row row) throws Exception` when you specify the strategy of UDF to read the original data as `MappableRowByRowAccessStrategy`. + +This method processes the raw data multiple rows at a time. After performance tests, we found that UDTF that process multiple rows at once perform better than those UDTF that process one data point at a time. The raw data is input from `Column[]` and output by `ColumnBuilder`. You must output a corresponding data point based on each input data point in a single `transform` method call, i.e., input and output are still one-to-one. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +The following is a complete UDF example that implements the `void transform(Column[] columns, ColumnBuilder builder) throws Exception` method. It is an adder that receives two columns of time series as input. + +```java +import org.apache.iotdb.tsfile.read.common.block.column.Column; +import org.apache.iotdb.tsfile.read.common.block.column.ColumnBuilder; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameterValidator; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.MappableRowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + private Type type; + + @Override + public void validate(UDFParameterValidator validator) throws Exception { + validator + .validateInputSeriesNumber(2) + .validateInputSeriesDataType(0, Type.INT64) + .validateInputSeriesDataType(1, Type.INT64); + } + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + type = parameters.getDataType(0); + configurations.setAccessStrategy(new MappableRowByRowAccessStrategy()).setOutputDataType(type); + } + + @Override + public void transform(Column[] columns, ColumnBuilder builder) throws Exception { + long[] inputs1 = columns[0].getLongs(); + long[] inputs2 = columns[1].getLongs(); + + int count = columns[0].getPositionCount(); + for (int i = 0; i < count; i++) { + builder.writeLong(inputs1[i] + inputs2[i]); + } + } +} +``` + +5. **void transform(Row row, PointCollector collector) throws Exception** + +You need to implement this method when you specify the strategy of UDF to read the original data as `RowByRowAccessStrategy`. + +This method processes the raw data one row at a time. The raw data is input from `Row` and output by `PointCollector`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +The following is a complete UDF example that implements the `void transform(Row row, PointCollector collector) throws Exception` method. It is an adder that receives two columns of time series as input. When two data points in a row are not `null`, this UDF will output the algebraic sum of these two data points. + +``` java +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT64) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) throws Exception { + if (row.isNull(0) || row.isNull(1)) { + return; + } + collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); + } +} +``` + +6. **void transform(RowWindow rowWindow, PointCollector collector) throws Exception** + +You need to implement this method when you specify the strategy of UDF to read the original data as `SlidingTimeWindowAccessStrategy` or `SlidingSizeWindowAccessStrategy`. + +This method processes a batch of data in a fixed number of rows or a fixed time interval each time, and we call the container containing this batch of data a window. The raw data is input from `RowWindow` and output by `PointCollector`. `RowWindow` can help you access a batch of `Row`, it provides a set of interfaces for random access and iterative access to this batch of `Row`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +Below is a complete UDF example that implements the `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` method. It is a counter that receives any number of time series as input, and its function is to count and output the number of data rows in each time window within a specified time range. + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.access.RowWindow; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Counter implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new SlidingTimeWindowAccessStrategy( + parameters.getLong("time_interval"), + parameters.getLong("sliding_step"), + parameters.getLong("display_window_begin"), + parameters.getLong("display_window_end"))); + } + + @Override + public void transform(RowWindow rowWindow, PointCollector collector) { + if (rowWindow.windowSize() != 0) { + collector.putInt(rowWindow.windowStartTime(), rowWindow.windowSize()); + } + } +} +``` + +7. **void terminate(PointCollector collector) throws Exception** + +In some scenarios, a UDF needs to traverse all the original data to calculate the final output data points. The `terminate` interface provides support for those scenarios. + +This method is called after all `transform` calls are executed and before the `beforeDestory` method is executed. You can implement the `transform` method to perform pure data processing (without outputting any data points), and implement the `terminate` method to output the processing results. + +The processing results need to be output by the `PointCollector`. You can output any number of data points in one `terminate` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +Below is a complete UDF example that implements the `void terminate(PointCollector collector) throws Exception` method. It takes one time series whose data type is `INT32` as input, and outputs the maximum value point of the series. + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Max implements UDTF { + + private Long time; + private int value; + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) { + if (row.isNull(0)) { + return; + } + int candidateValue = row.getInt(0); + if (time == null || value < candidateValue) { + time = row.getTime(); + value = candidateValue; + } + } + + @Override + public void terminate(PointCollector collector) throws IOException { + if (time != null) { + collector.putInt(time, value); + } + } +} +``` + +8. **void beforeDestroy()** + +The method for terminating a UDF. + +This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. + + + +### 1.3 UDAF (User Defined Aggregation Function) + +A complete definition of UDAF involves two classes, `State` and `UDAF`. + +#### State Class + +To write your own `State`, you need to implement the `org.apache.iotdb.udf.api.State` interface. + +#### Interface Description: + +| Interface Definition | Description | Required to Implement | +| -------------------------------- | ------------------------------------------------------------ | --------------------- | +| void reset() | To reset the `State` object to its initial state, you need to fill in the initial values of the fields in the `State` class within this method as if you were writing a constructor. | Required | +| byte[] serialize() | Serializes `State` to binary data. This method is used for IoTDB internal `State` passing. Note that the order of serialization must be consistent with the following deserialization methods. | Required | +| void deserialize(byte[] bytes) | Deserializes binary data to `State`. This method is used for IoTDB internal `State` passing. Note that the order of deserialization must be consistent with the serialization method above. | Required | + +#### Detailed interface introduction: + +1. **void reset()** + +This method resets the `State` to its initial state, you need to fill in the initial values of the fields in the `State` object in this method. For optimization reasons, IoTDB reuses `State` as much as possible internally, rather than creating a new `State` for each group, which would introduce unnecessary overhead. When `State` has finished updating the data in a group, this method is called to reset to the initial state as a way to process the next group. + +In the case of `State` for averaging (aka `avg`), for example, you would need the sum of the data, `sum`, and the number of entries in the data, `count`, and initialize both to 0 in the `reset()` method. + +```java +class AvgState implements State { + double sum; + + long count; + + @Override + public void reset() { + sum = 0; + count = 0; + } + + // other methods +} +``` + +2. **byte[] serialize()/void deserialize(byte[] bytes)** + +These methods serialize the `State` into binary data, and deserialize the `State` from the binary data. IoTDB, as a distributed database, involves passing data among different nodes, so you need to write these two methods to enable the passing of the State among different nodes. Note that the order of serialization and deserialization must be the consistent. + +In the case of `State` for averaging (aka `avg`), for example, you can convert the content of State to `byte[]` array and read out the content of State from `byte[]` array in any way you want, the following shows the code for serialization/deserialization using `ByteBuffer` introduced by Java8: + +```java +@Override +public byte[] serialize() { + ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES); + buffer.putDouble(sum); + buffer.putLong(count); + + return buffer.array(); +} + +@Override +public void deserialize(byte[] bytes) { + ByteBuffer buffer = ByteBuffer.wrap(bytes); + sum = buffer.getDouble(); + count = buffer.getLong(); +} +``` + + + +#### UDAF Classes + +To write a UDAF, you need to implement the `org.apache.iotdb.udf.api.UDAF` interface. + +#### Interface Description: + +| Interface definition | Description | Required to Implement | +| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------- | +| void validate(UDFParameterValidator validator) throws Exception | This method is mainly used to validate `UDFParameters` and it is executed before `beforeStart(UDFParameters, UDTFConfigurations)` is called. | Optional | +| void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception | Initialization method that invokes user-defined initialization behavior before UDAF processes the input data. Unlike UDTF, configuration is of type `UDAFConfiguration`. | Required | +| State createState() | To create a `State` object, usually just call the default constructor and modify the default initial value as needed. | Required | +| void addInput(State state, Column[] columns, BitMap bitMap) | Update `State` object according to the incoming data `Column[]` in batch, note that last column `columns[columns.length - 1]` always represents the time column. In addition, `BitMap` represents the data that has been filtered out before, you need to manually determine whether the corresponding data has been filtered out when writing this method. | Required | +| void combineState(State state, State rhs) | Merge `rhs` state into `state` state. In a distributed scenario, the same set of data may be distributed on different nodes, IoTDB generates a `State` object for the partial data on each node, and then calls this method to merge it into the complete `State`. | Required | +| void outputFinal(State state, ResultValue resultValue) | Computes the final aggregated result based on the data in `State`. Note that according to the semantics of the aggregation, only one value can be output per group. | Required | +| void beforeDestroy() | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | + +In the life cycle of a UDAF instance, the calling sequence of each method is as follows: + +1. State createState() +2. void validate(UDFParameterValidator validator) throws Exception +3. void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception +4. void addInput(State state, Column[] columns, BitMap bitMap) +5. void combineState(State state, State rhs) +6. void outputFinal(State state, ResultValue resultValue) +7. void beforeDestroy() + +Similar to UDTF, every time the framework executes a UDAF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDAF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDAF without considering the influence of concurrency and other factors. + +#### Detailed interface introduction: + + +1. **void validate(UDFParameterValidator validator) throws Exception** + +Same as UDTF, the `validate` method is used to validate the parameters entered by the user. + +In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification. + +2. **void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception** + + The `beforeStart` method does the same thing as the UDAF: + +1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. +2. Set the strategy to access the raw data and set the output data type in UDAFConfigurations. +3. Create resources, such as establishing external connections, opening files, etc. + +The role of the `UDFParameters` type can be seen above. + +2.2 **UDTFConfigurations** + +The difference from UDTF is that UDAF uses `UDAFConfigurations` as the type of `configuration` object. + +Currently, this class only supports setting the type of output data. + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setOutputDataType(Type.INT32); } +} +``` + +The relationship between the output type set in `setOutputDataType` and the type of data output that `ResultValue` can actually receive is as follows: + +| The output type set in `setOutputDataType` | The output type that `ResultValue` can actually receive | +| ------------------------------------------ | ------------------------------------------------------- | +| INT32 | int | +| INT64 | long | +| FLOAT | float | +| DOUBLE | double | +| BOOLEAN | boolean | +| TEXT | org.apache.iotdb.udf.api.type.Binary | + +The output type of the UDAF is determined at runtime. You can dynamically determine the output sequence type based on the input type. + +Here is a simple example: + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setOutputDataType(parameters.getDataType(0)); +} +``` + +3. **State createState()** + + +This method creates and initializes a `State` object for UDAF. Due to the limitations of the Java language, you can only call the default constructor for the `State` class. The default constructor assigns a default initial value to all the fields in the class, and if that initial value does not meet your requirements, you need to initialize them manually within this method. + +The following is an example that includes manual initialization. Suppose you want to implement an aggregate function that multiply all numbers in the group, then your initial `State` value should be set to 1, but the default constructor initializes it to 0, so you need to initialize `State` manually after calling the default constructor: + +```java +public State createState() { + MultiplyState state = new MultiplyState(); + state.result = 1; + return state; +} +``` + +4. **void addInput(State state, Column[] columns, BitMap bitMap)** + +This method updates the `State` object with the raw input data. For performance reasons, also to align with the IoTDB vectorized query engine, the raw input data is no longer a data point, but an array of columns ``Column[]``. Note that the last column (i.e. `columns[columns.length - 1]`) is always the time column, so you can also do different operations in UDAF depending on the time. + +Since the input parameter is not of a single data point type, but of multiple columns, you need to manually filter some of the data in the columns, which is why the third parameter, `BitMap`, exists. It identifies which of these columns have been filtered out, so you don't have to think about the filtered data in any case. + +Here's an example of `addInput()` that counts the number of items (aka count). It shows how you can use `BitMap` to ignore data that has been filtered out. Note that due to the limitations of the Java language, you need to do the explicit cast the `State` object from type defined in the interface to a custom `State` type at the beginning of the method, otherwise you won't be able to use the `State` object. + +```java +public void addInput(State state, Column[] columns, BitMap bitMap) { + CountState countState = (CountState) state; + + int count = columns[0].getPositionCount(); + for (int i = 0; i < count; i++) { + if (bitMap != null && !bitMap.isMarked(i)) { + continue; + } + if (!columns[0].isNull(i)) { + countState.count++; + } + } +} +``` + +5. **void combineState(State state, State rhs)** + + +This method combines two `State`s, or more precisely, updates the first `State` object with the second `State` object. IoTDB is a distributed database, and the data of the same group may be distributed on different nodes. For performance reasons, IoTDB will first aggregate some of the data on each node into `State`, and then merge the `State`s on different nodes that belong to the same group, which is what `combineState` does. + +Here's an example of `combineState()` for averaging (aka avg). Similar to `addInput`, you need to do an explicit type conversion for the two `State`s at the beginning. Also note that you are updating the value of the first `State` with the contents of the second `State`. + +```java +public void combineState(State state, State rhs) { + AvgState avgState = (AvgState) state; + AvgState avgRhs = (AvgState) rhs; + + avgState.count += avgRhs.count; + avgState.sum += avgRhs.sum; +} +``` + +6. **void outputFinal(State state, ResultValue resultValue)** + +This method works by calculating the final result from `State`. You need to access the various fields in `State`, derive the final result, and set the final result into the `ResultValue` object.IoTDB internally calls this method once at the end for each group. Note that according to the semantics of aggregation, the final result can only be one value. + +Here is another `outputFinal` example for averaging (aka avg). In addition to the forced type conversion at the beginning, you will also see a specific use of the `ResultValue` object, where the final result is set by `setXXX` (where `XXX` is the type name). + +```java +public void outputFinal(State state, ResultValue resultValue) { + AvgState avgState = (AvgState) state; + + if (avgState.count != 0) { + resultValue.setDouble(avgState.sum / avgState.count); + } else { + resultValue.setNull(); + } +} +``` + +7. **void beforeDestroy()** + + +The method for terminating a UDF. + +This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. + + +### 1.4 Maven Project Example + +If you use Maven, you can build your own UDF project referring to our **udf-example** module. You can find the project [here](https://github.com/apache/iotdb/tree/master/example/udf). + + +## 2. Contribute universal built-in UDF functions to iotdb + +This part mainly introduces how external users can contribute their own UDFs to the IoTDB community. + +#### 2.1 Prerequisites + +1. UDFs must be universal. + + The "universal" mentioned here refers to: UDFs can be widely used in some scenarios. In other words, the UDF function must have reuse value and may be directly used by other users in the community. + + If you are not sure whether the UDF you want to contribute is universal, you can send an email to `dev@iotdb.apache.org` or create an issue to initiate a discussion. + +2. The UDF you are going to contribute has been well tested and can run normally in the production environment. + + +#### 2.2 What you need to prepare + +1. UDF source code +2. Test cases +3. Instructions + +### 2.3 Contribution Content + +#### 2.3.1 UDF Source Code + +1. Create the UDF main class and related classes in `iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin` or in its subfolders. +2. Register your UDF in `iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin/BuiltinTimeSeriesGeneratingFunction.java`. + +#### 2.3.2 Test Cases + +At a minimum, you need to write integration tests for the UDF. + +You can add a test class in `integration-test/src/test/java/org/apache/iotdb/db/it/udf`. + + +#### 2.3.3 Instructions + +The instructions need to include: the name and the function of the UDF, the attribute parameters that must be provided when the UDF is executed, the applicable scenarios, and the usage examples, etc. + +The instructions for use should include both Chinese and English versions. Instructions for use should be added separately in `docs/zh/UserGuide/Operation Manual/DML Data Manipulation Language.md` and `docs/UserGuide/Operation Manual/DML Data Manipulation Language.md`. + +#### 2.3.4 Submit a PR + +When you have prepared the UDF source code, test cases, and instructions, you are ready to submit a Pull Request (PR) on [Github](https://github.com/apache/iotdb). You can refer to our code contribution guide to submit a PR: [Development Guide](https://iotdb.apache.org/Community/Development-Guide.html). + + +After the PR review is approved and merged, your UDF has already contributed to the IoTDB community! diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_apache.md b/src/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_apache.md new file mode 100644 index 00000000..2bf1553c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_apache.md @@ -0,0 +1,213 @@ +# USER-DEFINED FUNCTION (UDF) + +## 1. UDF Introduction + +UDF (User Defined Function) refers to user-defined functions. IoTDB provides a variety of built-in time series processing functions and also supports extending custom functions to meet more computing needs. + +In IoTDB, you can expand two types of UDF: + + + + + + + + + + + + + + + + + + + + + +
UDF ClassAccessStrategyDescription
UDTFMAPPABLE_ROW_BY_ROWCustom scalar function, input k columns of time series and 1 row of data, output 1 column of time series and 1 row of data, can be used in any clause and expression that appears in the scalar function, such as select clause, where clause, etc.
ROW_BY_ROW
SLIDING_TIME_WINDOW
SLIDING_SIZE_WINDOW
SESSION_TIME_WINDOW
STATE_WINDOW
Custom time series generation function, input k columns of time series m rows of data, output 1 column of time series n rows of data, the number of input rows m can be different from the number of output rows n, and can only be used in SELECT clauses.
UDAF-Custom aggregation function, input k columns of time series m rows of data, output 1 column of time series 1 row of data, can be used in any clause and expression that appears in the aggregation function, such as select clause, having clause, etc.
+ +### 1.1 UDF usage + +The usage of UDF is similar to that of regular built-in functions, and can be directly used in SELECT statements like calling regular functions. + +#### 1.Basic SQL syntax support + +* Support `SLIMIT` / `SOFFSET` +* Support `LIMIT` / `OFFSET` +* Support queries with value filters +* Support queries with time filters + + +#### 2. Queries with * in SELECT Clauses + +Assume that there are 2 time series (`root.sg.d1.s1` and `root.sg.d1.s2`) in the system. + +* **`SELECT example(*) from root.sg.d1`** + +Then the result set will include the results of `example (root.sg.d1.s1)` and `example (root.sg.d1.s2)`. + +* **`SELECT example(s1, *) from root.sg.d1`** + +Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)` and `example(root.sg.d1.s1, root.sg.d1.s2)`. + +* **`SELECT example(*, *) from root.sg.d1`** + +Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)`, `example(root.sg.d1.s2, root.sg.d1.s1)`, `example(root.sg.d1.s1, root.sg.d1.s2)` and `example(root.sg.d1.s2, root.sg.d1.s2)`. + +#### 3. Queries with Key-value Attributes in UDF Parameters + +You can pass any number of key-value pair parameters to the UDF when constructing a UDF query. The key and value in the key-value pair need to be enclosed in single or double quotes. Note that key-value pair parameters can only be passed in after all time series have been passed in. Here is a set of examples: + + Example: +``` sql +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; +``` + +#### 4. Nested Queries + + Example: +``` sql +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` + +## 2. UDF Development + +You can refer to UDF development:[Development Guide](./UDF-development.md) + +## 3. UDF management + +### 3.1 UDF Registration + +The process of registering a UDF in IoTDB is as follows: + +1. Implement a complete UDF class, assuming the full class name of this class is `org.apache.iotdb.udf.ExampleUDTF`. +2. Convert the project into a JAR package. If using Maven to manage the project, you can refer to the [Maven project example](https://github.com/apache/iotdb/tree/master/example/udf) above. +3. Make preparations for registration according to the registration mode. For details, see the following example. +4. You can use following SQL to register UDF. + +```sql +CREATE FUNCTION AS (USING URI URI-STRING) +``` + +#### Example: register UDF named `example`, you can choose either of the following two registration methods + +#### Method 1: Manually place the jar package + +Prepare: +When registering using this method, it is necessary to place the JAR package in advance in the `ext/udf` directory of all nodes in the cluster (which can be configured). + +Registration statement: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' +``` + +#### Method 2: Cluster automatically installs jar packages through URI + +Prepare: +When registering using this method, it is necessary to upload the JAR package to the URI server in advance and ensure that the IoTDB instance executing the registration statement can access the URI server. + +Registration statement: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' +``` + +IoTDB will download JAR packages and synchronize them to the entire cluster. + +#### Note + +1. Since UDF instances are dynamically loaded through reflection technology, you do not need to restart the server during the UDF registration process. + +2. UDF function names are not case-sensitive. + +3. Please ensure that the function name given to the UDF is different from all built-in function names. A UDF with the same name as a built-in function cannot be registered. + +4. We recommend that you do not use classes that have the same class name but different function logic in different JAR packages. For example, in `UDF(UDAF/UDTF): udf1, udf2`, the JAR package of udf1 is `udf1.jar` and the JAR package of udf2 is `udf2.jar`. Assume that both JAR packages contain the `org.apache.iotdb.udf.ExampleUDTF` class. If you use two UDFs in the same SQL statement at the same time, the system will randomly load either of them and may cause inconsistency in UDF execution behavior. + +### 3.2 UDF Deregistration + +The SQL syntax is as follows: + +```sql +DROP FUNCTION +``` + +Example: Uninstall the UDF from the above example: + +```sql +DROP FUNCTION example +``` + + + +### 3.3 Show All Registered UDFs + +``` sql +SHOW FUNCTIONS +``` + +### 3.4 UDF configuration + +- UDF configuration allows configuring the storage directory of UDF in `iotdb-system.properties` + ``` Properties +# UDF lib dir + +udf_lib_dir=ext/udf +``` + +- -When using custom functions, there is a message indicating insufficient memory. Change the following configuration parameters in `iotdb-system.properties` and restart the service. + + ``` Properties + +# Used to estimate the memory usage of text fields in a UDF query. +# It is recommended to set this value to be slightly larger than the average length of all text +# effectiveMode: restart +# Datatype: int +udf_initial_byte_array_length_for_memory_control=48 + +# How much memory may be used in ONE UDF query (in MB). +# The upper limit is 20% of allocated memory for read. +# effectiveMode: restart +# Datatype: float +udf_memory_budget_in_mb=30.0 + +# UDF memory allocation ratio. +# The parameter form is a:b:c, where a, b, and c are integers. +# effectiveMode: restart +udf_reader_transformer_collector_memory_proportion=1:1:1 +``` + +### 3.5 UDF User Permissions + + +When users use UDF, they will be involved in the `USE_UDF` permission, and only users with this permission are allowed to perform UDF registration, uninstallation, and query operations. + +For more user permissions related content, please refer to [Account Management Statements](./Authority-Management.md). + + +## 4. UDF Libraries + +Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. + +You can refer to the [UDF Libraries](../SQL-Manual/UDF-Libraries_apache.md)document to find the installation steps and registration statements for each function, to ensure that all required functions are registered correctly. + + +## 5. Common problem: + +Q1: How to modify the registered UDF? + +A1: Assume that the name of the UDF is `example` and the full class name is `org.apache.iotdb.udf.ExampleUDTF`, which is introduced by `example.jar`. + +1. Unload the registered function by executing `DROP FUNCTION example`. +2. Delete `example.jar` under `iotdb-server-1.0.0-all-bin/ext/udf`. +3. Modify the logic in `org.apache.iotdb.udf.ExampleUDTF` and repackage it. The name of the JAR package can still be `example.jar`. +4. Upload the new JAR package to `iotdb-server-1.0.0-all-bin/ext/udf`. +5. Load the new UDF by executing `CREATE FUNCTION example AS "org.apache.iotdb.udf.ExampleUDTF"`. + diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_timecho.md new file mode 100644 index 00000000..fcbae4cc --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_timecho.md @@ -0,0 +1,213 @@ +# USER-DEFINED FUNCTION (UDF) + +## 1. UDF Introduction + +UDF (User Defined Function) refers to user-defined functions. IoTDB provides a variety of built-in time series processing functions and also supports extending custom functions to meet more computing needs. + +In IoTDB, you can expand two types of UDF: + + + + + + + + + + + + + + + + + + + + + +
UDF ClassAccessStrategyDescription
UDTFMAPPABLE_ROW_BY_ROWCustom scalar function, input k columns of time series and 1 row of data, output 1 column of time series and 1 row of data, can be used in any clause and expression that appears in the scalar function, such as select clause, where clause, etc.
ROW_BY_ROW
SLIDING_TIME_WINDOW
SLIDING_SIZE_WINDOW
SESSION_TIME_WINDOW
STATE_WINDOW
Custom time series generation function, input k columns of time series m rows of data, output 1 column of time series n rows of data, the number of input rows m can be different from the number of output rows n, and can only be used in SELECT clauses.
UDAF-Custom aggregation function, input k columns of time series m rows of data, output 1 column of time series 1 row of data, can be used in any clause and expression that appears in the aggregation function, such as select clause, having clause, etc.
+ +### 1.1 UDF usage + +The usage of UDF is similar to that of regular built-in functions, and can be directly used in SELECT statements like calling regular functions. + +#### 1.Basic SQL syntax support + +* Support `SLIMIT` / `SOFFSET` +* Support `LIMIT` / `OFFSET` +* Support queries with value filters +* Support queries with time filters + + +#### 2. Queries with * in SELECT Clauses + +Assume that there are 2 time series (`root.sg.d1.s1` and `root.sg.d1.s2`) in the system. + +* **`SELECT example(*) from root.sg.d1`** + +Then the result set will include the results of `example (root.sg.d1.s1)` and `example (root.sg.d1.s2)`. + +* **`SELECT example(s1, *) from root.sg.d1`** + +Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)` and `example(root.sg.d1.s1, root.sg.d1.s2)`. + +* **`SELECT example(*, *) from root.sg.d1`** + +Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)`, `example(root.sg.d1.s2, root.sg.d1.s1)`, `example(root.sg.d1.s1, root.sg.d1.s2)` and `example(root.sg.d1.s2, root.sg.d1.s2)`. + +#### 3. Queries with Key-value Attributes in UDF Parameters + +You can pass any number of key-value pair parameters to the UDF when constructing a UDF query. The key and value in the key-value pair need to be enclosed in single or double quotes. Note that key-value pair parameters can only be passed in after all time series have been passed in. Here is a set of examples: + + Example: +``` sql +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; +``` + +#### 4. Nested Queries + + Example: +``` sql +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` + +## 2. UDF Development + +You can refer to UDF development:[Development Guide](./UDF-development.md) + +## 3. UDF management + +### 3.1 UDF Registration + +The process of registering a UDF in IoTDB is as follows: + +1. Implement a complete UDF class, assuming the full class name of this class is `org.apache.iotdb.udf.ExampleUDTF`. +2. Convert the project into a JAR package. If using Maven to manage the project, you can refer to the [Maven project example](https://github.com/apache/iotdb/tree/master/example/udf) above. +3. Make preparations for registration according to the registration mode. For details, see the following example. +4. You can use following SQL to register UDF. + +```sql +CREATE FUNCTION AS (USING URI URI-STRING) +``` + +#### Example: register UDF named `example`, you can choose either of the following two registration methods + +#### Method 1: Manually place the jar package + +Prepare: +When registering using this method, it is necessary to place the JAR package in advance in the `ext/udf` directory of all nodes in the cluster (which can be configured). + +Registration statement: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' +``` + +#### Method 2: Cluster automatically installs jar packages through URI + +Prepare: +When registering using this method, it is necessary to upload the JAR package to the URI server in advance and ensure that the IoTDB instance executing the registration statement can access the URI server. + +Registration statement: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' +``` + +IoTDB will download JAR packages and synchronize them to the entire cluster. + +#### Note + +1. Since UDF instances are dynamically loaded through reflection technology, you do not need to restart the server during the UDF registration process. + +2. UDF function names are not case-sensitive. + +3. Please ensure that the function name given to the UDF is different from all built-in function names. A UDF with the same name as a built-in function cannot be registered. + +4. We recommend that you do not use classes that have the same class name but different function logic in different JAR packages. For example, in `UDF(UDAF/UDTF): udf1, udf2`, the JAR package of udf1 is `udf1.jar` and the JAR package of udf2 is `udf2.jar`. Assume that both JAR packages contain the `org.apache.iotdb.udf.ExampleUDTF` class. If you use two UDFs in the same SQL statement at the same time, the system will randomly load either of them and may cause inconsistency in UDF execution behavior. + +### 3.2 UDF Deregistration + +The SQL syntax is as follows: + +```sql +DROP FUNCTION +``` + +Example: Uninstall the UDF from the above example: + +```sql +DROP FUNCTION example +``` + + + +### 3.3 Show All Registered UDFs + +``` sql +SHOW FUNCTIONS +``` + +### 3.4 UDF configuration + +- UDF configuration allows configuring the storage directory of UDF in `iotdb-system.properties` + ``` Properties +# UDF lib dir + +udf_lib_dir=ext/udf +``` + +- -When using custom functions, there is a message indicating insufficient memory. Change the following configuration parameters in `iotdb-system.properties` and restart the service. + + ``` Properties + +# Used to estimate the memory usage of text fields in a UDF query. +# It is recommended to set this value to be slightly larger than the average length of all text +# effectiveMode: restart +# Datatype: int +udf_initial_byte_array_length_for_memory_control=48 + +# How much memory may be used in ONE UDF query (in MB). +# The upper limit is 20% of allocated memory for read. +# effectiveMode: restart +# Datatype: float +udf_memory_budget_in_mb=30.0 + +# UDF memory allocation ratio. +# The parameter form is a:b:c, where a, b, and c are integers. +# effectiveMode: restart +udf_reader_transformer_collector_memory_proportion=1:1:1 +``` + +### 3.5 UDF User Permissions + + +When users use UDF, they will be involved in the `USE_UDF` permission, and only users with this permission are allowed to perform UDF registration, uninstallation, and query operations. + +For more user permissions related content, please refer to [Account Management Statements](./Authority-Management.md). + + +## 4. UDF Libraries + +Based on the ability of user-defined functions, IoTDB provides a series of functions for temporal data processing, including data quality, data profiling, anomaly detection, frequency domain analysis, data matching, data repairing, sequence discovery, machine learning, etc., which can meet the needs of industrial fields for temporal data processing. + +You can refer to the [UDF Libraries](../SQL-Manual/UDF-Libraries_timecho.md)document to find the installation steps and registration statements for each function, to ensure that all required functions are registered correctly. + + +## 5. Common problem: + +Q1: How to modify the registered UDF? + +A1: Assume that the name of the UDF is `example` and the full class name is `org.apache.iotdb.udf.ExampleUDTF`, which is introduced by `example.jar`. + +1. Unload the registered function by executing `DROP FUNCTION example`. +2. Delete `example.jar` under `iotdb-server-1.0.0-all-bin/ext/udf`. +3. Modify the logic in `org.apache.iotdb.udf.ExampleUDTF` and repackage it. The name of the JAR package can still be `example.jar`. +4. Upload the new JAR package to `iotdb-server-1.0.0-all-bin/ext/udf`. +5. Load the new UDF by executing `CREATE FUNCTION example AS "org.apache.iotdb.udf.ExampleUDTF"`. + diff --git a/src/UserGuide/V2.0.1/Tree/User-Manual/White-List_timecho.md b/src/UserGuide/V2.0.1/Tree/User-Manual/White-List_timecho.md new file mode 100644 index 00000000..75fe1186 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/User-Manual/White-List_timecho.md @@ -0,0 +1,70 @@ + + +# White List + +**function description** + +Allow which client addresses can connect to IoTDB + +**configuration file** + +conf/iotdb-system.properties + +conf/white.list + +**configuration item** + +iotdb-system.properties: + +Decide whether to enable white list + +```YAML + +# Whether to enable white list +enable_white_list=true +``` + +white.list: + +Decide which IP addresses can connect to IoTDB + +```YAML +# Support for annotation +# Supports precise matching, one IP per line +10.2.3.4 + +# Support for * wildcards, one ip per line +10.*.1.3 +10.100.0.* +``` + +**note** + +1. If the white list itself is cancelled via the session client, the current connection is not immediately disconnected. It is rejected the next time the connection is created. +2. If white.list is modified directly, it takes effect within one minute. If modified via the session client, it takes effect immediately, updating the values in memory and the white.list disk file. +3. Enable the whitelist function, there is no white.list file, start the DB service successfully, however, all connections are rejected. +4. while DB service is running, the white.list file is deleted, and all connections are denied after up to one minute. +5. whether to enable the configuration of the white list function, can be hot loaded. +6. Use the Java native interface to modify the whitelist, must be the root user to modify, reject non-root user to modify; modify the content must be legal, otherwise it will throw a StatementExecutionException. + +![](https://alioss.timecho.com/docs/img/%E7%99%BD%E5%90%8D%E5%8D%95.PNG) + diff --git a/src/UserGuide/V2.0.1/Tree/UserGuideReadme.md b/src/UserGuide/V2.0.1/Tree/UserGuideReadme.md new file mode 100644 index 00000000..a661bfd5 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/UserGuideReadme.md @@ -0,0 +1,31 @@ + +# IoTDB User Guide Toc + +We keep introducing more features into IoTDB. Therefore, different released versions have their user guide documents respectively. + +The "In Progress Version" is for matching the master branch of IOTDB's source code Repository. +Other documents are for IoTDB previous released versions. + +- [In progress version](https://iotdb.apache.org/UserGuide/Master/QuickStart/QuickStart_apache.html) +- [Version 1.0.x](https://iotdb.apache.org/UserGuide/V1.0.x/QuickStart/QuickStart.html) +- [Version 0.13.x](https://iotdb.apache.org/UserGuide/V0.13.x/QuickStart/QuickStart.html) + diff --git a/src/UserGuide/V2.0.1/Tree/stage/AINode_Deployment.md b/src/UserGuide/V2.0.1/Tree/stage/AINode_Deployment.md new file mode 100644 index 00000000..2bf8b04b --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/AINode_Deployment.md @@ -0,0 +1,329 @@ + +# AINode Deployment + +## Installation environment + +### Recommended Operating System + +Ubuntu, CentOS, MacOS + +### Runtime Environment + +AINode currently requires Python 3.8 or higher with pip and venv tools. + +For networked environments, AINode creates a virtual environment and downloads runtime dependencies automatically, no additional configuration is needed. + +In case of a non-networked environment, you can download it from https://cloud.tsinghua.edu.cn/d/4c1342f6c272439aa96c/to get the required dependencies and install them offline. + +## Installation steps + +Users can download the AINode software installation package, download and unzip it to complete the installation of AINode. You can also download the source code from the code repository and compile it to get the installation package. + +## Software directory structure + +After downloading and extracting the software package, you can get the following directory structure + +```Shell +|-- apache-iotdb-AINode-bin + |-- lib # package binary executable with environment dependencies + |-- conf # store configuration files + - iotdb-AINode.properties + |-- sbin # AINode related startup scripts + - start-AINode.sh + - start-AINode.bat + - stop-AINode.sh + - stop-AINode.bat + - remove-AINode.sh + - remove-AINode.bat + |-- licenses + - LICENSE + - NOTICE + - README.md + - README_ZH.md + - RELEASE_NOTES.md +``` + +- **lib:** AINode's compiled binary executable and related code dependencies. +- **conf:** contains AINode's configuration items, specifically the following configuration items +- **sbin:** AINode's runtime script, which can start, remove and stop AINode. + +## Start AINode + +After completing the deployment of Seed-ConfigNode, you can add an AINode node to support the model registration and inference functions. After specifying the information of IoTDB cluster in the configuration item, you can execute the corresponding commands to start AINode and join the IoTDB cluster. + +Note: Starting AINode requires that the system environment contains a Python interpreter of 3.8 or above as the default interpreter, so users should check whether the Python interpreter exists in the environment variables and can be directly invoked through the `python` command before using it. + +### Direct Start + +After obtaining the installation package files, you can directly start AINode for the first time. + +The startup commands on Linux and MacOS are as follows: + +```Shell +> bash sbin/start-AINode.sh +``` + +The startup command on windows is as follows: + +```Shell +> sbin\start-AINode.bat +``` + +If start AINode for the first time and do not specify the path to the interpreter, the script will create a new venv virtual environment in the root directory of the program using the system Python interpreter, and install the third-party dependencies of AINode and the main program of AINode in this environment automatically and successively. **This process will generate a virtual environment of about 1GB in size, so please reserve space for installation**. On subsequent startups, if the path to the interpreter is not specified, the script will automatically look for the newly created venv environment above and start AINode without having to install the program and dependencies repeatedly. + +Note that it is possible to activate reinstall with -r if you wish to force a reinstall of AINode proper on a certain startup, this parameter will reinstall AINode based on the files under lib. + +Linux和MacOS: + +```Shell +> bash sbin/start-AINode.sh -r +``` + +Windows: + +```Shell +> sbin\start-AINode.bat -r +``` + +For example, a user replaces a newer version of the AINode installer in the lib, but the installer is not installed in the user's usual environment. In this case, you need to add the -r option at startup to instruct the script to force a reinstallation of the main AINode program in the virtual environment to update the version. + +### Specify a customized virtual environment + +When starting AINode, you can specify a virtual environment interpreter path to install the AINode main program and its dependencies to a specific location. Specifically, you need to specify the value of the parameter ain_interpreter_dir. + +Linux and MacOS: + +```Shell +> bash sbin/start-AINode.sh -i xxx/bin/python +``` + +Windows: + +```Shell +> sbin\start-AINode.bat -i xxx\Scripts\python.exe +``` + +When specifying the Python interpreter please enter the address of the **executable file** of the Python interpreter in the virtual environment. Currently AINode **supports virtual environments such as venv, ****conda****, etc.** **Inputting the system Python interpreter as the installation location** is not supported. In order to ensure that scripts are recognized properly, please **use absolute paths whenever possible**! + +### Join the cluster + +The AINode startup process automatically adds the new AINode to the IoTDB cluster. After starting the AINode you can verify that the node was joined successfully by entering the SQL for the cluster query in IoTDB's cli command line. +```Shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|Running| 127.0.0.1| 10810|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ + +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | |UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730| | 0.0.0.0| 6667| 10740| 10750| 10760|UNKNOWN|190e303-dev| +| 2| AINode|Running| 127.0.0.1| 10810| | 0.0.0.0| 10810| | | |UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+-------+-----------+ + +IoTDB> show AINodes ++------+-------+----------+-------+ +|NodeID| Status|RpcAddress|RpcPort| ++------+-------+----------+-------+ +| 2|Running| 127.0.0.1| 10810| ++------+-------+----------+-------+ +``` + +## Remove AINode + +When it is necessary to move an already connected AINode out of the cluster, the corresponding removal script can be executed. + +The commands on Linux and MacOS are as follows: + +```Shell +> bash sbin/remove-AINode.sh +``` + +The startup command on windows is as follows: + +```Shell +> sbin/remove-AINode.bat +``` + +After removing the node, information about the node will not be available. + +```Shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` + +In addition, if the location of the AINode installation was previously customized, then the remove script should be called with the corresponding path as an argument: + +Linux and MacOS: + +```Shell +> bash sbin/remove-AINode.sh -i xxx/bin/python +``` + +Windows: + +```Shell +> sbin\remove-AINode.bat -i 1 xxx\Scripts\python.exe +``` + +Similarly, script parameters that are persistently modified in the env script will also take effect when the removal is performed. + +If a user loses a file in the data folder, AINode may not be able to remove itself locally, and requires the user to specify the node number, address and port number for removal, in which case we support the user to enter parameters for removal as follows + +Linux and MacOS: + +```Shell +> bash sbin/remove-AINode.sh -t /: +``` + +Windows: + +```Shell +> sbin\remove-AINode.bat -t /: +``` + +## Stop AINode + +If you need to stop a running AINode node, execute the appropriate shutdown script. + +The commands on Linux and MacOS are as follows: + +``` Shell. +> bash sbin/stop-AINode.sh +``` + +The startup command on windows is as follows: + +```Shell +> sbin/stop-AINode.bat +``` + +At this point the exact state of the node is not available and the corresponding management and reasoning functions cannot be used. If you need to restart the node, just execute the startup script again. + +```Shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|UNKNOWN| 127.0.0.1| 10790|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` + +## Script parameter details + +Two parameters are supported during AINode startup, and their specific roles are shown below: + +| **Name** | **Action Script** | Tag | **Description** | **Type** | **Default Value** | Input Method | +| ------------------- | ---------------- | ---- | ------------------------------------------------------------ | -------- | ---------------- | --------------------- | +| ain_interpreter_dir | start remove env | -i | The path to the interpreter of the virtual environment in which AINode is installed; absolute paths are required. | String | Read environment variables by default | Input on call + persistent modifications | +| ain_remove_target | remove stop | -t | AINode shutdown can specify the Node ID, address, and port number of the target AINode to be removed, in the format of `/:` | String | Null | Input on call | +| ain_force_reinstall | start remove env | -r | This script checks the version of the AINode installation, and if it does, it forces the installation of the whl package in lib if the version is not correct. | Bool | false | Input on call | +| ain_no_dependencies | start remove env | -n | Specifies whether to install dependencies when installing AINode, if so only the main AINode program will be installed without dependencies. | Bool | false | Input on call | + +Besides passing in the above parameters when executing the script as described above, it is also possible to modify some of the parameters persistently in the `AINode-env.sh` and `AINode-env.bat` scripts in the `conf` folder. + +`AINode-env.sh`: + +```Bash +# The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark +# ain_interpreter_dir= +``` + +`AINode-env.bat`: + +```Plain +@REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark +@REM set ain_interpreter_dir= +``` + +Uncomment the corresponding line after writing the parameter value and save it to take effect the next time you execute the script. + +## AINode configuration items + +AINode supports modifying some necessary parameters. The following parameters can be found in the `conf/iotdb-AINode.properties` file and modified for persistence: + +| **Name** | **Description** | **Type** | **Default Value** | **Modified Mode of Effect** | +| --------------------------- | ------------------------------------------------------------ | -------- | ------------------ | ---------------------------- | +| ain_seed_config_node | ConfigNode address registered at AINode startup | String | 10710 | Only allow to modify before the first startup | +| ain_inference_rpc_address | Addresses where AINode provides services and communications | String | 127.0.0.1 | Effective after reboot | +| ain_inference_rpc_port | AINode provides services and communication ports | String | 10810 | Effective after reboot | +| ain_system_dir | AINode metadata storage path, the starting directory of the relative path is related to the operating system, it is recommended to use the absolute path. | String | data/AINode/system | Effective after reboot | +| ain_models_dir | AINode stores the path to the model file. The starting directory of the relative path is related to the operating system, and an absolute path is recommended. | String | data/AINode/models | Effective after reboot | +| ain_logs_dir | The path where AINode stores the logs. The starting directory of the relative path is related to the operating system, and it is recommended to use the absolute path. | String | logs/AINode | Effective after reboot | + +## Frequently Asked Questions + +1. **Not found venv module error when starting AINode** + +When starting AINode using the default method, a python virtual environment is created in the installation package directory and dependencies are installed, thus requiring the installation of the venv module. Generally speaking, python 3.8 and above will come with venv, but for some systems that come with python environment may not fulfill this requirement. There are two solutions when this error occurs (either one or the other): + +- Install venv module locally, take ubuntu as an example, you can run the following command to install the venv module that comes with python. Or install a version of python that comes with venv from the python website. + +```SQL +apt-get install python3.8-venv +``` + +- Specify the path to an existing python interpreter as the AINode runtime environment via -i when running the startup script, so that you no longer need to create a new virtual environment. + +2. **Compiling the python environment in CentOS7** + +The new environment in centos7 (comes with python3.6) does not meet the requirements to start mlnode, you need to compile python3.8+ by yourself (python is not provided as a binary package in centos7) + +- Install OpenSSL + +> Currently Python versions 3.6 to 3.9 are compatible with OpenSSL 1.0.2, 1.1.0, and 1.1.1. + +Python requires that we have OpenSSL installed on our system, which can be found at https://stackoverflow.com/questions/56552390/how-to-fix-ssl-module-in-python-is-not-available-in-centos + +- Installation and compilation of python + +Download the installation package from the official website and extract it using the following specifications + +```SQL +wget https://www.python.org/ftp/python/3.8.1/Python-3.8.1.tgz +tar -zxvf Python-3.8.1.tgz +``` + +Compile and install the corresponding python packages. + +```SQL +./configure prefix=/usr/local/python3 -with-openssl=/usr/local/openssl +make && make install +``` + +1. **Windows compilation problem like "error: Microsoft Visual** **C++** **14.0 or greater is required..." compilation problem** on windows. + +The corresponding error is usually caused by an insufficient version of c++ or setuptools, you can find the appropriate solution at https://stackoverflow.com/questions/44951456/pip-error-microsoft-visual-c-14-0-is-required +you can find a suitable solution there. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Administration-Management/Administration.md b/src/UserGuide/V2.0.1/Tree/stage/Administration-Management/Administration.md new file mode 100644 index 00000000..44219a5e --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Administration-Management/Administration.md @@ -0,0 +1,541 @@ + + +# Administration Management + +IoTDB provides users with account privilege management operations, so as to ensure data security. + +We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../Reference/SQL-Reference.md). +At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. + +## Basic Concepts + +### User + +The user is the legal user of the database. A user corresponds to a unique username and has a password as a means of authentication. Before using a database, a person must first provide a legitimate username and password to make himself/herself a user. + +### Privilege + +The database provides a variety of operations, and not all users can perform all operations. If a user can perform an operation, the user is said to have the privilege to perform the operation. privileges are divided into data management privilege (such as adding, deleting and modifying data) and authority management privilege (such as creation and deletion of users and roles, granting and revoking of privileges, etc.). Data management privilege often needs a path to limit its effective range. It is flexible that using [path pattern](../Basic-Concept/Data-Model-and-Terminology.md) to manage privileges. + +### Role + +A role is a set of privileges and has a unique role name as an identifier. A user usually corresponds to a real identity (such as a traffic dispatcher), while a real identity may correspond to multiple users. These users with the same real identity tend to have the same privileges. Roles are abstractions that can unify the management of such privileges. + +### Default User + +There is a default user in IoTDB after the initial installation: root, and the default password is root. This user is an administrator user, who cannot be deleted and has all the privileges. Neither can new privileges be granted to the root user nor can privileges owned by the root user be deleted. + +## Privilege Management Operation Examples + +According to the [sample data](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt), the sample data of IoTDB might belong to different power generation groups such as ln, sgcc, etc. Different power generation groups do not want others to obtain their own database data, so we need to have data privilege isolated at the group layer. + +### Create User + +We use `CREATE USER ` to create users. For example, we can use root user who has all privileges to create two users for ln and sgcc groups, named ln\_write\_user and sgcc\_write\_user, with both passwords being write\_pwd. It is recommended to wrap the username in backtick(`). The SQL statement is: + +``` +CREATE USER `ln_write_user` 'write_pwd' +CREATE USER `sgcc_write_user` 'write_pwd' +``` +Then use the following SQL statement to show the user: + +``` +LIST USER +``` +As can be seen from the result shown below, the two users have been created: + +``` +IoTDB> CREATE USER `ln_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> LIST USER ++---------------+ +| user| ++---------------+ +| ln_write_user| +| root| +|sgcc_write_user| ++---------------+ +Total line number = 3 +It costs 0.157s +``` + +### Grant User Privilege + +At this point, although two users have been created, they do not have any privileges, so they can not operate on the database. For example, we use ln_write_user to write data in the database, the SQL statement is: + +``` +INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +``` +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +Now, we use root user to grant the two users write privileges to the corresponding databases. + +We use `GRANT USER PRIVILEGES ON ` to grant user privileges(ps: grant create user does not need path). For example: + +``` +GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +``` +The execution result is as follows: + +``` +IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +Next, use ln_write_user to try to write data again. +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: The statement is executed successfully. +``` + +### Revoker User Privilege + +After granting user privileges, we could use `REVOKE USER PRIVILEGES ON ` to revoke the granted user privileges(ps: revoke create user does not need path). For example, use root user to revoke the privilege of ln_write_user and sgcc_write_user: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +``` + +The execution result is as follows: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +After revoking, ln_write_user has no permission to writing data to root.ln.** +``` +INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +### SQL Statements + +Here are all related SQL statements: + +* Create User + +``` +CREATE USER ; +Eg: IoTDB > CREATE USER `thulab` 'pwd'; +``` + +* Delete User + +``` +DROP USER ; +Eg: IoTDB > DROP USER `xiaoming`; +``` + +* Create Role + +``` +CREATE ROLE ; +Eg: IoTDB > CREATE ROLE `admin`; +``` + +* Delete Role + +``` +DROP ROLE ; +Eg: IoTDB > DROP ROLE `admin`; +``` + +* Grant User Privileges + +``` +GRANT USER PRIVILEGES ON ; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +- Grant User All Privileges + +``` +GRANT USER PRIVILEGES ALL; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; +``` + +* Grant Role Privileges + +``` +GRANT ROLE PRIVILEGES ON ; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +- Grant Role All Privileges + +``` +GRANT ROLE PRIVILEGES ALL ON ; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; +``` + +* Grant User Role + +``` +GRANT TO ; +Eg: IoTDB > GRANT `temprole` TO tempuser; +``` + +* Revoke User Privileges + +``` +REVOKE USER PRIVILEGES ON ; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +* Revoke User All Privileges + +``` +REVOKE USER PRIVILEGES ALL; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; +``` + +* Revoke Role Privileges + +``` +REVOKE ROLE PRIVILEGES ON ; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +* Revoke All Role Privileges + +``` +REVOKE ROLE PRIVILEGES ALL; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; +``` + +* Revoke Role From User + +``` +REVOKE FROM ; +Eg: IoTDB > REVOKE `temprole` FROM tempuser; +``` + +* List Users + +``` +LIST USER +Eg: IoTDB > LIST USER +``` + +* List User of Specific Role + +``` +LIST USER OF ROLE ; +Eg: IoTDB > LIST USER OF ROLE `roleuser`; +``` + +* List Roles + +``` +LIST ROLE +Eg: IoTDB > LIST ROLE +``` + +* List Roles of Specific User + +``` +LIST ROLE OF USER ; +Eg: IoTDB > LIST ROLE OF USER `tempuser`; +``` + +* List All Privileges of Users + +``` +LIST PRIVILEGES USER ; +Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; +``` + +* List Related Privileges of Users(On Specific Paths) + +``` +LIST PRIVILEGES USER ON ; +Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +``` + +* List All Privileges of Roles + +``` +LIST PRIVILEGES ROLE +Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; +``` + +* List Related Privileges of Roles(On Specific Paths) + +``` +LIST PRIVILEGES ROLE ON ; +Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +``` + +* Alter Password + +``` +ALTER USER SET PASSWORD ; +Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; +``` + + +## Other Instructions + +### The Relationship among Users, Privileges and Roles + +A Role is a set of privileges, and privileges and roles are both attributes of users. That is, a role can have several privileges and a user can have several roles and privileges (called the user's own privileges). + +At present, there is no conflicting privilege in IoTDB, so the real privileges of a user is the union of the user's own privileges and the privileges of the user's roles. That is to say, to determine whether a user can perform an operation, it depends on whether one of the user's own privileges or the privileges of the user's roles permits the operation. The user's own privileges and privileges of the user's roles may overlap, but it does not matter. + +It should be noted that if users have a privilege (corresponding to operation A) themselves and their roles contain the same privilege, then revoking the privilege from the users themselves alone can not prohibit the users from performing operation A, since it is necessary to revoke the privilege from the role, or revoke the role from the user. Similarly, revoking the privilege from the users's roles alone can not prohibit the users from performing operation A. + +At the same time, changes to roles are immediately reflected on all users who own the roles. For example, adding certain privileges to roles will immediately give all users who own the roles corresponding privileges, and deleting certain privileges will also deprive the corresponding users of the privileges (unless the users themselves have the privileges). + +### List of Privileges Included in the System + +**List of privileges Included in the System** + +| privilege Name | Interpretation | Example | +|:--------------------------|:-------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CREATE\_DATABASE | create database; set/unset database ttl; path dependent | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | +| DELETE\_DATABASE | delete databases; path dependent | Eg: `delete database root.ln;` | +| CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | +| INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | +| ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | +| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](../Query-Data/Overview.md)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | +| DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | +| CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | +| DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | +| MODIFY\_PASSWORD | modify passwords for all users; path independent; (Those who do not have this privilege can still change their own asswords. ) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | +| LIST\_USER | list all users; list all user of specific role; list a user's related privileges on speciific paths; path independent | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | +| GRANT\_USER\_PRIVILEGE | grant user privileges; path independent | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| REVOKE\_USER\_PRIVILEGE | revoke user privileges; path independent | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| GRANT\_USER\_ROLE | grant user roles; path independent | Eg: `grant temprole to tempuser;` | +| REVOKE\_USER\_ROLE | revoke user roles; path independent | Eg: `revoke temprole from tempuser;` | +| CREATE\_ROLE | create roles; path independent | Eg: `create role admin;` | +| DELETE\_ROLE | delete roles; path independent | Eg: `drop role admin;` | +| LIST\_ROLE | list all roles; list all roles of specific user; list a role's related privileges on speciific paths; path independent | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | +| GRANT\_ROLE\_PRIVILEGE | grant role privileges; path independent | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| REVOKE\_ROLE\_PRIVILEGE | revoke role privileges; path independent | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| CREATE_FUNCTION | register UDFs; path independent | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | +| DROP_FUNCTION | deregister UDFs; path independent | Eg: `drop function example` | +| CREATE_TRIGGER | create triggers; path dependent | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | +| DROP_TRIGGER | drop triggers; path dependent | Eg: `drop trigger 'alert-listener-sg1d1s1'` | +| CREATE_CONTINUOUS_QUERY | create continuous queries; path independent | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | +| DROP_CONTINUOUS_QUERY | drop continuous queries; path independent | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | +| SHOW_CONTINUOUS_QUERIES | show continuous queries; path independent | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | +| UPDATE_TEMPLATE | create and drop schema template; path independent | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | +| READ_TEMPLATE | show schema templates and show nodes in schema template; path independent | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | +| APPLY_TEMPLATE | set, unset and activate schema template; path dependent | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | +| READ_TEMPLATE_APPLICATION | show paths set and using schema template; path independent | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | + +Note that path dependent privileges can only be granted or revoked on root.**; + +Note that the following SQL statements need to be granted multiple permissions before they can be used: + +- Import data: Need to assign `READ_TIMESERIES`,`INSERT_TIMESERIES` two permissions.。 + +``` +Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv +``` + +- Query Write-back (SELECT INTO) +- - `READ_TIMESERIES` permission of source sequence in all `select` clauses is required + - `INSERT_TIMESERIES` permission of target sequence in all `into` clauses is required + +``` +Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 +``` + +### Username Restrictions + +IoTDB specifies that the character length of a username should not be less than 4, and the username cannot contain spaces. + +### Password Restrictions + +IoTDB specifies that the character length of a password should have no less than 4 character length, and no spaces. The password is encrypted with MD5. + +### Role Name Restrictions + +IoTDB specifies that the character length of a role name should have no less than 4 character length, and no spaces. + +### Path pattern in Administration Management + +A path pattern's result set contains all the elements of its sub pattern's +result set. For example, `root.sg.d.*` is a sub pattern of +`root.sg.*.*`, while `root.sg.**` is not a sub pattern of +`root.sg.*.*`. When a user is granted privilege on a pattern, the pattern used in his DDL or DML must be a sub pattern of the privilege pattern, which guarantees that the user won't access the timeseries exceed his privilege scope. + +### Permission cache + +In distributed related permission operations, when changing permissions other than creating users and roles, all the cache information of `dataNode` related to the user (role) will be cleared first. If any `dataNode` cache information is clear and fails, the permission change task will fail. + +### Operations restricted by non root users + +At present, the following SQL statements supported by iotdb can only be operated by the `root` user, and no corresponding permission can be given to the new user. + +#### TsFile Management + +- Load TsFiles + +``` +Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' +``` + +- remove a tsfile + +``` +Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' +``` + +- unload a tsfile and move it to a target directory + +``` +Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' +``` + +#### Delete Time Partition (experimental) + +- Delete Time Partition (experimental) + +``` +Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 +``` + +#### Continuous Query,CQ + +- Continuous Query,CQ + +``` +Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END +``` + +#### Maintenance Command + +- FLUSH + +``` +Eg: IoTDB > flush +``` + +- MERGE + +``` +Eg: IoTDB > MERGE +Eg: IoTDB > FULL MERGE +``` + +- CLEAR CACHE + +```sql +Eg: IoTDB > CLEAR CACHE +``` + +- START REPAIR DATA + +```sql +Eg: IoTDB > START REPAIR DATA +``` + +- STOP REPAIR DATA + +```sql +Eg: IoTDB > STOP REPAIR DATA +``` + +- SET SYSTEM TO READONLY / WRITABLE + +``` +Eg: IoTDB > SET SYSTEM TO READONLY / WRITABLE +``` + +- Query abort + +``` +Eg: IoTDB > KILL QUERY 1 +``` + +#### Watermark Tool + +- Watermark new users + +``` +Eg: IoTDB > grant watermark_embedding to Alice +``` + +- Watermark Detection + +``` +Eg: IoTDB > revoke watermark_embedding from Alice +``` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Architecture.md b/src/UserGuide/V2.0.1/Tree/stage/Architecture.md new file mode 100644 index 00000000..e135da54 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Architecture.md @@ -0,0 +1,44 @@ + + +# System Architecture + +Besides IoTDB engine, we also developed several components to provide better IoT service. All components are referred to below as the IoTDB suite, and IoTDB refers specifically to the IoTDB engine. + +IoTDB suite can provide a series of functions in the real situation such as data collection, data writing, data storage, data query, data visualization and data analysis. Figure 1.1 shows the overall application architecture brought by all the components of the IoTDB suite. + + + +As shown in Figure 1.1, users can use JDBC to import timeseries data collected by sensor on the device to local/remote IoTDB. These timeseries data may be system state data (such as server load and CPU memory, etc.), message queue data, timeseries data from applications, or other timeseries data in the database. Users can also write the data directly to the TsFile (local or on HDFS). + +TsFile could be written to the HDFS, thereby implementing data processing tasks such as abnormality detection and machine learning on the Hadoop or Spark data processing platform. + +For the data written to HDFS or local TsFile, users can use TsFile-Hadoop-Connector or TsFile-Spark-Connector to allow Hadoop or Spark to process data. + +The results of the analysis can be write back to TsFile in the same way. + +Also, IoTDB and TsFile provide client tools to meet the various needs of users in writing and viewing data in SQL form, script form and graphical form. + +IoTDB offers two deployment modes: standalone and cluster. In cluster deployment mode, IoTDB supports automatic failover, ensuring that the system can quickly switch to standby nodes in the event of a node failure. The switch time can be achieved in seconds, thereby minimizing system downtime and ensuring no data loss after the switch. When the faulty node returns to normal, the system will automatically reintegrate it into the cluster, ensuring the cluster's high availability and scalability. + +IoTDB also supports a read-write separation deployment mode, which can allocate read and write operations to different nodes, achieving load balancing and enhancing the system's concurrent processing capability. + +Through these features, IoTDB can avoid single-point performance bottlenecks and single-point failures (SPOF), offering a high-availability and reliable data storage and management solution. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment.md b/src/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment.md new file mode 100644 index 00000000..3a183d45 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment.md @@ -0,0 +1,615 @@ + + +# Cluster Deployment +## Cluster Deployment +This article uses a local environment as an example to +illustrate how to start, expand, and shrink an IoTDB Cluster. + +**Notice: This document is a tutorial for deploying in a pseudo-cluster environment using different local ports, and is for exercise only. In real deployment scenarios, you only need to configure the IPV4 address or domain name of the server, and do not need to change the Node ports.** + +### 1. Prepare the Start Environment + +Unzip the apache-iotdb-1.0.0-all-bin.zip file to cluster0 folder. + +### 2. Start a Minimum Cluster + +Start the Cluster version with one ConfigNode and one DataNode(1C1D), and +the default number of replicas is one. + +``` +./cluster0/sbin/start-confignode.sh +./cluster0/sbin/start-datanode.sh +``` + +### 3. Verify the Minimum Cluster + ++ If everything goes well, the minimum cluster will start successfully. Then, we can start the Cli for verification. + +``` +./cluster0/sbin/start-cli.sh +``` + ++ Execute the `show cluster details` + command on the Cli. The result is shown below: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+--------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort |SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+--------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 1| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+----------+-------+--------+-------------------+-----------------+ +Total line number = 2 +It costs 0.242s +``` + +### 4. Prepare the Expanding Environment + +Unzip the apache-iotdb-1.0.0-all-bin.zip file to cluster1 and cluster2 folder. + +### 5. Modify the Node Configuration file + +For folder cluster1: + ++ Modify ConfigNode configurations: + +| **configuration item** | **value** | +| ------------------------------ | --------------- | +| cn\_internal\_address | 127.0.0.1 | +| cn\_internal\_port | 10711 | +| cn\_consensus\_port | 10721 | +| cn\_target\_config\_node\_list | 127.0.0.1:10710 | + ++ Modify DataNode configurations: + +| **configuration item** | **value** | +| ----------------------------------- | --------------- | +| dn\_rpc\_address | 127.0.0.1 | +| dn\_rpc\_port | 6668 | +| dn\_internal\_address | 127.0.0.1 | +| dn\_internal\_port | 10731 | +| dn\_mpp\_data\_exchange\_port | 10741 | +| dn\_schema\_region\_consensus\_port | 10751 | +| dn\_data\_region\_consensus\_port | 10761 | +| dn\_target\_config\_node\_list | 127.0.0.1:10710 | + +For folder cluster2: + ++ Modify ConfigNode configurations: + +| **configuration item** | **value** | +| ------------------------------ | --------------- | +| cn\_internal\_address | 127.0.0.1 | +| cn\_internal\_port | 10712 | +| cn\_consensus\_port | 10722 | +| cn\_target\_config\_node\_list | 127.0.0.1:10710 | + ++ Modify DataNode configurations: + +| **configuration item** | **value** | +| ----------------------------------- | --------------- | +| dn\_rpc\_address | 127.0.0.1 | +| dn\_rpc\_port | 6669 | +| dn\_internal\_address | 127.0.0.1 | +| dn\_internal\_port | 10732 | +| dn\_mpp\_data\_exchange\_port | 10742 | +| dn\_schema\_region\_consensus\_port | 10752 | +| dn\_data\_region\_consensus\_port | 10762 | +| dn\_target\_config\_node\_list | 127.0.0.1:10710 | + +### 6. Expanding the Cluster + +Expanding the Cluster to three ConfigNode and three DataNode(3C3D). +The following commands can be executed in arbitrary order. + +``` +./cluster1/sbin/start-confignode.sh +./cluster1/sbin/start-datanode.sh +./cluster2/sbin/start-confignode.sh +./cluster2/sbin/start-datanode.sh +``` + +### 7. Verify Cluster expansion + +Execute the `show cluster details` command, then the result is shown below: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 2|ConfigNode|Running| 127.0.0.1| 10711| 10721| | | | | | +| 3|ConfigNode|Running| 127.0.0.1| 10712| 10722| | | | | | +| 1| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 127.0.0.1| 10731| | 127.0.0.1| 6668| 10741| 10751| 10761| +| 5| DataNode|Running| 127.0.0.1| 10732| | 127.0.0.1| 6669| 10742| 10752| 10762| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +### 8. Shrinking the Cluster + ++ Remove a ConfigNode: + +``` +# Removing by ip:port +./cluster0/sbin/remove-confignode.sh 127.0.0.1:10711 + +# Removing by Node index +./cluster0/sbin/remove-confignode.sh 2 +``` + ++ Remove a DataNode: + +``` +# Removing by ip:port +./cluster0/sbin/remove-datanode.sh 127.0.0.1:6668 + +# Removing by Node index +./cluster0/sbin/remove-confignode.sh 4 +``` + +### 9. Verify Cluster shrinkage + +Execute the `show cluster details` command, then the result is shown below: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 3|ConfigNode|Running| 127.0.0.1| 10712| 10722| | | | | | +| 1| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| 127.0.0.1| 10732| | 127.0.0.1| 6669| 10742| 10752| 10762| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 4 +It costs 0.005s +``` + +## Manual Deployment +### Prerequisites + +1. JDK>=1.8. +2. Max open file 65535. +3. Disable the swap memory. +4. Ensure that data/confignode directory has been cleared when starting ConfigNode for the first time, + and data/datanode directory has been cleared when starting DataNode for the first time +5. Turn off the firewall of the server if the entire cluster is in a trusted environment. +6. By default, IoTDB Cluster will use ports 10710, 10720 for the ConfigNode and + 6667, 10730, 10740, 10750 and 10760 for the DataNode. + Please make sure those ports are not occupied, or you will modify the ports in configuration files. + +### Get the Installation Package + +You can either download the binary release files (see Chap 3.1) or compile with source code (see Chap 3.2). + +#### Download the binary distribution + +1. Open our website [Download Page](https://iotdb.apache.org/Download/). +2. Download the binary distribution. +3. Decompress to get the apache-iotdb-1.3.x-all-bin directory. + +#### Compile with source code + +##### Download the source code + +**Git** + +``` +git clone https://github.com/apache/iotdb.git +git checkout v1.3.x +``` + +**Website** + +1. Open our website [Download Page](https://iotdb.apache.org/Download/). +2. Download the source code. +3. Decompress to get the apache-iotdb-1.3.x directory. + +##### Compile source code + +Under the source root folder: + +``` +mvn clean package -pl distribution -am -DskipTests +``` + +Then you will get the binary distribution under +**distribution/target/apache-iotdb-1.3.x-SNAPSHOT-all-bin/apache-iotdb-1.3.x-SNAPSHOT-all-bin**. + +### Binary Distribution Content + +| **Folder** | **Description** | +| ---------- | ------------------------------------------------------------ | +| conf | Configuration files folder, contains configuration files of ConfigNode, DataNode, JMX and logback | +| data | Data files folder, contains data files of ConfigNode and DataNode | +| lib | Jar files folder | +| licenses | Licenses files folder | +| logs | Logs files folder, contains logs files of ConfigNode and DataNode | +| sbin | Shell files folder, contains start/stop/remove shell of ConfigNode and DataNode, cli shell | +| tools | System tools | + +### Cluster Installation and Configuration + +#### Cluster Installation + +`apache-iotdb-1.0.0-SNAPSHOT-all-bin` contains both the ConfigNode and the DataNode. +Please deploy the files to all servers of your target cluster. +A best practice is deploying the files into the same directory in all servers. + +If you want to try the cluster mode on one server, please read +[Cluster Quick Start](../QuickStart/ClusterQuickStart.md). + +#### Cluster Configuration + +We need to modify the configurations on each server. +Therefore, login each server and switch the working directory to `apache-iotdb-1.0.0-SNAPSHOT-all-bin`. +The configuration files are stored in the `./conf` directory. + +For all ConfigNode servers, we need to modify the common configuration (see Chap 5.2.1) +and ConfigNode configuration (see Chap 5.2.2). + +For all DataNode servers, we need to modify the common configuration (see Chap 5.2.1) +and DataNode configuration (see Chap 5.2.3). + +##### Common configuration + +Open the common configuration file ./conf/iotdb-system.properties, +and set the following parameters base on the +[Deployment Recommendation](./Deployment-Recommendation.md): + +| **Configuration** | **Description** | **Default** | +| ------------------------------------------ | ------------------------------------------------------------ | ----------------------------------------------- | +| cluster\_name | Cluster name for which the Node to join in | defaultCluster | +| config\_node\_consensus\_protocol\_class | Consensus protocol of ConfigNode | org.apache.iotdb.consensus.ratis.RatisConsensus | +| schema\_replication\_factor | Schema replication factor, no more than DataNode number | 1 | +| schema\_region\_consensus\_protocol\_class | Consensus protocol of schema replicas | org.apache.iotdb.consensus.ratis.RatisConsensus | +| data\_replication\_factor | Data replication factor, no more than DataNode number | 1 | +| data\_region\_consensus\_protocol\_class | Consensus protocol of data replicas. Note that RatisConsensus currently does not support multiple data directories | org.apache.iotdb.consensus.iot.IoTConsensus | + +**Notice: The preceding configuration parameters cannot be changed after the cluster is started. Ensure that the common configurations of all Nodes are the same. Otherwise, the Nodes cannot be started.** + +##### ConfigNode configuration + +Open the ConfigNode configuration file ./conf/iotdb-system.properties, +and set the following parameters based on the IP address and available port of the server or VM: + +| **Configuration** | **Description** | **Default** | **Usage** | +| ------------------------------ | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | +| cn\_internal\_address | Internal rpc service address of ConfigNode | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| cn\_internal\_port | Internal rpc service port of ConfigNode | 10710 | Set to any unoccupied port | +| cn\_consensus\_port | ConfigNode replication consensus protocol communication port | 10720 | Set to any unoccupied port | +| cn\_target\_config\_node\_list | ConfigNode address to which the node is connected when it is registered to the cluster. Note that Only one ConfigNode can be configured. | 127.0.0.1:10710 | For Seed-ConfigNode, set to its own cn\_internal\_address:cn\_internal\_port; For other ConfigNodes, set to other one running ConfigNode's cn\_internal\_address:cn\_internal\_port | + +**Notice: The preceding configuration parameters cannot be changed after the node is started. Ensure that all ports are not occupied. Otherwise, the Node cannot be started.** + +##### DataNode configuration + +Open the DataNode configuration file ./conf/iotdb-system.properties, +and set the following parameters based on the IP address and available port of the server or VM: + +| **Configuration** | **Description** | **Default** | **Usage** | +| ----------------------------------- | ------------------------------------------------ | --------------- | ------------------------------------------------------------ | +| dn\_rpc\_address | Client RPC Service address | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| dn\_rpc\_port | Client RPC Service port | 6667 | Set to any unoccupied port | +| dn\_internal\_address | Control flow address of DataNode inside cluster | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| dn\_internal\_port | Control flow port of DataNode inside cluster | 10730 | Set to any unoccupied port | +| dn\_mpp\_data\_exchange\_port | Data flow port of DataNode inside cluster | 10740 | Set to any unoccupied port | +| dn\_data\_region\_consensus\_port | Data replicas communication port for consensus | 10750 | Set to any unoccupied port | +| dn\_schema\_region\_consensus\_port | Schema replicas communication port for consensus | 10760 | Set to any unoccupied port | +| dn\_target\_config\_node\_list | Running ConfigNode of the Cluster | 127.0.0.1:10710 | Set to any running ConfigNode's cn\_internal\_address:cn\_internal\_port. You can set multiple values, separate them with commas(",") | + +**Notice: The preceding configuration parameters cannot be changed after the node is started. Ensure that all ports are not occupied. Otherwise, the Node cannot be started.** + +### Cluster Operation + +#### Starting the cluster + +This section describes how to start a cluster that includes several ConfigNodes and DataNodes. +The cluster can provide services only by starting at least one ConfigNode +and no less than the number of data/schema_replication_factor DataNodes. + +The total process are three steps: + +* Start the Seed-ConfigNode +* Add ConfigNode (Optional) +* Add DataNode + +##### Start the Seed-ConfigNode + +**The first Node started in the cluster must be ConfigNode. The first started ConfigNode must follow the tutorial in this section.** + +The first ConfigNode to start is the Seed-ConfigNode, which marks the creation of the new cluster. +Before start the Seed-ConfigNode, please open the common configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +| ------------------------------------------ | ----------------------------------------------- | +| cluster\_name | Is set to the expected name | +| config\_node\_consensus\_protocol\_class | Is set to the expected consensus protocol | +| schema\_replication\_factor | Is set to the expected schema replication count | +| schema\_region\_consensus\_protocol\_class | Is set to the expected consensus protocol | +| data\_replication\_factor | Is set to the expected data replication count | +| data\_region\_consensus\_protocol\_class | Is set to the expected consensus protocol | + +**Notice:** Please set these parameters carefully based on the [Deployment Recommendation](./Deployment-Recommendation.md). +These parameters are not modifiable after the Node first startup. + +Then open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +| ------------------------------ | ------------------------------------------------------------ | +| cn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| cn\_internal\_port | The port isn't occupied | +| cn\_consensus\_port | The port isn't occupied | +| cn\_target\_config\_node\_list | Is set to its own internal communication address, which is cn\_internal\_address:cn\_internal\_port | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-confignode.sh + +# Linux background +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +For more details about other configuration parameters of ConfigNode, see the +[ConfigNode Configurations](../Reference/ConfigNode-Config-Manual.md). + +##### Add more ConfigNodes (Optional) + +**The ConfigNode who isn't the first one started must follow the tutorial in this section.** + +You can add more ConfigNodes to the cluster to ensure high availability of ConfigNodes. +A common configuration is to add extra two ConfigNodes to make the cluster has three ConfigNodes. + +Ensure that all configuration parameters in the ./conf/iotdb-common.properites are the same as those in the Seed-ConfigNode; +otherwise, it may fail to start or generate runtime errors. +Therefore, please check the following parameters in common configuration file: + +| **Configuration** | **Check** | +| ------------------------------------------ | -------------------------------------- | +| cluster\_name | Is consistent with the Seed-ConfigNode | +| config\_node\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | +| schema\_replication\_factor | Is consistent with the Seed-ConfigNode | +| schema\_region\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | +| data\_replication\_factor | Is consistent with the Seed-ConfigNode | +| data\_region\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | + +Then, please open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +| ------------------------------ | ------------------------------------------------------------ | +| cn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| cn\_internal\_port | The port isn't occupied | +| cn\_consensus\_port | The port isn't occupied | +| cn\_target\_config\_node\_list | Is set to the internal communication address of an other running ConfigNode. The internal communication address of the seed ConfigNode is recommended. | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-confignode.sh + +# Linux background +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +For more details about other configuration parameters of ConfigNode, see the +[ConfigNode Configurations](../Reference/ConfigNode-Config-Manual.md). + +##### Start DataNode + +**Before adding DataNodes, ensure that there exists at least one ConfigNode is running in the cluster.** + +You can add any number of DataNodes to the cluster. +Before adding a new DataNode, + +please open its common configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +| ----------------- | -------------------------------------- | +| cluster\_name | Is consistent with the Seed-ConfigNode | + +Then open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +| ----------------------------------- | ------------------------------------------------------------ | +| dn\_rpc\_address | Is set to the IPV4 address or domain name of the server | +| dn\_rpc\_port | The port isn't occupied | +| dn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| dn\_internal\_port | The port isn't occupied | +| dn\_mpp\_data\_exchange\_port | The port isn't occupied | +| dn\_data\_region\_consensus\_port | The port isn't occupied | +| dn\_schema\_region\_consensus\_port | The port isn't occupied | +| dn\_target\_config\_node\_list | Is set to the internal communication address of other running ConfigNodes. The internal communication address of the seed ConfigNode is recommended. | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-datanode.sh + +# Linux background +nohup bash ./sbin/start-datanode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-datanode.bat +``` + +For more details about other configuration parameters of DataNode, see the +[DataNode Configurations](../Reference/DataNode-Config-Manual.md). + +**Notice: The cluster can provide services only if the number of its DataNodes is no less than the number of replicas(max{schema\_replication\_factor, data\_replication\_factor}).** + +#### Start Cli + +If the cluster is in local environment, you can directly run the Cli startup script in the ./sbin directory: + +``` +# Linux +./sbin/start-cli.sh + +# Windows +.\sbin\start-cli.bat +``` + +If you want to use the Cli to connect to a cluster in the production environment, +Please read the [Cli manual](../Tools-System/CLI.md). + +#### Verify Cluster + +Use a 3C3D(3 ConfigNodes and 3 DataNodes) as an example. +Assumed that the IP addresses of the 3 ConfigNodes are 192.168.1.10, 192.168.1.11 and 192.168.1.12, and the default ports 10710 and 10720 are used. +Assumed that the IP addresses of the 3 DataNodes are 192.168.1.20, 192.168.1.21 and 192.168.1.22, and the default ports 6667, 10730, 10740, 10750 and 10760 are used. + +After starting the cluster successfully according to chapter 6.1, you can run the `show cluster details` command on the Cli, and you will see the following results: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort| RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 192.168.1.10| 10710| 10720| | | | | | +| 2|ConfigNode|Running| 192.168.1.11| 10710| 10720| | | | | | +| 3|ConfigNode|Running| 192.168.1.12| 10710| 10720| | | | | | +| 1| DataNode|Running| 192.168.1.20| 10730| |192.168.1.20| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 192.168.1.21| 10730| |192.168.1.21| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| 192.168.1.22| 10730| |192.168.1.22| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +If the status of all Nodes is **Running**, the cluster deployment is successful. +Otherwise, read the run logs of the Node that fails to start and +check the corresponding configuration parameters. + +#### Stop IoTDB + +This section describes how to manually shut down the ConfigNode or DataNode process of the IoTDB. + +##### Stop ConfigNode by script + +Run the stop ConfigNode script: + +``` +# Linux +./sbin/stop-confignode.sh + +# Windows +.\sbin\stop-confignode.bat +``` + +##### Stop DataNode by script + +Run the stop DataNode script: + +``` +# Linux +./sbin/stop-datanode.sh + +# Windows +.\sbin\stop-datanode.bat +``` + +##### Kill Node process + +Get the process number of the Node: + +``` +jps + +# or + +ps aux | grep iotdb +``` + +Kill the process: + +``` +kill -9 +``` + +**Notice Some ports require root access, in which case use sudo** + +#### Shrink the Cluster + +This section describes how to remove ConfigNode or DataNode from the cluster. + +##### Remove ConfigNode + +Before removing a ConfigNode, ensure that there is at least one active ConfigNode in the cluster after the removal. +Run the remove-confignode script on an active ConfigNode: + +``` +# Linux +# Remove the ConfigNode with confignode_id +./sbin/remove-confignode.sh + +# Remove the ConfigNode with address:port +./sbin/remove-confignode.sh : + + +# Windows +# Remove the ConfigNode with confignode_id +.\sbin\remove-confignode.bat + +# Remove the ConfigNode with address:port +.\sbin\remove-confignode.bat : +``` + +##### Remove DataNode + +Before removing a DataNode, ensure that the cluster has at least the number of data/schema replicas DataNodes. +Run the remove-datanode script on an active DataNode: + +``` +# Linux +# Remove the DataNode with datanode_id +./sbin/remove-datanode.sh + +# Remove the DataNode with rpc address:port +./sbin/remove-datanode.sh : + + +# Windows +# Remove the DataNode with datanode_id +.\sbin\remove-datanode.bat + +# Remove the DataNode with rpc address:port +.\sbin\remove-datanode.bat : +``` + +### FAQ + +See [FAQ](../FAQ/Frequently-asked-questions.md). \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment_timecho.md b/src/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..596c30d8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment_timecho.md @@ -0,0 +1,1109 @@ + + +# Cluster Deployment + +## Cluster deployment(Cluster management tool) + +The IoTDB cluster management tool is an easy-to-use operation and maintenance tool (enterprise version tool). +It is designed to solve the operation and maintenance problems of multiple nodes in the IoTDB distributed system. +It mainly includes cluster deployment, cluster start and stop, elastic expansion, configuration update, data export and other functions, thereby realizing one-click command issuance for complex database clusters, which greatly Reduce management difficulty. +This document will explain how to remotely deploy, configure, start and stop IoTDB cluster instances with cluster management tools. + +### Environment dependence + +This tool is a supporting tool for TimechoDB(Enterprise Edition based on IoTDB). You can contact your sales representative to obtain the tool download method. + +The machine where IoTDB is to be deployed needs to rely on jdk 8 and above, lsof, netstat, and unzip functions. If not, please install them yourself. You can refer to the installation commands required for the environment in the last section of the document. + +Tip: The IoTDB cluster management tool requires an account with root privileges + +### Deployment method + +#### Download and install + +This tool is a supporting tool for TimechoDB(Enterprise Edition based on IoTDB). You can contact your salesperson to obtain the tool download method. + +Note: Since the binary package only supports GLIBC2.17 and above, the minimum version is Centos7. + +* After entering the following commands in the iotdb-opskit directory: + +```bash +bash install-iotdbctl.sh +``` + +The iotdbctl keyword can be activated in the subsequent shell, such as checking the environment instructions required before deployment as follows: + +```bash +iotdbctl cluster check example +``` + +* You can also directly use <iotdbctl absolute path>/sbin/iotdbctl without activating iotdbctl to execute commands, such as checking the environment required before deployment: + +```bash +/sbin/iotdbctl cluster check example +``` + +### Introduction to cluster configuration files + +* There is a cluster configuration yaml file in the `iotdbctl/config` directory. The yaml file name is the cluster name. There can be multiple yaml files. In order to facilitate users to configure yaml files, a `default_cluster.yaml` example is provided under the iotdbctl/config directory. +* The yaml file configuration consists of five major parts: `global`, `confignode_servers`, `datanode_servers`, `grafana_server`, and `prometheus_server` +* `global` is a general configuration that mainly configures machine username and password, IoTDB local installation files, Jdk configuration, etc. A `default_cluster.yaml` sample data is provided in the `iotdbctl/config` directory, + Users can copy and modify it to their own cluster name and refer to the instructions inside to configure the IoTDB cluster. In the `default_cluster.yaml` sample, all uncommented items are required, and those that have been commented are non-required. + +For example, to execute the `default_cluster.yaml` check command you need to execute the command `iotdbctl cluster check default_cluster`. +See further details in the following list of commands. + + +| parameter name | parameter describe | required | +|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| iotdb\_zip\_dir | IoTDB deployment distribution directory, if the value is empty, it will be downloaded from the address specified by `iotdb_download_url` | NO | +| iotdb\_download\_url | IoTDB download address, if `iotdb_zip_dir` has no value, download from the specified address | NO | +| jdk\_tar\_dir | jdk local directory, you can use this jdk path to upload and deploy to the target node. | NO | +| jdk\_deploy\_dir | jdk remote machine deployment directory, jdk will be deployed to this directory, and the following `jdk_dir_name` parameter forms a complete jdk deployment directory, that is, `/` | NO | +| jdk\_dir\_name | The directory name after jdk decompression defaults to jdk_iotdb | NO | +| iotdb\_lib\_dir | The IoTDB lib directory or the IoTDB lib compressed package only supports .zip format and is only used for IoTDB upgrade. It is in the comment state by default. If you need to upgrade, please open the comment and modify the path. If you use a zip file, please use the zip command to compress the iotdb/lib directory, such as zip -r lib.zip apache-iotdb-1.2.0/lib/* d | NO | +| user | User name for ssh login deployment machine | YES | +| password | The password for ssh login. If the password does not specify the use of pkey to log in, please ensure that the ssh login between nodes has been configured without a key. | NO | +| pkey | Key login: If password has a value, password is used first, otherwise pkey is used to log in. | NO | +| ssh\_port | ssh port | YES | +| deploy\_dir | IoTDB deployment directory, IoTDB will be deployed to this directory and the following `iotdb_dir_name` parameter will form a complete IoTDB deployment directory, that is, `/` | YES | +| iotdb\_dir\_name | The directory name after decompression of IoTDB is iotdb by default. | NO | +| datanode-env.sh | Corresponding to `iotdb/config/datanode-env.sh`, when `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first | NO | +| confignode-env.sh | Corresponding to `iotdb/config/confignode-env.sh`, the value in `datanode_servers` is used first when `global` and `datanode_servers` are configured at the same time | NO | +| iotdb-system.properties | Corresponds to `/config/iotdb-system.properties` | NO | +| cn\_internal\_address | The cluster configuration address points to the surviving ConfigNode, and it points to confignode_x by default. When `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_internal\_address | The cluster configuration address points to the surviving ConfigNode, and points to confignode_x by default. When configuring values for `global` and `datanode_servers` at the same time, the value in `datanode_servers` is used first, corresponding to `dn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | + +Among them, datanode-env.sh and confignode-env.sh can be configured with extra parameters extra_opts. When this parameter is configured, corresponding values will be appended after datanode-env.sh and confignode-env.sh. Refer to default_cluster.yaml for configuration examples as follows: +datanode-env.sh: +extra_opts: | +IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:+UseG1GC" +IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:MaxGCPauseMillis=200" + +* `confignode_servers` is the configuration for deploying IoTDB Confignodes, in which multiple Confignodes can be configured + By default, the first started ConfigNode node node1 is regarded as the Seed-ConfigNode + +| parameter name | parameter describe | required | +|-----------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| name | Confignode name | YES | +| deploy\_dir | IoTDB config node deployment directory | YES | +| cn_internal_address | The cluster configuration address points to the surviving ConfigNode, and it points to confignode_x by default. When `global` and `confignode_servers` are configured at the same time, the value in `confignode_servers` is used first, corresponding to `cn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | +| cn\_internal\_port | Internal communication port, corresponding to `cn_internal_port` in `iotdb/config/iotdb-system.properties` | YES | +| cn\_consensus\_port | Corresponds to `cn_consensus_port` in `iotdb/config/iotdb-system.properties` | NO | +| cn\_data\_dir | Corresponds to `cn_consensus_port` in `iotdb/config/iotdb-system.properties` Corresponds to `cn_data_dir` in `iotdb/config/iotdb-system.properties` | YES | +| iotdb-system.properties | Corresponding to `iotdb/config/iotdb-system.properties`, when configuring values in `global` and `confignode_servers` at the same time, the value in confignode_servers will be used first. | NO | + +* datanode_servers 是部署IoTDB Datanodes配置,里面可以配置多个Datanode + +| parameter name | parameter describe | required | +|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------| +| name | Datanode name | YES | +| deploy\_dir | IoTDB data node deployment directory | YES | +| dn\_rpc\_address | The datanode rpc address corresponds to `dn_rpc_address` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_internal\_address | Internal communication address, corresponding to `dn_internal_address` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_seed\_config\_node | The cluster configuration address points to the surviving ConfigNode, and points to confignode_x by default. When configuring values for `global` and `datanode_servers` at the same time, the value in `datanode_servers` is used first, corresponding to `dn_seed_config_node` in `iotdb/config/iotdb-system.properties`. | YES | +| dn\_rpc\_port | Datanode rpc port address, corresponding to `dn_rpc_port` in `iotdb/config/iotdb-system.properties` | YES | +| dn\_internal\_port | Internal communication port, corresponding to `dn_internal_port` in `iotdb/config/iotdb-system.properties` | YES | +| iotdb-system.properties | Corresponding to `iotdb/config/iotdb-system.properties`, when configuring values in `global` and `datanode_servers` at the same time, the value in `datanode_servers` will be used first. | NO | + +* grafana_server is the configuration related to deploying Grafana + +| parameter name | parameter describe | required | +|--------------------|-------------------------------------------------------------|-----------| +| grafana\_dir\_name | Grafana decompression directory name(default grafana_iotdb) | NO | +| host | Server ip deployed by grafana | YES | +| grafana\_port | The port of grafana deployment machine, default 3000 | NO | +| deploy\_dir | grafana deployment server directory | YES | +| grafana\_tar\_dir | Grafana compressed package location | YES | +| dashboards | dashboards directory | NO | + +* prometheus_server 是部署Prometheus 相关配置 + +| parameter name | parameter describe | required | +|--------------------------------|----------------------------------------------------|----------| +| prometheus\_dir\_name | prometheus decompression directory name, default prometheus_iotdb | NO | +| host | Server IP deployed by prometheus | YES | +| prometheus\_port | The port of prometheus deployment machine, default 9090 | NO | +| deploy\_dir | prometheus deployment server directory | YES | +| prometheus\_tar\_dir | prometheus compressed package path | YES | +| storage\_tsdb\_retention\_time | The number of days to save data is 15 days by default | NO | +| storage\_tsdb\_retention\_size | The data size that can be saved by the specified block defaults to 512M. Please note the units are KB, MB, GB, TB, PB, and EB. | NO | + +If metrics are configured in `iotdb-system.properties` of config/xxx.yaml, the configuration will be automatically put into prometheus without manual modification. + +Note: How to configure the value corresponding to the yaml key to contain special characters such as: etc. It is recommended to use double quotes for the entire value, and do not use paths containing spaces in the corresponding file paths to prevent abnormal recognition problems. + +### scenes to be used + +#### Clean data + +* Cleaning up the cluster data scenario will delete the data directory in the IoTDB cluster and `cn_system_dir`, `cn_consensus_dir`, `cn_consensus_dir` configured in the yaml file + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs` and `ext` directories. +* First execute the stop cluster command, and then execute the cluster cleanup command. + +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster clean default_cluster +``` + +#### Cluster destruction + +* The cluster destruction scenario will delete `data`, `cn_system_dir`, `cn_consensus_dir`, in the IoTDB cluster + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs`, `ext`, `IoTDB` deployment directory, + grafana deployment directory and prometheus deployment directory. +* First execute the stop cluster command, and then execute the cluster destruction command. + + +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster destroy default_cluster +``` + +#### Cluster upgrade + +* To upgrade the cluster, you first need to configure `iotdb_lib_dir` in config/xxx.yaml as the directory path where the jar to be uploaded to the server is located (for example, iotdb/lib). +* If you use zip files to upload, please use the zip command to compress the iotdb/lib directory, such as zip -r lib.zip apache-iotdb-1.2.0/lib/* +* Execute the upload command and then execute the restart IoTDB cluster command to complete the cluster upgrade. + +```bash +iotdbctl cluster upgrade default_cluster +iotdbctl cluster restart default_cluster +``` + +#### hot deployment + +* First modify the configuration in config/xxx.yaml. +* Execute the distribution command, and then execute the hot deployment command to complete the hot deployment of the cluster configuration + +```bash +iotdbctl cluster distribute default_cluster +iotdbctl cluster reload default_cluster +``` + +#### Cluster expansion + +* First modify and add a datanode or confignode node in config/xxx.yaml. +* Execute the cluster expansion command + +```bash +iotdbctl cluster scaleout default_cluster +``` + +#### Cluster scaling + +* First find the node name or ip+port to shrink in config/xxx.yaml (where confignode port is cn_internal_port, datanode port is rpc_port) +* Execute cluster shrink command + +```bash +iotdbctl cluster scalein default_cluster +``` + +#### Using cluster management tools to manipulate existing IoTDB clusters + +* Configure the server's `user`, `passwod` or `pkey`, `ssh_port` +* Modify the IoTDB deployment path in config/xxx.yaml, `deploy_dir` (IoTDB deployment directory), `iotdb_dir_name` (IoTDB decompression directory name, the default is iotdb) + For example, if the full path of IoTDB deployment is `/home/data/apache-iotdb-1.1.1`, you need to modify the yaml files `deploy_dir:/home/data/` and `iotdb_dir_name:apache-iotdb-1.3.x` +* If the server is not using java_home, modify `jdk_deploy_dir` (jdk deployment directory) and `jdk_dir_name` (the directory name after jdk decompression, the default is jdk_iotdb). If java_home is used, there is no need to modify the configuration. + For example, the full path of jdk deployment is `/home/data/jdk_1.8.2`, you need to modify the yaml files `jdk_deploy_dir:/home/data/`, `jdk_dir_name:jdk_1.8.2` +* Configure `cn_internal_address`, `dn_internal_address` +* Configure `cn_internal_address`, `cn_internal_port`, `cn_consensus_port`, `cn_system_dir`, in `iotdb-system.properties` in `confignode_servers` + If the values in `cn_consensus_dir` and `iotdb-system.properties` are not the default for IoTDB, they need to be configured, otherwise there is no need to configure them. +* Configure `dn_rpc_address`, `dn_internal_address`, `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir` and `iotdb-system.properties` in `datanode_servers` +* Execute initialization command + +```bash +iotdbctl cluster init default_cluster +``` + +#### Deploy IoTDB, Grafana and Prometheus + +* Configure `iotdb-system.properties` to open the metrics interface +* Configure the Grafana configuration. If there are multiple `dashboards`, separate them with commas. The names cannot be repeated or they will be overwritten. +* Configure the Prometheus configuration. If the IoTDB cluster is configured with metrics, there is no need to manually modify the Prometheus configuration. The Prometheus configuration will be automatically modified according to which node is configured with metrics. +* Start the cluster + +```bash +iotdbctl cluster start default_cluster +``` + +For more detailed parameters, please refer to the cluster configuration file introduction above + +### Command + +The basic usage of this tool is: +```bash +iotdbctl cluster [params (Optional)] +``` +* key indicates a specific command. + +* cluster name indicates the cluster name (that is, the name of the yaml file in the `iotdbctl/config` file). + +* params indicates the required parameters of the command (optional). + +* For example, the command format to deploy the default_cluster cluster is: + +```bash +iotdbctl cluster deploy default_cluster +``` + +* The functions and parameters of the cluster are listed as follows: + +| command | description | parameter | +|------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| check | check whether the cluster can be deployed | Cluster name list | +| clean | cleanup-cluster | cluster-name | +| deploy | deploy cluster | Cluster name, -N, module name (optional for iotdb, grafana, prometheus), -op force (optional) | +| list | cluster status list | None | +| start | start cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional) | +| stop | stop cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional), -op force (nodename, grafana, prometheus optional) | +| restart | restart cluster | Cluster name, -N, node name (nodename, grafana, prometheus optional), -op force (nodename, grafana, prometheus optional) | +| show | view cluster information. The details field indicates the details of the cluster information. | Cluster name, details (optional) | +| destroy | destroy cluster | Cluster name, -N, module name (iotdb, grafana, prometheus optional) | +| scaleout | cluster expansion | Cluster name | +| scalein | cluster shrink | Cluster name, -N, cluster node name or cluster node ip+port | +| reload | hot loading of cluster configuration files | Cluster name | +| distribute | cluster configuration file distribution | Cluster name | +| dumplog | Back up specified cluster logs | Cluster name, -N, cluster node name -h Back up to target machine ip -pw Back up to target machine password -p Back up to target machine port -path Backup directory -startdate Start time -enddate End time -loglevel Log type -l transfer speed | +| dumpdata | Backup cluster data | Cluster name, -h backup to target machine ip -pw backup to target machine password -p backup to target machine port -path backup directory -startdate start time -enddate end time -l transmission speed | +| upgrade | lib package upgrade | Cluster name | +| init | When an existing cluster uses the cluster deployment tool, initialize the cluster configuration | Cluster name | +| status | View process status | Cluster name | +| activate | Activate cluster | Cluster name | +### Detailed command execution process + +The following commands are executed using default_cluster.yaml as an example, and users can modify them to their own cluster files to execute + +#### Check cluster deployment environment commands + +```bash +iotdbctl cluster check default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Verify that the target node is able to log in via SSH + +* Verify whether the JDK version on the corresponding node meets IoTDB jdk1.8 and above, and whether the server is installed with unzip, lsof, and netstat. + +* If you see the following prompt `Info:example check successfully!`, it proves that the server has already met the installation requirements. + If `Error:example check fail!` is output, it proves that some conditions do not meet the requirements. You can check the Error log output above (for example: `Error:Server (ip:172.20.31.76) iotdb port(10713) is listening`) to make repairs. , + If the jdk check does not meet the requirements, we can configure a jdk1.8 or above version in the yaml file ourselves for deployment without affecting subsequent use. + If checking lsof, netstat or unzip does not meet the requirements, you need to install it on the server yourself. + +#### Deploy cluster command + +```bash +iotdbctl cluster deploy default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Upload IoTDB compressed package and jdk compressed package according to the node information in `confignode_servers` and `datanode_servers` (if `jdk_tar_dir` and `jdk_deploy_dir` values ​​are configured in yaml) + +* Generate and upload `iotdb-system.properties` according to the yaml file node configuration information + +```bash +iotdbctl cluster deploy default_cluster -op force +``` + +Note: This command will force the deployment, and the specific process will delete the existing deployment directory and redeploy + +*deploy a single module* +```bash +# Deploy grafana module +iotdbctl cluster deploy default_cluster -N grafana +# Deploy the prometheus module +iotdbctl cluster deploy default_cluster -N prometheus +# Deploy the iotdb module +iotdbctl cluster deploy default_cluster -N iotdb +``` + +#### Start cluster command + +```bash +iotdbctl cluster start default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Start confignode, start sequentially according to the order in `confignode_servers` in the yaml configuration file and check whether the confignode is normal according to the process id, the first confignode is seek config + +* Start the datanode in sequence according to the order in `datanode_servers` in the yaml configuration file and check whether the datanode is normal according to the process id. + +* After checking the existence of the process according to the process id, check whether each service in the cluster list is normal through the cli. If the cli link fails, retry every 10s until it succeeds and retry up to 5 times + + +* +Start a single node command* +```bash +#Start according to the IoTDB node name +iotdbctl cluster start default_cluster -N datanode_1 +#Start according to IoTDB cluster ip+port, where port corresponds to cn_internal_port of confignode and rpc_port of datanode. +iotdbctl cluster start default_cluster -N 192.168.1.5:6667 +#Start grafana +iotdbctl cluster start default_cluster -N grafana +#Start prometheus +iotdbctl cluster start default_cluster -N prometheus +``` + +* Find the yaml file in the default location based on cluster-name + +* Find the node location information based on the provided node name or ip:port. If the started node is `data_node`, the ip uses `dn_rpc_address` in the yaml file, and the port uses `dn_rpc_port` in datanode_servers in the yaml file. + If the started node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` + +* start the node + +Note: Since the cluster deployment tool only calls the start-confignode.sh and start-datanode.sh scripts in the IoTDB cluster, +When the actual output result fails, it may be that the cluster has not started normally. It is recommended to use the status command to check the current cluster status (iotdbctl cluster status xxx) + + +#### View IoTDB cluster status command + +```bash +iotdbctl cluster show default_cluster +#View IoTDB cluster details +iotdbctl cluster show default_cluster details +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Execute `show cluster details` through cli on datanode in turn. If one node is executed successfully, it will not continue to execute cli on subsequent nodes and return the result directly. + +#### Stop cluster command + + +```bash +iotdbctl cluster stop default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* According to the datanode node information in `datanode_servers`, stop the datanode nodes in order according to the configuration. + +* Based on the confignode node information in `confignode_servers`, stop the confignode nodes in sequence according to the configuration + +*force stop cluster command* + +```bash +iotdbctl cluster stop default_cluster -op force +``` +Will directly execute the kill -9 pid command to forcibly stop the cluster + +*Stop single node command* + +```bash +#Stop by IoTDB node name +iotdbctl cluster stop default_cluster -N datanode_1 +#Stop according to IoTDB cluster ip+port (ip+port is to get the only node according to ip+dn_rpc_port in datanode or ip+cn_internal_port in confignode to get the only node) +iotdbctl cluster stop default_cluster -N 192.168.1.5:6667 +#Stop grafana +iotdbctl cluster stop default_cluster -N grafana +#Stop prometheus +iotdbctl cluster stop default_cluster -N prometheus +``` + +* Find the yaml file in the default location based on cluster-name + +* Find the corresponding node location information based on the provided node name or ip:port. If the stopped node is `data_node`, the ip uses `dn_rpc_address` in the yaml file, and the port uses `dn_rpc_port` in datanode_servers in the yaml file. + If the stopped node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` + +* stop the node + +Note: Since the cluster deployment tool only calls the stop-confignode.sh and stop-datanode.sh scripts in the IoTDB cluster, in some cases the iotdb cluster may not be stopped. + + +#### Clean cluster data command + +```bash +iotdbctl cluster clean default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Based on the information in `confignode_servers` and `datanode_servers`, check whether there are still services running, + If any service is running, the cleanup command will not be executed. + +* Delete the data directory in the IoTDB cluster and the `cn_system_dir`, `cn_consensus_dir`, configured in the yaml file + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs` and `ext` directories. + + + +#### Restart cluster command + +```bash +iotdbctl cluster restart default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` + +* Execute the above stop cluster command (stop), and then execute the start cluster command (start). For details, refer to the above start and stop commands. + +*Force restart cluster command* + +```bash +iotdbctl cluster restart default_cluster -op force +``` +Will directly execute the kill -9 pid command to force stop the cluster, and then start the cluster + + +*Restart a single node command* + +```bash +#Restart datanode_1 according to the IoTDB node name +iotdbctl cluster restart default_cluster -N datanode_1 +#Restart confignode_1 according to the IoTDB node name +iotdbctl cluster restart default_cluster -N confignode_1 +#Restart grafana +iotdbctl cluster restart default_cluster -N grafana +#Restart prometheus +iotdbctl cluster restart default_cluster -N prometheus +``` + +#### Cluster shrink command + +```bash +#Scale down by node name +iotdbctl cluster scalein default_cluster -N nodename +#Scale down according to ip+port (ip+port obtains the only node according to ip+dn_rpc_port in datanode, and obtains the only node according to ip+cn_internal_port in confignode) +iotdbctl cluster scalein default_cluster -N ip:port +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Determine whether there is only one confignode node and datanode to be reduced. If there is only one left, the reduction cannot be performed. + +* Then get the node information to shrink according to ip:port or nodename, execute the shrink command, and then destroy the node directory. If the shrink node is `data_node`, use `dn_rpc_address` in the yaml file for ip, and use `dn_rpc_address` in the port. `dn_rpc_port` in datanode_servers in yaml file. + If the shrinking node is `config_node`, the ip uses `cn_internal_address` in confignode_servers in the yaml file, and the port uses `cn_internal_port` + + +Tip: Currently, only one node scaling is supported at a time + +#### Cluster expansion command + +```bash +iotdbctl cluster scaleout default_cluster +``` +* Modify the config/xxx.yaml file to add a datanode node or confignode node + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Find the node to be expanded, upload the IoTDB compressed package and jdb package (if the `jdk_tar_dir` and `jdk_deploy_dir` values ​​are configured in yaml) and decompress it + +* Generate and upload `iotdb-system.properties` according to the yaml file node configuration information + +* Execute the command to start the node and verify whether the node is started successfully + +Tip: Currently, only one node expansion is supported at a time + +#### destroy cluster command +```bash +iotdbctl cluster destroy default_cluster +``` + +* cluster-name finds the yaml file in the default location + +* Check whether the node is still running based on the node node information in `confignode_servers`, `datanode_servers`, `grafana`, and `prometheus`. + Stop the destroy command if any node is running + +* Delete `data` in the IoTDB cluster and `cn_system_dir`, `cn_consensus_dir` configured in the yaml file + `dn_data_dirs`, `dn_consensus_dir`, `dn_system_dir`, `logs`, `ext`, `IoTDB` deployment directory, + grafana deployment directory and prometheus deployment directory + +*Destroy a single module* + +```bash +# Destroy grafana module +iotdbctl cluster destroy default_cluster -N grafana +# Destroy prometheus module +iotdbctl cluster destroy default_cluster -N prometheus +# Destroy iotdb module +iotdbctl cluster destroy default_cluster -N iotdb +``` + +#### Distribute cluster configuration commands + +```bash +iotdbctl cluster distribute default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` + +* Generate and upload `iotdb-system.properties` to the specified node according to the node configuration information of the yaml file + +#### Hot load cluster configuration command + +```bash +iotdbctl cluster reload default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Execute `load configuration` in the cli according to the node configuration information of the yaml file. + +#### Cluster node log backup +```bash +iotdbctl cluster dumplog default_cluster -N datanode_1,confignode_1 -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/logs' -logs '/root/data/db/iotdb/logs' +``` + +* Find the yaml file in the default location based on cluster-name + +* This command will verify the existence of datanode_1 and confignode_1 according to the yaml file, and then back up the log data of the specified node datanode_1 and confignode_1 to the specified service `192.168.9.48` port 36000 according to the configured start and end dates (startdate<=logtime<=enddate) The data backup path is `/iotdb/logs`, and the IoTDB log storage path is `/root/data/db/iotdb/logs` (not required, if you do not fill in -logs xxx, the default is to backup logs from the IoTDB installation path /logs ) + +| command | description | required | +|------------|-------------------------------------------------------------------------|----------| +| -h | backup data server ip | NO | +| -u | backup data server username | NO | +| -pw | backup data machine password | NO | +| -p | backup data machine port(default 22) | NO | +| -path | path to backup data (default current path) | NO | +| -loglevel | Log levels include all, info, error, warn (default is all) | NO | +| -l | speed limit (default 1024 speed limit range 0 to 104857601 unit Kbit/s) | NO | +| -N | multiple configuration file cluster names are separated by commas. | YES | +| -startdate | start time (including default 1970-01-01) | NO | +| -enddate | end time (included) | NO | +| -logs | IoTDB log storage path, the default is ({iotdb}/logs)) | NO | + +#### Cluster data backup +```bash +iotdbctl cluster dumpdata default_cluster -granularity partition -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/datas' +``` +* This command will obtain the leader node based on the yaml file, and then back up the data to the /iotdb/datas directory on the 192.168.9.48 service based on the start and end dates (startdate<=logtime<=enddate) + +| command | description | required | +|--------------|-------------------------------------------------------------------------|----------| +| -h | backup data server ip | NO | +| -u | backup data server username | NO | +| -pw | backup data machine password | NO | +| -p | backup data machine port(default 22) | NO | +| -path | path to backup data (default current path) | NO | +| -granularity | partition | YES | +| -l | speed limit (default 1024 speed limit range 0 to 104857601 unit Kbit/s) | NO | +| -startdate | start time (including default 1970-01-01) | YES | +| -enddate | end time (included) | YES | + +#### Cluster upgrade +```bash +iotdbctl cluster upgrade default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers` and `datanode_servers` + +* Upload lib package + +Note that after performing the upgrade, please restart IoTDB for it to take effect. + +#### Cluster initialization +```bash +iotdbctl cluster init default_cluster +``` +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` +* Initialize cluster configuration + +#### View cluster process status +```bash +iotdbctl cluster status default_cluster +``` + +* Find the yaml file in the default location according to cluster-name and obtain the configuration information of `confignode_servers`, `datanode_servers`, `grafana` and `prometheus` +* Display the survival status of each node in the cluster + +#### Cluster authorization activation + +Cluster activation is activated by entering the activation code by default, or by using the - op license_path activated through license path + +* Default activation method +```bash +iotdbctl cluster activate default_cluster +``` +* Find the yaml file in the default location based on `cluster-name` and obtain the `confignode_servers` configuration information +* Obtain the machine code inside +* Waiting for activation code input + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* Activate a node + +```bash +iotdbctl cluster activate default_cluster -N confignode1 +``` + +* Activate through license path + +```bash +iotdbctl cluster activate default_cluster -op license_path +``` +* Find the yaml file in the default location based on `cluster-name` and obtain the `confignode_servers` configuration information +* Obtain the machine code inside +* Waiting for activation code input + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* Activate a node + +```bash +iotdbctl cluster activate default_cluster -N confignode1 -op license_path +``` + + +### Introduction to Cluster Deployment Tool Samples + +In the cluster deployment tool installation directory config/example, there are three yaml examples. If necessary, you can copy them to config and modify them. + +| name | description | +|-----------------------------|------------------------------------------------| +| default\_1c1d.yaml | 1 confignode and 1 datanode configuration example | +| default\_3c3d.yaml | 3 confignode and 3 datanode configuration samples | +| default\_3c3d\_grafa\_prome | 3 confignode and 3 datanode, Grafana, Prometheus configuration examples | + + +## Manual Deployment + +### Prerequisites + +1. JDK>=1.8. +2. Max open file 65535. +3. Disable the swap memory. +4. Ensure that data/confignode directory has been cleared when starting ConfigNode for the first time, + and data/datanode directory has been cleared when starting DataNode for the first time +5. Turn off the firewall of the server if the entire cluster is in a trusted environment. +6. By default, IoTDB Cluster will use ports 10710, 10720 for the ConfigNode and + 6667, 10730, 10740, 10750 and 10760 for the DataNode. + Please make sure those ports are not occupied, or you will modify the ports in configuration files. + +### Get the Installation Package + +You can either download the binary release files or compile with source code. + +#### Download the binary distribution + +1. Open our website [Download Page](https://iotdb.apache.org/Download/). +2. Download the binary distribution. +3. Decompress to get the apache-iotdb-1.3.x-all-bin directory. + +#### Compile with source code + +##### Download the source code + +**Git** + +``` +git clone https://github.com/apache/iotdb.git +git checkout v1.3.x +``` + +**Website** + +1. Open our website [Download Page](https://iotdb.apache.org/Download/). +2. Download the source code. +3. Decompress to get the apache-iotdb-1.3.x directory. + +##### Compile source code + +Under the source root folder: + +``` +mvn clean package -pl distribution -am -DskipTests +``` + +Then you will get the binary distribution under +**distribution/target/apache-iotdb-1.3.x-SNAPSHOT-all-bin/apache-iotdb-1.3.x-SNAPSHOT-all-bin**. + +### Binary Distribution Content + +| **Folder** | **Description** | +| ---------- | ------------------------------------------------------------ | +| conf | Configuration files folder, contains configuration files of ConfigNode, DataNode, JMX and logback | +| data | Data files folder, contains data files of ConfigNode and DataNode | +| lib | Jar files folder | +| licenses | Licenses files folder | +| logs | Logs files folder, contains logs files of ConfigNode and DataNode | +| sbin | Shell files folder, contains start/stop/remove shell of ConfigNode and DataNode, cli shell | +| tools | System tools | + +### Cluster Installation and Configuration + +#### Cluster Installation + +`apache-iotdb-1.0.0-SNAPSHOT-all-bin` contains both the ConfigNode and the DataNode. +Please deploy the files to all servers of your target cluster. +A best practice is deploying the files into the same directory in all servers. + +If you want to try the cluster mode on one server, please read +[Cluster Quick Start](../QuickStart/ClusterQuickStart.md). + +#### Cluster Configuration + +We need to modify the configurations on each server. +Therefore, login each server and switch the working directory to `apache-iotdb-1.0.0-SNAPSHOT-all-bin`. +The configuration files are stored in the `./conf` directory. + +For all ConfigNode servers, we need to modify the common configuration +and ConfigNode configuration. + +For all DataNode servers, we need to modify the common configuration +and DataNode configuration. + +##### Common configuration + +Open the common configuration file ./conf/iotdb-system.properties, +and set the following parameters base on the +[Deployment Recommendation](./Deployment-Recommendation.md): + +| **Configuration** | **Description** | **Default** | +| ------------------------------------------ | ------------------------------------------------------------ | ----------------------------------------------- | +| cluster\_name | Cluster name for which the Node to join in | defaultCluster | +| config\_node\_consensus\_protocol\_class | Consensus protocol of ConfigNode | org.apache.iotdb.consensus.ratis.RatisConsensus | +| schema\_replication\_factor | Schema replication factor, no more than DataNode number | 1 | +| schema\_region\_consensus\_protocol\_class | Consensus protocol of schema replicas | org.apache.iotdb.consensus.ratis.RatisConsensus | +| data\_replication\_factor | Data replication factor, no more than DataNode number | 1 | +| data\_region\_consensus\_protocol\_class | Consensus protocol of data replicas. Note that RatisConsensus currently does not support multiple data directories | org.apache.iotdb.consensus.iot.IoTConsensus | + +**Notice: The preceding configuration parameters cannot be changed after the cluster is started. Ensure that the common configurations of all Nodes are the same. Otherwise, the Nodes cannot be started.** + +##### ConfigNode configuration + +Open the ConfigNode configuration file ./conf/iotdb-system.properties, +and set the following parameters based on the IP address and available port of the server or VM: + +| **Configuration** | **Description** | **Default** | **Usage** | +|------------------------| ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | +| cn\_internal\_address | Internal rpc service address of ConfigNode | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| cn\_internal\_port | Internal rpc service port of ConfigNode | 10710 | Set to any unoccupied port | +| cn\_consensus\_port | ConfigNode replication consensus protocol communication port | 10720 | Set to any unoccupied port | +| cn\_seed\_config\_node | ConfigNode address to which the node is connected when it is registered to the cluster. Note that Only one ConfigNode can be configured. | 127.0.0.1:10710 | For Seed-ConfigNode, set to its own cn\_internal\_address:cn\_internal\_port; For other ConfigNodes, set to other one running ConfigNode's cn\_internal\_address:cn\_internal\_port | + +**Notice: The preceding configuration parameters cannot be changed after the node is started. Ensure that all ports are not occupied. Otherwise, the Node cannot be started.** + +##### DataNode configuration + +Open the DataNode configuration file ./conf/iotdb-system.properties, +and set the following parameters based on the IP address and available port of the server or VM: + +| **Configuration** | **Description** | **Default** | **Usage** | +|-------------------------------------| ------------------------------------------------ | --------------- | ------------------------------------------------------------ | +| dn\_rpc\_address | Client RPC Service address | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| dn\_rpc\_port | Client RPC Service port | 6667 | Set to any unoccupied port | +| dn\_internal\_address | Control flow address of DataNode inside cluster | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| dn\_internal\_port | Control flow port of DataNode inside cluster | 10730 | Set to any unoccupied port | +| dn\_mpp\_data\_exchange\_port | Data flow port of DataNode inside cluster | 10740 | Set to any unoccupied port | +| dn\_data\_region\_consensus\_port | Data replicas communication port for consensus | 10750 | Set to any unoccupied port | +| dn\_schema\_region\_consensus\_port | Schema replicas communication port for consensus | 10760 | Set to any unoccupied port | +| dn\_seed\_config\_node | Running ConfigNode of the Cluster | 127.0.0.1:10710 | Set to any running ConfigNode's cn\_internal\_address:cn\_internal\_port. You can set multiple values, separate them with commas(",") | + +**Notice: The preceding configuration parameters cannot be changed after the node is started. Ensure that all ports are not occupied. Otherwise, the Node cannot be started.** + +### Cluster Operation + +#### Starting the cluster + +This section describes how to start a cluster that includes several ConfigNodes and DataNodes. +The cluster can provide services only by starting at least one ConfigNode +and no less than the number of data/schema_replication_factor DataNodes. + +The total process are three steps: + +* Start the Seed-ConfigNode +* Add ConfigNode (Optional) +* Add DataNode + +##### Start the Seed-ConfigNode + +**The first Node started in the cluster must be ConfigNode. The first started ConfigNode must follow the tutorial in this section.** + +The first ConfigNode to start is the Seed-ConfigNode, which marks the creation of the new cluster. +Before start the Seed-ConfigNode, please open the common configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +| ------------------------------------------ | ----------------------------------------------- | +| cluster\_name | Is set to the expected name | +| config\_node\_consensus\_protocol\_class | Is set to the expected consensus protocol | +| schema\_replication\_factor | Is set to the expected schema replication count | +| schema\_region\_consensus\_protocol\_class | Is set to the expected consensus protocol | +| data\_replication\_factor | Is set to the expected data replication count | +| data\_region\_consensus\_protocol\_class | Is set to the expected consensus protocol | + +**Notice:** Please set these parameters carefully based on the [Deployment Recommendation](./Deployment-Recommendation.md). +These parameters are not modifiable after the Node first startup. + +Then open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|------------------------| ------------------------------------------------------------ | +| cn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| cn\_internal\_port | The port isn't occupied | +| cn\_consensus\_port | The port isn't occupied | +| cn\_seed\_config\_node | Is set to its own internal communication address, which is cn\_internal\_address:cn\_internal\_port | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-confignode.sh + +# Linux background +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +For more details about other configuration parameters of ConfigNode, see the +[ConfigNode Configurations](../Reference/ConfigNode-Config-Manual.md). + +##### Add more ConfigNodes (Optional) + +**The ConfigNode who isn't the first one started must follow the tutorial in this section.** + +You can add more ConfigNodes to the cluster to ensure high availability of ConfigNodes. +A common configuration is to add extra two ConfigNodes to make the cluster has three ConfigNodes. + +Ensure that all configuration parameters in the ./conf/iotdb-common.properites are the same as those in the Seed-ConfigNode; +otherwise, it may fail to start or generate runtime errors. +Therefore, please check the following parameters in common configuration file: + +| **Configuration** | **Check** | +| ------------------------------------------ | -------------------------------------- | +| cluster\_name | Is consistent with the Seed-ConfigNode | +| config\_node\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | +| schema\_replication\_factor | Is consistent with the Seed-ConfigNode | +| schema\_region\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | +| data\_replication\_factor | Is consistent with the Seed-ConfigNode | +| data\_region\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | + +Then, please open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|------------------------| ------------------------------------------------------------ | +| cn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| cn\_internal\_port | The port isn't occupied | +| cn\_consensus\_port | The port isn't occupied | +| cn\_seed\_config\_node | Is set to the internal communication address of an other running ConfigNode. The internal communication address of the seed ConfigNode is recommended. | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-confignode.sh + +# Linux background +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +For more details about other configuration parameters of ConfigNode, see the +[ConfigNode Configurations](../Reference/ConfigNode-Config-Manual.md). + +##### Start DataNode + +**Before adding DataNodes, ensure that there exists at least one ConfigNode is running in the cluster.** + +You can add any number of DataNodes to the cluster. +Before adding a new DataNode, + +please open its common configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +| ----------------- | -------------------------------------- | +| cluster\_name | Is consistent with the Seed-ConfigNode | + +Then open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|-------------------------------------| ------------------------------------------------------------ | +| dn\_rpc\_address | Is set to the IPV4 address or domain name of the server | +| dn\_rpc\_port | The port isn't occupied | +| dn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| dn\_internal\_port | The port isn't occupied | +| dn\_mpp\_data\_exchange\_port | The port isn't occupied | +| dn\_data\_region\_consensus\_port | The port isn't occupied | +| dn\_schema\_region\_consensus\_port | The port isn't occupied | +| dn\_seed\_config\_node | Is set to the internal communication address of other running ConfigNodes. The internal communication address of the seed ConfigNode is recommended. | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-datanode.sh + +# Linux background +nohup bash ./sbin/start-datanode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-datanode.bat +``` + +For more details about other configuration parameters of DataNode, see the +[DataNode Configurations](../Reference/DataNode-Config-Manual.md). + +**Notice: The cluster can provide services only if the number of its DataNodes is no less than the number of replicas(max{schema\_replication\_factor, data\_replication\_factor}).** + +#### Start Cli + +If the cluster is in local environment, you can directly run the Cli startup script in the ./sbin directory: + +``` +# Linux +./sbin/start-cli.sh + +# Windows +.\sbin\start-cli.bat +``` + +If you want to use the Cli to connect to a cluster in the production environment, +Please read the [Cli manual](../Tools-System/CLI.md). + +#### Verify Cluster + +Use a 3C3D(3 ConfigNodes and 3 DataNodes) as an example. +Assumed that the IP addresses of the 3 ConfigNodes are 192.168.1.10, 192.168.1.11 and 192.168.1.12, and the default ports 10710 and 10720 are used. +Assumed that the IP addresses of the 3 DataNodes are 192.168.1.20, 192.168.1.21 and 192.168.1.22, and the default ports 6667, 10730, 10740, 10750 and 10760 are used. + +After starting the cluster successfully according to chapter 6.1, you can run the `show cluster details` command on the Cli, and you will see the following results: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort| RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 192.168.1.10| 10710| 10720| | | | | | +| 2|ConfigNode|Running| 192.168.1.11| 10710| 10720| | | | | | +| 3|ConfigNode|Running| 192.168.1.12| 10710| 10720| | | | | | +| 1| DataNode|Running| 192.168.1.20| 10730| |192.168.1.20| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 192.168.1.21| 10730| |192.168.1.21| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| 192.168.1.22| 10730| |192.168.1.22| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +If the status of all Nodes is **Running**, the cluster deployment is successful. +Otherwise, read the run logs of the Node that fails to start and +check the corresponding configuration parameters. + +#### Stop IoTDB + +This section describes how to manually shut down the ConfigNode or DataNode process of the IoTDB. + +##### Stop ConfigNode by script + +Run the stop ConfigNode script: + +``` +# Linux +./sbin/stop-confignode.sh + +# Windows +.\sbin\stop-confignode.bat +``` + +##### Stop DataNode by script + +Run the stop DataNode script: + +``` +# Linux +./sbin/stop-datanode.sh + +# Windows +.\sbin\stop-datanode.bat +``` + +##### Kill Node process + +Get the process number of the Node: + +``` +jps + +# or + +ps aux | grep iotdb +``` + +Kill the process: + +``` +kill -9 +``` + +**Notice Some ports require root access, in which case use sudo** + +#### Shrink the Cluster + +This section describes how to remove ConfigNode or DataNode from the cluster. + +##### Remove ConfigNode + +Before removing a ConfigNode, ensure that there is at least one active ConfigNode in the cluster after the removal. +Run the remove-confignode script on an active ConfigNode: + +``` +# Linux +# Remove the ConfigNode with confignode_id +./sbin/remove-confignode.sh + +# Remove the ConfigNode with address:port +./sbin/remove-confignode.sh : + + +# Windows +# Remove the ConfigNode with confignode_id +.\sbin\remove-confignode.bat + +# Remove the ConfigNode with address:port +.\sbin\remove-confignode.bat : +``` + +##### Remove DataNode + +Before removing a DataNode, ensure that the cluster has at least the number of data/schema replicas DataNodes. +Run the remove-datanode script on an active DataNode: + +``` +# Linux +# Remove the DataNode with datanode_id +./sbin/remove-datanode.sh + +# Remove the DataNode with rpc address:port +./sbin/remove-datanode.sh : + + +# Windows +# Remove the DataNode with datanode_id +.\sbin\remove-datanode.bat + +# Remove the DataNode with rpc address:port +.\sbin\remove-datanode.bat : +``` + +### FAQ + +See [FAQ](../FAQ/Frequently-asked-questions.md). + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Concept.md b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Concept.md new file mode 100644 index 00000000..d2db5497 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Concept.md @@ -0,0 +1,117 @@ + + +# Cluster Concept + +## Basic Concepts of IoTDB Cluster + +Apache IoTDB Cluster contains two types of nodes: ConfigNode and DataNode, each is a process that could be deployed independently. + +An illustration of the cluster architecture: + + + +ConfigNode is the control node of the cluster, which manages the cluster's node status, partition information, etc. All ConfigNodes in the cluster form a highly available group, which is fully replicated. + +Notice:The replication factor of ConfigNode is all ConfigNodes that has joined the Cluster. Over half of the ConfigNodes is Running could the cluster work. + +DataNode stores the data and schema of the cluster, which manages multiple data regions and schema regions. Data is a time-value pair, and schema is the path and data type of each time series. + +Client could only connect to the DataNode for operation. + +### Concepts + +| Concept | Type | Description | +|:------------------|:---------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------| +| ConfigNode | node role | Configuration node, which manages cluster node information and partition information, monitors cluster status and controls load balancing | +| DataNode | node role | Data node, which manages data and meta data | +| Database | meta data | Database, data are isolated physically from different databases | +| DeviceId | device id | The full path from root to the penultimate level in the metadata tree represents a device id | +| SeriesSlot | schema partition | Each database contains many SeriesSlots, the partition key being DeviceId | +| SchemaRegion | schema region | A collection of multiple SeriesSlots | +| SchemaRegionGroup | logical concept | The number of SchemaRegions contained in group is the number of schema replicas, it manages the same schema data, and back up each other | +| SeriesTimeSlot | data partition | The data of a time interval of SeriesSlot, a SeriesSlot contains multiple SeriesTimeSlots, the partition key being timestamp | +| DataRegion | data region | A collection of multiple SeriesTimeSlots | +| DataRegionGroup | logical concept | The number of DataRegions contained in group is the number of data replicas, it manages the same data, and back up each other | + +## Characteristics of Cluster + +* Native Cluster Architecture + * All modules are designed for cluster. + * Standalone is a special form of Cluster. +* High Scalability + * Support adding nodes in a few seconds without data migration. +* Massive Parallel Processing Architecture + * Adopt the MPP architecture and volcano module for data processing, which have high extensibility. +* Configurable Consensus Protocol + * We could adopt different consensus protocol for data replicas and schema replicas. +* Extensible Partition Strategy + * The cluster adopts the lookup table for data and schema partitions, which is flexible to extend. +* Built-in Metric Framework + * Monitor the status of each node in cluster. + +## Partitioning Strategy + +The partitioning strategy partitions data and schema into different Regions, and allocates Regions to different DataNodes. + +It is recommended to set 1 database, and the cluster will dynamically allocate resources according to the number of nodes and cores. + +The database contains multiple SchemaRegions and DataRegions, which are managed by DataNodes. + +* Schema partition strategy + * For a time series schema, the ConfigNode maps the device ID (full path from root to the penultimate tier node) into a SeriesSlot and allocate this SeriesSlot to a SchemaRegionGroup. +* Data partition strategy + * For a time series data point, the ConfigNode will map to a SeriesSlot according to the DeviceId, and then map it to a SeriesTimeSlot according to the timestamp, and allocate this SeriesTimeSlot to a DataRegionGroup. + +IoTDB uses a slot-based partitioning strategy, so the size of the partition information is controllable and does not grow infinitely with the number of time series or devices. + +Regions will be allocated to different DataNodes to avoid single point of failure, and the load balance of different DataNodes will be ensured when Regions are allocated. + +## Replication Strategy + +The replication strategy replicates data in multiple replicas, which are copies of each other. Multiple copies can provide high-availability services together and tolerate the failure of some copies. + +A region is the basic unit of replication. Multiple replicas of a region construct a high-availability RegionGroup. + +* Replication and consensus + * ConfigNode Group: Consisting of all ConfigNodes. + * SchemaRegionGroup: The cluster has multiple SchemaRegionGroups, and each SchemaRegionGroup has multiple SchemaRegions with the same id. + * DataRegionGroup: The cluster has multiple DataRegionGroups, and each DataRegionGroup has multiple DataRegions with the same id. + +An illustration of the partition allocation in cluster: + + + +The figure contains 1 SchemaRegionGroup, and the schema_replication_factor is 3, so the 3 white SchemaRegion-0s form a replication group. + +The figure contains 3 DataRegionGroups, and the data_replication_factor is 3, so there are 9 DataRegions in total. + +## Consensus Protocol (Consistency Protocol) + +Among multiple Regions of each RegionGroup, consistency is guaranteed through a consensus protocol, which routes read and write requests to multiple replicas. + +* Current supported consensus protocol + * SimpleConsensus:Provide strong consistency, could only be used when replica is 1, which is the empty implementation of the consensus protocol. + * IoTConsensus:Provide eventual consistency, could be used in any number of replicas, 2 replicas could avoid single point failure, only for DataRegion, writings can be applied on each replica and replicated asynchronously to other replicas. + * RatisConsensus:Provide Strong consistency, using raft consensus protocol, Could be used in any number of replicas, and could be used for any region groups. + Currently, DataRegion uses RatisConsensus does not support multiple data directories. This feature is planned to be supported in future releases. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Maintenance.md b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Maintenance.md new file mode 100644 index 00000000..5250bd8f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Maintenance.md @@ -0,0 +1,718 @@ + + +# Cluster Information Query Command + +## Show Variables + +Currently, IoTDB supports showing key parameters of the cluster: +``` +SHOW VARIABLES +``` + +Eg: +``` +IoTDB> show variables ++----------------------------------+-----------------------------------------------------------------+ +| Variables| Value| ++----------------------------------+-----------------------------------------------------------------+ +| ClusterName| defaultCluster| +| DataReplicationFactor| 1| +| SchemaReplicationFactor| 1| +| DataRegionConsensusProtocolClass| org.apache.iotdb.consensus.iot.IoTConsensus| +|SchemaRegionConsensusProtocolClass| org.apache.iotdb.consensus.ratis.RatisConsensus| +| ConfigNodeConsensusProtocolClass| org.apache.iotdb.consensus.ratis.RatisConsensus| +| TimePartitionInterval| 604800000| +| DefaultTTL(ms)| 9223372036854775807| +| ReadConsistencyLevel| strong| +| SchemaRegionPerDataNode| 1.0| +| DataRegionPerDataNode| 5.0| +| LeastDataRegionGroupNum| 5| +| SeriesSlotNum| 10000| +| SeriesSlotExecutorClass|org.apache.iotdb.commons.partition.executor.hash.BKDRHashExecutor| +| DiskSpaceWarningThreshold| 0.05| ++----------------------------------+-----------------------------------------------------------------+ +Total line number = 15 +It costs 0.225s +``` + +**Notice:** Ensure that all key parameters displayed in this SQL are consist on each node in the same cluster + +## Show ConfigNode information + +Currently, IoTDB supports showing ConfigNode information by the following SQL: +``` +SHOW CONFIGNODES +``` + +Eg: +``` +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +### ConfigNode status definition +The ConfigNode statuses are defined as follows: + +- **Running**: The ConfigNode is running properly. +- **Unknown**: The ConfigNode doesn't report heartbeat properly. + - Can't receive data synchronized from other ConfigNodes + - Won't be selected as the cluster ConfigNode-leader + +## Show DataNode information + +Currently, IoTDB supports showing DataNode information by the following SQL: +``` +SHOW DATANODES +``` + +Eg: +``` +IoTDB> create timeseries root.sg.d1.s1 with datatype=BOOLEAN,encoding=PLAIN +Msg: The statement is executed successfully. +IoTDB> create timeseries root.sg.d2.s1 with datatype=BOOLEAN,encoding=PLAIN +Msg: The statement is executed successfully. +IoTDB> create timeseries root.ln.d1.s1 with datatype=BOOLEAN,encoding=PLAIN +Msg: The statement is executed successfully. +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 127.0.0.1| 6667| 0| 1| +| 2|Running| 127.0.0.1| 6668| 0| 1| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 2 +It costs 0.007s + +IoTDB> insert into root.ln.d1(timestamp,s1) values(1,true) +Msg: The statement is executed successfully. +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 127.0.0.1| 6667| 1| 1| +| 2|Running| 127.0.0.1| 6668| 0| 1| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 2 +It costs 0.006s +``` + +### DataNode status definition +The state machine of DataNode is shown in the figure below: + + +The DataNode statuses are defined as follows: + +- **Running**: The DataNode is running properly and is readable and writable. +- **Unknown**: The DataNode doesn't report heartbeat properly, the ConfigNode considers the DataNode as unreadable and un-writable. + - The cluster is still readable and writable if some DataNodes are Unknown +- **Removing**: The DataNode is being removed from the cluster and is unreadable and un-writable. + - The cluster is still readable and writable if some DataNodes are Removing +- **ReadOnly**: The remaining disk space of DataNode is lower than disk_warning_threshold(default is 5%), the DataNode is readable but un-writable and cannot synchronize data. + - The cluster is still readable and writable if some DataNodes are ReadOnly + - The schema and data in a ReadOnly DataNode is readable + - The schema and data in a ReadOnly DataNode is deletable + - A ReadOnly DataNode is creatable for schema, but un-writable for data + - Data cannot be written to the cluster when all DataNodes are ReadOnly, but new Databases and schema is still creatable + +**For a DataNode**, the following table describes the impact of schema read, write, and deletion in different status: + +| DataNode status | readable | creatable | deletable | +|-----------------|----------|-----------|-----------| +| Running | yes | yes | yes | +| Unknown | no | no | no | +| Removing | no | no | no | +| ReadOnly | yes | yes | yes | + +**For a DataNode**, the following table describes the impact of data read, write, and deletion in different status: + +| DataNode status | readable | writable | deletable | +|-----------------|----------|----------|-----------| +| Running | yes | yes | yes | +| Unknown | no | no | no | +| Removing | no | no | no | +| ReadOnly | yes | no | yes | + +## Show all Node information + +Currently, IoTDB supports show the information of all Nodes by the following SQL: +``` +SHOW CLUSTER +``` + +Eg: +``` +IoTDB> show cluster ++------+----------+-------+---------------+------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort| ++------+----------+-------+---------------+------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| +| 1|ConfigNode|Running| 127.0.0.1| 10711| +| 2|ConfigNode|Running| 127.0.0.1| 10712| +| 3| DataNode|Running| 127.0.0.1| 10730| +| 4| DataNode|Running| 127.0.0.1| 10731| +| 5| DataNode|Running| 127.0.0.1| 10732| ++------+----------+-------+---------------+------------+ +Total line number = 6 +It costs 0.011s +``` + +After a node is stopped, its status will change, as shown below: +``` +IoTDB> show cluster ++------+----------+-------+---------------+------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort| ++------+----------+-------+---------------+------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| +| 1|ConfigNode|Unknown| 127.0.0.1| 10711| +| 2|ConfigNode|Running| 127.0.0.1| 10712| +| 3| DataNode|Running| 127.0.0.1| 10730| +| 4| DataNode|Running| 127.0.0.1| 10731| +| 5| DataNode|Running| 127.0.0.1| 10732| ++------+----------+-------+---------------+------------+ +Total line number = 6 +It costs 0.012s +``` + +Show the details of all nodes: +``` +SHOW CLUSTER DETAILS +``` + +Eg: +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 1|ConfigNode|Running| 127.0.0.1| 10711| 10721| | | | | | +| 2|ConfigNode|Running| 127.0.0.1| 10712| 10722| | | | | | +| 3| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 127.0.0.1| 10731| | 127.0.0.1| 6668| 10741| 10751| 10761| +| 5| DataNode|Running| 127.0.0.1| 10732| | 127.0.0.1| 6669| 10742| 10752| 10762| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.340s +``` + +## Show Region information + +The cluster uses a SchemaRegion/DataRegion as a unit for schema/data replication and data management. +The Region status and distribution is helpful for system operation and maintenance testing, as shown in the following scenarios: + +- Check which DataNodes are allocated to each Region in the cluster and whether they are balanced. +- Check the partitions allocated to each Region in the cluster and whether they are balanced. +- Check which DataNodes are allocated by the leaders of each RegionGroup in the cluster and whether they are balanced. + +Currently, IoTDB supports show Region information by the following SQL: + +- `SHOW REGIONS`: Show distribution of all Regions. +- `SHOW SCHEMA REGIONS`: Show distribution of all SchemaRegions. +- `SHOW DATA REGIONS`: Show distribution of all DataRegions. +- `SHOW (DATA|SCHEMA)? REGIONS OF DATABASE `: Show Region distribution of specified StorageGroups. +- `SHOW (DATA|SCHEMA)? REGIONS ON NODEID `: Show Region distribution on specified Nodes. +- `SHOW (DATA|SCHEMA)? REGIONS (OF DATABASE )? (ON NODEID )?`: Show Region distribution of specified StorageGroups on specified Nodes. + +Show distribution of all Regions: +``` +IoTDB> show regions ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2| DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 12 +It costs 0.165s +``` + +The SeriesSlotNum refers to the number of the seriesSlots in the region. In the same light, the TimeSlotNum means the number of the timeSlots in the region. + +Show the distribution of SchemaRegions or DataRegions: +``` +IoTDB> show data regions ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2| DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.011s + +IoTDB> show schema regions ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.012s +``` + +Show Region distribution of specified DataBases: +``` +IoTDB> show regions of database root.sg1 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+-- -----+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.007s + +IoTDB> show regions of database root.sg1, root.sg2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2| DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 12 +It costs 0.009s + +IoTDB> show data regions of database root.sg1, root.sg2 ++--------+----------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+----------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0|DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0|DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0|DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 2|DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2|DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2|DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| ++--------+----------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.007s + +IoTDB> show schema regions of database root.sg1, root.sg2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.009s +``` + +Show Region distribution on specified Nodes: +``` +IoTDB> show regions on nodeid 1 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 4 +It costs 0.165s + +IoTDB> show regions on nodeid 1, 2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:19.011| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 8 +It costs 0.165s +``` + +Show Region distribution of specified StorageGroups on specified Nodes: +``` +IoTDB> show regions of database root.sg1 on nodeid 1 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 2 +It costs 0.165s + +IoTDB> show data regions of database root.sg1, root.sg2 on nodeid 1, 2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:19.011| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 4 +It costs 0.165s +``` + + +### Region status definition +Region inherits the status of the DataNode where the Region resides. And Region states are defined as follows: + +- **Running**: The DataNode where the Region resides is running properly, the Region is readable and writable. +- **Unknown**: The DataNode where the Region resides doesn't report heartbeat properly, the ConfigNode considers the Region is unreadable and un-writable. +- **Removing**: The DataNode where the Region resides is being removed from the cluster, the Region is unreadable and un-writable. +- **ReadOnly**: The available disk space of the DataNode where the Region resides is lower than the disk_warning_threshold(5% by default). The Region is readable but un-writable and cannot synchronize data. + +**The status switchover of a Region doesn't affect the belonged RegionGroup**, +when setting up a multi-replica cluster(i.e. the number of schema replica and data replica is greater than 1), +other Running Regions of the same RegionGroup ensure the high availability of RegionGroup. + +**For a RegionGroup:** +- It's readable, writable and deletable if and only if more than half of its Regions are Running +- It's unreadable, un-writable and un-deletable when the number of its Running Regions is less than half + +## Show cluster slots information + +The cluster uses partitions for schema and data arrangement, the partition defined as follows: + +- `SchemaPartition`: SeriesSlot +- `DataPartition`: + +More details can be found in the [Cluster-Concept](./Cluster-Concept.md) document. + +The cluster slots information can be shown by the following SQLs: + +### Show the DataRegion where a DataPartition resides in + +Show the DataRegion where a DataPartition of a certain database or device resides in: + +- `SHOW DATA REGIONID WHERE (DATABASE=root.xxx |DEVICE=root.xxx.xxx) (AND TIME=xxxxx)?` + +Specifications: + +1. "DEVICE" corresponds to a unique SeriesSlot for the device path, while "TIME" corresponds to a unique SeriesTimeSlot for either a timestamp or a universal time. + +2. "DATABASE" and "DEVICE" must begin with "root". If the path does not exist, it will return empty instead of reporting an error, as will be seen below. + +3. Currently, "DATABASE" and "DEVICE" do not support wildcard matching or multiple queries. If it contains a wildcard character(such as * or **) or multiple DATABASE and DEVICE, an error will be reported, as will be seen below. + +4. "TIME" supports both timestamps and universal dates. For timestamp, it must be greater than or equal to 0. For universal time, it need to be no earlier than 1970-01-01 00:00:00. + +Eg: +``` +IoTDB> show data regionid where device=root.sg.m1.d1 ++--------+ +|RegionId| ++--------+ +| 1| +| 2| ++--------+ +Total line number = 2 +It costs 0.006s + +IoTDB> show data regionid where device=root.sg.m1.d1 and time=604800000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s + +IoTDB> show data regionid where device=root.sg.m1.d1 and time=1970-01-08T00:00:00.000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s + +IoTDB> show data regionid where database=root.sg ++--------+ +|RegionId| ++--------+ +| 1| +| 2| ++--------+ +Total line number = 2 +It costs 0.006s + +IoTDB> show data regionid where database=root.sg and time=604800000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s + +IoTDB> show data regionid where database=root.sg and time=1970-01-08T00:00:00.000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s +``` + +### Show the SchemaRegion where a SchemaPartition resides in + +Show the SchemaRegion where a DataPartition of a certain database or device resides in: + +- `SHOW SCHEMA REGIONID WHERE (DATABASE=root.xxx | DEVICE=root.xxx.xxx)` + +Eg: +``` +IoTDB> show schema regionid where device=root.sg.m1.d2 ++--------+ +|RegionId| ++--------+ +| 0| ++--------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show schema regionid where database=root.sg ++--------+ +|RegionId| ++--------+ +| 0| ++--------+ +Total line number = 1 +It costs 0.007s +``` + +### Show Database's series slots + +Show the data/schema series slots related to a database: +- `SHOW (DATA|SCHEMA) SERIESSLOTID WHERE DATABASE=root.xxx` + +Eg: +``` +IoTDB> show data seriesslotid where database = root.sg ++------------+ +|SeriesSlotId| ++------------+ +| 5286| ++------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show schema seriesslotid where database = root.sg ++------------+ +|SeriesSlotId| ++------------+ +| 5286| ++------------+ +Total line number = 1 +It costs 0.006s +``` + +### Show the time partition under filtering conditions. + +Show the TimePartition of a certain device, database, or DataRegion. + +- `SHOW TIMEPARTITION WHERE (DEVICE=root.a.b |REGIONID = r0 | DATABASE=root.xxx) (AND STARTTIME=t1)?(AND ENDTIME=t2)?` + +Specifications: + +1. TimePartition is short for SeriesTimeSlotId. +2. If REGIONID is the Id of schemaRegion, return empty instead of reporting an error. +3. REGIONID do not support multiple queries. If it contains multiple REGIONID, an error will be reported, as will be seen below. +4. "STARTTIME" and "ENDTIME" support both timestamps and universal dates. For timestamp, it must be greater than or equal to 0. For universal time, it need to be no earlier than 1970-01-01 00:00:00. +5. The StartTime in the returned result is the starting time of the TimePartition's corresponding time interval. + +Eg: +``` +IoTDB> show timePartition where device=root.sg.m1.d1 ++-------------------------------------+ +|TimePartition| StartTime| ++-------------------------------------+ +| 0|1970-01-01T00:00:00.000| ++-------------------------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show timePartition where regionId = 1 ++-------------------------------------+ +|TimePartition| StartTime| ++-------------------------------------+ +| 0|1970-01-01T00:00:00.000| ++-------------------------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show timePartition where database = root.sg ++-------------------------------------+ +|TimePartition| StartTime| ++-------------------------------------+ +| 0|1970-01-01T00:00:00.000| ++-------------------------------------+ +| 1|1970-01-08T00:00:00.000| ++-------------------------------------+ +Total line number = 2 +It costs 0.007s +``` +#### Count the time partition under filtering conditions. + +Count the TimePartition of a certain device, database, or DataRegion. + +- `COUNT TIMEPARTITION WHERE (DEVICE=root.a.b |REGIONID = r0 | DATABASE=root.xxx) (AND STARTTIME=t1)?(AND ENDTIME=t2)?` + +``` +IoTDB> count timePartition where device=root.sg.m1.d1 ++--------------------+ +|count(timePartition)| ++--------------------+ +| 1| ++--------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> count timePartition where regionId = 1 ++--------------------+ +|count(timePartition)| ++--------------------+ +| 1| ++--------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> count timePartition where database = root.sg ++--------------------+ +|count(timePartition)| ++--------------------+ +| 2| ++--------------------+ +Total line number = 1 +It costs 0.007s +``` + + +## Migrate Region +The following sql can be applied to manually migrate a region, for load balancing or other purposes. +``` +MIGRATE REGION FROM TO +``` +Eg: +``` +IoTDB> SHOW REGIONS ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +|RegionId| Type| Status| Database|SeriesSlotId|TimeSlotId|DataNodeId|RpcAddress|RpcPort| Role| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 3| 127.0.0.1| 6670| Leader| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 4| 127.0.0.1| 6681|Follower| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 5| 127.0.0.1| 6668|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 1| 127.0.0.1| 6667|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 3| 127.0.0.1| 6670|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 7| 127.0.0.1| 6669| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 3| 127.0.0.1| 6670| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 4| 127.0.0.1| 6681|Follower| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 5| 127.0.0.1| 6668|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 1| 127.0.0.1| 6667|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 5| 127.0.0.1| 6668| Leader| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 7| 127.0.0.1| 6669|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 3| 127.0.0.1| 6670|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681| Leader| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 7| 127.0.0.1| 6669|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 1| 127.0.0.1| 6667| Leader| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 5| 127.0.0.1| 6668|Follower| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +Total line number = 18 +It costs 0.161s + +IoTDB> MIGRATE REGION 1 FROM 3 TO 4 +Msg: The statement is executed successfully. + +IoTDB> SHOW REGIONS ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +|RegionId| Type| Status| Database|SeriesSlotId|TimeSlotId|DataNodeId|RpcAddress|RpcPort| Role| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 3| 127.0.0.1| 6670| Leader| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 4| 127.0.0.1| 6681|Follower| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 5| 127.0.0.1| 6668|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 1| 127.0.0.1| 6667|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 4| 127.0.0.1| 6681|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 7| 127.0.0.1| 6669| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 3| 127.0.0.1| 6670| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 4| 127.0.0.1| 6681|Follower| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 5| 127.0.0.1| 6668|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 1| 127.0.0.1| 6667|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 5| 127.0.0.1| 6668| Leader| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 7| 127.0.0.1| 6669|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 3| 127.0.0.1| 6670|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681| Leader| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 7| 127.0.0.1| 6669|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 1| 127.0.0.1| 6667| Leader| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 5| 127.0.0.1| 6668|Follower| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +Total line number = 18 +It costs 0.165s +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Setup.md b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Setup.md new file mode 100644 index 00000000..58fd391b --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Setup.md @@ -0,0 +1,447 @@ + + +# Cluster Setup + +## 1. Purpose + +This document describes how to install and start IoTDB Cluster (1.0.0). + +## 2. Prerequisites + +1. JDK>=1.8. +2. Max open file 65535. +3. Disable the swap memory. +4. Ensure that data/confignode directory has been cleared when starting ConfigNode for the first time, +and data/datanode directory has been cleared when starting DataNode for the first time +5. Turn off the firewall of the server if the entire cluster is in a trusted environment. +6. By default, IoTDB Cluster will use ports 10710, 10720 for the ConfigNode and +6667, 10730, 10740, 10750 and 10760 for the DataNode. +Please make sure those ports are not occupied, or you will modify the ports in configuration files. + +## 3. Get the Installation Package + +You can either download the binary release files (see Chap 3.1) or compile with source code (see Chap 3.2). + +### 3.1 Download the binary distribution + +1. Open our website [Download Page](https://iotdb.apache.org/Download/). +2. Download the binary distribution. +3. Decompress to get the apache-iotdb-1.0.0-all-bin directory. + +### 3.2 Compile with source code + +#### 3.2.1 Download the source code + +**Git** +``` +git clone https://github.com/apache/iotdb.git +git checkout v1.0.0 +``` + +**Website** +1. Open our website [Download Page](https://iotdb.apache.org/Download/). +2. Download the source code. +3. Decompress to get the apache-iotdb-1.0.0 directory. + +#### 3.2.2 Compile source code + +Under the source root folder: +``` +mvn clean package -pl distribution -am -DskipTests +``` + +Then you will get the binary distribution under +**distribution/target/apache-iotdb-1.0.0-SNAPSHOT-all-bin/apache-iotdb-1.0.0-SNAPSHOT-all-bin**. + +## 4. Binary Distribution Content + +| **Folder** | **Description** | +|-------------------------|---------------------------------------------------------------------------------------------------| +| conf | Configuration files folder, contains configuration files of ConfigNode, DataNode, JMX and logback | +| data | Data files folder, contains data files of ConfigNode and DataNode | +| lib | Jar files folder | +| licenses | Licenses files folder | +| logs | Logs files folder, contains logs files of ConfigNode and DataNode | +| sbin | Shell files folder, contains start/stop/remove shell of ConfigNode and DataNode, cli shell | +| tools | System tools | + +## 5. Cluster Installation and Configuration + +### 5.1 Cluster Installation + +`apache-iotdb-1.0.0-SNAPSHOT-all-bin` contains both the ConfigNode and the DataNode. +Please deploy the files to all servers of your target cluster. +A best practice is deploying the files into the same directory in all servers. + +If you want to try the cluster mode on one server, please read +[Cluster Quick Start](https://iotdb.apache.org/UserGuide/Master/QuickStart/ClusterQuickStart.html). + +### 5.2 Cluster Configuration + +We need to modify the configurations on each server. +Therefore, login each server and switch the working directory to `apache-iotdb-1.0.0-SNAPSHOT-all-bin`. +The configuration files are stored in the `./conf` directory. + +For all ConfigNode servers, we need to modify the common configuration (see Chap 5.2.1) +and ConfigNode configuration (see Chap 5.2.2). + +For all DataNode servers, we need to modify the common configuration (see Chap 5.2.1) +and DataNode configuration (see Chap 5.2.3). + +#### 5.2.1 Common configuration + +Open the common configuration file ./conf/iotdb-system.properties, +and set the following parameters base on the +[Deployment Recommendation](https://iotdb.apache.org/UserGuide/Master/Cluster/Deployment-Recommendation.html): + +| **Configuration** | **Description** | **Default** | +|--------------------------------------------|--------------------------------------------------------------------------------------------------------------------|-------------------------------------------------| +| cluster\_name | Cluster name for which the Node to join in | defaultCluster | +| config\_node\_consensus\_protocol\_class | Consensus protocol of ConfigNode | org.apache.iotdb.consensus.ratis.RatisConsensus | +| schema\_replication\_factor | Schema replication factor, no more than DataNode number | 1 | +| schema\_region\_consensus\_protocol\_class | Consensus protocol of schema replicas | org.apache.iotdb.consensus.ratis.RatisConsensus | +| data\_replication\_factor | Data replication factor, no more than DataNode number | 1 | +| data\_region\_consensus\_protocol\_class | Consensus protocol of data replicas. Note that RatisConsensus currently does not support multiple data directories | org.apache.iotdb.consensus.iot.IoTConsensus | + +**Notice: The preceding configuration parameters cannot be changed after the cluster is started. Ensure that the common configurations of all Nodes are the same. Otherwise, the Nodes cannot be started.** + +#### 5.2.2 ConfigNode configuration + +Open the ConfigNode configuration file ./conf/iotdb-system.properties, +and set the following parameters based on the IP address and available port of the server or VM: + +| **Configuration** | **Description** | **Default** | **Usage** | +|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|-----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| cn\_internal\_address | Internal rpc service address of ConfigNode | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| cn\_internal\_port | Internal rpc service port of ConfigNode | 10710 | Set to any unoccupied port | +| cn\_consensus\_port | ConfigNode replication consensus protocol communication port | 10720 | Set to any unoccupied port | +| cn\_target\_config\_node\_list | ConfigNode address to which the node is connected when it is registered to the cluster. Note that Only one ConfigNode can be configured. | 127.0.0.1:10710 | For Seed-ConfigNode, set to its own cn\_internal\_address:cn\_internal\_port; For other ConfigNodes, set to other one running ConfigNode's cn\_internal\_address:cn\_internal\_port | + +**Notice: The preceding configuration parameters cannot be changed after the node is started. Ensure that all ports are not occupied. Otherwise, the Node cannot be started.** + +#### 5.2.3 DataNode configuration + +Open the DataNode configuration file ./conf/iotdb-system.properties, +and set the following parameters based on the IP address and available port of the server or VM: + +| **Configuration** | **Description** | **Default** | **Usage** | +|-------------------------------------|--------------------------------------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------| +| dn\_rpc\_address | Client RPC Service address | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| dn\_rpc\_port | Client RPC Service port | 6667 | Set to any unoccupied port | +| dn\_internal\_address | Control flow address of DataNode inside cluster | 127.0.0.1 | Set to the IPV4 address or domain name of the server | +| dn\_internal\_port | Control flow port of DataNode inside cluster | 10730 | Set to any unoccupied port | +| dn\_mpp\_data\_exchange\_port | Data flow port of DataNode inside cluster | 10740 | Set to any unoccupied port | +| dn\_data\_region\_consensus\_port | Data replicas communication port for consensus | 10750 | Set to any unoccupied port | +| dn\_schema\_region\_consensus\_port | Schema replicas communication port for consensus | 10760 | Set to any unoccupied port | +| dn\_target\_config\_node\_list | Running ConfigNode of the Cluster | 127.0.0.1:10710 | Set to any running ConfigNode's cn\_internal\_address:cn\_internal\_port. You can set multiple values, separate them with commas(",") | + +**Notice: The preceding configuration parameters cannot be changed after the node is started. Ensure that all ports are not occupied. Otherwise, the Node cannot be started.** + +## 6. Cluster Operation + +### 6.1 Starting the cluster + +This section describes how to start a cluster that includes several ConfigNodes and DataNodes. +The cluster can provide services only by starting at least one ConfigNode +and no less than the number of data/schema_replication_factor DataNodes. + +The total process are three steps: + +* Start the Seed-ConfigNode +* Add ConfigNode (Optional) +* Add DataNode + +#### 6.1.1 Start the Seed-ConfigNode + +**The first Node started in the cluster must be ConfigNode. The first started ConfigNode must follow the tutorial in this section.** + +The first ConfigNode to start is the Seed-ConfigNode, which marks the creation of the new cluster. +Before start the Seed-ConfigNode, please open the common configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|--------------------------------------------|-------------------------------------------------| +| cluster\_name | Is set to the expected name | +| config\_node\_consensus\_protocol\_class | Is set to the expected consensus protocol | +| schema\_replication\_factor | Is set to the expected schema replication count | +| schema\_region\_consensus\_protocol\_class | Is set to the expected consensus protocol | +| data\_replication\_factor | Is set to the expected data replication count | +| data\_region\_consensus\_protocol\_class | Is set to the expected consensus protocol | + +**Notice:** Please set these parameters carefully based on the [Deployment Recommendation](https://iotdb.apache.org/UserGuide/Master/Cluster/Deployment-Recommendation.html). +These parameters are not modifiable after the Node first startup. + +Then open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|--------------------------------|-----------------------------------------------------------------------------------------------------| +| cn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| cn\_internal\_port | The port isn't occupied | +| cn\_consensus\_port | The port isn't occupied | +| cn\_target\_config\_node\_list | Is set to its own internal communication address, which is cn\_internal\_address:cn\_internal\_port | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-confignode.sh + +# Linux background +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +For more details about other configuration parameters of ConfigNode, see the +[ConfigNode Configurations](https://iotdb.apache.org/UserGuide/Master/Reference/ConfigNode-Config-Manual.html). + +#### 6.1.2 Add more ConfigNodes (Optional) + +**The ConfigNode who isn't the first one started must follow the tutorial in this section.** + +You can add more ConfigNodes to the cluster to ensure high availability of ConfigNodes. +A common configuration is to add extra two ConfigNodes to make the cluster has three ConfigNodes. + +Ensure that all configuration parameters in the ./conf/iotdb-common.properites are the same as those in the Seed-ConfigNode; +otherwise, it may fail to start or generate runtime errors. +Therefore, please check the following parameters in common configuration file: + +| **Configuration** | **Check** | +|--------------------------------------------|----------------------------------------| +| cluster\_name | Is consistent with the Seed-ConfigNode | +| config\_node\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | +| schema\_replication\_factor | Is consistent with the Seed-ConfigNode | +| schema\_region\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | +| data\_replication\_factor | Is consistent with the Seed-ConfigNode | +| data\_region\_consensus\_protocol\_class | Is consistent with the Seed-ConfigNode | + +Then, please open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------| +| cn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| cn\_internal\_port | The port isn't occupied | +| cn\_consensus\_port | The port isn't occupied | +| cn\_target\_config\_node\_list | Is set to the internal communication address of an other running ConfigNode. The internal communication address of the seed ConfigNode is recommended. | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-confignode.sh + +# Linux background +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +For more details about other configuration parameters of ConfigNode, see the +[ConfigNode Configurations](https://iotdb.apache.org/UserGuide/Master/Reference/ConfigNode-Config-Manual.html). + +#### 6.1.3 Start DataNode + +**Before adding DataNodes, ensure that there exists at least one ConfigNode is running in the cluster.** + +You can add any number of DataNodes to the cluster. +Before adding a new DataNode, + +please open its common configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|--------------------------------------------|----------------------------------------| +| cluster\_name | Is consistent with the Seed-ConfigNode | + +Then open its configuration file ./conf/iotdb-system.properties and check the following parameters: + +| **Configuration** | **Check** | +|-------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| +| dn\_rpc\_address | Is set to the IPV4 address or domain name of the server | +| dn\_rpc\_port | The port isn't occupied | +| dn\_internal\_address | Is set to the IPV4 address or domain name of the server | +| dn\_internal\_port | The port isn't occupied | +| dn\_mpp\_data\_exchange\_port | The port isn't occupied | +| dn\_data\_region\_consensus\_port | The port isn't occupied | +| dn\_schema\_region\_consensus\_port | The port isn't occupied | +| dn\_target\_config\_node\_list | Is set to the internal communication address of other running ConfigNodes. The internal communication address of the seed ConfigNode is recommended. | + +After checking, you can run the startup script on the server: + +``` +# Linux foreground +bash ./sbin/start-datanode.sh + +# Linux background +nohup bash ./sbin/start-datanode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-datanode.bat +``` + +For more details about other configuration parameters of DataNode, see the +[DataNode Configurations](https://iotdb.apache.org/UserGuide/Master/Reference/DataNode-Config-Manual.html). + +**Notice: The cluster can provide services only if the number of its DataNodes is no less than the number of replicas(max{schema\_replication\_factor, data\_replication\_factor}).** + +### 6.2 Start Cli + +If the cluster is in local environment, you can directly run the Cli startup script in the ./sbin directory: + +``` +# Linux +./sbin/start-cli.sh + +# Windows +.\sbin\start-cli.bat +``` + +If you want to use the Cli to connect to a cluster in the production environment, +Please read the [Cli manual](https://iotdb.apache.org/UserGuide/Master/QuickStart/Command-Line-Interface.html). + +### 6.3 Verify Cluster + +Use a 3C3D(3 ConfigNodes and 3 DataNodes) as an example. +Assumed that the IP addresses of the 3 ConfigNodes are 192.168.1.10, 192.168.1.11 and 192.168.1.12, and the default ports 10710 and 10720 are used. +Assumed that the IP addresses of the 3 DataNodes are 192.168.1.20, 192.168.1.21 and 192.168.1.22, and the default ports 6667, 10730, 10740, 10750 and 10760 are used. + +After starting the cluster successfully according to chapter 6.1, you can run the `show cluster details` command on the Cli, and you will see the following results: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort| RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 192.168.1.10| 10710| 10720| | | | | | +| 2|ConfigNode|Running| 192.168.1.11| 10710| 10720| | | | | | +| 3|ConfigNode|Running| 192.168.1.12| 10710| 10720| | | | | | +| 1| DataNode|Running| 192.168.1.20| 10730| |192.168.1.20| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 192.168.1.21| 10730| |192.168.1.21| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| 192.168.1.22| 10730| |192.168.1.22| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +If the status of all Nodes is **Running**, the cluster deployment is successful. +Otherwise, read the run logs of the Node that fails to start and +check the corresponding configuration parameters. + +### 6.4 Stop IoTDB + +This section describes how to manually shut down the ConfigNode or DataNode process of the IoTDB. + +#### 6.4.1 Stop ConfigNode by script + +Run the stop ConfigNode script: + +``` +# Linux +./sbin/stop-confignode.sh + +# Windows +.\sbin\stop-confignode.bat +``` + +#### 6.4.2 Stop DataNode by script + +Run the stop DataNode script: + +``` +# Linux +./sbin/stop-datanode.sh + +# Windows +.\sbin\stop-datanode.bat +``` + +#### 6.4.3 Kill Node process + +Get the process number of the Node: + +``` +jps + +# or + +ps aux | grep iotdb +``` + +Kill the process: + +``` +kill -9 +``` + +**Notice Some ports require root access, in which case use sudo** + +### 6.5 Shrink the Cluster + +This section describes how to remove ConfigNode or DataNode from the cluster. + +#### 6.5.1 Remove ConfigNode + +Before removing a ConfigNode, ensure that there is at least one active ConfigNode in the cluster after the removal. +Run the remove-confignode script on an active ConfigNode: + +``` +# Linux +# Remove the ConfigNode with confignode_id +./sbin/remove-confignode.sh + +# Remove the ConfigNode with address:port +./sbin/remove-confignode.sh : + + +# Windows +# Remove the ConfigNode with confignode_id +.\sbin\remove-confignode.bat + +# Remove the ConfigNode with address:port +.\sbin\remove-confignode.bat : +``` + +#### 6.5.2 Remove DataNode + +Before removing a DataNode, ensure that the cluster has at least the number of data/schema replicas DataNodes. +Run the remove-datanode script on an active DataNode: + +``` +# Linux +# Remove the DataNode with datanode_id +./sbin/remove-datanode.sh + +# Remove the DataNode with rpc address:port +./sbin/remove-datanode.sh : + + +# Windows +# Remove the DataNode with datanode_id +.\sbin\remove-datanode.bat + +# Remove the DataNode with rpc address:port +.\sbin\remove-datanode.bat : +``` + +## 7. FAQ + +See [FAQ](https://iotdb.apache.org/UserGuide/Master/FAQ/FAQ-for-cluster-setup.html) diff --git a/src/UserGuide/V2.0.1/Tree/stage/Cluster/Get-Installation-Package.md b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Get-Installation-Package.md new file mode 100644 index 00000000..9301efe3 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Cluster/Get-Installation-Package.md @@ -0,0 +1,223 @@ + + +# Get Installation Package + +IoTDB provides you three installation methods, you can refer to the following suggestions, choose one of them: + +* Installation from source code. If you need to modify the code yourself, you can use this method. +* Installation from binary files. Download the binary files from the official website. This is the recommended method, in which you will get a binary released package which is out-of-the-box. +* Using Docker. The path to the dockerfile is https://github.com/apache/iotdb/blob/master/docker + +## Prerequisites + +To use IoTDB, you need to have: + +1. Java >= 1.8 (Please make sure the environment path has been set) +2. Maven >= 3.6 (Optional) +3. Set the max open files num as 65535 to avoid "too many open files" problem. + +>Note: If you don't have maven installed, you should replace 'mvn' in the following commands with 'mvnw' or 'mvnw.cmd'. +> +>### Installation from binary files + +You can download the binary file from: +[Download page](https://iotdb.apache.org/Download/) + +## Installation from source code + +You can get the released source code from https://iotdb.apache.org/Download/, or from the git repository https://github.com/apache/iotdb/tree/master +You can download the source code from: + +``` +git clone https://github.com/apache/iotdb.git +``` + +After that, go to the root path of IoTDB. If you want to build the version that we have released, you need to create and check out a new branch by command `git checkout -b my_{project.version} v{project.version}`. E.g., you want to build the version `0.12.4`, you can execute this command to make it: + +```shell +> git checkout -b my_0.12.4 v0.12.4 +``` + +Then you can execute this command to build the version that you want: + +``` +> mvn clean package -DskipTests +``` + +Then the binary version (including both server and client) can be found at **distribution/target/apache-iotdb-{project.version}-bin.zip** + +> NOTE: Directories "thrift/target/generated-sources/thrift" and "antlr/target/generated-sources/antlr4" need to be added to sources roots to avoid compilation errors in IDE. + +If you would like to build the IoTDB server, you can run the following command under the root path of iotdb: + +``` +> mvn clean package -pl iotdb-core/datanode -am -DskipTests +``` + +After build, the IoTDB server will be at the folder "server/target/iotdb-server-{project.version}". + +If you would like to build a module, you can execute command `mvn clean package -pl {module.name} -am -DskipTests` under the root path of IoTDB. +If you need the jar with dependencies, you can add parameter `-P get-jar-with-dependencies` after the command. E.g., If you need the jar of jdbc with dependencies, you can execute this command: + +```shell +> mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies +``` + +Then you can find it under the path `{module.name}/target`. + +## Installation by Docker + +Apache IoTDB' Docker image is released on [https://hub.docker.com/r/apache/iotdb](https://hub.docker.com/r/apache/iotdb) +Add environments of docker to update the configurations of Apache IoTDB. + +### Have a try + +```shell +# get IoTDB official image +docker pull apache/iotdb:1.1.0-standalone +# create docker bridge network +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +# create docker container +docker run -d --name iotdb-service \ + --hostname iotdb-service \ + --network iotdb \ + --ip 172.18.0.6 \ + -p 6667:6667 \ + -e cn_internal_address=iotdb-service \ + -e cn_seed_config_node=iotdb-service:10710 \ + -e cn_internal_port=10710 \ + -e cn_consensus_port=10720 \ + -e dn_rpc_address=iotdb-service \ + -e dn_internal_address=iotdb-service \ + -e dn_seed_config_node=iotdb-service:10710 \ + -e dn_mpp_data_exchange_port=10740 \ + -e dn_schema_region_consensus_port=10750 \ + -e dn_data_region_consensus_port=10760 \ + -e dn_rpc_port=6667 \ + apache/iotdb:1.1.0-standalone +# execute SQL +docker exec -ti iotdb-service /iotdb/sbin/start-cli.sh -h iotdb-service +``` + +External access: + +```shell +# is the real IP or domain address rather than the one in docker network, could be 127.0.0.1 within the computer. +$IOTDB_HOME/sbin/start-cli.sh -h -p 6667 +``` + +Notice:The confignode service would fail when restarting this container if the IP Adress of the container has been changed. + +```yaml +# docker-compose-standalone.yml +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.1.0-standalone + hostname: iotdb-service + container_name: iotdb-service + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb-service + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-service:10710 + - dn_rpc_address=iotdb-service + - dn_internal_address=iotdb-service + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb-service:10710 + volumes: + - ./data:/iotdb/data + - ./logs:/iotdb/logs + networks: + iotdb: + ipv4_address: 172.18.0.6 + +networks: + iotdb: + external: true +``` + +### deploy cluster + +Until now, we support host and overlay networks but haven't supported bridge networks on multiple computers. +Overlay networks see [1C2D](https://github.com/apache/iotdb/tree/master/docker/src/main/DockerCompose/docker-compose-cluster-1c2d.yml) and here are the configurations and operation steps to start an IoTDB cluster with docker using host networks。 + +Suppose that there are three computers of iotdb-1, iotdb-2 and iotdb-3. We called them nodes. +Here is the docker-compose file of iotdb-2, as the sample: + +```yaml +version: "3" +services: + iotdb-confignode: + image: apache/iotdb:1.1.0-confignode + container_name: iotdb-confignode + environment: + - cn_internal_address=iotdb-2 + - cn_seed_config_node=iotdb-1:10710 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - data_replication_factor=3 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/confignode:/iotdb/data + - ./logs/confignode:/iotdb/logs + network_mode: "host" + + iotdb-datanode: + image: apache/iotdb:1.1.0-datanode + container_name: iotdb-datanode + environment: + - dn_rpc_address=iotdb-2 + - dn_internal_address=iotdb-2 + - dn_seed_config_node=iotdb-1:10710 + - data_replication_factor=3 + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/datanode:/iotdb/data/ + - ./logs/datanode:/iotdb/logs/ + network_mode: "host" +``` + +Notice: + +1. The `dn_seed_config_node` of three nodes must the same and it is the first starting node of `iotdb-1` with the cn_internal_port of 10710。 +2. In this docker-compose file,`iotdb-2` should be replace with the real IP or hostname of each node to generate docker compose files in the other nodes. +3. The services would talk with each other, so they need map the /etc/hosts file or add the `extra_hosts` to the docker compose file. +4. We must start the IoTDB services of `iotdb-1` first at the first time of starting. +5. Stop and remove all the IoTDB services and clean up the `data` and `logs` directories of the 3 nodes,then start the cluster again. diff --git a/src/UserGuide/V2.0.1/Tree/stage/ClusterQuickStart.md b/src/UserGuide/V2.0.1/Tree/stage/ClusterQuickStart.md new file mode 100644 index 00000000..f12792a3 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/ClusterQuickStart.md @@ -0,0 +1,260 @@ + + +# Cluster Quick Start + +The purpose of this article is to show how to start, expand, and shrink an IoTDB cluster in an easy way. + +See also: +[FAQ](../FAQ/Frequently-asked-questions.md) + + +## Installation and deployment + +As an example, we'd like to start an IoTDB cluster with 3 ConfigNodes and 3 DataNodes(3C3D) with minimum modifications. Thus, +- the cluster name is defaultCluster +- data/schema replica is 1 +- the max heap size of ConfigNode take the 1/4 of the computer +- the max heap size of DataNode take the 1/4 of the computer + +Suppose there are 3 computers(3 nodes we called here) with Linux OS and JDK installed(detail see [Prerequisites](../QuickStart/QuickStart.md)) and IoTDB working directory is `/data/iotdb`. +IP address and configurations is like below: + +| Node IP | 192.168.132.10 | 192.168.132.11 | 192.168.132.12 | +|--------|:---------------|:---------------|:---------------| +| service | ConfigNode | ConfigNode | ConfigNode | +| service | DataNode | DataNode | DataNode | + +Port: + +| Service | ConfigNode | DataNode | +|---|---|---| +|port| 10710, 10720 | 6667, 10730, 10740, 10750, 10760 | + +**illustration:** +- We could use IP address or hostname/domain to set up an IoTDB cluster, then we would take IP address. If using hostname/domain, `/etc/hosts` must be set well. +- JVM memory configuration: `ON_HEAP_MEMORY` in `confignode-env.sh` and `datanode-env.sh`, equal to or greater than 1G is recommended. It's enough for ConfigNode taking 1~2G. The memory taking of DataNode is decided by the inputing and querying data. + +### download +In every computer, [Download](https://iotdb.apache.org/Download/) the IoTDB install package and extract it to working directory of `/data/iotdb`. +Then get the directory tree: +```shell +/data/iotdb/ +├── conf # configuration files +├── lib # jar library +├── sbin # start/stop shell etc. +└── tools # other tools +``` + +### configuration + +Configuration files are in `/data/iotdb/conf`. +Modify the specified configuration file according to the table below: + +| Configuration| Configuration Option | IP:192.168.132.10 | IP:192.168.132.11 | IP:192.168.132.12 | +|------------|:-------------------------------|----------------------|----------------------|:---------------------| +| iotdb-system.properties | cn\_internal\_address | 192.168.132.10 | 192.168.132.11 | 192.168.132.12 | +| iotdb-system.properties | cn_seed_config_node | 192.168.132.10:10710 | 192.168.132.10:10710 | 192.168.132.10:10710 | +| iotdb-system.properties | dn\_rpc\_address | 192.168.132.10 | 192.168.132.11 | 192.168.132.12 | +| iotdb-system.properties | dn\_internal\_address | 192.168.132.10 | 192.168.132.11 | 192.168.132.12 | +| iotdb-system.properties | dn_seed_config_node | 192.168.132.10:10710 | 192.168.132.10:10710 | 192.168.132.10:10710 | + +**Notice:** +It's recommended that the configurations of iotdb-system.properties and the heap size of JVM in all nodes are the same. + +### start IoTDB cluster +Before starting the IoTDB cluster, make sure the configurations are correct and there is no any data in the working directory. + +#### start the first node +That is `cn_seed_config_node` in above configuration table. +Execute these commands below in node of `192.168.132.10`. +```shell +cd /data/iotdb +# start ConfigNode and DataNode services +sbin/start-standalone.sh + +# check DataNode logs to see whether starting successfully or not +tail -f logs/log_datanode_all.log +# expecting statements like below +# 2023-07-21 20:26:01,881 [main] INFO o.a.i.db.service.DataNode:192 - Congratulation, IoTDB DataNode is set up successfully. Now, enjoy yourself! +``` +If there is no such logs mentioned abolve or there are some `Exception`s in log files, it's failed. Then please check `log_confignode_all.log` and `log_datanode_all.log` in directory of `/data/iotdb/logs`. + +**Notice**: +- Make sure the first node, especially the first ConfigNode that `cn_seed_config_node` specified, starting successfully, and then start the other services. +- If starting failed,it's necessary to do [cleanup](#【reference】cleanup) before starting again. +- How to start service ConfigNode or DataNode alone: +```shell +# start ConfigNode alone in daemon +sbin/start-confignode.sh -d +# start DataNode alone in daemon +sbin/start-datanode.sh -d +``` + +#### start service ConfigNode and DataNode in other nodes +Execute commands below in both 192.168.132.11 and 192.168.132.12: +```shell +cd /data/iotdb +# start service ConfigNode and DataNode +sbin/start-standalone.sh +``` +If starting failed, it's necessary to do [cleanup](#【reference】cleanup) in all nodes, and then doging all again from starting the first node. + +#### check the cluster status +If everything goes well, the cluster will start successfully. Then, we can start the Cli for verification. +```shell +/data/iotdb/sbin/start-cli.sh -h 192.168.132.10 +IoTDB>show cluster; +# example result: ++------+----------+-------+---------------+------------+-------+---------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version|BuildInfo| ++------+----------+-------+---------------+------------+-------+---------+ +| 0|ConfigNode|Running| 192.168.132.10| 10710|1.x.x | xxxxxxx| +| 1| DataNode|Running| 192.168.132.10| 10730|1.x.x | xxxxxxx| +| 2|ConfigNode|Running| 192.168.132.11| 10710|1.x.x | xxxxxxx| +| 3| DataNode|Running| 192.168.132.11| 10730|1.x.x | xxxxxxx| +| 4|ConfigNode|Running| 192.168.132.12| 10710|1.x.x | xxxxxxx| +| 5| DataNode|Running| 192.168.132.12| 10730|1.x.x | xxxxxxx| ++------+----------+-------+---------------+------------+-------+---------+ +``` +**illustration:** +The IP address of `start-cli.sh -h` could be any IP address of DataNode service. + + +### 【reference】Cleanup +Execute commands in every node: +1. End processes of ConfigNode and DataNode +```shell +# 1. Stop services ConfigNode and DataNode +sbin/stop-standalone.sh + +# 2. Check whether there are IoTDB processes left or not +jps +# 或者 +ps -ef|gerp iotdb + +# 3. If there is any IoTDB process left, kill it +kill -9 +# If there is only 1 IoTDB instance, execue command below to remove all IoTDB process +ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 +``` + +2. Remove directories of data and logs +```shell +cd /data/iotdb +rm -rf data logs +``` +**illustration:** +It's necessary to remove directory of `data` but it's not necessary to remove directory of `logs`, only for convenience. + + +## Expand +`Expand` means add services of ConfigNode or DataNode into an existing IoTDB cluster. + +It's the same as starting the other nodes mentioned above. That is downloading IoTDB install package, extracting, configurating and then starting. The new node here is `192.168.132.13`. +**Notice** +- It's must be cleaned up, in other words doing [cleanup](#cleanup) in it. +- `cluster_name` of `iotdb-system.properties` must be the same to the cluster. +- `cn_seed_config_node` and `dn_seed_config_node` must be the same to the cluster. +- The old data wouldn't be moved to the new node but the new data would be. + +### configuration +Modify the specified configuration file according to the table below: + +| Configuration | Configuration Option| IP:192.168.132.13 | +|------------|:-------------------------------|:---------------------| +| iotdb-system.properties | cn\_internal\_address | 192.168.132.13 | +| iotdb-system.properties | cn\_target\_config\_node\_list | 192.168.132.10:10710 | +| iotdb-system.properties | dn\_rpc\_address | 192.168.132.13 | +| iotdb-system.properties | dn\_internal\_address | 192.168.132.13 | +| iotdb-system.properties | dn\_target\_config\_node\_list | 192.168.132.10:10710 | + +### expand +Execute commands below in new node of `192.168.132.13`: +```shell +cd /data/iotdb +# start service ConfigNode and DataNode +sbin/start-standalone.sh +``` + +### check the result +Execute `show cluster` through Cli and the result like below: +```shell +/data/iotdb/sbin/start-cli.sh -h 192.168.132.10 +IoTDB>show cluster; +# example result: ++------+----------+-------+---------------+------------+-------+---------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version|BuildInfo| ++------+----------+-------+---------------+------------+-------+---------+ +| 0|ConfigNode|Running| 192.168.132.10| 10710|1.x.x | xxxxxxx| +| 1| DataNode|Running| 192.168.132.10| 10730|1.x.x | xxxxxxx| +| 2|ConfigNode|Running| 192.168.132.11| 10710|1.x.x | xxxxxxx| +| 3| DataNode|Running| 192.168.132.11| 10730|1.x.x | xxxxxxx| +| 4|ConfigNode|Running| 192.168.132.12| 10710|1.x.x | xxxxxxx| +| 5| DataNode|Running| 192.168.132.12| 10730|1.x.x | xxxxxxx| +| 6|ConfigNode|Running| 192.168.132.13| 10710|1.x.x | xxxxxxx| +| 7| DataNode|Running| 192.168.132.13| 10730|1.x.x | xxxxxxx| ++------+----------+-------+---------------+------------+-------+---------+ +``` + + +## Remove service +`Shrink` means removing a service from the IoTDB cluster. +**Notice:** +- `Shrink` could be done in any node within the cluster +- Any service could be shrinked within cluster. But the DataNode service of the cluster must greater than the data replica of iotdb-system.properties. +- Be patient to wait for the end of shrinking, and then read the guide in logs carefully. + +### shrink service ConfigNode +```shell +cd /data/iotdb +# way 1: shrink with ip:port +sbin/remove-confignode.sh 192.168.132.13:10710 + +# way 2: shrink with NodeID of `show cluster` +sbin/remove-confignode.sh 6 +``` + +### shrink service DataNode +```shell +cd /data/iotdb +# way 1: shrink with ip:port +sbin/remove-datanode.sh 192.168.132.13:6667 + +# way 2: shrink with NodeID of `show cluster` +sbin/remove-datanode.sh 7 +``` + +### check the result + +Execute `show cluster` through Cli, the result is like below: +```shell ++------+----------+-------+---------------+------------+-------+---------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version|BuildInfo| ++------+----------+-------+---------------+------------+-------+---------+ +| 0|ConfigNode|Running| 192.168.132.10| 10710|1.x.x | xxxxxxx| +| 1| DataNode|Running| 192.168.132.10| 10730|1.x.x | xxxxxxx| +| 2|ConfigNode|Running| 192.168.132.11| 10710|1.x.x | xxxxxxx| +| 3| DataNode|Running| 192.168.132.11| 10730|1.x.x | xxxxxxx| +| 4|ConfigNode|Running| 192.168.132.12| 10710|1.x.x | xxxxxxx| +| 5| DataNode|Running| 192.168.132.12| 10730|1.x.x | xxxxxxx| ++------+----------+-------+---------------+------------+-------+---------+ +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Command-Line-Interface.md b/src/UserGuide/V2.0.1/Tree/stage/Command-Line-Interface.md new file mode 100644 index 00000000..4d9f6f58 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Command-Line-Interface.md @@ -0,0 +1,285 @@ + + +# Command Line Interface(CLI) + + +IoTDB provides Cli/shell tools for users to interact with IoTDB server in command lines. This document shows how Cli/shell tool works and the meaning of its parameters. + +> Note: In this document, \$IOTDB\_HOME represents the path of the IoTDB installation directory. + +## Installation + +If you use the source code version of IoTDB, then under the root path of IoTDB, execute: + +```shell +> mvn clean package -pl iotdb-client/cli -am -DskipTests -P get-jar-with-dependencies +``` + +After build, the IoTDB Cli will be in the folder "cli/target/iotdb-cli-{project.version}". + +If you download the binary version, then the Cli can be used directly in sbin folder. + +## Running + +### Running Cli + +After installation, there is a default user in IoTDB: `root`, and the +default password is `root`. Users can use this username to try IoTDB Cli/Shell tool. The cli startup script is the `start-cli` file under the \$IOTDB\_HOME/bin folder. When starting the script, you need to specify the IP and PORT. (Make sure the IoTDB cluster is running properly when you use Cli/Shell tool to connect to it.) + +Here is an example where the cluster is started locally and the user has not changed the running port. The default rpc port is +6667
+If you need to connect to the remote DataNode or changes +the rpc port number of the DataNode running, set the specific IP and RPC PORT at -h and -p.
+You also can set your own environment variable at the front of the start script ("/sbin/start-cli.sh" for linux and "/sbin/start-cli.bat" for windows) + +The Linux and MacOS system startup commands are as follows: + +```shell +Shell > bash sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root +``` +The Windows system startup commands are as follows: + +```shell +Shell > sbin\start-cli.bat -h 127.0.0.1 -p 6667 -u root -pw root +``` +After operating these commands, the cli can be started successfully. The successful status will be as follows: + +``` + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version + + +Successfully login at 127.0.0.1:6667 +IoTDB> +``` +Enter ```quit``` or `exit` can exit Cli. + +### Cli Parameters + +|Parameter name|Parameter type|Required| Description| Example | +|:---|:---|:---|:---|:---| +|-disableISO8601 |No parameters | No |If this parameter is set, IoTDB will print the timestamp in digital form|-disableISO8601| +|-h <`host`> |string, no quotation marks|Yes|The IP address of the IoTDB server|-h 10.129.187.21| +|-help|No parameters|No|Print help information for IoTDB|-help| +|-p <`rpcPort`>|int|Yes|The rpc port number of the IoTDB server. IoTDB runs on rpc port 6667 by default|-p 6667| +|-pw <`password`>|string, no quotation marks|No|The password used for IoTDB to connect to the server. If no password is entered, IoTDB will ask for password in Cli command|-pw root| +|-u <`username`>|string, no quotation marks|Yes|User name used for IoTDB to connect the server|-u root| +|-maxPRC <`maxPrintRowCount`>|int|No|Set the maximum number of rows that IoTDB returns|-maxPRC 10| +|-e <`execute`> |string|No|manipulate IoTDB in batches without entering cli input mode|-e "show databases"| +|-c | empty | No | If the server enables `rpc_thrift_compression_enable=true`, then cli must use `-c` | -c | + +Following is a cli command which connects the host with IP +10.129.187.21, rpc port 6667, username "root", password "root", and prints the timestamp in digital form. The maximum number of lines displayed on the IoTDB command line is 10. + +The Linux and MacOS system startup commands are as follows: + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` +The Windows system startup commands are as follows: + +```shell +Shell > sbin\start-cli.bat -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` + +### CLI Special Command +Special commands of Cli are below. + +| Command | Description / Example | +|:---|:---| +| `set time_display_type=xxx` | eg. long, default, ISO8601, yyyy-MM-dd HH:mm:ss | +| `show time_display_type` | show time display type | +| `set time_zone=xxx` | eg. +08:00, Asia/Shanghai | +| `show time_zone` | show cli time zone | +| `set fetch_size=xxx` | set fetch size when querying data from server | +| `show fetch_size` | show fetch size | +| `set max_display_num=xxx` | set max lines for cli to output, -1 equals to unlimited | +| `help` | Get hints for CLI special commands | +| `exit/quit` | Exit CLI | + +### Note on using the CLI with OpenID Connect Auth enabled on Server side + +Openid connect (oidc) uses keycloack as the authority authentication service of oidc service + + +#### configuration +The configuration is located in iotdb-system.properties , set the author_provider_class is org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer Openid service is enabled, and the default value is org.apache.iotdb.db.auth.authorizer.LocalFileAuthorizer Indicates that the openid service is not enabled. + +``` +authorizer_provider_class=org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer +``` +If the openid service is turned on, openid_URL is required,openID_url value is http://ip:port/realms/{realmsName} + +``` +openID_url=http://127.0.0.1:8080/realms/iotdb/ +``` +#### keycloack configuration + +1、Download the keycloack file (This tutorial is version 21.1.0) and start keycloack in keycloack/bin + +```shell +Shell >cd bin +Shell >./kc.sh start-dev +``` +2、use url(https://ip:port) login keycloack, the first login needs to create a user +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/login_keycloak.png?raw=true) + +3、Click administration console +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/AdministrationConsole.png?raw=true) + +4、In the master menu on the left, click Create realm and enter Realm name to create a new realm +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_1.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_2.jpg?raw=true) + + +5、Click the menu clients on the left to create clients + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/client.jpg?raw=true) + +6、Click user on the left menu to create user + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/user.jpg?raw=true) + +7、Click the newly created user ID, click the credentials navigation, enter the password and close the temporary option. The configuration of keycloud is completed + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/pwd.jpg?raw=true) + +8、To create a role, click Roles on the left menu and then click the Create Role button to add a role + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role1.jpg?raw=true) + +9、 Enter `iotdb_admin` in the Role Name and click the save button. Tip: `iotdb_admin` here cannot be any other name, otherwise even after successful login, you will not have permission to use iotdb's query, insert, create database, add users, roles and other functions + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role2.jpg?raw=true) + +10、Click on the User menu on the left and then click on the user in the user list to add the `iotdb_admin` role we just created for that user + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role3.jpg?raw=true) + +11、 Select Role Mappings, select the `iotdb_admin` role in Assign Role + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role4.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role5.jpg?raw=true) + + +Tip: If the user role is adjusted, you need to regenerate the token and log in to iotdb again to take effect + +The above steps provide a way for keycloak to log into iotdb. For more ways, please refer to keycloak configuration + +If OIDC is enabled on server side then no username / passwort is needed but a valid Access Token from the OIDC Provider. +So as username you use the token and the password has to be empty, e.g. + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u {my-access-token} -pw "" +``` + +Among them, you need to replace {my access token} (note, including {}) with your token, that is, the value corresponding to access_token. The password is empty and needs to be confirmed again. + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/iotdbpw.jpeg?raw=true) + + +How to get the token is dependent on your OpenID Connect setup and not covered here. +In the simplest case you can get this via the command line with the `passwort-grant`. +For example, if you use keycloack as OIDC and you have a realm with a client `iotdb` defined as public you could use +the following `curl` command to fetch a token (replace all `{}` with appropriate values). + +```shell +curl -X POST "https://{your-keycloack-server}/realms/{your-realm}/protocol/openid-connect/token" \ + -H "Content-Type: application/x-www-form-urlencoded" \ + -d "username={username}" \ + -d "password={password}" \ + -d 'grant_type=password' \ + -d "client_id=iotdb-client" +``` +The response looks something like + +```json +{"access_token":"eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJxMS1XbTBvelE1TzBtUUg4LVNKYXAyWmNONE1tdWNXd25RV0tZeFpKNG93In0.eyJleHAiOjE1OTAzOTgwNzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNjA0ZmYxMDctN2NiNy00NTRmLWIwYmQtY2M2ZDQwMjFiNGU4IiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiYWNjb3VudCIsInN1YiI6ImJhMzJlNDcxLWM3NzItNGIzMy04ZGE2LTZmZThhY2RhMDA3MyIsInR5cCI6IkJlYXJlciIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsImFjciI6IjEiLCJhbGxvd2VkLW9yaWdpbnMiOlsibG9jYWxob3N0OjgwODAiXSwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbIm9mZmxpbmVfYWNjZXNzIiwidW1hX2F1dGhvcml6YXRpb24iLCJpb3RkYl9hZG1pbiJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoiZW1haWwgcHJvZmlsZSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJ1c2VyIn0.nwbrJkWdCNjzFrTDwKNuV5h9dDMg5ytRKGOXmFIajpfsbOutJytjWTCB2WpA8E1YI3KM6gU6Jx7cd7u0oPo5syHhfCz119n_wBiDnyTZkFOAPsx0M2z20kvBLN9k36_VfuCMFUeddJjO31MeLTmxB0UKg2VkxdczmzMH3pnalhxqpnWWk3GnrRrhAf2sZog0foH4Ae3Ks0lYtYzaWK_Yo7E4Px42-gJpohy3JevOC44aJ4auzJR1RBj9LUbgcRinkBy0JLi6XXiYznSC2V485CSBHW3sseXn7pSXQADhnmGQrLfFGO5ZljmPO18eFJaimdjvgSChsrlSEmTDDsoo5Q","expires_in":300,"refresh_expires_in":1800,"refresh_token":"eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhMzZlMGU0NC02MWNmLTQ5NmMtOGRlZi03NTkwNjQ5MzQzMjEifQ.eyJleHAiOjE1OTAzOTk1NzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNmMxNTBiY2EtYmE5NC00NTgxLWEwODEtYjI2YzhhMmI5YmZmIiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwic3ViIjoiYmEzMmU0NzEtYzc3Mi00YjMzLThkYTYtNmZlOGFjZGEwMDczIiwidHlwIjoiUmVmcmVzaCIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsInNjb3BlIjoiZW1haWwgcHJvZmlsZSJ9.ayNpXdNX28qahodX1zowrMGiUCw2AodlHBQFqr8Ui7c","token_type":"bearer","not-before-policy":0,"session_state":"060d2862-14ed-42fe-baf7-8d1f784657f1","scope":"email profile"} +``` + +The interesting part here is the access token with the key `access_token`. +This has to be passed as username (with parameter `-u`) and empty password to the CLI. + +### Batch Operation of Cli + +-e parameter is designed for the Cli/shell tool in the situation where you would like to manipulate IoTDB in batches through scripts. By using the -e parameter, you can operate IoTDB without entering the cli's input mode. + +In order to avoid confusion between statements and other parameters, the current version only supports the -e parameter as the last parameter. + +The usage of -e parameter for Cli/shell is as follows: + +The Linux and MacOS system commands: + +```shell +Shell > bash sbin/start-cli.sh -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +The Windows system commands: + +```shell +Shell > sbin\start-cli.bat -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +In the Windows environment, the SQL statement of the -e parameter needs to use ` `` ` to replace `" "` + +In order to better explain the use of -e parameter, take following as an example(On linux system). + +Suppose you want to create a database root.demo to a newly launched IoTDB, create a timeseries root.demo.s1 and insert three data points into it. With -e parameter, you could write a shell like this: + +```shell +# !/bin/bash + +host=127.0.0.1 +rpcPort=6667 +user=root +pass=root + +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create database root.demo" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create timeseries root.demo.s1 WITH DATATYPE=INT32, ENCODING=RLE" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(1,10)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(2,11)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(3,12)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "select s1 from root.demo" +``` + +The results are shown in the figure, which are consistent with the Cli and jdbc operations. + +```shell + Shell > bash ./shell.sh ++-----------------------------+------------+ +| Time|root.demo.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 10| +|1970-01-01T08:00:00.002+08:00| 11| +|1970-01-01T08:00:00.003+08:00| 12| ++-----------------------------+------------+ +Total line number = 3 +It costs 0.267s +``` + +It should be noted that the use of the -e parameter in shell scripts requires attention to the escaping of special characters. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Data-Import-Export-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Data-Import-Export-Tool.md new file mode 100644 index 00000000..84ee41b6 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Data-Import-Export-Tool.md @@ -0,0 +1,278 @@ + + +# Data Import Export Script + +IoTDB provides data import and export scripts (tools/export-data, tools/import-data, supported in versions 1.3.2 and above; for historical versions, tools/export-csv, tools/import-csv scripts can be used, see the reference link for usage [Document](./TsFile-Import-Export-Tool.md) ), which are used to facilitate the interaction between IoTDB internal data and external files, suitable for batch operations of single files or directories. + + +## Supported Data Formats + +- **CSV** : Plain text format for storing formatted data, which must be constructed according to the specified CSV format below. +- **SQL** : Files containing custom SQL statements. + +## export-data Script (Data Export) + +### Command + +```Bash +# Unix/OS X +>tools/export-data.sh -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] + +# Windows +>tools\export-data.bat -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] +``` + +Parameter Introduction: + +| Parameter | Definition | Required | Default | +|:-------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------|:-------------------------| +| -h | Database IP address | No | 127.0.0.1 | +| -p | Database port | No | 6667 | +| -u | Database connection username | No | root | +| -pw | Database connection password | No | root | +| -t | Output path for the exported CSV or SQL file(The parameter for V1.3.2 is `-td`) | Yes | | +| -datatype | Whether to print the corresponding data type behind the time series in the CSV file header, options are true or false | No | true | +| -q | Directly specify the query statement to be executed in the command (currently only supports some statements, see the table below for details).
Note: -q and -s parameters must be filled in one, and -q takes effect if both are filled. For detailed examples of supported SQL statements, please refer to the "SQL Statement Support Details" below. | No | | +| -s | Specify an SQL file, which may contain one or more SQL statements. If there are multiple SQL statements, they should be separated by newlines (returns). Each SQL statement corresponds to one or more output CSV or SQL files.
Note: -q and -s parameters must be filled in one, and -q takes effect if both are filled. For detailed examples of supported SQL statements, please refer to the "SQL Statement Support Details" below. | No | | +| -type | Specify the type of exported file, options are csv or sql | No | csv | +| -tf | Specify the time format. The time format must comply with the [ISO 8601](https://calendars.wikia.org/wiki/ISO_8601) standard or timestamp.
Note: Only effective when -type is csv | No | yyyy-MM-dd HH:mm:ss.SSSz | +| -lpf | Specify the maximum number of lines for the exported dump file(The parameter for V1.3.2 is `-linesPerFile`) | No | 10000 | +| -timeout | Specify the timeout time for session queries in milliseconds | No | -1 | + +SQL Statement Support Rules: + +1. Only query statements are supported; non-query statements (such as metadata management, system management, etc.) are not supported. For unsupported SQL, the program will automatically skip and output an error message. +2. In the current version of query statements, only the export of raw data is supported. If there are group by, aggregate functions, UDFs, or operational operators, they are not supported for export as SQL. When exporting raw data, please note that if exporting data from multiple devices, please use the align by device statement. Detailed examples are as follows: + +| | Supported for Export | Example | +|--------------------------------------------------------|----------------------|-----------------------------------------------| +| Raw data single device query | Supported | select * from root.s_0.d_0 | +| Raw data multi-device query (align by device) | Supported | select * from root.** align by device | +| Raw data multi-device query (without align by device) | Unsupported | select * from root.**
select * from root.s_0.* | + +### Running Examples + +- Export all data within a certain SQL execution range to a CSV file. +```Bash + # Unix/OS X + >tools/export-data.sh -t ./data/ -q 'select * from root.stock.**' + # Windows + >tools/export-data.bat -t ./data/ -q 'select * from root.stock.**' + ``` + +- Export Results + ```Bash + Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice + 2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 + 2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 + ``` +- All data within the scope of all SQL statements in the SQL file is exported to CSV files. + ```Bash + # Unix/OS X + >tools/export-data.sh -t ./data/ -s export.sql + # Windows + >tools/export-data.bat -t ./data/ -s export.sql + ``` + +- Contents of export.sql File (Pointed to by -s Parameter) + ```SQL + select * from root.stock.** limit 100 + select * from root.db.** limit 100 + ``` + +- Export Result File 1 + ```Bash + Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice + 2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 + 2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 + ``` + +- Export Result File 2 + ```Bash + Time,root.db.Random.RandomBoolean + 2024-07-22T17:16:05.820+08:00,true + 2024-07-22T17:16:02.597+08:00,false + ``` +- Export Data in SQL File to SQL Statements with Aligned Format + ```Bash + # Unix/OS X + >tools/export-data.sh -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true + # Windows + >tools/export-data.bat -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true + ``` + +- Export Results + ```Bash + INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249629831,0.62308747,2.0,0.012206747854849653,-6.0,false,0.14164352); + INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249630834,0.7520042,3.0,0.22760657101910464,-5.0,true,0.089064896); + INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249631835,0.3981064,3.0,0.6254559288663467,-6.0,false,0.9767922); + ``` +- Export Data in a Certain SQL Execution Range to a CSV File with Specified Time Format and Data Types + ```Bash + # Unix/OS X + >tools/export-data.sh -t ./data/ -tf 'yyyy-MM-dd HH:mm:ss' -datatype true -q "select * from root.stock.**" -type csv + # Windows + >tools/export-data.bat -t ./data/ -tf 'yyyy-MM-dd HH:mm:ss' -datatype true -q "select * from root.stock.**" -type csv + ``` + +- Export Results + ```Bash + Time,root.stock.Legacy.0700HK.L1_BidPrice(DOUBLE),root.stock.Legacy.0700HK.Type(DOUBLE),root.stock.Legacy.0700HK.L1_BidSize(DOUBLE),root.stock.Legacy.0700HK.Domain(DOUBLE),root.stock.Legacy.0700HK.L1_BuyNo(BOOLEAN),root.stock.Legacy.0700HK.L1_AskPrice(DOUBLE) + 2024-07-30 10:33:55,0.44574088,3.0,0.21476832811611501,-4.0,true,0.5951748 + 2024-07-30 10:33:56,0.6880933,3.0,0.6289119476165305,-5.0,false,0.114634395 + ``` + +## import-data Script (Data Import) + +### Import File Examples + +#### CSV File Example + +Note that before importing CSV data, special characters need to be handled as follows: + +1. If the text type field contains special characters such as `,`, it should be escaped with `\`. +2. You can import times in formats like `yyyy-MM-dd'T'HH:mm:ss`, `yyyy-MM-dd HH:mm:ss`, or `yyyy-MM-dd'T'HH:mm:ss.SSSZ`. +3. The time column `Time` should always be in the first column. + +Example 1: Time Aligned, No Data Types in Header + +```SQL +Time,root.test.t1.str,root.test.t2.str,root.test.t2.var +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,"123",, +``` + +Example 2: Time Aligned, Data Types in Header(Text type data supports double quotation marks and non double quotation marks) + +```SQL +Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.var(INT32) +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,123,hello world,123 +1970-01-01T08:00:00.003+08:00,"123",, +1970-01-01T08:00:00.004+08:00,123,,12 +``` +Example 3: Device Aligned, No Data Types in Header + +```SQL +Time,Device,str,var +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +``` + +Example 4: Device Aligned, Data Types in Header (Text type data supports double quotation marks and non double quotation marks) + +```SQL +Time,Device,str(TEXT),var(INT32) +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 +``` + +#### SQL File Example + +> For unsupported SQL, illegal SQL, or failed SQL executions, they will be placed in the failed directory under the failed file (default to filename.failed). + +```SQL +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728578812,0.21911979,4.0,0.7129878488375604,-5.0,false,0.65362453); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728579812,0.35814416,3.0,0.04674720094979623,-5.0,false,0.9365247); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728580813,0.20012152,3.0,0.9910098187911393,-4.0,true,0.70040536); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728581814,0.034122765,4.0,0.9313345284181858,-4.0,true,0.9945297); +``` + +### Command + +```Bash +# Unix/OS X +>tools/import-data.sh -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] + +# Windows +>tools\import-data.bat -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] +``` + +> Although IoTDB has the ability to infer types, it is still recommended to create metadata before importing data to avoid unnecessary type conversion errors. For example: + +```SQL +CREATE DATABASE root.fit.d1; +CREATE DATABASE root.fit.d2; +CREATE DATABASE root.fit.p; +CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; +CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; +``` + +Parameter Introduction: + +| Parameter | Definition | Required | Default | +|:----------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------|:-------------------------------------------------| +| -h | Database IP address | No | 127.0.0.1 | +| -p | Database port | No | 6667 | +| -u | Database connection username | No | root | +| -pw | Database connection password | No | root | +| -s | Specify the data you want to import, here you can specify a file or folder. If a folder is specified, all files with the suffix CSV or SQL in the folder will be imported in bulk.(The parameter for V1.3.2 is `-f`) | Yes | | +| -fd | Specify the directory to store the failed SQL files. If this parameter is not specified, the failed files will be saved to the directory of the source data.
Note: For unsupported SQL, illegal SQL, and failed SQL, they will be placed in the failed file in the failed directory (default file name is. failed) | No | Add the suffix '. failed' to the source file name | +| -aligned | Specify whether to use the 'aligned' interface, with options of true or false. This parameter only takes effect when the imported file is a CSV file | No | false | +| -batch | Used to specify the number of points to be inserted for each batch of data (minimum value is 1, maximum value is Integer.MAX_VALUE). If the program reports' org.apache.hrift.transport ' If TTransportException: Frame size larger than protect max size is incorrect, you can adjust this parameter appropriately. | No | `100000` | +| -tp | Specify time precision, optional values include 'ms' (milliseconds),' ns' (nanoseconds), 'us' (microseconds) | No | `ms` | +| -lpf | Specify the number of lines to write data to each failed import file(The parameter for V1.3.2 is `-linesPerFailedFile`) | No | 10000 | +| -typeInfer | Used to specify type inference rules. For Example:.
Note: Used to specify type inference rules.`srcTsDataType` include `boolean`,`int`,`long`,`float`,`double`,`NaN`.`dstTsDataType` include `boolean`,`int`,`long`,`float`,`double`,`text`.when`srcTsDataType`is`boolean`, `dstTsDataType`can only be`boolean`or`text`.when`srcTsDataType`is`NaN`, `dstTsDataType`can only be`float`, `double`or`text`.when`srcTsDataType`is numeric, the precision of `dstTsDataType`needs to be higher than that of `srcTsDataType`.For example:`-typeInfer boolean=text,float=double` | No | | + +### Running Examples + +- Import the `dump0_0.sql` data in the current data directory into the local IoTDB database. + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.sql +# Windows +>tools/import-data.bat -s ./data/dump0_0.sql +``` + +- Import all data in the current data directory in an aligned manner into the local IoTDB database. + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/ -fd ./failed/ -aligned true +# Windows +>tools/import-data.bat -s ./data/ -fd ./failed/ -aligned true +``` + +- Import the `dump0_0.csv` data in the current data directory into the local IoTDB database. + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.csv -fd ./failed/ +# Windows +>tools/import-data.bat -s ./data/dump0_0.csv -fd ./failed/ +``` + +- Import the `dump0_0.csv` data in the current data directory in an aligned manner, batch import 100,000 records into the IoTDB database on the host with IP `192.168.100.1`, record failures in the current `failed` directory, and limit each file to 1,000 records. + +```Bash +# Unix/OS X +>tools/import-data.sh -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +# Windows +>tools/import-data.bat -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Data-Modeling/DataRegion.md b/src/UserGuide/V2.0.1/Tree/stage/Data-Modeling/DataRegion.md new file mode 100644 index 00000000..99f6357d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Data-Modeling/DataRegion.md @@ -0,0 +1,57 @@ + + +# Data Region + +## Background + +The database is specified by the user display. +Use the statement "CREATE DATABASE" to create the database. +Each database has a corresponding StorageGroupProcessor. + +To ensure eventually consistency, a insert lock (exclusive lock) is used to synchronize each insert request in each database. +So the server side parallelism of data ingestion is equal to the number of database. + +## Problem + +From background, we can infer that the parallelism of data ingestion of IoTDB is max(num of client, server side parallelism), which equals to max(num of client, num of database) + +The concept of database usually is related to real world entity such as factory, location, country and so on. +The number of databases may be small which makes the parallelism of data ingestion of IoTDB insufficient. We can't jump out of this dilemma even we start hundreds of client for ingestion. + +## Solution + +Our idea is to group devices into buckets and change the granularity of synchronization from database level to device buckets level. + +In detail, we use hash to group different devices into buckets called data region. +For example, one device called "root.sg.d"(assume it's database is "root.sg") is belonged to data region "root.sg.[hash("root.sg.d") mod num_of_data_region]" + +## Usage + +To use data region, you can set this config below: + +``` +data_region_num +``` + +Recommended value is [data region number] = [CPU core number] / [user-defined database number] + +For more information, you can refer to [this page](../Reference/DataNode-Config-Manual.md). \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Data-Modeling/SchemaRegion-rocksdb.md b/src/UserGuide/V2.0.1/Tree/stage/Data-Modeling/SchemaRegion-rocksdb.md new file mode 100644 index 00000000..2f767793 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Data-Modeling/SchemaRegion-rocksdb.md @@ -0,0 +1,110 @@ + + +# Schema Region + +## Background + +When IoTDB service is started, metadata information is organized by loading log file `mlog.bin` and the results are held +in memory for a long time. As metadata continues to grow, memory continues to grow. In order to support the controllable +fluctuation in the massive metadata scenario, we provide a metadata storage type based on rocksDB. + +## Usage + +Firstly, you should package **schema-engine-rocksdb** by the following command: + +```shell +mvn clean package -pl schema-engine-rocksdb -am -DskipTests +``` + +After that, you can get a **conf** directory and a **lib** directory in +schema-engine-rocksdb/target/schema-engine-rocksdb. Copy the file in the conf directory to the conf directory of server, +and copy the files in the lib directory to the lib directory of server. + +Then, open the **iotdb-system.properties** in the conf directory of server, and set the `schema_engine_mode` to +Rocksdb_based. Restart the IoTDB, the system will use `RSchemaRegion` to manage the metadata. + +``` +#################### +### Schema Engine Configuration +#################### +# Choose the mode of schema engine. The value could be Memory,PBTree and Rocksdb_based. If the provided value doesn't match any pre-defined value, Memory mode will be used as default. +# Datatype: string +schema_engine_mode=Rocksdb_based + +``` + +When rocksdb is specified as the metadata storage type, configuration parameters of rocksDB are open to the public as file. You can modify the configuration file `schema-rocksdb.properties` to adjust parameters according to your own requirements, such as block cache. If there is no special requirement, use the default value. + +## Function Support + +The module is still being improved, and some functions are not supported at the moment. The function modules are supported as follows: + +| function | support | +| :-----| ----: | +| timeseries addition and deletion | yes | +| query the wildcard path(* and **) | yes | +| tag addition and deletion | yes | +| aligned timeseries | yes | +| wildcard node name(*) | no | +| meta template | no | +| tag index | no | +| continuous query | no | + + +## Appendix: Interface support + +The external interface, that is, the client can sense, related SQL is not supported; + +The internal interface, that is, the invocation logic of other modules within the service, has no direct dependence on the external SQL; + +| interface | type | support | comment | +| :-----| ----: | :----: | :----: | +| createTimeseries | external | yes | | +| createAlignedTimeSeries | external | yes | | +| showTimeseries | external | part of the support | not support LATEST | +| changeAlias | external | yes | | +| upsertTagsAndAttributes | external | yes | | +| addAttributes | external | yes | | +| addTags | external | yes | | +| dropTagsOrAttributes | external | yes | | +| setTagsOrAttributesValue | external | yes | | +| renameTagOrAttributeKey | external | yes | | +| *template | external | no | | +| *trigger | external | no | | +| deleteSchemaRegion | internal | yes | | +| autoCreateDeviceMNode | internal | no | | +| isPathExist | internal | yes | | +| getAllTimeseriesCount | internal | yes | | +| getDevicesNum | internal | yes | | +| getNodesCountInGivenLevel | internal | conditional support | path does not support wildcard | +| getMeasurementCountGroupByLevel | internal | yes | | +| getNodesListInGivenLevel | internal | conditional support | path does not support wildcard | +| getChildNodePathInNextLevel | internal | conditional support | path does not support wildcard | +| getChildNodeNameInNextLevel | internal | conditional support | path does not support wildcard | +| getBelongedDevices | internal | yes | | +| getMatchedDevices | internal | yes | | +| getMeasurementPaths | internal | yes | | +| getMeasurementPathsWithAlias | internal | yes | | +| getAllMeasurementByDevicePath | internal | yes | | +| getDeviceNode | internal | yes | | +| getMeasurementMNodes | internal | yes | | +| getSeriesSchemasAndReadLockDevice | internal | yes | | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Deadband-Process.md b/src/UserGuide/V2.0.1/Tree/stage/Deadband-Process.md new file mode 100644 index 00000000..115afbe2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Deadband-Process.md @@ -0,0 +1,113 @@ + + +# Deadband Process + +## SDT + +The Swinging Door Trending (SDT) algorithm is a deadband process algorithm. +SDT has low computational complexity and uses a linear trend to represent a quantity of data. + +In IoTDB SDT compresses and discards data when flushing into the disk. + +IoTDB allows you to specify the properties of SDT when creating a time series, and supports three properties: + +* CompDev (Compression Deviation) + +CompDev is the most important parameter in SDT that represents the maximum difference between the +current sample and the current linear trend. CompDev needs to be greater than 0 to perform compression. + +* CompMinTime (Compression Minimum Time Interval) + +CompMinTime is a parameter measures the time distance between two stored data points, which is used for noisy reduction. +If the time interval between the current point and the last stored point is less than or equal to its value, +current point will NOT be stored regardless of compression deviation. +The default value is 0 with time unit ms. + +* CompMaxTime (Compression Maximum Time Interval) + +CompMaxTime is a parameter measure the time distance between two stored data points. +If the time interval between the current point and the last stored point is greater than or equal to its value, +current point will be stored regardless of compression deviation. +The default value is 9,223,372,036,854,775,807 with time unit ms. + +The specified syntax for SDT is detailed in [Create Timeseries Statement](../Reference/SQL-Reference.md). + +Supported datatypes: + +* INT32 (Integer) +* INT64 (Long Integer) +* FLOAT (Single Precision Floating Point) +* DOUBLE (Double Precision Floating Point) + +The following is an example of using SDT compression. + +``` +IoTDB> CREATE TIMESERIES root.sg1.d0.s0 WITH DATATYPE=INT32,ENCODING=PLAIN,DEADBAND=SDT,COMPDEV=2 +``` + +Prior to flushing and SDT compression, the results are shown below: + +``` +IoTDB> SELECT s0 FROM root.sg1.d0 ++-----------------------------+--------------+ +| Time|root.sg1.d0.s0| ++-----------------------------+--------------+ +|2017-11-01T00:06:00.001+08:00| 1| +|2017-11-01T00:06:00.002+08:00| 1| +|2017-11-01T00:06:00.003+08:00| 1| +|2017-11-01T00:06:00.004+08:00| 1| +|2017-11-01T00:06:00.005+08:00| 1| +|2017-11-01T00:06:00.006+08:00| 1| +|2017-11-01T00:06:00.007+08:00| 1| +|2017-11-01T00:06:00.015+08:00| 10| +|2017-11-01T00:06:00.016+08:00| 20| +|2017-11-01T00:06:00.017+08:00| 1| +|2017-11-01T00:06:00.018+08:00| 30| ++-----------------------------+--------------+ +Total line number = 11 +It costs 0.008s +``` + +After flushing and SDT compression, the results are shown below: + +``` +IoTDB> FLUSH +IoTDB> SELECT s0 FROM root.sg1.d0 ++-----------------------------+--------------+ +| Time|root.sg1.d0.s0| ++-----------------------------+--------------+ +|2017-11-01T00:06:00.001+08:00| 1| +|2017-11-01T00:06:00.007+08:00| 1| +|2017-11-01T00:06:00.015+08:00| 10| +|2017-11-01T00:06:00.016+08:00| 20| +|2017-11-01T00:06:00.017+08:00| 1| ++-----------------------------+--------------+ +Total line number = 5 +It costs 0.044s +``` + +SDT takes effect when flushing to the disk. The SDT algorithm always stores the first point and does not store the last point. + +The data in [2017-11-01T00:06:00.001, 2017-11-01T00:06:00.007] is within the compression deviation thus discarded. +The data point at time 2017-11-01T00:06:00.007 is stored because the next data point at time 2017-11-01T00:06:00.015 +exceeds compression deviation. When a data point exceeds the compression deviation, SDT stores the last read +point and updates the upper and lower boundaries. The last point at time 2017-11-01T00:06:00.018 is not stored. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Delete-Data/Delete-Data.md b/src/UserGuide/V2.0.1/Tree/stage/Delete-Data/Delete-Data.md new file mode 100644 index 00000000..1e4a367d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Delete-Data/Delete-Data.md @@ -0,0 +1,98 @@ + + +# DELETE + +Users can delete data that meet the deletion condition in the specified timeseries by using the [DELETE statement](../Reference/SQL-Reference.md). When deleting data, users can select one or more timeseries paths, prefix paths, or paths with star to delete data within a certain time interval. + +In a JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute single or batch UPDATE statements. + +## Delete Single Timeseries +Taking ln Group as an example, there exists such a usage scenario: + +The wf02 plant's wt02 device has many segments of errors in its power supply status before 2017-11-01 16:26:00, and the data cannot be analyzed correctly. The erroneous data affected the correlation analysis with other devices. At this point, the data before this time point needs to be deleted. The SQL statement for this operation is + +```sql +delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; +``` + +In case we hope to merely delete the data before 2017-11-01 16:26:00 in the year of 2017, The SQL statement is: +```sql +delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` + +IoTDB supports to delete a range of timeseries points. Users can write SQL expressions as follows to specify the delete interval: + +```sql +delete from root.ln.wf02.wt02.status where time < 10 +delete from root.ln.wf02.wt02.status where time <= 10 +delete from root.ln.wf02.wt02.status where time < 20 and time > 10 +delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 +delete from root.ln.wf02.wt02.status where time > 20 +delete from root.ln.wf02.wt02.status where time >= 20 +delete from root.ln.wf02.wt02.status where time = 20 +``` + +Please pay attention that multiple intervals connected by "OR" expression are not supported in delete statement: + +``` +delete from root.ln.wf02.wt02.status where time > 4 or time < 0 +Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic +expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' +``` + +If no "where" clause specified in a delete statement, all the data in a timeseries will be deleted. + +```sql +delete from root.ln.wf02.wt02.status +``` + + +## Delete Multiple Timeseries +If both the power supply status and hardware version of the ln group wf02 plant wt02 device before 2017-11-01 16:26:00 need to be deleted, [the prefix path with broader meaning or the path with star](../Basic-Concept/Data-Model-and-Terminology.md) can be used to delete the data. The SQL statement for this operation is: + +```sql +delete from root.ln.wf02.wt02 where time <= 2017-11-01T16:26:00; +``` +or + +```sql +delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; +``` +It should be noted that when the deleted path does not exist, IoTDB will not prompt that the path does not exist, but that the execution is successful, because SQL is a declarative programming method. Unless it is a syntax error, insufficient permissions and so on, it is not considered an error, as shown below: +``` +IoTDB> delete from root.ln.wf03.wt02.status where time < now() +Msg: The statement is executed successfully. +``` + +## Delete Time Partition (experimental) +You may delete all data in a time partition of a database using the following grammar: + +```sql +DELETE PARTITION root.ln 0,1,2 +``` + +The `0,1,2` above is the id of the partition that is to be deleted, you can find it from the IoTDB +data folders or convert a timestamp manually to an id using `timestamp / partitionInterval +` (flooring), and the `partitionInterval` should be in your config (if time-partitioning is +supported in your version). + +Please notice that this function is experimental and mainly for development, please use it with care. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Delete-Data/TTL.md b/src/UserGuide/V2.0.1/Tree/stage/Delete-Data/TTL.md new file mode 100644 index 00000000..d539a87c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Delete-Data/TTL.md @@ -0,0 +1,132 @@ + + +# TTL + +IoTDB supports device-level TTL settings, which means it is able to delete old data automatically and periodically. The benefit of using TTL is that hopefully you can control the total disk space usage and prevent the machine from running out of disks. Moreover, the query performance may downgrade as the total number of files goes up and the memory usage also increases as there are more files. Timely removing such files helps to keep at a high query performance level and reduce memory usage. + +The default unit of TTL is milliseconds. If the time precision in the configuration file changes to another, the TTL is still set to milliseconds. + +When setting TTL, the system will look for all devices included in the set path and set TTL for these devices. The system will delete expired data at the device granularity. +After the device data expires, it will not be queryable. The data in the disk file cannot be guaranteed to be deleted immediately, but it can be guaranteed to be deleted eventually. +However, due to operational costs, the expired data will not be physically deleted right after expiring. The physical deletion is delayed until compaction. +Therefore, before the data is physically deleted, if the TTL is reduced or lifted, it may cause data that was previously invisible due to TTL to reappear. +The system can only set up to 1000 TTL rules, and when this limit is reached, some TTL rules need to be deleted before new rules can be set. + +## TTL Path Rule +The path can only be prefix paths (i.e., the path cannot contain \* , except \*\* in the last level). +This path will match devices and also allows users to specify paths without asterisks as specific databases or devices. +When the path does not contain asterisks, the system will check if it matches a database; if it matches a database, both the path and path.\*\* will be set at the same time. Note: Device TTL settings do not verify the existence of metadata, i.e., it is allowed to set TTL for a non-existent device. +``` +qualified paths: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +unqualified paths: +root.*.db +root.**.db.* +root.db.* +``` +## TTL Applicable Rules +When a device is subject to multiple TTL rules, the more precise and longer rules are prioritized. For example, for the device "root.bj.hd.dist001.turbine001", the rule "root.bj.hd.dist001.turbine001" takes precedence over "root.bj.hd.dist001.\*\*", and the rule "root.bj.hd.dist001.\*\*" takes precedence over "root.bj.hd.**". +## Set TTL +The set ttl operation can be understood as setting a TTL rule, for example, setting ttl to root.sg.group1.** is equivalent to mounting ttl for all devices that can match this path pattern. +The unset ttl operation indicates unmounting TTL for the corresponding path pattern; if there is no corresponding TTL, nothing will be done. +If you want to set TTL to be infinitely large, you can use the INF keyword. +The SQL Statement for setting TTL is as follow: +``` +set ttl to pathPattern 360000; +``` +Set the Time to Live (TTL) to a pathPattern of 360,000 milliseconds; the pathPattern should not contain a wildcard (\*) in the middle and must end with a double asterisk (\*\*). The pathPattern is used to match corresponding devices. +To maintain compatibility with older SQL syntax, if the user-provided pathPattern matches a database (db), the path pattern is automatically expanded to include all sub-paths denoted by path.\*\*. +For instance, writing "set ttl to root.sg 360000" will automatically be transformed into "set ttl to root.sg.\*\* 360000", which sets the TTL for all devices under root.sg. However, if the specified pathPattern does not match a database, the aforementioned logic will not apply. For example, writing "set ttl to root.sg.group 360000" will not be expanded to "root.sg.group.\*\*" since root.sg.group does not match a database. +It is also permissible to specify a particular device without a wildcard (*). +## Unset TTL + +To unset TTL, we can use follwing SQL statement: + +``` +IoTDB> unset ttl from root.ln +``` + +After unset TTL, all data will be accepted in `root.ln`. +``` +IoTDB> unset ttl from root.sgcc.** +``` + +Unset the TTL in the `root.sgcc` path. + +New syntax +``` +IoTDB> unset ttl from root.** +``` + +Old syntax +``` +IoTDB> unset ttl to root.** +``` +There is no functional difference between the old and new syntax, and they are compatible with each other. +The new syntax is just more conventional in terms of wording. + +Unset the TTL setting for all path pattern. + +## Show TTL + +To Show TTL, we can use following SQL statement: + +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` + +The SHOW ALL TTL example gives the TTL for all path patterns. +The SHOW TTL ON pathPattern shows the TTL for the path pattern specified. + +Display devices' ttl +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +All devices will definitely have a TTL, meaning it cannot be null. INF represents infinity. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Deployment-Recommendation.md b/src/UserGuide/V2.0.1/Tree/stage/Deployment-Recommendation.md new file mode 100644 index 00000000..5fd59f93 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Deployment-Recommendation.md @@ -0,0 +1,182 @@ + + +# IoTDB Deployment Recommendation +## Backgrounds + +System Abilities +- Performance: writing and reading performance, compression ratio +- Extensibility: system has the ability to manage data with multiple nodes, and is essentially that data can be managed by partitions +- High availability(HA): system has the ability to tolerate the nodes disconnected, and is essentially that the data has replicas +- Consistency:when data is with multiple copies, whether the replicas are consistent, and is essentially that the system treats the whole database as a single node + +Abbreviations +- C: ConfigNode +- D: DataNode +- nCmD:cluster with n ConfigNodes and m DataNodes + +## Deployment mode + +| mode | Performance | Extensibility | HA | Consistency | +|:---------------------------------------:|:---------------|:--------------|:-------|:------------| +| Lightweight standalone mode | Extremely High | None | None | High | +| Scalable standalone mode (default) | High | High | Medium | High | +| High performance cluster mode | High | High | High | Medium | +| Strong consistency cluster mode | Medium | High | High | High | + + +| Config | Lightweight standalone mode | Scalable single node mode | High performance mode | strong consistency cluster mode | +|:--------------------------------------:|:----------------------------|:--------------------------|:----------------------|:--------------------------------| +| ConfigNode number | 1 | ≥1 (odd number) | ≥1 (odd number) | ≥1 (odd number) | +| DataNode number | 1 | ≥1 | ≥3 | ≥3 | +| schema_replication_factor | 1 | 1 | 3 | 3 | +| data_replication_factor | 1 | 1 | 2 | 3 | +| config_node_consensus_protocol_class | Simple | Ratis | Ratis | Ratis | +| schema_region_consensus_protocol_class | Simple | Ratis | Ratis | Ratis | +| data_region_consensus_protocol_class | Simple | IoT | IoT | Ratis | + + +## Deployment Recommendation + +### Upgrade from v0.13 to v1.0 + +Scenario: +Already has some data under v0.13, hope to upgrade to v1.0. + +Options: +1. Upgrade to 1C1D standalone mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode. +2. Upgrade to 3C3D cluster mode, allocate 2GB memory to ConfigNode, allocate same memory size with v0.13 to DataNode. + +Configuration modification: + +- Do not point v1.0 data directory to v0.13 data directory +- region_group_extension_strategy=COSTOM +- data_region_group_per_database + - for 3C3D cluster mode: Cluster CPU total core num / data_replication_factor + - for 1C1D standalone mode: use virtual_storage_group_num in v0.13 + +Data migration: +After modifying the configuration, use load-tsfile tool to load the TsFiles of v0.13 to v1.0. + +### Use v1.0 directly + +**Recommend to use 1 Database only** + +#### Memory estimation + +##### Use active series number to estimate memory size + +Cluster DataNode total heap size(GB) = active series number / 100000 * data_replication_factor + +Heap size of each DataNode (GB) = Cluster DataNode total heap size / DataNode number + +> Example: use 3C3D to manage 1 million timeseries, use 3 data replicas +> - Cluster DataNode total heap size: 1,000,000 / 100,000 * 3 = 30G +> - 每Heap size of each DataNode: 30 / 3 = 10G + +##### Use total series number to estimate memory size + +Cluster DataNode total heap size(B) = 20 * (180 + 2 * average character num of the series full path) * total series number * schema_replication_factor + +Heap size of each DataNode = Cluster DataNode total heap size / DataNode number + +> Example: use 3C3D to manage 1 million timeseries, use 3 schema replicas, series name such as root.sg_1.d_10.s_100(20 chars) +> - Cluster DataNode total heap size: 20 * (180 + 2 * 20) * 1,000,000 * 3 = 13.2 GB +> - Heap size of each DataNode: 13.2 GB / 3 = 4.4 GB + +#### Disk estimation + +IoTDB storage size = data storage size + schema storage size + temp storage size + +##### Data storage size + +Series number * Sampling frequency * Data point size * Storage duration * data_replication_factor / 10 (compression ratio) + +| Data Type \ Data point size | Timestamp (Byte) | Value (Byte) | Total (Byte) | +|:---------------------------:|:-----------------|:-------------|:-------------| +| Boolean | 8 | 1 | 9 | +| INT32 / FLOAT | 8 | 4 | 12 | +| INT64)/ DOUBLE | 8 | 8 | 16 | +| TEXT | 8 | Assuming a | 8+a | + + +> Example: 1000 devices, 100 sensors for one device, 100,000 series total, INT32 data type, 1Hz sampling frequency, 1 year storage duration, 3 replicas, compression ratio is 10 +> Data storage size = 1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T + +##### Schema storage size + +One series uses the path character byte size + 20 bytes. +If the series has tag, add the tag character byte size. + +##### Temp storage size + +Temp storage size = WAL storage size + Consensus storage size + Compaction temp storage size + +1. WAL + +max wal storage size = memtable memory size ÷ wal_min_effective_info_ratio +- memtable memory size is decided by datanode_memory_proportion, storage_engine_memory_proportion and write_memory_proportion +- wal_min_effective_info_ratio is decided by wal_min_effective_info_ratio configuration + +> Example: allocate 16G memory for DataNode, config is as below: +> datanode_memory_proportion=3:3:1:1:1:1 +> storage_engine_memory_proportion=8:2 +> write_memory_proportion=19:1 +> wal_min_effective_info_ratio=0.1 +> max wal storage size = 16 * (3 / 10) * (8 / 10) * (19 / 20) ÷ 0.1 = 36.48G + +2. Consensus + +Ratis consensus + +When using ratis consensus protocol, we need extra storage for Raft Log, which will be deleted after the state machine takes snapshot. +We can adjust `trigger_snapshot_threshold` to control the maximum Raft Log disk usage. + + +Raft Log disk size in each Region = average * trigger_snapshot_threshold + +The total Raft Log storage space is proportional to the data replica number + +> Example: DataRegion, 20kB data for one request, data_region_trigger_snapshot_threshold = 400,000, then max Raft Log disk size = 20K * 400,000 = 8G. +Raft Log increases from 0 to 8GB, and then turns to 0 after snapshot. Average size will be 4GB. +When replica number is 3, max Raft log size will be 3 * 8G = 24G. + +What's more, we can configure data_region_ratis_log_max_size to limit max log size of a single DataRegion. +By default, data_region_ratis_log_max_size=20G, which guarantees that Raft Log size would not exceed 20G. + +3. Compaction + +- Inner space compaction + Disk space for temporary files = Total Disk space of origin files + + > Example: 10 origin files, 100MB for each file + > Disk space for temporary files = 10 * 100 = 1000M + + +- Outer space compaction + The overlap of out-of-order data = overlapped data amount / total out-of-order data amount + + Disk space for temporary file = Total ordered Disk space of origin files + Total out-of-order disk space of origin files *(1 - overlap) + > Example: 10 ordered files, 10 out-of-order files, 100M for each ordered file, 50M for each out-of-order file, half of data is overlapped with sequence file + > The overlap of out-of-order data = 25M/50M * 100% = 50% + > Disk space for temporary files = 10 * 100 + 10 * 50 * 50% = 1250M + + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Docker-Install.md b/src/UserGuide/V2.0.1/Tree/stage/Docker-Install.md new file mode 100644 index 00000000..421d8497 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Docker-Install.md @@ -0,0 +1,187 @@ + + +# Installation by Docker + +Apache IoTDB' Docker image is released on [https://hub.docker.com/r/apache/iotdb](https://hub.docker.com/r/apache/iotdb) +Add environments of docker to update the configurations of Apache IoTDB. + +## Have a try + +```shell +# get IoTDB official image +docker pull apache/iotdb:1.3.0-standalone +# create docker bridge network +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +# create docker container +docker run -d --name iotdb-service \ + --hostname iotdb-service \ + --network iotdb \ + --ip 172.18.0.6 \ + -p 6667:6667 \ + -e cn_internal_address=iotdb-service \ + -e cn_seed_config_node=iotdb-service:10710 \ + -e cn_internal_port=10710 \ + -e cn_consensus_port=10720 \ + -e dn_rpc_address=iotdb-service \ + -e dn_internal_address=iotdb-service \ + -e dn_seed_config_node=iotdb-service:10710 \ + -e dn_mpp_data_exchange_port=10740 \ + -e dn_schema_region_consensus_port=10750 \ + -e dn_data_region_consensus_port=10760 \ + -e dn_rpc_port=6667 \ + apache/iotdb:1.3.0-standalone +# execute SQL +docker exec -ti iotdb-service /iotdb/sbin/start-cli.sh -h iotdb-service +``` + +External access: + +```shell +# is the real IP or domain address rather than the one in docker network, could be 127.0.0.1 within the computer. +$IOTDB_HOME/sbin/start-cli.sh -h -p 6667 +``` + +Notice:The confignode service would fail when restarting this container if the IP Adress of the container has been changed. + +```yaml +# docker-compose-standalone.yml +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.3.0-standalone + hostname: iotdb-service + container_name: iotdb-service + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb-service + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-service:10710 + - dn_rpc_address=iotdb-service + - dn_internal_address=iotdb-service + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb-service:10710 + volumes: + - ./data:/iotdb/data + - ./logs:/iotdb/logs + networks: + iotdb: + ipv4_address: 172.18.0.6 + +networks: + iotdb: + external: true +``` + +If you'd like to limit the memory of IoTDB, follow these steps: +1. Add another configuration of volumes: `./iotdb-conf:/iotdb/conf` and then start the IoTDB docker container. Thus, there are some configuration files in directory of iotdb-conf. +2. Change the memory configurations in confignode-env.sh and datanode-env.sh of iotdb-conf, and then restart the IoTDB docker container again. + +## deploy cluster + +Until now, we support host and overlay networks but haven't supported bridge networks on multiple computers. +Overlay networks see [1C2D](https://github.com/apache/iotdb/tree/master/docker/src/main/DockerCompose/docker-compose-cluster-1c2d.yml) and here are the configurations and operation steps to start an IoTDB cluster with docker using host networks。 + +Suppose that there are three computers of iotdb-1, iotdb-2 and iotdb-3. We called them nodes. +Here is the docker-compose file of iotdb-2, as the sample: + +```yaml +version: "3" +services: + iotdb-confignode: + image: apache/iotdb:1.3.0-confignode + container_name: iotdb-confignode + environment: + - cn_internal_address=iotdb-2 + - cn_seed_config_node=iotdb-1:10710 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - data_replication_factor=3 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/confignode:/iotdb/data + - ./logs/confignode:/iotdb/logs + network_mode: "host" + + iotdb-datanode: + image: apache/iotdb:1.3.0-datanode + container_name: iotdb-datanode + environment: + - dn_rpc_address=iotdb-2 + - dn_internal_address=iotdb-2 + - dn_seed_config_node=iotdb-1:10710 + - data_replication_factor=3 + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/datanode:/iotdb/data/ + - ./logs/datanode:/iotdb/logs/ + network_mode: "host" +``` + +Notice: + +1. The `cn_seed_config_node` and `dn_seed_config_node` of three nodes must the same and they are the first starting node of `iotdb-1` with the `cn_internal_port` of 10710。 +2. In this docker-compose file,`iotdb-2` should be replace with the real IP or hostname of each node to generate docker compose files in the other nodes. +3. The services would talk with each other, so they need map the /etc/hosts file or add the `extra_hosts` to the docker compose file. +4. We must start the IoTDB services of `iotdb-1` first at the first time of starting. +5. Stop and remove all the IoTDB services and clean up the `data` and `logs` directories of the 3 nodes,then start the cluster again. + + +## Configuration +All configuration files are in the directory of `conf`. +The elements of environment in docker-compose file is the configurations of IoTDB. +If you'd changed the configurations files in conf, please map the directory of `conf` in docker-compose file. + + +### log level +The conf directory contains log configuration files, namely logback-confignode.xml and logback-datanode.xml. + + +### memory set +The conf directory contains memory configuration files, namely confignode-env.sh and datanode-env.sh. JVM heap size uses ON_HEAP_MEMORY and JVM direct memroy uses OFF_HEAP_MEMORY. e.g. `ON_HEAP_MEMORY=8G, OFF_HEAP_MEMORY=2G` + +## upgrade IoTDB +1. Downloads the newer IoTDB docker image from docker hub +2. Update the image of docker-compose file +3. Stop the IoTDB docker containers with the commands of docker stop and docker rm. +4. Start IoTDB with `docker-compose -f docker-compose-standalone.yml up -d` + +## boot automatically +1. Add `restart: always` to every service of IoTDB in docker-compose file +2. Set docker service to boot automatically +e.g. in CentOS: `systemctl enable docker` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Edge-Cloud-Collaboration/Sync-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Edge-Cloud-Collaboration/Sync-Tool.md new file mode 100644 index 00000000..e6e59a88 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Edge-Cloud-Collaboration/Sync-Tool.md @@ -0,0 +1,374 @@ + + +# Sync Tool (Edge-Cloud Collaboration) + +## 1.Introduction + +The Sync Tool is an IoTDB suite tool that continuously uploads the timeseries data from the edge (sender) to the cloud(receiver). + +On the sender side of the sync-tool, the sync module is embedded in the IoTDB engine. The receiver side of the sync-tool supports IoTDB (standalone/cluster). + +You can use SQL commands to start or close a synchronization task at the sender, and you can check the status of the synchronization task at any time. At the receiving end, you can set the IP white list to specify the access IP address range of sender. + +## 2.Model definition + +![pipe2.png](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Sync-Tool/pipe2.png?raw=true) + +Two machines A and B, which are installed with iotdb, we want to continuously synchronize the data from A to B. To better describe this process, we introduce the following concepts. + +- Pipe + - It refers to a synchronization task. In the above case, we can see that there is a data flow pipeline connecting A and B. + - A pipe has three states, RUNNING, STOP and DROP, which respectively indicate running, pause and permanent cancellation. +- PipeSink + - It refers to the receiving end. In the above case, pipesink is machine B. At present, the pipesink type only supports IoTDB, that is, the receiver is the IoTDB instance installed on B. + - Pipeserver: when the type of pipesink is IoTDB, you need to open the pipeserver service of IoTDB to process the pipe data. + +## 3.Precautions for Use + +- The sender side of the sync-tool currently supports IoTDB version 1.0 **only if data_replication_factor is set to 1**. The receiver side supports any IoTDB version 1.0 configuration +- A normal Pipe has two states: RUNNING indicates that it is synchronizing data to the receiver, and STOP indicates that synchronization to the receiver is suspended. +- When one or more senders send data to a receiver, there should be no intersection between the respective device path sets of these senders and receivers, otherwise unexpected errors may occur. + - e.g. When sender A includes path `root.sg.d.s`, sender B also includes the path `root.sg.d.s`, sender A deletes database `root.sg` will also delete all data of B stored in the path `root.sg.d.s` at receiver. +- The two "ends" do not support synchronization with each other. +- The Sync Tool only synchronizes insertions. If no database is created on the receiver, a database of the same level as the sender will be automatically created. Currently, deletion operation is not guaranteed to be synchronized and do not support TTL settings, trigger and other operations. + - If TTL is set on the sender side, all unexpired data in the IoTDB and all future data writes and deletions will be synchronized to the receiver side when Pipe is started. +- When operating a synchronization task, ensure that all DataNode nodes in `SHOW DATANODES` that are in the Running state are connected, otherwise the execution will fail. + +## 4.Quick Start + +Execute the following SQL statements at the sender and receiver to quickly start a data synchronization task between two IoTDB. For complete SQL statements and configuration matters, please see the `parameter configuration`and `SQL` sections. For more usage examples, please refer to the `usage examples` section. + +### 4.1 Receiver + +- Start sender IoTDB and receiver IoTDB. + +- Create a PipeSink with IoTDB type. + +``` +IoTDB> CREATE PIPESINK central_iotdb AS IoTDB (ip='There is your goal IP', port='There is your goal port') +``` + +- Establish a Pipe (before creation, ensure that receiver IoTDB has been started). + +``` +IoTDB> CREATE PIPE my_pipe TO central_iotDB +``` + +- Start this Pipe. + +``` +IoTDB> START PIPE my_pipe +``` + +- Show Pipe's status. + +``` +IoTDB> SHOW PIPES +``` + +- Stop this Pipe. + +``` +IoTDB> STOP PIPE my_pipe +``` + +- Continue this Pipe. + +``` +IoTDB> START PIPE my_pipe +``` + +- Drop this Pipe (delete all information about this pipe). + +``` +IoTDB> DROP PIPE my_pipe +``` + +## 5.Parameter Configuration + +All parameters are in `$IOTDB_ HOME$/conf/iotdb-system.properties`, after all modifications are completed, execute `load configuration` and it will take effect immediately. + +### 5.1 Sender + +| **Parameter Name** | **max_number_of_sync_file_retry** | +| ------------------ | ------------------------------------------------------------ | +| Description | The maximum number of retries when the sender fails to synchronize files to the receiver. | +| Data type | Int : [0,2147483647] | +| Default value | 5 | + + + +### 5.2 Receiver + +| **Parameter Name** | **ip_white_list** | +| ------------------ | ------------------------------------------------------------ | +| Description | Set the white list of IP addresses of the sender of the synchronization, which is expressed in the form of network segments, and multiple network segments are separated by commas. When the sender synchronizes data to the receiver, the receiver allows synchronization only when the IP address of the sender is within the network segment set in the white list. If the whitelist is empty, the receiver does not allow any sender to synchronize data. By default, the receiver rejects the synchronization request of all IP addresses except 127.0.0.1. When configuring this parameter, please ensure that all DataNode addresses on the sender are set. | +| Data type | String | +| Default value | 127.0.0.1/32 | + +## 6.SQL + +### SHOW PIPESINKTYPE + +- Show all PipeSink types supported by IoTDB. + +``` +IoTDB> SHOW PIPESINKTYPE +IoTDB> ++-----+ +| type| ++-----+ +|IoTDB| ++-----+ +``` + +### CREATE PIPESINK + +* Create a PipeSink with IoTDB type, where IP and port are optional parameters. + +``` +IoTDB> CREATE PIPESINK AS IoTDB [(ip='127.0.0.1',port=6667);] +``` + +### DROP PIPESINK + +- Drop the pipesink with PipeSinkName parameter. + +``` +IoTDB> DROP PIPESINK +``` + +### SHOW PIPESINK + +- Show all PipeSinks' definition, the results set has three columns, name, PipeSink’s type and PipeSink‘s attributes. + +``` +IoTDB> SHOW PIPESINKS +IoTDB> SHOW PIPESINK [PipeSinkName] +IoTDB> ++-----------+-----+------------------------+ +| name| type| attributes| ++-----------+-----+------------------------+ +|my_pipesink|IoTDB|ip='127.0.0.1',port=6667| ++-----------+-----+------------------------+ +``` + +### CREATE PIPE + +- Create a pipe. + + - At present, the SELECT statement only supports `**` (i.e. data in all timeseries), the FROM statement only supports `root`, and the WHERE statement only supports the start time of the specified time. The start time can be specified in the form of yyyy-mm-dd HH:MM:SS or a timestamp. + +``` +IoTDB> CREATE PIPE my_pipe TO my_iotdb [FROM (select ** from root WHERE time>='yyyy-mm-dd HH:MM:SS' )] +``` + +### STOP PIPE + +- Stop the Pipe with PipeName. + +``` +IoTDB> STOP PIPE +``` + +### START PIPE + +- Continue the Pipe with PipeName. + +``` +IoTDB> START PIPE +``` + +### DROP PIPE + +- Drop the pipe with PipeName(delete all information about this pipe). + +``` +IoTDB> DROP PIPE +``` + +### SHOW PIPE + +> This statement can be executed on both senders and receivers. + +- Show all Pipe's status. + + - `create time`: the creation time of this pipe. + + - `name`: the name of this pipe. + + - `role`: the current role of this IoTDB in pipe, there are two possible roles. + - Sender, the current IoTDB is the synchronous sender + - Receiver, the current IoTDB is the synchronous receiver + + - `remote`: information about the opposite end of the Pipe. + - When role is sender, the value of this field is the PipeSink name. + - When role is receiver, the value of this field is the sender's IP. + + +- `status`: the Pipe's status. +- `attributes`: the attributes of Pipe + - When role is sender, the value of this field is the synchronization start time of the Pipe and whether to synchronize the delete operation. + - When role is receiver, the value of this field is the name of the database corresponding to the synchronization connection created on this DataNode. + +- `message`: the status message of this pipe. When pipe runs normally, this column is NORMAL. When an exception occurs, messages may appear in following two states. + - WARN, this indicates that a data loss or other error has occurred, but the pipe will remain running. + - ERROR, This indicates a problem where the network connection works but the data cannot be transferred, for example, the IP of the sender is not in the whitelist of the receiver or the version of the sender is not compatible with that of the receiver. + - When the ERROR status appears, it is recommended to check the DataNode logs after STOP PIPE, check the receiver configuration or network conditions, and then START PIPE again. + + +``` +IoTDB> SHOW PIPES +IoTDB> ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +| create time| name | role| remote| status| attributes|message| ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +|2022-03-30T20:58:30.689|my_pipe1| sender| my_pipesink| STOP|SyncDelOp=false,DataStartTimestamp=0| NORMAL| ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +|2022-03-31T12:55:28.129|my_pipe2|receiver|192.168.11.11| RUNNING| Database='root.vehicle'| NORMAL| ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +``` + +- Show the pipe status with PipeName. When the PipeName is empty,it is the same with `Show PIPES`. + +``` +IoTDB> SHOW PIPE [PipeName] +``` + +## 7. Usage Examples + +### Goal + +- Create a synchronize task from sender IoTDB to receiver IoTDB. +- Sender wants to synchronize the data after 2022-3-30 00:00:00. +- Sender does not want to synchronize the deletions. +- Receiver only wants to receive data from this sender(sender ip 192.168.0.1). + +### Receiver + +- `vi conf/iotdb-system.properties` to config the parameters,set the IP white list to 192.168.0.1/32 to receive and only receive data from sender. + +``` +#################### +### PIPE Server Configuration +#################### +# White IP list of Sync client. +# Please use the form of IPv4 network segment to present the range of IP, for example: 192.168.0.0/16 +# If there are more than one IP segment, please separate them by commas +# The default is to reject all IP to sync except 0.0.0.0 +# Datatype: String +ip_white_list=192.168.0.1/32 +``` + +### Sender + +- Create PipeSink with IoTDB type, input ip address 192.168.0.1, port 6667. + +``` +IoTDB> CREATE PIPESINK my_iotdb AS IoTDB (IP='192.168.0.2',PORT=6667) +``` + +- Create Pipe connect to my_iotdb, input the start time 2022-03-30 00:00:00 in WHERE statments. The following two SQL statements are equivalent + +``` +IoTDB> CREATE PIPE p TO my_iotdb FROM (select ** from root where time>='2022-03-30 00:00:00') +IoTDB> CREATE PIPE p TO my_iotdb FROM (select ** from root where time>= 1648569600000) +``` + +- Start the Pipe p + +``` +IoTDB> START PIPE p +``` + +- Show the status of pipe p. + +``` +IoTDB> SHOW PIPE p +``` + +### Result Verification + +Execute SQL on sender. + +``` +CREATE DATABASE root.vehicle; +CREATE TIMESERIES root.vehicle.d0.s0 WITH DATATYPE=INT32, ENCODING=RLE; +CREATE TIMESERIES root.vehicle.d0.s1 WITH DATATYPE=TEXT, ENCODING=PLAIN; +CREATE TIMESERIES root.vehicle.d1.s2 WITH DATATYPE=FLOAT, ENCODING=RLE; +CREATE TIMESERIES root.vehicle.d1.s3 WITH DATATYPE=BOOLEAN, ENCODING=PLAIN; +insert into root.vehicle.d0(timestamp,s0) values(now(),10); +insert into root.vehicle.d0(timestamp,s0,s1) values(now(),12,'12'); +insert into root.vehicle.d0(timestamp,s1) values(now(),'14'); +insert into root.vehicle.d1(timestamp,s2) values(now(),16.0); +insert into root.vehicle.d1(timestamp,s2,s3) values(now(),18.0,true); +insert into root.vehicle.d1(timestamp,s3) values(now(),false); +flush; +``` + +Execute SELECT statements, the same results can be found on sender and receiver. + +``` +IoTDB> select ** from root.vehicle ++-----------------------------+------------------+------------------+------------------+------------------+ +| Time|root.vehicle.d0.s0|root.vehicle.d0.s1|root.vehicle.d1.s3|root.vehicle.d1.s2| ++-----------------------------+------------------+------------------+------------------+------------------+ +|2022-04-03T20:08:17.127+08:00| 10| null| null| null| +|2022-04-03T20:08:17.358+08:00| 12| 12| null| null| +|2022-04-03T20:08:17.393+08:00| null| 14| null| null| +|2022-04-03T20:08:17.538+08:00| null| null| null| 16.0| +|2022-04-03T20:08:17.753+08:00| null| null| true| 18.0| +|2022-04-03T20:08:18.263+08:00| null| null| false| null| ++-----------------------------+------------------+------------------+------------------+------------------+ +Total line number = 6 +It costs 0.134s +``` + +## 8.Q&A + +- Execute `CREATE PIPESINK demo as IoTDB` get message `PIPESINK [demo] already exists in IoTDB.` + + - Cause by: Current PipeSink already exists + - Solution: Execute `DROP PIPESINK demo` to drop PipeSink and recreate it. +- Execute `DROP PIPESINK pipesinkName` get message `Can not drop PIPESINK [demo], because PIPE [mypipe] is using it.` + + - Cause by: It is not allowed to delete PipeSink that is used by a running PIPE. + - Solution: Execute `SHOW PIPE` on the sender side to stop using the PipeSink's PIPE. + +- Execute `CREATE PIPE p to demo` get message `PIPE [p] is STOP, please retry after drop it.` + - Cause by: Current Pipe already exists + - Solution: Execute `DROP PIPE p` to drop Pipe and recreate it. +- Execute `CREATE PIPE p to demo` get message `Fail to create PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}.` + - Cause by: There are some DataNodes with the status Running cannot be connected. + - Solution: Execute `SHOW DATANODES`, and check for unreachable DataNode networks, or wait for their status to change to Unknown and re-execute the statement. +- Execute `START PIPE p` get message `Fail to start PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}.` + - Cause by: There are some DataNodes with the status Running cannot be connected. + - Solution: Execute `SHOW DATANODES`, and check for unreachable DataNode networks, or wait for their status to change to Unknown and re-execute the statement. +- Execute `STOP PIPE p` get message `Fail to stop PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}.` + - Cause by: There are some DataNodes with the status Running cannot be connected. + - Solution: Execute `SHOW DATANODES`, and check for unreachable DataNode networks, or wait for their status to change to Unknown and re-execute the statement. +- Execute `DROP PIPE p` get message `Fail to DROP_PIPE because Fail to drop PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}. Please execute [DROP PIPE p] later to retry.` + - Cause by: There are some DataNodes with the status Running cannot be connected. Pipe has been deleted on some nodes and the status has been set to ***DROP***. + - Solution: Execute `SHOW DATANODES`, and check for unreachable DataNode networks, or wait for their status to change to Unknown and re-execute the statement. +- Sync.log prompts `org.apache.iotdb.commons.exception.IoTDBException: root.** already been created as database` + - Cause by: The synchronization tool attempts to automatically create a database at the sender at the receiver. This is a normal phenomenon. + - Solution: No intervention is required. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Environmental-Requirement.md b/src/UserGuide/V2.0.1/Tree/stage/Environmental-Requirement.md new file mode 100644 index 00000000..1798c9b6 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Environmental-Requirement.md @@ -0,0 +1,33 @@ + + +# Environmental Requirement + +To use IoTDB, you need to have: + +* Java >= 1.8 (1.8, 11 to 17 have been verified. Please make sure the environment path has been set.) +* Maven >= 3.6 (if you want to install IoTDB by compiling the source code) +* Set the max open files num as 65535 to avoid "too many open files" problem. +* (Optional) Set the somaxconn as 65535 to avoid "connection reset" error when the system is under high load. + + +> **# Linux**
`sudo sysctl -w net.core.somaxconn=65535`
**# FreeBSD 或 Darwin**
`sudo sysctl -w kern.ipc.somaxconn=65535` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Features.md b/src/UserGuide/V2.0.1/Tree/stage/Features.md new file mode 100644 index 00000000..f7af7131 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Features.md @@ -0,0 +1,58 @@ + + +# Features + + +* Flexible deployment. + +IoTDB provides users one-click installation tool on the cloud, once-decompressed-used terminal tool and the bridging tool between cloud platforms and terminal tools (Data Synchronization Tool). + +* Low storage cost. + +IoTDB can reach a high compression ratio of disk storage, which means IoTDB can store the same amount of data with less hardware disk cost. + +* Efficient directory structure. + +IoTDB supports efficient oganization for complex timeseries data structure from intelligent networking devices, oganization for timeseries data from devices of the same type, fuzzy searching strategy for massive and complex directory of timeseries data. +* High-throughput read and write. + +IoTDB supports millions of low-power devices' strong connection data access, high-speed data read and write for intelligent networking devices and mixed devices mentioned above. + +* Rich query semantics. + +IoTDB supports time alignment for timeseries data accross devices and sensors, computation in timeseries field (frequency domain transformation) and rich aggregation function support in time dimension. + +* Easy to get started. + +IoTDB supports SQL-Like language, JDBC standard API and import/export tools which are easy to use. + +* Intense integration with Open Source Ecosystem. + +IoTDB supports Hadoop, Spark, etc. analysis ecosystems and Grafana visualization tool. + +* Unified data access mode + +IoTDB eliminates the need for database partitioning or sharding and makes no distinction between historical and real-time databases. + +* High availability support + +IoTDB supports a HA distributed architecture, ensuring 7x24 uninterrupted real-time database services. Users can connect to any node within the cluster for system access. The system remains operational and unaffected even during physical node outages or network failures. As physical nodes are added, removed, or face performance issues, IoTDB automatically manages load balancing for both computational and storage resources. Furthermore, it's compatible with heterogeneous environments, allowing servers of varying types and capabilities to form a cluster, with load balancing optimized based on the specific configurations of each server. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Files.md b/src/UserGuide/V2.0.1/Tree/stage/Files.md new file mode 100644 index 00000000..4e79555b --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Files.md @@ -0,0 +1,128 @@ + + +# Files + +In IoTDB, there are many kinds of data needed to be stored. This section introduces IoTDB's data storage strategy to provide you an explicit understanding of IoTDB's data management. + +The data in IoTDB is divided into three categories, namely data files, system files, and pre-write log files. + +## Data Files +> under directory basedir/data/ + +Data files store all the data that the user wrote to IoTDB, which contains TsFile and other files. TsFile storage directory can be configured with the `data_dirs` configuration item (see [file layer](../Reference/DataNode-Config-Manual.md) for details). Other files can be configured through [data_dirs](../Reference/DataNode-Config-Manual.md) configuration item (see [Engine Layer](../Reference/DataNode-Config-Manual.md) for details). + +In order to support users' storage requirements such as disk space expansion better, IoTDB supports multiple file directories storage methods for TsFile storage configuration. Users can set multiple storage paths as data storage locations( see [data_dirs](../Reference/DataNode-Config-Manual.md) configuration item), and you can specify or customize the directory selection strategy (see [multi_dir_strategy](../Reference/DataNode-Config-Manual.md) configuration item for details). + +### TsFile +> under directory data/sequence or unsequence/{DatabaseName}/{DataRegionId}/{TimePartitionId}/ + +1. {time}-{version}-{inner_compaction_count}-{cross_compaction_count}.tsfile + + normal data file +2. {TsFileName}.tsfile.mod + + modification file + + record delete operation + +### TsFileResource +1. {TsFileName}.tsfile.resource + + descriptor and statistic file of a TsFile + +### Compaction Related Files +> under directory basedir/data/sequence or unsequence/{DatabaseName}/ + +1. file suffixe with `.cross ` or `.inner` + + temporary files of metadata generated in a compaction task +2. file suffixe with `.inner-compaction.log` or `.cross-compaction.log` + + record the progress of a compaction task +3. file suffixe with `.compaction.mods` + + modification file generated during a compaction task +4. file suffixe with `.meta` + + temporary files of metadata generated during a merge + +## System files + +System files include schema files, which store metadata information of data in IoTDB. It can be configured through the `base_dir` configuration item (see [System Layer](../Reference/DataNode-Config-Manual.md) for details). + +### MetaData Related Files +> under directory basedir/system/schema + +#### Meta +1. mlog.bin + + record the meta operation +2. mtree-1.snapshot + + snapshot of metadata +3. mtree-1.snapshot.tmp + + temp file, to avoid damaging the snapshot when updating it + +#### Tags&Attributes +1. tlog.txt + + store tags and attributes of each TimeSeries + + about 700 bytes for each TimeSeries + +### Other System Files +#### Version +> under directory basedir/system/database/{DatabaseName}/{TimePartitionId} or upgrade + +1. Version-{version} + + version file, record the max version in fileName of a storage group + +#### Upgrade +> under directory basedir/system/upgrade + +1. upgrade.txt + + record which files have been upgraded + +#### Authority +> under directory basedir/system/users/ +> under directory basedir/system/roles/ + +#### CompressRatio +> under directory basedir/system/compression_ration +1. Ration-{compressionRatioSum}-{calTimes} + + record compression ratio of each tsfile +## Pre-write Log Files + +Pre-write log files store WAL files. It can be configured through the `wal_dir` configuration item (see [System Layer](../Reference/DataNode-Config-Manual.md) for details). + +> under directory basedir/wal + +1. {DatabaseName}-{TsFileName}/wal1 + + every storage group has several wal files, and every memtable has one associated wal file before it is flushed into a TsFile +## Example of Setting Data storage Directory + +For a clearer understanding of configuring the data storage directory, we will give an example in this section. + +The data directory path included in storage directory setting are: base_dir, data_dirs, multi_dir_strategy, and wal_dir, which refer to system files, data folders, storage strategy, and pre-write log files. + +An example of the configuration items are as follows: + +``` +dn_system_dir = $IOTDB_HOME/data/datanode/system +dn_data_dirs = /data1/datanode/data, /data2/datanode/data, /data3/datanode/data +dn_multi_dir_strategy=MaxDiskUsableSpaceFirstStrategy +dn_wal_dirs= $IOTDB_HOME/data/datanode/wal +``` +After setting the configuration, the system will: + +* Save all system files in $IOTDB_HOME/data/datanode/system +* Save TsFile in /data1/datanode/data, /data2/datanode/data, /data3/datanode/data. And the choosing strategy is `MaxDiskUsableSpaceFirstStrategy`, when data writes to the disk, the system will automatically select a directory with the largest remaining disk space to write data. +* Save WAL data in $IOTDB_HOME/data/datanode/wal +* diff --git a/src/UserGuide/V2.0.1/Tree/stage/Flink-SQL-IoTDB.md b/src/UserGuide/V2.0.1/Tree/stage/Flink-SQL-IoTDB.md new file mode 100644 index 00000000..e964807e --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Flink-SQL-IoTDB.md @@ -0,0 +1,527 @@ +# fApache Flink(SQL) + +The flink-sql-iotdb-connector seamlessly connects Flink SQL or Flink Table with IoTDB, enabling real-time read and write operations on IoTDB within Flink tasks. It can be applied to the following scenarios: + +1. Real-time data synchronization: Real-time synchronization of data from one database to another. +2. Real-time data pipeline: Building real-time data processing pipelines to process and analyze data in databases. +3. Real-time data analysis: Real-time analysis of data in databases, providing real-time business insights. +4. Real-time applications: Real-time application of database data in real-time applications such as real-time reporting and real-time recommendations. +5. Real-time monitoring: Real-time monitoring of database data, detecting anomalies and errors. + +## Read and Write Modes + +| Read Modes (Source) | Write Modes (Sink) | +| ------------------------- | -------------------------- | +| Bounded Scan, Lookup, CDC | Streaming Sink, Batch Sink | + +### Read Modes (Source) + +* **Bounded Scan:** Bounded scan is primarily implemented by specifying the `time series` and optional `upper and lower bounds of the query conditions` to query data, and the query result usually consists of multiple rows of data. This type of query cannot retrieve data that is updated after the query. + +* **Lookup:** The lookup query mode differs from the scan query mode. While bounded scan queries data within a time range, the `lookup` query mode only queries data at a precise time point, resulting in a single row of data. Additionally, only the right table of a `lookup join` can use the lookup query mode. + +* **CDC:** CDC is mainly used in Flink's ETL tasks. When data in IoTDB changes, Flink can detect it through our provided CDC connector, and we can forward the detected change data to other external data sources to achieve the purpose of ETL. + +### Write Modes (Sink) + +* **Streaming Sink:** Used in Flink's streaming mode, it synchronizes the insert, update, and delete records of the Dynamic Table in Flink to IoTDB in real-time. + +* **Batch Sink:** Used in Flink's batch mode, it writes the batch computation results from Flink to IoTDB in a single operation. + +## Usage + +We provide two ways to use the flink-sql-iotdb-connector. One is to reference it through Maven during project development, and the other is to use it in Flink's sql-client. We will introduce these two usage methods separately. + +> 📌 Note: flink version requires 1.17.0 and above. + +### Maven + +Simply add the following dependency to your project's pom file: + +```xml + + org.apache.iotdb + flink-sql-iotdb-connector + ${iotdb.version} + +``` + +### sql-client + +If you want to use the flink-sql-iotdb-connector in the sql-client, follow these steps to configure the environment: + +1. Download the flink-sql-iotdb-connector jar file with dependencies from the [official website](https://iotdb.apache.org/Download/). + +2. Copy the jar file to the `$FLINK_HOME/lib` directory. + +3. Start the Flink cluster. + +4. Start the sql-client. + +You can now use the flink-sql-iotdb-connector in the sql-client. + +## Table Structure Specification + +Regardless of the type of connector used, the following table structure specifications must be met: + +- For all tables using the `IoTDB connector`, the first column must be named `Time_` and have a data type of `BIGINT`. +- All column names, except for the `Time_` column, must start with `root.`. Additionally, any node in the column name cannot be purely numeric. If there are purely numeric or other illegal characters in the column name, they must be enclosed in backticks. For example, the path `root.sg.d0.123` is an illegal path, but `root.sg.d0.`123`` is a valid path. +- When querying data from IoTDB using either `pattern` or `sql`, the time series names in the query result must include all column names in Flink, except for `Time_`. If there is no corresponding column name in the query result, that column will be filled with null. +- The supported data types in flink-sql-iotdb-connector are: `INT`, `BIGINT`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `STRING`. The data type of each column in Flink Table must match the corresponding time series type in IoTDB, otherwise an error will occur and the Flink task will exit. + +The following examples illustrate the mapping between time series in IoTDB and columns in Flink Table. + +## Read Mode (Source) + +### Scan Table (Bounded) + +#### Parameters + +| Parameter | Required | Default | Type | Description | +| ------------------------- | -------- | --------------- | ------ | ------------------------------------------------------------ | +| nodeUrls | No | 127.0.0.1:6667 | String | Specifies the datanode addresses of IoTDB. If IoTDB is deployed in cluster mode, multiple addresses can be specified, separated by commas. | +| user | No | root | String | IoTDB username | +| password | No | root | String | IoTDB password | +| scan.bounded.lower-bound | No | -1L | Long | Lower bound (inclusive) of the timestamp for bounded scan queries. Valid when the parameter is greater than `0`. | +| scan.bounded.upper-bound | No | -1L | Long | Upper bound (inclusive) of the timestamp for bounded scan queries. Valid when the parameter is greater than `0`. | +| sql | Yes | None | String | Query to be executed in IoTDB. | + +#### Example + +This example demonstrates how to read data from IoTDB using the `scan table` method in a Flink Table Job: + +Assume the data in IoTDB is as follows: +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1| ++-----------------------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.028s +``` + +```java +import org.apache.flink.table.api.*; + +public class BoundedScanTest { + public static void main(String[] args) throws Exception { + // setup table environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + // setup schema + Schema iotdbTableSchema = + Schema.newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + // register table + TableDescriptor iotdbDescriptor = + TableDescriptor.forConnector("IoTDB") + .schema(iotdbTableSchema) + .option("nodeUrls", "127.0.0.1:6667") + .option("sql", "select ** from root") + .build(); + tableEnv.createTemporaryTable("iotdbTable", iotdbDescriptor); + + // output table + tableEnv.from("iotdbTable").execute().print(); + } +} +``` +After executing the above job, the output table in the Flink console is as follows: +```text ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| op | Time_ | root.sg.d0.s0 | root.sg.d1.s0 | root.sg.d1.s1 | ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| +I | 1 | 1.0833644 | 2.34874 | 1.2414109 | +| +I | 2 | 4.929185 | 3.1885583 | 4.6980085 | +| +I | 3 | 3.5206156 | 3.5600138 | 4.8080945 | +| +I | 4 | 1.3449302 | 2.8781595 | 3.3195343 | +| +I | 5 | 3.3079383 | 3.3840187 | 3.7278645 | ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +``` + +### Lookup Point + +#### Parameters + +| Parameter | Required | Default | Type | Description | +| ------------------------ | -------- | --------------- | ------- | --------------------------------------------------------------------------- | +| nodeUrls | No | 127.0.0.1:6667 | String | Specifies the addresses of the IoTDB datanode. If IoTDB is deployed in cluster mode, multiple addresses can be specified, separated by commas. | +| user | No | root | String | IoTDB username | +| password | No | root | String | IoTDB password | +| lookup.cache.max-rows | No | -1 | Integer | Maximum number of rows to cache for lookup queries. Effective when the parameter is greater than `0`. | +| lookup.cache.ttl-sec | No | -1 | Integer | Time-to-live for cached data in lookup queries, in seconds. | +| sql | Yes | None | String | SQL query to execute in IoTDB. | + +#### Example + +This example demonstrates how to perform a `lookup` query using the `device` table in IoTDB as a dimension table: + +* Use the `datagen connector` to generate two fields as the left table for `Lookup Join`. The first field is an incrementing field representing the timestamp. The second field is a random field representing a measurement time series. +* Register a table using the `IoTDB connector` as the right table for `Lookup Join`. +* Join the two tables together. + +The current data in IoTDB is as follows: + +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1| ++-----------------------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.028s +``` +```java +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.EnvironmentSettings; +import org.apache.flink.table.api.Schema; +import org.apache.flink.table.api.TableDescriptor; +import org.apache.flink.table.api.TableEnvironment; + +public class LookupTest { + public static void main(String[] args) { + // Setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + + // Register left table + Schema dataGenTableSchema = + Schema.newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("s0", DataTypes.INT()) + .build(); + + TableDescriptor datagenDescriptor = + TableDescriptor.forConnector("datagen") + .schema(dataGenTableSchema) + .option("fields.Time_.kind", "sequence") + .option("fields.Time_.start", "1") + .option("fields.Time_.end", "5") + .option("fields.s0.min", "1") + .option("fields.s0.max", "1") + .build(); + tableEnv.createTemporaryTable("leftTable", datagenDescriptor); + + // Register right table + Schema iotdbTableSchema = + Schema.newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + + TableDescriptor iotdbDescriptor = + TableDescriptor.forConnector("IoTDB") + .schema(iotdbTableSchema) + .option("sql", "select ** from root") + .build(); + tableEnv.createTemporaryTable("rightTable", iotdbDescriptor); + + // Join + String sql = + "SELECT l.Time_, l.s0, r.`root.sg.d0.s0`, r.`root.sg.d1.s0`, r.`root.sg.d1.s1` " + + "FROM (SELECT *, PROCTIME() AS proc_time FROM leftTable) AS l " + + "JOIN rightTable FOR SYSTEM_TIME AS OF l.proc_time AS r " + + "ON l.Time_ = r.Time_"; + + // Output table + tableEnv.sqlQuery(sql).execute().print(); + } +} +``` + +After executing the above task, the output table in Flink's console is as follows: +```text ++----+----------------------+-------------+---------------+----------------------+--------------------------------+ +| op | Time_ | s0 | root.sg.d0.s0 | root.sg.d1.s0 | root.sg.d1.s1 | ++----+----------------------+-------------+---------------+----------------------+--------------------------------+ +| +I | 5 | 1 | 3.3079383 | 3.3840187 | 3.7278645 | +| +I | 2 | 1 | 4.929185 | 3.1885583 | 4.6980085 | +| +I | 1 | 1 | 1.0833644 | 2.34874 | 1.2414109 | +| +I | 4 | 1 | 1.3449302 | 2.8781595 | 3.3195343 | +| +I | 3 | 1 | 3.5206156 | 3.5600138 | 4.8080945 | ++----+----------------------+-------------+---------------+----------------------+--------------------------------+ +``` +### CDC + +#### Parameters + +| Parameter | Required | Default | Type | Description | +| --------------- | -------- | --------------- | ------- | --------------------------------------------------------------------------- | +| nodeUrls | No | 127.0.0.1:6667 | String | Specifies the datanode address of IoTDB. If IoTDB is deployed in cluster mode, multiple addresses can be specified, separated by commas. | +| user | No | root | String | IoTDB username | +| password | No | root | String | IoTDB password | +| mode | Yes | BOUNDED | ENUM | **This parameter must be set to `CDC` in order to start** | +| sql | Yes | None | String | SQL query to be executed in IoTDB | +| cdc.port | No | 8080 | Integer | Port number for the CDC service in IoTDB | +| cdc.task.name | Yes | None | String | Required when the mode parameter is set to CDC. Used to create a Pipe task in IoTDB. | +| cdc.pattern | Yes | None | String | Required when the mode parameter is set to CDC. Used as a filtering condition for sending data in IoTDB. | + +#### Example + +This example demonstrates how to retrieve the changing data from a specific path in IoTDB using the `CDC Connector`: + +* Create a `CDC` table using the `CDC Connector`. +* Print the `CDC` table. + +```java +import org.apache.flink.table.api.*; + +public class CDCTest { + public static void main(String[] args) { + // setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + // setup schema + Schema iotdbTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + + // register table + TableDescriptor iotdbDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(iotdbTableSchema) + .option("mode", "CDC") + .option("cdc.task.name", "test") + .option("cdc.pattern", "root.sg") + .build(); + tableEnv.createTemporaryTable("iotdbTable", iotdbDescriptor); + + // output table + tableEnv.from("iotdbTable").execute().print(); + } +} +``` +Run the above Flink CDC task and execute the following SQL in IoTDB-cli: +```sql +insert into root.sg.d1(timestamp,s0,s1) values(6,1.0,1.0); +insert into root.sg.d1(timestamp,s0,s1) values(7,1.0,1.0); +insert into root.sg.d1(timestamp,s0,s1) values(6,2.0,1.0); +insert into root.sg.d0(timestamp,s0) values(7,2.0); +``` +The console of Flink will print the following data: +```text ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| op | Time_ | root.sg.d0.s0 | root.sg.d1.s0 | root.sg.d1.s1 | ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| +I | 7 | | 1.0 | 1.0 | +| +I | 6 | | 1.0 | 1.0 | +| +I | 6 | | 2.0 | 1.0 | +| +I | 7 | 2.0 | | | +``` +## Write Mode (Sink) + +### Streaming Sink + +#### Parameters + +| Parameter | Required | Default | Type | Description | +| ----------| -------- | --------------- | ------- | --------------------------------------------------------------------------- | +| nodeUrls | No | 127.0.0.1:6667 | String | Specifies the datanode address of IoTDB. If IoTDB is deployed in cluster mode, multiple addresses can be specified, separated by commas. | +| user | No | root | String | IoTDB username | +| password | No | root | String | IoTDB password | +| aligned | No | false | Boolean | Whether to call the `aligned` interface when writing data to IoTDB. | + +#### Example + +This example demonstrates how to write data to IoTDB in a Flink Table Streaming Job: + +* Generate a source data table using the `datagen connector`. +* Register an output table using the `IoTDB connector`. +* Insert data from the source table into the output table. + +```java +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.EnvironmentSettings; +import org.apache.flink.table.api.Schema; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableDescriptor; +import org.apache.flink.table.api.TableEnvironment; + +public class StreamingSinkTest { + public static void main(String[] args) { + // setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + + // create data source table + Schema dataGenTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + TableDescriptor descriptor = TableDescriptor + .forConnector("datagen") + .schema(dataGenTableSchema) + .option("rows-per-second", "1") + .option("fields.Time_.kind", "sequence") + .option("fields.Time_.start", "1") + .option("fields.Time_.end", "5") + .option("fields.root.sg.d0.s0.min", "1") + .option("fields.root.sg.d0.s0.max", "5") + .option("fields.root.sg.d1.s0.min", "1") + .option("fields.root.sg.d1.s0.max", "5") + .option("fields.root.sg.d1.s1.min", "1") + .option("fields.root.sg.d1.s1.max", "5") + .build(); + // register source table + tableEnv.createTemporaryTable("dataGenTable", descriptor); + Table dataGenTable = tableEnv.from("dataGenTable"); + + // create iotdb sink table + TableDescriptor iotdbDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(dataGenTableSchema) + .build(); + tableEnv.createTemporaryTable("iotdbSinkTable", iotdbDescriptor); + + // insert data + dataGenTable.executeInsert("iotdbSinkTable").print(); + } +} +``` + +After the above job is executed, the query result in the IoTDB CLI is as follows: + +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1| ++-----------------------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.054s +``` +### Batch Sink + +#### Parameters + +| Parameter | Required | Default | Type | Description | +| --------- | -------- | --------------- | ------- | ------------------------------------------------------------ | +| nodeUrls | No | 127.0.0.1:6667 | String | Specifies the addresses of datanodes in IoTDB. If IoTDB is deployed in cluster mode, multiple addresses can be specified, separated by commas. | +| user | No | root | String | IoTDB username | +| password | No | root | String | IoTDB password | +| aligned | No | false | Boolean | Whether to call the `aligned` interface when writing data to IoTDB. | + +#### Example + +This example demonstrates how to write data to IoTDB in a Batch Job of a Flink Table: + +* Generate a source table using the `IoTDB connector`. +* Register an output table using the `IoTDB connector`. +* Write the renamed columns from the source table back to IoTDB. + +```java +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.EnvironmentSettings; +import org.apache.flink.table.api.Schema; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableDescriptor; +import org.apache.flink.table.api.TableEnvironment; + +import static org.apache.flink.table.api.Expressions.$; + +public class BatchSinkTest { + public static void main(String[] args) { + // setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inBatchMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + + // create source table + Schema sourceTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + TableDescriptor sourceTableDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(sourceTableSchema) + .option("sql", "select ** from root.sg.d0,root.sg.d1") + .build(); + + tableEnv.createTemporaryTable("sourceTable", sourceTableDescriptor); + Table sourceTable = tableEnv.from("sourceTable"); + // register sink table + Schema sinkTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d2.s0", DataTypes.FLOAT()) + .column("root.sg.d3.s0", DataTypes.FLOAT()) + .column("root.sg.d3.s1", DataTypes.FLOAT()) + .build(); + TableDescriptor sinkTableDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(sinkTableSchema) + .build(); + tableEnv.createTemporaryTable("sinkTable", sinkTableDescriptor); + + // insert data + sourceTable.renameColumns( + $("root.sg.d0.s0").as("root.sg.d2.s0"), + $("root.sg.d1.s0").as("root.sg.d3.s0"), + $("root.sg.d1.s1").as("root.sg.d3.s1") + ).insertInto("sinkTable").execute().print(); + } +} +``` + +After the above task is executed, the query result in the IoTDB cli is as follows: + +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1|root.sg.d2.s0|root.sg.d3.s0|root.sg.d3.s1| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.015s +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/General-SQL-Statements.md b/src/UserGuide/V2.0.1/Tree/stage/General-SQL-Statements.md new file mode 100644 index 00000000..2e35f600 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/General-SQL-Statements.md @@ -0,0 +1,160 @@ + + +# General SQL Statements + +## Database Management + +Database is similar to the database in the relational database, which is a collection of structured time series data. + +### create database + +Create a database named `root.ln` with the following syntax: +```sql +CREATE DATABASE root.ln +``` +### show databases + +View all databases: + +```sql +SHOW DATABASES +``` +### delete database + +Drop the database named `root.ln`: +```sql +DELETE DATABASE root.ln +``` +### count databases + +```sql +COUNT DATABASES +``` +## Time Series Management + +Time series is a collection of data points indexed by time. In IoTDB, time series refers to a complete sequence of measurement points. This section mainly introduces the management of time series. + +### create timeseries + +The encoding method and data type need to be specified. For example, create a time series named `root.ln.wf01.wt01.temperature`: +```sql +CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH datatype=FLOAT,ENCODING=RLE +``` + +### show timeseries + +View all time series: +```sql +SHOW TIMESERIES +``` + +Use wildcards to match time series under database `root.ln`: + +```sql +SHOW TIMESERIES root.ln.** +``` +### delete timeseries + +Delete a time series named `root.ln.wf01.wt01.temperature`: +```sql +DELETE TIMESERIES root.ln.wf01.wt01.temperature +``` +### count timeseries + +Count the total number of time series: +```sql +COUNT TIMESERIES root.** +``` +Count the number of time series under a wildcard path: +```sql +COUNT TIMESERIES root.ln.** +``` +## Time Series Path Management + +In addition to the concept of time series, IoTDB also has the concepts of subpaths and devices. + +**Subpath**: It is a part of the path in a complete time series name. For example, if the time series name is `root.ln.wf01.wt01.temperature`, then `root.ln`, `root.ln.wf01`, and `root.ln.wf01.wt01` are all its subpaths. + +**Device**: It is a combination of a group of time series. In IoTDB, the device is a subpath from the root to the penultimate node. If the time series name is `root.ln.wf01.wt01.temperature`, then `root.ln.wf01.wt01` is its device. + +### show devices + +```sql +SHOW DEVICES +``` + +### show child paths + +Check out the next level of `root.ln`: +```sql +SHOW CHILD PATHS root.ln +``` +### show child nodes + +```sql +SHOW CHILD NODES root.ln +``` +### count devices + +Count the number of devices: +```sql +COUNT DEVICES +``` +### count nodes + +Count the number of nodes at the specified level in the path: +```sql +COUNT NODES root.ln.** LEVEL=2 +``` +## Query Data + +The following are commonly used query statements in IoTDB. + +### Query the data of the specified time series + +Query all time series data under the device `root.ln.wf01.wt01`: + +```sql +SELECT * FROM root.ln.wf01.wt01 +``` + +### Query time series data within a certain time range + +Query the data in the time series `root.ln.wf01.wt01.temperature` whose timestamp is greater than 2022-01-01T00:05:00.000: + +```sql +SELECT temperature FROM root.ln.wf01.wt01 WHERE time > 2022-01-01T00:05:00.000 +``` + +### Query time series data whose values are within the specified range + +Query the data whose value is greater than 36.5 in the time series `root.ln.wf01.wt01.temperature`: + +```sql +SELECT temperature FROM root.ln.wf01.wt01 WHERE temperature > 36.5 +``` + +### Use last to query the latest point data + +```sql +SELECT last * FROM root.ln.wf01.wt01 +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Integration-Test/Integration-Test-refactoring-tutorial.md b/src/UserGuide/V2.0.1/Tree/stage/Integration-Test/Integration-Test-refactoring-tutorial.md new file mode 100644 index 00000000..3522a327 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Integration-Test/Integration-Test-refactoring-tutorial.md @@ -0,0 +1,240 @@ + + +# Developer Document for Integration Test + +**Integration test** is one of the phases in software testing, when different software modules are put together and tested as a whole. Integration tests are for evaluating whether a system or component meets the target functional requirements. + + +## Apache IoTDB Integration Test Criteria + +### The Environment of the integration test of Apache IoTDB + + +There are three kinds of environments for Apache IoTDB integration test, correspondingly **local standalone, Cluster, and remote.** The integration test should be conducted on at least one of them. Details of the three kinds are as follows. + +1. Local standalone. It is set up for integration testing of IoTDB, the standalone version. Any change of the configurations of IoTDB would require updating the configuration files before starting the database. +2. Cluster. It is set up for integration testing of IoTDB, the distribution version (pseudo-distribution). Any change of the configurations of IoTDB would require updating the configuration files before starting the database. +3. Remote. It is set up for the integration testing of a remote IoTDB instance, which could be either a standalone instance or a node in a remote cluster. Any change of the configuration is restricted and is not allowed currently. + +Integration test developers need to specify at least one of the environments when writing the tests. Please check the details below. + + +### Black-Box Testing + +Black-box testing is a software testing method that evaluates the functionality of a program without regard to its internal structure or how it works. Developers do not need to understand the internal logic of the application for testing. **Apache IoTDB integration tests are conducted as black-box tests. Any test interacting with the system through JDBC or Session API is considered a black-box test case.** Moreover, the validation of the output should also be implemented through the JDBC or Session API. + + +### Steps of an integration test + +Generally, there are three steps to finish the integration test, (1) constructing the test class and annotating the environment, (2) housekeeping to prepare for the test and clean up the environment after the test, and (3) implementing the logic of the integration test. To test IoTDB not under the default configuration, the configuration should be changed before the test, which will be introduced in section 4. + + + +#### 1. Integration Test Class (IT Class) and Annotations + +When writing new IT classes, the developers are encouraged to create the new ones in the [integration-test](https://github.com/apache/iotdb/tree/master/integration-test) module. Except for the classes serving the other test cases, the classes containing integration tests to evaluate the functionality of IoTDB should be named "function"+"IT". For example, the test for auto-registration metadata in IoTDB is named “IoTDBAutoCreateSchemaIT”. + +- Category`` Annotation. **When creating new IT classes, the ```@Category``` should be introduced explicitly**, and the test environment should be specified by ```LocalStandaloneIT.class```, ```ClusterIT.class```, and ```RemoteIT.class```, which corresponds to the Local Standalone, Cluster and Remote environment respectively. **In general, ```LocalStandaloneIT.class``` and ```ClusterIT.class``` should both be included**. Only in the case when some functionalities are only supported in the standalone version can we include ```LocalStandaloneIT.class``` solely. +- RunWith Annotation. The ```@RunWith(IoTDBTestRunner.class)``` annotation should be included in every IT class. + + +```java +// Introduce annotations to IoTDBAliasIT.class. The environments include local standalone, cluster and remote. +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class, ClusterIT.class, RemoteIT.class}) +public class IoTDBAliasIT { + ... +} + +// Introduce annotations to IoTDBAlignByDeviceIT.class. The environments include local standalone and cluster. +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class, ClusterIT.class}) +public class IoTDBAlignByDeviceIT { + ... +} +``` + +#### 2. Housekeeping to Prepare for the Test and Clean up the Environment after the Test + +Preparations before the test include starting an IoTDB (single or cluster) instance and preparing data for the test. The logic should be implemented in the ```setUp()``` method, and the method should follow the annotation ```@BeforeClass``` or ```@Before```. +The former means that this method is the first method executed for the IT class and is executed only once. The latter indicates that ```setUp()``` will be executed before each test method in the IT class. + +- Please start IoTDB instance through the factor class, i.e., ```EnvFactory.getEnv().initBeforeClass()```. +- Data preparation for the test includes registering databases, registering time series, and writing time series data as required by the test. It is recommended to implement a separate method within the IT class to prepare the data, such as ```insertData()```. +Please try to take advantage of the ```executeBatch()``` in JDBC or ```insertRecords()``` and ```insertTablets()``` in Session API if multiple statements or operations are to be executed. + +```java +@BeforeClass +public static void setUp() throws Exception { + // start an IoTDB instance + EnvFactory.getEnv().initBeforeClass(); + ... // data preparation +} +``` + +After the test, please clean up the environment by shut down the connections that have not been closed. This logic should be implemented in the ```tearDown()``` method. The ```tearDown()``` method follows the annotation ```@AfterClass``` or ```@After```. The former means that this method is the last method to execute for the IT class and is executed only once. The latter indicates that ```tearDown()``` will be executed after each test method in the IT class. + +- If the IoTDB connection is declared as an instance variable and is not closed after the test, please explicitly close it in the ```tearDown()``` method. +- The cleaning up should be implemented through the factory class, i.e., ```EnvFactory.getEnv().cleanAfterClass()```. + + +```java +@AfterClass +public static void tearDown() throws Exception { + ... // close the connection + // clean up the environment + EnvFactory.getEnv().cleanAfterClass(); +} +``` + +#### 3. Implementing the logic of IT + +IT of Apache IoTDB should be implemented as black-box testing. Please name the method as "functionality"+"Test", e.g., "selectWithAliasTest". The interaction should be implemented through JDBC or Session API. + +1 With JDBC + +When using the JDBC interface, it is recommended that the connection be established in a try statement. Connections established in this way do not need to be closed in the tearDown method explicitly. Connections need to be established through the factory class, i.e., ```EnvFactory.getEnv().getConnection()```. It is not necessary to specify the IP address or port number. The sample code is shown below. + +```java +@Test +public void someFunctionTest(){ + try (Connection connection = EnvFactory.getEnv().getConnection(); + Statement statement = connection.createStatement()) { + ... // execute the statements and test the correctness + } catch (Exception e) { + e.printStackTrace(); + Assert.fail(); + } +} +``` + +Please note that, +- **It is required to use ```executeQuery()``` to query the data from the database and get the ResultSet.** +- **For updating the database without any return value, it is required to use ```execute()``` method to interact with the database.** +The sample code is as follows. + +```java +@Test +public void exampleTest() throws Exception { + try (Connection connection = EnvFactory.getEnv().getConnection(); + Statement statement = connection.createStatement()) { + // use execute() to set the databases + statement.execute("CREATE DATABASE root.sg"); + // use executeQuery() query the databases + try (ResultSet resultSet = statement.executeQuery("SHOW DATABASES")) { + if (resultSet.next()) { + String storageGroupPath = resultSet.getString("database"); + Assert.assertEquals("root.sg", storageGroupPath); + } else { + Assert.fail("This ResultSet is empty."); + } + } + } +} +``` + +2 With Session API + +Currently, it is not recommended to implement IT with Session API. + +3 Annotations of Environment for the Test Methods + +For test methods, developers can also specify a test environment with the annotation before the method. It is important to note that a case with additional test environment annotations will be tested not only in the specified environment, but also in the environment of the IT class to which the use case belongs. The sample code is as follows. + + +```java +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class}) +public class IoTDBExampleIT { + + // This case will only be tested in a local stand-alone test environment + @Test + public void theStandaloneCaseTest() { + ... + } + + // The use case will be tested in the local standalone environment, the cluster environment, and the remote test environment. + @Test + @Category({ClusterIT.class, RemoteIT.class}) + public void theAllEnvCaseTest() { + ... + } +} +``` + +#### 4. Change the configurations of IoTDB when testing + +Sometimes, the configurations of IoTDB need to be changed in order to test the functionalities under certain conditions. Because changing the configurations on a remote machine is troublesome, configuration modification is not allowed in the remote environment. However, it is allowed in the local standalone and cluster environment. Changes of the configuration files should be implemented in the ```setUp()``` method, before ```EnvFactory.getEnv().initBeforeClass()```, and should be implemented through ConfigFactory. In ```tearDown()``` , please undo all changes of the configurations and revert to its original default settings by ConfigFactory after the environment cleanup (```EnvFactory.getEnv().cleanAfterTest()```). The example code is as follows. + + +```java +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class, ClusterIT.class}) +public class IoTDBAlignedSeriesQueryIT { + + protected static boolean enableSeqSpaceCompaction; + protected static boolean enableUnseqSpaceCompaction; + protected static boolean enableCrossSpaceCompaction; + + @BeforeClass + public static void setUp() throws Exception { + // get the default configurations + enableSeqSpaceCompaction = ConfigFactory.getConfig().isEnableSeqSpaceCompaction(); + enableUnseqSpaceCompaction = ConfigFactory.getConfig().isEnableUnseqSpaceCompaction(); + enableCrossSpaceCompaction = ConfigFactory.getConfig().isEnableCrossSpaceCompaction(); + // update configurations + ConfigFactory.getConfig().setEnableSeqSpaceCompaction(false); + ConfigFactory.getConfig().setEnableUnseqSpaceCompaction(false); + ConfigFactory.getConfig().setEnableCrossSpaceCompaction(false); + EnvFactory.getEnv().initBeforeClass(); + AlignedWriteUtil.insertData(); + } + + @AfterClass + public static void tearDown() throws Exception { + EnvFactory.getEnv().cleanAfterClass(); + // revert to the default configurations + ConfigFactory.getConfig().setEnableSeqSpaceCompaction(enableSeqSpaceCompaction); + ConfigFactory.getConfig().setEnableUnseqSpaceCompaction(enableUnseqSpaceCompaction); + ConfigFactory.getConfig().setEnableCrossSpaceCompaction(enableCrossSpaceCompaction); + } +} +``` + + +## Q&A +### Ways to check the log after the CI failure +1 click *Details* of the corresponding test + + + +2 check and download the error log + + + +You can also click the *summary* at the upper left and then check and download the error log. + + + +### Commands for running IT + +Please check [Integration Test For the MPP Architecture](https://github.com/apache/iotdb/blob/master/integration-test/README.md) for details. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Interface-Comparison.md b/src/UserGuide/V2.0.1/Tree/stage/Interface-Comparison.md new file mode 100644 index 00000000..4907b825 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Interface-Comparison.md @@ -0,0 +1,50 @@ + + +# Native API Comparison + +This chapter mainly compares the differences between Java Native API and python native API, mainly for the convenience of distinguishing the differences between Java Native API and python native API. + + + +| Order | API name and function | Java API | Python API |
API Comparison
| +| ----- |---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| :----------------------------------------------------------- | ------------------------------------------------------------ | +| 1 | Initialize session | `Session.Builder.build(); Session.Builder().host(String host).port(int port).build(); Session.Builder().nodeUrls(List nodeUrls).build(); Session.Builder().fetchSize(int fetchSize).username(String username).password(String password).thriftDefaultBufferSize(int thriftDefaultBufferSize).thriftMaxFrameSize(int thriftMaxFrameSize).enableRedirection(boolean enableCacheLeader).version(Version version).build();` | `Session(ip, port_, username_, password_,fetch_size=1024, zone_id="UTC+8")` | 1. The python native API lacks the default configuration to initialize the session
2. The python native API is missing the initialization session of specifying multiple connectable nodes
3. The python native API is missing. Use other configuration items to initialize the session | +| 2 | Open session | `void open() void open(boolean enableRPCCompression)` | `session.open(enable_rpc_compression=False)` | | +| 3 | Close session | `void close()` | `session.close()` | | +| 4 | Create Database | `void setStorageGroup(String storageGroupId)` | `session.set_storage_group(group_name)` | | +| 5 | Delete database | `void deleteStorageGroup(String storageGroup) void deleteStorageGroups(List storageGroups)` | `session.delete_storage_group(group_name) session.delete_storage_groups(group_name_lst)` | | +| 6 | Create timeseries | `void createTimeseries(String path, TSDataType dataType,TSEncoding encoding, CompressionType compressor, Map props,Map tags, Map attributes, String measurementAlias) void createMultiTimeseries(List paths, List dataTypes,List encodings, List compressors,List> propsList, List> tagsList,List> attributesList, List measurementAliasList)` | `session.create_time_series(ts_path, data_type, encoding, compressor,props=None, tags=None, attributes=None, alias=None) session.create_multi_time_series(ts_path_lst, data_type_lst, encoding_lst, compressor_lst,props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None)` | | +| 7 | Create aligned timeseries | `void createAlignedTimeseries(String prefixPath, List measurements,List dataTypes, List encodings,CompressionType compressor, List measurementAliasList);` | `session.create_aligned_time_series(device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst)` | | +| 8 | Delete timeseries | `void deleteTimeseries(String path) void deleteTimeseries(List paths)` | `session.delete_time_series(paths_list)` | Python native API is missing an API to delete a time series | +| 9 | Detect whether the timeseries exists | `boolean checkTimeseriesExists(String path)` | `session.check_time_series_exists(path)` | | +| 10 | Schema template | `public void createSchemaTemplate(Template template);` | | | +| 11 | Insert tablet | `void insertTablet(Tablet tablet) void insertTablets(Map tablets)` | `session.insert_tablet(tablet_) session.insert_tablets(tablet_lst)` | | +| 12 | Insert record | `void insertRecord(String prefixPath, long time, List measurements,List types, List values) void insertRecords(List deviceIds,List times,List> measurementsList,List> typesList,List> valuesList) void insertRecordsOfOneDevice(String deviceId, List times,List> valuesList)` | `session.insert_record(device_id, timestamp, measurements_, data_types_, values_) session.insert_records(device_ids_, time_list_, measurements_list_, data_type_list_, values_list_) session.insert_records_of_one_device(device_id, time_list, measurements_list, data_types_list, values_list)` | | +| 13 | Write with type inference | `void insertRecord(String prefixPath, long time, List measurements, List values) void insertRecords(List deviceIds, List times,List> measurementsList, List> valuesList) void insertStringRecordsOfOneDevice(String deviceId, List times,List> measurementsList, List> valuesList)` | `session.insert_str_record(device_id, timestamp, measurements, string_values)` | 1. The python native API lacks an API for inserting multiple records
2. The python native API lacks the ability to insert multiple records belonging to the same device | +| 14 | Write of aligned time series | `insertAlignedRecord insertAlignedRecords insertAlignedRecordsOfOneDevice insertAlignedStringRecordsOfOneDevice insertAlignedTablet insertAlignedTablets` | `insert_aligned_record insert_aligned_records insert_aligned_records_of_one_device insert_aligned_tablet insert_aligned_tablets` | Python native API is missing the writing of aligned time series with judgment type | +| 15 | Data deletion | `void deleteData(String path, long endTime) void deleteData(List paths, long endTime)` | | 1. The python native API lacks an API to delete a piece of data
2. The python native API lacks an API to delete multiple pieces of data | +| 16 | Data query | `SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime) SessionDataSet executeLastDataQuery(List paths, long LastTime)` | | 1. The python native API lacks an API for querying the original data
2. The python native API lacks an API to query the data whose last timestamp is greater than or equal to a certain time point | +| 17 | Iotdb SQL API - query statement | `SessionDataSet executeQueryStatement(String sql)` | `session.execute_query_statement(sql)` | | +| 18 | Iotdb SQL API - non query statement | `void executeNonQueryStatement(String sql)` | `session.execute_non_query_statement(sql)` | | +| 19 | Test API | `void testInsertRecord(String deviceId, long time, List measurements, List values) void testInsertRecord(String deviceId, long time, List measurements,List types, List values) void testInsertRecords(List deviceIds, List times,List> measurementsList, List> valuesList) void testInsertRecords(List deviceIds, List times,List> measurementsList, List> typesList,List> valuesList) void testInsertTablet(Tablet tablet) void testInsertTablets(Map tablets)` | Python client support for testing is based on the testcontainers library | Python API has no native test API | +| 20 | Connection pool for native interfaces | `SessionPool` | | Python API has no connection pool for native API | +| 21 | API related to cluster information | `iotdb-thrift-cluster` | | Python API does not support interfaces related to cluster information | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/IoTDB-Data-Pipe_timecho.md b/src/UserGuide/V2.0.1/Tree/stage/IoTDB-Data-Pipe_timecho.md new file mode 100644 index 00000000..f4f038a8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/IoTDB-Data-Pipe_timecho.md @@ -0,0 +1,24 @@ + + +# IoTDB Data Pipe + +TODO \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/CSV-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/CSV-Tool.md new file mode 100644 index 00000000..2acb5ff9 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/CSV-Tool.md @@ -0,0 +1,263 @@ + + +# CSV Tool + +The CSV tool can help you import data in CSV format to IoTDB or export data from IoTDB to a CSV file. + +## Usage of export-csv.sh + +### Syntax + +```shell +# Unix/OS X +> tools/export-csv.sh -h -p -u -pw -td [-tf -datatype -q -s -linesPerFile ] + +# Windows +> tools\export-csv.bat -h -p -u -pw -td [-tf -datatype -q -s -linesPerFile ] +``` + +Description: + +* `-datatype`: + - true (by default): print the data type of timesries in the head line of CSV file. i.e., `Time, root.sg1.d1.s1(INT32), root.sg1.d1.s2(INT64)`. + - false: only print the timeseries name in the head line of the CSV file. i.e., `Time, root.sg1.d1.s1 , root.sg1.d1.s2` +* `-q `: + - specifying a query command that you want to execute + - example: `select * from root.** limit 100`, or `select * from root.** limit 100 align by device` +* `-s `: + - specifying a SQL file which can consist of more than one sql. If there are multiple SQLs in one SQL file, the SQLs should be separated by line breaks. And, for each SQL, a output CSV file will be generated. +* `-td `: + - specifying the directory that the data will be exported +* `-tf `: + - specifying a time format that you want. The time format have to obey [ISO 8601](https://calendars.wikia.org/wiki/ISO_8601) standard. If you want to save the time as the timestamp, then setting `-tf timestamp` + - example: `-tf yyyy-MM-dd\ HH:mm:ss` or `-tf timestamp` +* `-linesPerFile `: + - Specifying lines of each dump file, `10000` is default. + - example: `-linesPerFile 1` +* `-t `: + - Specifies the timeout period for session queries, in milliseconds + + +More, if you don't use one of `-s` and `-q`, you need to enter some queries after running the export script. The results of the different query will be saved to different CSV files. + +### example + +```shell +# Unix/OS X +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 + +# Windows +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 +``` + +### Sample SQL file + +```sql +select * from root.**; +select * from root.** align by device; +``` + +The result of `select * from root.**` + +```sql +Time,root.ln.wf04.wt04.status(BOOLEAN),root.ln.wf03.wt03.hardware(TEXT),root.ln.wf02.wt02.status(BOOLEAN),root.ln.wf02.wt02.hardware(TEXT),root.ln.wf01.wt01.hardware(TEXT),root.ln.wf01.wt01.status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,true,"v1",true,"v1",v1,true +1970-01-01T08:00:00.002+08:00,true,"v1",,,,true +``` + +The result of `select * from root.** align by device` + +```sql +Time,Device,hardware(TEXT),status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,root.ln.wf01.wt01,"v1",true +1970-01-01T08:00:00.002+08:00,root.ln.wf01.wt01,,true +1970-01-01T08:00:00.001+08:00,root.ln.wf02.wt02,"v1",true +1970-01-01T08:00:00.001+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.002+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.001+08:00,root.ln.wf04.wt04,,true +1970-01-01T08:00:00.002+08:00,root.ln.wf04.wt04,,true +``` + +The data of boolean type signed by `true` and `false` without double quotes. And the text data will be enclosed in double quotes. + +### Note + +Note that if fields exported by the export tool have the following special characters: + +1. `,`: the field will be escaped by `\`. + +## Usage of import-csv.sh + +### Create metadata (optional) + +```sql +CREATE DATABASE root.fit.d1; +CREATE DATABASE root.fit.d2; +CREATE DATABASE root.fit.p; +CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; +CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; +``` + +IoTDB has the ability of type inference, so it is not necessary to create metadata before data import. However, we still recommend creating metadata before importing data using the CSV import tool, as this can avoid unnecessary type conversion errors. + +### Sample CSV file to be imported + +The data aligned by time, and headers without data type. + +```sql +Time,root.test.t1.str,root.test.t2.str,root.test.t2.int +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,"123",, +``` + +The data aligned by time, and headers with data type.(Text type data supports double quotation marks and no double quotation marks) + +```sql +Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.int(INT32) +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,123,hello world,123 +1970-01-01T08:00:00.003+08:00,"123",, +1970-01-01T08:00:00.004+08:00,123,,12 +``` + +The data aligned by device, and headers without data type. + +```sql +Time,Device,str,int +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +``` + +The data aligned by device, and headers with data type.(Text type data supports double quotation marks and no double quotation marks) + +```sql +Time,Device,str(TEXT),int(INT32) +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 +1970-01-01T08:00:00.003+08:00,root.test.t1,,123 +``` + +### Syntax + +```shell +# Unix/OS X +> tools/import-csv.sh -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] +# Windows +> tools\import-csv.bat -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] +``` + +Description: + +* `-f`: + - the CSV file that you want to import, and it could be a file or a folder. If a folder is specified, all TXT and CSV files in the folder will be imported in batches. + - example: `-f filename.csv` + +* `-fd`: + - specifying a directory to save files which save failed lines. If you don't use this parameter, the failed file will be saved at original directory, and the filename will be the source filename with suffix `.failed`. + - example: `-fd ./failed/` + +* `-aligned`: + - whether to use the aligned interface? The option `false` is default. + - example: `-aligned true` + +* `-batch`: + - specifying the point's number of a batch. If the program throw the exception `org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`, you can lower this parameter as appropriate. + - example: `-batch 100000`, `100000` is the default value. + +* `-tp `: + - specifying a time precision. Options includes `ms`(millisecond), `ns`(nanosecond), and `us`(microsecond), `ms` is default. + +* `-typeInfer `: + - specifying rules of type inference. + - Option `srcTsDataType` includes `boolean`,`int`,`long`,`float`,`double`,`NaN`. + - Option `dstTsDataType` includes `boolean`,`int`,`long`,`float`,`double`,`text`. + - When `srcTsDataType` is `boolean`, `dstTsDataType` should be between `boolean` and `text`. + - When `srcTsDataType` is `NaN`, `dstTsDataType` should be among `float`, `double` and `text`. + - When `srcTsDataType` is Numeric type, `dstTsDataType` precision should be greater than `srcTsDataType`. + - example: `-typeInfer boolean=text,float=double` + +* `-linesPerFailedFile `: + - Specifying lines of each failed file, `10000` is default. + - example: `-linesPerFailedFile 1` + +### Example + +```sh +# Unix/OS X +> tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +> tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 + +# Windows +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 + +``` + +### Note + +Note that the following special characters in fields need to be checked before importing: + +1. `,` : fields containing `,` should be escaped by `\`. +2. you can input time format like `yyyy-MM-dd'T'HH:mm:ss`, `yyy-MM-dd HH:mm:ss`, or `yyyy-MM-dd'T'HH:mm:ss.SSSZ`. +3. the `Time` column must be the first one. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/IoTDB-Data-Dir-Overview-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/IoTDB-Data-Dir-Overview-Tool.md new file mode 100644 index 00000000..dcfe657c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/IoTDB-Data-Dir-Overview-Tool.md @@ -0,0 +1,82 @@ + + +# IoTDB Data Directory Overview Tool + +IoTDB data directory overview tool is used to print an overview of the IoTDB data directory structure. The location is tools/tsfile/print-iotdb-data-dir. + +## Usage + +- For Windows: + +```bash +.\print-iotdb-data-dir.bat () +``` + +- For Linux or MacOs: + +```shell +./print-iotdb-data-dir.sh () +``` + +Note: if the storage path of the output overview file is not set, the default relative path "IoTDB_data_dir_overview.txt" will be used. + +## Example + +Use Windows in this example: + +`````````````````````````bash +.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data +```````````````````````` +Starting Printing the IoTDB Data Directory Overview +```````````````````````` +output save path:IoTDB_data_dir_overview.txt +data dir num:1 +143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +|============================================================== +|D:\github\master\iotdb\data\datanode\data +|--sequence +| |--root.redirect0 +| | |--1 +| | | |--0 +| |--root.redirect1 +| | |--2 +| | | |--0 +| |--root.redirect2 +| | |--3 +| | | |--0 +| |--root.redirect3 +| | |--4 +| | | |--0 +| |--root.redirect4 +| | |--5 +| | | |--0 +| |--root.redirect5 +| | |--6 +| | | |--0 +| |--root.sg1 +| | |--0 +| | | |--0 +| | | |--2760 +|--unsequence +|============================================================== +````````````````````````` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/JMX-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/JMX-Tool.md new file mode 100644 index 00000000..a0bd8c08 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/JMX-Tool.md @@ -0,0 +1,59 @@ + + +# JMX Tool + +Java VisualVM is a tool that provides a visual interface for viewing detailed information about Java applications while they are running on a Java Virtual Machine (JVM), and for troubleshooting and profiling these applications. + +## Usage + +Step1: Fetch IoTDB-sever. + +Step2: Edit configuration. + +* IoTDB is LOCAL +View `$IOTDB_HOME/conf/jmx.password`, and use default user or add new users here. +If new users are added, remember to edit `$IOTDB_HOME/conf/jmx.access` and add new users' access + +* IoTDB is not LOCAL +Edit `$IOTDB_HOME/conf/datanode-env.sh`, and modify config below: +``` +JMX_LOCAL="false" +JMX_IP="the_real_iotdb_server_ip" # Write the actual IoTDB IP address +``` +View `$IOTDB_HOME/conf/jmx.password`, and use default user or add new users here. +If new users are added, remember to edit `$IOTDB_HOME/conf/jmx.access` and add new users' access + +Step 3: Start IoTDB-server. + +Step 4: Use jvisualvm +1. Make sure jdk 8 is installed. For versions later than jdk 8, you need to [download visualvm](https://visualvm.github.io/download.html) +2. Open jvisualvm +3. Right-click at the left navigation area -> Add JMX connection + + +4. Fill in information and log in as below. Remember to check "Do not require SSL connection". +An example is: +Connection:192.168.130.15:31999 +Username:iotdb +Password:passw!d + + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Load-Tsfile.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Load-Tsfile.md new file mode 100644 index 00000000..a372680f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Load-Tsfile.md @@ -0,0 +1,111 @@ + + +# Load External TsFile Tool + +## Introduction + +The load external tsfile tool allows users to load tsfiles, delete a tsfile, or move a tsfile to target directory from the running Apache IoTDB instance. Alternatively, you can use scripts to load tsfiles into IoTDB, for more information. + +## Load with SQL + +The user sends specified commands to the Apache IoTDB system through the Cli tool or JDBC to use the tool. + +### load tsfiles + +The command to load tsfiles is `load [sglevel=int][verify=true/false][onSuccess=delete/none]`. + +This command has two usages: + +1. Load a single tsfile by specifying a file path (absolute path). + +The first parameter indicates the path of the tsfile to be loaded. This command has three options: sglevel, verify, onSuccess. + +SGLEVEL option. If the database correspond to the tsfile does not exist, the user can set the level of database through the fourth parameter. By default, it uses the database level which is set in `iotdb-system.properties`. + +VERIFY option. If this parameter is true, All timeseries in this loading tsfile will be compared with the timeseries in IoTDB. If existing a measurement which has different datatype with the measurement in IoTDB, the loading process will be stopped and exit. If consistence can be promised, setting false for this parameter will be a better choice. + +ONSUCCESS option. The default value is DELETE, which means the processing method of successfully loaded tsfiles, and DELETE means after the tsfile is successfully loaded, it will be deleted. NONE means after the tsfile is successfully loaded, it will be remained in the origin dir. + +If the `.resource` file corresponding to the file exists, it will be loaded into the data directory and engine of the Apache IoTDB. Otherwise, the corresponding `.resource` file will be regenerated from the tsfile file. + +Examples: + +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true onSuccess=none` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1 onSuccess=delete` + +2. Load a batch of files by specifying a folder path (absolute path). + +The first parameter indicates the path of the tsfile to be loaded. The options above also works for this command. + +Examples: + +* `load '/Users/Desktop/data'` +* `load '/Users/Desktop/data' verify=false` +* `load '/Users/Desktop/data' verify=true` +* `load '/Users/Desktop/data' verify=true sglevel=1` +* `load '/Users/Desktop/data' verify=false sglevel=1 onSuccess=delete` + +**NOTICE**: When `$IOTDB_HOME$/conf/iotdb-system.properties` has `enable_auto_create_schema=true`, it will automatically create metadata in TSFILE, otherwise it will not be created automatically. + +## Load with Script + +Run rewrite-tsfile.bat if you are in a Windows environment, or rewrite-tsfile.sh if you are on Linux or Unix. + +```bash +./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--verify true/false] [--onSuccess none/delete] +-f File/Directory to be load, required +-h IoTDB Host address, optional field, 127.0.0.1 by default +-p IoTDB port, optional field, 6667 by default +-u IoTDB user name, optional field, root by default +-pw IoTDB password, optional field, root by default +--sgLevel Sg level of loading Tsfile, optional field, default_storage_group_level in iotdb-system.properties by default +--verify Verify schema or not, optional field, True by default +--onSuccess Delete or remain origin TsFile after loading, optional field, none by default +``` + +### Example + +Assuming that an IoTDB instance is running on server 192.168.0.101:6667, you want to load all TsFile files from the locally saved TsFile backup folder D:\IoTDB\data into this IoTDB instance. + +First move to the folder `$IOTDB_HOME/tools/`, open the command line, and execute + +```bash +./load-rewrite.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root +``` + +After waiting for the script execution to complete, you can check that the data in the IoTDB instance has been loaded correctly. + +### Q&A + +- Cannot find or load the main class + - It may be because the environment variable $IOTDB_HOME is not set, please set the environment variable and try again +- -f option must be set! + - The input command is missing the -f field (file or folder path to be loaded) or the -u field (user name), please add it and re-execute +- What if the execution crashes in the middle and you want to reload? + - You re-execute the command just now, reloading the data will not affect the correctness after loading \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Log-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Log-Tool.md new file mode 100644 index 00000000..fb0ae438 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Log-Tool.md @@ -0,0 +1,68 @@ + + +# System log + +IoTDB allows users to configure IoTDB system logs (such as log output level) by modifying the log configuration file. The default location of the system log configuration file is in \$IOTDB_HOME/conf folder. + +The default log configuration file is named logback.xml. The user can modify the configuration of the system running log by adding or changing the xml tree node parameters. It should be noted that the configuration of the system log using the log configuration file does not take effect immediately after the modification, instead, it will take effect after restarting the system. The usage of logback.xml is just as usual. + +At the same time, in order to facilitate the debugging of the system by the developers and DBAs, we provide several JMX interfaces to dynamically modify the log configuration, and configure the Log module of the system in real time without restarting the system. + +## Dynamic System Log Configuration + +### Connect JMX + +Here we use JConsole to connect with JMX. + +Start the JConsole, establish a new JMX connection with the IoTDB Server (you can select the local process or input the IP and PORT for remote connection, the default operation port of the IoTDB JMX service is 31999). Fig 4.1 shows the connection GUI of JConsole. + + + +After connected, click `MBean` and find `ch.qos.logback.classic.default.ch.qos.logback.classic.jmx.JMXConfigurator`(As shown in fig 4.2). + + +In the JMXConfigurator Window, there are 6 operations provided, as shown in fig 4.3. You can use these interfaces to perform operation. + + + +### Interface Instruction + +* reloadDefaultConfiguration + +This method is to reload the default logback configuration file. The user can modify the default configuration file first, and then call this method to reload the modified configuration file into the system to take effect. + +* reloadByFileName + +This method loads a logback configuration file with the specified path and name, and then makes it take effect. This method accepts a parameter of type String named p1, which is the path to the configuration file that needs to be specified for loading. + +* getLoggerEffectiveLevel + +This method is to obtain the current log level of the specified Logger. This method accepts a String type parameter named p1, which is the name of the specified Logger. This method returns the log level currently in effect for the specified Logger. + +* getLoggerLevel + +This method is to obtain the log level of the specified Logger. This method accepts a String type parameter named p1, which is the name of the specified Logger. This method returns the log level of the specified Logger. +It should be noted that the difference between this method and the `getLoggerEffectiveLevel` method is that the method returns the log level that the specified Logger is set in the configuration file. If the user does not set the log level for the Logger, then return empty. According to Logger's log-level inheritance mechanism, a Logger's level is not explicitly set, it will inherit the log level settings from its nearest ancestor. At this point, calling the `getLoggerEffectiveLevel` method will return the log level in which the Logger is in effect; calling `getLoggerLevel` will return null. + +* setLoggerLevel + +This method sets the log level of the specified Logger. The method accepts a parameter of type String named p1 and a parameter of type String named p2, specifying the name of the logger and the log level of the target, respectively. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/MLogParser-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/MLogParser-Tool.md new file mode 100644 index 00000000..cb146ec8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/MLogParser-Tool.md @@ -0,0 +1,40 @@ + + +# MlogParser Tool + +After version 0.12.x, IoTDB encodes metadata files into binary format. + +If you want to parse metadata into a human-readable way, you can use this tool to parse the specified metadata file. + +Currently, the tool can only parse mlog.bin file. + +If the consensus protocol used in cluster for SchemaRegion is RatisConsensus, IoTDB won't use mlog.bin file to store metadata and won't generate mlog.bin file. + +## How to use + +Linux/MacOS +> ./print-schema-log.sh -f /your path/mlog.bin -o /your path/mlog.txt + +Windows + +> .\print-schema-log.bat -f \your path\mlog.bin -o \your path\mlog.txt + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Maintenance-Command.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Maintenance-Command.md new file mode 100644 index 00000000..9d9ced83 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Maintenance-Command.md @@ -0,0 +1,227 @@ + + +# Maintenance Command +## FLUSH + +Persist all the data points in the memory table of the database to the disk, and seal the data file. In cluster mode, we provide commands to persist the specified database cache of local node and persist the specified database cache of the cluster. + +Note: This command does not need to be invoked manually by the client. IoTDB has WAL to ensure data security +and IoTDB will flush when appropriate. +Frequently call flush can result in small data files that degrade query performance. + +```sql +IoTDB> FLUSH +IoTDB> FLUSH ON LOCAL +IoTDB> FLUSH ON CLUSTER +IoTDB> FLUSH root.ln +IoTDB> FLUSH root.sg1,root.sg2 ON LOCAL +IoTDB> FLUSH root.sg1,root.sg2 ON CLUSTER +``` + +## CLEAR CACHE + +Clear the cache of chunk, chunk metadata and timeseries metadata to release the memory footprint. In cluster mode, we provide commands to clear local node cache and clear the cluster cache. + +```sql +IoTDB> CLEAR CACHE +IoTDB> CLEAR CACHE ON LOCAL +IoTDB> CLEAR CACHE ON CLUSTER +``` + +## START REPAIR DATA + +Start a repair task to scan all files created before current time. +The repair task will scan all tsfiles and repair some bad files. + +```sql +IoTDB> START REPAIR DATA +IoTDB> START REPAIR DATA ON LOCAL +IoTDB> START REPAIR DATA ON CLUSTER +``` + +## STOP REPAIR DATA + +Stop the running repair task. To restart the stopped task. +If there is a stopped repair task, it can be restart and recover the repair progress by executing SQL `START REPAIR DATA`. + +```sql +IoTDB> STOP REPAIR DATA +IoTDB> STOP REPAIR DATA ON LOCAL +IoTDB> STOP REPAIR DATA ON CLUSTER +``` + +## SET SYSTEM TO READONLY / RUNNING + +Manually set IoTDB system to running, read-only mode. In cluster mode, we provide commands to set the local node status and set the cluster status, valid for the entire cluster by default. + +```sql +IoTDB> SET SYSTEM TO RUNNING +IoTDB> SET SYSTEM TO READONLY ON LOCAL +IoTDB> SET SYSTEM TO READONLY ON CLUSTER +``` + + +## Kill Query + +IoTDB supports setting session connection timeouts and query timeouts, and also allows to stop the executing query manually. + +### Session timeout + +Session timeout controls when idle sessions are closed. An idle session is one that had not initiated any query or non-query operations for a period of time. + +Session timeout is disabled by default and can be set using the `dn_session_timeout_threshold` parameter in IoTDB configuration file. + +### Query timeout + +For queries that take too long to execute, IoTDB will forcibly interrupt the query and throw a timeout exception, as shown in the figure: + +```sql +IoTDB> select * from root; +Msg: 701 Current query is time out, please check your statement or modify timeout parameter. +``` + +The default timeout of a query is 60000 ms,which can be customized in the configuration file through the `query_timeout_threshold` parameter. + +If you use JDBC or Session, we also support setting a timeout for a single query(Unit: ms): + +```java +((IoTDBStatement) statement).executeQuery(String sql, long timeoutInMS) +session.executeQueryStatement(String sql, long timeout) +``` + + +> If the timeout parameter is not configured or with a negative number, the default timeout time will be used. +> If value 0 is used, timeout function will be disabled. + +### Query abort + +In addition to waiting for the query to time out passively, IoTDB also supports stopping the query actively: + +#### Kill specific query + +```sql +KILL QUERY +``` + +You can kill the specified query by specifying `queryId`. `queryId` is a string, so you need to put quotes around it. + +To get the executing `queryId`,you can use the [show queries](#show-queries) command, which will show the list of all executing queries. + +##### Example +```sql +kill query '20221205_114444_00003_5' +``` + +#### Kill all queries + +```sql +KILL ALL QUERIES +``` + +Kill all queries on all DataNodes. + +## SHOW QUERIES + +This command is used to display all ongoing queries, here are usage scenarios: +- When you want to kill a query, you need to get the queryId of it +- Verify that a query has been killed after killing + +### Grammar + +```sql +SHOW QUERIES | (QUERY PROCESSLIST) +[WHERE whereCondition] +[ORDER BY sortKey {ASC | DESC}] +[LIMIT rowLimit] [OFFSET rowOffset] +``` +Note: +- Compatibility with old syntax `show query processlist` +- When using WHERE clause, ensure that target columns of filter are existed in the result set +- When using ORDER BY clause, ensure that sortKeys are existed in the result set + +### ResultSet +Time:Start time of query,DataType is `INT64` +QueryId:Cluster - level unique query identifier,DataType is `TEXT`, format is `yyyyMMdd_HHmmss_index_dataNodeId` +DataNodeId:DataNode which do execution of query,DataType is `INT32` +ElapsedTime:Execution time of query (Imperfectly accurate),`second` for unit,DataType is `FLOAT` +Statement:Origin string of query,DataType is `TEXT` + +``` ++-----------------------------+-----------------------+----------+-----------+------------+ +| Time| QueryId|DataNodeId|ElapsedTime| Statement| ++-----------------------------+-----------------------+----------+-----------+------------+ +|2022-12-30T13:26:47.260+08:00|20221230_052647_00005_1| 1| 0.019|show queries| ++-----------------------------+-----------------------+----------+-----------+------------+ +``` +Note: +- Result set is arranged in Time ASC as default, use ORDER BY clause if you want to sort it by other keys. + +### SQL Example +#### Example1:Obtain all current queries whose execution time is longer than 30 seconds + +SQL string: +```sql +SHOW QUERIES WHERE ElapsedTime > 30 +``` + +SQL result: +``` ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +| Time| QueryId|DataNodeId|ElapsedTime| Statement| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:44.515+08:00|20221205_114444_00002_2| 2| 31.111| select * from root.test1| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:45.515+08:00|20221205_114445_00003_2| 2| 30.111| select * from root.test2| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:43.515+08:00|20221205_114443_00001_3| 3| 32.111| select * from root.**| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +``` + +#### Example2:Obtain the Top5 queries in the current execution time + +SQL string: +```sql +SHOW QUERIES limit 5 +``` + +Equivalent to +```sql +SHOW QUERIES ORDER BY ElapsedTime DESC limit 5 +``` + +SQL result: +``` ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +| Time| QueryId|DataNodeId|ElapsedTime| Statement| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:44.515+08:00|20221205_114444_00003_5| 5| 31.111| select * from root.test1| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:45.515+08:00|20221205_114445_00003_2| 2| 30.111| select * from root.test2| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:46.515+08:00|20221205_114446_00003_3| 3| 29.111| select * from root.test3| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:47.515+08:00|20221205_114447_00003_2| 2| 28.111| select * from root.test4| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:48.515+08:00|20221205_114448_00003_4| 4| 27.111| select * from root.test5| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Overlap-Validation-And-Repair-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Overlap-Validation-And-Repair-Tool.md new file mode 100644 index 00000000..5305fc46 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Overlap-Validation-And-Repair-Tool.md @@ -0,0 +1,42 @@ + + +# Overlap validation and repair tool + +The Overlap Validation And Repair tool is used to validate the resource files in sequence space, and repair overlaps. + +The validation function can be run in any scenario. Confirmation is required after overlapping files are found. Typing 'y' will perform the repair. + +**The repair function must be run when corresponding DataNode is stopped and there are no unfinished compaction task in all data dirs.** +To make sure there are no unfinished compaction tasks, you can modify the config files to set enable compaction items to false, and restart DataNode waiting compaction recover task to finish. +Then stop the DataNode and run this tool. +## Usage +```shell +#MacOs or Linux +./check-overlap-sequence-files-and-repair.sh [sequence_data_dir1] [sequence_data_dir2]... +# Windows +.\check-overlap-sequence-files-and-repair.bat [sequence_data_dir1] [sequence_data_dir2]... +``` +## Example +```shell +./check-overlap-sequence-files-and-repair.sh /data1/sequence/ /data2/sequence +``` +This example validate two data dirs: /data1/sequence/, /data2/sequence. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/SchemaFileSketch-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/SchemaFileSketch-Tool.md new file mode 100644 index 00000000..8fe0a511 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/SchemaFileSketch-Tool.md @@ -0,0 +1,38 @@ + + +# PBTreeFileSketch Tool + +Since version 1.1, IoTDB could store schema into a persistent slotted file. + +If you want to parse PBTree file into a human-readable way, you can use this tool to parse the specified PBTree file. + +The tool can sketch .pst file. + +## How to use + +Linux/MacOS +> ./print-pbtree-file.sh -f your/path/to/pbtree.pst -o /your/path/to/sketch.txt + +Windows + +> ./print-pbtree-file.bat -f your/path/to/pbtree.pst -o /your/path/to/sketch.txt + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Load-Export-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Load-Export-Tool.md new file mode 100644 index 00000000..af05df19 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Load-Export-Tool.md @@ -0,0 +1,179 @@ + +# TsFile Load And Export Tool + +## TsFile Load Tool + +### Introduction + +The load external tsfile tool allows users to load tsfiles, delete a tsfile, or move a tsfile to target directory from the running Apache IoTDB instance. Alternatively, you can use scripts to load tsfiles into IoTDB, for more information. + +### Load with SQL + +The user sends specified commands to the Apache IoTDB system through the Cli tool or JDBC to use the tool. + +#### load tsfiles + +The command to load tsfiles is `load [sglevel=int][verify=true/false][onSuccess=delete/none]`. + +This command has two usages: + +1. Load a single tsfile by specifying a file path (absolute path). + +The first parameter indicates the path of the tsfile to be loaded. This command has three options: sglevel, verify, onSuccess. + +SGLEVEL option. If the database correspond to the tsfile does not exist, the user can set the level of database through the fourth parameter. By default, it uses the database level which is set in `iotdb-system.properties`. + +VERIFY option. If this parameter is true, All timeseries in this loading tsfile will be compared with the timeseries in IoTDB. If existing a measurement which has different datatype with the measurement in IoTDB, the loading process will be stopped and exit. If consistence can be promised, setting false for this parameter will be a better choice. + +ONSUCCESS option. The default value is DELETE, which means the processing method of successfully loaded tsfiles, and DELETE means after the tsfile is successfully loaded, it will be deleted. NONE means after the tsfile is successfully loaded, it will be remained in the origin dir. + +If the `.resource` file corresponding to the file exists, it will be loaded into the data directory and engine of the Apache IoTDB. Otherwise, the corresponding `.resource` file will be regenerated from the tsfile file. + +Examples: + +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true onSuccess=none` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1 onSuccess=delete` + +2. Load a batch of files by specifying a folder path (absolute path). + +The first parameter indicates the path of the tsfile to be loaded. The options above also works for this command. + +Examples: + +* `load '/Users/Desktop/data'` +* `load '/Users/Desktop/data' verify=false` +* `load '/Users/Desktop/data' verify=true` +* `load '/Users/Desktop/data' verify=true sglevel=1` +* `load '/Users/Desktop/data' verify=false sglevel=1 onSuccess=delete` + +**NOTICE**: When `$IOTDB_HOME$/conf/iotdb-system.properties` has `enable_auto_create_schema=true`, it will automatically create metadata in TSFILE, otherwise it will not be created automatically. + +### Load with Script + +Run rewrite-tsfile.bat if you are in a Windows environment, or rewrite-tsfile.sh if you are on Linux or Unix. + +```bash +./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--verify true/false] [--onSuccess none/delete] +-f File/Directory to be load, required +-h IoTDB Host address, optional field, 127.0.0.1 by default +-p IoTDB port, optional field, 6667 by default +-u IoTDB user name, optional field, root by default +-pw IoTDB password, optional field, root by default +--sgLevel Sg level of loading Tsfile, optional field, default_storage_group_level in iotdb-system.properties by default +--verify Verify schema or not, optional field, True by default +--onSuccess Delete or remain origin TsFile after loading, optional field, none by default +``` + +#### Example + +Assuming that an IoTDB instance is running on server 192.168.0.101:6667, you want to load all TsFile files from the locally saved TsFile backup folder D:\IoTDB\data into this IoTDB instance. + +First move to the folder `$IOTDB_HOME/tools/`, open the command line, and execute + +```bash +./load-rewrite.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root +``` + +After waiting for the script execution to complete, you can check that the data in the IoTDB instance has been loaded correctly. + +#### Q&A + +- Cannot find or load the main class + - It may be because the environment variable $IOTDB_HOME is not set, please set the environment variable and try again +- -f option must be set! + - The input command is missing the -f field (file or folder path to be loaded) or the -u field (user name), please add it and re-execute +- What if the execution crashes in the middle and you want to reload? + - You re-execute the command just now, reloading the data will not affect the correctness after loading + +TsFile can help you export the result set in the format of TsFile file to the specified path by executing the sql, command line sql, and sql file. + +## TsFile Export Tool + +### Syntax + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h -p -u -pw -td [-f -q -s ] + +# Windows +> tools\export-tsfile.bat -h -p -u -pw -td [-f -q -s ] +``` + +* `-h `: + - The host address of the IoTDB service. +* `-p `: + - The port number of the IoTDB service. +* `-u `: + - The username of the IoTDB service. +* `-pw `: + - Password for IoTDB service. +* `-td `: + - Specify the output path for the exported TsFile file. +* `-f `: + - For the file name of the exported TsFile file, just write the file name, and cannot include the file path and suffix. If the sql file or console input contains multiple sqls, multiple files will be generated in the order of sql. + - Example: There are three SQLs in the file or command line, and -f param is "dump", then three TsFile files: dump0.tsfile、dump1.tsfile、dump2.tsfile will be generated in the target path. +* `-q `: + - Directly specify the query statement you want to execute in the command. + - Example: `select * from root.** limit 100` +* `-s `: + - Specify a SQL file that contains one or more SQL statements. If an SQL file contains multiple SQL statements, the SQL statements should be separated by newlines. Each SQL statement corresponds to an output TsFile file. +* `-t `: + - Specifies the timeout period for session queries, in milliseconds + + +In addition, if you do not use the `-s` and `-q` parameters, after the export script is started, you need to enter the query statement as prompted by the program, and different query results will be saved to different TsFile files. + +### example + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 + +# Windows +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 +``` + +### example +- It is recommended not to execute the write data command at the same time when loading data, which may lead to insufficient memory in the JVM. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Resource-Sketch-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Resource-Sketch-Tool.md new file mode 100644 index 00000000..c77dbe9d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Resource-Sketch-Tool.md @@ -0,0 +1,79 @@ + + +# TsFile Resource Sketch Tool + +TsFile resource sketch tool is used to print the content of a TsFile resource file. The location is tools/tsfile/print-tsfile-resource-files. + +## Usage + +- For Windows: + +```bash +.\print-tsfile-resource-files.bat +``` + +- For Linux or MacOs: + +``` +./print-tsfile-resource-files.sh +``` + +## Example + +Use Windows in this example: + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. +````````````````````````` + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. +````````````````````````` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Settle-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Settle-Tool.md new file mode 100644 index 00000000..c8646bd4 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Settle-Tool.md @@ -0,0 +1,42 @@ + + +# TsFile Settle tool + +The TsFile Settle tool is used to rewrite one or more TsFiles that have modified record files, and submit the TsFile compaction task by sending an RPC to the DataNode to rewrite the TsFile. +## Usage +```shell +#MacOs or Linux +./settle-tsfile.sh -h [host] -p [port] -f [filePaths] +# Windows +.\settle-tsfile.bat -h [host] -p [port] -f [filePaths] +``` +The host and port parameters are the host and port of the DataNodeInternalRPCService. If not specified, the default values are 127.0.0.1 and 10730 respectively. The filePaths parameter specifies the absolute paths of all TsFiles to be submitted as a compaction task on this DataNode, separated by spaces. Pass in at least one path. +## Example +```shell +./settle-tsfile.sh -h 127.0.0.1 -p 10730 -f /data/sequence/root.sg/0/0/1672133354759-2-0-0.tsfile /data/sequence/root.sg/0/0/1672306417865-3-0-0.tsfile /data/sequence/root.sg/0/0/1672306417865-3-0-0.tsfile +``` +## Requirement +* Specify at least one TsFile +* All specified TsFiles are in the same space and are continuous, and cross-space compaction is not supported +* The specified file path is the absolute path of the TsFile of the node where the specified DataNode is located +* The specified DataNode is configured to allow the space where the input TsFile is located to perform the compaction +* At least one of the specified TsFiles has a corresponding .mods file \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Sketch-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Sketch-Tool.md new file mode 100644 index 00000000..9429fb43 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Sketch-Tool.md @@ -0,0 +1,108 @@ + + +# TsFile Sketch Tool + +TsFile sketch tool is used to print the content of a TsFile in sketch mode. The location is tools/tsfile/print-tsfile. + +## Usage + +- For Windows: + +``` +.\print-tsfile-sketch.bat () +``` + +- For Linux or MacOs: + +``` +./print-tsfile-sketch.sh () +``` + +Note: if the storage path of the output sketch file is not set, the default relative path "TsFile_sketch_view.txt" will be used. + +## Example + +Use Windows in this example: + +`````````````````````````bash +.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt +```````````````````````` +Starting Printing the TsFile Sketch +```````````````````````` +TsFile path:D:\github\master\1669359533965-1-0-0.tsfile +Sketch save path:D:\github\master\sketch.txt +148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +-------------------------------- TsFile Sketch -------------------------------- +file path: D:\github\master\1669359533965-1-0-0.tsfile +file length: 2974 + + POSITION| CONTENT + -------- ------- + 0| [magic head] TsFile + 6| [version number] 3 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 + 7| [Chunk Group Header] + | [marker] 0 + | [deviceID] root.sg1.d1 + 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends + 2656| [marker] 2 + 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [ChunkIndex] offset=20 + 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [ChunkIndex] offset=893 + 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [ChunkIndex] offset=1766 + 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT + | + | +||||||||||||||||||||| [TsFileMetadata] begins + 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE + | + | + | [meta offset] 2656 + | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 +||||||||||||||||||||| [TsFileMetadata] ends + 2964| [TsFileMetadataSize] 73 + 2968| [magic tail] TsFile + 2974| END of TsFile +---------------------------- IndexOfTimerseriesIndex Tree ----------------------------- + [MetadataIndex:LEAF_DEVICE] + └──────[root.sg1.d1,2870] + [MetadataIndex:LEAF_MEASUREMENT] + └──────[s1,2657] +---------------------------------- TsFile Sketch End ---------------------------------- +````````````````````````` + +Explanations: + +- Separated by "|", the left is the actual position in the TsFile, and the right is the summary content. +- "||||||||||||||||||||" is the guide information added to enhance readability, not the actual data stored in TsFile. +- The last printed "IndexOfTimerseriesIndex Tree" is a reorganization of the metadata index tree at the end of the TsFile, which is convenient for intuitive understanding, and again not the actual data stored in TsFile. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Split-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Split-Tool.md new file mode 100644 index 00000000..e0fae81b --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Split-Tool.md @@ -0,0 +1,46 @@ + + +# TsFile Split Tool + +TsFile split tool is used to split a TsFile into multiple TsFiles. The location is tools/tsfile/split-tsfile-tool + +How to use: + +For Windows: + +``` +.\split-tsfile-tool.bat (-level ) (-size ) +``` + +For Linux or MacOs: + +``` +./split-tsfile-tool.sh (-level ) (-size ) +``` + +> For example, if the new files size is 100MB, and the compaction num is 6, the command is `./split-tsfile-tool.sh test.tsfile -level 6 -size 1048576000` (Linux or MacOs) + +Here are some more tips: +1. TsFile split tool is for one closed TsFile, need to ensure this TsFile is closed. If the TsFile is in IoTDB, a `.resource` file represent it is closed. +2. When doing split, make sure the TsFile is not in a running IoTDB. +3. Currently, we do not resolve the corresponding mods file, if you wish to put the new files into the IoTDB data dir and be loaded by restarting, you need to copy the related mods file(if exist) and rename them, make sure each new file has one mods. +4. This tools do not support aligned timeseries currently. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFileSelfCheck-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFileSelfCheck-Tool.md new file mode 100644 index 00000000..ef0c8eff --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFileSelfCheck-Tool.md @@ -0,0 +1,42 @@ + + +# TsFileSelfCheck Tool +IoTDB Server provides the TsFile self check tool. At present, the tool can check the basic format of the TsFile file, the correctness of TimeseriesMetadata, and the correctness and consistency of the Statistics stored in each part of the TsFile. + +## Use +Step 1:Create an object instance of TsFileSelfCheckTool class. + +``` java +TsFileSelfCheckTool tool = new TsFileSelfCheckTool(); +``` + +Step 2:Call the check method of the self check tool. The first parameter path is the path of the TsFile to be checked. The second parameter is whether to check only the Magic String and Version Number at the beginning and end of TsFile. + +``` java +tool.check(path, false); +``` + +* There are four return values of the check method. +* The return value is 0, which means that the TsFile self check is error-free. +* The return value is -1, which means that TsFile has inconsistencies in Statistics. There will be two specific exceptions, one is that the Statistics of TimeSeriesMetadata is inconsistent with the Statistics of the aggregated statistics of ChunkMetadata. The other is that the Statistics of ChunkMetadata is inconsistent with the Statistics of Page aggregation statistics in the Chunk indexed by it. +* The return value is -2, which means that the TsFile version is not compatible. +* The return value is -3, which means that the TsFile file does not exist in the given path. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Watermark-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Watermark-Tool.md new file mode 100644 index 00000000..dd571252 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Watermark-Tool.md @@ -0,0 +1,196 @@ + + +# Watermark Tool + +This tool has two functions: 1) watermark embedding of the IoTDB query result and 2) watermark detection of the suspected data. + +## Watermark Embedding + +### Configuration + +Watermark is disabled by default in IoTDB. To enable watermark embedding, the first thing is to modify the following fields in the configuration file `iotdb-system.properties`: + +| Name | Example | Explanation | +| ----------------------- | ------------------------------------------------------ | ------------------------------------------------------------ | +| watermark_module_opened | false | `true` to enable watermark embedding of the IoTDB server; `false` to disable | +| watermark_secret_key | IoTDB*2019@Beijing | self-defined secret key | +| watermark_bit_string | 100101110100 | 0-1 bit string to be embedded | +| watermark_method | GroupBasedLSBMethod(embed_row_cycle=2,embed_lsb_num=5) | specifies the watermark algorithm and its paramters | + +Notes: + +- `watermark_module_opened`: Set it to be true if you want to enable watermark embedding +- `watermark_secret_key`: Character '&' is not allowed. There is no constraint on the length of the secret key. Generally, the longer the key is, the higher the bar to intruders. +- `watermark_bit_string`: There is no constraint on the length of the bit string (except that it should not be empty). But note that it is difficult to reach the required significance level at the watermark detection phase if the bit string is way too short. +- `watermark_method`: Now only GroupBasedLSBMethod is supported, so actually you can only tune the two parameters of this method, which are `embed_row_cycle` and `embed_lsb_num`. + - Both of them should be positive integers. + - `embed_row_cycle` controls the ratio of rows watermarked. The smaller the `embed_row_cycle`, the larger the ratio of rows watermarked. When `embed_row_cycle` equals 1, every row is watermarked. + - GroupBasedLSBMethod uses LSB embedding. `embed_lsb_num` controls the number of least significant bits available for watermark embedding. The biggger the `embed_lsb_num`, the larger the varying range of a data point. +- `watermark_secret_key`, `watermark_bit_string` and `watermark_method` should be kept secret from possible attackers. That is, it is your responsiblity to take care of `iotdb-system.properties`. + +### Usage Example + +* step 1. Create a new user Alice, grant read privilege and query + +A newly created user doesn't use watermark by default. So the query result is the original data. + +``` +.\start-cli.bat -u root -pw root +create user Alice 1234 +grant user Alice privileges READ_TIMESERIES on root.vehicle +exit + +.\start-cli.bat -u Alice -pw 1234 +select * from root ++-----------------------------------+------------------+ +| Time|root.vehicle.d0.s0| ++-----------------------------------+------------------+ +| 1970-01-01T08:00:00.001+08:00| 21.5| +| 1970-01-01T08:00:00.002+08:00| 22.5| +| 1970-01-01T08:00:00.003+08:00| 23.5| +| 1970-01-01T08:00:00.004+08:00| 24.5| +| 1970-01-01T08:00:00.005+08:00| 25.5| +| 1970-01-01T08:00:00.006+08:00| 26.5| +| 1970-01-01T08:00:00.007+08:00| 27.5| +| 1970-01-01T08:00:00.008+08:00| 28.5| +| 1970-01-01T08:00:00.009+08:00| 29.5| +| 1970-01-01T08:00:00.010+08:00| 30.5| +| 1970-01-01T08:00:00.011+08:00| 31.5| +| 1970-01-01T08:00:00.012+08:00| 32.5| +| 1970-01-01T08:00:00.013+08:00| 33.5| +| 1970-01-01T08:00:00.014+08:00| 34.5| +| 1970-01-01T08:00:00.015+08:00| 35.5| +| 1970-01-01T08:00:00.016+08:00| 36.5| +| 1970-01-01T08:00:00.017+08:00| 37.5| +| 1970-01-01T08:00:00.018+08:00| 38.5| +| 1970-01-01T08:00:00.019+08:00| 39.5| +| 1970-01-01T08:00:00.020+08:00| 40.5| +| 1970-01-01T08:00:00.021+08:00| 41.5| +| 1970-01-01T08:00:00.022+08:00| 42.5| +| 1970-01-01T08:00:00.023+08:00| 43.5| +| 1970-01-01T08:00:00.024+08:00| 44.5| +| 1970-01-01T08:00:00.025+08:00| 45.5| +| 1970-01-01T08:00:00.026+08:00| 46.5| +| 1970-01-01T08:00:00.027+08:00| 47.5| +| 1970-01-01T08:00:00.028+08:00| 48.5| +| 1970-01-01T08:00:00.029+08:00| 49.5| +| 1970-01-01T08:00:00.030+08:00| 50.5| +| 1970-01-01T08:00:00.031+08:00| 51.5| +| 1970-01-01T08:00:00.032+08:00| 52.5| +| 1970-01-01T08:00:00.033+08:00| 53.5| ++-----------------------------------+------------------+ +``` + +* step 2. grant watermark_embedding to Alice + +Usage: `grant watermark_embedding to Alice` + +Note that you can use `grant watermark_embedding to user1,user2,...` to grant watermark_embedding to multiple users. + +Only root can run this command. After root grants watermark_embedding to Alice, all query results of Alice are watermarked. + +``` +.\start-cli.bat -u root -pw root +grant watermark_embedding to Alice +exit + +.\start-cli.bat -u Alice -pw '1234' +select * from root ++-----------------------------------+------------------+ +| Time|root.vehicle.d0.s0| ++-----------------------------------+------------------+ +| 1970-01-01T08:00:00.001+08:00| 21.5| +| 1970-01-01T08:00:00.002+08:00| 22.5| +| 1970-01-01T08:00:00.003+08:00| 23.500008| +| 1970-01-01T08:00:00.004+08:00| 24.500015| +| 1970-01-01T08:00:00.005+08:00| 25.5| +| 1970-01-01T08:00:00.006+08:00| 26.500015| +| 1970-01-01T08:00:00.007+08:00| 27.5| +| 1970-01-01T08:00:00.008+08:00| 28.500004| +| 1970-01-01T08:00:00.009+08:00| 29.5| +| 1970-01-01T08:00:00.010+08:00| 30.5| +| 1970-01-01T08:00:00.011+08:00| 31.5| +| 1970-01-01T08:00:00.012+08:00| 32.5| +| 1970-01-01T08:00:00.013+08:00| 33.5| +| 1970-01-01T08:00:00.014+08:00| 34.5| +| 1970-01-01T08:00:00.015+08:00| 35.500004| +| 1970-01-01T08:00:00.016+08:00| 36.5| +| 1970-01-01T08:00:00.017+08:00| 37.5| +| 1970-01-01T08:00:00.018+08:00| 38.5| +| 1970-01-01T08:00:00.019+08:00| 39.5| +| 1970-01-01T08:00:00.020+08:00| 40.5| +| 1970-01-01T08:00:00.021+08:00| 41.5| +| 1970-01-01T08:00:00.022+08:00| 42.500015| +| 1970-01-01T08:00:00.023+08:00| 43.5| +| 1970-01-01T08:00:00.024+08:00| 44.500008| +| 1970-01-01T08:00:00.025+08:00| 45.50003| +| 1970-01-01T08:00:00.026+08:00| 46.500008| +| 1970-01-01T08:00:00.027+08:00| 47.500008| +| 1970-01-01T08:00:00.028+08:00| 48.5| +| 1970-01-01T08:00:00.029+08:00| 49.5| +| 1970-01-01T08:00:00.030+08:00| 50.5| +| 1970-01-01T08:00:00.031+08:00| 51.500008| +| 1970-01-01T08:00:00.032+08:00| 52.5| +| 1970-01-01T08:00:00.033+08:00| 53.5| ++-----------------------------------+------------------+ +``` + +* step 3. revoke watermark_embedding from Alice + +Usage: `revoke watermark_embedding from Alice` + +Note that you can use `revoke watermark_embedding from user1,user2,...` to revoke watermark_embedding from multiple users. + +Only root can run this command. After root revokes watermark_embedding from Alice, all query results of Alice are original again. + +## Watermark Detection + +`detect-watermark.sh` and `detect-watermark.bat` are provided for different platforms. + +Usage: ./detect-watermark.sh [filePath] [secretKey] [watermarkBitString] [embed_row_cycle] [embed_lsb_num] [alpha] [columnIndex] [dataType: int/float/double] + +Example: ./detect-watermark.sh /home/data/dump1.csv IoTDB*2019@Beijing 100101110100 2 5 0.05 1 float + +| Args | Example | Explanation | +| ------------------ | -------------------- | ------------------------------------------------------------ | +| filePath | /home/data/dump1.csv | suspected data file path | +| secretKey | IoTDB*2019@Beijing | see watermark embedding section | +| watermarkBitString | 100101110100 | see watermark embedding section | +| embed_row_cycle | 2 | see watermark embedding section | +| embed_lsb_num | 5 | see watermark embedding section | +| alpha | 0.05 | significance level | +| columnIndex | 1 | specifies one column of the data to detect | +| dataType | float | specifies the data type of the detected column; int/float/double | + +Notes: + +- `filePath`: You can use export-csv tool to generate such data file. The first row is header and the first column is time. Data in the file looks like this: + + | Time | root.vehicle.d0.s1 | root.vehicle.d0.s1 | + | ----------------------------- | ------------------ | ------------------ | + | 1970-01-01T08:00:00.001+08:00 | 100 | null | + | ... | ... | ... | + +- `watermark_secret_key`, `watermark_bit_string`, `embed_row_cycle` and `embed_lsb_num` should be consistent with those used in the embedding phase. + +- `alpha`: It should be in the range of [0,1]. The watermark detection is based on the significance test. The smaller the `alpha` is, the lower the probability that the data without the watermark is detected to be watermark embedded, and thus the higher the credibility of the result of detecting the existence of the watermark in data. + +- `columnIndex`: It should be a positive integer. + diff --git a/src/UserGuide/V2.0.1/Tree/stage/MapReduce-TsFile.md b/src/UserGuide/V2.0.1/Tree/stage/MapReduce-TsFile.md new file mode 100644 index 00000000..b77f4165 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/MapReduce-TsFile.md @@ -0,0 +1,199 @@ + +# Hadoop-TsFile + + +## About Hadoop-TsFile-Connector + +TsFile-Hadoop-Connector implements the support of Hadoop for external data sources of Tsfile type. This enables users to read, write and query Tsfile by Hadoop. + +With this connector, you can +* load a single TsFile, from either the local file system or hdfs, into Hadoop +* load all files in a specific directory, from either the local file system or hdfs, into hadoop +* write data from Hadoop into TsFile + +## System Requirements + +|Hadoop Version | Java Version | TsFile Version| +|:---:|:---:|:---:| +| `2.7.3` | `1.8` | `1.0.0`| + +> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/iotdb/tree/master/tsfile. + +## Data Type Correspondence + +| TsFile data type | Hadoop writable | +| ---------------- | --------------- | +| BOOLEAN | BooleanWritable | +| INT32 | IntWritable | +| INT64 | LongWritable | +| FLOAT | FloatWritable | +| DOUBLE | DoubleWritable | +| TEXT | Text | + +## TSFInputFormat Explanation + +TSFInputFormat extract data from tsfile and format them into records of `MapWritable`. + +Suppose that we want to extract data of the device named `d1` which has three sensors named `s1`, `s2`, `s3`. + +`s1`'s type is `BOOLEAN`, `s2`'s type is `DOUBLE`, `s3`'s type is `TEXT`. + +The `MapWritable` struct will be like: +``` +{ + "time_stamp": 10000000, + "device_id": d1, + "s1": true, + "s2": 3.14, + "s3": "middle" +} +``` + +In the Map job of Hadoop, you can get any value you want by key as following: + +`mapwritable.get(new Text("s1"))` +> Note: All keys in `MapWritable` are `Text` type. + +## Examples + +### Read Example: calculate the sum + +First of all, we should tell InputFormat what kind of data we want from tsfile. + +``` + // configure reading time enable + TSFInputFormat.setReadTime(job, true); + // configure reading deviceId enable + TSFInputFormat.setReadDeviceId(job, true); + // configure reading which deltaObjectIds + String[] deviceIds = {"device_1"}; + TSFInputFormat.setReadDeviceIds(job, deltaObjectIds); + // configure reading which measurementIds + String[] measurementIds = {"sensor_1", "sensor_2", "sensor_3"}; + TSFInputFormat.setReadMeasurementIds(job, measurementIds); +``` + +And then,the output key and value of mapper and reducer should be specified + +``` + // set inputformat and outputformat + job.setInputFormatClass(TSFInputFormat.class); + // set mapper output key and value + job.setMapOutputKeyClass(Text.class); + job.setMapOutputValueClass(DoubleWritable.class); + // set reducer output key and value + job.setOutputKeyClass(Text.class); + job.setOutputValueClass(DoubleWritable.class); +``` + +Then, the `mapper` and `reducer` class is how you deal with the `MapWritable` produced by `TSFInputFormat` class. + +``` + public static class TSMapper extends Mapper { + + @Override + protected void map(NullWritable key, MapWritable value, + Mapper.Context context) + throws IOException, InterruptedException { + + Text deltaObjectId = (Text) value.get(new Text("device_id")); + context.write(deltaObjectId, (DoubleWritable) value.get(new Text("sensor_3"))); + } + } + + public static class TSReducer extends Reducer { + + @Override + protected void reduce(Text key, Iterable values, + Reducer.Context context) + throws IOException, InterruptedException { + + double sum = 0; + for (DoubleWritable value : values) { + sum = sum + value.get(); + } + context.write(key, new DoubleWritable(sum)); + } + } +``` + +> Note: For the complete code, please see the following link: https://github.com/apache/iotdb/blob/master/example/hadoop/src/main/java/org/apache/iotdb//hadoop/tsfile/TSFMRReadExample.java + + +### Write Example: write the average into Tsfile + +Except for the `OutputFormatClass`, the rest of configuration code for hadoop map-reduce job is almost same as above. + +``` + job.setOutputFormatClass(TSFOutputFormat.class); + // set reducer output key and value + job.setOutputKeyClass(NullWritable.class); + job.setOutputValueClass(HDFSTSRecord.class); +``` + +Then, the `mapper` and `reducer` class is how you deal with the `MapWritable` produced by `TSFInputFormat` class. + +``` + public static class TSMapper extends Mapper { + @Override + protected void map(NullWritable key, MapWritable value, + Mapper.Context context) + throws IOException, InterruptedException { + + Text deltaObjectId = (Text) value.get(new Text("device_id")); + long timestamp = ((LongWritable)value.get(new Text("timestamp"))).get(); + if (timestamp % 100000 == 0) { + context.write(deltaObjectId, new MapWritable(value)); + } + } + } + + /** + * This reducer calculate the average value. + */ + public static class TSReducer extends Reducer { + + @Override + protected void reduce(Text key, Iterable values, + Reducer.Context context) throws IOException, InterruptedException { + long sensor1_value_sum = 0; + long sensor2_value_sum = 0; + double sensor3_value_sum = 0; + long num = 0; + for (MapWritable value : values) { + num++; + sensor1_value_sum += ((LongWritable)value.get(new Text("sensor_1"))).get(); + sensor2_value_sum += ((LongWritable)value.get(new Text("sensor_2"))).get(); + sensor3_value_sum += ((DoubleWritable)value.get(new Text("sensor_3"))).get(); + } + HDFSTSRecord tsRecord = new HDFSTSRecord(1L, key.toString()); + DataPoint dPoint1 = new LongDataPoint("sensor_1", sensor1_value_sum / num); + DataPoint dPoint2 = new LongDataPoint("sensor_2", sensor2_value_sum / num); + DataPoint dPoint3 = new DoubleDataPoint("sensor_3", sensor3_value_sum / num); + tsRecord.addTuple(dPoint1); + tsRecord.addTuple(dPoint2); + tsRecord.addTuple(dPoint3); + context.write(NullWritable.get(), tsRecord); + } + } +``` +> Note: For the complete code, please see the following link: https://github.com/apache/iotdb/blob/master/example/hadoop/src/main/java/org/apache/iotdb//hadoop/tsfile/TSMRWriteExample.java diff --git a/src/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Alerting.md b/src/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Alerting.md new file mode 100644 index 00000000..99ecc2a0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Alerting.md @@ -0,0 +1,402 @@ + + +# Alerting + +## Overview +The alerting of IoTDB is expected to support two modes: + +* Writing triggered: the user writes data to the original time series, and every time a piece of data is inserted, the judgment logic of `trigger` will be triggered. +If the alerting requirements are met, an alert is sent to the data sink, +The data sink then forwards the alert to the external terminal. + * This mode is suitable for scenarios that need to monitor every piece of data in real time. + * Since the operation in the trigger will affect the data writing performance, it is suitable for scenarios that are not sensitive to the original data writing performance. + +* Continuous query: the user writes data to the original time series, +`ContinousQuery` periodically queries the original time series, and writes the query results into the new time series, +Each write triggers the judgment logic of `trigger`, +If the alerting requirements are met, an alert is sent to the data sink, +The data sink then forwards the alert to the external terminal. + * This mode is suitable for scenarios where data needs to be regularly queried within a certain period of time. + * It is Suitable for scenarios where the original data needs to be down-sampled and persisted. + * Since the timing query hardly affects the writing of the original time series, it is suitable for scenarios that are sensitive to the performance of the original data writing performance. + +With the introduction of the [Trigger](../Trigger/Instructions.md) into IoTDB, +at present, users can use these two modules with `AlertManager` to realize the writing triggered alerting mode. + + + +## Deploying AlertManager + +### Installation +#### Precompiled binaries +The pre-compiled binary file can be downloaded [here](https://prometheus.io/download/). + +Running command: +````shell +./alertmanager --config.file= +```` + +#### Docker image +Available at [Quay.io](https://hub.docker.com/r/prom/alertmanager/) +or [Docker Hub](https://quay.io/repository/prometheus/alertmanager). + +Running command: +````shell +docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager +```` + +### Configuration + +The following is an example, which can cover most of the configuration rules. For detailed configuration rules, see +[here](https://prometheus.io/docs/alerting/latest/configuration/). + +Example: +``` yaml +# alertmanager.yml + +global: + # The smarthost and SMTP sender used for mail notifications. + smtp_smarthost: 'localhost:25' + smtp_from: 'alertmanager@example.org' + +# The root route on which each incoming alert enters. +route: + # The root route must not have any matchers as it is the entry point for + # all alerts. It needs to have a receiver configured so alerts that do not + # match any of the sub-routes are sent to someone. + receiver: 'team-X-mails' + + # The labels by which incoming alerts are grouped together. For example, + # multiple alerts coming in for cluster=A and alertname=LatencyHigh would + # be batched into a single group. + # + # To aggregate by all possible labels use '...' as the sole label name. + # This effectively disables aggregation entirely, passing through all + # alerts as-is. This is unlikely to be what you want, unless you have + # a very low alert volume or your upstream notification system performs + # its own grouping. Example: group_by: [...] + group_by: ['alertname', 'cluster'] + + # When a new group of alerts is created by an incoming alert, wait at + # least 'group_wait' to send the initial notification. + # This way ensures that you get multiple alerts for the same group that start + # firing shortly after another are batched together on the first + # notification. + group_wait: 30s + + # When the first notification was sent, wait 'group_interval' to send a batch + # of new alerts that started firing for that group. + group_interval: 5m + + # If an alert has successfully been sent, wait 'repeat_interval' to + # resend them. + repeat_interval: 3h + + # All the above attributes are inherited by all child routes and can + # overwritten on each. + + # The child route trees. + routes: + # This routes performs a regular expression match on alert labels to + # catch alerts that are related to a list of services. + - match_re: + service: ^(foo1|foo2|baz)$ + receiver: team-X-mails + + # The service has a sub-route for critical alerts, any alerts + # that do not match, i.e. severity != critical, fall-back to the + # parent node and are sent to 'team-X-mails' + routes: + - match: + severity: critical + receiver: team-X-pager + + - match: + service: files + receiver: team-Y-mails + + routes: + - match: + severity: critical + receiver: team-Y-pager + + # This route handles all alerts coming from a database service. If there's + # no team to handle it, it defaults to the DB team. + - match: + service: database + + receiver: team-DB-pager + # Also group alerts by affected database. + group_by: [alertname, cluster, database] + + routes: + - match: + owner: team-X + receiver: team-X-pager + + - match: + owner: team-Y + receiver: team-Y-pager + + +# Inhibition rules allow to mute a set of alerts given that another alert is +# firing. +# We use this to mute any warning-level notifications if the same alert is +# already critical. +inhibit_rules: +- source_match: + severity: 'critical' + target_match: + severity: 'warning' + # Apply inhibition if the alertname is the same. + # CAUTION: + # If all label names listed in `equal` are missing + # from both the source and target alerts, + # the inhibition rule will apply! + equal: ['alertname'] + + +receivers: +- name: 'team-X-mails' + email_configs: + - to: 'team-X+alerts@example.org, team-Y+alerts@example.org' + +- name: 'team-X-pager' + email_configs: + - to: 'team-X+alerts-critical@example.org' + pagerduty_configs: + - routing_key: + +- name: 'team-Y-mails' + email_configs: + - to: 'team-Y+alerts@example.org' + +- name: 'team-Y-pager' + pagerduty_configs: + - routing_key: + +- name: 'team-DB-pager' + pagerduty_configs: + - routing_key: +``` + +In the following example, we used the following configuration: +````yaml +# alertmanager.yml + +global: + smtp_smarthost: '' + smtp_from: '' + smtp_auth_username: '' + smtp_auth_password: '' + smtp_require_tls: false + +route: + group_by: ['alertname'] + group_wait: 1m + group_interval: 10m + repeat_interval: 10h + receiver: 'email' + +receivers: + - name: 'email' + email_configs: + - to: '' + +inhibit_rules: + - source_match: + severity: 'critical' + target_match: + severity: 'warning' + equal: ['alertname'] +```` + + +### API +The `AlertManager` API is divided into two versions, `v1` and `v2`. The current `AlertManager` API version is `v2` +(For configuration see +[api/v2/openapi.yaml](https://github.com/prometheus/alertmanager/blob/master/api/v2/openapi.yaml)). + +By default, the prefix is `/api/v1` or `/api/v2` and the endpoint for sending alerts is `/api/v1/alerts` or `/api/v2/alerts`. +If the user specifies `--web.route-prefix`, +for example `--web.route-prefix=/alertmanager/`, +then the prefix will become `/alertmanager/api/v1` or `/alertmanager/api/v2`, +and the endpoint that sends the alert becomes `/alertmanager/api/v1/alerts` +or `/alertmanager/api/v2/alerts`. + +## Creating trigger + +### Writing the trigger class + +The user defines a trigger by creating a Java class and writing the logic in the hook. +Please refer to [Trigger](../Trigger/Implement-Trigger.md) for the specific configuration process. + +The following example creates the `org.apache.iotdb.trigger.ClusterAlertingExample` class, +Its alertManagerHandler member variables can send alerts to the AlertManager instance +at the address of `http://127.0.0.1:9093/`. + +When `value> 100.0`, send an alert of `critical` severity; +when `50.0 labels = new HashMap<>(); + + private final HashMap annotations = new HashMap<>(); + + @Override + public void onCreate(TriggerAttributes attributes) throws Exception { + alertname = "alert_test"; + + labels.put("series", "root.ln.wf01.wt01.temperature"); + labels.put("value", ""); + labels.put("severity", ""); + + annotations.put("summary", "high temperature"); + annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); + + alertManagerHandler.open(alertManagerConfiguration); + } + + @Override + public void onDrop() throws IOException { + alertManagerHandler.close(); + } + + @Override + public boolean fire(Tablet tablet) throws Exception { + List measurementSchemaList = tablet.getSchemas(); + for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { + if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { + // for example, we only deal with the columns of Double type + double[] values = (double[]) tablet.values[i]; + for (double value : values) { + if (value > 100.0) { + LOGGER.info("trigger value > 100"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "critical"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } else if (value > 50.0) { + LOGGER.info("trigger value > 50"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "warning"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } + } + } + } + return true; + } +} +``` + +### Creating trigger + +The following SQL statement registered the trigger +named `root-ln-wf01-wt01-alert` +on the `root.ln.wf01.wt01.temperature` time series, +whose operation logic is defined +by `org.apache.iotdb.trigger.ClusterAlertingExample` java class. + +``` sql + CREATE STATELESS TRIGGER `root-ln-wf01-wt01-alert` + AFTER INSERT + ON root.ln.wf01.wt01.temperature + AS "org.apache.iotdb.trigger.AlertingExample" + USING URI 'http://jar/ClusterAlertingExample.jar' +``` + + +## Writing data + +When we finish the deployment and startup of AlertManager as well as the creation of Trigger, +we can test the alerting +by writing data to the time series. + +``` sql +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (1, 0); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (2, 30); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (3, 60); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (4, 90); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (5, 120); +``` + +After executing the above writing statements, +we can receive an alerting email. Because our `AlertManager` configuration above +makes alerts of `critical` severity inhibit those of `warning` severity, +the alerting email we receive only contains the alert triggered +by the writing of `(5, 120)`. + +alerting + + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Metric-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Metric-Tool.md new file mode 100644 index 00000000..d918b3df --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Metric-Tool.md @@ -0,0 +1,674 @@ + + +# Metric Tool + +Along with IoTDB running, we hope to observe the status of IoTDB, so as to troubleshoot system problems or discover +potential system risks in time. A series of metrics that can **reflect the operating status of the system** are system +monitoring metrics. + +## 1. When to use metric framework? + +Belows are some typical application scenarios + +1. System is running slowly + + When system is running slowly, we always hope to have information about system's running status as detail as + possible, such as: + + - JVM:Is there FGC? How long does it cost? How much does the memory usage decreased after GC? Are there lots of + threads? + - System:Is the CPU usage too hi?Are there many disk IOs? + - Connections:How many connections are there in the current time? + - Interface:What is the TPS and latency of every interface? + - Thread Pool:Are there many pending tasks? + - Cache Hit Ratio + +2. No space left on device + + When meet a "no space left on device" error, we really want to know which kind of data file had a rapid rise in the + past hours. + +3. Is the system running in abnormal status + + We could use the count of error logs、the alive status of nodes in cluster, etc, to determine whether the system is + running abnormally. + +## 2. Who will use metric framework? + +Any person cares about the system's status, including but not limited to RD, QA, SRE, DBA, can use the metrics to work +more efficiently. + +## 3. What is metrics? + +### 3.1. Key Concept + +In IoTDB's metric module, each metrics is uniquely identified by `Metric Name` and `Tags`. + +- `Metric Name`: Metric type name, such as `logback_events` means log events. +- `Tags`: indicator classification, in the form of Key-Value pairs, each indicator can have 0 or more categories, common + Key-Value pairs: + - `name = xxx`: The name of the monitored object, which is the description of **business logic**. For example, for a + monitoring item of type `Metric Name = entry_seconds_count`, the meaning of name refers to the monitored business + interface. + - `type = xxx`: Monitoring indicator type subdivision, which is a description of **monitoring indicator** itself. + For example, for monitoring items of type `Metric Name = point`, the meaning of type refers to the specific type + of monitoring points. + - `status = xxx`: The status of the monitored object is a description of **business logic**. For example, for + monitoring items of type `Metric Name = Task`, this parameter can be used to distinguish the status of the + monitored object. + - `user = xxx`: The relevant user of the monitored object is a description of **business logic**. For example, count + the total points written by the `root` user. + - Customize according to the specific situation: For example, there is a level classification under + logback_events_total, which is used to indicate the number of logs under a specific level. +- `Metric Level`: The level of metric managing level, The default startup level is `Core` level, the recommended startup + level is `Important level`, and the audit strictness is `Core > Important > Normal > All` + - `Core`: Core metrics of the system, used by the **operation and maintenance personnel**, which is related to the * + *performance, stability, and security** of the system, such as the status of the instance, the load of the system, + etc. + - `Important`: Important metrics of the module, which is used by **operation and maintenance and testers**, and is + directly related to **the running status of each module**, such as the number of merged files, execution status, + etc. + - `Normal`: Normal metrics of the module, used by **developers** to facilitate **locating the module** when problems + occur, such as specific key operation situations in the merger. + - `All`: All metrics of the module, used by **module developers**, often used when the problem is reproduced, so as + to solve the problem quickly. + +### 3.2. External data format for metrics + +- IoTDB provides metrics in JMX, Prometheus and IoTDB formats: + - For JMX, metrics can be obtained through ```org.apache.iotdb.metrics```. + - For Prometheus, the value of the metrics can be obtained through the externally exposed port + - External exposure in IoTDB mode: metrics can be obtained by executing IoTDB queries + +## 4. The detail of metrics + +Currently, IoTDB provides metrics for some main modules externally, and with the development of new functions and system +optimization or refactoring, metrics will be added and updated synchronously. + +If you want to add your own metrics data in IoTDB, please see +the [IoTDB Metric Framework] (https://github.com/apache/iotdb/tree/master/metrics) document. + +### 4.1. Core level metrics + +Core-level metrics are enabled by default during system operation. The addition of each Core-level metrics needs to be +carefully evaluated. The current Core-level metrics are as follows: + +#### 4.1.1. Cluster + +| Metric | Tags | Type | Description | +|---------------------------|-------------------------------------------------| --------- |-----------------------------------------------------| +| up_time | - | AutoGauge | The time IoTDB has been running | +| config_node | name="total",status="Registered/Online/Unknown" | AutoGauge | The number of registered/online/unknown confignodes | +| data_node | name="total",status="Registered/Online/Unknown" | AutoGauge | The number of registered/online/unknown datanodes | +| cluster_node_leader_count | name="{ip}:{port}" | Gauge | The count of consensus group leader on each node | +| cluster_node_status | name="{ip}:{port}",type="ConfigNode/DataNode" | Gauge | The current node status, 0=Unkonwn 1=online | +| entry | name="{interface}" | Timer | The time consumed of thrift operations | +| mem | name="IoTConsensus" | AutoGauge | The memory usage of IoTConsensus, Unit: byte | + +#### 4.1.2. Interface + +| Metric | Tags | Type | Description | +| --------------------- | ---------------------------------- | --------- | -------------------------------------------------------------- | +| thrift_connections | name="ConfigNodeRPC" | AutoGauge | The number of thrift internal connections in ConfigNode | +| thrift_connections | name="InternalRPC" | AutoGauge | The number of thrift internal connections in DataNode | +| thrift_connections | name="MPPDataExchangeRPC" | AutoGauge | The number of thrift internal connections in MPP | +| thrift_connections | name="ClientRPC" | AutoGauge | The number of thrift connections of Client | +| thrift_active_threads | name="ConfigNodeRPC-Service" | AutoGauge | The number of thrift active internal connections in ConfigNode | +| thrift_active_threads | name="DataNodeInternalRPC-Service" | AutoGauge | The number of thrift active internal connections in DataNode | +| thrift_active_threads | name="MPPDataExchangeRPC-Service" | AutoGauge | The number of thrift active internal connections in MPP | +| thrift_active_threads | name="ClientRPC-Service" | AutoGauge | The number of thrift active connections of client | +| session_idle_time | name = "sessionId" | Histogram | The distribution of idle time of different sessions | + +#### 4.1.2. Node Statistics +| Metric | Tags | Type | Description | +| -------- | ----------------------------------- | --------- | ----------------------------------------- | +| quantity | name="database" | AutoGauge | The number of database | +| quantity | name="timeSeries" | AutoGauge | The number of timeseries | +| quantity | name="pointsIn" | Counter | The number of write points | +| points | database="{database}", type="flush" | Gauge | The point number of last flushed memtable | + +#### 4.1.3. Cluster Tracing +| Metric | Tags | Type | Description | +| ------------------------------------ | ------------------------------------------------ | ----- | ------------------------------------------------ | +| performance_overview | interface="{interface}", type="{statement_type}" | Timer | The time consumed of operations in client | +| performance_overview_detail | stage="authority" | Timer | The time consumed on authority authentication | +| performance_overview_detail | stage="parser" | Timer | The time consumed on parsing statement | +| performance_overview_detail | stage="analyzer" | Timer | The time consumed on analyzing statement | +| performance_overview_detail | stage="planner" | Timer | The time consumed on planning | +| performance_overview_detail | stage="scheduler" | Timer | The time consumed on scheduling | +| performance_overview_schedule_detail | stage="local_scheduler" | Timer | The time consumed on local scheduler | +| performance_overview_schedule_detail | stage="remote_scheduler" | Timer | The time consumed on remote scheduler | +| performance_overview_local_detail | stage="schema_validate" | Timer | The time consumed on schema validation | +| performance_overview_local_detail | stage="trigger" | Timer | The time consumed on trigger | +| performance_overview_local_detail | stage="storage" | Timer | The time consumed on consensus | +| performance_overview_storage_detail | stage="engine" | Timer | The time consumed on write stateMachine | +| performance_overview_engine_detail | stage="lock" | Timer | The time consumed on grabbing lock in DataRegion | +| performance_overview_engine_detail | stage="create_memtable_block" | Timer | The time consumed on creating new memtable | +| performance_overview_engine_detail | stage="memory_block" | Timer | The time consumed on insert memory control | +| performance_overview_engine_detail | stage="wal" | Timer | The time consumed on writing wal | +| performance_overview_engine_detail | stage="memtable" | Timer | The time consumed on writing memtable | +| performance_overview_engine_detail | stage="last_cache" | Timer | The time consumed on updating last cache | + +#### 4.1.5. Task Statistics + +| Metric | Tags | Type | Description | +| --------- | ------------------------------------------------- | --------- | ------------------------------------- | +| queue | name="compaction_inner", status="running/waiting" | Gauge | The number of inner compaction tasks | +| queue | name="compaction_cross", status="running/waiting" | Gauge | The number of cross compatcion tasks | +| queue | name="flush",status="running/waiting" | AutoGauge | The number of flush tasks | +| cost_task | name="inner_compaction/cross_compaction/flush" | Gauge | The time consumed of compaction tasks | + +#### 4.1.6. IoTDB process + +| Metric | Tags | Type | Description | +| ----------------- | -------------- | --------- | ------------------------------------------------------ | +| process_cpu_load | name="process" | AutoGauge | The current CPU usage of IoTDB process, Unit: % | +| process_cpu_time | name="process" | AutoGauge | The total CPU time occupied of IoTDB process, Unit: ns | +| process_max_mem | name="memory" | AutoGauge | The maximum available memory of IoTDB process | +| process_total_mem | name="memory" | AutoGauge | The current requested memory for IoTDB process | +| process_free_mem | name="memory" | AutoGauge | The free available memory of IoTDB process | + +#### 4.1.7. System + +| Metric | Tags | Type | Description | +| ------------------------------ | ------------- | --------- | ---------------------------------------------------------- | +| sys_cpu_load | name="system" | AutoGauge | The current CPU usage of system, Unit: % | +| sys_cpu_cores | name="system" | Gauge | The available number of CPU cores | +| sys_total_physical_memory_size | name="memory" | Gauge | The maximum physical memory of system | +| sys_free_physical_memory_size | name="memory" | AutoGauge | The current available memory of system | +| sys_total_swap_space_size | name="memory" | AutoGauge | The maximum swap space of system | +| sys_free_swap_space_size | name="memory" | AutoGauge | The available swap space of system | +| sys_committed_vm_size | name="memory" | AutoGauge | The space of virtual memory available to running processes | +| sys_disk_total_space | name="disk" | AutoGauge | The total disk space | +| sys_disk_free_space | name="disk" | AutoGauge | The available disk space | + +#### 4.1.8. Log + +| Metric | Tags | Type | Description | +| -------------- | ----------------------------------- | ------- | ------------------------ | +| logback_events | level="trace/debug/info/warn/error" | Counter | The number of log events | + +#### 4.1.9. File + +| Metric | Tags | Type | Description | +| ---------- | ------------------------- | --------- | --------------------------------------------------------------------------- | +| file_size | name="wal" | AutoGauge | The size of WAL file, Unit: byte | +| file_size | name="seq" | AutoGauge | The size of sequence TsFile, Unit: byte | +| file_size | name="unseq" | AutoGauge | The size of unsequence TsFile, Unit: byte | +| file_size | name="inner-seq-temp" | AutoGauge | The size of inner sequence space compaction temporal file | +| file_size | name="inner-unseq-temp" | AutoGauge | The size of inner unsequence space compaction temporal file | +| file_size | name="cross-temp" | AutoGauge | The size of cross space compaction temoporal file | +| file_size | name="mods | AutoGauge | The size of modification files | +| file_count | name="wal" | AutoGauge | The count of WAL file | +| file_count | name="seq" | AutoGauge | The count of sequence TsFile | +| file_count | name="unseq" | AutoGauge | The count of unsequence TsFile | +| file_count | name="inner-seq-temp" | AutoGauge | The count of inner sequence space compaction temporal file | +| file_count | name="inner-unseq-temp" | AutoGauge | The count of inner unsequence space compaction temporal file | +| file_count | name="cross-temp" | AutoGauge | The count of cross space compaction temporal file | +| file_count | name="open_file_handlers" | AutoGauge | The count of open files of the IoTDB process, only supports Linux and MacOS | +| file_count | name="mods | AutoGauge | The count of modification file | + +#### 4.1.10. JVM Memory + +| Metric | Tags | Type | Description | +| ------------------------------- | ------------------------------- | --------- | --------------------------- | +| jvm_buffer_memory_used_bytes | id="direct/mapped" | AutoGauge | The used size of buffer | +| jvm_buffer_total_capacity_bytes | id="direct/mapped" | AutoGauge | The max size of buffer | +| jvm_buffer_count_buffers | id="direct/mapped" | AutoGauge | The number of buffer | +| jvm_memory_committed_bytes | {area="heap/nonheap",id="xxx",} | AutoGauge | The committed memory of JVM | +| jvm_memory_max_bytes | {area="heap/nonheap",id="xxx",} | AutoGauge | The max memory of JVM | +| jvm_memory_used_bytes | {area="heap/nonheap",id="xxx",} | AutoGauge | The used memory of JVM | + +#### 4.1.11. JVM Thread + +| Metric | Tags | Type | Description | +| -------------------------- | ------------------------------------------------------------- | --------- | ---------------------------------------- | +| jvm_threads_live_threads | | AutoGauge | The number of live thread | +| jvm_threads_daemon_threads | | AutoGauge | The number of daemon thread | +| jvm_threads_peak_threads | | AutoGauge | The number of peak thread | +| jvm_threads_states_threads | state="runnable/blocked/waiting/timed-waiting/new/terminated" | AutoGauge | The number of thread in different states | + +#### 4.1.12. JVM GC + +| Metric | Tags | Type | Description | +| ----------------------------- | ----------------------------------------------------- | --------- | --------------------------------------------------------------------------- | +| jvm_gc_pause | action="end of major GC/end of minor GC",cause="xxxx" | Timer | The number and time consumed of Young GC/Full Gc caused by different reason | +| | +| jvm_gc_concurrent_phase_time | action="{action}",cause="{cause}" | Timer | The number and time consumed of Young GC/Full Gc caused by different | +| | +| jvm_gc_max_data_size_bytes | | AutoGauge | The historical maximum value of old memory | +| jvm_gc_live_data_size_bytes | | AutoGauge | The usage of old memory | +| jvm_gc_memory_promoted_bytes | | Counter | The accumulative value of positive memory growth of old memory | +| jvm_gc_memory_allocated_bytes | | Counter | The accumulative value of positive memory growth of allocated memory | + + +### 4.2. Important level metrics + +#### 4.2.1. Node + +| Metric | Tags | Type | Description | +| ------ | -------------------------------------- | --------- | ------------------------------------------------------------- | +| region | name="total",type="SchemaRegion" | AutoGauge | The total number of SchemaRegion in PartitionTable | +| region | name="total",type="DataRegion" | AutoGauge | The total number of DataRegion in PartitionTable | +| region | name="{ip}:{port}",type="SchemaRegion" | Gauge | The number of SchemaRegion in PartitionTable of specific node | +| region | name="{ip}:{port}",type="DataRegion" | Gauge | The number of DataRegion in PartitionTable of specific node | + +#### 4.2.2. RatisConsensus + +| Metric | Tags | Type | Description | +| --------------------- | -------------------------- | ----- | ------------------------------------------------------------ | +| ratis_consensus_write | stage="writeLocally" | Timer | The time cost of writing locally stage | +| ratis_consensus_write | stage="writeRemotely" | Timer | The time cost of writing remotely stage | +| ratis_consensus_write | stage="writeStateMachine" | Timer | The time cost of writing state machine stage | +| ratis_server | clientWriteRequest | Timer | Time taken to process write requests from client | +| ratis_server | followerAppendEntryLatency | Timer | Time taken for followers to append log entries | +| ratis_log_worker | appendEntryLatency | Timer | Total time taken to append a raft log entry | +| ratis_log_worker | queueingDelay | Timer | Time taken for a Raft log operation to get into the queue after being requested, waiting queue to be non-full | +| ratis_log_worker | enqueuedTime | Timer | Time spent by a Raft log operation in the queue | +| ratis_log_worker | writelogExecutionTime | Timer | Time taken for a Raft log write operation to complete execution | +| ratis_log_worker | flushTime | Timer | Time taken to flush log | +| ratis_log_worker | closedSegmentsSizeInBytes | Gauge | Size of closed raft log segments in bytes | +| ratis_log_worker | openSegmentSizeInBytes | Gauge | Size of open raft log segment in bytes | + +#### 4.2.3. IoTConsensus + +| Metric | Tags | Type | Description | +| ------------ | ------------------------------------------------------------ | --------- | ------------------------------------------------------------ | +| mutli_leader | name="logDispatcher-{IP}:{Port}", region="{region}", type="currentSyncIndex" | AutoGauge | The sync index of synchronization thread in replica group | +| mutli_leader | name="logDispatcher-{IP}:{Port}", region="{region}", type="cachedRequestInMemoryQueue" | AutoGauge | The size of cache requests of synchronization thread in replica group | +| mutli_leader | name="IoTConsensusServerImpl", region="{region}", type="searchIndex" | AutoGauge | The write process of main process in replica group | +| mutli_leader | name="IoTConsensusServerImpl", region="{region}", type="safeIndex" | AutoGauge | The sync index of replica group | +| mutli_leader | name="IoTConsensusServerImpl", region="{region}", type="syncLag" | AutoGauge | The sync lag of replica group | +| mutli_leader | name="IoTConsensusServerImpl", region="{region}", type="LogEntriesFromWAL" | AutoGauge | The number of logEntries from wal in Batch | +| mutli_leader | name="IoTConsensusServerImpl", region="{region}", type="LogEntriesFromQueue" | AutoGauge | The number of logEntries from queue in Batch | +| stage | name="iot_consensus", region="{region}", type="getStateMachineLock" | Histogram | The time consumed to get statemachine lock in main process | +| stage | name="iot_consensus", region="{region}", type="checkingBeforeWrite" | Histogram | The time consumed to precheck before write in main process | +| stage | name="iot_consensus", region="{region}", type="writeStateMachine" | Histogram | The time consumed to write statemachine in main process | +| stage | name="iot_consensus", region="{region}", type="offerRequestToQueue" | Histogram | The time consumed to try to offer request to queue in main process | +| stage | name="iot_consensus", region="{region}", type="consensusWrite" | Histogram | The time consumed to the whole write in main process | +| stage | name="iot_consensus", region="{region}", type="constructBatch" | Histogram | The time consumed to construct batch in synchronization thread | +| stage | name="iot_consensus", region="{region}", type="syncLogTimePerRequest" | Histogram | The time consumed to sync log in asynchronous callback process | + +#### 4.2.4. Cache + +| Metric | Tags | Type | Description | +| --------- |------------------------------------| --------- |--------------------------------------------------------------------------| +| cache_hit | name="chunk" | AutoGauge | The cache hit ratio of ChunkCache, Unit: % | +| cache_hit | name="schema" | AutoGauge | The cache hit ratio of SchemaCache, Unit: % | +| cache_hit | name="timeSeriesMeta" | AutoGauge | The cache hit ratio of TimeseriesMetadataCache, Unit: % | +| cache_hit | name="bloomFilter" | AutoGauge | The interception rate of bloomFilter in TimeseriesMetadataCache, Unit: % | +| cache | name="Database", type="hit" | Counter | The hit number of Database Cache | +| cache | name="Database", type="all" | Counter | The access number of Database Cache | +| cache | name="SchemaPartition", type="hit" | Counter | The hit number of SchemaPartition Cache | +| cache | name="SchemaPartition", type="all" | Counter | The access number of SchemaPartition Cache | +| cache | name="DataPartition", type="hit" | Counter | The hit number of DataPartition Cache | +| cache | name="DataPartition", type="all" | Counter | The access number of DataPartition Cache | +| cache | name="SchemaCache", type="hit" | Counter | The hit number of Schema Cache | +| cache | name="SchemaCache", type="all" | Counter | The access number of Schema Cache | + +#### 4.2.5. Memory + +| Metric | Tags | Type | Description | +| ------ | -------------------------------- | --------- | ------------------------------------------------------------------ | +| mem | name="database_{name}" | AutoGauge | The memory usage of DataRegion in DataNode, Unit: byte | +| mem | name="chunkMetaData_{name}" | AutoGauge | The memory usage of chunkMetaData when writting TsFile, Unit: byte | +| mem | name="IoTConsensus" | AutoGauge | The memory usage of IoTConsensus, Unit: byte | +| mem | name="IoTConsensusQueue" | AutoGauge | The memory usage of IoTConsensus Queue, Unit: byte | +| mem | name="IoTConsensusSync" | AutoGauge | The memory usage of IoTConsensus SyncStatus, Unit: byte | +| mem | name="schema_region_total_usage" | AutoGauge | The memory usage of all SchemaRegion, Unit: byte | + +#### 4.2.6. Compaction + +| Metric | Tags | Type | Description | +| --------------------- | --------------------------------------------------- | ------- | -------------------------------------- | +| data_written | name="compaction", type="aligned/not-aligned/total" | Counter | The written size of compaction | +| data_read | name="compaction" | Counter | The read size of compaction | +| compaction_task_count | name = "inner_compaction", type="sequence" | Counter | The number of inner sequence compction | +| compaction_task_count | name = "inner_compaction", type="unsequence" | Counter | The number of inner sequence compction | +| compaction_task_count | name = "cross_compaction", type="cross" | Counter | The number of corss compction | + +#### 4.2.7. IoTDB Process + +| Metric | Tags | Type | Description | +| --------------------- | -------------- | --------- | ------------------------------------------- | +| process_used_mem | name="memory" | AutoGauge | The used memory of IoTDB process | +| process_mem_ratio | name="memory" | AutoGauge | The used memory ratio of IoTDB process | +| process_threads_count | name="process" | AutoGauge | The number of thread of IoTDB process | +| process_status | name="process" | AutoGauge | The status of IoTDB process, 1=live, 0=dead | + +#### 4.2.8. JVM Class + +| Metric | Tags | Type | Description | +| ---------------------------- | ---- | --------- | ---------------------------- | +| jvm_classes_unloaded_classes | | AutoGauge | The number of unloaded class | +| jvm_classes_loaded_classes | | AutoGauge | The number of loaded class | + +#### 4.2.9. JVM Compilation + +| Metric | Tags | Type | Description | +| ----------------------- | --------------------------------------------- | --------- | -------------------------------- | +| jvm_compilation_time_ms | {compiler="HotSpot 64-Bit Tiered Compilers",} | AutoGauge | The time consumed in compilation | + +#### 4.2.10. Query Planning + +| Metric | Tags | Type | Description | +| --------------- | ---------------------------- | ----- | --------------------------------------------------- | +| query_plan_cost | stage="analyzer" | Timer | The query statement analysis time-consuming | +| query_plan_cost | stage="logical_planner" | Timer | The query logical plan planning time-consuming | +| query_plan_cost | stage="distribution_planner" | Timer | The query distribution plan planning time-consuming | +| query_plan_cost | stage="partition_fetcher" | Timer | The partition information fetching time-consuming | +| query_plan_cost | stage="schema_fetcher" | Timer | The schema information fetching time-consuming | + +#### 4.2.11. Plan Dispatcher + +| Metric | Tags | Type | Description | +| ---------- | ------------------------- | ----- | ------------------------------------------------------------ | +| dispatcher | stage="wait_for_dispatch" | Timer | The distribution plan dispatcher time-consuming | +| dispatcher | stage="dispatch_read" | Timer | The distribution plan dispatcher time-consuming (only query) | + +#### 4.2.12. Query Resource + +| Metric | Tags | Type | Description | +| -------------- | ------------------------ | ---- | ------------------------------------------ | +| query_resource | type="sequence_tsfile" | Rate | The access frequency of sequence tsfiles | +| query_resource | type="unsequence_tsfile" | Rate | The access frequency of unsequence tsfiles | +| query_resource | type="flushing_memtable" | Rate | The access frequency of flushing memtables | +| query_resource | type="working_memtable" | Rate | The access frequency of working memtables | + +#### 4.2.13. Data Exchange + +| Metric | Tags | Type | Description | +|---------------------|------------------------------------------------------------------------|-----------|-----------------------------------------------------------------| +| data_exchange_cost | operation="source_handle_get_tsblock", type="local/remote" | Timer | The time-consuming that source handles receive TsBlock | +| data_exchange_cost | operation="source_handle_deserialize_tsblock", type="local/remote" | Timer | The time-consuming that source handles deserialize TsBlock | +| data_exchange_cost | operation="sink_handle_send_tsblock", type="local/remote" | Timer | The time-consuming that sink handles send TsBlock | +| data_exchange_cost | operation="send_new_data_block_event_task", type="server/caller" | Timer | The RPC time-consuming that sink handles send TsBlock | +| data_exchange_cost | operation="get_data_block_task", type="server/caller" | Timer | The RPC time-consuming that source handles receive TsBlock | +| data_exchange_cost | operation="on_acknowledge_data_block_event_task", type="server/caller" | Timer | The RPC time-consuming that source handles ack received TsBlock | +| data_exchange_count | name="send_new_data_block_num", type="server/caller" | Histogram | The number of sent TsBlocks by sink handles | +| data_exchange_count | name="get_data_block_num", type="server/caller" | Histogram | The number of received TsBlocks by source handles | +| data_exchange_count | name="on_acknowledge_data_block_num", type="server/caller" | Histogram | The number of acknowledged TsBlocks by source handles | +| data_exchange_count | name="shuffle_sink_handle_size" | AutoGauge | The number of shuffle sink handle | +| data_exchange_count | name="source_handle_size" | AutoGauge | The number of source handle | +#### 4.2.14. Query Task Schedule + +| Metric | Tags | Type | Description | +|------------------|----------------------------------|-----------|---------------------------------------------------| +| driver_scheduler | name="ready_queued_time" | Timer | The queuing time of ready queue | +| driver_scheduler | name="block_queued_time" | Timer | The queuing time of blocking queue | +| driver_scheduler | name="ready_queue_task_count" | AutoGauge | The number of tasks queued in the ready queue | +| driver_scheduler | name="block_queued_task_count" | AutoGauge | The number of tasks queued in the blocking queue | +| driver_scheduler | name="timeout_queued_task_count" | AutoGauge | The number of tasks queued in the timeout queue | +| driver_scheduler | name="query_map_size" | AutoGauge | The number of queries recorded in DriverScheduler | + +#### 4.2.15. Query Execution + +| Metric | Tags | Type | Description | +| ------------------------ | ----------------------------------------------------------------------------------- | ------- | --------------------------------------------------------------------------------------- | +| query_execution | stage="local_execution_planner" | Timer | The time-consuming of operator tree construction | +| query_execution | stage="query_resource_init" | Timer | The time-consuming of query resource initialization | +| query_execution | stage="get_query_resource_from_mem" | Timer | The time-consuming of query resource memory query and construction | +| query_execution | stage="driver_internal_process" | Timer | The time-consuming of driver execution | +| query_execution | stage="wait_for_result" | Timer | The time-consuming of getting query result from result handle | +| operator_execution_cost | name="{operator_name}" | Timer | The operator execution time | +| operator_execution_count | name="{operator_name}" | Counter | The number of operator calls (counted by the number of next method calls) | +| aggregation | from="raw_data" | Timer | The time-consuming of performing an aggregation calculation from a batch of raw data | +| aggregation | from="statistics" | Timer | The time-consuming of updating an aggregated value with statistics | +| series_scan_cost | stage="load_timeseries_metadata", type="aligned/non_aligned", from="mem/disk" | Timer | The time-consuming of loading TimeseriesMetadata | +| series_scan_cost | stage="read_timeseries_metadata", type="", from="cache/file" | Timer | The time-consuming of reading TimeseriesMetadata of a tsfile | +| series_scan_cost | stage="timeseries_metadata_modification", type="aligned/non_aligned", from="null" | Timer | The time-consuming of filtering TimeseriesMetadata by mods | +| series_scan_cost | stage="load_chunk_metadata_list", type="aligned/non_aligned", from="mem/disk" | Timer | The time-consuming of loading ChunkMetadata list | +| series_scan_cost | stage="chunk_metadata_modification", type="aligned/non_aligned", from="mem/disk" | Timer | The time-consuming of filtering ChunkMetadata by mods | +| series_scan_cost | stage="chunk_metadata_filter", type="aligned/non_aligned", from="mem/disk" | Timer | The time-consuming of filtering ChunkMetadata by query filter | +| series_scan_cost | stage="construct_chunk_reader", type="aligned/non_aligned", from="mem/disk" | Timer | The time-consuming of constructing ChunkReader | +| series_scan_cost | stage="read_chunk", type="", from="cache/file" | Timer | The time-consuming of reading Chunk | +| series_scan_cost | stage="init_chunk_reader", type="aligned/non_aligned", from="mem/disk" | Timer | The time-consuming of initializing ChunkReader (constructing PageReader) | +| series_scan_cost | stage="build_tsblock_from_page_reader", type="aligned/non_aligned", from="mem/disk" | Timer | The time-consuming of constructing Tsblock from PageReader | +| series_scan_cost | stage="build_tsblock_from_merge_reader", type="aligned/non_aligned", from="null" | Timer | The time-consuming of constructing Tsblock from MergeReader (handling overlapping data) | + +#### 4.2.16. Coordinator + +| Metric | Tags | Type | Description | +|-------------|---------------------------------|-----------|----------------------------------------------------| +| coordinator | name="query_execution_map_size" | AutoGauge | The number of queries recorded on current DataNode | + +#### 4.2.17. FragmentInstanceManager + +| Metric | Tags | Type | Description | +|---------------------------|--------------------------------|-----------|----------------------------------------------------------| +| fragment_instance_manager | name="instance_context_size" | AutoGauge | The number of query fragment context on current DataNode | +| fragment_instance_manager | name="instance_execution_size" | AutoGauge | The number of query fragment on current DataNode | + +#### 4.2.18. MemoryPool + +| Metric | Tags | Type | Description | +|-------------|--------------------------------------|-----------|----------------------------------------------------------------| +| memory_pool | name="max_bytes" | Gauge | Maximum memory for data exchange | +| memory_pool | name="remaining_bytes" | AutoGauge | Remaining memory for data exchange | +| memory_pool | name="query_memory_reservation_size" | AutoGauge | Size of query reserved memory | +| memory_pool | name="memory_reservation_size" | AutoGauge | Size of sink handle and source handle trying to reserve memory | + +#### 4.2.19. LocalExecutionPlanner + +| Metric | Tags | Type | Description | +|-------------------------|----------------------------------|-----------|---------------------------------------------------------------------------| +| local_execution_planner | name="free_memory_for_operators" | AutoGauge | The remaining memory can allocate for query fragments on current DataNode | + +#### 4.2.20. Schema Engine + +| Metric | Tags | Type | Description | +| ------------- | ------------------------------------------------------------ | --------- | -------------------------------------------------- | +| schema_engine | name="schema_region_total_mem_usage" | AutoGauge | Memory usgae for all SchemaRegion | +| schema_engine | name="schema_region_mem_capacity" | AutoGauge | Memory capacity for all SchemaRegion | +| schema_engine | name="schema_engine_mode" | Gauge | Mode of SchemaEngine | +| schema_engine | name="schema_region_consensus" | Gauge | Consensus protocol of SchemaRegion | +| schema_engine | name="schema_region_number" | AutoGauge | Number of SchemaRegion | +| quantity | name="template_series_cnt" | AutoGauge | Number of template series | +| schema_region | name="schema_region_mem_usage", region="SchemaRegion[{regionId}]" | AutoGauge | Memory usgae for each SchemaRegion | +| schema_region | name="schema_region_series_cnt", region="SchemaRegion[{regionId}]" | AutoGauge | Number of total timeseries for each SchemaRegion | +| schema_region | name="activated_template_cnt", region="SchemaRegion[{regionId}]" | AutoGauge | Number of Activated template for each SchemaRegion | +| schema_region | name="template_series_cnt", region="SchemaRegion[{regionId}]" | AutoGauge | Number of template series for each SchemaRegion | + +#### 4.2.21. Write Performance + +| Metric | Tags | Type | Description | +| ------------------------- | :-------------------------------------------------------------------- | --------- | ------------------------------------------------------ | +| wal_node_num | name="wal_nodes_num" | AutoGauge | Num of WALNode | +| wal_cost | stage="make_checkpoint" type="" | Timer | Time cost of make checkpoints for all checkpoint type | +| wal_cost | type="serialize_one_wal_info_entry" | Timer | Time cost of serialize one WALInfoEntry | +| wal_cost | stage="sync_wal_buffer" type="" | Timer | Time cost of sync WALBuffer | +| wal_buffer | name="used_ratio" | Histogram | Used ratio of WALBuffer | +| wal_buffer | name="entries_count" | Histogram | Entries Count of WALBuffer | +| wal_cost | stage="serialize_wal_entry" type="serialize_wal_entry_total" | Timer | Time cost of WALBuffer serialize task | +| wal_node_info | name="effective_info_ratio" type="" | Histogram | Effective info ratio of WALNode | +| wal_node_info | name="oldest_mem_table_ram_when_cause_snapshot" type="" | Histogram | Ram of oldest memTable when cause snapshot | +| wal_node_info | name="oldest_mem_table_ram_when_cause_flush" type="" | Histogram | Ram of oldest memTable when cause flush | +| flush_sub_task_cost | type="sort_task" | Timer | Time cost of sort series in flush sort stage | +| flush_sub_task_cost | type="encoding_task" | Timer | Time cost of sub encoding task in flush encoding stage | +| flush_sub_task_cost | type="io_task" | Timer | Time cost of sub io task in flush io stage | +| flush_cost | stage="write_plan_indices" | Timer | Time cost of write plan indices | +| flush_cost | stage="sort" | Timer | Time cost of flush sort stage | +| flush_cost | stage="encoding" | Timer | Time cost of flush encoding stage | +| flush_cost | stage="io" | Timer | Time cost of flush io stage | +| pending_flush_task | type="pending_task_num" | AutoGauge | Num of pending flush task num | +| pending_flush_task | type="pending_sub_task_num" | AutoGauge | Num of pending flush sub task num | +| flushing_mem_table_status | name="mem_table_size" region="DataRegion[]" | Histogram | Size of flushing memTable | +| flushing_mem_table_status | name="total_point_num" region="DataRegion[]" | Histogram | Point num of flushing memTable | +| flushing_mem_table_status | name="series_num" region="DataRegion[]" | Histogram | Series num of flushing memTable | +| flushing_mem_table_status | name="avg_series_points_num" region="DataRegion[]" | Histogram | Point num of flushing memChunk | +| flushing_mem_table_status | name="tsfile_compression_ratio" region="DataRegion[]" | Histogram | TsFile Compression ratio of flushing memTable | +| flushing_mem_table_status | name="flush_tsfile_size" region="DataRegion[]" | Histogram | TsFile size of flushing memTable | +| data_region_mem_cost | name="data_region_mem_cost" | AutoGauge | Mem cost of data regions | + +### 4.3. Normal level Metrics + +#### 4.3.1. Cluster + +| Metric | Tags | Type | Description | +| ------ | ------------------------------------------------------------ | --------- | ------------------------------------------------------------------ | +| region | name="{DatabaseName}",type="SchemaRegion/DataRegion" | AutoGauge | The number of DataRegion/SchemaRegion of database in specific node | +| slot | name="{DatabaseName}",type="schemaSlotNumber/dataSlotNumber" | AutoGauge | The number of DataSlot/SchemaSlot of database in specific node | + +### 4.4. All Metric + +Currently there is no All level metrics, and it will continue to be added in the future. + +## 5. How to get these metrics? + +The relevant configuration of the metric module is in `conf/iotdb-{datanode/confignode}.properties`, and all +configuration items support hot loading through the `load configuration` command. + +### 5.1. JMX + +For metrics exposed externally using JMX, you can view them through Jconsole. After entering the Jconsole monitoring +page, you will first see an overview of various running conditions of IoTDB. Here you can see heap memory information, +thread information, class information, and the server's CPU usage. + +#### 5.1.1. Obtain metric data + +After connecting to JMX, you can find the "MBean" named "org.apache.iotdb.metrics" through the "MBeans" tab, and you can +view the specific values of all monitoring metrics in the sidebar. + +metric-jmx + +#### 5.1.2. Get other relevant data + +After connecting to JMX, you can find the "MBean" named "org.apache.iotdb.service" through the "MBeans" tab, as shown in +the image below, to understand the basic status of the service + +
+ +In order to improve query performance, IOTDB caches ChunkMetaData and TsFileMetaData. Users can use MXBean and expand +the sidebar `org.apache.iotdb.db.service` to view the cache hit ratio: + + + +### 5.2. Prometheus + +#### 5.2.1. The mapping from metric type to prometheus format + +> For metrics whose Metric Name is name and Tags are K1=V1, ..., Kn=Vn, the mapping is as follows, where value is a +> specific value + +| Metric Type | Mapping | +| ---------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | +| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value | +| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | +| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", rate="mean"} value | +| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.5"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId", k1="V1", ..., Kn="Vn", quantile="0.99"} value | + +#### 5.2.2. Config File + +1) Taking DataNode as an example, modify the iotdb-system.properties configuration file as follows: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=CORE +dn_metric_prometheus_reporter_port=9091 +``` + +Then you can get metrics data as follows + +2) Start IoTDB DataNodes +3) Open a browser or use ```curl``` to visit ```http://servier_ip:9091/metrics```, you can get the following metric + data: + +``` +... +# HELP file_count +# TYPE file_count gauge +file_count{name="wal",} 0.0 +file_count{name="unseq",} 0.0 +file_count{name="seq",} 2.0 +... +``` + +#### 5.2.3. Prometheus + Grafana + +As shown above, IoTDB exposes monitoring metrics data in the standard Prometheus format to the outside world. Prometheus +can be used to collect and store monitoring indicators, and Grafana can be used to visualize monitoring indicators. + +The following picture describes the relationships among IoTDB, Prometheus and Grafana + +![iotdb_prometheus_grafana](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) + +1. Along with running, IoTDB will collect its metrics continuously. +2. Prometheus scrapes metrics from IoTDB at a constant interval (can be configured). +3. Prometheus saves these metrics to its inner TSDB. +4. Grafana queries metrics from Prometheus at a constant interval (can be configured) and then presents them on the + graph. + +So, we need to do some additional works to configure and deploy Prometheus and Grafana. + +For instance, you can config your Prometheus as follows to get metrics data from IoTDB: + +```yaml +job_name: pull-metrics +honor_labels: true +honor_timestamps: true +scrape_interval: 15s +scrape_timeout: 10s +metrics_path: /metrics +scheme: http +follow_redirects: true +static_configs: + - targets: + - localhost:9091 +``` + +The following documents may help you have a good journey with Prometheus and Grafana. + +[Prometheus getting_started](https://prometheus.io/docs/prometheus/latest/getting_started/) + +[Prometheus scrape metrics](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) + +[Grafana getting_started](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) + +[Grafana query metrics from Prometheus](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) + +#### 5.2.4. Apache IoTDB Dashboard + +We provide the Apache IoTDB Dashboard, and the rendering shown in Grafana is as follows: + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/dashboard.png) + +You can obtain the json files of Dashboards in enterprise version. + +### 5.3. IoTDB + +#### 5.3.1. IoTDB mapping relationship of metrics + +> For metrics whose Metric Name is name and Tags are K1=V1, ..., Kn=Vn, the mapping is as follows, taking root.__ +> system.metric.`clusterName`.`nodeType`.`nodeId` as an example by default + +| Metric Type | Mapping | +| ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Counter | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.value | +| AutoGauge、Gauge | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.value | +| Histogram | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.count
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.max
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.sum
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p0
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p50
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p75
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p99
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p999 | +| Rate | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.count
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.mean
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m1
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m5
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m15 | +| Timer | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.count
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.max
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.mean
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.sum
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p0
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p50
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p75
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p99
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p999
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m1
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m5
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m15 | + +#### 5.3.2. Obtain metrics + +According to the above mapping relationship, related IoTDB query statements can be formed to obtain metrics \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Monitoring-Board-Install-and-Deploy.md b/src/UserGuide/V2.0.1/Tree/stage/Monitoring-Board-Install-and-Deploy.md new file mode 100644 index 00000000..be807c19 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Monitoring-Board-Install-and-Deploy.md @@ -0,0 +1,207 @@ + + +# 1 Monitoring Board Install and Deploy +From the IoTDB 1.0 version, we introduced the system monitoring module, you can complete the IoTDB important operational indicators for monitoring, this article describes how to open the system monitoring module in the IoTDB distribution, and the use of Prometheus + Grafana way to complete the visualisation of the system monitoring indicators. + +## 1.1 pre-preparation + +### 1.1.1 software requirement + +1. IoTDB: version 1.0 and above, you may contact your sales for the relevant installer +2. Prometheus: version 2.30.3 and above, download from the official website: https://prometheus.io/download/ +3. Grafana: version 8.4.2 and above, download from the official website: https://grafana.com/grafana/download +4. IoTDB Dashboards: IoTDB Dashboard is a tool for Enterprise IoTDB, and you may contact your sales for the relevant installer. + +### 1.1.2 Start ConfigNode +1. Enter the `iotdb-enterprise-1.x.x.x-bin` package +2. Modify the configuration file `conf/iotdb-system.properties` and modify the following configuration. Other configurations remain unchanged: + +```properties +cn_metric_reporter_list=PROMETHEUS +cn_metric_level=IMPORTANT +cn_metric_prometheus_reporter_port=9091 +``` + +3. Run the script to start ConfigNode: `./sbin/start-confignode.sh`. If the following prompt appears, the startup is successful: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/1.png) + +4. Enter the http://localhost:9091/metrics URL in the browser, and you can view the following monitoring item information: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/2.png) + +5. Similarly, the other two ConfigNode nodes can be configured to ports 9092 and 9093 respectively. + +### 1.1.3 Start DataNode +1. Enter the `iotdb-enterprise-1.x.x.x-bin` package +2. Modify the configuration file `conf/iotdb-system.properties` and modify the following configuration. Other configurations remain unchanged: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=IMPORTANT +dn_metric_prometheus_reporter_port=9094 +``` + +3. Run the script to start DataNode: `./sbin/start-datanode.sh`. If the following prompt appears, the startup is successful: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/3.png) + +4. Enter the `http://localhost:9094/metrics` URL in the browser, and you can view the following monitoring item information: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/4.png) + +5. Similarly, the other two DataNodes can be configured to ports 9095 and 9096. + +### 1.1.4 clarification + +Please confirm that the IoTDB cluster has been started before performing the following operations. + +This doc will build the monitoring dashboard on one machine (1 ConfigNode and 1 DataNode) environment, other cluster configurations are similar, users can adjust the configuration according to their own cluster situation (the number of ConfigNode and DataNode). The basic configuration information of the cluster built in this paper is shown in the table below. + +| NODETYPE | NODEIP | Monitor Pusher | Monitor Level | Monitor Port | +| ---------- | --------- | -------------- | ------------ | --------- | +| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9091 | +| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9092 | +| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9093 | +| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9094 | +| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9095 | +| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9096 | + +## 1.2 configure Prometheus capture monitoring metrics + +1. Download the installation package. Download the Prometheus binary package locally, unzip it and go to the corresponding folder: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +2. Modify the configuration. Modify the Prometheus configuration file prometheus.yml as follows: + a. Added confignode task to collect monitoring data from ConfigNode + b. Add datanode task to collect monitoring data from DataNode + +```YAML +global: + scrape_interval: 15s + +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["localhost:9091", "localhost:9092", "localhost:9093"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["localhost:9094", "localhost:9095", "localhost:9096"] + honor_labels: true +``` + +3. Start Promethues. the default expiration time for Prometheus monitoring data is 15d. in production environments, it is recommended to adjust the expiration time to 180d or more in order to track historical monitoring data for a longer period of time, as shown in the following startup command: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +4. Confirm the startup is successful. Enter http://localhost:9090 in the browser to enter Prometheus, click to enter the Target interface under Status (Figure 1 below), when you see State are Up, it means the configuration is successful and connected (Figure 2 below), click the link on the left side to jump to the webpage monitoring. + +![](https://alioss.timecho.com/docs/img/1a.PNG) +![](https://alioss.timecho.com/docs/img/2a.PNG) + + + +## 1.3 Using Grafana to View Monitoring Data + +### 1.3.1 Step1:Grafana Installation, Configuration and Startup + +1. Download the binary package of Grafana locally, unzip it and go to the corresponding folder: + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +2. Start Grafana and enter: + +```Shell +./bin/grafana-server web +``` + +3. Enter http://localhost:3000 in your browser to access Grafana, the default initial username and password are both admin. +4. First we configure the Data Source in Configuration to be Prometheus. + +![](https://alioss.timecho.com/upload/3aea.jpg) + +5. When configuring the Data Source, pay attention to the URL where Prometheus is located, and click Save & Test after configuration, the Data source is working prompt appears, then the configuration is successful. + +![](https://alioss.timecho.com/upload/4aen.jpg) + +### 1.3.2 Step2:Import the IoTDB Dashboards + +1. Enter Grafana,click Browse of Dashboards + +![](https://alioss.timecho.com/docs/img/5a.png) + +2. Click the Import button on the right + +![](https://alioss.timecho.com/docs/img/6a.png) + +3. Select a way to import Dashboard + a. Upload the Json file of the downloaded Dashboard locally + b. Paste the contents of the Dashboard's Json file + +![](https://alioss.timecho.com/docs/img/7a.png) + +1. Select Prometheus in the Dashboard as the Data Source you just configured and click Import + +![](https://alioss.timecho.com/docs/img/8a.png) + +5. Then enter the Apache ConfigNode Dashboard and see the following monitoring panel + +![](https://alioss.timecho.com/docs/img/confignode.png) + +6. Similarly, we can import Apache DataNode Dashboard and see the following monitoring panel: + +![](https://alioss.timecho.com/docs/img/datanode.png) + +7. Similarly, we can import the Apache Performance Overview Dashboard and see the following monitoring panel: + +![](https://alioss.timecho.com/docs/img/performance.png) + +8. Similarly, we can import the Apache System Overview Dashboard and see the following monitoring panel: + +![](https://alioss.timecho.com/docs/img/system.png) + +### 1.3.3 Step3:Creating a new Dashboard for data visualisation + +1. First create the Dashboard, then create the Panel. + +![](https://alioss.timecho.com/docs/img/11a.png) + +2. After that, you can visualize the monitoring-related data in the panel according to your needs (all relevant monitoring metrics can be filtered by selecting confignode/datanode in the job first). + +![](https://alioss.timecho.com/upload/12aen.jpg) + +3. Once the visualisation of the monitoring metrics selected for attention is complete, we get a panel like this: + +![](https://alioss.timecho.com/docs/img/13a.png) \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Auto-Create-MetaData.md b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Auto-Create-MetaData.md new file mode 100644 index 00000000..74955293 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Auto-Create-MetaData.md @@ -0,0 +1,112 @@ + + +# Auto create metadata + +Automatically creating schema means creating time series based on the characteristics of written data in case time series haven't defined by users themselves. +This function can not only solve the problem that entities and measurements are difficult to predict and model in advance under massive time series scenarios, +but also provide users with an out-of-the-box writing experience. + +## Auto create database metadata + +* enable\_auto\_create\_schema + +| Name | enable\_auto\_create\_schema | +|:---:|:---| +| Description | whether creating schema automatically is enabled | +| Type | boolean | +| Default | true | +| Effective | After restarting system | + +* default\_storage\_group\_level + +| Name | default\_storage\_group\_level | +|:---:|:---| +| Description | Specify which level database is in the time series, the default level is 1 (root is on level 0) | +| Type | int | +| Default | 1 | +| Effective | Only allowed to be modified in first start up | + +Illustrated as the following figure: + +* When default_storage_group_level=1, root.turbine1 and root.turbine2 will be created as database. + +* When default_storage_group_level=2, root.turbine1.d1, root.turbine1.d2, root.turbine2.d1 and root.turbine2.d2 will be created as database. + +auto create database example + +## Auto create time series metadata(specify data type in the frontend) + +* Users should specify data type when writing: + + * insertTablet method in Session module. + * insert method using TSDataType in Session module. + ``` + public void insertRecord(String deviceId, long time, List measurements, List types, Object... values); + public void insertRecords(List deviceIds, List times, List> measurementsList, List> typesList, List> valuesList); + ``` + * ...... + +* Efficient, time series are auto created when inserting data. + +## Auto create time series metadata(infer data type in the backend) + +* Just pass string, and the database will infer the data type: + + * insert command in CLI module. + * insert method without using TSDataType in Session module. + ``` + public void insertRecord(String deviceId, long time, List measurements, List types, List values); + public void insertRecords(List deviceIds, List times, List> measurementsList, List> valuesList); + ``` + * ...... + +* Since type inference will increase the writing time, the efficiency of auto creating time series metadata through type inference is lower than that of auto creating time series metadata through specifying data type. We recommend users choose specifying data type in the frontend when possible. + +### Type inference + +| Data(String Format) | Format Type | iotdb-system.properties | Default | +|:---:|:---|:------------------------------|:---| +| true | boolean | boolean\_string\_infer\_type | BOOLEAN | +| 1 | integer | integer\_string\_infer\_type | FLOAT | +| 17000000(integer > 2^24) | integer | long\_string\_infer\_type | DOUBLE | +| 1.2 | floating | floating\_string\_infer\_type | FLOAT | +| NaN | nan | nan\_string\_infer\_type | DOUBLE | +| 'I am text' | text | x | x | + +* Data types can be configured as BOOLEAN, INT32, INT64, FLOAT, DOUBLE, TEXT. + +* long_string_infer_type is used to avoid precision loss caused by using integer_string_infer_type=FLOAT to infer num > 2^24. + +### Encoding Type + +| Data Type | iotdb-system.properties | Default | +|:---|:---------------------------|:---| +| BOOLEAN | default\_boolean\_encoding | RLE | +| INT32 | default\_int32\_encoding | RLE | +| INT64 | default\_int64\_encoding | RLE | +| FLOAT | default\_float\_encoding | GORILLA | +| DOUBLE | default\_double\_encoding | GORILLA | +| TEXT | default\_text\_encoding | PLAIN | + +* Encoding types can be configured as PLAIN, RLE, TS_2DIFF, GORILLA, DICTIONARY. + +* The corresponding relationship between data types and encoding types is detailed in [Encoding](../Basic-Concept/Encoding-and-Compression.md). \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Database.md b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Database.md new file mode 100644 index 00000000..42aaf388 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Database.md @@ -0,0 +1,227 @@ + + +# Database Management + +## Create Database + +According to the storage model we can set up the corresponding database. Two SQL statements are supported for creating databases, as follows: + +``` +IoTDB > create database root.ln +IoTDB > create database root.sgcc +``` + +We can thus create two databases using the above two SQL statements. + +It is worth noting that when the path itself or the parent/child layer of the path is already created as database, the path is then not allowed to be created as database. For example, it is not feasible to create `root.ln.wf01` as database when two databases `root.ln` and `root.sgcc` exist. The system gives the corresponding error prompt as shown below: + +``` +IoTDB> CREATE DATABASE root.ln.wf01 +Msg: 300: root.ln has already been created as database. +IoTDB> create database root.ln.wf01 +Msg: 300: root.ln has already been created as database. +``` +The LayerName of database can only be characters, numbers, underscores. If you want to set it to pure numbers or contain other characters, you need to enclose the database name with backticks (``). + +Besides, if deploy on Windows system, the LayerName is case-insensitive, which means it's not allowed to create databases `root.ln` and `root.LN` at the same time. + +## Show Databases + +After creating the database, we can use the [SHOW DATABASES](../Reference/SQL-Reference.md) statement and [SHOW DATABASES \](../Reference/SQL-Reference.md) to view the databases. The SQL statements are as follows: + +``` +IoTDB> SHOW DATABASES +IoTDB> SHOW DATABASES root.** +``` + +The result is as follows: + +``` ++-------------+----+-------------------------+-----------------------+-----------------------+ +|database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| ++-------------+----+-------------------------+-----------------------+-----------------------+ +| root.sgcc|null| 2| 2| 604800| +| root.ln|null| 2| 2| 604800| ++-------------+----+-------------------------+-----------------------+-----------------------+ +Total line number = 2 +It costs 0.060s +``` + +## Delete Database + +User can use the `DELETE DATABASE ` statement to delete all databases matching the pathPattern. Please note the data in the database will also be deleted. + +``` +IoTDB > DELETE DATABASE root.ln +IoTDB > DELETE DATABASE root.sgcc +// delete all data, all timeseries and all databases +IoTDB > DELETE DATABASE root.** +``` + +## Count Databases + +User can use the `COUNT DATABASE ` statement to count the number of databases. It is allowed to specify `PathPattern` to count the number of databases matching the `PathPattern`. + +SQL statement is as follows: + +``` +IoTDB> count databases +IoTDB> count databases root.* +IoTDB> count databases root.sgcc.* +IoTDB> count databases root.sgcc +``` + +The result is as follows: + +``` ++-------------+ +| database| ++-------------+ +| root.sgcc| +| root.turbine| +| root.ln| ++-------------+ +Total line number = 3 +It costs 0.003s + ++-------------+ +| database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.003s + ++-------------+ +| database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 0| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 1| ++-------------+ +Total line number = 1 +It costs 0.002s +``` + +### Setting up heterogeneous databases (Advanced operations) + +Under the premise of familiar with IoTDB metadata modeling, +users can set up heterogeneous databases in IoTDB to cope with different production needs. + +Currently, the following database heterogeneous parameters are supported: + +| Parameter | Type | Description | +|---------------------------|---------|-----------------------------------------------| +| TTL | Long | TTL of the Database | +| SCHEMA_REPLICATION_FACTOR | Integer | The schema replication number of the Database | +| DATA_REPLICATION_FACTOR | Integer | The data replication number of the Database | +| SCHEMA_REGION_GROUP_NUM | Integer | The SchemaRegionGroup number of the Database | +| DATA_REGION_GROUP_NUM | Integer | The DataRegionGroup number of the Database | + +Note the following when configuring heterogeneous parameters: ++ TTL and TIME_PARTITION_INTERVAL must be positive integers. ++ SCHEMA_REPLICATION_FACTOR and DATA_REPLICATION_FACTOR must be smaller than or equal to the number of deployed DataNodes. ++ The function of SCHEMA_REGION_GROUP_NUM and DATA_REGION_GROUP_NUM are related to the parameter `schema_region_group_extension_policy` and `data_region_group_extension_policy` in iotdb-system.properties configuration file. Take DATA_REGION_GROUP_NUM as an example: +If `data_region_group_extension_policy=CUSTOM` is set, DATA_REGION_GROUP_NUM serves as the number of DataRegionGroups owned by the Database. +If `data_region_group_extension_policy=AUTO`, DATA_REGION_GROUP_NUM is used as the lower bound of the DataRegionGroup quota owned by the Database. That is, when the Database starts writing data, it will have at least this number of DataRegionGroups. + +Users can set any heterogeneous parameters when creating a Database, or adjust some heterogeneous parameters during a stand-alone/distributed IoTDB run. + +#### Set heterogeneous parameters when creating a Database + +The user can set any of the above heterogeneous parameters when creating a Database. The SQL statement is as follows: + +``` +CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? +``` + +For example: +``` +CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### Adjust heterogeneous parameters at run time + +Users can adjust some heterogeneous parameters during the IoTDB runtime, as shown in the following SQL statement: + +``` +ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* +``` + +For example: +``` +ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +Note that only the following heterogeneous parameters can be adjusted at runtime: ++ SCHEMA_REGION_GROUP_NUM ++ DATA_REGION_GROUP_NUM + +#### Show heterogeneous databases + +The user can query the specific heterogeneous configuration of each Database, and the SQL statement is as follows: + +``` +SHOW DATABASES DETAILS prefixPath? +``` + +For example: + +``` +IoTDB> SHOW DATABASES DETAILS ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| +|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| +|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +Total line number = 3 +It costs 0.058s +``` + +The query results in each column are as follows: ++ The name of the Database ++ The TTL of the Database ++ The schema replication number of the Database ++ The data replication number of the Database ++ The time partition interval of the Database ++ The current SchemaRegionGroup number of the Database ++ The required minimum SchemaRegionGroup number of the Database ++ The permitted maximum SchemaRegionGroup number of the Database ++ The current DataRegionGroup number of the Database ++ The required minimum DataRegionGroup number of the Database ++ The permitted maximum DataRegionGroup number of the Database \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Node.md b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Node.md new file mode 100644 index 00000000..b8860965 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Node.md @@ -0,0 +1,288 @@ + + +# Node Management +## Show Child Paths + +``` +SHOW CHILD PATHS pathPattern +``` + +Return all child paths and their node types of all the paths matching pathPattern. + +node types: ROOT -> DB INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES + + +Example: + +* return the child paths of root.ln:show child paths root.ln + +``` ++------------+----------+ +| child paths|node types| ++------------+----------+ +|root.ln.wf01| INTERNAL| +|root.ln.wf02| INTERNAL| ++------------+----------+ +Total line number = 2 +It costs 0.002s +``` + +> get all paths in form of root.xx.xx.xx:show child paths root.xx.xx + +## Show Child Nodes + +``` +SHOW CHILD NODES pathPattern +``` + +Return all child nodes of the pathPattern. + +Example: + +* return the child nodes of root:show child nodes root + +``` ++------------+ +| child nodes| ++------------+ +| ln| ++------------+ +``` + +* return the child nodes of root.ln:show child nodes root.ln + +``` ++------------+ +| child nodes| ++------------+ +| wf01| +| wf02| ++------------+ +``` + +## Count Nodes + +IoTDB is able to use `COUNT NODES LEVEL=` to count the number of nodes at + the given level in current Metadata Tree considering a given pattern. IoTDB will find paths that + match the pattern and counts distinct nodes at the specified level among the matched paths. + This could be used to query the number of devices with specified measurements. The usage are as + follows: + +``` +IoTDB > COUNT NODES root.** LEVEL=2 +IoTDB > COUNT NODES root.ln.** LEVEL=2 +IoTDB > COUNT NODES root.ln.wf01.** LEVEL=3 +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` + +As for the above mentioned example and Metadata tree, you can get following results: + +``` ++------------+ +|count(nodes)| ++------------+ +| 4| ++------------+ +Total line number = 1 +It costs 0.003s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 1| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s +``` + +> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. + +## Show Devices + +* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? + +Similar to `Show Timeseries`, IoTDB also supports two ways of viewing devices: + +* `SHOW DEVICES` statement presents all devices' information, which is equal to `SHOW DEVICES root.**`. +* `SHOW DEVICES ` statement specifies the `PathPattern` and returns the devices information matching the pathPattern and under the given level. +* `WHERE` condition supports `DEVICE contains 'xxx'` to do a fuzzy query based on the device name. +* `WHERE` condition supports `TEMPLATE = 'xxx'`,`TEMPLATE != 'xxx'` to do a filter query based on the template name. +* `WHERE` condition supports `TEMPLATE is null`,`TEMPLATE is not null` to do a filter query based on whether the template is null (indicating it's inactive) or not null (indicating activation). + +SQL statement is as follows: + +``` +IoTDB> show devices +IoTDB> show devices root.ln.** +IoTDB> show devices root.ln.** where device contains 't' +IoTDB> show devices root.ln.** where template = 't1' +IoTDB> show devices root.ln.** where template is null +``` + +You can get results below: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +| root.ln.wf01.wt01| false| t1| +| root.ln.wf02.wt02| false| null| +|root.sgcc.wf03.wt01| false| null| +| root.turbine.d1| false| null| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.002s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s +``` + +`isAligned` indicates whether the timeseries under the device are aligned, `Template` displays the name of the template activated on the device, with "null" indicating that no template has been activated. + +To view devices' information with database, we can use `SHOW DEVICES WITH DATABASE` statement. + +* `SHOW DEVICES WITH DATABASE` statement presents all devices' information with their database. +* `SHOW DEVICES WITH DATABASE` statement specifies the `PathPattern` and returns the +devices' information under the given level with their database information. + +SQL statement is as follows: + +``` +IoTDB> show devices with database +IoTDB> show devices root.ln.** with database +``` + +You can get results below: + +``` ++-------------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-------------------+-------------+---------+---------+ +| root.ln.wf01.wt01| root.ln| false| t1| +| root.ln.wf02.wt02| root.ln| false| null| +|root.sgcc.wf03.wt01| root.sgcc| false| null| +| root.turbine.d1| root.turbine| false| null| ++-------------------+-------------+---------+---------+ +Total line number = 4 +It costs 0.003s + ++-----------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-----------------+-------------+---------+---------+ +|root.ln.wf01.wt01| root.ln| false| t1| +|root.ln.wf02.wt02| root.ln| false| null| ++-----------------+-------------+---------+---------+ +Total line number = 2 +It costs 0.001s +``` + +## Count Devices + +* COUNT DEVICES / + +The above statement is used to count the number of devices. At the same time, it is allowed to specify `PathPattern` to count the number of devices matching the `PathPattern`. + +SQL statement is as follows: + +``` +IoTDB> show devices +IoTDB> count devices +IoTDB> count devices root.ln.** +``` + +You can get results below: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +|root.sgcc.wf03.wt03| false| null| +| root.turbine.d1| false| null| +| root.ln.wf02.wt02| false| null| +| root.ln.wf01.wt01| false| t1| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.024s + ++--------------+ +|count(devices)| ++--------------+ +| 4| ++--------------+ +Total line number = 1 +It costs 0.004s + ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +Total line number = 1 +It costs 0.004s +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Template.md b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Template.md new file mode 100644 index 00000000..c9e5780d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Template.md @@ -0,0 +1,241 @@ + + +# Schema Template + +IoTDB supports the schema template function, enabling different entities of the same type to share metadata, reduce the memory usage of metadata, and simplify the management of numerous entities and measurements. + +Note: The `schema` keyword in the following statements can be omitted. + +## Create Schema Template + +The SQL syntax for creating a schema template is as follows: + +```sql +CREATE SCHEMA TEMPLATE ALIGNED? '(' [',' ]+ ')' +``` + +**Example 1:** Create a template containing two non-aligned timeseires + +```shell +IoTDB> create schema template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +**Example 2:** Create a template containing a group of aligned timeseires + +```shell +IoTDB> create schema template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` + +The` lat` and `lon` measurements are aligned. + +## Set Schema Template + +After a schema template is created, it should be set to specific path before creating related timeseries or insert data. + +**It should be ensured that the related database has been set before setting template.** + +**It is recommended to set schema template to database path. It is not suggested to set schema template to some path above database** + +**It is forbidden to create timeseries under a path setting schema template. Schema template shall not be set on a prefix path of an existing timeseries.** + +The SQL Statement for setting schema template is as follow: + +```shell +IoTDB> set schema template t1 to root.sg1.d1 +``` + +## Activate Schema Template + +After setting the schema template, with the system enabled to auto create schema, you can insert data into the timeseries. For example, suppose there's a database root.sg1 and t1 has been set to root.sg1.d1, then timeseries like root.sg1.d1.temperature and root.sg1.d1.status are available and data points can be inserted. + + +**Attention**: Before inserting data or the system not enabled to auto create schema, timeseries defined by the schema template will not be created. You can use the following SQL statement to create the timeseries or activate the schema template, act before inserting data: + +```shell +IoTDB> create timeseries using schema template on root.sg1.d1 +``` + +**Example:** Execute the following statement +```shell +IoTDB> set schema template t1 to root.sg1.d1 +IoTDB> set schema template t2 to root.sg1.d2 +IoTDB> create timeseries using schema template on root.sg1.d1 +IoTDB> create timeseries using schema template on root.sg1.d2 +``` + +Show the time series: +```sql +show timeseries root.sg1.** +```` + +```shell ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| +| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +``` + +Show the devices: +```sql +show devices root.sg1.** +```` + +```shell ++---------------+---------+---------+ +| devices|isAligned| Template| ++---------------+---------+---------+ +| root.sg1.d1| false| null| +| root.sg1.d2| true| null| ++---------------+---------+---------+ +```` + +## Show Schema Template + +- Show all schema templates + +The SQL statement looks like this: + +```shell +IoTDB> show schema templates +``` + +The execution result is as follows: +```shell ++-------------+ +|template name| ++-------------+ +| t2| +| t1| ++-------------+ +``` + +- Show nodes under in schema template + +The SQL statement looks like this: + +```shell +IoTDB> show nodes in schema template t1 +``` + +The execution result is as follows: +```shell ++-----------+--------+--------+-----------+ +|child nodes|dataType|encoding|compression| ++-----------+--------+--------+-----------+ +|temperature| FLOAT| RLE| SNAPPY| +| status| BOOLEAN| PLAIN| SNAPPY| ++-----------+--------+--------+-----------+ +``` + +- Show the path prefix where a schema template is set + +```shell +IoTDB> show paths set schema template t1 +``` + +The execution result is as follows: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +- Show the path prefix where a schema template is used (i.e. the time series has been created) + +```shell +IoTDB> show paths using schema template t1 +``` + +The execution result is as follows: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +## Deactivate Schema Template + +To delete a group of timeseries represented by schema template, namely deactivate the schema template, use the following SQL statement: + +```shell +IoTDB> delete timeseries of schema template t1 from root.sg1.d1 +``` + +or + +```shell +IoTDB> deactivate schema template t1 from root.sg1.d1 +``` + +The deactivation supports batch process. + +```shell +IoTDB> delete timeseries of schema template t1 from root.sg1.*, root.sg2.* +``` + +or + +```shell +IoTDB> deactivate schema template t1 from root.sg1.*, root.sg2.* +``` + +If the template name is not provided in sql, all template activation on paths matched by given path pattern will be removed. + +## Unset Schema Template + +The SQL Statement for unsetting schema template is as follow: + +```shell +IoTDB> unset schema template t1 from root.sg1.d1 +``` + +**Attention**: It should be guaranteed that none of the timeseries represented by the target schema template exists, before unset it. It can be achieved by deactivation operation. + +## Drop Schema Template + +The SQL Statement for dropping schema template is as follow: + +```shell +IoTDB> drop schema template t1 +``` + +**Attention**: Dropping an already set template is not supported. + +## Alter Schema Template + +In a scenario where measurements need to be added, you can modify the schema template to add measurements to all devices using the schema template. + +The SQL Statement for altering schema template is as follow: + +```shell +IoTDB> alter schema template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) +``` + +**When executing data insertion to devices with schema template set on related prefix path and there are measurements not present in this schema template, the measurements will be auto added to this schema template.** \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Timeseries.md b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Timeseries.md new file mode 100644 index 00000000..12c0b03d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Timeseries.md @@ -0,0 +1,438 @@ + + +# Timeseries Management + +## Create Timeseries + +According to the storage model selected before, we can create corresponding timeseries in the two databases respectively. The SQL statements for creating timeseries are as follows: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +From v0.13, you can use a simplified version of the SQL statements to create timeseries: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE +``` + +Notice that when in the CREATE TIMESERIES statement the encoding method conflicts with the data type, the system gives the corresponding error prompt as shown below: + +``` +IoTDB > create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +error: encoding TS_2DIFF does not support BOOLEAN +``` + +Please refer to [Encoding](../Basic-Concept/Encoding-and-Compression.md) for correspondence between data type and encoding. + +## Create Aligned Timeseries + +The SQL statement for creating a group of timeseries are as follows: + +``` +IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +You can set different datatype, encoding, and compression for the timeseries in a group of aligned timeseries + +It is also supported to set an alias, tag, and attribute for aligned timeseries. + +## Delete Timeseries + +To delete the timeseries we created before, we are able to use `(DELETE | DROP) TimeSeries ` statement. + +The usage are as follows: + +``` +IoTDB> delete timeseries root.ln.wf01.wt01.status +IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +IoTDB> delete timeseries root.ln.wf02.* +IoTDB> drop timeseries root.ln.wf02.* +``` + +## Show Timeseries + +* SHOW LATEST? TIMESERIES pathPattern? whereClause? limitClause? + + There are four optional clauses added in SHOW TIMESERIES, return information of time series + + +Timeseries information includes: timeseries path, alias of measurement, database it belongs to, data type, encoding type, compression type, tags and attributes. + +Examples: + +* SHOW TIMESERIES + + presents all timeseries information in JSON form + +* SHOW TIMESERIES <`PathPattern`> + + returns all timeseries information matching the given <`PathPattern`>. SQL statements are as follows: + +``` +IoTDB> show timeseries root.** +IoTDB> show timeseries root.ln.** +``` + +The results are shown below respectively: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.016s + ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +Total line number = 4 +It costs 0.004s +``` + +* SHOW TIMESERIES LIMIT INT OFFSET INT + + returns all the timeseries information start from the offset and limit the number of series returned. For example, + +``` +show timeseries root.ln.** limit 10 offset 10 +``` + +* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' + + The query result set is filtered by string fuzzy matching based on the names of the timeseries. For example: + +``` +show timeseries root.ln.** where timeseries contains 'wf01.wt' +``` + +The result is shown below: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 2 +It costs 0.016s +``` + +* SHOW TIMESERIES WHERE DataType=type + + The query result set is filtered by data type. For example: + +``` +show timeseries root.ln.** where dataType=FLOAT +``` + +The result is shown below: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 3 +It costs 0.016s + +``` + + +* SHOW LATEST TIMESERIES + + all the returned timeseries information should be sorted in descending order of the last timestamp of timeseries + + +It is worth noting that when the queried path does not exist, the system will return no timeseries. + + +## Count Timeseries + +IoTDB is able to use `COUNT TIMESERIES ` to count the number of timeseries matching the path. SQL statements are as follows: +* `WHERE` condition could be used to fuzzy match a time series name with the following syntax: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'`. +* `WHERE` condition could be used to filter result by data type with the syntax: `COUNT TIMESERIES WHERE DataType='`. +* `WHERE` condition could be used to filter result by tags with the syntax: `COUNT TIMESERIES WHERE TAGS(key)='value'` or `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`. +* `LEVEL` could be defined to show count the number of timeseries of each node at the given level in current Metadata Tree. This could be used to query the number of sensors under each device. The grammar is: `COUNT TIMESERIES GROUP BY LEVEL=`. + + +``` +IoTDB > COUNT TIMESERIES root.** +IoTDB > COUNT TIMESERIES root.ln.** +IoTDB > COUNT TIMESERIES root.ln.*.*.status +IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +``` + +For example, if there are several timeseries (use `show timeseries` to show all timeseries): + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.004s +``` + +Then the Metadata Tree will be as below: + +
+As can be seen, `root` is considered as `LEVEL=0`. So when you enter statements such as: + +``` +IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +You will get following results: + +``` ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +| root.sgcc| 2| +|root.turbine| 1| +| root.ln| 4| ++------------+-----------------+ +Total line number = 3 +It costs 0.002s + ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf02| 2| +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 2 +It costs 0.002s + ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 1 +It costs 0.002s +``` + +> Note: The path of timeseries is just a filter condition, which has no relationship with the definition of level. + +## Tag and Attribute Management + +We can also add an alias, extra tag and attribute information while creating one timeseries. + +The differences between tag and attribute are: + +* Tag could be used to query the path of timeseries, we will maintain an inverted index in memory on the tag: Tag -> Timeseries +* Attribute could only be queried by timeseries path : Timeseries -> Attribute + +The SQL statements for creating timeseries with extra tag and attribute information are extended as follows: + +``` +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +The `temprature` in the brackets is an alias for the sensor `s1`. So we can use `temprature` to replace `s1` anywhere. + +> IoTDB also supports [using AS function](../Reference/SQL-Reference.md#data-management-statement) to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. + +> Notice that the size of the extra tag and attribute information shouldn't exceed the `tag_attribute_total_size`. + +We can update the tag information after creating it as following: + +* Rename the tag/attribute key +``` +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` +* Reset the tag/attribute value +``` +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` +* Delete the existing tag/attribute +``` +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` +* Add new tags +``` +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` +* Add new attributes +``` +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` +* Upsert alias, tags and attributes +> add alias or a new key-value if the alias or key doesn't exist, otherwise, update the old one with new value. +``` +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag3=v3, tag4=v4) ATTRIBUTES(attr3=v3, attr4=v4) +``` +* Show timeseries using tags. Use TAGS(tagKey) to identify the tags used as filter key +``` +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +returns all the timeseries information that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +``` +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +The results are shown below respectly: + +``` ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.005s + ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.004s +``` + +- count timeseries using tags + +``` +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +returns all the number of timeseries that satisfy the where condition and match the pathPattern. SQL statements are as follows: + +``` +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +The results are shown below respectly : + +``` +IoTDB> count timeseries ++-----------------+ +|count(timeseries)| ++-----------------+ +| 6| ++-----------------+ +Total line number = 1 +It costs 0.019s +IoTDB> count timeseries root.** where TAGS(unit)='c' ++-----------------+ +|count(timeseries)| ++-----------------+ +| 2| ++-----------------+ +Total line number = 1 +It costs 0.020s +IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 ++--------------+-----------------+ +| column|count(timeseries)| ++--------------+-----------------+ +| root.ln.wf02| 2| +| root.ln.wf01| 0| +|root.sgcc.wf03| 0| ++--------------+-----------------+ +Total line number = 3 +It costs 0.011s +``` + +> Notice that, we only support one condition in the where clause. Either it's an equal filter or it is an `contains` filter. In both case, the property in the where condition must be a tag. + +create aligned timeseries + +``` +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +The execution result is as follows: + +``` +IoTDB> show timeseries ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| +|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +Support query: + +``` +IoTDB> show timeseries where TAGS(tag1)='v1' ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +The above operations are supported for timeseries tag, attribute updates, etc. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Aggregation.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Aggregation.md new file mode 100644 index 00000000..bb7c5a23 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Aggregation.md @@ -0,0 +1,488 @@ + + +# Aggregate Functions + +Aggregate functions are many-to-one functions. They perform aggregate calculations on a set of values, resulting in a single aggregated result. + +All aggregate functions except `COUNT()`, `COUNT_IF()` ignore null values and return null when there are no input rows or all values are null. For example, `SUM()` returns null instead of zero, and `AVG()` does not include null values in the count. + +The aggregate functions supported by IoTDB are as follows: + +| Function Name | Description | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | +| ------------- | ------------------------------------------------------------ |-----------------------------------------------------| ------------------------------------------------------------ | ----------------------------------- | +| SUM | Summation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| COUNT | Counts the number of data points. | All data types | / | INT | +| AVG | Average. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| EXTREME | Finds the value with the largest absolute value. Returns a positive value if the maximum absolute value of positive and negative values is equal. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| MAX_VALUE | Find the maximum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| MIN_VALUE | Find the minimum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| FIRST_VALUE | Find the value with the smallest timestamp. | All data types | / | Consistent with input data type | +| LAST_VALUE | Find the value with the largest timestamp. | All data types | / | Consistent with input data type | +| MAX_TIME | Find the maximum timestamp. | All data Types | / | Timestamp | +| MIN_TIME | Find the minimum timestamp. | All data Types | / | Timestamp | +| COUNT_IF | Find the number of data points that continuously meet a given condition and the number of data points that meet the condition (represented by keep) meet the specified threshold. | BOOLEAN | `[keep >=/>/=/!=/= threshold` if `threshold` is used alone, type of `threshold` is `INT64` `ignoreNull`:Optional, default value is `true`;If the value is `true`, null values are ignored, it means that if there is a null value in the middle, the value is ignored without interrupting the continuity. If the value is `true`, null values are not ignored, it means that if there are null values in the middle, continuity will be broken | INT64 | +| TIME_DURATION | Find the difference between the timestamp of the largest non-null value and the timestamp of the smallest non-null value in a column | All data Types | / | INT64 | +| MODE | Find the mode. Note: 1.Having too many different values in the input series risks a memory exception; 2.If all the elements have the same number of occurrences, that is no Mode, return the value with earliest time; 3.If there are many Modes, return the Mode with earliest time. | All data Types | / | Consistent with the input data type | +| COUNT_TIME | The number of timestamps in the query data set. When used with `align by device`, the result is the number of timestamps in the data set per device. | All data Types, the input parameter can only be `*` | / | INT64 | +| MAX_BY | MAX_BY(x, y) returns the value of x corresponding to the maximum value of the input y. MAX_BY(time, x) returns the timestamp when x is at its maximum value. | The first input x can be of any type, while the second input y must be of type INT32, INT64, FLOAT, or DOUBLE. | / | Consistent with the data type of the first input x. | +| MIN_BY | MIN_BY(x, y) returns the value of x corresponding to the minimum value of the input y. MIN_BY(time, x) returns the timestamp when x is at its minimum value. | The first input x can be of any type, while the second input y must be of type INT32, INT64, FLOAT, or DOUBLE. | / | Consistent with the data type of the first input x. | + + +## COUNT + +### example + +```sql +select count(status) from root.ln.wf01.wt01; +``` +Result: + +``` ++-------------------------------+ +|count(root.ln.wf01.wt01.status)| ++-------------------------------+ +| 10080| ++-------------------------------+ +Total line number = 1 +It costs 0.016s +``` + +## COUNT_IF + +### Grammar +```sql +count_if(predicate, [keep >=/>/=/!=/Note: count_if is not supported to use with SlidingWindow in group by time now + +### example + +#### raw data + +``` ++-----------------------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2| ++-----------------------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 0| 0| +|1970-01-01T08:00:00.002+08:00| null| 0| +|1970-01-01T08:00:00.003+08:00| 0| 0| +|1970-01-01T08:00:00.004+08:00| 0| 0| +|1970-01-01T08:00:00.005+08:00| 1| 0| +|1970-01-01T08:00:00.006+08:00| 1| 0| +|1970-01-01T08:00:00.007+08:00| 1| 0| +|1970-01-01T08:00:00.008+08:00| 0| 0| +|1970-01-01T08:00:00.009+08:00| 0| 0| +|1970-01-01T08:00:00.010+08:00| 0| 0| ++-----------------------------+-------------+-------------+ +``` + +#### Not use `ignoreNull` attribute (Ignore Null) + +SQL: +```sql +select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1 +``` + +Result: +``` ++--------------------------------------------------+--------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3)|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3)| ++--------------------------------------------------+--------------------------------------------------+ +| 2| 1| ++--------------------------------------------------+-------------------------------------------------- +``` + +#### Use `ignoreNull` attribute + +SQL: +```sql +select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1 +``` + +Result: +``` ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +| 1| 1| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +``` + +## TIME_DURATION +### Grammar +```sql + time_duration(Path) +``` +### Example +#### raw data +```sql ++----------+-------------+ +| Time|root.db.d1.s1| ++----------+-------------+ +| 1| 70| +| 3| 10| +| 4| 303| +| 6| 110| +| 7| 302| +| 8| 110| +| 9| 60| +| 10| 70| +|1677570934| 30| ++----------+-------------+ +``` +#### Insert sql +```sql +"CREATE DATABASE root.db", +"CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN tags(city=Beijing)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1, 2, 10, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(2, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(3, 10, 0, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(4, 303, 30, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(5, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(6, 110, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(7, 302, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(8, 110, null, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(9, 60, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(10,70, 20, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1677570934, 30, 0, true)", +``` + +SQL: +```sql +select time_duration(s1) from root.db.d1 +``` + +Result: +``` ++----------------------------+ +|time_duration(root.db.d1.s1)| ++----------------------------+ +| 1677570933| ++----------------------------+ +``` +> Note: Returns 0 if there is only one data point, or null if the data point is null. + +## COUNT_TIME +### Grammar +```sql + count_time(*) +``` +### Example +#### raw data +``` ++----------+-------------+-------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2|root.db.d2.s1|root.db.d2.s2| ++----------+-------------+-------------+-------------+-------------+ +| 0| 0| null| null| 0| +| 1| null| 1| 1| null| +| 2| null| 2| 2| null| +| 4| 4| null| null| 4| +| 5| 5| 5| 5| 5| +| 7| null| 7| 7| null| +| 8| 8| 8| 8| 8| +| 9| null| 9| null| null| ++----------+-------------+-------------+-------------+-------------+ +``` +#### Insert sql +```sql +CREATE DATABASE root.db; +CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d1.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +INSERT INTO root.db.d1(time, s1) VALUES(0, 0), (4,4), (5,5), (8,8); +INSERT INTO root.db.d1(time, s2) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8), (9,9); +INSERT INTO root.db.d2(time, s1) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8); +INSERT INTO root.db.d2(time, s2) VALUES(0, 0), (4,4), (5,5), (8,8); +``` + +Query-Example - 1: +```sql +select count_time(*) from root.db.** +``` + +Result +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +Query-Example - 2: +```sql +select count_time(*) from root.db.d1, root.db.d2 +``` + +Result +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +Query-Example - 3: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) +``` + +Result +``` ++-----------------------------+-------------+ +| Time|count_time(*)| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 1| +|1970-01-01T08:00:00.008+08:00| 2| ++-----------------------------+-------------+ +``` + +Query-Example - 4: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) align by device +``` + +Result +``` ++-----------------------------+----------+-------------+ +| Time| Device|count_time(*)| ++-----------------------------+----------+-------------+ +|1970-01-01T08:00:00.000+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.000+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d2| 1| ++-----------------------------+----------+-------------+ +``` + +> Note: +> 1. The parameter in count_time can only be *. +> 2. Count_time aggregation cannot be used with other aggregation functions. +> 3. Count_time aggregation used with having statement is not supported, and count_time aggregation can not appear in the having statement. +> 4. Count_time does not support use with group by level, group by tag. + + +### MAX_BY +#### Function Definition +max_by(x, y): Returns the value of x at the timestamp when y is at its maximum. +- max_by must have two input parameters x and y. +- The first input x can be the keyword time, with max_by(time, x) returning the timestamp when x is at its maximum value. +- If x is null at the timestamp corresponding to the maximum value of y, null is returned. +- If y reaches its maximum value at multiple timestamps, the x value corresponding to the smallest timestamp among those maximum values is returned. +- Consistent with IoTDB max_value, only INT32, INT64, FLOAT, DOUBLE are supported as inputs for y, while all six types are supported as inputs for x. +- Direct numerical values are not allowed as inputs for either x or y. + +#### Grammar +```sql +select max_by(x, y) from root.sg +select max_by(time, x) from root.sg +``` + +#### Examples + +##### Input Data +```sql +IoTDB> select * from root.test ++-----------------------------+-----------+-----------+ +| Time|root.test.a|root.test.b| ++-----------------------------+-----------+-----------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 10.0| +|1970-01-01T08:00:00.002+08:00| 2.0| 10.0| +|1970-01-01T08:00:00.003+08:00| 3.0| 3.0| +|1970-01-01T08:00:00.004+08:00| 10.0| 10.0| +|1970-01-01T08:00:00.005+08:00| 10.0| 12.0| +|1970-01-01T08:00:00.006+08:00| 6.0| 6.0| ++-----------------------------+-----------+-----------+ +``` +##### Query Example +Querying the timestamp corresponding to the maximum value: +```sql +IoTDB> select max_by(time, a), max_value(a) from root.test ++-------------------------+------------------------+ +|max_by(Time, root.test.a)| max_value(root.test.a)| ++-------------------------+------------------------+ +| 4| 10.0| ++-------------------------+------------------------+ +``` + +Finding the value of b when a is at its maximum: +```sql +IoTDB> select max_by(b, a) from root.test ++--------------------------------+ +|max_by(root.test.b, root.test.a)| ++--------------------------------+ +| 10.0| ++--------------------------------+ +``` + +Using with expressions: +```sql +IoTDB> select max_by(b + 1, a * 2) from root.test ++----------------------------------------+ +|max_by(root.test.b + 1, root.test.a * 2)| ++----------------------------------------+ +| 11.0| ++----------------------------------------+ +``` + +Using with group by clause: +```sql +IoTDB> select max_by(b, a) from root.test group by ([0,7),4ms) ++-----------------------------+--------------------------------+ +| Time|max_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +Using with having clause: +```sql +IoTDB> select max_by(b, a) from root.test group by ([0,7),4ms) having max_by(b, a) > 4.0 ++-----------------------------+--------------------------------+ +| Time|max_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` +Using with order by clause: +```sql +IoTDB> select max_by(b, a) from root.test group by ([0,7),4ms) order by time desc ++-----------------------------+--------------------------------+ +| Time|max_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +``` + +### MIN_BY +#### Function Definition +min_by(x, y): Returns the value of x at the timestamp when y is at its minimum. +- min_by must have two input parameters x and y. +- The first input x can be the keyword time, with min_by(time, x) returning the timestamp when x is at its minimum value. +- If x is null at the timestamp corresponding to the minimum value of y, null is returned. +- If y reaches its minimum value at multiple timestamps, the x value corresponding to the smallest timestamp among those minimum values is returned. +- Consistent with IoTDB min_value, only INT32, INT64, FLOAT, DOUBLE are supported as inputs for y, while all six types are supported as inputs for x. +- Direct numerical values are not allowed as inputs for either x or y. + +#### Grammar +```sql +select min_by(x, y) from root.sg +select min_by(time, x) from root.sg +``` + +#### Examples + +##### Input Data +```sql +IoTDB> select * from root.test ++-----------------------------+-----------+-----------+ +| Time|root.test.a|root.test.b| ++-----------------------------+-----------+-----------+ +|1970-01-01T08:00:00.001+08:00| 4.0| 10.0| +|1970-01-01T08:00:00.002+08:00| 3.0| 10.0| +|1970-01-01T08:00:00.003+08:00| 2.0| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 10.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 12.0| +|1970-01-01T08:00:00.006+08:00| 6.0| 6.0| ++-----------------------------+-----------+-----------+ +``` +##### Query Example +Querying the timestamp corresponding to the minimum value: +```sql +IoTDB> select min_by(time, a), min_value(a) from root.test ++-------------------------+------------------------+ +|min_by(Time, root.test.a)| min_value(root.test.a)| ++-------------------------+------------------------+ +| 4| 1.0| ++-------------------------+------------------------+ +``` + +Finding the value of b when a is at its minimum: +```sql +IoTDB> select min_by(b, a) from root.test ++--------------------------------+ +|min_by(root.test.b, root.test.a)| ++--------------------------------+ +| 10.0| ++--------------------------------+ +``` + +Using with expressions: +```sql +IoTDB> select min_by(b + 1, a * 2) from root.test ++----------------------------------------+ +|min_by(root.test.b + 1, root.test.a * 2)| ++----------------------------------------+ +| 11.0| ++----------------------------------------+ +``` + +Using with group by clause: +```sql +IoTDB> select min_by(b, a) from root.test group by ([0,7),4ms) ++-----------------------------+--------------------------------+ +| Time|min_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +Using with having clause: +```sql +IoTDB> select min_by(b, a) from root.test group by ([0,7),4ms) having max_by(b, a) > 4.0 ++-----------------------------+--------------------------------+ +| Time|min_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` +Using with order by clause: +```sql +IoTDB> select min_by(b, a) from root.test group by ([0,7),4ms) order by time desc ++-----------------------------+--------------------------------+ +| Time|min_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Anomaly-Detection.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Anomaly-Detection.md new file mode 100644 index 00000000..c8490f02 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Anomaly-Detection.md @@ -0,0 +1,824 @@ + + +# Anomaly Detection + +## IQR + +### Usage + +This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected. + +**Name:** IQR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch". ++ `q1`: The lower quantile when method is set to "stream". ++ `q3`: The upper quantile when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** $IQR=Q_3-Q_1$ + +### Examples + +#### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select iqr(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +## KSigma + +### Usage + +This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. +Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly. + +**Name:** KSIGMA + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `k`: How many times to multiply on standard deviation to define anomaly, the default value is 3. ++ `window`: The window size of Dynamic K-Sigma Algorithm, the default value is 10000. + +**Output Series:** Output a single series. The type is same as input series. + +**Note:** Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output. + +### Examples + +#### Assigning k + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +## LOF + +### Usage + +This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output. + +**Name:** LOF + +**Input Series:** Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension. ++ `k`:use the k-th distance to calculate lof. Default value is 3. ++ `window`: size of window to split origin data points. Default value is 10000. ++ `windowsize`:dimension that will be transformed into when method is "series". The default value is 5. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** Incomplete rows will be ignored. They are neither calculated nor marked as anomaly. + +### Examples + +#### Using default parameters + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +#### Diagnosing 1d timeseries + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +## MissDetect + +### Usage + +This function is used to detect missing anomalies. +In some datasets, missing values are filled by linear interpolation. +Thus, there are several long perfect linear segments. +By discovering these perfect linear segments, +missing anomalies are detected. + +**Name:** MISSDETECT + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + +`error`: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10. + +**Output Series:** Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +## Range + +### Usage + +This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output. + +**Name:** RANGE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lower_bound`:lower bound of range anomaly detection. ++ `upper_bound`:upper bound of range anomaly detection. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** Only when `upper_bound` is larger than `lower_bound`, the anomaly detection will be performed. Otherwise, nothing will be output. + + + +### Examples + +#### Assigning Lower and Upper Bound + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +## TwoSidedFilter + +### Usage + +The function is used to filter anomalies of a numeric time series based on two-sided window detection. + +**Name:** TWOSIDEDFILTER + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE + +**Output Series:** Output a single series. The type is the same as the input. It is the input without anomalies. + +**Parameter:** + +- `len`: The size of the window, which is a positive integer. By default, it's 5. When `len`=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point. + +- `threshold`: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold. + +### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +Output series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +## Outlier + +### Usage + +This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier. + +**Name:** OUTLIER + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `r`:the neighbor distance threshold. ++ `k`:the neighbor count threshold. ++ `w`:the window size. ++ `s`:the slide size. + +**Output Series:** Output a single series. The type is the same as the input. + +### Examples + +#### Assigning Parameters of Queries + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + + +## MasterTrain + +### Usage + +This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points. + +**Name:** MasterTrain + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn clean package -am -Dmaven.test.skip=true`. +- Copy `./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain'` in client. + +### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ +``` + +## MasterDetect + +### Usage + +This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain. + +**Name:** MasterDetect + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. ++ `eta`: The detection threshold. By default, it will be estimated based on the 3-sigma rule. ++ `output_type`: The type of output. 'repair' for repairing and 'anomaly' for anomaly detection. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn clean package -am -Dmaven.test.skip=true`. +- Copy `./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'` in client. + +### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +#### Repairing + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +#### Anomaly Detection + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| true| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Comparison.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Comparison.md new file mode 100644 index 00000000..f4c87b5f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Comparison.md @@ -0,0 +1,305 @@ + + +# Comparison Operators and Functions + +## Basic comparison operators + +Supported operators `>`, `>=`, `<`, `<=`, `==`, `!=` (or `<>` ) + +Supported input data types: `INT32`, `INT64`, `FLOAT` and `DOUBLE` + +Note: It will transform all type to `DOUBLE` then do computation. + +Output data type: `BOOLEAN` + +**Example:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` + +## `BETWEEN ... AND ...` operator + +|operator |meaning| +|-----------------------------|-----------| +|`BETWEEN ... AND ...` |within the specified range| +|`NOT BETWEEN ... AND ...` |Not within the specified range| + +**Example:** Select data within or outside the interval [36.5,40]: + +```sql +select temperature from root.sg1.d1 where temperature between 36.5 and 40; +``` + +```sql +select temperature from root.sg1.d1 where temperature not between 36.5 and 40; +``` + +## Fuzzy matching operator + +For TEXT type data, support fuzzy matching of data using `Like` and `Regexp` operators. + +|operator |meaning| +|-----------------------------|-----------| +|`LIKE` | matches simple patterns| +|`NOT LIKE` |cannot match simple pattern| +|`REGEXP` | Match regular expression| +|`NOT REGEXP` |Cannot match regular expression| + +Input data type: `TEXT` + +Return type: `BOOLEAN` + +### Use `Like` for fuzzy matching + +**Matching rules:** + +- `%` means any 0 or more characters. +- `_` means any single character. + +**Example 1:** Query the data under `root.sg.d1` that contains `'cc'` in `value`. + +```shell +IoTDB> select * from root.sg.d1 where value like '%cc%' ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++--------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query the data under `root.sg.d1` with `'b'` in the middle of `value` and any single character before and after. + +```shell +IoTDB> select * from root.sg.device where value like '_b_' ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00|abc| ++--------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +### Use `Regexp` for fuzzy matching + +The filter condition that needs to be passed in is **Java standard library style regular expression**. + +**Common regular matching examples:** + +``` +All characters with a length of 3-20: ^.{3,20}$ +Uppercase English characters: ^[A-Z]+$ +Numbers and English characters: ^[A-Za-z0-9]+$ +Starting with a: ^a.* +``` + +**Example 1:** Query the string of 26 English characters for value under root.sg.d1. + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++--------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query root.sg.d1 where the value is a string consisting of 26 lowercase English characters and the time is greater than 100. + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++--------------------------+----------------+ +| Time|root.sg.d1.value| ++--------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++--------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 3:** + +```sql +select b, b like '1%', b regexp '[0-2]' from root.test; +``` + +operation result +``` ++-----------------------------+-----------+------- ------------------+--------------------------+ +| Time|root.test.b|root.test.b LIKE '^1.*?$'|root.test.b REGEXP '[0-2]'| ++-----------------------------+-----------+------- ------------------+--------------------------+ +|1970-01-01T08:00:00.001+08:00| 111test111| true| true| +|1970-01-01T08:00:00.003+08:00| 333test333| false| false| ++-----------------------------+-----------+------- ------------------+--------------------------+ +``` + +## `IS NULL` operator + +|operator |meaning| +|-----------------------------|-----------| +|`IS NULL` |is a null value| +|`IS NOT NULL` |is not a null value| + +**Example 1:** Select data with empty values: + +```sql +select code from root.sg1.d1 where temperature is null; +``` + +**Example 2:** Select data with non-null values: + +```sql +select code from root.sg1.d1 where temperature is not null; +``` + +## `IN` operator + +|operator |meaning| +|-----------------------------|-----------| +|`IN` / `CONTAINS` | are the values ​​in the specified list| +|`NOT IN` / `NOT CONTAINS` |not a value in the specified list| + +Input data type: `All Types` + +return type `BOOLEAN` + +**Note: Please ensure that the values ​​in the collection can be converted to the type of the input data. ** +> +> For example: +> +> `s1 in (1, 2, 3, 'test')`, the data type of `s1` is `INT32` +> +> We will throw an exception because `'test'` cannot be converted to type `INT32` + +**Example 1:** Select data with values ​​within a certain range: + +```sql +select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +``` + +**Example 2:** Select data with values ​​outside a certain range: + +```sql +select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +``` + +**Example 3:** + +```sql +select a, a in (1, 2) from root.test; +``` + +Output 2: +``` ++-----------------------------+-----------+------- -------------+ +| Time|root.test.a|root.test.a IN (1,2)| ++-----------------------------+-----------+------- -------------+ +|1970-01-01T08:00:00.001+08:00| 1| true| +|1970-01-01T08:00:00.003+08:00| 3| false| ++-----------------------------+-----------+------- -------------+ +``` + +## Condition Functions + +Condition functions are used to check whether timeseries data points satisfy some specific condition. + +They return BOOLEANs. + +Currently, IoTDB supports the following condition functions: + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | --------------------------------------------- | ----------------------- | --------------------------------------------- | +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`: a double type variate | BOOLEAN | Return `ts_value >= threshold`. | +| IN_RANGR | INT32 / INT64 / FLOAT / DOUBLE | `lower`: DOUBLE type
`upper`: DOUBLE type | BOOLEAN | Return `ts_value >= lower && value <= upper`. | + +Example Data: +``` +IoTDB> select ts from root.test; ++-----------------------------+------------+ +| Time|root.test.ts| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 3| +|1970-01-01T08:00:00.004+08:00| 4| ++-----------------------------+------------+ +``` + +### Test 1 +SQL: +```sql +select ts, on_off(ts, 'threshold'='2') from root.test; +``` + +Output: +``` +IoTDB> select ts, on_off(ts, 'threshold'='2') from root.test; ++-----------------------------+------------+-------------------------------------+ +| Time|root.test.ts|on_off(root.test.ts, "threshold"="2")| ++-----------------------------+------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| true| ++-----------------------------+------------+-------------------------------------+ +``` + +### Test 2 +Sql: +```sql +select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; +``` + +Output: +``` +IoTDB> select ts, in_range(ts,'lower'='2', 'upper'='3.1') from root.test; ++-----------------------------+------------+--------------------------------------------------+ +| Time|root.test.ts|in_range(root.test.ts, "lower"="2", "upper"="3.1")| ++-----------------------------+------------+--------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| false| ++-----------------------------+------------+--------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conditional.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conditional.md new file mode 100644 index 00000000..daa16ea2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conditional.md @@ -0,0 +1,349 @@ + + +# Conditional Expressions + +## CASE + +The CASE expression is a kind of conditional expression that can be used to return different values based on specific conditions, similar to the if-else statements in other languages. + +The CASE expression consists of the following parts: + +- CASE keyword: Indicates the start of the CASE expression. +- WHEN-THEN clauses: There may be multiple clauses used to define conditions and give results. This clause is divided into two parts, WHEN and THEN. The WHEN part defines the condition, and the THEN part defines the result expression. If the WHEN condition is true, the corresponding THEN result is returned. +- ELSE clause: If none of the WHEN conditions is true, the result in the ELSE clause will be returned. The ELSE clause can be omitted. +- END keyword: Indicates the end of the CASE expression. + +The CASE expression is a scalar operation that can be used in combination with any other scalar operation or aggregate function. + +In the following text, all THEN parts and ELSE clauses will be collectively referred to as result clauses. + +### Syntax + +The CASE expression supports two formats. + +- Format 1: + ```sql + CASE + WHEN condition1 THEN expression1 + [WHEN condition2 THEN expression2] ... + [ELSE expression_end] + END + ``` + The `condition`s will be evaluated one by one. + + The first `condition` that is true will return the corresponding expression. + +- Format 2: + ```sql + CASE caseValue + WHEN whenValue1 THEN expression1 + [WHEN whenValue2 THEN expression2] ... + [ELSE expression_end] + END + ``` + The `caseValue` will be evaluated first, and then the `whenValue`s will be evaluated one by one. The first `whenValue` that is equal to the `caseValue` will return the corresponding `expression`. + + Format 2 will be transformed into an equivalent Format 1 by iotdb. + + For example, the above SQL statement will be transformed into: + + ```sql + CASE + WHEN caseValue=whenValue1 THEN expression1 + [WHEN caseValue=whenValue1 THEN expression1] ... + [ELSE expression_end] + END + ``` + +If none of the conditions are true, or if none of the `whenValue`s match the `caseValue`, the `expression_end` will be returned. + +If there is no ELSE clause, `null` will be returned. + +### Notes + +- In format 1, all WHEN clauses must return a BOOLEAN type. +- In format 2, all WHEN clauses must be able to be compared to the CASE clause. +- All result clauses in a CASE expression must satisfy certain conditions for their return value types: + - BOOLEAN types cannot coexist with other types and will cause an error if present. + - TEXT types cannot coexist with other types and will cause an error if present. + - The other four numeric types can coexist, and the final result will be of DOUBLE type, with possible precision loss during conversion. + - If necessary, you can use the CAST function to convert the result to a type that can coexist with others. +- The CASE expression does not implement lazy evaluation, meaning that all clauses will be evaluated. +- The CASE expression does not support mixing with UDFs. +- Aggregate functions cannot be used within a CASE expression, but the result of a CASE expression can be used as input for an aggregate function. +- When using the CLI, because the CASE expression string can be lengthy, it is recommended to provide an alias for the expression using AS. + +### Using Examples + +#### Example 1 + +The CASE expression can be used to analyze data in a visual way. For example: +- The preparation of a certain chemical product requires that the temperature and pressure be within specific ranges. +- During the preparation process, sensors will detect the temperature and pressure, forming two time-series T (temperature) and P (pressure) in IoTDB. +In this application scenario, the CASE expression can indicate which time parameters are appropriate, which are not, and why they are not. + +data: +```sql +IoTDB> select * from root.test1 ++-----------------------------+------------+------------+ +| Time|root.test1.P|root.test1.T| ++-----------------------------+------------+------------+ +|2023-03-29T11:25:54.724+08:00| 1000000.0| 1025.0| +|2023-03-29T11:26:13.445+08:00| 1000094.0| 1040.0| +|2023-03-29T11:27:36.988+08:00| 1000095.0| 1041.0| +|2023-03-29T11:27:56.446+08:00| 1000095.0| 1059.0| +|2023-03-29T11:28:20.838+08:00| 1200000.0| 1040.0| ++-----------------------------+------------+------------+ +``` + +SQL statements: +```sql +select T, P, case +when 1000=1050 then "bad temperature" +when P<=1000000 or P>=1100000 then "bad pressure" +end as `result` +from root.test1 +``` + + +output: +``` ++-----------------------------+------------+------------+---------------+ +| Time|root.test1.T|root.test1.P| result| ++-----------------------------+------------+------------+---------------+ +|2023-03-29T11:25:54.724+08:00| 1025.0| 1000000.0| bad pressure| +|2023-03-29T11:26:13.445+08:00| 1040.0| 1000094.0| good!| +|2023-03-29T11:27:36.988+08:00| 1041.0| 1000095.0| good!| +|2023-03-29T11:27:56.446+08:00| 1059.0| 1000095.0|bad temperature| +|2023-03-29T11:28:20.838+08:00| 1040.0| 1200000.0| bad pressure| ++-----------------------------+------------+------------+---------------+ +``` + + +#### Example 2 + +The CASE expression can achieve flexible result transformation, such as converting strings with a certain pattern to other strings. + +data: +```sql +IoTDB> select * from root.test2 ++-----------------------------+--------------+ +| Time|root.test2.str| ++-----------------------------+--------------+ +|2023-03-27T18:23:33.427+08:00| abccd| +|2023-03-27T18:23:39.389+08:00| abcdd| +|2023-03-27T18:23:43.463+08:00| abcdefg| ++-----------------------------+--------------+ +``` + +SQL statements: +```sql +select str, case +when str like "%cc%" then "has cc" +when str like "%dd%" then "has dd" +else "no cc and dd" end as `result` +from root.test2 +``` + +output: +``` ++-----------------------------+--------------+------------+ +| Time|root.test2.str| result| ++-----------------------------+--------------+------------+ +|2023-03-27T18:23:33.427+08:00| abccd| has cc| +|2023-03-27T18:23:39.389+08:00| abcdd| has dd| +|2023-03-27T18:23:43.463+08:00| abcdefg|no cc and dd| ++-----------------------------+--------------+------------+ +``` + +#### Example 3: work with aggregation functions + +##### Valid: aggregation function ← CASE expression + +The CASE expression can be used as a parameter for aggregate functions. For example, used in conjunction with the COUNT function, it can implement statistics based on multiple conditions simultaneously. + +data: +```sql +IoTDB> select * from root.test3 ++-----------------------------+------------+ +| Time|root.test3.x| ++-----------------------------+------------+ +|2023-03-27T18:11:11.300+08:00| 0.0| +|2023-03-27T18:11:14.658+08:00| 1.0| +|2023-03-27T18:11:15.981+08:00| 2.0| +|2023-03-27T18:11:17.668+08:00| 3.0| +|2023-03-27T18:11:19.112+08:00| 4.0| +|2023-03-27T18:11:20.822+08:00| 5.0| +|2023-03-27T18:11:22.462+08:00| 6.0| +|2023-03-27T18:11:24.174+08:00| 7.0| +|2023-03-27T18:11:25.858+08:00| 8.0| +|2023-03-27T18:11:27.979+08:00| 9.0| ++-----------------------------+------------+ +``` + +SQL statements: + +```sql +select +count(case when x<=1 then 1 end) as `(-∞,1]`, +count(case when 1 select * from root.test4 ++-----------------------------+------------+ +| Time|root.test4.x| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| ++-----------------------------+------------+ +``` + +SQL statements: +```sql +select x, case x when 1 then "one" when 2 then "two" else "other" end from root.test4 +``` + +output: +``` ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +| Time|root.test4.x|CASE WHEN root.test4.x = 1 THEN "one" WHEN root.test4.x = 2 THEN "two" ELSE "other"| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| one| +|1970-01-01T08:00:00.002+08:00| 2.0| two| +|1970-01-01T08:00:00.003+08:00| 3.0| other| +|1970-01-01T08:00:00.004+08:00| 4.0| other| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +``` + +#### Example 5: type of return clauses + +The result clause of a CASE expression needs to satisfy certain type restrictions. + +In this example, we continue to use the data from Example 4. + +##### Invalid: BOOLEAN cannot coexist with other types + +SQL statements: +```sql +select x, case x when 1 then true when 2 then 2 end from root.test4 +``` + +output: +``` +Msg: 701: CASE expression: BOOLEAN and other types cannot exist at same time +``` + +##### Valid: Only BOOLEAN type exists + +SQL statements: +```sql +select x, case x when 1 then true when 2 then false end as `result` from root.test4 +``` + +output: +``` ++-----------------------------+------------+------+ +| Time|root.test4.x|result| ++-----------------------------+------------+------+ +|1970-01-01T08:00:00.001+08:00| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 2.0| false| +|1970-01-01T08:00:00.003+08:00| 3.0| null| +|1970-01-01T08:00:00.004+08:00| 4.0| null| ++-----------------------------+------------+------+ +``` + +##### Invalid:TEXT cannot coexist with other types + +SQL statements: +```sql +select x, case x when 1 then 1 when 2 then "str" end from root.test4 +``` + +output: +``` +Msg: 701: CASE expression: TEXT and other types cannot exist at same time +``` + +##### Valid: Only TEXT type exists + +See in Example 1. + +##### Valid: Numerical types coexist + +SQL statements: +```sql +select x, case x +when 1 then 1 +when 2 then 222222222222222 +when 3 then 3.3 +when 4 then 4.4444444444444 +end as `result` +from root.test4 +``` + +output: +``` ++-----------------------------+------------+-------------------+ +| Time|root.test4.x| result| ++-----------------------------+------------+-------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0|2.22222222222222E14| +|1970-01-01T08:00:00.003+08:00| 3.0| 3.299999952316284| +|1970-01-01T08:00:00.004+08:00| 4.0| 4.44444465637207| ++-----------------------------+------------+-------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Constant.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Constant.md new file mode 100644 index 00000000..0cec3713 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Constant.md @@ -0,0 +1,57 @@ + + +# Constant Timeseries Generating Functions + +The constant timeseries generating function is used to generate a timeseries in which the values of all data points are the same. + +The constant timeseries generating function accepts one or more timeseries inputs, and the timestamp set of the output data points is the union of the timestamp sets of the input timeseries. + +Currently, IoTDB supports the following constant timeseries generating functions: + +| Function Name | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------ | +| CONST | `value`: the value of the output data point
`type`: the type of the output data point, it can only be INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | Determined by the required attribute `type` | Output the user-specified constant timeseries according to the attributes `value` and `type`. | +| PI | None | DOUBLE | Data point value: a `double` value of `π`, the ratio of the circumference of a circle to its diameter, which is equals to `Math.PI` in the *Java Standard Library*. | +| E | None | DOUBLE | Data point value: a `double` value of `e`, the base of the natural logarithms, which is equals to `Math.E` in the *Java Standard Library*. | + +Example: + +``` sql +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; +``` + +Result: + +``` +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|const(root.sg1.d1.s1, "value"="1024", "type"="INT64")|pi(root.sg1.d1.s2)|e(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 1024| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.001+08:00| 1.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.002+08:00| 2.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.003+08:00| null| 3.0| null| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.004+08:00| null| 4.0| null| 3.141592653589793| 2.718281828459045| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +Total line number = 5 +It costs 0.005s +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Continuous-Interval.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Continuous-Interval.md new file mode 100644 index 00000000..beec3df1 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Continuous-Interval.md @@ -0,0 +1,73 @@ + + +# Continuous Interval Functions + +The continuous interval functions are used to query all continuous intervals that meet specified conditions. +They can be divided into two categories according to return value: +1. Returns the start timestamp and time span of the continuous interval that meets the conditions (a time span of 0 means that only the start time point meets the conditions) +2. Returns the start timestamp of the continuous interval that meets the condition and the number of points in the interval (a number of 1 means that only the start time point meets the conditions) + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------- | ------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always 0(false), and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always not 0, and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always 0(false). Data points number `n` satisfy `n >= min && n <= max` | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L`
`max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always not 0(false). Data points number `n` satisfy `n >= min && n <= max` | + +## Demonstrate +Example data: +``` +IoTDB> select s1,s2,s3,s4,s5 from root.sg.d2; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d2.s1|root.sg.d2.s2|root.sg.d2.s3|root.sg.d2.s4|root.sg.d2.s5| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.004+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.005+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.006+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.007+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +``` + +Sql: +```sql +select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; +``` + +Result: +``` ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +| Time|root.sg.d2.s1|zero_count(root.sg.d2.s1)|non_zero_count(root.sg.d2.s2)|zero_duration(root.sg.d2.s3)|non_zero_duration(root.sg.d2.s4)| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.001+08:00| 1| null| 2| null| 1| +|1970-01-01T08:00:00.002+08:00| 1| null| null| null| null| +|1970-01-01T08:00:00.003+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.004+08:00| 1| null| 1| null| 0| +|1970-01-01T08:00:00.005+08:00| 0| 2| null| 1| null| +|1970-01-01T08:00:00.006+08:00| 0| null| null| null| null| +|1970-01-01T08:00:00.007+08:00| 1| null| 1| null| 0| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conversion.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conversion.md new file mode 100644 index 00000000..303eeed2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conversion.md @@ -0,0 +1,101 @@ + + +# Data Type Conversion Function + +The IoTDB currently supports 6 data types, including INT32, INT64 ,FLOAT, DOUBLE, BOOLEAN, TEXT. When we query or evaluate data, we may need to convert data types, such as TEXT to INT32, or FLOAT to DOUBLE. IoTDB supports cast function to convert data types. + +Syntax example: + +```sql +SELECT cast(s1 as INT32) from root.sg +``` + +The syntax of the cast function is consistent with that of PostgreSQL. The data type specified after AS indicates the target type to be converted. Currently, all six data types supported by IoTDB can be used in the cast function. The conversion rules to be followed are shown in the following table. The row represents the original data type, and the column represents the target data type to be converted into: + +| | **INT32** | **INT64** | **FLOAT** | **DOUBLE** | **BOOLEAN** | **TEXT** | +| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | -------------------------------- | +| **INT32** | No need to cast | Cast directly | Cast directly | Cast directly | !=0 : true
==0: false | String.valueOf() | +| **INT64** | Out of the range of INT32: throw Exception
Otherwise: Cast directly | No need to cast | Cast directly | Cast directly | !=0L : true
==0: false | String.valueOf() | +| **FLOAT** | Out of the range of INT32: throw Exception
Otherwise: Math.round() | Out of the range of INT64: throw Exception
Otherwise: Math.round() | No need to cast | Cast directly | !=0.0f : true
==0: false | String.valueOf() | +| **DOUBLE** | Out of the range of INT32: throw Exception
Otherwise: Math.round() | Out of the range of INT64: throw Exception
Otherwise: Math.round() | Out of the range of FLOAT:throw Exception
Otherwise: Cast directly | No need to cast | !=0.0 : true
==0: false | String.valueOf() | +| **BOOLEAN** | true: 1
false: 0 | true: 1L
false: 0 | true: 1.0f
false: 0 | true: 1.0
false: 0 | No need to cast | true: "true"
false: "false" | +| **TEXT** | Integer.parseInt() | Long.parseLong() | Float.parseFloat() | Double.parseDouble() | text.toLowerCase =="true" : true
text.toLowerCase =="false" : false
Otherwise: throw Exception | No need to cast | + +## Examples + +``` +// timeseries +IoTDB> show timeseries root.sg.d1.** ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +|root.sg.d1.s3| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s4| null| root.sg| DOUBLE| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s5| null| root.sg| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s6| null| root.sg| TEXT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s1| null| root.sg| INT32| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s2| null| root.sg| INT64| PLAIN| SNAPPY|null| null| null| null| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ + +// data of timeseries +IoTDB> select * from root.sg.d1; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d1.s3|root.sg.d1.s4|root.sg.d1.s5|root.sg.d1.s6|root.sg.d1.s1|root.sg.d1.s2| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| false| 10000| 0| 0| +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| false| 3| 1| 1| +|1970-01-01T08:00:00.002+08:00| 2.7| 2.7| true| TRue| 2| 2| +|1970-01-01T08:00:00.003+08:00| 3.33| 3.33| true| faLse| 3| 3| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ + +// cast BOOLEAN to other types +IoTDB> select cast(s5 as INT32), cast(s5 as INT64),cast(s5 as FLOAT),cast(s5 as DOUBLE), cast(s5 as TEXT) from root.sg.d1 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +| Time|CAST(root.sg.d1.s5 AS INT32)|CAST(root.sg.d1.s5 AS INT64)|CAST(root.sg.d1.s5 AS FLOAT)|CAST(root.sg.d1.s5 AS DOUBLE)|CAST(root.sg.d1.s5 AS TEXT)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ + +// cast TEXT to numeric types +IoTDB> select cast(s6 as INT32), cast(s6 as INT64), cast(s6 as FLOAT), cast(s6 as DOUBLE) from root.sg.d1 where time < 2 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +| Time|CAST(root.sg.d1.s6 AS INT32)|CAST(root.sg.d1.s6 AS INT64)|CAST(root.sg.d1.s6 AS FLOAT)|CAST(root.sg.d1.s6 AS DOUBLE)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 10000| 10000| 10000.0| 10000.0| +|1970-01-01T08:00:00.001+08:00| 3| 3| 3.0| 3.0| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ + +// cast TEXT to BOOLEAN +IoTDB> select cast(s6 as BOOLEAN) from root.sg.d1 where time >= 2 ++-----------------------------+------------------------------+ +| Time|CAST(root.sg.d1.s6 AS BOOLEAN)| ++-----------------------------+------------------------------+ +|1970-01-01T08:00:00.002+08:00| true| +|1970-01-01T08:00:00.003+08:00| false| ++-----------------------------+------------------------------+ +``` + + + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Matching.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Matching.md new file mode 100644 index 00000000..2d7d1228 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Matching.md @@ -0,0 +1,335 @@ + + +# Data Matching + +## Cov + +### Usage + +This function is used to calculate the population covariance. + +**Name:** COV + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population covariance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +## DTW + +### Usage + +This function is used to calculate the DTW distance between two input series. + +**Name:** DTW + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the DTW distance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `0` will be output. + + +### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +## Pearson + +### Usage + +This function is used to calculate the Pearson Correlation Coefficient. + +**Name:** PEARSON + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the Pearson Correlation Coefficient. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +## PtnSym + +### Usage + +This function is used to find all symmetric subseries in the input whose degree of symmetry is less than the threshold. +The degree of symmetry is calculated by DTW. +The smaller the degree, the more symmetrical the series is. + +**Name:** PATTERNSYMMETRIC + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameter:** + ++ `window`: The length of the symmetric subseries. It's a positive integer and the default value is 10. ++ `threshold`: The threshold of the degree of symmetry. It's non-negative. Only the subseries whose degree of symmetry is below it will be output. By default, all subseries will be output. + + +**Output Series:** Output a single series. The type is DOUBLE. Each data point in the output series corresponds to a symmetric subseries. The output timestamp is the starting timestamp of the subseries and the output value is the degree of symmetry. + +### Example + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +## XCorr + +### Usage + +This function is used to calculate the cross correlation function of given two time series. +For discrete time series, cross correlation is given by +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +which represent the similarities between two series with different index shifts. + +**Name:** XCORR + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series with DOUBLE as datatype. +There are $2N-1$ data points in the series, the center of which represents the cross correlation +calculated with pre-aligned series(that is $CR(0)$ in the formula above), +and the previous(or post) values represent those with shifting the latter series forward(or backward otherwise) +until the two series are no longer overlapped(not included). +In short, the values of output series are given by(index starts from 1) +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Profiling.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Profiling.md new file mode 100644 index 00000000..08f83798 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Profiling.md @@ -0,0 +1,1887 @@ + + + +# Data Profiling + +## ACF + +### Usage + +This function is used to calculate the auto-correlation factor of the input time series, +which equals to cross correlation between the same series. +For more information, please refer to [XCorr](./Data-Matching.md#XCorr) function. + +**Name:** ACF + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. +There are $2N-1$ data points in the series, and the values are interpreted in details in [XCorr](./Data-Matching.md#XCorr) function. + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| null| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +## Distinct + +### Usage + +This function returns all unique values in time series. + +**Name:** DISTINCT + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** + ++ The timestamp of the output series is meaningless. The output order is arbitrary. ++ Missing points and null points in the input series will be ignored, but `NaN` will not. ++ Case Sensitive. + + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select distinct(s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +## Histogram + +### Usage + +This function is used to calculate the distribution histogram of a single column of numerical data. + +**Name:** HISTOGRAM + +**Input Series:** Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameters:** + ++ `min`: The lower limit of the requested data range, the default value is -Double.MAX_VALUE. ++ `max`: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end. ++ `count`: The number of buckets of the histogram, the default value is 1. It must be a positive integer. + +**Output Series:** The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is $min+ (i-1)\cdot\frac{max-min}{count}$ and the upper bound is $min + i \cdot \frac{max-min}{count}$. + +**Note:** + ++ If the value is lower than `min`, it will be put into the 1st bucket. If the value is larger than `max`, it will be put into the last bucket. ++ Missing points, null points and `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +## Integral + +### Usage + +This function is used to calculate the integration of time series, +which equals to the area under the curve with time as X-axis and values as Y-axis. + +**Name:** INTEGRAL + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `unit`: The unit of time used when computing the integral. + The value should be chosen from "1S", "1s", "1m", "1H", "1d"(case-sensitive), + and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration. + +**Note:** + ++ The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. + Choosing different `unit` implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + +### Examples + +#### Default Parameters + +With default parameters, this function will take one second as 1.0. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + +#### Specific time unit + +With time unit specified as "1m", this function will take one minute as 1.0. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +## IntegralAvg + +### Usage + +This function is used to calculate the function average of time series. +The output equals to the area divided by the time interval using the same time `unit`. +For more information of the area under the curve, please refer to `Integral` function. + +**Name:** INTEGRALAVG + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average. + +**Note:** + ++ The time-weighted value equals to the integral value with any `unit` divided by the time interval of input series. + The result is irrelevant to the time unit used in integral, and it's consistent with the timestamp precision of IoTDB by default. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + ++ If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +## Mad + +### Usage + +The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements' median. + +Take a dataset $\{1,3,3,5,5,6,7,8,9\}$ as an instance. Its median is 5 and the deviation of each element from the median is $\{0,0,1,2,2,2,3,4,4\}$, whose median is 2. Therefore, the MAD of the original dataset is 2. + +**Name:** MAD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking `error`=0.01 as an instance, suppose the exact MAD is $a$ and the approximate MAD is $b$, we have $0.99a \le b \le 1.01a$. With `error`=0, the output is the exact MAD. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +### Examples + +#### Exact Query + +With the default `error`(`error`=0), the function queries the exact MAD. + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select mad(s0) from root.test +``` + +Output series: + +``` ++-----------------------------+------------------+ +| Time| mad(root.test.s0)| ++-----------------------------+------------------+ +|1970-01-01T08:00:00.000+08:00|0.6806197166442871| ++-----------------------------+------------------+ +``` + +#### Approximate Query + +By setting `error` within (0,1), the function queries the approximate MAD. + +SQL for query: + +```sql +select mad(s0, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s0, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.6806616245859518| ++-----------------------------+---------------------------------+ +``` + +## Median + +### Usage + +The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample. + +**Name:** MEDIAN + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact median. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median. + +### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select median(s0, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s0, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.021884560585022| ++-----------------------------+------------------------------------+ +``` + +## MinMax + +### Usage + +This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1. + +**Name:** MINMAX + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide minimum and maximum values. The default method is "batch". ++ `min`: The maximum value when method is set to "stream". ++ `max`: The minimum value when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +### Examples + +#### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select minmax(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + +## Mode + +### Usage + +This function is used to calculate the mode of time series, that is, the value that occurs most frequently. + +**Name:** MODE + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is the same as which the first mode value has and value is the mode. + +**Note:** + ++ If there are multiple values with the most occurrences, the arbitrary one will be output. ++ Missing points and null points in the input series will be ignored, but `NaN` will not. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| Hello| +|1970-01-01T08:00:00.004+08:00| World| +|1970-01-01T08:00:00.005+08:00| World| +|1970-01-01T08:00:01.600+08:00| World| +|1970-01-15T09:37:34.451+08:00| Hello| +|1970-01-15T09:37:34.452+08:00| hello| +|1970-01-15T09:37:34.453+08:00| Hello| +|1970-01-15T09:37:34.454+08:00| World| +|1970-01-15T09:37:34.455+08:00| World| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select mode(s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+---------------------+ +| Time|mode(root.test.d2.s2)| ++-----------------------------+---------------------+ +|1970-01-01T08:00:00.004+08:00| World| ++-----------------------------+---------------------+ +``` + +## MvAvg + +### Usage + +This function is used to calculate moving average of input series. + +**Name:** MVAVG + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `window`: Length of the moving window. Default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. + +### Examples + +#### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +## PACF + +### Usage + +This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output. + +**Name:** PACF + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lag`: Maximum lag of pacf to calculate. The default value is $\min(10\log_{10}n,n-1)$, where $n$ is the number of data points. + +**Output Series:** Output a single series. The type is DOUBLE. + +### Examples + +#### Assigning maximum lag + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2019-12-27T00:00:00.000+08:00| 5.0| +|2019-12-27T00:05:00.000+08:00| 5.0| +|2019-12-27T00:10:00.000+08:00| 5.0| +|2019-12-27T00:15:00.000+08:00| 5.0| +|2019-12-27T00:20:00.000+08:00| 6.0| +|2019-12-27T00:25:00.000+08:00| 5.0| +|2019-12-27T00:30:00.000+08:00| 6.0| +|2019-12-27T00:35:00.000+08:00| 6.0| +|2019-12-27T00:40:00.000+08:00| 6.0| +|2019-12-27T00:45:00.000+08:00| 6.0| +|2019-12-27T00:50:00.000+08:00| 6.0| +|2019-12-27T00:55:00.000+08:00| 5.982609| +|2019-12-27T01:00:00.000+08:00| 5.9652176| +|2019-12-27T01:05:00.000+08:00| 5.947826| +|2019-12-27T01:10:00.000+08:00| 5.9304347| +|2019-12-27T01:15:00.000+08:00| 5.9130435| +|2019-12-27T01:20:00.000+08:00| 5.8956523| +|2019-12-27T01:25:00.000+08:00| 5.878261| +|2019-12-27T01:30:00.000+08:00| 5.8608694| +|2019-12-27T01:35:00.000+08:00| 5.843478| +............ +Total line number = 18066 +``` + +SQL for query: + +```sql +select pacf(s1, "lag"="5") from root.test +``` + +Output series: + +``` ++-----------------------------+-----------------------------+ +| Time|pacf(root.test.s1, "lag"="5")| ++-----------------------------+-----------------------------+ +|2019-12-27T00:00:00.000+08:00| 1.0| +|2019-12-27T00:05:00.000+08:00| 0.3528915091942786| +|2019-12-27T00:10:00.000+08:00| 0.1761346122516304| +|2019-12-27T00:15:00.000+08:00| 0.1492391973294682| +|2019-12-27T00:20:00.000+08:00| 0.03560059645868398| +|2019-12-27T00:25:00.000+08:00| 0.0366222998995286| ++-----------------------------+-----------------------------+ +``` + +## Percentile + +### Usage + +The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series. + +**Name:** PERCENTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with `rank`=0.5 is the median. ++ `error`: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact percentile. + +**Output Series:** Output a single series. The type is the same as input series. If `error`=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|percentile(root.test.s0, "rank"="0.2", "error"="0.01")| ++-----------------------------+------------------------------------------------------+ +|2021-03-17T10:35:02.054+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +``` + +## Quantile + +### Usage + +The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series. + +**Name:** QUANTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with `rank`=0.5 is the median. ++ `K`: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases. + +**Output Series:** Output a single series. The type is the same as input series. The timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select quantile(s0, "rank"="0.2", "K"="800") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|quantile(root.test.s0, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +``` + +## Period + +### Usage + +The function is used to compute the period of a numeric time series. + +**Name:** PERIOD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period. + +### Examples + +Input series: + + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select period(s1) from root.test.d3 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +## QLB + +### Usage + +This function is used to calculate Ljung-Box statistics $Q_{LB}$ for time series, and convert it to p value. + +**Name:** QLB + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters**: + +`lag`: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2. + +**Output Series:** Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag. + +**Note:** If you want to calculate Ljung-Box statistics $Q_{LB}$ instead of p value, you may use ACF function. + +### Examples + +#### Using Default Parameter + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select QLB(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +## Resample + +### Usage + +This function is used to resample the input series according to a given frequency, +including up-sampling and down-sampling. +Currently, the supported up-sampling methods are +NaN (filling with `NaN`), +FFill (filling with previous value), +BFill (filling with next value) and +Linear (filling with linear interpolation). +Down-sampling relies on group aggregation, +which supports Max, Min, First, Last, Mean and Median. + +**Name:** RESAMPLE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + + ++ `every`: The frequency of resampling, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. ++ `interp`: The interpolation method of up-sampling, which is 'NaN', 'FFill', 'BFill' or 'Linear'. By default, NaN is used. ++ `aggr`: The aggregation method of down-sampling, which is 'Max', 'Min', 'First', 'Last', 'Mean' or 'Median'. By default, Mean is used. ++ `start`: The start time (inclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the first valid data point. ++ `end`: The end time (exclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the last valid data point. + +**Output Series:** Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency `every`. + +**Note:** `NaN` in the input series will be ignored. + +### Examples + +#### Up-sampling + +When the frequency of resampling is higher than the original frequency, up-sampling starts. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +SQL for query: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +#### Down-sampling + +When the frequency of resampling is lower than the original frequency, down-sampling starts. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + + +#### Specify the time period + +The time period of resampling can be specified with `start` and `end`. +The period outside the actual time range will be interpolated. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +## Sample + +### Usage + +This function is used to sample the input series, +that is, select a specified number of data points from the input series and output them. +Currently, three sampling methods are supported: +**Reservoir sampling** randomly selects data points. +All of the points have the same probability of being sampled. +**Isometric sampling** selects data points at equal index intervals. +**Triangle sampling** assigns data points to the buckets based on the number of sampling. +Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. +For more detail, please read [paper](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf) + +**Name:** SAMPLE + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Parameters:** + ++ `method`: The method of sampling, which is 'reservoir', 'isometric' or 'triangle'. By default, reservoir sampling is used. ++ `k`: The number of sampling, which is a positive integer. By default, it's 1. + +**Output Series:** Output a single series. The type is the same as the input. The length of the output series is `k`. Each data point in the output series comes from the input series. + +**Note:** If `k` is greater than the length of input series, all data points in the input series will be output. + +### Examples + +#### Reservoir Sampling + +When `method` is 'reservoir' or the default, reservoir sampling is used. +Due to the randomness of this method, the output series shown below is only a possible result. + + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + +#### Isometric Sampling + +When `method` is 'isometric', isometric sampling is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +## Segment + +### Usage + +This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point. + +**Name:** SEGMENT + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `output` :"all" to output all fitted points; "first" to output first fitted points in each subsequence. + ++ `error`: error allowed at linear regression. It is defined as mean absolute error of a subsequence. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points. + +### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select segment(s1, "error"="0.1") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +## Skew + +### Usage + +This function is used to calculate the population skewness. + +**Name:** SKEW + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select skew(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +## Spline + +### Usage + +This function is used to calculate cubic spline interpolation of input series. + +**Name:** SPLINE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `points`: Number of resampling points. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note**: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series. + +### Examples + +#### Assigning number of interpolation points + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select spline(s1, "points"="151") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +## Spread + +### Usage + +This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value. + +**Name:** SPREAD + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + +## Stddev + +### Usage + +This function is used to calculate the population standard deviation. + +**Name:** STDDEV + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population standard deviation. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select stddev(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|stddev(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 5.7662812973353965| ++-----------------------------+-----------------------+ +``` + +## ZScore + +### Usage + +This function is used to standardize the input series with z-score. + +**Name:** ZSCORE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide mean and standard deviation. The default method is "batch". ++ `avg`: Mean value when method is set to "stream". ++ `sd`: Standard deviation when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +### Examples + +#### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select zscore(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Quality.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Quality.md new file mode 100644 index 00000000..ffc8ee6c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Quality.md @@ -0,0 +1,574 @@ + + +# Data Quality + +## Completeness + +### Usage + +This function is used to calculate the completeness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the completeness of each window will be output. + +**Name:** COMPLETENESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. ++ `downtime`: Whether the downtime exception is considered in the calculation of completeness. It is 'true' or 'false' (default). When considering the downtime exception, long-term missing data will be considered as downtime exception without any influence on completeness. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +### Examples + +#### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +#### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +## Consistency + +### Usage + +This function is used to calculate the consistency of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the consistency of each window will be output. + +**Name:** CONSISTENCY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +### Examples + +#### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +#### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +## Timeliness + +### Usage + +This function is used to calculate the timeliness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the timeliness of each window will be output. + +**Name:** TIMELINESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +### Examples + +#### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +#### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +## Validity + +### Usage + +This function is used to calculate the Validity of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the Validity of each window will be output. + +**Name:** VALIDITY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +### Examples + +#### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +#### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +## Accuracy + +### Usage + +This function is used to calculate the Accuracy of time series based on master data. + +**Name**: Accuracy + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. ++ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. + +**Output Series**: Output a single value. The type is DOUBLE. The range is [0,1]. + +### Examples + +Input series: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +SQL for query: + +```sql +select Accuracy(t1,t2,t3,m1,m2,m3) from root.test +``` + +Output series: + + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|Accuracy(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+---------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 0.875| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Repairing.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Repairing.md new file mode 100644 index 00000000..67e08cf8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Repairing.md @@ -0,0 +1,520 @@ + + +# Data Repairing + +## TimestampRepair + +This function is used for timestamp repair. +According to the given standard time interval, +the method of minimizing the repair cost is adopted. +By fine-tuning the timestamps, +the original data with unstable timestamp interval is repaired to strictly equispaced data. +If no standard time interval is given, +this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval. + +**Name:** TIMESTAMPREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method. ++ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +### Examples + +#### Manually Specify the Standard Time Interval + +When `interval` is given, this function repairs according to the given standard time interval. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +Output series: + + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +#### Automatically Estimate the Standard Time Interval + +When `interval` is default, this function estimates the standard time interval. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +## ValueFill + +### Usage + +This function is used to impute time series. Several methods are supported. + +**Name**: ValueFill +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear". + Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0). + +### Examples + +#### Fill with linear + +When `method` is "linear" or the default, Screen method is used to impute. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuefill(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +#### Previous Fill + +When `method` is "previous", previous method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +## ValueRepair + +### Usage + +This function is used to repair the value of the time series. +Currently, two methods are supported: +**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes; +**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood. + + +**Name:** VALUEREPAIR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used. ++ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation. ++ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation. ++ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0. ++ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +### Examples + +#### Repair with Screen + +When `method` is 'Screen' or the default, Screen method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +#### Repair with LsGreedy + +When `method` is 'LsGreedy', LsGreedy method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +## MasterRepair + +### Usage + +This function is used to clean time series with master data. + +**Name**: MasterRepair +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. ++ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +### Examples + +Input series: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +SQL for query: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +Output series: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +## SeasonalRepair + +### Usage +This function is used to repair the value of the seasonal time series via decomposition. Currently, two methods are supported: **Classical** - detect irregular fluctuations through residual component decomposed by classical decomposition, and repair them through moving average; **Improved** - detect irregular fluctuations through residual component decomposed by improved decomposition, and repair them through moving median. + +**Name:** SEASONALREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The decomposition method used to repair, which is 'Classical' or 'Improved'. By default, classical decomposition is used. ++ `period`: It is the period of the time series. ++ `k`: It is the range threshold of residual term, which limits the degree to which the residual term is off-center. By default, it is 9. ++ `max_iter`: It is the maximum number of iterations for the algorithm. By default, it is 10. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +### Examples + +#### Repair with Classical + +When `method` is 'Classical' or default value, classical decomposition method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +#### Repair with Improved +When `method` is 'Improved', improved decomposition method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Frequency-Domain.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Frequency-Domain.md new file mode 100644 index 00000000..e198c856 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Frequency-Domain.md @@ -0,0 +1,672 @@ + + +# Frequency Domain Analysis + +## Conv + +### Usage + +This function is used to calculate the convolution, i.e. polynomial multiplication. + +**Name:** CONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output:** Output a single series. The type is DOUBLE. It is the result of convolution whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +## Deconv + +### Usage + +This function is used to calculate the deconvolution, i.e. polynomial division. + +**Name:** DECONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `result`: The result of deconvolution, which is 'quotient' or 'remainder'. By default, the quotient will be output. + +**Output:** Output a single series. The type is DOUBLE. It is the result of deconvolving the second series from the first series (dividing the first series by the second series) whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +### Examples + + +#### Calculate the quotient + +When `result` is 'quotient' or the default, this function calculates the quotient of the deconvolution. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +#### Calculate the remainder + +When `result` is 'remainder', this function calculates the remainder of the deconvolution. + +Input series is the same as above, the SQL for query is shown below: + + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +## DWT + +### Usage + +This function is used to calculate 1d discrete wavelet transform of a numerical series. + +**Name:** DWT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of wavelet. May select 'Haar', 'DB4', 'DB6', 'DB8', where DB means Daubechies. User may offer coefficients of wavelet transform and ignore this parameter. Case ignored. ++ `coef`: Coefficients of wavelet transform. When providing this parameter, use comma ',' to split them, and leave no spaces or other punctuations. ++ `layer`: Times to transform. The number of output vectors equals $layer+1$. Default is 1. + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. + +**Note:** The length of input series must be an integer number power of 2. + +### Examples + + +#### Haar wavelet transform + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +## FFT + +### Usage + +This function is used to calculate the fast Fourier transform (FFT) of a numerical series. + +**Name:** FFT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of FFT, which is 'uniform' (by default) or 'nonuniform'. If the value is 'uniform', the timestamps will be ignored and all data points will be regarded as equidistant. Thus, the equidistant fast Fourier transform algorithm will be applied. If the value is 'nonuniform' (TODO), the non-equidistant fast Fourier transform algorithm will be applied based on timestamps. ++ `result`: The result of FFT, which is 'real', 'imag', 'abs' or 'angle', corresponding to the real part, imaginary part, magnitude and phase angle. By default, the magnitude will be output. ++ `compress`: The parameter of compression, which is within (0,1]. It is the reserved energy ratio of lossy compression. By default, there is no compression. + + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. The timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +### Examples + + +#### Uniform FFT + +With the default `type`, uniform FFT is applied. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select fft(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, there are peaks in $k=4$ and $k=5$ of the output. + +#### Uniform FFT with Compression + +Input series is the same as above, the SQL for query is shown below: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +Note: Based on the conjugation of the Fourier transform result, only the first half of the compression result is reserved. +According to the given parameter, data points are reserved from low frequency to high frequency until the reserved energy ratio exceeds it. +The last data point is reserved to indicate the length of the series. + +## HighPass + +### Usage + +This function performs low-pass filtering on the input series and extracts components above the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** HIGHPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=sin(2\pi t/4)$ after high-pass filtering. + +## IFFT + +### Usage + +This function treats the two input series as the real and imaginary part of a complex series, performs an inverse fast Fourier transform (IFFT), and outputs the real part of the result. +For the input format, please refer to the output format of `FFT` function. +Moreover, the compressed output of `FFT` function is also supported. + +**Name:** IFFT + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `start`: The start time of the output series with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is '1970-01-01 08:00:00'. ++ `interval`: The interval of the output series, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it is 1s. + +**Output:** Output a single series. The type is DOUBLE. It is strictly equispaced. The values are the results of IFFT. + +**Note:** If a row contains null points or `NaN`, it will be ignored. + +### Examples + + +Input series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +SQL for query: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +## LowPass + +### Usage + +This function performs low-pass filtering on the input series and extracts components below the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** LOWPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=2sin(2\pi t/5)$ after low-pass filtering. + + +## Envelope + +### Usage + +This function can demodulate the signal and extract the envelope by inputting the one-dimensional floating-point number set and the modulation frequency specified by the user. The goal of demodulation is to extract parts of interest from complex signals and make them easier to understand. For example, demodulation can find the envelope of the signal, that is, the trend of amplitude change. + +**Name:** Envelope + +**Input:** Only a single input sequence is supported. The type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameters:** + ++ `frequency`:Modulation frequency (optional, positive). Without this parameter, the system will infer the frequency based on the time interval of the corresponding time of the sequence. ++ `amplification`: Amplification multiple (optional, a positive integer. The result of the output Time column is a collection of positive integers, with no decimals. When the frequency is less than 1, the frequency can be amplified by this parameter to show normal results). + +**Output:** ++ `Time`: The value returned in this column means frequency, not time. If the output format is time (for example, 1970-01-01T08:00:19.000+08:00), convert it to a timestamp value. + ++ `Envelope(Path, 'frequency'='{frequency}')`:Output a single sequence of type DOUBLE, which is the result of envelope analysis. + +**Note:** When the values of the demodulated original sequence are not continuous, this function is treated as continuous, and it is recommended that the analyzed time series be a complete time series. Specify the start time and end time. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:01.000+08:00| 1.0 | +|1970-01-01T08:00:02.000+08:00| 2.0 | +|1970-01-01T08:00:03.000+08:00| 3.0 | +|1970-01-01T08:00:04.000+08:00| 4.0 | +|1970-01-01T08:00:05.000+08:00| 5.0 | +|1970-01-01T08:00:06.000+08:00| 6.0 | +|1970-01-01T08:00:07.000+08:00| 7.0 | +|1970-01-01T08:00:08.000+08:00| 8.0 | +|1970-01-01T08:00:09.000+08:00| 9.0 | +|1970-01-01T08:00:10.000+08:00| 10.0 | ++-----------------------------+---------------+ +``` + +SQL for query: +```sql +set time_display_type=long; +select envelope(s1),envelope(s1,'frequency'='1000'),envelope(s1,'amplification'='10') from root.test.d1; +``` +Output series: + +``` ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +|Time|envelope(root.test.d1.s1)|envelope(root.test.d1.s1, "frequency"="1000")|envelope(root.test.d1.s1, "amplification"="10")| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +| 0| 6.284350808484124| 6.284350808484124| 6.284350808484124| +| 100| 1.5581923657404393| 1.5581923657404393| null| +| 200| 0.8503211038340728| 0.8503211038340728| null| +| 300| 0.512808785945551| 0.512808785945551| null| +| 400| 0.26361156774506744| 0.26361156774506744| null| +|1000| null| null| 1.5581923657404393| +|2000| null| null| 0.8503211038340728| +|3000| null| null| 0.512808785945551| +|4000| null| null| 0.26361156774506744| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Lambda.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Lambda.md new file mode 100644 index 00000000..4c7dfd85 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Lambda.md @@ -0,0 +1,77 @@ + + +# Lambda Expression + +## JEXL Function + +Java Expression Language (JEXL) is an expression language engine. We use JEXL to extend UDFs, which are implemented on the command line with simple lambda expressions. See the link for [operators supported in jexl lambda expressions](https://commons.apache.org/proper/commons-jexl/apidocs/org/apache/commons/jexl3/package-summary.html#customization). + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Series Data Type Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr` is a lambda expression that supports standard one or multi arguments in the form `x -> {...}` or `(x, y, z) -> {...}`, e.g. ` x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | Returns the input time series transformed by a lambda expression | + +#### Demonstrate +Example data: `root.ln.wf01.wt01.temperature`, `root.ln.wf01.wt01.st`, `root.ln.wf01.wt01.str` a total of `11` data. + +``` +IoTDB> select * from root.ln.wf01.wt01; ++-----------------------------+---------------------+--------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.str|root.ln.wf01.wt01.st|root.ln.wf01.wt01.temperature| ++-----------------------------+---------------------+--------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| str| 10.0| 0.0| +|1970-01-01T08:00:00.001+08:00| str| 20.0| 1.0| +|1970-01-01T08:00:00.002+08:00| str| 30.0| 2.0| +|1970-01-01T08:00:00.003+08:00| str| 40.0| 3.0| +|1970-01-01T08:00:00.004+08:00| str| 50.0| 4.0| +|1970-01-01T08:00:00.005+08:00| str| 60.0| 5.0| +|1970-01-01T08:00:00.006+08:00| str| 70.0| 6.0| +|1970-01-01T08:00:00.007+08:00| str| 80.0| 7.0| +|1970-01-01T08:00:00.008+08:00| str| 90.0| 8.0| +|1970-01-01T08:00:00.009+08:00| str| 100.0| 9.0| +|1970-01-01T08:00:00.010+08:00| str| 110.0| 10.0| ++-----------------------------+---------------------+--------------------+-----------------------------+ +``` +Sql: +```sql +select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` +``` + +Result: +``` ++-----------------------------+-----+-----+-----+------+-----+--------+ +| Time|jexl1|jexl2|jexl3| jexl4|jexl5| jexl6| ++-----------------------------+-----+-----+-----+------+-----+--------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 0.0| 0.0| 10.0| 10.0str| +|1970-01-01T08:00:00.001+08:00| 2.0| 3.0| 1.0| 100.0| 21.0| 21.0str| +|1970-01-01T08:00:00.002+08:00| 4.0| 6.0| 4.0| 200.0| 32.0| 32.0str| +|1970-01-01T08:00:00.003+08:00| 6.0| 9.0| 9.0| 300.0| 43.0| 43.0str| +|1970-01-01T08:00:00.004+08:00| 8.0| 12.0| 16.0| 400.0| 54.0| 54.0str| +|1970-01-01T08:00:00.005+08:00| 10.0| 15.0| 25.0| 500.0| 65.0| 65.0str| +|1970-01-01T08:00:00.006+08:00| 12.0| 18.0| 36.0| 600.0| 76.0| 76.0str| +|1970-01-01T08:00:00.007+08:00| 14.0| 21.0| 49.0| 700.0| 87.0| 87.0str| +|1970-01-01T08:00:00.008+08:00| 16.0| 24.0| 64.0| 800.0| 98.0| 98.0str| +|1970-01-01T08:00:00.009+08:00| 18.0| 27.0| 81.0| 900.0|109.0|109.0str| +|1970-01-01T08:00:00.010+08:00| 20.0| 30.0|100.0|1000.0|120.0|120.0str| ++-----------------------------+-----+-----+-----+------+-----+--------+ +Total line number = 11 +It costs 0.118s +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Logical.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Logical.md new file mode 100644 index 00000000..1524870a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Logical.md @@ -0,0 +1,63 @@ + + +# Logical Operators + +## Unary Logical Operators + +Supported operator `!` + +Supported input data types: `BOOLEAN` + +Output data type: `BOOLEAN` + +Hint: the priority of `!` is the same as `-`. Remember to use brackets to modify priority. + +## Binary Logical Operators + +Supported operators AND:`and`,`&`, `&&`; OR:`or`,`|`,`||` + +Supported input data types: `BOOLEAN` + +Output data type: `BOOLEAN` + +Note: Only when the left operand and the right operand under a certain timestamp are both `BOOLEAN` type, the binary logic operation will have an output value. + +**Example:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +运行结果 +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Machine-Learning.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Machine-Learning.md new file mode 100644 index 00000000..b71604b4 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Machine-Learning.md @@ -0,0 +1,207 @@ + + +# Machine Learning + +## AR + +### Usage + +This function is used to learn the coefficients of the autoregressive models for a time series. + +**Name:** AR + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `p`: The order of the autoregressive model. Its default value is 1. + +**Output Series:** Output a single series. The type is DOUBLE. The first line corresponds to the first order coefficient, and so on. + +**Note:** + +- Parameter `p` should be a positive integer. +- Most points in the series should be sampled at a constant time interval. +- Linear interpolation is applied for the missing points in the series. + +### Examples + +#### Assigning Model Order + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +## Representation + +### Usage + +This function is used to represent a time series. + +**Name:** Representation + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is INT32. The length is `tb*vb`. The timestamps starting from 0 only indicate the order. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +### Examples + +#### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +## RM + +### Usage + +This function is used to calculate the matching score of two time series according to the representation. + +**Name:** RM + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the matching score. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +### Examples + +#### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Mathematical.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Mathematical.md new file mode 100644 index 00000000..6273fb07 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Mathematical.md @@ -0,0 +1,134 @@ + + +# Arithmetic Operators and Functions + +## Arithmetic Operators + +### Unary Arithmetic Operators + +Supported operators: `+`, `-` + +Supported input data types: `INT32`, `INT64` and `FLOAT` + +Output data type: consistent with the input data type + +### Binary Arithmetic Operators + +Supported operators: `+`, `-`, `*`, `/`, `%` + +Supported input data types: `INT32`, `INT64`, `FLOAT` and `DOUBLE` + +Output data type: `DOUBLE` + +Note: Only when the left operand and the right operand under a certain timestamp are not `null`, the binary arithmetic operation will have an output value. + +### Example + +```sql +select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 +``` + +Result: + +``` ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.sg.d1.s1|-root.sg.d1.s1|root.sg.d1.s2|root.sg.d1.s2|root.sg.d1.s1 + root.sg.d1.s2|root.sg.d1.s1 - root.sg.d1.s2|root.sg.d1.s1 * root.sg.d1.s2|root.sg.d1.s1 / root.sg.d1.s2|root.sg.d1.s1 % root.sg.d1.s2| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| -1.0| 1.0| 1.0| 2.0| 0.0| 1.0| 1.0| 0.0| +|1970-01-01T08:00:00.002+08:00| 2.0| -2.0| 2.0| 2.0| 4.0| 0.0| 4.0| 1.0| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.0| -3.0| 3.0| 3.0| 6.0| 0.0| 9.0| 1.0| 0.0| +|1970-01-01T08:00:00.004+08:00| 4.0| -4.0| 4.0| 4.0| 8.0| 0.0| 16.0| 1.0| 0.0| +|1970-01-01T08:00:00.005+08:00| 5.0| -5.0| 5.0| 5.0| 10.0| 0.0| 25.0| 1.0| 0.0| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.014s +``` + +## Arithmetic Functions + +Currently, IoTDB supports the following mathematical functions. The behavior of these mathematical functions is consistent with the behavior of these functions in the Java Math standard library. + +| Function Name | Allowed Input Series Data Types | Output Series Data Type | Required Attributes | Corresponding Implementation in the Java Standard Library | +| ------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | Same type as the input series | / | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 'places' : Round the significant number, positive number is the significant number after the decimal point, negative number is the significant number of whole number | Math#rint(Math#pow(10,places))/Math#pow(10,places) | +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sqrt(double) | + +Example: + +``` sql +select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +Result: + +``` ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +| Time| root.sg1.d1.s1|sin(root.sg1.d1.s1)| cos(root.sg1.d1.s1)|tan(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 0.8133527237573284| 0.5817708713544664| 1.3980636773094157| +|2020-12-10T17:11:49.038+08:00|4377791063319964531|-0.8938962705202537| 0.4482738644511651| -1.994085181866842| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 0.9627757585308978|-0.27030138509681073|-3.5618602479083545| +|2020-12-10T17:11:49.040+08:00|2508858212791964081|-0.6073417341629443| -0.7944406950452296| 0.7644897069734913| +|2020-12-10T17:11:49.041+08:00|2817297431185141819|-0.8419358900502509| -0.5395775727782725| 1.5603611649667768| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +Total line number = 5 +It costs 0.008s +``` + +### ROUND +Example: +```sql +select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1 +``` + +```sql ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +| Time|root.db.d1.s4|ROUND(root.db.d1.s4)|ROUND(root.db.d1.s4,2)|ROUND(root.db.d1.s4,-1)| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 101.14345| 101.0| 101.14| 100.0| +|1970-01-01T08:00:00.002+08:00| 20.144346| 20.0| 20.14| 20.0| +|1970-01-01T08:00:00.003+08:00| 20.614372| 21.0| 20.61| 20.0| +|1970-01-01T08:00:00.005+08:00| 20.814346| 21.0| 20.81| 20.0| +|1970-01-01T08:00:00.006+08:00| 60.71443| 61.0| 60.71| 60.0| +|2023-03-13T16:16:19.764+08:00| 10.143425| 10.0| 10.14| 10.0| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +Total line number = 6 +It costs 0.059s +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Overview.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Overview.md new file mode 100644 index 00000000..0dc7e074 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Overview.md @@ -0,0 +1,287 @@ + + +# Overview + +This chapter describes the operators and functions supported by IoTDB. IoTDB provides a wealth of built-in operators and functions to meet your computing needs, and supports extensions through the [User-Defined Function](./User-Defined-Function.md). + +A list of all available functions, both built-in and custom, can be displayed with `SHOW FUNCTIONS` command. + +See the documentation [Select-Expression](../Query-Data/Select-Expression.md) for the behavior of operators and functions in SQL. + +## Operators + +### Arithmetic Operators + +| Operator | Meaning | +| -------- | ------------------------- | +| `+` | positive (unary operator) | +| `-` | negative (unary operator) | +| `*` | multiplication | +| `/` | division | +| `%` | modulo | +| `+` | addition | +| `-` | subtraction | + +For details and examples, see the document [Arithmetic Operators and Functions](./Mathematical.md). + +### Comparison Operators + +| Operator | Meaning | +| ------------------------- | ------------------------------------ | +| `>` | greater than | +| `>=` | greater than or equal to | +| `<` | less than | +| `<=` | less than or equal to | +| `==` | equal to | +| `!=` / `<>` | not equal to | +| `BETWEEN ... AND ...` | within the specified range | +| `NOT BETWEEN ... AND ...` | not within the specified range | +| `LIKE` | match simple pattern | +| `NOT LIKE` | cannot match simple pattern | +| `REGEXP` | match regular expression | +| `NOT REGEXP` | cannot match regular expression | +| `IS NULL` | is null | +| `IS NOT NULL` | is not null | +| `IN` / `CONTAINS` | is a value in the specified list | +| `NOT IN` / `NOT CONTAINS` | is not a value in the specified list | + +For details and examples, see the document [Comparison Operators and Functions](./Comparison.md). + +### Logical Operators + +| Operator | Meaning | +| --------------------------- | --------------------------------- | +| `NOT` / `!` | logical negation (unary operator) | +| `AND` / `&` / `&&` | logical AND | +| `OR`/ | / || | logical OR | + +For details and examples, see the document [Logical Operators](./Logical.md). + +### Operator Precedence + +The precedence of operators is arranged as shown below from high to low, and operators on the same row have the same precedence. + +```sql +!, - (unary operator), + (unary operator) +*, /, DIV, %, MOD +-, + +=, ==, <=>, >=, >, <=, <, <>, != +LIKE, REGEXP, NOT LIKE, NOT REGEXP +BETWEEN ... AND ..., NOT BETWEEN ... AND ... +IS NULL, IS NOT NULL +IN, CONTAINS, NOT IN, NOT CONTAINS +AND, &, && +OR, |, || +``` + +## Built-in Functions + +The built-in functions can be used in IoTDB without registration, and the functions in the data quality function library need to be registered by referring to the registration steps in the next chapter before they can be used. + +### Aggregate Functions + +| Function Name | Description | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | +| ------------- | ------------------------------------------------------------ | ------------------------------- | ------------------------------------------------------------ | ----------------------------------- | +| SUM | Summation. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| COUNT | Counts the number of data points. | All types | / | INT | +| AVG | Average. | INT32 INT64 FLOAT DOUBLE | / | DOUBLE | +| EXTREME | Finds the value with the largest absolute value. Returns a positive value if the maximum absolute value of positive and negative values is equal. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| MAX_VALUE | Find the maximum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| MIN_VALUE | Find the minimum value. | INT32 INT64 FLOAT DOUBLE | / | Consistent with the input data type | +| FIRST_VALUE | Find the value with the smallest timestamp. | All data types | / | Consistent with input data type | +| LAST_VALUE | Find the value with the largest timestamp. | All data types | / | Consistent with input data type | +| MAX_TIME | Find the maximum timestamp. | All data Types | / | Timestamp | +| MIN_TIME | Find the minimum timestamp. | All data Types | / | Timestamp | +| COUNT_IF | Find the number of data points that continuously meet a given condition and the number of data points that meet the condition (represented by keep) meet the specified threshold. | BOOLEAN | `[keep >=/>/=/!=/= threshold` if `threshold` is used alone, type of `threshold` is `INT64` `ignoreNull`:Optional, default value is `true`;If the value is `true`, null values are ignored, it means that if there is a null value in the middle, the value is ignored without interrupting the continuity. If the value is `true`, null values are not ignored, it means that if there are null values in the middle, continuity will be broken | INT64 | +| TIME_DURATION | Find the difference between the timestamp of the largest non-null value and the timestamp of the smallest non-null value in a column | All data Types | / | INT64 | +| MODE | Find the mode. Note: 1.Having too many different values in the input series risks a memory exception; 2.If all the elements have the same number of occurrences, that is no Mode, return the value with earliest time; 3.If there are many Modes, return the Mode with earliest time. | All data Types | / | Consistent with the input data type | + +For details and examples, see the document [Aggregate Functions](./Aggregation.md). + +### Arithmetic Functions + +| Function Name | Allowed Input Series Data Types | Output Series Data Type | Required Attributes | Corresponding Implementation in the Java Standard Library | +| ------------- | ------------------------------- | ----------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | Same type as the input series | / | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 'places' : Round the significant number, positive number is the significant number after the decimal point, negative number is the significant number of whole number | Math#rint(Math#pow(10,places))/Math#pow(10,places) | +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sqrt(double) | + +For details and examples, see the document [Arithmetic Operators and Functions](./Mathematical.md). + +### Comparison Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | ----------------------------------------- | ----------------------- | --------------------------------------------- | +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`: a double type variate | BOOLEAN | Return `ts_value >= threshold`. | +| IN_RANGR | INT32 / INT64 / FLOAT / DOUBLE | `lower`: DOUBLE type `upper`: DOUBLE type | BOOLEAN | Return `ts_value >= lower && value <= upper`. | + +For details and examples, see the document [Comparison Operators and Functions](./Comparison.md). + +### String Processing Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| --------------- | ------------------------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| STRING_CONTAINS | TEXT | `s`: string to search for | BOOLEAN | Checks whether the substring `s` exists in the string. | +| STRING_MATCHES | TEXT | `regex`: Java standard library-style regular expressions. | BOOLEAN | Judges whether a string can be matched by the regular expression `regex`. | +| LENGTH | TEXT | / | INT32 | Get the length of input series. | +| LOCATE | TEXT | `target`: The substring to be located.
`reverse`: Indicates whether reverse locate is required. The default value is `false`, means left-to-right locate. | INT32 | Get the position of the first occurrence of substring `target` in input series. Returns -1 if there are no `target` in input. | +| STARTSWITH | TEXT | `target`: The prefix to be checked. | BOOLEAN | Check whether input series starts with the specified prefix `target`. | +| ENDSWITH | TEXT | `target`: The suffix to be checked. | BOOLEAN | Check whether input series ends with the specified suffix `target`. | +| CONCAT | TEXT | `targets`: a series of K-V, key needs to start with `target` and be not duplicated, value is the string you want to concat.
`series_behind`: Indicates whether series behind targets. The default value is `false`. | TEXT | Concatenate input string and `target` string. | +| SUBSTRING | TEXT | `from`: Indicates the start position of substring.
`for`: Indicates how many characters to stop after of substring. | TEXT | Extracts a substring of a string, starting with the first specified character and stopping after the specified number of characters.The index start at 1. | +| REPLACE | TEXT | first parameter: The target substring to be replaced.
second parameter: The substring to replace with. | TEXT | Replace a substring in the input sequence with the target substring. | +| UPPER | TEXT | / | TEXT | Get the string of input series with all characters changed to uppercase. | +| LOWER | TEXT | / | TEXT | Get the string of input series with all characters changed to lowercase. | +| TRIM | TEXT | / | TEXT | Get the string whose value is same to input series, with all leading and trailing space removed. | +| STRCMP | TEXT | / | TEXT | Get the compare result of two input series. Returns `0` if series value are the same, a `negative integer` if value of series1 is smaller than series2,
a `positive integer` if value of series1 is more than series2. | + +For details and examples, see the document [String Processing](./String.md). + +### Data Type Conversion Function + +| Function Name | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| CAST | `type`: Output data type, INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | determined by `type` | Convert the data to the type specified by the `type` parameter. | + +For details and examples, see the document [Data Type Conversion Function](./Conversion.md). + +### Constant Timeseries Generating Functions + +| Function Name | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------------------------------ | -------------------------------------------- | ------------------------------------------------------------ | +| CONST | `value`: the value of the output data point `type`: the type of the output data point, it can only be INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | Determined by the required attribute `type` | Output the user-specified constant timeseries according to the attributes `value` and `type`. | +| PI | None | DOUBLE | Data point value: a `double` value of `π`, the ratio of the circumference of a circle to its diameter, which is equals to `Math.PI` in the *Java Standard Library*. | +| E | None | DOUBLE | Data point value: a `double` value of `e`, the base of the natural logarithms, which is equals to `Math.E` in the *Java Standard Library*. | + +For details and examples, see the document [Constant Timeseries Generating Functions](./Constant.md). + +### Selector Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the largest values in a time series. | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the smallest values in a time series. | + +For details and examples, see the document [Selector Functions](./Selection.md). + +### Continuous Interval Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------- | ------------------------------------ | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always 0(false), and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always not 0, and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always 0(false). Data points number `n` satisfy `n >= min && n <= max` | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always not 0(false). Data points number `n` satisfy `n >= min && n <= max` | + +For details and examples, see the document [Continuous Interval Functions](./Continuous-Interval.md). + +### Variation Trend Calculation Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | / | INT64 | Calculates the difference between the time stamp of a data point and the time stamp of the previous data point. There is no corresponding output for the first data point. | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the absolute value of the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the rate of change of a data point compared to the previous data point, the result is equals to DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the absolute value of the rate of change of a data point compared to the previous data point, the result is equals to NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:optional,default is true. If is true, the previous data point is ignored when it is null and continues to find the first non-null value forwardly. If the value is false, previous data point is not ignored when it is null, the result is also null because null is used for subtraction | DOUBLE | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point, so output is null | + +For details and examples, see the document [Variation Trend Calculation Functions](./Variation-Trend.md). + +### Sample Functions + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| -------------------------------- | ------------------------------- | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns a random sample of equal buckets that matches the sampling ratio | +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1`
`type`: The value types are `avg`, `max`, `min`, `sum`, `extreme`, `variance`, the default is `avg` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket aggregation samples that match the sampling ratio | +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket M4 samples that match the sampling ratio | +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | The value range of `proportion` is `(0, 1]`, the default is `0.1`
The value of `type` is `avg` or `stendis` or `cos` or `prenextdis`, the default is `avg`
The value of `number` should be greater than 0, the default is `3` | INT32 / INT64 / FLOAT / DOUBLE | Returns outlier samples in equal buckets that match the sampling ratio and the number of samples in the bucket | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | Different attributes used by the size window and the time window. The size window uses attributes `windowSize` and `slidingStep`. The time window uses attributes `timeInterval`, `slidingStep`, `displayWindowBegin`, and `displayWindowEnd`. More details see below. | INT32 / INT64 / FLOAT / DOUBLE | Returns the `first, last, bottom, top` points in each sliding window. M4 sorts and deduplicates the aggregated points within the window before outputting them. | + +For details and examples, see the document [Sample Functions](./Sample.md). + +### Change Points Function + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | ------------------- | ----------------------------- | ----------------------------------------------------------- | +| CHANGE_POINTS | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Remove consecutive identical values from an input sequence. | + +For details and examples, see the document [Time-Series](./Time-Series.md). + +## Lambda Expression + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Series Data Type Description | +| ------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | ------------------------------------------------------------ | +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr` is a lambda expression that supports standard one or multi arguments in the form `x -> {...}` or `(x, y, z) -> {...}`, e.g. `x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | Returns the input time series transformed by a lambda expression | + +For details and examples, see the document [Lambda](./Lambda.md). + +## Conditional Expressions + +| Expression Name | Description | +| --------------- | -------------------- | +| `CASE` | similar to "if else" | + +For details and examples, see the document [Conditional Expressions](./Conditional.md). + +## Data Quality Function Library + +### About + +For applications based on time series data, data quality is vital. **UDF Library** is IoTDB User Defined Functions (UDF) about data quality, including data profiling, data quality evalution and data repairing. It effectively meets the demand for data quality in the industrial field. + +### Quick Start + +The functions in this function library are not built-in functions, and must be loaded into the system before use. + +1. [Download](https://archive.apache.org/dist/iotdb/1.0.1/apache-iotdb-1.0.1-library-udf-bin.zip) the JAR with all dependencies and the script of registering UDF. +2. Copy the JAR package to `ext\udf` under the directory of IoTDB system (Please put JAR to this directory of all DataNodes if you use Cluster). +3. Run `sbin\start-server.bat` (for Windows) or `sbin\start-server.sh` (for Linux or MacOS) to start IoTDB server. +4. Copy the script to the directory of IoTDB system (under the root directory, at the same level as `sbin`), modify the parameters in the script if needed and run it to register UDF. + +### Implemented Functions + +1. Data Quality related functions, such as `Completeness`. For details and examples, see the document [Data-Quality](../Operators-Functions/Data-Quality.md). +2. Data Profiling related functions, such as `ACF`. For details and examples, see the document [Data-Profiling](../Operators-Functions/Data-Profiling.md). +3. Anomaly Detection related functions, such as `IQR`. For details and examples, see the document [Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md). +4. Frequency Domain Analysis related functions, such as `Conv`. For details and examples, see the document [Frequency-Domain](../Operators-Functions/Frequency-Domain.md). +5. Data Matching related functions, such as `DTW`. For details and examples, see the document [Data-Matching](../Operators-Functions/Data-Matching.md). +6. Data Repairing related functions, such as `TimestampRepair`. For details and examples, see the document [Data-Repairing](../Operators-Functions/Data-Repairing.md). +7. Series Discovery related functions, such as `ConsecutiveSequences`. For details and examples, see the document [Series-Discovery](../Operators-Functions/Series-Discovery.md). +8. Machine Learning related functions, such as `AR`. For details and examples, see the document [Machine-Learning](../Operators-Functions/Machine-Learning.md). diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Sample.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Sample.md new file mode 100644 index 00000000..507cc279 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Sample.md @@ -0,0 +1,399 @@ + + +# Sample Functions + +## Equal Size Bucket Sample Function + +This function samples the input sequence in equal size buckets, that is, according to the downsampling ratio and downsampling method given by the user, the input sequence is equally divided into several buckets according to a fixed number of points. Sampling by the given sampling method within each bucket. +- `proportion`: sample ratio, the value range is `(0, 1]`. +### Equal Size Bucket Random Sample +Random sampling is performed on the equally divided buckets. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns a random sample of equal buckets that matches the sampling ratio | + +#### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`. + +```sql +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 5.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|.............................|.............................| +|1970-01-01T08:00:00.089+08:00| 89.0| +|1970-01-01T08:00:00.090+08:00| 90.0| +|1970-01-01T08:00:00.091+08:00| 91.0| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 95.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +Sql: +```sql +select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+-------------+ +| Time|random_sample| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.035+08:00| 35.0| +|1970-01-01T08:00:00.047+08:00| 47.0| +|1970-01-01T08:00:00.059+08:00| 59.0| +|1970-01-01T08:00:00.063+08:00| 63.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.086+08:00| 86.0| +|1970-01-01T08:00:00.096+08:00| 96.0| ++-----------------------------+-------------+ +Total line number = 10 +It costs 0.024s +``` + +### Equal Size Bucket Aggregation Sample + +The input sequence is sampled by the aggregation sampling method, and the user needs to provide an additional aggregation function parameter, namely +- `type`: Aggregate type, which can be `avg` or `max` or `min` or `sum` or `extreme` or `variance`. By default, `avg` is used. `extreme` represents the value with the largest absolute value in the equal bucket. `variance` represents the variance in the sampling equal buckets. + +The timestamp of the sampling output of each bucket is the timestamp of the first point of the bucket. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1`
`type`: The value types are `avg`, `max`, `min`, `sum`, `extreme`, `variance`, the default is `avg` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket aggregation samples that match the sampling ratio | + +#### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`, and the test data is randomly sampled in equal buckets. + +Sql: +```sql +select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +| Time| agg_avg|agg_max|agg_min|agg_sum|agg_extreme|agg_variance| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| 14.5| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| 20.0| 245.0| 29.0| 8.25| +|1970-01-01T08:00:00.030+08:00| 34.5| 39.0| 30.0| 345.0| 39.0| 8.25| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| 54.5| 59.0| 50.0| 545.0| 59.0| 8.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| 8.25| +|1970-01-01T08:00:00.070+08:00|74.50000000000001| 79.0| 70.0| 745.0| 79.0| 8.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 8.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 8.25| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +Total line number = 10 +It costs 0.044s +``` + +### Equal Size Bucket M4 Sample + +The input sequence is sampled using the M4 sampling method. That is to sample the head, tail, min and max values for each bucket. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket M4 samples that match the sampling ratio | + +#### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`, and the test data is randomly sampled in equal buckets. + +Sql: +```sql +select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+---------+ +| Time|M4_sample| ++-----------------------------+---------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.038+08:00| 38.0| +|1970-01-01T08:00:00.039+08:00| 39.0| +|1970-01-01T08:00:00.040+08:00| 40.0| +|1970-01-01T08:00:00.041+08:00| 41.0| +|1970-01-01T08:00:00.078+08:00| 78.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.080+08:00| 80.0| +|1970-01-01T08:00:00.081+08:00| 81.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+---------+ +Total line number = 12 +It costs 0.065s +``` + +### Equal Size Bucket Outlier Sample + +This function samples the input sequence with equal number of bucket outliers, that is, according to the downsampling ratio given by the user and the number of samples in the bucket, the input sequence is divided into several buckets according to a fixed number of points. Sampling by the given outlier sampling method within each bucket. + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | The value range of `proportion` is `(0, 1]`, the default is `0.1`
The value of `type` is `avg` or `stendis` or `cos` or `prenextdis`, the default is `avg`
The value of `number` should be greater than 0, the default is `3`| INT32 / INT64 / FLOAT / DOUBLE | Returns outlier samples in equal buckets that match the sampling ratio and the number of samples in the bucket | + +Parameter Description +- `proportion`: sampling ratio +- `number`: the number of samples in each bucket, default `3` +- `type`: outlier sampling method, the value is + - `avg`: Take the average of the data points in the bucket, and find the `top number` farthest from the average according to the sampling ratio + - `stendis`: Take the vertical distance between each data point in the bucket and the first and last data points of the bucket to form a straight line, and according to the sampling ratio, find the `top number` with the largest distance + - `cos`: Set a data point in the bucket as b, the data point on the left of b as a, and the data point on the right of b as c, then take the cosine value of the angle between the ab and bc vectors. The larger the angle, the more likely it is an outlier. Find the `top number` with the smallest cos value + - `prenextdis`: Let a data point in the bucket be b, the data point to the left of b is a, and the data point to the right of b is c, then take the sum of the lengths of ab and bc as the yardstick, the larger the sum, the more likely it is to be an outlier, and find the `top number` with the largest sum value + +#### Example + +Example data: `root.ln.wf01.wt01.temperature` has a total of `100` ordered data from `0.0-99.0`. Among them, in order to add outliers, we make the number modulo 5 equal to 0 increment by 100. + +```sql +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 105.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|1970-01-01T08:00:00.013+08:00| 13.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.015+08:00| 115.0| +|1970-01-01T08:00:00.016+08:00| 16.0| +|.............................|.............................| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 195.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +Sql: +```sql +select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; +``` +Result: +```sql ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +| Time|outlier_avg_sample|outlier_stendis_sample|outlier_cos_sample|outlier_prenextdis_sample| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +|1970-01-01T08:00:00.005+08:00| 105.0| 105.0| 105.0| 105.0| +|1970-01-01T08:00:00.015+08:00| 115.0| 115.0| 115.0| 115.0| +|1970-01-01T08:00:00.025+08:00| 125.0| 125.0| 125.0| 125.0| +|1970-01-01T08:00:00.035+08:00| 135.0| 135.0| 135.0| 135.0| +|1970-01-01T08:00:00.045+08:00| 145.0| 145.0| 145.0| 145.0| +|1970-01-01T08:00:00.055+08:00| 155.0| 155.0| 155.0| 155.0| +|1970-01-01T08:00:00.065+08:00| 165.0| 165.0| 165.0| 165.0| +|1970-01-01T08:00:00.075+08:00| 175.0| 175.0| 175.0| 175.0| +|1970-01-01T08:00:00.085+08:00| 185.0| 185.0| 185.0| 185.0| +|1970-01-01T08:00:00.095+08:00| 195.0| 195.0| 195.0| 195.0| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +Total line number = 10 +It costs 0.041s +``` + +## M4 Function + +M4 is used to sample the `first, last, bottom, top` points for each sliding window: + +- the first point is the point with the **m**inimal time; +- the last point is the point with the **m**aximal time; +- the bottom point is the point with the **m**inimal value (if there are multiple such points, M4 returns one of them); +- the top point is the point with the **m**aximal value (if there are multiple such points, M4 returns one of them). + +image + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------- | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | Different attributes used by the size window and the time window. The size window uses attributes `windowSize` and `slidingStep`. The time window uses attributes `timeInterval`, `slidingStep`, `displayWindowBegin`, and `displayWindowEnd`. More details see below. | INT32 / INT64 / FLOAT / DOUBLE | Returns the `first, last, bottom, top` points in each sliding window. M4 sorts and deduplicates the aggregated points within the window before outputting them. | + +### Attributes + +**(1) Attributes for the size window:** + ++ `windowSize`: The number of points in a window. Int data type. **Required**. ++ `slidingStep`: Slide a window by the number of points. Int data type. Optional. If not set, default to the same as `windowSize`. + +image + +**(2) Attributes for the time window:** + ++ `timeInterval`: The time interval length of a window. Long data type. **Required**. ++ `slidingStep`: Slide a window by the time length. Long data type. Optional. If not set, default to the same as `timeInterval`. ++ `displayWindowBegin`: The starting position of the window (included). Long data type. Optional. If not set, default to Long.MIN_VALUE, meaning using the time of the first data point of the input time series as the starting position of the window. ++ `displayWindowEnd`: End time limit (excluded, essentially playing the same role as `WHERE time < displayWindowEnd`). Long data type. Optional. If not set, default to Long.MAX_VALUE, meaning there is no additional end time limit other than the end of the input time series itself. + +groupBy window + +### Examples + +Input series: + +```sql ++-----------------------------+------------------+ +| Time|root.vehicle.d1.s1| ++-----------------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.002+08:00| 15.0| +|1970-01-01T08:00:00.005+08:00| 10.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.027+08:00| 20.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.040+08:00| 20.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+------------------+ +``` + +SQL for query1: + +```sql +select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 +``` + +Output1: + +```sql ++-----------------------------+-----------------------------------------------------------------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "timeInterval"="25", "displayWindowBegin"="0", "displayWindowEnd"="100")| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +Total line number = 8 +``` + +SQL for query2: + +```sql +select M4(s1,'windowSize'='10') from root.vehicle.d1 +``` + +Output2: + +```sql ++-----------------------------+-----------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "windowSize"="10")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------+ +Total line number = 7 +``` + +### Suggested Use Cases + +**(1) Use Case: Extreme-point-preserving downsampling** + +As M4 aggregation selects the `first, last, bottom, top` points for each window, M4 usually preserves extreme points and thus patterns better than other downsampling methods such as Piecewise Aggregate Approximation (PAA). Therefore, if you want to downsample the time series while preserving extreme points, you may give M4 a try. + +**(2) Use case: Error-free two-color line chart visualization of large-scale time series through M4 downsampling** + +Referring to paper ["M4: A Visualization-Oriented Time Series Data Aggregation"](http://www.vldb.org/pvldb/vol7/p797-jugel.pdf), M4 is a downsampling method to facilitate large-scale time series visualization without deforming the shape in terms of a two-color line chart. + +Given a chart of `w*h` pixels, suppose that the visualization time range of the time series is `[tqs,tqe)` and (tqe-tqs) is divisible by w, the points that fall within the `i`-th time span `Ii=[tqs+(tqe-tqs)/w*(i-1),tqs+(tqe-tqs)/w*i)` will be drawn on the `i`-th pixel column, i=1,2,...,w. Therefore, from a visualization-driven perspective, use the sql: `"select M4(s1,'timeInterval'='(tqe-tqs)/w','displayWindowBegin'='tqs','displayWindowEnd'='tqe') from root.vehicle.d1"` to sample the `first, last, bottom, top` points for each time span. The resulting downsampled time series has no more than `4*w` points, a big reduction compared to the original large-scale time series. Meanwhile, the two-color line chart drawn from the reduced data is identical that to that drawn from the original data (pixel-level consistency). + +To eliminate the hassle of hardcoding parameters, we recommend the following usage of Grafana's [template variable](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables) `$__interval_ms` when Grafana is used for visualization: + +``` +select M4(s1,'timeInterval'='$__interval_ms') from root.sg1.d1 +``` + +where `timeInterval` is set as `(tqe-tqs)/w` automatically. Note that the time precision here is assumed to be milliseconds. + +### Comparison with Other Functions + +| SQL | Whether support M4 aggregation | Sliding window type | Example | Docs | +| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| 1. native built-in aggregate functions with Group By clause | No. Lack `BOTTOM_TIME` and `TOP_TIME`, which are respectively the time of the points that have the mininum and maximum value. | Time Window | `select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#built-in-aggregate-functions
https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#downsampling-aggregate-query | +| 2. EQUAL_SIZE_BUCKET_M4_SAMPLE (built-in UDF) | Yes* | Size Window. `windowSize = 4*(int)(1/proportion)` | `select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Select-Expression.html#time-series-generating-functions | +| **3. M4 (built-in UDF)** | Yes* | Size Window, Time Window | (1) Size Window: `select M4(s1,'windowSize'='10') from root.vehicle.d1`
(2) Time Window: `select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1` | refer to this doc | +| 4. extend native built-in aggregate functions with Group By clause to support M4 aggregation | not implemented | not implemented | not implemented | not implemented | + +Further compare `EQUAL_SIZE_BUCKET_M4_SAMPLE` and `M4`: + +**(1) Different M4 aggregation definition:** + +For each window, `EQUAL_SIZE_BUCKET_M4_SAMPLE` extracts the top and bottom points from points **EXCLUDING** the first and last points. + +In contrast, `M4` extracts the top and bottom points from points **INCLUDING** the first and last points, which is more consistent with the semantics of `max_value` and `min_value` stored in metadata. + +It is worth noting that both functions sort and deduplicate the aggregated points in a window before outputting them to the collectors. + +**(2) Different sliding windows:** + +`EQUAL_SIZE_BUCKET_M4_SAMPLE` uses SlidingSizeWindowAccessStrategy and **indirectly** controls sliding window size by sampling proportion. The conversion formula is `windowSize = 4*(int)(1/proportion)`. + +`M4` supports two types of sliding window: SlidingSizeWindowAccessStrategy and SlidingTimeWindowAccessStrategy. `M4` **directly** controls the window point size or time length using corresponding parameters. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Selection.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Selection.md new file mode 100644 index 00000000..f5e07ba2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Selection.md @@ -0,0 +1,51 @@ + + +# Selector Functions + +Currently, IoTDB supports the following selector functions: + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ------------- | ------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the largest values in a time series. | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the smallest values in a time series. | + +Example: + +``` sql +select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; +``` + +Result: + +``` ++-----------------------------+--------------------+------------------------------+---------------------------------+ +| Time| root.sg1.d2.s1|top_k(root.sg1.d2.s1, "k"="2")|bottom_k(root.sg1.d2.s1, "k"="2")| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +|2020-12-10T20:36:15.531+08:00| 1531604122307244742| 1531604122307244742| null| +|2020-12-10T20:36:15.532+08:00|-7426070874923281101| null| null| +|2020-12-10T20:36:15.533+08:00|-7162825364312197604| -7162825364312197604| null| +|2020-12-10T20:36:15.534+08:00|-8581625725655917595| null| -8581625725655917595| +|2020-12-10T20:36:15.535+08:00|-7667364751255535391| null| -7667364751255535391| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +Total line number = 5 +It costs 0.006s +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Series-Discovery.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Series-Discovery.md new file mode 100644 index 00000000..8cc69b5d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Series-Discovery.md @@ -0,0 +1,173 @@ + + +# Series Discovery + +## ConsecutiveSequences + +### Usage + +This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest. + +**Name:** CONSECUTIVESEQUENCES + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +### Examples + +#### Manually Specify the Standard Time Interval + +It's able to manually specify the standard time interval by the parameter `gap`. It's notable that false parameter leads to false output. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + + +#### Automatically Estimate the Standard Time Interval + +When `gap` is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +## ConsecutiveWindows + +### Usage + +This function is used to find consecutive windows of specified length in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive window is the subsequence that is strictly equispaced with the standard time interval without any missing data. + +**Name:** CONSECUTIVEWINDOWS + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. ++ `length`: The length of the window which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a consecutive window. The output timestamp is the starting timestamp of the window and the output value is the number of data points in the window. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +### Examples + + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/String.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/String.md new file mode 100644 index 00000000..86f40bbc --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/String.md @@ -0,0 +1,911 @@ + + +# String Processing + +## STRING_CONTAINS + +### Function introduction + +This function checks whether the substring `s` exists in the string + +**Function name:** STRING_CONTAINS + +**Input sequence:** Only a single input sequence is supported, the type is TEXT. + +**parameter:** ++ `s`: The string to search for. + +**Output Sequence:** Output a single sequence, the type is BOOLEAN. + +### Usage example + +``` sql +select s1, string_contains(s1, 's'='warn') from root.sg1.d4; +``` + +``` ++-----------------------------+--------------+-------------------------------------------+ +| Time|root.sg1.d4.s1|string_contains(root.sg1.d4.s1, "s"="warn")| ++-----------------------------+--------------+-------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| true| +|1970-01-01T08:00:00.002+08:00| error:-37229| false| +|1970-01-01T08:00:00.003+08:00| warn:1731| true| ++-----------------------------+--------------+-------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +## STRING_MATCHES + +### Function introduction + +This function judges whether a string can be matched by the regular expression `regex`. + +**Function name:** STRING_MATCHES + +**Input sequence:** Only a single input sequence is supported, the type is TEXT. + +**parameter:** ++ `regex`: Java standard library-style regular expressions. + +**Output Sequence:** Output a single sequence, the type is BOOLEAN. + +### Usage example + +```sql +select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; +``` + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d4.s1|string_matches(root.sg1.d4.s1, "regex"="[^\\s]+37229")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| false| +|1970-01-01T08:00:00.002+08:00| error:-37229| true| +|1970-01-01T08:00:00.003+08:00| warn:1731| false| ++-----------------------------+--------------+------------------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +## Length + +### Usage + +The function is used to get the length of input series. + +**Name:** LENGTH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is INT32. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, length(s1) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+----------------------+ +| Time|root.sg1.d1.s1|length(root.sg1.d1.s1)| ++-----------------------------+--------------+----------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 6| +|1970-01-01T08:00:00.002+08:00| 22test22| 8| ++-----------------------------+--------------+----------------------+ +``` + +## Locate + +### Usage + +The function is used to get the position of the first occurrence of substring `target` in input series. Returns -1 if there are no `target` in input. + +**Name:** LOCATE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `target`: The substring to be located. ++ `reverse`: Indicates whether reverse locate is required. The default value is `false`, means left-to-right locate. + +**Output Series:** Output a single series. The type is INT32. + +**Note:** The index begins from 0. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, locate(s1, "target"="1") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 0| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------+ +``` + +Another SQL for query: + +```sql +select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1", "reverse"="true")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 5| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------------------------+ +``` + +## StartsWith + +### Usage + +The function is used to check whether input series starts with the specified prefix. + +**Name:** STARTSWITH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ `target`: The prefix to be checked. + +**Output Series:** Output a single series. The type is BOOLEAN. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, startswith(s1, "target"="1") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+----------------------------------------+ +| Time|root.sg1.d1.s1|startswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+----------------------------------------+ +``` + +## EndsWith + +### Usage + +The function is used to check whether input series ends with the specified suffix. + +**Name:** ENDSWITH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ `target`: The suffix to be checked. + +**Output Series:** Output a single series. The type is BOOLEAN. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, endswith(s1, "target"="1") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|endswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+--------------------------------------+ +``` + +## Concat + +### Usage + +The function is used to concat input series and target strings. + +**Name:** CONCAT + +**Input Series:** At least one input series. The data type is TEXT. + +**Parameter:** ++ `targets`: A series of K-V, key needs to start with `target` and be not duplicated, value is the string you want to concat. ++ `series_behind`: Indicates whether series behind targets. The default value is `false`. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** ++ If value of input series is NULL, it will be skipped. ++ We can only concat input series and `targets` separately. `concat(s1, "target1"="IoT", s2, "target2"="DB")` and + `concat(s1, s2, "target1"="IoT", "target2"="DB")` gives the same result. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| 1test1IoTDB| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 22test222222testIoTDB| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +``` + +Another SQL for query: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB", "series_behind"="true")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| IoTDB1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| IoTDB22test222222test| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +``` + +## substring + +### Usage + +Extracts a substring of a string, starting with the first specified character and stopping after the specified number of characters.The index start at 1. The value range of from and for is an INT32. + +**Name:** SUBSTRING + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ `from`: Indicates the start position of substring. ++ `for`: Indicates how many characters to stop after of substring. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, substring(s1 from 1 for 2) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|SUBSTRING(root.sg1.d1.s1 FROM 1 FOR 2)| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1t| +|1970-01-01T08:00:00.002+08:00| 22test22| 22| ++-----------------------------+--------------+--------------------------------------+ +``` + +## replace + +### Usage + +Replace a substring in the input sequence with the target substring. + +**Name:** REPLACE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** ++ first parameter: The target substring to be replaced. ++ second parameter: The substring to replace with. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, replace(s1, 'es', 'tt') from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+-----------------------------------+ +| Time|root.sg1.d1.s1|REPLACE(root.sg1.d1.s1, 'es', 'tt')| ++-----------------------------+--------------+-----------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1tttt1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22tttt22| ++-----------------------------+--------------+-----------------------------------+ +``` + +## Upper + +### Usage + +The function is used to get the string of input series with all characters changed to uppercase. + +**Name:** UPPER + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, upper(s1) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|upper(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22TEST22| ++-----------------------------+--------------+---------------------+ +``` + +## Lower + +### Usage + +The function is used to get the string of input series with all characters changed to lowercase. + +**Name:** LOWER + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s1, lower(s1) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|lower(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| 1test1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| 22test22| ++-----------------------------+--------------+---------------------+ +``` + +## Trim + +### Usage + +The function is used to get the string whose value is same to input series, with all leading and trailing space removed. + +**Name:** TRIM + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Returns NULL if input is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s3| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | ++-----------------------------+--------------+ +``` + +SQL for query: + +```sql +select s3, trim(s3) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------------+ +| Time|root.sg1.d1.s3|trim(root.sg1.d1.s3)| ++-----------------------------+--------------+--------------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | 3querytest3| ++-----------------------------+--------------+--------------------+ +``` + +## StrCmp + +### Usage + +The function is used to get the compare result of two input series. Returns `0` if series value are the same, a `negative integer` if value of series1 is smaller than series2, +a `positive integer` if value of series1 is more than series2. + +**Name:** StrCmp + +**Input Series:** Support two input series. Data types are all the TEXT. + +**Output Series:** Output a single series. The type is INT32. + +**Note:** Returns NULL either series value is NULL. + +### Examples + +Input series: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select s1, s2, strcmp(s1, s2) from root.sg1.d1 +``` + +Output series: + +``` ++-----------------------------+--------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|strcmp(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 66| ++-----------------------------+--------------+--------------+--------------------------------------+ +``` + + +## StrReplace + +### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to replace the specific substring with given string. + +**Name:** STRREPLACE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `target`: The target substring to be replaced. ++ `replace`: The string to be put on. ++ `limit`: The number of matches to be replaced which should be an integer no less than -1, + default to -1 which means all matches will be replaced. ++ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. ++ `reverse`: Whether to count all the matches reversely, default to 'false'. + +**Output Series:** Output a single series. The type is TEXT. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",",| +| | "replace"="/", "limit"="2")| ++-----------------------------+-----------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A/B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A/A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B/B| +|2021-01-01T00:00:04.000+08:00| A+/A/A+,A| +|2021-01-01T00:00:05.000+08:00| A/B-/B,B| ++-----------------------------+-----------------------------------------+ +``` + +Another SQL for query: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= | +| | "|", "limit"="1", "offset"="1", "reverse"="true")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A,B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B,B| +|2021-01-01T00:00:04.000+08:00| A+,A/A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-/B,B| ++-----------------------------+-----------------------------------------------------+ +``` + +## RegexMatch + +### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to fetch matched contents from text with given regular expression. + +**Name:** REGEXMATCH + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `regex`: The regular expression to match in the text. All grammars supported by Java are acceptable, + for example, `\d+\.\d+\.\d+\.\d+` is expected to match any IPv4 addresses. ++ `group`: The wanted group index in the matched result. + Reference to java.util.regex, group 0 is the whole pattern and + the next ones are numbered with the appearance order of left parentheses. + For example, the groups in `A(B(CD))` are: 0-`A(B(CD))`, 1-`B(CD)`, 2-`CD`. + +**Output Series:** Output a single series. The type is TEXT. + +**Note:** Those points with null values or not matched with the given pattern will not return any results. + +### Examples + +Input series: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +SQL for query: + +```sql +select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------------------------------------+ +| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")| ++-----------------------------+----------------------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 192.168.0.1| +|2021-01-01T00:00:02.000+08:00| 192.168.0.24| +|2021-01-01T00:00:03.000+08:00| 192.168.0.2| +|2021-01-01T00:00:04.000+08:00| 192.168.0.5| +|2021-01-01T00:00:05.000+08:00| 192.168.0.124| ++-----------------------------+----------------------------------------------------------------------+ +``` + +## RegexReplace + +### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to replace the specific regular expression matches with given string. + +**Name:** REGEXREPLACE + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `regex`: The target regular expression to be replaced. All grammars supported by Java are acceptable. ++ `replace`: The string to be put on and back reference notes in Java is also supported, + for example, '$1' refers to group 1 in the `regex` which will be filled with corresponding matched results. ++ `limit`: The number of matches to be replaced which should be an integer no less than -1, + default to -1 which means all matches will be replaced. ++ `offset`: The number of matches to be skipped, which means the first `offset` matches will not be replaced, default to 0. ++ `reverse`: Whether to count all the matches reversely, default to 'false'. + +**Output Series:** Output a single series. The type is TEXT. + +### Examples + +Input series: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +SQL for query: + +```sql +select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------------+ +| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",| +| | "replace"="cluster-$1", "limit"="1")| ++-----------------------------+-----------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]| ++-----------------------------+-----------------------------------------------------------+ +``` + +## RegexSplit + +### Usage + +**This is not a built-in function and can only be used after registering the library-udf.** The function is used to split text with given regular expression and return specific element. + +**Name:** REGEXSPLIT + +**Input Series:** Only support a single input series. The data type is TEXT. + +**Parameter:** + ++ `regex`: The regular expression used to split the text. + All grammars supported by Java are acceptable, for example, `['"]` is expected to match `'` and `"`. ++ `index`: The wanted index of elements in the split result. + It should be an integer no less than -1, default to -1 which means the length of the result array is returned + and any non-negative integer is used to fetch the text of the specific index starting from 0. + +**Output Series:** Output a single series. The type is INT32 when `index` is -1 and TEXT when it's an valid index. + +**Note:** When `index` is out of the range of the result array, for example `0,1,2` split with `,` and `index` is set to 3, +no result are returned for that record. + +### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 4| +|2021-01-01T00:00:02.000+08:00| 4| +|2021-01-01T00:00:03.000+08:00| 3| +|2021-01-01T00:00:04.000+08:00| 4| +|2021-01-01T00:00:05.000+08:00| 4| ++-----------------------------+------------------------------------------------------+ +``` + +Another SQL for query: + +SQL for query: + +```sql +select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| B-| +|2021-01-01T00:00:02.000+08:00| B+| +|2021-01-01T00:00:04.000+08:00| A| +|2021-01-01T00:00:05.000+08:00| B| ++-----------------------------+-----------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Time-Series.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Time-Series.md new file mode 100644 index 00000000..531474bf --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Time-Series.md @@ -0,0 +1,70 @@ + + +# Time Series Processing + +## CHANGE_POINTS + +### Usage + +This function is used to remove consecutive identical values from an input sequence. +For example, input:`1,1,2,2,3` output:`1,2,3`. + +**Name:** CHANGE_POINTS + +**Input Series:** Support only one input series. + +**Parameters:** No parameters. + +### Example + +Raw data: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|root.testChangePoints.d1.s1|root.testChangePoints.d1.s2|root.testChangePoints.d1.s3|root.testChangePoints.d1.s4|root.testChangePoints.d1.s5|root.testChangePoints.d1.s6| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| true| 2| 2| 2.0| 1.0| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| 2| 1.0| 1.0| 2test2| +|1970-01-01T08:00:00.004+08:00| true| 1| 3| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.005+08:00| true| 1| 3| 1.0| 1.0| 1test1| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +``` + +SQL for query: + +```sql +select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +| Time|change_points(root.testChangePoints.d1.s1)|change_points(root.testChangePoints.d1.s2)|change_points(root.testChangePoints.d1.s3)|change_points(root.testChangePoints.d1.s4)|change_points(root.testChangePoints.d1.s5)|change_points(root.testChangePoints.d1.s6)| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| null| 2| 2| 2.0| null| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| null| 1.0| null| null| +|1970-01-01T08:00:00.004+08:00| true| null| 3| null| null| 1test1| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/User-Defined-Function.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/User-Defined-Function.md new file mode 100644 index 00000000..94de878a --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/User-Defined-Function.md @@ -0,0 +1,658 @@ + + + +# User Defined Function (UDF) + +IoTDB provides a variety of built-in functions to meet your computing needs, and you can also create user defined functions to meet more computing needs. + +This document describes how to write, register and use a UDF. + + +## UDF Types + +In IoTDB, you can expand two types of UDF: + +| UDF Class | Description | +| --------------------------------------------------- | ------------------------------------------------------------ | +| UDTF(User Defined Timeseries Generating Function) | This type of function can take **multiple** time series as input, and output **one** time series, which can have any number of data points. | +| UDAF(User Defined Aggregation Function) | Under development, please stay tuned. | + + + +## UDF Development Dependencies + +If you use [Maven](http://search.maven.org/), you can search for the development dependencies listed below from the [Maven repository](http://search.maven.org/) . Please note that you must select the same dependency version as the target IoTDB server version for development. + +``` xml + + org.apache.iotdb + udf-api + 1.0.0 + provided + +``` + + + +## UDTF(User Defined Timeseries Generating Function) + +To write a UDTF, you need to inherit the `org.apache.iotdb.udf.api.UDTF` class, and at least implement the `beforeStart` method and a `transform` method. + +The following table shows all the interfaces available for user implementation. + +| Interface definition | Description | Required to Implement | +| :----------------------------------------------------------- | :----------------------------------------------------------- | ----------------------------------------------------- | +| `void validate(UDFParameterValidator validator) throws Exception` | This method is mainly used to validate `UDFParameters` and it is executed before `beforeStart(UDFParameters, UDTFConfigurations)` is called. | Optional | +| `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` | The initialization method to call the user-defined initialization behavior before a UDTF processes the input data. Every time a user executes a UDTF query, the framework will construct a new UDF instance, and `beforeStart` will be called. | Required | +| `void transform(Row row, PointCollector collector) throws Exception` | This method is called by the framework. This data processing method will be called when you choose to use the `RowByRowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `Row`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | +| `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | This method is called by the framework. This data processing method will be called when you choose to use the `SlidingSizeWindowAccessStrategy` or `SlidingTimeWindowAccessStrategy` strategy (set in `beforeStart`) to consume raw data. Input data is passed in by `RowWindow`, and the transformation result should be output by `PointCollector`. You need to call the data collection method provided by `collector` to determine the output data. | Required to implement at least one `transform` method | +| `void terminate(PointCollector collector) throws Exception` | This method is called by the framework. This method will be called once after all `transform` calls have been executed. In a single UDF query, this method will and will only be called once. You need to call the data collection method provided by `collector` to determine the output data. | Optional | +| `void beforeDestroy() ` | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | + +In the life cycle of a UDTF instance, the calling sequence of each method is as follows: + +1. `void validate(UDFParameterValidator validator) throws Exception` +2. `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` +3. `void transform(Row row, PointCollector collector) throws Exception` or `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` +4. `void terminate(PointCollector collector) throws Exception` +5. `void beforeDestroy() ` + +Note that every time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDTF without considering the influence of concurrency and other factors. + +The usage of each interface will be described in detail below. + + + +### void validate(UDFParameterValidator validator) throws Exception + +The `validate` method is used to validate the parameters entered by the user. + +In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification. + +Please refer to the Javadoc for the usage of `UDFParameterValidator`. + + + +### void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception + +This method is mainly used to customize UDTF. In this method, the user can do the following things: + +1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. +2. Set the strategy to access the raw data and set the output data type in UDTFConfigurations. +3. Create resources, such as establishing external connections, opening files, etc. + + + + +#### UDFParameters + +`UDFParameters` is used to parse UDF parameters in SQL statements (the part in parentheses after the UDF function name in SQL). The input parameters have two parts. The first part is data types of the time series that the UDF needs to process, and the second part is the key-value pair attributes for customization. Only the second part can be empty. + + +Example: + +``` sql +SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; +``` + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + String stringValue = parameters.getString("key1"); // iotdb + Float floatValue = parameters.getFloat("key2"); // 123.45 + Double doubleValue = parameters.getDouble("key3"); // null + int intValue = parameters.getIntOrDefault("key4", 678); // 678 + // do something + + // configurations + // ... +} +``` + + + +#### UDTFConfigurations + +You must use `UDTFConfigurations` to specify the strategy used by UDF to access raw data and the type of output sequence. + +Usage: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(Type.INT32); +} +``` + +The `setAccessStrategy` method is used to set the UDF's strategy for accessing the raw data, and the `setOutputDataType` method is used to set the data type of the output sequence. + + + +##### setAccessStrategy + +Note that the raw data access strategy you set here determines which `transform` method the framework will call. Please implement the `transform` method corresponding to the raw data access strategy. Of course, you can also dynamically decide which strategy to set based on the attribute parameters parsed by `UDFParameters`. Therefore, two `transform` methods are also allowed to be implemented in one UDF. + +The following are the strategies you can set: + +| Interface definition | Description | The `transform` Method to Call | +| :-------------------------------- |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------------------ | +| `RowByRowAccessStrategy` | Process raw data row by row. The framework calls the `transform` method once for each row of raw data input. When UDF has only one input sequence, a row of input is one data point in the input sequence. When UDF has multiple input sequences, one row of input is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(Row row, PointCollector collector) throws Exception` | +| `SlidingTimeWindowAccessStrategy` | Process a batch of data in a fixed time interval each time. We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `SlidingSizeWindowAccessStrategy` | The raw data is processed batch by batch, and each batch contains a fixed number of raw data rows (except the last batch). We call the container of a data batch a window. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `SessionTimeWindowAccessStrategy` | The raw data is processed batch by batch. We call the container of a data batch a window. The time interval between each two windows is greater than or equal to the `sessionGap` given by the user. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window, and each row is a result record of the raw query (aligned by time) on these input sequences. (In a row, there may be a column with a value of `null`, but not all of them are `null`) | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `StateWindowAccessStrategy` | The raw data is processed batch by batch. We call the container of a data batch a window. In the state window, for text type or boolean type data, each value of the point in window is equal to the value of the first point in the window, and for numerical data, the distance between each value of the point in window and the value of the first point in the window is less than the threshold `delta` given by the user. The framework calls the `transform` method once for each raw data input window. There may be multiple rows of data in a window. Currently, we only support state window for one measurement, that is, a column of data. | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | + + +`RowByRowAccessStrategy`: The construction of `RowByRowAccessStrategy` does not require any parameters. + +The `SlidingTimeWindowAccessStrategy` is shown schematically below. + + +`SlidingTimeWindowAccessStrategy`: `SlidingTimeWindowAccessStrategy` has many constructors, you can pass 3 types of parameters to them: + +- Parameter 1: The display window on the time axis +- Parameter 2: Time interval for dividing the time axis (should be positive) +- Parameter 3: Time sliding step (not required to be greater than or equal to the time interval, but must be a positive number) + +The first type of parameters are optional. If the parameters are not provided, the beginning time of the display window will be set to the same as the minimum timestamp of the query result set, and the ending time of the display window will be set to the same as the maximum timestamp of the query result set. + +The sliding step parameter is also optional. If the parameter is not provided, the sliding step will be set to the same as the time interval for dividing the time axis. + +The relationship between the three types of parameters can be seen in the figure below. Please see the Javadoc for more details. + +
+ +Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, there may be cases where the number of data rows in some time windows is 0. In these cases, the framework will also call the `transform` method for the empty windows. + +The `SlidingSizeWindowAccessStrategy` is shown schematically below. + + +`SlidingSizeWindowAccessStrategy`: `SlidingSizeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: + +* Parameter 1: Window size. This parameter specifies the number of data rows contained in a data processing window. Note that the number of data rows in some of the last time windows may be less than the specified number of data rows. +* Parameter 2: Sliding step. This parameter means the number of rows between the first point of the next window and the first point of the current window. (This parameter is not required to be greater than or equal to the window size, but must be a positive number) + +The sliding step parameter is optional. If the parameter is not provided, the sliding step will be set to the same as the window size. + +The `SessionTimeWindowAccessStrategy` is shown schematically below. **Time intervals less than or equal to the given minimum time interval `sessionGap` are assigned in one group** + + +`SessionTimeWindowAccessStrategy`: `SessionTimeWindowAccessStrategy` has many constructors, you can pass 2 types of parameters to them: +- Parameter 1: The display window on the time axis. +- Parameter 2: The minimum time interval `sessionGap` of two adjacent windows. + + +The `StateWindowAccessStrategy` is shown schematically below. **For numerical data, if the state difference is less than or equal to the given threshold `delta`, it will be assigned in one group. ** + + +`StateWindowAccessStrategy` has four constructors. +- Constructor 1: For numerical data, there are 3 parameters: the time axis can display the start and end time of the time window and the threshold `delta` for the allowable change within a single window. +- Constructor 2: For text data and boolean data, there are 3 parameters: the time axis can be provided to display the start and end time of the time window. For both data types, the data within a single window is same, and there is no need to provide an allowable change threshold. +- Constructor 3: For numerical data, there are 1 parameters: you can only provide the threshold delta that is allowed to change within a single window. The start time of the time axis display time window will be defined as the smallest timestamp in the entire query result set, and the time axis display time window end time will be defined as The largest timestamp in the entire query result set. +- Constructor 4: For text data and boolean data, you can provide no parameter. The start and end timestamps are explained in Constructor 3. + +StateWindowAccessStrategy can only take one column as input for now. + +Please see the Javadoc for more details. + + + +##### setOutputDataType + +Note that the type of output sequence you set here determines the type of data that the `PointCollector` can actually receive in the `transform` method. The relationship between the output data type set in `setOutputDataType` and the actual data output type that `PointCollector` can receive is as follows: + +| Output Data Type Set in `setOutputDataType` | Data Type that `PointCollector` Can Receive | +| :------------------------------------------ | :----------------------------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `java.lang.String` and `org.apache.iotdb.udf.api.type.Binary` | + +The type of output time series of a UDTF is determined at runtime, which means that a UDTF can dynamically determine the type of output time series according to the type of input time series. +Here is a simple example: + +```java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(parameters.getDataType(0)); +} +``` + + + +### void transform(Row row, PointCollector collector) throws Exception + +You need to implement this method when you specify the strategy of UDF to read the original data as `RowByRowAccessStrategy`. + +This method processes the raw data one row at a time. The raw data is input from `Row` and output by `PointCollector`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +The following is a complete UDF example that implements the `void transform(Row row, PointCollector collector) throws Exception` method. It is an adder that receives two columns of time series as input. When two data points in a row are not `null`, this UDF will output the algebraic sum of these two data points. + +``` java +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT64) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) throws Exception { + if (row.isNull(0) || row.isNull(1)) { + return; + } + collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); + } +} +``` + + + +### void transform(RowWindow rowWindow, PointCollector collector) throws Exception + +You need to implement this method when you specify the strategy of UDF to read the original data as `SlidingTimeWindowAccessStrategy` or `SlidingSizeWindowAccessStrategy`. + +This method processes a batch of data in a fixed number of rows or a fixed time interval each time, and we call the container containing this batch of data a window. The raw data is input from `RowWindow` and output by `PointCollector`. `RowWindow` can help you access a batch of `Row`, it provides a set of interfaces for random access and iterative access to this batch of `Row`. You can output any number of data points in one `transform` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +Below is a complete UDF example that implements the `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` method. It is a counter that receives any number of time series as input, and its function is to count and output the number of data rows in each time window within a specified time range. + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.access.RowWindow; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Counter implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new SlidingTimeWindowAccessStrategy( + parameters.getLong("time_interval"), + parameters.getLong("sliding_step"), + parameters.getLong("display_window_begin"), + parameters.getLong("display_window_end"))); + } + + @Override + public void transform(RowWindow rowWindow, PointCollector collector) { + if (rowWindow.windowSize() != 0) { + collector.putInt(rowWindow.windowStartTime(), rowWindow.windowSize()); + } + } +} +``` + + + +### void terminate(PointCollector collector) throws Exception + +In some scenarios, a UDF needs to traverse all the original data to calculate the final output data points. The `terminate` interface provides support for those scenarios. + +This method is called after all `transform` calls are executed and before the `beforeDestory` method is executed. You can implement the `transform` method to perform pure data processing (without outputting any data points), and implement the `terminate` method to output the processing results. + +The processing results need to be output by the `PointCollector`. You can output any number of data points in one `terminate` method call. It should be noted that the type of output data points must be the same as you set in the `beforeStart` method, and the timestamps of output data points must be strictly monotonically increasing. + +Below is a complete UDF example that implements the `void terminate(PointCollector collector) throws Exception` method. It takes one time series whose data type is `INT32` as input, and outputs the maximum value point of the series. + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Max implements UDTF { + + private Long time; + private int value; + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) { + if (row.isNull(0)) { + return; + } + int candidateValue = row.getInt(0); + if (time == null || value < candidateValue) { + time = row.getTime(); + value = candidateValue; + } + } + + @Override + public void terminate(PointCollector collector) throws IOException { + if (time != null) { + collector.putInt(time, value); + } + } +} +``` + + + +### void beforeDestroy() + +The method for terminating a UDF. + +This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. + + + +## Maven Project Example + +If you use Maven, you can build your own UDF project referring to our **udf-example** module. You can find the project [here](https://github.com/apache/iotdb/tree/master/example/udf). + + + +## UDF Registration + +The process of registering a UDF in IoTDB is as follows: + +1. Implement a complete UDF class, assuming the full class name of this class is `org.apache.iotdb.udf.ExampleUDTF`. +2. Package your project into a JAR. If you use Maven to manage your project, you can refer to the Maven project example above. +3. Make preparations for registration according to the registration mode. For details, see the following example. +4. You can use following SQL to register UDF. +```sql +CREATE FUNCTION AS (USING URI URI-STRING)? +``` + +### Example: register UDF named `example`, you can choose either of the following two registration methods + +#### No URI + +Prepare: +When use this method to register,you should put JAR to directory `iotdb-server-1.0.0-all-bin/ext/udf`(directory can config). +**Note,you should put JAR to this directory of all DataNodes if using Cluster** + +SQL: +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' +``` + +#### Using URI + +Prepare: +When use this method to register,you need to upload the JAR to URI server and ensure the IoTDB instance executing this registration statement has access to the URI server. +**Note,you needn't place JAR manually,IoTDB will download the JAR and sync it.** + +SQL: +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' +``` + +### Note +Since UDF instances are dynamically loaded through reflection technology, you do not need to restart the server during the UDF registration process. + +UDF function names are not case-sensitive. + +Please ensure that the function name given to the UDF is different from all built-in function names. A UDF with the same name as a built-in function cannot be registered. + +We recommend that you do not use classes that have the same class name but different function logic in different JAR packages. For example, in `UDF(UDAF/UDTF): udf1, udf2`, the JAR package of udf1 is `udf1.jar` and the JAR package of udf2 is `udf2.jar`. Assume that both JAR packages contain the `org.apache.iotdb.udf.ExampleUDTF` class. If you use two UDFs in the same SQL statement at the same time, the system will randomly load either of them and may cause inconsistency in UDF execution behavior. + +## UDF Deregistration + +The following shows the SQL syntax of how to deregister a UDF. + +```sql +DROP FUNCTION +``` + +Here is an example: + +```sql +DROP FUNCTION example +``` + + + +## UDF Queries + +The usage of UDF is similar to that of built-in aggregation functions. + + + +### Basic SQL syntax support + +* Support `SLIMIT` / `SOFFSET` +* Support `LIMIT` / `OFFSET` +* Support queries with time filters +* Support queries with value filters + + +### Queries with * in SELECT Clauses + +Assume that there are 2 time series (`root.sg.d1.s1` and `root.sg.d1.s2`) in the system. + +* **`SELECT example(*) from root.sg.d1`** + +Then the result set will include the results of `example (root.sg.d1.s1)` and `example (root.sg.d1.s2)`. + +* **`SELECT example(s1, *) from root.sg.d1`** + +Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)` and `example(root.sg.d1.s1, root.sg.d1.s2)`. + +* **`SELECT example(*, *) from root.sg.d1`** + +Then the result set will include the results of `example(root.sg.d1.s1, root.sg.d1.s1)`, `example(root.sg.d1.s2, root.sg.d1.s1)`, `example(root.sg.d1.s1, root.sg.d1.s2)` and `example(root.sg.d1.s2, root.sg.d1.s2)`. + + + +### Queries with Key-value Attributes in UDF Parameters + +You can pass any number of key-value pair parameters to the UDF when constructing a UDF query. The key and value in the key-value pair need to be enclosed in single or double quotes. Note that key-value pair parameters can only be passed in after all time series have been passed in. Here is a set of examples: + +``` sql +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; +``` + + + +### Nested Queries + +``` sql +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` + + + +## Show All Registered UDFs + +``` sql +SHOW FUNCTIONS +``` + + + +## User Permission Management + +There are 3 types of user permissions related to UDF: + +* `CREATE_FUNCTION`: Only users with this permission are allowed to register UDFs +* `DROP_FUNCTION`: Only users with this permission are allowed to deregister UDFs +* `READ_TIMESERIES`: Only users with this permission are allowed to use UDFs for queries + +For more user permissions related content, please refer to [Account Management Statements](../Administration-Management/Administration.md). + + + +## Configurable Properties + +You can use `udf_lib_dir` to config udf lib directory. +When querying by a UDF, IoTDB may prompt that there is insufficient memory. You can resolve the issue by configuring `udf_initial_byte_array_length_for_memory_control`, `udf_memory_budget_in_mb` and `udf_reader_transformer_collector_memory_proportion` in `iotdb-system.properties` and restarting the server. + + + +## Contribute UDF + + + +This part mainly introduces how external users can contribute their own UDFs to the IoTDB community. + + + +### Prerequisites + +1. UDFs must be universal. + + The "universal" mentioned here refers to: UDFs can be widely used in some scenarios. In other words, the UDF function must have reuse value and may be directly used by other users in the community. + + If you are not sure whether the UDF you want to contribute is universal, you can send an email to `dev@iotdb.apache.org` or create an issue to initiate a discussion. + +2. The UDF you are going to contribute has been well tested and can run normally in the production environment. + + + +### What you need to prepare + +1. UDF source code +2. Test cases +3. Instructions + + + +#### UDF Source Code + +1. Create the UDF main class and related classes in `iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin` or in its subfolders. +2. Register your UDF in `iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin/BuiltinTimeSeriesGeneratingFunction.java`. + + + +#### Test Cases + +At a minimum, you need to write integration tests for the UDF. + +You can add a test class in `integration-test/src/test/java/org/apache/iotdb/db/it/udf`. + + + +#### Instructions + +The instructions need to include: the name and the function of the UDF, the attribute parameters that must be provided when the UDF is executed, the applicable scenarios, and the usage examples, etc. + +The instructions should be added in `docs/UserGuide/Operation Manual/DML Data Manipulation Language.md`. + + + +### Submit a PR + +When you have prepared the UDF source code, test cases, and instructions, you are ready to submit a Pull Request (PR) on [Github](https://github.com/apache/iotdb). You can refer to our code contribution guide to submit a PR: [Pull Request Guide](https://iotdb.apache.org/Development/HowToCommit.html). + + + +## Known Implementations + +### Built-in UDF + +1. Aggregate Functions, such as `SUM`. For details and examples, see the document [Aggregate Functions](../Operators-Functions/Aggregation.md). +2. Arithmetic Functions, such as `SIN`. For details and examples, see the document [Arithmetic Operators and Functions](../Operators-Functions/Mathematical.md). +3. Comparison Functions, such as `ON_OFF`. For details and examples, see the document [Comparison Operators and Functions](../Operators-Functions/Comparison.md). +4. String Processing Functions, such as `STRING_CONTAINS`. For details and examples, see the document [String Processing](../Operators-Functions/String.md). +5. Data Type Conversion Function, such as `CAST`. For details and examples, see the document [Data Type Conversion Function](../Operators-Functions/Conversion.md). +6. Constant Timeseries Generating Functions, such as `CONST`. For details and examples, see the document [Constant Timeseries Generating Functions](../Operators-Functions/Constant.md). +7. Selector Functions, such as `TOP_K`. For details and examples, see the document [Selector Functions](../Operators-Functions/Selection.md). +8. Continuous Interval Functions, such as `ZERO_DURATION`. For details and examples, see the document [Continuous Interval Functions](../Operators-Functions/Continuous-Interval.md). +9. Variation Trend Calculation Functions, such as `TIME_DIFFERENCE`. For details and examples, see the document [Variation Trend Calculation Functions](../Operators-Functions/Variation-Trend.md). +10. Sample Functions, such as `M4`. For details and examples, see the document [Sample Functions](../Operators-Functions/Sample.md). +11. Change Points Function, such as `CHANGE_POINTS`. For details and examples, see the document [Time-Series](../Operators-Functions/Time-Series.md). + +### Data Quality Function Library + +#### About + +For applications based on time series data, data quality is vital. **UDF Library** is IoTDB User Defined Functions (UDF) about data quality, including data profiling, data quality evalution and data repairing. It effectively meets the demand for data quality in the industrial field. + +#### Quick Start + +The functions in this function library are not built-in functions, and must be loaded into the system before use. + +1. [Download](https://archive.apache.org/dist/iotdb/1.0.1/apache-iotdb-1.0.1-library-udf-bin.zip) the JAR with all dependencies and the script of registering UDF. +2. Copy the JAR package to `ext\udf` under the directory of IoTDB system (Please put JAR to this directory of all DataNodes if you use Cluster). +3. Run `sbin\start-server.bat` (for Windows) or `sbin\start-server.sh` (for Linux or MacOS) to start IoTDB server. +4. Copy the script to the directory of IoTDB system (under the root directory, at the same level as `sbin`), modify the parameters in the script if needed and run it to register UDF. + +#### Functions + +1. Data Quality related functions, such as `Completeness`. For details and examples, see the document [Data-Quality](../Operators-Functions/Data-Quality.md). +2. Data Profiling related functions, such as `ACF`. For details and examples, see the document [Data-Profiling](../Operators-Functions/Data-Profiling.md). +3. Anomaly Detection related functions, such as `IQR`. For details and examples, see the document [Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md). +4. Frequency Domain Analysis related functions, such as `Conv`. For details and examples, see the document [Frequency-Domain](../Operators-Functions/Frequency-Domain.md). +5. Data Matching related functions, such as `DTW`. For details and examples, see the document [Data-Matching](../Operators-Functions/Data-Matching.md). +6. Data Repairing related functions, such as `TimestampRepair`. For details and examples, see the document [Data-Repairing](../Operators-Functions/Data-Repairing.md). +7. Series Discovery related functions, such as `ConsecutiveSequences`. For details and examples, see the document [Series-Discovery](../Operators-Functions/Series-Discovery.md). +8. Machine Learning related functions, such as `AR`. For details and examples, see the document [Machine-Learning](../Operators-Functions/Machine-Learning.md). + + + + +## Q&A + +Q1: How to modify the registered UDF? + +A1: Assume that the name of the UDF is `example` and the full class name is `org.apache.iotdb.udf.ExampleUDTF`, which is introduced by `example.jar`. + +1. Unload the registered function by executing `DROP FUNCTION example`. +2. Delete `example.jar` under `iotdb-server-1.0.0-all-bin/ext/udf`. +3. Modify the logic in `org.apache.iotdb.udf.ExampleUDTF` and repackage it. The name of the JAR package can still be `example.jar`. +4. Upload the new JAR package to `iotdb-server-1.0.0-all-bin/ext/udf`. +5. Load the new UDF by executing `CREATE FUNCTION example AS "org.apache.iotdb.udf.ExampleUDTF"`. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Variation-Trend.md b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Variation-Trend.md new file mode 100644 index 00000000..5f5300f8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Variation-Trend.md @@ -0,0 +1,114 @@ + + +# Variation Trend Calculation Functions + +Currently, IoTDB supports the following variation trend calculation functions: + +| Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | +| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | / | INT64 | Calculates the difference between the time stamp of a data point and the time stamp of the previous data point. There is no corresponding output for the first data point. | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Calculates the absolute value of the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point. | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the rate of change of a data point compared to the previous data point, the result is equals to DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the absolute value of the rate of change of a data point compared to the previous data point, the result is equals to NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:optional,default is true. If is true, the previous data point is ignored when it is null and continues to find the first non-null value forwardly. If the value is false, previous data point is not ignored when it is null, the result is also null because null is used for subtraction | DOUBLE | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point, so output is null | + +Example: + +``` sql +select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +Result: + +``` ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +| Time| root.sg1.d1.s1|time_difference(root.sg1.d1.s1)|difference(root.sg1.d1.s1)|non_negative_difference(root.sg1.d1.s1)|derivative(root.sg1.d1.s1)|non_negative_derivative(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 1| -8431715764844238876| 8431715764844238876| -8.4317157648442388E18| 8.4317157648442388E18| +|2020-12-10T17:11:49.038+08:00|4377791063319964531| 1| -2982932021602795251| 2982932021602795251| -2.982932021602795E18| 2.982932021602795E18| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 1| 3594694504414678384| 3594694504414678384| 3.5946945044146785E18| 3.5946945044146785E18| +|2020-12-10T17:11:49.040+08:00|2508858212791964081| 1| -5463627354942678834| 5463627354942678834| -5.463627354942679E18| 5.463627354942679E18| +|2020-12-10T17:11:49.041+08:00|2817297431185141819| 1| 308439218393177738| 308439218393177738| 3.0843921839317773E17| 3.0843921839317773E17| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +Total line number = 5 +It costs 0.014s +``` + +## Example + +### RawData + +``` ++-----------------------------+------------+------------+ +| Time|root.test.s1|root.test.s2| ++-----------------------------+------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| null| +|1970-01-01T08:00:00.003+08:00| null| 3.0| +|1970-01-01T08:00:00.004+08:00| 4| null| +|1970-01-01T08:00:00.005+08:00| 5| 5.0| +|1970-01-01T08:00:00.006+08:00| null| 6.0| ++-----------------------------+------------+------------+ +``` + +### Not use `ignoreNull` attribute (Ignore Null) + +SQL: +```sql +SELECT DIFF(s1), DIFF(s2) from root.test; +``` + +Result: +``` ++-----------------------------+------------------+------------------+ +| Time|DIFF(root.test.s1)|DIFF(root.test.s2)| ++-----------------------------+------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| 2.0| +|1970-01-01T08:00:00.004+08:00| 2.0| null| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+------------------+------------------+ +``` + +### Use `ignoreNull` attribute + +SQL: +```sql +SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; +``` + +Result: +``` ++-----------------------------+----------------------------------------+----------------------------------------+ +| Time|DIFF(root.test.s1, "ignoreNull"="false")|DIFF(root.test.s2, "ignoreNull"="false")| ++-----------------------------+----------------------------------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| null| +|1970-01-01T08:00:00.004+08:00| null| null| +|1970-01-01T08:00:00.005+08:00| 1.0| null| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+----------------------------------------+----------------------------------------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Performance.md b/src/UserGuide/V2.0.1/Tree/stage/Performance.md new file mode 100644 index 00000000..a428d141 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Performance.md @@ -0,0 +1,38 @@ + + +# Performance + +This chapter introduces the performance characteristics of IoTDB from the perspectives of database connection, database read and write performance, and storage performance. +The test tool uses IoTDBBenchmark, an open source time series database benchmark tool. + +## Database connection + +- Support high concurrent connections, a single server can support tens of thousands of concurrent connections per second. + + +## Read and write performance + +- It has the characteristics of high write throughput, a single core can handle more than tens of thousands of write requests per second, and the write performance of a single server can reach tens of millions of points per second; the cluster can be linearly scaled, and the write performance of the cluster can reach hundreds of millions points/second. +- It has the characteristics of high query throughput and low query latency, a single server supports tens of millions of points/second query throughput, and can aggregate tens of billions of data points in milliseconds. +- +## Storage performance + +- Supports the storage of massive data, with the storage and processing capabilities of PB-level data. +- Support high compression ratio, lossless compression can reach 20 times compression ratio, lossy compression can reach 100 times compression ratio. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Programming-Thrift.md b/src/UserGuide/V2.0.1/Tree/stage/Programming-Thrift.md new file mode 100644 index 00000000..0b200cb2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Programming-Thrift.md @@ -0,0 +1,157 @@ + + +# Communication Service protocol + +## Thrift rpc interface + +### introduction + +Thrift is a remote procedure call software framework for the development of extensible and cross-language services. +It combines a powerful software stack and code generation engine, +In order to build seamlessly integrated and efficient services among programming languages ​​such as C++, Java, Go, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml. + +IoTDB server and client use thrift for communication. In actual use, it is recommended to use the native client package provided by IoTDB: +Session or Session Pool. If you have special needs, you can also program directly against the RPC interface + +The default IoTDB server uses port 6667 as the RPC communication port, you can modify the configuration item +``` +rpc_port=6667 +``` +to change the default thrift port + + +### rpc interface + +``` +// open a session +TSOpenSessionResp openSession(1:TSOpenSessionReq req); + +// close a session +TSStatus closeSession(1:TSCloseSessionReq req); + +// run an SQL statement in batch +TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); + +// execute SQL statement in batch +TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); + +// execute query SQL statement +TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); + +// execute insert, delete and update SQL statement +TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); + +// fetch next query result +TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) + +// fetch meta data +TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) + +// cancel a query +TSStatus cancelOperation(1:TSCancelOperationReq req); + +// close a query dataset +TSStatus closeOperation(1:TSCloseOperationReq req); + +// get time zone +TSGetTimeZoneResp getTimeZone(1:i64 sessionId); + +// set time zone +TSStatus setTimeZone(1:TSSetTimeZoneReq req); + +// get server's properties +ServerProperties getProperties(); + +// CREATE DATABASE +TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); + +// create timeseries +TSStatus createTimeseries(1:TSCreateTimeseriesReq req); + +// create multi timeseries +TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); + +// delete timeseries +TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) + +// delete sttorage groups +TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); + +// insert record +TSStatus insertRecord(1:TSInsertRecordReq req); + +// insert record in string format +TSStatus insertStringRecord(1:TSInsertStringRecordReq req); + +// insert tablet +TSStatus insertTablet(1:TSInsertTabletReq req); + +// insert tablets in batch +TSStatus insertTablets(1:TSInsertTabletsReq req); + +// insert records in batch +TSStatus insertRecords(1:TSInsertRecordsReq req); + +// insert records of one device +TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// insert records in batch as string format +TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); + +// test the latency of innsert tablet,caution:no data will be inserted, only for test latency +TSStatus testInsertTablet(1:TSInsertTabletReq req); + +// test the latency of innsert tablets,caution:no data will be inserted, only for test latency +TSStatus testInsertTablets(1:TSInsertTabletsReq req); + +// test the latency of innsert record,caution:no data will be inserted, only for test latency +TSStatus testInsertRecord(1:TSInsertRecordReq req); + +// test the latency of innsert record in string format,caution:no data will be inserted, only for test latency +TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); + +// test the latency of innsert records,caution:no data will be inserted, only for test latency +TSStatus testInsertRecords(1:TSInsertRecordsReq req); + +// test the latency of innsert records of one device,caution:no data will be inserted, only for test latency +TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// test the latency of innsert records in string formate,caution:no data will be inserted, only for test latency +TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); + +// delete data +TSStatus deleteData(1:TSDeleteDataReq req); + +// execute raw data query +TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); + +// request a statement id from server +i64 requestStatementId(1:i64 sessionId); +``` + +### IDL file path +IDL file path is "thrift/src/main/thrift/rpc.thrift" which includes interface and struct + +### target file path +We will use thrift compile IDL file in mvn Compilation, in which generate target .class file +target file path is "thrift/target/classes/org/apache/iotdb/service/rpc/thrift" + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Align-By.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Align-By.md new file mode 100644 index 00000000..ae117d20 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Align-By.md @@ -0,0 +1,62 @@ + + +# Query Alignment Mode + +In addition, IoTDB supports another result set format: `ALIGN BY DEVICE`. + +## align by device + +The `ALIGN BY DEVICE` indicates that the deviceId is considered as a column. Therefore, there are totally limited columns in the dataset. + +> NOTE: +> +> 1.You can see the result of 'align by device' as one relational table, `Time + Device` is the primary key of this Table. +> +> 2.The result is order by `Device` firstly, and then by `Time` order. + +The SQL statement is: + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` + +The result shows below: + +``` ++-----------------------------+-----------------+-----------+------+--------+ +| Time| Device|temperature|status|hardware| ++-----------------------------+-----------------+-----------+------+--------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| 25.96| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| 24.36| true| null| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| null| true| v1| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| null| false| v2| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| null| true| v2| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| null| true| v2| ++-----------------------------+-----------------+-----------+------+--------+ +Total line number = 6 +It costs 0.012s +``` +## Ordering in ALIGN BY DEVICE + +ALIGN BY DEVICE mode arranges according to the device first, and sort each device in ascending order according to the timestamp. The ordering and priority can be adjusted through `ORDER BY` clause. + +For details and examples, see the document [Order By](./Order-By.md). diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Continuous-Query.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Continuous-Query.md new file mode 100644 index 00000000..a31216c2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Continuous-Query.md @@ -0,0 +1,581 @@ + + +# Continuous Query(CQ) + +## Introduction +Continuous queries(CQ) are queries that run automatically and periodically on realtime data and store query results in other specified time series. + +## Syntax + +```sql +CREATE (CONTINUOUS QUERY | CQ) +[RESAMPLE + [EVERY ] + [BOUNDARY ] + [RANGE [, end_time_offset]] +] +[TIMEOUT POLICY BLOCKED|DISCARD] +BEGIN + SELECT CLAUSE + INTO CLAUSE + FROM CLAUSE + [WHERE CLAUSE] + [GROUP BY([, ]) [, level = ]] + [HAVING CLAUSE] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +END +``` +> Note: +> 1. If there exists any time filters in WHERE CLAUSE, IoTDB will throw an error, because IoTDB will automatically generate a time range for the query each time it's executed. +> 2. GROUP BY TIME CLAUSE is different, it doesn't contain its original first display window parameter which is [start_time, end_time). It's still because IoTDB will automatically generate a time range for the query each time it's executed. +> 3. If there is no group by time clause in query, EVERY clause is required, otherwise IoTDB will throw an error. + +### Descriptions of parameters in CQ syntax + +- `` specifies the globally unique id of CQ. +- `` specifies the query execution time interval. We currently support the units of ns, us, ms, s, m, h, d, w, and its value should not be lower than the minimum threshold configured by the user, which is `continuous_query_min_every_interval`. It's an optional parameter, default value is set to `group_by_interval` in group by clause. +- `` specifies the start time of each query execution as `now()-`. We currently support the units of ns, us, ms, s, m, h, d, w.It's an optional parameter, default value is set to `every_interval` in resample clause. +- `` specifies the end time of each query execution as `now()-`. We currently support the units of ns, us, ms, s, m, h, d, w.It's an optional parameter, default value is set to `0`. +- `` is a date that represents the execution time of a certain cq task. + - `` can be earlier than, equals to, later than **current time**. + - This parameter is optional. If not specified, it is equal to `BOUNDARY 0`。 + - **The start time of the first time window** is ` - `. + - **The end time of the first time window** is ` - `. + - The **time range** of the `i (1 <= i)th` window is `[ - + (i - 1) * , - + (i - 1) * )`. + - If the **current time** is earlier than or equal to `execution_boundary_time`, then the first execution moment of the continuous query is `execution_boundary_time`. + - If the **current time** is later than `execution_boundary_time`, then the first execution moment of the continuous query is the first `execution_boundary_time + i * ` that is later than or equal to the current time . + +> - ``,`` and `` should all be greater than `0`. +> - The value of `` should be less than or equal to the value of ``, otherwise the system will throw an error. +> - Users should specify the appropriate `` and `` according to actual needs. +> - If `` is greater than ``, there will be partial data overlap in each query window. +> - If `` is less than ``, there may be uncovered data between each query window. +> - `start_time_offset` should be larger than `end_time_offset`, otherwise the system will throw an error. + +#### `` == `` + +![1](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic1.png?raw=true) + +#### `` > `` + +![2](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic2.png?raw=true) + +#### `` < `` + +![3](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic3.png?raw=true) + +#### `` is not zero + +![4](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic4.png?raw=true) + + +- `TIMEOUT POLICY` specify how we deal with the cq task whose previous time interval execution is not finished while the next execution time has reached. The default value is `BLOCKED`. + - `BLOCKED` means that we will block and wait to do the current cq execution task until the previous time interval cq task finishes. If using `BLOCKED` policy, all the time intervals will be executed, but it may be behind the latest time interval. + - `DISCARD` means that we just discard the current cq execution task and wait for the next execution time and do the next time interval cq task. If using `DISCARD` policy, some time intervals won't be executed when the execution time of one cq task is longer than the ``. However, once a cq task is executed, it will use the latest time interval, so it can catch up at the sacrifice of some time intervals being discarded. + + +## Examples of CQ + +The examples below use the following sample data. It's a real time data stream and we can assume that the data arrives on time. +```` ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.ln.wf02.wt02.temperature|root.ln.wf02.wt01.temperature|root.ln.wf01.wt02.temperature|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|2021-05-11T22:18:14.598+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:19.941+08:00| 0.0| 68.0| 68.0| 103.0| +|2021-05-11T22:18:24.949+08:00| 122.0| 45.0| 11.0| 14.0| +|2021-05-11T22:18:29.967+08:00| 47.0| 14.0| 59.0| 181.0| +|2021-05-11T22:18:34.979+08:00| 182.0| 113.0| 29.0| 180.0| +|2021-05-11T22:18:39.990+08:00| 42.0| 11.0| 52.0| 19.0| +|2021-05-11T22:18:44.995+08:00| 78.0| 38.0| 123.0| 52.0| +|2021-05-11T22:18:49.999+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:55.003+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +```` + +### Configuring execution intervals + +Use an `EVERY` interval in the `RESAMPLE` clause to specify the CQ’s execution interval, if not specific, default value is equal to `group_by_interval`. + +```sql +CREATE CONTINUOUS QUERY cq1 +RESAMPLE EVERY 20s +BEGIN +SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq1` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. + +`cq1` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq1` runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq1` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq1` won't deal with data that is before the current time window which is `2021-05-11T22:18:20.000+08:00`, so here are the results: +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### Configuring time range for resampling + +Use `start_time_offset` in the `RANGE` clause to specify the start time of the CQ’s time range, if not specific, default value is equal to `EVERY` interval. + +```sql +CREATE CONTINUOUS QUERY cq2 +RESAMPLE RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq2` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. + +`cq2` executes at 10-second intervals, the same interval as the `group_by_interval`. Every 10 seconds, `cq2` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()` , that is, the time range between 40 seconds prior to `now()` and `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq2` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| NULL| NULL| NULL| NULL| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:18:50.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:10, 2021-05-11T22:18:50)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq2` won't write lines that are all null. Notice `cq2` will also calculate the results for some time interval many times. Here are the results: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### Configuring execution intervals and CQ time ranges + +Use an `EVERY` interval and `RANGE` interval in the `RESAMPLE` clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use `fill()` to change the value reported for time intervals with no data. + +```sql +CREATE CONTINUOUS QUERY cq3 +RESAMPLE EVERY 20s RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq3` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value `100.0` for time intervals with no results. + +`cq3` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq3` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()`, that is, the time range between 40 seconds prior to `now()` and `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq3` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +Notice that `cq3` will calculate the results for some time interval many times, so here are the results: +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### Configuring end_time_offset for CQ time range + +Use an `EVERY` interval and `RANGE` interval in the RESAMPLE clause to specify the CQ’s execution interval and the length of the CQ’s time range. And use `fill()` to change the value reported for time intervals with no data. + +```sql +CREATE CONTINUOUS QUERY cq4 +RESAMPLE EVERY 20s RANGE 40s, 20s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq4` calculates the 10-second average of `temperature` sensor under the `root.ln` prefix path and stores the results in the `temperature_max` sensor using the same prefix path as the corresponding sensor. Where possible, it writes the value `100.0` for time intervals with no results. + +`cq4` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq4` runs a single query that covers the time range between `now()` minus the `start_time_offset` and `now()` minus the `end_time_offset`, that is, the time range between 40 seconds prior to `now()` and 20 seconds prior to `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq4` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:20)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +Notice that `cq4` will calculate the results for all time intervals only once after a delay of 20 seconds, so here are the results: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### CQ without group by clause + +Use an `EVERY` interval in the `RESAMPLE` clause to specify the CQ’s execution interval and the length of the CQ’s time range. + +```sql +CREATE CONTINUOUS QUERY cq5 +RESAMPLE EVERY 20s +BEGIN + SELECT temperature + 1 + INTO root.precalculated_sg.::(temperature) + FROM root.ln.*.* + align by device +END +``` + +`cq5` calculates the `temperature + 1` under the `root.ln` prefix path and stores the results in the `root.precalculated_sg` database. Sensors use the same prefix path as the corresponding sensor. + +`cq5` executes at 20-second intervals, the same interval as the `EVERY` interval. Every 20 seconds, `cq5` runs a single query that covers the time range for the current time bucket, that is, the 20-second time bucket that intersects with `now()`. + +Supposing that the current time is `2021-05-11T22:18:40.000+08:00`, we can see annotated log output about `cq5` running at DataNode if you set log level to DEBUG: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq5` generate 16 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| ++-----------------------------+-------------------------------+-----------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq5` generate 12 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +> +```` + +`cq5` won't deal with data that is before the current time window which is `2021-05-11T22:18:20.000+08:00`, so here are the results: + +```` +> SELECT temperature from root.precalculated_sg.*.* align by device; ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +```` + +## CQ Management + +### Listing continuous queries + +List every CQ on the IoTDB Cluster with: + +```sql +SHOW (CONTINUOUS QUERIES | CQS) +``` + +`SHOW (CONTINUOUS QUERIES | CQS)` order results by `cq_id`. + +#### Examples + +```sql +SHOW CONTINUOUS QUERIES; +``` + +we will get: + +| cq_id | query | state | +|:-------------|---------------------------------------------------------------------------------------------------------------------------------------|-------| +| s1_count_cq | CREATE CQ s1_count_cq
BEGIN
SELECT count(s1)
INTO root.sg_count.d.count_s1
FROM root.sg.d
GROUP BY(30m)
END | active | + + +### Dropping continuous queries + +Drop a CQ with a specific `cq_id`: + +```sql +DROP (CONTINUOUS QUERY | CQ) +``` + +DROP CQ returns an empty result. + +#### Examples + +Drop the CQ named `s1_count_cq`: + +```sql +DROP CONTINUOUS QUERY s1_count_cq; +``` + +### Altering continuous queries + +CQs can't be altered once they're created. To change a CQ, you must `DROP` and re`CREATE` it with the updated settings. + + +## CQ Use Cases + +### Downsampling and Data Retention + +Use CQs with `TTL` set on database in IoTDB to mitigate storage concerns. Combine CQs and `TTL` to automatically downsample high precision data to a lower precision and remove the dispensable, high precision data from the database. + +### Recalculating expensive queries + +Shorten query runtimes by pre-calculating expensive queries with CQs. Use a CQ to automatically downsample commonly-queried, high precision data to a lower precision. Queries on lower precision data require fewer resources and return faster. + +> Pre-calculate queries for your preferred graphing tool to accelerate the population of graphs and dashboards. + +### Substituting for sub-query + +IoTDB does not support sub queries. We can get the same functionality by creating a CQ as a sub query and store its result into other time series and then querying from those time series again will be like doing nested sub query. + +#### Example + +IoTDB does not accept the following query with a nested sub query. The query calculates the average number of non-null values of `s1` at 30 minute intervals: + +```sql +SELECT avg(count_s1) from (select count(s1) as count_s1 from root.sg.d group by([0, now()), 30m)); +``` + +To get the same results: + +**1. Create a CQ** + +This step performs the nested sub query in from clause of the query above. The following CQ automatically calculates the number of non-null values of `s1` at 30 minute intervals and writes those counts into the new `root.sg_count.d.count_s1` time series. + +```sql +CREATE CQ s1_count_cq +BEGIN + SELECT count(s1) + INTO root.sg_count.d(count_s1) + FROM root.sg.d + GROUP BY(30m) +END +``` + +**2. Query the CQ results** + +Next step performs the avg([...]) part of the outer query above. + +Query the data in the time series `root.sg_count.d.count_s1` to calculate the average of it: + +```sql +SELECT avg(count_s1) from root.sg_count.d; +``` + + +## System Parameter Configuration +| Name | Description | Data Type | Default Value | +| :---------------------------------- |-------- |-----------|---------------| +| `continuous_query_submit_thread` | The number of threads in the scheduled thread pool that submit continuous query tasks periodically | int32 | 2 | +| `continuous_query_min_every_interval_in_ms` | The minimum value of the continuous query execution time interval | duration | 1000 | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Fill.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Fill.md new file mode 100644 index 00000000..73c23bcc --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Fill.md @@ -0,0 +1,333 @@ + + +# Fill Null Value + +## Introduction + +When executing some queries, there may be no data for some columns in some rows, and data in these locations will be null, but this kind of null value is not conducive to data visualization and analysis, and the null value needs to be filled. + +In IoTDB, users can use the FILL clause to specify the fill mode when data is missing. Fill null value allows the user to fill any query result with null values according to a specific method, such as taking the previous value that is not null, or linear interpolation. The query result after filling the null value can better reflect the data distribution, which is beneficial for users to perform data analysis. + +## Syntax Definition + +**The following is the syntax definition of the `FILL` clause:** + +```sql +FILL '(' PREVIOUS | LINEAR | constant ')' +``` + +**Note:** +- We can specify only one fill method in the `FILL` clause, and this method applies to all columns of the result set. +- Null value fill is not compatible with version 0.13 and previous syntax (`FILL(([(, , )?])+)`) is not supported anymore. + +## Fill Methods + +**IoTDB supports the following three fill methods:** + +- `PREVIOUS`: Fill with the previous non-null value of the column. +- `LINEAR`: Fill the column with a linear interpolation of the previous non-null value and the next non-null value of the column. +- Constant: Fill with the specified constant. + +**Following table lists the data types and supported fill methods.** + +| Data Type | Supported Fill Methods | +| :-------- |:------------------------| +| boolean | previous, value | +| int32 | previous, linear, value | +| int64 | previous, linear, value | +| float | previous, linear, value | +| double | previous, linear, value | +| text | previous, value | + +**Note:** For columns whose data type does not support specifying the fill method, we neither fill it nor throw exception, just keep it as it is. + +**For examples:** + +If we don't use any fill methods: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000; +``` + +the original result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +### `PREVIOUS` Fill + +**For null values in the query result set, fill with the previous non-null value of the column.** + +**Note:** If the first value of this column is null, we will keep first value as null and won't fill it until we meet first non-null value + +For example, with `PREVIOUS` fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 21.93| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| false| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +**While using `FILL(PREVIOUS)`, you can specify a time interval. If the interval between the timestamp of the current null value and the timestamp of the previous non-null value exceeds the specified time interval, no filling will be performed.** + +> 1. In the case of FILL(LINEAR) and FILL(CONSTANT), if the second parameter is specified, an exception will be thrown +> 2. The interval parameter only supports integers + +For example, the raw data looks like this: + +```sql +select s1 from root.db.d1 +``` +``` ++-----------------------------+-------------+ +| Time|root.db.d1.s1| ++-----------------------------+-------------+ +|2023-11-08T16:41:50.008+08:00| 1.0| ++-----------------------------+-------------+ +|2023-11-08T16:46:50.011+08:00| 2.0| ++-----------------------------+-------------+ +|2023-11-08T16:48:50.011+08:00| 3.0| ++-----------------------------+-------------+ +``` + +We want to group the data by 1 min time interval: + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| null| ++-----------------------------+------------------+ +``` + +After grouping, we want to fill the null value: + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + +we also don't want the null value to be filled if it keeps null for 2 min. + +```sql +select avg(s1) +from root.db.d1 +group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS, 2m); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + + +### `LINEAR` Fill + +**For null values in the query result set, fill the column with a linear interpolation of the previous non-null value and the next non-null value of the column.** + +**Note:** +- If all the values before current value are null or all the values after current value are null, we will keep current value as null and won't fill it. +- If the column's data type is boolean/text, we neither fill it nor throw exception, just keep it as it is. + +Here we give an example of filling null values using the linear method. The SQL statement is as follows: + +For example, with `LINEAR` fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 22.08| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +### Constant Fill + +**For null values in the query result set, fill with the specified constant.** + +**Note:** +- When using the ValueFill, IoTDB neither fill the query result if the data type is different from the input constant nor throw exception, just keep it as it is. + + | Constant Value Data Type | Support Data Type | + |:-------------------------|:----------------------------------| + | `BOOLEAN` | `BOOLEAN` `TEXT` | + | `INT64` | `INT32` `INT64` `FLOAT` `DOUBLE` `TEXT` | + | `DOUBLE` | `FLOAT` `DOUBLE` `TEXT` | + | `TEXT` | `TEXT` | +- If constant value is larger than Integer.MAX_VALUE, IoTDB neither fill the query result if the data type is int32 nor throw exception, just keep it as it is. + +For example, with `FLOAT` constant fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 2.0| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +For example, with `BOOLEAN` constant fill, the SQL is as follows: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); +``` + +result will be like: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| true| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Group-By.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Group-By.md new file mode 100644 index 00000000..bf4fac27 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Group-By.md @@ -0,0 +1,930 @@ + + +# Group By Aggregate + +IoTDB supports using `GROUP BY` clause to aggregate the time series by segment and group. + +Segmented aggregation refers to segmenting data in the row direction according to the time dimension, aiming at the time relationship between different data points in the same time series, and obtaining an aggregated value for each segment. Currently only **group by time**、**group by variation**、**group by condition**、**group by session** and **group by count** is supported, and more segmentation methods will be supported in the future. + +Group aggregation refers to grouping the potential business attributes of time series for different time series. Each group contains several time series, and each group gets an aggregated value. Support **group by path level** and **group by tag** two grouping methods. + +## Aggregate By Segment +### Aggregate By Time + +Aggregate by time is a typical query method for time series data. Data is collected at high frequency and needs to be aggregated and calculated at certain time intervals. For example, to calculate the daily average temperature, the sequence of temperature needs to be segmented by day, and then calculated. average value. + +Aggregate by time refers to a query method that uses a lower frequency than the time frequency of data collection, and is a special case of segmented aggregation. For example, the frequency of data collection is one second. If you want to display the data in one minute, you need to use time aggregagtion. + +This section mainly introduces the related examples of time aggregation, using the `GROUP BY` clause. IoTDB supports partitioning result sets according to time interval and customized sliding step. And by default results are sorted by time in ascending order. + +The GROUP BY statement provides users with three types of specified parameters: + +* Parameter 1: The display window on the time axis +* Parameter 2: Time interval for dividing the time axis(should be positive) +* Parameter 3: Time sliding step (optional and defaults to equal the time interval if not set) + +The actual meanings of the three types of parameters are shown in Figure below. +Among them, the parameter 3 is optional. + +
+
+ +There are three typical examples of frequency reduction aggregation: + +#### Aggregate By Time without Specifying the Sliding Step Length + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); +``` +which means: + +Since the sliding step length is not specified, the `GROUP BY` statement by default set the sliding step the same as the time interval which is `1d`. + +The fist parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2017-11-07T23:00:00). + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1d) as time interval and startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [0,1d), [1d, 2d), [2d, 3d), etc. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-11-01T00:00:00, 2017-11-07 T23:00:00]), and map these data to the previously segmented time axis (in this case there are mapped data in every 1-day period from 2017-11-01T00:00:00 to 2017-11-07T23:00:00:00). + +Since there is data for each time period in the result range to be displayed, the execution result of the SQL statement is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 1440| 26.0| +|2017-11-02T00:00:00.000+08:00| 1440| 26.0| +|2017-11-03T00:00:00.000+08:00| 1440| 25.99| +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.024s +``` + +#### Aggregate By Time Specifying the Sliding Step Length + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); +``` + +which means: + +Since the user specifies the sliding step parameter as 1d, the `GROUP BY` statement will move the time interval `1 day` long instead of `3 hours` as default. + +That means we want to fetch all the data of 00:00:00 to 02:59:59 every day from 2017-11-01 to 2017-11-07. + +The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2017-11-07T23:00:00). + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (3h) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-11-01T00:00:00, 2017-11-01T03:00:00), [2017-11-02T00:00:00, 2017-11-02T03:00:00), [2017-11-03T00:00:00, 2017-11-03T03:00:00), etc. + +The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-11-01T00:00:00, 2017-11-07T23:00:00]), and map these data to the previously segmented time axis (in this case there are mapped data in every 3-hour period for each day from 2017-11-01T00:00:00 to 2017-11-07T23:00:00:00). + +Since there is data for each time period in the result range to be displayed, the execution result of the SQL statement is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-02T00:00:00.000+08:00| 180| 25.98| +|2017-11-03T00:00:00.000+08:00| 180| 25.96| +|2017-11-04T00:00:00.000+08:00| 180| 25.96| +|2017-11-05T00:00:00.000+08:00| 180| 26.0| +|2017-11-06T00:00:00.000+08:00| 180| 25.85| +|2017-11-07T00:00:00.000+08:00| 180| 25.99| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.006s +``` + +The sliding step can be smaller than the interval, in which case there is overlapping time between the aggregation windows (similar to a sliding window). + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-01 10:00:00), 4h, 2h); +``` + +The execution result of the SQL statement is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-01T02:00:00.000+08:00| 180| 25.98| +|2017-11-01T04:00:00.000+08:00| 180| 25.96| +|2017-11-01T06:00:00.000+08:00| 180| 25.96| +|2017-11-01T08:00:00.000+08:00| 180| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 5 +It costs 0.006s +``` + +#### Aggregate by Natural Month + +The SQL statement is: + +```sql +select count(status) from root.ln.wf01.wt01 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +which means: + +Since the user specifies the sliding step parameter as `2mo`, the `GROUP BY` statement will move the time interval `2 months` long instead of `1 month` as default. + +The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-11-01T00:00:00, 2019-11-07T23:00:00). + +The start time is 2017-11-01T00:00:00. The sliding step will increment monthly based on the start date, and the 1st day of the month will be used as the time interval's start time. + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1mo) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-11-01T00:00:00, 2017-12-01T00:00:00), [2018-02-01T00:00:00, 2018-03-01T00:00:00), [2018-05-03T00:00:00, 2018-06-01T00:00:00)), etc. + +The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of (2017-11-01T00:00:00, 2019-11-07T23:00:00], and map these data to the previously segmented time axis (in this case there are mapped data of the first month in every two month period from 2017-11-01T00:00:00 to 2019-11-07T23:00:00). + +The SQL execution result is: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-01T00:00:00.000+08:00| 259| +|2018-01-01T00:00:00.000+08:00| 250| +|2018-03-01T00:00:00.000+08:00| 259| +|2018-05-01T00:00:00.000+08:00| 251| +|2018-07-01T00:00:00.000+08:00| 242| +|2018-09-01T00:00:00.000+08:00| 225| +|2018-11-01T00:00:00.000+08:00| 216| +|2019-01-01T00:00:00.000+08:00| 207| +|2019-03-01T00:00:00.000+08:00| 216| +|2019-05-01T00:00:00.000+08:00| 207| +|2019-07-01T00:00:00.000+08:00| 199| +|2019-09-01T00:00:00.000+08:00| 181| +|2019-11-01T00:00:00.000+08:00| 60| ++-----------------------------+-------------------------------+ +``` + +The SQL statement is: + +```sql +select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +which means: + +Since the user specifies the sliding step parameter as `2mo`, the `GROUP BY` statement will move the time interval `2 months` long instead of `1 month` as default. + +The first parameter of the `GROUP BY` statement above is the display window parameter, which determines the final display range is [2017-10-31T00:00:00, 2019-11-07T23:00:00). + +Different from the previous example, the start time is set to 2017-10-31T00:00:00. The sliding step will increment monthly based on the start date, and the 31st day of the month meaning the last day of the month will be used as the time interval's start time. If the start time is set to the 30th date, the sliding step will use the 30th or the last day of the month. + +The start time is 2017-10-31T00:00:00. The sliding step will increment monthly based on the start time, and the 1st day of the month will be used as the time interval's start time. + +The second parameter of the `GROUP BY` statement above is the time interval for dividing the time axis. Taking this parameter (1mo) as time interval and the startTime of the display window as the dividing origin, the time axis is divided into several continuous intervals, which are [2017-10-31T00:00:00, 2017-11-31T00:00:00), [2018-02-31T00:00:00, 2018-03-31T00:00:00), [2018-05-31T00:00:00, 2018-06-31T00:00:00), etc. + +The third parameter of the `GROUP BY` statement above is the sliding step for each time interval moving. + +Then the system will use the time and value filtering condition in the `WHERE` clause and the first parameter of the `GROUP BY` statement as the data filtering condition to obtain the data satisfying the filtering condition (which in this case is the data in the range of [2017-10-31T00:00:00, 2019-11-07T23:00:00) and map these data to the previously segmented time axis (in this case there are mapped data of the first month in every two month period from 2017-10-31T00:00:00 to 2019-11-07T23:00:00). + +The SQL execution result is: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-10-31T00:00:00.000+08:00| 251| +|2017-12-31T00:00:00.000+08:00| 250| +|2018-02-28T00:00:00.000+08:00| 259| +|2018-04-30T00:00:00.000+08:00| 250| +|2018-06-30T00:00:00.000+08:00| 242| +|2018-08-31T00:00:00.000+08:00| 225| +|2018-10-31T00:00:00.000+08:00| 216| +|2018-12-31T00:00:00.000+08:00| 208| +|2019-02-28T00:00:00.000+08:00| 216| +|2019-04-30T00:00:00.000+08:00| 208| +|2019-06-30T00:00:00.000+08:00| 199| +|2019-08-31T00:00:00.000+08:00| 181| +|2019-10-31T00:00:00.000+08:00| 69| ++-----------------------------+-------------------------------+ +``` + +#### Left Open And Right Close Range + +The SQL statement is: + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); +``` + +In this sql, the time interval is left open and right close, so we won't include the value of timestamp 2017-11-01T00:00:00 and instead we will include the value of timestamp 2017-11-07T23:00:00. + +We will get the result like following: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------------+ +Total line number = 7 +It costs 0.004s +``` +### Aggregation By Variation +IoTDB supports grouping by continuous stable values through the `GROUP BY VARIATION` statement. + +Group-By-Variation wil set the first point in group as the base point, +then if the difference between the new data and base point is small than or equal to delta, +the data point will be grouped together and execute aggregation query (The calculation of difference and the meaning of delte are introduced below). The groups won't overlap and there is no fixed start time and end time. +The syntax of clause is as follows: +```sql +group by variation(controlExpression[,delta][,ignoreNull=true/false]) +``` +The different parameters mean: +* controlExpression + +The value that is used to calculate difference. It can be any columns or the expression of them. +* delta + +The threshold that is used when grouping. The difference of controlExpression between the first data point and new data point should less than or equal to delta. +When delta is zero, all the continuous data with equal expression value will be grouped into the same group. +* ignoreNull + +Used to specify how to deal with the data when the value of controlExpression is null. When ignoreNull is false, null will be treated as a new value and when ignoreNull is true, the data point will be directly skipped. + +The supported return types of controlExpression and how to deal with null value when ignoreNull is false are shown in the following table: + +| delta | Return Type Supported By controlExpression | The Handling of null when ignoreNull is False | +|----------|--------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| delta!=0 | INT32、INT64、FLOAT、DOUBLE | If the processing group doesn't contains null, null value should be treated as infinity/infinitesimal and will end current group.
Continuous null values are treated as stable values and assigned to the same group. | +| delta=0 | TEXT、BINARY、INT32、INT64、FLOAT、DOUBLE | Null is treated as a new value in a new group and continuous nulls belong to the same group. | + +groupByVariation + +#### Precautions for Use +1. The result of controlExpression should be a unique value. If multiple columns appear after using wildcard stitching, an error will be reported. +2. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +3. Each device is grouped separately when used with `ALIGN BY DEVICE`. +4. Delta is zero and ignoreNull is true by default. +5. Currently `GROUP BY VARIATION` is not supported with `GROUP BY LEVEL`. + +Using the raw data below, several examples of `GROUP BY VARIAITON` queries will be given. +``` ++-----------------------------+-------+-------+-------+--------+-------+-------+ +| Time| s1| s2| s3| s4| s5| s6| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| null| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| null| 245.0| 29.0| null| +|1970-01-01T08:00:00.030+08:00| 34.5| null| 30.0| 345.0| null| null| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| null| 59.0| 50.0| 545.0| 59.0| 6.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| null| +|1970-01-01T08:00:00.070+08:00| 74.5| 79.0| null| null| 79.0| 3.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 3.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 3.25| +|1970-01-01T08:00:00.150+08:00| 66.5| 77.0| 90.0| 945.0| 99.0| 9.25| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +``` +#### delta = 0 +The sql is shown below: +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) +``` +Get the result below which ignores the row with null value in `s6`. +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.040+08:00| 24.5| 3| 50.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +when ignoreNull is false, the row with null value in `s6` will be considered. +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) +``` +Get the following result. +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.020+08:00|1970-01-01T08:00:00.030+08:00| 29.5| 1| 30.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.040+08:00| 44.5| 1| 40.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.060+08:00|1970-01-01T08:00:00.060+08:00| 64.5| 1| 60.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +#### delta !=0 + +The sql is shown below: +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) +``` +Get the result below: +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.050+08:00| 24.5| 4| 100.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +The sql is shown below: + +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6+s5, 10) +``` +Get the result below: +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.050+08:00| 44.5| 2| 90.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.080+08:00| 79.5| 2| 80.0| +|1970-01-01T08:00:00.090+08:00|1970-01-01T08:00:00.150+08:00| 80.5| 2| 180.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` + +### Aggregation By Condition +When you need to filter the data according to a specific condition and group the continuous ones for an aggregation query. +`GROUP BY CONDITION` is suitable for you.The rows which don't meet the given condition will be simply ignored because they don't belong to any group. +Its syntax is defined below: +```sql +group by condition(predict,[keep>/>=/=/<=/<]threshold,[,ignoreNull=true/false]) +``` +* predict + +Any legal expression return the type of boolean for filtering in grouping. +* [keep>/>=/=/<=/<]threshold + +Keep expression is used to specify the number of continuous rows that meet the `predict` condition to form a group. Only the number of rows in group satisfy the keep condition, the result of group will be output. +Keep expression consists of a 'keep' string and a threshold of type `long` or a single 'long' type data. +* ignoreNull=true/false + +Used to specify how to handle data rows that encounter null predict, skip the row when it's true and end current group when it's false. + +#### Precautions for Use +1. keep condition is required in the query, but you can omit the 'keep' string and given a `long` number which defaults to 'keep=long number' condition. +2. IgnoreNull defaults to true. +3. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +4. Each device is grouped separately when used with `ALIGN BY DEVICE`. +5. Currently `GROUP BY CONDITION` is not supported with `GROUP BY LEVEL`. + +For the following raw data, several query examples are given below: +``` ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +| Time|root.sg.beijing.car01.soc|root.sg.beijing.car01.charging_status|root.sg.beijing.car01.vehicle_status| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| 1| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| 1| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +``` +The sql statement to query data with at least two continuous row shown below: +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=true) +``` +Get the result below: +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 10| 5| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` +When ignoreNull is false, the null value will be treated as a row that doesn't meet the condition. +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoringNull=false) +``` +Get the result below, the original group is split. +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 7| 3| 36.0| +|1970-01-01T08:00:00.009+08:00| 10| 2| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` + +### Aggregation By Session +`GROUP BY SESSION` can be used to group data according to the interval of the time. Data with a time interval less than or equal to the given threshold will be assigned to the same group. +For example, in industrial scenarios, devices don't always run continuously, `GROUP BY SESSION` will group the data generated by each access session of the device. +Its syntax is defined as follows: +```sql +group by session(timeInterval) +``` +* timeInterval + +A given interval threshold to create a new group of data when the difference between the time of data is greater than the threshold. + +The figure below is a grouping diagram under `GROUP BY SESSION`. + +groupBySession + +#### Precautions for Use +1. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +2. Each device is grouped separately when used with `ALIGN BY DEVICE`. +3. Currently `GROUP BY SESSION` is not supported with `GROUP BY LEVEL`. + +For the raw data below, a few query examples are given: +``` ++-----------------------------+-----------------+-----------+--------+------+ +| Time| Device|temperature|hardware|status| ++-----------------------------+-----------------+-----------+--------+------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01| 35.7| 11| false| +|1970-01-01T08:00:02.000+08:00|root.ln.wf02.wt01| 35.8| 22| true| +|1970-01-01T08:00:03.000+08:00|root.ln.wf02.wt01| 35.4| 33| false| +|1970-01-01T08:00:04.000+08:00|root.ln.wf02.wt01| 36.4| 44| false| +|1970-01-01T08:00:05.000+08:00|root.ln.wf02.wt01| 36.8| 55| false| +|1970-01-01T08:00:10.000+08:00|root.ln.wf02.wt01| 36.8| 110| false| +|1970-01-01T08:00:20.000+08:00|root.ln.wf02.wt01| 37.8| 220| true| +|1970-01-01T08:00:30.000+08:00|root.ln.wf02.wt01| 37.5| 330| false| +|1970-01-01T08:00:40.000+08:00|root.ln.wf02.wt01| 37.4| 440| false| +|1970-01-01T08:00:50.000+08:00|root.ln.wf02.wt01| 37.9| 550| false| +|1970-01-01T08:01:40.000+08:00|root.ln.wf02.wt01| 38.0| 110| false| +|1970-01-01T08:02:30.000+08:00|root.ln.wf02.wt01| 38.8| 220| true| +|1970-01-01T08:03:20.000+08:00|root.ln.wf02.wt01| 38.6| 330| false| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01| 38.4| 440| false| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01| 38.3| 550| false| +|1970-01-01T08:06:40.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:07:50.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:08:00.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01| 38.2| 110| false| +|1970-01-02T08:08:02.000+08:00|root.ln.wf02.wt01| 37.5| 220| true| +|1970-01-02T08:08:03.000+08:00|root.ln.wf02.wt01| 37.4| 330| false| +|1970-01-02T08:08:04.000+08:00|root.ln.wf02.wt01| 36.8| 440| false| +|1970-01-02T08:08:05.000+08:00|root.ln.wf02.wt01| 37.4| 550| false| ++-----------------------------+-----------------+-----------+--------+------+ +``` +TimeInterval can be set by different time units, the sql is shown below: +```sql +select __endTime,count(*) from root.** group by session(1d) +``` +Get the result: +``` ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +| Time| __endTime|count(root.ln.wf02.wt01.temperature)|count(root.ln.wf02.wt01.hardware)|count(root.ln.wf02.wt01.status)| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +|1970-01-01T08:00:01.000+08:00|1970-01-01T08:08:00.000+08:00| 15| 18| 15| +|1970-01-02T08:08:01.000+08:00|1970-01-02T08:08:05.000+08:00| 5| 5| 5| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +``` +It can be also used with `HAVING` and `ALIGN BY DEVICE` clauses. +```sql +select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device +``` +Get the result below: +``` ++-----------------------------+-----------------+-----------------------------+-------------+ +| Time| Device| __endTime|sum(hardware)| ++-----------------------------+-----------------+-----------------------------+-------------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01|1970-01-01T08:03:20.000+08:00| 2475.0| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:04:20.000+08:00| 440.0| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:05:20.000+08:00| 550.0| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01|1970-01-02T08:08:05.000+08:00| 1650.0| ++-----------------------------+-----------------+-----------------------------+-------------+ +``` +### Aggregation By Count +`GROUP BY COUNT`can aggregate the data points according to the number of points. It can group fixed number of continuous data points together for aggregation query. +Its syntax is defined as follows: +```sql +group by count(controlExpression, size[,ignoreNull=true/false]) +``` + +* controlExpression + +The object to count during processing, it can be any column or an expression of columns. + +* size + +The number of data points in a group, a number of `size` continuous points will be divided to the same group. + +* ignoreNull=true/false + +Whether to ignore the data points with null in `controlExpression`, when ignoreNull is true, data points with the `controlExpression` of null will be skipped during counting. + +#### Precautions for Use +1. For a group in resultSet, the time column output the start time of the group by default. __endTime can be used in select clause to output the endTime of groups in resultSet. +2. Each device is grouped separately when used with `ALIGN BY DEVICE`. +3. Currently `GROUP BY SESSION` is not supported with `GROUP BY LEVEL`. +4. When the final number of data points in a group is less than `size`, the result of the group will not be output. + +For the data below, some examples will be given. +``` ++-----------------------------+-----------+-----------------------+ +| Time|root.sg.soc|root.sg.charging_status| ++-----------------------------+-----------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| ++-----------------------------+-----------+-----------------------+ +``` +The sql is shown below +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) +``` +Get the result below, in the second group from 1970-01-01T08:00:00.006+08:00 to 1970-01-01T08:00:00.010+08:00. There are only four points included which is less than `size`. So it won't be output. +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` +When `ignoreNull=false` is used to take null value into account. There will be two groups with 5 points in the resultSet, which is shown as follows: +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) +``` +Get the results: +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| +|1970-01-01T08:00:00.006+08:00|1970-01-01T08:00:00.010+08:00| 24.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` +## Aggregate By Group + +### Aggregation By Level + +Aggregation by level statement is used to group the query result whose name is the same at the given level. + +- Keyword `LEVEL` is used to specify the level that need to be grouped. By convention, `level=0` represents *root* level. +- All aggregation functions are supported. When using five aggregations: sum, avg, min_value, max_value and extreme, please make sure all the aggregated series have exactly the same data type. Otherwise, it will generate a syntax error. + +**Example 1:** there are multiple series named `status` under different databases, like "root.ln.wf01.wt01.status", "root.ln.wf02.wt02.status", and "root.sgcc.wf03.wt01.status". If you need to count the number of data points of the `status` sequence under different databases, use the following query: + +```sql +select count(status) from root.** group by level = 1 +``` + +Result: + +``` ++-------------------------+---------------------------+ +|count(root.ln.*.*.status)|count(root.sgcc.*.*.status)| ++-------------------------+---------------------------+ +| 20160| 10080| ++-------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**Example 2:** If you need to count the number of data points under different devices, you can specify level = 3, + +```sql +select count(status) from root.** group by level = 3 +``` + +Result: + +``` ++---------------------------+---------------------------+ +|count(root.*.*.wt01.status)|count(root.*.*.wt02.status)| ++---------------------------+---------------------------+ +| 20160| 10080| ++---------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**Example 3:** Attention,the devices named `wt01` under databases `ln` and `sgcc` are grouped together, since they are regarded as devices with the same name. If you need to further count the number of data points in different devices under different databases, you can use the following query: + +```sql +select count(status) from root.** group by level = 1, 3 +``` + +Result: + +``` ++----------------------------+----------------------------+------------------------------+ +|count(root.ln.*.wt01.status)|count(root.ln.*.wt02.status)|count(root.sgcc.*.wt01.status)| ++----------------------------+----------------------------+------------------------------+ +| 10080| 10080| 10080| ++----------------------------+----------------------------+------------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**Example 4:** Assuming that you want to query the maximum value of temperature sensor under all time series, you can use the following query statement: + +```sql +select max_value(temperature) from root.** group by level = 0 +``` + +Result: + +``` ++---------------------------------+ +|max_value(root.*.*.*.temperature)| ++---------------------------------+ +| 26.0| ++---------------------------------+ +Total line number = 1 +It costs 0.013s +``` + +**Example 5:** The above queries are for a certain sensor. In particular, **if you want to query the total data points owned by all sensors at a certain level**, you need to explicitly specify `*` is selected. + +```sql +select count(*) from root.ln.** group by level = 2 +``` + +Result: + +``` ++----------------------+----------------------+ +|count(root.*.wf01.*.*)|count(root.*.wf02.*.*)| ++----------------------+----------------------+ +| 20160| 20160| ++----------------------+----------------------+ +Total line number = 1 +It costs 0.013s +``` +#### Aggregate By Time with Level Clause + +Level could be defined to show count the number of points of each node at the given level in current Metadata Tree. + +This could be used to query the number of points under each device. + +The SQL statement is: + +Get time aggregation by level. + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; +``` +Result: + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.006s +``` + +Time aggregation with sliding step and by level. + +```sql +select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; +``` + +Result: + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| +|2017-11-02T00:00:00.000+08:00| 180| +|2017-11-03T00:00:00.000+08:00| 180| +|2017-11-04T00:00:00.000+08:00| 180| +|2017-11-05T00:00:00.000+08:00| 180| +|2017-11-06T00:00:00.000+08:00| 180| +|2017-11-07T00:00:00.000+08:00| 180| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.004s +``` + +### Aggregation By Tags + +IotDB allows you to do aggregation query with the tags defined in timeseries through `GROUP BY TAGS` clause as well. + +Firstly, we can put these example data into IoTDB, which will be used in the following feature introduction. + +These are the temperature data of the workshops, which belongs to the factory `factory1` and locates in different cities. The time range is `[1000, 10000)`. + +The device node of the timeseries path is the ID of the device. The information of city and workshop are modelled in the tags `city` and `workshop`. +The devices `d1` and `d2` belong to the workshop `d1` in `Beijing`. +`d3` and `d4` belong to the workshop `w2` in `Beijing`. +`d5` and `d6` belong to the workshop `w1` in `Shanghai`. +`d7` belongs to the workshop `w2` in `Shanghai`. +`d8` and `d9` are under maintenance, and don't belong to any workshops, so they have no tags. + + +```SQL +CREATE DATABASE root.factory1; +create timeseries root.factory1.d1.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d2.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d3.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d4.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d5.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d6.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d7.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w2); +create timeseries root.factory1.d8.temperature with datatype=FLOAT; +create timeseries root.factory1.d9.temperature with datatype=FLOAT; + +insert into root.factory1.d1(time, temperature) values(1000, 104.0); +insert into root.factory1.d1(time, temperature) values(3000, 104.2); +insert into root.factory1.d1(time, temperature) values(5000, 103.3); +insert into root.factory1.d1(time, temperature) values(7000, 104.1); + +insert into root.factory1.d2(time, temperature) values(1000, 104.4); +insert into root.factory1.d2(time, temperature) values(3000, 103.7); +insert into root.factory1.d2(time, temperature) values(5000, 103.3); +insert into root.factory1.d2(time, temperature) values(7000, 102.9); + +insert into root.factory1.d3(time, temperature) values(1000, 103.9); +insert into root.factory1.d3(time, temperature) values(3000, 103.8); +insert into root.factory1.d3(time, temperature) values(5000, 102.7); +insert into root.factory1.d3(time, temperature) values(7000, 106.9); + +insert into root.factory1.d4(time, temperature) values(1000, 103.9); +insert into root.factory1.d4(time, temperature) values(5000, 102.7); +insert into root.factory1.d4(time, temperature) values(7000, 106.9); + +insert into root.factory1.d5(time, temperature) values(1000, 112.9); +insert into root.factory1.d5(time, temperature) values(7000, 113.0); + +insert into root.factory1.d6(time, temperature) values(1000, 113.9); +insert into root.factory1.d6(time, temperature) values(3000, 113.3); +insert into root.factory1.d6(time, temperature) values(5000, 112.7); +insert into root.factory1.d6(time, temperature) values(7000, 112.3); + +insert into root.factory1.d7(time, temperature) values(1000, 101.2); +insert into root.factory1.d7(time, temperature) values(3000, 99.3); +insert into root.factory1.d7(time, temperature) values(5000, 100.1); +insert into root.factory1.d7(time, temperature) values(7000, 99.8); + +insert into root.factory1.d8(time, temperature) values(1000, 50.0); +insert into root.factory1.d8(time, temperature) values(3000, 52.1); +insert into root.factory1.d8(time, temperature) values(5000, 50.1); +insert into root.factory1.d8(time, temperature) values(7000, 50.5); + +insert into root.factory1.d9(time, temperature) values(1000, 50.3); +insert into root.factory1.d9(time, temperature) values(3000, 52.1); +``` + +#### Aggregation query by one single tag + +If the user wants to know the average temperature of each workshop, he can query like this + +```SQL +SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); +``` + +The query will calculate the average of the temperatures of those timeseries which have the same tag value of the key `city`. +The results are + +``` ++--------+------------------+ +| city| avg(temperature)| ++--------+------------------+ +| Beijing|104.04666697184244| +|Shanghai|107.85000076293946| +| NULL| 50.84999910990397| ++--------+------------------+ +Total line number = 3 +It costs 0.231s +``` + +From the results we can see that the differences between aggregation by tags query and aggregation by time or level query are: +1. Aggregation query by tags will no longer remove wildcard to raw timeseries, but do the aggregation through the data of multiple timeseries, which have the same tag value. +2. Except for the aggregate result column, the result set contains the key-value column of the grouped tag. The column name is the tag key, and the values in the column are tag values which present in the searched timeseries. +If some searched timeseries doesn't have the grouped tag, a `NULL` value in the key-value column of the grouped tag will be presented, which means the aggregation of all the timeseries lacking the tagged key. + +#### Aggregation query by multiple tags + +Except for the aggregation query by one single tag, aggregation query by multiple tags in a particular order is allowed as well. + +For example, a user wants to know the average temperature of the devices in each workshop. +As the workshop names may be same in different city, it's not correct to aggregated by the tag `workshop` directly. +So the aggregation by the tag `city` should be done first, and then by the tag `workshop`. + +SQL + +```SQL +SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); +``` + +The results + +``` ++--------+--------+------------------+ +| city|workshop| avg(temperature)| ++--------+--------+------------------+ +| NULL| NULL| 50.84999910990397| +|Shanghai| w1|113.01666768391927| +| Beijing| w2| 104.4000004359654| +|Shanghai| w2|100.10000038146973| +| Beijing| w1|103.73750019073486| ++--------+--------+------------------+ +Total line number = 5 +It costs 0.027s +``` + +We can see that in a multiple tags aggregation query, the result set will output the key-value columns of all the grouped tag keys, which have the same order with the one in `GROUP BY TAGS`. + +#### Downsampling Aggregation by tags based on Time Window + +Downsampling aggregation by time window is one of the most popular features in a time series database. IoTDB supports to do aggregation query by tags based on time window. + +For example, a user wants to know the average temperature of the devices in each workshop, in every 5 seconds, in the range of time `[1000, 10000)`. + +SQL + +```SQL +SELECT avg(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); +``` + +The results + +``` ++-----------------------------+--------+--------+------------------+ +| Time| city|workshop| avg(temperature)| ++-----------------------------+--------+--------+------------------+ +|1970-01-01T08:00:01.000+08:00| NULL| NULL| 50.91999893188476| +|1970-01-01T08:00:01.000+08:00|Shanghai| w1|113.20000076293945| +|1970-01-01T08:00:01.000+08:00| Beijing| w2| 103.4| +|1970-01-01T08:00:01.000+08:00|Shanghai| w2| 100.1999994913737| +|1970-01-01T08:00:01.000+08:00| Beijing| w1|103.81666692097981| +|1970-01-01T08:00:06.000+08:00| NULL| NULL| 50.5| +|1970-01-01T08:00:06.000+08:00|Shanghai| w1| 112.6500015258789| +|1970-01-01T08:00:06.000+08:00| Beijing| w2| 106.9000015258789| +|1970-01-01T08:00:06.000+08:00|Shanghai| w2| 99.80000305175781| +|1970-01-01T08:00:06.000+08:00| Beijing| w1| 103.5| ++-----------------------------+--------+--------+------------------+ +``` + +Comparing to the pure tag aggregations, this kind of aggregation will divide the data according to the time window specification firstly, and do the aggregation query by the multiple tags in each time window secondly. +The result set will also contain a time column, which have the same meaning with the time column of the result in downsampling aggregation query by time window. + +#### Limitation of Aggregation by Tags + +As this feature is still under development, some queries have not been completed yet and will be supported in the future. + +> 1. Temporarily not support `HAVING` clause to filter the results. +> 2. Temporarily not support ordering by tag values. +> 3. Temporarily not support `LIMIT`,`OFFSET`,`SLIMIT`,`SOFFSET`. +> 4. Temporarily not support `ALIGN BY DEVICE`. +> 5. Temporarily not support expressions as aggregation function parameter,e.g. `count(s+1)`. +> 6. Not support the value filter, which stands the same with the `GROUP BY LEVEL` query. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Having-Condition.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Having-Condition.md new file mode 100644 index 00000000..830898a4 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Having-Condition.md @@ -0,0 +1,115 @@ + + + +# Aggregate Result Filtering + +If you want to filter the results of aggregate queries, +you can use the `HAVING` clause after the `GROUP BY` clause. + +> NOTE: +> +> 1.The expression in HAVING clause must consist of aggregate values; the original sequence cannot appear alone. +> The following usages are incorrect: +> ```sql +> select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 +> select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 +> ``` +> 2.When filtering the `GROUP BY LEVEL` result, the PATH in `SELECT` and `HAVING` can only have one node. +> The following usages are incorrect: +> ```sql +> select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 +> select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 +> ``` + +Here are a few examples of using the 'HAVING' clause to filter aggregate results. + +Aggregation result 1: + +``` ++-----------------------------+---------------------+---------------------+ +| Time|count(root.test.*.s1)|count(root.test.*.s2)| ++-----------------------------+---------------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 4| 4| +|1970-01-01T08:00:00.003+08:00| 1| 0| +|1970-01-01T08:00:00.005+08:00| 2| 4| +|1970-01-01T08:00:00.007+08:00| 3| 2| +|1970-01-01T08:00:00.009+08:00| 4| 4| ++-----------------------------+---------------------+---------------------+ +``` + +Aggregation result filtering query 1: + +```sql + select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 1 +``` + +Filtering result 1: + +``` ++-----------------------------+---------------------+ +| Time|count(root.test.*.s1)| ++-----------------------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 4| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 4| ++-----------------------------+---------------------+ +``` + +Aggregation result 2: + +``` ++-----------------------------+-------------+---------+---------+ +| Time| Device|count(s1)|count(s2)| ++-----------------------------+-------------+---------+---------+ +|1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.003+08:00|root.test.sg1| 1| 0| +|1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.007+08:00|root.test.sg1| 2| 1| +|1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| +|1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| +|1970-01-01T08:00:00.003+08:00|root.test.sg2| 0| 0| +|1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| +|1970-01-01T08:00:00.007+08:00|root.test.sg2| 1| 1| +|1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| ++-----------------------------+-------------+---------+---------+ +``` + +Aggregation result filtering query 2: + +```sql + select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device +``` + +Filtering result 2: + +``` ++-----------------------------+-------------+---------+---------+ +| Time| Device|count(s1)|count(s2)| ++-----------------------------+-------------+---------+---------+ +|1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| +|1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| +|1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| +|1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| +|1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| ++-----------------------------+-------------+---------+---------+ +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Last-Query.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Last-Query.md new file mode 100644 index 00000000..ea963b20 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Last-Query.md @@ -0,0 +1,101 @@ + + +# Last Query + +The last query is a special type of query in Apache IoTDB. It returns the data point with the largest timestamp of the specified time series. In other word, it returns the latest state of a time series. This feature is especially important in IoT data analysis scenarios. To meet the performance requirement of real-time device monitoring systems, Apache IoTDB caches the latest values of all time series to achieve microsecond read latency. + +The last query is to return the most recent data point of the given timeseries in a three column format. + +The SQL syntax is defined as: + +```sql +select last [COMMA ]* from < PrefixPath > [COMMA < PrefixPath >]* [ORDER BY TIMESERIES (DESC | ASC)?] +``` + +which means: Query and return the last data points of timeseries prefixPath.path. + +- Only time filter is supported in \. Any other filters given in the \ will give an exception. When the cached most recent data point does not satisfy the criterion specified by the filter, IoTDB will have to get the result from the external storage, which may cause a decrease in performance. + +- The result will be returned in a four column table format. + + ``` + | Time | timeseries | value | dataType | + ``` + + **Note:** The `value` colum will always return the value as `string` and thus also has `TSDataType.TEXT`. Therefore, the column `dataType` is returned also which contains the _real_ type how the value should be interpreted. + +- We can use `TIME/TIMESERIES/VALUE/DATATYPE (DESC | ASC)` to specify that the result set is sorted in descending/ascending order based on a particular column. When the value column contains multiple types of data, the sorting is based on the string representation of the values. + +**Example 1:** get the last point of root.ln.wf01.wt01.status: + +``` +IoTDB> select last status from root.ln.wf01.wt01 ++-----------------------------+------------------------+-----+--------+ +| Time| timeseries|value|dataType| ++-----------------------------+------------------------+-----+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.status|false| BOOLEAN| ++-----------------------------+------------------------+-----+--------+ +Total line number = 1 +It costs 0.000s +``` + +**Example 2:** get the last status and temperature points of root.ln.wf01.wt01, whose timestamp larger or equal to 2017-11-07T23:50:00。 + +``` +IoTDB> select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 3:** get the last points of all sensor in root.ln.wf01.wt01, and order the result by the timeseries column in descending order + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by timeseries desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 4:** get the last points of all sensor in root.ln.wf01.wt01, and order the result by the dataType column in descending order + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by dataType desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Order-By.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Order-By.md new file mode 100644 index 00000000..8f5ccbb2 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Order-By.md @@ -0,0 +1,276 @@ + + +# Order By + +## Order by in ALIGN BY TIME mode + +The result set of IoTDB is in ALIGN BY TIME mode by default and `ORDER BY TIME` clause can also be used to specify the ordering of timestamp. The SQL statement is: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; +``` +Results: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-01T00:01:00.000+08:00| v2| true| 24.36| true| +|2017-11-01T00:00:00.000+08:00| v2| true| 25.96| true| +|1970-01-01T08:00:00.002+08:00| v2| false| null| null| +|1970-01-01T08:00:00.001+08:00| v1| true| null| null| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +``` + +## Order by in ALIGN BY DEVICE mode +When querying in ALIGN BY DEVICE mode, `ORDER BY` clause can be used to specify the ordering of result set. + +ALIGN BY DEVICE mode supports four kinds of clauses with two sort keys which are `Device` and `Time`. + +1. ``ORDER BY DEVICE``: sort by the alphabetical order of the device name. The devices with the same column names will be clustered in a group view. + +2. ``ORDER BY TIME``: sort by the timestamp, the data points from different devices will be shuffled according to the timestamp. + +3. ``ORDER BY DEVICE,TIME``: sort by the alphabetical order of the device name. The data points with the same device name will be sorted by timestamp. + +4. ``ORDER BY TIME,DEVICE``: sort by timestamp. The data points with the same time will be sorted by the alphabetical order of the device name. + +> To make the result set more legible, when `ORDER BY` clause is not used, default settings will be provided. +> The default ordering clause is `ORDER BY DEVICE,TIME` and the default ordering is `ASC`. + +When `Device` is the main sort key, the result set is sorted by device name first, then by timestamp in the group with the same device name, the SQL statement is: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; +``` +The result shows below: + +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` +When `Time` is the main sort key, the result set is sorted by timestamp first, then by device name in data points with the same timestamp. The SQL statement is: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; +``` +The result shows below: +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` +When `ORDER BY` clause is not used, sort in default way, the SQL statement is: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` +The result below indicates `ORDER BY DEVICE ASC,TIME ASC` is the clause in default situation. +`ASC` can be omitted because it's the default ordering. +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| ++-----------------------------+-----------------+--------+------+-----------+ +``` +Besides,`ALIGN BY DEVICE` and `ORDER BY` clauses can be used with aggregate query,the SQL statement is: +```sql +select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device +``` +The result shows below: +``` ++-----------------------------+-----------------+---------------+-------------+------------------+ +| Time| Device|count(hardware)|count(status)|count(temperature)| ++-----------------------------+-----------------+---------------+-------------+------------------+ +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| 1| 1| +|2017-11-01T00:02:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:03:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| 1| 1| null| +|2017-11-01T00:02:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| +|2017-11-01T00:03:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| ++-----------------------------+-----------------+---------------+-------------+------------------+ +``` +## Order by arbitrary expressions + +In addition to the predefined keywords "Time" and "Device" in IoTDB, `ORDER BY` can also be used to sort by any expressions. + +When sorting, `ASC` or `DESC` can be used to specify the sorting order, and `NULLS` syntax is supported to specify the priority of NULL values in the sorting. By default, `NULLS FIRST` places NULL values at the top of the result, and `NULLS LAST` ensures that NULL values appear at the end of the result. If not specified in the clause, the default order is ASC with NULLS LAST. + +Here are several examples of queries for sorting arbitrary expressions using the following data: +``` ++-----------------------------+-------------+-------+-------+--------+-------+ +| Time| Device| base| score| bonus| total| ++-----------------------------+-------------+-------+-------+--------+-------+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0| 107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0| 105.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0| 103.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00| root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.020+08:00| root.three| 8| null| 22.5| 30.5| +|1970-01-01T08:00:00.030+08:00| root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0| 104.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0| 102.0| ++-----------------------------+-------------+-------+-------+--------+-------+ +``` +When you need to sort the results based on the base score score, you can use the following SQL: +```Sql +select score from root.** order by score desc align by device +``` +This will give you the following results: + +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` +If you want to sort the results based on the total score, you can use an expression in the `ORDER BY` clause to perform the calculation: +```Sql +select score,total from root.one order by base+score+bonus desc +``` +This SQL is equivalent to: +```Sql +select score,total from root.one order by total desc +``` +Here are the results: +``` ++-----------------------------+--------------+--------------+ +| Time|root.one.score|root.one.total| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.000+08:00| 50.0| 107.0| +|1970-01-02T08:00:00.000+08:00| 50.0| 105.0| +|1970-01-03T08:00:00.000+08:00| 50.0| 103.0| ++-----------------------------+--------------+--------------+ +``` +If you want to sort the results based on the total score and, in case of tied scores, sort by score, base, bonus, and submission time in descending order, you can specify multiple layers of sorting using multiple expressions: + +```Sql +select base, score, bonus, total from root.** order by total desc NULLS Last, + score desc NULLS Last, + bonus desc NULLS Last, + time desc align by device +``` +Here are the results: +``` ++-----------------------------+----------+----+-----+-----+-----+ +| Time| Device|base|score|bonus|total| ++-----------------------------+----------+----+-----+-----+-----+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0|107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0|105.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0|104.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0|103.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0|102.0| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.000+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00|root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.030+08:00|root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.020+08:00|root.three| 8| null| 22.5| 30.5| ++-----------------------------+----------+----+-----+-----+-----+ +``` +In the `ORDER BY` clause, you can also use aggregate query expressions. For example: +```Sql +select min_value(total) from root.** order by min_value(total) asc align by device +``` +This will give you the following results: +``` ++----------+----------------+ +| Device|min_value(total)| ++----------+----------------+ +|root.three| 30.5| +| root.two| 33.0| +| root.four| 85.0| +| root.five| 102.0| +| root.one| 103.0| ++----------+----------------+ +``` +When specifying multiple columns in the query, the unsorted columns will change order along with the rows and sorted columns. The order of rows when the sorting columns are the same may vary depending on the specific implementation (no fixed order). For example: +```Sql +select min_value(total),max_value(base) from root.** order by max_value(total) desc align by device +``` +This will give you the following results: +· +``` ++----------+----------------+---------------+ +| Device|min_value(total)|max_value(base)| ++----------+----------------+---------------+ +| root.one| 103.0| 12| +| root.five| 102.0| 7| +| root.four| 85.0| 9| +| root.two| 33.0| 9| +|root.three| 30.5| 9| ++----------+----------------+---------------+ +``` + +You can use both `ORDER BY DEVICE,TIME` and `ORDER BY EXPRESSION` together. For example: +```Sql +select score from root.** order by device asc, score desc, time asc align by device +``` +This will give you the following results: +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Overview.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Overview.md new file mode 100644 index 00000000..d2357e28 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Overview.md @@ -0,0 +1,334 @@ + + +# Overview + +## Syntax Definition + +In IoTDB, `SELECT` statement is used to retrieve data from one or more selected time series. Here is the syntax definition of `SELECT` statement: + +```sql +SELECT [LAST] selectExpr [, selectExpr] ... + [INTO intoItem [, intoItem] ...] + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY { + ([startTime, endTime), interval [, slidingStep]) | + LEVEL = levelNum [, levelNum] ... | + TAGS(tagKey [, tagKey] ... ) | + VARIATION(expression[,delta][,ignoreNull=true/false]) | + CONDITION(expression,[keep>/>=/=/ 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires that the status and temperature sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. + +The execution result of this SQL statement is as follows: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 6 +It costs 0.018s +``` + +### Select Multiple Columns of Data for the Same Device According to Multiple Time Intervals + +IoTDB supports specifying multiple time interval conditions in a query. Users can combine time interval conditions at will according to their needs. For example, the SQL statement is: + +```sql +select status,temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature"; the statement specifies two different time intervals, namely "2017-11-01T00:05:00.000 to 2017-11-01T00:12:00.000" and "2017-11-01T16:35:00.000 to 2017-11-01T16:37:00.000". The SQL statement requires that the values of selected timeseries satisfying any time interval be selected. + +The execution result of this SQL statement is as follows: +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| +|2017-11-01T16:35:00.000+08:00| true| 23.44| +|2017-11-01T16:36:00.000+08:00| false| 21.98| +|2017-11-01T16:37:00.000+08:00| false| 21.93| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 9 +It costs 0.018s +``` + + +### Choose Multiple Columns of Data for Different Devices According to Multiple Time Intervals + +The system supports the selection of data in any column in a query, i.e., the selected columns can come from different devices. For example, the SQL statement is: + +```sql +select wf01.wt01.status,wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` +which means: + +The selected timeseries are "the power supply status of ln group wf01 plant wt01 device" and "the hardware version of ln group wf02 plant wt02 device"; the statement specifies two different time intervals, namely "2017-11-01T00:05:00.000 to 2017-11-01T00:12:00.000" and "2017-11-01T16:35:00.000 to 2017-11-01T16:37:00.000". The SQL statement requires that the values of selected timeseries satisfying any time interval be selected. + +The execution result of this SQL statement is as follows: + +``` ++-----------------------------+------------------------+--------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf02.wt02.hardware| ++-----------------------------+------------------------+--------------------------+ +|2017-11-01T00:06:00.000+08:00| false| v1| +|2017-11-01T00:07:00.000+08:00| false| v1| +|2017-11-01T00:08:00.000+08:00| false| v1| +|2017-11-01T00:09:00.000+08:00| false| v1| +|2017-11-01T00:10:00.000+08:00| true| v2| +|2017-11-01T00:11:00.000+08:00| false| v1| +|2017-11-01T16:35:00.000+08:00| true| v2| +|2017-11-01T16:36:00.000+08:00| false| v1| +|2017-11-01T16:37:00.000+08:00| false| v1| ++-----------------------------+------------------------+--------------------------+ +Total line number = 9 +It costs 0.014s +``` + +### Order By Time Query +IoTDB supports the 'order by time' statement since 0.11, it's used to display results in descending order by time. +For example, the SQL statement is: + +```sql +select * from root.ln.** where time > 1 order by time desc limit 10; +``` +The execution result of this SQL statement is as follows: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-07T23:59:00.000+08:00| v1| false| 21.07| false| +|2017-11-07T23:58:00.000+08:00| v1| false| 22.93| false| +|2017-11-07T23:57:00.000+08:00| v2| true| 24.39| true| +|2017-11-07T23:56:00.000+08:00| v2| true| 24.44| true| +|2017-11-07T23:55:00.000+08:00| v2| true| 25.9| true| +|2017-11-07T23:54:00.000+08:00| v1| false| 22.52| false| +|2017-11-07T23:53:00.000+08:00| v2| true| 24.58| true| +|2017-11-07T23:52:00.000+08:00| v1| false| 20.18| false| +|2017-11-07T23:51:00.000+08:00| v1| false| 22.24| false| +|2017-11-07T23:50:00.000+08:00| v2| true| 23.7| true| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.016s +``` + +## Execution Interface + +In IoTDB, there are two ways to execute data query: +- Execute queries using IoTDB-SQL. +- Efficient execution interfaces for common queries, including time-series raw data query, last query, and aggregation query. + +### Execute queries using IoTDB-SQL + +Data query statements can be used in SQL command-line terminals, JDBC, JAVA / C++ / Python / Go and other native APIs, and RESTful APIs. + +- Execute the query statement in the SQL command line terminal: start the SQL command line terminal, and directly enter the query statement to execute, see [SQL command line terminal](../QuickStart/Command-Line-Interface.md). + +- Execute query statements in JDBC, see [JDBC](../API/Programming-JDBC.md) for details. + +- Execute query statements in native APIs such as JAVA / C++ / Python / Go. For details, please refer to the relevant documentation in the Application Programming Interface chapter. The interface prototype is as follows: + + ````java + SessionDataSet executeQueryStatement(String sql) + ```` + +- Used in RESTful API, see [HTTP API V1](../API/RestServiceV1.md) or [HTTP API V2](../API/RestServiceV2.md) for details. + +### Efficient execution interfaces + +The native APIs provide efficient execution interfaces for commonly used queries, which can save time-consuming operations such as SQL parsing. include: + +* Time-series raw data query with time range: + - The specified query time range is a left-closed right-open interval, including the start time but excluding the end time. + +```java +SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); +``` + +* Last query: + - Query the last data, whose timestamp is greater than or equal LastTime. + +```java +SessionDataSet executeLastDataQuery(List paths, long LastTime); +``` + +* Aggregation query: + - Support specified query time range: The specified query time range is a left-closed right-open interval, including the start time but not the end time. + - Support GROUP BY TIME. + +```java +SessionDataSet executeAggregationQuery(List paths, List aggregations); + +SessionDataSet executeAggregationQuery( + List paths, List aggregations, long startTime, long endTime); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval, + long slidingStep); +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Pagination.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Pagination.md new file mode 100644 index 00000000..ae265fb3 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Pagination.md @@ -0,0 +1,341 @@ + + +# Pagination + +When the query result set has a large amount of data, it is not conducive to display on one page. You can use the `LIMIT/SLIMIT` clause and the `OFFSET/SOFFSET` clause to control paging. + +- The `LIMIT` and `SLIMIT` clauses are used to control the number of rows and columns of query results. +- The `OFFSET` and `SOFFSET` clauses are used to control the starting position of the result display. + +## Row Control over Query Results + +By using LIMIT and OFFSET clauses, users control the query results in a row-related manner. We demonstrate how to use LIMIT and OFFSET clauses through the following examples. + +* Example 1: basic LIMIT clause + +The SQL statement is: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 10 +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires the first 10 rows of the query result. + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:00:00.000+08:00| true| 25.96| +|2017-11-01T00:01:00.000+08:00| true| 24.36| +|2017-11-01T00:02:00.000+08:00| false| 20.09| +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 10 +It costs 0.000s +``` + +* Example 2: LIMIT clause with OFFSET + +The SQL statement is: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires rows 3 to 7 of the query result be returned (with the first row numbered as row 0). + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.342s +``` + +* Example 3: LIMIT clause combined with WHERE clause + +The SQL statement is: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time< 2017-11-01T00:12:00.000 limit 2 offset 3 +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is "status" and "temperature". The SQL statement requires rows 3 to 4 of the status and temperature sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" (with the first row numbered as row 0). + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.000s +``` + +* Example 4: LIMIT clause combined with GROUP BY clause + +The SQL statement is: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 5 offset 3 +``` +which means: + +The SQL statement clause requires rows 3 to 7 of the query result be returned (with the first row numbered as row 0). + +The result is shown below: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 4 +It costs 0.016s +``` + +## Column Control over Query Results + +By using SLIMIT and SOFFSET clauses, users can control the query results in a column-related manner. We will demonstrate how to use SLIMIT and SOFFSET clauses through the following examples. + +* Example 1: basic SLIMIT clause + +The SQL statement is: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is the first column under this device, i.e., the power supply status. The SQL statement requires the status sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. + +The result is shown below: + +``` ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| 20.71| +|2017-11-01T00:07:00.000+08:00| 21.45| +|2017-11-01T00:08:00.000+08:00| 22.58| +|2017-11-01T00:09:00.000+08:00| 20.98| +|2017-11-01T00:10:00.000+08:00| 25.52| +|2017-11-01T00:11:00.000+08:00| 22.91| ++-----------------------------+-----------------------------+ +Total line number = 6 +It costs 0.000s +``` + +* Example 2: SLIMIT clause with SOFFSET + +The SQL statement is: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is the second column under this device, i.e., the temperature. The SQL statement requires the temperature sensor values between the time point of "2017-11-01T00:05:00.000" and "2017-11-01T00:12:00.000" be selected. + +The result is shown below: + +``` ++-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.status| ++-----------------------------+------------------------+ +|2017-11-01T00:06:00.000+08:00| false| +|2017-11-01T00:07:00.000+08:00| false| +|2017-11-01T00:08:00.000+08:00| false| +|2017-11-01T00:09:00.000+08:00| false| +|2017-11-01T00:10:00.000+08:00| true| +|2017-11-01T00:11:00.000+08:00| false| ++-----------------------------+------------------------+ +Total line number = 6 +It costs 0.003s +``` + +* Example 3: SLIMIT clause combined with GROUP BY clause + +The SQL statement is: + +```sql +select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 +``` + +The result is shown below: + +``` ++-----------------------------+-----------------------------------+ +| Time|max_value(root.ln.wf01.wt01.status)| ++-----------------------------+-----------------------------------+ +|2017-11-01T00:00:00.000+08:00| true| +|2017-11-02T00:00:00.000+08:00| true| +|2017-11-03T00:00:00.000+08:00| true| +|2017-11-04T00:00:00.000+08:00| true| +|2017-11-05T00:00:00.000+08:00| true| +|2017-11-06T00:00:00.000+08:00| true| +|2017-11-07T00:00:00.000+08:00| true| ++-----------------------------+-----------------------------------+ +Total line number = 7 +It costs 0.000s +``` + +## Row and Column Control over Query Results + +In addition to row or column control over query results, IoTDB allows users to control both rows and columns of query results. Here is a complete example with both LIMIT clauses and SLIMIT clauses. + +The SQL statement is: + +```sql +select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 +``` +which means: + +The selected device is ln group wf01 plant wt01 device; the selected timeseries is columns 0 to 1 under this device (with the first column numbered as column 0). The SQL statement clause requires rows 100 to 109 of the query result be returned (with the first row numbered as row 0). + +The result is shown below: + +``` ++-----------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+-----------------------------+------------------------+ +|2017-11-01T01:40:00.000+08:00| 21.19| false| +|2017-11-01T01:41:00.000+08:00| 22.79| false| +|2017-11-01T01:42:00.000+08:00| 22.98| false| +|2017-11-01T01:43:00.000+08:00| 21.52| false| +|2017-11-01T01:44:00.000+08:00| 23.45| true| +|2017-11-01T01:45:00.000+08:00| 24.06| true| +|2017-11-01T01:46:00.000+08:00| 22.6| false| +|2017-11-01T01:47:00.000+08:00| 23.78| true| +|2017-11-01T01:48:00.000+08:00| 24.72| true| +|2017-11-01T01:49:00.000+08:00| 24.68| true| ++-----------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.009s +``` + +## Error Handling + +If the parameter N/SN of LIMIT/SLIMIT exceeds the size of the result set, IoTDB returns all the results as expected. For example, the query result of the original SQL statement consists of six rows, and we select the first 100 rows through the LIMIT clause: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 100 +``` + +The result is shown below: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 6 +It costs 0.005s +``` + +If the parameter N/SN of LIMIT/SLIMIT clause exceeds the allowable maximum value (N/SN is of type int64), the system prompts errors. For example, executing the following SQL statement: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 9223372036854775808 +``` + +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +Msg: 416: Out of range. LIMIT : N should be Int64. +``` + +If the parameter N/SN of LIMIT/SLIMIT clause is not a positive intege, the system prompts errors. For example, executing the following SQL statement: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 13.1 +``` + +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +Msg: 401: line 1:129 mismatched input '.' expecting {, ';'} +``` + +If the parameter OFFSET of LIMIT clause exceeds the size of the result set, IoTDB will return an empty result set. For example, executing the following SQL statement: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 limit 2 offset 6 +``` + +The result is shown below: + +``` ++----+------------------------+-----------------------------+ +|Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++----+------------------------+-----------------------------+ ++----+------------------------+-----------------------------+ +Empty set. +It costs 0.005s +``` + +If the parameter SOFFSET of SLIMIT clause is not smaller than the number of available timeseries, the system prompts errors. For example, executing the following SQL statement: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 2 +``` + +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +Msg: 411: Meet error in query process: The value of SOFFSET (2) is equal to or exceeds the number of sequences (2) that can actually be returned. +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Expression.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Expression.md new file mode 100644 index 00000000..44a14023 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Expression.md @@ -0,0 +1,324 @@ + + +# Select Expression + +The `SELECT` clause specifies the output of the query, consisting of several `selectExpr`. Each `selectExpr` defines one or more columns in the query result. + +**`selectExpr` is an expression consisting of time series path suffixes, constants, functions, and operators. That is, `selectExpr` can contain: ** +- Time series path suffix (wildcards are supported) +- operator + - Arithmetic operators + - comparison operators + - Logical Operators +- function + - aggregate functions + - Time series generation functions (including built-in functions and user-defined functions) +- constant + +## Use Alias + +Since the unique data model of IoTDB, lots of additional information like device will be carried before each sensor. Sometimes, we want to query just one specific device, then these prefix information show frequently will be redundant in this situation, influencing the analysis of result set. At this time, we can use `AS` function provided by IoTDB, assign an alias to time series selected in query. + +For example: + +```sql +select s1 as temperature, s2 as speed from root.ln.wf01.wt01; +``` + +The result set is: + +| Time | temperature | speed | +| ---- | ----------- | ----- | +| ... | ... | ... | + + +## Operator + +See the documentation [Operators and Functions](../Operators-Functions/Overview.md) for a list of operators supported in IoTDB. + +## Function + +### aggregate functions + +Aggregate functions are many-to-one functions. They perform aggregate calculations on a set of values, resulting in a single aggregated result. + +**A query that contains an aggregate function is called an aggregate query**, otherwise, it is called a time series query. + +> Please note that mixed use of `Aggregate Query` and `Timeseries Query` is not allowed. Below are examples for queries that are not allowed. +> +> ``` +> select a, count(a) from root.sg +> select sin(a), count(a) from root.sg +> select a, count(a) from root.sg group by ([10,100),10ms) +> ``` + +For the aggregation functions supported by IoTDB, see the document [Aggregation Functions](../Operators-Functions/Aggregation.md). + +### Time series generation function + +A time series generation function takes several raw time series as input and produces a list of time series as output. Unlike aggregate functions, time series generators have a timestamp column in their result sets. + +All time series generation functions accept * as input, and all can be mixed with raw time series queries. + +#### Built-in time series generation functions + +See the documentation [Operators and Functions](../Operators-Functions/Overview.md) for a list of built-in functions supported in IoTDB. + +#### User-Defined time series generation function + +IoTDB supports function extension through User Defined Function (click for [User-Defined Function](../Operators-Functions/User-Defined-Function.md)) capability. + +## Nested Expressions + +IoTDB supports the calculation of arbitrary nested expressions. Since time series query and aggregation query can not be used in a query statement at the same time, we divide nested expressions into two types, which are nested expressions with time series query and nested expressions with aggregation query. + +The following is the syntax definition of the `select` clause: + +```sql +selectClause + : SELECT resultColumn (',' resultColumn)* + ; + +resultColumn + : expression (AS ID)? + ; + +expression + : '(' expression ')' + | '-' expression + | expression ('*' | '/' | '%') expression + | expression ('+' | '-') expression + | functionName '(' expression (',' expression)* functionAttribute* ')' + | timeSeriesSuffixPath + | number + ; +``` + +### Nested Expressions with Time Series Query + +IoTDB supports the calculation of arbitrary nested expressions consisting of **numbers, time series, time series generating functions (including user-defined functions) and arithmetic expressions** in the `select` clause. + +##### Example + +Input1: + +```sql +select a, + b, + ((a + 1) * 2 - 1) % 2 + 1.5, + sin(a + sin(a + sin(b))), + -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 +from root.sg1; +``` + +Result1: + +``` ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Time|root.sg1.a|root.sg1.b|((((root.sg1.a + 1) * 2) - 1) % 2) + 1.5|sin(root.sg1.a + sin(root.sg1.a + sin(root.sg1.b)))|(-root.sg1.a + root.sg1.b * ((sin(root.sg1.a + root.sg1.b) * sin(root.sg1.a + root.sg1.b)) + (cos(root.sg1.a + root.sg1.b) * cos(root.sg1.a + root.sg1.b)))) + 1| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 1| 1| 2.5| 0.9238430524420609| -1.0| +|1970-01-01T08:00:00.020+08:00| 2| 2| 2.5| 0.7903505371876317| -3.0| +|1970-01-01T08:00:00.030+08:00| 3| 3| 2.5| 0.14065207680386618| -5.0| +|1970-01-01T08:00:00.040+08:00| 4| null| 2.5| null| null| +|1970-01-01T08:00:00.050+08:00| null| 5| null| null| null| +|1970-01-01T08:00:00.060+08:00| 6| 6| 2.5| -0.7288037411970916| -11.0| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +Total line number = 6 +It costs 0.048s +``` + +Input2: + +```sql +select (a + b) * 2 + sin(a) from root.sg +``` + +Result2: + +``` ++-----------------------------+----------------------------------------------+ +| Time|((root.sg.a + root.sg.b) * 2) + sin(root.sg.a)| ++-----------------------------+----------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 59.45597888911063| +|1970-01-01T08:00:00.020+08:00| 100.91294525072763| +|1970-01-01T08:00:00.030+08:00| 139.01196837590714| +|1970-01-01T08:00:00.040+08:00| 180.74511316047935| +|1970-01-01T08:00:00.050+08:00| 219.73762514629607| +|1970-01-01T08:00:00.060+08:00| 259.6951893788978| +|1970-01-01T08:00:00.070+08:00| 300.7738906815579| +|1970-01-01T08:00:00.090+08:00| 39.45597888911063| +|1970-01-01T08:00:00.100+08:00| 39.45597888911063| ++-----------------------------+----------------------------------------------+ +Total line number = 9 +It costs 0.011s +``` + +Input3: + +```sql +select (a + *) / 2 from root.sg1 +``` + +Result3: + +``` ++-----------------------------+-----------------------------+-----------------------------+ +| Time|(root.sg1.a + root.sg1.a) / 2|(root.sg1.a + root.sg1.b) / 2| ++-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.010+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.020+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.030+08:00| 3.0| 3.0| +|1970-01-01T08:00:00.040+08:00| 4.0| null| +|1970-01-01T08:00:00.060+08:00| 6.0| 6.0| ++-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.011s +``` + +Input4: + +```sql +select (a + b) * 3 from root.sg, root.ln +``` + +Result4: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|(root.sg.a + root.sg.b) * 3|(root.sg.a + root.ln.b) * 3|(root.ln.a + root.sg.b) * 3|(root.ln.a + root.ln.b) * 3| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.010+08:00| 90.0| 270.0| 360.0| 540.0| +|1970-01-01T08:00:00.020+08:00| 150.0| 330.0| 690.0| 870.0| +|1970-01-01T08:00:00.030+08:00| 210.0| 450.0| 570.0| 810.0| +|1970-01-01T08:00:00.040+08:00| 270.0| 240.0| 690.0| 660.0| +|1970-01-01T08:00:00.050+08:00| 330.0| null| null| null| +|1970-01-01T08:00:00.060+08:00| 390.0| null| null| null| +|1970-01-01T08:00:00.070+08:00| 450.0| null| null| null| +|1970-01-01T08:00:00.090+08:00| 60.0| null| null| null| +|1970-01-01T08:00:00.100+08:00| 60.0| null| null| null| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +Total line number = 9 +It costs 0.014s +``` + +##### Explanation + +- Only when the left operand and the right operand under a certain timestamp are not `null`, the nested expressions will have an output value. Otherwise this row will not be included in the result. + - In Result1 of the Example part, the value of time series `root.sg.a` at time 40 is 4, while the value of time series `root.sg.b` is `null`. So at time 40, the value of nested expressions `(a + b) * 2 + sin(a)` is `null`. So in Result2, this row is not included in the result. +- If one operand in the nested expressions can be translated into multiple time series (For example, `*`), the result of each time series will be included in the result (Cartesian product). Please refer to Input3, Input4 and corresponding Result3 and Result4 in Example. + +##### Note + +> Please note that Aligned Time Series has not been supported in Nested Expressions with Time Series Query yet. An error message is expected if you use it with Aligned Time Series selected in a query statement. + +### Nested Expressions query with aggregations + +IoTDB supports the calculation of arbitrary nested expressions consisting of **numbers, aggregations and arithmetic expressions** in the `select` clause. + +##### Example + +Aggregation query without `GROUP BY`. + +Input1: + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) +from root.ln.wf01.wt01; +``` + +Result1: + +``` ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|avg(root.ln.wf01.wt01.temperature) + sum(root.ln.wf01.wt01.hardware)| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +| 15.927999999999999| -0.21826546964855045| 16.927999999999997| -7426.0| 7441.928| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +Total line number = 1 +It costs 0.009s +``` + +Input2: + +```sql +select avg(*), + (avg(*) + 1) * 3 / 2 -1 +from root.sg1 +``` + +Result2: + +``` ++---------------+---------------+-------------------------------------+-------------------------------------+ +|avg(root.sg1.a)|avg(root.sg1.b)|(avg(root.sg1.a) + 1) * 3 / 2 - 1 |(avg(root.sg1.b) + 1) * 3 / 2 - 1 | ++---------------+---------------+-------------------------------------+-------------------------------------+ +| 3.2| 3.4| 5.300000000000001| 5.6000000000000005| ++---------------+---------------+-------------------------------------+-------------------------------------+ +Total line number = 1 +It costs 0.007s +``` + +Aggregation with `GROUP BY`. + +Input3: + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) as custom_sum +from root.ln.wf01.wt01 +GROUP BY([10, 90), 10ms); +``` + +Result3: + +``` ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +| Time|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|custom_sum| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +|1970-01-01T08:00:00.010+08:00| 13.987499999999999| 0.9888207947857667| 14.987499999999999| -3211.0| 3224.9875| +|1970-01-01T08:00:00.020+08:00| 29.6| -0.9701057337071853| 30.6| -3720.0| 3749.6| +|1970-01-01T08:00:00.030+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.040+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.050+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.060+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.070+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.080+08:00| null| null| null| null| null| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +Total line number = 8 +It costs 0.012s +``` + +##### Explanation + +- Only when the left operand and the right operand under a certain timestamp are not `null`, the nested expressions will have an output value. Otherwise this row will not be included in the result. But for nested expressions with `GROUP BY` clause, it is better to show the result of all time intervals. Please refer to Input3 and corresponding Result3 in Example. +- If one operand in the nested expressions can be translated into multiple time series (For example, `*`), the result of each time series will be included in the result (Cartesian product). Please refer to Input2 and corresponding Result2 in Example. + + + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Into.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Into.md new file mode 100644 index 00000000..b39c3c14 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Into.md @@ -0,0 +1,340 @@ + + +# Query Write-back (SELECT INTO) + +The `SELECT INTO` statement copies data from query result set into target time series. + +The application scenarios are as follows: +- **Implement IoTDB internal ETL**: ETL the original data and write a new time series. +- **Query result storage**: Persistently store the query results, which acts like a materialized view. +- **Non-aligned time series to aligned time series**: Rewrite non-aligned time series into another aligned time series. + +## SQL Syntax + +### Syntax Definition + +**The following is the syntax definition of the `select` statement:** + +```sql +selectIntoStatement +: SELECT + resultColumn [, resultColumn] ... + INTO intoItem [, intoItem] ... + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY groupByTimeClause, groupByLevelClause] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +; + +intoItem +: [ALIGNED] intoDevicePath '(' intoMeasurementName [',' intoMeasurementName]* ')' + ; +``` + +### `INTO` Clause + +The `INTO` clause consists of several `intoItem`. + +Each `intoItem` consists of a target device and a list of target measurements (similar to the `INTO` clause in an `INSERT` statement). + +Each target measurement and device form a target time series, and an `intoItem` contains a series of time series. For example: `root.sg_copy.d1(s1, s2)` specifies two target time series `root.sg_copy.d1.s1` and `root.sg_copy.d1.s2`. + +The target time series specified by the `INTO` clause must correspond one-to-one with the columns of the query result set. The specific rules are as follows: + +- **Align by time** (default): The number of target time series contained in all `intoItem` must be consistent with the number of columns in the query result set (except the time column) and correspond one-to-one in the order from left to right in the header. +- **Align by device** (using `ALIGN BY DEVICE`): the number of target devices specified in all `intoItem` is the same as the number of devices queried (i.e., the number of devices matched by the path pattern in the `FROM` clause), and One-to-one correspondence according to the output order of the result set device. +
The number of measurements specified for each target device should be consistent with the number of columns in the query result set (except for the time and device columns). It should be in one-to-one correspondence from left to right in the header. + +For examples: + +- **Example 1** (aligned by time) +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; ++--------------+-------------------+--------+ +| source column| target timeseries| written| ++--------------+-------------------+--------+ +| root.sg.d1.s1| root.sg_copy.d1.t1| 8000| ++--------------+-------------------+--------+ +| root.sg.d2.s1| root.sg_copy.d2.t1| 10000| ++--------------+-------------------+--------+ +| root.sg.d1.s2| root.sg_copy.d2.t2| 12000| ++--------------+-------------------+--------+ +| root.sg.d2.s2| root.sg_copy.d1.t2| 10000| ++--------------+-------------------+--------+ +Total line number = 4 +It costs 0.725s +``` +This statement writes the query results of the four time series under the `root.sg` database to the four specified time series under the `root.sg_copy` database. Note that `root.sg_copy.d2(t1, t2)` can also be written as `root.sg_copy.d2(t1), root.sg_copy.d2(t2)`. + +We can see that the writing of the `INTO` clause is very flexible as long as the combined target time series is not repeated and corresponds to the query result column one-to-one. + +> In the result set displayed by `CLI`, the meaning of each column is as follows: +> - The `source column` column represents the column name of the query result. +> - `target timeseries` represents the target time series for the corresponding column to write. +> - `written` indicates the amount of data expected to be written. + + +- **Example 2** (aligned by time) +```shell +IoTDB> select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); ++--------------------------------------+-------------------------+--------+ +| source column| target timeseries| written| ++--------------------------------------+-------------------------+--------+ +| count(root.sg.d1.s1 + root.sg.d1.s2)| root.agg.count.s1_add_s2| 10| ++--------------------------------------+-------------------------+--------+ +| last_value(root.sg.d1.s2)| root.agg.last_value.s2| 10| ++--------------------------------------+-------------------------+--------+ +Total line number = 2 +It costs 0.375s +``` + +This statement stores the results of an aggregated query into the specified time series. + +- **Example 3** (aligned by device) +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+-------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s1| root.sg_copy.d1.t1| 8000| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s2| root.sg_copy.d1.t2| 11000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s1| root.sg_copy.d2.t1| 12000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s2| root.sg_copy.d2.t2| 9000| ++--------------+--------------+-------------------+--------+ +Total line number = 4 +It costs 0.625s +``` +This statement also writes the query results of the four time series under the `root.sg` database to the four specified time series under the `root.sg_copy` database. However, in ALIGN BY DEVICE, the number of `intoItem` must be the same as the number of queried devices, and each queried device corresponds to one `intoItem`. + +> When aligning the query by device, the result set displayed by `CLI` has one more column, the `source device` column indicating the queried device. + +- **Example 4** (aligned by device) +```shell +IoTDB> select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+------------------------+--------+ +| root.sg.d1| s1 + s2| root.expr.add.d1s1_d1s2| 10000| ++--------------+--------------+------------------------+--------+ +| root.sg.d2| s1 + s2| root.expr.add.d2s1_d2s2| 10000| ++--------------+--------------+------------------------+--------+ +Total line number = 2 +It costs 0.532s +``` +This statement stores the result of evaluating an expression into the specified time series. + +### Using variable placeholders + +In particular, We can use variable placeholders to describe the correspondence between the target and query time series, simplifying the statement. The following two variable placeholders are currently supported: + +- Suffix duplication character `::`: Copy the suffix (or measurement) of the query device, indicating that from this layer to the last layer (or measurement) of the device, the node name (or measurement) of the target device corresponds to the queried device The node name (or measurement) is the same. +- Single-level node matcher `${i}`: Indicates that the current level node name of the target sequence is the same as the i-th level node name of the query sequence. For example, for the path `root.sg1.d1.s1`, `${1}` means `sg1`, `${2}` means `d1`, and `${3}` means `s1`. + +When using variable placeholders, there must be no ambiguity in the correspondence between `intoItem` and the columns of the query result set. The specific cases are classified as follows: + +#### ALIGN BY TIME (default) + +> Note: The variable placeholder **can only describe the correspondence between time series**. If the query includes aggregation and expression calculation, the columns in the query result cannot correspond to a time series, so neither the target device nor the measurement can use variable placeholders. + +##### (1) The target device does not use variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** +1. In each `intoItem`, the length of the list of physical quantities must be 1.
(If the length can be greater than 1, e.g. `root.sg1.d1(::, s1)`, it is not possible to determine which columns match `::`) +2. The number of `intoItem` is 1, or the same as the number of columns in the query result set.
(When the length of each target measurement list is 1, if there is only one `intoItem`, it means that all the query sequences are written to the same device; if the number of `intoItem` is consistent with the query sequence, it is expressed as each query time series specifies a target device; if `intoItem` is greater than one and less than the number of query sequences, it cannot be a one-to-one correspondence with the query sequence) + +**Matching method:** Each query time series specifies the target device, and the target measurement is generated from the variable placeholder. + +**Example:** + +```sql +select s1, s2 +into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) +from root.sg.d1, root.sg.d2; +```` +This statement is equivalent to: +```sql +select s1, s2 +into root.sg_copy.d1(s1), root.sg_copy.d2(s1), root.sg_copy.d1(s2), root.sg_copy.d2(s2) +from root.sg.d1, root.sg.d2; +```` +As you can see, the statement is not very simplified in this case. + +##### (2) The target device uses variable placeholders & the target measurement list does not use variable placeholders + +**Limitations:** The number of target measurements in all `intoItem` is the same as the number of columns in the query result set. + +**Matching method:** The target measurement is specified for each query time series, and the target device is generated according to the target device placeholder of the `intoItem` where the corresponding target measurement is located. + +**Example:** +```sql +select d1.s1, d1.s2, d2.s3, d3.s4 +into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) +from root.sg; +```` +##### (3) The target device uses variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** There is only one `intoItem`, and the length of the list of measurement list is 1. + +**Matching method:** Each query time series can get a target time series according to the variable placeholder. + +**Example:** +```sql +select * into root.sg_bk.::(::) from root.sg.**; +```` +Write the query results of all time series under `root.sg` to `root.sg_bk`, the device name suffix and measurement remain unchanged. + +#### ALIGN BY DEVICE + +> Note: The variable placeholder **can only describe the correspondence between time series**. If the query includes aggregation and expression calculation, the columns in the query result cannot correspond to a specific physical quantity, so the target measurement cannot use variable placeholders. + +##### (1) The target device does not use variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** In each `intoItem`, if the list of measurement uses variable placeholders, the length of the list must be 1. + +**Matching method:** Each query time series specifies the target device, and the target measurement is generated from the variable placeholder. + +**Example:** +```sql +select s1, s2, s3, s4 +into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) +from root.sg.d1, root.sg.d2, root.sg.d3 +align by device; +```` + +##### (2) The target device uses variable placeholders & the target measurement list does not use variable placeholders + +**Limitations:** There is only one `intoItem`. (If there are multiple `intoItem` with placeholders, we will not know which source devices each `intoItem` needs to match) + +**Matching method:** Each query device obtains a target device according to the variable placeholder, and the target measurement written in each column of the result set under each device is specified by the target measurement list. + +**Example:** +```sql +select avg(s1), sum(s2) + sum(s3), count(s4) +into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) +from root.** +align by device; +```` + +##### (3) The target device uses variable placeholders & the target measurement list uses variable placeholders + +**Limitations:** There is only one `intoItem` and the length of the target measurement list is 1. + +**Matching method:** Each query time series can get a target time series according to the variable placeholder. + +**Example:** +```sql +select * into ::(backup_${4}) from root.sg.** align by device; +```` +Write the query result of each time series in `root.sg` to the same device, and add `backup_` before the measurement. + +### Specify the target time series as the aligned time series + +We can use the `ALIGNED` keyword to specify the target device for writing to be aligned, and each `intoItem` can be set independently. + +**Example:** +```sql +select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +``` +This statement specifies that `root.sg_copy.d1` is an unaligned device and `root.sg_copy.d2` is an aligned device. + +### Unsupported query clauses + +- `SLIMIT`, `SOFFSET`: The query columns are uncertain, so they are not supported. +- `LAST`, `GROUP BY TAGS`, `DISABLE ALIGN`: The table structure is inconsistent with the writing structure, so it is not supported. + +### Other points to note + +- For general aggregation queries, the timestamp is meaningless, and the convention is to use 0 to store. +- When the target time-series exists, the data type of the source column and the target time-series must be compatible. About data type compatibility, see the document [Data Type](../Basic-Concept/Data-Type.md#Data Type Compatibility). +- When the target time series does not exist, the system automatically creates it (including the database). +- When the queried time series does not exist, or the queried sequence does not have data, the target time series will not be created automatically. + +## Application examples + +### Implement IoTDB internal ETL +ETL the original data and write a new time series. +```shell +IOTDB > SELECT preprocess_udf(s1, s2) INTO ::(preprocessed_s1, preprocessed_s2) FROM root.sg.* ALIGN BY DEIVCE; ++--------------+-------------------+---------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s1)| root.sg.d1.preprocessed_s1| 8000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s2)| root.sg.d1.preprocessed_s2| 10000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s1)| root.sg.d2.preprocessed_s1| 11000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s2)| root.sg.d2.preprocessed_s2| 9000| ++--------------+-------------------+---------------------------+--------+ +``` + +### Query result storage +Persistently store the query results, which acts like a materialized view. +```shell +IOTDB > SELECT count(s1), last_value(s1) INTO root.sg.agg_${2}(count_s1, last_value_s1) FROM root.sg1.d1 GROUP BY ([0, 10000), 10ms); ++--------------------------+-----------------------------+--------+ +| source column| target timeseries| written| ++--------------------------+-----------------------------+--------+ +| count(root.sg.d1.s1)| root.sg.agg_d1.count_s1| 1000| ++--------------------------+-----------------------------+--------+ +| last_value(root.sg.d1.s2)| root.sg.agg_d1.last_value_s2| 1000| ++--------------------------+-----------------------------+--------+ +Total line number = 2 +It costs 0.115s +``` + +### Non-aligned time series to aligned time series +Rewrite non-aligned time series into another aligned time series. + +**Note:** It is recommended to use the `LIMIT & OFFSET` clause or the `WHERE` clause (time filter) to batch data to prevent excessive data volume in a single operation. + +```shell +IOTDB > SELECT s1, s2 INTO ALIGNED root.sg1.aligned_d(s1, s2) FROM root.sg1.non_aligned_d WHERE time >= 0 and time < 10000; ++--------------------------+----------------------+--------+ +| source column| target timeseries| written| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s1| root.sg1.aligned_d.s1| 10000| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s2| root.sg1.aligned_d.s2| 10000| ++--------------------------+----------------------+--------+ +Total line number = 2 +It costs 0.375s +``` + +## User Permission Management + +The user must have the following permissions to execute a query write-back statement: + +* All `READ_TIMESERIES` permissions for the source series in the `select` clause. +* All `INSERT_TIMESERIES` permissions for the target series in the `into` clause. + +For more user permissions related content, please refer to [Account Management Statements](../Administration-Management/Administration.md). + +## Configurable Properties + +* `select_into_insert_tablet_plan_row_limit`: The maximum number of rows can be processed in one insert-tablet-plan when executing select-into statements. 10000 by default. + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Where-Condition.md b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Where-Condition.md new file mode 100644 index 00000000..8f9fcc61 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Query-Data/Where-Condition.md @@ -0,0 +1,191 @@ + + +# Query Filter + +In IoTDB query statements, two filter conditions, **time filter** and **value filter**, are supported. + +The supported operators are as follows: + +- Comparison operators: greater than (`>`), greater than or equal ( `>=`), equal ( `=` or `==`), not equal ( `!=` or `<>`), less than or equal ( `<=`), less than ( `<`). +- Logical operators: and ( `AND` or `&` or `&&`), or ( `OR` or `|` or `||`), not ( `NOT` or `!`). +- Range contains operator: contains ( `IN` ). +- String matches operator: `LIKE`, `REGEXP`. + +## Time Filter + +Use time filters to filter data for a specific time range. For supported formats of timestamps, please refer to [Timestamp](../Basic-Concept/Data-Type.md) . + +An example is as follows: + +1. Select data with timestamp greater than 2022-01-01T00:05:00.000: + + ```sql + select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; + ```` + +2. Select data with timestamp equal to 2022-01-01T00:05:00.000: + + ```sql + select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; + ```` + +3. Select the data in the time interval [2017-11-01T00:05:00.000, 2017-11-01T00:12:00.000): + + ```sql + select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; + ```` + +Note: In the above example, `time` can also be written as `timestamp`. + +## Value Filter + +Use value filters to filter data whose data values meet certain criteria. **Allow** to use a time series not selected in the select clause as a value filter. + +An example is as follows: + +1. Select data with a value greater than 36.5: + + ```sql + select temperature from root.sg1.d1 where temperature > 36.5; + ```` + +2. Select data with value equal to true: + + ```sql + select status from root.sg1.d1 where status = true; + ```` + +3. Select data for the interval [36.5,40] or not: + + ```sql + select temperature from root.sg1.d1 where temperature between 36.5 and 40; + ```` + ```sql + select temperature from root.sg1.d1 where temperature not between 36.5 and 40; + ```` + +4. Select data with values within a specific range: + + ```sql + select code from root.sg1.d1 where code in ('200', '300', '400', '500'); + ```` + +5. Select data with values outside a certain range: + + ```sql + select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); + ```` + +6. Select data with values is null: + + ```sql + select code from root.sg1.d1 where temperature is null; + ```` + +7. Select data with values is not null: + + ```sql + select code from root.sg1.d1 where temperature is not null; + ```` + +## Fuzzy Query + +Fuzzy query is divided into Like statement and Regexp statement, both of which can support fuzzy matching of TEXT type data. + +Like statement: + +### Fuzzy matching using `Like` + +In the value filter condition, for TEXT type data, use `Like` and `Regexp` operators to perform fuzzy matching on data. + +**Matching rules:** + +- The percentage (`%`) wildcard matches any string of zero or more characters. +- The underscore (`_`) wildcard matches any single character. + +**Example 1:** Query data containing `'cc'` in `value` under `root.sg.d1`. + +``` +IoTDB> select * from root.sg.d1 where value like '%cc%' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query data that consists of 3 characters and the second character is `'b'` in `value` under `root.sg.d1`. + +``` +IoTDB> select * from root.sg.device where value like '_b_' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00| abc| ++-----------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +### Fuzzy matching using `Regexp` + +The filter conditions that need to be passed in are regular expressions in the Java standard library style. + +**Examples of common regular matching:** + +``` +All characters with a length of 3-20: ^.{3,20}$ +Uppercase english characters: ^[A-Z]+$ +Numbers and English characters: ^[A-Za-z0-9]+$ +Beginning with a: ^a.* +``` + +**Example 1:** Query a string composed of 26 English characters for the value under root.sg.d1 + +``` +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**Example 2:** Query root.sg.d1 where the value value is a string composed of 26 lowercase English characters and the time is greater than 100 + +``` +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/QuickStart.md b/src/UserGuide/V2.0.1/Tree/stage/QuickStart.md new file mode 100644 index 00000000..586b34e4 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/QuickStart.md @@ -0,0 +1,232 @@ + + +# Quick Start + +This short guide will walk you through the basic process of using IoTDB. For a more-complete guide, please visit our website's [User Guide](../IoTDB-Introduction/What-is-IoTDB.md). + +## Prerequisites + +To use IoTDB, you need to have: + +1. Java >= 1.8 (Please make sure the environment path has been set) +2. Set the max open files num as 65535 to avoid "too many open files" problem. + +## Installation + +IoTDB provides you three installation methods, you can refer to the following suggestions, choose one of them: + +* Installation from source code. If you need to modify the code yourself, you can use this method. +* Installation from binary files. Download the binary files from the official website. This is the recommended method, in which you will get a binary released package which is out-of-the-box. +* Using Docker:The path to the dockerfile is [github](https://github.com/apache/iotdb/blob/master/docker/src/main) + + +## Download + +You can download the binary file from: +[Download Page](https://iotdb.apache.org/Download/) + +## Start + +You can go through the following step to test the installation, if there is no error after execution, the installation is completed. + +### Start IoTDB +IoTDB is a database based on distributed system. To launch IoTDB, you can first start standalone mode (i.e. 1 ConfigNode and 1 DataNode) to check. + +Users can start IoTDB standalone mode by the start-standalone script under the sbin folder. + +``` +# Unix/OS X +> bash sbin/start-standalone.sh +``` +``` +# Windows +> sbin\start-standalone.bat +``` + +Note: Currently, To run standalone mode, you need to ensure that all addresses are set to 127.0.0.1, If you need to access the IoTDB from a machine different from the one where the IoTDB is located, please change the configuration item `dn_rpc_address` to the IP of the machine where the IoTDB lives. And replication factors set to 1, which is by now the default setting. +Besides, it's recommended to use IoTConsensus in this mode, since it brings additional efficiency. +### Use Cli + +IoTDB offers different ways to interact with server, here we introduce basic steps of using Cli tool to insert and query data. + +After installing IoTDB, there is a default user 'root', its default password is also 'root'. Users can use this +default user to login Cli to use IoTDB. The startup script of Cli is the start-cli script in the folder sbin. When executing the script, user should assign +IP, PORT, USER_NAME and PASSWORD. The default parameters are "-h 127.0.0.1 -p 6667 -u root -pw -root". + +Here is the command for starting the Cli: + +``` +# Unix/OS X +> bash sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root + +# Windows +> sbin\start-cli.bat -h 127.0.0.1 -p 6667 -u root -pw root +``` + +The command line client is interactive so if everything is ready you should see the welcome logo and statements: + +``` + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version x.x.x + + +Successfully login at 127.0.0.1:6667 +IoTDB> +``` + +### Basic commands for IoTDB + +Now, let us introduce the way of creating timeseries, inserting data and querying data. + +The data in IoTDB is organized as timeseries, in each timeseries there are some data-time pairs, and every timeseries is owned by a database. Before defining a timeseries, we should define a database using create DATABASE, and here is an example: + +``` +IoTDB> create database root.ln +``` + +We can also use SHOW DATABASES to check created databases: + +``` +IoTDB> SHOW DATABASES ++---------------+----+-----------------------+---------------------+---------------------+ +| Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval| ++---------------+----+-----------------------+---------------------+---------------------+ +| root.ln|null| 1| 1| 604800000| ++---------------+----+-----------------------+---------------------+---------------------+ +Database number = 1 +``` + +After the database is set, we can use CREATE TIMESERIES to create new timeseries. When we create a timeseries, we should define its data type and the encoding scheme. We create two timeseries as follows: + +``` +IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN +IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE +``` + +To query the specific timeseries, use SHOW TIMESERIES \. \ represents the path of the timeseries. Its default value is null, which means querying all the timeseries in the system(the same as using "SHOW TIMESERIES root"). Here are the examples: + +1. Query all timeseries in the system: + +``` +IoTDB> SHOW TIMESERIES ++-------------------------------+---------------+--------+--------+ +| Timeseries| Database|DataType|Encoding| ++-------------------------------+---------------+--------+--------+ +| root.ln.wf01.wt01.status| root.ln| BOOLEAN| PLAIN| +| root.ln.wf01.wt01.temperature| root.ln| FLOAT| RLE| ++-------------------------------+---------------+--------+--------+ +Total timeseries number = 2 +``` + +2. Query a specific timeseries(root.ln.wf01.wt01.status): + +``` +IoTDB> SHOW TIMESERIES root.ln.wf01.wt01.status ++------------------------------+--------------+--------+--------+ +| Timeseries| Database|DataType|Encoding| ++------------------------------+--------------+--------+--------+ +| root.ln.wf01.wt01.status| root.ln| BOOLEAN| PLAIN| ++------------------------------+--------------+--------+--------+ +Total timeseries number = 1 +``` + +Insert timeseries data is the basic operation of IoTDB, you can use ‘INSERT’ command to finish this. Before insert you should assign the timestamp and the suffix path name: + +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true); +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71) +``` + +The data we’ve just inserted displays like this: + +``` +IoTDB> SELECT status FROM root.ln.wf01.wt01 ++-----------------------+------------------------+ +| Time|root.ln.wf01.wt01.status| ++-----------------------+------------------------+ +|1970-01-01T08:00:00.100| true| +|1970-01-01T08:00:00.200| false| ++-----------------------+------------------------+ +Total line number = 2 +``` + +We can also query several timeseries data at once like this: + +``` +IoTDB> SELECT * FROM root.ln.wf01.wt01 ++-----------------------+--------------------------+-----------------------------+ +| Time| root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------+--------------------------+-----------------------------+ +|1970-01-01T08:00:00.100| true| null| +|1970-01-01T08:00:00.200| false| 20.71| ++-----------------------+--------------------------+-----------------------------+ +Total line number = 2 +``` + +The commands to exit the Cli are: + +``` +IoTDB> quit +or +IoTDB> exit +``` + +For more on what commands are supported by IoTDB SQL, see [SQL Manuel](../SQL-Manual/SQL-Manual.md). + +### Stop IoTDB + +The server can be stopped with ctrl-C or the following script: + +``` +# Unix/OS X +> bash sbin/stop-standalone.sh + +# Windows +> sbin\stop-standalone.bat +``` +Note: In Linux, please add the "sudo" as far as possible, or else the stopping process may fail. +More explanations are in Cluster/Cluster-setup.md. + +### Administration management + +There is a default user in IoTDB after the initial installation: root, and the default password is root. This user is an administrator user, who cannot be deleted and has all the privileges. Neither can new privileges be granted to the root user nor can privileges owned by the root user be deleted. + +You can alter the password of root using the following command: +``` +ALTER USER SET PASSWORD ; +Example: IoTDB > ALTER USER root SET PASSWORD 'newpwd'; +``` + +More about administration management:[Administration Management](../User-Manual/Authority-Management.md) + + +## Basic configuration + +The configuration files is in the `conf` folder, includes: + +* environment configuration (`datanode-env.bat`, `datanode-env.sh`,`confignode-env.bat`,`confignode-env.sh`), +* system configuration (`iotdb-system.properties`,`iotdb-cluster.properties`) +* log configuration (`logback.xml`). diff --git a/src/UserGuide/V2.0.1/Tree/stage/SQL-Reference.md b/src/UserGuide/V2.0.1/Tree/stage/SQL-Reference.md new file mode 100644 index 00000000..aa5f7d83 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/SQL-Reference.md @@ -0,0 +1,1326 @@ + + +# SQL Reference + +In this part, we will introduce you IoTDB's Query Language. IoTDB offers you a SQL-like query language for interacting with IoTDB, the query language can be devided into 4 major parts: + +* Schema Statement: statements about schema management are all listed in this section. +* Data Management Statement: statements about data management (such as: data insertion, data query, etc.) are all listed in this section. +* Database Management Statement: statements about database management and authentication are all listed in this section. +* Functions: functions that IoTDB offers are all listed in this section. + +All of these statements are write in IoTDB's own syntax, for details about the syntax composition, please check the `Reference` section. + +## Show Version + +```sql +show version +``` + +``` ++---------------+ +| version| ++---------------+ +|1.0.0| ++---------------+ +Total line number = 1 +It costs 0.417s +``` + +## Schema Statement + +* Create Database + +``` SQL +CREATE DATABASE +Eg: IoTDB > CREATE DATABASE root.ln.wf01.wt01 +Note: FullPath can not include wildcard `*` or `**` +``` + +* Delete database + +``` +DELETE DATABASE [COMMA ]* +Eg: IoTDB > DELETE DATABASE root.ln.wf01.wt01 +Eg: IoTDB > DELETE DATABASE root.ln.wf01.wt01, root.ln.wf01.wt02 +Eg: IoTDB > DELETE DATABASE root.ln.wf01.* +Eg: IoTDB > DELETE DATABASE root.** +``` + +* Create Timeseries Statement +``` +CREATE TIMESERIES WITH +alias + : LR_BRACKET ID RR_BRACKET + ; +attributeClauses + : DATATYPE OPERATOR_EQ + COMMA ENCODING OPERATOR_EQ + (COMMA (COMPRESSOR | COMPRESSION) OPERATOR_EQ )? + (COMMA property)* + tagClause + attributeClause + ; +attributeClause + : ATTRIBUTES LR_BRACKET propertyClause (COMMA propertyClause)* RR_BRACKET + ; +tagClause + : TAGS LR_BRACKET propertyClause (COMMA propertyClause)* RR_BRACKET + ; +propertyClause + : name=ID OPERATOR_EQ propertyValue + ; +DataTypeValue: BOOLEAN | DOUBLE | FLOAT | INT32 | INT64 | TEXT +EncodingValue: GORILLA | PLAIN | RLE | TS_2DIFF | REGULAR +CompressorValue: UNCOMPRESSED | SNAPPY +AttributesType: SDT | COMPDEV | COMPMINTIME | COMPMAXTIME +PropertyValue: ID | constant +Eg: CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, COMPRESSOR=SNAPPY, MAX_POINT_NUMBER=3 +Eg: CREATE TIMESERIES root.turbine.d0.s0(temperature) WITH DATATYPE=FLOAT, ENCODING=RLE, COMPRESSOR=SNAPPY tags(unit=f, description='turbine this is a test1') attributes(H_Alarm=100, M_Alarm=50) +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, DEADBAND=SDT, COMPDEV=0.01 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, DEADBAND=SDT, COMPDEV=0.01, COMPMINTIME=3 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, DEADBAND=SDT, COMPDEV=0.01, COMPMINTIME=2, COMPMAXTIME=15 +Note: Datatype and encoding type must be corresponding. Please check Chapter 3 Encoding Section for details. +Note: When propertyValue is SDT, it is required to set compression deviation COMPDEV, which is the maximum absolute difference between values. +Note: For SDT, values withtin COMPDEV will be discarded. +Note: For SDT, it is optional to set compression minimum COMPMINTIME, which is the minimum time difference between stored values for purpose of noise reduction. +Note: For SDT, it is optional to set compression maximum COMPMAXTIME, which is the maximum time difference between stored values regardless of COMPDEV. +``` + +* Create Timeseries Statement (Simplified version, from v0.13) +``` +CREATE TIMESERIES +SimplifiedAttributeClauses + : WITH? (DATATYPE OPERATOR_EQ)? + ENCODING OPERATOR_EQ + ((COMPRESSOR | COMPRESSION) OPERATOR_EQ )? + (COMMA property)* + tagClause + attributeClause + ; +Eg: CREATE TIMESERIES root.ln.wf01.wt01.status BOOLEAN ENCODING=PLAIN +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE COMPRESSOR=SNAPPY MAX_POINT_NUMBER=3 +Eg: CREATE TIMESERIES root.turbine.d0.s0(temperature) FLOAT ENCODING=RLE COMPRESSOR=SNAPPY tags(unit=f, description='turbine this is a test1') attributes(H_Alarm=100, M_Alarm=50) +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE DEADBAND=SDT COMPDEV=0.01 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE DEADBAND=SDT COMPDEV=0.01 COMPMINTIME=3 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE DEADBAND=SDT COMPDEV=0.01 COMPMINTIME=2 COMPMAXTIME=15 +``` + +* Create Aligned Timeseries Statement +``` +CREATE ALIGNED TIMESERIES alignedMeasurements +alignedMeasurements + : LR_BRACKET nodeNameWithoutWildcard attributeClauses + (COMMA nodeNameWithoutWildcard attributeClauses)+ RR_BRACKET + ; +Eg: CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(lat FLOAT ENCODING=GORILLA, lon FLOAT ENCODING=GORILLA COMPRESSOR=SNAPPY) +Note: It is not supported to set different compression for a group of aligned timeseries. +Note: It is not currently supported to set an alias, tag, and attribute for aligned timeseries. +``` + +* Create Schema Template Statement +``` +CREATE SCHEMA TEMPLATE LR_BRACKET (COMMA plateMeasurementClause>)* RR_BRACKET +templateMeasurementClause + : suffixPath attributeClauses #nonAlignedTemplateMeasurement + | suffixPath LR_BRACKET nodeNameWithoutWildcard attributeClauses + (COMMA nodeNameWithoutWildcard attributeClauses)+ RR_BRACKET #alignedTemplateMeasurement + ; +Eg: CREATE SCHEMA TEMPLATE temp1( + s1 INT32 encoding=Gorilla, compression=SNAPPY, + vector1( + s1 INT32 encoding=Gorilla, + s2 FLOAT encoding=RLE, compression=SNAPPY) + ) +``` + +* Set Schema Template Statement +``` +SET SCHEMA TEMPLATE TO +Eg: SET SCHEMA TEMPLATE temp1 TO root.beijing +``` + +* Create Timeseries Of Schema Template Statement +``` +CREATE TIMESERIES OF SCHEMA TEMPLATE ON +Eg: CREATE TIMESERIES OF SCHEMA TEMPLATE ON root.beijing +``` + +* Unset Schema Template Statement +``` +UNSET SCHEMA TEMPLATE FROM +Eg: UNSET SCHEMA TEMPLATE temp1 FROM root.beijing +``` + +* Delete Timeseries Statement + +``` +(DELETE | DROP) TIMESERIES [COMMA ]* +Eg: IoTDB > DELETE TIMESERIES root.ln.wf01.wt01.status +Eg: IoTDB > DELETE TIMESERIES root.ln.wf01.wt01.status, root.ln.wf01.wt01.temperature +Eg: IoTDB > DELETE TIMESERIES root.ln.wf01.wt01.* +Eg: IoTDB > DROP TIMESERIES root.ln.wf01.wt01.* +``` + +* Alter Timeseries Statement +``` +ALTER TIMESERIES fullPath alterClause +alterClause + : RENAME beforeName=ID TO currentName=ID + | SET property (COMMA property)* + | DROP ID (COMMA ID)* + | ADD TAGS property (COMMA property)* + | ADD ATTRIBUTES property (COMMA property)* + | UPSERT tagClause attributeClause + ; +attributeClause + : (ATTRIBUTES LR_BRACKET property (COMMA property)* RR_BRACKET)? + ; +tagClause + : (TAGS LR_BRACKET property (COMMA property)* RR_BRACKET)? + ; +Eg: ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +Eg: ALTER timeseries root.turbine.d1.s1 SET tag1=newV1, attr1=newV1 +Eg: ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +Eg: ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +Eg: ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +EG: ALTER timeseries root.turbine.d1.s1 UPSERT TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* Show All Timeseries Statement + +``` +SHOW TIMESERIES +Eg: IoTDB > SHOW TIMESERIES +Note: This statement can only be used in IoTDB Client. If you need to show all timeseries in JDBC, please use `DataBaseMetadata` interface. +``` + +* Show Specific Timeseries Statement + +``` +SHOW TIMESERIES +Eg: IoTDB > SHOW TIMESERIES root.** +Eg: IoTDB > SHOW TIMESERIES root.ln.** +Eg: IoTDB > SHOW TIMESERIES root.ln.*.*.status +Eg: IoTDB > SHOW TIMESERIES root.ln.wf01.wt01.status +Note: The path can be timeseries path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Show Specific Timeseries Statement with where clause + +``` +SHOW TIMESERIES pathPattern? showWhereClause? +showWhereClause + : WHERE (property | containsExpression) + ; +containsExpression + : name=ID OPERATOR_CONTAINS value=propertyValue + ; + +Eg: show timeseries root.ln.** where unit='c' +Eg: show timeseries root.ln.** where description contains 'test1' +``` + +* Show Specific Timeseries Statement with where clause start from offset and limit the total number of result + +``` +SHOW TIMESERIES pathPattern? showWhereClause? limitClause? + +showWhereClause + : WHERE (property | containsExpression) + ; +containsExpression + : name=ID OPERATOR_CONTAINS value=propertyValue + ; +limitClause + : LIMIT INT offsetClause? + | offsetClause? LIMIT INT + ; + +Eg: show timeseries root.ln.** where unit='c' +Eg: show timeseries root.ln.** where description contains 'test1' +Eg: show timeseries root.ln.** where unit='c' limit 10 offset 10 +``` + +* Show Databases Statement + +``` +SHOW DATABASES +Eg: IoTDB > SHOW DATABASES +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Show Specific database Statement + +``` +SHOW DATABASES +Eg: IoTDB > SHOW DATABASES root.* +Eg: IoTDB > SHOW DATABASES root.ln +Note: The path can be full path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Show Merge Status Statement + +``` +SHOW MERGE INFO +Eg: IoTDB > SHOW MERGE INFO +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Count Timeseries Statement + +``` +COUNT TIMESERIES +Eg: IoTDB > COUNT TIMESERIES root.** +Eg: IoTDB > COUNT TIMESERIES root.ln.** +Eg: IoTDB > COUNT TIMESERIES root.ln.*.*.status +Eg: IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +Note: The path can be timeseries path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +``` +COUNT TIMESERIES GROUP BY LEVEL= +Eg: IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +Eg: IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +Eg: IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=3 +Note: The path can be timeseries path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Count Nodes Statement + +``` +COUNT NODES LEVEL= +Eg: IoTDB > COUNT NODES root.** LEVEL=2 +Eg: IoTDB > COUNT NODES root.ln.** LEVEL=2 +Eg: IoTDB > COUNT NODES root.ln.* LEVEL=3 +Eg: IoTDB > COUNT NODES root.ln.wf01 LEVEL=3 +Note: The path can be full path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Show All Devices Statement + +``` +SHOW DEVICES (WITH DATABASE)? limitClause? +Eg: IoTDB > SHOW DEVICES +Eg: IoTDB > SHOW DEVICES WITH DATABASE +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Show Specific Devices Statement + +``` +SHOW DEVICES (WITH DATABASE)? limitClause? +Eg: IoTDB > SHOW DEVICES root.** +Eg: IoTDB > SHOW DEVICES root.ln.** +Eg: IoTDB > SHOW DEVICES root.*.wf01 +Eg: IoTDB > SHOW DEVICES root.ln WITH DATABASE +Eg: IoTDB > SHOW DEVICES root.*.wf01 WITH DATABASE +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Show Child Paths of Root Statement +``` +SHOW CHILD PATHS +Eg: IoTDB > SHOW CHILD PATHS +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* Show Child Paths Statement +``` +SHOW CHILD PATHS +Eg: IoTDB > SHOW CHILD PATHS root +Eg: IoTDB > SHOW CHILD PATHS root.ln +Eg: IoTDB > SHOW CHILD PATHS root.*.wf01 +Eg: IoTDB > SHOW CHILD PATHS root.ln.wf* +Note: This statement can be used in IoTDB Client and JDBC. +``` + +## Data Management Statement + +* Insert Record Statement + +``` +INSERT INTO LPAREN TIMESTAMP COMMA [COMMA ]* RPAREN VALUES LPAREN , [COMMA ]* RPAREN +Sensor : Identifier +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,status) VALUES(NOW(), false) +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,temperature) VALUES(2017-11-01T00:17:00.000+08:00,24.22028) +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) VALUES (1509466680000,false,20.060787) +Eg: IoTDB > INSERT INTO root.sg.d1(timestamp,(s1,s2),(s3,s4)) VALUES (1509466680000,(1.0,2),(NULL,4)) +Note: the statement needs to satisfy this constraint: + = +Note: The order of Sensor and PointValue need one-to-one correspondence +``` + +* Delete Record Statement + +``` +DELETE FROM [COMMA ]* [WHERE ]? +WhereClause : [(AND) ]* +Condition : [(AND) ]* +TimeExpr : TIME PrecedenceEqualOperator ( | ) +Eg: DELETE FROM root.ln.wf01.wt01.temperature WHERE time > 2016-01-05T00:15:00+08:00 and time < 2017-11-1T00:05:00+08:00 +Eg: DELETE FROM root.ln.wf01.wt01.status, root.ln.wf01.wt01.temperature WHERE time < NOW() +Eg: DELETE FROM root.ln.wf01.wt01.* WHERE time >= 1509466140000 +``` + +* Select Record Statement + +``` +SELECT FROM [WHERE ]? +SelectClause : (COMMA )* +SelectPath : LPAREN RPAREN | +FUNCTION : ‘COUNT’ , ‘MIN_TIME’, ‘MAX_TIME’, ‘MIN_VALUE’, ‘MAX_VALUE’ +FromClause : (COMMA )? +WhereClause : [(AND | OR) ]* +Condition : [(AND | OR) ]* +Expression : [NOT | !]? | [NOT | !]? +TimeExpr : TIME PrecedenceEqualOperator ( | ) +RelativeTimeDurationUnit = Integer ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS') +RelativeTime : (now() | ) [(+|-) RelativeTimeDurationUnit]+ +SensorExpr : ( | ) PrecedenceEqualOperator +Eg: IoTDB > SELECT status, temperature FROM root.ln.wf01.wt01 WHERE temperature < 24 and time > 2017-11-1 0:13:00 +Eg. IoTDB > SELECT ** FROM root +Eg. IoTDB > SELECT * FROM root.** +Eg. IoTDB > SELECT * FROM root.** where time > now() - 5m +Eg. IoTDB > SELECT * FROM root.ln.*.wf* +Eg. IoTDB > SELECT COUNT(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 +Eg. IoTDB > SELECT MIN_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 +Eg. IoTDB > SELECT MAX_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature > 24 +Eg. IoTDB > SELECT MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature > 23 +Eg. IoTDB > SELECT MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 +Eg. IoTDB > SELECT COUNT(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 GROUP BY LEVEL=1 +Note: the statement needs to satisfy this constraint: (SelectClause) + (FromClause) = +Note: If the (WhereClause) is started with and not with ROOT, the statement needs to satisfy this constraint: (FromClause) + (SensorExpr) = +Note: In Version 0.7.0, if includes `OR`, time filter can not be used. +Note: There must be a space on both sides of the plus and minus operator appearing in the time expression +``` + +* Group By Statement + +``` +SELECT FROM WHERE GROUP BY +SelectClause : [COMMA < Function >]* +Function : LPAREN RPAREN +FromClause : +WhereClause : [(AND | OR) ]* +Condition : [(AND | OR) ]* +Expression : [NOT | !]? | [NOT | !]? +TimeExpr : TIME PrecedenceEqualOperator ( | ) +RelativeTimeDurationUnit = Integer ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS') +RelativeTime : (now() | ) [(+|-) RelativeTimeDurationUnit]+ +SensorExpr : ( | ) PrecedenceEqualOperator +GroupByTimeClause : LPAREN COMMA (COMMA )? RPAREN +TimeInterval: LSBRACKET COMMA RRBRACKET | LRBRACKET COMMA RSBRACKET +TimeUnit : Integer +DurationUnit : "ms" | "s" | "m" | "h" | "d" | "w" | "mo" +Eg: SELECT COUNT(status), COUNT(temperature) FROM root.ln.wf01.wt01 where temperature < 24 GROUP BY([1509465720000, 1509466380000), 5m) +Eg: SELECT COUNT(status), COUNT(temperature) FROM root.ln.wf01.wt01 where temperature < 24 GROUP BY((1509465720000, 1509466380000], 5m) +Eg. SELECT COUNT (status), MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE time < 1509466500000 GROUP BY([1509465720000, 1509466380000), 5m, 10m) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ([1509466140000, 1509466380000), 3m, 5ms) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 3m, 5ms) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 1mo) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 1mo, 1mo) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 1mo, 2mo) +Note: the statement needs to satisfy this constraint: (SelectClause) + (FromClause) = +Note: If the (WhereClause) is started with and not with ROOT, the statement needs to satisfy this constraint: (FromClause) + (SensorExpr) = +Note: (TimeInterval) needs to be greater than 0 +Note: First (TimeInterval) in needs to be smaller than second (TimeInterval) +Note: needs to be greater than 0 +Note: Third if set shouldn't be smaller than second +Note: If the second is "mo", the third need to be in month +Note: If the third is "mo", the second can be in any unit +``` + +* Fill Statement + +``` +SELECT FROM WHERE FILL +SelectClause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : +WhereExpression : TIME EQUAL +FillClause : LPAREN [COMMA ]* RPAREN +TypeClause : | | | | | +Int32Clause: INT32 LBRACKET ( | ) RBRACKET +Int64Clause: INT64 LBRACKET ( | ) RBRACKET +FloatClause: FLOAT LBRACKET ( | ) RBRACKET +DoubleClause: DOUBLE LBRACKET ( | ) RBRACKET +BoolClause: BOOLEAN LBRACKET ( | ) RBRACKET +TextClause: TEXT LBRACKET ( | ) RBRACKET +PreviousClause : PREVIOUS [COMMA ]? +LinearClause : LINEAR [COMMA COMMA ]? +ValidPreviousTime, ValidBehindTime: +TimeUnit : Integer +DurationUnit : "ms" | "s" | "m" | "h" | "d" | "w" +Eg: SELECT temperature FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL(float[previous, 1m]) +Eg: SELECT temperature,status FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL (float[linear, 1m, 1m], boolean[previous, 1m]) +Eg: SELECT temperature,status,hardware FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL (float[linear, 1m, 1m], boolean[previous, 1m], text[previous]) +Eg: SELECT temperature,status,hardware FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL (float[linear], boolean[previous, 1m], text[previous]) +Note: the statement needs to satisfy this constraint: (FromClause) + (SelectClause) = +Note: Integer in needs to be greater than 0 +``` + +* Group By Fill Statement + +``` +SELECT FROM WHERE GROUP BY (FILL )? +GroupByClause : LPAREN COMMA RPAREN +GROUPBYFillClause : LPAREN RPAREN +TypeClause : | | | | | | +AllClause: ALL LBRACKET ( | ) RBRACKET +Int32Clause: INT32 LBRACKET ( | ) RBRACKET +Int64Clause: INT64 LBRACKET ( | ) RBRACKET +FloatClause: FLOAT LBRACKET ( | ) RBRACKET +DoubleClause: DOUBLE LBRACKET ( | ) RBRACKET +BoolClause: BOOLEAN LBRACKET ( | ) RBRACKET +TextClause: TEXT LBRACKET ( | ) RBRACKET +PreviousClause : PREVIOUS +PreviousUntilLastClause : PREVIOUSUNTILLAST +Eg: SELECT last_value(temperature) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (float[PREVIOUS]) +Eg: SELECT last_value(temperature) FROM root.ln.wf01.wt01 GROUP BY((15, 100], 5m) FILL (float[PREVIOUS]) +Eg: SELECT last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (int32[PREVIOUSUNTILLAST]) +Eg: SELECT last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (int32[PREVIOUSUNTILLAST, 5m]) +Eg: SELECT last_value(temperature), last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (ALL[PREVIOUS]) +Eg: SELECT last_value(temperature), last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (ALL[PREVIOUS, 5m]) +Note: In group by fill, sliding step is not supported in group by clause +Note: Now, only last_value aggregation function is supported in group by fill. +Note: Linear fill is not supported in group by fill. +``` + +* Order by time Statement + +``` +SELECT FROM WHERE GROUP BY (FILL )? orderByTimeClause? +orderByTimeClause: order by time (asc | desc)? + +Eg: SELECT last_value(temperature) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (float[PREVIOUS]) order by time desc +Eg: SELECT * from root.** order by time desc +Eg: SELECT * from root.** order by time desc align by device +Eg: SELECT * from root.** order by time desc disable align +Eg: SELECT last * from root.** order by time desc +``` + +* Limit Statement + +``` +SELECT FROM [WHERE ] [] [] +SelectClause : [ | Function]+ +Function : LPAREN RPAREN +FromClause : +WhereClause : [(AND | OR) ]* +Condition : [(AND | OR) ]* +Expression: [NOT|!]? | [NOT|!]? +TimeExpr : TIME PrecedenceEqualOperator ( | ) +RelativeTimeDurationUnit = Integer ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS') +RelativeTime : (now() | ) [(+|-) RelativeTimeDurationUnit]+ +SensorExpr : (|) PrecedenceEqualOperator +LIMITClause : LIMIT [OFFSETClause]? +N : Integer +OFFSETClause : OFFSET +OFFSETValue : Integer +SLIMITClause : SLIMIT [SOFFSETClause]? +SN : Integer +SOFFSETClause : SOFFSET +SOFFSETValue : Integer +Eg: IoTDB > SELECT status, temperature FROM root.ln.wf01.wt01 WHERE temperature < 24 and time > 2017-11-1 0:13:00 LIMIT 3 OFFSET 2 +Eg. IoTDB > SELECT COUNT (status), MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE time < 1509466500000 GROUP BY([1509465720000, 1509466380000], 5m) LIMIT 3 +Note: N, OFFSETValue, SN and SOFFSETValue must be greater than 0. +Note: The order of and does not affect the grammatical correctness. +Note: can not use but not . +``` + +* Align By Device Statement + +``` +AlignbyDeviceClause : ALIGN BY DEVICE + +Rules: +1. Both uppercase and lowercase are ok. +Correct example: select * from root.sg1.* align by device +Correct example: select * from root.sg1.* ALIGN BY DEVICE + +2. AlignbyDeviceClause can only be used at the end of a query statement. +Correct example: select * from root.sg1.* where time > 10 align by device +Wrong example: select * from root.sg1.* align by device where time > 10 + +3. The paths of the SELECT clause can only be single level. In other words, the paths of the SELECT clause can only be measurements or STAR, without DOT. +Correct example: select s0,s1 from root.sg1.* align by device +Correct example: select s0,s1 from root.sg1.d0, root.sg1.d1 align by device +Correct example: select * from root.sg1.* align by device +Correct example: select * from root.** align by device +Correct example: select s0,s1,* from root.*.* align by device +Wrong example: select d0.s1, d0.s2, d1.s0 from root.sg1 align by device +Wrong example: select *.s0, *.s1 from root.* align by device +Wrong example: select *.*.* from root align by device + +4. The data types of the same measurement column should be the same across devices. +Note that when it comes to aggregated paths, the data type of the measurement column will reflect +the aggregation function rather than the original timeseries. + +Correct example: select s0 from root.sg1.d0,root.sg1.d1 align by device +root.sg1.d0.s0 and root.sg1.d1.s0 are both INT32. + +Correct example: select count(s0) from root.sg1.d0,root.sg1.d1 align by device +count(root.sg1.d0.s0) and count(root.sg1.d1.s0) are both INT64. + +Wrong example: select s0 from root.sg1.d0, root.sg2.d3 align by device +root.sg1.d0.s0 is INT32 while root.sg2.d3.s0 is FLOAT. + +5. The display principle of the result table is that all the columns (no matther whther a column has has existing data) will be shown, with nonexistent cells being null. Besides, the select clause support const column (e.g., 'a', '123' etc..). +For example, "select s0,s1,s2,'abc',s1,s2 from root.sg.d0, root.sg.d1, root.sg.d2 align by device". Suppose that the actual existing timeseries are as follows: +- root.sg.d0.s0 +- root.sg.d0.s1 +- root.sg.d1.s0 + +Then you could expect a table like: + +| Time | Device | s0 | s1 | s2 | 'abc' | s1 | s2 | +| --- | --- | ---| ---| null | 'abc' | ---| null | +| 1 |root.sg.d0| 20 | 2.5| null | 'abc' | 2.5| null | +| 2 |root.sg.d0| 23 | 3.1| null | 'abc' | 3.1| null | +| ... | ... | ...| ...| null | 'abc' | ...| null | +| 1 |root.sg.d1| 12 |null| null | 'abc' |null| null | +| 2 |root.sg.d1| 19 |null| null | 'abc' |null| null | +| ... | ... | ...| ...| null | 'abc' | ...| null | + +Note that the cells of measurement 's0' and device 'root.sg.d1' are all null. + +6. The duplicated devices in the prefix paths are neglected. +For example, "select s0,s1 from root.sg.d0,root.sg.d0,root.sg.d1 align by device" is equal to "select s0,s1 from root.sg.d0,root.sg.d1 align by device". +For example. "select s0,s1 from root.sg.*,root.sg.d0 align by device" is equal to "select s0,s1 from root.sg.* align by device". + +7. The duplicated measurements in the suffix paths are not neglected. +For example, "select s0,s0,s1 from root.sg.* align by device" is not equal to "select s0,s1 from root.sg.* align by device". + +8. Both time predicates and value predicates are allowed in Where Clause. The paths of the value predicates can be the leaf node or full path started with ROOT. And wildcard is not allowed here. For example: +- select * from root.sg.* where time = 1 align by device +- select * from root.sg.* where s0 < 100 align by device +- select * from root.sg.* where time < 20 AND s0 > 50 align by device +- select * from root.sg.d0 where root.sg.d0.s0 = 15 align by device + +9. More correct examples: + - select * from root.vehicle.* align by device + - select s0,s0,s1 from root.vehicle.* align by device + - select s0,s1 from root.vehicle.* limit 10 offset 1 align by device + - select * from root.vehicle.* slimit 10 soffset 2 align by device + - select * from root.vehicle.* where time > 10 align by device + - select * from root.vehicle.* where time < 10 AND s0 > 25 align by device + - select * from root.vehicle.* where root.vehicle.d0.s0>0 align by device + - select count(*) from root.vehicle align by device + - select sum(*) from root.vehicle.* GROUP BY (20ms,0,[2,50]) align by device + - select * from root.vehicle.* where time = 3 Fill(int32[previous, 5ms]) align by device +``` +* Disable Align Statement + +``` +Disable Align Clause: DISABLE ALIGN + +Rules: +1. Both uppercase and lowercase are ok. +Correct example: select * from root.sg1.* disable align +Correct example: select * from root.sg1.* DISABLE ALIGN + +2. Disable Align Clause can only be used at the end of a query statement. +Correct example: select * from root.sg1.* where time > 10 disable align +Wrong example: select * from root.sg1.* disable align where time > 10 + +3. Disable Align Clause cannot be used with Aggregation, Fill Statements, Group By or Group By Device Statements, but can with Limit Statements. +Correct example: select * from root.sg1.* limit 3 offset 2 disable align +Correct example: select * from root.sg1.* slimit 3 soffset 2 disable align +Wrong example: select count(s0),count(s1) from root.sg1.d1 disable align +Wrong example: select * from root.vehicle.* where root.vehicle.d0.s0>0 disable align +Wrong example: select * from root.vehicle.* align by device disable align + +4. The display principle of the result table is that only when the column (or row) has existing data will the column (or row) be shown, with nonexistent cells being empty. + +You could expect a table like: +| Time | root.sg.d0.s1 | Time | root.sg.d0.s2 | Time | root.sg.d1.s1 | +| --- | --- | --- | --- | --- | --- | +| 1 | 100 | 20 | 300 | 400 | 600 | +| 2 | 300 | 40 | 800 | 700 | 900 | +| 4 | 500 | | | 800 | 1000 | +| | | | | 900 | 8000 | + +5. More correct examples: + - select * from root.vehicle.* disable align + - select s0,s0,s1 from root.vehicle.* disable align + - select s0,s1 from root.vehicle.* limit 10 offset 1 disable align + - select * from root.vehicle.* slimit 10 soffset 2 disable align + - select * from root.vehicle.* where time > 10 disable align + +``` + +* Select Last Record Statement + +The LAST function returns the last time-value pair of the given timeseries. Currently filters are not supported in LAST queries. + +``` +SELECT LAST FROM +Select Clause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : [(AND | OR) ]* +TimeExpr : TIME PrecedenceEqualOperator ( | ) + +Eg. SELECT LAST s1 FROM root.sg.d1 +Eg. SELECT LAST s1, s2 FROM root.sg.d1 +Eg. SELECT LAST s1 FROM root.sg.d1, root.sg.d2 +Eg. SELECT LAST s1 FROM root.sg.d1 where time > 100 +Eg. SELECT LAST s1, s2 FROM root.sg.d1 where time >= 500 + +Rules: +1. the statement needs to satisfy this constraint: + = + +2. SELECT LAST only supports time filter that contains '>' or '>=' currently. + +3. The result set of last query will always be displayed in a fixed three column table format. +For example, "select last s1, s2 from root.sg.d1, root.sg.d2", the query result would be: + +| Time | Path | Value | dataType | +| --- | ------------- |------ | -------- | +| 5 | root.sg.d1.s1 | 100 | INT32 | +| 2 | root.sg.d1.s2 | 400 | INT32 | +| 4 | root.sg.d2.s1 | 250 | INT32 | +| 9 | root.sg.d2.s2 | 600 | INT32 | + +4. It is not supported to use "diable align" in LAST query. + +``` + +* As Statement + +As statement assigns an alias to time seires queried in SELECT statement + +``` +You can use as statement in all queries, but some rules are restricted about wildcard. + +1. Raw data query +select s1 as speed, s2 as temperature from root.sg.d1 + +The result set will be like: +| Time | speed | temperature | +| ... | ... | .... | + +2. Aggregation query +select count(s1) as s1_num, max_value(s2) as s2_max from root.sg.d1 + +3. Down-frequence query +select count(s1) as s1_num from root.sg.d1 group by ([100,500), 80ms) + +4. Align by device query +select s1 as speed, s2 as temperature from root.sg.d1 align by device + +select count(s1) as s1_num, count(s2), count(s3) as s3_num from root.sg.d2 align by device + +5. Last Record query +select last s1 as speed, s2 from root.sg.d1 + +Rules: +1. In addition to Align by device query,each AS statement has to corresponding to one time series exactly. + +E.g. select s1 as temperature from root.sg.* + +At this time if `root.sg.*` includes more than one device,then an exception will be thrown。 + +2. In align by device query,the prefix path that each AS statement corresponding to can includes multiple device, but the suffix path can only be single sensor. + +E.g. select s1 as temperature from root.sg.* + +In this situation, it will be show correctly even if multiple devices are selected. + +E.g. select * as temperature from root.sg.d1 + +In this situation, it will throws an exception if * corresponds to multiple sensors. + +``` + +* Regexp Statement + +Regexp Statement only supports regular expressions with Java standard library style on timeseries which is TEXT data type +``` +SELECT FROM WHERE +Select Clause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : andExpression (OPERATOR_OR andExpression)* +andExpression : predicate (OPERATOR_AND predicate)* +predicate : (suffixPath | fullPath) REGEXP regularExpression +regularExpression: Java standard regularexpression, like '^[a-z][0-9]$', [details](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) + +Eg. select s1 from root.sg.d1 where s1 regexp '^[0-9]*$' +Eg. select s1, s2 FROM root.sg.d1 where s1 regexp '^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$' and s2 regexp '^\d{15}|\d{18}$' +Eg. select * from root.sg.d1 where s1 regexp '^[a-zA-Z]\w{5,17}$' +Eg. select * from root.sg.d1 where s1 regexp '^\d{4}-\d{1,2}-\d{1,2}' and time > 100 +``` + +* Like Statement + +The usage of LIKE Statement similar with mysql, but only support timeseries which is TEXT data type +``` +SELECT FROM WHERE +Select Clause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : andExpression (OPERATOR_OR andExpression)* +andExpression : predicate (OPERATOR_AND predicate)* +predicate : (suffixPath | fullPath) LIKE likeExpression +likeExpression : string that may contains "%" or "_", while "%value" means a string that ends with the value, "value%" means a string starts with the value, "%value%" means string that contains values, and "_" represents any character. + +Eg. select s1 from root.sg.d1 where s1 like 'abc' +Eg. select s1, s2 from root.sg.d1 where s1 like 'a%bc' +Eg. select * from root.sg.d1 where s1 like 'abc_' +Eg. select * from root.sg.d1 where s1 like 'abc\%' and time > 100 +In this situation, '\%' means '%' will be escaped +The result set will be like: +| Time | Path | Value | +| --- | ------------ | ----- | +| 200 | root.sg.d1.s1| abc% | +``` + +## Database Management Statement + +* Create User + +``` +CREATE USER ; +userName:=identifier +password:=string +Eg: IoTDB > CREATE USER thulab 'pwd'; +``` + +* Delete User + +``` +DROP USER ; +userName:=identifier +Eg: IoTDB > DROP USER xiaoming; +``` + +* Create Role + +``` +CREATE ROLE ; +roleName:=identifie +Eg: IoTDB > CREATE ROLE admin; +``` + +* Delete Role + +``` +DROP ROLE ; +roleName:=identifier +Eg: IoTDB > DROP ROLE admin; +``` + +* Grant User Privileges + +``` +GRANT USER PRIVILEGES ON ; +userName:=identifier +nodeName:=identifier (DOT identifier)* +privileges:= string (COMMA string)* +Eg: IoTDB > GRANT USER tempuser PRIVILEGES DELETE_TIMESERIES on root.ln; +``` + +* Grant Role Privileges + +``` +GRANT ROLE PRIVILEGES ON ; +privileges:= string (COMMA string)* +roleName:=identifier +nodeName:=identifier (DOT identifier)* +Eg: IoTDB > GRANT ROLE temprole PRIVILEGES DELETE_TIMESERIES ON root.ln; +``` + +* Grant User Role + +``` +GRANT TO ; +roleName:=identifier +userName:=identifier +Eg: IoTDB > GRANT temprole TO tempuser; +``` + +* Revoke User Privileges + +``` +REVOKE USER PRIVILEGES ON ; +privileges:= string (COMMA string)* +userName:=identifier +nodeName:=identifier (DOT identifier)* +Eg: IoTDB > REVOKE USER tempuser PRIVILEGES DELETE_TIMESERIES on root.ln; +``` + +* Revoke Role Privileges + +``` +REVOKE ROLE PRIVILEGES ON ; +privileges:= string (COMMA string)* +roleName:= identifier +nodeName:=identifier (DOT identifier)* +Eg: IoTDB > REVOKE ROLE temprole PRIVILEGES DELETE_TIMESERIES ON root.ln; +``` + +* Revoke Role From User + +``` +REVOKE FROM ; +roleName:=identifier +userName:=identifier +Eg: IoTDB > REVOKE temprole FROM tempuser; +``` + +* List Users + +``` +LIST USER +Eg: IoTDB > LIST USER +``` + +* List Roles + +``` +LIST ROLE +Eg: IoTDB > LIST ROLE +``` + +* List Privileges + +``` +LIST PRIVILEGES USER ON ; +username:=identifier +path=‘root’ (DOT identifier)* +Eg: IoTDB > LIST PRIVILEGES USER sgcc_wirte_user ON root.sgcc; +``` + +* List Privileges of Roles + +``` +LIST ROLE PRIVILEGES +roleName:=identifier +Eg: IoTDB > LIST ROLE PRIVILEGES actor; +``` + +* List Privileges of Roles(On Specific Path) + +``` +LIST PRIVILEGES ROLE ON ; +roleName:=identifier +path=‘root’ (DOT identifier)* +Eg: IoTDB > LIST PRIVILEGES ROLE wirte_role ON root.sgcc; +``` + +* List Privileges of Users + +``` +LIST USER PRIVILEGES ; +username:=identifier +Eg: IoTDB > LIST USER PRIVILEGES tempuser; +``` + +* List Roles of Users + +``` +LIST ALL ROLE OF USER ; +username:=identifier +Eg: IoTDB > LIST ALL ROLE OF USER tempuser; +``` + +* List Users of Role + +``` +LIST ALL USER OF ROLE ; +roleName:=identifier +Eg: IoTDB > LIST ALL USER OF ROLE roleuser; +``` + +* Alter Password + +``` +ALTER USER SET PASSWORD ; +roleName:=identifier +password:=identifier +Eg: IoTDB > ALTER USER tempuser SET PASSWORD 'newpwd'; +``` + +## Functions + +* COUNT + +The COUNT function returns the value number of timeseries(one or more) non-null values selected by the SELECT statement. The result is a signed 64-bit integer. If there are no matching rows, COUNT () returns 0. + +``` +SELECT COUNT(Path) (COMMA COUNT(Path))* FROM [WHERE ]? +Eg. SELECT COUNT(status), COUNT(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* FIRST_VALUE(Rename from `FIRST` at `V0.10.0`) + +The FIRST_VALUE function returns the first point value of the choosen timeseries(one or more). + +``` +SELECT FIRST_VALUE (Path) (COMMA FIRST_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT FIRST_VALUE (status), FIRST_VALUE (temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* LAST_VALUE + +The LAST_VALUE function returns the last point value of the choosen timeseries(one or more). + +``` +SELECT LAST_VALUE (Path) (COMMA LAST_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT LAST_VALUE (status), LAST_VALUE (temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MAX_TIME + +The MAX_TIME function returns the maximum timestamp of the choosen timeseries(one or more). The result is a signed 64-bit integer, greater than 0. + +``` +SELECT MAX_TIME (Path) (COMMA MAX_TIME (Path))* FROM [WHERE ]? +Eg. SELECT MAX_TIME(status), MAX_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MAX_VALUE + +The MAX_VALUE function returns the maximum value(lexicographically ordered) of the choosen timeseries (one or more). + +``` +SELECT MAX_VALUE (Path) (COMMA MAX_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT MAX_VALUE(status), MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* EXTREME + +The EXTREME function returns the extreme value(lexicographically ordered) of the choosen timeseries (one or more). +extreme value: The value that has the maximum absolute value. +If the maximum absolute value of a positive value and a negative value is equal, return the positive value. +``` +SELECT EXTREME (Path) (COMMA EXT (Path))* FROM [WHERE ]? +Eg. SELECT EXTREME(status), EXTREME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* AVG(Rename from `MEAN` at `V0.9.0`) + +The AVG function returns the arithmetic mean value of the choosen timeseries over a specified period of time. The timeseries must be int32, int64, float, double type, and the other types are not to be calculated. The result is a double type number. + +``` +SELECT AVG (Path) (COMMA AVG (Path))* FROM [WHERE ]? +Eg. SELECT AVG (temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MIN_TIME + +The MIN_TIME function returns the minimum timestamp of the choosen timeseries(one or more). The result is a signed 64-bit integer, greater than 0. + +``` +SELECT MIN_TIME (Path) (COMMA MIN_TIME (Path))*FROM [WHERE ]? +Eg. SELECT MIN_TIME(status), MIN_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MIN_VALUE + +The MIN_VALUE function returns the minimum value(lexicographically ordered) of the choosen timeseries (one or more). + +``` +SELECT MIN_VALUE (Path) (COMMA MIN_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT MIN_VALUE(status),MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* NOW + +The NOW function returns the current timestamp. This function can be used in the data operation statement to represent time. The result is a signed 64-bit integer, greater than 0. + +``` +NOW() +Eg. INSERT INTO root.ln.wf01.wt01(timestamp,status) VALUES(NOW(), false) +Eg. DELETE FROM root.ln.wf01.wt01.status, root.ln.wf01.wt01.temperature WHERE time < NOW() +Eg. SELECT * FROM root.** WHERE time < NOW() +Eg. SELECT COUNT(temperature) FROM root.ln.wf01.wt01 WHERE time < NOW() +``` +* SUM + +The SUM function returns the sum of the choosen timeseries (one or more) over a specified period of time. The timeseries must be int32, int64, float, double type, and the other types are not to be calculated. The result is a double type number. + +``` +SELECT SUM(Path) (COMMA SUM(Path))* FROM [WHERE ]? +Eg. SELECT SUM(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +## TTL + +IoTDB supports device-level TTL settings, which means it is able to delete old data automatically and periodically. The benefit of using TTL is that hopefully you can control the total disk space usage and prevent the machine from running out of disks. Moreover, the query performance may downgrade as the total number of files goes up and the memory usage also increases as there are more files. Timely removing such files helps to keep at a high query performance level and reduce memory usage. + +The default unit of TTL is milliseconds. If the time precision in the configuration file changes to another, the TTL is still set to milliseconds. + +When setting TTL, the system will look for all devices included in the set path and set TTL for these devices. The system will delete expired data at the device granularity. +After the device data expires, it will not be queryable. The data in the disk file cannot be guaranteed to be deleted immediately, but it can be guaranteed to be deleted eventually. +However, due to operational costs, the expired data will not be physically deleted right after expiring. The physical deletion is delayed until compaction. +Therefore, before the data is physically deleted, if the TTL is reduced or lifted, it may cause data that was previously invisible due to TTL to reappear. +The system can only set up to 1000 TTL rules, and when this limit is reached, some TTL rules need to be deleted before new rules can be set. + +### TTL Path Rule +The path can only be prefix paths (i.e., the path cannot contain \* , except \*\* in the last level). +This path will match devices and also allows users to specify paths without asterisks as specific databases or devices. +When the path does not contain asterisks, the system will check if it matches a database; if it matches a database, both the path and path.\*\* will be set at the same time. Note: Device TTL settings do not verify the existence of metadata, i.e., it is allowed to set TTL for a non-existent device. +``` +qualified paths: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +unqualified paths: +root.*.db +root.**.db.* +root.db.* +``` +### TTL Applicable Rules +When a device is subject to multiple TTL rules, the more precise and longer rules are prioritized. For example, for the device "root.bj.hd.dist001.turbine001", the rule "root.bj.hd.dist001.turbine001" takes precedence over "root.bj.hd.dist001.\*\*", and the rule "root.bj.hd.dist001.\*\*" takes precedence over "root.bj.hd.**". +### Set TTL +The set ttl operation can be understood as setting a TTL rule, for example, setting ttl to root.sg.group1.** is equivalent to mounting ttl for all devices that can match this path pattern. +The unset ttl operation indicates unmounting TTL for the corresponding path pattern; if there is no corresponding TTL, nothing will be done. +If you want to set TTL to be infinitely large, you can use the INF keyword. +The SQL Statement for setting TTL is as follow: +``` +set ttl to pathPattern 360000; +``` +Set the Time to Live (TTL) to a pathPattern of 360,000 milliseconds; the pathPattern should not contain a wildcard (\*) in the middle and must end with a double asterisk (\*\*). The pathPattern is used to match corresponding devices. +To maintain compatibility with older SQL syntax, if the user-provided pathPattern matches a database (db), the path pattern is automatically expanded to include all sub-paths denoted by path.\*\*. +For instance, writing "set ttl to root.sg 360000" will automatically be transformed into "set ttl to root.sg.\*\* 360000", which sets the TTL for all devices under root.sg. However, if the specified pathPattern does not match a database, the aforementioned logic will not apply. For example, writing "set ttl to root.sg.group 360000" will not be expanded to "root.sg.group.\*\*" since root.sg.group does not match a database. +It is also permissible to specify a particular device without a wildcard (*). +### Unset TTL + +To unset TTL, we can use follwing SQL statement: + +``` +IoTDB> unset ttl from root.ln +``` + +After unset TTL, all data will be accepted in `root.ln`. +``` +IoTDB> unset ttl from root.sgcc.** +``` + +Unset the TTL in the `root.sgcc` path. + +New syntax +``` +IoTDB> unset ttl from root.** +``` + +Old syntax +``` +IoTDB> unset ttl to root.** +``` +There is no functional difference between the old and new syntax, and they are compatible with each other. +The new syntax is just more conventional in terms of wording. + +Unset the TTL setting for all path pattern. + +### Show TTL + +To Show TTL, we can use following SQL statement: + +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` + +The SHOW ALL TTL example gives the TTL for all path patterns. +The SHOW TTL ON pathPattern shows the TTL for the path pattern specified. + +Display devices' ttl +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +All devices will definitely have a TTL, meaning it cannot be null. INF represents infinity. + +* Delete Partition (experimental) +``` +DELETE PARTITION StorageGroupName INT(COMMA INT)* +Eg DELETE PARTITION root.sg1 0,1,2 +This example will delete the first 3 time partitions of database root.sg1. +``` +The partitionId can be found in data folders or converted using `timestamp / partitionInterval`. + +## Kill query + +- Show the list of queries in progress + +``` +SHOW QUERY PROCESSLIST +``` + +- Kill query + +``` +KILL QUERY INT? +E.g. KILL QUERY +E.g. KILL QUERY 2 +``` + +## SET SYSTEM TO READONLY / WRITABLE + +Set IoTDB system to read-only or writable mode. + +``` +IoTDB> SET SYSTEM TO READONLY +IoTDB> SET SYSTEM TO WRITABLE +``` + +## Identifiers + +``` +QUOTE := '\''; +DOT := '.'; +COLON : ':' ; +COMMA := ',' ; +SEMICOLON := ';' ; +LPAREN := '(' ; +RPAREN := ')' ; +LBRACKET := '['; +RBRACKET := ']'; +EQUAL := '=' | '=='; +NOTEQUAL := '<>' | '!='; +LESSTHANOREQUALTO := '<='; +LESSTHAN := '<'; +GREATERTHANOREQUALTO := '>='; +GREATERTHAN := '>'; +DIVIDE := '/'; +PLUS := '+'; +MINUS := '-'; +STAR := '*'; +Letter := 'a'..'z' | 'A'..'Z'; +HexDigit := 'a'..'f' | 'A'..'F'; +Digit := '0'..'9'; +Boolean := TRUE | FALSE | 0 | 1 (case insensitive) + +``` + +``` +StringLiteral := ( '\'' ( ~('\'') )* '\''; +eg. 'abc' +``` + +``` +Integer := ('-' | '+')? Digit+; +eg. 123 +eg. -222 +``` + +``` +Float := ('-' | '+')? Digit+ DOT Digit+ (('e' | 'E') ('-' | '+')? Digit+)?; +eg. 3.1415 +eg. 1.2E10 +eg. -1.33 +``` + +``` +Identifier := (Letter | '_') (Letter | Digit | '_' | MINUS)*; +eg. a123 +eg. _abc123 + +``` + +## Literals + + +``` +PointValue : Integer | Float | StringLiteral | Boolean +``` +``` +TimeValue : Integer | DateTime | ISO8601 | NOW() +Note: Integer means timestamp type. + +DateTime : +eg. 2016-11-16T16:22:33+08:00 +eg. 2016-11-16 16:22:33+08:00 +eg. 2016-11-16T16:22:33.000+08:00 +eg. 2016-11-16 16:22:33.000+08:00 +Note: DateTime Type can support several types, see Chapter 3 Datetime section for details. +``` +``` +PrecedenceEqualOperator : EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO | GREATERTHAN +``` +``` +Timeseries : ROOT [DOT \]* DOT \ +LayerName : Identifier +SensorName : Identifier +eg. root.ln.wf01.wt01.status +eg. root.sgcc.wf03.wt01.temperature +Note: Timeseries must be start with `root`(case insensitive) and end with sensor name. +``` + +``` +PrefixPath : ROOT (DOT \)* +LayerName : Identifier | STAR +eg. root.sgcc +eg. root.* +``` +``` +Path: (ROOT | ) (DOT )* +LayerName: Identifier | STAR +eg. root.ln.wf01.wt01.status +eg. root.*.wf01.wt01.status +eg. root.ln.wf01.wt01.* +eg. *.wt01.* +eg. * +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Schema-Template.md b/src/UserGuide/V2.0.1/Tree/stage/Schema-Template.md new file mode 100644 index 00000000..a5fe6c47 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Schema-Template.md @@ -0,0 +1,67 @@ + + +# Schema Template + +## Problem scenario + +When faced with a large number of entities of the same type and the measurements of these entities are the same, registering time series for each measurent will result in the following problems. On the one hand, the metadata of time series will occupy a lot of memory resources; on the other hand, the maintenance of a large number of time series will be very complex. + +In order to enable different entities of the same type to share metadata, reduce the memory usage of metadata, and simplify the management of numerous entities and measurements, IoTDB introduces the schema template function. + +The following picture illustrates the data model of petrol vehicle scenario. The velocity, fuel amount, acceleration, and angular velocity of each petrol vehicle spread over cities will be collected. Obviously, the measurements of single petrol vehicle are the same as those of another. + +example without template + +## Concept + +Supported from v0.13 + +In the actual scenario, many entities collect the same measurements, that is, they have the same measurements name and type. A schema template can be declared to define the collectable measurements set. Schema template is hung on any node of the tree data pattern, which means that all entities under the node have the same measurements set. + +Currently you can only set one schema template on a specific path. If there's one schema template on one node, it will be forbidden to set any schema template on the ancestors or descendants of this node. An entity will use it's own schema template or ancestor's schema template. + +**Please notice that, we strongly recommend not setting templates on the nodes above the database to accommodate future updates and collaboration between modules.** + +In the following chapters of data definition language, data operation language and Java Native Interface, various operations related to schema template will be introduced one by one. + +After applying schema template, the following picture illustrates the new data model of petrol vehicle scenario. All petrol vehicles share the schemas defined in template. There are no redundancy storage of measurement schemas. + +example with template + +### Lifetime of Schema Template + +The term about lifetime of schema template may help you utilize it in a better way. Within this section, there are 6 key words specifying certain phase of schema template, namely CREATE, SET, ACTIVATE, DEACTIVATE, UNSET, and DROP. The figure below shows the process and related SQL examples for all these phases. When a user issues a statement mentioned above, there will be a check accordingly. The statement will be executed successfully if the check passed, refused otherwise. + +1. To CREATE a template, ensure that the template has a distinct name from all existed ones; +2. To SET a template on one node, ensure that all ancestors and descendants of the node has not been set any template yet; +3. To ACTIVATE a template on one node, ensure that the node or one of its ancestor had been set the template and no measurement child of the node entitled identical name as those inside the template; +4. To DEACTIVATE a template from one node, ensure that the node had been ACTIVATED before and note that timeseries instantiated from the template as well as its data points will be removed; +5. To UNSET a template on one node, ensure that the node had been SET the template previously and none of its descendants are being ACTIVATED of the template; +6. To DROP a template, ensure that the template is not SET to any nodes on the MTree now. + +It should be complemented that the distinction between SET and ACTIVATE is meant to serve an ubiquitous scenario where massive nodes with a common ancestor may need to apply the template. Under this circumstance, it is more feasible to SET the template on the common ancestor rather than all those descendant. For those who needs to apply the template, ACTIVATE is a more appropriate arrangement. + +example with template + +## Usage + +Java Native API, C++ Native API, and IoTDB-SQL have supported Schema Template usage. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Security-Management_apache.md b/src/UserGuide/V2.0.1/Tree/stage/Security-Management_apache.md new file mode 100644 index 00000000..e8e4e897 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Security-Management_apache.md @@ -0,0 +1,536 @@ + + +# Security Management + +## Administration Management + +IoTDB provides users with account privilege management operations, so as to ensure data security. + +We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../SQL-Manual/SQL-Manual.md). +At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. + +### Basic Concepts + +#### User + +The user is the legal user of the database. A user corresponds to a unique username and has a password as a means of authentication. Before using a database, a person must first provide a legitimate username and password to make himself/herself a user. + +#### Privilege + +The database provides a variety of operations, and not all users can perform all operations. If a user can perform an operation, the user is said to have the privilege to perform the operation. privileges are divided into data management privilege (such as adding, deleting and modifying data) and authority management privilege (such as creation and deletion of users and roles, granting and revoking of privileges, etc.). Data management privilege often needs a path to limit its effective range. It is flexible that using [path pattern](../Basic-Concept/Data-Model-and-Terminology.md) to manage privileges. + +#### Role + +A role is a set of privileges and has a unique role name as an identifier. A user usually corresponds to a real identity (such as a traffic dispatcher), while a real identity may correspond to multiple users. These users with the same real identity tend to have the same privileges. Roles are abstractions that can unify the management of such privileges. + +#### Default User + +There is a default user in IoTDB after the initial installation: root, and the default password is root. This user is an administrator user, who cannot be deleted and has all the privileges. Neither can new privileges be granted to the root user nor can privileges owned by the root user be deleted. + +### Privilege Management Operation Examples + +According to the [sample data](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt), the sample data of IoTDB might belong to different power generation groups such as ln, sgcc, etc. Different power generation groups do not want others to obtain their own database data, so we need to have data privilege isolated at the group layer. + +#### Create User + +We use `CREATE USER ` to create users. For example, we can use root user who has all privileges to create two users for ln and sgcc groups, named ln\_write\_user and sgcc\_write\_user, with both passwords being write\_pwd. It is recommended to wrap the username in backtick(`). The SQL statement is: + +``` +CREATE USER `ln_write_user` 'write_pwd' +CREATE USER `sgcc_write_user` 'write_pwd' +``` +Then use the following SQL statement to show the user: + +``` +LIST USER +``` +As can be seen from the result shown below, the two users have been created: + +``` +IoTDB> CREATE USER `ln_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> LIST USER ++---------------+ +| user| ++---------------+ +| ln_write_user| +| root| +|sgcc_write_user| ++---------------+ +Total line number = 3 +It costs 0.157s +``` + +#### Grant User Privilege + +At this point, although two users have been created, they do not have any privileges, so they can not operate on the database. For example, we use ln_write_user to write data in the database, the SQL statement is: + +``` +INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +``` +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +Now, we use root user to grant the two users write privileges to the corresponding databases. + +We use `GRANT USER PRIVILEGES ON ` to grant user privileges(ps: grant create user does not need path). For example: + +``` +GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +``` +The execution result is as follows: + +``` +IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +Next, use ln_write_user to try to write data again. +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: The statement is executed successfully. +``` + +#### Revoker User Privilege + +After granting user privileges, we could use `REVOKE USER PRIVILEGES ON ` to revoke the granted user privileges(ps: revoke create user does not need path). For example, use root user to revoke the privilege of ln_write_user and sgcc_write_user: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +``` + +The execution result is as follows: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +After revoking, ln_write_user has no permission to writing data to root.ln.** +``` +INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +#### SQL Statements + +Here are all related SQL statements: + +* Create User + +``` +CREATE USER ; +Eg: IoTDB > CREATE USER `thulab` 'pwd'; +``` + +* Delete User + +``` +DROP USER ; +Eg: IoTDB > DROP USER `xiaoming`; +``` + +* Create Role + +``` +CREATE ROLE ; +Eg: IoTDB > CREATE ROLE `admin`; +``` + +* Delete Role + +``` +DROP ROLE ; +Eg: IoTDB > DROP ROLE `admin`; +``` + +* Grant User Privileges + +``` +GRANT USER PRIVILEGES ON ; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +- Grant User All Privileges + +``` +GRANT USER PRIVILEGES ALL; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; +``` + +* Grant Role Privileges + +``` +GRANT ROLE PRIVILEGES ON ; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +- Grant Role All Privileges + +``` +GRANT ROLE PRIVILEGES ALL ON ; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; +``` + +* Grant User Role + +``` +GRANT TO ; +Eg: IoTDB > GRANT `temprole` TO tempuser; +``` + +* Revoke User Privileges + +``` +REVOKE USER PRIVILEGES ON ; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +* Revoke User All Privileges + +``` +REVOKE USER PRIVILEGES ALL; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; +``` + +* Revoke Role Privileges + +``` +REVOKE ROLE PRIVILEGES ON ; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +* Revoke All Role Privileges + +``` +REVOKE ROLE PRIVILEGES ALL; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; +``` + +* Revoke Role From User + +``` +REVOKE FROM ; +Eg: IoTDB > REVOKE `temprole` FROM tempuser; +``` + +* List Users + +``` +LIST USER +Eg: IoTDB > LIST USER +``` + +* List User of Specific Role + +``` +LIST USER OF ROLE ; +Eg: IoTDB > LIST USER OF ROLE `roleuser`; +``` + +* List Roles + +``` +LIST ROLE +Eg: IoTDB > LIST ROLE +``` + +* List Roles of Specific User + +``` +LIST ROLE OF USER ; +Eg: IoTDB > LIST ROLE OF USER `tempuser`; +``` + +* List All Privileges of Users + +``` +LIST PRIVILEGES USER ; +Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; +``` + +* List Related Privileges of Users(On Specific Paths) + +``` +LIST PRIVILEGES USER ON ; +Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +``` + +* List All Privileges of Roles + +``` +LIST PRIVILEGES ROLE +Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; +``` + +* List Related Privileges of Roles(On Specific Paths) + +``` +LIST PRIVILEGES ROLE ON ; +Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +``` + +* Alter Password + +``` +ALTER USER SET PASSWORD ; +Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; +``` + + +### Other Instructions + +#### The Relationship among Users, Privileges and Roles + +A Role is a set of privileges, and privileges and roles are both attributes of users. That is, a role can have several privileges and a user can have several roles and privileges (called the user's own privileges). + +At present, there is no conflicting privilege in IoTDB, so the real privileges of a user is the union of the user's own privileges and the privileges of the user's roles. That is to say, to determine whether a user can perform an operation, it depends on whether one of the user's own privileges or the privileges of the user's roles permits the operation. The user's own privileges and privileges of the user's roles may overlap, but it does not matter. + +It should be noted that if users have a privilege (corresponding to operation A) themselves and their roles contain the same privilege, then revoking the privilege from the users themselves alone can not prohibit the users from performing operation A, since it is necessary to revoke the privilege from the role, or revoke the role from the user. Similarly, revoking the privilege from the users's roles alone can not prohibit the users from performing operation A. + +At the same time, changes to roles are immediately reflected on all users who own the roles. For example, adding certain privileges to roles will immediately give all users who own the roles corresponding privileges, and deleting certain privileges will also deprive the corresponding users of the privileges (unless the users themselves have the privileges). + +#### List of Privileges Included in the System + +| privilege Name | Interpretation | Example | +|:--------------------------|:-------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CREATE\_DATABASE | create database; set/unset database ttl; path dependent | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | +| DELETE\_DATABASE | delete databases; path dependent | Eg: `delete database root.ln;` | +| CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | +| INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | +| ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | +| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](../User-Manual/Query-Data.md)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | +| DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | +| CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | +| DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | +| MODIFY\_PASSWORD | modify passwords for all users; path independent; (Those who do not have this privilege can still change their own asswords. ) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | +| LIST\_USER | list all users; list all user of specific role; list a user's related privileges on speciific paths; path independent | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | +| GRANT\_USER\_PRIVILEGE | grant user privileges; path independent | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| REVOKE\_USER\_PRIVILEGE | revoke user privileges; path independent | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| GRANT\_USER\_ROLE | grant user roles; path independent | Eg: `grant temprole to tempuser;` | +| REVOKE\_USER\_ROLE | revoke user roles; path independent | Eg: `revoke temprole from tempuser;` | +| CREATE\_ROLE | create roles; path independent | Eg: `create role admin;` | +| DELETE\_ROLE | delete roles; path independent | Eg: `drop role admin;` | +| LIST\_ROLE | list all roles; list all roles of specific user; list a role's related privileges on speciific paths; path independent | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | +| GRANT\_ROLE\_PRIVILEGE | grant role privileges; path independent | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| REVOKE\_ROLE\_PRIVILEGE | revoke role privileges; path independent | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| CREATE_FUNCTION | register UDFs; path independent | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | +| DROP_FUNCTION | deregister UDFs; path independent | Eg: `drop function example` | +| CREATE_TRIGGER | create triggers; path dependent | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | +| DROP_TRIGGER | drop triggers; path dependent | Eg: `drop trigger 'alert-listener-sg1d1s1'` | +| CREATE_CONTINUOUS_QUERY | create continuous queries; path independent | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | +| DROP_CONTINUOUS_QUERY | drop continuous queries; path independent | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | +| SHOW_CONTINUOUS_QUERIES | show continuous queries; path independent | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | +| UPDATE_TEMPLATE | create and drop schema template; path independent | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | +| READ_TEMPLATE | show schema templates and show nodes in schema template; path independent | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | +| APPLY_TEMPLATE | set, unset and activate schema template; path dependent | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | +| READ_TEMPLATE_APPLICATION | show paths set and using schema template; path independent | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | + +Note that path dependent privileges can only be granted or revoked on root.**; + +Note that the following SQL statements need to be granted multiple permissions before they can be used: + +- Import data: Need to assign `READ_TIMESERIES`,`INSERT_TIMESERIES` two permissions.。 + +``` +Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv +``` + +- Query Write-back (SELECT INTO) +- - `READ_TIMESERIES` permission of source sequence in all `select` clauses is required +- `INSERT_TIMESERIES` permission of target sequence in all `into` clauses is required + +``` +Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 +``` + +#### Username Restrictions + +IoTDB specifies that the character length of a username should not be less than 4, and the username cannot contain spaces. + +#### Password Restrictions + +IoTDB specifies that the character length of a password should have no less than 4 character length, and no spaces. The password is encrypted with MD5. + +#### Role Name Restrictions + +IoTDB specifies that the character length of a role name should have no less than 4 character length, and no spaces. + +#### Path pattern in Administration Management + +A path pattern's result set contains all the elements of its sub pattern's +result set. For example, `root.sg.d.*` is a sub pattern of +`root.sg.*.*`, while `root.sg.**` is not a sub pattern of +`root.sg.*.*`. When a user is granted privilege on a pattern, the pattern used in his DDL or DML must be a sub pattern of the privilege pattern, which guarantees that the user won't access the timeseries exceed his privilege scope. + +#### Permission cache + +In distributed related permission operations, when changing permissions other than creating users and roles, all the cache information of `dataNode` related to the user (role) will be cleared first. If any `dataNode` cache information is clear and fails, the permission change task will fail. + +#### Operations restricted by non root users + +At present, the following SQL statements supported by iotdb can only be operated by the `root` user, and no corresponding permission can be given to the new user. + +##### TsFile Management + +- Load TsFiles + +``` +Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' +``` + +- remove a tsfile + +``` +Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' +``` + +- unload a tsfile and move it to a target directory + +``` +Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' +``` + +##### Delete Time Partition (experimental) + +``` +Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 +``` + +##### Continuous Query,CQ + +``` +Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END +``` + +##### Maintenance Command + +- FLUSH + +``` +Eg: IoTDB > flush +``` + +- MERGE + +``` +Eg: IoTDB > MERGE +Eg: IoTDB > FULL MERGE +``` + +- CLEAR CACHE + +```sql +Eg: IoTDB > CLEAR CACHE +``` + +- START REPAIR DATA + +```sql +Eg: IoTDB > START REPAIR DATA +``` + +- STOP REPAIR DATA + +```sql +Eg: IoTDB > STOP REPAIR DATA +``` + +- SET SYSTEM TO READONLY / WRITABLE + +``` +Eg: IoTDB > SET SYSTEM TO READONLY / WRITABLE +``` + +- Query abort + +``` +Eg: IoTDB > KILL QUERY 1 +``` + +##### Watermark Tool + +- Watermark new users + +``` +Eg: IoTDB > grant watermark_embedding to Alice +``` + +- Watermark Detection + +``` +Eg: IoTDB > revoke watermark_embedding from Alice +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Security-Management_timecho.md b/src/UserGuide/V2.0.1/Tree/stage/Security-Management_timecho.md new file mode 100644 index 00000000..94b755e9 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Security-Management_timecho.md @@ -0,0 +1,544 @@ + + +# Security Management + +## White List + +TODO + +## Audit Log + +TODO + +## Administration Management + +IoTDB provides users with account privilege management operations, so as to ensure data security. + +We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../SQL-Manual/SQL-Manual.md). +At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. + +### Basic Concepts + +#### User + +The user is the legal user of the database. A user corresponds to a unique username and has a password as a means of authentication. Before using a database, a person must first provide a legitimate username and password to make himself/herself a user. + +#### Privilege + +The database provides a variety of operations, and not all users can perform all operations. If a user can perform an operation, the user is said to have the privilege to perform the operation. privileges are divided into data management privilege (such as adding, deleting and modifying data) and authority management privilege (such as creation and deletion of users and roles, granting and revoking of privileges, etc.). Data management privilege often needs a path to limit its effective range. It is flexible that using [path pattern](../Basic-Concept/Data-Model-and-Terminology.md) to manage privileges. + +#### Role + +A role is a set of privileges and has a unique role name as an identifier. A user usually corresponds to a real identity (such as a traffic dispatcher), while a real identity may correspond to multiple users. These users with the same real identity tend to have the same privileges. Roles are abstractions that can unify the management of such privileges. + +#### Default User + +There is a default user in IoTDB after the initial installation: root, and the default password is root. This user is an administrator user, who cannot be deleted and has all the privileges. Neither can new privileges be granted to the root user nor can privileges owned by the root user be deleted. + +### Privilege Management Operation Examples + +According to the [sample data](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt), the sample data of IoTDB might belong to different power generation groups such as ln, sgcc, etc. Different power generation groups do not want others to obtain their own database data, so we need to have data privilege isolated at the group layer. + +#### Create User + +We use `CREATE USER ` to create users. For example, we can use root user who has all privileges to create two users for ln and sgcc groups, named ln\_write\_user and sgcc\_write\_user, with both passwords being write\_pwd. It is recommended to wrap the username in backtick(`). The SQL statement is: + +``` +CREATE USER `ln_write_user` 'write_pwd' +CREATE USER `sgcc_write_user` 'write_pwd' +``` +Then use the following SQL statement to show the user: + +``` +LIST USER +``` +As can be seen from the result shown below, the two users have been created: + +``` +IoTDB> CREATE USER `ln_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> LIST USER ++---------------+ +| user| ++---------------+ +| ln_write_user| +| root| +|sgcc_write_user| ++---------------+ +Total line number = 3 +It costs 0.157s +``` + +#### Grant User Privilege + +At this point, although two users have been created, they do not have any privileges, so they can not operate on the database. For example, we use ln_write_user to write data in the database, the SQL statement is: + +``` +INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +``` +The SQL statement will not be executed and the corresponding error prompt is given as follows: + +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +Now, we use root user to grant the two users write privileges to the corresponding databases. + +We use `GRANT USER PRIVILEGES ON ` to grant user privileges(ps: grant create user does not need path). For example: + +``` +GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +``` +The execution result is as follows: + +``` +IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +Next, use ln_write_user to try to write data again. +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: The statement is executed successfully. +``` + +#### Revoker User Privilege + +After granting user privileges, we could use `REVOKE USER PRIVILEGES ON ` to revoke the granted user privileges(ps: revoke create user does not need path). For example, use root user to revoke the privilege of ln_write_user and sgcc_write_user: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +``` + +The execution result is as follows: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +After revoking, ln_write_user has no permission to writing data to root.ln.** +``` +INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +#### SQL Statements + +Here are all related SQL statements: + +* Create User + +``` +CREATE USER ; +Eg: IoTDB > CREATE USER `thulab` 'pwd'; +``` + +* Delete User + +``` +DROP USER ; +Eg: IoTDB > DROP USER `xiaoming`; +``` + +* Create Role + +``` +CREATE ROLE ; +Eg: IoTDB > CREATE ROLE `admin`; +``` + +* Delete Role + +``` +DROP ROLE ; +Eg: IoTDB > DROP ROLE `admin`; +``` + +* Grant User Privileges + +``` +GRANT USER PRIVILEGES ON ; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +- Grant User All Privileges + +``` +GRANT USER PRIVILEGES ALL; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; +``` + +* Grant Role Privileges + +``` +GRANT ROLE PRIVILEGES ON ; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +- Grant Role All Privileges + +``` +GRANT ROLE PRIVILEGES ALL ON ; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; +``` + +* Grant User Role + +``` +GRANT TO ; +Eg: IoTDB > GRANT `temprole` TO tempuser; +``` + +* Revoke User Privileges + +``` +REVOKE USER PRIVILEGES ON ; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +* Revoke User All Privileges + +``` +REVOKE USER PRIVILEGES ALL; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; +``` + +* Revoke Role Privileges + +``` +REVOKE ROLE PRIVILEGES ON ; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +* Revoke All Role Privileges + +``` +REVOKE ROLE PRIVILEGES ALL; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; +``` + +* Revoke Role From User + +``` +REVOKE FROM ; +Eg: IoTDB > REVOKE `temprole` FROM tempuser; +``` + +* List Users + +``` +LIST USER +Eg: IoTDB > LIST USER +``` + +* List User of Specific Role + +``` +LIST USER OF ROLE ; +Eg: IoTDB > LIST USER OF ROLE `roleuser`; +``` + +* List Roles + +``` +LIST ROLE +Eg: IoTDB > LIST ROLE +``` + +* List Roles of Specific User + +``` +LIST ROLE OF USER ; +Eg: IoTDB > LIST ROLE OF USER `tempuser`; +``` + +* List All Privileges of Users + +``` +LIST PRIVILEGES USER ; +Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; +``` + +* List Related Privileges of Users(On Specific Paths) + +``` +LIST PRIVILEGES USER ON ; +Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +``` + +* List All Privileges of Roles + +``` +LIST PRIVILEGES ROLE +Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; +``` + +* List Related Privileges of Roles(On Specific Paths) + +``` +LIST PRIVILEGES ROLE ON ; +Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +``` + +* Alter Password + +``` +ALTER USER SET PASSWORD ; +Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; +``` + + +### Other Instructions + +#### The Relationship among Users, Privileges and Roles + +A Role is a set of privileges, and privileges and roles are both attributes of users. That is, a role can have several privileges and a user can have several roles and privileges (called the user's own privileges). + +At present, there is no conflicting privilege in IoTDB, so the real privileges of a user is the union of the user's own privileges and the privileges of the user's roles. That is to say, to determine whether a user can perform an operation, it depends on whether one of the user's own privileges or the privileges of the user's roles permits the operation. The user's own privileges and privileges of the user's roles may overlap, but it does not matter. + +It should be noted that if users have a privilege (corresponding to operation A) themselves and their roles contain the same privilege, then revoking the privilege from the users themselves alone can not prohibit the users from performing operation A, since it is necessary to revoke the privilege from the role, or revoke the role from the user. Similarly, revoking the privilege from the users's roles alone can not prohibit the users from performing operation A. + +At the same time, changes to roles are immediately reflected on all users who own the roles. For example, adding certain privileges to roles will immediately give all users who own the roles corresponding privileges, and deleting certain privileges will also deprive the corresponding users of the privileges (unless the users themselves have the privileges). + +#### List of Privileges Included in the System + +| privilege Name | Interpretation | Example | +|:--------------------------|:-------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CREATE\_DATABASE | create database; set/unset database ttl; path dependent | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | +| DELETE\_DATABASE | delete databases; path dependent | Eg: `delete database root.ln;` | +| CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | +| INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | +| ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | +| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](../User-Manual/Query-Data.md)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | +| DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | +| CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | +| DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | +| MODIFY\_PASSWORD | modify passwords for all users; path independent; (Those who do not have this privilege can still change their own asswords. ) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | +| LIST\_USER | list all users; list all user of specific role; list a user's related privileges on speciific paths; path independent | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | +| GRANT\_USER\_PRIVILEGE | grant user privileges; path independent | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| REVOKE\_USER\_PRIVILEGE | revoke user privileges; path independent | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| GRANT\_USER\_ROLE | grant user roles; path independent | Eg: `grant temprole to tempuser;` | +| REVOKE\_USER\_ROLE | revoke user roles; path independent | Eg: `revoke temprole from tempuser;` | +| CREATE\_ROLE | create roles; path independent | Eg: `create role admin;` | +| DELETE\_ROLE | delete roles; path independent | Eg: `drop role admin;` | +| LIST\_ROLE | list all roles; list all roles of specific user; list a role's related privileges on speciific paths; path independent | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | +| GRANT\_ROLE\_PRIVILEGE | grant role privileges; path independent | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| REVOKE\_ROLE\_PRIVILEGE | revoke role privileges; path independent | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| CREATE_FUNCTION | register UDFs; path independent | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | +| DROP_FUNCTION | deregister UDFs; path independent | Eg: `drop function example` | +| CREATE_TRIGGER | create triggers; path dependent | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | +| DROP_TRIGGER | drop triggers; path dependent | Eg: `drop trigger 'alert-listener-sg1d1s1'` | +| CREATE_CONTINUOUS_QUERY | create continuous queries; path independent | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | +| DROP_CONTINUOUS_QUERY | drop continuous queries; path independent | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | +| SHOW_CONTINUOUS_QUERIES | show continuous queries; path independent | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | +| UPDATE_TEMPLATE | create and drop schema template; path independent | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | +| READ_TEMPLATE | show schema templates and show nodes in schema template; path independent | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | +| APPLY_TEMPLATE | set, unset and activate schema template; path dependent | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | +| READ_TEMPLATE_APPLICATION | show paths set and using schema template; path independent | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | + +Note that path dependent privileges can only be granted or revoked on root.**; + +Note that the following SQL statements need to be granted multiple permissions before they can be used: + +- Import data: Need to assign `READ_TIMESERIES`,`INSERT_TIMESERIES` two permissions.。 + +``` +Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv +``` + +- Query Write-back (SELECT INTO) +- - `READ_TIMESERIES` permission of source sequence in all `select` clauses is required +- `INSERT_TIMESERIES` permission of target sequence in all `into` clauses is required + +``` +Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 +``` + +#### Username Restrictions + +IoTDB specifies that the character length of a username should not be less than 4, and the username cannot contain spaces. + +#### Password Restrictions + +IoTDB specifies that the character length of a password should have no less than 4 character length, and no spaces. The password is encrypted with MD5. + +#### Role Name Restrictions + +IoTDB specifies that the character length of a role name should have no less than 4 character length, and no spaces. + +#### Path pattern in Administration Management + +A path pattern's result set contains all the elements of its sub pattern's +result set. For example, `root.sg.d.*` is a sub pattern of +`root.sg.*.*`, while `root.sg.**` is not a sub pattern of +`root.sg.*.*`. When a user is granted privilege on a pattern, the pattern used in his DDL or DML must be a sub pattern of the privilege pattern, which guarantees that the user won't access the timeseries exceed his privilege scope. + +#### Permission cache + +In distributed related permission operations, when changing permissions other than creating users and roles, all the cache information of `dataNode` related to the user (role) will be cleared first. If any `dataNode` cache information is clear and fails, the permission change task will fail. + +#### Operations restricted by non root users + +At present, the following SQL statements supported by iotdb can only be operated by the `root` user, and no corresponding permission can be given to the new user. + +##### TsFile Management + +- Load TsFiles + +``` +Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' +``` + +- remove a tsfile + +``` +Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' +``` + +- unload a tsfile and move it to a target directory + +``` +Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' +``` + +##### Delete Time Partition (experimental) + +``` +Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 +``` + +##### Continuous Query,CQ + +``` +Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END +``` + +##### Maintenance Command + +- FLUSH + +``` +Eg: IoTDB > flush +``` + +- MERGE + +``` +Eg: IoTDB > MERGE +Eg: IoTDB > FULL MERGE +``` + +- CLEAR CACHE + +```sql +Eg: IoTDB > CLEAR CACHE +``` + +- START REPAIR DATA + +```sql +Eg: IoTDB > START REPAIR DATA +``` + +- STOP REPAIR DATA + +```sql +Eg: IoTDB > STOP REPAIR DATA +``` + +- SET SYSTEM TO READONLY / WRITABLE + +``` +Eg: IoTDB > SET SYSTEM TO READONLY / WRITABLE +``` + +- Query abort + +``` +Eg: IoTDB > KILL QUERY 1 +``` + +##### Watermark Tool + +- Watermark new users + +``` +Eg: IoTDB > grant watermark_embedding to Alice +``` + +- Watermark Detection + +``` +Eg: IoTDB > revoke watermark_embedding from Alice +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/ServerFileList.md b/src/UserGuide/V2.0.1/Tree/stage/ServerFileList.md new file mode 100644 index 00000000..d5ce0b03 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/ServerFileList.md @@ -0,0 +1,117 @@ + + +> Here are all files generated or used by IoTDB +> +> Continuously Updating... + +# Stand-alone + +## Configuration Files +> under conf directory +1. iotdb-system.properties +2. logback.xml +3. datanode-env.sh +4. jmx.access +5. jmx.password +6. iotdb-sync-client.properties + + only sync tool use it + +> under directory basedir/system/schema +1. system.properties + + record all immutable properties, will be checked when starting IoTDB to avoid system errors + +## State Related Files + +### MetaData Related Files +> under directory basedir/system/schema + +#### Meta +1. mlog.bin + + record the meta operation +2. mtree-1.snapshot + + snapshot of metadata +3. mtree-1.snapshot.tmp + + temp file, to avoid damaging the snapshot when updating it + +#### Tags&Attributes +1. tlog.txt + + store tags and attributes of each TimeSeries + + about 700 bytes for each TimeSeries + +### Data Related Files +> under directory basedir/data/ + +#### WAL +> under directory basedir/wal + +1. {StorageGroupName}-{TsFileName}/wal1 + + every database has several wal files, and every memtable has one associated wal file before it is flushed into a TsFile + +#### TsFile +> under directory data/sequence or unsequence/{DatabaseName}/{DataRegionId}/{TimePartitionId}/ + +1. {time}-{version}-{mergeCnt}.tsfile + + normal data file +2. {TsFileName}.tsfile.mod + + modification file + + record delete operation + +#### TsFileResource +1. {TsFileName}.tsfile.resource + + descriptor and statistic file of a TsFile +2. {TsFileName}.tsfile.resource.temp + + temp file + + avoid damaging the tsfile.resource when updating it +3. {TsFileName}.tsfile.resource.closing + + close flag file, to mark a tsfile closing so during restarts we can continue to close it or reopen it + +#### Version +> under directory basedir/system/databases/{DatabaseName}/{DataRegionId}/{TimePartitionId} or upgrade + +1. Version-{version} + + version file, record the max version in fileName of a database + +#### Upgrade +> under directory basedir/system/upgrade + +1. upgrade.txt + + record which files have been upgraded + +#### Merge +> under directory basedir/system/databases/{StorageGroup}/ + +1. merge.mods + + modification file generated during a merge +2. merge.log + + record the progress of a merge +3. tsfile.merge + + temporary merge result file, an involved sequence tsfile may have one during a merge + +#### Authority +> under directory basedir/system/users/ +> under directory basedir/system/roles/ + +#### CompressRatio +> under directory basedir/system/compression_ration +1. Ration-{compressionRatioSum}-{calTimes} + + record compression ratio of each tsfile + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Detailed-Grammar.md b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Detailed-Grammar.md new file mode 100644 index 00000000..1876e57c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Detailed-Grammar.md @@ -0,0 +1,28 @@ + + +# Detailed Definitions of Lexical and Grammar + +Please read the lexical and grammar description files in our code repository: + +Lexical file: `antlr/src/main/antlr4/org/apache/iotdb/db/qp/sql/IoTDBSqlLexer.g4` + +Grammer file: `antlr/src/main/antlr4/org/apache/iotdb/db/qp/sql/IoTDBSqlParser.g4` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Identifier.md b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Identifier.md new file mode 100644 index 00000000..f362dcaa --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Identifier.md @@ -0,0 +1,141 @@ + + +# Identifier + +## Usage scenarios + +Certain objects within IoTDB, including `TRIGGER`, `FUNCTION`(UDF), `CONTINUOUS QUERY`, `SCHEMA TEMPLATE`, `USER`, `ROLE`,`Pipe`,`PipeSink`,`alias` and other object names are known as identifiers. + +## Constraints + +Below are basic constraints of identifiers, specific identifiers may have other constraints, for example, `user` should consists of more than 4 characters. + +- Permitted characters in unquoted identifiers: + - [0-9 a-z A-Z _ ] (letters, digits and underscore) + - ['\u2E80'..'\u9FFF'] (UNICODE Chinese characters) +- Identifiers may begin with a digit, unquoted identifiers can not be a real number. +- Identifiers are case sensitive. +- Key words can be used as an identifier. + +**You need to quote the identifier with back quote(`) in the following cases:** + +- Identifier contains special characters. +- Identifier that is a real number. + +## How to use quotations marks in quoted identifiers + +`'` and `"` can be used directly in quoted identifiers. + +` may be written as `` in quoted identifiers. See the example below: + +```sql +# create template t1't"t +create schema template `t1't"t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + +# create template t1`t +create schema template `t1``t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +## Examples + +Examples of case in which quoted identifier is used : + +- Trigger name should be quoted in cases described above : + + ```sql + # create trigger named alert.`listener-sg1d1s1 + CREATE TRIGGER `alert.``listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.storageengine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + ``` + +- UDF name should be quoted in cases described above : + + ```sql + # create a funciton named 111, 111 is a real number. + CREATE FUNCTION `111` AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- Template name should be quoted in cases described above : + + ```sql + # create a template named 111, 111 is a real number. + create schema template `111` + (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + ``` + +- User and Role name should be quoted in cases described above, blank space is not allow in User and Role name whether quoted or not : + + ```sql + # create user special`user. + CREATE USER `special``user.` 'write_pwd' + + # create role 111 + CREATE ROLE `111` + ``` + +- Continuous query name should be quoted in cases described above : + + ```sql + # create continuous query test.cq + CREATE CONTINUOUS QUERY `test.cq` + BEGIN + SELECT max_value(temperature) + INTO temperature_max + FROM root.ln.*.* + GROUP BY time(10s) + END + ``` + +- Pipe、PipeSink should be quoted in cases described above : + + ```sql + # create PipeSink test.*1 + CREATE PIPESINK `test.*1` AS IoTDB ('ip' = '输入你的IP') + + # create Pipe test.*2 + CREATE PIPE `test.*2` TO `test.*1` FROM + (select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS) WITH 'SyncDelOp' = 'true' + ``` + +- `AS` function provided by IoTDB can assign an alias to time series selected in query. Alias can be constant(including string) or identifier. + + ```sql + select s1 as temperature, s2 as speed from root.ln.wf01.wt01; + + # Header of result dataset + +-----------------------------+-----------|-----+ + | Time|temperature|speed| + +-----------------------------+-----------|-----+ + ``` + +- The key/value of an attribute can be String Literal and identifier, more details can be found at **key-value pair** part. + + + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/KeyValue-Pair.md b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/KeyValue-Pair.md new file mode 100644 index 00000000..3041b1b8 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/KeyValue-Pair.md @@ -0,0 +1,119 @@ + + +# Key-Value Pair + +**The key/value of an attribute can be constant(including string) and identifier.** + +Below are usage scenarios of key-value pair: + +- Attributes fields of trigger. See the attributes after `With` clause in the example below: + +```sql +# 以字符串形式表示键值对 +CREATE TRIGGER `alert-listener-sg1d1s1` +AFTER INSERT +ON root.sg1.d1.s1 +AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' +WITH ( + 'lo' = '0', + 'hi' = '100.0' +) + +# 以标识符和常量形式表示键值对 +CREATE TRIGGER `alert-listener-sg1d1s1` +AFTER INSERT +ON root.sg1.d1.s1 +AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' +WITH ( + lo = 0, + hi = 100.0 +) +``` + +- Key-value pair to represent tag/attributes in timeseries: + +```sql +# create timeseries using string as key/value +CREATE timeseries root.turbine.d1.s1(temprature) +WITH datatype = FLOAT, encoding = RLE, compression = SNAPPY, 'max_point_number' = '5' +TAGS('tag1' = 'v1', 'tag2'= 'v2') ATTRIBUTES('attr1' = 'v1', 'attr2' = 'v2') + +# create timeseries using constant as key/value +CREATE timeseries root.turbine.d1.s1(temprature) +WITH datatype = FLOAT, encoding = RLE, compression = SNAPPY, max_point_number = 5 +TAGS(tag1 = v1, tag2 = v2) ATTRIBUTES(attr1 = v1, attr2 = v2) +``` + +```sql +# alter tags and attributes of timeseries +ALTER timeseries root.turbine.d1.s1 SET 'newTag1' = 'newV1', 'attr1' = 'newV1' + +ALTER timeseries root.turbine.d1.s1 SET newTag1 = newV1, attr1 = newV1 +``` + +```sql +# rename tag +ALTER timeseries root.turbine.d1.s1 RENAME 'tag1' TO 'newTag1' + +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +```sql +# upsert alias, tags, attributes +ALTER timeseries root.turbine.d1.s1 UPSERT +ALIAS='newAlias' TAGS('tag2' = 'newV2', 'tag3' = 'v3') ATTRIBUTES('attr3' ='v3', 'attr4'='v4') + +ALTER timeseries root.turbine.d1.s1 UPSERT +ALIAS = newAlias TAGS(tag2 = newV2, tag3 = v3) ATTRIBUTES(attr3 = v3, attr4 = v4) +``` + +```sql +# add new tags +ALTER timeseries root.turbine.d1.s1 ADD TAGS 'tag3' = 'v3', 'tag4' = 'v4' + +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3 = v3, tag4 = v4 +``` + +```sql +# add new attributes +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES 'attr3' = 'v3', 'attr4' = 'v4' + +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3 = v3, attr4 = v4 +``` + +```sql +# query for timeseries +SHOW timeseries root.ln.** WHRER 'unit' = 'c' + +SHOW timeseries root.ln.** WHRER unit = c +``` + +- Attributes fields of Pipe and PipeSink. + +```sql +# PipeSink example +CREATE PIPESINK my_iotdb AS IoTDB ('ip' = '输入你的IP') + +# Pipe example +CREATE PIPE my_pipe TO my_iotdb FROM +(select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS) WITH 'SyncDelOp' = 'true' +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Keywords-And-Reserved-Words.md b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Keywords-And-Reserved-Words.md new file mode 100644 index 00000000..80b011d5 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Keywords-And-Reserved-Words.md @@ -0,0 +1,26 @@ + + +# KeyWords Words + +Keywords are words that have significance in SQL. Keywords can be used as an identifier. Certain keywords, such as TIME/TIMESTAMP and ROOT, are reserved and cannot use as identifiers. + +[Keywords](../Reference/Keywords.md) shows the keywords in IoTDB. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Literal-Values.md b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Literal-Values.md new file mode 100644 index 00000000..96e674bd --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Literal-Values.md @@ -0,0 +1,157 @@ + + +# Literal Values + +This section describes how to write literal values in IoTDB. These include strings, numbers, timestamp values, boolean values, and NULL. + +## String Literals + +in IoTDB, **A string is a sequence of bytes or characters, enclosed within either single quote (`'`) or double quote (`"`) characters.** Examples: + +```js +'a string' +"another string" +``` + +### Usage Scenarios + +Usages of string literals: + +- Values of `TEXT` type data in `INSERT` or `SELECT` statements + + ```sql + # insert + insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') + insert into root.ln.wf02.wt02(timestamp,hardware) values(2, '\\') + + +-----------------------------+--------------------------+ + | Time|root.ln.wf02.wt02.hardware| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.001+08:00| v1| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.002+08:00| \\| + +-----------------------------+--------------------------+ + + # select + select code from root.sg1.d1 where code in ('string1', 'string2'); + ``` + +- Used in`LOAD` / `REMOVE` / `SETTLE` instructions to represent file path. + + ```sql + # load + LOAD 'examplePath' + + # remove + REMOVE 'examplePath' + + # SETTLE + SETTLE 'examplePath' + ``` + +- Password fields in user management statements + + ```sql + # write_pwd is the password + CREATE USER ln_write_user 'write_pwd' + ``` + +- Full Java class names in UDF and trigger management statements + + ```sql + # Trigger example. Full java class names after 'AS' should be string literals. + CREATE TRIGGER `alert-listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + + # UDF example. Full java class names after 'AS' should be string literals. + CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- `AS` function provided by IoTDB can assign an alias to time series selected in query. Alias can be constant(including string) or identifier. + + ```sql + select s1 as 'temperature', s2 as 'speed' from root.ln.wf01.wt01; + + # Header of dataset + +-----------------------------+-----------|-----+ + | Time|temperature|speed| + +-----------------------------+-----------|-----+ + ``` + +- The key/value of an attribute can be String Literal and identifier, more details can be found at **key-value pair** part. + + +### How to use quotation marks in String Literals + +There are several ways to include quote characters within a string: + + - `'` inside a string quoted with `"` needs no special treatment and need not be doubled or escaped. In the same way, `"` inside a string quoted with `'` needs no special treatment. + - A `'` inside a string quoted with `'` may be written as `''`. +- A `"` inside a string quoted with `"` may be written as `""`. + +The following examples demonstrate how quoting and escaping work: + +```js +'string' // string +'"string"' // "string" +'""string""' // ""string"" +'''string' // 'string + +"string" // string +"'string'" // 'string' +"''string''" // ''string'' +"""string" // "string +``` + +## Numeric Literals + +Number literals include integer (exact-value) literals and floating-point (approximate-value) literals. + +Integers are represented as a sequence of digits. Numbers may be preceded by `-` or `+` to indicate a negative or positive value, respectively. Examples: `1`, `-1`. + +Numbers with fractional part or represented in scientific notation with a mantissa and exponent are approximate-value numbers. Examples: `.1`, `3.14`, `-2.23`, `+1.70`, `1.2E3`, `1.2E-3`, `-1.2E3`, `-1.2E-3`. + +The `INT32` and `INT64` data types are integer types and calculations are exact. + +The `FLOAT` and `DOUBLE` data types are floating-point types and calculations are approximate. + +An integer may be used in floating-point context; it is interpreted as the equivalent floating-point number. + +## Timestamp Literals + +The timestamp is the time point at which data is produced. It includes absolute timestamps and relative timestamps in IoTDB. For information about timestamp support in IoTDB, see [Data Type Doc](../Basic-Concept/Data-Type.md). + +Specially, `NOW()` represents a constant timestamp that indicates the system time at which the statement began to execute. + +## Boolean Literals + +The constants `TRUE` and `FALSE` evaluate to 1 and 0, respectively. The constant names can be written in any lettercase. + +## NULL Values + +The `NULL` value means “no data.” `NULL` can be written in any lettercase. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/NodeName-In-Path.md b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/NodeName-In-Path.md new file mode 100644 index 00000000..3c72d738 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/NodeName-In-Path.md @@ -0,0 +1,119 @@ + + +# Node Name in Path + +Node name is a special identifier, it can also be wildcard `*` and `**`. When creating timeseries, node name can not be wildcard. In query statment, you can use wildcard to match one or more nodes of path. + +## Wildcard + +`*` represents one node. For example, `root.vehicle.*.sensor1` represents a 4-node path which is prefixed with `root.vehicle` and suffixed with `sensor1`. + +`**` represents (`*`)+, which is one or more nodes of `*`. For example, `root.vehicle.device1.**` represents all paths prefixed by `root.vehicle.device1` with nodes num greater than or equal to 4, like `root.vehicle.device1.*`, `root.vehicle.device1.*.*`, `root.vehicle.device1.*.*.*`, etc; `root.vehicle.**.sensor1` represents a path which is prefixed with `root.vehicle` and suffixed with `sensor1` and has at least 4 nodes. + +As `*` can also be used in expressions of select clause to represent multiplication, below are examples to help you better understand the usage of `* `: + +```sql +# create timeseries root.sg.`a*b` +create timeseries root.sg.`a*b` with datatype=FLOAT,encoding=PLAIN; + +# As described in Identifier part, a*b should be quoted. +# "create timeseries root.sg.a*b with datatype=FLOAT,encoding=PLAIN" is wrong. + +# create timeseries root.sg.a +create timeseries root.sg.a with datatype=FLOAT,encoding=PLAIN; + +# create timeseries root.sg.b +create timeseries root.sg.b with datatype=FLOAT,encoding=PLAIN; + +# query data of root.sg.`a*b` +select `a*b` from root.sg +# Header of result dataset +|Time|root.sg.a*b| + +# multiplication of root.sg.a and root.sg.b +select a*b from root.sg +# Header of result dataset +|Time|root.sg.a * root.sg.b| +``` + +## Identifier + +When node name is not wildcard, it is a identifier, which means the constraints on it is the same as described in Identifier part. + +- Create timeseries statement: + +```sql +# Node name contains special characters like ` and .,all nodes of this timeseries are: ["root","sg","www.`baidu.com"] +create timeseries root.sg.`www.``baidu.com`.a with datatype=FLOAT,encoding=PLAIN; + +# Node name is a real number. +create timeseries root.sg.`111`.a with datatype=FLOAT,encoding=PLAIN; +``` + +After executing above statments, execute "show timeseries",below is the result: + +```sql ++---------------------------+-----+-------------+--------+--------+-----------+----+----------+ +| timeseries|alias|database|dataType|encoding|compression|tags|attributes| ++---------------------------+-----+-------------+--------+--------+-----------+----+----------+ +| root.sg.`111`.a| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| +|root.sg.`www.``baidu.com`.a| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| ++---------------------------+-----+-------------+--------+--------+-----------+----+----------+ +``` + +- Insert statment: + +```sql +# Node name contains special characters like . and ` +insert into root.sg.`www.``baidu.com`(timestamp, a) values(1, 2); + +# Node name is a real number. +insert into root.sg(timestamp, `111`) values (1, 2); +``` + +- Query statement: + +```sql +# Node name contains special characters like . and ` +select a from root.sg.`www.``baidu.com`; + +# Node name is a real number. +select `111` from root.sg +``` + +Results: + +```sql +# select a from root.sg.`www.``baidu.com` ++-----------------------------+---------------------------+ +| Time|root.sg.`www.``baidu.com`.a| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 2.0| ++-----------------------------+---------------------------+ + +# select `111` from root.sg ++-----------------------------+-----------+ +| Time|root.sg.111| ++-----------------------------+-----------+ +|1970-01-01T08:00:00.001+08:00| 2.0| ++-----------------------------+-----------+ +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Session-And-TsFile-API.md b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Session-And-TsFile-API.md new file mode 100644 index 00000000..e70b6b7c --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Session-And-TsFile-API.md @@ -0,0 +1,119 @@ + + +# Session And TsFile API + +When using the Session and TsFile APIs, if the method you call requires parameters such as measurement, device, database, path in the form of String, **please ensure that the parameters passed in the input string is the same as when using the SQL statement**, here are some examples to help you understand. Code example could be found at: `example/session/src/main/java/org/apache/iotdb/SyntaxConventionRelatedExample.java` + +1. Take creating a time series createTimeseries as an example: + +```java +public void createTimeseries( + String path, + TSDataType dataType, + TSEncoding encoding, + CompressionType compressor) + throws IoTDBConnectionException, StatementExecutionException; +``` + +If you wish to create the time series root.sg.a, root.sg.\`a.\`\`"b\`, root.sg.\`111\`, the SQL statement you use should look like this: + +```sql +create timeseries root.sg.a with datatype=FLOAT,encoding=PLAIN,compressor=SNAPPY; + +# node names contain special characters, each node in the time series is ["root","sg","a.`\"b"] +create timeseries root.sg.`a.``"b` with datatype=FLOAT,encoding=PLAIN,compressor=SNAPPY; + +# node names are pure numbers +create timeseries root.sg.`111` with datatype=FLOAT,encoding=PLAIN,compressor=SNAPPY; +``` + +When you call the createTimeseries method, you should assign the path string as follows to ensure that the content of the path string is the same as when using SQL: + +```java +// timeseries root.sg.a +String path = "root.sg.a"; + +// timeseries root.sg.`a.``"b` +String path = "root.sg.`a.``\"b`"; + +// timeseries root.sg.`111` +String path = "root.sg.`111`"; +``` + +2. Take inserting data insertRecord as an example: + +```java +public void insertRecord( + String deviceId, + long time, + List measurements, + List types, + Object... values) + throws IoTDBConnectionException, StatementExecutionException; +``` + +If you want to insert data into the time series root.sg.a, root.sg.\`a.\`\`"b\`, root.sg.\`111\`, the SQL statement you use should be as follows: + +```sql +insert into root.sg(timestamp, a, `a.``"b`, `111`) values (1, 2, 2, 2); +``` + +When you call the insertRecord method, you should assign deviceId and measurements as follows: + +```java +// deviceId is root.sg +String deviceId = "root.sg"; + +// measurements +String[] measurements = new String[]{"a", "`a.``\"b`", "`111`"}; +List measurementList = Arrays.asList(measurements); +``` + +3. Take executeRawDataQuery as an example: + +```java +public SessionDataSet executeRawDataQuery( + List paths, + long startTime, + long endTime) + throws StatementExecutionException, IoTDBConnectionException; +``` + +If you wish to query the data of the time series root.sg.a, root.sg.\`a.\`\`"b\`, root.sg.\`111\`, the SQL statement you use should be as follows : + +```sql +select a from root.sg + +# node name contains special characters +select `a.``"b` from root.sg; + +# node names are pure numbers +select `111` from root.sg +``` + +When you call the executeRawDataQuery method, you should assign paths as follows: + +```java +// paths +String[] paths = new String[]{"root.sg.a", "root.sg.`a.``\"b`", "root.sg.`111`"}; +List pathList = Arrays.asList(paths); +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/TSDB-Comparison.md b/src/UserGuide/V2.0.1/Tree/stage/TSDB-Comparison.md new file mode 100644 index 00000000..8b33561d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/TSDB-Comparison.md @@ -0,0 +1,386 @@ + + +# TSDB Comparison + +## Overview + +![TSDB Comparison](https://alioss.timecho.com/docs/img/github/119833923-182ffc00-bf32-11eb-8b3f-9f95d3729ad2.png) + + + +**The table format is inspired by [Andriy Zabavskyy: How to Select Time Series DB](https://towardsdatascience.com/how-to-select-time-series-db-123b0eb4ab82).* + + + +## Known Time Series Database + +As the time series data becomes more and more important, +several open sourced time series databases are introduced to the world. +However, few of them are developed for IoT or IIoT (Industrial IoT) scenario in particular. + + +3 kinds of TSDBs are compared here. + +* InfluxDB - Native Time series database + + InfluxDB is one of the most popular TSDBs. + + Interface: InfluxQL and HTTP API + +* OpenTSDB and KairosDB - Time series database based on NoSQL + + These two DBs are similar, while the first is based on HBase and the second is based on Cassandra. + Both of them provides RESTful style API. + + Interface: Restful API + +* TimescaleDB - Time series database based on Relational Database + + Interface: SQL + +Prometheus and Druid are also famous for time series data management. +However, Prometheus focuses data collection, data visualization and alert warnings. +Druid focuses on data analysis with OLAP workload. We omit them here. + + +## Comparison +The above time series databases are compared from two aspects: the feature comparison and the performance +comparison. + + +### Feature Comparison + +I list the basic features comparison of these databases. + +Legend: +- `++`: big support greatly +- `+`: support +- `+-`: support but not very good +- `-`: not support +- `?`: unknown + +#### Basic Features + +| TSDB | IoTDB | InfluxDB | OpenTSDB | KairosDB | TimescaleDB | +| ----------------------------- | :---------------------: | :--------: | :--------: | :--------: | :---------: | +| *OpenSource* | **+** | + | + | **+** | + | +| *SQL\-like* | + | + | - | - | **++** | +| *Schema* | tree\-based, tag\-based | tag\-based | tag\-based | tag\-based | relational | +| *Writing out\-of\-order data* | + | + | + | + | + | +| *Schema\-less* | + | + | + | + | + | +| *Batch insertion* | + | + | + | + | + | +| *Time range filter* | + | + | + | + | + | +| *Order by time* | **++** | + | - | - | + | +| *Value filter* | + | + | - | - | + | +| *Downsampling* | **++** | + | + | + | + | +| *Fill* | **++** | + | + | - | + | +| *LIMIT* | + | + | + | + | + | +| *SLIMIT* | + | + | - | - | ? | +| *Latest value* | ++ | + | + | - | + | + +**Details** + +* OpenSource: + + * IoTDB uses Apache License 2.0. + * InfluxDB uses MIT license. However, **the cluster version is not open sourced**. + * OpenTSDB uses LGPL2.1, which **is not compatible with Apache License**. + * KairosDB uses Apache License 2.0. + * TimescaleDB uses Timescale License, which is not free for enterprise. + +* SQL like: + + * IoTDB and InfluxDB support SQL like language. + * OpenTSDB and KairosDB only support Rest API, while IoTDB also supports Rest API. + * TimescaleDB uses the SQL the same as PG. + +* Schema: + + * IoTDB: IoTDB proposes a [Tree based schema](http://iotdb.apache.org/UserGuide/Master/Data-Concept/Data-Model-and-Terminology.html). + It is quite different from other TSDBs. However, the kind of schema has the following advantages: + + * In many industrial scenarios, the management of devices are hierarchical, rather than flat. + That is why we think a tree based schema is better than tag-value based schema. + + * In many real world applications, tag names are constant. For example, a wind turbine manufacturer + always identify their wind turbines by which country it locates, the farm name it belongs to, and its ID in the farm. + So, a 4-depth tree ("root.the-country-name.the-farm-name.the-id") is fine. + You do not need to repeat to tell IoTDB the 2nd level of the tree is for country name, + the 3rd level is for farm id, etc. + + * A path based time series ID definition also supports flexible queries, like "root.\*.a.b.\*", where \* is wildcard character. + + * InfluxDB, KairosDB, OpenTSDB are tag-value based, which is more popular currently. + + * TimescaleDB uses relational table. + +* Order by time: + + Order by time seems quite trivial for time series database. But... if we consider another feature, called align by time, + something becomes interesting. And, that is why we mark OpenTSDB and KairosDB unsupported. + + Actually, in each time series, all these TSDBs support order data by timestamps. + + However, OpenTSDB and KairosDB do not support order data from different timeseries in the time order. + + Ok, consider a new case: I have two time series, one is for the wind speed in wind farm1, + another is for the generated energy of wind turbine1 in farm1. If we want to analyze the relation between the + wind speed and the generated energy, we have to know the values of both at the same time. + That is to say, we have to align the two time series in the time dimension. + + So, the result should be: + + | timestamp | wind speed | generated energy | + |-----------|-------------|------------------| + | 1 | 5.0 | 13.1 | + | 2 | 6.0 | 13.3 | + | 3 | null | 13.1 | + + or, + + | timestamp | series name | value | + |-----------|-------------------|------------| + | 1 | wind speed | 5.0 | + | 1 | generated energy | 13.1 | + | 2 | wind speed | 6.0 | + | 2 | generated energy | 13.3 | + | 3 | generated energy | 13.1 | + + Though the second table format does not align data by the time dimension, it is easy to be implemented in the client-side, + by just scanning data row by row. + + IoTDB supports the first table format (called align by time), InfluxDB supports the second table format. + +* Downsampling: + + Downsampling is for changing the granularity of timeseries, e.g., from 10Hz to 1Hz, or 1 point per day. + + Different from other systems, IoTDB downsamples data in real time, while others serialized downsampled data on disk. + That is to say, + + * IoTDB supports **adhoc** downsampling data in **arbitrary time**. + e.g., a SQL returns 1 point per 5 minutes and start with 2020-04-27 08:00:00 while another SQL returns 1 point per 5 minutes + 10 seconds and start with 2020-04-27 08:00:01. + (InfluxDB also supports adhoc downsampling but the performance is ..... hm) + + * There is no disk loss for IoTDB. + +* Fill: + + Sometimes we thought the data is collected in some fixed frequency, e.g., 1Hz (1 point per second). + But usually, we may lost some data points, because the network is unstable, the machine is busy, or the machine is down for several minutes. + + In this case, filling these holes is important. Data scientists can avoid many so called dirty work, e.g., data clean. + + InfluxDB and OpenTSDB only support using fill in a group by statement, while IoTDB supports to fill data when just given a particular timestamp. + Besides, IoTDB supports several strategies for filling data. + +* Slimit: + + Slimit means return limited number of measurements (or, fields in InfluxDB). + For example, a wind turbine may have 1000 measurements (speed, voltage, etc..), using slimit and soffset can just return a part of them. + +* Latest value: + + As one of the most basic timeseries based applications is monitoring the latest data. + Therefore, a query to return the latest value of a time series is very important. + IoTDB and OpenTSDB support that with a special SQL or API, + while InfluxDB supports that using an aggregation function. + (the reason why IoTDB provides a special SQL is IoTDB optimizes the query expressly.) + + + +**Conclusion**: + +Well, if we compare the basic features, we can find that OpenTSDB and KairosDB somehow lack some important query features. +TimescaleDB can not be freely used in business. +IoTDB and InfluxDB can meet most requirements of time series data management, while they have some difference. + + +#### Advanced Features + +I listed some interesting features that these systems may differ. + +| TSDB | IoTDB | InfluxDB | OpenTSDB | KairosDB | TimescaleDB | +| ---------------------------- | :----: | :------: | :------: | :------: |:-----------:| +| *Align by time* | **++** | + | - | - | + | +| *Compression* | **++** | +- | +- | +- | +- | +| *MQTT support* | **++** | + | - | - | +- | +| *Run on Edge-side Device* | **++** | + | - | +- | + | +| *Multi\-instance Sync* | **++** | - | - | - | - | +| *JDBC Driver* | **+** | - | - | - | ++ | +| *Standard SQL* | + | - | - | - | **++** | +| *Spark integration* | **++** | - | - | - | - | +| *Hive integration* | **++** | - | - | - | - | +| *Writing data to NFS (HDFS)* | **++** | - | + | - | - | +| *Flink integration* | **++** | - | - | - | - | + + +* Align by time: have been introduced. Let's skip it.. + +* Compression: + * IoTDB supports many encoding and compression for time series, like RLE, 2DIFF, Gorilla, etc.. and Snappy compression. + In IoTDB, you can choose which encoding method you want, according to the data distribution. For more info, see [here](http://iotdb.apache.org/UserGuide/Master/Data-Concept/Encoding.html). + * InfluxDB also supports encoding and compression, but you can not define which encoding method you want. + It just depends on the data type. For more info, see [here](https://docs.influxdata.com/influxdb/v1.7/concepts/storage_engine/). + * OpenTSDB and KairosDB use HBase and Cassandra in backend, and have no special encoding for time series. + +* MQTT protocol support: + + MQTT protocol is an international standard and widely known in industrial users. Only IoTDB and InfluxDB support user using MQTT client to write data. + +* Running on Edge-side Device: + + Nowdays, edge computing is more and more popular, which means the edge device has more powerful computational resources. + Deploying a TSDB on the edge side is useful for managing data on the edge side and serve for edge computing. + As OpenTSDB and KairosDB rely another DB, the architecture is heavy. Especially, it is hard to run Hadoop on the edge side. + +* Multi-instance Sync: + + Ok, now we have many TSDB instances on the edge-side. Then, how to upload their data to the data center, to form a ... data lake (or ocean, river,..., whatever). + One solution is to read data from these instances and write the data point by point to the data center instance. + IoTDB provides another choice, which is just uploading the data file into the data center incrementally, then the data center can support service on the data. + +* JDBC driver: + + Now only IoTDB supports a JDBC driver (though not all interfaces are implemented), and makes it possible to integrate many other JDBC driver based softwares. + + +* Spark and Hive integration: + + It is very important that letting big data analysis software to access the data in database for more complex data analysis. + IoTDB supports Hive-connector and Spark connector for better integration. + +* Writing data to NFS (HDFS): + Sharing nothing architecture is good, but sometimes you have to add new servers even your CPU and memory is idle but the disk is full... + Besides, if we can save the data file directly to HDFS, it will be more easy to use Spark and other softwares to analyze data, without ETL. + + * IoTDB supports writing data locally or on HDFS directly. IoTDB also allows user to extend to store data on other NFS. + * InfluxDB, KairosDB have to write data locally. + * OpenTSDB has to write data on HDFS. + +**Conclusion**: + + We can find that IoTDB has many powerful features that other TSDBs do not support. + +### Performance Comparison + +Ok... If you say, "well, I just want the basic features. IoTDB has little difference from others.". +It is somehow right. But, if you consider the performance, you may change your mind. + +#### quick review + +| TSDB | IoTDB | InfluxDB | KairosDB | TimescaleDB | +| -------------------- | :---: | :------: | :------: | :---------: | +| *Scalable Writes* | ++ | + | + | + | +| *Raw Data Query* | ++ | + | + | + | +| *Aggregation Query* | ++ | + | + | + | +| *Downsampling Query* | ++ | + | +- | +- | +| *Latest Query* | ++ | + | +- | + | + +* Write: + +We test the performance of writing from two aspects: *batch size* and *client num*. The number of database is 10. There are 1000 devices and each device has 100 measurements(i.e.,, 100K time series total). + +* Read: + +10 clients read data concurrently. The number of database is 10. There are 10 devices and each device has 10 measurements (i.e.,, 100 time series total). +The data type is *double*, encoding type is *GORILLA* + +* Compression: + +We test and compare file sizes of TsFile(the file format of IoTDB) and some others famous dataset formats, which are Parquet, ORC and Csv, after the same datasets are written. + +The IoTDB version is v0.11.1. + +**Write performance**: + +* batch size: + +10 clients write data concurrently. +IoTDB uses batch insertion API and the batch size is distributed from 0 to 6000 (write N data points per write API call). + +The write throughput (points/second) is: + +![Batch Size with Write Throughput (points/second)](https://alioss.timecho.com/docs/img/github/106251391-df1b9f80-624f-11eb-9f1f-66823839acba.png) +
Figure 1. Batch Size with Write throughput (points/second) IoTDB v0.11.1
+ +* client num: + +The client num is distributed from 1 to 50. +IoTDB uses batch insertion API and the batch size is 100 (write 100 data points per write API call). + +The write throughput (points/second) is: + +![Client Num with Write Throughput (points/second) (ms)](https://alioss.timecho.com/docs/img/github/106251411-e5aa1700-624f-11eb-8ca8-00c0627b1e96.png) +
Figure 3. Client Num with Write Throughput (points/second) IoTDB v0.11.1
+ +**Query performance** + +![Raw data query 1 col](https://alioss.timecho.com/docs/img/github/106251377-daef8200-624f-11eb-9678-b1d5440be2de.png) +
Figure 4. Raw data query 1 col time cost(ms) IoTDB v0.11.1
+ +![Aggregation query](https://alioss.timecho.com/docs/img/github/106251336-cf03c000-624f-11eb-8395-de5e349f47b5.png) +
Figure 5. Aggregation query time cost(ms) IoTDB v0.11.1
+ +![Downsampling query](https://alioss.timecho.com/docs/img/github/106251353-d32fdd80-624f-11eb-80c1-fdb4197939fe.png) +
Figure 6. Downsampling query time cost(ms) IoTDB v0.11.1
+ +![Latest query](https://alioss.timecho.com/docs/img/github/106251369-d7f49180-624f-11eb-9d19-fc7341582b90.png) +
Figure 7. Latest query time cost(ms) IoTDB v0.11.1
+ +![Data compression](https://alioss.timecho.com/docs/img/github/118790229-23e34900-b8c8-11eb-87da-ac01dd117f28.png) +
Figure 8. Data compression IoTDB v0.11.1
+ +We can see that IoTDB outperforms others. + +#### More details + +We provide a benchmarking tool, called IoTDB-benchamrk (https://github.com/thulab/iotdb-benchmark, you may have to use the dev branch to compile it), +it supports IoTDB, InfluxDB, KairosDB, TimescaleDB, OpenTSDB. We have an [article](https://arxiv.org/abs/1901.08304) for comparing these systems using the benchmark tool. +When we publish the article, IoTDB just entered Apache incubator, so we deleted the performance of IoTDB in that article. But after comparison, some results are presented here. + + +- For InfluxDB, we set the cache-max-memory-size and the max-series-perbase as unlimited (otherwise it will be timeout quickly). + +- For KairosDB, we set Cassandra's read_repair_chance as 0.1 (However it has no effect because we just have one node). + +- For TimescaleDB, we use PGTune tool to optimize PostgreSQL. + +All TSDBs run on a server with Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz,(8 cores 16 threads), 32GB memory , 256G SSD and 10T HDD. +The OS is Ubuntu 16.04.7 LTS, 64bits. + +All clients run on a server with Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6 cores 12 threads), 16GB memory , 256G SSD. +The OS is Ubuntu 16.04.7 LTS, 64bits. + +## Conclusion + +From all above experiments, we can see that IoTDB outperforms others hugely. +IoTDB has the minimal write latency. The larger the batch size, the higher the write throughput of IoTDB. This indicates that IoTDB is most suitable for batch data writing scenarios. +In high concurrency scenarios, IoTDB can also maintain a steady growth in throughput. (12 million points per second may have reached the limit of gigabit network card) +In raw data query, as the query scope increases, the advantages of IoTDB begin to manifest. Because the granularity of data blocks is larger and the advantages of columnar storage are used, column-based compression and columnar iterators will both accelerate the query. +In aggregation query, we use the statistics of the file layer and cache the statistics. Therefore, multiple queries only need to perform memory calculations (do not need to traverse the original data points, and do not need to access the disk), so the aggregation performance advantage is obvious. +Downsampling query scenarios is more interesting, as the time partition becomes larger and larger, the query performance of IoTDB increases gradually. Probably it has risen twice, which corresponds to the pre-calculated information of 2 granularities(3 hours and 4.5 days). Therefore, the queries in the range of 1 day and 1 week are accelerated respectively. The other databases only rose once, indicating that they only have one granular statistics. + +If you are considering a TSDB for your IIoT application, Apache IoTDB, a new time series database, is your best choice. + +We will update this page once we release new version and finish the experiments. +We also welcome more contributors correct this article and contribute IoTDB and reproduce experiments. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Time-Partition.md b/src/UserGuide/V2.0.1/Tree/stage/Time-Partition.md new file mode 100644 index 00000000..736c1e70 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Time-Partition.md @@ -0,0 +1,53 @@ + + +# Time partition + +## Features + +Time partition divides data according to time, and a time partition is used to save all data within a certain time range. The time partition number is represented by a natural number. Number 0 means January 1, 1970, it will increase by one every partition_interval milliseconds. Time partition number's calculation formula is timestamp / partition_interval. The main configuration items are as follows: + +* time\_partition\_interval + +|Name| time\_partition\_interval | + |:---:|:-------------------------------------------------------------------------------------------------------| +|Description| Time range for dividing database, time series data will be divided into groups by this time range | +|Type| Int64 | +|Default| 604800000 | +|Effective| Only allowed to be modified in first start up | + +## Configuration example + +Enable time partition and set partition_interval to 86400000 (one day), then the data distribution is shown as the following figure: + +time partition example + +* Insert one datapoint with timestamp 0, calculate 0/86400000 = 0, then this datapoint will be stored in TsFile under folder 0 + +* Insert one datapoint with timestamp 1609459200010, calculate 1609459200010/86400000 = 18628, then this datapoint will be stored in TsFile under folder 18628 + +## Suggestions + +When enabling time partition, it is better to enable timed flush memtable, configuration params are detailed in [Config manual for timed flush](../Reference/DataNode-Config-Manual.md). + +* enable_timed_flush_unseq_memtable: Whether to enable timed flush unsequence memtable, enabled by default. + +* enable_timed_flush_seq_memtable: Whether to enable timed flush sequence memtable, disabled by default. It should be enabled when time partition is enabled, so inactive time partition's memtable can be flushed regularly. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Time-zone.md b/src/UserGuide/V2.0.1/Tree/stage/Time-zone.md new file mode 100644 index 00000000..7d084042 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Time-zone.md @@ -0,0 +1,90 @@ + + +# Time zone + +When a client connects to the IoTDB server, it can specify the time zone to be used for this connection. If not specified, the default time zone is the one of the client. + +The time zone can be set in both JDBC and session native interface connections. The usage is as follows: + +```java +JDBC: (IoTDBConnection) connection.setTimeZone("+08:00"); + +Session: session.setTimeZone("+08:00"); +``` + +In the CLI command line tool, the way to manually set the time zone through command is as follows: + +```java +SET time_zone=+08:00 +``` + +The way to view the time zone used by the current connection is as follows: + +```java +JDBC: (IoTDBConnection) connection.getTimeZone(); + +Session: session.getTimeZone(); +``` + +In CLI: + +```sql +SHOW time_zone +``` + +## Time zone usage scenarios + +The IoTDB server only stores and processes time stamps, and the time zone is only used to interact with clients. The specific scenarios are as follows: + +1. Convert the time format string sent from the client to the corresponding time stamp. + + For example,execute `insert into root.sg.d1(timestamp, s1) values(2021-07-01T08:00:00.000, 3.14)` + + Then `2021-07-01T08:00:00.000` will be converted to the corresponding timestamp value according to the time zone of the client. If it's in GMT+08:00, the result will be `1625097600000` ,which is equal to the timestamp value of `2021-07-01T00:00:00.000` in GMT+00:00。 + + > Note: At the same time, the dates of different time zones are different, but the timestamps are the same. + + + +2. Convert the timestamp in the result returned to the client into a time format string. + + Take the above situation as an example,execute `select * from root.sg.d1`,the server will return the time value pair: `(1625097600000, 3.14)`. If CLI tool is used,then `1625097600000` will be converted into time format string according to time zone, as shown in the figure below: + + ``` + +-----------------------------+-------------+ + | Time|root.sg.d1.s1| + +-----------------------------+-------------+ + |2021-07-01T08:00:00.000+08:00| 3.14| + +-----------------------------+-------------+ + ``` + + If the query is executed on the client in GMT:+00:00, the result will be as follows: + + ``` + +-----------------------------+-------------+ + | Time|root.sg.d1.s1| + +-----------------------------+-------------+ + |2021-07-01T00:00:00.000+00:00| 3.14| + +-----------------------------+-------------+ + ``` + + Note that the timestamps returned are the same, but the dates shown in different time zones are different. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Trigger/Configuration-Parameters.md b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Configuration-Parameters.md new file mode 100644 index 00000000..9d49e0f6 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Configuration-Parameters.md @@ -0,0 +1,29 @@ + + + + +# Configuration Parameters + +| Parameter | Meaning | +| ------------------------------------------------- | ------------------------------------------------------------ | +| *trigger_lib_dir* | Directory to save the trigger jar package | +| *stateful\_trigger\_retry\_num\_when\_not\_found* | How many times will we retry to found an instance of stateful trigger on DataNodes if not found | \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Trigger/Implement-Trigger.md b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Implement-Trigger.md new file mode 100644 index 00000000..38de35b0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Implement-Trigger.md @@ -0,0 +1,294 @@ + + + + +# How to implement a trigger + +You need to implement the trigger by writing a Java class, where the dependency shown below is required. If you use [Maven](http://search.maven.org/), you can search for them directly from the [Maven repository](http://search.maven.org/). + +## Dependency + +```xml + + org.apache.iotdb + iotdb-server + 1.0.0 + provided + +``` + +Note that the dependency version should be correspondent to the target server version. + +## Interface Description + +To implement a trigger, you need to implement the `org.apache.iotdb.trigger.api.Trigger` class. + +```java +import org.apache.iotdb.trigger.api.enums.FailureStrategy; +import org.apache.iotdb.tsfile.write.record.Tablet; + +public interface Trigger { + + /** + * This method is mainly used to validate {@link TriggerAttributes} before calling {@link + * Trigger#onCreate(TriggerAttributes)}. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void validate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when creating a trigger after validation. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void onCreate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when dropping a trigger. + * + * @throws Exception e + */ + default void onDrop() throws Exception {} + + /** + * When restarting a DataNode, Triggers that have been registered will be restored and this method + * will be called during the process of restoring. + * + * @throws Exception e + */ + default void restore() throws Exception {} + + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } + + /** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +} +``` + +This class provides two types of programming interfaces: **Lifecycle related interfaces** and **data change listening related interfaces**. All the interfaces in this class are not required to be implemented. When the interfaces are not implemented, the trigger will not respond to the data changes. You can implement only some of these interfaces according to your needs. + +Descriptions of the interfaces are as followed. + +### Lifecycle related interfaces + +| Interface | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| *default void validate(TriggerAttributes attributes) throws Exception {}* | When you creates a trigger using the `CREATE TRIGGER` statement, you can specify the parameters that the trigger needs to use, and this interface will be used to verify the correctness of the parameters。 | +| *default void onCreate(TriggerAttributes attributes) throws Exception {}* | This interface is called once when you create a trigger using the `CREATE TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface is mainly used for the following functions: helping users to parse custom attributes in SQL statements (using `TriggerAttributes`). You can create or apply for resources, such as establishing external links, opening files, etc. | +| *default void onDrop() throws Exception {}* | This interface is called when you drop a trigger using the `DROP TRIGGER` statement. During the lifetime of each trigger instance, this interface will be called only once. This interface mainly has the following functions: it can perform the operation of resource release and can be used to persist the results of trigger calculations. | +| *default void restore() throws Exception {}* | When the DataNode is restarted, the cluster will restore the trigger instance registered on the DataNode, and this interface will be called once for stateful trigger during the process. After the DataNode where the stateful trigger instance is located goes down, the cluster will restore the trigger instance on another available DataNode, calling this interface once in the process. This interface can be used to customize recovery logic. | + +### Data change listening related interfaces + +#### Listening interface + +```java +/** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +``` + +When the data changes, the trigger uses the Tablet as the unit of firing operation. You can obtain the metadata and data of the corresponding sequence through Tablet, and then perform the corresponding trigger operation. If the fire process is successful, the return value should be true. If the interface returns false or throws an exception, we consider the trigger fire process as failed. When the trigger fire process fails, we will perform corresponding operations according to the listening strategy interface. + +When performing an INSERT operation, for each time series in it, we will detect whether there is a trigger that listens to the path pattern, and then assemble the time series data that matches the path pattern listened by the same trigger into a new Tablet for trigger fire interface. Can be understood as: + +```java +Map> pathToTriggerListMap => Map +``` + +**Note that currently we do not make any guarantees about the order in which triggers fire.** + +Here is an example: + +Suppose there are three triggers, and the trigger event of the triggers are all BEFORE INSERT: + +- Trigger1 listens on `root.sg.*` +- Trigger2 listens on `root.sg.a` +- Trigger3 listens on `root.sg.b` + +Insertion statement: + +```sql +insert into root.sg(time, a, b) values (1, 1, 1); +``` + +The time series `root.sg.a` matches Trigger1 and Trigger2, and the sequence `root.sg.b` matches Trigger1 and Trigger3, then: + +- The data of `root.sg.a` and `root.sg.b` will be assembled into a new tablet1, and Trigger1.fire(tablet1) will be executed at the corresponding Trigger Event. +- The data of `root.sg.a` will be assembled into a new tablet2, and Trigger2.fire(tablet2) will be executed at the corresponding Trigger Event. +- The data of `root.sg.b` will be assembled into a new tablet3, and Trigger3.fire(tablet3) will be executed at the corresponding Trigger Event. + +#### Listening strategy interface + +When the trigger fails to fire, we will take corresponding actions according to the strategy set by the listening strategy interface. You can set `org.apache.iotdb.trigger.api.enums.FailureStrategy`. There are currently two strategies, optimistic and pessimistic: + +- Optimistic strategy: The trigger that fails to fire does not affect the firing of subsequent triggers, nor does it affect the writing process, that is, we do not perform additional processing on the sequence involved in the trigger failure, only log the failure to record the failure, and finally inform user that data insertion is successful, but the trigger fire part failed. +- Pessimistic strategy: The failure trigger affects the processing of all subsequent Pipelines, that is, we believe that the firing failure of the trigger will cause all subsequent triggering processes to no longer be carried out. If the trigger event of the trigger is BEFORE INSERT, then the insertion will no longer be performed, and the insertion failure will be returned directly. + +```java + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } +``` + +## Example + +If you use [Maven](http://search.maven.org/), you can refer to our sample project **trigger-example**. + +You can find it [here](https://github.com/apache/iotdb/tree/master/example/trigger). + +Here is the code from one of the sample projects: + +```java +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.trigger; + +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerConfiguration; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerEvent; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerHandler; +import org.apache.iotdb.trigger.api.Trigger; +import org.apache.iotdb.trigger.api.TriggerAttributes; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.write.record.Tablet; +import org.apache.iotdb.tsfile.write.schema.MeasurementSchema; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; + +public class ClusterAlertingExample implements Trigger { + private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class); + + private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); + + private final AlertManagerConfiguration alertManagerConfiguration = + new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); + + private String alertname; + + private final HashMap labels = new HashMap<>(); + + private final HashMap annotations = new HashMap<>(); + + @Override + public void onCreate(TriggerAttributes attributes) throws Exception { + alertname = "alert_test"; + + labels.put("series", "root.ln.wf01.wt01.temperature"); + labels.put("value", ""); + labels.put("severity", ""); + + annotations.put("summary", "high temperature"); + annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); + + alertManagerHandler.open(alertManagerConfiguration); + } + + @Override + public void onDrop() throws IOException { + alertManagerHandler.close(); + } + + @Override + public boolean fire(Tablet tablet) throws Exception { + List measurementSchemaList = tablet.getSchemas(); + for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { + if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { + // for example, we only deal with the columns of Double type + double[] values = (double[]) tablet.values[i]; + for (double value : values) { + if (value > 100.0) { + LOGGER.info("trigger value > 100"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "critical"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } else if (value > 50.0) { + LOGGER.info("trigger value > 50"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "warning"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } + } + } + } + return true; + } +} +``` + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Trigger/Instructions.md b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Instructions.md new file mode 100644 index 00000000..3e4aed35 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Instructions.md @@ -0,0 +1,51 @@ + + + + +# Instructions + +The trigger provides a mechanism for listening to changes in time series data. With user-defined logic, tasks such as alerting and data forwarding can be conducted. + +The trigger is implemented based on the reflection mechanism. Users can monitor data changes by implementing the Java interfaces. IoTDB allows users to dynamically register and drop triggers without restarting the server. + +The document will help you learn to define and manage triggers. + +## Pattern for listening + +A single trigger can be used to listen for data changes in a time series that match a specific pattern. For example, a trigger can listen for the data changes of time series `root.sg.a`, or time series that match the pattern `root.sg.*`. When you register a trigger, you can specify the path pattern that the trigger listens on through an SQL statement. + +## Trigger Type + +There are currently two types of triggers, and you can specify the type through an SQL statement when registering a trigger: + +- Stateful triggers: The execution logic of this type of trigger may depend on data from multiple insertion statement . The framework will aggregate the data written by different nodes into the same trigger instance for calculation to retain context information. This type of trigger is usually used for sampling or statistical data aggregation for a period of time. information. Only one node in the cluster holds an instance of a stateful trigger. +- Stateless triggers: The execution logic of the trigger is only related to the current input data. The framework does not need to aggregate the data of different nodes into the same trigger instance. This type of trigger is usually used for calculation of single row data and abnormal detection. Each node in the cluster holds an instance of a stateless trigger. + +## Trigger Event + +There are currently two trigger events for the trigger, and other trigger events will be expanded in the future. When you register a trigger, you can specify the trigger event through an SQL statement: + +- BEFORE INSERT: Fires before the data is persisted. **Please note that currently the trigger does not support data cleaning and will not change the data to be persisted itself.** +- AFTER INSERT: Fires after the data is persisted. + + + diff --git a/src/UserGuide/V2.0.1/Tree/stage/Trigger/Notes.md b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Notes.md new file mode 100644 index 00000000..77f282a0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Notes.md @@ -0,0 +1,30 @@ + + +# Notes + +- The trigger takes effect from the time of registration, and does not process the existing historical data. **That is, only insertion requests that occur after the trigger is successfully registered will be listened to by the trigger. ** +- The fire process of trigger is synchronous currently, so you need to ensure the efficiency of the trigger, otherwise the writing performance may be greatly affected. **You need to guarantee concurrency safety of triggers yourself**. +- Please do no register too many triggers in the cluster. Because the trigger information is fully stored in the ConfigNode, and there is a copy of the information in all DataNodes +- **It is recommended to stop writing when registering triggers**. Registering a trigger is not an atomic operation. When registering a trigger, there will be an intermediate state in which some nodes in the cluster have registered the trigger, and some nodes have not yet registered successfully. To avoid write requests on some nodes being listened to by triggers and not being listened to on some nodes, we recommend not to perform writes when registering triggers. +- When the node holding the stateful trigger instance goes down, we will try to restore the corresponding instance on another node. During the recovery process, we will call the restore interface of the trigger class once. +- The trigger JAR package has a size limit, which must be less than min(`config_node_ratis_log_appender_buffer_size_max`, 2G), where `config_node_ratis_log_appender_buffer_size_max` is a configuration item. For the specific meaning, please refer to the IOTDB configuration item description. +- **It is better not to have classes with the same full class name but different function implementations in different JAR packages.** For example, trigger1 and trigger2 correspond to resources trigger1.jar and trigger2.jar respectively. If two JAR packages contain a `org.apache.iotdb.trigger.example.AlertListener` class, when `CREATE TRIGGER` uses this class, the system will randomly load the class in one of the JAR packages, which will eventually leads the inconsistent behavior of trigger and other issues. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Trigger/Trigger-Management.md b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Trigger-Management.md new file mode 100644 index 00000000..0c555a47 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Trigger/Trigger-Management.md @@ -0,0 +1,152 @@ + + + + +# Triggers Management + +You can create and drop a trigger through an SQL statement, and you can also query all registered triggers through an SQL statement. + +**We recommend that you stop insertion while creating triggers.** + +## Create Trigger + +Triggers can be registered on arbitrary path patterns. The time series registered with the trigger will be listened to by the trigger. When there is data change on the series, the corresponding fire method in the trigger will be called. + +Registering a trigger can be done as follows: + +1. Implement a Trigger class as described in the How to implement a Trigger chapter, assuming the class's full class name is `org.apache.iotdb.trigger.ClusterAlertingExample` +2. Package the project into a JAR package. +3. Register the trigger with an SQL statement. During the creation process, the `validate` and `onCreate` interfaces of the trigger will only be called once. For details, please refer to the chapter of How to implement a Trigger. + +The complete SQL syntax is as follows: + +```sql +// Create Trigger +createTrigger + : CREATE triggerType TRIGGER triggerName=identifier triggerEventClause ON pathPattern AS className=STRING_LITERAL uriClause? triggerAttributeClause? + ; + +triggerType + : STATELESS | STATEFUL + ; + +triggerEventClause + : (BEFORE | AFTER) INSERT + ; + +uriClause + : USING URI uri + ; + +uri + : STRING_LITERAL + ; + +triggerAttributeClause + : WITH LR_BRACKET triggerAttribute (COMMA triggerAttribute)* RR_BRACKET + ; + +triggerAttribute + : key=attributeKey operator_eq value=attributeValue + ; +``` + +Below is the explanation for the SQL syntax: + +- triggerName: The trigger ID, which is globally unique and used to distinguish different triggers, is case-sensitive. +- triggerType: Trigger types are divided into two categories, STATELESS and STATEFUL. +- triggerEventClause: when the trigger fires, BEFORE INSERT and AFTER INSERT are supported now. +- pathPattern:The path pattern the trigger listens on, can contain wildcards * and **. +- className:The class name of the Trigger class. +- jarLocation: Optional. When this option is not specified, by default, we consider that the DBA has placed the JAR package required to create the trigger in the trigger_root_dir directory (configuration item, default is IOTDB_HOME/ext/trigger) of each DataNode node. When this option is specified, we will download and distribute the file resource corresponding to the URI to the trigger_root_dir/install directory of each DataNode. +- triggerAttributeClause: It is used to specify the parameters that need to be set when the trigger instance is created. This part is optional in the SQL syntax. + +Here is an example SQL statement to help you understand: + +```sql +CREATE STATELESS TRIGGER triggerTest +BEFORE INSERT +ON root.sg.** +AS 'org.apache.iotdb.trigger.ClusterAlertingExample' +USING URI '/jar/ClusterAlertingExample.jar' +WITH ( + "name" = "trigger", + "limit" = "100" +) +``` + +The above SQL statement creates a trigger named triggerTest: + +- The trigger is stateless. +- Fires before insertion. +- Listens on path pattern root.sg.** +- The implemented trigger class is named `org.apache.iotdb.trigger.ClusterAlertingExample` +- The JAR package URI is http://jar/ClusterAlertingExample.jar +- When creating the trigger instance, two parameters, name and limit, are passed in. + +## Drop Trigger + +The trigger can be dropped by specifying the trigger ID. During the process of dropping the trigger, the `onDrop` interface of the trigger will be called only once. + +The SQL syntax is: + +```sql +// Drop Trigger +dropTrigger + : DROP TRIGGER triggerName=identifier +; +``` + +Here is an example statement: + +```sql +DROP TRIGGER triggerTest1 +``` + +The above statement will drop the trigger with ID triggerTest1. + +## Show Trigger + +You can query information about triggers that exist in the cluster through an SQL statement. + +The SQL syntax is as follows: + +```sql +SHOW TRIGGERS +``` + +The result set format of this statement is as follows: + +| TriggerName | Event | Type | State | PathPattern | ClassName | NodeId | +| ------------ | ---------------------------- | -------------------- | ------------------------------------------- | ----------- | --------------------------------------- | --------------------------------------- | +| triggerTest1 | BEFORE_INSERT / AFTER_INSERT | STATELESS / STATEFUL | INACTIVE / ACTIVE / DROPPING / TRANSFFERING | root.** | org.apache.iotdb.trigger.TriggerExample | ALL(STATELESS) / DATA_NODE_ID(STATEFUL) | + +## Trigger State + +During the process of creating and dropping triggers in the cluster, we maintain the states of the triggers. The following is a description of these states: + +| State | Description | Is it recommended to insert data? | +| ------------ | ------------------------------------------------------------ | --------------------------------- | +| INACTIVE | The intermediate state of executing `CREATE TRIGGER`, the cluster has just recorded the trigger information on the ConfigNode, and the trigger has not been activated on any DataNode. | NO | +| ACTIVE | Status after successful execution of `CREATE TRIGGE`, the trigger is available on all DataNodes in the cluster. | YES | +| DROPPING | Intermediate state of executing `DROP TRIGGER`, the cluster is in the process of dropping the trigger. | NO | +| TRANSFERRING | The cluster is migrating the location of this trigger instance. | NO | diff --git a/src/UserGuide/V2.0.1/Tree/stage/TsFile-Import-Export-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/TsFile-Import-Export-Tool.md new file mode 100644 index 00000000..952a1dbf --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/TsFile-Import-Export-Tool.md @@ -0,0 +1,428 @@ + + +# TsFile Import Export Script + +For different scenarios, IoTDB provides users with a variety of operation methods for batch importing data. This chapter introduces the two most commonly used methods for importing in the form of CSV text and importing in the form of TsFile files. + +## TsFile Load And Export Script + +### TsFile Load Tool + +#### Introduction + +The load external tsfile tool allows users to load tsfiles, delete a tsfile, or move a tsfile to target directory from the running Apache IoTDB instance. Alternatively, you can use scripts to load tsfiles into IoTDB, for more information. + +#### Load with SQL + +The user sends specified commands to the Apache IoTDB system through the Cli tool or JDBC to use the tool. + +##### Load Tsfiles + +The command to load tsfiles is `load [sglevel=int][verify=true/false][onSuccess=delete/none]`. + +This command has two usages: + +1. Load a single tsfile by specifying a file path (absolute path). + +The first parameter indicates the path of the tsfile to be loaded. This command has three options: sglevel, verify, onSuccess. + +SGLEVEL option. If the database correspond to the tsfile does not exist, the user can set the level of database through the fourth parameter. By default, it uses the database level which is set in `iotdb-system.properties`. + +VERIFY option. If this parameter is true, All timeseries in this loading tsfile will be compared with the timeseries in IoTDB. If existing a measurement which has different datatype with the measurement in IoTDB, the loading process will be stopped and exit. If consistence can be promised, setting false for this parameter will be a better choice. + +ONSUCCESS option. The default value is DELETE, which means the processing method of successfully loaded tsfiles, and DELETE means after the tsfile is successfully loaded, it will be deleted. NONE means after the tsfile is successfully loaded, it will be remained in the origin dir. + +If the `.resource` file corresponding to the file exists, it will be loaded into the data directory and engine of the Apache IoTDB. Otherwise, the corresponding `.resource` file will be regenerated from the tsfile file. + +Examples: + +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true onSuccess=none` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1 onSuccess=delete` + +2. Load a batch of files by specifying a folder path (absolute path). + +The first parameter indicates the path of the tsfile to be loaded. The options above also works for this command. + +Examples: + +* `load '/Users/Desktop/data'` +* `load '/Users/Desktop/data' verify=false` +* `load '/Users/Desktop/data' verify=true` +* `load '/Users/Desktop/data' verify=true sglevel=1` +* `load '/Users/Desktop/data' verify=false sglevel=1 onSuccess=delete` + +**NOTICE**: When `$IOTDB_HOME$/conf/iotdb-system.properties` has `enable_auto_create_schema=true`, it will automatically create metadata in TSFILE, otherwise it will not be created automatically. + +#### Load with Script + +Run rewrite-tsfile.bat if you are in a Windows environment, or rewrite-tsfile.sh if you are on Linux or Unix. + +```bash +./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--verify true/false] [--onSuccess none/delete] +-f File/Directory to be load, required +-h IoTDB Host address, optional field, 127.0.0.1 by default +-p IoTDB port, optional field, 6667 by default +-u IoTDB user name, optional field, root by default +-pw IoTDB password, optional field, root by default +--sgLevel Sg level of loading Tsfile, optional field, default_storage_group_level in iotdb-system.properties by default +--verify Verify schema or not, optional field, True by default +--onSuccess Delete or remain origin TsFile after loading, optional field, none by default +``` + +##### Example + +Assuming that an IoTDB instance is running on server 192.168.0.101:6667, you want to load all TsFile files from the locally saved TsFile backup folder D:\IoTDB\data into this IoTDB instance. + +First move to the folder `$IOTDB_HOME/tools/`, open the command line, and execute + +```bash +./load-rewrite.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root +``` + +After waiting for the script execution to complete, you can check that the data in the IoTDB instance has been loaded correctly. + +##### Q&A + +- Cannot find or load the main class + - It may be because the environment variable $IOTDB_HOME is not set, please set the environment variable and try again +- -f option must be set! + - The input command is missing the -f field (file or folder path to be loaded) or the -u field (user name), please add it and re-execute +- What if the execution crashes in the middle and you want to reload? + - You re-execute the command just now, reloading the data will not affect the correctness after loading + +TsFile can help you export the result set in the format of TsFile file to the specified path by executing the sql, command line sql, and sql file. + +### TsFile Export Tool + +#### Syntax + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h -p -u -pw -td [-f -q -s ] + +# Windows +> tools\export-tsfile.bat -h -p -u -pw -td [-f -q -s ] +``` + +* `-h `: + - The host address of the IoTDB service. +* `-p `: + - The port number of the IoTDB service. +* `-u `: + - The username of the IoTDB service. +* `-pw `: + - Password for IoTDB service. +* `-td `: + - Specify the output path for the exported TsFile file. +* `-f `: + - For the file name of the exported TsFile file, just write the file name, and cannot include the file path and suffix. If the sql file or console input contains multiple sqls, multiple files will be generated in the order of sql. + - Example: There are three SQLs in the file or command line, and -f param is "dump", then three TsFile files: dump0.tsfile、dump1.tsfile、dump2.tsfile will be generated in the target path. +* `-q `: + - Directly specify the query statement you want to execute in the command. + - Example: `select * from root.** limit 100` +* `-s `: + - Specify a SQL file that contains one or more SQL statements. If an SQL file contains multiple SQL statements, the SQL statements should be separated by newlines. Each SQL statement corresponds to an output TsFile file. +* `-t `: + - Specifies the timeout period for session queries, in milliseconds + + +In addition, if you do not use the `-s` and `-q` parameters, after the export script is started, you need to enter the query statement as prompted by the program, and different query results will be saved to different TsFile files. + +#### Example + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 + +# Windows +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 +``` + +#### Q&A + +- It is recommended not to execute the write data command at the same time when loading data, which may lead to insufficient memory in the JVM. + +## CSV Tool + +The CSV tool can help you import data in CSV format to IoTDB or export data from IoTDB to a CSV file. + +### Usage of export-csv.sh + +#### Syntax + +```shell +# Unix/OS X +> tools/export-csv.sh -h -p -u -pw -td [-tf -datatype -q -s -linesPerFile ] + +# Windows +> tools\export-csv.bat -h -p -u -pw -td [-tf -datatype -q -s -linesPerFile ] +``` + +Description: + +* `-datatype`: + - true (by default): print the data type of timesries in the head line of CSV file. i.e., `Time, root.sg1.d1.s1(INT32), root.sg1.d1.s2(INT64)`. + - false: only print the timeseries name in the head line of the CSV file. i.e., `Time, root.sg1.d1.s1 , root.sg1.d1.s2` +* `-q `: + - specifying a query command that you want to execute + - example: `select * from root.** limit 100`, or `select * from root.** limit 100 align by device` +* `-s `: + - specifying a SQL file which can consist of more than one sql. If there are multiple SQLs in one SQL file, the SQLs should be separated by line breaks. And, for each SQL, a output CSV file will be generated. +* `-td `: + - specifying the directory that the data will be exported +* `-tf `: + - specifying a time format that you want. The time format have to obey [ISO 8601](https://calendars.wikia.org/wiki/ISO_8601) standard. If you want to save the time as the timestamp, then setting `-tf timestamp` + - example: `-tf yyyy-MM-dd\ HH:mm:ss` or `-tf timestamp` +* `-linesPerFile `: + - Specifying lines of each dump file, `10000` is default. + - example: `-linesPerFile 1` +* `-t `: + - Specifies the timeout period for session queries, in milliseconds + + +More, if you don't use one of `-s` and `-q`, you need to enter some queries after running the export script. The results of the different query will be saved to different CSV files. + +#### Example + +```shell +# Unix/OS X +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 + +# Windows +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 +``` + +#### Sample SQL file + +```sql +select * from root.**; +select * from root.** align by device; +``` + +The result of `select * from root.**` + +```sql +Time,root.ln.wf04.wt04.status(BOOLEAN),root.ln.wf03.wt03.hardware(TEXT),root.ln.wf02.wt02.status(BOOLEAN),root.ln.wf02.wt02.hardware(TEXT),root.ln.wf01.wt01.hardware(TEXT),root.ln.wf01.wt01.status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,true,"v1",true,"v1",v1,true +1970-01-01T08:00:00.002+08:00,true,"v1",,,,true +``` + +The result of `select * from root.** align by device` + +```sql +Time,Device,hardware(TEXT),status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,root.ln.wf01.wt01,"v1",true +1970-01-01T08:00:00.002+08:00,root.ln.wf01.wt01,,true +1970-01-01T08:00:00.001+08:00,root.ln.wf02.wt02,"v1",true +1970-01-01T08:00:00.001+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.002+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.001+08:00,root.ln.wf04.wt04,,true +1970-01-01T08:00:00.002+08:00,root.ln.wf04.wt04,,true +``` + +The data of boolean type signed by `true` and `false` without double quotes. And the text data will be enclosed in double quotes. + +#### Note + +Note that if fields exported by the export tool have the following special characters: + +1. `,`: the field will be escaped by `\`. + +### Usage of import-csv.sh + +#### Create Metadata (optional) + +```sql +CREATE DATABASE root.fit.d1; +CREATE DATABASE root.fit.d2; +CREATE DATABASE root.fit.p; +CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; +CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; +``` + +IoTDB has the ability of type inference, so it is not necessary to create metadata before data import. However, we still recommend creating metadata before importing data using the CSV import tool, as this can avoid unnecessary type conversion errors. + +#### Sample CSV File to Be Imported + +The data aligned by time, and headers without data type. + +```sql +Time,root.test.t1.str,root.test.t2.str,root.test.t2.int +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,"123",, +``` + +The data aligned by time, and headers with data type.(Text type data supports double quotation marks and no double quotation marks) + +```sql +Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.int(INT32) +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,123,hello world,123 +1970-01-01T08:00:00.003+08:00,"123",, +1970-01-01T08:00:00.004+08:00,123,,12 +``` + +The data aligned by device, and headers without data type. + +```sql +Time,Device,str,int +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +``` + +The data aligned by device, and headers with data type.(Text type data supports double quotation marks and no double quotation marks) + +```sql +Time,Device,str(TEXT),int(INT32) +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 +1970-01-01T08:00:00.003+08:00,root.test.t1,,123 +``` + +#### Syntax + +```shell +# Unix/OS X +> tools/import-csv.sh -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] +# Windows +> tools\import-csv.bat -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] +``` + +Description: + +* `-f`: + - the CSV file that you want to import, and it could be a file or a folder. If a folder is specified, all TXT and CSV files in the folder will be imported in batches. + - example: `-f filename.csv` + +* `-fd`: + - specifying a directory to save files which save failed lines. If you don't use this parameter, the failed file will be saved at original directory, and the filename will be the source filename with suffix `.failed`. + - example: `-fd ./failed/` + +* `-aligned`: + - whether to use the aligned interface? The option `false` is default. + - example: `-aligned true` + +* `-batch`: + - specifying the point's number of a batch. If the program throw the exception `org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`, you can lower this parameter as appropriate. + - example: `-batch 100000`, `100000` is the default value. + +* `-tp `: + - specifying a time precision. Options includes `ms`(millisecond), `ns`(nanosecond), and `us`(microsecond), `ms` is default. + +* `-typeInfer `: + - specifying rules of type inference. + - Option `srcTsDataType` includes `boolean`,`int`,`long`,`float`,`double`,`NaN`. + - Option `dstTsDataType` includes `boolean`,`int`,`long`,`float`,`double`,`text`. + - When `srcTsDataType` is `boolean`, `dstTsDataType` should be between `boolean` and `text`. + - When `srcTsDataType` is `NaN`, `dstTsDataType` should be among `float`, `double` and `text`. + - When `srcTsDataType` is Numeric type, `dstTsDataType` precision should be greater than `srcTsDataType`. + - example: `-typeInfer boolean=text,float=double` + +* `-linesPerFailedFile `: + - Specifying lines of each failed file, `10000` is default. + - example: `-linesPerFailedFile 1` + +#### Example + +```sh +# Unix/OS X +> tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +> tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 + +# Windows +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 + +``` + +#### Note + +Note that the following special characters in fields need to be checked before importing: + +1. `,` : fields containing `,` should be escaped by `\`. +2. you can input time format like `yyyy-MM-dd'T'HH:mm:ss`, `yyy-MM-dd HH:mm:ss`, or `yyyy-MM-dd'T'HH:mm:ss.SSSZ`. +3. the `Time` column must be the first one. \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/WayToGetIoTDB.md b/src/UserGuide/V2.0.1/Tree/stage/WayToGetIoTDB.md new file mode 100644 index 00000000..9d7503bd --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/WayToGetIoTDB.md @@ -0,0 +1,211 @@ + + +# Way to get IoTDB binary files + +IoTDB provides you three installation methods, you can refer to the following suggestions, choose one of them: + +* Installation from source code. If you need to modify the code yourself, you can use this method. +* Installation from binary files. Download the binary files from the official website. This is the recommended method, in which you will get a binary released package which is out-of-the-box. +* Using Docker:The path to the dockerfile is https://github.com/apache/iotdb/blob/master/docker + +## Prerequisites + +To use IoTDB, you need to have: + +1. Java >= 1.8 (Please make sure the environment path has been set) +2. Maven >= 3.6 (Optional) +3. Set the max open files num as 65535 to avoid "too many open files" problem. + +>Note: If you don't have maven installed, you should replace 'mvn' in the following commands with 'mvnw' or 'mvnw.cmd'. +> +>### Installation from binary files + +You can download the binary file from: +[Download page](https://iotdb.apache.org/Download/) + +## Installation from source code + +You can get the released source code from https://iotdb.apache.org/Download/, or from the git repository https://github.com/apache/iotdb/tree/master +You can download the source code from: + +``` +git clone https://github.com/apache/iotdb.git +``` + +After that, go to the root path of IoTDB. If you want to build the version that we have released, you need to create and check out a new branch by command `git checkout -b my_{project.version} v{project.version}`. E.g., you want to build the version `0.12.4`, you can execute this command to make it: + +```shell +> git checkout -b my_0.12.4 v0.12.4 +``` + +Then you can execute this command to build the version that you want: + +``` +> mvn clean package -DskipTests +``` + +Then the binary version (including both server and client) can be found at **distribution/target/apache-iotdb-{project.version}-bin.zip** + +> NOTE: Directories "thrift/target/generated-sources/thrift" and "antlr/target/generated-sources/antlr4" need to be added to sources roots to avoid compilation errors in IDE. + +If you would like to build the IoTDB server, you can run the following command under the root path of iotdb: + +``` +> mvn clean package -pl iotdb-core/datanode -am -DskipTests +``` + +After build, the IoTDB server will be at the folder "server/target/iotdb-server-{project.version}". + +If you would like to build a module, you can execute command `mvn clean package -pl {module.name} -am -DskipTests` under the root path of IoTDB. +If you need the jar with dependencies, you can add parameter `-P get-jar-with-dependencies` after the command. E.g., If you need the jar of jdbc with dependencies, you can execute this command: + +```shell +> mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies +``` + +Then you can find it under the path `{module.name}/target`. + +## Installation by Docker +Apache IoTDB' Docker image is released on [https://hub.docker.com/r/apache/iotdb](https://hub.docker.com/r/apache/iotdb) +Add environments of docker to update the configurations of Apache IoTDB. +### Have a try +```shell +# get IoTDB official image +docker pull apache/iotdb:1.1.0-standalone +# create docker bridge network +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +# create docker container +docker run -d --name iotdb-service \ + --hostname iotdb-service \ + --network iotdb \ + --ip 172.18.0.6 \ + -p 6667:6667 \ + -e cn_internal_address=iotdb-service \ + -e cn_seed_config_node=iotdb-service:10710 \ + -e cn_internal_port=10710 \ + -e cn_consensus_port=10720 \ + -e dn_rpc_address=iotdb-service \ + -e dn_internal_address=iotdb-service \ + -e dn_seed_config_node=iotdb-service:10710 \ + -e dn_mpp_data_exchange_port=10740 \ + -e dn_schema_region_consensus_port=10750 \ + -e dn_data_region_consensus_port=10760 \ + -e dn_rpc_port=6667 \ + apache/iotdb:1.1.0-standalone +# execute SQL +docker exec -ti iotdb-service /iotdb/sbin/start-cli.sh -h iotdb-service +``` +External access: +```shell +# is the real IP or domain address rather than the one in docker network, could be 127.0.0.1 within the computer. +$IOTDB_HOME/sbin/start-cli.sh -h -p 6667 +``` +Notice:The confignode service would fail when restarting this container if the IP Adress of the container has been changed. +```yaml +# docker-compose-standalone.yml +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.1.0-standalone + hostname: iotdb-service + container_name: iotdb-service + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb-service + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-service:10710 + - dn_rpc_address=iotdb-service + - dn_internal_address=iotdb-service + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb-service:10710 + volumes: + - ./data:/iotdb/data + - ./logs:/iotdb/logs + networks: + iotdb: + ipv4_address: 172.18.0.6 + +networks: + iotdb: + external: true +``` +### deploy cluster +Until now, we support host and overlay networks but haven't supported bridge networks on multiple computers. +Overlay networks see [1C2D](https://github.com/apache/iotdb/tree/master/docker/src/main/DockerCompose/docker-compose-cluster-1c2d.yml) and here are the configurations and operation steps to start an IoTDB cluster with docker using host networks。 + +Suppose that there are three computers of iotdb-1, iotdb-2 and iotdb-3. We called them nodes. +Here is the docker-compose file of iotdb-2, as the sample: +```yaml +version: "3" +services: + iotdb-confignode: + image: apache/iotdb:1.1.0-confignode + container_name: iotdb-confignode + environment: + - cn_internal_address=iotdb-2 + - cn_seed_config_node=iotdb-1:10710 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - data_replication_factor=3 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/confignode:/iotdb/data + - ./logs/confignode:/iotdb/logs + network_mode: "host" + + iotdb-datanode: + image: apache/iotdb:1.1.0-datanode + container_name: iotdb-datanode + environment: + - dn_rpc_address=iotdb-2 + - dn_internal_address=iotdb-2 + - dn_seed_config_node=iotdb-1:10710 + - data_replication_factor=3 + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/datanode:/iotdb/data/ + - ./logs/datanode:/iotdb/logs/ + network_mode: "host" +``` +Notice: +1. The `dn_seed_config_node` of three nodes must the same and it is the first starting node of `iotdb-1` with the cn_internal_port of 10710。 +2. In this docker-compose file,`iotdb-2` should be replace with the real IP or hostname of each node to generate docker compose files in the other nodes. +3. The services would talk with each other, so they need map the /etc/hosts file or add the `extra_hosts` to the docker compose file. +4. We must start the IoTDB services of `iotdb-1` first at the first time of starting. +5. Stop and remove all the IoTDB services and clean up the `data` and `logs` directories of the 3 nodes,then start the cluster again. diff --git a/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Batch-Load-Tool.md b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Batch-Load-Tool.md new file mode 100644 index 00000000..912c2bb0 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Batch-Load-Tool.md @@ -0,0 +1,32 @@ + + +# Batch Data Load + +In different scenarios, the IoTDB provides a variety of methods for importing data in batches. This section describes the two most common methods for importing data in CSV format and TsFile format. + +## TsFile Batch Load + +TsFile is the file format of time series used in IoTDB. You can directly import one or more TsFile files with time series into another running IoTDB instance through tools such as CLI. For details, see [TsFile Load Tool](../Maintenance-Tools/Load-Tsfile.md) [TsFile Export Tools](../Maintenance-Tools/TsFile-Load-Export-Tool.md). + +## CSV Batch Load + +CSV stores table data in plain text. You can write multiple formatted data into a CSV file and import the data into the IoTDB in batches. Before importing data, you are advised to create the corresponding metadata in the IoTDB. Don't worry if you forget to create one, the IoTDB can automatically infer the data in the CSV to its corresponding data type, as long as you have a unique data type for each column. In addition to a single file, the tool supports importing multiple CSV files as folders and setting optimization parameters such as time precision. For details, see [CSV Load Export Tools](../Maintenance-Tools/CSV-Tool.md). diff --git a/src/UserGuide/V2.0.1/Tree/stage/Write-Data/MQTT.md b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/MQTT.md new file mode 100644 index 00000000..492e2e19 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/MQTT.md @@ -0,0 +1,24 @@ + + +# MQTT Write + +Refer to [Built-in MQTT Service](../API/Programming-MQTT.md#built-in-mqtt-service) \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Write-Data/REST-API.md b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/REST-API.md new file mode 100644 index 00000000..603621aa --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/REST-API.md @@ -0,0 +1,58 @@ + + +# REST API Write + +Refer to [insertTablet (v1)](../API/RestServiceV1.md#inserttablet) or [insertTablet (v2)](../API/RestServiceV2.md#inserttablet) + +Example: + +```JSON +{ +      "timestamps": [ +            1, +            2, +            3 +      ], +      "measurements": [ +            "temperature", +            "status" +      ], +      "data_types": [ +            "FLOAT", +            "BOOLEAN" +      ], +      "values": [ +            [ +                  1.1, +                  2.2, +                  3.3 +            ], +            [ +                  false, +                  true, +                  true +            ] +      ], +      "is_aligned": false, +      "device": "root.ln.wf01.wt01" +} +``` \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Session.md b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Session.md new file mode 100644 index 00000000..39445f60 --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Session.md @@ -0,0 +1,38 @@ + + +# Native API Write + +The Native API ( Session ) is the most widely used series of APIs of IoTDB, including multiple APIs, adapted to different data collection scenarios, with high performance and multi-language support. + +## Multi-language API write +* ### Java + Before writing via the Java API, you need to establish a connection, refer to [Java Native API](../API/Programming-Java-Native-API.md). + then refer to [ JAVA Data Manipulation Interface (DML) ](../API/Programming-Java-Native-API.md#insert) + +* ### Python + Refer to [ Python Data Manipulation Interface (DML) ](../API/Programming-Python-Native-API.md#insert) + +* ### C++ + Refer to [ C++ Data Manipulation Interface (DML) ](../API/Programming-Cpp-Native-API.md#insert) + +* ### Go + Refer to [Go Native API](../API/Programming-Go-Native-API.md) \ No newline at end of file diff --git a/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Write-Data.md b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Write-Data.md new file mode 100644 index 00000000..a8eb2f2d --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Write-Data/Write-Data.md @@ -0,0 +1,110 @@ + + +# INSERT + +IoTDB provides users with a variety of ways to insert real-time data, such as directly inputting [INSERT SQL statement](../Reference/SQL-Reference.md) in [Client/Shell tools](../QuickStart/Command-Line-Interface.md), or using [Java JDBC](../API/Programming-JDBC.md) to perform single or batch execution of [INSERT SQL statement](../Reference/SQL-Reference.md). + +NOTE: This section mainly introduces the use of [INSERT SQL statement](../Reference/SQL-Reference.md) for real-time data import in the scenario. + +Writing a repeat timestamp covers the original timestamp data, which can be regarded as updated data. + +## Use of INSERT Statements + +The [INSERT SQL statement](../Reference/SQL-Reference.md) statement is used to insert data into one or more specified timeseries created. For each point of data inserted, it consists of a [timestamp](../Basic-Concept/Data-Model-and-Terminology.md) and a sensor acquisition value (see [Data Type](../Basic-Concept/Data-Type.md)). + +In the scenario of this section, take two timeseries `root.ln.wf02.wt02.status` and `root.ln.wf02.wt02.hardware` as an example, and their data types are BOOLEAN and TEXT, respectively. + +The sample code for single column data insertion is as follows: +``` +IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) +IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') +``` + +The above example code inserts the long integer timestamp and the value "true" into the timeseries `root.ln.wf02.wt02.status` and inserts the long integer timestamp and the value "v1" into the timeseries `root.ln.wf02.wt02.hardware`. When the execution is successful, cost time is shown to indicate that the data insertion has been completed. + +> Note: In IoTDB, TEXT type data can be represented by single and double quotation marks. The insertion statement above uses double quotation marks for TEXT type data. The following example will use single quotation marks for TEXT type data. + +The INSERT statement can also support the insertion of multi-column data at the same time point. The sample code of inserting the values of the two timeseries at the same time point '2' is as follows: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (2, false, 'v2') +``` + +In addition, The INSERT statement support insert multi-rows at once. The sample code of inserting two rows as follows: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') +``` + +After inserting the data, we can simply query the inserted data using the SELECT statement: + +```sql +IoTDB > select * from root.ln.wf02.wt02 where time < 5 +``` + +The result is shown below. The query result shows that the insertion statements of single column and multi column data are performed correctly. + +``` ++-----------------------------+--------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status| ++-----------------------------+--------------------------+------------------------+ +|1970-01-01T08:00:00.001+08:00| v1| true| +|1970-01-01T08:00:00.002+08:00| v2| false| +|1970-01-01T08:00:00.003+08:00| v3| false| +|1970-01-01T08:00:00.004+08:00| v4| true| ++-----------------------------+--------------------------+------------------------+ +Total line number = 4 +It costs 0.004s +``` + +In addition, we can omit the timestamp column, and the system will use the current system timestamp as the timestamp of the data point. The sample code is as follows: +```sql +IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') +``` +**Note:** Timestamps must be specified when inserting multiple rows of data in a SQL. + +## Insert Data Into Aligned Timeseries + +To insert data into a group of aligned time series, we only need to add the `ALIGNED` keyword in SQL, and others are similar. + +The sample code is as follows: + +```sql +IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(2, 2, 2), (3, 3, 3) +IoTDB > select * from root.sg1.d1 +``` + +The result is shown below. The query result shows that the insertion statements are performed correctly. + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| 2.0| +|1970-01-01T08:00:00.003+08:00| 3| 3.0| ++-----------------------------+--------------+--------------+ +Total line number = 3 +It costs 0.004s +``` diff --git a/src/UserGuide/V2.0.1/Tree/stage/Writing-Data-on-HDFS.md b/src/UserGuide/V2.0.1/Tree/stage/Writing-Data-on-HDFS.md new file mode 100644 index 00000000..96609b8f --- /dev/null +++ b/src/UserGuide/V2.0.1/Tree/stage/Writing-Data-on-HDFS.md @@ -0,0 +1,171 @@ + + +# Integration with HDFS + +## Shared Storage Architecture + +Currently, TsFiles(including both TsFile and related data files) are supported to be stored in local file system and hadoop distributed file system(HDFS). It is very easy to config the storage file system of TSFile. + +## System architecture + +When you config to store TSFile on HDFS, your data files will be in distributed storage. The system architecture is as below: + + + +## Config and usage + +To store TSFile and related data files in HDFS, here are the steps: + +First, download the source release from website or git clone the repository + +Build server and Hadoop module by: `mvn clean package -pl iotdb-core/datanode,iotdb-connector/hadoop -am -Dmaven.test.skip=true -P get-jar-with-dependencies` + +Then, copy the target jar of Hadoop module `hadoop-tsfile-X.X.X-jar-with-dependencies.jar` into server target lib folder `.../server/target/iotdb-server-X.X.X/lib`. + +Edit user config in `iotdb-system.properties`. Related configurations are: + +* tsfile\_storage\_fs + +|Name| tsfile\_storage\_fs | +|:---:|:---| +|Description| The storage file system of Tsfile and related data files. Currently LOCAL file system and HDFS are supported.| +|Type| String | +|Default|LOCAL | +|Effective|Only allowed to be modified in first start up| + +* core\_site\_path + +|Name| core\_site\_path | +|:---:|:---| +|Description| Absolute file path of core-site.xml if Tsfile and related data files are stored in HDFS.| +|Type| String | +|Default|/etc/hadoop/conf/core-site.xml | +|Effective|After restart system| + +* hdfs\_site\_path + +|Name| hdfs\_site\_path | +|:---:|:---| +|Description| Absolute file path of hdfs-site.xml if Tsfile and related data files are stored in HDFS.| +|Type| String | +|Default|/etc/hadoop/conf/hdfs-site.xml | +|Effective|After restart system| + +* hdfs\_ip + +|Name| hdfs\_ip | +|:---:|:---| +|Description| IP of HDFS if Tsfile and related data files are stored in HDFS. **If there are more than one hdfs\_ip in configuration, Hadoop HA is used.**| +|Type| String | +|Default|localhost | +|Effective|After restart system| + +* hdfs\_port + +|Name| hdfs\_port | +|:---:|:---| +|Description| Port of HDFS if Tsfile and related data files are stored in HDFS| +|Type| String | +|Default|9000 | +|Effective|After restart system| + +* dfs\_nameservices + +|Name| hdfs\_nameservices | +|:---:|:---| +|Description| Nameservices of HDFS HA if using Hadoop HA| +|Type| String | +|Default|hdfsnamespace | +|Effective|After restart system| + +* dfs\_ha\_namenodes + +|Name| hdfs\_ha\_namenodes | +|:---:|:---| +|Description| Namenodes under DFS nameservices of HDFS HA if using Hadoop HA| +|Type| String | +|Default|nn1,nn2 | +|Effective|After restart system| + +* dfs\_ha\_automatic\_failover\_enabled + +|Name| dfs\_ha\_automatic\_failover\_enabled | +|:---:|:---| +|Description| Whether using automatic failover if using Hadoop HA| +|Type| Boolean | +|Default|true | +|Effective|After restart system| + +* dfs\_client\_failover\_proxy\_provider + +|Name| dfs\_client\_failover\_proxy\_provider | +|:---:|:---| +|Description| Proxy provider if using Hadoop HA and enabling automatic failover| +|Type| String | +|Default|org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider | +|Effective|After restart system| + +* hdfs\_use\_kerberos + +|Name| hdfs\_use\_kerberos | +|:---:|:---| +|Description| Whether use kerberos to authenticate hdfs| +|Type| String | +|Default|false | +|Effective|After restart system| + +* kerberos\_keytab\_file_path + +|Name| kerberos\_keytab\_file_path | +|:---:|:---| +|Description| Full path of kerberos keytab file| +|Type| String | +|Default|/path | +|Effective|After restart system| + +* kerberos\_principal + +|Name| kerberos\_principal | +|:---:|:---| +|Description| Kerberos pricipal| +|Type| String | +|Default|your principal | +|Effective|After restart system| + +Start server, and Tsfile will be stored on HDFS. + +To reset storage file system to local, just edit configuration `tsfile_storage_fs` to `LOCAL`. In this situation, if data files are already on HDFS, you should either download them to local and move them to your config data file folder (`../server/target/iotdb-server-X.X.X/data/data` by default), or restart your process and import data to IoTDB. + +## Frequent questions + +1. What Hadoop version does it support? + +A: Both Hadoop 2.x and Hadoop 3.x can be supported. + +2. When starting the server or trying to create timeseries, I encounter the error below: +``` +ERROR org.apache.iotdb.tsfile.fileSystem.fsFactory.HDFSFactory:62 - Failed to get Hadoop file system. Please check your dependency of Hadoop module. +``` + +A: It indicates that you forget to put Hadoop module dependency in IoTDB server. You can solve it by: +* Build Hadoop module: `mvn clean package -pl iotdb-connector/hadoop -am -Dmaven.test.skip=true -P get-jar-with-dependencies` +* Copy the target jar of Hadoop module `hadoop-tsfile-X.X.X-jar-with-dependencies.jar` into server target lib folder `.../server/target/iotdb-server-X.X.X/lib`. diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..82386a2d --- /dev/null +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md @@ -0,0 +1,386 @@ + +# 集群版安装部署 + +本小节描述如何手动部署包括3个ConfigNode和3个DataNode的实例,即通常所说的3C3D集群。 + +
+ +
+ +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](../Deployment-and-Maintenance/Environment-Requirements.md)准备完成。 + +2. 推荐使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在服务器上配`/etc/hosts`,如本机ip是11.101.17.224,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的`cn_internal_address`、`dn_internal_address`。 + + ```shell + echo "11.101.17.224 iotdb-1" >> /etc/hosts + ``` + +3. 有些参数首次启动后不能修改,请参考下方的[参数配置](#参数配置)章节来进行设置。 + +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 + +5. 请注意,安装部署(包括激活和使用软件)IoTDB时,您可以: + +- 使用 root 用户(推荐):可以避免权限等问题。 + +- 使用固定的非 root 用户: + + - 使用同一用户操作:确保在启动、激活、停止等操作均保持使用同一用户,不要切换用户。 + + - 避免使用 sudo:使用 sudo 命令会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 + +6. 推荐部署监控面板,可以对重要运行指标进行监控,随时掌握数据库运行状态,监控面板可以联系商务获取,部署监控面板步骤可以参考:[监控面板部署](./Monitoring-panel-deployment.md) + +## 准备步骤 + +1. 准备IoTDB数据库安装包 :timechodb-{version}-bin.zip(安装包获取见:[链接](./IoTDB-Package_timecho.md)) +2. 按环境要求配置好操作系统环境(系统环境配置见:[链接](./Environment-Requirements.md)) + +## 安装步骤 + +假设现在有3台linux服务器,IP地址和服务角色分配如下: + +| 节点ip | 主机名 | 服务 | +| ------------- | ------- | -------------------- | +| 11.101.17.224 | iotdb-1 | ConfigNode、DataNode | +| 11.101.17.225 | iotdb-2 | ConfigNode、DataNode | +| 11.101.17.226 | iotdb-3 | ConfigNode、DataNode | + +### 设置主机名 + +在3台机器上分别配置主机名,设置主机名需要在目标服务器上配置/etc/hosts,使用如下命令: + +```shell +echo "11.101.17.224 iotdb-1" >> /etc/hosts +echo "11.101.17.225 iotdb-2" >> /etc/hosts +echo "11.101.17.226 iotdb-3" >> /etc/hosts +``` + +### 参数配置 + +解压安装包并进入安装目录 + +```shell +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +#### 环境脚本配置 + +- ./conf/confignode-env.sh配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :------------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- ./conf/datanode-env.sh配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :----------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 通用配置(./conf/iotdb-system.properties) + +- 集群配置 + +| 配置项 | 说明 | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | +| ------------------------- | ---------------------------------------- | -------------- | -------------- | -------------- | +| cluster_name | 集群名称 | defaultCluster | defaultCluster | defaultCluster | +| schema_replication_factor | 元数据副本数,DataNode数量不应少于此数目 | 3 | 3 | 3 | +| data_replication_factor | 数据副本数,DataNode数量不应少于此数目 | 2 | 2 | 2 | + +#### ConfigNode 配置 + +| 配置项 | 说明 | 默认 | 推荐值 | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | 备注 | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 10710 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 10720 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +#### DataNode 配置 + +| 配置项 | 说明 | 默认 | 推荐值 | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | 备注 | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| dn_rpc_address | 客户端 RPC 服务的地址 | 0.0.0.0 | 0.0.0.0 | 0.0.0.0 | 0.0.0.0 | 0.0.0.0 | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 6667 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 10730 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 10740 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 10750 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 10760 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +> ❗️注意:VSCode Remote等编辑器无自动保存配置功能,请确保修改的文件被持久化保存,否则配置项无法生效 + +### 启动ConfigNode节点 + +先启动第一个iotdb-1的confignode, 保证种子confignode节点先启动,然后依次启动第2和第3个confignode节点 + +```shell +cd sbin +./start-confignode.sh -d #“-d”参数将在后台进行启动 +``` + +如果启动失败,请参考下[常见问题](#常见问题) + +### 启动DataNode 节点 + + 分别进入iotdb的sbin目录下,依次启动3个datanode节点: + +```shell +cd sbin +./start-datanode.sh -d #-d参数将在后台进行启动 +``` + +### 激活数据库 + +#### 方式一:激活文件拷贝激活 + +- 依次启动3个Confignode、Datanode节点后,每台机器各自的activation文件夹, 分别拷贝每台机器的system_info文件给天谋工作人员; +- 工作人员将返回每个ConfigNode、Datanode节点的license文件,这里会返回3个license文件; +- 将3个license文件分别放入对应的ConfigNode节点的activation文件夹下; + +#### 方式二:激活脚本激活 +- 依次获取3台机器的机器码,进入 IoTDB CLI + + - 表模型 CLI 进入命令: + + ```SQL + # Linux或MACOS系统 + ./start-cli.sh -sql_dialect table + + # windows系统 + ./start-cli.bat -sql_dialect table + ``` + + - 树模型 CLI 进入命令: + + ```SQL + # Linux或MACOS系统 + ./start-cli.sh + + # windows系统 + ./start-cli.bat + ``` + + - 执行以下内容获取激活所需机器码: + - 注:当前仅支持在树模型中进行激活 + + ```Bash + show system info + ``` + + - 显示如下信息,这里显示的是1台机器的机器码 : + + ```Bash + +--------------------------------------------------------------+ + | SystemInfo| + +--------------------------------------------------------------+ + |01-TE5NLES4-UDDWCMYE,01-GG5NLES4-XXDWCMYE,01-FF5NLES4-WWWWCMYE| + +--------------------------------------------------------------+ + Total line number = 1 + It costs 0.030s + ``` + +- 其他2个节点依次进入到IoTDB树模型的CLI中,执行语句后将获取的3台机器的机器码都复制给天谋工作人员 + +- 工作人员会返回3段激活码,正常是与提供的3个机器码的顺序对应的,请分别将各自的激活码粘贴到CLI中,如下提示: + + - 注:激活码前后需要用`'`符号进行标注,如所示 + + ```Bash + IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' + ``` + +### 验证激活 + +当看到“Result”字段状态显示为success表示激活成功 + +![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4-%E9%AA%8C%E8%AF%81.png) + +## 节点维护步骤 + +### ConfigNode节点维护 + +ConfigNode节点维护分为ConfigNode添加和移除两种操作,有两个常见使用场景: + +- 集群扩展:如集群中只有1个ConfigNode时,希望增加ConfigNode以提升ConfigNode节点高可用性,则可以添加2个ConfigNode,使得集群中有3个ConfigNode。 +- 集群故障恢复:1个ConfigNode所在机器发生故障,使得该ConfigNode无法正常运行,此时可以移除该ConfigNode,然后添加一个新的ConfigNode进入集群。 + +> ❗️注意,在完成ConfigNode节点维护后,需要保证集群中有1或者3个正常运行的ConfigNode。2个ConfigNode不具备高可用性,超过3个ConfigNode会导致性能损失。 + +#### 添加ConfigNode节点 + +脚本命令: + +```shell +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-confignode.sh + +# Windows +# 首先切换到IoTDB根目录 +sbin/start-confignode.bat +``` + +#### 移除ConfigNode节点 + +首先通过CLI连接集群,通过`show confignodes`确认想要移除ConfigNode的内部地址与端口号: + +```shell +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] +或 +./sbin/remove-confignode.sh [cn_internal_address:cn_internal_port] + +#Windows +sbin/remove-confignode.bat [confignode_id] +或 +./sbin/remove-confignode.bat [cn_internal_address:cn_internal_port] +``` + +### DataNode节点维护 + +DataNode节点维护有两个常见场景: + +- 集群扩容:出于集群能力扩容等目的,添加新的DataNode进入集群 +- 集群故障恢复:一个DataNode所在机器出现故障,使得该DataNode无法正常运行,此时可以移除该DataNode,并添加新的DataNode进入集群 + +> ❗️注意,为了使集群能正常工作,在DataNode节点维护过程中以及维护完成后,正常运行的DataNode总数不得少于数据副本数(通常为2),也不得少于元数据副本数(通常为3)。 + +#### 添加DataNode节点 + +脚本命令: + +```Bash +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-datanode.sh + +#Windows +# 首先切换到IoTDB根目录 +sbin/start-datanode.bat +``` + +说明:在添加DataNode后,随着新的写入到来(以及旧数据过期,如果设置了TTL),集群负载会逐渐向新的DataNode均衡,最终在所有节点上达到存算资源的均衡。 + +#### 移除DataNode节点 + +首先通过CLI连接集群,通过`show datanodes`确认想要移除的DataNode的RPC地址与端口号: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [dn_rpc_address:dn_rpc_port] + +#Windows +sbin/remove-datanode.bat [dn_rpc_address:dn_rpc_port] +``` + +## 常见问题 + +1. 部署过程中多次提示激活失败 + - 使用 `ls -al` 命令:使用 `ls -al` 命令检查安装包根目录的所有者信息是否为当前用户。 + - 检查激活目录:检查 `./activation` 目录下的所有文件,所有者信息是否为当前用户。 +2. Confignode节点启动失败 + - 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + - 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + - 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + - 清理环境: + + 1. 结束所有 ConfigNode 和 DataNode 进程。 + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + 2. 删除 data 和 logs 目录。 + - 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```shell + cd /data/iotdb rm -rf data logs + ``` +## 附录 + +### Confignode节点参数介绍 + +| 参数 | 描述 | 是否为必填项 | +| :--- | :------------------------------- | :----------- | +| -d | 以守护进程模式启动,即在后台运行 | 否 | + +### Datanode节点参数介绍 + +| 缩写 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Database-Resources.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Database-Resources.md new file mode 100644 index 00000000..17e09aa0 --- /dev/null +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Database-Resources.md @@ -0,0 +1,193 @@ + +# 资源规划 +## CPU + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序列数(采集频率<=1HZ)CPU节点数
单机双活分布式
10W以内2核-4核123
30W以内4核-8核123
50W以内8核-16核123
100W以内16核-32核123
200w以内32核-48核123
1000w以内48核12请联系天谋商务咨询
1000w以上请联系天谋商务咨询
+ +## 内存 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序列数(采集频率<=1HZ)内存节点数
单机双活分布式
10W以内4G-8G123
30W以内12G-32G123
50W以内24G-48G123
100W以内32G-96G123
200w以内64G-128G123
1000w以内128G12请联系天谋商务咨询
1000w以上请联系天谋商务咨询
+ +## 存储(磁盘) +### 存储空间 +计算公式:测点数量 * 采样频率(Hz)* 每个数据点大小(Byte,不同数据类型不一样,见下表) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
数据点大小计算表
数据类型 时间戳(字节)值(字节)数据点总大小(字节)
开关量(Boolean)819
整型(INT32)/ 单精度浮点数(FLOAT)8412
长整型(INT64)/ 双精度浮点数(DOUBLE)8816
字符串(TEXT)8平均为a8+a
+ +示例:1000设备,每个设备100 测点,共 100000 序列,INT32 类型。采样频率1Hz(每秒一次),存储1年,3副本。 +- 完整计算公式:1000设备 * 100测点 * 12字节每数据点 * 86400秒每天 * 365天每年 * 3副本/10压缩比=11T +- 简版计算公式:1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T +### 存储配置 +1000w 点位以上或查询负载较大,推荐配置 SSD。 +## 网络(网卡) +在写入吞吐不超过1000万点/秒时,需配置千兆网卡;当写入吞吐超过 1000万点/秒时,需配置万兆网卡。 +| **写入吞吐(数据点/秒)** | **网卡速率** | +| ------------------- | ------------- | +| <1000万 | 1Gbps(千兆) | +| >=1000万 | 10Gbps(万兆) | +## 其他说明 +IoTDB 具有集群秒级扩容能力,扩容节点数据可不迁移,因此您无需担心按现有数据情况估算的集群能力有限,未来您可在需要扩容时为集群加入新的节点。 \ No newline at end of file diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md new file mode 100644 index 00000000..99c5b14c --- /dev/null +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Environment-Requirements.md @@ -0,0 +1,205 @@ + +# 系统配置 + +## 磁盘阵列 + +### 配置建议 + +IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵列存储IoTDB的数据,以达到多个磁盘阵列并发写入的目标,配置可参考以下建议: + +1. 物理环境 + 系统盘:建议使用2块磁盘做Raid1,仅考虑操作系统自身所占空间即可,可以不为IoTDB预留系统盘空间 + 数据盘 + 建议做Raid,在磁盘维度进行数据保护 + 建议为IoTDB提供多块磁盘(1-6块左右)或磁盘组(不建议将所有磁盘做成一个磁盘阵列,会影响 IoTDB的性能上限) +2. 虚拟环境 + 建议挂载多块硬盘(1-6块左右) + +### 配置示例 + +- 示例1,4块3.5英寸硬盘 + +因服务器安装的硬盘较少,直接做Raid5即可,无需其他配置。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| ----------- | -------- | -------- | --------- | -------- | +| 系统/数据盘 | RAID5 | 4 | 允许坏1块 | 3 | + +- 示例2,12块3.5英寸硬盘 + +服务器配置12块3.5英寸盘。 + +前2块盘推荐Raid1作系统盘,2组数据盘可分为2组Raid5,每组5块盘实际可用4块。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| -------- | -------- | -------- | --------- | -------- | +| 系统盘 | RAID1 | 2 | 允许坏1块 | 1 | +| 数据盘 | RAID5 | 5 | 允许坏1块 | 4 | +| 数据盘 | RAID5 | 5 | 允许坏1块 | 4 | + +- 示例3,24块2.5英寸盘 + +服务器配置24块2.5英寸盘。 + +前2块盘推荐Raid1作系统盘,后面可分为3组Raid5,每组7块盘实际可用6块。剩余一块可闲置或存储写前日志使用。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| -------- | -------- | -------- | --------- | -------- | +| 系统盘 | RAID1 | 2 | 允许坏1块 | 1 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | NoRaid | 1 | 损坏丢失 | 1 | + +## 操作系统 + +### 版本要求 + +IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 + +### 硬盘分区 + +- 建议使用默认的标准分区方式,不推荐LVM扩展和硬盘加密。 +- 系统盘只需满足操作系统的使用空间即可,不需要为IoTDB预留空间。 +- 每个硬盘组只对应一个分区即可,数据盘(里面有多个磁盘组,对应raid)不用再额外分区,所有空间给IoTDB使用。 + +建议的磁盘分区方式如下表所示。 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
硬盘分类磁盘组对应盘符大小文件系统类型
系统盘磁盘组0/boot1GB默认
/磁盘组剩余全部空间默认
数据盘磁盘组1/data1磁盘组1全部空间默认
磁盘组2/data2磁盘组2全部空间默认
......
+ +### 网络配置 + +1. 关闭防火墙 + +```Bash +# 查看防火墙 +systemctl status firewalld +# 关闭防火墙 +systemctl stop firewalld +# 永久关闭防火墙 +systemctl disable firewalld +``` + +2. 保证所需端口不被占用 + +(1)集群占用端口的检查:在集群默认配置中,ConfigNode 会占用端口 10710 和 10720,DataNode 会占用端口 6667、10730、10740、10750 、10760、9090、9190、3000请确保这些端口未被占用。检查方式如下: + +```Bash +lsof -i:6667 或 netstat -tunp | grep 6667 +lsof -i:10710 或 netstat -tunp | grep 10710 +lsof -i:10720 或 netstat -tunp | grep 10720 +#如果命令有输出,则表示该端口已被占用。 +``` + +(2)集群部署工具占用端口的检查:使用集群管理工具opskit安装部署集群时,需打开SSH远程连接服务配置,并开放22号端口。 + +```Bash +yum install openssh-server #安装ssh服务 +systemctl start sshd #启用22号端口 +``` + +3. 保证服务器之间的网络相互连通 + +### 其他配置 + +1. 关闭系统 swap 内存 + +```Bash +echo "vm.swappiness = 0">> /etc/sysctl.conf +# 一起执行 swapoff -a 和 swapon -a 命令是为了将 swap 里的数据转储回内存,并清空 swap 里的数据。 +# 不可省略 swappiness 设置而只执行 swapoff -a;否则,重启后 swap 会再次自动打开,使得操作失效。 +swapoff -a && swapon -a +# 在不重启的情况下使配置生效。 +sysctl -p +# 检查内存分配,预期 swap 为 0 +free -m +``` + +2. 设置系统最大打开文件数为 65535,以避免出现 "太多的打开文件 "的错误。 + +```Bash +#查看当前限制 +ulimit -n +# 临时修改 +ulimit -n 65535 +# 永久修改 +echo "* soft nofile 65535" >> /etc/security/limits.conf +echo "* hard nofile 65535" >> /etc/security/limits.conf +#退出当前终端会话后查看,预期显示65535 +ulimit -n +``` + +## 软件依赖 + +安装 Java 运行环境 ,Java 版本 >= 1.8,请确保已设置 jdk 环境变量。(V1.3.2.2 及之上版本推荐直接部署JDK17,老版本JDK部分场景下性能有问题,且datanode会出现stop不掉的问题) + +```Bash + #下面以在centos7,使用JDK-17安装为例: + tar -zxvf jdk-17_linux-x64_bin.tar #解压JDK文件 + Vim ~/.bashrc #配置JDK环境 + { export JAVA_HOME=/usr/lib/jvm/jdk-17.0.9 + export PATH=$JAVA_HOME/bin:$PATH + } #添加JDK环境变量 + source ~/.bashrc #配置环境生效 + java -version #检查JDK环境 +``` \ No newline at end of file diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md new file mode 100644 index 00000000..6c66c7fb --- /dev/null +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md @@ -0,0 +1,45 @@ + +# 安装包获取 +## 获取方式 + +企业版安装包可通过产品试用申请,或直接联系与您对接的工作人员获取。 + +## 安装包结构 + +安装包解压后目录结构如下: + +| **目录** | **类型** | **说明** | +| :--------------- | :------- | :----------------------------------------------------------- | +| activation | 文件夹 | 激活文件所在目录,包括生成的机器码以及从天谋工作人员获取的企业版激活码(启动ConfigNode后才会生成该目录,即可获取激活码) | +| conf | 文件夹 | 配置文件目录,包含 ConfigNode、DataNode、JMX 和 logback 等配置文件 | +| data | 文件夹 | 默认的数据文件目录,包含 ConfigNode 和 DataNode 的数据文件。(启动程序后才会生成该目录) | +| lib | 文件夹 | 库文件目录 | +| licenses | 文件夹 | 开源协议证书文件目录 | +| logs | 文件夹 | 默认的日志文件目录,包含 ConfigNode 和 DataNode 的日志文件(启动程序后才会生成该目录) | +| sbin | 文件夹 | 主要脚本目录,包含数据库启、停等脚本 | +| tools | 文件夹 | 工具目录 | +| ext | 文件夹 | pipe,trigger,udf插件的相关文件 | +| LICENSE | 文件 | 开源许可证文件 | +| NOTICE | 文件 | 开源声明文件 | +| README_ZH.md | 文件 | 使用说明(中文版) | +| README.md | 文件 | 使用说明(英文版) | +| RELEASE_NOTES.md | 文件 | 版本说明 | diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md new file mode 100644 index 00000000..c7fba837 --- /dev/null +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -0,0 +1,682 @@ + +# 监控面板部署 + +IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 + +## 安装准备 + +1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 +2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 + +## 安装步骤 + +### 步骤一:IoTDB开启监控指标采集 + +1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 + +| 配置项 | 所在配置文件 | 配置说明 | +| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | +| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | +| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | +| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | +| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | +| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | +| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | + +以3C3D集群为例,需要修改的监控配置如下: + +| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | +| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | +| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | + +2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: + +```shell +./sbin/stop-standalone.sh #先停止confignode和datanode +./sbin/start-confignode.sh -d #启动confignode +./sbin/start-datanode.sh -d #启动datanode +``` + +3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### 步骤二:安装、配置Prometheus + +> 此处以prometheus安装在服务器192.168.1.3为例。 + +1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) +2. 解压安装包,进入解压后的文件夹: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +3. 修改配置。修改配置文件prometheus.yml如下 + 1. 新增confignode任务收集ConfigNode的监控数据 + 2. 新增datanode任务收集DataNode的监控数据 + +```shell +global: + scrape_interval: 15s + evaluation_interval: 15s +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true +``` + +4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 + +
+ + +
+ + + +6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: + +![](https://alioss.timecho.com/docs/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) + +### 步骤三:安装grafana并配置数据源 + +> 此处以Grafana安装在服务器192.168.1.3为例。 + +1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) +2. 解压并进入对应文件夹 + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +3. 启动Grafana: + +```Shell +./bin/grafana-server web +``` + +4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 + +5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus + +![](https://alioss.timecho.com/docs/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) + +在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 + +![](https://alioss.timecho.com/docs/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) + +### 步骤四:导入IoTDB Grafana看板 + +1. 进入Grafana,选择Dashboards: + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) + +2. 点击右侧 Import 按钮 + + ![](https://alioss.timecho.com/docs/img/Import%E6%8C%89%E9%92%AE.png) + +3. 使用upload json file的方式导入Dashboard + + ![](https://alioss.timecho.com/docs/img/%E5%AF%BC%E5%85%A5Dashboard.png) + +4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) + +5. 选择数据源为Prometheus,然后点击Import + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) + +6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF.png) + +7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: + +
+ + + +
+ +8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) + +## 附录、监控指标详解 + +### 系统面板(System Dashboard) + +该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 + +#### CPU + +- CPU Core:CPU 核数 +- CPU Load: + - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 + - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 +- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 + +#### Memory + +- System Memory:当前系统内存的使用情况。 + - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 + - Total physical memory:系统可用物理内存的总量。 + - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 +- System Swap Memory:交换空间(Swap Space)内存用量。 +- Process Memory:IoTDB 进程使用内存的情况。 + - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) + - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 + - Used Memory:IoTDB 进程当前已经使用的内存总量。 + +#### Disk + +- Disk Space: + - Total disk space:IoTDB 可使用的最大磁盘空间。 + - Used disk space:IoTDB 已经使用的磁盘空间。 +- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 +- File Count:IoTDB 相关文件数量 + - all:所有文件数量 + - TsFile:TsFile 数量 + - seq:顺序 TsFile 数量 + - unseq:乱序 TsFile 数量 + - wal:WAL 文件数量 + - cross-temp:跨空间合并 temp 文件数量 + - inner-seq-temp:顺序空间内合并 temp 文件数量 + - innser-unseq-temp:乱序空间内合并 temp 文件数量 + - mods:墓碑文件数量 +- Open File Count:系统打开的文件句柄数量 +- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 +- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 +- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 +- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 +- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 +- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 +- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 +- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 +- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 + +#### JVM + +- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 +- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 +- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 +- Heap Memory:JVM 堆内存使用情况。 + - Maximum heap memory:JVM 最大可用的堆内存大小。 + - Committed heap memory:JVM 已提交的堆内存大小。 + - Used heap memory:JVM 已经使用的堆内存大小。 + - PS Eden Space:PS Young 区的大小。 + - PS Old Space:PS Old 区的大小。 + - PS Survivor Space:PS Survivor 区的大小。 + - ...(CMS/G1/ZGC 等) +- Off Heap Memory:堆外内存用量。 + - direct memory:堆外直接内存。 + - mapped memory:堆外映射内存。 +- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC +- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC +- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC +- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC +- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 +- The Number of Class: + - loaded:JVM 目前已经加载的类的数量 + - unloaded:系统启动至今 JVM 卸载的类的数量 +- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 + +#### Network + +eno 指的是到公网的网卡,lo 是虚拟网卡。 + +- Net Speed:网卡发送和接收数据的速度 +- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 +- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 +- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) + +### 整体性能面板(Performance Overview Dashboard) + +#### Cluster Overview + +- Total CPU Core: 集群机器 CPU 总核数 +- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 +- 磁盘 + - Total Disk Space: 集群机器磁盘总大小 + - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 +- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 +- Cluster: 集群 ConfigNode 和 DataNode 节点数量 +- Up Time: 集群启动至今的时长 +- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 +- 内存 + - Total System Memory: 集群机器系统内存总大小 + - Total Swap Memory: 集群机器交换内存总大小 + - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 +- Total File Number: 集群管理文件总数量 +- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 +- Total DataBase: 集群管理的 Database 总数(含副本) +- Total DataRegion: 集群管理的 DataRegion 总数 +- Total SchemaRegion: 集群管理的 SchemaRegion 总数 + +#### Node Overview + +- CPU Core: 节点所在机器的 CPU 核数 +- Disk Space: 节点所在机器的磁盘大小 +- Timeseries: 节点所在机器管理的时间序列数量(含副本) +- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 +- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) +- System Memory: 节点所在机器的系统内存大小 +- Swap Memory: 节点所在机器的交换内存大小 +- File Number: 节点管理的文件数 + +#### Performance + +- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 +- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 +- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 +- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 +- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 +- Task Number: 节点的各项系统任务数量 +- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 +- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 +- Operation Per Second: 节点的每秒操作数 +- 主流程 + - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 + - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 + - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 +- Schedule 阶段 + - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 + - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 + - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 +- Local Schedule 各子阶段 + - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 + - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 + - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 +- Storage 阶段 + - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 + - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 + - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 +- Engine 阶段 + - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 + - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 + - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 + +#### System + +- CPU Load: 节点的 CPU 负载 +- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 +- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC +- Heap Memory: 节点的堆内存使用情况 +- Off Heap Memory: 节点的非堆内存使用情况 +- The Number Of Java Thread: 节点的 Java 线程数量情况 +- File Count: 节点管理的文件数量情况 +- File Size: 节点管理文件大小情况 +- Log Number Per Minute: 节点的每分钟不同类型日志情况 + +### ConfigNode 面板(ConfigNode Dashboard) + +该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 + +#### Node Overview + +- Database Count: 节点的数据库数量 +- Region + - DataRegion Count: 节点的 DataRegion 数量 + - DataRegion Current Status: 节点的 DataRegion 的状态 + - SchemaRegion Count: 节点的 SchemaRegion 数量 + - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 +- System Memory: 节点的系统内存大小 +- Swap Memory: 节点的交换区内存大小 +- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 +- DataNodes: 节点所在集群的 DataNode 情况 +- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 + +#### NodeInfo + +- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode +- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 +- DataNode Status: 节点所在集群的 DataNode 节点的状态 +- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 +- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 +- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 +- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 + +#### Protocol + +- 客户端数量统计 + - Active Client Num: 节点各线程池的活跃客户端数量 + - Idle Client Num: 节点各线程池的空闲客户端数量 + - Borrowed Client Count: 节点各线程池的借用客户端数量 + - Created Client Count: 节点各线程池的创建客户端数量 + - Destroyed Client Count: 节点各线程池的销毁客户端数量 +- 客户端时间情况 + - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 + - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 + - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + +#### Partition Table + +- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 +- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 +- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 +- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 +- DataRegion Status: 节点所在集群的 DataRegion 状态 +- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 + +#### Consensus + +- Ratis Stage Time: 节点的 Ratis 各阶段耗时 +- Write Log Entry: 节点的 Ratis 写 Log 的耗时 +- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS +- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 + +### DataNode 面板(DataNode Dashboard) + +该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 + +#### Node Overview + +- The Number Of Entity: 节点管理的实体情况 +- Write Point Per Second: 节点的每秒写入速度 +- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 + +#### Protocol + +- 节点操作耗时 + - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 + - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 + - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 +- Thrift统计 + - The QPS Of Interface: 节点各个 Thrift 接口的 QPS + - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 + - Thrift Connection: 节点的各类型的 Thrfit 连接数量 + - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 +- 客户端统计 + - Active Client Num: 节点各线程池的活跃客户端数量 + - Idle Client Num: 节点各线程池的空闲客户端数量 + - Borrowed Client Count: 节点的各线程池借用客户端数量 + - Created Client Count: 节点各线程池的创建客户端数量 + - Destroyed Client Count: 节点各线程池的销毁客户端数量 + - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 + - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 + - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + +#### Storage Engine + +- File Count: 节点管理的各类型文件数量 +- File Size: 节点管理的各类型文件大小 +- TsFile + - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 + - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 + - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 +- Task Number: 节点的 Task 数量 +- The Time Consumed of Task: 节点的 Task 的耗时 +- Compaction + - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 + - Compaction Number Per Minute: 节点的每分钟合并数量 + - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 + - Compacted Point Num Per Minute: 节点每分钟合并的点数 + +#### Write Performance + +- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable +- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable +- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable +- WAL + - WAL File Size: 节点管理的 WAL 文件总大小 + - WAL File Num: 节点管理的 WAL 文件数量 + - WAL Nodes Num: 节点管理的 WAL Node 数量 + - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 + - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 + - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 + - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 + - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 + - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 + - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 + - WAL Buffer + - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 + - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 + - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 +- Flush统计 + - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 + - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 + - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 + - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 + - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 + - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 +- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 +- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 +- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 +- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 +- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 +- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 +- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 +- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 + +#### Schema Engine + +- Schema Engine Mode: 节点的元数据引擎模式 +- Schema Consensus Protocol: 节点的元数据共识协议 +- Schema Region Number: 节点管理的 SchemaRegion 数量 +- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 +- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 +- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 +- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) +- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 +- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 +- 时间序列统计 + - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 + - Series Type: 节点不同类型的时间序列数量 + - Time Series Number: 节点的时间序列总数 + - Template Series Number: 节点的模板时间序列总数 + - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 +- IMNode统计 + - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 + - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 + - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 + - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 + - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 + - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 +- Cache Hit Rate: 节点的缓存命中率 +- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 +- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 +- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 + +#### Query Engine + +- 各阶段耗时 + - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 + - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 + - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 +- 执行计划分发耗时 + - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 + - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 + - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 +- 执行计划执行耗时 + - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 + - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 + - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 +- 算子执行耗时 + - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 + - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 + - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 +- 聚合查询计算耗时 + - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 + - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 + - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 +- 文件/内存接口耗时 + - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 + - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 + - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 +- 资源访问数量 + - The usage of query resource(avg): 节点查询资源访问数量的平均值 + - The usage of query resource(50%): 节点查询资源访问数量的中位数 + - The usage of query resource(99%): 节点查询资源访问数量的P99 +- 数据传输耗时 + - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 + - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 + - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 +- 数据传输数量 + - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 + - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 +- 任务调度数量与耗时 + - The number of query queue: 节点查询任务调度数量 + - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 + - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 + - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 + +#### Query Interface + +- 加载时间序列元数据 + - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 + - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 + - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 +- 读取时间序列 + - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 + - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 + - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 +- 修改时间序列元数据 + - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 + - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 + - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 +- 加载Chunk元数据列表 + - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 + - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 + - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 +- 修改Chunk元数据 + - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 + - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 + - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 +- 按照Chunk元数据过滤 + - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 + - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 + - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 +- 构造Chunk Reader + - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 + - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 + - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 +- 读取Chunk + - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 + - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 + - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 +- 初始化Chunk Reader + - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 + - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 + - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 +- 通过 Page Reader 构造 TsBlock + - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 + - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 + - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 +- 查询通过 Merge Reader 构造 TsBlock + - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 + - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 + - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 + +#### Query Data Exchange + +查询的数据交换耗时。 + +- 通过 source handle 获取 TsBlock + - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 + - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 + - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 +- 通过 source handle 反序列化 TsBlock + - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 + - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 + - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 +- 通过 sink handle 发送 TsBlock + - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 + - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 + - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 +- 回调 data block event + - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 + - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 + - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 +- 获取 data block task + - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 + - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 + - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 + +#### Query Related Resource + +- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 +- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 +- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 +- Coordinator: 节点上记录的查询数量 +- MemoryPool Size: 节点查询相关的内存池情况 +- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 +- DriverScheduler: 节点查询相关的队列任务数量 + +#### Consensus - IoT Consensus + +- 内存使用 + - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 +- 节点间同步情况 + - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 + - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 + - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 + - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 + - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 + - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 + - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 + - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 + - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 + - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 +- 不同执行阶段耗时 + - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 + - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 + - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 +- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 +- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS +- RatisConsensus Memory: 节点 Ratis 的内存使用情况 + +#### Consensus - SchemaRegion Ratis Consensus + +- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 +- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 +- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS +- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md new file mode 100644 index 00000000..5344fd9e --- /dev/null +++ b/src/zh/UserGuide/Master/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md @@ -0,0 +1,234 @@ + +# 单机版安装部署 + +本章将介绍如何启动IoTDB单机实例,IoTDB单机实例包括 1 个ConfigNode 和1个DataNode(即通常所说的1C1D)。 + +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](../Deployment-and-Maintenance/Environment-Requirements.md)准备完成。 +2. 推荐使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在服务器上配置`/etc/hosts`,如本机ip是192.168.1.3,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的 `cn_internal_address`、`dn_internal_address`。 + + ```shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. 部分参数首次启动后不能修改,请参考下方的[参数配置](#2参数配置)章节进行设置。 +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 +5. 请注意,安装部署(包括激活和使用软件)IoTDB时,您可以: + - 使用 root 用户(推荐):可以避免权限等问题。 + - 使用固定的非 root 用户: + - 使用同一用户操作:确保在启动、激活、停止等操作均保持使用同一用户,不要切换用户。 + - 避免使用 sudo:使用 sudo 命令会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 +6. 推荐部署监控面板,可以对重要运行指标进行监控,随时掌握数据库运行状态,监控面板可以联系工作人员获取,部署监控面板步骤可以参考:[监控面板部署](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) + +## 安装步骤 + +### 1、解压安装包并进入安装目录 + +```Plain +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +### 2、参数配置 + +#### 内存配置 + +- conf/confignode-env.sh(或 .bat) + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :------------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- conf/datanode-env.sh(或 .bat) + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :----------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 功能配置 + +系统实际生效的参数在文件 conf/iotdb-system.properties 中,启动需设置以下参数,可以从 conf/iotdb-system.properties.template 文件中查看全部参数 + +集群级功能配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :------------------------ | :------------------------------- | :------------- | :----------------------------------------------- | :------------------------ | +| cluster_name | 集群名称 | defaultCluster | 可根据需要设置集群名称,如无特殊需要保持默认即可 | 首次启动后不可修改 | +| schema_replication_factor | 元数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | +| data_replication_factor | 数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | + +ConfigNode 配置 + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------- | :----------------- | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +DataNode 配置 + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :------------------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------- | :----------------- | +| dn_rpc_address | 客户端 RPC 服务的地址 | 0.0.0.0 | 0.0.0.0 | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +### 3、启动 ConfigNode 节点 + +进入iotdb的sbin目录下,启动confignode + +```shell +./sbin/start-confignode.sh -d #“-d”参数将在后台进行启动 +``` + +如果启动失败,请参考下方[常见问题](#常见问题)。 + +### 4、启动 DataNode 节点 + + 进入iotdb的sbin目录下,启动datanode: + +```shell +./sbin/start-datanode.sh -d #“-d”参数将在后台进行启动 +``` + +### 5、激活数据库 + +#### 方式一:文件激活 + +- 启动Confignode、Datanode节点后,进入activation文件夹, 将 system_info文件复制给天谋工作人员 +- 收到工作人员返回的 license文件 +- 将license文件放入对应节点的activation文件夹下; + +#### 方式二:命令激活 +- 进入 IoTDB CLI + - 表模型 CLI 进入命令: + ```SQL + # Linux或MACOS系统 + ./start-cli.sh -sql_dialect table + + # windows系统 + ./start-cli.bat -sql_dialect table + ``` + + - 树模型 CLI 进入命令: + ```SQL + # Linux或MACOS系统 + ./start-cli.sh + + # windows系统 + ./start-cli.bat + ``` +- 执行以下内容获取激活所需机器码: + - 注:当前仅支持在树模型中进行激活 + +```Bash +show system info +``` + +- 将返回机器码(即绿色字符串)复制给天谋工作人员: + +```Bash ++--------------------------------------------------------------+ +| SystemInfo| ++--------------------------------------------------------------+ +| 01-TE5NLES4-UDDWCMYE| ++--------------------------------------------------------------+ +Total line number = 1 +It costs 0.030s +``` + +- 将工作人员返回的激活码输入到CLI中,输入以下内容 + - 注:激活码前后需要用`'`符号进行标注,如所示 + +```Bash +IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' +``` + +### 6、验证激活 + +当看到“ClusterActivationStatus”字段状态显示为ACTIVATED表示激活成功 + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81.png) + +## 常见问题 + +1. 部署过程中多次提示激活失败 + - 使用 `ls -al` 命令:使用 `ls -al` 命令检查安装包根目录的所有者信息是否为当前用户。 + - 检查激活目录:检查 `./activation` 目录下的所有文件,所有者信息是否为当前用户。 +2. Confignode节点启动失败 + - 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + - 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + - 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + - 清理环境: + 1. 结束所有 ConfigNode 和 DataNode 进程。 + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + 2. 删除 data 和 logs 目录。 + - 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```shell + cd /data/iotdb rm -rf data logs + ``` + +## 附录 + +### Confignode节点参数介绍 + +| 参数 | 描述 | 是否为必填项 | +| :--- | :------------------------------- | :----------- | +| -d | 以守护进程模式启动,即在后台运行 | 否 | + +### Datanode节点参数介绍 + +| 缩写 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + diff --git a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md index 99c5b14c..75be11d6 100644 --- a/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md +++ b/src/zh/UserGuide/Master/Tree/Deployment-and-Maintenance/Environment-Requirements.md @@ -162,7 +162,7 @@ systemctl start sshd #启用22号端口 ### 其他配置 -1. 关闭系统 swap 内存 +1. 将系统 swap 优先级降至最低 ```Bash echo "vm.swappiness = 0">> /etc/sysctl.conf @@ -171,7 +171,7 @@ echo "vm.swappiness = 0">> /etc/sysctl.conf swapoff -a && swapon -a # 在不重启的情况下使配置生效。 sysctl -p -# 检查内存分配,预期 swap 为 0 +# swap的已使用内存变为0 free -m ``` diff --git a/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..82386a2d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Cluster-Deployment_timecho.md @@ -0,0 +1,386 @@ + +# 集群版安装部署 + +本小节描述如何手动部署包括3个ConfigNode和3个DataNode的实例,即通常所说的3C3D集群。 + +
+ +
+ +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](../Deployment-and-Maintenance/Environment-Requirements.md)准备完成。 + +2. 推荐使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在服务器上配`/etc/hosts`,如本机ip是11.101.17.224,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的`cn_internal_address`、`dn_internal_address`。 + + ```shell + echo "11.101.17.224 iotdb-1" >> /etc/hosts + ``` + +3. 有些参数首次启动后不能修改,请参考下方的[参数配置](#参数配置)章节来进行设置。 + +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 + +5. 请注意,安装部署(包括激活和使用软件)IoTDB时,您可以: + +- 使用 root 用户(推荐):可以避免权限等问题。 + +- 使用固定的非 root 用户: + + - 使用同一用户操作:确保在启动、激活、停止等操作均保持使用同一用户,不要切换用户。 + + - 避免使用 sudo:使用 sudo 命令会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 + +6. 推荐部署监控面板,可以对重要运行指标进行监控,随时掌握数据库运行状态,监控面板可以联系商务获取,部署监控面板步骤可以参考:[监控面板部署](./Monitoring-panel-deployment.md) + +## 准备步骤 + +1. 准备IoTDB数据库安装包 :timechodb-{version}-bin.zip(安装包获取见:[链接](./IoTDB-Package_timecho.md)) +2. 按环境要求配置好操作系统环境(系统环境配置见:[链接](./Environment-Requirements.md)) + +## 安装步骤 + +假设现在有3台linux服务器,IP地址和服务角色分配如下: + +| 节点ip | 主机名 | 服务 | +| ------------- | ------- | -------------------- | +| 11.101.17.224 | iotdb-1 | ConfigNode、DataNode | +| 11.101.17.225 | iotdb-2 | ConfigNode、DataNode | +| 11.101.17.226 | iotdb-3 | ConfigNode、DataNode | + +### 设置主机名 + +在3台机器上分别配置主机名,设置主机名需要在目标服务器上配置/etc/hosts,使用如下命令: + +```shell +echo "11.101.17.224 iotdb-1" >> /etc/hosts +echo "11.101.17.225 iotdb-2" >> /etc/hosts +echo "11.101.17.226 iotdb-3" >> /etc/hosts +``` + +### 参数配置 + +解压安装包并进入安装目录 + +```shell +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +#### 环境脚本配置 + +- ./conf/confignode-env.sh配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :------------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- ./conf/datanode-env.sh配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :----------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 通用配置(./conf/iotdb-system.properties) + +- 集群配置 + +| 配置项 | 说明 | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | +| ------------------------- | ---------------------------------------- | -------------- | -------------- | -------------- | +| cluster_name | 集群名称 | defaultCluster | defaultCluster | defaultCluster | +| schema_replication_factor | 元数据副本数,DataNode数量不应少于此数目 | 3 | 3 | 3 | +| data_replication_factor | 数据副本数,DataNode数量不应少于此数目 | 2 | 2 | 2 | + +#### ConfigNode 配置 + +| 配置项 | 说明 | 默认 | 推荐值 | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | 备注 | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 10710 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 10720 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +#### DataNode 配置 + +| 配置项 | 说明 | 默认 | 推荐值 | 11.101.17.224 | 11.101.17.225 | 11.101.17.226 | 备注 | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| dn_rpc_address | 客户端 RPC 服务的地址 | 0.0.0.0 | 0.0.0.0 | 0.0.0.0 | 0.0.0.0 | 0.0.0.0 | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 6667 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 10730 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 10740 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 10750 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 10760 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +> ❗️注意:VSCode Remote等编辑器无自动保存配置功能,请确保修改的文件被持久化保存,否则配置项无法生效 + +### 启动ConfigNode节点 + +先启动第一个iotdb-1的confignode, 保证种子confignode节点先启动,然后依次启动第2和第3个confignode节点 + +```shell +cd sbin +./start-confignode.sh -d #“-d”参数将在后台进行启动 +``` + +如果启动失败,请参考下[常见问题](#常见问题) + +### 启动DataNode 节点 + + 分别进入iotdb的sbin目录下,依次启动3个datanode节点: + +```shell +cd sbin +./start-datanode.sh -d #-d参数将在后台进行启动 +``` + +### 激活数据库 + +#### 方式一:激活文件拷贝激活 + +- 依次启动3个Confignode、Datanode节点后,每台机器各自的activation文件夹, 分别拷贝每台机器的system_info文件给天谋工作人员; +- 工作人员将返回每个ConfigNode、Datanode节点的license文件,这里会返回3个license文件; +- 将3个license文件分别放入对应的ConfigNode节点的activation文件夹下; + +#### 方式二:激活脚本激活 +- 依次获取3台机器的机器码,进入 IoTDB CLI + + - 表模型 CLI 进入命令: + + ```SQL + # Linux或MACOS系统 + ./start-cli.sh -sql_dialect table + + # windows系统 + ./start-cli.bat -sql_dialect table + ``` + + - 树模型 CLI 进入命令: + + ```SQL + # Linux或MACOS系统 + ./start-cli.sh + + # windows系统 + ./start-cli.bat + ``` + + - 执行以下内容获取激活所需机器码: + - 注:当前仅支持在树模型中进行激活 + + ```Bash + show system info + ``` + + - 显示如下信息,这里显示的是1台机器的机器码 : + + ```Bash + +--------------------------------------------------------------+ + | SystemInfo| + +--------------------------------------------------------------+ + |01-TE5NLES4-UDDWCMYE,01-GG5NLES4-XXDWCMYE,01-FF5NLES4-WWWWCMYE| + +--------------------------------------------------------------+ + Total line number = 1 + It costs 0.030s + ``` + +- 其他2个节点依次进入到IoTDB树模型的CLI中,执行语句后将获取的3台机器的机器码都复制给天谋工作人员 + +- 工作人员会返回3段激活码,正常是与提供的3个机器码的顺序对应的,请分别将各自的激活码粘贴到CLI中,如下提示: + + - 注:激活码前后需要用`'`符号进行标注,如所示 + + ```Bash + IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===,01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' + ``` + +### 验证激活 + +当看到“Result”字段状态显示为success表示激活成功 + +![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4-%E9%AA%8C%E8%AF%81.png) + +## 节点维护步骤 + +### ConfigNode节点维护 + +ConfigNode节点维护分为ConfigNode添加和移除两种操作,有两个常见使用场景: + +- 集群扩展:如集群中只有1个ConfigNode时,希望增加ConfigNode以提升ConfigNode节点高可用性,则可以添加2个ConfigNode,使得集群中有3个ConfigNode。 +- 集群故障恢复:1个ConfigNode所在机器发生故障,使得该ConfigNode无法正常运行,此时可以移除该ConfigNode,然后添加一个新的ConfigNode进入集群。 + +> ❗️注意,在完成ConfigNode节点维护后,需要保证集群中有1或者3个正常运行的ConfigNode。2个ConfigNode不具备高可用性,超过3个ConfigNode会导致性能损失。 + +#### 添加ConfigNode节点 + +脚本命令: + +```shell +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-confignode.sh + +# Windows +# 首先切换到IoTDB根目录 +sbin/start-confignode.bat +``` + +#### 移除ConfigNode节点 + +首先通过CLI连接集群,通过`show confignodes`确认想要移除ConfigNode的内部地址与端口号: + +```shell +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] +或 +./sbin/remove-confignode.sh [cn_internal_address:cn_internal_port] + +#Windows +sbin/remove-confignode.bat [confignode_id] +或 +./sbin/remove-confignode.bat [cn_internal_address:cn_internal_port] +``` + +### DataNode节点维护 + +DataNode节点维护有两个常见场景: + +- 集群扩容:出于集群能力扩容等目的,添加新的DataNode进入集群 +- 集群故障恢复:一个DataNode所在机器出现故障,使得该DataNode无法正常运行,此时可以移除该DataNode,并添加新的DataNode进入集群 + +> ❗️注意,为了使集群能正常工作,在DataNode节点维护过程中以及维护完成后,正常运行的DataNode总数不得少于数据副本数(通常为2),也不得少于元数据副本数(通常为3)。 + +#### 添加DataNode节点 + +脚本命令: + +```Bash +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-datanode.sh + +#Windows +# 首先切换到IoTDB根目录 +sbin/start-datanode.bat +``` + +说明:在添加DataNode后,随着新的写入到来(以及旧数据过期,如果设置了TTL),集群负载会逐渐向新的DataNode均衡,最终在所有节点上达到存算资源的均衡。 + +#### 移除DataNode节点 + +首先通过CLI连接集群,通过`show datanodes`确认想要移除的DataNode的RPC地址与端口号: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [dn_rpc_address:dn_rpc_port] + +#Windows +sbin/remove-datanode.bat [dn_rpc_address:dn_rpc_port] +``` + +## 常见问题 + +1. 部署过程中多次提示激活失败 + - 使用 `ls -al` 命令:使用 `ls -al` 命令检查安装包根目录的所有者信息是否为当前用户。 + - 检查激活目录:检查 `./activation` 目录下的所有文件,所有者信息是否为当前用户。 +2. Confignode节点启动失败 + - 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + - 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + - 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + - 清理环境: + + 1. 结束所有 ConfigNode 和 DataNode 进程。 + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + 2. 删除 data 和 logs 目录。 + - 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```shell + cd /data/iotdb rm -rf data logs + ``` +## 附录 + +### Confignode节点参数介绍 + +| 参数 | 描述 | 是否为必填项 | +| :--- | :------------------------------- | :----------- | +| -d | 以守护进程模式启动,即在后台运行 | 否 | + +### Datanode节点参数介绍 + +| 缩写 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + diff --git a/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Database-Resources.md b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Database-Resources.md new file mode 100644 index 00000000..17e09aa0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Database-Resources.md @@ -0,0 +1,193 @@ + +# 资源规划 +## CPU + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序列数(采集频率<=1HZ)CPU节点数
单机双活分布式
10W以内2核-4核123
30W以内4核-8核123
50W以内8核-16核123
100W以内16核-32核123
200w以内32核-48核123
1000w以内48核12请联系天谋商务咨询
1000w以上请联系天谋商务咨询
+ +## 内存 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序列数(采集频率<=1HZ)内存节点数
单机双活分布式
10W以内4G-8G123
30W以内12G-32G123
50W以内24G-48G123
100W以内32G-96G123
200w以内64G-128G123
1000w以内128G12请联系天谋商务咨询
1000w以上请联系天谋商务咨询
+ +## 存储(磁盘) +### 存储空间 +计算公式:测点数量 * 采样频率(Hz)* 每个数据点大小(Byte,不同数据类型不一样,见下表) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
数据点大小计算表
数据类型 时间戳(字节)值(字节)数据点总大小(字节)
开关量(Boolean)819
整型(INT32)/ 单精度浮点数(FLOAT)8412
长整型(INT64)/ 双精度浮点数(DOUBLE)8816
字符串(TEXT)8平均为a8+a
+ +示例:1000设备,每个设备100 测点,共 100000 序列,INT32 类型。采样频率1Hz(每秒一次),存储1年,3副本。 +- 完整计算公式:1000设备 * 100测点 * 12字节每数据点 * 86400秒每天 * 365天每年 * 3副本/10压缩比=11T +- 简版计算公式:1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T +### 存储配置 +1000w 点位以上或查询负载较大,推荐配置 SSD。 +## 网络(网卡) +在写入吞吐不超过1000万点/秒时,需配置千兆网卡;当写入吞吐超过 1000万点/秒时,需配置万兆网卡。 +| **写入吞吐(数据点/秒)** | **网卡速率** | +| ------------------- | ------------- | +| <1000万 | 1Gbps(千兆) | +| >=1000万 | 10Gbps(万兆) | +## 其他说明 +IoTDB 具有集群秒级扩容能力,扩容节点数据可不迁移,因此您无需担心按现有数据情况估算的集群能力有限,未来您可在需要扩容时为集群加入新的节点。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Environment-Requirements.md new file mode 100644 index 00000000..99c5b14c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Environment-Requirements.md @@ -0,0 +1,205 @@ + +# 系统配置 + +## 磁盘阵列 + +### 配置建议 + +IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵列存储IoTDB的数据,以达到多个磁盘阵列并发写入的目标,配置可参考以下建议: + +1. 物理环境 + 系统盘:建议使用2块磁盘做Raid1,仅考虑操作系统自身所占空间即可,可以不为IoTDB预留系统盘空间 + 数据盘 + 建议做Raid,在磁盘维度进行数据保护 + 建议为IoTDB提供多块磁盘(1-6块左右)或磁盘组(不建议将所有磁盘做成一个磁盘阵列,会影响 IoTDB的性能上限) +2. 虚拟环境 + 建议挂载多块硬盘(1-6块左右) + +### 配置示例 + +- 示例1,4块3.5英寸硬盘 + +因服务器安装的硬盘较少,直接做Raid5即可,无需其他配置。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| ----------- | -------- | -------- | --------- | -------- | +| 系统/数据盘 | RAID5 | 4 | 允许坏1块 | 3 | + +- 示例2,12块3.5英寸硬盘 + +服务器配置12块3.5英寸盘。 + +前2块盘推荐Raid1作系统盘,2组数据盘可分为2组Raid5,每组5块盘实际可用4块。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| -------- | -------- | -------- | --------- | -------- | +| 系统盘 | RAID1 | 2 | 允许坏1块 | 1 | +| 数据盘 | RAID5 | 5 | 允许坏1块 | 4 | +| 数据盘 | RAID5 | 5 | 允许坏1块 | 4 | + +- 示例3,24块2.5英寸盘 + +服务器配置24块2.5英寸盘。 + +前2块盘推荐Raid1作系统盘,后面可分为3组Raid5,每组7块盘实际可用6块。剩余一块可闲置或存储写前日志使用。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| -------- | -------- | -------- | --------- | -------- | +| 系统盘 | RAID1 | 2 | 允许坏1块 | 1 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | NoRaid | 1 | 损坏丢失 | 1 | + +## 操作系统 + +### 版本要求 + +IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 + +### 硬盘分区 + +- 建议使用默认的标准分区方式,不推荐LVM扩展和硬盘加密。 +- 系统盘只需满足操作系统的使用空间即可,不需要为IoTDB预留空间。 +- 每个硬盘组只对应一个分区即可,数据盘(里面有多个磁盘组,对应raid)不用再额外分区,所有空间给IoTDB使用。 + +建议的磁盘分区方式如下表所示。 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
硬盘分类磁盘组对应盘符大小文件系统类型
系统盘磁盘组0/boot1GB默认
/磁盘组剩余全部空间默认
数据盘磁盘组1/data1磁盘组1全部空间默认
磁盘组2/data2磁盘组2全部空间默认
......
+ +### 网络配置 + +1. 关闭防火墙 + +```Bash +# 查看防火墙 +systemctl status firewalld +# 关闭防火墙 +systemctl stop firewalld +# 永久关闭防火墙 +systemctl disable firewalld +``` + +2. 保证所需端口不被占用 + +(1)集群占用端口的检查:在集群默认配置中,ConfigNode 会占用端口 10710 和 10720,DataNode 会占用端口 6667、10730、10740、10750 、10760、9090、9190、3000请确保这些端口未被占用。检查方式如下: + +```Bash +lsof -i:6667 或 netstat -tunp | grep 6667 +lsof -i:10710 或 netstat -tunp | grep 10710 +lsof -i:10720 或 netstat -tunp | grep 10720 +#如果命令有输出,则表示该端口已被占用。 +``` + +(2)集群部署工具占用端口的检查:使用集群管理工具opskit安装部署集群时,需打开SSH远程连接服务配置,并开放22号端口。 + +```Bash +yum install openssh-server #安装ssh服务 +systemctl start sshd #启用22号端口 +``` + +3. 保证服务器之间的网络相互连通 + +### 其他配置 + +1. 关闭系统 swap 内存 + +```Bash +echo "vm.swappiness = 0">> /etc/sysctl.conf +# 一起执行 swapoff -a 和 swapon -a 命令是为了将 swap 里的数据转储回内存,并清空 swap 里的数据。 +# 不可省略 swappiness 设置而只执行 swapoff -a;否则,重启后 swap 会再次自动打开,使得操作失效。 +swapoff -a && swapon -a +# 在不重启的情况下使配置生效。 +sysctl -p +# 检查内存分配,预期 swap 为 0 +free -m +``` + +2. 设置系统最大打开文件数为 65535,以避免出现 "太多的打开文件 "的错误。 + +```Bash +#查看当前限制 +ulimit -n +# 临时修改 +ulimit -n 65535 +# 永久修改 +echo "* soft nofile 65535" >> /etc/security/limits.conf +echo "* hard nofile 65535" >> /etc/security/limits.conf +#退出当前终端会话后查看,预期显示65535 +ulimit -n +``` + +## 软件依赖 + +安装 Java 运行环境 ,Java 版本 >= 1.8,请确保已设置 jdk 环境变量。(V1.3.2.2 及之上版本推荐直接部署JDK17,老版本JDK部分场景下性能有问题,且datanode会出现stop不掉的问题) + +```Bash + #下面以在centos7,使用JDK-17安装为例: + tar -zxvf jdk-17_linux-x64_bin.tar #解压JDK文件 + Vim ~/.bashrc #配置JDK环境 + { export JAVA_HOME=/usr/lib/jvm/jdk-17.0.9 + export PATH=$JAVA_HOME/bin:$PATH + } #添加JDK环境变量 + source ~/.bashrc #配置环境生效 + java -version #检查JDK环境 +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md new file mode 100644 index 00000000..6c66c7fb --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/IoTDB-Package_timecho.md @@ -0,0 +1,45 @@ + +# 安装包获取 +## 获取方式 + +企业版安装包可通过产品试用申请,或直接联系与您对接的工作人员获取。 + +## 安装包结构 + +安装包解压后目录结构如下: + +| **目录** | **类型** | **说明** | +| :--------------- | :------- | :----------------------------------------------------------- | +| activation | 文件夹 | 激活文件所在目录,包括生成的机器码以及从天谋工作人员获取的企业版激活码(启动ConfigNode后才会生成该目录,即可获取激活码) | +| conf | 文件夹 | 配置文件目录,包含 ConfigNode、DataNode、JMX 和 logback 等配置文件 | +| data | 文件夹 | 默认的数据文件目录,包含 ConfigNode 和 DataNode 的数据文件。(启动程序后才会生成该目录) | +| lib | 文件夹 | 库文件目录 | +| licenses | 文件夹 | 开源协议证书文件目录 | +| logs | 文件夹 | 默认的日志文件目录,包含 ConfigNode 和 DataNode 的日志文件(启动程序后才会生成该目录) | +| sbin | 文件夹 | 主要脚本目录,包含数据库启、停等脚本 | +| tools | 文件夹 | 工具目录 | +| ext | 文件夹 | pipe,trigger,udf插件的相关文件 | +| LICENSE | 文件 | 开源许可证文件 | +| NOTICE | 文件 | 开源声明文件 | +| README_ZH.md | 文件 | 使用说明(中文版) | +| README.md | 文件 | 使用说明(英文版) | +| RELEASE_NOTES.md | 文件 | 版本说明 | diff --git a/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md new file mode 100644 index 00000000..c7fba837 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -0,0 +1,682 @@ + +# 监控面板部署 + +IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 + +## 安装准备 + +1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 +2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 + +## 安装步骤 + +### 步骤一:IoTDB开启监控指标采集 + +1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 + +| 配置项 | 所在配置文件 | 配置说明 | +| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | +| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | +| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | +| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | +| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | +| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | +| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | + +以3C3D集群为例,需要修改的监控配置如下: + +| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | +| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | +| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | + +2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: + +```shell +./sbin/stop-standalone.sh #先停止confignode和datanode +./sbin/start-confignode.sh -d #启动confignode +./sbin/start-datanode.sh -d #启动datanode +``` + +3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### 步骤二:安装、配置Prometheus + +> 此处以prometheus安装在服务器192.168.1.3为例。 + +1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) +2. 解压安装包,进入解压后的文件夹: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +3. 修改配置。修改配置文件prometheus.yml如下 + 1. 新增confignode任务收集ConfigNode的监控数据 + 2. 新增datanode任务收集DataNode的监控数据 + +```shell +global: + scrape_interval: 15s + evaluation_interval: 15s +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true +``` + +4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 + +
+ + +
+ + + +6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: + +![](https://alioss.timecho.com/docs/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) + +### 步骤三:安装grafana并配置数据源 + +> 此处以Grafana安装在服务器192.168.1.3为例。 + +1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) +2. 解压并进入对应文件夹 + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +3. 启动Grafana: + +```Shell +./bin/grafana-server web +``` + +4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 + +5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus + +![](https://alioss.timecho.com/docs/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) + +在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 + +![](https://alioss.timecho.com/docs/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) + +### 步骤四:导入IoTDB Grafana看板 + +1. 进入Grafana,选择Dashboards: + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) + +2. 点击右侧 Import 按钮 + + ![](https://alioss.timecho.com/docs/img/Import%E6%8C%89%E9%92%AE.png) + +3. 使用upload json file的方式导入Dashboard + + ![](https://alioss.timecho.com/docs/img/%E5%AF%BC%E5%85%A5Dashboard.png) + +4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) + +5. 选择数据源为Prometheus,然后点击Import + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) + +6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF.png) + +7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: + +
+ + + +
+ +8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) + +## 附录、监控指标详解 + +### 系统面板(System Dashboard) + +该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 + +#### CPU + +- CPU Core:CPU 核数 +- CPU Load: + - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 + - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 +- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 + +#### Memory + +- System Memory:当前系统内存的使用情况。 + - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 + - Total physical memory:系统可用物理内存的总量。 + - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 +- System Swap Memory:交换空间(Swap Space)内存用量。 +- Process Memory:IoTDB 进程使用内存的情况。 + - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) + - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 + - Used Memory:IoTDB 进程当前已经使用的内存总量。 + +#### Disk + +- Disk Space: + - Total disk space:IoTDB 可使用的最大磁盘空间。 + - Used disk space:IoTDB 已经使用的磁盘空间。 +- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 +- File Count:IoTDB 相关文件数量 + - all:所有文件数量 + - TsFile:TsFile 数量 + - seq:顺序 TsFile 数量 + - unseq:乱序 TsFile 数量 + - wal:WAL 文件数量 + - cross-temp:跨空间合并 temp 文件数量 + - inner-seq-temp:顺序空间内合并 temp 文件数量 + - innser-unseq-temp:乱序空间内合并 temp 文件数量 + - mods:墓碑文件数量 +- Open File Count:系统打开的文件句柄数量 +- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 +- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 +- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 +- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 +- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 +- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 +- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 +- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 +- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 + +#### JVM + +- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 +- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 +- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 +- Heap Memory:JVM 堆内存使用情况。 + - Maximum heap memory:JVM 最大可用的堆内存大小。 + - Committed heap memory:JVM 已提交的堆内存大小。 + - Used heap memory:JVM 已经使用的堆内存大小。 + - PS Eden Space:PS Young 区的大小。 + - PS Old Space:PS Old 区的大小。 + - PS Survivor Space:PS Survivor 区的大小。 + - ...(CMS/G1/ZGC 等) +- Off Heap Memory:堆外内存用量。 + - direct memory:堆外直接内存。 + - mapped memory:堆外映射内存。 +- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC +- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC +- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC +- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC +- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 +- The Number of Class: + - loaded:JVM 目前已经加载的类的数量 + - unloaded:系统启动至今 JVM 卸载的类的数量 +- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 + +#### Network + +eno 指的是到公网的网卡,lo 是虚拟网卡。 + +- Net Speed:网卡发送和接收数据的速度 +- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 +- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 +- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) + +### 整体性能面板(Performance Overview Dashboard) + +#### Cluster Overview + +- Total CPU Core: 集群机器 CPU 总核数 +- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 +- 磁盘 + - Total Disk Space: 集群机器磁盘总大小 + - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 +- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 +- Cluster: 集群 ConfigNode 和 DataNode 节点数量 +- Up Time: 集群启动至今的时长 +- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 +- 内存 + - Total System Memory: 集群机器系统内存总大小 + - Total Swap Memory: 集群机器交换内存总大小 + - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 +- Total File Number: 集群管理文件总数量 +- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 +- Total DataBase: 集群管理的 Database 总数(含副本) +- Total DataRegion: 集群管理的 DataRegion 总数 +- Total SchemaRegion: 集群管理的 SchemaRegion 总数 + +#### Node Overview + +- CPU Core: 节点所在机器的 CPU 核数 +- Disk Space: 节点所在机器的磁盘大小 +- Timeseries: 节点所在机器管理的时间序列数量(含副本) +- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 +- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) +- System Memory: 节点所在机器的系统内存大小 +- Swap Memory: 节点所在机器的交换内存大小 +- File Number: 节点管理的文件数 + +#### Performance + +- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 +- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 +- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 +- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 +- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 +- Task Number: 节点的各项系统任务数量 +- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 +- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 +- Operation Per Second: 节点的每秒操作数 +- 主流程 + - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 + - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 + - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 +- Schedule 阶段 + - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 + - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 + - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 +- Local Schedule 各子阶段 + - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 + - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 + - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 +- Storage 阶段 + - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 + - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 + - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 +- Engine 阶段 + - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 + - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 + - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 + +#### System + +- CPU Load: 节点的 CPU 负载 +- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 +- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC +- Heap Memory: 节点的堆内存使用情况 +- Off Heap Memory: 节点的非堆内存使用情况 +- The Number Of Java Thread: 节点的 Java 线程数量情况 +- File Count: 节点管理的文件数量情况 +- File Size: 节点管理文件大小情况 +- Log Number Per Minute: 节点的每分钟不同类型日志情况 + +### ConfigNode 面板(ConfigNode Dashboard) + +该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 + +#### Node Overview + +- Database Count: 节点的数据库数量 +- Region + - DataRegion Count: 节点的 DataRegion 数量 + - DataRegion Current Status: 节点的 DataRegion 的状态 + - SchemaRegion Count: 节点的 SchemaRegion 数量 + - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 +- System Memory: 节点的系统内存大小 +- Swap Memory: 节点的交换区内存大小 +- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 +- DataNodes: 节点所在集群的 DataNode 情况 +- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 + +#### NodeInfo + +- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode +- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 +- DataNode Status: 节点所在集群的 DataNode 节点的状态 +- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 +- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 +- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 +- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 + +#### Protocol + +- 客户端数量统计 + - Active Client Num: 节点各线程池的活跃客户端数量 + - Idle Client Num: 节点各线程池的空闲客户端数量 + - Borrowed Client Count: 节点各线程池的借用客户端数量 + - Created Client Count: 节点各线程池的创建客户端数量 + - Destroyed Client Count: 节点各线程池的销毁客户端数量 +- 客户端时间情况 + - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 + - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 + - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + +#### Partition Table + +- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 +- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 +- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 +- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 +- DataRegion Status: 节点所在集群的 DataRegion 状态 +- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 + +#### Consensus + +- Ratis Stage Time: 节点的 Ratis 各阶段耗时 +- Write Log Entry: 节点的 Ratis 写 Log 的耗时 +- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS +- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 + +### DataNode 面板(DataNode Dashboard) + +该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 + +#### Node Overview + +- The Number Of Entity: 节点管理的实体情况 +- Write Point Per Second: 节点的每秒写入速度 +- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 + +#### Protocol + +- 节点操作耗时 + - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 + - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 + - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 +- Thrift统计 + - The QPS Of Interface: 节点各个 Thrift 接口的 QPS + - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 + - Thrift Connection: 节点的各类型的 Thrfit 连接数量 + - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 +- 客户端统计 + - Active Client Num: 节点各线程池的活跃客户端数量 + - Idle Client Num: 节点各线程池的空闲客户端数量 + - Borrowed Client Count: 节点的各线程池借用客户端数量 + - Created Client Count: 节点各线程池的创建客户端数量 + - Destroyed Client Count: 节点各线程池的销毁客户端数量 + - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 + - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 + - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + +#### Storage Engine + +- File Count: 节点管理的各类型文件数量 +- File Size: 节点管理的各类型文件大小 +- TsFile + - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 + - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 + - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 +- Task Number: 节点的 Task 数量 +- The Time Consumed of Task: 节点的 Task 的耗时 +- Compaction + - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 + - Compaction Number Per Minute: 节点的每分钟合并数量 + - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 + - Compacted Point Num Per Minute: 节点每分钟合并的点数 + +#### Write Performance + +- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable +- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable +- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable +- WAL + - WAL File Size: 节点管理的 WAL 文件总大小 + - WAL File Num: 节点管理的 WAL 文件数量 + - WAL Nodes Num: 节点管理的 WAL Node 数量 + - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 + - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 + - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 + - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 + - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 + - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 + - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 + - WAL Buffer + - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 + - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 + - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 +- Flush统计 + - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 + - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 + - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 + - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 + - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 + - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 +- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 +- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 +- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 +- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 +- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 +- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 +- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 +- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 + +#### Schema Engine + +- Schema Engine Mode: 节点的元数据引擎模式 +- Schema Consensus Protocol: 节点的元数据共识协议 +- Schema Region Number: 节点管理的 SchemaRegion 数量 +- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 +- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 +- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 +- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) +- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 +- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 +- 时间序列统计 + - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 + - Series Type: 节点不同类型的时间序列数量 + - Time Series Number: 节点的时间序列总数 + - Template Series Number: 节点的模板时间序列总数 + - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 +- IMNode统计 + - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 + - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 + - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 + - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 + - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 + - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 +- Cache Hit Rate: 节点的缓存命中率 +- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 +- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 +- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 + +#### Query Engine + +- 各阶段耗时 + - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 + - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 + - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 +- 执行计划分发耗时 + - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 + - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 + - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 +- 执行计划执行耗时 + - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 + - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 + - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 +- 算子执行耗时 + - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 + - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 + - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 +- 聚合查询计算耗时 + - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 + - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 + - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 +- 文件/内存接口耗时 + - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 + - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 + - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 +- 资源访问数量 + - The usage of query resource(avg): 节点查询资源访问数量的平均值 + - The usage of query resource(50%): 节点查询资源访问数量的中位数 + - The usage of query resource(99%): 节点查询资源访问数量的P99 +- 数据传输耗时 + - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 + - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 + - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 +- 数据传输数量 + - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 + - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 +- 任务调度数量与耗时 + - The number of query queue: 节点查询任务调度数量 + - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 + - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 + - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 + +#### Query Interface + +- 加载时间序列元数据 + - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 + - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 + - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 +- 读取时间序列 + - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 + - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 + - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 +- 修改时间序列元数据 + - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 + - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 + - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 +- 加载Chunk元数据列表 + - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 + - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 + - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 +- 修改Chunk元数据 + - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 + - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 + - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 +- 按照Chunk元数据过滤 + - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 + - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 + - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 +- 构造Chunk Reader + - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 + - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 + - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 +- 读取Chunk + - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 + - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 + - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 +- 初始化Chunk Reader + - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 + - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 + - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 +- 通过 Page Reader 构造 TsBlock + - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 + - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 + - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 +- 查询通过 Merge Reader 构造 TsBlock + - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 + - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 + - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 + +#### Query Data Exchange + +查询的数据交换耗时。 + +- 通过 source handle 获取 TsBlock + - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 + - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 + - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 +- 通过 source handle 反序列化 TsBlock + - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 + - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 + - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 +- 通过 sink handle 发送 TsBlock + - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 + - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 + - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 +- 回调 data block event + - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 + - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 + - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 +- 获取 data block task + - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 + - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 + - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 + +#### Query Related Resource + +- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 +- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 +- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 +- Coordinator: 节点上记录的查询数量 +- MemoryPool Size: 节点查询相关的内存池情况 +- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 +- DriverScheduler: 节点查询相关的队列任务数量 + +#### Consensus - IoT Consensus + +- 内存使用 + - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 +- 节点间同步情况 + - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 + - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 + - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 + - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 + - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 + - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 + - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 + - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 + - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 + - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 +- 不同执行阶段耗时 + - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 + - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 + - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 +- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 +- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS +- RatisConsensus Memory: 节点 Ratis 的内存使用情况 + +#### Consensus - SchemaRegion Ratis Consensus + +- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 +- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 +- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS +- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md new file mode 100644 index 00000000..3e6fc34e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Table/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md @@ -0,0 +1,234 @@ + +# 单机版安装部署 + +本章将介绍如何启动IoTDB单机实例,IoTDB单机实例包括 1 个ConfigNode 和1个DataNode(即通常所说的1C1D)。 + +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](../Deployment-and-Maintenance/Environment-Requirements.md)准备完成。 +2. 推荐使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在服务器上配置`/etc/hosts`,如本机ip是192.168.1.3,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的 `cn_internal_address`、`dn_internal_address`。 + + ```shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. 部分参数首次启动后不能修改,请参考下方的[参数配置](#2参数配置)章节进行设置。 +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 +5. 请注意,安装部署(包括激活和使用软件)IoTDB时,您可以: + - 使用 root 用户(推荐):可以避免权限等问题。 + - 使用固定的非 root 用户: + - 使用同一用户操作:确保在启动、激活、停止等操作均保持使用同一用户,不要切换用户。 + - 避免使用 sudo:使用 sudo 命令会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 +6. 推荐部署监控面板,可以对重要运行指标进行监控,随时掌握数据库运行状态,监控面板可以联系工作人员获取,部署监控面板步骤可以参考:[监控面板部署](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) + +## 安装步骤 + +### 1、解压安装包并进入安装目录 + +```Plain +unzip timechodb-{version}-bin.zip +cd timechodb-{version}-bin +``` + +### 2、参数配置 + +#### 内存配置 + +- conf/confignode-env.sh(或 .bat) + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :------------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- conf/datanode-env.sh(或 .bat) + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------- | :----------------------------------- | :--------- | :----------------------------------------------- | :----------- | +| MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 功能配置 + +系统实际生效的参数在文件 conf/iotdb-system.properties 中,启动需设置以下参数,可以从 conf/iotdb-system.properties.template 文件中查看全部参数 + +集群级功能配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :------------------------ | :------------------------------- | :------------- | :----------------------------------------------- | :------------------------ | +| cluster_name | 集群名称 | defaultCluster | 可根据需要设置集群名称,如无特殊需要保持默认即可 | 首次启动后不可修改 | +| schema_replication_factor | 元数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | +| data_replication_factor | 数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | + +ConfigNode 配置 + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------- | :----------------- | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +DataNode 配置 + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :------------------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------- | :----------------- | +| dn_rpc_address | 客户端 RPC 服务的地址 | 0.0.0.0 | 0.0.0.0 | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +### 3、启动 ConfigNode 节点 + +进入iotdb的sbin目录下,启动confignode + +```shell +./sbin/start-confignode.sh -d #“-d”参数将在后台进行启动 +``` + +如果启动失败,请参考下方[常见问题](#常见问题)。 + +### 4、启动 DataNode 节点 + + 进入iotdb的sbin目录下,启动datanode: + +```shell +./sbin/start-datanode.sh -d #“-d”参数将在后台进行启动 +``` + +### 5、激活数据库 + +#### 方式一:文件激活 + +- 启动Confignode、Datanode节点后,进入activation文件夹, 将 system_info文件复制给天谋工作人员 +- 收到工作人员返回的 license文件 +- 将license文件放入对应节点的activation文件夹下; + +#### 方式二:命令激活 +- 进入 IoTDB CLI + - 表模型 CLI 进入命令: + ```SQL + # Linux或MACOS系统 + ./start-cli.sh -sql_dialect table + + # windows系统 + ./start-cli.bat -sql_dialect table + ``` + + - 树模型 CLI 进入命令: + ```SQL + # Linux或MACOS系统 + ./start-cli.sh + + # windows系统 + ./start-cli.bat + ``` +- 执行以下内容获取激活所需机器码: + - 注:当前仅支持在树模型中进行激活 + +```Bash +show system info +``` + +- 将返回机器码(即绿色字符串)复制给天谋工作人员: + +```Bash ++--------------------------------------------------------------+ +| SystemInfo| ++--------------------------------------------------------------+ +| 01-TE5NLES4-UDDWCMYE| ++--------------------------------------------------------------+ +Total line number = 1 +It costs 0.030s +``` + +- 将工作人员返回的激活码输入到CLI中,输入以下内容 + - 注:激活码前后需要用`'`符号进行标注,如所示 + +```Bash +IoTDB> activate '01-D4EYQGPZ-EAUJJODW-NUKRDR6F-TUQS3B75-EDZFLK3A-6BOKJFFZ-ALDHOMN7-NB2E4BHI-7ZKGFVK6-GCIFXA4T-UG3XJTTD-SHJV6F2P-Q27B4OMJ-R47ZDIM3-UUASUXG2-OQXGVZCO-MMYKICZU-TWFQYYAO-ZOAGOKJA-NYHQTA5U-EWAR4EP5-MRC6R2CI-PKUTKRCT-7UDGRH3F-7BYV4P5D-6KKIA===' +``` + +### 6、验证激活 + +当看到“ClusterActivationStatus”字段状态显示为ACTIVATED表示激活成功 + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81.png) + +## 常见问题 + +1. 部署过程中多次提示激活失败 + - 使用 `ls -al` 命令:使用 `ls -al` 命令检查安装包根目录的所有者信息是否为当前用户。 + - 检查激活目录:检查 `./activation` 目录下的所有文件,所有者信息是否为当前用户。 +2. Confignode节点启动失败 + - 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + - 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + - 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + - 清理环境: + 1. 结束所有 ConfigNode 和 DataNode 进程。 + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + + 2. 删除 data 和 logs 目录。 + - 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```shell + cd /data/iotdb rm -rf data logs + ``` + +## 附录 + +### Confignode节点参数介绍 + +| 参数 | 描述 | 是否为必填项 | +| :--- | :------------------------------- | :----------- | +| -d | 以守护进程模式启动,即在后台运行 | 否 | + +### Datanode节点参数介绍 + +| 缩写 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-CSharp-Native-API.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-CSharp-Native-API.md new file mode 100644 index 00000000..addbce6c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-CSharp-Native-API.md @@ -0,0 +1,274 @@ + + +# C# 原生接口 + +## 依赖 + +- .NET SDK >= 5.0 或 .NET Framework 4.x +- Thrift >= 0.14.1 +- NLog >= 4.7.9 + +## 安装 + +您可以使用 NuGet Package Manager, .NET CLI等工具来安装,以 .NET CLI为例 + +如果您使用的是\.NET 5.0 或者更高版本的SDK,输入如下命令即可安装最新的NuGet包 + +``` +dotnet add package Apache.IoTDB +``` + +为了适配 .NET Framework 4.x,我们单独构建了一个NuGet包,如果您使用的是\.NET Framework 4.x,输入如下命令即可安装最新的包 + +```bash +dotnet add package Apache.IoTDB.framework +``` + +如果您想安装更早版本的客户端,只需要指定版本即可 + +```bash +# 安装0.12.1.2版本的客户端 +dotnet add package Apache.IoTDB --version 0.12.1.2 +``` + +## 基本接口说明 + +Session接口在语义上和其他语言客户端相同 + +```csharp +// 参数定义 +string host = "localhost"; +int port = 6667; +int pool_size = 2; + +// 初始化session +var session_pool = new SessionPool(host, port, pool_size); + +// 开启session +await session_pool.Open(false); + +// 创建时间序列 +await session_pool.CreateTimeSeries("root.test_group.test_device.ts1", TSDataType.TEXT, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); +await session_pool.CreateTimeSeries("root.test_group.test_device.ts2", TSDataType.BOOLEAN, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); +await session_pool.CreateTimeSeries("root.test_group.test_device.ts3", TSDataType.INT32, TSEncoding.PLAIN, Compressor.UNCOMPRESSED); + +// 插入record +var measures = new List{"ts1", "ts2", "ts3"}; +var values = new List { "test_text", true, (int)123 }; +var timestamp = 1; +var rowRecord = new RowRecord(timestamp, values, measures); +await session_pool.InsertRecordAsync("root.test_group.test_device", rowRecord); + +// 插入Tablet +var timestamp_lst = new List{ timestamp + 1 }; +var value_lst = new List {"iotdb", true, (int) 12}; +var tablet = new Tablet("root.test_group.test_device", measures, value_lst, timestamp_ls); +await session_pool.InsertTabletAsync(tablet); + +// 关闭Session +await session_pool.Close(); +``` + +## **Row Record** + +- 对**IoTDB**中的`record`数据进行封装和抽象。 +- 示例: + + | timestamp | status | temperature | + | --------- | ------ | ----------- | + | 1 | 0 | 20 | + +- 构造方法: + +```csharp +var rowRecord = + new RowRecord(long timestamps, List values, List measurements); +``` + +### **Tablet** + +- 一种类似于表格的数据结构,包含一个设备的若干行非空数据块。 +- 示例: + + | time | status | temperature | + | ---- | ------ | ----------- | + | 1 | 0 | 20 | + | 2 | 0 | 20 | + | 3 | 3 | 21 | + +- 构造方法: + +```csharp +var tablet = + Tablet(string deviceId, List measurements, List> values, List timestamps); +``` + + + +## **API** + +### **基础接口** + +| api name | parameters | notes | use example | +| -------------- | ------------------------- | ------------------------ | ----------------------------- | +| Open | bool | open session | session_pool.Open(false) | +| Close | null | close session | session_pool.Close() | +| IsOpen | null | check if session is open | session_pool.IsOpen() | +| OpenDebugMode | LoggingConfiguration=null | open debug mode | session_pool.OpenDebugMode() | +| CloseDebugMode | null | close debug mode | session_pool.CloseDebugMode() | +| SetTimeZone | string | set time zone | session_pool.GetTimeZone() | +| GetTimeZone | null | get time zone | session_pool.GetTimeZone() | + +### **Record相关接口** + +| api name | parameters | notes | use example | +| ----------------------------------- | ----------------------------- | ----------------------------------- | ------------------------------------------------------------ | +| InsertRecordAsync | string, RowRecord | insert single record | session_pool.InsertRecordAsync("root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE", new RowRecord(1, values, measures)); | +| InsertRecordsAsync | List\, List\ | insert records | session_pool.InsertRecordsAsync(device_id, rowRecords) | +| InsertRecordsOfOneDeviceAsync | string, List\ | insert records of one device | session_pool.InsertRecordsOfOneDeviceAsync(device_id, rowRecords) | +| InsertRecordsOfOneDeviceSortedAsync | string, List\ | insert sorted records of one device | InsertRecordsOfOneDeviceSortedAsync(deviceId, sortedRowRecords); | +| TestInsertRecordAsync | string, RowRecord | test insert record | session_pool.TestInsertRecordAsync("root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE", rowRecord) | +| TestInsertRecordsAsync | List\, List\ | test insert record | session_pool.TestInsertRecordsAsync(device_id, rowRecords) | + +### **Tablet相关接口** + +| api name | parameters | notes | use example | +| ---------------------- | ------------ | -------------------- | -------------------------------------------- | +| InsertTabletAsync | Tablet | insert single tablet | session_pool.InsertTabletAsync(tablet) | +| InsertTabletsAsync | List\ | insert tablets | session_pool.InsertTabletsAsync(tablets) | +| TestInsertTabletAsync | Tablet | test insert tablet | session_pool.TestInsertTabletAsync(tablet) | +| TestInsertTabletsAsync | List\ | test insert tablets | session_pool.TestInsertTabletsAsync(tablets) | + +### **SQL语句接口** + +| api name | parameters | notes | use example | +| ----------------------------- | ---------- | ------------------------------ | ------------------------------------------------------------ | +| ExecuteQueryStatementAsync | string | execute sql query statement | session_pool.ExecuteQueryStatementAsync("select * from root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE where time<15"); | +| ExecuteNonQueryStatementAsync | string | execute sql nonquery statement | session_pool.ExecuteNonQueryStatementAsync( "create timeseries root.97209_TEST_CSHARP_CLIENT_GROUP.TEST_CSHARP_CLIENT_DEVICE.status with datatype=BOOLEAN,encoding=PLAIN") | + +### 数据表接口 + +| api name | parameters | notes | use example | +| -------------------------- | ------------------------------------------------------------ | --------------------------- | ------------------------------------------------------------ | +| SetStorageGroup | string | set storage group | session_pool.SetStorageGroup("root.97209_TEST_CSHARP_CLIENT_GROUP_01") | +| CreateTimeSeries | string, TSDataType, TSEncoding, Compressor | create time series | session_pool.InsertTabletsAsync(tablets) | +| DeleteStorageGroupAsync | string | delete single storage group | session_pool.DeleteStorageGroupAsync("root.97209_TEST_CSHARP_CLIENT_GROUP_01") | +| DeleteStorageGroupsAsync | List\ | delete storage group | session_pool.DeleteStorageGroupAsync("root.97209_TEST_CSHARP_CLIENT_GROUP") | +| CreateMultiTimeSeriesAsync | List\, List\ , List\ , List\ | create multi time series | session_pool.CreateMultiTimeSeriesAsync(ts_path_lst, data_type_lst, encoding_lst, compressor_lst); | +| DeleteTimeSeriesAsync | List\ | delete time series | | +| DeleteTimeSeriesAsync | string | delete time series | | +| DeleteDataAsync | List\, long, long | delete data | session_pool.DeleteDataAsync(ts_path_lst, 2, 3) | + +### **辅助接口** + +| api name | parameters | notes | use example | +| -------------------------- | ---------- | --------------------------- | ---------------------------------------------------- | +| CheckTimeSeriesExistsAsync | string | check if time series exists | session_pool.CheckTimeSeriesExistsAsync(time series) | + + + +用法可以参考[用户示例](https://github.com/apache/iotdb-client-csharp/tree/main/samples/Apache.IoTDB.Samples) + +## 连接池 + +为了实现并发客户端请求,我们提供了针对原生接口的连接池(`SessionPool`),由于`SessionPool`本身为`Session`的超集,当`SessionPool`的`pool_size`参数设置为1时,退化为原来的`Session` + +我们使用`ConcurrentQueue`数据结构封装了一个客户端队列,以维护与服务端的多个连接,当调用`Open()`接口时,会在该队列中创建指定个数的客户端,同时通过`System.Threading.Monitor`类实现对队列的同步访问。 + +当请求发生时,会尝试从连接池中寻找一个空闲的客户端连接,如果没有空闲连接,那么程序将需要等待直到有空闲连接 + +当一个连接被用完后,他会自动返回池中等待下次被使用 + +## ByteBuffer + +在传入RPC接口参数时,需要对Record和Tablet两种数据结构进行序列化,我们主要通过封装的ByteBuffer类实现 + +在封装字节序列的基础上,我们进行了内存预申请与内存倍增的优化,减少了序列化过程中内存的申请和释放,在一个拥有20000行的Tablet上进行序列化测试时,速度比起原生的数组动态增长具有**35倍的性能加速** + +### 实现介绍 +在进行`RowRecords`以及`Tablet`的插入时,我们需要对多行RowRecord和Tablet进行序列化以进行发送。客户端中的序列化实现主要依赖于ByteBuffer完成。接下来我们介绍ByteBuffer的实现细节。本文包含如下几点内容: + - 序列化的协议 + - C#与Java的大小端的差异 + - ByteBuffer内存倍增算法 + +### 序列化协议 +客户端向IoTDB服务器发送的序列化数据总体应该包含两个信息。 + - 数据类型 + - 数据本身 + +其中对于`字符串`的序列化时,我们需要再加入字符串的长度信息。即一个字符串的序列化完整结果为: + + [类型][长度][数据内容] +接下来我们分别介绍`RowRecord`、`Tablet`的序列化方式 + +#### RowRecord +我们对RowRecord进行序列化时,`伪代码`如下: +```csharp +public byte[] value_to_bytes(List data_types, List values){ + ByteBuffer buffer = new ByteBuffer(values.Count); + for(int i = 0;i < data_types.Count(); i++){ + buffer.add_type((data_types[i]); + buffer.add_val(values[i]); + } +} +``` + +对于其序列化的结果格式如下: + + [数据类型1][数据1][数据类型2][数据2]...[数据类型N][数据N] + 其中数据类型为自定义的`Enum`变量,分别如下: +```csharp +public enum TSDataType{BOOLEAN, INT32, INT64, FLOAT, DOUBLE, TEXT, NONE}; +``` + +#### Tablet序列化 +使用`Tabelt`进行数据插入时有如下限制: + + 限制:Tablet中数据不能有空值 +由于向 `IoTDB`服务器发送`Tablet`数据插入请求时会携带`行数`, `列数`, `列数据类型`,所以`Tabelt`序列化时我们不需要加入数据类型信息。`Tablet`是`按照列进行序列化`,这是因为后端可以通过行数得知出当前列的元素个数,同时根据列类型来对数据进行解析。 + +### CSharp与Java序列化数据时的大小端差异 +由于Java序列化默认大端协议,而CSharp序列化默认得到小端序列。所以我们在CSharp中序列化数据之后,需要对数据进行反转这样后端才可以正常解析。同时当我们从后端获取到序列化的结果时(如`SessionDataset`),我们也需要对获得的数据进行反转以解析内容。这其中特例便是字符串的序列化,CSharp中对字符串的序列化结果为大端序,所以序列化字符串或者接收到字符串序列化结果时,不需要反转序列结果。 + +### ByteBuffer内存倍增法 +拥有数万行的Tablet的序列化结果可能有上百兆,为了能够高效的实现大`Tablet`的序列化,我们对ByteBuffer使用`内存倍增法`的策略来减少序列化过程中对于内存的申请和释放。即当当前的buffer的长度不足以放下序列化结果时,我们将当前buffer的内存`至少`扩增2倍。这极大的减少了内存的申请释放次数,加速了大Tablet的序列化速度。 +```csharp +private void extend_buffer(int space_need){ + if(write_pos + space_need >= total_length){ + total_length = max(space_need, total_length); + byte[] new_buffer = new byte[total_length * 2]; + buffer.CopyTo(new_buffer, 0); + buffer = new_buffer; + total_length = 2 * total_length; + } +} +``` +同时在序列化`Tablet`时,我们首先根据Tablet的`行数`,`列数`以及每一列的数据类型估计当前`Tablet`序列化结果所需要的内存大小,并在初始化时进行内存的申请。这进一步的减少了内存的申请释放频率。 + +通过上述的策略,我们在一个有`20000`行的Tablet上进行测试时,序列化速度相比Naive数组长度动态生长实现算法具有约35倍的性能加速。 + +## 异常重连 + +当服务端发生异常或者宕机重启时,客户端中原来通过`Open()`产生的的session会失效,抛出`TException`异常 + +为了避免这一情况的发生,我们对大部分的接口进行了增强,一旦出现连接问题,就会尝试重新调用`Open()`接口并创建新的Session,并尝试重新发送对应的请求 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Cpp-Native-API.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Cpp-Native-API.md new file mode 100644 index 00000000..77636edf --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Cpp-Native-API.md @@ -0,0 +1,431 @@ + + +# C++ 原生接口 + +## 依赖 + +- Java 8+ +- Flex +- Bison 2.7+ +- Boost 1.56+ +- OpenSSL 1.0+ +- GCC 5.5.0+ + + +## 安装 + +### 安装相关依赖 + +- **MAC** + 1. 安装 Bison : + + 使用下面 brew 命令安装 bison 版本: + ```shell + brew install bison + ``` + + 2. 安装 Boost :确保安装最新的 Boost 版本。 + + ```shell + brew install boost + ``` + + 3. 检查 OpenSSL :确保 openssl 库已安装,默认的 openssl 头文件路径为"/usr/local/opt/openssl/include" + + 如果在编译过程中出现找不到 openssl 的错误,尝试添加`-Dopenssl.include.dir=""` + + +- **Ubuntu 16.04+ 或其他 Debian 系列** + + 使用以下命令安装所赖: + + ```shell + sudo apt-get update + sudo apt-get install gcc g++ bison flex libboost-all-dev libssl-dev + ``` + + +- **CentOS 7.7+/Fedora/Rocky Linux 或其他 Red-hat 系列** + + 使用 yum 命令安装依赖: + + ```shell + sudo yum update + sudo yum install gcc gcc-c++ boost-devel bison flex openssl-devel + ``` + + +- **Windows** + +1. 构建编译环境 + - 安装 MS Visual Studio(推荐安装 2019+ 版本):安装时需要勾选 Visual Studio C/C++ IDE and compiler(supporting CMake, Clang, MinGW) + - 下载安装 [CMake](https://cmake.org/download/) 。 + +2. 下载安装 Flex、Bison + - 下载 [Win_Flex_Bison](https://sourceforge.net/projects/winflexbison/) + - 下载后需要将可执行文件重命名为 flex.exe 和 bison.exe 以保证编译时能够被找到,添加可执行文件的目录到 PATH 环境变量中 + +3. 安装 Boost 库 + - 下载 [Boost](https://www.boost.org/users/download/) + - 本地编译 Boost :依次执行 bootstrap.bat 和 b2.exe + - 添加 Boost 安装目录到 PATH 环境变量中,例如 `C:\Program Files (x86)\boost_1_78_0` + +4. 安装 OpenSSL + - 下载安装 [OpenSSL](http://slproweb.com/products/Win32OpenSSL.html) + - 添加 OpenSSL 下的 include 目录到 PATH 环境变量中 + + +### 执行编译 + +从 git 克隆源代码: +```shell +git clone https://github.com/apache/iotdb.git +``` + +默认的主分支是 master 分支,如果你想使用某个发布版本,请切换分支 (如 1.3.2 版本): +```shell +git checkout rc/1.3.2 +``` + +在 IoTDB 根目录下执行 maven 编译: + +- Mac 或 glibc 版本 >= 2.32 的 Linux + ```shell + ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp + ``` + +- glibc 版本 >= 2.31 的 Linux + ```shell + ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Diotdb-tools-thrift.version=0.14.1.1-old-glibc-SNAPSHOT + ``` + +- glibc 版本 >= 2.17 的 Linux + ```shell + ./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Diotdb-tools-thrift.version=0.14.1.1-glibc223-SNAPSHOT + ``` + +- 使用 Visual Studio 2022 的 Windows + ```Batchfile + .\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp + ``` + +- 使用 Visual Studio 2019 的 Windows + ```Batchfile + .\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Dcmake.generator="Visual Studio 16 2019" -Diotdb-tools-thrift.version=0.14.1.1-msvc142-SNAPSHOT + ``` + - 如果没有将 Boost 库地址加入 PATH 环境变量,在编译命令中还需添加相关参数,例如:`-DboostIncludeDir="C:\Program Files (x86)\boost_1_78_0" -DboostLibraryDir="C:\Program Files (x86)\boost_1_78_0\stage\lib"` + +编译成功后,打包好的库文件位于 `iotdb-client/client-cpp/target` 中,同时可以在 `example/client-cpp-example/target` 下找到编译好的示例程序。 + +### 编译 Q&A + +Q:Linux 上的环境有哪些要求呢? + +A: +- 已知依赖的 glibc (x86_64 版本) 最低版本要求为 2.17,GCC 最低版本为 5.5 +- 已知依赖的 glibc (ARM 版本) 最低版本要求为 2.31,GCC 最低版本为 10.2 +- 如果不满足上面的要求,可以尝试自己本地编译 Thrift + - 下载 https://github.com/apache/iotdb-bin-resources/tree/iotdb-tools-thrift-v0.14.1.0/iotdb-tools-thrift 这里的代码 + - 执行 `./mvnw clean install` + - 回到 iotdb 代码目录执行 `./mvnw clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp` + + +Q:Linux 编译报错`undefined reference to '_libc_sinle_thread'`如何处理? + +A: +- 该问题是用于默认的预编译 Thrift 依赖了过高的 glibc 版本导致的 +- 可以尝试在编译的 maven 命令中增加 `-Diotdb-tools-thrift.version=0.14.1.1-glibc223-SNAPSHOT` 或者 `-Diotdb-tools-thrift.version=0.14.1.1-old-glibc-SNAPSHOT` + +Q:如果在 Windows 上需要使用 Visual Studio 2017 或更早版本进行编译,要怎么做? + +A: +- 可以尝试自己本地编译 Thrift 后再进行客户端的编译 + - 下载 https://github.com/apache/iotdb-bin-resources/tree/iotdb-tools-thrift-v0.14.1.0/iotdb-tools-thrift 这里的代码 + - 执行 `.\mvnw.cmd clean install` + - 回到 iotdb 代码目录执行 `.\mvnw.cmd clean package -pl example/client-cpp-example -am -DskipTests -P with-cpp -Dcmake.generator="Visual Studio 15 2017"` + +## 基本接口说明 + +下面将给出 Session 接口的简要介绍和原型定义: + +### 初始化 + +- 开启 Session +```cpp +void open(); +``` + +- 开启 Session,并决定是否开启 RPC 压缩 +```cpp +void open(bool enableRPCCompression); +``` + 注意: 客户端的 RPC 压缩开启状态需和服务端一致。 + +- 关闭 Session +```cpp +void close(); +``` + +### 数据定义接口(DDL) + +#### Database 管理 + +- 设置 database +```cpp +void setStorageGroup(const std::string &storageGroupId); +``` + +- 删除单个或多个 database +```cpp +void deleteStorageGroup(const std::string &storageGroup); +void deleteStorageGroups(const std::vector &storageGroups); +``` + +#### 时间序列管理 + +- 创建单个或多个非对齐时间序列 +```cpp +void createTimeseries(const std::string &path, TSDataType::TSDataType dataType, TSEncoding::TSEncoding encoding, + CompressionType::CompressionType compressor); + +void createMultiTimeseries(const std::vector &paths, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors, + std::vector> *propsList, + std::vector> *tagsList, + std::vector> *attributesList, + std::vector *measurementAliasList); +``` + +- 创建对齐时间序列 +```cpp +void createAlignedTimeseries(const std::string &deviceId, + const std::vector &measurements, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors); +``` + +- 删除一个或多个时间序列 +```cpp +void deleteTimeseries(const std::string &path); +void deleteTimeseries(const std::vector &paths); +``` + +- 检查时间序列是否存在 +```cpp +bool checkTimeseriesExists(const std::string &path); +``` + +#### 元数据模版 + +- 创建元数据模板 +```cpp +void createSchemaTemplate(const Template &templ); +``` + +- 挂载元数据模板 +```cpp +void setSchemaTemplate(const std::string &template_name, const std::string &prefix_path); +``` +请注意,如果一个子树中有多个孩子节点需要使用模板,可以在其共同父母节点上使用 setSchemaTemplate 。而只有在已有数据点插入模板对应的物理量时,模板才会被设置为激活状态,进而被 show timeseries 等查询检测到。 + +- 卸载元数据模板 +```cpp +void unsetSchemaTemplate(const std::string &prefix_path, const std::string &template_name); +``` +注意:目前不支持从曾经在`prefixPath`路径及其后代节点使用模板插入数据后(即使数据已被删除)卸载模板。 + +- 在创建概念元数据模板以后,还可以通过以下接口增加或删除模板内的物理量。请注意,已经挂载的模板不能删除内部的物理量。 +```cpp +// 为指定模板新增一组对齐的物理量,若其父节点在模板中已经存在,且不要求对齐,则报错 +void addAlignedMeasurementsInTemplate(const std::string &template_name, + const std::vector &measurements, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors); + +// 为指定模板新增一个对齐物理量, 若其父节点在模板中已经存在,且不要求对齐,则报错 +void addAlignedMeasurementsInTemplate(const std::string &template_name, + const std::string &measurement, + TSDataType::TSDataType dataType, + TSEncoding::TSEncoding encoding, + CompressionType::CompressionType compressor); + +// 为指定模板新增一个不对齐物理量, 若其父节在模板中已经存在,且要求对齐,则报错 +void addUnalignedMeasurementsInTemplate(const std::string &template_name, + const std::vector &measurements, + const std::vector &dataTypes, + const std::vector &encodings, + const std::vector &compressors); + +// 为指定模板新增一组不对齐的物理量, 若其父节在模板中已经存在,且要求对齐,则报错 +void addUnalignedMeasurementsInTemplate(const std::string &template_name, + const std::string &measurement, + TSDataType::TSDataType dataType, + TSEncoding::TSEncoding encoding, + CompressionType::CompressionType compressor); + +// 从指定模板中删除一个节点及其子树 +void deleteNodeInTemplate(const std::string &template_name, const std::string &path); +``` + +- 对于已经创建的元数据模板,还可以通过以下接口查询模板信息: +```cpp +// 查询返回目前模板中所有物理量的数量 +int countMeasurementsInTemplate(const std::string &template_name); + +// 检查模板内指定路径是否为物理量 +bool isMeasurementInTemplate(const std::string &template_name, const std::string &path); + +// 检查在指定模板内是否存在某路径 +bool isPathExistInTemplate(const std::string &template_name, const std::string &path); + +// 返回指定模板内所有物理量的路径 +std::vector showMeasurementsInTemplate(const std::string &template_name); + +// 返回指定模板内某前缀路径下的所有物理量的路径 +std::vector showMeasurementsInTemplate(const std::string &template_name, const std::string &pattern); +``` + + +### 数据操作接口(DML) + +#### 数据写入 + +> 推荐使用 insertTablet 帮助提高写入效率。 + +- 插入一个 Tablet,Tablet 是一个设备若干行数据块,每一行的列都相同。 + - 写入效率高。 + - 支持写入空值:空值处可以填入任意值,然后通过 BitMap 标记空值。 +```cpp +void insertTablet(Tablet &tablet); +``` + +- 插入多个 Tablet +```cpp +void insertTablets(std::unordered_map &tablets); +``` + +- 插入一个 Record,一个 Record 是一个设备一个时间戳下多个测点的数据 +```cpp +void insertRecord(const std::string &deviceId, int64_t time, const std::vector &measurements, + const std::vector &types, const std::vector &values); +``` + +- 插入多个 Record +```cpp +void insertRecords(const std::vector &deviceIds, + const std::vector ×, + const std::vector> &measurementsList, + const std::vector> &typesList, + const std::vector> &valuesList); +``` + +- 插入同属于一个 device 的多个 Record +```cpp +void insertRecordsOfOneDevice(const std::string &deviceId, + std::vector ×, + std::vector> &measurementsList, + std::vector> &typesList, + std::vector> &valuesList); +``` + +#### 带有类型推断的写入 + +服务器需要做类型推断,可能会有额外耗时,速度较无需类型推断的写入慢。 + +```cpp +void insertRecord(const std::string &deviceId, int64_t time, const std::vector &measurements, + const std::vector &values); + + +void insertRecords(const std::vector &deviceIds, + const std::vector ×, + const std::vector> &measurementsList, + const std::vector> &valuesList); + + +void insertRecordsOfOneDevice(const std::string &deviceId, + std::vector ×, + std::vector> &measurementsList, + const std::vector> &valuesList); +``` + +#### 对齐时间序列写入 + +对齐时间序列的写入使用 insertAlignedXXX 接口,其余与上述接口类似: + +- insertAlignedRecord +- insertAlignedRecords +- insertAlignedRecordsOfOneDevice +- insertAlignedTablet +- insertAlignedTablets + +#### 数据删除 + +- 删除一个或多个时间序列在某个时间范围的数据 +```cpp +void deleteData(const std::string &path, int64_t endTime); +void deleteData(const std::vector &paths, int64_t endTime); +void deleteData(const std::vector &paths, int64_t startTime, int64_t endTime); +``` + +### IoTDB-SQL 接口 + +- 执行查询语句 +```cpp +unique_ptr executeQueryStatement(const std::string &sql); +``` + +- 执行非查询语句 +```cpp +void executeNonQueryStatement(const std::string &sql); +``` + + +## 示例代码 + +示例工程源代码: + +- `example/client-cpp-example/src/SessionExample.cpp` +- `example/client-cpp-example/src/AlignedTimeseriesSessionExample.cpp` (使用对齐时间序列) + +编译成功后,示例代码工程位于 `example/client-cpp-example/target` + +## FAQ + +### Thrift 编译相关问题 + +1. MAC:本地 Maven 编译 Thrift 时如出现以下链接的问题,可以尝试将 xcode-commandline 版本从 12 降低到 11.5 + https://stackoverflow.com/questions/63592445/ld-unsupported-tapi-file-type-tapi-tbd-in-yaml-file/65518087#65518087 + + +2. Windows:Maven 编译 Thrift 时需要使用 wget 下载远端文件,可能出现以下报错: + ``` + Failed to delete cached file C:\Users\Administrator\.m2\repository\.cache\download-maven-plugin\index.ser + ``` + + 解决方法: + - 尝试删除 ".m2\repository\\.cache\" 目录并重试。 + - 在添加 pom 文件对应的 download-maven-plugin 中添加 "\true\" diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Go-Native-API.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Go-Native-API.md new file mode 100644 index 00000000..303e791e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Go-Native-API.md @@ -0,0 +1,84 @@ + + +# Go 原生接口 + +## 依赖 + + * golang >= 1.13 + * make >= 3.0 + * curl >= 7.1.1 + * thrift 0.15.0 + * Linux、Macos 或其他类 unix 系统 + * Windows+bash (下载 IoTDB Go client 需要 git ,通过 WSL、cygwin、Git Bash 任意一种方式均可) + +## 安装方法 + + * 通过 go mod + +```sh +# 切换到 GOPATH 的 HOME 路径,启用 Go Modules 功能 +export GO111MODULE=on + +# 配置 GOPROXY 环境变量 +export GOPROXY=https://goproxy.io + +# 创建命名的文件夹或目录,并切换当前目录 +mkdir session_example && cd session_example + +# 保存文件,自动跳转到新的地址 +curl -o session_example.go -L https://github.com/apache/iotdb-client-go/raw/main/example/session_example.go + +# 初始化 go module 环境 +go mod init session_example + +# 下载依赖包 +go mod tidy + +# 编译并运行程序 +go run session_example.go +``` + +* 通过 GOPATH + +```sh +# get thrift 0.13.0 +go get github.com/apache/thrift@0.13.0 + +# 递归创建目录 +mkdir -p $GOPATH/src/iotdb-client-go-example/session_example + +# 切换到当前目录 +cd $GOPATH/src/iotdb-client-go-example/session_example + +# 保存文件,自动跳转到新的地址 +curl -o session_example.go -L https://github.com/apache/iotdb-client-go/raw/main/example/session_example.go + +# 初始化 go module 环境 +go mod init + +# 下载依赖包 +go mod tidy + +# 编译并运行程序 +go run session_example.go +``` +**注意:GO原生客户端Session不是线程安全的,强烈不建议在多线程场景下应用。如有多线程应用场景,请使用Session Pool.** diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-JDBC.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-JDBC.md new file mode 100644 index 00000000..fc726d6c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-JDBC.md @@ -0,0 +1,291 @@ + + +# JDBC(不推荐) + +*注意: 目前的JDBC实现仅是为与第三方工具连接使用的。使用JDBC(执行插入语句时)无法提供高性能写入。 +对于Java应用,我们推荐使用[Java 原生接口](./Programming-Java-Native-API.md)* + +## 依赖 + +* JDK >= 1.8 +* Maven >= 3.6 + +## 安装方法 + +在根目录下执行下面的命令: +```shell +mvn clean install -pl iotdb-client/jdbc -am -DskipTests +``` + +### 在 MAVEN 中使用 IoTDB JDBC + +```xml + + + org.apache.iotdb + iotdb-jdbc + 1.3.1 + + +``` + +### 示例代码 + +本章提供了如何建立数据库连接、执行 SQL 和显示查询结果的示例。 + +要求您已经在工程中包含了数据库编程所需引入的包和 JDBC class. + +**注意:为了更快地插入,建议使用 executeBatch()** + +```java +import java.sql.*; +import org.apache.iotdb.jdbc.IoTDBSQLException; + +public class JDBCExample { + /** + * Before executing a SQL statement with a Statement object, you need to create a Statement object using the createStatement() method of the Connection object. + * After creating a Statement object, you can use its execute() method to execute a SQL statement + * Finally, remember to close the 'statement' and 'connection' objects by using their close() method + * For statements with query results, we can use the getResultSet() method of the Statement object to get the result set. + */ + public static void main(String[] args) throws SQLException { + Connection connection = getConnection(); + if (connection == null) { + System.out.println("get connection defeat"); + return; + } + Statement statement = connection.createStatement(); + //Create database + try { + statement.execute("CREATE DATABASE root.demo"); + }catch (IoTDBSQLException e){ + System.out.println(e.getMessage()); + } + + //SHOW DATABASES + statement.execute("SHOW DATABASES"); + outputResult(statement.getResultSet()); + + //Create time series + //Different data type has different encoding methods. Here use INT32 as an example + try { + statement.execute("CREATE TIMESERIES root.demo.s0 WITH DATATYPE=INT32,ENCODING=RLE;"); + }catch (IoTDBSQLException e){ + System.out.println(e.getMessage()); + } + //Show time series + statement.execute("SHOW TIMESERIES root.demo"); + outputResult(statement.getResultSet()); + //Show devices + statement.execute("SHOW DEVICES"); + outputResult(statement.getResultSet()); + //Count time series + statement.execute("COUNT TIMESERIES root"); + outputResult(statement.getResultSet()); + //Count nodes at the given level + statement.execute("COUNT NODES root LEVEL=3"); + outputResult(statement.getResultSet()); + //Count timeseries group by each node at the given level + statement.execute("COUNT TIMESERIES root GROUP BY LEVEL=3"); + outputResult(statement.getResultSet()); + + + //Execute insert statements in batch + statement.addBatch("insert into root.demo(timestamp,s0) values(1,1);"); + statement.addBatch("insert into root.demo(timestamp,s0) values(2,15);"); + statement.addBatch("insert into root.demo(timestamp,s0) values(2,17);"); + statement.addBatch("insert into root.demo(timestamp,s0) values(4,12);"); + statement.executeBatch(); + statement.clearBatch(); + + //Full query statement + String sql = "select * from root.demo"; + ResultSet resultSet = statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Exact query statement + sql = "select s0 from root.demo where time = 4;"; + resultSet= statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Time range query + sql = "select s0 from root.demo where time >= 2 and time < 5;"; + resultSet = statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Aggregate query + sql = "select count(s0) from root.demo;"; + resultSet = statement.executeQuery(sql); + System.out.println("sql: " + sql); + outputResult(resultSet); + + //Delete time series + statement.execute("delete timeseries root.demo.s0"); + + //close connection + statement.close(); + connection.close(); + } + + public static Connection getConnection() { + // JDBC driver name and database URL + String driver = "org.apache.iotdb.jdbc.IoTDBDriver"; + String url = "jdbc:iotdb://127.0.0.1:6667/"; + // set rpc compress mode + // String url = "jdbc:iotdb://127.0.0.1:6667?rpc_compress=true"; + + // Database credentials + String username = "root"; + String password = "root"; + + Connection connection = null; + try { + Class.forName(driver); + connection = DriverManager.getConnection(url, username, password); + } catch (ClassNotFoundException e) { + e.printStackTrace(); + } catch (SQLException e) { + e.printStackTrace(); + } + return connection; + } + + /** + * This is an example of outputting the results in the ResultSet + */ + private static void outputResult(ResultSet resultSet) throws SQLException { + if (resultSet != null) { + System.out.println("--------------------------"); + final ResultSetMetaData metaData = resultSet.getMetaData(); + final int columnCount = metaData.getColumnCount(); + for (int i = 0; i < columnCount; i++) { + System.out.print(metaData.getColumnLabel(i + 1) + " "); + } + System.out.println(); + while (resultSet.next()) { + for (int i = 1; ; i++) { + System.out.print(resultSet.getString(i)); + if (i < columnCount) { + System.out.print(", "); + } else { + System.out.println(); + break; + } + } + } + System.out.println("--------------------------\n"); + } + } +} +``` + +可以在 url 中指定 version 参数: +```java +String url = "jdbc:iotdb://127.0.0.1:6667?version=V_1_0"; +``` +version 表示客户端使用的 SQL 语义版本,用于升级 0.13 时兼容 0.12 的 SQL 语义,可能取值有:`V_0_12`、`V_0_13`、`V_1_0`。 + +此外,IoTDB 在 JDBC 中提供了额外的接口,供用户在连接中使用不同的字符集(例如 GB18030)读写数据库。 +IoTDB 默认的字符集为 UTF-8。当用户期望使用 UTF-8 外的字符集时,需要在 JDBC 的连接中,指定 charset 属性。例如: +1. 使用 GB18030 的 charset 创建连接: +```java +DriverManager.getConnection("jdbc:iotdb://127.0.0.1:6667?charset=GB18030", "root", "root") +``` +2. 调用如下 `IoTDBStatement` 接口执行 SQL 时,可以接受 `byte[]` 编码的 SQL,该 SQL 将按照被指定的 charset 解析成字符串。 +```java +public boolean execute(byte[] sql) throws SQLException; +``` +3. 查询结果输出时,可使用 `ResultSet` 的 `getBytes` 方法得到的 `byte[]`,`byte[]` 的编码使用连接指定的 charset 进行。 +```java +System.out.print(resultSet.getString(i) + " (" + new String(resultSet.getBytes(i), charset) + ")"); +``` +以下是完整示例: +```java +public class JDBCCharsetExample { + + private static final Logger LOGGER = LoggerFactory.getLogger(JDBCCharsetExample.class); + + public static void main(String[] args) throws Exception { + Class.forName("org.apache.iotdb.jdbc.IoTDBDriver"); + + try (final Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667?charset=GB18030", "root", "root"); + final IoTDBStatement statement = (IoTDBStatement) connection.createStatement()) { + + final String insertSQLWithGB18030 = + "insert into root.测试(timestamp, 维语, 彝语, 繁体, 蒙文, 简体, 标点符号, 藏语) values(1, 'ئۇيغۇر تىلى', 'ꆈꌠꉙ', \"繁體\", 'ᠮᠣᠩᠭᠣᠯ ᠬᠡᠯᠡ', '简体', '——?!', \"བོད་སྐད།\");"; + final byte[] insertSQLWithGB18030Bytes = insertSQLWithGB18030.getBytes("GB18030"); + statement.execute(insertSQLWithGB18030Bytes); + } catch (IoTDBSQLException e) { + LOGGER.error("IoTDB Jdbc example error", e); + } + + outputResult("GB18030"); + outputResult("UTF-8"); + outputResult("UTF-16"); + outputResult("GBK"); + outputResult("ISO-8859-1"); + } + + private static void outputResult(String charset) throws SQLException { + System.out.println("[Charset: " + charset + "]"); + try (final Connection connection = + DriverManager.getConnection( + "jdbc:iotdb://127.0.0.1:6667?charset=" + charset, "root", "root"); + final IoTDBStatement statement = (IoTDBStatement) connection.createStatement()) { + outputResult(statement.executeQuery("select ** from root"), Charset.forName(charset)); + } catch (IoTDBSQLException e) { + LOGGER.error("IoTDB Jdbc example error", e); + } + } + + private static void outputResult(ResultSet resultSet, Charset charset) throws SQLException { + if (resultSet != null) { + System.out.println("--------------------------"); + final ResultSetMetaData metaData = resultSet.getMetaData(); + final int columnCount = metaData.getColumnCount(); + for (int i = 0; i < columnCount; i++) { + System.out.print(metaData.getColumnLabel(i + 1) + " "); + } + System.out.println(); + + while (resultSet.next()) { + for (int i = 1; ; i++) { + System.out.print( + resultSet.getString(i) + " (" + new String(resultSet.getBytes(i), charset) + ")"); + if (i < columnCount) { + System.out.print(", "); + } else { + System.out.println(); + break; + } + } + } + System.out.println("--------------------------\n"); + } + } +} +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Java-Native-API.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Java-Native-API.md new file mode 100644 index 00000000..7c952a91 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Java-Native-API.md @@ -0,0 +1,793 @@ + + +# Java 原生接口 + +## 安装 + +### 依赖 + +* JDK >= 1.8 +* Maven >= 3.6 + + +### 在 MAVEN 中使用原生接口 + +```xml + + + org.apache.iotdb + iotdb-session + ${project.version} + + +``` + +## 语法说明 + + - 对于 IoTDB-SQL 接口:传入的 SQL 参数需要符合 [语法规范](../User-Manual/Syntax-Rule.md#字面值常量) ,并且针对 JAVA 字符串进行反转义,如双引号前需要加反斜杠。(即:经 JAVA 转义之后与命令行执行的 SQL 语句一致。) + - 对于其他接口: + - 经参数传入的路径或路径前缀中的节点: 在 SQL 语句中需要使用反引号(`)进行转义的,此处均需要进行转义。 + - 经参数传入的标识符(如模板名):在 SQL 语句中需要使用反引号(`)进行转义的,均可以不用进行转义。 + - 语法说明相关代码示例可以参考:`example/session/src/main/java/org/apache/iotdb/SyntaxConventionRelatedExample.java` + +## 基本接口说明 + +下面将给出 Session 对应的接口的简要介绍和对应参数: + +### Session管理 + +* 初始化 Session + +``` java +// 全部使用默认配置 +session = new Session.Builder.build(); + +// 指定一个可连接节点 +session = + new Session.Builder() + .host(String host) + .port(int port) + .build(); + +// 指定多个可连接节点 +session = + new Session.Builder() + .nodeUrls(List nodeUrls) + .build(); + +// 其他配置项 +session = + new Session.Builder() + .fetchSize(int fetchSize) + .username(String username) + .password(String password) + .thriftDefaultBufferSize(int thriftDefaultBufferSize) + .thriftMaxFrameSize(int thriftMaxFrameSize) + .enableRedirection(boolean enableRedirection) + .version(Version version) + .build(); +``` + +其中,version 表示客户端使用的 SQL 语义版本,用于升级 0.13 时兼容 0.12 的 SQL 语义,可能取值有:`V_0_12`、`V_0_13`、`V_1_0`等。 + + +* 开启 Session + +``` java +void open() +``` + +* 开启 Session,并决定是否开启 RPC 压缩 + +``` java +void open(boolean enableRPCCompression) +``` + +注意: 客户端的 RPC 压缩开启状态需和服务端一致 + +* 关闭 Session + +``` java +void close() +``` + +* SessionPool + +我们提供了一个针对原生接口的连接池 (`SessionPool`),使用该接口时,你只需要指定连接池的大小,就可以在使用时从池中获取连接。 +如果超过 60s 都没得到一个连接的话,那么会打印一条警告日志,但是程序仍将继续等待。 + +当一个连接被用完后,他会自动返回池中等待下次被使用; +当一个连接损坏后,他会从池中被删除,并重建一个连接重新执行用户的操作; +你还可以像创建 Session 那样在创建 SessionPool 时指定多个可连接节点的 url,以保证分布式集群中客户端的高可用性。 + +对于查询操作: + +1. 使用 SessionPool 进行查询时,得到的结果集是`SessionDataSet`的封装类`SessionDataSetWrapper`; +2. 若对于一个查询的结果集,用户并没有遍历完且不再想继续遍历时,需要手动调用释放连接的操作`closeResultSet`; +3. 若对一个查询的结果集遍历时出现异常,也需要手动调用释放连接的操作`closeResultSet`. +4. 可以调用 `SessionDataSetWrapper` 的 `getColumnNames()` 方法得到结果集列名 + +使用示例可以参见 `session/src/test/java/org/apache/iotdb/session/pool/SessionPoolTest.java` + +或 `example/session/src/main/java/org/apache/iotdb/SessionPoolExample.java` + + +### 测点管理接口 + +#### Database 管理 + +* 设置 database + +``` java +void setStorageGroup(String storageGroupId) +``` + +* 删除单个或多个 database + +``` java +void deleteStorageGroup(String storageGroup) +void deleteStorageGroups(List storageGroups) +``` +#### 时间序列管理 + +* 创建单个或多个时间序列 + +``` java +void createTimeseries(String path, TSDataType dataType, + TSEncoding encoding, CompressionType compressor, Map props, + Map tags, Map attributes, String measurementAlias) + +void createMultiTimeseries(List paths, List dataTypes, + List encodings, List compressors, + List> propsList, List> tagsList, + List> attributesList, List measurementAliasList) +``` + +* 创建对齐时间序列 + +``` +void createAlignedTimeseries(String prefixPath, List measurements, + List dataTypes, List encodings, + List compressors, List measurementAliasList); +``` + +注意:目前**暂不支持**使用传感器别名。 + +* 删除一个或多个时间序列 + +``` java +void deleteTimeseries(String path) +void deleteTimeseries(List paths) +``` + +* 检测时间序列是否存在 + +``` java +boolean checkTimeseriesExists(String path) +``` + +#### 元数据模版 + +* 创建元数据模板,可以通过先后创建 Template、MeasurementNode 的对象,描述模板内物理量结构与类型、编码方式、压缩方式等信息,并通过以下接口创建模板 + +``` java +public void createSchemaTemplate(Template template); + +Class Template { + private String name; + private boolean directShareTime; + Map children; + public Template(String name, boolean isShareTime); + + public void addToTemplate(Node node); + public void deleteFromTemplate(String name); + public void setShareTime(boolean shareTime); +} + +Abstract Class Node { + private String name; + public void addChild(Node node); + public void deleteChild(Node node); +} + +Class MeasurementNode extends Node { + TSDataType dataType; + TSEncoding encoding; + CompressionType compressor; + public MeasurementNode(String name, + TSDataType dataType, + TSEncoding encoding, + CompressionType compressor); +} +``` + +通过上述类的实例描述模板时,Template 内应当仅能包含单层的 MeasurementNode,具体可以参见如下示例: + +``` java +MeasurementNode nodeX = new MeasurementNode("x", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); +MeasurementNode nodeY = new MeasurementNode("y", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); +MeasurementNode nodeSpeed = new MeasurementNode("speed", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY); + +// This is the template we suggest to implement +Template flatTemplate = new Template("flatTemplate"); +template.addToTemplate(nodeX); +template.addToTemplate(nodeY); +template.addToTemplate(nodeSpeed); + +createSchemaTemplate(flatTemplate); +``` + +* 完成模板挂载操作后,可以通过如下的接口在给定的设备上使用模板注册序列,或者也可以直接向相应的设备写入数据以自动使用模板注册序列。 + +``` java +void createTimeseriesUsingSchemaTemplate(List devicePathList) +``` + +* 将名为'templateName'的元数据模板挂载到'prefixPath'路径下,在执行这一步之前,你需要创建名为'templateName'的元数据模板 +* **请注意,我们强烈建议您将模板设置在 database 或 database 下层的节点中,以更好地适配未来版本更新及各模块的协作** + +``` java +void setSchemaTemplate(String templateName, String prefixPath) +``` + +- 将模板挂载到 MTree 上之后,你可以随时查询所有模板的名称、某模板被设置到 MTree 的所有路径、所有正在使用某模板的所有路径,即如下接口: + +``` java +/** @return All template names. */ +public List showAllTemplates(); + +/** @return All paths have been set to designated template. */ +public List showPathsTemplateSetOn(String templateName); + +/** @return All paths are using designated template. */ +public List showPathsTemplateUsingOn(String templateName) +``` + +- 如果你需要删除某一个模板,请确保在进行删除之前,MTree 上已经没有节点被挂载了模板,对于已经被挂载模板的节点,可以用如下接口卸载模板; + + +``` java +void unsetSchemaTemplate(String prefixPath, String templateName); +public void dropSchemaTemplate(String templateName); +``` + +* 请注意,如果一个子树中有多个孩子节点需要使用模板,可以在其共同父母节点上使用 setSchemaTemplate 。而只有在已有数据点插入模板对应的物理量时,模板才会被设置为激活状态,进而被 show timeseries 等查询检测到。 +* 卸载'prefixPath'路径下的名为'templateName'的元数据模板。你需要保证给定的路径'prefixPath'下需要有名为'templateName'的元数据模板。 + +注意:目前不支持从曾经在'prefixPath'路径及其后代节点使用模板插入数据后(即使数据已被删除)卸载模板。 + + +### 数据写入接口 + +推荐使用 insertTablet 帮助提高写入效率 + +* 插入一个 Tablet,Tablet 是一个设备若干行数据块,每一行的列都相同 + * **写入效率高** + * **支持批量写入** + * **支持写入空值**:空值处可以填入任意值,然后通过 BitMap 标记空值 + +``` java +void insertTablet(Tablet tablet) + +public class Tablet { + /** deviceId of this tablet */ + public String prefixPath; + /** the list of measurement schemas for creating the tablet */ + private List schemas; + /** timestamps in this tablet */ + public long[] timestamps; + /** each object is a primitive type array, which represents values of one measurement */ + public Object[] values; + /** each bitmap represents the existence of each value in the current column. */ + public BitMap[] bitMaps; + /** the number of rows to include in this tablet */ + public int rowSize; + /** the maximum number of rows for this tablet */ + private int maxRowNumber; + /** whether this tablet store data of aligned timeseries or not */ + private boolean isAligned; +} +``` + +* 插入多个 Tablet + +``` java +void insertTablets(Map tablets) +``` + +* 插入一个 Record,一个 Record 是一个设备一个时间戳下多个测点的数据。这里的 value 是 Object 类型,相当于提供了一个公用接口,后面可以通过 TSDataType 将 value 强转为原类型 + + 其中,Object 类型与 TSDataType 类型的对应关系如下表所示: + + | TSDataType | Object | + |------------|--------------| + | BOOLEAN | Boolean | + | INT32 | Integer | + | DATE | LocalDate | + | INT64 | Long | + | TIMESTAMP | Long | + | FLOAT | Float | + | DOUBLE | Double | + | TEXT | String, Binary | + | STRING | String, Binary | + | BLOB | Binary | + +``` java +void insertRecord(String prefixPath, long time, List measurements, + List types, List values) +``` + +* 插入多个 Record + +``` java +void insertRecords(List deviceIds, + List times, + List> measurementsList, + List> typesList, + List> valuesList) +``` + +* 插入同属于一个 device 的多个 Record + +``` java +void insertRecordsOfOneDevice(String deviceId, List times, + List> measurementsList, List> typesList, + List> valuesList) +``` + +#### 带有类型推断的写入 + +当数据均是 String 类型时,我们可以使用如下接口,根据 value 的值进行类型推断。例如:value 为 "true" ,就可以自动推断为布尔类型。value 为 "3.2" ,就可以自动推断为数值类型。服务器需要做类型推断,可能会有额外耗时,速度较无需类型推断的写入慢 + +* 插入一个 Record,一个 Record 是一个设备一个时间戳下多个测点的数据 + +``` java +void insertRecord(String prefixPath, long time, List measurements, List values) +``` + +* 插入多个 Record + +``` java +void insertRecords(List deviceIds, List times, + List> measurementsList, List> valuesList) +``` + +* 插入同属于一个 device 的多个 Record + +``` java +void insertStringRecordsOfOneDevice(String deviceId, List times, + List> measurementsList, List> valuesList) +``` + +#### 对齐时间序列的写入 + +对齐时间序列的写入使用 insertAlignedXXX 接口,其余与上述接口类似: + +* insertAlignedRecord +* insertAlignedRecords +* insertAlignedRecordsOfOneDevice +* insertAlignedStringRecordsOfOneDevice +* insertAlignedTablet +* insertAlignedTablets + +### 数据删除接口 + +* 删除一个或多个时间序列在某个时间点前或这个时间点的数据 + +``` java +void deleteData(String path, long endTime) +void deleteData(List paths, long endTime) +``` + +### 数据查询接口 + +* 时间序列原始数据范围查询: + - 指定的查询时间范围为左闭右开区间,包含开始时间但不包含结束时间。 + +``` java +SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); +``` + +* 最新点查询: + - 查询最后一条时间戳大于等于某个时间点的数据。 + ``` java + SessionDataSet executeLastDataQuery(List paths, long lastTime); + ``` + - 快速查询单设备下指定序列最新点,支持重定向;如果您确认使用的查询路径是合法的,可将`isLegalPathNodes`置为true以避免路径校验带来的性能损失。 + ``` java + SessionDataSet executeLastDataQueryForOneDevice( + String db, String device, List sensors, boolean isLegalPathNodes); + ``` + +* 聚合查询: + - 支持指定查询时间范围。指定的查询时间范围为左闭右开区间,包含开始时间但不包含结束时间。 + - 支持按照时间区间分段查询。 + +``` java +SessionDataSet executeAggregationQuery(List paths, List aggregations); + +SessionDataSet executeAggregationQuery( + List paths, List aggregations, long startTime, long endTime); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval, + long slidingStep); +``` + +* 直接执行查询语句 + +``` java +SessionDataSet executeQueryStatement(String sql) +``` + +### 数据订阅 + +#### 1 Topic 管理 + +IoTDB 订阅客户端中的 `SubscriptionSession` 类提供了 Topic 管理的相关接口。Topic状态变化如下图所示: + +
+ +
+ +##### 1.1 创建 Topic + +```Java + void createTopicIfNotExists(String topicName, Properties properties) throws Exception; +``` + +示例: + +```Java +try (final SubscriptionSession session = new SubscriptionSession(host, port)) { + session.open(); + final Properties config = new Properties(); + config.put(TopicConstant.PATH_KEY, "root.db.**"); + session.createTopic(topicName, config); +} +``` + +##### 1.2 删除 Topic + +```Java +void dropTopicIfExists(String topicName) throws Exception; +``` + +##### 1.3 查看 Topic + +```Java +// 获取所有 topics +Set getTopics() throws Exception; + +// 获取单个 topic +Optional getTopic(String topicName) throws Exception; +``` + +#### 2 查看订阅状态 + +IoTDB 订阅客户端中的 `SubscriptionSession` 类提供了获取订阅状态的相关接口: + +```Java +Set getSubscriptions() throws Exception; +Set getSubscriptions(final String topicName) throws Exception; +``` + +#### 3 创建 Consumer + +在使用 JAVA 原生接口创建 consumer 时,需要指定 consumer 所应用的参数。 + +对于 `SubscriptionPullConsumer` 和 `SubscriptionPushConsumer` 而言,有以下公共配置: + +| 参数 | 是否必填(默认值) | 参数含义 | +| :-------------------------------------------- | :--------------------------------- | :----------------------------------------------------------- | +| host | optional: 127.0.0.1 | `String`: IoTDB 中某 DataNode 的 RPC host | +| port | optional: 6667 | `Integer`: IoTDB 中某 DataNode 的 RPC port | +| node-urls | optional: 127.0.0.1:6667 | `List`: IoTDB 中所有 DataNode 的 RPC 地址,可以是多个;host:port 和 node-urls 选填一个即可。当 host:port 和 node-urls 都填写了,则取 host:port 和 node-urls 的**并集**构成新的 node-urls 应用 | +| username | optional: root | `String`: IoTDB 中 DataNode 的用户名 | +| password | optional: root | `String`: IoTDB 中 DataNode 的密码 | +| groupId | optional | `String`: consumer group id,若未指定则随机分配(新的 consumer group),保证不同的 consumer group 对应的 consumer group id 均不相同 | +| consumerId | optional | `String`: consumer client id,若未指定则随机分配,保证同一个 consumer group 中每一个 consumer client id 均不相同 | +| heartbeatIntervalMs | optional: 30000 (min: 1000) | `Long`: consumer 向 IoTDB DataNode 定期发送心跳请求的间隔 | +| endpointsSyncIntervalMs | optional: 120000 (min: 5000) | `Long`: consumer 探测 IoTDB 集群节点扩缩容情况调整订阅连接的间隔 | +| fileSaveDir | optional: Paths.get(System.getProperty("user.dir"), "iotdb-subscription").toString() | `String`: consumer 订阅出的 TsFile 文件临时存放的目录路径 | +| fileSaveFsync | optional: false | `Boolean`: consumer 订阅 TsFile 的过程中是否主动调用 fsync | + + +##### 3.1 SubscriptionPushConsumer + +以下为 `SubscriptionPushConsumer` 中的特殊配置: +| 参数 | 是否必填(默认值) | 参数含义 | +| :-------------------------------------------- | :--------------------------------- | :----------------------------------------------------------- | +| ackStrategy | optional: `ACKStrategy.AFTER_CONSUME` | 消费进度的确认机制包含以下选项:`ACKStrategy.BEFORE_CONSUME`(当 consumer 收到数据时立刻提交消费进度,`onReceive` 前)`ACKStrategy.AFTER_CONSUME`(当 consumer 消费完数据再去提交消费进度,`onReceive` 后) | +| consumeListener | optional | 消费数据的回调函数,需实现 `ConsumeListener` 接口,定义消费 `SessionDataSetsHandler` 和 `TsFileHandler` 形式数据的处理逻辑 | +| autoPollIntervalMs | optional: 5000 (min: 500) | Long: consumer 自动拉取数据的时间间隔,单位为**毫秒** | +| autoPollTimeoutMs | optional: 10000 (min: 1000) | Long: consumer 每次拉取数据的超时时间,单位为**毫秒** | + +其中,`ConsumerListener` 接口定义如下: + +```Java +@FunctionInterface +interface ConsumeListener { + default ConsumeResult onReceive(Message message) { + return ConsumeResult.SUCCESS; + } +} + +enum ConsumeResult { + SUCCESS, + FAILURE, +} +``` + +##### 3.2 SubscriptionPullConsumer + +以下为 `SubscriptionPullConsumer` 中的特殊配置: + +| 参数 | 是否必填(默认值) | 参数含义 | +| :-------------------------------------------- | :--------------------------------- | :----------------------------------------------------------- | +| autoCommit | optional: true | Boolean: 是否自动提交消费进度如果此参数设置为 false,则需要调用 `commit` 方法来手动提交消费进度 | +| autoCommitInterval | optional: 5000 (min: 500) | Long: 自动提交消费进度的时间间隔,单位为**毫秒**仅当 autoCommit 参数为 true 的时候才会生效 | + +在创建 consumer 后,需要手动调用 consumer 的 open 方法: + +```Java +void open() throws Exception; +``` + +此时,IoTDB 订阅客户端才会校验 consumer 的配置正确性,在校验成功后 consumer 就会加入对应的 consumer group。也就是说,在打开 consumer 后,才可以使用返回的 consumer 对象进行订阅 Topic,消费数据等操作。 + +#### 4 订阅 Topic + +`SubscriptionPushConsumer` 和 `SubscriptionPullConsumer` 提供了下述 JAVA 原生接口用于订阅 Topics: + +```Java +// 订阅 topics +void subscribe(String topic) throws Exception; +void subscribe(List topics) throws Exception; +``` + +- 在 consumer 订阅 topic 前,topic 必须已经被创建,否则订阅会失败 +- 一个 consumer 在已经订阅了某个 topic 的情况下再次订阅这个 topic,不会报错 +- 如果该 consumer 所在的 consumer group 中已经有 consumers 订阅了相同的 topics,那么该 consumer 将会复用对应的消费进度 + +#### 5 消费数据 + +无论是 push 模式还是 pull 模式的 consumer: + +- 只有显式订阅了某个 topic,才会收到对应 topic 的数据 +- 若在创建后没有订阅任何 topics,此时该 consumer 无法消费到任何数据,即使该 consumer 所在的 consumer group 中其它的 consumers 订阅了一些 topics + +##### 5.1 SubscriptionPushConsumer + +SubscriptionPushConsumer 在订阅 topics 后,无需手动拉取数据,其消费数据的逻辑在创建 SubscriptionPushConsumer 指定的 `consumeListener` 配置中。 + +##### 5.2 SubscriptionPullConsumer + +SubscriptionPullConsumer 在订阅 topics 后,需要主动调用 `poll` 方法拉取数据: + +```Java +List poll(final Duration timeout) throws Exception; +List poll(final long timeoutMs) throws Exception; +List poll(final Set topicNames, final Duration timeout) throws Exception; +List poll(final Set topicNames, final long timeoutMs) throws Exception; +``` + +在 poll 方法中可以指定需要拉取的 topic 名称(如果不指定则默认拉取该 consumer 已订阅的所有 topics)和超时时间。 + +当 SubscriptionPullConsumer 配置 autoCommit 参数为 false 时,需要手动调用 commitSync 和 commitAsync 方法同步或异步提交某批数据的消费进度: + +```Java +void commitSync(final SubscriptionMessage message) throws Exception; +void commitSync(final Iterable messages) throws Exception; + +CompletableFuture commitAsync(final SubscriptionMessage message); +CompletableFuture commitAsync(final Iterable messages); +void commitAsync(final SubscriptionMessage message, final AsyncCommitCallback callback); +void commitAsync(final Iterable messages, final AsyncCommitCallback callback); +``` + +AsyncCommitCallback 类定义如下: + +```Java +public interface AsyncCommitCallback { + default void onComplete() { + // Do nothing + } + + default void onFailure(final Throwable e) { + // Do nothing + } +} +``` + +#### 6 取消订阅 + +`SubscriptionPushConsumer` 和 `SubscriptionPullConsumer` 提供了下述 JAVA 原生接口用于取消订阅并关闭 consumer: + +```Java +// 取消订阅 topics +void unsubscribe(String topic) throws Exception; +void unsubscribe(List topics) throws Exception; + +// 关闭 consumer +void close(); +``` + +- 在 topic 存在的情况下,如果一个 consumer 在没有订阅了某个 topic 的情况下取消订阅某个 topic,不会报错 +- consumer close 时会退出对应的 consumer group,同时自动 unsubscribe 该 consumer 现存订阅的所有 topics +- consumer 在 close 后生命周期即结束,无法再重新 open 订阅并消费数据 + +#### 7 代码示例 + +##### 7.1 单 Pull Consumer 消费 SessionDataSetsHandler 形式的数据 + +```Java +// Create topics +try (final SubscriptionSession session = new SubscriptionSession(HOST, PORT)) { + session.open(); + final Properties config = new Properties(); + config.put(TopicConstant.PATH_KEY, "root.db.**"); + session.createTopic(TOPIC_1, config); +} + +// Subscription: property-style ctor +final Properties config = new Properties(); +config.put(ConsumerConstant.CONSUMER_ID_KEY, "c1"); +config.put(ConsumerConstant.CONSUMER_GROUP_ID_KEY, "cg1"); + +final SubscriptionPullConsumer consumer1 = new SubscriptionPullConsumer(config); +consumer1.open(); +consumer1.subscribe(TOPIC_1); +while (true) { + LockSupport.parkNanos(SLEEP_NS); // wait some time + final List messages = consumer1.poll(POLL_TIMEOUT_MS); + for (final SubscriptionMessage message : messages) { + for (final SubscriptionSessionDataSet dataSet : message.getSessionDataSetsHandler()) { + System.out.println(dataSet.getColumnNames()); + System.out.println(dataSet.getColumnTypes()); + while (dataSet.hasNext()) { + System.out.println(dataSet.next()); + } + } + } + // Auto commit +} + +// Show topics and subscriptions +try (final SubscriptionSession session = new SubscriptionSession(HOST, PORT)) { + session.open(); + session.getTopics().forEach((System.out::println)); + session.getSubscriptions().forEach((System.out::println)); +} + +consumer1.unsubscribe(TOPIC_1); +consumer1.close(); +``` + +##### 7.2 多 Push Consumer 消费 TsFileHandler 形式的数据 + +```Java +// Create topics +try (final SubscriptionSession subscriptionSession = new SubscriptionSession(HOST, PORT)) { + subscriptionSession.open(); + final Properties config = new Properties(); + config.put(TopicConstant.FORMAT_KEY, TopicConstant.FORMAT_TS_FILE_HANDLER_VALUE); + subscriptionSession.createTopic(TOPIC_2, config); +} + +final List threads = new ArrayList<>(); +for (int i = 0; i < 8; ++i) { + final int idx = i; + final Thread thread = + new Thread( + () -> { + // Subscription: builder-style ctor + try (final SubscriptionPushConsumer consumer2 = + new SubscriptionPushConsumer.Builder() + .consumerId("c" + idx) + .consumerGroupId("cg2") + .fileSaveDir(System.getProperty("java.io.tmpdir")) + .ackStrategy(AckStrategy.AFTER_CONSUME) + .consumeListener( + message -> { + doSomething(message.getTsFileHandler()); + return ConsumeResult.SUCCESS; + }) + .buildPushConsumer()) { + consumer2.open(); + consumer2.subscribe(TOPIC_2); + // block the consumer main thread + Thread.sleep(Long.MAX_VALUE); + } catch (final IOException | InterruptedException e) { + throw new RuntimeException(e); + } + }); + thread.start(); + threads.add(thread); +} + +for (final Thread thread : threads) { + thread.join(); +} +``` + + +### 其他功能(直接执行SQL语句) + +``` java +void executeNonQueryStatement(String sql) +``` + +### 写入测试接口 (用于分析网络带宽) + +不实际写入数据,只将数据传输到 server 即返回 + +* 测试 insertRecord + +``` java +void testInsertRecord(String deviceId, long time, List measurements, List values) + +void testInsertRecord(String deviceId, long time, List measurements, + List types, List values) +``` + +* 测试 testInsertRecords + +``` java +void testInsertRecords(List deviceIds, List times, + List> measurementsList, List> valuesList) + +void testInsertRecords(List deviceIds, List times, + List> measurementsList, List> typesList, + List> valuesList) +``` + +* 测试 insertTablet + +``` java +void testInsertTablet(Tablet tablet) +``` + +* 测试 insertTablets + +``` java +void testInsertTablets(Map tablets) +``` + +### 示例代码 + +浏览上述接口的详细信息,请参阅代码 ```session/src/main/java/org/apache/iotdb/session/Session.java``` + +使用上述接口的示例代码在 ```example/session/src/main/java/org/apache/iotdb/SessionExample.java``` + +使用对齐时间序列和元数据模板的示例可以参见 `example/session/src/main/java/org/apache/iotdb/AlignedTimeseriesSessionExample.java` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Kafka.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Kafka.md new file mode 100644 index 00000000..61bfaab0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Kafka.md @@ -0,0 +1,118 @@ + + +# Kafka + +[Apache Kafka](https://kafka.apache.org/) 是一个开源的分布式事件流平台,被数千家公司用于高性能数据管道、流分析、数据集成和关键任务应用。 + +## 示例代码 + +### kafka 生产者生产数据 Java 代码示例 + +```java + Properties props = new Properties(); + props.put("bootstrap.servers", "127.0.0.1:9092"); + props.put("key.serializer", StringSerializer.class); + props.put("value.serializer", StringSerializer.class); + KafkaProducer producer = new KafkaProducer<>(props); + producer.send( + new ProducerRecord<>( + "Kafka-Test", "key", "root.kafka," + System.currentTimeMillis() + ",value,INT32,100")); + producer.close(); +``` + +### kafka 消费者接收数据 Java 代码示例 + +```java + Properties props = new Properties(); + props.put("bootstrap.servers", "127.0.0.1:9092"); + props.put("key.deserializer", StringDeserializer.class); + props.put("value.deserializer", StringDeserializer.class); + props.put("auto.offset.reset", "earliest"); + props.put("group.id", "Kafka-Test"); + KafkaConsumer kafkaConsumer = new KafkaConsumer<>(props); + kafkaConsumer.subscribe(Collections.singleton("Kafka-Test")); + ConsumerRecords records = kafkaConsumer.poll(Duration.ofSeconds(1)); + ``` + +### 存入 IoTDB 服务器的 Java 代码示例 + +```java + SessionPool pool = + new SessionPool.Builder() + .host("127.0.0.1") + .port(6667) + .user("root") + .password("root") + .maxSize(3) + .build(); + List datas = new ArrayList<>(records.count()); + for (ConsumerRecord record : records) { + datas.add(record.value()); + } + int size = datas.size(); + List deviceIds = new ArrayList<>(size); + List times = new ArrayList<>(size); + List> measurementsList = new ArrayList<>(size); + List> typesList = new ArrayList<>(size); + List> valuesList = new ArrayList<>(size); + for (String data : datas) { + String[] dataArray = data.split(","); + String device = dataArray[0]; + long time = Long.parseLong(dataArray[1]); + List measurements = Arrays.asList(dataArray[2].split(":")); + List types = new ArrayList<>(); + for (String type : dataArray[3].split(":")) { + types.add(TSDataType.valueOf(type)); + } + List values = new ArrayList<>(); + String[] valuesStr = dataArray[4].split(":"); + for (int i = 0; i < valuesStr.length; i++) { + switch (types.get(i)) { + case INT64: + values.add(Long.parseLong(valuesStr[i])); + break; + case DOUBLE: + values.add(Double.parseDouble(valuesStr[i])); + break; + case INT32: + values.add(Integer.parseInt(valuesStr[i])); + break; + case TEXT: + values.add(valuesStr[i]); + break; + case FLOAT: + values.add(Float.parseFloat(valuesStr[i])); + break; + case BOOLEAN: + values.add(Boolean.parseBoolean(valuesStr[i])); + break; + } + } + deviceIds.add(device); + times.add(time); + measurementsList.add(measurements); + typesList.add(types); + valuesList.add(values); + } + pool.insertRecords(deviceIds, times, measurementsList, typesList, valuesList); + ``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-MQTT.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-MQTT.md new file mode 100644 index 00000000..bb8dd1a0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-MQTT.md @@ -0,0 +1,179 @@ + + +# MQTT 协议 + +[MQTT](http://mqtt.org/) 是机器对机器(M2M)/“物联网”连接协议。 + +它被设计为一种非常轻量级的发布/订阅消息传递。 + +对于与需要较小代码占用和/或网络带宽非常宝贵的远程位置的连接很有用。 + +IoTDB 支持 MQTT v3.1(OASIS 标准)协议。 +IoTDB 服务器包括内置的 MQTT 服务,该服务允许远程设备将消息直接发送到 IoTDB 服务器。 + + + +## 内置 MQTT 服务 +内置的 MQTT 服务提供了通过 MQTT 直接连接到 IoTDB 的能力。 它侦听来自 MQTT 客户端的发布消息,然后立即将数据写入存储。 +MQTT 主题与 IoTDB 时间序列相对应。 +消息有效载荷可以由 Java SPI 加载的`PayloadFormatter`格式化为事件,默认实现为`JSONPayloadFormatter` + 默认的`json`格式化程序支持两种 json 格式以及由他们组成的json数组,以下是 MQTT 消息有效负载示例: + +```json + { + "device":"root.sg.d1", + "timestamp":1586076045524, + "measurements":["s1","s2"], + "values":[0.530635,0.530635] + } +``` +或者 +```json + { + "device":"root.sg.d1", + "timestamps":[1586076045524,1586076065526], + "measurements":["s1","s2"], + "values":[[0.530635,0.530635], [0.530655,0.530695]] + } +``` +或者以上两者的JSON数组形式。 + + + +## MQTT 配置 +默认情况下,IoTDB MQTT 服务从`${IOTDB_HOME}/${IOTDB_CONF}/iotdb-system.properties`加载配置。 + +配置如下: + +| 名称 | 描述 | 默认 | +| ------------- |:-------------:|:------:| +| enable_mqtt_service | 是否启用 mqtt 服务 | false | +| mqtt_host | mqtt 服务绑定主机 | 127.0.0.1 | +| mqtt_port | mqtt 服务绑定端口 | 1883 | +| mqtt_handler_pool_size | 处理 mqtt 消息的处理程序池大小 | 1 | +| mqtt_payload_formatter | mqtt 消息有效负载格式化程序 | json | +| mqtt_max_message_size | mqtt 消息最大长度(字节)| 1048576 | + +## 示例代码 +以下是 mqtt 客户端将消息发送到 IoTDB 服务器的示例。 + + ```java +MQTT mqtt = new MQTT(); +mqtt.setHost("127.0.0.1", 1883); +mqtt.setUserName("root"); +mqtt.setPassword("root"); + +BlockingConnection connection = mqtt.blockingConnection(); +connection.connect(); + +Random random = new Random(); +for (int i = 0; i < 10; i++) { + String payload = String.format("{\n" + + "\"device\":\"root.sg.d1\",\n" + + "\"timestamp\":%d,\n" + + "\"measurements\":[\"s1\"],\n" + + "\"values\":[%f]\n" + + "}", System.currentTimeMillis(), random.nextDouble()); + + connection.publish("root.sg.d1.s1", payload.getBytes(), QoS.AT_LEAST_ONCE, false); +} + +connection.disconnect(); + ``` + + +## 自定义 MQTT 消息格式 + +事实上可以通过简单编程来实现 MQTT 消息的格式自定义。 +可以在源码的 `example/mqtt-customize` 项目中找到一个简单示例。 + +步骤: +1. 创建一个 Java 项目,增加如下依赖 +```xml + + org.apache.iotdb + iotdb-server + 1.3.0-SNAPSHOT + +``` +2. 创建一个实现类,实现接口 `org.apache.iotdb.db.mqtt.protocol.PayloadFormatter` + +```java +package org.apache.iotdb.mqtt.server; + +import io.netty.buffer.ByteBuf; +import org.apache.iotdb.db.protocol.mqtt.Message; +import org.apache.iotdb.db.protocol.mqtt.PayloadFormatter; + +import java.nio.charset.StandardCharsets; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; + +public class CustomizedJsonPayloadFormatter implements PayloadFormatter { + + @Override + public List format(ByteBuf payload) { + // Suppose the payload is a json format + if (payload == null) { + return null; + } + + String json = payload.toString(StandardCharsets.UTF_8); + // parse data from the json and generate Messages and put them into List ret + List ret = new ArrayList<>(); + // this is just an example, so we just generate some Messages directly + for (int i = 0; i < 2; i++) { + long ts = i; + Message message = new Message(); + message.setDevice("d" + i); + message.setTimestamp(ts); + message.setMeasurements(Arrays.asList("s1", "s2")); + message.setValues(Arrays.asList("4.0" + i, "5.0" + i)); + ret.add(message); + } + return ret; + } + + @Override + public String getName() { + // set the value of mqtt_payload_formatter in iotdb-system.properties as the following string: + return "CustomizedJson"; + } +} +``` +3. 修改项目中的 `src/main/resources/META-INF/services/org.apache.iotdb.db.protocol.mqtt.PayloadFormatter` 文件: + 将示例中的文件内容清除,并将刚才的实现类的全名(包名.类名)写入文件中。注意,这个文件中只有一行。 + 在本例中,文件内容为: `org.apache.iotdb.mqtt.server.CustomizedJsonPayloadFormatter` +4. 编译项目生成一个 jar 包: `mvn package -DskipTests` + + +在 IoTDB 服务端: +1. 创建 ${IOTDB_HOME}/ext/mqtt/ 文件夹, 将刚才的 jar 包放入此文件夹。 +2. 打开 MQTT 服务参数. (`enable_mqtt_service=true` in `conf/iotdb-system.properties`) +3. 用刚才的实现类中的 getName() 方法的返回值 设置为 `conf/iotdb-system.properties` 中 `mqtt_payload_formatter` 的值, + , 在本例中,为 `CustomizedJson` +4. 启动 IoTDB +5. 搞定 + +More: MQTT 协议的消息不限于 json,你还可以用任意二进制。通过如下函数获得: +`payload.forEachByte()` or `payload.array`。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-NodeJS-Native-API.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-NodeJS-Native-API.md new file mode 100644 index 00000000..3bd4e132 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-NodeJS-Native-API.md @@ -0,0 +1,201 @@ + + + +# Node.js 原生接口 + +IoTDB 使用 Thrift 作为跨语言的 RPC 框架,因此可以通过 Thrift 提供的接口来实现对 IoTDB 的访问。本文档将介绍如何生成可访问 IoTDB 的原生 Node.js 接口。 + + +## 依赖 + + * JDK >= 1.8 + * Node.js >= 16.0.0 + * thrift 0.14.1 + * Linux、Macos 或其他类 unix 系统 + * Windows+bash (下载 IoTDB Go client 需要 git ,通过 WSL、cygwin、Git Bash 任意一种方式均可) + +必须安装 thrift(0.14.1 或更高版本)才能将 thrift 文件编译为 Node.js 代码。下面是官方的安装教程,最终,您应该得到一个 thrift 可执行文件。 + +``` +http://thrift.apache.org/docs/install/ +``` + + +## 编译 thrift 库,生成 Node.js 原生接口 + +1. 在 IoTDB 源代码文件夹的根目录中找到 pom.xml 文件。 +2. 打开 pom.xml 文件,找到以下内容: + +```xml + + generate-thrift-sources-java + generate-sources + + compile + + + java + ${thrift.exec.absolute.path} + ${basedir}/src/main/thrift + + +``` +3. 参考该设置,在 pom.xml 文件中添加以下内容,用来生成 Node.js 的原生接口: + +```xml + + generate-thrift-sources-nodejs + generate-sources + + compile + + + js:node + ${thrift.exec.absolute.path} + ${basedir}/src/main/thrift + **/common.thrift,**/client.thrift + ${project.build.directory}/generated-sources-nodejs + + +``` + +4. 在 IoTDB 源代码文件夹的根目录下,运行`mvn clean generate-sources`, + +这个指令将自动删除`iotdb/iotdb-protocol/thrift/target` 和 `iotdb/iotdb-protocol/thrift-commons/target`中的文件,并使用新生成的 thrift 文件重新填充该文件夹。 + +这个文件夹在 git 中会被忽略,并且**永远不应该被推到 git 中!** + +**注意**不要将`iotdb/iotdb-protocol/thrift/target` 和 `iotdb/iotdb-protocol/thrift-commons/target`上传到 git 仓库中 ! + +## 使用 Node.js 原生接口 + +将 `iotdb/iotdb-protocol/thrift/target/generated-sources-nodejs/` 和 `iotdb/iotdb-protocol/thrift-commons/target/generated-sources-nodejs/` 中的文件复制到您的项目中,即可使用。 + + +## 支持的 rpc 接口 + +``` +// 打开一个 session +TSOpenSessionResp openSession(1:TSOpenSessionReq req); + +// 关闭一个 session +TSStatus closeSession(1:TSCloseSessionReq req); + +// 执行一条 SQL 语句 +TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); + +// 批量执行 SQL 语句 +TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); + +// 执行查询 SQL 语句 +TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); + +// 执行插入、删除 SQL 语句 +TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); + +// 向服务器取下一批查询结果 +TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) + +// 获取元数据 +TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) + +// 取消某次查询操作 +TSStatus cancelOperation(1:TSCancelOperationReq req); + +// 关闭查询操作数据集,释放资源 +TSStatus closeOperation(1:TSCloseOperationReq req); + +// 获取时区信息 +TSGetTimeZoneResp getTimeZone(1:i64 sessionId); + +// 设置时区 +TSStatus setTimeZone(1:TSSetTimeZoneReq req); + +// 获取服务端配置 +ServerProperties getProperties(); + +// 设置 database +TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); + +// 创建时间序列 +TSStatus createTimeseries(1:TSCreateTimeseriesReq req); + +// 创建多条时间序列 +TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); + +// 删除时间序列 +TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) + +// 删除 database +TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); + +// 按行插入数据 +TSStatus insertRecord(1:TSInsertRecordReq req); + +// 按 String 格式插入一条数据 +TSStatus insertStringRecord(1:TSInsertStringRecordReq req); + +// 按列插入数据 +TSStatus insertTablet(1:TSInsertTabletReq req); + +// 按列批量插入数据 +TSStatus insertTablets(1:TSInsertTabletsReq req); + +// 按行批量插入数据 +TSStatus insertRecords(1:TSInsertRecordsReq req); + +// 按行批量插入同属于某个设备的数据 +TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// 按 String 格式批量按行插入数据 +TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); + +// 测试按列插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertTablet(1:TSInsertTabletReq req); + +// 测试批量按列插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertTablets(1:TSInsertTabletsReq req); + +// 测试按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecord(1:TSInsertRecordReq req); + +// 测试按 String 格式按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); + +// 测试按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecords(1:TSInsertRecordsReq req); + +// 测试按行批量插入同属于某个设备的数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// 测试按 String 格式批量按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); + +// 删除数据 +TSStatus deleteData(1:TSDeleteDataReq req); + +// 执行原始数据查询 +TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); + +// 向服务器申请一个查询语句 ID +i64 requestStatementId(1:i64 sessionId); +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-ODBC.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-ODBC.md new file mode 100644 index 00000000..df78dd58 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-ODBC.md @@ -0,0 +1,146 @@ + + +# ODBC +在 JDBC 插件的基础上,IoTDB 可以通过 ODBC-JDBC 桥来支持通过 ODBC 对数据库的操作。 + +## 依赖 +* 带依赖打包的 IoTDB JDBC 插件包 +* ODBC-JDBC 桥(如 Zappy-Sys) + +## 部署方法 +### 准备 JDBC 插件包 +下载 IoTDB 源码,在根目录下执行下面的命令: +```shell +mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies +``` +之后,就可以在`iotdb-client/jdbc/target`目录下看到`iotdb-jdbc-1.3.2-SNAPSHOT-jar-with-dependencies.jar`文件。 + +### 准备 ODBC-JDBC 桥 +*注意: 这里给出的仅仅是一种 ODBC-JDBC 桥,仅作示例。读者可以自行寻找其他的 ODBC-JDBC 桥来对接 IoTDB 的 JDBC 插件。* +1. **下载 Zappy-Sys ODBC-JDBC 桥插件**: + 进入 https://zappysys.com/products/odbc-powerpack/odbc-jdbc-bridge-driver/ 网站,点击下载按钮并直接安装。 + + ![ZappySys_website.jpg](https://alioss.timecho.com/upload/ZappySys_website.jpg) + +2. **准备 IoTDB**:打开 IoTDB 集群,并任意写入一条数据。 + ```sql + IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) + ``` + +3. **部署及调试插件**: + 1. 打开 ODBC 数据源 32/64 位,取决于 Windows 的位数,一个示例的位置是 `C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Administrative Tools`。 + + ![ODBC_ADD_CN.jpg](https://alioss.timecho.com/upload/ODBC_ADD_CN.jpg) + + 2. 点击添加,选择 ZappySys JDBC Bridge。 + + ![ODBC_CREATE_CN.jpg](https://alioss.timecho.com/upload/ODBC_CREATE_CN.jpg) + + 3. 填写如下配置: + + | 配置项 | 填写内容 | 示例 | + |---------------------|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------| + | Connection String | jdbc:iotdb://:/ | jdbc:iotdb://127.0.0.1:6667/ | + | Driver Class | org.apache.iotdb.jdbc.IoTDBDriver | org.apache.iotdb.jdbc.IoTDBDriver | + | JDBC driver file(s) | IoTDB JDBC jar-with-dependencies 插件路径 | C:\Users\13361\Documents\GitHub\iotdb\iotdb-client\jdbc\target\iotdb-jdbc-1.3.2-SNAPSHOT-jar-with-dependencies.jar | + | User name | IoTDB 的用户名 | root | + | User password | IoTDB 的密码 | root | + + ![ODBC_CONNECTION.png](https://alioss.timecho.com/upload/ODBC_CONNECTION.png) + + 4. 点击 Test Connection 按钮,应该显示连接成功。 + + ![ODBC_CONFIG_CN.jpg](https://alioss.timecho.com/upload/ODBC_CONFIG_CN.jpg) + + 5. 点击上方的 Preview, 将查询文本换为 `select * from root.**`,点击 Preview Data,应该正确显示查询结果。 + + ![ODBC_TEST.jpg](https://alioss.timecho.com/upload/ODBC_TEST.jpg) + +4. **使用 ODBC 操作数据**:正确部署后,就可以使用 Windows 的 ODBC 库,对 IoTDB 的数据进行操作。 这里给出 C# 语言的代码示例: + ```C# + using System.Data.Odbc; + + // Get a connection + var dbConnection = new OdbcConnection("DSN=ZappySys JDBC Bridge"); + dbConnection.Open(); + + // Execute the write commands to prepare data + var dbCommand = dbConnection.CreateCommand(); + dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s1) values(1715670861634, 1)"; + dbCommand.ExecuteNonQuery(); + dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s2) values(1715670861634, true)"; + dbCommand.ExecuteNonQuery(); + dbCommand.CommandText = "insert into root.Keller.Flur.Energieversorgung(time, s3) values(1715670861634, 3.1)"; + dbCommand.ExecuteNonQuery(); + + // Execute the read command + dbCommand.CommandText = "SELECT * FROM root.Keller.Flur.Energieversorgung"; + var dbReader = dbCommand.ExecuteReader(); + + // Write the output header + var fCount = dbReader.FieldCount; + Console.Write(":"); + for(var i = 0; i < fCount; i++) + { + var fName = dbReader.GetName(i); + Console.Write(fName + ":"); + } + Console.WriteLine(); + + // Output the content + while (dbReader.Read()) + { + Console.Write(":"); + for(var i = 0; i < fCount; i++) + { + var fieldType = dbReader.GetFieldType(i); + switch (fieldType.Name) + { + case "DateTime": + var dateTime = dbReader.GetInt64(i); + Console.Write(dateTime + ":"); + break; + case "Double": + if (dbReader.IsDBNull(i)) + { + Console.Write("null:"); + } + else + { + var fValue = dbReader.GetDouble(i); + Console.Write(fValue + ":"); + } + break; + default: + Console.Write(fieldType.Name + ":"); + break; + } + } + Console.WriteLine(); + } + + // Shut down gracefully + dbReader.Close(); + dbCommand.Dispose(); + dbConnection.Close(); + ``` + 运行该程序可以向 IoTDB 内写入数据,并且查询并打印写入的数据。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-OPC-UA_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-OPC-UA_timecho.md new file mode 100644 index 00000000..ea23ccfb --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-OPC-UA_timecho.md @@ -0,0 +1,256 @@ + + +# OPC UA 协议 + +## OPC UA + +OPC UA 是一种在自动化领域用于不同设备和系统之间进行通信的技术规范,用于实现跨平台、跨语言和跨网络的操作,为工业物联网提供一个可靠和安全的数据交换基础。IoTDB 中支持 OPC UA协议, IoTDB OPC Server 支持 Client/Server 和 Pub/Sub 两种通信模式。 + +### OPC UA Client/Server 模式 + +- **Client/Server 模式**:在这种模式下,IoTDB 的流处理引擎通过 OPC UA Sink 与 OPC UA 服务器(Server)建立连接。OPC UA 服务器在其地址空间(Address Space) 中维护数据,IoTDB可以请求并获取这些数据。同时,其他OPC UA客户端(Client)也能访问服务器上的数据。 + +
+ +
+ + +- 特性: + + - OPC UA 将从 Sink 收到的设备信息,按照树形模型整理到 Objects folder 下的文件夹中。 + - 每个测点都被记录为一个变量节点,并记录当前数据库中的最新值。 + +### OPC UA Pub/Sub 模式 + +- **Pub/Sub 模式**:在这种模式下,IoTDB的流处理引擎通过 OPC UA Sink 向OPC UA 服务器(Server)发送数据变更事件。这些事件被发布到服务器的消息队列中,并通过事件节点 (Event Node) 进行管理。其他OPC UA客户端(Client)可以订阅这些事件节点,以便在数据变更时接收通知。 + +
+ +
+ +- 特性: + + - 每个测点会被 OPC UA 包装成一个事件节点(EventNode)。 + + - 相关字段及其对应含义如下: + + | 字段 | 含义 | 类型(Milo) | 示例 | + | :--------- | :--------------- | :------------ | :-------------------- | + | Time | 时间戳 | DateTime | 1698907326198 | + | SourceName | 测点对应完整路径 | String | root.test.opc.sensor0 | + | SourceNode | 测点数据类型 | NodeId | Int32 | + | Message | 数据 | LocalizedText | 3.0 | + + - Event 仅会发送给所有已经监听的客户端,客户端未连接则会忽略该 Event。 + +## IoTDB OPC Server 启动方式 + +### 语法 + +创建该 Sink 的语法如下: + +```SQL +create pipe p1 + with source (...) + with processor (...) + with sink ('sink' = 'opc-ua-sink', + 'sink.opcua.tcp.port' = '12686', + 'sink.opcua.https.port' = '8443', + 'sink.user' = 'root', + 'sink.password' = 'root', + 'sink.opcua.security.dir' = '...' + ) +``` + +### 参数 + +| **参数** | **描述** | **取值范围** | **是否必填** | **默认值** | +| ---------------------------------- | ------------------------------ | -------------------------------- | ------------ | ------------------------------------------------------------ | +| sink | OPC UA SINK | String: opc-ua-sink | 必填 | | +| sink.opcua.model | OPC UA 使用的模式 | String: client-server / pub-sub | 选填 | client-server | +| sink.opcua.tcp.port | OPC UA 的 TCP 端口 | Integer: [0, 65536] | 选填 | 12686 | +| sink.opcua.https.port | OPC UA 的 HTTPS 端口 | Integer: [0, 65536] | 选填 | 8443 | +| sink.opcua.security.dir | OPC UA 的密钥及证书目录 | String: Path,支持绝对及相对目录 | 选填 | iotdb 相关 DataNode 的 conf 目录下的 opc_security 文件夹 /
如无 iotdb 的 conf 目录(例如 IDEA 中启动 DataNode),则为用户主目录下的 iotdb_opc_security 文件夹 / | +| sink.opcua.enable-anonymous-access | OPC UA 是否允许匿名访问 | Boolean | 选填 | true | +| sink.user | 用户,这里指 OPC UA 的允许用户 | String | 选填 | root | +| sink.password | 密码,这里指 OPC UA 的允许密码 | String | 选填 | root | + +### 示例 + +```Bash +create pipe p1 + with sink ('sink' = 'opc-ua-sink', + 'sink.user' = 'root', + 'sink.password' = 'root'); +start pipe p1; +``` + +### 使用限制 + +1. **必须存在 DataRegion**:在 IoTDB 有 dataRegion 时,OPC UA 的服务器才会启动。因此,对于一个空的 IoTDB,需要写入一条数据,OPC UA 的服务器才有效。 +2. **需连接才有数据**:每一个订阅该服务器的客户端,不会收到 OPC Server 在连接之前写入IoTDB的数据。 + +3. **多 DataNode 会有分散发送 / 冲突问题**: + + - 对于有多个 dataRegion,且分散在不同 DataNode ip上的 IoTDB 集群,数据会在 dataRegion 的 leader 上分散发送。客户端需要对 DataNode ip 的配置端口分别监听。 + + - 建议在 1C1D 下使用该 OPC UA 服务器。 + +4. **不支持删除数据和修改测点类型:**在Client Server模式下,OPC UA无法删除数据或者改变数据类型的设置。而在Pub Sub模式下,如果数据被删除了,信息是无法推送给客户端的。 + +## IoTDB OPC Server 示例 + +### Client / Server 模式 + +#### 准备工作 + +1. 此处以UAExpert客户端为例,下载 UAExpert 客户端:https://www.unified-automation.com/downloads/opc-ua-clients.html + +2. 安装 UAExpert,填写自身的证书等信息。 + +#### 快速开始 + +1. 使用如下 sql,创建并启动 client-server 模式的 OPC UA Sink。详细语法参见上文:[IoTDB OPC Server语法](#语法) + +```SQL +create pipe p1 with sink ('sink'='opc-ua-sink'); +``` + +2. 写入部分数据。 + +```SQL +insert into root.test.db(time, s2) values(now(), 2) +``` + +​ 此处自动创建元数据开启。 + +3. 在 UAExpert 中配置 iotdb 的连接,其中 password 填写为上述参数配置中 sink.password 中设定的密码(此处以默认密码root为例): + +
+ +
+ +
+ +
+ +4. 信任服务器的证书后,在左侧 Objects folder 即可看到写入的数据。 + +
+ +
+ +
+ +
+ +5. 可以将左侧节点拖动到中间,并展示该节点的最新值: + +
+ +
+ +### Pub / Sub 模式 + +#### 准备工作 + +该代码位于 iotdb-example 包下的 [opc-ua-sink 文件夹](https://github.com/apache/iotdb/tree/master/example/pipe-opc-ua-sink/src/main/java/org/apache/iotdb/opcua)中 + +代码中包含: + +- 主类(ClientTest) +- Client 证书相关的逻辑(IoTDBKeyStoreLoaderClient) +- Client 的配置及启动逻辑(ClientExampleRunner) +- ClientTest 的父类(ClientExample) + +### 快速开始 + +使用步骤为: + +1. 打开 IoTDB 并写入部分数据。 + +```SQL +insert into root.a.b(time, c, d) values(now(), 1, 2); +``` + +​ 此处自动创建元数据开启。 + +2. 使用如下 sql,创建并启动 Pub-Sub 模式的 OPC UA Sink。详细语法参见上文:[IoTDB OPC Server语法](#语法) + +```SQL +create pipe p1 with sink ('sink'='opc-ua-sink', + 'sink.opcua.model'='pub-sub'); +start pipe p1; +``` + +​ 此时能看到服务器的 conf 目录下创建了 opc 证书相关的目录。 + +
+ +
+ +3. 直接运行 Client 连接,此时 Client 证书被服务器拒收。 + +
+ +
+ +4. 进入服务器的 sink.opcua.security.dir 目录下,进入 pki 的 rejected 目录,此时 Client 的证书应该已经在该目录下生成。 + +
+ +
+ +5. 将客户端的证书移入(不是复制) 同目录下 trusted 目录的 certs 文件夹中。 + +
+ +
+ +6. 再次打开 Client 连接,此时服务器的证书应该被 Client 拒收。 + +
+ +
+ +7. 进入客户端的 /client/security 目录下,进入 pki 的 rejected 目录,将服务器的证书移入(不是复制)trusted 目录。 + +
+ +
+ +8. 打开 Client,此时建立双向信任成功, Client 能够连接到服务器。 + +9. 向服务器中写入数据,此时 Client 中能够打印出收到的数据。 + +
+ +
+ + +### 注意事项 + +1. **单机与集群**:建议使用1C1D单机版,如果集群中有多个 DataNode,可能数据会分散发送在各个 DataNode 上,无法收听到全量数据。 + +2. **无需操作根目录下证书**:在证书操作过程中,无需操作 IoTDB security 根目录下的 `iotdb-server.pfx` 证书和 client security 目录下的 `example-client.pfx` 目录。Client 和 Server 双向连接时,会将根目录下的证书发给对方,对方如果第一次看见此证书,就会放入 reject dir,如果该证书在 trusted/certs 里面,则能够信任对方。 + +3. **建议使用** **Java 17+**:在 JVM 8 的版本中,可能会存在密钥长度限制,报 Illegal key size 错误。对于特定版本(如 jdk.1.8u151+),可以在 ClientExampleRunner 的 create client 里加入 `Security.`*`setProperty`*`("crypto.policy", "unlimited");` 解决,也可以下载无限制的包 `local_policy.jar` 与 `US_export_policy `解决替换 `JDK/jre/lib/security `目录下的包解决,下载网址:https://www.oracle.com/java/technologies/javase-jce8-downloads.html。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Python-Native-API.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Python-Native-API.md new file mode 100644 index 00000000..c9f61838 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Python-Native-API.md @@ -0,0 +1,717 @@ + + +# Python 原生接口 + +## 依赖 + +在使用 Python 原生接口包前,您需要安装 thrift (>=0.13) 依赖。 + +## 如何使用 (示例) + +首先下载包:`pip3 install apache-iotdb` + +您可以从这里得到一个使用该包进行数据读写的例子:[Session Example](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/SessionExample.py) + +关于对齐时间序列读写的例子:[Aligned Timeseries Session Example](https://github.com/apache/iotdb/blob/master/iotdb-client/client-py/SessionAlignedTimeseriesExample.py) + +(您需要在文件的头部添加`import iotdb`) + +或者: + +```python +from iotdb.Session import Session + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +session = Session(ip, port_, username_, password_) +session.open(False) +zone = session.get_time_zone() +session.close() +``` +## 基本接口说明 + +下面将给出 Session 对应的接口的简要介绍和对应参数: + +### 初始化 + +* 初始化 Session + +```python +session = Session( + ip="127.0.0.1", + port="6667", + user="root", + password="root", + fetch_size=1024, + zone_id="UTC+8", + enable_redirection=True +) +``` + +* 初始化可连接多节点的 Session + +```python +session = Session.init_from_node_urls( + node_urls=["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"], + user="root", + password="root", + fetch_size=1024, + zone_id="UTC+8", + enable_redirection=True +) +``` + +* 开启 Session,并决定是否开启 RPC 压缩 + +```python +session.open(enable_rpc_compression=False) +``` + +注意: 客户端的 RPC 压缩开启状态需和服务端一致 + +* 关闭 Session + +```python +session.close() +``` +### 通过SessionPool管理session连接 + +利用SessionPool管理session,不需要再考虑如何重用session。当session连接到达pool的最大值时,获取session的请求会被阻塞,可以通过参数设置阻塞等待时间。每次session使用完需要使用putBack方法将session归还到SessionPool中管理。 + +#### 创建SessionPool + +```python +pool_config = PoolConfig(host=ip,port=port, user_name=username, + password=password, fetch_size=1024, + time_zone="UTC+8", max_retry=3) +max_pool_size = 5 +wait_timeout_in_ms = 3000 + +# 通过配置参数创建连接池 +session_pool = SessionPool(pool_config, max_pool_size, wait_timeout_in_ms) +``` +#### 通过分布式节点创建SessionPool +```python +pool_config = PoolConfig(node_urls=node_urls=["127.0.0.1:6667", "127.0.0.1:6668", "127.0.0.1:6669"], user_name=username, + password=password, fetch_size=1024, + time_zone="UTC+8", max_retry=3) +max_pool_size = 5 +wait_timeout_in_ms = 3000 +``` + +#### 通过SessionPool获取session,使用完手动调用PutBack + +```python +session = session_pool.get_session() +session.set_storage_group(STORAGE_GROUP_NAME) +session.create_time_series( + TIMESERIES_PATH, TSDataType.BOOLEAN, TSEncoding.PLAIN, Compressor.SNAPPY +) +# 使用完调用putBack归还 +session_pool.put_back(session) +# 关闭sessionPool时同时关闭管理的session +session_pool.close() +``` + +## 数据定义接口 DDL + +### Database 管理 + +* 设置 database + +```python +session.set_storage_group(group_name) +``` + +* 删除单个或多个 database + +```python +session.delete_storage_group(group_name) +session.delete_storage_groups(group_name_lst) +``` +### 时间序列管理 + +* 创建单个或多个时间序列 + +```python +session.create_time_series(ts_path, data_type, encoding, compressor, + props=None, tags=None, attributes=None, alias=None) + +session.create_multi_time_series( + ts_path_lst, data_type_lst, encoding_lst, compressor_lst, + props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None +) +``` + +* 创建对齐时间序列 + +```python +session.create_aligned_time_series( + device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst +) +``` + +注意:目前**暂不支持**使用传感器别名。 + +* 删除一个或多个时间序列 + +```python +session.delete_time_series(paths_list) +``` + +* 检测时间序列是否存在 + +```python +session.check_time_series_exists(path) +``` + +## 数据操作接口 DML + +### 数据写入 + +推荐使用 insert_tablet 帮助提高写入效率 + +* 插入一个 Tablet,Tablet 是一个设备若干行数据块,每一行的列都相同 + * **写入效率高** + * **支持写入空值** (0.13 版本起) + +Python API 里目前有两种 Tablet 实现 + +* 普通 Tablet + +```python +values_ = [ + [False, 10, 11, 1.1, 10011.1, "test01"], + [True, 100, 11111, 1.25, 101.0, "test02"], + [False, 100, 1, 188.1, 688.25, "test03"], + [True, 0, 0, 0, 6.25, "test04"], +] +timestamps_ = [1, 2, 3, 4] +tablet_ = Tablet( + device_id, measurements_, data_types_, values_, timestamps_ +) +session.insert_tablet(tablet_) + +values_ = [ + [None, 10, 11, 1.1, 10011.1, "test01"], + [True, None, 11111, 1.25, 101.0, "test02"], + [False, 100, None, 188.1, 688.25, "test03"], + [True, 0, 0, 0, None, None], +] +timestamps_ = [16, 17, 18, 19] +tablet_ = Tablet( + device_id, measurements_, data_types_, values_, timestamps_ +) +session.insert_tablet(tablet_) +``` +* Numpy Tablet + +相较于普通 Tablet,Numpy Tablet 使用 [numpy.ndarray](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) 来记录数值型数据。 +内存占用和序列化耗时会降低很多,写入效率也会有很大提升。 + +**注意** +1. Tablet 中的每一列时间戳和值记录为一个 ndarray +2. Numpy Tablet 只支持大端类型数据,ndarray 构建时如果不指定数据类型会使用小端,因此推荐在构建 ndarray 时指定下面例子中类型使用大端。如果不指定,IoTDB Python客户端也会进行大小端转换,不影响使用正确性。 + +```python +import numpy as np +data_types_ = [ + TSDataType.BOOLEAN, + TSDataType.INT32, + TSDataType.INT64, + TSDataType.FLOAT, + TSDataType.DOUBLE, + TSDataType.TEXT, +] +np_values_ = [ + np.array([False, True, False, True], TSDataType.BOOLEAN.np_dtype()), + np.array([10, 100, 100, 0], TSDataType.INT32.np_dtype()), + np.array([11, 11111, 1, 0], TSDataType.INT64.np_dtype()), + np.array([1.1, 1.25, 188.1, 0], TSDataType.FLOAT.np_dtype()), + np.array([10011.1, 101.0, 688.25, 6.25], TSDataType.DOUBLE.np_dtype()), + np.array(["test01", "test02", "test03", "test04"], TSDataType.TEXT.np_dtype()), +] +np_timestamps_ = np.array([1, 2, 3, 4], TSDataType.INT64.np_dtype()) +np_tablet_ = NumpyTablet( + device_id, measurements_, data_types_, np_values_, np_timestamps_ +) +session.insert_tablet(np_tablet_) + +# insert one numpy tablet with None into the database. +np_values_ = [ + np.array([False, True, False, True], TSDataType.BOOLEAN.np_dtype()), + np.array([10, 100, 100, 0], TSDataType.INT32.np_dtype()), + np.array([11, 11111, 1, 0], TSDataType.INT64.np_dtype()), + np.array([1.1, 1.25, 188.1, 0], TSDataType.FLOAT.np_dtype()), + np.array([10011.1, 101.0, 688.25, 6.25], TSDataType.DOUBLE.np_dtype()), + np.array(["test01", "test02", "test03", "test04"], TSDataType.TEXT.np_dtype()), +] +np_timestamps_ = np.array([98, 99, 100, 101], TSDataType.INT64.np_dtype()) +np_bitmaps_ = [] +for i in range(len(measurements_)): + np_bitmaps_.append(BitMap(len(np_timestamps_))) +np_bitmaps_[0].mark(0) +np_bitmaps_[1].mark(1) +np_bitmaps_[2].mark(2) +np_bitmaps_[4].mark(3) +np_bitmaps_[5].mark(3) +np_tablet_with_none = NumpyTablet( + device_id, measurements_, data_types_, np_values_, np_timestamps_, np_bitmaps_ +) +session.insert_tablet(np_tablet_with_none) +``` + +* 插入多个 Tablet + +```python +session.insert_tablets(tablet_lst) +``` + +* 插入一个 Record,一个 Record 是一个设备一个时间戳下多个测点的数据。 + +```python +session.insert_record(device_id, timestamp, measurements_, data_types_, values_) +``` + +* 插入多个 Record + +```python +session.insert_records( + device_ids_, time_list_, measurements_list_, data_type_list_, values_list_ + ) +``` + +* 插入同属于一个 device 的多个 Record + +```python +session.insert_records_of_one_device(device_id, time_list, measurements_list, data_types_list, values_list) +``` + +### 带有类型推断的写入 + +当数据均是 String 类型时,我们可以使用如下接口,根据 value 的值进行类型推断。例如:value 为 "true" ,就可以自动推断为布尔类型。value 为 "3.2" ,就可以自动推断为数值类型。服务器需要做类型推断,可能会有额外耗时,速度较无需类型推断的写入慢 + +```python +session.insert_str_record(device_id, timestamp, measurements, string_values) +``` + +### 对齐时间序列的写入 + +对齐时间序列的写入使用 insert_aligned_xxx 接口,其余与上述接口类似: + +* insert_aligned_record +* insert_aligned_records +* insert_aligned_records_of_one_device +* insert_aligned_tablet +* insert_aligned_tablets + + +## IoTDB-SQL 接口 + +* 执行查询语句 + +```python +session.execute_query_statement(sql) +``` + +* 执行非查询语句 + +```python +session.execute_non_query_statement(sql) +``` + +* 执行语句 + +```python +session.execute_statement(sql) +``` + + +## 元数据模版接口 +### 构建元数据模版 +1. 首先构建 Template 类 +2. 添加子节点 MeasurementNode +3. 调用创建元数据模版接口 + +```python +template = Template(name=template_name, share_time=True) + +m_node_x = MeasurementNode("x", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) +m_node_y = MeasurementNode("y", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) +m_node_z = MeasurementNode("z", TSDataType.FLOAT, TSEncoding.RLE, Compressor.SNAPPY) + +template.add_template(m_node_x) +template.add_template(m_node_y) +template.add_template(m_node_z) + +session.create_schema_template(template) +``` +### 修改模版节点信息 +修改模版节点,其中修改的模版必须已经被创建。以下函数能够在已经存在的模版中增加或者删除物理量 +* 在模版中增加实体 +```python +session.add_measurements_in_template(template_name, measurements_path, data_types, encodings, compressors, is_aligned) +``` + +* 在模版中删除物理量 +```python +session.delete_node_in_template(template_name, path) +``` + +### 挂载元数据模板 +```python +session.set_schema_template(template_name, prefix_path) +``` + +### 卸载元数据模版 +```python +session.unset_schema_template(template_name, prefix_path) +``` + +### 查看元数据模版 +* 查看所有的元数据模版 +```python +session.show_all_templates() +``` +* 查看元数据模版中的物理量个数 +```python +session.count_measurements_in_template(template_name) +``` + +* 判断某个节点是否为物理量,该节点必须已经在元数据模版中 +```python +session.count_measurements_in_template(template_name, path) +``` + +* 判断某个路径是否在元数据模版中,这个路径有可能不在元数据模版中 +```python +session.is_path_exist_in_template(template_name, path) +``` + +* 查看某个元数据模板下的物理量 +```python +session.show_measurements_in_template(template_name) +``` + +* 查看挂载了某个元数据模板的路径前缀 +```python +session.show_paths_template_set_on(template_name) +``` + +* 查看使用了某个元数据模板(即序列已创建)的路径前缀 +```python +session.show_paths_template_using_on(template_name) +``` + +### 删除元数据模版 +删除已经存在的元数据模版,不支持删除已经挂载的模版 +```python +session.drop_schema_template("template_python") +``` + + +## 对 Pandas 的支持 + +我们支持将查询结果轻松地转换为 [Pandas Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)。 + +SessionDataSet 有一个方法`.todf()`,它的作用是消费 SessionDataSet 中的数据,并将数据转换为 pandas dataframe。 + +例子: + +```python +from iotdb.Session import Session + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +session = Session(ip, port_, username_, password_) +session.open(False) +result = session.execute_query_statement("SELECT ** FROM root") + +# Transform to Pandas Dataset +df = result.todf() + +session.close() + +# Now you can work with the dataframe +df = ... +``` + +## IoTDB Testcontainer + +Python 客户端对测试的支持是基于`testcontainers`库 (https://testcontainers-python.readthedocs.io/en/latest/index.html) 的,如果您想使用该特性,就需要将其安装到您的项目中。 + +要在 Docker 容器中启动(和停止)一个 IoTDB 数据库,只需这样做: + +```python +class MyTestCase(unittest.TestCase): + + def test_something(self): + with IoTDBContainer() as c: + session = Session("localhost", c.get_exposed_port(6667), "root", "root") + session.open(False) + result = session.execute_query_statement("SHOW TIMESERIES") + print(result) + session.close() +``` + +默认情况下,它会拉取最新的 IoTDB 镜像 `apache/iotdb:latest`进行测试,如果您想指定待测 IoTDB 的版本,您只需要将版本信息像这样声明:`IoTDBContainer("apache/iotdb:0.12.0")`,此时,您就会得到一个`0.12.0`版本的 IoTDB 实例。 + +## IoTDB DBAPI + +IoTDB DBAPI 遵循 Python DB API 2.0 规范 (https://peps.python.org/pep-0249/),实现了通过Python语言访问数据库的通用接口。 + +### 例子 ++ 初始化 + +初始化的参数与Session部分保持一致(sqlalchemy_mode参数除外,该参数仅在SQLAlchemy方言中使用) +```python +from iotdb.dbapi import connect + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +conn = connect(ip, port_, username_, password_,fetch_size=1024,zone_id="UTC+8",sqlalchemy_mode=False) +cursor = conn.cursor() +``` ++ 执行简单的SQL语句 +```python +cursor.execute("SELECT ** FROM root") +for row in cursor.fetchall(): + print(row) +``` + ++ 执行带有参数的SQL语句 + +IoTDB DBAPI 支持pyformat风格的参数 +```python +cursor.execute("SELECT ** FROM root WHERE time < %(time)s",{"time":"2017-11-01T00:08:00.000"}) +for row in cursor.fetchall(): + print(row) +``` + ++ 批量执行带有参数的SQL语句 +```python +seq_of_parameters = [ + {"timestamp": 1, "temperature": 1}, + {"timestamp": 2, "temperature": 2}, + {"timestamp": 3, "temperature": 3}, + {"timestamp": 4, "temperature": 4}, + {"timestamp": 5, "temperature": 5}, +] +sql = "insert into root.cursor(timestamp,temperature) values(%(timestamp)s,%(temperature)s)" +cursor.executemany(sql,seq_of_parameters) +``` + ++ 关闭连接 +```python +cursor.close() +conn.close() +``` + +## IoTDB SQLAlchemy Dialect(实验性) +IoTDB的SQLAlchemy方言主要是为了适配Apache superset而编写的,该部分仍在完善中,请勿在生产环境中使用! +### 元数据模型映射 +SQLAlchemy 所使用的数据模型为关系数据模型,这种数据模型通过表格来描述不同实体之间的关系。 +而 IoTDB 的数据模型为层次数据模型,通过树状结构来对数据进行组织。 +为了使 IoTDB 能够适配 SQLAlchemy 的方言,需要对 IoTDB 中原有的数据模型进行重新组织, +把 IoTDB 的数据模型转换成 SQLAlchemy 的数据模型。 + +IoTDB 中的元数据有: + +1. Database:数据库 +2. Path:存储路径 +3. Entity:实体 +4. Measurement:物理量 + +SQLAlchemy 中的元数据有: +1. Schema:数据模式 +2. Table:数据表 +3. Column:数据列 + +它们之间的映射关系为: + +| SQLAlchemy中的元数据 | IoTDB中对应的元数据 | +| -------------------- | ---------------------------------------------- | +| Schema | Database | +| Table | Path ( from database to entity ) + Entity | +| Column | Measurement | + +下图更加清晰的展示了二者的映射关系: + +![sqlalchemy-to-iotdb](https://alioss.timecho.com/docs/img/UserGuide/API/IoTDB-SQLAlchemy/sqlalchemy-to-iotdb.png?raw=true) + +### 数据类型映射 +| IoTDB 中的数据类型 | SQLAlchemy 中的数据类型 | +|--------------|-------------------| +| BOOLEAN | Boolean | +| INT32 | Integer | +| INT64 | BigInteger | +| FLOAT | Float | +| DOUBLE | Float | +| TEXT | Text | +| LONG | BigInteger | +### Example + ++ 执行语句 + +```python +from sqlalchemy import create_engine + +engine = create_engine("iotdb://root:root@127.0.0.1:6667") +connect = engine.connect() +result = connect.execute("SELECT ** FROM root") +for row in result.fetchall(): + print(row) +``` + ++ ORM (目前只支持简单的查询) + +```python +from sqlalchemy import create_engine, Column, Float, BigInteger, MetaData +from sqlalchemy.ext.declarative import declarative_base +from sqlalchemy.orm import sessionmaker + +metadata = MetaData( + schema='root.factory' +) +Base = declarative_base(metadata=metadata) + + +class Device(Base): + __tablename__ = "room2.device1" + Time = Column(BigInteger, primary_key=True) + temperature = Column(Float) + status = Column(Float) + + +engine = create_engine("iotdb://root:root@127.0.0.1:6667") + +DbSession = sessionmaker(bind=engine) +session = DbSession() + +res = session.query(Device.status).filter(Device.temperature > 1) + +for row in res: + print(row) +``` + +## 给开发人员 + +### 介绍 + +这是一个使用 thrift rpc 接口连接到 IoTDB 的示例。在 Windows 和 Linux 上操作几乎是一样的,但要注意路径分隔符等不同之处。 + +### 依赖 + +首选 Python3.7 或更高版本。 + +必须安装 thrift(0.11.0 或更高版本)才能将 thrift 文件编译为 Python 代码。下面是官方的安装教程,最终,您应该得到一个 thrift 可执行文件。 + +``` +http://thrift.apache.org/docs/install/ +``` + +在开始之前,您还需要在 Python 环境中安装`requirements_dev.txt`中的其他依赖: +```shell +pip install -r requirements_dev.txt +``` + +### 编译 thrift 库并调试 + +在 IoTDB 源代码文件夹的根目录下,运行`mvn clean generate-sources -pl iotdb-client/client-py -am`, + +这个指令将自动删除`iotdb/thrift`中的文件,并使用新生成的 thrift 文件重新填充该文件夹。 + +这个文件夹在 git 中会被忽略,并且**永远不应该被推到 git 中!** + +**注意**不要将`iotdb/thrift`上传到 git 仓库中 ! + +### Session 客户端 & 使用示例 + +我们将 thrift 接口打包到`client-py/src/iotdb/session.py `中(与 Java 版本类似),还提供了一个示例文件`client-py/src/SessionExample.py`来说明如何使用 Session 模块。请仔细阅读。 + +另一个简单的例子: + +```python +from iotdb.Session import Session + +ip = "127.0.0.1" +port_ = "6667" +username_ = "root" +password_ = "root" +session = Session(ip, port_, username_, password_) +session.open(False) +zone = session.get_time_zone() +session.close() +``` + +### 测试 + +请在`tests`文件夹中添加自定义测试。 + +要运行所有的测试,只需在根目录中运行`pytest . `即可。 + +**注意**一些测试需要在您的系统上使用 docker,因为测试的 IoTDB 实例是使用 [testcontainers](https://testcontainers-python.readthedocs.io/en/latest/index.html) 在 docker 容器中启动的。 + +### 其他工具 + +[black](https://pypi.org/project/black/) 和 [flake8](https://pypi.org/project/flake8/) 分别用于自动格式化和 linting。 +它们可以通过 `black .` 或 `flake8 .` 分别运行。 + +## 发版 + +要进行发版, + +只需确保您生成了正确的 thrift 代码, + +运行了 linting 并进行了自动格式化, + +然后,确保所有测试都正常通过(通过`pytest . `), + +最后,您就可以将包发布到 pypi 了。 + +### 准备您的环境 + +首先,通过`pip install -r requirements_dev.txt`安装所有必要的开发依赖。 + +### 发版 + +有一个脚本`release.sh`可以用来执行发版的所有步骤。 + +这些步骤包括: + +* 删除所有临时目录(如果存在) + +* (重新)通过 mvn 生成所有必须的源代码 + +* 运行 linting (flke8) + +* 通过 pytest 运行测试 + +* Build + +* 发布到 pypi diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Rust-Native-API.md b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Rust-Native-API.md new file mode 100644 index 00000000..d7571050 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/Programming-Rust-Native-API.md @@ -0,0 +1,200 @@ + + + +# Rust 原生接口 + +IoTDB 使用 Thrift 作为跨语言的 RPC 框架,因此可以通过 Thrift 提供的接口来实现对 IoTDB 的访问。本文档将介绍如何生成可访问 IoTDB 的原生 Rust 接口。 + + +## 依赖 + + * JDK >= 1.8 + * Rust >= 1.0.0 + * thrift 0.14.1 + * Linux、Macos 或其他类 unix 系统 + * Windows+bash (下载 IoTDB Go client 需要 git ,通过 WSL、cygwin、Git Bash 任意一种方式均可) + +必须安装 thrift(0.14.1 或更高版本)才能将 thrift 文件编译为 Rust 代码。下面是官方的安装教程,最终,您应该得到一个 thrift 可执行文件。 + +``` +http://thrift.apache.org/docs/install/ +``` + + +## 编译 thrift 库,生成 Rust 原生接口 + +1. 在 IoTDB 源代码文件夹的根目录中找到 pom.xml 文件。 +2. 打开 pom.xml 文件,找到以下内容: + +```xml + + generate-thrift-sources-java + generate-sources + + compile + + + java + ${thrift.exec.absolute.path} + ${basedir}/src/main/thrift + + +``` +3. 参考该设置,在 pom.xml 文件中添加以下内容,用来生成 Rust 的原生接口: + +```xml + + generate-thrift-sources-rust + generate-sources + + compile + + + rs + ${thrift.exec.absolute.path} + ${basedir}/src/main/thrift + **/common.thrift,**/client.thrift + ${project.build.directory}/generated-sources-rust + + +``` + +4. 在 IoTDB 源代码文件夹的根目录下,运行`mvn clean generate-sources`, + +这个指令将自动删除`iotdb/iotdb-protocol/thrift/target` 和 `iotdb/iotdb-protocol/thrift-commons/target`中的文件,并使用新生成的 thrift 文件重新填充该文件夹。 + +这个文件夹在 git 中会被忽略,并且**永远不应该被推到 git 中!** + +**注意**不要将`iotdb/iotdb-protocol/thrift/target` 和 `iotdb/iotdb-protocol/thrift-commons/target`上传到 git 仓库中 ! + +## 使用 Rust 原生接口 + +将 `iotdb/iotdb-protocol/thrift/target/generated-sources-rust/` 和 `iotdb/iotdb-protocol/thrift-commons/target/generated-sources-rust/` 中的文件复制到您的项目中,即可使用。 + +## 支持的 rpc 接口 + +``` +// 打开一个 session +TSOpenSessionResp openSession(1:TSOpenSessionReq req); + +// 关闭一个 session +TSStatus closeSession(1:TSCloseSessionReq req); + +// 执行一条 SQL 语句 +TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); + +// 批量执行 SQL 语句 +TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); + +// 执行查询 SQL 语句 +TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); + +// 执行插入、删除 SQL 语句 +TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); + +// 向服务器取下一批查询结果 +TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) + +// 获取元数据 +TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) + +// 取消某次查询操作 +TSStatus cancelOperation(1:TSCancelOperationReq req); + +// 关闭查询操作数据集,释放资源 +TSStatus closeOperation(1:TSCloseOperationReq req); + +// 获取时区信息 +TSGetTimeZoneResp getTimeZone(1:i64 sessionId); + +// 设置时区 +TSStatus setTimeZone(1:TSSetTimeZoneReq req); + +// 获取服务端配置 +ServerProperties getProperties(); + +// 设置 database +TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); + +// 创建时间序列 +TSStatus createTimeseries(1:TSCreateTimeseriesReq req); + +// 创建多条时间序列 +TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); + +// 删除时间序列 +TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) + +// 删除 database +TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); + +// 按行插入数据 +TSStatus insertRecord(1:TSInsertRecordReq req); + +// 按 String 格式插入一条数据 +TSStatus insertStringRecord(1:TSInsertStringRecordReq req); + +// 按列插入数据 +TSStatus insertTablet(1:TSInsertTabletReq req); + +// 按列批量插入数据 +TSStatus insertTablets(1:TSInsertTabletsReq req); + +// 按行批量插入数据 +TSStatus insertRecords(1:TSInsertRecordsReq req); + +// 按行批量插入同属于某个设备的数据 +TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// 按 String 格式批量按行插入数据 +TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); + +// 测试按列插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertTablet(1:TSInsertTabletReq req); + +// 测试批量按列插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertTablets(1:TSInsertTabletsReq req); + +// 测试按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecord(1:TSInsertRecordReq req); + +// 测试按 String 格式按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); + +// 测试按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecords(1:TSInsertRecordsReq req); + +// 测试按行批量插入同属于某个设备的数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// 测试按 String 格式批量按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); + +// 删除数据 +TSStatus deleteData(1:TSDeleteDataReq req); + +// 执行原始数据查询 +TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); + +// 向服务器申请一个查询语句 ID +i64 requestStatementId(1:i64 sessionId); +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/RestServiceV1.md b/src/zh/UserGuide/V2.0.1/Tree/API/RestServiceV1.md new file mode 100644 index 00000000..c3917007 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/RestServiceV1.md @@ -0,0 +1,946 @@ + + +# RESTful API V1(不推荐) +IoTDB 的 RESTful 服务可用于查询、写入和管理操作,它使用 OpenAPI 标准来定义接口并生成框架。 + +## 开启RESTful 服务 +RESTful 服务默认情况是关闭的 + * 开发者 + + 找到sever模块中`org.apache.iotdb.db.conf.rest` 下面的`IoTDBRestServiceConfig`类,修改`enableRestService=true`即可。 + + * 使用者 + + 找到IoTDB安装目录下面的`conf/iotdb-system.properties`文件,将 `enable_rest_service` 设置为 `true` 以启用该模块。 + + ```properties + enable_rest_service=true + ``` + +## 鉴权 +除了检活接口 `/ping`,RESTful 服务使用了基础(basic)鉴权,每次 URL 请求都需要在 header 中携带 `'Authorization': 'Basic ' + base64.encode(username + ':' + password)`。 + +示例中使用的用户名为:`root`,密码为:`root`,对应的 Basic 鉴权 Header 格式为 + +``` +Authorization: Basic cm9vdDpyb290 +``` + +- 若用户名密码认证失败,则返回如下信息: + + HTTP 状态码:`401` + + 返回结构体如下 + ```json + { + "code": 600, + "message": "WRONG_LOGIN_PASSWORD_ERROR" + } + ``` + +- 若未设置 `Authorization`,则返回如下信息: + + HTTP 状态码:`401` + + 返回结构体如下 + ```json + { + "code": 603, + "message": "UNINITIALIZED_AUTH_ERROR" + } + ``` + +## 接口 + +### ping + +ping 接口可以用于线上服务检活。 + +请求方式:`GET` + +请求路径:`http://ip:port/ping +` +请求示例: + +```shell +$ curl http://127.0.0.1:18080/ping +``` + +返回的 HTTP 状态码: + +- `200`:当前服务工作正常,可以接收外部请求。 +- `503`:当前服务出现异常,不能接收外部请求。 + +响应参数: + +|参数名称 |参数类型 |参数描述| +| ------------ | ------------ | ------------| +| code | integer | 状态码 | +| message | string | 信息提示 | + +响应示例: + +- HTTP 状态码为 `200` 时: + + ```json + { + "code": 200, + "message": "SUCCESS_STATUS" + } + ``` + +- HTTP 状态码为 `503` 时: + + ```json + { + "code": 500, + "message": "thrift service is unavailable" + } + ``` + +> `/ping` 接口访问不需要鉴权。 + +### query + +query 接口可以用于处理数据查询和元数据查询。 + +请求方式:`POST` + +请求头:`application/json` + +请求路径:`http://ip:port/rest/v1/query` + +参数说明: + +| 参数名称 |参数类型 |是否必填|参数描述| +|-----------| ------------ | ------------ |------------ | +| sql | string | 是 | | +| rowLimit | integer | 否 | 一次查询能返回的结果集的最大行数。
如果不设置该参数,将使用配置文件的 `rest_query_default_row_size_limit` 作为默认值。
当返回结果集的行数超出限制时,将返回状态码 `411`。 | + +响应参数: + +| 参数名称 |参数类型 |参数描述| +|--------------| ------------ | ------------| +| expressions | array | 用于数据查询时结果集列名的数组,用于元数据查询时为`null`| +| columnNames | array | 用于元数据查询结果集列名数组,用于数据查询时为`null` | +| timestamps | array | 时间戳列,用于元数据查询时为`null` | +| values |array|二维数组,第一维与结果集列名数组的长度相同,第二维数组代表结果集的一列| + +请求示例如下所示: + +提示:为了避免OOM问题,不推荐使用select * from root.xx.** 这种查找方式。 + +1. 请求示例 表达式查询: + ```shell + curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4, s3 + 1 from root.sg27 limit 2"}' http://127.0.0.1:18080/rest/v1/query +``` + + - 响应示例: + +```json +{ + "expressions": [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg27.s3 + 1" + ], + "columnNames": null, + "timestamps": [ + 1635232143960, + 1635232153960 + ], + "values": [ + [ + 11, + null + ], + [ + false, + true + ], + [ + 12.0, + null + ] + ] +} +``` + +2. 请求示例 show child paths: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child paths root"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "child paths" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ] + ] +} +``` + +3. 请求示例 show child nodes: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child nodes root"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "child nodes" + ], + "timestamps": null, + "values": [ + [ + "sg27", + "sg28" + ] + ] +} +``` + +4. 请求示例 show all ttl: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show all ttl"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + null, + null + ] + ] +} +``` + +5. 请求示例 show ttl: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show ttl on root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27" + ], + [ + null + ] + ] +} +``` + +6. 请求示例 show functions: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show functions"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "function name", + "function type", + "class name (UDF)" + ], + "timestamps": null, + "values": [ + [ + "ABS", + "ACOS", + "ASIN", + ... + ], + [ + "built-in UDTF", + "built-in UDTF", + "built-in UDTF", + ... + ], + [ + "org.apache.iotdb.db.query.udf.builtin.UDTFAbs", + "org.apache.iotdb.db.query.udf.builtin.UDTFAcos", + "org.apache.iotdb.db.query.udf.builtin.UDTFAsin", + ... + ] + ] +} +``` + +7. 请求示例 show timeseries: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show timeseries"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg28.s3", + "root.sg28.s4" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg27", + "root.sg27", + "root.sg28", + "root.sg28" + ], + [ + "INT32", + "BOOLEAN", + "INT32", + "BOOLEAN" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +8. 请求示例 show latest timeseries: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show latest timeseries"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg28.s4", + "root.sg27.s4", + "root.sg28.s3", + "root.sg27.s3" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg28", + "root.sg27", + "root.sg28", + "root.sg27" + ], + [ + "BOOLEAN", + "BOOLEAN", + "INT32", + "INT32" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +9. 请求示例 count timeseries: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count timeseries root.**"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +10. 请求示例 count nodes: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count nodes root.** level=2"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +11. 请求示例 show devices: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "devices", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +12. 请求示例 show devices with database: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices with database"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "devices", + "database", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +13. 请求示例 list user: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"list user"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "user" + ], + "timestamps": null, + "values": [ + [ + "root" + ] + ] +} +``` + +14. 请求示例 原始聚合查询: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "columnNames": null, + "timestamps": [ + 0 + ], + "values": [ + [ + 1 + ], + [ + 2 + ] + ] +} +``` + +15. 请求示例 group by level: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.** group by level = 1"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "count(root.sg27.*)", + "count(root.sg28.*)" + ], + "timestamps": null, + "values": [ + [ + 3 + ], + [ + 3 + ] + ] +} +``` + +16. 请求示例 group by: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27 group by([1635232143960,1635232153960),1s)"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "columnNames": null, + "timestamps": [ + 1635232143960, + 1635232144960, + 1635232145960, + 1635232146960, + 1635232147960, + 1635232148960, + 1635232149960, + 1635232150960, + 1635232151960, + 1635232152960 + ], + "values": [ + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ], + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ] + ] +} +``` + +17. 请求示例 last: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select last s3 from root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "columnNames": [ + "timeseries", + "value", + "dataType" + ], + "timestamps": [ + 1635232143960 + ], + "values": [ + [ + "root.sg27.s3" + ], + [ + "11" + ], + [ + "INT32" + ] + ] +} +``` + +18. 请求示例 disable align: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select * from root.sg27 disable align"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "code": 407, + "message": "disable align clauses are not supported." +} +``` + +19. 请求示例 align by device: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(s3) from root.sg27 align by device"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "code": 407, + "message": "align by device clauses are not supported." +} +``` + +20. 请求示例 select into: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4 into root.sg29.s1, root.sg29.s2 from root.sg27"}' http://127.0.0.1:18080/rest/v1/query +``` + +- 响应示例: + +```json +{ + "code": 407, + "message": "select into clauses are not supported." +} +``` + +### nonQuery + +请求方式:`POST` + +请求头:`application/json` + +请求路径:`http://ip:port/rest/v1/nonQuery` + +参数说明: + +|参数名称 |参数类型 |是否必填|参数描述| +| ------------ | ------------ | ------------ |------------ | +| sql | string | 是 | | + +请求示例: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"CREATE DATABASE root.ln"}' http://127.0.0.1:18080/rest/v1/nonQuery +``` + +响应参数: + +|参数名称 |参数类型 |参数描述| +| ------------ | ------------ | ------------| +| code | integer | 状态码 | +| message | string | 信息提示 | + +响应示例: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + + + +### insertTablet + +请求方式:`POST` + +请求头:`application/json` + +请求路径:`http://ip:port/rest/v1/insertTablet` + +参数说明: + +| 参数名称 |参数类型 |是否必填|参数描述| +|--------------| ------------ | ------------ |------------ | +| timestamps | array | 是 | 时间列 | +| measurements | array | 是 | 测点名称 | +| dataTypes | array | 是 | 数据类型 | +| values | array | 是 | 值列,每一列中的值可以为 `null` | +| isAligned | boolean | 是 | 是否是对齐时间序列 | +| deviceId | string | 是 | 设备名称 | + +请求示例: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232143960,1635232153960],"measurements":["s3","s4"],"dataTypes":["INT32","BOOLEAN"],"values":[[11,null],[false,true]],"isAligned":false,"deviceId":"root.sg27"}' http://127.0.0.1:18080/rest/v1/insertTablet +``` + +响应参数: + +|参数名称 |参数类型 |参数描述| +| ------------ | ------------ | ------------| +| code | integer | 状态码 | +| message | string | 信息提示 | + +响应示例: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + + +## 配置 + +配置位于 `iotdb-system.properties` 中。 + + + +* 将 `enable_rest_service` 设置为 `true` 以启用该模块,而将 `false` 设置为禁用该模块。默认情况下,该值为 `false`。 + +```properties +enable_rest_service=true +``` + +* 仅在 `enable_rest_service=true` 时生效。将 `rest_service_port `设置为数字(1025~65535),以自定义REST服务套接字端口。默认情况下,值为 `18080`。 + +```properties +rest_service_port=18080 +``` + +* 将 'enable_swagger' 设置 'true' 启用swagger来展示rest接口信息, 而设置为 'false' 关闭该功能. 默认情况下,该值为 `false`。 + +```properties +enable_swagger=false +``` + +* 一次查询能返回的结果集最大行数。当返回结果集的行数超出参数限制时,您只会得到在行数范围内的结果集,且将得到状态码`411`。 + +```properties +rest_query_default_row_size_limit=10000 +``` + +* 缓存客户登录信息的过期时间(用于加速用户鉴权的速度,单位为秒,默认是8个小时) + +```properties +cache_expire=28800 +``` + +* 缓存中存储的最大用户数量(默认是100) + +```properties +cache_max_num=100 +``` + +* 缓存初始容量(默认是10) + +```properties +cache_init_num=10 +``` + +* REST Service 是否开启 SSL 配置,将 `enable_https` 设置为 `true` 以启用该模块,而将 `false` 设置为禁用该模块。默认情况下,该值为 `false`。 + +```properties +enable_https=false +``` + +* keyStore 所在路径(非必填) + +```properties +key_store_path= +``` + + +* keyStore 密码(非必填) + +```properties +key_store_pwd= +``` + + +* trustStore 所在路径(非必填) + +```properties +trust_store_path= +``` + +* trustStore 密码(非必填) + +```properties +trust_store_pwd= +``` + + +* SSL 超时时间,单位为秒 + +```properties +idle_timeout=5000 +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/API/RestServiceV2.md b/src/zh/UserGuide/V2.0.1/Tree/API/RestServiceV2.md new file mode 100644 index 00000000..62d37c51 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/API/RestServiceV2.md @@ -0,0 +1,985 @@ + + +# RESTful API V2 +IoTDB 的 RESTful 服务可用于查询、写入和管理操作,它使用 OpenAPI 标准来定义接口并生成框架。 + +## 开启RESTful 服务 +RESTful 服务默认情况是关闭的 + * 开发者 + + 找到sever模块中`org.apache.iotdb.db.conf.rest` 下面的`IoTDBRestServiceConfig`类,修改`enableRestService=true`即可。 + + * 使用者 + + 找到IoTDB安装目录下面的`conf/iotdb-system.properties`文件,将 `enable_rest_service` 设置为 `true` 以启用该模块。 + + ```properties + enable_rest_service=true + ``` + +## 鉴权 +除了检活接口 `/ping`,RESTful 服务使用了基础(basic)鉴权,每次 URL 请求都需要在 header 中携带 `'Authorization': 'Basic ' + base64.encode(username + ':' + password)`。 + +示例中使用的用户名为:`root`,密码为:`root`,对应的 Basic 鉴权 Header 格式为 + +``` +Authorization: Basic cm9vdDpyb290 +``` + +- 若用户名密码认证失败,则返回如下信息: + + HTTP 状态码:`401` + + 返回结构体如下 + ```json + { + "code": 600, + "message": "WRONG_LOGIN_PASSWORD_ERROR" + } + ``` + +- 若未设置 `Authorization`,则返回如下信息: + + HTTP 状态码:`401` + + 返回结构体如下 + ```json + { + "code": 603, + "message": "UNINITIALIZED_AUTH_ERROR" + } + ``` + +## 接口 + +### ping + +ping 接口可以用于线上服务检活。 + +请求方式:`GET` + +请求路径:http://ip:port/ping + +请求示例: + +```shell +$ curl http://127.0.0.1:18080/ping +``` + +返回的 HTTP 状态码: + +- `200`:当前服务工作正常,可以接收外部请求。 +- `503`:当前服务出现异常,不能接收外部请求。 + +响应参数: + +|参数名称 |参数类型 |参数描述| +| ------------ | ------------ | ------------| +| code | integer | 状态码 | +| message | string | 信息提示 | + +响应示例: + +- HTTP 状态码为 `200` 时: + + ```json + { + "code": 200, + "message": "SUCCESS_STATUS" + } + ``` + +- HTTP 状态码为 `503` 时: + + ```json + { + "code": 500, + "message": "thrift service is unavailable" + } + ``` + +> `/ping` 接口访问不需要鉴权。 + +### query + +query 接口可以用于处理数据查询和元数据查询。 + +请求方式:`POST` + +请求头:`application/json` + +请求路径: `http://ip:port/rest/v2/query` + +参数说明: + +| 参数名称 |参数类型 |是否必填|参数描述| +|-----------| ------------ | ------------ |------------ | +| sql | string | 是 | | +| row_limit | integer | 否 | 一次查询能返回的结果集的最大行数。
如果不设置该参数,将使用配置文件的 `rest_query_default_row_size_limit` 作为默认值。
当返回结果集的行数超出限制时,将返回状态码 `411`。 | + +响应参数: + +| 参数名称 |参数类型 |参数描述| +|--------------| ------------ | ------------| +| expressions | array | 用于数据查询时结果集列名的数组,用于元数据查询时为`null`| +| column_names | array | 用于元数据查询结果集列名数组,用于数据查询时为`null` | +| timestamps | array | 时间戳列,用于元数据查询时为`null` | +| values |array|二维数组,第一维与结果集列名数组的长度相同,第二维数组代表结果集的一列| + +请求示例如下所示: + +提示:为了避免OOM问题,不推荐使用select * from root.xx.** 这种查找方式。 + +1. 请求示例 表达式查询: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4, s3 + 1 from root.sg27 limit 2"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg27.s3 + 1" + ], + "column_names": null, + "timestamps": [ + 1635232143960, + 1635232153960 + ], + "values": [ + [ + 11, + null + ], + [ + false, + true + ], + [ + 12.0, + null + ] + ] +} +``` + +2.请求示例 show child paths: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child paths root"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "child paths" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ] + ] +} +``` + +3. 请求示例 show child nodes: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show child nodes root"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "child nodes" + ], + "timestamps": null, + "values": [ + [ + "sg27", + "sg28" + ] + ] +} +``` + +4. 请求示例 show all ttl: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show all ttl"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + null, + null + ] + ] +} +``` + +5. 请求示例 show ttl: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show ttl on root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "database", + "ttl" + ], + "timestamps": null, + "values": [ + [ + "root.sg27" + ], + [ + null + ] + ] +} +``` + +6. 请求示例 show functions: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show functions"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "function name", + "function type", + "class name (UDF)" + ], + "timestamps": null, + "values": [ + [ + "ABS", + "ACOS", + "ASIN", + ... + ], + [ + "built-in UDTF", + "built-in UDTF", + "built-in UDTF", + ... + ], + [ + "org.apache.iotdb.db.query.udf.builtin.UDTFAbs", + "org.apache.iotdb.db.query.udf.builtin.UDTFAcos", + "org.apache.iotdb.db.query.udf.builtin.UDTFAsin", + ... + ] + ] +} +``` + +7. 请求示例 show timeseries: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show timeseries"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg27.s3", + "root.sg27.s4", + "root.sg28.s3", + "root.sg28.s4" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg27", + "root.sg27", + "root.sg28", + "root.sg28" + ], + [ + "INT32", + "BOOLEAN", + "INT32", + "BOOLEAN" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +8. 请求示例 show latest timeseries: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show latest timeseries"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "timeseries", + "alias", + "database", + "dataType", + "encoding", + "compression", + "tags", + "attributes" + ], + "timestamps": null, + "values": [ + [ + "root.sg28.s4", + "root.sg27.s4", + "root.sg28.s3", + "root.sg27.s3" + ], + [ + null, + null, + null, + null + ], + [ + "root.sg28", + "root.sg27", + "root.sg28", + "root.sg27" + ], + [ + "BOOLEAN", + "BOOLEAN", + "INT32", + "INT32" + ], + [ + "RLE", + "RLE", + "RLE", + "RLE" + ], + [ + "SNAPPY", + "SNAPPY", + "SNAPPY", + "SNAPPY" + ], + [ + null, + null, + null, + null + ], + [ + null, + null, + null, + null + ] + ] +} +``` + +9. 请求示例 count timeseries: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count timeseries root.**"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +10. 请求示例 count nodes: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"count nodes root.** level=2"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "count" + ], + "timestamps": null, + "values": [ + [ + 4 + ] + ] +} +``` + +11. 请求示例 show devices: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "devices", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +12. 请求示例 show devices with database: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"show devices with database"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "devices", + "database", + "isAligned" + ], + "timestamps": null, + "values": [ + [ + "root.sg27", + "root.sg28" + ], + [ + "root.sg27", + "root.sg28" + ], + [ + "false", + "false" + ] + ] +} +``` + +13. 请求示例 list user: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"list user"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "user" + ], + "timestamps": null, + "values": [ + [ + "root" + ] + ] +} +``` + +14. 请求示例 原始聚合查询: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "column_names": null, + "timestamps": [ + 0 + ], + "values": [ + [ + 1 + ], + [ + 2 + ] + ] +} +``` + +15. 请求示例 group by level: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.** group by level = 1"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "count(root.sg27.*)", + "count(root.sg28.*)" + ], + "timestamps": null, + "values": [ + [ + 3 + ], + [ + 3 + ] + ] +} +``` + +16. 请求示例 group by: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(*) from root.sg27 group by([1635232143960,1635232153960),1s)"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": [ + "count(root.sg27.s3)", + "count(root.sg27.s4)" + ], + "column_names": null, + "timestamps": [ + 1635232143960, + 1635232144960, + 1635232145960, + 1635232146960, + 1635232147960, + 1635232148960, + 1635232149960, + 1635232150960, + 1635232151960, + 1635232152960 + ], + "values": [ + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ], + [ + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ] + ] +} +``` + +17. 请求示例 last: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select last s3 from root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "expressions": null, + "column_names": [ + "timeseries", + "value", + "dataType" + ], + "timestamps": [ + 1635232143960 + ], + "values": [ + [ + "root.sg27.s3" + ], + [ + "11" + ], + [ + "INT32" + ] + ] +} +``` + +18. 请求示例 disable align: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select * from root.sg27 disable align"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "code": 407, + "message": "disable align clauses are not supported." +} +``` + +19. 请求示例 align by device: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select count(s3) from root.sg27 align by device"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "code": 407, + "message": "align by device clauses are not supported." +} +``` + +20. 请求示例 select into: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"select s3, s4 into root.sg29.s1, root.sg29.s2 from root.sg27"}' http://127.0.0.1:18080/rest/v2/query +``` + +- 响应示例: + +```json +{ + "code": 407, + "message": "select into clauses are not supported." +} +``` + +### nonQuery + +请求方式:`POST` + +请求头:`application/json` + +请求路径:`http://ip:port/rest/v2/nonQuery` + +参数说明: + +|参数名称 |参数类型 |是否必填|参数描述| +| ------------ | ------------ | ------------ |------------ | +| sql | string | 是 | | + +请求示例: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"sql":"CREATE DATABASE root.ln"}' http://127.0.0.1:18080/rest/v2/nonQuery +``` + +响应参数: + +|参数名称 |参数类型 |参数描述| +| ------------ | ------------ | ------------| +| code | integer | 状态码 | +| message | string | 信息提示 | + +响应示例: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + + + +### insertTablet + +请求方式:`POST` + +请求头:`application/json` + +请求路径:`http://ip:port/rest/v2/insertTablet` + +参数说明: + +| 参数名称 |参数类型 |是否必填|参数描述| +|--------------| ------------ | ------------ |------------ | +| timestamps | array | 是 | 时间列 | +| measurements | array | 是 | 测点名称 | +| data_types | array | 是 | 数据类型 | +| values | array | 是 | 值列,每一列中的值可以为 `null` | +| is_aligned | boolean | 是 | 是否是对齐时间序列 | +| device | string | 是 | 设备名称 | + +请求示例: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232143960,1635232153960],"measurements":["s3","s4"],"data_types":["INT32","BOOLEAN"],"values":[[11,null],[false,true]],"is_aligned":false,"device":"root.sg27"}' http://127.0.0.1:18080/rest/v2/insertTablet +``` + +响应参数: + +|参数名称 |参数类型 |参数描述| +| ------------ | ------------ | ------------| +| code | integer | 状态码 | +| message | string | 信息提示 | + +响应示例: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + +### insertRecords + +请求方式:`POST` + +请求头:`application/json` + +请求路径:`http://ip:port/rest/v2/insertRecords` + +参数说明: + +| 参数名称 |参数类型 |是否必填|参数描述| +|-------------------| ------------ | ------------ |------------ | +| timestamps | array | 是 | 时间列 | +| measurements_list | array | 是 | 测点名称 | +| data_types_list | array | 是 | 数据类型 | +| values_list | array | 是 | 值列,每一列中的值可以为 `null` | +| devices | string | 是 | 设备名称 | +| is_aligned | string | 是 | 是否是对齐时间序列 | + +请求示例: +```shell +curl -H "Content-Type:application/json" -H "Authorization:Basic cm9vdDpyb290" -X POST --data '{"timestamps":[1635232113960,1635232151960,1635232143960,1635232143960],"measurements_list":[["s33","s44"],["s55","s66"],["s77","s88"],["s771","s881"]],"data_types_list":[["INT32","INT64"],["FLOAT","DOUBLE"],["FLOAT","DOUBLE"],["BOOLEAN","TEXT"]],"values_list":[[1,11],[2.1,2],[4,6],[false,"cccccc"]],"is_aligned":false,"devices":["root.s1","root.s1","root.s1","root.s3"]}' http://127.0.0.1:18080/rest/v2/insertRecords +``` + +响应参数: + +|参数名称 |参数类型 |参数描述| +| ------------ | ------------ | ------------| +| code | integer | 状态码 | +| message | string | 信息提示 | + +响应示例: +```json +{ + "code": 200, + "message": "SUCCESS_STATUS" +} +``` + + +## 配置 + +配置位于 `iotdb-system.properties` 中。 + + + +* 将 `enable_rest_service` 设置为 `true` 以启用该模块,而将 `false` 设置为禁用该模块。默认情况下,该值为 `false`。 + +```properties +enable_rest_service=true +``` + +* 仅在 `enable_rest_service=true` 时生效。将 `rest_service_port `设置为数字(1025~65535),以自定义REST服务套接字端口。默认情况下,值为 `18080`。 + +```properties +rest_service_port=18080 +``` + +* 将 'enable_swagger' 设置 'true' 启用swagger来展示rest接口信息, 而设置为 'false' 关闭该功能. 默认情况下,该值为 `false`。 + +```properties +enable_swagger=false +``` + +* 一次查询能返回的结果集最大行数。当返回结果集的行数超出参数限制时,您只会得到在行数范围内的结果集,且将得到状态码`411`。 + +```properties +rest_query_default_row_size_limit=10000 +``` + +* 缓存客户登录信息的过期时间(用于加速用户鉴权的速度,单位为秒,默认是8个小时) + +```properties +cache_expire=28800 +``` + +* 缓存中存储的最大用户数量(默认是100) + +```properties +cache_max_num=100 +``` + +* 缓存初始容量(默认是10) + +```properties +cache_init_num=10 +``` + +* REST Service 是否开启 SSL 配置,将 `enable_https` 设置为 `true` 以启用该模块,而将 `false` 设置为禁用该模块。默认情况下,该值为 `false`。 + +```properties +enable_https=false +``` + +* keyStore 所在路径(非必填) + +```properties +key_store_path= +``` + + +* keyStore 密码(非必填) + +```properties +key_store_pwd= +``` + + +* trustStore 所在路径(非必填) + +```properties +trust_store_path= +``` + +* trustStore 密码(非必填) + +```properties +trust_store_pwd= +``` + + +* SSL 超时时间,单位为秒 + +```properties +idle_timeout=5000 +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Background-knowledge/Cluster-Concept.md b/src/zh/UserGuide/V2.0.1/Tree/Background-knowledge/Cluster-Concept.md new file mode 100644 index 00000000..ebd6a800 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Background-knowledge/Cluster-Concept.md @@ -0,0 +1,55 @@ + + +# 集群相关概念 +下图展示了一个常见的 IoTDB 3C3D1A(3 个 ConfigNode、3 个 DataNode 和 1 个 AINode)的集群部署模式: + + +其中包括了 IoTDB 集群使用中用户常接触到的几个概念,包括: +- **节点**(ConfigNode、DataNode、AINode); +- **槽**(SchemaSlot、DataSlot); +- **Region**(SchemaRegion、DataRegion); +- ***副本组***。 + +下文将重点对以上概念进行介绍。 + +## 节点 +IoTDB 集群包括三种节点(进程),**ConfigNode**(管理节点),**DataNode**(数据节点)和 **AINode**(分析节点),如下所示: +- **ConfigNode**:存储集群的配置信息、数据库的元数据、时间序列元数据和数据的路由信息,监控集群节点并实施负载均衡,所有 ConfigNode 之间互为全量备份,如上图中的 ConfigNode-1,ConfigNode-2 和 ConfigNode-3 所示。ConfigNode 不直接接收客户端读写请求,它会通过一系列[负载均衡算法](../Technical-Insider/Cluster-data-partitioning.md)对集群中元数据和数据的分布提供指导。 +- **DataNode**:负责时间序列元数据和数据的读写,每个 DataNode 都能接收客户端读写请求并提供相应服务,如上图中的 DataNode-1,DataNode-2 和 DataNode-3 所示。接收客户端读写请求时,若 DataNode 缓存有对应的路由信息,它能直接在本地执行或是转发这些请求;否则它会向 ConfigNode 询问并缓存路由信息,以加速后续请求的服务效率。 +- **AINode**:负责与 ConfigNode 和 DataNode 交互来扩展 IoTDB 集群对时间序列进行智能分析的能力,支持从外部引入已有机器学习模型进行注册,并使用注册的模型在指定时序数据上通过简单 SQL 语句完成时序分析任务的过程,将模型的创建、管理及推理融合在数据库引擎中。目前已提供常见时序分析场景(例如预测与异常检测)的机器学习算法或自研模型。 + +## 槽 +IoTDB 内部将元数据和数据划分成多个更小的、更易于管理的单元,每个单元称为一个**槽**。槽是一个逻辑概念,在 IoTDB 集群中,**元数据槽**和**数据槽**定义如下: +- **元数据槽**(SchemaSlot):一部分元数据集合,元数据槽总数固定,默认数量为 1000,IoTDB 使用哈希算法将所有设备均匀地分配到这些元数据槽中。 +- **数据槽**(DataSlot):一部分数据集合,在元数据槽的基础上,将对应设备的数据按时间范围划分为数据槽,默认的时间范围为 7 天。 + +## Region +在 IoTDB 中,元数据和数据被复制到各个 DataNode 以获得集群高可用性。然而以槽为粒度进行复制会增加集群管理成本、降低写入吞吐。因此 IoTDB 引入 **Region** 这一概念,将元数据槽和数据槽分别分配给 SchemaRegion 和 DataRegion 后,以 Region 为单位进行复制。**SchemRegion** 和 **DataRegion** 的详细定义如下: +- **SchemaRegion**:元数据存储和复制的基本单元,集群每个数据库的所有元数据槽会被均匀分配给该数据库的所有 SchemaRegion。拥有相同 RegionID 的 SchemaRegion 互为副本,如上图中 SchemaRegion-1 拥有三个副本,分别放置于 DataNode-1,DataNode-2 和 DataNode-3。 +- **DataRegion**:数据存储和复制的基本单元,集群每个数据库的所有数据槽会被均匀分配给该数据库的所有 DataRegion。拥有相同 RegionID 的 DataRegion 互为副本,如上图中 DataRegion-2 拥有两个副本,分别放置于 DataNode-1 和 DataNode-2。 + +## 副本组 +Region 的副本对集群的容灾能力至关重要。对于每个 Region 的所有副本,它们的角色分为 **leader** 和 **follower**,共同提供读写服务。不同架构下的副本组配置推荐如下: +| 类别 | 配置项 | 单机推荐配置 | 分布式推荐配置 | +| :-: | :-: | :-: | :-: | +| 元数据 | schema_replication_factor | 1 | 3 | +| 数据 | data_replication_factor | 1 | 2 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Background-knowledge/Data-Type.md b/src/zh/UserGuide/V2.0.1/Tree/Background-knowledge/Data-Type.md new file mode 100644 index 00000000..3584aabb --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Background-knowledge/Data-Type.md @@ -0,0 +1,184 @@ + + +# 数据类型 + +## 基本数据类型 + +IoTDB 支持以下十种数据类型: + +* BOOLEAN(布尔值) +* INT32(整型) +* INT64(长整型) +* FLOAT(单精度浮点数) +* DOUBLE(双精度浮点数) +* TEXT(长字符串) +* STRING(字符串) +* BLOB(大二进制对象) +* TIMESTAMP(时间戳) +* DATE(日期) + +其中,STRING 和 TEXT 类型的区别在于,STRING 类型具有更多的统计信息,能够用于优化值过滤查询。TEXT 类型适合用于存储长字符串。 + +### 浮点数精度配置 + +对于 **FLOAT** 与 **DOUBLE** 类型的序列,如果编码方式采用 `RLE`或 `TS_2DIFF`,可以在创建序列时通过 `MAX_POINT_NUMBER` 属性指定浮点数的小数点后位数。 + +例如, +```sql +CREATE TIMESERIES root.vehicle.d0.s0 WITH DATATYPE=FLOAT, ENCODING=RLE, 'MAX_POINT_NUMBER'='2'; +``` + +若不指定,系统会按照配置文件 `iotdb-system.properties` 中的 [float_precision](../Reference/Common-Config-Manual.md) 项配置(默认为 2 位)。 + +### 数据类型兼容性 + +当写入数据的类型与序列注册的数据类型不一致时, +- 如果序列数据类型不兼容写入数据类型,系统会给出错误提示。 +- 如果序列数据类型兼容写入数据类型,系统会进行数据类型的自动转换,将写入的数据类型更正为注册序列的类型。 + +各数据类型的兼容情况如下表所示: + +| 序列数据类型 | 支持的写入数据类型 | +|--------------|--------------------------| +| BOOLEAN | BOOLEAN | +| INT32 | INT32 | +| INT64 | INT32 INT64 | +| FLOAT | INT32 FLOAT | +| DOUBLE | INT32 INT64 FLOAT DOUBLE | +| TEXT | TEXT | + +## 时间戳类型 + +时间戳是一个数据到来的时间点,其中包括绝对时间戳和相对时间戳。 + +### 绝对时间戳 + +IOTDB 中绝对时间戳分为二种,一种为 LONG 类型,一种为 DATETIME 类型(包含 DATETIME-INPUT, DATETIME-DISPLAY 两个小类)。 + +在用户在输入时间戳时,可以使用 LONG 类型的时间戳或 DATETIME-INPUT 类型的时间戳,其中 DATETIME-INPUT 类型的时间戳支持格式如表所示: + +
+ +**DATETIME-INPUT 类型支持格式** + + +| format | +| :--------------------------- | +| yyyy-MM-dd HH:mm:ss | +| yyyy/MM/dd HH:mm:ss | +| yyyy.MM.dd HH:mm:ss | +| yyyy-MM-dd HH:mm:ssZZ | +| yyyy/MM/dd HH:mm:ssZZ | +| yyyy.MM.dd HH:mm:ssZZ | +| yyyy/MM/dd HH:mm:ss.SSS | +| yyyy-MM-dd HH:mm:ss.SSS | +| yyyy.MM.dd HH:mm:ss.SSS | +| yyyy-MM-dd HH:mm:ss.SSSZZ | +| yyyy/MM/dd HH:mm:ss.SSSZZ | +| yyyy.MM.dd HH:mm:ss.SSSZZ | +| ISO8601 standard time format | + + +
+ + +IoTDB 在显示时间戳时可以支持 LONG 类型以及 DATETIME-DISPLAY 类型,其中 DATETIME-DISPLAY 类型可以支持用户自定义时间格式。自定义时间格式的语法如表所示: + +
+ +**DATETIME-DISPLAY 自定义时间格式的语法** + + +| Symbol | Meaning | Presentation | Examples | +| :----: | :-------------------------: | :----------: | :--------------------------------: | +| G | era | era | era | +| C | century of era (>=0) | number | 20 | +| Y | year of era (>=0) | year | 1996 | +| | | | | +| x | weekyear | year | 1996 | +| w | week of weekyear | number | 27 | +| e | day of week | number | 2 | +| E | day of week | text | Tuesday; Tue | +| | | | | +| y | year | year | 1996 | +| D | day of year | number | 189 | +| M | month of year | month | July; Jul; 07 | +| d | day of month | number | 10 | +| | | | | +| a | halfday of day | text | PM | +| K | hour of halfday (0~11) | number | 0 | +| h | clockhour of halfday (1~12) | number | 12 | +| | | | | +| H | hour of day (0~23) | number | 0 | +| k | clockhour of day (1~24) | number | 24 | +| m | minute of hour | number | 30 | +| s | second of minute | number | 55 | +| S | fraction of second | millis | 978 | +| | | | | +| z | time zone | text | Pacific Standard Time; PST | +| Z | time zone offset/id | zone | -0800; -08:00; America/Los_Angeles | +| | | | | +| ' | escape for text | delimiter | | +| '' | single quote | literal | ' | + +
+ +### 相对时间戳 + + 相对时间是指与服务器时间```now()```和```DATETIME```类型时间相差一定时间间隔的时间。 + 形式化定义为: + + ``` + Duration = (Digit+ ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS'))+ + RelativeTime = (now() | DATETIME) ((+|-) Duration)+ + ``` + +
+ + **The syntax of the duration unit** + + + | Symbol | Meaning | Presentation | Examples | + | :----: | :---------: | :----------------------: | :------: | + | y | year | 1y=365 days | 1y | + | mo | month | 1mo=30 days | 1mo | + | w | week | 1w=7 days | 1w | + | d | day | 1d=1 day | 1d | + | | | | | + | h | hour | 1h=3600 seconds | 1h | + | m | minute | 1m=60 seconds | 1m | + | s | second | 1s=1 second | 1s | + | | | | | + | ms | millisecond | 1ms=1000_000 nanoseconds | 1ms | + | us | microsecond | 1us=1000 nanoseconds | 1us | + | ns | nanosecond | 1ns=1 nanosecond | 1ns | + +
+ + 例子: + + ``` + now() - 1d2h //比服务器时间早 1 天 2 小时的时间 + now() - 1w //比服务器时间早 1 周的时间 + ``` + + > 注意:'+'和'-'的左右两边必须有空格 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Data-Model-and-Terminology.md b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Data-Model-and-Terminology.md new file mode 100644 index 00000000..1d6fdf0f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Data-Model-and-Terminology.md @@ -0,0 +1,141 @@ + + +# 数据模型 + +我们以风电场物联网场景为例,说明如何在 IoTDB 中创建一个正确的数据模型。 + +根据企业组织结构和设备实体层次结构,我们将其物联网数据模型表示为如下图所示的属性层级组织结构,即电力集团层-风电场层-实体层-物理量层。其中 ROOT 为根节点,物理量层的每一个节点为叶子节点。IoTDB 采用树形结构定义数据模式,以从 ROOT 节点到叶子节点的路径来命名一个时间序列,层次间以“.”连接。例如,下图最左侧路径对应的时间序列名称为`ROOT.ln.wf01.wt01.status`。 + + + +在上图所描述的实际场景中,有许多实体所采集的物理量相同,即具有相同的工况名称和类型,因此,可以声明一个**元数据模板**来定义可采集的物理量集合。在实践中,元数据模板的使用可帮助减少元数据的资源占用,详细内容参见 [元数据模板](../User-Manual/Operate-Metadata_timecho.md#元数据模板管理)。 + +IoTDB 模型结构涉及的基本概念在下文将做详细叙述。 + +## 数据库(Database) + +用户可以将任意前缀路径设置成数据库。如有 4 条时间序列`root.ln.wf01.wt01.status`, `root.ln.wf01.wt01.temperature`, `root.ln.wf02.wt02.hardware`, `root.ln.wf02.wt02.status`,路径`root.ln`下的两个实体 `wf01`, `wf02`可能属于同一个业主,或者同一个制造商,这时候就可以将前缀路径`root.ln`指定为一个数据库。未来`root.ln`下增加了新的实体,也将属于该数据库。 + +一个 database 中的所有数据会存储在同一批文件夹下,不同 database 的数据会存储在磁盘的不同文件夹下,从而实现物理隔离。一般情况下建议设置 1 个 database。 + +> 注意 1:不允许将一个完整路径(如上例的`root.ln.wf01.wt01.status`) 设置成 database。 +> +> 注意 2:一个时间序列其前缀必须属于某个 database。在创建时间序列之前,用户必须设定该序列属于哪个database。只有设置了 database 的时间序列才可以被持久化在磁盘上。 +> +> 注意 3:被设置为数据库的路径总字符数不能超过64,包括路径开头的`root.`这5个字符。 + +一个前缀路径一旦被设定成 database 后就不可以再更改这个 database 的设定。 + +一个 database 设定后,其对应的前缀路径的祖先层级与孩子及后裔层级也不允许再设置 database(如,`root.ln`设置 database 后,root 层级与`root.ln.wf01`不允许被设置为 database)。 + +Database 节点名只支持中英文字符、数字和下划线的组合。例如`root.数据库_1` 。 + +## 设备(Device) + +**一个物理设备**,也称实体(Entity),是在实际场景中拥有物理量的设备或装置。在 IoTDB 当中,所有的物理量都有其对应的归属实体。实体无需手动创建,默认为倒数第二层。实体是管理的一组时间序列的组合,可以是一个物理设备、测量装置、传感器集合等。 + + +## 物理量(Measurement) + +**物理量**,也称工况或字段(field),是在实际场景中检测装置所记录的测量信息,且可以按一定规律变换成为电信号或其他所需形式的信息输出并发送给 IoTDB。在 IoTDB 当中,存储的所有数据及路径,都是以物理量为单位进行组织。 + +## 时间序列 + +### 时间戳 (Timestamp) + +时间戳是一个数据到来的时间点,其中包括绝对时间戳和相对时间戳,详细描述参见 [数据类型文档](./Data-Type.md)。 + +### 数据点(Data Point) + +**一个“时间戳-值”对**。 + +### 时间序列(Timeseries) + +**一个物理实体的某个物理量在时间轴上的记录**,是数据点的序列。 + +一个实体的一个物理量对应一个时间序列,即实体+物理量=时间序列。 + +时间序列也被称测点(meter)、时间线(timeline)。实时数据库中常被称作标签(tag)、参数(parameter)。 IoTDB管理的测点数量可达数十亿以上。 + +例如,ln 电力集团、wf01 风电场的实体 wt01 有名为 status 的物理量,则它的时间序列可以表示为:`root.ln.wf01.wt01.status`。 + +### 对齐时间序列(Aligned Timeseries) + +在实际应用中,存在某些实体的多个物理量**同时采样**,形成一组时间列相同的时间序列,这样的一组时间序列在Apache IoTDB中可以建模为对齐时间序列。 + +在插入数据时,一组对齐序列的时间戳列在内存和磁盘中仅需存储一次,而不是每个时间序列存储一次。 + +对齐的一组时间序列最好同时创建。 + +不可以在对齐序列所属的实体下创建非对齐的序列,不可以在非对齐序列所属的实体下创建对齐序列。 + +查询数据时,可以对于每一条时间序列单独查询。 + +插入数据时,对齐的时间序列中某列的某些行允许有空值。 + + + +在后续数据定义语言、数据操作语言和 Java 原生接口章节,将对涉及到对齐时间序列的各种操作进行逐一介绍。 + +## 路径(Path) + +路径(`path`)是指符合以下约束的表达式: + +```sql +path + : nodeName ('.' nodeName)* + ; + +nodeName + : wildcard? identifier wildcard? + | wildcard + ; + +wildcard + : '*' + | '**' + ; +``` + +我们称一个路径中由 `'.'` 分割的部分叫做路径结点名(`nodeName`)。例如:`root.a.b.c`为一个层级为 4 的路径。 + +下面是对路径结点名(`nodeName`)的约束: + +* `root` 作为一个保留字符,它只允许出现在下文提到的时间序列的开头,若其他层级出现 `root`,则无法解析,提示报错。 +* 除了时间序列的开头的层级(`root`)外,其他的层级支持的字符如下: + * [ 0-9 a-z A-Z _ ] (字母,数字,下划线) + * ['\u2E80'..'\u9FFF'] (UNICODE 中文字符) +* 特别地,如果系统在 Windows 系统上部署,那么 database 路径结点名是大小写不敏感的。例如,同时创建`root.ln` 和 `root.LN` 是不被允许的。 + +### 特殊字符(反引号) + +如果需要在路径结点名中用特殊字符,可以用反引号引用路径结点名,具体使用方法可以参考[反引号](../Reference/Syntax-Rule.md#反引号)。 + +## 路径模式(Path Pattern) + +为了使得在表达多个时间序列的时候更加方便快捷,IoTDB 为用户提供带通配符`*`或`**`的路径。用户可以利用两种通配符构造出期望的路径模式。通配符可以出现在路径中的任何层。 + +`*`在路径中表示一层。例如`root.vehicle.*.sensor1`代表的是以`root.vehicle`为前缀,以`sensor1`为后缀,层次等于 4 层的路径。 + +`**`在路径中表示是(`*`)+,即为一层或多层`*`。例如`root.vehicle.device1.**`代表的是`root.vehicle.device1.*`, `root.vehicle.device1.*.*`, `root.vehicle.device1.*.*.*`等所有以`root.vehicle.device1`为前缀路径的大于等于 4 层的路径;`root.vehicle.**.sensor1`代表的是以`root.vehicle`为前缀,以`sensor1`为后缀,层次大于等于 4 层的路径。 + +> 注意:`*`和`**`不能放在路径开头。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Navigating_Time_Series_Data.md b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Navigating_Time_Series_Data.md new file mode 100644 index 00000000..96e9fdf9 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Navigating_Time_Series_Data.md @@ -0,0 +1,67 @@ + +# 走进时序数据 + +## 什么叫时序数据? + +万物互联的今天,物联网场景、工业场景等各类场景都在进行数字化转型,人们通过在各类设备上安装传感器对设备的各类状态进行采集。如电机采集电压、电流,风机的叶片转速、角速度、发电功率;车辆采集经纬度、速度、油耗;桥梁的振动频率、挠度、位移量等。传感器的数据采集,已经渗透在各个行业中。 + +![](https://alioss.timecho.com/docs/img/%E6%97%B6%E5%BA%8F%E6%95%B0%E6%8D%AE%E4%BB%8B%E7%BB%8D.png) + + + +通常来说,我们把每个采集点位叫做一个**测点( 也叫物理量、时间序列、时间线、信号量、指标、测量值等)**,每个测点都在随时间的推移不断收集到新的数据信息,从而构成了一条**时间序列**。用表格的方式,每个时间序列就是一个由时间、值两列形成的表格;用图形化的方式,每个时间序列就是一个随时间推移形成的走势图,也可以形象的称之为设备的“心电图”。 + +![](https://alioss.timecho.com/docs/img/%E5%BF%83%E7%94%B5%E5%9B%BE1.png) + +传感器产生的海量时序数据是各行各业数字化转型的基础,因此我们对时序数据的模型梳理主要围绕设备、传感器展开。 + +## 时序数据中的关键概念有哪些? + +时序数据中主要涉及的概念由下至上可分为:数据点、测点、设备。 + +![](https://alioss.timecho.com/docs/img/%E7%99%BD%E6%9D%BF.png) + +### 数据点 + +- 定义:由一个时间戳和一个数值组成,其中时间戳为 long 类型,数值可以为 BOOLEAN、FLOAT、INT32 等各种类型。 +- 示例:如上图中表格形式的时间序列的一行,或图形形式的时间序列的一个点,就是一个数据点。 + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E7%82%B9.png) + +### 测点 + +- 定义:是多个数据点按时间戳递增排列形成的一个时间序列。通常一个测点代表一个采集点位,能够定期采集所在环境的物理量。 +- 又名:物理量、时间序列、时间线、信号量、指标、测量值等 +- 示例: + - 电力场景:电流、电压 + - 能源场景:风速、转速 + - 车联网场景:油量、车速、经度、维度 + - 工厂场景:温度、湿度 + +### 设备 + +- 定义:对应一个实际场景中的物理设备,通常是一组测点的集合,由一到多个标签定位标识 +- 示例 + - 车联网场景:车辆,由车辆识别代码 VIN 标识 + - 工厂场景:机械臂,由物联网平台生成的唯一 ID 标识 + - 能源场景:风机,由区域、场站、线路、机型、实例等标识 + - 监控场景:CPU,由机房、机架、Hostname、设备类型等标识 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_apache.md new file mode 100644 index 00000000..e67ae075 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_apache.md @@ -0,0 +1,1261 @@ + + +# 测点管理 + +## 数据库管理 + +数据库(Database)可以被视为关系数据库中的Database。 + +### 创建数据库 + +我们可以根据存储模型建立相应的数据库。如下所示: + +``` +IoTDB > CREATE DATABASE root.ln +``` + +需要注意的是,推荐创建一个 database. + +Database 的父子节点都不能再设置 database。例如在已经有`root.ln`和`root.sgcc`这两个 database 的情况下,创建`root.ln.wf01` database 是不可行的。系统将给出相应的错误提示,如下所示: + +``` +IoTDB> CREATE DATABASE root.ln.wf01 +Msg: 300: root.ln has already been created as database. +``` +Database 节点名只支持中英文字符、数字、下划线、英文句号和反引号的组合,如果想设置为纯数字或者包含下划线和英文句号,需要用反引号(` `` `)把 database 名称引起来。其中` `` `内,两个反引号表示一个反引号,例如 ` ```` ` 表示`` ` ``。 + +还需注意,如果在 Windows 系统上部署,database 名是大小写不敏感的。例如同时创建`root.ln` 和 `root.LN` 是不被允许的。 + +### 查看数据库 + +在 database 创建后,我们可以使用 [SHOW DATABASES](../SQL-Manual/SQL-Manual.md#查看数据库) 语句和 [SHOW DATABASES \](../SQL-Manual/SQL-Manual.md#查看数据库) 来查看 database,SQL 语句如下所示: + +``` +IoTDB> show databases +IoTDB> show databases root.* +IoTDB> show databases root.** +``` + +执行结果为: + +``` ++-------------+----+-------------------------+-----------------------+-----------------------+ +| database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| ++-------------+----+-------------------------+-----------------------+-----------------------+ +| root.sgcc|null| 2| 2| 604800| +| root.ln|null| 2| 2| 604800| ++-------------+----+-------------------------+-----------------------+-----------------------+ +Total line number = 2 +It costs 0.060s +``` + +### 删除数据库 + +用户可以使用`DELETE DATABASE `语句删除该路径模式匹配的所有的数据库。在删除的过程中,需要注意的是数据库的数据也会被删除。 + +``` +IoTDB > DELETE DATABASE root.ln +IoTDB > DELETE DATABASE root.sgcc +// 删除所有数据,时间序列以及数据库 +IoTDB > DELETE DATABASE root.** +``` + +### 统计数据库数量 + +用户可以使用`COUNT DATABASES `语句统计数据库的数量,允许指定`PathPattern` 用来统计匹配该`PathPattern` 的数据库的数量 + +SQL 语句如下所示: + +``` +IoTDB> show databases +IoTDB> count databases +IoTDB> count databases root.* +IoTDB> count databases root.sgcc.* +IoTDB> count databases root.sgcc +``` + +执行结果为: + +``` ++-------------+ +| database| ++-------------+ +| root.sgcc| +| root.turbine| +| root.ln| ++-------------+ +Total line number = 3 +It costs 0.003s + ++-------------+ +| Database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.003s + ++-------------+ +| Database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| Database| ++-------------+ +| 0| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 1| ++-------------+ +Total line number = 1 +It costs 0.002s +``` + +### 数据存活时间(TTL) + +IoTDB 支持对 device 级别设置数据存活时间(TTL),这使得 IoTDB 可以定期、自动地删除一定时间之前的数据。合理使用 TTL +可以帮助您控制 IoTDB 占用的总磁盘空间以避免出现磁盘写满等异常。并且,随着文件数量的增多,查询性能往往随之下降, +内存占用也会有所提高。及时地删除一些较老的文件有助于使查询性能维持在一个较高的水平和减少内存资源的占用。 + +TTL的默认单位为毫秒,如果配置文件中的时间精度修改为其他单位,设置ttl时仍然使用毫秒单位。 + +当设置 TTL 时,系统会根据设置的路径寻找所包含的所有 device,并为这些 device 设置 TTL 时间,系统会按设备粒度对过期数据进行删除。 +当设备数据过期后,将不能被查询到,但磁盘文件中的数据不能保证立即删除(会在一定时间内删除),但可以保证最终被删除。 +考虑到操作代价,系统不会立即物理删除超过 TTL 的数据,而是通过合并来延迟地物理删除。因此,在数据被物理删除前,如果调小或者解除 TTL,可能会导致之前因 TTL 而不可见的数据重新出现。 +系统中仅能设置至多 1000 条 TTL 规则,达到该上限时,需要先删除部分 TTL 规则才能设置新的规则 + +#### TTL Path 规则 +设置的路径 path 只支持前缀路径(即路径中间不能带 \* , 且必须以 \*\* 结尾),该路径会匹配到设备,也允许用户指定不带星的 path 为具体的 database 或 device,当 path 不带 \* 时,会检查是否匹配到 database,若匹配到 database,则会同时设置 path 和 path.\*\*。 +注意:设备 TTL 设置不会对元数据的存在性进行校验,即允许对一条不存在的设备设置 TTL。 +``` +合格的 path: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +不合格的 path: +root.*.db +root.**.db.* +root.db.* +``` +#### TTL 适用规则 +当一个设备适用多条TTL规则时,优先适用较精确和较长的规则。例如对于设备“root.bj.hd.dist001.turbine001”来说,规则“root.bj.hd.dist001.turbine001”比“root.bj.hd.dist001.\*\*”优先,而规则“root.bj.hd.dist001.\*\*”比“root.bj.hd.\*\*”优先; +#### 设置 TTL +set ttl 操作可以理解为设置一条 TTL规则,比如 set ttl to root.sg.group1.\*\* 就相当于对所有可以匹配到该路径模式的设备挂载 ttl。 unset ttl 操作表示对相应路径模式卸载 TTL,若不存在对应 TTL,则不做任何事。若想把 TTL 调成无限大,则可以使用 INF 关键字 +设置 TTL 的 SQL 语句如下所示: +``` +set ttl to pathPattern 360000; +``` +pathPattern 是前缀路径,即路径中间不能带 \* 且必须以 \*\* 结尾。 +pathPattern 匹配对应的设备。为了兼容老版本 SQL 语法,允许用户输入的 pathPattern 匹配到 db,则自动将前缀路径扩展为 path.\*\*。 +例如,写set ttl to root.sg 360000 则会自动转化为set ttl to root.sg.\*\* 360000,转化后的语句对所有 root.sg 下的 device 设置TTL。 +但若写的 pathPattern 无法匹配到 db,则上述逻辑不会生效。 +如写set ttl to root.sg.group 360000 ,由于root.sg.group未匹配到 db,则不会被扩充为root.sg.group.\*\*。 也允许指定具体 device,不带 \*。 +#### 取消 TTL + +取消 TTL 的 SQL 语句如下所示: + +``` +IoTDB> unset ttl from root.ln +``` + +取消设置 TTL 后, `root.ln` 路径下所有的数据都会被保存。 +``` +IoTDB> unset ttl from root.sgcc.** +``` + +取消设置`root.sgcc`路径下的所有的 TTL 。 +``` +IoTDB> unset ttl from root.** +``` + +取消设置所有的 TTL 。 + +新语法 +``` +IoTDB> unset ttl from root.** +``` + +旧语法 +``` +IoTDB> unset ttl to root.** +``` +新旧语法在功能上没有区别并且同时兼容,仅是新语法在用词上更符合常规。 +#### 显示 TTL + +显示 TTL 的 SQL 语句如下所示: +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` +SHOW ALL TTL 这个例子会给出所有的 TTL。 +SHOW TTL ON pathPattern 这个例子会显示指定路径的 TTL。 + +显示设备的 TTL。 +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +所有设备都一定会有 TTL,即不可能是 null。INF 表示无穷大。 + + +### 设置异构数据库(进阶操作) + +在熟悉 IoTDB 元数据建模的前提下,用户可以在 IoTDB 中设置异构的数据库,以便应对不同的生产需求。 + +目前支持的数据库异构参数有: + +| 参数名 | 参数类型 | 参数描述 | +|---------------------------|---------|---------------------------| +| TTL | Long | 数据库的 TTL | +| SCHEMA_REPLICATION_FACTOR | Integer | 数据库的元数据副本数 | +| DATA_REPLICATION_FACTOR | Integer | 数据库的数据副本数 | +| SCHEMA_REGION_GROUP_NUM | Integer | 数据库的 SchemaRegionGroup 数量 | +| DATA_REGION_GROUP_NUM | Integer | 数据库的 DataRegionGroup 数量 | + +用户在配置异构参数时需要注意以下三点: ++ TTL 和 TIME_PARTITION_INTERVAL 必须为正整数。 ++ SCHEMA_REPLICATION_FACTOR 和 DATA_REPLICATION_FACTOR 必须小于等于已部署的 DataNode 数量。 ++ SCHEMA_REGION_GROUP_NUM 和 DATA_REGION_GROUP_NUM 的功能与 iotdb-system.properties 配置文件中的 +`schema_region_group_extension_policy` 和 `data_region_group_extension_policy` 参数相关,以 DATA_REGION_GROUP_NUM 为例: +若设置 `data_region_group_extension_policy=CUSTOM`,则 DATA_REGION_GROUP_NUM 将作为 Database 拥有的 DataRegionGroup 的数量; +若设置 `data_region_group_extension_policy=AUTO`,则 DATA_REGION_GROUP_NUM 将作为 Database 拥有的 DataRegionGroup 的配额下界,即当该 Database 开始写入数据时,将至少拥有此数量的 DataRegionGroup。 + +用户可以在创建 Database 时设置任意异构参数,或在单机/分布式 IoTDB 运行时调整部分异构参数。 + +#### 创建 Database 时设置异构参数 + +用户可以在创建 Database 时设置上述任意异构参数,SQL 语句如下所示: + +``` +CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? +``` + +例如: +``` +CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### 运行时调整异构参数 + +用户可以在 IoTDB 运行时调整部分异构参数,SQL 语句如下所示: + +``` +ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* +``` + +例如: +``` +ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +注意,运行时只能调整下列异构参数: ++ SCHEMA_REGION_GROUP_NUM ++ DATA_REGION_GROUP_NUM + +#### 查看异构数据库 + +用户可以查询每个 Database 的具体异构配置,SQL 语句如下所示: + +``` +SHOW DATABASES DETAILS prefixPath? +``` + +例如: + +``` +IoTDB> SHOW DATABASES DETAILS ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| +|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| +|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +Total line number = 3 +It costs 0.058s +``` + +各列查询结果依次为: ++ 数据库名称 ++ 数据库的 TTL ++ 数据库的元数据副本数 ++ 数据库的数据副本数 ++ 数据库的时间分区间隔 ++ 数据库当前拥有的 SchemaRegionGroup 数量 ++ 数据库需要拥有的最小 SchemaRegionGroup 数量 ++ 数据库允许拥有的最大 SchemaRegionGroup 数量 ++ 数据库当前拥有的 DataRegionGroup 数量 ++ 数据库需要拥有的最小 DataRegionGroup 数量 ++ 数据库允许拥有的最大 DataRegionGroup 数量 + + +## 设备模板管理 + +IoTDB 支持设备模板功能,实现同类型不同实体的物理量元数据共享,减少元数据内存占用,同时简化同类型实体的管理。 + + +![img](https://alioss.timecho.com/docs/img/%E6%A8%A1%E6%9D%BF.png) + +![img](https://alioss.timecho.com/docs/img/template.jpg) + +### 创建设备模板 + +创建设备模板的 SQL 语法如下: + +```sql +CREATE DEVICE TEMPLATE ALIGNED? '(' [',' ]+ ')' +``` + +**示例1:** 创建包含两个非对齐序列的元数据模板 + +```shell +IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +**示例2:** 创建包含一组对齐序列的元数据模板 + +```shell +IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` + +其中,物理量 `lat` 和 `lon` 是对齐的。 + +### 挂载设备模板 + +元数据模板在创建后,需执行挂载操作,方可用于相应路径下的序列创建与数据写入。 + +**挂载模板前,需确保相关数据库已经创建。** + +**推荐将模板挂载在 database 节点上,不建议将模板挂载到 database 上层的节点上。** + +**模板挂载路径下禁止创建普通序列,已创建了普通序列的前缀路径上不允许挂载模板。** + +挂载元数据模板的 SQL 语句如下所示: + +```shell +IoTDB> set device template t1 to root.sg1.d1 +``` + +### 激活设备模板 + +挂载好设备模板后,且系统开启自动注册序列功能的情况下,即可直接进行数据的写入。例如 database 为 root.sg1,模板 t1 被挂载到了节点 root.sg1.d1,那么可直接向时间序列(如 root.sg1.d1.temperature 和 root.sg1.d1.status)写入时间序列数据,该时间序列已可被当作正常创建的序列使用。 + +**注意**:在插入数据之前或系统未开启自动注册序列功能,模板定义的时间序列不会被创建。可以使用如下SQL语句在插入数据前创建时间序列即激活模板: + +```shell +IoTDB> create timeseries using device template on root.sg1.d1 +``` + +**示例:** 执行以下语句 +```shell +IoTDB> set device template t1 to root.sg1.d1 +IoTDB> set device template t2 to root.sg1.d2 +IoTDB> create timeseries using device template on root.sg1.d1 +IoTDB> create timeseries using device template on root.sg1.d2 +``` + +查看此时的时间序列: +```sql +show timeseries root.sg1.** +``` + +```shell ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| +| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +``` + +查看此时的设备: +```sql +show devices root.sg1.** +``` + +```shell ++---------------+---------+---------+ +| devices|isAligned| Template| ++---------------+---------+---------+ +| root.sg1.d1| false| null| +| root.sg1.d2| true| null| ++---------------+---------+---------+ +``` + +### 查看设备模板 + +- 查看所有设备模板 + +SQL 语句如下所示: + +```shell +IoTDB> show device templates +``` + +执行结果如下: +```shell ++-------------+ +|template name| ++-------------+ +| t2| +| t1| ++-------------+ +``` + +- 查看某个设备模板下的物理量 + +SQL 语句如下所示: + +```shell +IoTDB> show nodes in device template t1 +``` + +执行结果如下: +```shell ++-----------+--------+--------+-----------+ +|child nodes|dataType|encoding|compression| ++-----------+--------+--------+-----------+ +|temperature| FLOAT| RLE| SNAPPY| +| status| BOOLEAN| PLAIN| SNAPPY| ++-----------+--------+--------+-----------+ +``` + +- 查看挂载了某个设备模板的路径 + +```shell +IoTDB> show paths set device template t1 +``` + +执行结果如下: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +- 查看使用了某个设备模板的路径(即模板在该路径上已激活,序列已创建) + +```shell +IoTDB> show paths using device template t1 +``` + +执行结果如下: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +### 解除设备模板 + +若需删除模板表示的某一组时间序列,可采用解除模板操作,SQL语句如下所示: + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.d1 +``` + +或 + +```shell +IoTDB> deactivate device template t1 from root.sg1.d1 +``` + +解除操作支持批量处理,SQL语句如下所示: + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* +``` + +或 + +```shell +IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* +``` + +若解除命令不指定模板名称,则会将给定路径涉及的所有模板使用情况均解除。 + +### 卸载设备模板 + +卸载设备模板的 SQL 语句如下所示: + +```shell +IoTDB> unset device template t1 from root.sg1.d1 +``` + +**注意**:不支持卸载仍处于激活状态的模板,需保证执行卸载操作前解除对该模板的所有使用,即删除所有该模板表示的序列。 + +### 删除设备模板 + +删除设备模板的 SQL 语句如下所示: + +```shell +IoTDB> drop device template t1 +``` + +**注意**:不支持删除已经挂载的模板,需在删除操作前保证该模板卸载成功。 + +### 修改设备模板 + +在需要新增物理量的场景中,可以通过修改设备模板来给所有已激活该模板的设备新增物理量。 + +修改设备模板的 SQL 语句如下所示: + +```shell +IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) +``` + +**向已挂载模板的路径下的设备中写入数据,若写入请求中的物理量不在模板中,将自动扩展模板。** + + +## 时间序列管理 + +### 创建时间序列 + +根据建立的数据模型,我们可以分别在两个数据库中创建相应的时间序列。创建时间序列的 SQL 语句如下所示: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +从 v0.13 起,可以使用简化版的 SQL 语句创建时间序列: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE +``` + +需要注意的是,当创建时间序列时指定的编码方式与数据类型不对应时,系统会给出相应的错误提示,如下所示: +``` +IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +error: encoding TS_2DIFF does not support BOOLEAN +``` + +详细的数据类型与编码方式的对应列表请参见 [编码方式](../Basic-Concept/Encoding-and-Compression.md)。 + +### 创建对齐时间序列 + +创建一组对齐时间序列的SQL语句如下所示: + +``` +IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +一组对齐序列中的序列可以有不同的数据类型、编码方式以及压缩方式。 + +对齐的时间序列也支持设置别名、标签、属性。 + +### 删除时间序列 + +我们可以使用`(DELETE | DROP) TimeSeries `语句来删除我们之前创建的时间序列。SQL 语句如下所示: + +``` +IoTDB> delete timeseries root.ln.wf01.wt01.status +IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +IoTDB> delete timeseries root.ln.wf02.* +IoTDB> drop timeseries root.ln.wf02.* +``` + +### 查看时间序列 + +* SHOW LATEST? TIMESERIES pathPattern? timeseriesWhereClause? limitClause? + + SHOW TIMESERIES 中可以有四种可选的子句,查询结果为这些时间序列的所有信息 + +时间序列信息具体包括:时间序列路径名,database,Measurement 别名,数据类型,编码方式,压缩方式,属性和标签。 + +示例: + +* SHOW TIMESERIES + + 展示系统中所有的时间序列信息 + +* SHOW TIMESERIES <`Path`> + + 返回给定路径的下的所有时间序列信息。其中 `Path` 需要为一个时间序列路径或路径模式。例如,分别查看`root`路径和`root.ln`路径下的时间序列,SQL 语句如下所示: + +``` +IoTDB> show timeseries root.** +IoTDB> show timeseries root.ln.** +``` + +执行结果分别为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.016s + ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +Total line number = 4 +It costs 0.004s +``` + +* SHOW TIMESERIES LIMIT INT OFFSET INT + + 只返回从指定下标开始的结果,最大返回条数被 LIMIT 限制,用于分页查询。例如: + +``` +show timeseries root.ln.** limit 10 offset 10 +``` + +* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' + + 对查询结果集根据 timeseries 名称进行字符串模糊匹配过滤。例如: + +``` +show timeseries root.ln.** where timeseries contains 'wf01.wt' +``` + +执行结果为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 2 +It costs 0.016s +``` + +* SHOW TIMESERIES WHERE DataType=type + + 对查询结果集根据时间序列数据类型进行过滤。例如: + +``` +show timeseries root.ln.** where dataType=FLOAT +``` + +执行结果为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 3 +It costs 0.016s + +``` + + +* SHOW LATEST TIMESERIES + + 表示查询出的时间序列需要按照最近插入时间戳降序排列 + + +需要注意的是,当查询路径不存在时,系统会返回 0 条时间序列。 + +### 统计时间序列总数 + +IoTDB 支持使用`COUNT TIMESERIES`来统计一条路径中的时间序列个数。SQL 语句如下所示: + +* 可以通过 `WHERE` 条件对时间序列名称进行字符串模糊匹配,语法为: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'` 。 +* 可以通过 `WHERE` 条件对时间序列数据类型进行过滤,语法为: `COUNT TIMESERIES WHERE DataType='`。 +* 可以通过 `WHERE` 条件对标签点进行过滤,语法为: `COUNT TIMESERIES WHERE TAGS(key)='value'` 或 `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`。 +* 可以通过定义`LEVEL`来统计指定层级下的时间序列个数。这条语句可以用来统计每一个设备下的传感器数量,语法为:`COUNT TIMESERIES GROUP BY LEVEL=`。 + +``` +IoTDB > COUNT TIMESERIES root.** +IoTDB > COUNT TIMESERIES root.ln.** +IoTDB > COUNT TIMESERIES root.ln.*.*.status +IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +``` + +例如有如下时间序列(可以使用`show timeseries`展示所有时间序列): + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.004s +``` + +那么 Metadata Tree 如下所示: + + + +可以看到,`root`被定义为`LEVEL=0`。那么当你输入如下语句时: + +``` +IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +你将得到以下结果: + +``` +IoTDB> COUNT TIMESERIES root.** GROUP BY LEVEL=1 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +| root.sgcc| 2| +|root.turbine| 1| +| root.ln| 4| ++------------+-----------------+ +Total line number = 3 +It costs 0.002s + +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf02| 2| +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 2 +It costs 0.002s + +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 1 +It costs 0.002s +``` + +> 注意:时间序列的路径只是过滤条件,与 level 的定义无关。 + +### 标签点管理 + +我们可以在创建时间序列的时候,为它添加别名和额外的标签和属性信息。 + +标签和属性的区别在于: + +* 标签可以用来查询时间序列路径,会在内存中维护标点到时间序列路径的倒排索引:标签 -> 时间序列路径 +* 属性只能用时间序列路径来查询:时间序列路径 -> 属性 + +所用到的扩展的创建时间序列的 SQL 语句如下所示: +``` +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +括号里的`temprature`是`s1`这个传感器的别名。 +我们可以在任何用到`s1`的地方,将其用`temprature`代替,这两者是等价的。 + +> IoTDB 同时支持在查询语句中使用 AS 函数设置别名。二者的区别在于:AS 函数设置的别名用于替代整条时间序列名,且是临时的,不与时间序列绑定;而上文中的别名只作为传感器的别名,与其绑定且可与原传感器名等价使用。 + +> 注意:额外的标签和属性信息总的大小不能超过`tag_attribute_total_size`. + + * 标签点属性更新 +创建时间序列后,我们也可以对其原有的标签点属性进行更新,主要有以下六种更新方式: +* 重命名标签或属性 +``` +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` +* 重新设置标签或属性的值 +``` +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` +* 删除已经存在的标签或属性 +``` +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` +* 添加新的标签 +``` +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` +* 添加新的属性 +``` +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` +* 更新插入别名,标签和属性 +> 如果该别名,标签或属性原来不存在,则插入,否则,用新值更新原来的旧值 +``` +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* 使用标签作为过滤条件查询时间序列,使用 TAGS(tagKey) 来标识作为过滤条件的标签 +``` +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +返回给定路径的下的所有满足条件的时间序列信息,SQL 语句如下所示: + +``` +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +执行结果分别为: + +``` ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.005s + ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.004s +``` + +- 使用标签作为过滤条件统计时间序列数量 + +``` +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +返回给定路径的下的所有满足条件的时间序列的数量,SQL 语句如下所示: + +``` +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +执行结果分别为: + +``` +IoTDB> count timeseries ++-----------------+ +|count(timeseries)| ++-----------------+ +| 6| ++-----------------+ +Total line number = 1 +It costs 0.019s +IoTDB> count timeseries root.** where TAGS(unit)='c' ++-----------------+ +|count(timeseries)| ++-----------------+ +| 2| ++-----------------+ +Total line number = 1 +It costs 0.020s +IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 ++--------------+-----------------+ +| column|count(timeseries)| ++--------------+-----------------+ +| root.ln.wf02| 2| +| root.ln.wf01| 0| +|root.sgcc.wf03| 0| ++--------------+-----------------+ +Total line number = 3 +It costs 0.011s +``` + +> 注意,现在我们只支持一个查询条件,要么是等值条件查询,要么是包含条件查询。当然 where 子句中涉及的必须是标签值,而不能是属性值。 + +创建对齐时间序列 + +``` +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +执行结果如下: + +``` +IoTDB> show timeseries ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| +|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +支持查询: + +``` +IoTDB> show timeseries where TAGS(tag1)='v1' ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +上述对时间序列标签、属性的更新等操作都支持。 + + +## 路径查询 + +### 查看路径的所有子路径 + +``` +SHOW CHILD PATHS pathPattern +``` + +可以查看此路径模式所匹配的所有路径的下一层的所有路径和它对应的节点类型,即pathPattern.*所匹配的路径及其节点类型。 + +节点类型:ROOT -> SG INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES + +示例: + +* 查询 root.ln 的下一层:show child paths root.ln + +``` ++------------+----------+ +| child paths|node types| ++------------+----------+ +|root.ln.wf01| INTERNAL| +|root.ln.wf02| INTERNAL| ++------------+----------+ +Total line number = 2 +It costs 0.002s +``` + +* 查询形如 root.xx.xx.xx 的路径:show child paths root.\*.\* + +``` ++---------------+ +| child paths| ++---------------+ +|root.ln.wf01.s1| +|root.ln.wf02.s2| ++---------------+ +``` + +### 查看路径的下一级节点 + +``` +SHOW CHILD NODES pathPattern +``` + +可以查看此路径模式所匹配的节点的下一层的所有节点。 + +示例: + +* 查询 root 的下一层:show child nodes root + +``` ++------------+ +| child nodes| ++------------+ +| ln| ++------------+ +``` + +* 查询 root.ln 的下一层 :show child nodes root.ln + +``` ++------------+ +| child nodes| ++------------+ +| wf01| +| wf02| ++------------+ +``` + +### 统计节点数 + +IoTDB 支持使用`COUNT NODES LEVEL=`来统计当前 Metadata + 树下满足某路径模式的路径中指定层级的节点个数。这条语句可以用来统计带有特定采样点的设备数。例如: + +``` +IoTDB > COUNT NODES root.** LEVEL=2 +IoTDB > COUNT NODES root.ln.** LEVEL=2 +IoTDB > COUNT NODES root.ln.wf01.* LEVEL=3 +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` + +对于上面提到的例子和 Metadata Tree,你可以获得如下结果: + +``` ++------------+ +|count(nodes)| ++------------+ +| 4| ++------------+ +Total line number = 1 +It costs 0.003s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 1| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s +``` + +> 注意:时间序列的路径只是过滤条件,与 level 的定义无关。 + +### 查看设备 + +* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? + +与 `Show Timeseries` 相似,IoTDB 目前也支持两种方式查看设备。 + +* `SHOW DEVICES` 语句显示当前所有的设备信息,等价于 `SHOW DEVICES root.**`。 +* `SHOW DEVICES ` 语句规定了 `PathPattern`,返回给定的路径模式所匹配的设备信息。 +* `WHERE` 条件中可以使用 `DEVICE contains 'xxx'`,根据 device 名称进行模糊查询。 +* `WHERE` 条件中可以使用 `TEMPLATE = 'xxx'`,`TEMPLATE != 'xxx'`,根据 template 名称进行过滤查询。 +* `WHERE` 条件中可以使用 `TEMPLATE is null`,`TEMPLATE is not null`,根据 template 是否为null(null 表示没激活)进行过滤查询。 + +SQL 语句如下所示: + +``` +IoTDB> show devices +IoTDB> show devices root.ln.** +IoTDB> show devices root.ln.** where device contains 't' +IoTDB> show devices root.ln.** where template = 't1' +IoTDB> show devices root.ln.** where template is null +IoTDB> show devices root.ln.** where template != 't1' +IoTDB> show devices root.ln.** where template is not null +``` + +你可以获得如下数据: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +| root.ln.wf01.wt01| false| t1| +| root.ln.wf02.wt02| false| null| +|root.sgcc.wf03.wt01| false| null| +| root.turbine.d1| false| null| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.002s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s +``` + +其中,`isAligned`表示该设备下的时间序列是否对齐, +`Template`显示着该设备所激活的模板名,null 表示没有激活模板。 + +查看设备及其 database 信息,可以使用 `SHOW DEVICES WITH DATABASE` 语句。 + +* `SHOW DEVICES WITH DATABASE` 语句显示当前所有的设备信息和其所在的 database,等价于 `SHOW DEVICES root.**`。 +* `SHOW DEVICES WITH DATABASE` 语句规定了 `PathPattern`,返回给定的路径模式所匹配的设备信息和其所在的 database。 + +SQL 语句如下所示: + +``` +IoTDB> show devices with database +IoTDB> show devices root.ln.** with database +``` + +你可以获得如下数据: + +``` ++-------------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-------------------+-------------+---------+---------+ +| root.ln.wf01.wt01| root.ln| false| t1| +| root.ln.wf02.wt02| root.ln| false| null| +|root.sgcc.wf03.wt01| root.sgcc| false| null| +| root.turbine.d1| root.turbine| false| null| ++-------------------+-------------+---------+---------+ +Total line number = 4 +It costs 0.003s + ++-----------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-----------------+-------------+---------+---------+ +|root.ln.wf01.wt01| root.ln| false| t1| +|root.ln.wf02.wt02| root.ln| false| null| ++-----------------+-------------+---------+---------+ +Total line number = 2 +It costs 0.001s +``` + +### 统计设备数量 + +* COUNT DEVICES \ + +上述语句用于统计设备的数量,同时允许指定`PathPattern` 用于统计匹配该`PathPattern` 的设备数量 + +SQL 语句如下所示: + +``` +IoTDB> show devices +IoTDB> count devices +IoTDB> count devices root.ln.** +``` + +你可以获得如下数据: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +|root.sgcc.wf03.wt03| false| null| +| root.turbine.d1| false| null| +| root.ln.wf02.wt02| false| null| +| root.ln.wf01.wt01| false| t1| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.024s + ++--------------+ +|count(devices)| ++--------------+ +| 4| ++--------------+ +Total line number = 1 +It costs 0.004s + ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +Total line number = 1 +It costs 0.004s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_timecho.md new file mode 100644 index 00000000..01cf39e7 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Operate-Metadata_timecho.md @@ -0,0 +1,1333 @@ + + +# 测点管理 + +## 数据库管理 + +数据库(Database)可以被视为关系数据库中的Database。 + +### 创建数据库 + +我们可以根据存储模型建立相应的数据库。如下所示: + +``` +IoTDB > CREATE DATABASE root.ln +``` + +需要注意的是,推荐创建一个 database. + +Database 的父子节点都不能再设置 database。例如在已经有`root.ln`和`root.sgcc`这两个 database 的情况下,创建`root.ln.wf01` database 是不可行的。系统将给出相应的错误提示,如下所示: + +``` +IoTDB> CREATE DATABASE root.ln.wf01 +Msg: 300: root.ln has already been created as database. +``` +Database 节点名只支持中英文字符、数字、下划线、英文句号和反引号的组合,如果想设置为纯数字或者包含下划线和英文句号,需要用反引号(` `` `)把 database 名称引起来。其中` `` `内,两个反引号表示一个反引号,例如 ` ```` ` 表示`` ` ``。 + +还需注意,如果在 Windows 系统上部署,database 名是大小写不敏感的。例如同时创建`root.ln` 和 `root.LN` 是不被允许的。 + +### 查看数据库 + +在 database 创建后,我们可以使用 [SHOW DATABASES](../SQL-Manual/SQL-Manual.md#查看数据库) 语句和 [SHOW DATABASES \](../SQL-Manual/SQL-Manual.md#查看数据库) 来查看 database,SQL 语句如下所示: + +``` +IoTDB> show databases +IoTDB> show databases root.* +IoTDB> show databases root.** +``` + +执行结果为: + +``` ++-------------+----+-------------------------+-----------------------+-----------------------+ +| database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| ++-------------+----+-------------------------+-----------------------+-----------------------+ +| root.sgcc|null| 2| 2| 604800| +| root.ln|null| 2| 2| 604800| ++-------------+----+-------------------------+-----------------------+-----------------------+ +Total line number = 2 +It costs 0.060s +``` + +### 删除数据库 + +用户可以使用`DELETE DATABASE `语句删除该路径模式匹配的所有的数据库。在删除的过程中,需要注意的是数据库的数据也会被删除。 + +``` +IoTDB > DELETE DATABASE root.ln +IoTDB > DELETE DATABASE root.sgcc +// 删除所有数据,时间序列以及数据库 +IoTDB > DELETE DATABASE root.** +``` + +### 统计数据库数量 + +用户可以使用`COUNT DATABASES `语句统计数据库的数量,允许指定`PathPattern` 用来统计匹配该`PathPattern` 的数据库的数量 + +SQL 语句如下所示: + +``` +IoTDB> show databases +IoTDB> count databases +IoTDB> count databases root.* +IoTDB> count databases root.sgcc.* +IoTDB> count databases root.sgcc +``` + +执行结果为: + +``` ++-------------+ +| database| ++-------------+ +| root.sgcc| +| root.turbine| +| root.ln| ++-------------+ +Total line number = 3 +It costs 0.003s + ++-------------+ +| Database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.003s + ++-------------+ +| Database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| Database| ++-------------+ +| 0| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 1| ++-------------+ +Total line number = 1 +It costs 0.002s +``` + +### 数据存活时间(TTL) + +IoTDB 支持对 device 级别设置数据存活时间(TTL),这使得 IoTDB 可以定期、自动地删除一定时间之前的数据。合理使用 TTL +可以帮助您控制 IoTDB 占用的总磁盘空间以避免出现磁盘写满等异常。并且,随着文件数量的增多,查询性能往往随之下降, +内存占用也会有所提高。及时地删除一些较老的文件有助于使查询性能维持在一个较高的水平和减少内存资源的占用。 + +TTL的默认单位为毫秒,如果配置文件中的时间精度修改为其他单位,设置ttl时仍然使用毫秒单位。 + +当设置 TTL 时,系统会根据设置的路径寻找所包含的所有 device,并为这些 device 设置 TTL 时间,系统会按设备粒度对过期数据进行删除。 +当设备数据过期后,将不能被查询到,但磁盘文件中的数据不能保证立即删除(会在一定时间内删除),但可以保证最终被删除。 +考虑到操作代价,系统不会立即物理删除超过 TTL 的数据,而是通过合并来延迟地物理删除。因此,在数据被物理删除前,如果调小或者解除 TTL,可能会导致之前因 TTL 而不可见的数据重新出现。 +系统中仅能设置至多 1000 条 TTL 规则,达到该上限时,需要先删除部分 TTL 规则才能设置新的规则 + +#### TTL Path 规则 +设置的路径 path 只支持前缀路径(即路径中间不能带 \* , 且必须以 \*\* 结尾),该路径会匹配到设备,也允许用户指定不带星的 path 为具体的 database 或 device,当 path 不带 \* 时,会检查是否匹配到 database,若匹配到 database,则会同时设置 path 和 path.\*\*。 +注意:设备 TTL 设置不会对元数据的存在性进行校验,即允许对一条不存在的设备设置 TTL。 +``` +合格的 path: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +不合格的 path: +root.*.db +root.**.db.* +root.db.* +``` +#### TTL 适用规则 +当一个设备适用多条TTL规则时,优先适用较精确和较长的规则。例如对于设备“root.bj.hd.dist001.turbine001”来说,规则“root.bj.hd.dist001.turbine001”比“root.bj.hd.dist001.\*\*”优先,而规则“root.bj.hd.dist001.\*\*”比“root.bj.hd.\*\*”优先; +#### 设置 TTL +set ttl 操作可以理解为设置一条 TTL规则,比如 set ttl to root.sg.group1.\*\* 就相当于对所有可以匹配到该路径模式的设备挂载 ttl。 unset ttl 操作表示对相应路径模式卸载 TTL,若不存在对应 TTL,则不做任何事。若想把 TTL 调成无限大,则可以使用 INF 关键字 +设置 TTL 的 SQL 语句如下所示: +``` +set ttl to pathPattern 360000; +``` +pathPattern 是前缀路径,即路径中间不能带 \* 且必须以 \*\* 结尾。 +pathPattern 匹配对应的设备。为了兼容老版本 SQL 语法,允许用户输入的 pathPattern 匹配到 db,则自动将前缀路径扩展为 path.\*\*。 +例如,写set ttl to root.sg 360000 则会自动转化为set ttl to root.sg.\*\* 360000,转化后的语句对所有 root.sg 下的 device 设置TTL。 +但若写的 pathPattern 无法匹配到 db,则上述逻辑不会生效。 +如写set ttl to root.sg.group 360000 ,由于root.sg.group未匹配到 db,则不会被扩充为root.sg.group.\*\*。 也允许指定具体 device,不带 \*。 +#### 取消 TTL + +取消 TTL 的 SQL 语句如下所示: + +``` +IoTDB> unset ttl from root.ln +``` + +取消设置 TTL 后, `root.ln` 路径下所有的数据都会被保存。 +``` +IoTDB> unset ttl from root.sgcc.** +``` + +取消设置`root.sgcc`路径下的所有的 TTL 。 +``` +IoTDB> unset ttl from root.** +``` + +取消设置所有的 TTL 。 + +新语法 +``` +IoTDB> unset ttl from root.** +``` + +旧语法 +``` +IoTDB> unset ttl to root.** +``` +新旧语法在功能上没有区别并且同时兼容,仅是新语法在用词上更符合常规。 +#### 显示 TTL + +显示 TTL 的 SQL 语句如下所示: +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` +SHOW ALL TTL 这个例子会给出所有的 TTL。 +SHOW TTL ON pathPattern 这个例子会显示指定路径的 TTL。 + +显示设备的 TTL。 +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +所有设备都一定会有 TTL,即不可能是 null。INF 表示无穷大。 + + +### 设置异构数据库(进阶操作) + +在熟悉 IoTDB 元数据建模的前提下,用户可以在 IoTDB 中设置异构的数据库,以便应对不同的生产需求。 + +目前支持的数据库异构参数有: + +| 参数名 | 参数类型 | 参数描述 | +|---------------------------|---------|---------------------------| +| TTL | Long | 数据库的 TTL | +| SCHEMA_REPLICATION_FACTOR | Integer | 数据库的元数据副本数 | +| DATA_REPLICATION_FACTOR | Integer | 数据库的数据副本数 | +| SCHEMA_REGION_GROUP_NUM | Integer | 数据库的 SchemaRegionGroup 数量 | +| DATA_REGION_GROUP_NUM | Integer | 数据库的 DataRegionGroup 数量 | + +用户在配置异构参数时需要注意以下三点: ++ TTL 和 TIME_PARTITION_INTERVAL 必须为正整数。 ++ SCHEMA_REPLICATION_FACTOR 和 DATA_REPLICATION_FACTOR 必须小于等于已部署的 DataNode 数量。 ++ SCHEMA_REGION_GROUP_NUM 和 DATA_REGION_GROUP_NUM 的功能与 iotdb-common.properties 配置文件中的 +`schema_region_group_extension_policy` 和 `data_region_group_extension_policy` 参数相关,以 DATA_REGION_GROUP_NUM 为例: +若设置 `data_region_group_extension_policy=CUSTOM`,则 DATA_REGION_GROUP_NUM 将作为 Database 拥有的 DataRegionGroup 的数量; +若设置 `data_region_group_extension_policy=AUTO`,则 DATA_REGION_GROUP_NUM 将作为 Database 拥有的 DataRegionGroup 的配额下界,即当该 Database 开始写入数据时,将至少拥有此数量的 DataRegionGroup。 + +用户可以在创建 Database 时设置任意异构参数,或在单机/分布式 IoTDB 运行时调整部分异构参数。 + +#### 创建 Database 时设置异构参数 + +用户可以在创建 Database 时设置上述任意异构参数,SQL 语句如下所示: + +``` +CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? +``` + +例如: +``` +CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### 运行时调整异构参数 + +用户可以在 IoTDB 运行时调整部分异构参数,SQL 语句如下所示: + +``` +ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* +``` + +例如: +``` +ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +注意,运行时只能调整下列异构参数: ++ SCHEMA_REGION_GROUP_NUM ++ DATA_REGION_GROUP_NUM + +#### 查看异构数据库 + +用户可以查询每个 Database 的具体异构配置,SQL 语句如下所示: + +``` +SHOW DATABASES DETAILS prefixPath? +``` + +例如: + +``` +IoTDB> SHOW DATABASES DETAILS ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| +|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| +|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +Total line number = 3 +It costs 0.058s +``` + +各列查询结果依次为: ++ 数据库名称 ++ 数据库的 TTL ++ 数据库的元数据副本数 ++ 数据库的数据副本数 ++ 数据库的时间分区间隔 ++ 数据库当前拥有的 SchemaRegionGroup 数量 ++ 数据库需要拥有的最小 SchemaRegionGroup 数量 ++ 数据库允许拥有的最大 SchemaRegionGroup 数量 ++ 数据库当前拥有的 DataRegionGroup 数量 ++ 数据库需要拥有的最小 DataRegionGroup 数量 ++ 数据库允许拥有的最大 DataRegionGroup 数量 + + +## 设备模板管理 + +IoTDB 支持设备模板功能,实现同类型不同实体的物理量元数据共享,减少元数据内存占用,同时简化同类型实体的管理。 + + +![img](https://alioss.timecho.com/docs/img/%E6%A8%A1%E6%9D%BF.png) + +![img](https://alioss.timecho.com/docs/img/template.jpg) + +### 创建设备模板 + +创建设备模板的 SQL 语法如下: + +```sql +CREATE DEVICE TEMPLATE ALIGNED? '(' [',' ]+ ')' +``` + +**示例1:** 创建包含两个非对齐序列的元数据模板 + +```shell +IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +**示例2:** 创建包含一组对齐序列的元数据模板 + +```shell +IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` + +其中,物理量 `lat` 和 `lon` 是对齐的。 + +### 挂载设备模板 + +元数据模板在创建后,需执行挂载操作,方可用于相应路径下的序列创建与数据写入。 + +**挂载模板前,需确保相关数据库已经创建。** + +**推荐将模板挂载在 database 节点上,不建议将模板挂载到 database 上层的节点上。** + +**模板挂载路径下禁止创建普通序列,已创建了普通序列的前缀路径上不允许挂载模板。** + +挂载元数据模板的 SQL 语句如下所示: + +```shell +IoTDB> set device template t1 to root.sg1.d1 +``` + +### 激活设备模板 + +挂载好设备模板后,且系统开启自动注册序列功能的情况下,即可直接进行数据的写入。例如 database 为 root.sg1,模板 t1 被挂载到了节点 root.sg1.d1,那么可直接向时间序列(如 root.sg1.d1.temperature 和 root.sg1.d1.status)写入时间序列数据,该时间序列已可被当作正常创建的序列使用。 + +**注意**:在插入数据之前或系统未开启自动注册序列功能,模板定义的时间序列不会被创建。可以使用如下SQL语句在插入数据前创建时间序列即激活模板: + +```shell +IoTDB> create timeseries using device template on root.sg1.d1 +``` + +**示例:** 执行以下语句 +```shell +IoTDB> set device template t1 to root.sg1.d1 +IoTDB> set device template t2 to root.sg1.d2 +IoTDB> create timeseries using device template on root.sg1.d1 +IoTDB> create timeseries using device template on root.sg1.d2 +``` + +查看此时的时间序列: +```sql +show timeseries root.sg1.** +``` + +```shell ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| +| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +``` + +查看此时的设备: +```sql +show devices root.sg1.** +``` + +```shell ++---------------+---------+---------+ +| devices|isAligned| Template| ++---------------+---------+---------+ +| root.sg1.d1| false| null| +| root.sg1.d2| true| null| ++---------------+---------+---------+ +``` + +### 查看设备模板 + +- 查看所有设备模板 + +SQL 语句如下所示: + +```shell +IoTDB> show device templates +``` + +执行结果如下: +```shell ++-------------+ +|template name| ++-------------+ +| t2| +| t1| ++-------------+ +``` + +- 查看某个设备模板下的物理量 + +SQL 语句如下所示: + +```shell +IoTDB> show nodes in device template t1 +``` + +执行结果如下: +```shell ++-----------+--------+--------+-----------+ +|child nodes|dataType|encoding|compression| ++-----------+--------+--------+-----------+ +|temperature| FLOAT| RLE| SNAPPY| +| status| BOOLEAN| PLAIN| SNAPPY| ++-----------+--------+--------+-----------+ +``` + +- 查看挂载了某个设备模板的路径 + +```shell +IoTDB> show paths set device template t1 +``` + +执行结果如下: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +- 查看使用了某个设备模板的路径(即模板在该路径上已激活,序列已创建) + +```shell +IoTDB> show paths using device template t1 +``` + +执行结果如下: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +### 解除设备模板 + +若需删除模板表示的某一组时间序列,可采用解除模板操作,SQL语句如下所示: + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.d1 +``` + +或 + +```shell +IoTDB> deactivate device template t1 from root.sg1.d1 +``` + +解除操作支持批量处理,SQL语句如下所示: + +```shell +IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* +``` + +或 + +```shell +IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* +``` + +若解除命令不指定模板名称,则会将给定路径涉及的所有模板使用情况均解除。 + +### 卸载设备模板 + +卸载设备模板的 SQL 语句如下所示: + +```shell +IoTDB> unset device template t1 from root.sg1.d1 +``` + +**注意**:不支持卸载仍处于激活状态的模板,需保证执行卸载操作前解除对该模板的所有使用,即删除所有该模板表示的序列。 + +### 删除设备模板 + +删除设备模板的 SQL 语句如下所示: + +```shell +IoTDB> drop device template t1 +``` + +**注意**:不支持删除已经挂载的模板,需在删除操作前保证该模板卸载成功。 + +### 修改设备模板 + +在需要新增物理量的场景中,可以通过修改设备模板来给所有已激活该模板的设备新增物理量。 + +修改设备模板的 SQL 语句如下所示: + +```shell +IoTDB> alter device template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) +``` + +**向已挂载模板的路径下的设备中写入数据,若写入请求中的物理量不在模板中,将自动扩展模板。** + + +## 时间序列管理 + +### 创建时间序列 + +根据建立的数据模型,我们可以分别在两个数据库中创建相应的时间序列。创建时间序列的 SQL 语句如下所示: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +从 v0.13 起,可以使用简化版的 SQL 语句创建时间序列: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE +``` + +需要注意的是,当创建时间序列时指定的编码方式与数据类型不对应时,系统会给出相应的错误提示,如下所示: +``` +IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +error: encoding TS_2DIFF does not support BOOLEAN +``` + +详细的数据类型与编码方式的对应列表请参见 [编码方式](../Basic-Concept/Encoding-and-Compression.md)。 + +### 创建对齐时间序列 + +创建一组对齐时间序列的SQL语句如下所示: + +``` +IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +一组对齐序列中的序列可以有不同的数据类型、编码方式以及压缩方式。 + +对齐的时间序列也支持设置别名、标签、属性。 + +### 删除时间序列 + +我们可以使用`(DELETE | DROP) TimeSeries `语句来删除我们之前创建的时间序列。SQL 语句如下所示: + +``` +IoTDB> delete timeseries root.ln.wf01.wt01.status +IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +IoTDB> delete timeseries root.ln.wf02.* +IoTDB> drop timeseries root.ln.wf02.* +``` + +### 查看时间序列 + +* SHOW LATEST? TIMESERIES pathPattern? timeseriesWhereClause? limitClause? + + SHOW TIMESERIES 中可以有四种可选的子句,查询结果为这些时间序列的所有信息 + +时间序列信息具体包括:时间序列路径名,database,Measurement 别名,数据类型,编码方式,压缩方式,属性和标签。 + +示例: + +* SHOW TIMESERIES + + 展示系统中所有的时间序列信息 + +* SHOW TIMESERIES <`Path`> + + 返回给定路径的下的所有时间序列信息。其中 `Path` 需要为一个时间序列路径或路径模式。例如,分别查看`root`路径和`root.ln`路径下的时间序列,SQL 语句如下所示: + +``` +IoTDB> show timeseries root.** +IoTDB> show timeseries root.ln.** +``` + +执行结果分别为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.016s + ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +Total line number = 4 +It costs 0.004s +``` + +* SHOW TIMESERIES LIMIT INT OFFSET INT + + 只返回从指定下标开始的结果,最大返回条数被 LIMIT 限制,用于分页查询。例如: + +``` +show timeseries root.ln.** limit 10 offset 10 +``` + +* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' + + 对查询结果集根据 timeseries 名称进行字符串模糊匹配过滤。例如: + +``` +show timeseries root.ln.** where timeseries contains 'wf01.wt' +``` + +执行结果为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 2 +It costs 0.016s +``` + +* SHOW TIMESERIES WHERE DataType=type + + 对查询结果集根据时间序列数据类型进行过滤。例如: + +``` +show timeseries root.ln.** where dataType=FLOAT +``` + +执行结果为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 3 +It costs 0.016s + +``` + + +* SHOW LATEST TIMESERIES + + 表示查询出的时间序列需要按照最近插入时间戳降序排列 + + +需要注意的是,当查询路径不存在时,系统会返回 0 条时间序列。 + +### 统计时间序列总数 + +IoTDB 支持使用`COUNT TIMESERIES`来统计一条路径中的时间序列个数。SQL 语句如下所示: + +* 可以通过 `WHERE` 条件对时间序列名称进行字符串模糊匹配,语法为: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'` 。 +* 可以通过 `WHERE` 条件对时间序列数据类型进行过滤,语法为: `COUNT TIMESERIES WHERE DataType='`。 +* 可以通过 `WHERE` 条件对标签点进行过滤,语法为: `COUNT TIMESERIES WHERE TAGS(key)='value'` 或 `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`。 +* 可以通过定义`LEVEL`来统计指定层级下的时间序列个数。这条语句可以用来统计每一个设备下的传感器数量,语法为:`COUNT TIMESERIES GROUP BY LEVEL=`。 + +``` +IoTDB > COUNT TIMESERIES root.** +IoTDB > COUNT TIMESERIES root.ln.** +IoTDB > COUNT TIMESERIES root.ln.*.*.status +IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +``` + +例如有如下时间序列(可以使用`show timeseries`展示所有时间序列): + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.004s +``` + +那么 Metadata Tree 如下所示: + + + +可以看到,`root`被定义为`LEVEL=0`。那么当你输入如下语句时: + +``` +IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +你将得到以下结果: + +``` +IoTDB> COUNT TIMESERIES root.** GROUP BY LEVEL=1 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +| root.sgcc| 2| +|root.turbine| 1| +| root.ln| 4| ++------------+-----------------+ +Total line number = 3 +It costs 0.002s + +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf02| 2| +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 2 +It costs 0.002s + +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 1 +It costs 0.002s +``` + +> 注意:时间序列的路径只是过滤条件,与 level 的定义无关。 + +### 活跃时间序列查询 +我们在原有的时间序列查询和统计上添加新的WHERE时间过滤条件,可以得到在指定时间范围中存在数据的时间序列。 + +需要注意的是, 在带有时间过滤的元数据查询中并不考虑视图的存在,只考虑TsFile中实际存储的时间序列。 + +一个使用样例如下: +``` +IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); +IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); +IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); +IoTDB> show timeseries; ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data3.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data3.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ + +IoTDB> show timeseries where time >= 15000 and time < 16000; ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| root.sg.data.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +| root.sg.data.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s1| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| +|root.sg.data2.s2| null| root.sg| FLOAT| GORILLA| LZ4|null| null| null| null| BASE| ++----------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ + +IoTDB> count timeseries where time >= 15000 and time < 16000; ++-----------------+ +|count(timeseries)| ++-----------------+ +| 4| ++-----------------+ +``` +关于活跃时间序列的定义,能通过正常查询查出来的数据就是活跃数据,也就是说插入但被删除的时间序列不在考虑范围内。 + +### 标签点管理 + +我们可以在创建时间序列的时候,为它添加别名和额外的标签和属性信息。 + +标签和属性的区别在于: + +* 标签可以用来查询时间序列路径,会在内存中维护标点到时间序列路径的倒排索引:标签 -> 时间序列路径 +* 属性只能用时间序列路径来查询:时间序列路径 -> 属性 + +所用到的扩展的创建时间序列的 SQL 语句如下所示: +``` +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +括号里的`temprature`是`s1`这个传感器的别名。 +我们可以在任何用到`s1`的地方,将其用`temprature`代替,这两者是等价的。 + +> IoTDB 同时支持在查询语句中使用 AS 函数设置别名。二者的区别在于:AS 函数设置的别名用于替代整条时间序列名,且是临时的,不与时间序列绑定;而上文中的别名只作为传感器的别名,与其绑定且可与原传感器名等价使用。 + +> 注意:额外的标签和属性信息总的大小不能超过`tag_attribute_total_size`. + + * 标签点属性更新 +创建时间序列后,我们也可以对其原有的标签点属性进行更新,主要有以下六种更新方式: +* 重命名标签或属性 +``` +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` +* 重新设置标签或属性的值 +``` +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` +* 删除已经存在的标签或属性 +``` +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` +* 添加新的标签 +``` +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` +* 添加新的属性 +``` +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` +* 更新插入别名,标签和属性 +> 如果该别名,标签或属性原来不存在,则插入,否则,用新值更新原来的旧值 +``` +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* 使用标签作为过滤条件查询时间序列,使用 TAGS(tagKey) 来标识作为过滤条件的标签 +``` +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +返回给定路径的下的所有满足条件的时间序列信息,SQL 语句如下所示: + +``` +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +执行结果分别为: + +``` ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.005s + ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.004s +``` + +- 使用标签作为过滤条件统计时间序列数量 + +``` +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +返回给定路径的下的所有满足条件的时间序列的数量,SQL 语句如下所示: + +``` +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +执行结果分别为: + +``` +IoTDB> count timeseries ++-----------------+ +|count(timeseries)| ++-----------------+ +| 6| ++-----------------+ +Total line number = 1 +It costs 0.019s +IoTDB> count timeseries root.** where TAGS(unit)='c' ++-----------------+ +|count(timeseries)| ++-----------------+ +| 2| ++-----------------+ +Total line number = 1 +It costs 0.020s +IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 ++--------------+-----------------+ +| column|count(timeseries)| ++--------------+-----------------+ +| root.ln.wf02| 2| +| root.ln.wf01| 0| +|root.sgcc.wf03| 0| ++--------------+-----------------+ +Total line number = 3 +It costs 0.011s +``` + +> 注意,现在我们只支持一个查询条件,要么是等值条件查询,要么是包含条件查询。当然 where 子句中涉及的必须是标签值,而不能是属性值。 + +创建对齐时间序列 + +``` +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +执行结果如下: + +``` +IoTDB> show timeseries ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| +|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +支持查询: + +``` +IoTDB> show timeseries where TAGS(tag1)='v1' ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +上述对时间序列标签、属性的更新等操作都支持。 + + +## 路径查询 + +### 查看路径的所有子路径 + +``` +SHOW CHILD PATHS pathPattern +``` + +可以查看此路径模式所匹配的所有路径的下一层的所有路径和它对应的节点类型,即pathPattern.*所匹配的路径及其节点类型。 + +节点类型:ROOT -> SG INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES + +示例: + +* 查询 root.ln 的下一层:show child paths root.ln + +``` ++------------+----------+ +| child paths|node types| ++------------+----------+ +|root.ln.wf01| INTERNAL| +|root.ln.wf02| INTERNAL| ++------------+----------+ +Total line number = 2 +It costs 0.002s +``` + +* 查询形如 root.xx.xx.xx 的路径:show child paths root.\*.\* + +``` ++---------------+ +| child paths| ++---------------+ +|root.ln.wf01.s1| +|root.ln.wf02.s2| ++---------------+ +``` + +### 查看路径的下一级节点 + +``` +SHOW CHILD NODES pathPattern +``` + +可以查看此路径模式所匹配的节点的下一层的所有节点。 + +示例: + +* 查询 root 的下一层:show child nodes root + +``` ++------------+ +| child nodes| ++------------+ +| ln| ++------------+ +``` + +* 查询 root.ln 的下一层 :show child nodes root.ln + +``` ++------------+ +| child nodes| ++------------+ +| wf01| +| wf02| ++------------+ +``` + +### 统计节点数 + +IoTDB 支持使用`COUNT NODES LEVEL=`来统计当前 Metadata + 树下满足某路径模式的路径中指定层级的节点个数。这条语句可以用来统计带有特定采样点的设备数。例如: + +``` +IoTDB > COUNT NODES root.** LEVEL=2 +IoTDB > COUNT NODES root.ln.** LEVEL=2 +IoTDB > COUNT NODES root.ln.wf01.* LEVEL=3 +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` + +对于上面提到的例子和 Metadata Tree,你可以获得如下结果: + +``` ++------------+ +|count(nodes)| ++------------+ +| 4| ++------------+ +Total line number = 1 +It costs 0.003s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 1| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s +``` + +> 注意:时间序列的路径只是过滤条件,与 level 的定义无关。 + +### 查看设备 + +* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? + +与 `Show Timeseries` 相似,IoTDB 目前也支持两种方式查看设备。 + +* `SHOW DEVICES` 语句显示当前所有的设备信息,等价于 `SHOW DEVICES root.**`。 +* `SHOW DEVICES ` 语句规定了 `PathPattern`,返回给定的路径模式所匹配的设备信息。 +* `WHERE` 条件中可以使用 `DEVICE contains 'xxx'`,根据 device 名称进行模糊查询。 +* `WHERE` 条件中可以使用 `TEMPLATE = 'xxx'`,`TEMPLATE != 'xxx'`,根据 template 名称进行过滤查询。 +* `WHERE` 条件中可以使用 `TEMPLATE is null`,`TEMPLATE is not null`,根据 template 是否为null(null 表示没激活)进行过滤查询。 + +SQL 语句如下所示: + +``` +IoTDB> show devices +IoTDB> show devices root.ln.** +IoTDB> show devices root.ln.** where device contains 't' +IoTDB> show devices root.ln.** where template = 't1' +IoTDB> show devices root.ln.** where template is null +IoTDB> show devices root.ln.** where template != 't1' +IoTDB> show devices root.ln.** where template is not null +``` + +你可以获得如下数据: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +| root.ln.wf01.wt01| false| t1| +| root.ln.wf02.wt02| false| null| +|root.sgcc.wf03.wt01| false| null| +| root.turbine.d1| false| null| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.002s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s +``` + +其中,`isAligned`表示该设备下的时间序列是否对齐, +`Template`显示着该设备所激活的模板名,null 表示没有激活模板。 + +查看设备及其 database 信息,可以使用 `SHOW DEVICES WITH DATABASE` 语句。 + +* `SHOW DEVICES WITH DATABASE` 语句显示当前所有的设备信息和其所在的 database,等价于 `SHOW DEVICES root.**`。 +* `SHOW DEVICES WITH DATABASE` 语句规定了 `PathPattern`,返回给定的路径模式所匹配的设备信息和其所在的 database。 + +SQL 语句如下所示: + +``` +IoTDB> show devices with database +IoTDB> show devices root.ln.** with database +``` + +你可以获得如下数据: + +``` ++-------------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-------------------+-------------+---------+---------+ +| root.ln.wf01.wt01| root.ln| false| t1| +| root.ln.wf02.wt02| root.ln| false| null| +|root.sgcc.wf03.wt01| root.sgcc| false| null| +| root.turbine.d1| root.turbine| false| null| ++-------------------+-------------+---------+---------+ +Total line number = 4 +It costs 0.003s + ++-----------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-----------------+-------------+---------+---------+ +|root.ln.wf01.wt01| root.ln| false| t1| +|root.ln.wf02.wt02| root.ln| false| null| ++-----------------+-------------+---------+---------+ +Total line number = 2 +It costs 0.001s +``` + +### 统计设备数量 + +* COUNT DEVICES \ + +上述语句用于统计设备的数量,同时允许指定`PathPattern` 用于统计匹配该`PathPattern` 的设备数量 + +SQL 语句如下所示: + +``` +IoTDB> show devices +IoTDB> count devices +IoTDB> count devices root.ln.** +``` + +你可以获得如下数据: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +|root.sgcc.wf03.wt03| false| null| +| root.turbine.d1| false| null| +| root.ln.wf02.wt02| false| null| +| root.ln.wf01.wt01| false| t1| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.024s + ++--------------+ +|count(devices)| ++--------------+ +| 4| ++--------------+ +Total line number = 1 +It costs 0.004s + ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +Total line number = 1 +It costs 0.004s +``` + +### 活跃设备查询 +和活跃时间序列一样,我们可以在查看和统计设备的基础上添加时间过滤条件来查询在某段时间内存在数据的活跃设备。这里活跃的定义与活跃时间序列相同,使用样例如下: +``` +IoTDB> insert into root.sg.data(timestamp, s1,s2) values(15000, 1, 2); +IoTDB> insert into root.sg.data2(timestamp, s1,s2) values(15002, 1, 2); +IoTDB> insert into root.sg.data3(timestamp, s1,s2) values(16000, 1, 2); +IoTDB> show devices; ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +| root.sg.data| false| +| root.sg.data2| false| +| root.sg.data3| false| ++-------------------+---------+ + +IoTDB> show devices where time >= 15000 and time < 16000; ++-------------------+---------+ +| devices|isAligned| ++-------------------+---------+ +| root.sg.data| false| +| root.sg.data2| false| ++-------------------+---------+ + +IoTDB> count devices where time >= 15000 and time < 16000; ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Query-Data.md b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Query-Data.md new file mode 100644 index 00000000..9988c1ee --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Query-Data.md @@ -0,0 +1,3041 @@ + + +# 数据查询 +## 概述 + +在 IoTDB 中,使用 `SELECT` 语句从一条或多条时间序列中查询数据,IoTDB 不区分历史数据和实时数据,用户可以用统一的sql语法进行查询,通过 `WHERE` 子句中的时间过滤谓词决定查询的时间范围。 + +### 语法定义 + +```sql +SELECT [LAST] selectExpr [, selectExpr] ... + [INTO intoItem [, intoItem] ...] + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY { + ([startTime, endTime), interval [, slidingStep]) | + LEVEL = levelNum [, levelNum] ... | + TAGS(tagKey [, tagKey] ... | + VARIATION(expression[,delta][,ignoreNull=true/false]) | + CONDITION(expression,[keep>/>=/=/ 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` + +其含义为: + +被选择的设备为 ln 集团 wf01 子站 wt01 设备;被选择的时间序列为供电状态(status)和温度传感器(temperature);该语句要求选择出 “2017-11-01T00:05:00.000” 至 “2017-11-01T00:12:00.000” 之间的所选时间序列的值。 + +该 SQL 语句的执行结果如下: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 6 +It costs 0.018s +``` + +#### 示例3:按照多个时间区间选择同一设备的多列数据 + +IoTDB 支持在一次查询中指定多个时间区间条件,用户可以根据需求随意组合时间区间条件。例如, + +SQL 语句为: + +```sql +select status, temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +其含义为: + +被选择的设备为 ln 集团 wf01 子站 wt01 设备;被选择的时间序列为“供电状态(status)”和“温度传感器(temperature)”;该语句指定了两个不同的时间区间,分别为“2017-11-01T00:05:00.000 至 2017-11-01T00:12:00.000”和“2017-11-01T16:35:00.000 至 2017-11-01T16:37:00.000”;该语句要求选择出满足任一时间区间的被选时间序列的值。 + +该 SQL 语句的执行结果如下: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| +|2017-11-01T16:35:00.000+08:00| true| 23.44| +|2017-11-01T16:36:00.000+08:00| false| 21.98| +|2017-11-01T16:37:00.000+08:00| false| 21.93| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 9 +It costs 0.018s +``` + +#### 示例4:按照多个时间区间选择不同设备的多列数据 + +该系统支持在一次查询中选择任意列的数据,也就是说,被选择的列可以来源于不同的设备。例如,SQL 语句为: + +```sql +select wf01.wt01.status, wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +其含义为: + +被选择的时间序列为 “ln 集团 wf01 子站 wt01 设备的供电状态” 以及 “ln 集团 wf02 子站 wt02 设备的硬件版本”;该语句指定了两个时间区间,分别为 “2017-11-01T00:05:00.000 至 2017-11-01T00:12:00.000” 和 “2017-11-01T16:35:00.000 至 2017-11-01T16:37:00.000”;该语句要求选择出满足任意时间区间的被选时间序列的值。 + +该 SQL 语句的执行结果如下: + +``` ++-----------------------------+------------------------+--------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf02.wt02.hardware| ++-----------------------------+------------------------+--------------------------+ +|2017-11-01T00:06:00.000+08:00| false| v1| +|2017-11-01T00:07:00.000+08:00| false| v1| +|2017-11-01T00:08:00.000+08:00| false| v1| +|2017-11-01T00:09:00.000+08:00| false| v1| +|2017-11-01T00:10:00.000+08:00| true| v2| +|2017-11-01T00:11:00.000+08:00| false| v1| +|2017-11-01T16:35:00.000+08:00| true| v2| +|2017-11-01T16:36:00.000+08:00| false| v1| +|2017-11-01T16:37:00.000+08:00| false| v1| ++-----------------------------+------------------------+--------------------------+ +Total line number = 9 +It costs 0.014s +``` + +#### 示例5:根据时间降序返回结果集 + +IoTDB 支持 `order by time` 语句,用于对结果按照时间进行降序展示。例如,SQL 语句为: + +```sql +select * from root.ln.** where time > 1 order by time desc limit 10; +``` + +语句执行的结果为: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-07T23:59:00.000+08:00| v1| false| 21.07| false| +|2017-11-07T23:58:00.000+08:00| v1| false| 22.93| false| +|2017-11-07T23:57:00.000+08:00| v2| true| 24.39| true| +|2017-11-07T23:56:00.000+08:00| v2| true| 24.44| true| +|2017-11-07T23:55:00.000+08:00| v2| true| 25.9| true| +|2017-11-07T23:54:00.000+08:00| v1| false| 22.52| false| +|2017-11-07T23:53:00.000+08:00| v2| true| 24.58| true| +|2017-11-07T23:52:00.000+08:00| v1| false| 20.18| false| +|2017-11-07T23:51:00.000+08:00| v1| false| 22.24| false| +|2017-11-07T23:50:00.000+08:00| v2| true| 23.7| true| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.016s +``` + +### 查询执行接口 + +在 IoTDB 中,提供两种方式执行数据查询操作: +- 使用 IoTDB-SQL 执行查询。 +- 常用查询的高效执行接口,包括时间序列原始数据范围查询、最新点查询、简单聚合查询。 + +#### 使用 IoTDB-SQL 执行查询 + +数据查询语句支持在 SQL 命令行终端、JDBC、JAVA / C++ / Python / Go 等编程语言 API、RESTful API 中使用。 + +- 在 SQL 命令行终端中执行查询语句:启动 SQL 命令行终端,直接输入查询语句执行即可,详见 [SQL 命令行终端](../Tools-System/CLI.md)。 + +- 在 JDBC 中执行查询语句,详见 [JDBC](../API/Programming-JDBC.md) 。 + +- 在 JAVA / C++ / Python / Go 等编程语言 API 中执行查询语句,详见应用编程接口一章相应文档。接口原型如下: + + ```java + SessionDataSet executeQueryStatement(String sql); + ``` + +- 在 RESTful API 中使用,详见 [HTTP API V1](../API/RestServiceV1.md) 或者 [HTTP API V2](../API/RestServiceV2.md)。 + +#### 常用查询的高效执行接口 + +各编程语言的 API 为常用的查询提供了高效执行接口,可以省去 SQL 解析等操作的耗时。包括: + +* 时间序列原始数据范围查询: + - 指定的查询时间范围为左闭右开区间,包含开始时间但不包含结束时间。 + +```java +SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); +``` + +* 最新点查询: + - 查询最后一条时间戳大于等于某个时间点的数据。 + +```java +SessionDataSet executeLastDataQuery(List paths, long lastTime); +``` + +* 聚合查询: + - 支持指定查询时间范围。指定的查询时间范围为左闭右开区间,包含开始时间但不包含结束时间。 + - 支持按照时间区间分段查询。 + +```java +SessionDataSet executeAggregationQuery(List paths, List aggregations); + +SessionDataSet executeAggregationQuery( + List paths, List aggregations, long startTime, long endTime); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval, + long slidingStep); +``` + +## 选择表达式(SELECT FROM 子句) + +`SELECT` 子句指定查询的输出,由若干个 `selectExpr` 组成。 每个 `selectExpr` 定义了查询结果中的一列或多列。 + +**`selectExpr` 是一个由时间序列路径后缀、常量、函数和运算符组成的表达式。即 `selectExpr` 中可以包含:** +- 时间序列路径后缀(支持使用通配符) +- 运算符 + - 算数运算符 + - 比较运算符 + - 逻辑运算符 +- 函数 + - 聚合函数 + - 时间序列生成函数(包括内置函数和用户自定义函数) +- 常量 + +### 使用别名 + +由于 IoTDB 独特的数据模型,在每个传感器前都附带有设备等诸多额外信息。有时,我们只针对某个具体设备查询,而这些前缀信息频繁显示造成了冗余,影响了结果集的显示与分析。 + +IoTDB 支持使用`AS`为查询结果集中的列指定别名。 + +**示例:** + +```sql +select s1 as temperature, s2 as speed from root.ln.wf01.wt01; +``` + +结果集将显示为: + +| Time | temperature | speed | +| ---- | ----------- | ----- | +| ... | ... | ... | + +### 运算符 + +IoTDB 中支持的运算符列表见文档 [运算符和函数](../SQL-Manual/Operator-and-Expression.md)。 + +### 函数 + +#### 聚合函数 + +聚合函数是多对一函数。它们对一组值进行聚合计算,得到单个聚合结果。 + +**包含聚合函数的查询称为聚合查询**,否则称为时间序列查询。 + +**注意:聚合查询和时间序列查询不能混合使用。** 下列语句是不支持的: + +```sql +select s1, count(s1) from root.sg.d1; +select sin(s1), count(s1) from root.sg.d1; +select s1, count(s1) from root.sg.d1 group by ([10,100),10ms); +``` + +IoTDB 支持的聚合函数见文档 [聚合函数](../SQL-Manual/Operator-and-Expression.md#内置函数)。 + +#### 时间序列生成函数 + +时间序列生成函数接受若干原始时间序列作为输入,产生一列时间序列输出。与聚合函数不同的是,时间序列生成函数的结果集带有时间戳列。 + +所有的时间序列生成函数都可以接受 * 作为输入,都可以与原始时间序列查询混合进行。 + +##### 内置时间序列生成函数 + +IoTDB 中支持的内置函数列表见文档 [运算符和函数](../SQL-Manual/Operator-and-Expression.md)。 + +##### 自定义时间序列生成函数 + +IoTDB 支持通过用户自定义函数(点击查看: [用户自定义函数](../User-Manual/Database-Programming.md#用户自定义函数) )能力进行函数功能扩展。 + +### 嵌套表达式举例 + +IoTDB 支持嵌套表达式,由于聚合查询和时间序列查询不能在一条查询语句中同时出现,我们将支持的嵌套表达式分为时间序列查询嵌套表达式和聚合查询嵌套表达式两类。 + +#### 时间序列查询嵌套表达式 + +IoTDB 支持在 `SELECT` 子句中计算由**时间序列、常量、时间序列生成函数(包括用户自定义函数)和运算符**组成的任意嵌套表达式。 + +**说明:** + +- 当某个时间戳下左操作数和右操作数都不为空(`null`)时,表达式才会有结果,否则表达式值为`null`,且默认不出现在结果集中。 +- 如果表达式中某个操作数对应多条时间序列(如通配符 `*`),那么每条时间序列对应的结果都会出现在结果集中(按照笛卡尔积形式)。 + +**示例 1:** + +```sql +select a, + b, + ((a + 1) * 2 - 1) % 2 + 1.5, + sin(a + sin(a + sin(b))), + -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 +from root.sg1; +``` + +运行结果: + +``` ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Time|root.sg1.a|root.sg1.b|((((root.sg1.a + 1) * 2) - 1) % 2) + 1.5|sin(root.sg1.a + sin(root.sg1.a + sin(root.sg1.b)))|(-root.sg1.a + root.sg1.b * ((sin(root.sg1.a + root.sg1.b) * sin(root.sg1.a + root.sg1.b)) + (cos(root.sg1.a + root.sg1.b) * cos(root.sg1.a + root.sg1.b)))) + 1| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 1| 1| 2.5| 0.9238430524420609| -1.0| +|1970-01-01T08:00:00.020+08:00| 2| 2| 2.5| 0.7903505371876317| -3.0| +|1970-01-01T08:00:00.030+08:00| 3| 3| 2.5| 0.14065207680386618| -5.0| +|1970-01-01T08:00:00.040+08:00| 4| null| 2.5| null| null| +|1970-01-01T08:00:00.050+08:00| null| 5| null| null| null| +|1970-01-01T08:00:00.060+08:00| 6| 6| 2.5| -0.7288037411970916| -11.0| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +Total line number = 6 +It costs 0.048s +``` + +**示例 2:** + +```sql +select (a + b) * 2 + sin(a) from root.sg +``` + +运行结果: + +``` ++-----------------------------+----------------------------------------------+ +| Time|((root.sg.a + root.sg.b) * 2) + sin(root.sg.a)| ++-----------------------------+----------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 59.45597888911063| +|1970-01-01T08:00:00.020+08:00| 100.91294525072763| +|1970-01-01T08:00:00.030+08:00| 139.01196837590714| +|1970-01-01T08:00:00.040+08:00| 180.74511316047935| +|1970-01-01T08:00:00.050+08:00| 219.73762514629607| +|1970-01-01T08:00:00.060+08:00| 259.6951893788978| +|1970-01-01T08:00:00.070+08:00| 300.7738906815579| +|1970-01-01T08:00:00.090+08:00| 39.45597888911063| +|1970-01-01T08:00:00.100+08:00| 39.45597888911063| ++-----------------------------+----------------------------------------------+ +Total line number = 9 +It costs 0.011s +``` + +**示例 3:** + +```sql +select (a + *) / 2 from root.sg1 +``` + +运行结果: + +``` ++-----------------------------+-----------------------------+-----------------------------+ +| Time|(root.sg1.a + root.sg1.a) / 2|(root.sg1.a + root.sg1.b) / 2| ++-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.010+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.020+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.030+08:00| 3.0| 3.0| +|1970-01-01T08:00:00.040+08:00| 4.0| null| +|1970-01-01T08:00:00.060+08:00| 6.0| 6.0| ++-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.011s +``` + +**示例 4:** + +```sql +select (a + b) * 3 from root.sg, root.ln +``` + +运行结果: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|(root.sg.a + root.sg.b) * 3|(root.sg.a + root.ln.b) * 3|(root.ln.a + root.sg.b) * 3|(root.ln.a + root.ln.b) * 3| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.010+08:00| 90.0| 270.0| 360.0| 540.0| +|1970-01-01T08:00:00.020+08:00| 150.0| 330.0| 690.0| 870.0| +|1970-01-01T08:00:00.030+08:00| 210.0| 450.0| 570.0| 810.0| +|1970-01-01T08:00:00.040+08:00| 270.0| 240.0| 690.0| 660.0| +|1970-01-01T08:00:00.050+08:00| 330.0| null| null| null| +|1970-01-01T08:00:00.060+08:00| 390.0| null| null| null| +|1970-01-01T08:00:00.070+08:00| 450.0| null| null| null| +|1970-01-01T08:00:00.090+08:00| 60.0| null| null| null| +|1970-01-01T08:00:00.100+08:00| 60.0| null| null| null| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +Total line number = 9 +It costs 0.014s +``` + +#### 聚合查询嵌套表达式 + +IoTDB 支持在 `SELECT` 子句中计算由**聚合函数、常量、时间序列生成函数和表达式**组成的任意嵌套表达式。 + +**说明:** +- 当某个时间戳下左操作数和右操作数都不为空(`null`)时,表达式才会有结果,否则表达式值为`null`,且默认不出现在结果集中。但在使用`GROUP BY`子句的聚合查询嵌套表达式中,我们希望保留每个时间窗口的值,所以表达式值为`null`的窗口也包含在结果集中。 +- 如果表达式中某个操作数对应多条时间序列(如通配符`*`),那么每条时间序列对应的结果都会出现在结果集中(按照笛卡尔积形式)。 + +**示例 1:** + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) +from root.ln.wf01.wt01; +``` + +运行结果: + +``` ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|avg(root.ln.wf01.wt01.temperature) + sum(root.ln.wf01.wt01.hardware)| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +| 15.927999999999999| -0.21826546964855045| 16.927999999999997| -7426.0| 7441.928| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +Total line number = 1 +It costs 0.009s +``` + +**示例 2:** + +```sql +select avg(*), + (avg(*) + 1) * 3 / 2 -1 +from root.sg1 +``` + +运行结果: + +``` ++---------------+---------------+-------------------------------------+-------------------------------------+ +|avg(root.sg1.a)|avg(root.sg1.b)|(avg(root.sg1.a) + 1) * 3 / 2 - 1 |(avg(root.sg1.b) + 1) * 3 / 2 - 1 | ++---------------+---------------+-------------------------------------+-------------------------------------+ +| 3.2| 3.4| 5.300000000000001| 5.6000000000000005| ++---------------+---------------+-------------------------------------+-------------------------------------+ +Total line number = 1 +It costs 0.007s +``` + +**示例 3:** + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) as custom_sum +from root.ln.wf01.wt01 +GROUP BY([10, 90), 10ms); +``` + +运行结果: + +``` ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +| Time|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|custom_sum| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +|1970-01-01T08:00:00.010+08:00| 13.987499999999999| 0.9888207947857667| 14.987499999999999| -3211.0| 3224.9875| +|1970-01-01T08:00:00.020+08:00| 29.6| -0.9701057337071853| 30.6| -3720.0| 3749.6| +|1970-01-01T08:00:00.030+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.040+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.050+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.060+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.070+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.080+08:00| null| null| null| null| null| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +Total line number = 8 +It costs 0.012s +``` + +### 最新点查询 + +最新点查询是时序数据库 Apache IoTDB 中提供的一种特殊查询。它返回指定时间序列中时间戳最大的数据点,即一条序列的最新状态。 + +在物联网数据分析场景中,此功能尤为重要。为了满足了用户对设备实时监控的需求,Apache IoTDB 对最新点查询进行了**缓存优化**,能够提供毫秒级的返回速度。 + +SQL 语法: + +```sql +select last [COMMA ]* from < PrefixPath > [COMMA < PrefixPath >]* [ORDER BY TIMESERIES (DESC | ASC)?] +``` + +其含义是: 查询时间序列 prefixPath.path 中最近时间戳的数据。 + +- `whereClause` 中当前只支持时间过滤条件,任何其他过滤条件都将会返回异常。当缓存的最新点不满足过滤条件时,IoTDB 需要从存储中获取结果,此时性能将会有所下降。 + +- 结果集为四列的结构: + + ``` + +----+----------+-----+--------+ + |Time|timeseries|value|dataType| + +----+----------+-----+--------+ + ``` + +- 可以使用 `ORDER BY TIME/TIMESERIES/VALUE/DATATYPE (DESC | ASC)` 指定结果集按照某一列进行降序/升序排列。当值列包含多种类型的数据时,按照字符串类型来排序。 + +**示例 1:** 查询 root.ln.wf01.wt01.status 的最新数据点 + +``` +IoTDB> select last status from root.ln.wf01.wt01 ++-----------------------------+------------------------+-----+--------+ +| Time| timeseries|value|dataType| ++-----------------------------+------------------------+-----+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.status|false| BOOLEAN| ++-----------------------------+------------------------+-----+--------+ +Total line number = 1 +It costs 0.000s +``` + +**示例 2:** 查询 root.ln.wf01.wt01 下 status,temperature 时间戳大于等于 2017-11-07T23:50:00 的最新数据点。 + +``` +IoTDB> select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 3:** 查询 root.ln.wf01.wt01 下所有序列的最新数据点,并按照序列名降序排列。 + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by timeseries desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 4:** 查询 root.ln.wf01.wt01 下所有序列的最新数据点,并按照dataType降序排列。 + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by dataType desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +## 查询过滤条件(WHERE 子句) + +`WHERE` 子句指定了对数据行的筛选条件,由一个 `whereCondition` 组成。 + +`whereCondition` 是一个逻辑表达式,对于要选择的每一行,其计算结果为真。如果没有 `WHERE` 子句,将选择所有行。 +在 `whereCondition` 中,可以使用除聚合函数之外的任何 IOTDB 支持的函数和运算符。 + +根据过滤条件的不同,可以分为时间过滤条件和值过滤条件。时间过滤条件和值过滤条件可以混合使用。 + +### 时间过滤条件 + +使用时间过滤条件可以筛选特定时间范围的数据。对于时间戳支持的格式,请参考 [时间戳类型](../Basic-Concept/Data-Type.md) 。 + +示例如下: + +1. 选择时间戳大于 2022-01-01T00:05:00.000 的数据: + + ```sql + select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; + ``` + +2. 选择时间戳等于 2022-01-01T00:05:00.000 的数据: + + ```sql + select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; + ``` + +3. 选择时间区间 [2017-11-01T00:05:00.000, 2017-11-01T00:12:00.000) 内的数据: + + ```sql + select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; + ``` + +注:在上述示例中,`time` 也可写做 `timestamp`。 + +### 值过滤条件 + +使用值过滤条件可以筛选数据值满足特定条件的数据。 +**允许**使用 select 子句中未选择的时间序列作为值过滤条件。 + +示例如下: + +1. 选择值大于 36.5 的数据: + + ```sql + select temperature from root.sg1.d1 where temperature > 36.5; + ``` + +2. 选择值等于 true 的数据: + + ```sql + select status from root.sg1.d1 where status = true; + +3. 选择区间 [36.5,40] 内或之外的数据: + + ```sql + select temperature from root.sg1.d1 where temperature between 36.5 and 40; + ```` + ```sql + select temperature from root.sg1.d1 where temperature not between 36.5 and 40; + ```` + +4. 选择值在特定范围内的数据: + + ```sql + select code from root.sg1.d1 where code in ('200', '300', '400', '500'); + ``` + +5. 选择值在特定范围外的数据: + + ```sql + select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); + ``` + +6. 选择值为空的数据: + + ```sql + select code from root.sg1.d1 where temperature is null; + ```` + +7. 选择值为非空的数据: + + ```sql + select code from root.sg1.d1 where temperature is not null; + ```` + +### 模糊查询 + +对于 TEXT 类型的数据,支持使用 `Like` 和 `Regexp` 运算符对数据进行模糊匹配 + +#### 使用 `Like` 进行模糊匹配 + +**匹配规则:** + +- `%` 表示任意0个或多个字符。 +- `_` 表示任意单个字符。 + +**示例 1:** 查询 `root.sg.d1` 下 `value` 含有`'cc'`的数据。 + +``` +IoTDB> select * from root.sg.d1 where value like '%cc%' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 `root.sg.d1` 下 `value` 中间为 `'b'`、前后为任意单个字符的数据。 + +``` +IoTDB> select * from root.sg.device where value like '_b_' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00| abc| ++-----------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +#### 使用 `Regexp` 进行模糊匹配 + +需要传入的过滤条件为 **Java 标准库风格的正则表达式**。 + +**常见的正则匹配举例:** + +``` +长度为3-20的所有字符:^.{3,20}$ +大写英文字符:^[A-Z]+$ +数字和英文字符:^[A-Za-z0-9]+$ +以a开头的:^a.* +``` + +**示例 1:** 查询 root.sg.d1 下 value 值为26个英文字符组成的字符串。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 root.sg.d1 下 value 值为26个小写英文字符组成的字符串且时间大于100的。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +## 分段分组聚合(GROUP BY 子句) +IoTDB支持通过`GROUP BY`子句对序列进行分段或者分组聚合。 + +分段聚合是指按照时间维度,针对同时间序列中不同数据点之间的时间关系,对数据在行的方向进行分段,每个段得到一个聚合值。目前支持**时间区间分段**、**差值分段**、**条件分段**、**会话分段**和**点数分段**,未来将支持更多分段方式。 + +分组聚合是指针对不同时间序列,在时间序列的潜在业务属性上分组,每个组包含若干条时间序列,每个组得到一个聚合值。支持**按路径层级分组**和**按序列标签分组**两种分组方式。 + +### 分段聚合 + +#### 时间区间分段聚合 + +时间区间分段聚合是一种时序数据典型的查询方式,数据以高频进行采集,需要按照一定的时间间隔进行聚合计算,如计算每天的平均气温,需要将气温的序列按天进行分段,然后计算平均值。 + +在 IoTDB 中,聚合查询可以通过 `GROUP BY` 子句指定按照时间区间分段聚合。用户可以指定聚合的时间间隔和滑动步长,相关参数如下: + +* 参数 1:时间轴显示时间窗口大小 +* 参数 2:聚合窗口的大小(必须为正数) +* 参数 3:聚合窗口的滑动步长(可选,默认与聚合窗口大小相同) + +下图中指出了这三个参数的含义: + + + +接下来,我们给出几个典型例子: + +##### 未指定滑动步长的时间区间分段聚合查询 + +对应的 SQL 语句是: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); +``` +这条查询的含义是: + +由于用户没有指定滑动步长,滑动步长将会被默认设置为跟时间间隔参数相同,也就是`1d`。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-11-01T00:00:00, 2017-11-07T23:00:00)。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`1d`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[0,1d), [1d, 2d), [2d, 3d) 等等。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-11-01T00:00:00, 2017-11-07 T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-11-01T00:00:00 到 2017-11-07T23:00:00:00 的每一天) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 1440| 26.0| +|2017-11-02T00:00:00.000+08:00| 1440| 26.0| +|2017-11-03T00:00:00.000+08:00| 1440| 25.99| +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.024s +``` + +##### 指定滑动步长的时间区间分段聚合查询 + +对应的 SQL 语句是: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); +``` + +这条查询的含义是: + +由于用户指定了滑动步长为`1d`,GROUP BY 语句执行时将会每次把时间间隔往后移动一天的步长,而不是默认的 3 小时。 + +也就意味着,我们想要取从 2017-11-01 到 2017-11-07 每一天的凌晨 0 点到凌晨 3 点的数据。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-11-01T00:00:00, 2017-11-07T23:00:00)。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`3h`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[2017-11-01T00:00:00, 2017-11-01T03:00:00), [2017-11-02T00:00:00, 2017-11-02T03:00:00), [2017-11-03T00:00:00, 2017-11-03T03:00:00) 等等。 + +上面这个例子的第三个参数是每次时间间隔的滑动步长。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-11-01T00:00:00, 2017-11-07 T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-11-01T00:00:00 到 2017-11-07T23:00:00:00 的每一天的凌晨 0 点到凌晨 3 点) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-02T00:00:00.000+08:00| 180| 25.98| +|2017-11-03T00:00:00.000+08:00| 180| 25.96| +|2017-11-04T00:00:00.000+08:00| 180| 25.96| +|2017-11-05T00:00:00.000+08:00| 180| 26.0| +|2017-11-06T00:00:00.000+08:00| 180| 25.85| +|2017-11-07T00:00:00.000+08:00| 180| 25.99| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.006s +``` + +滑动步长可以小于聚合窗口,此时聚合窗口之间有重叠时间(类似于一个滑动窗口)。 + +例如 SQL: +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-01 10:00:00), 4h, 2h); +``` + +SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-01T02:00:00.000+08:00| 180| 25.98| +|2017-11-01T04:00:00.000+08:00| 180| 25.96| +|2017-11-01T06:00:00.000+08:00| 180| 25.96| +|2017-11-01T08:00:00.000+08:00| 180| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 5 +It costs 0.006s +``` + +##### 按照自然月份的时间区间分段聚合查询 + +对应的 SQL 语句是: + +```sql +select count(status) from root.ln.wf01.wt01 where time > 2017-11-01T01:00:00 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +这条查询的含义是: + +由于用户指定了滑动步长为`2mo`,GROUP BY 语句执行时将会每次把时间间隔往后移动 2 个自然月的步长,而不是默认的 1 个自然月。 + +也就意味着,我们想要取从 2017-11-01 到 2019-11-07 每 2 个自然月的第一个月的数据。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-11-01T00:00:00, 2019-11-07T23:00:00)。 + +起始时间为 2017-11-01T00:00:00,滑动步长将会以起始时间作为标准按月递增,取当月的 1 号作为时间间隔的起始时间。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`1mo`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[2017-11-01T00:00:00, 2017-12-01T00:00:00), [2018-02-01T00:00:00, 2018-03-01T00:00:00), [2018-05-03T00:00:00, 2018-06-01T00:00:00) 等等。 + +上面这个例子的第三个参数是每次时间间隔的滑动步长。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-11-01T00:00:00, 2019-11-07T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-11-01T00:00:00 到 2019-11-07T23:00:00:00 的每两个自然月的第一个月) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-01T00:00:00.000+08:00| 259| +|2018-01-01T00:00:00.000+08:00| 250| +|2018-03-01T00:00:00.000+08:00| 259| +|2018-05-01T00:00:00.000+08:00| 251| +|2018-07-01T00:00:00.000+08:00| 242| +|2018-09-01T00:00:00.000+08:00| 225| +|2018-11-01T00:00:00.000+08:00| 216| +|2019-01-01T00:00:00.000+08:00| 207| +|2019-03-01T00:00:00.000+08:00| 216| +|2019-05-01T00:00:00.000+08:00| 207| +|2019-07-01T00:00:00.000+08:00| 199| +|2019-09-01T00:00:00.000+08:00| 181| +|2019-11-01T00:00:00.000+08:00| 60| ++-----------------------------+-------------------------------+ +``` + +对应的 SQL 语句是: + +```sql +select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +这条查询的含义是: + +由于用户指定了滑动步长为`2mo`,GROUP BY 语句执行时将会每次把时间间隔往后移动 2 个自然月的步长,而不是默认的 1 个自然月。 + +也就意味着,我们想要取从 2017-10-31 到 2019-11-07 每 2 个自然月的第一个月的数据。 + +与上述示例不同的是起始时间为 2017-10-31T00:00:00,滑动步长将会以起始时间作为标准按月递增,取当月的 31 号(即最后一天)作为时间间隔的起始时间。若起始时间设置为 30 号,滑动步长会将时间间隔的起始时间设置为当月 30 号,若不存在则为最后一天。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-10-31T00:00:00, 2019-11-07T23:00:00)。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`1mo`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[2017-10-31T00:00:00, 2017-11-31T00:00:00), [2018-02-31T00:00:00, 2018-03-31T00:00:00), [2018-05-31T00:00:00, 2018-06-31T00:00:00) 等等。 + +上面这个例子的第三个参数是每次时间间隔的滑动步长。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-10-31T00:00:00, 2019-11-07T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-10-31T00:00:00 到 2019-11-07T23:00:00:00 的每两个自然月的第一个月) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-10-31T00:00:00.000+08:00| 251| +|2017-12-31T00:00:00.000+08:00| 250| +|2018-02-28T00:00:00.000+08:00| 259| +|2018-04-30T00:00:00.000+08:00| 250| +|2018-06-30T00:00:00.000+08:00| 242| +|2018-08-31T00:00:00.000+08:00| 225| +|2018-10-31T00:00:00.000+08:00| 216| +|2018-12-31T00:00:00.000+08:00| 208| +|2019-02-28T00:00:00.000+08:00| 216| +|2019-04-30T00:00:00.000+08:00| 208| +|2019-06-30T00:00:00.000+08:00| 199| +|2019-08-31T00:00:00.000+08:00| 181| +|2019-10-31T00:00:00.000+08:00| 69| ++-----------------------------+-------------------------------+ +``` + +##### 左开右闭区间 + +每个区间的结果时间戳为区间右端点,对应的 SQL 语句是: + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); +``` + +这条查询语句的时间区间是左开右闭的,结果中不会包含时间点 2017-11-01 的数据,但是会包含时间点 2017-11-07 的数据。 + +SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------------+ +Total line number = 7 +It costs 0.004s +``` + +#### 差值分段聚合 +IoTDB支持通过`GROUP BY VARIATION`语句来根据差值进行分组。`GROUP BY VARIATION`会将第一个点作为一个组的**基准点**,每个新的数据在按照给定规则与基准点进行差值运算后, +如果差值小于给定的阈值则将该新点归于同一组,否则结束当前分组,以这个新的数据为新的基准点开启新的分组。 +该分组方式不会重叠,且没有固定的开始结束时间。其子句语法如下: +```sql +group by variation(controlExpression[,delta][,ignoreNull=true/false]) +``` +不同的参数含义如下 +* controlExpression + +分组所参照的值,**可以是查询数据中的某一列或是多列的表达式 +(多列表达式计算后仍为一个值,使用多列表达式时指定的列必须都为数值列)**, 差值便是根据数据的controlExpression的差值运算。 +* delta + +分组所使用的阈值,同一分组中**每个点的controlExpression对应的值与该组中基准点对应值的差值都小于`delta`**。当`delta=0`时,相当于一个等值分组,所有连续且expression值相同的数据将被分到一组。 + +* ignoreNull + +用于指定`controlExpression`的值为null时对数据的处理方式,当`ignoreNull`为false时,该null值会被视为新的值,`ignoreNull`为true时,则直接跳过对应的点。 + +在`delta`取不同值时,`controlExpression`支持的返回数据类型以及当`ignoreNull`为false时对于null值的处理方式可以见下表: + +| delta | controlExpression支持的返回类型 | ignoreNull=false时对于Null值的处理 | +|----------|--------------------------------------|-----------------------------------------------------------------| +| delta!=0 | INT32、INT64、FLOAT、DOUBLE | 若正在维护分组的值不为null,null视为无穷大/无穷小,结束当前分组。连续的null视为差值相等的值,会被分配在同一个分组 | +| delta=0 | TEXT、BINARY、INT32、INT64、FLOAT、DOUBLE | null被视为新分组中的新值,连续的null属于相同的分组 | + +下图为差值分段的一个分段方式示意图,与组中第一个数据的控制列值的差值在delta内的控制列对应的点属于相同的分组。 + +groupByVariation + +##### 使用注意事项 +1. `controlExpression`的结果应该为唯一值,如果使用通配符拼接后出现多列,则报错。 +2. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +3. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +4. 当没有指定`delta`和`ignoreNull`时,`delta`默认为0,`ignoreNull`默认为true。 +5. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 + +使用如下的原始数据,接下来会给出几个事件分段查询的使用样例 +``` ++-----------------------------+-------+-------+-------+--------+-------+-------+ +| Time| s1| s2| s3| s4| s5| s6| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| null| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| null| 245.0| 29.0| null| +|1970-01-01T08:00:00.030+08:00| 34.5| null| 30.0| 345.0| null| null| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| null| 59.0| 50.0| 545.0| 59.0| 6.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| null| +|1970-01-01T08:00:00.070+08:00| 74.5| 79.0| null| null| 79.0| 3.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 3.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 3.25| +|1970-01-01T08:00:00.150+08:00| 66.5| 77.0| 90.0| 945.0| 99.0| 9.25| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +``` +##### delta=0时的等值事件分段 +使用如下sql语句 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) +``` +得到如下的查询结果,这里忽略了s6为null的行 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.040+08:00| 24.5| 3| 50.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +当指定ignoreNull为false时,会将s6为null的数据也考虑进来 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) +``` +得到如下的结果 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.020+08:00|1970-01-01T08:00:00.030+08:00| 29.5| 1| 30.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.040+08:00| 44.5| 1| 40.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.060+08:00|1970-01-01T08:00:00.060+08:00| 64.5| 1| 60.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +##### delta!=0时的差值事件分段 +使用如下sql语句 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) +``` +得到如下的查询结果 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.050+08:00| 24.5| 4| 100.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +group by子句中的controlExpression同样支持列的表达式 + +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6+s5, 10) +``` +得到如下的查询结果 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.050+08:00| 44.5| 2| 90.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.080+08:00| 79.5| 2| 80.0| +|1970-01-01T08:00:00.090+08:00|1970-01-01T08:00:00.150+08:00| 80.5| 2| 180.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +#### 条件分段聚合 +当需要根据指定条件对数据进行筛选,并将连续的符合条件的行分为一组进行聚合运算时,可以使用`GROUP BY CONDITION`的分段方式;不满足给定条件的行因为不属于任何分组会被直接简单忽略。 +其语法定义如下: +```sql +group by condition(predict,[keep>/>=/=/<=/<]threshold,[,ignoreNull=true/false]) +``` +* predict + +返回boolean数据类型的合法表达式,用于分组的筛选。 +* keep[>/>=/=/<=/<]threshold + +keep表达式用来指定形成分组所需要连续满足`predict`条件的数据行数,只有行数满足keep表达式的分组才会被输出。keep表达式由一个'keep'字符串和`long`类型的threshold组合或者是单独的`long`类型数据构成。 + +* ignoreNull=true/false + +用于指定遇到predict为null的数据行时的处理方式,为true则跳过该行,为false则结束当前分组。 + +##### 使用注意事项 +1. keep条件在查询中是必需的,但可以省略掉keep字符串给出一个`long`类型常数,默认为`keep=该long型常数`的等于条件。 +2. `ignoreNull`默认为true。 +3. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +4. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +5. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 + + +对于如下原始数据,下面会给出几个查询样例: +``` ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +| Time|root.sg.beijing.car01.soc|root.sg.beijing.car01.charging_status|root.sg.beijing.car01.vehicle_status| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| 1| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| 1| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +``` +查询至少连续两行以上的charging_status=1的数据,sql语句如下: +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoreNull=true) +``` +得到结果如下: +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 10| 5| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` +当设置`ignoreNull`为false时,遇到null值为将其视为一个不满足条件的行,会结束正在计算的分组。 +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoreNull=false) +``` +得到如下结果,原先的分组被含null的行拆分: +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 7| 3| 36.0| +|1970-01-01T08:00:00.009+08:00| 10| 2| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` +#### 会话分段聚合 +`GROUP BY SESSION`可以根据时间列的间隔进行分组,在结果集的时间列中,时间间隔小于等于设定阈值的数据会被分为一组。例如在工业场景中,设备并不总是连续运行,`GROUP BY SESSION`会将设备每次接入会话所产生的数据分为一组。 +其语法定义如下: +```sql +group by session(timeInterval) +``` +* timeInterval + +设定的时间差阈值,当两条数据时间列的差值大于该阈值,则会给数据创建一个新的分组。 + +下图为`group by session`下的一个分组示意图 + + + +##### 使用注意事项 +1. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +2. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +3. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 + +对于下面的原始数据,给出几个查询样例。 +``` ++-----------------------------+-----------------+-----------+--------+------+ +| Time| Device|temperature|hardware|status| ++-----------------------------+-----------------+-----------+--------+------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01| 35.7| 11| false| +|1970-01-01T08:00:02.000+08:00|root.ln.wf02.wt01| 35.8| 22| true| +|1970-01-01T08:00:03.000+08:00|root.ln.wf02.wt01| 35.4| 33| false| +|1970-01-01T08:00:04.000+08:00|root.ln.wf02.wt01| 36.4| 44| false| +|1970-01-01T08:00:05.000+08:00|root.ln.wf02.wt01| 36.8| 55| false| +|1970-01-01T08:00:10.000+08:00|root.ln.wf02.wt01| 36.8| 110| false| +|1970-01-01T08:00:20.000+08:00|root.ln.wf02.wt01| 37.8| 220| true| +|1970-01-01T08:00:30.000+08:00|root.ln.wf02.wt01| 37.5| 330| false| +|1970-01-01T08:00:40.000+08:00|root.ln.wf02.wt01| 37.4| 440| false| +|1970-01-01T08:00:50.000+08:00|root.ln.wf02.wt01| 37.9| 550| false| +|1970-01-01T08:01:40.000+08:00|root.ln.wf02.wt01| 38.0| 110| false| +|1970-01-01T08:02:30.000+08:00|root.ln.wf02.wt01| 38.8| 220| true| +|1970-01-01T08:03:20.000+08:00|root.ln.wf02.wt01| 38.6| 330| false| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01| 38.4| 440| false| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01| 38.3| 550| false| +|1970-01-01T08:06:40.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:07:50.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:08:00.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01| 38.2| 110| false| +|1970-01-02T08:08:02.000+08:00|root.ln.wf02.wt01| 37.5| 220| true| +|1970-01-02T08:08:03.000+08:00|root.ln.wf02.wt01| 37.4| 330| false| +|1970-01-02T08:08:04.000+08:00|root.ln.wf02.wt01| 36.8| 440| false| +|1970-01-02T08:08:05.000+08:00|root.ln.wf02.wt01| 37.4| 550| false| ++-----------------------------+-----------------+-----------+--------+------+ +``` +可以按照不同的时间单位设定时间间隔,sql语句如下: +```sql +select __endTime,count(*) from root.** group by session(1d) +``` +得到如下结果: +``` ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +| Time| __endTime|count(root.ln.wf02.wt01.temperature)|count(root.ln.wf02.wt01.hardware)|count(root.ln.wf02.wt01.status)| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +|1970-01-01T08:00:01.000+08:00|1970-01-01T08:08:00.000+08:00| 15| 18| 15| +|1970-01-02T08:08:01.000+08:00|1970-01-02T08:08:05.000+08:00| 5| 5| 5| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +``` +也可以和`HAVING`、`ALIGN BY DEVICE`共同使用 +```sql +select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device +``` +得到如下结果,其中排除了`sum(hardware)`为0的部分 +``` ++-----------------------------+-----------------+-----------------------------+-------------+ +| Time| Device| __endTime|sum(hardware)| ++-----------------------------+-----------------+-----------------------------+-------------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01|1970-01-01T08:03:20.000+08:00| 2475.0| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:04:20.000+08:00| 440.0| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:05:20.000+08:00| 550.0| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01|1970-01-02T08:08:05.000+08:00| 1650.0| ++-----------------------------+-----------------+-----------------------------+-------------+ +``` +#### 点数分段聚合 +`GROUP BY COUNT`可以根据点数分组进行聚合运算,将连续的指定数量数据点分为一组,即按照固定的点数进行分组。 +其语法定义如下: +```sql +group by count(controlExpression, size[,ignoreNull=true/false]) +``` +* controlExpression + +计数参照的对象,可以是结果集的任意列或是列的表达式 + +* size + +一个组中数据点的数量,每`size`个数据点会被分到同一个组 + +* ignoreNull=true/false + +是否忽略`controlExpression`为null的数据点,当ignoreNull为true时,在计数时会跳过`controlExpression`结果为null的数据点 + +##### 使用注意事项 +1. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +2. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +3. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 +4. 当一个分组内最终的点数不满足`size`的数量时,不会输出该分组的结果 + +对于下面的原始数据,给出几个查询样例。 +``` ++-----------------------------+-----------+-----------------------+ +| Time|root.sg.soc|root.sg.charging_status| ++-----------------------------+-----------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| ++-----------------------------+-----------+-----------------------+ +``` +sql语句如下 +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) +``` +得到如下结果,其中由于第二个1970-01-01T08:00:00.006+08:00到1970-01-01T08:00:00.010+08:00的窗口中包含四个点,不符合`size = 5`的条件,因此不被输出 +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` +而当使用ignoreNull将null值也考虑进来时,可以得到两个点计数为5的窗口,sql如下 +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) +``` +得到如下结果 +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| +|1970-01-01T08:00:00.006+08:00|1970-01-01T08:00:00.010+08:00| 24.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` +### 分组聚合 + +#### 路径层级分组聚合 + +在时间序列层级结构中,路径层级分组聚合查询用于**对某一层级下同名的序列进行聚合查询**。 + +- 使用 `GROUP BY LEVEL = INT` 来指定需要聚合的层级,并约定 `ROOT` 为第 0 层。若统计 "root.ln" 下所有序列则需指定 level 为 1。 +- 路径层次分组聚合查询支持使用所有内置聚合函数。对于 `sum`,`avg`,`min_value`, `max_value`, `extreme` 五种聚合函数,需保证所有聚合的时间序列数据类型相同。其他聚合函数没有此限制。 + +**示例1:** 不同 database 下均存在名为 status 的序列, 如 "root.ln.wf01.wt01.status", "root.ln.wf02.wt02.status", 以及 "root.sgcc.wf03.wt01.status", 如果需要统计不同 database 下 status 序列的数据点个数,使用以下查询: + +```sql +select count(status) from root.** group by level = 1 +``` + +运行结果为: + +``` ++-------------------------+---------------------------+ +|count(root.ln.*.*.status)|count(root.sgcc.*.*.status)| ++-------------------------+---------------------------+ +| 20160| 10080| ++-------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**示例2:** 统计不同设备下 status 序列的数据点个数,可以规定 level = 3, + +```sql +select count(status) from root.** group by level = 3 +``` + +运行结果为: + +``` ++---------------------------+---------------------------+ +|count(root.*.*.wt01.status)|count(root.*.*.wt02.status)| ++---------------------------+---------------------------+ +| 20160| 10080| ++---------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +注意,这时会将 database `ln` 和 `sgcc` 下名为 `wt01` 的设备视为同名设备聚合在一起。 + +**示例3:** 统计不同 database 下的不同设备中 status 序列的数据点个数,可以使用以下查询: + +```sql +select count(status) from root.** group by level = 1, 3 +``` + +运行结果为: + +``` ++----------------------------+----------------------------+------------------------------+ +|count(root.ln.*.wt01.status)|count(root.ln.*.wt02.status)|count(root.sgcc.*.wt01.status)| ++----------------------------+----------------------------+------------------------------+ +| 10080| 10080| 10080| ++----------------------------+----------------------------+------------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**示例4:** 查询所有序列下温度传感器 temperature 的最大值,可以使用下列查询语句: + +```sql +select max_value(temperature) from root.** group by level = 0 +``` + +运行结果: + +``` ++---------------------------------+ +|max_value(root.*.*.*.temperature)| ++---------------------------------+ +| 26.0| ++---------------------------------+ +Total line number = 1 +It costs 0.013s +``` + +**示例5:** 上面的查询都是针对某一个传感器,特别地,**如果想要查询某一层级下所有传感器拥有的总数据点数,则需要显式规定测点为 `*`** + +```sql +select count(*) from root.ln.** group by level = 2 +``` + +运行结果: + +``` ++----------------------+----------------------+ +|count(root.*.wf01.*.*)|count(root.*.wf02.*.*)| ++----------------------+----------------------+ +| 20160| 20160| ++----------------------+----------------------+ +Total line number = 1 +It costs 0.013s +``` + +##### 与时间区间分段聚合混合使用 + +通过定义 LEVEL 来统计指定层级下的数据点个数。 + +例如: + +统计降采样后的数据点个数 + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; +``` + +结果: + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.006s +``` + +加上滑动 Step 的降采样后的结果也可以汇总 + +```sql +select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; +``` + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| +|2017-11-02T00:00:00.000+08:00| 180| +|2017-11-03T00:00:00.000+08:00| 180| +|2017-11-04T00:00:00.000+08:00| 180| +|2017-11-05T00:00:00.000+08:00| 180| +|2017-11-06T00:00:00.000+08:00| 180| +|2017-11-07T00:00:00.000+08:00| 180| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.004s +``` + +#### 标签分组聚合 + +IoTDB 支持通过 `GROUP BY TAGS` 语句根据时间序列中定义的标签的键值做分组聚合查询。 + +我们先在 IoTDB 中写入如下示例数据,稍后会以这些数据为例介绍标签聚合查询。 + +这些是某工厂 `factory1` 在多个城市的多个车间的设备温度数据, 时间范围为 [1000, 10000)。 + +时间序列路径中的设备一级是设备唯一标识。城市信息 `city` 和车间信息 `workshop` 则被建模在该设备时间序列的标签中。 +其中,设备 `d1`、`d2` 在 `Beijing` 的 `w1` 车间, `d3`、`d4` 在 `Beijing` 的 `w2` 车间,`d5`、`d6` 在 `Shanghai` 的 `w1` 车间,`d7` 在 `Shanghai` 的 `w2` 车间。 +`d8` 和 `d9` 设备目前处于调试阶段,还未被分配到具体的城市和车间,所以其相应的标签值为空值。 + +```SQL +create database root.factory1; +create timeseries root.factory1.d1.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d2.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d3.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d4.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d5.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d6.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d7.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w2); +create timeseries root.factory1.d8.temperature with datatype=FLOAT; +create timeseries root.factory1.d9.temperature with datatype=FLOAT; + +insert into root.factory1.d1(time, temperature) values(1000, 104.0); +insert into root.factory1.d1(time, temperature) values(3000, 104.2); +insert into root.factory1.d1(time, temperature) values(5000, 103.3); +insert into root.factory1.d1(time, temperature) values(7000, 104.1); + +insert into root.factory1.d2(time, temperature) values(1000, 104.4); +insert into root.factory1.d2(time, temperature) values(3000, 103.7); +insert into root.factory1.d2(time, temperature) values(5000, 103.3); +insert into root.factory1.d2(time, temperature) values(7000, 102.9); + +insert into root.factory1.d3(time, temperature) values(1000, 103.9); +insert into root.factory1.d3(time, temperature) values(3000, 103.8); +insert into root.factory1.d3(time, temperature) values(5000, 102.7); +insert into root.factory1.d3(time, temperature) values(7000, 106.9); + +insert into root.factory1.d4(time, temperature) values(1000, 103.9); +insert into root.factory1.d4(time, temperature) values(5000, 102.7); +insert into root.factory1.d4(time, temperature) values(7000, 106.9); + +insert into root.factory1.d5(time, temperature) values(1000, 112.9); +insert into root.factory1.d5(time, temperature) values(7000, 113.0); + +insert into root.factory1.d6(time, temperature) values(1000, 113.9); +insert into root.factory1.d6(time, temperature) values(3000, 113.3); +insert into root.factory1.d6(time, temperature) values(5000, 112.7); +insert into root.factory1.d6(time, temperature) values(7000, 112.3); + +insert into root.factory1.d7(time, temperature) values(1000, 101.2); +insert into root.factory1.d7(time, temperature) values(3000, 99.3); +insert into root.factory1.d7(time, temperature) values(5000, 100.1); +insert into root.factory1.d7(time, temperature) values(7000, 99.8); + +insert into root.factory1.d8(time, temperature) values(1000, 50.0); +insert into root.factory1.d8(time, temperature) values(3000, 52.1); +insert into root.factory1.d8(time, temperature) values(5000, 50.1); +insert into root.factory1.d8(time, temperature) values(7000, 50.5); + +insert into root.factory1.d9(time, temperature) values(1000, 50.3); +insert into root.factory1.d9(time, temperature) values(3000, 52.1); +``` + +##### 单标签聚合查询 + +用户想统计该工厂每个地区的设备的温度的平均值,可以使用如下查询语句 + +```SQL +SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); +``` + +该查询会将具有同一个 `city` 标签值的时间序列的所有满足查询条件的点做平均值计算,计算结果如下 + +``` ++--------+------------------+ +| city| avg(temperature)| ++--------+------------------+ +| Beijing|104.04666697184244| +|Shanghai|107.85000076293946| +| NULL| 50.84999910990397| ++--------+------------------+ +Total line number = 3 +It costs 0.231s +``` + +从结果集中可以看到,和分段聚合、按层次分组聚合相比,标签聚合的查询结果的不同点是: +1. 标签聚合查询的聚合结果不会再做去星号展开,而是将多个时间序列的数据作为一个整体进行聚合计算。 +2. 标签聚合查询除了输出聚合结果列,还会输出聚合标签的键值列。该列的列名为聚合指定的标签键,列的值则为所有查询的时间序列中出现的该标签的值。 +如果某些时间序列未设置该标签,则在键值列中有一行单独的 `NULL` ,代表未设置标签的所有时间序列数据的聚合结果。 + +##### 多标签分组聚合查询 + +除了基本的单标签聚合查询外,还可以按顺序指定多个标签进行聚合计算。 + +例如,用户想统计每个城市的每个车间内设备的平均温度。但因为各个城市的车间名称有可能相同,所以不能直接按照 `workshop` 做标签聚合。必须要先按照城市,再按照车间处理。 + +SQL 语句如下 + +```SQL +SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); +``` + +查询结果如下 + +``` ++--------+--------+------------------+ +| city|workshop| avg(temperature)| ++--------+--------+------------------+ +| NULL| NULL| 50.84999910990397| +|Shanghai| w1|113.01666768391927| +| Beijing| w2| 104.4000004359654| +|Shanghai| w2|100.10000038146973| +| Beijing| w1|103.73750019073486| ++--------+--------+------------------+ +Total line number = 5 +It costs 0.027s +``` + +从结果集中可以看到,和单标签聚合相比,多标签聚合的查询结果会根据指定的标签顺序,输出相应标签的键值列。 + +##### 基于时间区间的标签聚合查询 + +按照时间区间聚合是时序数据库中最常用的查询需求之一。IoTDB 在基于时间区间的聚合基础上,支持进一步按照标签进行聚合查询。 + +例如,用户想统计时间 `[1000, 10000)` 范围内,每个城市每个车间中的设备每 5 秒内的平均温度。 + +SQL 语句如下 + +```SQL +SELECT AVG(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); +``` + +查询结果如下 + +``` ++-----------------------------+--------+--------+------------------+ +| Time| city|workshop| avg(temperature)| ++-----------------------------+--------+--------+------------------+ +|1970-01-01T08:00:01.000+08:00| NULL| NULL| 50.91999893188476| +|1970-01-01T08:00:01.000+08:00|Shanghai| w1|113.20000076293945| +|1970-01-01T08:00:01.000+08:00| Beijing| w2| 103.4| +|1970-01-01T08:00:01.000+08:00|Shanghai| w2| 100.1999994913737| +|1970-01-01T08:00:01.000+08:00| Beijing| w1|103.81666692097981| +|1970-01-01T08:00:06.000+08:00| NULL| NULL| 50.5| +|1970-01-01T08:00:06.000+08:00|Shanghai| w1| 112.6500015258789| +|1970-01-01T08:00:06.000+08:00| Beijing| w2| 106.9000015258789| +|1970-01-01T08:00:06.000+08:00|Shanghai| w2| 99.80000305175781| +|1970-01-01T08:00:06.000+08:00| Beijing| w1| 103.5| ++-----------------------------+--------+--------+------------------+ +``` + +和标签聚合相比,基于时间区间的标签聚合的查询会首先按照时间区间划定聚合范围,在时间区间内部再根据指定的标签顺序,进行相应数据的聚合计算。在输出的结果集中,会包含一列时间列,该时间列值的含义和时间区间聚合查询的相同。 + +##### 标签分组聚合的限制 + +由于标签聚合功能仍然处于开发阶段,目前有如下未实现功能。 + +> 1. 暂不支持 `HAVING` 子句过滤查询结果。 +> 2. 暂不支持结果按照标签值排序。 +> 3. 暂不支持 `LIMIT`,`OFFSET`,`SLIMIT`,`SOFFSET`。 +> 4. 暂不支持 `ALIGN BY DEVICE`。 +> 5. 暂不支持聚合函数内部包含表达式,例如 `count(s+1)`。 +> 6. 不支持值过滤条件聚合,和分层聚合查询行为保持一致。 + +## 聚合结果过滤(HAVING 子句) + +如果想对聚合查询的结果进行过滤,可以在 `GROUP BY` 子句之后使用 `HAVING` 子句。 + +**注意:** + +1. `HAVING`子句中的过滤条件必须由聚合值构成,原始序列不能单独出现。 + + 下列使用方式是不正确的: + ```sql + select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 + select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 + ``` + +2. 对`GROUP BY LEVEL`结果进行过滤时,`SELECT`和`HAVING`中出现的PATH只能有一级。 + + 下列使用方式是不正确的: + ```sql + select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 + select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 + ``` + +**SQL 示例:** + +- **示例 1:** + + 对于以下聚合结果进行过滤: + + ``` + +-----------------------------+---------------------+---------------------+ + | Time|count(root.test.*.s1)|count(root.test.*.s2)| + +-----------------------------+---------------------+---------------------+ + |1970-01-01T08:00:00.001+08:00| 4| 4| + |1970-01-01T08:00:00.003+08:00| 1| 0| + |1970-01-01T08:00:00.005+08:00| 2| 4| + |1970-01-01T08:00:00.007+08:00| 3| 2| + |1970-01-01T08:00:00.009+08:00| 4| 4| + +-----------------------------+---------------------+---------------------+ + ``` + + ```sql + select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 2; + ``` + + 执行结果如下: + + ``` + +-----------------------------+---------------------+ + | Time|count(root.test.*.s1)| + +-----------------------------+---------------------+ + |1970-01-01T08:00:00.001+08:00| 4| + |1970-01-01T08:00:00.005+08:00| 2| + |1970-01-01T08:00:00.009+08:00| 4| + +-----------------------------+---------------------+ + ``` + +- **示例 2:** + + 对于以下聚合结果进行过滤: + ``` + +-----------------------------+-------------+---------+---------+ + | Time| Device|count(s1)|count(s2)| + +-----------------------------+-------------+---------+---------+ + |1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.003+08:00|root.test.sg1| 1| 0| + |1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.007+08:00|root.test.sg1| 2| 1| + |1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| + |1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| + |1970-01-01T08:00:00.003+08:00|root.test.sg2| 0| 0| + |1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| + |1970-01-01T08:00:00.007+08:00|root.test.sg2| 1| 1| + |1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| + +-----------------------------+-------------+---------+---------+ + ``` + + ```sql + select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device; + ``` + + 执行结果如下: + + ``` + +-----------------------------+-------------+---------+---------+ + | Time| Device|count(s1)|count(s2)| + +-----------------------------+-------------+---------+---------+ + |1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| + |1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| + |1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| + |1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| + +-----------------------------+-------------+---------+---------+ + ``` + + +## 结果集补空值(FILL 子句) + +### 功能介绍 + +当执行一些数据查询时,结果集的某行某列可能没有数据,则此位置结果为空,但这种空值不利于进行数据可视化展示和分析,需要对空值进行填充。 + +在 IoTDB 中,用户可以使用 `FILL` 子句指定数据缺失情况下的填充模式,允许用户按照特定的方法对任何查询的结果集填充空值,如取前一个不为空的值、线性插值等。 + +### 语法定义 + +**`FILL` 子句的语法定义如下:** + +```sql +FILL '(' PREVIOUS | LINEAR | constant ')' +``` + +**注意:** +- 在 `Fill` 语句中只能指定一种填充方法,该方法作用于结果集的全部列。 +- 空值填充不兼容 0.13 版本及以前的语法(即不支持 `FILL(([(, , )?])+)`) + +### 填充方式 + +**IoTDB 目前支持以下三种空值填充方式:** + +- `PREVIOUS` 填充:使用该列前一个非空值进行填充。 +- `LINEAR` 填充:使用该列前一个非空值和下一个非空值的线性插值进行填充。 +- 常量填充:使用指定常量填充。 + +**各数据类型支持的填充方法如下表所示:** + +| 数据类型 | 支持的填充方法 | +| :------- |:------------------------| +| BOOLEAN | `PREVIOUS`、常量 | +| INT32 | `PREVIOUS`、`LINEAR`、常量 | +| INT64 | `PREVIOUS`、`LINEAR`、常量 | +| FLOAT | `PREVIOUS`、`LINEAR`、常量 | +| DOUBLE | `PREVIOUS`、`LINEAR`、常量 | +| TEXT | `PREVIOUS`、常量 | + +**注意:** 对于数据类型不支持指定填充方法的列,既不会填充它,也不会报错,只是让那一列保持原样。 + +**下面通过举例进一步说明。** + +如果我们不使用任何填充方式: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000; +``` + +查询结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +#### `PREVIOUS` 填充 + +**对于查询结果集中的空值,使用该列前一个非空值进行填充。** + +**注意:** 如果结果集的某一列第一个值就为空,则不会填充该值,直到遇到该列第一个非空值为止。 + +例如,使用 `PREVIOUS` 填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); +``` + +`PREVIOUS` 填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 21.93| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| false| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +**在前值填充时,能够支持指定一个时间间隔,如果当前null值的时间戳与前一个非null值的时间戳的间隔,超过指定的时间间隔,则不进行填充。** + +> 1. 在线性填充和常量填充的情况下,如果指定了第二个参数,会抛出异常 +> 2. 时间超时参数仅支持整数 + 例如,原始数据如下所示: + +```sql +select s1 from root.db.d1 +``` +``` ++-----------------------------+-------------+ +| Time|root.db.d1.s1| ++-----------------------------+-------------+ +|2023-11-08T16:41:50.008+08:00| 1.0| ++-----------------------------+-------------+ +|2023-11-08T16:46:50.011+08:00| 2.0| ++-----------------------------+-------------+ +|2023-11-08T16:48:50.011+08:00| 3.0| ++-----------------------------+-------------+ +``` + +根据时间分组,每1分钟求一个平均值 + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| null| ++-----------------------------+------------------+ +``` + +根据时间分组并用前值填充 + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + +根据时间分组并用前值填充,并指定超过2分钟的就不填充 + +```sql +select avg(s1) +from root.db.d1 +group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS, 2m); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + + +#### `LINEAR` 填充 + +**对于查询结果集中的空值,使用该列前一个非空值和下一个非空值的线性插值进行填充。** + +**注意:** +- 如果某个值之前的所有值都为空,或者某个值之后的所有值都为空,则不会填充该值。 +- 如果某列的数据类型为boolean/text,我们既不会填充它,也不会报错,只是让那一列保持原样。 + +例如,使用 `LINEAR` 填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); +``` + +`LINEAR` 填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 22.08| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +#### 常量填充 + +**对于查询结果集中的空值,使用指定常量填充。** + +**注意:** +- 如果某列数据类型与常量类型不兼容,既不填充该列,也不报错,将该列保持原样。对于常量兼容的数据类型,如下表所示: + + | 常量类型 | 能够填充的序列数据类型 | + |:------ |:------------------ | + | `BOOLEAN` | `BOOLEAN` `TEXT` | + | `INT64` | `INT32` `INT64` `FLOAT` `DOUBLE` `TEXT` | + | `DOUBLE` | `FLOAT` `DOUBLE` `TEXT` | + | `TEXT` | `TEXT` | +- 当常量值大于 `INT32` 所能表示的最大值时,对于 `INT32` 类型的列,既不填充该列,也不报错,将该列保持原样。 + +例如,使用 `FLOAT` 类型的常量填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); +``` + +`FLOAT` 类型的常量填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 2.0| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +再比如,使用 `BOOLEAN` 类型的常量填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); +``` + +`BOOLEAN` 类型的常量填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| true| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + + +## 查询结果分页(LIMIT/SLIMIT 子句) + +当查询结果集数据量很大,放在一个页面不利于显示,可以使用 `LIMIT/SLIMIT` 子句和 `OFFSET/SOFFSET `子句进行分页控制。 + +- `LIMIT` 和 `SLIMIT` 子句用于控制查询结果的行数和列数。 +- `OFFSET` 和 `SOFFSET` 子句用于控制结果显示的起始位置。 + +### 按行分页 + +用户可以通过 `LIMIT` 和 `OFFSET` 子句控制查询结果的行数,`LIMIT rowLimit` 指定查询结果的行数,`OFFSET rowOffset` 指定查询结果显示的起始行位置。 + +注意: +- 当 `rowOffset` 超过结果集的大小时,返回空结果集。 +- 当 `rowLimit` 超过结果集的大小时,返回所有查询结果。 +- 当 `rowLimit` 和 `rowOffset` 不是正整数,或超过 `INT64` 允许的最大值时,系统将提示错误。 + +我们将通过以下示例演示如何使用 `LIMIT` 和 `OFFSET` 子句。 + +- **示例 1:** 基本的 `LIMIT` 子句 + +SQL 语句: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 10 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 选择的时间序列是“状态”和“温度”。 SQL 语句要求返回查询结果的前 10 行。 + +结果如下所示: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:00:00.000+08:00| true| 25.96| +|2017-11-01T00:01:00.000+08:00| true| 24.36| +|2017-11-01T00:02:00.000+08:00| false| 20.09| +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 10 +It costs 0.000s +``` + +- **示例 2:** 带 `OFFSET` 的 `LIMIT` 子句 + +SQL 语句: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 选择的时间序列是“状态”和“温度”。 SQL 语句要求返回查询结果的第 3 至 7 行(第一行编号为 0 行)。 + +结果如下所示: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.342s +``` + +- **示例 3:** `LIMIT` 子句与 `WHERE` 子句结合 + +SQL 语句: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2024-07-07T00:05:00.000 and time< 2024-07-12T00:12:00.000 limit 5 offset 3 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 选择的时间序列是“状态”和“温度”。 SQL 语句要求返回时间“ 2024-07-07T00:05:00.000”和“ 2024-07-12T00:12:00.000”之间的状态和温度传感器值的第 3 至 7 行(第一行编号为第 0 行)。 + +结果如下所示: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2024-07-09T17:32:11.943+08:00| true| 24.941973| +|2024-07-09T17:32:12.944+08:00| true| 20.05108| +|2024-07-09T17:32:13.945+08:00| true| 20.541632| +|2024-07-09T17:32:14.945+08:00| null| 23.09016| +|2024-07-09T17:32:14.946+08:00| true| null| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.070s +`` + +- **示例 4:** `LIMIT` 子句与 `GROUP BY` 子句组合 + +SQL 语句: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 4 offset 3 +``` + +含义: + +SQL 语句子句要求返回查询结果的第 3 至 6 行(第一行编号为 0 行)。 + +结果如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 4 +It costs 0.016s +``` + +### 按列分页 + +用户可以通过 `SLIMIT` 和 `SOFFSET` 子句控制查询结果的列数,`SLIMIT seriesLimit` 指定查询结果的列数,`SOFFSET seriesOffset` 指定查询结果显示的起始列位置。 + +注意: +- 仅用于控制值列,对时间列和设备列无效。 +- 当 `seriesOffset` 超过结果集的大小时,返回空结果集。 +- 当 `seriesLimit` 超过结果集的大小时,返回所有查询结果。 +- 当 `seriesLimit` 和 `seriesOffset` 不是正整数,或超过 `INT64` 允许的最大值时,系统将提示错误。 + +我们将通过以下示例演示如何使用 `SLIMIT` 和 `SOFFSET` 子句。 + +- **示例 1:** 基本的 `SLIMIT` 子句 + +SQL 语句: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 所选时间序列是该设备下的第二列,即温度。 SQL 语句要求在"2017-11-01T00:05:00.000"和"2017-11-01T00:12:00.000"的时间点之间选择温度传感器值。 + +结果如下所示: + +``` ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| 20.71| +|2017-11-01T00:07:00.000+08:00| 21.45| +|2017-11-01T00:08:00.000+08:00| 22.58| +|2017-11-01T00:09:00.000+08:00| 20.98| +|2017-11-01T00:10:00.000+08:00| 25.52| +|2017-11-01T00:11:00.000+08:00| 22.91| ++-----------------------------+-----------------------------+ +Total line number = 6 +It costs 0.000s +``` + +- **示例 2:** 带 `SOFFSET` 的 `SLIMIT` 子句 + +SQL 语句: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 所选时间序列是该设备下的第一列,即电源状态。 SQL 语句要求在" 2017-11-01T00:05:00.000"和"2017-11-01T00:12:00.000"的时间点之间选择状态传感器值。 + +结果如下所示: + +``` ++-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.status| ++-----------------------------+------------------------+ +|2017-11-01T00:06:00.000+08:00| false| +|2017-11-01T00:07:00.000+08:00| false| +|2017-11-01T00:08:00.000+08:00| false| +|2017-11-01T00:09:00.000+08:00| false| +|2017-11-01T00:10:00.000+08:00| true| +|2017-11-01T00:11:00.000+08:00| false| ++-----------------------------+------------------------+ +Total line number = 6 +It costs 0.003s +``` + +- **示例 3:** `SLIMIT` 子句与 `GROUP BY` 子句结合 + +SQL 语句: + +```sql +select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 +``` + +含义: + +``` ++-----------------------------+-----------------------------------+ +| Time|max_value(root.ln.wf01.wt01.status)| ++-----------------------------+-----------------------------------+ +|2017-11-01T00:00:00.000+08:00| true| +|2017-11-02T00:00:00.000+08:00| true| +|2017-11-03T00:00:00.000+08:00| true| +|2017-11-04T00:00:00.000+08:00| true| +|2017-11-05T00:00:00.000+08:00| true| +|2017-11-06T00:00:00.000+08:00| true| +|2017-11-07T00:00:00.000+08:00| true| ++-----------------------------+-----------------------------------+ +Total line number = 7 +It costs 0.000s +``` + +- **示例 4:** `SLIMIT` 子句与 `LIMIT` 子句结合 + +SQL 语句: + +```sql +select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 所选时间序列是此设备下的第 0 列至第 1 列(第一列编号为第 0 列)。 SQL 语句子句要求返回查询结果的第 100 至 109 行(第一行编号为 0 行)。 + +结果如下所示: + +``` ++-----------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+-----------------------------+------------------------+ +|2017-11-01T01:40:00.000+08:00| 21.19| false| +|2017-11-01T01:41:00.000+08:00| 22.79| false| +|2017-11-01T01:42:00.000+08:00| 22.98| false| +|2017-11-01T01:43:00.000+08:00| 21.52| false| +|2017-11-01T01:44:00.000+08:00| 23.45| true| +|2017-11-01T01:45:00.000+08:00| 24.06| true| +|2017-11-01T01:46:00.000+08:00| 22.6| false| +|2017-11-01T01:47:00.000+08:00| 23.78| true| +|2017-11-01T01:48:00.000+08:00| 24.72| true| +|2017-11-01T01:49:00.000+08:00| 24.68| true| ++-----------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.009s +``` + +## 结果集排序(ORDER BY 子句) + +### 时间对齐模式下的排序 +IoTDB的查询结果集默认按照时间对齐,可以使用`ORDER BY TIME`的子句指定时间戳的排列顺序。示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; +``` +执行结果: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-01T00:01:00.000+08:00| v2| true| 24.36| true| +|2017-11-01T00:00:00.000+08:00| v2| true| 25.96| true| +|1970-01-01T08:00:00.002+08:00| v2| false| null| null| +|1970-01-01T08:00:00.001+08:00| v1| true| null| null| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +``` +### 设备对齐模式下的排序 +当使用`ALIGN BY DEVICE`查询对齐模式下的结果集时,可以使用`ORDER BY`子句对返回的结果集顺序进行规定。 + +在设备对齐模式下支持4种排序模式的子句,其中包括两种排序键,`DEVICE`和`TIME`,靠前的排序键为主排序键,每种排序键都支持`ASC`和`DESC`两种排列顺序。 +1. ``ORDER BY DEVICE``: 按照设备名的字典序进行排序,排序方式为字典序排序,在这种情况下,相同名的设备会以组的形式进行展示。 + +2. ``ORDER BY TIME``: 按照时间戳进行排序,此时不同的设备对应的数据点会按照时间戳的优先级被打乱排序。 + +3. ``ORDER BY DEVICE,TIME``: 按照设备名的字典序进行排序,设备名相同的数据点会通过时间戳进行排序。 + +4. ``ORDER BY TIME,DEVICE``: 按照时间戳进行排序,时间戳相同的数据点会通过设备名的字典序进行排序。 + +> 为了保证结果的可观性,当不使用`ORDER BY`子句,仅使用`ALIGN BY DEVICE`时,会为设备视图提供默认的排序方式。其中默认的排序视图为``ORDER BY DEVCE,TIME``,默认的排序顺序为`ASC`, +> 即结果集默认先按照设备名升序排列,在相同设备名内再按照时间戳升序排序。 + + +当主排序键为`DEVICE`时,结果集的格式与默认情况类似:先按照设备名对结果进行排列,在相同的设备名下内按照时间戳进行排序。示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; +``` +执行结果: + +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` +主排序键为`Time`时,结果集会先按照时间戳进行排序,在时间戳相等时按照设备名排序。 +示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; +``` +执行结果: +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` +当没有显式指定时,主排序键默认为`Device`,排序顺序默认为`ASC`,示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` +结果如图所示,可以看出,`ORDER BY DEVICE ASC,TIME ASC`就是默认情况下的排序方式,由于`ASC`是默认排序顺序,此处可以省略。 +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| ++-----------------------------+-----------------+--------+------+-----------+ +``` +同样,可以在聚合查询中使用`ALIGN BY DEVICE`和`ORDER BY`子句,对聚合后的结果进行排序,示例代码如下所示: +```sql +select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device +``` +执行结果: +``` ++-----------------------------+-----------------+---------------+-------------+------------------+ +| Time| Device|count(hardware)|count(status)|count(temperature)| ++-----------------------------+-----------------+---------------+-------------+------------------+ +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| 1| 1| +|2017-11-01T00:02:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:03:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| 1| 1| null| +|2017-11-01T00:02:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| +|2017-11-01T00:03:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| ++-----------------------------+-----------------+---------------+-------------+------------------+ +``` + +### 任意表达式排序 +除了IoTDB中规定的Time,Device关键字外,还可以通过`ORDER BY`子句对指定时间序列中任意列的表达式进行排序。 + +排序在通过`ASC`,`DESC`指定排序顺序的同时,可以通过`NULLS`语法来指定NULL值在排序中的优先级,`NULLS FIRST`默认NULL值在结果集的最上方,`NULLS LAST`则保证NULL值在结果集的最后。如果没有在子句中指定,则默认顺序为`ASC`,`NULLS LAST`。 + +对于如下的数据,将给出几个任意表达式的查询示例供参考: +``` ++-----------------------------+-------------+-------+-------+--------+-------+ +| Time| Device| base| score| bonus| total| ++-----------------------------+-------------+-------+-------+--------+-------+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0| 107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0| 105.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0| 103.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00| root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.020+08:00| root.three| 8| null| 22.5| 30.5| +|1970-01-01T08:00:00.030+08:00| root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0| 104.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0| 102.0| ++-----------------------------+-------------+-------+-------+--------+-------+ +``` + +当需要根据基础分数score对结果进行排序时,可以直接使用 +```Sql +select score from root.** order by score desc align by device +``` +会得到如下结果 + +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` + +当想要根据总分对结果进行排序,可以在order by子句中使用表达式进行计算 +```Sql +select score,total from root.one order by base+score+bonus desc +``` +该sql等价于 +```Sql +select score,total from root.one order by total desc +``` +得到如下结果 + +``` ++-----------------------------+--------------+--------------+ +| Time|root.one.score|root.one.total| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.000+08:00| 50.0| 107.0| +|1970-01-02T08:00:00.000+08:00| 50.0| 105.0| +|1970-01-03T08:00:00.000+08:00| 50.0| 103.0| ++-----------------------------+--------------+--------------+ +``` +而如果要对总分进行排序,且分数相同时依次根据score, base, bonus和提交时间进行排序时,可以通过多个表达式来指定多层排序 + +```Sql +select base, score, bonus, total from root.** order by total desc NULLS Last, + score desc NULLS Last, + bonus desc NULLS Last, + time desc align by device +``` +得到如下结果 +``` ++-----------------------------+----------+----+-----+-----+-----+ +| Time| Device|base|score|bonus|total| ++-----------------------------+----------+----+-----+-----+-----+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0|107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0|105.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0|104.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0|103.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0|102.0| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.000+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00|root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.030+08:00|root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.020+08:00|root.three| 8| null| 22.5| 30.5| ++-----------------------------+----------+----+-----+-----+-----+ +``` +在order by中同样可以使用聚合查询表达式 +```Sql +select min_value(total) from root.** order by min_value(total) asc align by device +``` +得到如下结果 +``` ++----------+----------------+ +| Device|min_value(total)| ++----------+----------------+ +|root.three| 30.5| +| root.two| 33.0| +| root.four| 85.0| +| root.five| 102.0| +| root.one| 103.0| ++----------+----------------+ +``` +当在查询中指定多列,未被排序的列会随着行和排序列一起改变顺序,当排序列相同时行的顺序和具体实现有关(没有固定顺序) +```Sql +select min_value(total),max_value(base) from root.** order by max_value(total) desc align by device +``` +得到结果如下 +· +``` ++----------+----------------+---------------+ +| Device|min_value(total)|max_value(base)| ++----------+----------------+---------------+ +| root.one| 103.0| 12| +| root.five| 102.0| 7| +| root.four| 85.0| 9| +| root.two| 33.0| 9| +|root.three| 30.5| 9| ++----------+----------------+---------------+ +``` + +Order by device, time可以和order by expression共同使用 +```Sql +select score from root.** order by device asc, score desc, time asc align by device +``` +会得到如下结果 +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` + +## 查询对齐模式(ALIGN BY DEVICE 子句) + +在 IoTDB 中,查询结果集**默认按照时间对齐**,包含一列时间列和若干个值列,每一行数据各列的时间戳相同。 + +除按照时间对齐外,还支持以下对齐模式: + +- 按设备对齐 `ALIGN BY DEVICE` + +### 按设备对齐 + +在按设备对齐模式下,设备名会单独作为一列出现,查询结果集包含一列时间列、一列设备列和若干个值列。如果 `SELECT` 子句中选择了 `N` 列,则结果集包含 `N + 2` 列(时间列和设备名字列)。 + +在默认情况下,结果集按照 `Device` 进行排列,在每个 `Device` 内按照 `Time` 列升序排序。 + +当查询多个设备时,要求设备之间同名的列数据类型相同。 + +为便于理解,可以按照关系模型进行对应。设备可以视为关系模型中的表,选择的列可以视为表中的列,`Time + Device` 看做其主键。 + +**示例:** + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` + +执行如下: + +``` ++-----------------------------+-----------------+-----------+------+--------+ +| Time| Device|temperature|status|hardware| ++-----------------------------+-----------------+-----------+------+--------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| 25.96| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| 24.36| true| null| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| null| true| v1| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| null| false| v2| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| null| true| v2| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| null| true| v2| ++-----------------------------+-----------------+-----------+------+--------+ +Total line number = 6 +It costs 0.012s +``` +### 设备对齐模式下的排序 +在设备对齐模式下,默认按照设备名的字典序升序排列,每个设备内部按照时间戳大小升序排列,可以通过 `ORDER BY` 子句调整设备列和时间列的排序优先级。 + +详细说明及示例见文档 [结果集排序](../SQL-Manual/Operator-and-Expression.md)。 + +## 查询写回(INTO 子句) + +`SELECT INTO` 语句用于将查询结果写入一系列指定的时间序列中。 + +应用场景如下: +- **实现 IoTDB 内部 ETL**:对原始数据进行 ETL 处理后写入新序列。 +- **查询结果存储**:将查询结果进行持久化存储,起到类似物化视图的作用。 +- **非对齐序列转对齐序列**:对齐序列从0.13版本开始支持,可以通过该功能将非对齐序列的数据写入新的对齐序列中。 + +### 语法定义 + +#### 整体描述 + +```sql +selectIntoStatement + : SELECT + resultColumn [, resultColumn] ... + INTO intoItem [, intoItem] ... + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY groupByTimeClause, groupByLevelClause] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] + ; + +intoItem + : [ALIGNED] intoDevicePath '(' intoMeasurementName [',' intoMeasurementName]* ')' + ; +``` + +#### `INTO` 子句 + +`INTO` 子句由若干个 `intoItem` 构成。 + +每个 `intoItem` 由一个目标设备路径和一个包含若干目标物理量名的列表组成(与 `INSERT` 语句中的 `INTO` 子句写法类似)。 + +其中每个目标物理量名与目标设备路径组成一个目标序列,一个 `intoItem` 包含若干目标序列。例如:`root.sg_copy.d1(s1, s2)` 指定了两条目标序列 `root.sg_copy.d1.s1` 和 `root.sg_copy.d1.s2`。 + +`INTO` 子句指定的目标序列要能够与查询结果集的列一一对应。具体规则如下: + +- **按时间对齐**(默认):全部 `intoItem` 包含的目标序列数量要与查询结果集的列数(除时间列外)一致,且按照表头从左到右的顺序一一对应。 +- **按设备对齐**(使用 `ALIGN BY DEVICE`):全部 `intoItem` 中指定的目标设备数和查询的设备数(即 `FROM` 子句中路径模式匹配的设备数)一致,且按照结果集设备的输出顺序一一对应。 + 为每个目标设备指定的目标物理量数量要与查询结果集的列数(除时间和设备列外)一致,且按照表头从左到右的顺序一一对应。 + +下面通过示例进一步说明: + +- **示例 1**(按时间对齐) +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; ++--------------+-------------------+--------+ +| source column| target timeseries| written| ++--------------+-------------------+--------+ +| root.sg.d1.s1| root.sg_copy.d1.t1| 8000| ++--------------+-------------------+--------+ +| root.sg.d2.s1| root.sg_copy.d2.t1| 10000| ++--------------+-------------------+--------+ +| root.sg.d1.s2| root.sg_copy.d2.t2| 12000| ++--------------+-------------------+--------+ +| root.sg.d2.s2| root.sg_copy.d1.t2| 10000| ++--------------+-------------------+--------+ +Total line number = 4 +It costs 0.725s +``` + +该语句将 `root.sg` database 下四条序列的查询结果写入到 `root.sg_copy` database 下指定的四条序列中。注意,`root.sg_copy.d2(t1, t2)` 也可以写做 `root.sg_copy.d2(t1), root.sg_copy.d2(t2)`。 + +可以看到,`INTO` 子句的写法非常灵活,只要满足组合出的目标序列没有重复,且与查询结果列一一对应即可。 + +> `CLI` 展示的结果集中,各列的含义如下: +> - `source column` 列表示查询结果的列名。 +> - `target timeseries` 表示对应列写入的目标序列。 +> - `written` 表示预期写入的数据量。 + +- **示例 2**(按时间对齐) +```shell +IoTDB> select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); ++--------------------------------------+-------------------------+--------+ +| source column| target timeseries| written| ++--------------------------------------+-------------------------+--------+ +| count(root.sg.d1.s1 + root.sg.d1.s2)| root.agg.count.s1_add_s2| 10| ++--------------------------------------+-------------------------+--------+ +| last_value(root.sg.d1.s2)| root.agg.last_value.s2| 10| ++--------------------------------------+-------------------------+--------+ +Total line number = 2 +It costs 0.375s +``` + +该语句将聚合查询的结果存储到指定序列中。 + +- **示例 3**(按设备对齐) +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+-------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s1| root.sg_copy.d1.t1| 8000| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s2| root.sg_copy.d1.t2| 11000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s1| root.sg_copy.d2.t1| 12000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s2| root.sg_copy.d2.t2| 9000| ++--------------+--------------+-------------------+--------+ +Total line number = 4 +It costs 0.625s +``` + +该语句同样是将 `root.sg` database 下四条序列的查询结果写入到 `root.sg_copy` database 下指定的四条序列中。但在按设备对齐中,`intoItem` 的数量必须和查询的设备数量一致,每个查询设备对应一个 `intoItem`。 + +> 按设备对齐查询时,`CLI` 展示的结果集多出一列 `source device` 列表示查询的设备。 + +- **示例 4**(按设备对齐) +```shell +IoTDB> select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+------------------------+--------+ +| root.sg.d1| s1 + s2| root.expr.add.d1s1_d1s2| 10000| ++--------------+--------------+------------------------+--------+ +| root.sg.d2| s1 + s2| root.expr.add.d2s1_d2s2| 10000| ++--------------+--------------+------------------------+--------+ +Total line number = 2 +It costs 0.532s +``` + +该语句将表达式计算的结果存储到指定序列中。 + +#### 使用变量占位符 + +特别地,可以使用变量占位符描述目标序列与查询序列之间的对应规律,简化语句书写。目前支持以下两种变量占位符: + +- 后缀复制符 `::`:复制查询设备后缀(或物理量),表示从该层开始一直到设备的最后一层(或物理量),目标设备的节点名(或物理量名)与查询的设备对应的节点名(或物理量名)相同。 +- 单层节点匹配符 `${i}`:表示目标序列当前层节点名与查询序列的第`i`层节点名相同。比如,对于路径`root.sg1.d1.s1`而言,`${1}`表示`sg1`,`${2}`表示`d1`,`${3}`表示`s1`。 + +在使用变量占位符时,`intoItem`与查询结果集列的对应关系不能存在歧义,具体情况分类讨论如下: + +##### 按时间对齐(默认) + +> 注:变量占位符**只能描述序列与序列之间的对应关系**,如果查询中包含聚合、表达式计算,此时查询结果中的列无法与某个序列对应,因此目标设备和目标物理量都不能使用变量占位符。 + +###### (1)目标设备不使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** + 1. 每个 `intoItem` 中,物理量列表的长度必须为 1。
(如果长度可以大于1,例如 `root.sg1.d1(::, s1)`,无法确定具体哪些列与`::`匹配) + 2. `intoItem` 数量为 1,或与查询结果集列数一致。
(在每个目标物理量列表长度均为 1 的情况下,若 `intoItem` 只有 1 个,此时表示全部查询序列写入相同设备;若 `intoItem` 数量与查询序列一致,则表示为每个查询序列指定一个目标设备;若 `intoItem` 大于 1 小于查询序列数,此时无法与查询序列一一对应) + +**匹配方法:** 每个查询序列指定目标设备,而目标物理量根据变量占位符生成。 + +**示例:** + +```sql +select s1, s2 +into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) +from root.sg.d1, root.sg.d2; +``` +该语句等价于: +```sql +select s1, s2 +into root.sg_copy.d1(s1), root.sg_copy.d2(s1), root.sg_copy.d1(s2), root.sg_copy.d2(s2) +from root.sg.d1, root.sg.d2; +``` +可以看到,在这种情况下,语句并不能得到很好地简化。 + +###### (2)目标设备使用变量占位符 & 目标物理量列表不使用变量占位符 + +**限制:** 全部 `intoItem` 中目标物理量的数量与查询结果集列数一致。 + +**匹配方式:** 为每个查询序列指定了目标物理量,目标设备根据对应目标物理量所在 `intoItem` 的目标设备占位符生成。 + +**示例:** +```sql +select d1.s1, d1.s2, d2.s3, d3.s4 +into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) +from root.sg; +``` + +###### (3)目标设备使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** `intoItem` 只有一个且物理量列表的长度为 1。 + +**匹配方式:** 每个查询序列根据变量占位符可以得到一个目标序列。 + +**示例:** +```sql +select * into root.sg_bk.::(::) from root.sg.**; +``` +将 `root.sg` 下全部序列的查询结果写到 `root.sg_bk`,设备名后缀和物理量名保持不变。 + +##### 按设备对齐(使用 `ALIGN BY DEVICE`) + +> 注:变量占位符**只能描述序列与序列之间的对应关系**,如果查询中包含聚合、表达式计算,此时查询结果中的列无法与某个物理量对应,因此目标物理量不能使用变量占位符。 + +###### (1)目标设备不使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** 每个 `intoItem` 中,如果物理量列表使用了变量占位符,则列表的长度必须为 1。 + +**匹配方法:** 每个查询序列指定目标设备,而目标物理量根据变量占位符生成。 + +**示例:** +```sql +select s1, s2, s3, s4 +into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) +from root.sg.d1, root.sg.d2, root.sg.d3 +align by device; +``` + +###### (2)目标设备使用变量占位符 & 目标物理量列表不使用变量占位符 + +**限制:** `intoItem` 只有一个。(如果出现多个带占位符的 `intoItem`,我们将无法得知每个 `intoItem` 需要匹配哪几个源设备) + +**匹配方式:** 每个查询设备根据变量占位符得到一个目标设备,每个设备下结果集各列写入的目标物理量由目标物理量列表指定。 + +**示例:** +```sql +select avg(s1), sum(s2) + sum(s3), count(s4) +into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) +from root.** +align by device; +``` + +###### (3)目标设备使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** `intoItem` 只有一个且物理量列表的长度为 1。 + +**匹配方式:** 每个查询序列根据变量占位符可以得到一个目标序列。 + +**示例:** +```sql +select * into ::(backup_${4}) from root.sg.** align by device; +``` +将 `root.sg` 下每条序列的查询结果写到相同设备下,物理量名前加`backup_`。 + +#### 指定目标序列为对齐序列 + +通过 `ALIGNED` 关键词可以指定写入的目标设备为对齐写入,每个 `intoItem` 可以独立设置。 + +**示例:** +```sql +select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +``` +该语句指定了 `root.sg_copy.d1` 是非对齐设备,`root.sg_copy.d2`是对齐设备。 + +#### 不支持使用的查询子句 + +- `SLIMIT`、`SOFFSET`:查询出来的列不确定,功能不清晰,因此不支持。 +- `LAST`查询、`GROUP BY TAGS`、`DISABLE ALIGN`:表结构和写入结构不一致,因此不支持。 + +#### 其他要注意的点 + +- 对于一般的聚合查询,时间戳是无意义的,约定使用 0 来存储。 +- 当目标序列存在时,需要保证源序列和目标时间序列的数据类型兼容。关于数据类型的兼容性,查看文档 [数据类型](../Basic-Concept/Data-Type.md#数据类型兼容性)。 +- 当目标序列不存在时,系统将自动创建目标序列(包括 database)。 +- 当查询的序列不存在或查询的序列不存在数据,则不会自动创建目标序列。 + +### 应用举例 + +#### 实现 IoTDB 内部 ETL +对原始数据进行 ETL 处理后写入新序列。 +```shell +IOTDB > SELECT preprocess_udf(s1, s2) INTO ::(preprocessed_s1, preprocessed_s2) FROM root.sg.* ALIGN BY DEIVCE; ++--------------+-------------------+---------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s1)| root.sg.d1.preprocessed_s1| 8000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s2)| root.sg.d1.preprocessed_s2| 10000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s1)| root.sg.d2.preprocessed_s1| 11000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s2)| root.sg.d2.preprocessed_s2| 9000| ++--------------+-------------------+---------------------------+--------+ +``` +以上语句使用自定义函数对数据进行预处理,将预处理后的结果持久化存储到新序列中。 + +#### 查询结果存储 +将查询结果进行持久化存储,起到类似物化视图的作用。 +```shell +IOTDB > SELECT count(s1), last_value(s1) INTO root.sg.agg_${2}(count_s1, last_value_s1) FROM root.sg1.d1 GROUP BY ([0, 10000), 10ms); ++--------------------------+-----------------------------+--------+ +| source column| target timeseries| written| ++--------------------------+-----------------------------+--------+ +| count(root.sg.d1.s1)| root.sg.agg_d1.count_s1| 1000| ++--------------------------+-----------------------------+--------+ +| last_value(root.sg.d1.s2)| root.sg.agg_d1.last_value_s2| 1000| ++--------------------------+-----------------------------+--------+ +Total line number = 2 +It costs 0.115s +``` +以上语句将降采样查询的结果持久化存储到新序列中。 + +#### 非对齐序列转对齐序列 +对齐序列从 0.13 版本开始支持,可以通过该功能将非对齐序列的数据写入新的对齐序列中。 + +**注意:** 建议配合使用 `LIMIT & OFFSET` 子句或 `WHERE` 子句(时间过滤条件)对数据进行分批,防止单次操作的数据量过大。 + +```shell +IOTDB > SELECT s1, s2 INTO ALIGNED root.sg1.aligned_d(s1, s2) FROM root.sg1.non_aligned_d WHERE time >= 0 and time < 10000; ++--------------------------+----------------------+--------+ +| source column| target timeseries| written| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s1| root.sg1.aligned_d.s1| 10000| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s2| root.sg1.aligned_d.s2| 10000| ++--------------------------+----------------------+--------+ +Total line number = 2 +It costs 0.375s +``` +以上语句将一组非对齐的序列的数据迁移到一组对齐序列。 + +### 相关用户权限 + +用户必须有下列权限才能正常执行查询写回语句: + +* 所有 `SELECT` 子句中源序列的 `WRITE_SCHEMA` 权限。 +* 所有 `INTO` 子句中目标序列 `WRITE_DATA` 权限。 + +更多用户权限相关的内容,请参考[权限管理语句](./Authority-Management.md)。 + +### 相关配置参数 + +* `select_into_insert_tablet_plan_row_limit` + + | 参数名 | select_into_insert_tablet_plan_row_limit | + | ---- | ---- | + | 描述 | 写入过程中每一批 `Tablet` 的最大行数 | + | 类型 | int32 | + | 默认值 | 10000 | + | 改后生效方式 | 重启后生效 | diff --git a/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Write-Delete-Data.md b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Write-Delete-Data.md new file mode 100644 index 00000000..f7c7bcc5 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Basic-Concept/Write-Delete-Data.md @@ -0,0 +1,256 @@ + + + +# 数据写入与删除 +## CLI写入数据 + +IoTDB 为用户提供多种插入实时数据的方式,例如在 [Cli/Shell 工具](../Tools-System/CLI.md) 中直接输入插入数据的 INSERT 语句,或使用 Java API(标准 [Java JDBC](../API/Programming-JDBC.md) 接口)单条或批量执行插入数据的 INSERT 语句。 + +本节主要为您介绍实时数据接入的 INSERT 语句在场景中的实际使用示例,有关 INSERT SQL 语句的详细语法请参见本文 [INSERT 语句](../SQL-Manual/SQL-Manual.md#写入数据) 节。 + +注:写入重复时间戳的数据则原时间戳数据被覆盖,可视为更新数据。 + +### 使用 INSERT 语句 + +使用 INSERT 语句可以向指定的已经创建的一条或多条时间序列中插入数据。对于每一条数据,均由一个时间戳类型的时间戳和一个数值或布尔值、字符串类型的传感器采集值组成。 + +在本节的场景实例下,以其中的两个时间序列`root.ln.wf02.wt02.status`和`root.ln.wf02.wt02.hardware`为例 ,它们的数据类型分别为 BOOLEAN 和 TEXT。 + +单列数据插入示例代码如下: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) +IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') +``` + +以上示例代码将长整型的 timestamp 以及值为 true 的数据插入到时间序列`root.ln.wf02.wt02.status`中和将长整型的 timestamp 以及值为”v1”的数据插入到时间序列`root.ln.wf02.wt02.hardware`中。执行成功后会返回执行时间,代表数据插入已完成。 + +> 注意:在 IoTDB 中,TEXT 类型的数据单双引号都可以来表示,上面的插入语句是用的是双引号表示 TEXT 类型数据,下面的示例将使用单引号表示 TEXT 类型数据。 + +INSERT 语句还可以支持在同一个时间点下多列数据的插入,同时向 2 时间点插入上述两个时间序列的值,多列数据插入示例代码如下: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) values (2, false, 'v2') +``` + +此外,INSERT 语句支持一次性插入多行数据,同时向 2 个不同时间点插入上述时间序列的值,示例代码如下: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') +``` + +插入数据后我们可以使用 SELECT 语句简单查询已插入的数据。 + +```sql +IoTDB > select * from root.ln.wf02.wt02 where time < 5 +``` + +结果如图所示。由查询结果可以看出,单列、多列数据的插入操作正确执行。 + +``` ++-----------------------------+--------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status| ++-----------------------------+--------------------------+------------------------+ +|1970-01-01T08:00:00.001+08:00| v1| true| +|1970-01-01T08:00:00.002+08:00| v2| false| +|1970-01-01T08:00:00.003+08:00| v3| false| +|1970-01-01T08:00:00.004+08:00| v4| true| ++-----------------------------+--------------------------+------------------------+ +Total line number = 4 +It costs 0.004s +``` + +此外,我们可以省略 timestamp 列,此时系统将使用当前的系统时间作为该数据点的时间戳,示例代码如下: +```sql +IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') +``` +**注意:** 当一次插入多行数据时必须指定时间戳。 + +### 向对齐时间序列插入数据 + +向对齐时间序列插入数据只需在SQL中增加`ALIGNED`关键词,其他类似。 + +示例代码如下: + +```sql +IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(2, 2, 2), (3, 3, 3) +IoTDB > select * from root.sg1.d1 +``` + +结果如图所示。由查询结果可以看出,数据的插入操作正确执行。 + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| 2.0| +|1970-01-01T08:00:00.003+08:00| 3| 3.0| ++-----------------------------+--------------+--------------+ +Total line number = 3 +It costs 0.004s +``` + +## 原生接口写入 +原生接口 (Session) 是目前IoTDB使用最广泛的系列接口,包含多种写入接口,适配不同的数据采集场景,性能高效且支持多语言。 + +### 多语言接口写入 +* ### Java + 使用Java接口写入之前,你需要先建立连接,参考 [Java原生接口](../API/Programming-Java-Native-API.md)。 + 之后通过 [ JAVA 数据操作接口(DML)](../API/Programming-Java-Native-API.md#数据写入)写入。 + +* ### Python + 参考 [ Python 数据操作接口(DML)](../API/Programming-Python-Native-API.md#数据写入) + +* ### C++ + 参考 [ C++ 数据操作接口(DML)](../API/Programming-Cpp-Native-API.md) + +* ### Go + 参考 [Go 原生接口](../API/Programming-Go-Native-API.md) + +## REST API写入 + +参考 [insertTablet (v1)](../API/RestServiceV1.md#inserttablet) or [insertTablet (v2)](../API/RestServiceV2.md#inserttablet) + +示例如下: +```JSON +{ +      "timestamps": [ +            1, +            2, +            3 +      ], +      "measurements": [ +            "temperature", +            "status" +      ], +      "data_types": [ +            "FLOAT", +            "BOOLEAN" +      ], +      "values": [ +            [ +                  1.1, +                  2.2, +                  3.3 +            ], +            [ +                  false, +                  true, +                  true +            ] +      ], +      "is_aligned": false, +      "device": "root.ln.wf01.wt01" +} +``` + +## MQTT写入 + +参考 [内置 MQTT 服务](../API/Programming-MQTT.md#内置-mqtt-服务) + +## 批量数据导入 + +针对于不同场景,IoTDB 为用户提供多种批量导入数据的操作方式,本章节向大家介绍最为常用的两种方式为 CSV文本形式的导入 和 TsFile文件形式的导入。 + +### TsFile批量导入 + +TsFile 是在 IoTDB 中使用的时间序列的文件格式,您可以通过CLI等工具直接将存有时间序列的一个或多个 TsFile 文件导入到另外一个正在运行的IoTDB实例中。具体操作方式请参考[数据导入](../Tools-System/Data-Import-Tool.md)。 + +### CSV批量导入 + +CSV 是以纯文本形式存储表格数据,您可以在CSV文件中写入多条格式化的数据,并批量的将这些数据导入到 IoTDB 中,在导入数据之前,建议在IoTDB中创建好对应的元数据信息。如果忘记创建元数据也不要担心,IoTDB 可以自动将CSV中数据推断为其对应的数据类型,前提是你每一列的数据类型必须唯一。除单个文件外,此工具还支持以文件夹的形式导入多个 CSV 文件,并且支持设置如时间精度等优化参数。具体操作方式请参考[数据导入](../Tools-System/Data-Import-Tool.md)。 + +## 删除数据 + +用户使用 [DELETE 语句](../SQL-Manual/SQL-Manual.md#删除数据) 可以删除指定的时间序列中符合时间删除条件的数据。在删除数据时,用户可以选择需要删除的一个或多个时间序列、时间序列的前缀、时间序列带、*路径对某一个时间区间内的数据进行删除。 + +在 JAVA 编程环境中,您可以使用 JDBC API 单条或批量执行 DELETE 语句。 + +### 单传感器时间序列值删除 + +以测控 ln 集团为例,存在这样的使用场景: + +wf02 子站的 wt02 设备在 2017-11-01 16:26:00 之前的供电状态出现多段错误,且无法分析其正确数据,错误数据影响了与其他设备的关联分析。此时,需要将此时间段前的数据删除。进行此操作的 SQL 语句为: + +```sql +delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; +``` + +如果我们仅仅想要删除 2017 年内的在 2017-11-01 16:26:00 之前的数据,可以使用以下 SQL: +```sql +delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` + +IoTDB 支持删除一个时间序列任何一个时间范围内的所有时序点,用户可以使用以下 SQL 语句指定需要删除的时间范围: +```sql +delete from root.ln.wf02.wt02.status where time < 10 +delete from root.ln.wf02.wt02.status where time <= 10 +delete from root.ln.wf02.wt02.status where time < 20 and time > 10 +delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 +delete from root.ln.wf02.wt02.status where time > 20 +delete from root.ln.wf02.wt02.status where time >= 20 +delete from root.ln.wf02.wt02.status where time = 20 +``` + +需要注意,当前的删除语句不支持 where 子句后的时间范围为多个由 OR 连接成的时间区间。如下删除语句将会解析出错: +``` +delete from root.ln.wf02.wt02.status where time > 4 or time < 0 +Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic +expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' +``` + +如果 delete 语句中未指定 where 子句,则会删除时间序列中的所有数据。 +```sql +delete from root.ln.wf02.wt02.status +``` + +### 多传感器时间序列值删除 + +当 ln 集团 wf02 子站的 wt02 设备在 2017-11-01 16:26:00 之前的供电状态和设备硬件版本都需要删除,此时可以使用含义更广的 [路径模式(Path Pattern)](../Basic-Concept/Data-Model-and-Terminology.md) 进行删除操作,进行此操作的 SQL 语句为: + + +```sql +delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; +``` + +需要注意的是,当删除的路径不存在时,IoTDB 不会提示路径不存在,而是显示执行成功,因为 SQL 是一种声明式的编程方式,除非是语法错误、权限不足等,否则都不认为是错误,如下所示。 + +```sql +IoTDB> delete from root.ln.wf03.wt02.status where time < now() +Msg: The statement is executed successfully. +``` + +### 删除时间分区 (实验性功能) +您可以通过如下语句来删除某一个 database 下的指定时间分区: + +```sql +DELETE PARTITION root.ln 0,1,2 +``` + +上例中的 0,1,2 为待删除时间分区的 id,您可以通过查看 IoTDB 的数据文件夹找到它,或者可以通过计算`timestamp / partitionInterval`(向下取整), +手动地将一个时间戳转换为对应的 id,其中的`partitionInterval`可以在 IoTDB 的配置文件中找到(如果您使用的版本支持时间分区)。 + +请注意该功能目前只是实验性的,如果您不是开发者,使用时请务必谨慎。 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/AINode_Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/AINode_Deployment_timecho.md new file mode 100644 index 00000000..80d66262 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/AINode_Deployment_timecho.md @@ -0,0 +1,546 @@ + +# AINode 部署 + +## AINode介绍 + +### 能力介绍 + +AINode 是 IoTDB 在 ConfigNode、DataNode 后提供的第三种内生节点,该节点通过与 IoTDB 集群的 DataNode、ConfigNode 的交互,扩展了对时间序列进行机器学习分析的能力,支持从外部引入已有机器学习模型进行注册,并使用注册的模型在指定时序数据上通过简单 SQL 语句完成时序分析任务的过程,将模型的创建、管理及推理融合在数据库引擎中。目前已提供常见时序分析场景(例如预测与异常检测)的机器学习算法或自研模型。 + +### 交付方式 + 是 IoTDB 集群外的额外套件,独立安装包,独立激活(如需试用或使用,请联系天谋科技商务或技术支持)。 + +### 部署模式 +
+ + +
+ +## 安装准备 + +### 安装包获取 + + 用户可以下载AINode的软件安装包,下载并解压后即完成AINode的安装。 + + 解压后安装包(`iotdb-enterprise-ainode-.zip`),安装包解压后目录结构如下: +| **目录** | **类型** | **说明** | +| ------------ | -------- | ------------------------------------------------ | +| lib | 文件夹 | AINode编译后的二进制可执行文件以及相关的代码依赖 | +| sbin | 文件夹 | AINode的运行脚本,可以启动,移除和停止AINode | +| conf | 文件夹 | 包含AINode的配置项,具体包含以下配置项 | +| LICENSE | 文件 | 证书 | +| NOTICE | 文件 | 提示 | +| README_ZH.md | 文件 | markdown格式的中文版说明 | +| `README.md` | 文件 | 使用说明 | + +### 环境准备 +- 建议操作环境: Ubuntu, CentOS, MacOS + +- 运行环境 + - 联网环境下 Python >= 3.8即可,且带有 pip 和 venv 工具;非联网环境下需要使用 Python 3.8版本,并从 [此处](https://cloud.tsinghua.edu.cn/d/4c1342f6c272439aa96c/?p=%2Flibs&mode=list) 下载对应操作系统的zip压缩包(注意下载依赖需选择libs文件夹中的zip压缩包,如下图),并将文件夹下的所有文件拷贝到 `iotdb-enterprise-ainode-` 文件夹中 `lib` 文件夹下,并按下文步骤启动AINode。 + + + + - 环境变量中需存在 Python 解释器且可以通过 `python` 指令直接调用 + - 建议在 `iotdb-enterprise-ainode-` 文件夹下,新建 Python 解释器 venv 虚拟环境。如安装 3.8.0 版本虚拟环境,语句如下: + + ```shell + # 安装3.8.0版本的venv,创建虚拟环境,文件夹名为 `venv` + ../Python-3.8.0/python -m venv `venv` + ``` +## 安装部署及使用 + +### 安装 AINode + +1. AINode 激活 + + 要求 IoTDB 处于正常运行状态,且 license 中有 AINode 模块授权(通常 license 中不具有 AINode 授权,可联系天谋商务或技术支持人员获取 AINode 模块授权)。 + + 激活 AINode 模块授权方式如下: + - 方式一:激活文件拷贝激活 + - 重新启动 confignode 节点后,进入 activation 文件夹, 将 system_info 文件复制给天谋工作人员,并告知工作人员申请 AINode 独立授权; + - 收到工作人员返回的 license 文件; + - 将 license 文件放入对应节点的 activation 文件夹下; +- 方式二:激活脚本激活 + - 获取激活所需机器码,进入安装目录的 `sbin` 目录,执行激活脚本: + ```shell + cd sbin + ./start-activate.sh + ``` + - 显示如下信息,请将机器码(即该串字符)复制给天谋工作人员,并告知工作人员申请 AINode 独立授权: + ```shell + Please copy the system_info's content and send it to Timecho: + 01-KU5LDFFN-PNBEHDRH + Please enter license: + ``` + - 将工作人员返回的激活码输入上一步的命令行提示处 `Please enter license:`,如下提示: + ```shell + Please enter license: + Jw+MmF+AtexsfgNGOFgTm83BgXbq0zT1+fOfPvQsLlj6ZsooHFU6HycUSEGC78eT1g67KPvkcLCUIsz2QpbyVmPLr9x1+kVjBubZPYlVpsGYLqLFc8kgpb5vIrPLd3hGLbJ5Ks8fV1WOVrDDVQq89YF2atQa2EaB9EAeTWd0bRMZ+s9ffjc/1Zmh9NSP/T3VCfJcJQyi7YpXWy5nMtcW0gSV+S6fS5r7a96PjbtE0zXNjnEhqgRzdU+mfO8gVuUNaIy9l375cp1GLpeCh6m6pF+APW1CiXLTSijK9Qh3nsL5bAOXNeob5l+HO5fEMgzrW8OJPh26Vl6ljKUpCvpTiw== + License has been stored to sbin/../activation/license + Import completed. Please start cluster and excute 'show cluster' to verify activation status + ``` +- 更新 license 后,重新启动 DataNode 节点,进入 IoTDB 的 sbin 目录下,启动 datanode: + ```shell + cd sbin + ./start-datanode.sh -d #-d参数将在后台进行启动 + ``` + + 2. 检查Linux的内核架构 + ```shell + uname -m + ``` + + 3. 导入Python环境[下载](https://repo.anaconda.com/miniconda/) + + 推荐下载py311版本应用,导入至用户根目录下 iotdb专用文件夹 中 + + 4. 切换至iotdb专用文件夹安装Python环境 + + 以 Miniconda3-py311_24.5.0-0-Linux-x86_64 为例: + + ```shell + bash ./Miniconda3-py311_24.5.0-0-Linux-x86_64.sh + ``` + > 根据提示键入“回车”、“长按空格”、“回车”、“yes”、“yes”
+ > 关闭当前SSH窗口重新连接 + + 5. 创建专用环境 + + ```shell + conda create -n ainode_py python=3.11.9 + ``` + + 根据提示键入“y” + + 6. 激活专用环境 + + ```shell + conda activate ainode_py + ``` + + 7. 验证Python版本 + + ```shell + python --version + ``` + 8. 下载导入AINode到专用文件夹,切换到专用文件夹并解压安装包 + + ```shell + unzip iotdb-enterprise-ainode-1.3.3.2.zip + ``` + + 9. 配置项修改 + + ```shell + vi iotdb-enterprise-ainode-1.3.3.2/conf/iotdb-ainode.properties + ``` + 配置项修改:[详细信息](#配置项修改) + > ain_seed_config_node=iotdb-1:10710(集群通讯节点IP:通讯节点端口)
+ > ain_inference_rpc_address=iotdb-3(运行AINode的服务器IP) + + 10. 更换Python源 + + ```shell + pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/ + ``` + + 11. 启动AINode节点 + + ```shell + nohup bash iotdb-enterprise-ainode-1.3.3.2/sbin/start-ainode.sh > myout.file 2>& 1 & + ``` + > 回到系统默认环境:conda deactivate + +### 配置项修改 +AINode 支持修改一些必要的参数。可以在 `conf/iotdb-ainode.properties` 文件中找到下列参数并进行持久化的修改: + +| **名称** | **描述** | **类型** | **默认值** | **改后生效方式** | +| :----------------------------- | ------------------------------------------------------------ | ------- | ------------------ | ---------------------------- | +| cluster_name | AINode 要加入集群的标识 | string | defaultCluster | 仅允许在第一次启动服务前修改 | +| ain_seed_config_node | AINode 启动时注册的 ConfigNode 地址 | String | 127.0.0.1:10710 | 仅允许在第一次启动服务前修改 | +| ain_inference_rpc_address | AINode 提供服务与通信的地址 ,内部服务通讯接口 | String | 127.0.0.1 | 仅允许在第一次启动服务前修改 | +| ain_inference_rpc_port | AINode 提供服务与通信的端口 | String | 10810 | 仅允许在第一次启动服务前修改 | +| ain_system_dir | AINode 元数据存储路径,相对路径的起始目录与操作系统相关,建议使用绝对路径 | String | data/AINode/system | 仅允许在第一次启动服务前修改 | +| ain_models_dir | AINode 存储模型文件的路径,相对路径的起始目录与操作系统相关,建议使用绝对路径 | String | data/AINode/models | 仅允许在第一次启动服务前修改 | +| ain_logs_dir | AINode 存储日志的路径,相对路径的起始目录与操作系统相关,建议使用绝对路径 | String | logs/AINode | 重启后生效 | +| ain_thrift_compression_enabled | AINode 是否启用 thrift 的压缩机制,0-不启动、1-启动 | Boolean | 0 | 重启后生效 | +### 启动 AINode + + 在完成 Seed-ConfigNode 的部署后,可以通过添加 AINode 节点来支持模型的注册和推理功能。在配置项中指定 IoTDB 集群的信息后,可以执行相应的指令来启动 AINode,加入 IoTDB 集群。 + +#### 联网环境启动 + +##### 启动命令 + +```shell + # 启动命令 + # Linux 和 MacOS 系统 + bash sbin/start-ainode.sh + + # Windows 系统 + sbin\start-ainode.bat + + # 后台启动命令(长期运行推荐) + # Linux 和 MacOS 系统 + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + + # Windows 系统 + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +##### 详细语法 + +```shell + # 启动命令 + # Linux 和 MacOS 系统 + bash sbin/start-ainode.sh -i -r -n + + # Windows 系统 + sbin\start-ainode.bat -i -r -n + ``` + +##### 参数介绍: + +| **名称** | **标签** | **描述** | **是否必填** | **类型** | **默认值** | **输入方式** | +| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | ---------------------- | +| ain_interpreter_dir | -i | AINode 所安装在的虚拟环境的解释器路径,需要使用绝对路径 | 否 | String | 默认读取环境变量 | 调用时输入或持久化修改 | +| ain_force_reinstall | -r | 该脚本在检查 AINode 安装情况的时候是否检查版本,如果检查则在版本不对的情况下会强制安装 lib 里的 whl 安装包 | 否 | Bool | false | 调用时输入 | +| ain_no_dependencies | -n | 指定在安装 AINode 的时候是否安装依赖,如果指定则仅安装 AINode 主程序而不安装依赖。 | 否 | Bool | false | 调用时输入 | + + 如不想每次启动时指定对应参数,也可以在 `conf` 文件夹下的`ainode-env.sh` 和 `ainode-env.bat` 脚本中持久化修改参数(目前支持持久化修改 ain_interpreter_dir 参数)。 + + `ainode-env.sh` : + ```shell + # The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + # ain_interpreter_dir= + ``` + `ainode-env.bat` : +```shell + @REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + @REM set ain_interpreter_dir= + ``` + 在写入参数值的后解除对应行的注释并保存即可在下一次执行脚本时生效。 + +#### 示例 + +##### 直接启动: + +```shell + # 启动命令 + # Linux 和 MacOS 系统 + bash sbin/start-ainode.sh + # Windows 系统 + sbin\start-ainode.bat + + + # 后台启动命令(长期运行推荐) + # Linux 和 MacOS 系统 + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + # Windows 系统 + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +##### 更新启动: +如果 AINode 的版本进行了更新(如更新了 `lib` 文件夹),可使用此命令。首先要保证 AINode 已经停止运行,然后通过 `-r` 参数重启,该参数会根据 `lib` 下的文件重新安装 AINode。 + +```shell + # 更新启动命令 + # Linux 和 MacOS 系统 + bash sbin/start-ainode.sh -r + # Windows 系统 + sbin\start-ainode.bat -r + + + # 后台更新启动命令(长期运行推荐) + # Linux 和 MacOS 系统 + nohup bash sbin/start-ainode.sh -r > myout.file 2>& 1 & + # Windows 系统 + nohup bash sbin\start-ainode.bat -r > myout.file 2>& 1 & + ``` +#### 非联网环境启动 + +##### 启动命令 + +```shell + # 启动命令 + # Linux 和 MacOS 系统 + bash sbin/start-ainode.sh + + # Windows 系统 + sbin\start-ainode.bat + + # 后台启动命令(长期运行推荐) + # Linux 和 MacOS 系统 + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + + # Windows 系统 + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +##### 详细语法 + +```shell + # 启动命令 + # Linux 和 MacOS 系统 + bash sbin/start-ainode.sh -i -r -n + + # Windows 系统 + sbin\start-ainode.bat -i -r -n + ``` + +##### 参数介绍: + +| **名称** | **标签** | **描述** | **是否必填** | **类型** | **默认值** | **输入方式** | +| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | ---------------------- | +| ain_interpreter_dir | -i | AINode 所安装在的虚拟环境的解释器路径,需要使用绝对路径 | 否 | String | 默认读取环境变量 | 调用时输入或持久化修改 | +| ain_force_reinstall | -r | 该脚本在检查 AINode 安装情况的时候是否检查版本,如果检查则在版本不对的情况下会强制安装 lib 里的 whl 安装包 | 否 | Bool | false | 调用时输入 | + +> 注意:非联网环境下安装失败时,首先检查是否选择了平台对应的安装包,其次确认python版本为3.8(由于下载的安装包限制了python版本,3.7、3.9等其他都不行) + +#### 示例 + +##### 直接启动: + +```shell + # 启动命令 + # Linux 和 MacOS 系统 + bash sbin/start-ainode.sh + # Windows 系统 + sbin\start-ainode.bat + + + # 后台启动命令(长期运行推荐) + # Linux 和 MacOS 系统 + nohup bash sbin/start-ainode.sh > myout.file 2>& 1 & + # Windows 系统 + nohup bash sbin\start-ainode.bat > myout.file 2>& 1 & + ``` + +### 检测 AINode 节点状态 + +AINode 启动过程中会自动将新的 AINode 加入 IoTDB 集群。启动 AINode 后可以在 命令行中输入 SQL 来查询,集群中看到 AINode 节点,其运行状态为 Running(如下展示)表示加入成功。 + +```shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|Running| 127.0.0.1| 10810|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` + +### 停止 AINode + +如果需要停止正在运行的 AINode 节点,则执行相应的关闭脚本。 + +#### 停止命令 + +```shell + # Linux / MacOS + bash sbin/stop-ainode.sh + + #Windows + sbin\stop-ainode.bat + ``` + +##### 详细语法 + +```shell + # Linux / MacOS + bash sbin/stop-ainode.sh -t + + #Windows + sbin\stop-ainode.bat -t + ``` + +##### 参数介绍: + + | **名称** | **标签** | **描述** | **是否必填** | **类型** | **默认值** | **输入方式** | +| ----------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ------ | ---------- | +| ain_remove_target | -t | AINode 关闭时可以指定待移除的目标 AINode 的 Node ID、地址和端口号,格式为`` | 否 | String | 无 | 调用时输入 | + +#### 示例 +```shell + # Linux / MacOS + bash sbin/stop-ainode.sh + + # Windows + sbin\stop-ainode.bat + ``` +停止 AINode 后,还可以在集群中看到 AINode 节点,其运行状态为 UNKNOWN(如下展示),此时无法使用 AINode 功能。 + + ```shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|UNKNOWN| 127.0.0.1| 10790|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` +如果需要重新启动该节点,需重新执行启动脚本。 + +### 移除 AINode + +当需要把一个 AINode 节点移出集群时,可以执行移除脚本。移除和停止脚本的差别是:停止是在集群中保留 AINode 节点但停止 AINode 服务,移除则是把 AINode 节点从集群中移除出去。 + + + #### 移除命令 + +```shell + # Linux / MacOS + bash sbin/remove-ainode.sh + + # Windows + sbin\remove-ainode.bat + ``` + +##### 详细语法 + +```shell + # Linux / MacOS + bash sbin/remove-ainode.sh -i -t -r -n + + # Windows + sbin\remove-ainode.bat -i -t -r -n + ``` + +##### 参数介绍: + + | **名称** | **标签** | **描述** | **是否必填** | **类型** | **默认值** | **输入方式** | +| ------------------- | ---- | ------------------------------------------------------------ | -------- | ------ | ---------------- | --------------------- | +| ain_interpreter_dir | -i | AINode 所安装在的虚拟环境的解释器路径,需要使用绝对路径 | 否 | String | 默认读取环境变量 | 调用时输入+持久化修改 | +| ain_remove_target | -t | AINode 关闭时可以指定待移除的目标 AINode 的 Node ID、地址和端口号,格式为`` | 否 | String | 无 | 调用时输入 | +| ain_force_reinstall | -r | 该脚本在检查 AINode 安装情况的时候是否检查版本,如果检查则在版本不对的情况下会强制安装 lib 里的 whl 安装包 | 否 | Bool | false | 调用时输入 | +| ain_no_dependencies | -n | 指定在安装 AINode 的时候是否安装依赖,如果指定则仅安装 AINode 主程序而不安装依赖。 | 否 | Bool | false | 调用时输入 | + + 如不想每次启动时指定对应参数,也可以在 `conf` 文件夹下的`ainode-env.sh` 和 `ainode-env.bat` 脚本中持久化修改参数(目前支持持久化修改 ain_interpreter_dir 参数)。 + + `ainode-env.sh` : + ```shell + # The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + # ain_interpreter_dir= + ``` + `ainode-env.bat` : +```shell + @REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark + @REM set ain_interpreter_dir= + ``` + 在写入参数值的后解除对应行的注释并保存即可在下一次执行脚本时生效。 + +#### 示例 + +##### 直接移除: + + ```shell + # Linux / MacOS + bash sbin/remove-ainode.sh + + # Windows + sbin\remove-ainode.bat + ``` + 移除节点后,将无法查询到节点的相关信息。 + + ```shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` +##### 指定移除: + +如果用户丢失了 data 文件夹下的文件,可能 AINode 本地无法主动移除自己,需要用户指定节点号、地址和端口号进行移除,此时我们支持用户按照以下方法输入参数进行删除。 + + ```shell + # Linux / MacOS + bash sbin/remove-ainode.sh -t /: + + # Windows + sbin\remove-ainode.bat -t /: + ``` + +## 常见问题 + +### 启动AINode时出现找不到venv模块的报错 + + 当使用默认方式启动 AINode 时,会在安装包目录下创建一个 python 虚拟环境并安装依赖,因此要求安装 venv 模块。通常来说 python3.8 及以上的版本会自带 venv,但对于一些系统自带的 python 环境可能并不满足这一要求。出现该报错时有两种解决方案(二选一): + + 在本地安装 venv 模块,以 ubuntu 为例,可以通过运行以下命令来安装 python 自带的 venv 模块。或者从 python 官网安装一个自带 venv 的 python 版本。 + + ```shell +apt-get install python3.8-venv +``` + 安装 3.8.0 版本的 venv 到 AINode 里面 在 AINode 路径下 + + ```shell +../Python-3.8.0/python -m venv venv(文件夹名) +``` + 在运行启动脚本时通过 `-i` 指定已有的 python 解释器路径作为 AINode 的运行环境,这样就不再需要创建一个新的虚拟环境。 + + ### python中的SSL模块没有被正确安装和配置,无法处理HTTPS资源 +WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. +可以安装 OpenSSLS 后,再重新构建 python 来解决这个问题 +> Currently Python versions 3.6 to 3.9 are compatible with OpenSSL 1.0.2, 1.1.0, and 1.1.1. + + Python 要求我们的系统上安装有 OpenSSL,具体安装方法可见[链接](https://stackoverflow.com/questions/56552390/how-to-fix-ssl-module-in-python-is-not-available-in-centos) + + ```shell +sudo apt-get install build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev uuid-dev lzma-dev liblzma-dev +sudo -E ./configure --with-ssl +make +sudo make install +``` + + ### pip版本较低 + + windows下出现类似“error:Microsoft Visual C++ 14.0 or greater is required...”的编译问题 + + 出现对应的报错,通常是 c++版本或是 setuptools 版本不足,可以在 + + ```shell +./python -m pip install --upgrade pip +./python -m pip install --upgrade setuptools +``` + + + ### 安装编译python + + 使用以下指定从官网下载安装包并解压: + ```shell +.wget https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tar.xz +tar Jxf Python-3.8.0.tar.xz +``` + 编译安装对应的 python 包: + ```shell +cd Python-3.8.0 +./configure prefix=/usr/local/python3 +make +sudo make install +python3 --version +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_apache.md new file mode 100644 index 00000000..f762ba60 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_apache.md @@ -0,0 +1,342 @@ + +# 集群版部署 + +本小节将以IoTDB经典集群部署架构3C3D(3个ConfigNode和3个DataNode)为例,介绍如何部署集群,即通常所说的3C3D集群。3C3D集群架构图如下: + +
+ +
+ +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](./Environment-Requirements.md)准备完成。 + +2. 部署时推荐优先使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在目标服务器上配置/etc/hosts,如本机ip是192.168.1.3,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的`cn_internal_address`、`dn_internal_address`。 + + ``` shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. 有些参数首次启动后不能修改,请参考下方的"参数配置"章节来进行设置。 + +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 + +5. 请注意,安装部署IoTDB时需要保持使用同一个用户进行操作,您可以: +- 使用 root 用户(推荐):使用 root 用户可以避免权限等问题。 +- 使用固定的非 root 用户: + - 使用同一用户操作:确保在启动、停止等操作均保持使用同一用户,不要切换用户。 + - 避免使用 sudo:尽量避免使用 sudo 命令,因为它会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 + +## 准备步骤 + +1. 准备IoTDB数据库安装包 :apache-iotdb-{version}-all-bin.zip(安装包获取见:[链接](../Deployment-and-Maintenance/IoTDB-Package_apache.md)) + +2. 按环境要求配置好操作系统环境(系统环境配置见:[链接](../Deployment-and-Maintenance/Environment-Requirements.md)) + +## 安装步骤 + +假设现在有3台linux服务器,IP地址和服务角色分配如下: + +| 节点ip | 主机名 | 服务 | +| ----------- | ------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +### 设置主机名 + +在3台机器上分别配置主机名,设置主机名需要在目标服务器上配置`/etc/hosts`,使用如下命令: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### 参数配置 + +解压安装包并进入安装目录 + +```Plain +unzip apache-iotdb-{version}-all-bin.zip +cd apache-iotdb-{version}-all-bin +``` + +#### 环境脚本配置 + +- `./conf/confignode-env.sh` 配置 + + | **配置项** | **说明** | **默认值** | **推荐值** | 备注 | + | :---------- | :------------------------------------- | :--------- | :----------------------------------------------- | :----------- | + | MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- `./conf/datanode-env.sh` 配置 + + | **配置项** | **说明** | **默认值** | **推荐值** | 备注 | + | :---------- | :----------------------------------- | :--------- | :----------------------------------------------- | :----------- | + | MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 通用配置 + +打开通用配置文件`./conf/iotdb-system.properties`,可根据部署方式设置以下参数: + +| 配置项 | 说明 | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | +| ------------------------- | ---------------------------------------- | -------------- | -------------- | -------------- | +| cluster_name | 集群名称 | defaultCluster | defaultCluster | defaultCluster | +| schema_replication_factor | 元数据副本数,DataNode数量不应少于此数目 | 3 | 3 | 3 | +| data_replication_factor | 数据副本数,DataNode数量不应少于此数目 | 2 | 2 | 2 | + +#### ConfigNode 配置 + +打开ConfigNode配置文件`./conf/iotdb-system.properties`,设置以下参数 + +| 配置项 | 说明 | 默认 | 推荐值 | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | 备注 | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 10710 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 10720 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +#### DataNode 配置 + +打开DataNode配置文件 `./conf/iotdb-system.properties`,设置以下参数: + +| 配置项 | 说明 | 默认 | 推荐值 | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | 备注 | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| dn_rpc_address | 客户端 RPC 服务的地址 | 127.0.0.1 | 推荐使用所在服务器的**IPV4地址或hostname** | iotdb-1 |iotdb-2 | iotdb-3 | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 6667 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 10730 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 10740 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 10750 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 10760 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +> ❗️注意:VSCode Remote等编辑器无自动保存配置功能,请确保修改的文件被持久化保存,否则配置项无法生效 + +### 启动ConfigNode节点 + +先启动第一个iotdb-1的confignode, 保证种子confignode节点先启动,然后依次启动第2和第3个confignode节点 + +```Bash +cd sbin +./start-confignode.sh -d #“-d”参数将在后台进行启动 +``` +如果启动失败,请参考[常见问题](#常见问题)。 + +### 启动DataNode 节点 + + 分别进入iotdb的`sbin`目录下,依次启动3个datanode节点: + +```Bash +cd sbin +./start-datanode.sh -d #“-d”参数将在后台进行启动 +``` + +### 验证部署 + +可直接执行`./sbin`目录下的Cli启动脚本: + +```Plain +./start-cli.sh -h ip(本机ip或域名) -p 端口号(6667) +``` + + 成功启动后,出现如下界面显示IOTDB安装成功。 + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90%E6%88%90%E5%8A%9F.png) + +​ 可以使用`show cluster` 命令查看集群信息: + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90%E7%89%88%20show%20cluter.png) + +> 出现`ACTIVATED(W)`为被动激活,表示此ConfigNode没有license文件(或没有签发时间戳最新的license文件),其激活依赖于集群中其它Activate状态的ConfigNode。此时建议检查license文件是否已放入license文件夹,没有请放入license文件,若已存在license文件,可能是此节点license文件与其他节点信息不一致导致,请联系天谋工作人员重新申请. + +## 节点维护步骤 + +### ConfigNode节点维护 + +ConfigNode节点维护分为ConfigNode添加和移除两种操作,有两个常见使用场景: +- 集群扩展:如集群中只有1个ConfigNode时,希望增加ConfigNode以提升ConfigNode节点高可用性,则可以添加2个ConfigNode,使得集群中有3个ConfigNode。 +- 集群故障恢复:1个ConfigNode所在机器发生故障,使得该ConfigNode无法正常运行,此时可以移除该ConfigNode,然后添加一个新的ConfigNode进入集群。 + +> ❗️注意,在完成ConfigNode节点维护后,需要保证集群中有1或者3个正常运行的ConfigNode。2个ConfigNode不具备高可用性,超过3个ConfigNode会导致性能损失。 + +#### 添加ConfigNode节点 + +脚本命令: +```shell +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-confignode.sh + +# Windows +# 首先切换到IoTDB根目录 +sbin/start-confignode.bat +``` + +参数介绍: + +| 参数 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + +#### 移除ConfigNode节点 + +首先通过CLI连接集群,通过`show confignodes`确认想要移除ConfigNode的内部地址与端口号: + +```Bash +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] + +#Windows +sbin/remove-confignode.bat [confignode_id] + +``` + +### DataNode节点维护 + +DataNode节点维护有两个常见场景: + +- 集群扩容:出于集群能力扩容等目的,添加新的DataNode进入集群 +- 集群故障恢复:一个DataNode所在机器出现故障,使得该DataNode无法正常运行,此时可以移除该DataNode,并添加新的DataNode进入集群 + +> ❗️注意,为了使集群能正常工作,在DataNode节点维护过程中以及维护完成后,正常运行的DataNode总数不得少于数据副本数(通常为2),也不得少于元数据副本数(通常为3)。 + +#### 添加DataNode节点 + +脚本命令: + +```Bash +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-datanode.sh + +# Windows +# 首先切换到IoTDB根目录 +sbin/start-datanode.bat +``` + +参数介绍: + +| 缩写 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + +说明:在添加DataNode后,随着新的写入到来(以及旧数据过期,如果设置了TTL),集群负载会逐渐向新的DataNode均衡,最终在所有节点上达到存算资源的均衡。 + +#### 移除DataNode节点 + +首先通过CLI连接集群,通过`show datanodes`确认想要移除的DataNode的RPC地址与端口号: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [datanode_id] + +#Windows +sbin/remove-datanode.bat [datanode_id] +``` + +## 常见问题 + +1. Confignode节点启动失败 + + 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + + 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + + 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + + 步骤 4: 清理环境: + + a. 结束所有 ConfigNode 和 DataNode 进程。 + + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. 删除 data 和 logs 目录。 + + 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..0de5e289 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Cluster-Deployment_timecho.md @@ -0,0 +1,384 @@ + +# 集群版部署 + +本小节描述如何手动部署包括3个ConfigNode和3个DataNode的实例,即通常所说的3C3D集群。 + +
+ +
+ +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](./Environment-Requirements.md)准备完成。 + +2. 部署时推荐优先使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在目标服务器上配置/etc/hosts,如本机ip是192.168.1.3,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的`cn_internal_address`、`dn_internal_address`。`dn_internal_address`。 + + ``` shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. 有些参数首次启动后不能修改,请参考下方的"参数配置"章节来进行设置。 + +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 + +5. 请注意,安装部署(包括激活和使用软件)IoTDB时需要保持使用同一个用户进行操作,您可以: +- 使用 root 用户(推荐):使用 root 用户可以避免权限等问题。 +- 使用固定的非 root 用户: + - 使用同一用户操作:确保在启动、激活、停止等操作均保持使用同一用户,不要切换用户。 + - 避免使用 sudo:尽量避免使用 sudo 命令,因为它会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 + +6. 推荐部署监控面板,可以对重要运行指标进行监控,随时掌握数据库运行状态,监控面板可以联系商务获取,部署监控面板步骤可以参考:[监控面板部署](./Monitoring-panel-deployment.md) + +## 准备步骤 + +1. 准备IoTDB数据库安装包 :iotdb-enterprise-{version}-bin.zip(安装包获取见:[链接](../Deployment-and-Maintenance/IoTDB-Package_timecho.md)) +2. 按环境要求配置好操作系统环境(系统环境配置见:[链接](../Deployment-and-Maintenance/Environment-Requirements.md)) + +## 安装步骤 + +假设现在有3台linux服务器,IP地址和服务角色分配如下: + +| 节点ip | 主机名 | 服务 | +| ----------- | ------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +### 设置主机名 + +在3台机器上分别配置主机名,设置主机名需要在目标服务器上配置`/etc/hosts`,使用如下命令: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### 参数配置 + +解压安装包并进入安装目录 + +```Plain +unzip iotdb-enterprise-{version}-bin.zip +cd iotdb-enterprise-{version}-bin +``` + +#### 环境脚本配置 + +- `./conf/confignode-env.sh`配置 + + | **配置项** | **说明** | **默认值** | **推荐值** | 备注 | + | :---------- | :------------------------------------- | :--------- | :----------------------------------------------- | :----------- | + | MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- `./conf/datanode-env.sh`配置 + + | **配置项** | **说明** | **默认值** | **推荐值** | 备注 | + | :---------- | :----------------------------------- | :--------- | :----------------------------------------------- | :----------- | + | MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 通用配置 + +打开通用配置文件`./conf/iotdb-system.properties`,可根据部署方式设置以下参数: + +| 配置项 | 说明 | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | +| ------------------------- | ---------------------------------------- | -------------- | -------------- | -------------- | +| cluster_name | 集群名称 | defaultCluster | defaultCluster | defaultCluster | +| schema_replication_factor | 元数据副本数,DataNode数量不应少于此数目 | 3 | 3 | 3 | +| data_replication_factor | 数据副本数,DataNode数量不应少于此数目 | 2 | 2 | 2 | + +#### ConfigNode 配置 + +打开ConfigNode配置文件`./conf/iotdb-system.properties`,设置以下参数 + +| 配置项 | 说明 | 默认 | 推荐值 | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | 备注 | +| ------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 10710 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 10720 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +#### DataNode 配置 + +打开DataNode配置文件 `./conf/iotdb-system.properties`,设置以下参数: + +| 配置项 | 说明 | 默认 | 推荐值 | 192.168.1.3 | 192.168.1.4 | 192.168.1.5 | 备注 | +| ------------------------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------- | ------------- | ------------- | ------------- | ------------------ | +| dn_rpc_address | 客户端 RPC 服务的地址 | 127.0.0.1 | 推荐使用所在服务器的**IPV4地址或hostname** | iotdb-1 |iotdb-2 | iotdb-3 | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 6667 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | iotdb-1 | iotdb-2 | iotdb-3 | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 10730 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 10740 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 10750 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 10760 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | 第一个CongfigNode的cn_internal_address:cn_internal_port | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | 首次启动后不能修改 | + +> ❗️注意:VSCode Remote等编辑器无自动保存配置功能,请确保修改的文件被持久化保存,否则配置项无法生效 + +### 启动ConfigNode节点 + +先启动第一个iotdb-1的confignode, 保证种子confignode节点先启动,然后依次启动第2和第3个confignode节点 + +```Bash +cd sbin +./start-confignode.sh -d #“-d”参数将在后台进行启动 +``` +如果启动失败,请参考[常见问题](#常见问题)。 + +### 激活数据库 + +#### **方式一:激活文件拷贝激活** + +- 依次启动3个confignode节点后,每台机器各自的`activation`文件夹, 分别拷贝每台机器的`system_info`文件给天谋工作人员; +- 工作人员将返回每个ConfigNode节点的license文件,这里会返回3个license文件; +- 将3个license文件分别放入对应的ConfigNode节点的`activation`文件夹下; + +#### 方式二:激活脚本激活 + +- 依次获取3台机器的机器码,分别进入安装目录的`sbin`目录,执行激活脚本`start-activate.sh`: + + ```Bash + cd sbin + ./start-activate.sh + ``` + +- 显示如下信息,这里显示的是1台机器的机器码 : + + ```Bash + Please copy the system_info's content and send it to Timecho: + 01-KU5LDFFN-PNBEHDRH + Please enter license: + ``` + +- 其他2个节点依次执行激活脚本`start-activate.sh`,然后将获取的3台机器的机器码都复制给天谋工作人员 +- 工作人员会返回3段激活码,正常是与提供的3个机器码的顺序对应的,请分别将各自的激活码粘贴到上一步的命令行提示处 `Please enter license:`,如下提示: + + ```Bash + Please enter license: + Jw+MmF+Atxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx5bAOXNeob5l+HO5fEMgzrW8OJPh26Vl6ljKUpCvpTiw== + License has been stored to sbin/../activation/license + Import completed. Please start cluster and excute 'show cluster' to verify activation status + ``` + +### 启动DataNode 节点 + + 分别进入iotdb的`sbin`目录下,依次启动3个datanode节点: + +```Go +cd sbin +./start-datanode.sh -d #-d参数将在后台进行启动 +``` + +### 验证部署 + +可直接执行`./sbin`目录下的Cli启动脚本: + +```Plain +./start-cli.sh -h ip(本机ip或域名) -p 端口号(6667) +``` + + 成功启动后,出现如下界面显示IOTDB安装成功。 + +![](https://alioss.timecho.com/docs/img/%E4%BC%81%E4%B8%9A%E7%89%88%E6%88%90%E5%8A%9F.png) + +出现安装成功界面后,继续看下是否激活成功,使用 `show cluster`命令 + +当看到最右侧显示`ACTIVATED`表示激活成功 + +![](https://alioss.timecho.com/docs/img/%E4%BC%81%E4%B8%9A%E7%89%88%E6%BF%80%E6%B4%BB.png) + +> 出现`ACTIVATED(W)`为被动激活,表示此ConfigNode没有license文件(或没有签发时间戳最新的license文件),其激活依赖于集群中其它Activate状态的ConfigNode。此时建议检查license文件是否已放入license文件夹,没有请放入license文件,若已存在license文件,可能是此节点license文件与其他节点信息不一致导致,请联系天谋工作人员重新申请. + +## 节点维护步骤 + +### ConfigNode节点维护 + +ConfigNode节点维护分为ConfigNode添加和移除两种操作,有两个常见使用场景: +- 集群扩展:如集群中只有1个ConfigNode时,希望增加ConfigNode以提升ConfigNode节点高可用性,则可以添加2个ConfigNode,使得集群中有3个ConfigNode。 +- 集群故障恢复:1个ConfigNode所在机器发生故障,使得该ConfigNode无法正常运行,此时可以移除该ConfigNode,然后添加一个新的ConfigNode进入集群。 + +> ❗️注意,在完成ConfigNode节点维护后,需要保证集群中有1或者3个正常运行的ConfigNode。2个ConfigNode不具备高可用性,超过3个ConfigNode会导致性能损失。 + +#### 添加ConfigNode节点 + +脚本命令: +```shell +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-confignode.sh + +# Windows +# 首先切换到IoTDB根目录 +sbin/start-confignode.bat +``` + +参数介绍: + +| 参数 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + +#### 移除ConfigNode节点 + +首先通过CLI连接集群,通过`show confignodes`确认想要移除ConfigNode的内部地址与端口号: + +```Bash +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-confignode.sh [confignode_id] + +#Windows +sbin/remove-confignode.bat [confignode_id] + +``` + +### DataNode节点维护 + +DataNode节点维护有两个常见场景: + +- 集群扩容:出于集群能力扩容等目的,添加新的DataNode进入集群 +- 集群故障恢复:一个DataNode所在机器出现故障,使得该DataNode无法正常运行,此时可以移除该DataNode,并添加新的DataNode进入集群 + +> ❗️注意,为了使集群能正常工作,在DataNode节点维护过程中以及维护完成后,正常运行的DataNode总数不得少于数据副本数(通常为2),也不得少于元数据副本数(通常为3)。 + +#### 添加DataNode节点 + +脚本命令: + +```Bash +# Linux / MacOS +# 首先切换到IoTDB根目录 +sbin/start-datanode.sh + +# Windows +# 首先切换到IoTDB根目录 +sbin/start-datanode.bat +``` + +参数介绍: + +| 缩写 | 描述 | 是否为必填项 | +| :--- | :--------------------------------------------- | :----------- | +| -v | 显示版本信息 | 否 | +| -f | 在前台运行脚本,不将其放到后台 | 否 | +| -d | 以守护进程模式启动,即在后台运行 | 否 | +| -p | 指定一个文件来存放进程ID,用于进程管理 | 否 | +| -c | 指定配置文件夹的路径,脚本会从这里加载配置文件 | 否 | +| -g | 打印垃圾回收(GC)的详细信息 | 否 | +| -H | 指定Java堆转储文件的路径,当JVM内存溢出时使用 | 否 | +| -E | 指定JVM错误日志文件的路径 | 否 | +| -D | 定义系统属性,格式为 key=value | 否 | +| -X | 直接传递 -XX 参数给 JVM | 否 | +| -h | 帮助指令 | 否 | + +说明:在添加DataNode后,随着新的写入到来(以及旧数据过期,如果设置了TTL),集群负载会逐渐向新的DataNode均衡,最终在所有节点上达到存算资源的均衡。 + +#### 移除DataNode节点 + +首先通过CLI连接集群,通过`show datanodes`确认想要移除的DataNode的RPC地址与端口号: + +```Bash +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 0.0.0.0| 6667| 0| 0| +| 2|Running| 0.0.0.0| 6668| 1| 1| +| 3|Running| 0.0.0.0| 6669| 1| 0| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 3 +It costs 0.110s +``` + +然后使用脚本将DataNode移除。脚本命令: + +```Bash +# Linux / MacOS +sbin/remove-datanode.sh [datanode_id] + +#Windows +sbin/remove-datanode.bat [datanode_id] +``` + +## 常见问题 + +1. 部署过程中多次提示激活失败 + - 使用 `ls -al` 命令:使用 `ls -al` 命令检查安装包根目录的所有者信息是否为当前用户。 + - 检查激活目录:检查 `./activation` 目录下的所有文件,所有者信息是否为当前用户。 + +2. Confignode节点启动失败 + + 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + + 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + + 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + + 步骤 4: 清理环境: + + a. 结束所有 ConfigNode 和 DataNode 进程。 + + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. 删除 data 和 logs 目录。 + + 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Database-Resources.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Database-Resources.md new file mode 100644 index 00000000..17e09aa0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Database-Resources.md @@ -0,0 +1,193 @@ + +# 资源规划 +## CPU + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序列数(采集频率<=1HZ)CPU节点数
单机双活分布式
10W以内2核-4核123
30W以内4核-8核123
50W以内8核-16核123
100W以内16核-32核123
200w以内32核-48核123
1000w以内48核12请联系天谋商务咨询
1000w以上请联系天谋商务咨询
+ +## 内存 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序列数(采集频率<=1HZ)内存节点数
单机双活分布式
10W以内4G-8G123
30W以内12G-32G123
50W以内24G-48G123
100W以内32G-96G123
200w以内64G-128G123
1000w以内128G12请联系天谋商务咨询
1000w以上请联系天谋商务咨询
+ +## 存储(磁盘) +### 存储空间 +计算公式:测点数量 * 采样频率(Hz)* 每个数据点大小(Byte,不同数据类型不一样,见下表) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
数据点大小计算表
数据类型 时间戳(字节)值(字节)数据点总大小(字节)
开关量(Boolean)819
整型(INT32)/ 单精度浮点数(FLOAT)8412
长整型(INT64)/ 双精度浮点数(DOUBLE)8816
字符串(TEXT)8平均为a8+a
+ +示例:1000设备,每个设备100 测点,共 100000 序列,INT32 类型。采样频率1Hz(每秒一次),存储1年,3副本。 +- 完整计算公式:1000设备 * 100测点 * 12字节每数据点 * 86400秒每天 * 365天每年 * 3副本/10压缩比=11T +- 简版计算公式:1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T +### 存储配置 +1000w 点位以上或查询负载较大,推荐配置 SSD。 +## 网络(网卡) +在写入吞吐不超过1000万点/秒时,需配置千兆网卡;当写入吞吐超过 1000万点/秒时,需配置万兆网卡。 +| **写入吞吐(数据点/秒)** | **网卡速率** | +| ------------------- | ------------- | +| <1000万 | 1Gbps(千兆) | +| >=1000万 | 10Gbps(万兆) | +## 其他说明 +IoTDB 具有集群秒级扩容能力,扩容节点数据可不迁移,因此您无需担心按现有数据情况估算的集群能力有限,未来您可在需要扩容时为集群加入新的节点。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md new file mode 100644 index 00000000..92d4a6c8 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_apache.md @@ -0,0 +1,414 @@ + +# Docker部署 + +## 环境准备 + +### Docker安装 + +```SQL +#以ubuntu为例,其他操作系统可以自行搜索安装方法 +#step1: 安装一些必要的系统工具 +sudo apt-get update +sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common +#step2: 安装GPG证书 +curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - +#step3: 写入软件源信息 +sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" +#step4: 更新并安装Docker-CE +sudo apt-get -y update +sudo apt-get -y install docker-ce +#step5: 设置docker开机自启动 +sudo systemctl enable docker +#step6: 验证docker是否安装成功 +docker --version #显示版本信息,即安装成功 +``` + +### docker-compose安装 + +```SQL +#安装命令 +curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose +chmod +x /usr/local/bin/docker-compose +ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose +#验证是否安装成功 +docker-compose --version #显示版本信息即安装成功 +``` + +## 单机版 + +本节演示如何部署1C1D的docker单机版。 + +### 拉取镜像文件 + +Apache IoTDB的Docker镜像已经上传至https://hub.docker.com/r/apache/iotdb。 + +以获取1.3.2版本为例,拉取镜像命令: + +```bash +docker pull apache/iotdb:1.3.2-standalone +``` + +查看镜像: + +```bash +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E6%8B%89%E5%8F%96%E9%95%9C%E5%83%8F.PNG) + +### 创建docker bridge网络 + +```Bash +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +``` + +### 编写docker-compose的yml文件 + +这里我们以把IoTDB安装目录和yml文件统一放在`/docker-iotdb`文件夹下为例: + +文件目录结构为:`/docker-iotdb/iotdb`, `/docker-iotdb/docker-compose-standalone.yml ` + +```bash +docker-iotdb: +├── iotdb #iotdb安装目录 +│── docker-compose-standalone.yml #单机版docker-compose的yml文件 +``` + +完整的docker-compose-standalone.yml 内容如下: + +```bash +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.3.2-standalone #使用的镜像 + hostname: iotdb + container_name: iotdb + restart: always + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb:10710 + - dn_rpc_address=iotdb + - dn_internal_address=iotdb + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb:10710 + privileged: true + volumes: + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro + networks: + iotdb: + ipv4_address: 172.18.0.6 +networks: + iotdb: + external: true +``` + +### 启动IoTDB + +使用下面的命令启动: + +```bash +cd /docker-iotdb +docker-compose -f docker-compose-standalone.yml up -d #后台启动 +``` + +### 验证部署 + +- 查看日志,有如下字样,表示启动成功 + + ```SQL + docker logs -f iotdb-datanode #查看日志命令 + 2024-07-21 08:22:38,457 [main] INFO o.a.i.db.service.DataNode:227 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B2.png) + +- 进入容器,查看服务运行状态 + + 查看启动的容器 + + ```SQL + docker ps + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B22.png) + + 进入容器, 通过cli登录数据库, 使用show cluster命令查看服务状态 + + ```SQL + docker exec -it iotdb /bin/bash #进入容器 + ./start-cli.sh -h iotdb #登录数据库 + IoTDB> show cluster #查看服务状态 + ``` + + 可以看到服务状态都是running, 说明IoTDB部署成功。 + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B23.png) + +### 映射/conf目录(可选) + +后续如果想在物理机中直接修改配置文件,可以把容器中的/conf文件夹映射出来,分三步: + +步骤一:拷贝容器中的/conf目录到`/docker-iotdb/iotdb/conf` + +```bash +docker cp iotdb:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +步骤二:在`docker-compose-standalone.yml`中添加映射 + +```bash + volumes: + - ./iotdb/conf:/iotdb/conf #增加这个/conf文件夹的映射 + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro +``` + +步骤三:重新启动IoTDB + +```bash +docker-compose -f docker-compose-standalone.yml up -d +``` + +## 集群版 + +本小节描述如何手动部署包括3个ConfigNode和3个DataNode的实例,即通常所说的3C3D集群。 + +
+ +
+ +**注意:集群版目前只支持host网络和overlay 网络,不支持bridge网络。** + +下面以host网络为例演示如何部署3C3D集群。 + +### 设置主机名 + +假设现在有3台linux服务器,IP地址和服务角色分配如下: + +| 节点ip | 主机名 | 服务 | +| ----------- | ------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +在3台机器上分别配置主机名,设置主机名需要在目标服务器上配置`/etc/hosts`,使用如下命令: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### 拉取镜像文件 + +Apache IoTDB的Docker镜像已经上传至https://hub.docker.com/r/apache/iotdb。 + +在3台服务器上分别拉取IoTDB镜像,以获取1.3.2版本为例,拉取镜像命令: + +```SQL +docker pull apache/iotdb:1.3.2-standalone +``` + +查看镜像: + +```SQL +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%881.png) + +### 编写docker-compose的yml文件 + +这里我们以把IoTDB安装目录和yml文件统一放在`/docker-iotdb`文件夹下为例: + +文件目录结构为:`/docker-iotdb/iotdb`, `/docker-iotdb/confignode.yml`,`/docker-iotdb/datanode.yml` + +```SQL +docker-iotdb: +├── confignode.yml #confignode的yml文件 +├── datanode.yml #datanode的yml文件 +└── iotdb #IoTDB安装目录 +``` + +在每台服务器上都要编写2个yml文件,即confignode.yml和datanode.yml,yml示例如下: + +**confignode.yml:** + +```bash +#confignode.yml +version: "3" +services: + iotdb-confignode: + image: apache/iotdb:1.3.2-standalone #使用的镜像 + hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + container_name: iotdb-confignode + command: ["bash", "-c", "entrypoint.sh confignode"] + restart: always + environment: + - cn_internal_address=iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-1:10710 #默认第一台为seed节点 + - schema_replication_factor=3 #元数据副本数 + - data_replication_factor=2 #数据副本数 + privileged: true + volumes: + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro + network_mode: "host" #使用host网络 +``` + +**datanode.yml:** + +```bash +#datanode.yml +version: "3" +services: + iotdb-datanode: + image: iotdb-enterprise:1.3.2.3-standalone #使用的镜像 + hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + container_name: iotdb-datanode + command: ["bash", "-c", "entrypoint.sh datanode"] + restart: always + ports: + - "6667:6667" + privileged: true + environment: + - dn_rpc_address=iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + - dn_internal_address=iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + - dn_seed_config_node=iotdb-1:10710 #默认第1台为seed节点 + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - schema_replication_factor=3 #元数据副本数 + - data_replication_factor=2 #数据副本数 + volumes: + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro + network_mode: "host" #使用host网络 +``` + +### 首次启动confignode + +先在3台服务器上分别启动confignode, 注意启动顺序,先启动第1台iotdb-1,再启动iotdb-2和iotdb-3。 + +```bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d #后台启动 +``` + +### 启动datanode + +在3台服务器上分别启动datanode + +```SQL +cd /docker-iotdb +docker-compose -f datanode.yml up -d #后台启动 +``` + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%882.png) + +### 验证部署 + +- 查看日志,有如下字样,表示datanode启动成功 + + ```SQL + docker logs -f iotdb-datanode #查看日志命令 + 2024-07-21 09:40:58,120 [main] INFO o.a.i.db.service.DataNode:227 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%883.png) + +- 进入容器,查看服务运行状态 + + 查看启动的容器 + + ```SQL + docker ps + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%884.png) + + 进入任意一个容器, 通过cli登录数据库, 使用show cluster命令查看服务状态 + + ```SQL + docker exec -it iotdb-datanode /bin/bash #进入容器 + ./start-cli.sh -h iotdb-1 #登录数据库 + IoTDB> show cluster #查看服务状态 + ``` + + 可以看到服务状态都是running, 说明IoTDB部署成功。 + + ![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E9%9B%86%E7%BE%A4%E7%89%885.png) + +### 映射/conf目录(可选) + +后续如果想在物理机中直接修改配置文件,可以把容器中的/conf文件夹映射出来,分三步: + +步骤一:在3台服务器中分别拷贝容器中的/conf目录到/docker-iotdb/iotdb/conf + +```bash +docker cp iotdb-confignode:/iotdb/conf /docker-iotdb/iotdb/conf +或者 +docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +步骤二:在3台服务器的confignode.yml和datanode.yml中添加/conf目录映射 + +```bash +#confignode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #增加这个/conf文件夹的映射 + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro + +#datanode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #增加这个/conf文件夹的映射 + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /dev/mem:/dev/mem:ro +``` + +步骤三:在3台服务器上重新启动IoTDB + +```bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d +docker-compose -f datanode.yml up -d +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_timecho.md new file mode 100644 index 00000000..26555d12 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Docker-Deployment_timecho.md @@ -0,0 +1,474 @@ + +# Docker部署 + +## 环境准备 + +### Docker安装 + +```Bash +#以ubuntu为例,其他操作系统可以自行搜索安装方法 +#step1: 安装一些必要的系统工具 +sudo apt-get update +sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common +#step2: 安装GPG证书 +curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add - +#step3: 写入软件源信息 +sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable" +#step4: 更新并安装Docker-CE +sudo apt-get -y update +sudo apt-get -y install docker-ce +#step5: 设置docker开机自启动 +sudo systemctl enable docker +#step6: 验证docker是否安装成功 +docker --version #显示版本信息,即安装成功 +``` + +### docker-compose安装 + +```Bash +#安装命令 +curl -L "https://github.com/docker/compose/releases/download/v2.20.0/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose +chmod +x /usr/local/bin/docker-compose +ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose +#验证是否安装成功 +docker-compose --version #显示版本信息即安装成功 +``` + +### 安装dmidecode插件 + +默认情况下,linux服务器应该都已安装,如果没有安装的话,可以使用下面的命令安装。 + +```Bash +sudo apt-get install dmidecode +``` + +dmidecode 安装后,查找安装路径:`whereis dmidecode`,这里假设结果为`/usr/sbin/dmidecode`,记住该路径,后面的docker-compose的yml文件会用到。 + +### 获取IoTDB的容器镜像 + +关于IoTDB企业版的容器镜像您可联系商务或技术支持获取。 + +## 单机版部署 + +本节演示如何部署1C1D的docker单机版。 + +### load 镜像文件 + +比如这里获取的IoTDB的容器镜像文件名是:`iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz` + +load镜像: + +```Bash +docker load -i iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz +``` + +查看镜像: + +```Bash +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E6%9F%A5%E7%9C%8B%E9%95%9C%E5%83%8F.PNG) + +### 创建docker bridge网络 + +```Bash +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +``` + +### 编写docker-compose的yml文件 + +这里我们以把IoTDB安装目录和yml文件统一放在`/docker-iotdb` 文件夹下为例: + +文件目录结构为:`/docker-iotdb/iotdb`, `/docker-iotdb/docker-compose-standalone.yml ` + +```Bash +docker-iotdb: +├── iotdb #iotdb安装目录 +│── docker-compose-standalone.yml #单机版docker-compose的yml文件 +``` + +完整的`docker-compose-standalone.yml`内容如下: + +```Bash +version: "3" +services: + iotdb-service: + image: iotdb-enterprise:1.3.2.3-standalone #使用的镜像 + hostname: iotdb + container_name: iotdb + restart: always + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb:10710 + - dn_rpc_address=iotdb + - dn_internal_address=iotdb + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb:10710 + privileged: true + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + networks: + iotdb: + ipv4_address: 172.18.0.6 +networks: + iotdb: + external: true +``` + +### 首次启动 + +使用下面的命令启动: + +```Bash +cd /docker-iotdb +docker-compose -f docker-compose-standalone.yml up +``` + +由于没有激活,首次启动时会直接退出,属于正常现象,首次启动是为了获取机器码文件,用于后面的激活流程。 + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E6%BF%80%E6%B4%BB.png) + +### 申请激活 + +- 首次启动后,在物理机目录`/docker-iotdb/iotdb/activation`下会生成一个 `system_info`文件,将这个文件拷贝给天谋工作人员。 + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB1.png) + +- 收到工作人员返回的license文件,将license文件拷贝到`/docker-iotdb/iotdb/activation`文件夹下。 + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB2.png) + +### 再次启动IoTDB + +```Bash +docker-compose -f docker-compose-standalone.yml up -d +``` + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8iotdb.png) + +### 验证部署 + +- 查看日志,有如下字样,表示启动成功 + +```Bash +docker logs -f iotdb-datanode #查看日志命令 +2024-07-19 12:02:32,608 [main] INFO o.a.i.db.service.DataNode:231 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! +``` + +![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B21.png) + +- 进入容器,查看服务运行状态及激活信息 + + 查看启动的容器 + + ```Bash + docker ps + ``` + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B22.png) + + 进入容器, 通过cli登录数据库, 使用show cluster命令查看服务状态及激活状态 + + ```Bash + docker exec -it iotdb /bin/bash #进入容器 + ./start-cli.sh -h iotdb #登录数据库 + IoTDB> show cluster #查看状态 + ``` + + 可以看到服务都是running,激活状态显示已激活。 + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E9%AA%8C%E8%AF%81%E9%83%A8%E7%BD%B23.png) + +### 映射/conf目录(可选) + +后续如果想在物理机中直接修改配置文件,可以把容器中的/conf文件夹映射出来,分三步: + +步骤一:拷贝容器中的/conf目录到`/docker-iotdb/iotdb/conf` + +```Bash +docker cp iotdb:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +步骤二:在docker-compose-standalone.yml中添加映射 + +```Bash + volumes: + - ./iotdb/conf:/iotdb/conf #增加这个/conf文件夹的映射 + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro +``` + +步骤三:重新启动IoTDB + +```Bash +docker-compose -f docker-compose-standalone.yml up -d +``` + +## 集群版部署 + +本小节描述如何手动部署包括3个ConfigNode和3个DataNode的实例,即通常所说的3C3D集群。 + +
+ +
+ +**注意:集群版目前只支持host网络和overlay 网络,不支持bridge网络。** + +下面以host网络为例演示如何部署3C3D集群。 + +### 设置主机名 + +假设现在有3台linux服务器,IP地址和服务角色分配如下: + +| 节点ip | 主机名 | 服务 | +| ----------- | ------- | -------------------- | +| 192.168.1.3 | iotdb-1 | ConfigNode、DataNode | +| 192.168.1.4 | iotdb-2 | ConfigNode、DataNode | +| 192.168.1.5 | iotdb-3 | ConfigNode、DataNode | + +在3台机器上分别配置主机名,设置主机名需要在目标服务器上配置/etc/hosts,使用如下命令: + +```Bash +echo "192.168.1.3 iotdb-1" >> /etc/hosts +echo "192.168.1.4 iotdb-2" >> /etc/hosts +echo "192.168.1.5 iotdb-3" >> /etc/hosts +``` + +### load镜像文件 + +比如获取的IoTDB的容器镜像文件名是:`iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz` + +在3台服务器上分别执行load镜像命令: + +```Bash +docker load -i iotdb-enterprise-1.3.2.3-standalone-docker.tar.gz +``` + +查看镜像: + +```Bash +docker images +``` + +![](https://alioss.timecho.com/docs/img/%E9%95%9C%E5%83%8F%E5%8A%A0%E8%BD%BD.png) + +### 编写docker-compose的yml文件 + +这里我们以把IoTDB安装目录和yml文件统一放在/docker-iotdb文件夹下为例: + +文件目录结构为:`/docker-iotdb/iotdb`,`/docker-iotdb/confignode.yml`,`/docker-iotdb/datanode.yml` + +```Bash +docker-iotdb: +├── confignode.yml #confignode的yml文件 +├── datanode.yml #datanode的yml文件 +└── iotdb #IoTDB安装目录 +``` + +在每台服务器上都要编写2个yml文件,即`confignode.yml`和`datanode.yml`,yml示例如下: + +**confignode.yml:** + +```Bash +#confignode.yml +version: "3" +services: + iotdb-confignode: + image: iotdb-enterprise:1.3.2.3-standalone #使用的镜像 + hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + container_name: iotdb-confignode + command: ["bash", "-c", "entrypoint.sh confignode"] + restart: always + environment: + - cn_internal_address=iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-1:10710 #默认第一台为seed节点 + - schema_replication_factor=3 #元数据副本数 + - data_replication_factor=2 #数据副本数 + privileged: true + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + network_mode: "host" #使用host网络 +``` + +**datanode.yml:** + +```Bash +#datanode.yml +version: "3" +services: + iotdb-datanode: + image: iotdb-enterprise:1.3.2.3-standalone #使用的镜像 + hostname: iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + container_name: iotdb-datanode + command: ["bash", "-c", "entrypoint.sh datanode"] + restart: always + ports: + - "6667:6667" + privileged: true + environment: + - dn_rpc_address=iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + - dn_internal_address=iotdb-1|iotdb-2|iotdb-3 #根据实际情况选择,三选一 + - dn_seed_config_node=iotdb-1:10710 #默认第1台为seed节点 + - dn_rpc_port=6667 + - dn_internal_port=10730 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - schema_replication_factor=3 #元数据副本数 + - data_replication_factor=2 #数据副本数 + volumes: + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + network_mode: "host" #使用host网络 +``` + +### 首次启动confignode + +先在3台服务器上分别启动confignode, 用来获取机器码,注意启动顺序,先启动第1台iotdb-1,再启动iotdb-2和iotdb-3。 + +```Bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d #后台启动 +``` + +### 申请激活 + +- 首次启动3个confignode后,在每个物理机目录`/docker-iotdb/iotdb/activation`下都会生成一个`system_info`文件,将3个服务器的`system_info`文件拷贝给天谋工作人员; + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB1.png) + +- 将3个license文件分别放入对应的ConfigNode节点的`/docker-iotdb/iotdb/activation`文件夹下; + + ![](https://alioss.timecho.com/docs/img/%E5%8D%95%E6%9C%BA-%E7%94%B3%E8%AF%B7%E6%BF%80%E6%B4%BB2.png) + +- license放入对应的activation文件夹后,confignode会自动激活,不用重启confignode + +### 启动datanode + +在3台服务器上分别启动datanode + +```Bash +cd /docker-iotdb +docker-compose -f datanode.yml up -d #后台启动 +``` + +![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4%E7%89%88-dn%E5%90%AF%E5%8A%A8.png) + +### 验证部署 + +- 查看日志,有如下字样,表示datanode启动成功 + + ```Bash + docker logs -f iotdb-datanode #查看日志命令 + 2024-07-20 16:50:48,937 [main] INFO o.a.i.db.service.DataNode:231 - Congratulations, IoTDB DataNode is set up successfully. Now, enjoy yourself! + ``` + + ![](https://alioss.timecho.com/docs/img/dn%E5%90%AF%E5%8A%A8.png) + +- 进入任意一个容器,查看服务运行状态及激活信息 + + 查看启动的容器 + + ```Bash + docker ps + ``` + + ![](https://alioss.timecho.com/docs/img/%E6%9F%A5%E7%9C%8B%E5%AE%B9%E5%99%A8.png) + + 进入容器,通过cli登录数据库,使用`show cluster`命令查看服务状态及激活状态 + + ```Bash + docker exec -it iotdb-datanode /bin/bash #进入容器 + ./start-cli.sh -h iotdb-1 #登录数据库 + IoTDB> show cluster #查看状态 + ``` + + 可以看到服务都是running,激活状态显示已激活。 + + ![](https://alioss.timecho.com/docs/img/%E9%9B%86%E7%BE%A4-%E6%BF%80%E6%B4%BB.png) + +### 映射/conf目录(可选) + +后续如果想在物理机中直接修改配置文件,可以把容器中的/conf文件夹映射出来,分三步: + +步骤一:在3台服务器中分别拷贝容器中的/conf目录到`/docker-iotdb/iotdb/conf` + +```Bash +docker cp iotdb-confignode:/iotdb/conf /docker-iotdb/iotdb/conf +或者 +docker cp iotdb-datanode:/iotdb/conf /docker-iotdb/iotdb/conf +``` + +步骤二:在3台服务器的`confignode.yml`和`datanode.yml`中添加/conf目录映射 + +```Bash +#confignode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #增加这个/conf文件夹的映射 + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro + +#datanode.yml + volumes: + - ./iotdb/conf:/iotdb/conf #增加这个/conf文件夹的映射 + - ./iotdb/activation:/iotdb/activation + - ./iotdb/data:/iotdb/data + - ./iotdb/logs:/iotdb/logs + - /usr/sbin/dmidecode:/usr/sbin/dmidecode:ro + - /dev/mem:/dev/mem:ro +``` + +步骤三:在3台服务器上重新启动IoTDB + +```Bash +cd /docker-iotdb +docker-compose -f confignode.yml up -d +docker-compose -f datanode.yml up -d +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md new file mode 100644 index 00000000..dbf590a8 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md @@ -0,0 +1,163 @@ + +# 双活版部署 + +## 什么是双活版? + +双活通常是指两个独立的单机(或集群),实时进行镜像同步,它们的配置完全独立,可以同时接收外界的写入,每一个独立的单机(或集群)都可以将写入到自己的数据同步到另一个单机(或集群)中,两个单机(或集群)的数据可达到最终一致。 + +- 两个单机(或集群)可构成一个高可用组:当其中一个单机(或集群)停止服务时,另一个单机(或集群)不会受到影响。当停止服务的单机(或集群)再次启动时,另一个单机(或集群)会将新写入的数据同步过来。业务可以绑定两个单机(或集群)进行读写,从而达到高可用的目的。 +- 双活部署方案允许在物理节点少于 3 的情况下实现高可用,在部署成本上具备一定优势。同时可以通过电力、网络的双环网,实现两套单机(或集群)的物理供应隔离,保障运行的稳定性。 +- 目前双活能力为企业版功能。 + +![](https://alioss.timecho.com/docs/img/%E5%8F%8C%E6%B4%BB%E5%90%8C%E6%AD%A5.png) + +## 注意事项 + +1. 部署时推荐优先使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在目标服务器上配置`/etc/hosts`,如本机ip是192.168.1.3,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的`cn_internal_address`、`dn_internal_address`。 + + ```Bash + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +2. 有些参数首次启动后不能修改,请参考下方的"安装步骤"章节来进行设置。 + +3. 推荐部署监控面板,可以对重要运行指标进行监控,随时掌握数据库运行状态,监控面板可以联系商务获取,部署监控面板步骤可以参考[文档](https://www.timecho.com/docs/zh/UserGuide/latest/Deployment-and-Maintenance/Monitoring-panel-deployment.html) + +## 安装步骤 + +我们以两台单机A和B构建的双活版IoTDB为例,A和B的ip分别是192.168.1.3 和 192.168.1.4 ,这里用hostname来表示不同的主机,规划如下: + +| 机器 | 机器ip | 主机名 | +| ---- | ----------- | ------- | +| A | 192.168.1.3 | iotdb-1 | +| B | 192.168.1.4 | iotdb-2 | + +### Step1:分别安装两套独立的 IoTDB + +在2个机器上分别安装 IoTDB,单机版部署文档可参考[文档](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md),集群版部署文档可参考[文档](../Deployment-and-Maintenance/Cluster-Deployment_timecho.md)。**推荐 A、B 集群的各项配置保持一致,以实现最佳的双活效果。** + +### Step2:在机器A上创建数据同步任务至机器B + +- 在机器A上创建数据同步流程,即机器A上的数据自动同步到机器B,使用sbin目录下的cli工具连接A上的IoTDB数据库: + + ```Bash + ./sbin/start-cli.sh -h iotdb-1 + ``` + +- 创建并启动数据同步命令,SQL 如下: + + ```Bash + create pipe AB + with source ( + 'source.forwarding-pipe-requests' = 'false' + ) + with sink ( + 'sink'='iotdb-thrift-sink', + 'sink.ip'='iotdb-2', + 'sink.port'='6667' + ) + ``` + +- 注意:为了避免数据无限循环,需要将A和B上的参数`source.forwarding-pipe-requests` 均设置为 `false`,表示不转发从另一pipe传输而来的数据。 + +### Step3:在机器B上创建数据同步任务至机器A + + - 在机器B上创建数据同步流程,即机器B上的数据自动同步到机器A,使用sbin目录下的cli工具连接B上的IoTDB数据库: + + ```Bash + ./sbin/start-cli.sh -h iotdb-2 + ``` + + 创建并启动pipe,SQL 如下: + + ```Bash + create pipe BA + with source ( + 'source.forwarding-pipe-requests' = 'false' + ) + with sink ( + 'sink'='iotdb-thrift-sink', + 'sink.ip'='iotdb-1', + 'sink.port'='6667' + ) + ``` + +- 注意:为了避免数据无限循环,需要将A和B上的参数`source.forwarding-pipe-requests` 均设置为 `false`,表示不转发从另一pipe传输而来的数据。 + +### Step4:验证部署 + +上述数据同步流程创建完成后,即可启动双活集群。 + +#### 检查集群运行状态 + +```Bash +#在2个节点分别执行show cluster命令检查IoTDB服务状态 +show cluster +``` + +**机器A**: + +![](https://alioss.timecho.com/docs/img/%E5%8F%8C%E6%B4%BB-A.png) + +**机器B**: + +![](https://alioss.timecho.com/docs/img/%E5%8F%8C%E6%B4%BB-B.png) + +确保每一个 ConfigNode 和 DataNode 都处于 Running 状态。 + +#### 检查同步状态 + +- 机器A上检查同步状态 + +```Bash +show pipes +``` + +![](https://alioss.timecho.com/docs/img/show%20pipes-A.png) + +- 机器B上检查同步状态 + +```Bash +show pipes +``` + +![](https://alioss.timecho.com/docs/img/show%20pipes-B.png) + +确保每一个 pipe 都处于 RUNNING 状态。 + +### Step5:停止双活版 IoTDB + +- 在机器A的执行下列命令: + + ```SQL + ./sbin/start-cli.sh -h iotdb-1 #登录cli + IoTDB> stop pipe AB #停止数据同步流程 + ./sbin/stop-standalone.sh #停止数据库服务 + ``` + +- 在机器B的执行下列命令: + + ```SQL + ./sbin/start-cli.sh -h iotdb-2 #登录cli + IoTDB> stop pipe BA #停止数据同步流程 + ./sbin/stop-standalone.sh #停止数据库服务 + ``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Environment-Requirements.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Environment-Requirements.md new file mode 100644 index 00000000..75be11d6 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Environment-Requirements.md @@ -0,0 +1,205 @@ + +# 系统配置 + +## 磁盘阵列 + +### 配置建议 + +IoTDB对磁盘阵列配置没有严格运行要求,推荐使用多个磁盘阵列存储IoTDB的数据,以达到多个磁盘阵列并发写入的目标,配置可参考以下建议: + +1. 物理环境 + 系统盘:建议使用2块磁盘做Raid1,仅考虑操作系统自身所占空间即可,可以不为IoTDB预留系统盘空间 + 数据盘 + 建议做Raid,在磁盘维度进行数据保护 + 建议为IoTDB提供多块磁盘(1-6块左右)或磁盘组(不建议将所有磁盘做成一个磁盘阵列,会影响 IoTDB的性能上限) +2. 虚拟环境 + 建议挂载多块硬盘(1-6块左右) + +### 配置示例 + +- 示例1,4块3.5英寸硬盘 + +因服务器安装的硬盘较少,直接做Raid5即可,无需其他配置。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| ----------- | -------- | -------- | --------- | -------- | +| 系统/数据盘 | RAID5 | 4 | 允许坏1块 | 3 | + +- 示例2,12块3.5英寸硬盘 + +服务器配置12块3.5英寸盘。 + +前2块盘推荐Raid1作系统盘,2组数据盘可分为2组Raid5,每组5块盘实际可用4块。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| -------- | -------- | -------- | --------- | -------- | +| 系统盘 | RAID1 | 2 | 允许坏1块 | 1 | +| 数据盘 | RAID5 | 5 | 允许坏1块 | 4 | +| 数据盘 | RAID5 | 5 | 允许坏1块 | 4 | + +- 示例3,24块2.5英寸盘 + +服务器配置24块2.5英寸盘。 + +前2块盘推荐Raid1作系统盘,后面可分为3组Raid5,每组7块盘实际可用6块。剩余一块可闲置或存储写前日志使用。 + +推荐配置如下: + +| **使用分类** | **Raid类型** | **硬盘数量** | **冗余** | **可用容量** | +| -------- | -------- | -------- | --------- | -------- | +| 系统盘 | RAID1 | 2 | 允许坏1块 | 1 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | RAID5 | 7 | 允许坏1块 | 6 | +| 数据盘 | NoRaid | 1 | 损坏丢失 | 1 | + +## 操作系统 + +### 版本要求 + +IoTDB支持Linux、Windows、MacOS等操作系统,同时企业版支持龙芯、飞腾、鲲鹏等国产 CPU,支持中标麒麟、银河麒麟、统信、凝思等国产服务器操作系统。 + +### 硬盘分区 + +- 建议使用默认的标准分区方式,不推荐LVM扩展和硬盘加密。 +- 系统盘只需满足操作系统的使用空间即可,不需要为IoTDB预留空间。 +- 每个硬盘组只对应一个分区即可,数据盘(里面有多个磁盘组,对应raid)不用再额外分区,所有空间给IoTDB使用。 + +建议的磁盘分区方式如下表所示。 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
硬盘分类磁盘组对应盘符大小文件系统类型
系统盘磁盘组0/boot1GB默认
/磁盘组剩余全部空间默认
数据盘磁盘组1/data1磁盘组1全部空间默认
磁盘组2/data2磁盘组2全部空间默认
......
+ +### 网络配置 + +1. 关闭防火墙 + +```Bash +# 查看防火墙 +systemctl status firewalld +# 关闭防火墙 +systemctl stop firewalld +# 永久关闭防火墙 +systemctl disable firewalld +``` + +2. 保证所需端口不被占用 + +(1)集群占用端口的检查:在集群默认配置中,ConfigNode 会占用端口 10710 和 10720,DataNode 会占用端口 6667、10730、10740、10750 、10760、9090、9190、3000请确保这些端口未被占用。检查方式如下: + +```Bash +lsof -i:6667 或 netstat -tunp | grep 6667 +lsof -i:10710 或 netstat -tunp | grep 10710 +lsof -i:10720 或 netstat -tunp | grep 10720 +#如果命令有输出,则表示该端口已被占用。 +``` + +(2)集群部署工具占用端口的检查:使用集群管理工具opskit安装部署集群时,需打开SSH远程连接服务配置,并开放22号端口。 + +```Bash +yum install openssh-server #安装ssh服务 +systemctl start sshd #启用22号端口 +``` + +3. 保证服务器之间的网络相互连通 + +### 其他配置 + +1. 将系统 swap 优先级降至最低 + +```Bash +echo "vm.swappiness = 0">> /etc/sysctl.conf +# 一起执行 swapoff -a 和 swapon -a 命令是为了将 swap 里的数据转储回内存,并清空 swap 里的数据。 +# 不可省略 swappiness 设置而只执行 swapoff -a;否则,重启后 swap 会再次自动打开,使得操作失效。 +swapoff -a && swapon -a +# 在不重启的情况下使配置生效。 +sysctl -p +# swap的已使用内存变为0 +free -m +``` + +2. 设置系统最大打开文件数为 65535,以避免出现 "太多的打开文件 "的错误。 + +```Bash +#查看当前限制 +ulimit -n +# 临时修改 +ulimit -n 65535 +# 永久修改 +echo "* soft nofile 65535" >> /etc/security/limits.conf +echo "* hard nofile 65535" >> /etc/security/limits.conf +#退出当前终端会话后查看,预期显示65535 +ulimit -n +``` + +## 软件依赖 + +安装 Java 运行环境 ,Java 版本 >= 1.8,请确保已设置 jdk 环境变量。(V1.3.2.2 及之上版本推荐直接部署JDK17,老版本JDK部分场景下性能有问题,且datanode会出现stop不掉的问题) + +```Bash + #下面以在centos7,使用JDK-17安装为例: + tar -zxvf jdk-17_linux-x64_bin.tar #解压JDK文件 + Vim ~/.bashrc #配置JDK环境 + { export JAVA_HOME=/usr/lib/jvm/jdk-17.0.9 + export PATH=$JAVA_HOME/bin:$PATH + } #添加JDK环境变量 + source ~/.bashrc #配置环境生效 + java -version #检查JDK环境 +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_apache.md new file mode 100644 index 00000000..80e7cb01 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_apache.md @@ -0,0 +1,44 @@ + +# 安装包获取 +## 安装包获取方式 + +安装包可直接在Apache IoTDB官网获取:https://iotdb.apache.org/zh/Download/ + +## 安装包结构 + +解压后安装包(`apache-iotdb--all-bin.zip`),安装包解压后目录结构如下: + +| **目录** | **类型** | **说明** | +| ---------------- | -------- | ------------------------------------------------------------ | +| conf | 文件夹 | 配置文件目录,包含 ConfigNode、DataNode、JMX 和 logback 等配置文件 | +| data | 文件夹 | 默认的数据文件目录,包含 ConfigNode 和 DataNode 的数据文件。(启动程序后才会生成该目录) | +| lib | 文件夹 | IoTDB可执行库文件目录 | +| licenses | 文件夹 | 开源社区证书文件目录 | +| logs | 文件夹 | 默认的日志文件目录,包含 ConfigNode 和 DataNode 的日志文件(启动程序后才会生成该目录) | +| sbin | 文件夹 | 主要脚本目录,包含启、停等脚本等 | +| tools | 文件夹 | 系统周边工具目录 | +| ext | 文件夹 | pipe,trigger,udf插件的相关文件(需要使用时用户自行创建) | +| LICENSE | 文件 | 证书 | +| NOTICE | 文件 | 提示 | +| README_ZH\.md | 文件 | markdown格式的中文版说明 | +| README\.md | 文件 | 使用说明 | +| RELEASE_NOTES\.md | 文件 | 版本说明 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_timecho.md new file mode 100644 index 00000000..f824da36 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/IoTDB-Package_timecho.md @@ -0,0 +1,46 @@ + +# 安装包获取 + +## 企业版获取方式 + +企业版安装包可通过产品试用申请,或直接联系与您对接的商务人员获取。 + +## 安装包结构 + +解压后安装包(iotdb-enterprise-{version}-bin.zip),安装包解压后目录结构如下: + +| **目录** | **类型** | **说明** | +| ---------------- | -------- | ------------------------------------------------------------ | +| activation | 文件夹 | 激活文件所在目录,包括生成的机器码以及从商务侧获取的企业版激活码(启动ConfigNode后才会生成该目录,即可获取激活码) | +| conf | 文件夹 | 配置文件目录,包含 ConfigNode、DataNode、JMX 和 logback 等配置文件 | +| data | 文件夹 | 默认的数据文件目录,包含 ConfigNode 和 DataNode 的数据文件。(启动程序后才会生成该目录) | +| lib | 文件夹 | IoTDB可执行库文件目录 | +| licenses | 文件夹 | 开源社区证书文件目录 | +| logs | 文件夹 | 默认的日志文件目录,包含 ConfigNode 和 DataNode 的日志文件(启动程序后才会生成该目录) | +| sbin | 文件夹 | 主要脚本目录,包含启、停等脚本等 | +| tools | 文件夹 | 系统周边工具目录 | +| ext | 文件夹 | pipe,trigger,udf插件的相关文件(需要使用时用户自行创建) | +| LICENSE | 文件 | 证书 | +| NOTICE | 文件 | 提示 | +| README_ZH\.md | 文件 | markdown格式的中文版说明 | +| README\.md | 文件 | 使用说明 | +| RELEASE_NOTES\.md | 文件 | 版本说明 | diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md new file mode 100644 index 00000000..c7fba837 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Monitoring-panel-deployment.md @@ -0,0 +1,682 @@ + +# 监控面板部署 + +IoTDB配套监控面板是IoTDB企业版配套工具之一。它旨在解决IoTDB及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。本文将以常见的3C3D集群(3个Confignode和3个Datanode)为例,为您介绍如何在IoTDB的实例中开启系统监控模块,并且使用Prometheus + Grafana的方式完成对系统监控指标的可视化。 + +## 安装准备 + +1. 安装 IoTDB:需先安装IoTDB V1.0 版本及以上企业版,您可联系商务或技术支持获取 +2. 获取 IoTDB 监控面板安装包:基于企业版 IoTDB 的数据库监控面板,您可联系商务或技术支持获取 + +## 安装步骤 + +### 步骤一:IoTDB开启监控指标采集 + +1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 + +| 配置项 | 所在配置文件 | 配置说明 | +| :--------------------------------- | :------------------------------- | :----------------------------------------------------------- | +| cn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | +| cn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | +| cn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可 | +| dn_metric_reporter_list | conf/iotdb-system.properties | 将配置项取消注释,值设置为PROMETHEUS | +| dn_metric_level | conf/iotdb-system.properties | 将配置项取消注释,值设置为IMPORTANT | +| dn_metric_prometheus_reporter_port | conf/iotdb-system.properties | 将配置项取消注释,可默认设置为9092,如设置其他端口,不与其他端口冲突即可 | + +以3C3D集群为例,需要修改的监控配置如下: + +| 节点ip | 主机名 | 集群角色 | 配置文件路径 | 配置项 | +| ----------- | ------- | ---------- | -------------------------------- | ------------------------------------------------------------ | +| 192.168.1.3 | iotdb-1 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.4 | iotdb-2 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.5 | iotdb-3 | confignode | conf/iotdb-system.properties | cn_metric_reporter_list=PROMETHEUS cn_metric_level=IMPORTANT cn_metric_prometheus_reporter_port=9091 | +| 192.168.1.3 | iotdb-1 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.4 | iotdb-2 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | +| 192.168.1.5 | iotdb-3 | datanode | conf/iotdb-system.properties | dn_metric_reporter_list=PROMETHEUS dn_metric_level=IMPORTANT dn_metric_prometheus_reporter_port=9092 | + +2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: + +```shell +./sbin/stop-standalone.sh #先停止confignode和datanode +./sbin/start-confignode.sh -d #启动confignode +./sbin/start-datanode.sh -d #启动datanode +``` + +3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### 步骤二:安装、配置Prometheus + +> 此处以prometheus安装在服务器192.168.1.3为例。 + +1. 下载 Prometheus 安装包,要求安装 V2.30.3 版本及以上,可前往 Prometheus 官网下载(https://prometheus.io/docs/introduction/first_steps/) +2. 解压安装包,进入解压后的文件夹: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +3. 修改配置。修改配置文件prometheus.yml如下 + 1. 新增confignode任务收集ConfigNode的监控数据 + 2. 新增datanode任务收集DataNode的监控数据 + +```shell +global: + scrape_interval: 15s + evaluation_interval: 15s +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true +``` + +4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +5. 确认启动成功。在浏览器中输入 http://192.168.1.3:9090,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 + +
+ + +
+ + + +6. 点击Targets中左侧链接可以跳转到网页监控,查看相应节点的监控信息: + +![](https://alioss.timecho.com/docs/img/%E8%8A%82%E7%82%B9%E7%9B%91%E6%8E%A7.png) + +### 步骤三:安装grafana并配置数据源 + +> 此处以Grafana安装在服务器192.168.1.3为例。 + +1. 下载 Grafana 安装包,要求安装 V8.4.2 版本及以上,可以前往Grafana官网下载(https://grafana.com/grafana/download) +2. 解压并进入对应文件夹 + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +3. 启动Grafana: + +```Shell +./bin/grafana-server web +``` + +4. 登录Grafana。在浏览器中输入 http://192.168.1.3:3000(或修改后的端口),进入Grafana,默认初始用户名和密码均为 admin。 + +5. 配置数据源。在Connections中找到Data sources,新增一个data source并配置Data Source为Prometheus + +![](https://alioss.timecho.com/docs/img/%E6%B7%BB%E5%8A%A0%E9%85%8D%E7%BD%AE.png) + +在配置Data Source时注意Prometheus所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 + +![](https://alioss.timecho.com/docs/img/%E9%85%8D%E7%BD%AE%E6%88%90%E5%8A%9F.png) + +### 步骤四:导入IoTDB Grafana看板 + +1. 进入Grafana,选择Dashboards: + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E9%80%89%E6%8B%A9.png) + +2. 点击右侧 Import 按钮 + + ![](https://alioss.timecho.com/docs/img/Import%E6%8C%89%E9%92%AE.png) + +3. 使用upload json file的方式导入Dashboard + + ![](https://alioss.timecho.com/docs/img/%E5%AF%BC%E5%85%A5Dashboard.png) + +4. 选择IoTDB监控面板中其中一个面板的json文件,这里以选择 Apache IoTDB ConfigNode Dashboard为例(监控面板安装包获取参见本文【安装准备】): + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E9%9D%A2%E6%9D%BF.png) + +5. 选择数据源为Prometheus,然后点击Import + + ![](https://alioss.timecho.com/docs/img/%E9%80%89%E6%8B%A9%E6%95%B0%E6%8D%AE%E6%BA%90.png) + +6. 之后就可以看到导入的Apache IoTDB ConfigNode Dashboard监控面板 + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF.png) + +7. 同样地,我们可以导入Apache IoTDB DataNode Dashboard、Apache Performance Overview Dashboard、Apache System Overview Dashboard,可看到如下的监控面板: + +
+ + + +
+ +8. 至此,IoTDB监控面板就全部导入完成了,现在可以随时查看监控信息了。 + + ![](https://alioss.timecho.com/docs/img/%E9%9D%A2%E6%9D%BF%E6%B1%87%E6%80%BB.png) + +## 附录、监控指标详解 + +### 系统面板(System Dashboard) + +该面板展示了当前系统CPU、内存、磁盘、网络资源的使用情况已经JVM的部分状况。 + +#### CPU + +- CPU Core:CPU 核数 +- CPU Load: + - System CPU Load:整个系统在采样时间内 CPU 的平均负载和繁忙程度 + - Process CPU Load:IoTDB 进程在采样时间内占用的 CPU 比例 +- CPU Time Per Minute:系统每分钟内所有进程的 CPU 时间总和 + +#### Memory + +- System Memory:当前系统内存的使用情况。 + - Commited vm size: 操作系统分配给正在运行的进程使用的虚拟内存的大小。 + - Total physical memory:系统可用物理内存的总量。 + - Used physical memory:系统已经使用的内存总量。包含进程实际使用的内存量和操作系统 buffers/cache 占用的内存。 +- System Swap Memory:交换空间(Swap Space)内存用量。 +- Process Memory:IoTDB 进程使用内存的情况。 + - Max Memory:IoTDB 进程能够从操作系统那里最大请求到的内存量。(datanode-env/confignode-env 配置文件中配置分配的内存大小) + - Total Memory:IoTDB 进程当前已经从操作系统中请求到的内存总量。 + - Used Memory:IoTDB 进程当前已经使用的内存总量。 + +#### Disk + +- Disk Space: + - Total disk space:IoTDB 可使用的最大磁盘空间。 + - Used disk space:IoTDB 已经使用的磁盘空间。 +- Log Number Per Minute:采样时间内每分钟 IoTDB 各级别日志数量的平均值。 +- File Count:IoTDB 相关文件数量 + - all:所有文件数量 + - TsFile:TsFile 数量 + - seq:顺序 TsFile 数量 + - unseq:乱序 TsFile 数量 + - wal:WAL 文件数量 + - cross-temp:跨空间合并 temp 文件数量 + - inner-seq-temp:顺序空间内合并 temp 文件数量 + - innser-unseq-temp:乱序空间内合并 temp 文件数量 + - mods:墓碑文件数量 +- Open File Count:系统打开的文件句柄数量 +- File Size:IoTDB 相关文件的大小。各子项分别是对应文件的大小。 +- Disk I/O Busy Rate:等价于 iostat 中的 %util 指标,一定程度上反映磁盘的繁忙程度。各子项分别是对应磁盘的指标。 +- Disk I/O Throughput:系统各磁盘在一段时间 I/O Throughput 的平均值。各子项分别是对应磁盘的指标。 +- Disk I/O Ops:等价于 iostat 中的 r/s 、w/s、rrqm/s、wrqm/s 四个指标,指的是磁盘每秒钟进行 I/O 的次数。read 和 write 指的是磁盘执行单次 I/O 的次数,由于块设备有相应的调度算法,在某些情况下可以将多个相邻的 I/O 合并为一次进行,merged-read 和 merged-write 指的是将多个 I/O 合并为一个 I/O 进行的次数。 +- Disk I/O Avg Time:等价于 iostat 的 await,即每个 I/O 请求的平均时延。读和写请求分开记录。 +- Disk I/O Avg Size:等价于 iostat 的 avgrq-sz,反映了每个 I/O 请求的大小。读和写请求分开记录。 +- Disk I/O Avg Queue Size:等价于 iostat 中的 avgqu-sz,即 I/O 请求队列的平均长度。 +- I/O System Call Rate:进程调用读写系统调用的频率,类似于 IOPS。 +- I/O Throughput:进程进行 I/O 的吞吐量,分为 actual_read/write 和 attempt_read/write 两类。actual read 和 actual write 指的是进程实际导致块设备进行 I/O 的字节数,不包含被 Page Cache 处理的部分。 + +#### JVM + +- GC Time Percentage:节点 JVM 在过去一分钟的时间窗口内,GC 耗时所占的比例 +- GC Allocated/Promoted Size Detail: 节点 JVM 平均每分钟晋升到老年代的对象大小,新生代/老年代和非分代新申请的对象大小 +- GC Data Size Detail:节点 JVM 长期存活的对象大小和对应代际允许的最大值 +- Heap Memory:JVM 堆内存使用情况。 + - Maximum heap memory:JVM 最大可用的堆内存大小。 + - Committed heap memory:JVM 已提交的堆内存大小。 + - Used heap memory:JVM 已经使用的堆内存大小。 + - PS Eden Space:PS Young 区的大小。 + - PS Old Space:PS Old 区的大小。 + - PS Survivor Space:PS Survivor 区的大小。 + - ...(CMS/G1/ZGC 等) +- Off Heap Memory:堆外内存用量。 + - direct memory:堆外直接内存。 + - mapped memory:堆外映射内存。 +- GC Number Per Minute:节点 JVM 平均每分钟进行垃圾回收的次数,包括 YGC 和 FGC +- GC Time Per Minute:节点 JVM 平均每分钟进行垃圾回收的耗时,包括 YGC 和 FGC +- GC Number Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的次数,包括 YGC 和 FGC +- GC Time Per Minute Detail:节点 JVM 平均每分钟由于不同 cause 进行垃圾回收的耗时,包括 YGC 和 FGC +- Time Consumed Of Compilation Per Minute:每分钟 JVM 用于编译的总时间 +- The Number of Class: + - loaded:JVM 目前已经加载的类的数量 + - unloaded:系统启动至今 JVM 卸载的类的数量 +- The Number of Java Thread:IoTDB 目前存活的线程数。各子项分别为各状态的线程数。 + +#### Network + +eno 指的是到公网的网卡,lo 是虚拟网卡。 + +- Net Speed:网卡发送和接收数据的速度 +- Receive/Transmit Data Size:网卡发送或者接收的数据包大小,自系统重启后算起 +- Packet Speed:网卡发送和接收数据包的速度,一次 RPC 请求可以对应一个或者多个数据包 +- Connection Num:当前选定进程的 socket 连接数(IoTDB只有 TCP) + +### 整体性能面板(Performance Overview Dashboard) + +#### Cluster Overview + +- Total CPU Core: 集群机器 CPU 总核数 +- DataNode CPU Load: 集群各DataNode 节点的 CPU 使用率 +- 磁盘 + - Total Disk Space: 集群机器磁盘总大小 + - DataNode Disk Usage: 集群各 DataNode 的磁盘使用率 +- Total Timeseries: 集群管理的时间序列总数(含副本),实际时间序列数需结合元数据副本数计算 +- Cluster: 集群 ConfigNode 和 DataNode 节点数量 +- Up Time: 集群启动至今的时长 +- Total Write Point Per Second: 集群每秒写入总点数(含副本),实际写入总点数需结合数据副本数分析 +- 内存 + - Total System Memory: 集群机器系统内存总大小 + - Total Swap Memory: 集群机器交换内存总大小 + - DataNode Process Memory Usage: 集群各 DataNode 的内存使用率 +- Total File Number: 集群管理文件总数量 +- Cluster System Overview: 集群机器概述,包括平均 DataNode 节点内存占用率、平均机器磁盘使用率 +- Total DataBase: 集群管理的 Database 总数(含副本) +- Total DataRegion: 集群管理的 DataRegion 总数 +- Total SchemaRegion: 集群管理的 SchemaRegion 总数 + +#### Node Overview + +- CPU Core: 节点所在机器的 CPU 核数 +- Disk Space: 节点所在机器的磁盘大小 +- Timeseries: 节点所在机器管理的时间序列数量(含副本) +- System Overview: 节点所在机器的系统概述,包括 CPU 负载、进程内存使用比率、磁盘使用比率 +- Write Point Per Second: 节点所在机器的每秒写入速度(含副本) +- System Memory: 节点所在机器的系统内存大小 +- Swap Memory: 节点所在机器的交换内存大小 +- File Number: 节点管理的文件数 + +#### Performance + +- Session Idle Time: 节点的 session 连接的总空闲时间和总忙碌时间 +- Client Connection: 节点的客户端连接情况,包括总连接数和活跃连接数 +- Time Consumed Of Operation: 节点的各类型操作耗时,包括平均值和P99 +- Average Time Consumed Of Interface: 节点的各个 thrift 接口平均耗时 +- P99 Time Consumed Of Interface: 节点的各个 thrift 接口的 P99 耗时数 +- Task Number: 节点的各项系统任务数量 +- Average Time Consumed of Task: 节点的各项系统任务的平均耗时 +- P99 Time Consumed of Task: 节点的各项系统任务的 P99 耗时 +- Operation Per Second: 节点的每秒操作数 +- 主流程 + - Operation Per Second Of Stage: 节点主流程各阶段的每秒操作数 + - Average Time Consumed Of Stage: 节点主流程各阶段平均耗时 + - P99 Time Consumed Of Stage: 节点主流程各阶段 P99 耗时 +- Schedule 阶段 + - OPS Of Schedule: 节点 schedule 阶段各子阶段每秒操作数 + - Average Time Consumed Of Schedule Stage: 节点 schedule 阶段各子阶段平均耗时 + - P99 Time Consumed Of Schedule Stage: 节点的 schedule 阶段各子阶段 P99 耗时 +- Local Schedule 各子阶段 + - OPS Of Local Schedule Stage: 节点 local schedule 各子阶段每秒操作数 + - Average Time Consumed Of Local Schedule Stage: 节点 local schedule 阶段各子阶段平均耗时 + - P99 Time Consumed Of Local Schedule Stage: 节点的 local schedule 阶段各子阶段 P99 耗时 +- Storage 阶段 + - OPS Of Storage Stage: 节点 storage 阶段各子阶段每秒操作数 + - Average Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段平均耗时 + - P99 Time Consumed Of Storage Stage: 节点 storage 阶段各子阶段 P99 耗时 +- Engine 阶段 + - OPS Of Engine Stage: 节点 engine 阶段各子阶段每秒操作数 + - Average Time Consumed Of Engine Stage: 节点的 engine 阶段各子阶段平均耗时 + - P99 Time Consumed Of Engine Stage: 节点 engine 阶段各子阶段的 P99 耗时 + +#### System + +- CPU Load: 节点的 CPU 负载 +- CPU Time Per Minute: 节点的每分钟 CPU 时间,最大值和 CPU 核数相关 +- GC Time Per Minute: 节点的平均每分钟 GC 耗时,包括 YGC 和 FGC +- Heap Memory: 节点的堆内存使用情况 +- Off Heap Memory: 节点的非堆内存使用情况 +- The Number Of Java Thread: 节点的 Java 线程数量情况 +- File Count: 节点管理的文件数量情况 +- File Size: 节点管理文件大小情况 +- Log Number Per Minute: 节点的每分钟不同类型日志情况 + +### ConfigNode 面板(ConfigNode Dashboard) + +该面板展示了集群中所有管理节点的表现情况,包括分区、节点信息、客户端连接情况统计等。 + +#### Node Overview + +- Database Count: 节点的数据库数量 +- Region + - DataRegion Count: 节点的 DataRegion 数量 + - DataRegion Current Status: 节点的 DataRegion 的状态 + - SchemaRegion Count: 节点的 SchemaRegion 数量 + - SchemaRegion Current Status: 节点的 SchemaRegion 的状态 +- System Memory: 节点的系统内存大小 +- Swap Memory: 节点的交换区内存大小 +- ConfigNodes: 节点所在集群的 ConfigNode 的运行状态 +- DataNodes: 节点所在集群的 DataNode 情况 +- System Overview: 节点的系统概述,包括系统内存、磁盘使用、进程内存以及CPU负载 + +#### NodeInfo + +- Node Count: 节点所在集群的节点数量,包括 ConfigNode 和 DataNode +- ConfigNode Status: 节点所在集群的 ConfigNode 节点的状态 +- DataNode Status: 节点所在集群的 DataNode 节点的状态 +- SchemaRegion Distribution: 节点所在集群的 SchemaRegion 的分布情况 +- SchemaRegionGroup Leader Distribution: 节点所在集群的 SchemaRegionGroup 的 Leader 分布情况 +- DataRegion Distribution: 节点所在集群的 DataRegion 的分布情况 +- DataRegionGroup Leader Distribution: 节点所在集群的 DataRegionGroup 的 Leader 分布情况 + +#### Protocol + +- 客户端数量统计 + - Active Client Num: 节点各线程池的活跃客户端数量 + - Idle Client Num: 节点各线程池的空闲客户端数量 + - Borrowed Client Count: 节点各线程池的借用客户端数量 + - Created Client Count: 节点各线程池的创建客户端数量 + - Destroyed Client Count: 节点各线程池的销毁客户端数量 +- 客户端时间情况 + - Client Mean Active Time: 节点各线程池客户端的平均活跃时间 + - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 + - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + +#### Partition Table + +- SchemaRegionGroup Count: 节点所在集群的 Database 的 SchemaRegionGroup 的数量 +- DataRegionGroup Count: 节点所在集群的 Database 的 DataRegionGroup 的数量 +- SeriesSlot Count: 节点所在集群的 Database 的 SeriesSlot 的数量 +- TimeSlot Count: 节点所在集群的 Database 的 TimeSlot 的数量 +- DataRegion Status: 节点所在集群的 DataRegion 状态 +- SchemaRegion Status: 节点所在集群的 SchemaRegion 的状态 + +#### Consensus + +- Ratis Stage Time: 节点的 Ratis 各阶段耗时 +- Write Log Entry: 节点的 Ratis 写 Log 的耗时 +- Remote / Local Write Time: 节点的 Ratis 的远程写入和本地写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 的远程和本地写入的 QPS +- RatisConsensus Memory: 节点 Ratis 共识协议的内存使用 + +### DataNode 面板(DataNode Dashboard) + +该面板展示了集群中所有数据节点的监控情况,包含写入耗时、查询耗时、存储文件数等。 + +#### Node Overview + +- The Number Of Entity: 节点管理的实体情况 +- Write Point Per Second: 节点的每秒写入速度 +- Memory Usage: 节点的内存使用情况,包括 IoT Consensus 各部分内存占用、SchemaRegion内存总占用和各个数据库的内存占用。 + +#### Protocol + +- 节点操作耗时 + - The Time Consumed Of Operation (avg): 节点的各项操作的平均耗时 + - The Time Consumed Of Operation (50%): 节点的各项操作耗时的中位数 + - The Time Consumed Of Operation (99%): 节点的各项操作耗时的P99 +- Thrift统计 + - The QPS Of Interface: 节点各个 Thrift 接口的 QPS + - The Avg Time Consumed Of Interface: 节点各个 Thrift 接口的平均耗时 + - Thrift Connection: 节点的各类型的 Thrfit 连接数量 + - Thrift Active Thread: 节点各类型的活跃 Thrift 连接数量 +- 客户端统计 + - Active Client Num: 节点各线程池的活跃客户端数量 + - Idle Client Num: 节点各线程池的空闲客户端数量 + - Borrowed Client Count: 节点的各线程池借用客户端数量 + - Created Client Count: 节点各线程池的创建客户端数量 + - Destroyed Client Count: 节点各线程池的销毁客户端数量 + - Client Mean Active Time: 节点各线程池的客户端平均活跃时间 + - Client Mean Borrow Wait Time: 节点各线程池的客户端平均借用等待时间 + - Client Mean Idle Time: 节点各线程池的客户端平均空闲时间 + +#### Storage Engine + +- File Count: 节点管理的各类型文件数量 +- File Size: 节点管理的各类型文件大小 +- TsFile + - TsFile Total Size In Each Level: 节点管理的各级别 TsFile 文件总大小 + - TsFile Count In Each Level: 节点管理的各级别 TsFile 文件数量 + - Avg TsFile Size In Each Level: 节点管理的各级别 TsFile 文件的平均大小 +- Task Number: 节点的 Task 数量 +- The Time Consumed of Task: 节点的 Task 的耗时 +- Compaction + - Compaction Read And Write Per Second: 节点的每秒钟合并读写速度 + - Compaction Number Per Minute: 节点的每分钟合并数量 + - Compaction Process Chunk Status: 节点合并不同状态的 Chunk 的数量 + - Compacted Point Num Per Minute: 节点每分钟合并的点数 + +#### Write Performance + +- Write Cost(avg): 节点写入耗时平均值,包括写入 wal 和 memtable +- Write Cost(50%): 节点写入耗时中位数,包括写入 wal 和 memtable +- Write Cost(99%): 节点写入耗时的P99,包括写入 wal 和 memtable +- WAL + - WAL File Size: 节点管理的 WAL 文件总大小 + - WAL File Num: 节点管理的 WAL 文件数量 + - WAL Nodes Num: 节点管理的 WAL Node 数量 + - Make Checkpoint Costs: 节点创建各类型的 CheckPoint 的耗时 + - WAL Serialize Total Cost: 节点 WAL 序列化总耗时 + - Data Region Mem Cost: 节点不同的DataRegion的内存占用、当前实例的DataRegion的内存总占用、当前集群的 DataRegion 的内存总占用 + - Serialize One WAL Info Entry Cost: 节点序列化一个WAL Info Entry 耗时 + - Oldest MemTable Ram Cost When Cause Snapshot: 节点 WAL 触发 oldest MemTable snapshot 时 MemTable 大小 + - Oldest MemTable Ram Cost When Cause Flush: 节点 WAL 触发 oldest MemTable flush 时 MemTable 大小 + - Effective Info Ratio Of WALNode: 节点的不同 WALNode 的有效信息比 + - WAL Buffer + - WAL Buffer Cost: 节点 WAL flush SyncBuffer 耗时,包含同步和异步两种 + - WAL Buffer Used Ratio: 节点的 WAL Buffer 的使用率 + - WAL Buffer Entries Count: 节点的 WAL Buffer 的条目数量 +- Flush统计 + - Flush MemTable Cost(avg): 节点 Flush 的总耗时和各个子阶段耗时的平均值 + - Flush MemTable Cost(50%): 节点 Flush 的总耗时和各个子阶段耗时的中位数 + - Flush MemTable Cost(99%): 节点 Flush 的总耗时和各个子阶段耗时的 P99 + - Flush Sub Task Cost(avg): 节点的 Flush 平均子任务耗时平均情况,包括排序、编码、IO 阶段 + - Flush Sub Task Cost(50%): 节点的 Flush 各个子任务的耗时中位数情况,包括排序、编码、IO 阶段 + - Flush Sub Task Cost(99%): 节点的 Flush 平均子任务耗时P99情况,包括排序、编码、IO 阶段 +- Pending Flush Task Num: 节点的处于阻塞状态的 Flush 任务数量 +- Pending Flush Sub Task Num: 节点阻塞的 Flush 子任务数量 +- Tsfile Compression Ratio Of Flushing MemTable: 节点刷盘 Memtable 时对应的 TsFile 压缩率 +- Flush TsFile Size Of DataRegions: 节点不同 DataRegion 的每次刷盘时对应的 TsFile 大小 +- Size Of Flushing MemTable: 节点刷盘的 Memtable 的大小 +- Points Num Of Flushing MemTable: 节点不同 DataRegion 刷盘时的点数 +- Series Num Of Flushing MemTable: 节点的不同 DataRegion 的 Memtable 刷盘时的时间序列数 +- Average Point Num Of Flushing MemChunk: 节点 MemChunk 刷盘的平均点数 + +#### Schema Engine + +- Schema Engine Mode: 节点的元数据引擎模式 +- Schema Consensus Protocol: 节点的元数据共识协议 +- Schema Region Number: 节点管理的 SchemaRegion 数量 +- Schema Region Memory Overview: 节点的 SchemaRegion 的内存数量 +- Memory Usgae per SchemaRegion: 节点 SchemaRegion 的平均内存使用大小 +- Cache MNode per SchemaRegion: 节点每个 SchemaRegion 中 cache node 个数 +- MLog Length and Checkpoint: 节点每个 SchemaRegion 的当前 mlog 的总长度和检查点位置(仅 SimpleConsensus 有效) +- Buffer MNode per SchemaRegion: 节点每个 SchemaRegion 中 buffer node 个数 +- Activated Template Count per SchemaRegion: 节点每个SchemaRegion中已激活的模版数 +- 时间序列统计 + - Timeseries Count per SchemaRegion: 节点 SchemaRegion 的平均时间序列数 + - Series Type: 节点不同类型的时间序列数量 + - Time Series Number: 节点的时间序列总数 + - Template Series Number: 节点的模板时间序列总数 + - Template Series Count per SchemaRegion: 节点每个SchemaRegion中通过模版创建的序列数 +- IMNode统计 + - Pinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点数 + - Pinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Pinned 的 IMNode 节点的内存占用大小 + - Unpinned MNode per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点数 + - Unpinned Memory per SchemaRegion: 节点每个 SchemaRegion 中 Unpinned 的 IMNode 节点的内存占用大小 + - Schema File Memory MNode Number: 节点全局 pinned 和 unpinned 的 IMNode 节点数 + - Release and Flush MNode Rate: 节点每秒 release 和 flush 的 IMNode 数量 +- Cache Hit Rate: 节点的缓存命中率 +- Release and Flush Thread Number: 节点当前活跃的 Release 和 Flush 线程数量 +- Time Consumed of Relead and Flush (avg): 节点触发 cache 释放和 buffer 刷盘耗时的平均值 +- Time Consumed of Relead and Flush (99%): 节点触发 cache 释放和 buffer 刷盘的耗时的 P99 + +#### Query Engine + +- 各阶段耗时 + - The time consumed of query plan stages(avg): 节点查询各阶段耗时的平均值 + - The time consumed of query plan stages(50%): 节点查询各阶段耗时的中位数 + - The time consumed of query plan stages(99%): 节点查询各阶段耗时的P99 +- 执行计划分发耗时 + - The time consumed of plan dispatch stages(avg): 节点查询执行计划分发耗时的平均值 + - The time consumed of plan dispatch stages(50%): 节点查询执行计划分发耗时的中位数 + - The time consumed of plan dispatch stages(99%): 节点查询执行计划分发耗时的P99 +- 执行计划执行耗时 + - The time consumed of query execution stages(avg): 节点查询执行计划执行耗时的平均值 + - The time consumed of query execution stages(50%): 节点查询执行计划执行耗时的中位数 + - The time consumed of query execution stages(99%): 节点查询执行计划执行耗时的P99 +- 算子执行耗时 + - The time consumed of operator execution stages(avg): 节点查询算子执行耗时的平均值 + - The time consumed of operator execution(50%): 节点查询算子执行耗时的中位数 + - The time consumed of operator execution(99%): 节点查询算子执行耗时的P99 +- 聚合查询计算耗时 + - The time consumed of query aggregation(avg): 节点聚合查询计算耗时的平均值 + - The time consumed of query aggregation(50%): 节点聚合查询计算耗时的中位数 + - The time consumed of query aggregation(99%): 节点聚合查询计算耗时的P99 +- 文件/内存接口耗时 + - The time consumed of query scan(avg): 节点查询文件/内存接口耗时的平均值 + - The time consumed of query scan(50%): 节点查询文件/内存接口耗时的中位数 + - The time consumed of query scan(99%): 节点查询文件/内存接口耗时的P99 +- 资源访问数量 + - The usage of query resource(avg): 节点查询资源访问数量的平均值 + - The usage of query resource(50%): 节点查询资源访问数量的中位数 + - The usage of query resource(99%): 节点查询资源访问数量的P99 +- 数据传输耗时 + - The time consumed of query data exchange(avg): 节点查询数据传输耗时的平均值 + - The time consumed of query data exchange(50%): 节点查询数据传输耗时的中位数 + - The time consumed of query data exchange(99%): 节点查询数据传输耗时的P99 +- 数据传输数量 + - The count of Data Exchange(avg): 节点查询的数据传输数量的平均值 + - The count of Data Exchange: 节点查询的数据传输数量的分位数,包括中位数和P99 +- 任务调度数量与耗时 + - The number of query queue: 节点查询任务调度数量 + - The time consumed of query schedule time(avg): 节点查询任务调度耗时的平均值 + - The time consumed of query schedule time(50%): 节点查询任务调度耗时的中位数 + - The time consumed of query schedule time(99%): 节点查询任务调度耗时的P99 + +#### Query Interface + +- 加载时间序列元数据 + - The time consumed of load timeseries metadata(avg): 节点查询加载时间序列元数据耗时的平均值 + - The time consumed of load timeseries metadata(50%): 节点查询加载时间序列元数据耗时的中位数 + - The time consumed of load timeseries metadata(99%): 节点查询加载时间序列元数据耗时的P99 +- 读取时间序列 + - The time consumed of read timeseries metadata(avg): 节点查询读取时间序列耗时的平均值 + - The time consumed of read timeseries metadata(50%): 节点查询读取时间序列耗时的中位数 + - The time consumed of read timeseries metadata(99%): 节点查询读取时间序列耗时的P99 +- 修改时间序列元数据 + - The time consumed of timeseries metadata modification(avg): 节点查询修改时间序列元数据耗时的平均值 + - The time consumed of timeseries metadata modification(50%): 节点查询修改时间序列元数据耗时的中位数 + - The time consumed of timeseries metadata modification(99%): 节点查询修改时间序列元数据耗时的P99 +- 加载Chunk元数据列表 + - The time consumed of load chunk metadata list(avg): 节点查询加载Chunk元数据列表耗时的平均值 + - The time consumed of load chunk metadata list(50%): 节点查询加载Chunk元数据列表耗时的中位数 + - The time consumed of load chunk metadata list(99%): 节点查询加载Chunk元数据列表耗时的P99 +- 修改Chunk元数据 + - The time consumed of chunk metadata modification(avg): 节点查询修改Chunk元数据耗时的平均值 + - The time consumed of chunk metadata modification(50%): 节点查询修改Chunk元数据耗时的总位数 + - The time consumed of chunk metadata modification(99%): 节点查询修改Chunk元数据耗时的P99 +- 按照Chunk元数据过滤 + - The time consumed of chunk metadata filter(avg): 节点查询按照Chunk元数据过滤耗时的平均值 + - The time consumed of chunk metadata filter(50%): 节点查询按照Chunk元数据过滤耗时的中位数 + - The time consumed of chunk metadata filter(99%): 节点查询按照Chunk元数据过滤耗时的P99 +- 构造Chunk Reader + - The time consumed of construct chunk reader(avg): 节点查询构造Chunk Reader耗时的平均值 + - The time consumed of construct chunk reader(50%): 节点查询构造Chunk Reader耗时的中位数 + - The time consumed of construct chunk reader(99%): 节点查询构造Chunk Reader耗时的P99 +- 读取Chunk + - The time consumed of read chunk(avg): 节点查询读取Chunk耗时的平均值 + - The time consumed of read chunk(50%): 节点查询读取Chunk耗时的中位数 + - The time consumed of read chunk(99%): 节点查询读取Chunk耗时的P99 +- 初始化Chunk Reader + - The time consumed of init chunk reader(avg): 节点查询初始化Chunk Reader耗时的平均值 + - The time consumed of init chunk reader(50%): 节点查询初始化Chunk Reader耗时的中位数 + - The time consumed of init chunk reader(99%): 节点查询初始化Chunk Reader耗时的P99 +- 通过 Page Reader 构造 TsBlock + - The time consumed of build tsblock from page reader(avg): 节点查询通过 Page Reader 构造 TsBlock 耗时的平均值 + - The time consumed of build tsblock from page reader(50%): 节点查询通过 Page Reader 构造 TsBlock 耗时的中位数 + - The time consumed of build tsblock from page reader(99%): 节点查询通过 Page Reader 构造 TsBlock 耗时的P99 +- 查询通过 Merge Reader 构造 TsBlock + - The time consumed of build tsblock from merge reader(avg): 节点查询通过 Merge Reader 构造 TsBlock 耗时的平均值 + - The time consumed of build tsblock from merge reader(50%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的中位数 + - The time consumed of build tsblock from merge reader(99%): 节点查询通过 Merge Reader 构造 TsBlock 耗时的P99 + +#### Query Data Exchange + +查询的数据交换耗时。 + +- 通过 source handle 获取 TsBlock + - The time consumed of source handle get tsblock(avg): 节点查询通过 source handle 获取 TsBlock 耗时的平均值 + - The time consumed of source handle get tsblock(50%): 节点查询通过 source handle 获取 TsBlock 耗时的中位数 + - The time consumed of source handle get tsblock(99%): 节点查询通过 source handle 获取 TsBlock 耗时的P99 +- 通过 source handle 反序列化 TsBlock + - The time consumed of source handle deserialize tsblock(avg): 节点查询通过 source handle 反序列化 TsBlock 耗时的平均值 + - The time consumed of source handle deserialize tsblock(50%): 节点查询通过 source handle 反序列化 TsBlock 耗时的中位数 + - The time consumed of source handle deserialize tsblock(99%): 节点查询通过 source handle 反序列化 TsBlock 耗时的P99 +- 通过 sink handle 发送 TsBlock + - The time consumed of sink handle send tsblock(avg): 节点查询通过 sink handle 发送 TsBlock 耗时的平均值 + - The time consumed of sink handle send tsblock(50%): 节点查询通过 sink handle 发送 TsBlock 耗时的中位数 + - The time consumed of sink handle send tsblock(99%): 节点查询通过 sink handle 发送 TsBlock 耗时的P99 +- 回调 data block event + - The time consumed of on acknowledge data block event task(avg): 节点查询回调 data block event 耗时的平均值 + - The time consumed of on acknowledge data block event task(50%): 节点查询回调 data block event 耗时的中位数 + - The time consumed of on acknowledge data block event task(99%): 节点查询回调 data block event 耗时的P99 +- 获取 data block task + - The time consumed of get data block task(avg): 节点查询获取 data block task 耗时的平均值 + - The time consumed of get data block task(50%): 节点查询获取 data block task 耗时的中位数 + - The time consumed of get data block task(99%): 节点查询获取 data block task 耗时的 P99 + +#### Query Related Resource + +- MppDataExchangeManager: 节点查询时 shuffle sink handle 和 source handle 的数量 +- LocalExecutionPlanner: 节点可分配给查询分片的剩余内存 +- FragmentInstanceManager: 节点正在运行的查询分片上下文信息和查询分片的数量 +- Coordinator: 节点上记录的查询数量 +- MemoryPool Size: 节点查询相关的内存池情况 +- MemoryPool Capacity: 节点查询相关的内存池的大小情况,包括最大值和剩余可用值 +- DriverScheduler: 节点查询相关的队列任务数量 + +#### Consensus - IoT Consensus + +- 内存使用 + - IoTConsensus Used Memory: 节点的 IoT Consensus 的内存使用情况,包括总使用内存大小、队列使用内存大小、同步使用内存大小 +- 节点间同步情况 + - IoTConsensus Sync Index: 节点的 IoT Consensus 的 不同 DataRegion 的 SyncIndex 大小 + - IoTConsensus Overview: 节点的 IoT Consensus 的总同步差距和缓存的请求数量 + - IoTConsensus Search Index Rate: 节点 IoT Consensus 不同 DataRegion 的写入 SearchIndex 的增长速率 + - IoTConsensus Safe Index Rate: 节点 IoT Consensus 不同 DataRegion 的同步 SafeIndex 的增长速率 + - IoTConsensus LogDispatcher Request Size: 节点 IoT Consensus 不同 DataRegion 同步到其他节点的请求大小 + - Sync Lag: 节点 IoT Consensus 不同 DataRegion 的同步差距大小 + - Min Peer Sync Lag: 节点 IoT Consensus 不同 DataRegion 向不同副本的最小同步差距 + - Sync Speed Diff Of Peers: 节点 IoT Consensus 不同 DataRegion 向不同副本同步的最大差距 + - IoTConsensus LogEntriesFromWAL Rate: 节点 IoT Consensus 不同 DataRegion 从 WAL 获取日志的速率 + - IoTConsensus LogEntriesFromQueue Rate: 节点 IoT Consensus 不同 DataRegion 从 队列获取日志的速率 +- 不同执行阶段耗时 + - The Time Consumed Of Different Stages (avg): 节点 IoT Consensus 不同执行阶段的耗时的平均值 + - The Time Consumed Of Different Stages (50%): 节点 IoT Consensus 不同执行阶段的耗时的中位数 + - The Time Consumed Of Different Stages (99%): 节点 IoT Consensus 不同执行阶段的耗时的P99 + +#### Consensus - DataRegion Ratis Consensus + +- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 +- Write Log Entry: 节点 Ratis 写 Log 不同阶段的耗时 +- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的 QPS +- RatisConsensus Memory: 节点 Ratis 的内存使用情况 + +#### Consensus - SchemaRegion Ratis Consensus + +- Ratis Stage Time: 节点 Ratis 不同阶段的耗时 +- Write Log Entry: 节点 Ratis 写 Log 各阶段的耗时 +- Remote / Local Write Time: 节点 Ratis 在本地或者远端写入的耗时 +- Remote / Local Write QPS: 节点 Ratis 在本地或者远端写入的QPS +- RatisConsensus Memory: 节点 Ratis 内存使用情况 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Slow-Query-Management.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Slow-Query-Management.md new file mode 100644 index 00000000..7b8cf604 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Slow-Query-Management.md @@ -0,0 +1,35 @@ + + +# 慢查询管理 + +IoTDB 会将慢查询输出到单独的日志文件 log_datanode_slow_sql.log 中,并记录其执行耗时。 + +## 配置慢查询的阈值 + +IoTDB 在 `iotdb-system.properties` 提供了 `slow_query_threshold` 配置项,单位是毫秒,默认是30秒,超过此参数指定的阈值,便会被判断为慢查询,待其查询执行结束后,将其记录在 log_datanode_slow_sql.log 中。 + +## 慢查询日志示例 + +``` +2023-07-31 20:15:00,533 [pool-27-IoTDB-ClientRPC-Processor-1$20230731_121500_00003_1] INFO o.a.i.d.q.p.Coordinator:225 - Cost: 42593 ms, sql is select * from root.db.** +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md new file mode 100644 index 00000000..b08992d1 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md @@ -0,0 +1,178 @@ + +# 单机版部署 + +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](./Environment-Requirements.md)准备完成。 + +2. 部署时推荐优先使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在目标服务器上配置/etc/hosts,如本机ip是192.168.1.3,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的`cn_internal_address`、dn_internal_address、dn_rpc_address。 + + ```shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. 部分参数首次启动后不能修改,请参考下方的【参数配置】章节进行设置。 + +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 + +5. 请注意,安装部署IoTDB时需要保持使用同一个用户进行操作,您可以: +- 使用 root 用户(推荐):使用 root 用户可以避免权限等问题。 +- 使用固定的非 root 用户: + - 使用同一用户操作:确保在启动、停止等操作均保持使用同一用户,不要切换用户。 + - 避免使用 sudo:尽量避免使用 sudo 命令,因为它会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 + +## 安装步骤 + +### 解压安装包并进入安装目录 + +```shell +unzip apache-iotdb-{version}-all-bin.zip +cd apache-iotdb-{version}-all-bin +``` + +### 参数配置 + +#### 环境脚本配置 + +- ./conf/confignode-env.sh(./conf/confignode-env.bat)配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| ----------- | :------------------------------------: | :--------: | :----------------------------------------------: | :----------: | +| MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- ./conf/datanode-env.sh(./conf/datanode-env.bat)配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------: | :----------------------------------: | :--------: | :----------------------------------------------: | :----------: | +| MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 系统通用配置 + +打开通用配置文件(./conf/iotdb-system.properties 文件),设置以下参数: + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :-----------------------: | :------------------------------: | :------------: | :----------------------------------------------: | :-----------------------: | +| cluster_name | 集群名称 | defaultCluster | 可根据需要设置集群名称,如无特殊需要保持默认即可 | 首次启动后不可修改 | +| schema_replication_factor | 元数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | +| data_replication_factor | 数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | + +#### ConfigNode 配置 + +打开ConfigNode配置文件(./conf/iotdb-system.properties文件),设置以下参数: + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------: | :----------------: | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +#### DataNode 配置 + +打开DataNode配置文件 ./conf/iotdb-system.properties,设置以下参数: + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :-----------------------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------: | :----------------- | +| dn_rpc_address | 客户端 RPC 服务的地址 | 0.0.0.0 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +> ❗️注意:VSCode Remote等编辑器无自动保存配置功能,请确保修改的文件被持久化保存,否则配置项无法生效 + +### 启动ConfigNode 节点 + +进入iotdb的sbin目录下,启动confignode + +```shell +./start-confignode.sh -d #“-d”参数将在后台进行启动 +``` +如果启动失败,请参考[常见问题](#常见问题)。 + +### 启动DataNode 节点 + + 进入iotdb的sbin目录下,启动datanode: + +```shell +cd sbin +./start-datanode.sh -d #-d参数将在后台进行启动 +``` + +### 验证部署 + +可直接执行 ./sbin 目录下的 Cli 启动脚本: + +```shell +./start-cli.sh -h ip(本机ip或域名) -p 端口号(6667) +``` + + 成功启动后,出现如下界面显示IoTDB安装成功。 + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90%E7%89%88%E5%90%AF%E5%8A%A8%E6%88%90%E5%8A%9F.png) + +出现安装成功界面后,使用`show cluster`命令查看服务运行状态 + +当看到status都是running表示服务启动成功 + +![](https://alioss.timecho.com/docs/img/%E5%BC%80%E6%BA%90-%E5%8D%95%E6%9C%BAshow.jpeg) + +> 出现`ACTIVATED(W)`为被动激活,表示此ConfigNode没有license文件(或没有签发时间戳最新的license文件)。此时建议检查license文件是否已放入license文件夹,没有请放入license文件,若已存在license文件,可能是此节点license文件与其他节点信息不一致导致,请联系天谋工作人员重新申请. + +## 常见问题 + +1. Confignode节点启动失败 + + 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + + 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + + 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + + 步骤 4: 清理环境: + + a. 结束所有 ConfigNode 和 DataNode 进程。 + + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. 删除 data 和 logs 目录。 + + 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md new file mode 100644 index 00000000..449d058b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md @@ -0,0 +1,221 @@ + +# 单机版部署 + +本章将介绍如何启动IoTDB单机实例,IoTDB单机实例包括 1 个ConfigNode 和1个DataNode(即通常所说的1C1D)。 + +## 注意事项 + +1. 安装前请确认系统已参照[系统配置](./Environment-Requirements.md)准备完成。 + +2. 部署时推荐优先使用`hostname`进行IP配置,可避免后期修改主机ip导致数据库无法启动的问题。设置hostname需要在目标服务器上配置/etc/hosts,如本机ip是192.168.1.3,hostname是iotdb-1,则可以使用以下命令设置服务器的 hostname,并使用hostname配置IoTDB的`cn_internal_address`、dn_internal_address、dn_rpc_address。 + + ```shell + echo "192.168.1.3 iotdb-1" >> /etc/hosts + ``` + +3. 部分参数首次启动后不能修改,请参考下方的【参数配置】章节进行设置 + +4. 无论是在linux还是windows中,请确保IoTDB的安装路径中不含空格和中文,避免软件运行异常。 + +5. 请注意,安装部署(包括激活和使用软件)IoTDB时需要保持使用同一个用户进行操作,您可以: +- 使用 root 用户(推荐):使用 root 用户可以避免权限等问题。 +- 使用固定的非 root 用户: + - 使用同一用户操作:确保在启动、激活、停止等操作均保持使用同一用户,不要切换用户。 + - 避免使用 sudo:尽量避免使用 sudo 命令,因为它会以 root 用户权限执行命令,可能会引起权限混淆或安全问题。 + +6. 推荐部署监控面板,可以对重要运行指标进行监控,随时掌握数据库运行状态,监控面板可以联系商务获取,部署监控面板步骤可以参考:[监控面板部署](./Monitoring-panel-deployment.md)。 + +## 安装步骤 + +### 解压安装包并进入安装目录 + +```shell +unzip iotdb-enterprise-{version}-bin.zip +cd iotdb-enterprise-{version}-bin +``` + +### 参数配置 + +#### 环境脚本配置 + +- ./conf/confignode-env.sh(./conf/confignode-env.bat)配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------: | :------------------------------------: | :--------: | :----------------------------------------------: | :----------: | +| MEMORY_SIZE | IoTDB ConfigNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +- ./conf/datanode-env.sh(./conf/datanode-env.bat)配置 + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :---------: | :----------------------------------: | :--------: | :----------------------------------------------: | :----------: | +| MEMORY_SIZE | IoTDB DataNode节点可以使用的内存总量 | 空 | 可按需填写,填写后系统会根据填写的数值来分配内存 | 重启服务生效 | + +#### 系统通用配置 + +打开通用配置文件(./conf/iotdb-system.properties 文件),设置以下参数: + +| **配置项** | **说明** | **默认值** | **推荐值** | 备注 | +| :-----------------------: | :------------------------------: | :------------: | :----------------------------------------------: | :-----------------------: | +| cluster_name | 集群名称 | defaultCluster | 可根据需要设置集群名称,如无特殊需要保持默认即可 | 首次启动后不可修改 | +| schema_replication_factor | 元数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | +| data_replication_factor | 数据副本数,单机版此处设置为 1 | 1 | 1 | 默认1,首次启动后不可修改 | + +#### ConfigNode配置 + +打开ConfigNode配置文件(./conf/iotdb-system.properties文件),设置以下参数: + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :-----------------: | :----------------------------------------------------------: | :-------------: | :----------------------------------------------: | :----------------: | +| cn_internal_address | ConfigNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| cn_internal_port | ConfigNode在集群内部通讯使用的端口 | 10710 | 10710 | 首次启动后不能修改 | +| cn_consensus_port | ConfigNode副本组共识协议通信使用的端口 | 10720 | 10720 | 首次启动后不能修改 | +| cn_seed_config_node | 节点注册加入集群时连接的ConfigNode 的地址,cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +#### DataNode 配置 + +打开DataNode配置文件 ./conf/iotdb-system.properties,设置以下参数: + +| **配置项** | **说明** | **默认** | 推荐值 | **备注** | +| :------------------------------ | :----------------------------------------------------------- | :-------------- | :----------------------------------------------- | :----------------- | +| dn_rpc_address | 客户端 RPC 服务的地址 | 0.0.0.0 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 重启服务生效 | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 6667 | 重启服务生效 | +| dn_internal_address | DataNode在集群内部通讯使用的地址 | 127.0.0.1 | 所在服务器的IPV4地址或hostname,推荐使用hostname | 首次启动后不能修改 | +| dn_internal_port | DataNode在集群内部通信使用的端口 | 10730 | 10730 | 首次启动后不能修改 | +| dn_mpp_data_exchange_port | DataNode用于接收数据流使用的端口 | 10740 | 10740 | 首次启动后不能修改 | +| dn_data_region_consensus_port | DataNode用于数据副本共识协议通信使用的端口 | 10750 | 10750 | 首次启动后不能修改 | +| dn_schema_region_consensus_port | DataNode用于元数据副本共识协议通信使用的端口 | 10760 | 10760 | 首次启动后不能修改 | +| dn_seed_config_node | 节点注册加入集群时连接的ConfigNode地址,即cn_internal_address:cn_internal_port | 127.0.0.1:10710 | cn_internal_address:cn_internal_port | 首次启动后不能修改 | + +> ❗️注意:VSCode Remote等编辑器无自动保存配置功能,请确保修改的文件被持久化保存,否则配置项无法生效 + +### 启动 ConfigNode 节点 + +进入iotdb的sbin目录下,启动confignode + +```shell +./start-confignode.sh -d #“-d”参数将在后台进行启动 +``` +如果启动失败,请参考[常见问题](#常见问题)。 + +### 激活数据库 + +#### 方式一:激活文件拷贝激活 + +- 启动confignode节点后,进入activation文件夹, 将 system_info文件复制给天谋工作人员 +- 收到工作人员返回的 license文件 +- 将license文件放入对应节点的activation文件夹下; + +#### 方式二:激活脚本激活 + +- 获取激活所需机器码,进入安装目录的sbin目录,执行激活脚本: + +```shell + cd sbin +./start-activate.sh +``` + +- 显示如下信息,请将机器码(即该串字符)复制给天谋工作人员: + +```shell +Please copy the system_info's content and send it to Timecho: +01-KU5LDFFN-PNBEHDRH +Please enter license: +``` + +- 将工作人员返回的激活码输入上一步的命令行提示处 `Please enter license:`,如下提示: + +```shell +Please enter license: +Jw+MmF+AtexsfgNGOFgTm83BgXbq0zT1+fOfPvQsLlj6ZsooHFU6HycUSEGC78eT1g67KPvkcLCUIsz2QpbyVmPLr9x1+kVjBubZPYlVpsGYLqLFc8kgpb5vIrPLd3hGLbJ5Ks8fV1WOVrDDVQq89YF2atQa2EaB9EAeTWd0bRMZ+s9ffjc/1Zmh9NSP/T3VCfJcJQyi7YpXWy5nMtcW0gSV+S6fS5r7a96PjbtE0zXNjnEhqgRzdU+mfO8gVuUNaIy9l375cp1GLpeCh6m6pF+APW1CiXLTSijK9Qh3nsL5bAOXNeob5l+HO5fEMgzrW8OJPh26Vl6ljKUpCvpTiw== +License has been stored to sbin/../activation/license +Import completed. Please start cluster and excute 'show cluster' to verify activation status +``` + +### 启动DataNode 节点 + +进入iotdb的sbin目录下,启动datanode: + +```shell +cd sbin +./start-datanode.sh -d #-d参数将在后台进行启动 +``` + +### 验证部署 + +可直接执行 ./sbin 目录下的 Cli 启动脚本: + +```shell +./start-cli.sh -h ip(本机ip或域名) -p 端口号(6667) +``` + +成功启动后,出现如下界面显示IOTDB安装成功。 + +![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8%E6%88%90%E5%8A%9F.png) + +出现安装成功界面后,继续看下是否激活成功,使用`show cluster`命令 + +当看到最右侧显示ACTIVATED表示激活成功 + +![](https://alioss.timecho.com/docs/img/show%20cluster.png) + +> 出现`ACTIVATED(W)`为被动激活,表示此ConfigNode没有license文件(或没有签发时间戳最新的license文件)。此时建议检查license文件是否已放入license文件夹,没有请放入license文件,若已存在license文件,可能是此节点license文件与其他节点信息不一致导致,请联系天谋工作人员重新申请. + + +## 常见问题 + +1. 部署过程中多次提示激活失败 + - 使用 `ls -al` 命令:使用 `ls -al` 命令检查安装包根目录的所有者信息是否为当前用户。 + - 检查激活目录:检查 `./activation` 目录下的所有文件,所有者信息是否为当前用户。 + +2. Confignode节点启动失败 + + 步骤 1: 请查看启动日志,检查是否修改了某些首次启动后不可改的参数。 + + 步骤 2: 请查看启动日志,检查是否出现其他异常。日志中若存在异常现象,请联系天谋技术支持人员咨询解决方案。 + + 步骤 3: 如果是首次部署或者数据可删除,也可按下述步骤清理环境,重新部署后,再次启动。 + + 步骤 4: 清理环境: + + a. 结束所有 ConfigNode 和 DataNode 进程。 + + ```Bash + # 1. 停止 ConfigNode 和 DataNode 服务 + sbin/stop-standalone.sh + + # 2. 检查是否还有进程残留 + jps + # 或者 + ps -ef|gerp iotdb + + # 3. 如果有进程残留,则手动kill + kill -9 + # 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 + ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 + ``` + b. 删除 data 和 logs 目录。 + + 说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + ```Bash + cd /data/iotdb + rm -rf data logs + ``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/workbench-deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/workbench-deployment_timecho.md new file mode 100644 index 00000000..9f3611e0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Deployment-and-Maintenance/workbench-deployment_timecho.md @@ -0,0 +1,204 @@ + +# 可视化控制台部署 + +可视化控制台是IoTDB配套工具之一(类似 Navicat for MySQL)。它用于数据库部署实施、运维管理、应用开发各阶段的官方应用工具体系,让数据库的使用、运维和管理更加简单、高效,真正实现数据库低成本的管理和运维。本文档将帮助您安装Workbench。 + +
+  +  +
+ +## 安装准备 + +| 准备内容 | 名称 | 版本要求 | 官方链接 | +| :------: | :-----------------------: | :----------------------------------------------------------: | :----------------------------------------------------: | +| 操作系统 | Windows或Linux | - | - | +| 安装环境 | JDK | 需要 >= V1.8.0_162(推荐使用 11 或者 17,下载时请根据机器配置选择ARM或x64安装包) | https://www.oracle.com/java/technologies/downloads/ | +| 相关软件 | Prometheus | 需要 >=V2.30.3 | https://prometheus.io/download/ | +| 数据库 | IoTDB | 需要>=V1.2.0企业版 | 您可联系商务或技术支持获取 | +| 控制台 | IoTDB-Workbench-``| - | 您可根据附录版本对照表进行选择后联系商务或技术支持获取 | + +## 安装步骤 + +### 步骤一:IoTDB 开启监控指标采集 + +1. 打开监控配置项。IoTDB中监控有关的配置项默认是关闭的,在部署监控面板前,您需要打开相关配置项(注意开启监控配置后需要重启服务)。 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
配置项所在配置文件配置说明
cn_metric_reporter_listconf/iotdb-system.properties请在配置文件中添加该配置项,值设置为PROMETHEUS
cn_metric_level请在配置文件中添加该配置项,值设置为IMPORTANT
cn_metric_prometheus_reporter_port请在配置文件中添加该配置项,可保持默认设置9091,如设置其他端口,不与其他端口冲突即可
dn_metric_reporter_listconf/iotdb-system.properties请在配置文件中添加该配置项,值设置为PROMETHEUS
dn_metric_level请在配置文件中添加该配置项,值设置为IMPORTANT
dn_metric_prometheus_reporter_port请在配置文件中添加该配置项,可保持默认设置9092,如设置其他端口,不与其他端口冲突即可
dn_metric_internal_reporter_type请在配置文件中添加该配置项,值设置为IOTDB
enable_audit_logconf/iotdb-system.properties请在配置文件中添加该配置项,值设置为true
audit_log_storage请在配置文件中添加该配置项,值设置为IOTDB,LOGGER
audit_log_operation请在配置文件中添加该配置项,值设置为DML,DDL,QUERY
+ +2. 重启所有节点。修改3个节点的监控指标配置后,可重新启动所有节点的confignode和datanode服务: + + ```shell + ./sbin/stop-standalone.sh #先停止confignode和datanode + ./sbin/start-confignode.sh -d #启动confignode + ./sbin/start-datanode.sh -d #启动datanode + ``` + +3. 重启后,通过客户端确认各节点的运行状态,若状态都为Running,则为配置成功: + + ![](https://alioss.timecho.com/docs/img/%E5%90%AF%E5%8A%A8.PNG) + +### 步骤二:安装、配置Prometheus监控 + +1. 确保Prometheus安装完成(官方安装说明可参考:https://prometheus.io/docs/introduction/first_steps/) +2. 解压安装包,进入解压后的文件夹: + + ```Shell + tar xvfz prometheus-*.tar.gz + cd prometheus-* + ``` + +3. 修改配置。修改配置文件prometheus.yml如下 + 1. 新增confignode任务收集ConfigNode的监控数据 + 2. 新增datanode任务收集DataNode的监控数据 + + ```shell + global: + scrape_interval: 15s + evaluation_interval: 15s + scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["iotdb-1:9091","iotdb-2:9091","iotdb-3:9091"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["iotdb-1:9092","iotdb-2:9092","iotdb-3:9092"] + honor_labels: true + ``` + +4. 启动Prometheus。Prometheus 监控数据的默认过期时间为15天,在生产环境中,建议将其调整为180天以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: + + ```Shell + ./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d + ``` + +5. 确认启动成功。在浏览器中输入 `http://IP:port`,进入Prometheus,点击进入Status下的Target界面,当看到State均为Up时表示配置成功并已经联通。 + +
+ + +
+ +### 步骤三:安装Workbench + +1. 进入iotdb-Workbench-``的config目录 + +2. 修改Workbench配置文件:进入`config`文件夹下修改配置文件`application-prod.properties`。若您是在本机安装则无需修改,若是部署在服务器上则需修改IP地址 + > Workbench可以部署在本地或者云服务器,只要能与 IoTDB 连接即可 + + | 配置项 | 修改前 | 修改后 | + | ---------------- | --------------------------------- | -------------------------------------- | + | pipe.callbackUrl | pipe.callbackUrl=`http://127.0.0.1` | pipe.callbackUrl=`http://<部署Workbench的IP地址>` | + + ![](https://alioss.timecho.com/docs/img/workbench-conf-1.png) + +3. 启动程序:请在IoTDB-Workbench-``的sbin文件夹下执行启动命令 + + Windows版: + ```shell + # 后台启动Workbench + start.sh -d + ``` + + Linux版: + ```shell + # 后台启动Workbench + start.bat -d + ``` + +4. 可以通过`jps`命令进行启动是否成功,如图所示即为启动成功: + + ![](https://alioss.timecho.com/docs/img/windows-jps.png) + +5. 验证是否成功:浏览器中打开:"`http://服务器ip:配置文件中端口`"进行访问,例如:"`http://127.0.0.1:9190`",当出现登录界面时即为成功 + + ![](https://alioss.timecho.com/docs/img/workbench.png) + + +## 附录:IoTDB与控制台版本对照表 + +| 控制台版本号 | 版本说明 | 可支持IoTDB版本 | +| :------------: | :------------------------------------------------------------: | :----------------: | +| V1.4.0 | 新增树模型展示及国际化 | V1.3.2及以上版本 | +| V1.3.1 | 分析功能新增分析方式,优化导入模版等功能 | V1.3.2及以上版本 | +| V1.3.0 | 新增数据库配置功能,优化部分版本细节 | V1.3.2及以上版本 | +| V1.2.6 | 优化各模块权限控制功能 | V1.3.1及以上版本 | +| V1.2.5 | 可视化功能新增“常用模版”概念,所有界面优化补充页面缓存等功能 | V1.3.0及以上版本 | +| V1.2.4 | 计算功能新增“导入、导出”功能,测点列表新增“时间对齐”字段 | V1.2.2及以上版本 | +| V1.2.3 | 首页新增“激活详情”,新增分析等功能 | V1.2.2及以上版本 | +| V1.2.2 | 优化“测点描述”展示内容等功能 | V1.2.2及以上版本 | +| V1.2.1 | 数据同步界面新增“监控面板”,优化Prometheus提示信息 | V1.2.2及以上版本 | +| V1.2.0 | 全新Workbench版本升级 | V1.2.0及以上版本 | + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DBeaver.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DBeaver.md new file mode 100644 index 00000000..1959a99d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DBeaver.md @@ -0,0 +1,83 @@ + + +# DBeaver + +DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 IoTDB 的 JDBC 驱动与 IoTDB 进行交互。 + +## DBeaver 安装 + +* DBeaver 下载地址:https://dbeaver.io/download/ + +## IoTDB 安装 + +* 下载 IoTDB 二进制版本 + * IoTDB 下载地址:https://iotdb.apache.org/Download/ + * 版本 >= 0.13.0 +* 或者从源代码中编译 + * 参考 https://github.com/apache/iotdb + +## 连接 IoTDB 与 DBeaver + +1. 启动 IoTDB 服务 + + ```shell + ./sbin/start-server.sh + ``` +2. 启动 DBeaver + +3. 打开 Driver Manager + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/01.png?raw=true) +4. 为 IoTDB 新建一个驱动类型 + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) + +5. 下载 jdbc 驱动, 点击下列网址 [地址1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/) 或 [地址2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/),选择对应版本的 jar 包,下载后缀 jar-with-dependencies.jar 的包 + ![](https://alioss.timecho.com/docs/img/20230920-192746.jpg) +6. 添加刚刚下载的驱动包,点击 Find Class + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) + +7. 编辑驱动设置 + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/05.png) + +8. 新建 DataBase Connection, 选择 iotdb + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/06.png) + +9. 编辑 JDBC 连接设置 + + ``` + JDBC URL: jdbc:iotdb://127.0.0.1:6667/ + Username: root + Password: root + ``` + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/07.png) + +10. 测试连接 + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/08.png) + +11. 可以开始通过 DBeaver 使用 IoTDB + + ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/09.png) diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DataEase.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DataEase.md new file mode 100644 index 00000000..411bf151 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/DataEase.md @@ -0,0 +1,229 @@ + +# DataEase + +## 产品概述 + +1. DataEase 简介 + + DataEase 是一个开源的数据可视化与分析工具,提供拖拽式的界面,使得用户能够轻松创建图表和仪表板,已支持 MySQL、SQL Server、Hive、ClickHouse、达梦等多种数据源,并且可以集成到其他应用程序中。能帮助用户快速洞察数据,做出决策。更多介绍详情请参考[DataEase 官网](https://www.fit2cloud.com/dataease/index.html) + +
+ + +
+ +2. DataEase-IoTDB 连接器介绍 + + IoTDB 可以通过API数据源的形式与DataEase实现高效集成,利用API数据源插件通过Session接口访问IoTDB数据。该插件支持定制化的数据处理功能,为用户提供了更大的灵活性和更多样化的数据操作选项。 +
+ +
+ +## 安装要求 + +| **准备内容** | **版本要求** | +| :-------------------- | :----------------------------------------------------------- | +| IoTDB | 版本无要求,安装请参考 IoTDB [部署指导](https://www.timecho.com/docs/zh/UserGuide/latest/Deployment-and-Maintenance/IoTDB-Package_timecho.html) | +| JDK | 建议 JDK11 及以上版本(推荐部署 JDK17 及以上版本) | +| DataEase | 要求 v1 系列 v1.18 版本,安装请参考 DataEase 官网[安装指导](https://dataease.io/docs/v2/installation/offline_INSTL_and_UPG/)(暂不支持 v2.x,其他版本适配请联系天谋商务) | +| DataEase-IoTDB 连接器 | 请联系天谋商务获取 | + +## 安装步骤 + +步骤一:请联系商务获取压缩包,解压缩安装包( iotdb-api-source-1.0.0.zip ) + +步骤二:解压后,修改`config`文件夹中的配置文件`application.properties` + +- 端口`server.port`可以按需进行修改 +- `iotdb.nodeUrls`需配置为待连接的 IoTDB 的实例的地址和端口 +- `iotdb.user`需配置为 IoTDB 的用户名 +- `iotdb.password`需配置为 IoTDB 的密码 + +```Properties +# 启动 IoTDB API Source 监听的端口 +server.port=8097 +# IoTDB 的实例地址,多个 nodeUrls 用 ; 分割 +iotdb.nodeUrls=127.0.0.1:6667 +# IoTDB 用户名 +iotdb.user=root +# IoTDB 密码 +iotdb.password=root +``` + +步骤三:启动 DataEase-IoTDB 连接器 + +- 前台启动 + +```Shell +./sbin/start.sh +``` + +- 后台启动(增加 -d 参数) + +```Shell +./sbin/start.sh -d +``` + +步骤四:启动后可以通过日志来查看是否启动成功。 + +```Shell + lsof -i:8097 // config 里启动 IoTDB API Source 监听的端口 +``` + +## 使用说明 + +### 登录 DataEase + +1. 登录 DataEase,访问地址 : `http://目标服务器IP地址:80` +
+ +
+ +### 配置数据源 + +1. 在导航条中跳转【数据源】界面 +
+ +
+ +2. 点击左上角 【 + 】,滑动到底部,选择【API】数据源 +
+ +
+ +3. 新建 API 数据源,自行设置基本信息中的【显示名称】,在数据表位置点击【添加】 +
+ +
+ +4. 在数据表名称字段中输入自定义的【名称】,请求类型选择 `Post`,地址填写 `http://[IoTDB API Source]:[port]/getData`,如果在本机操作且使用的是默认端口,地址应填写`http://127.0.0.1:8097/getData` +
+ +
+ +5. 在【请求参数】部分,选择【请求体】标签页,并确保格式设置为 JSON。请按照以下示例填写参数,其中: + timeseries:要查询的序列的完整路径(目前只支持查询一条序列) + limit:需要查询的条数(有效范围为 大于 0 且 小于 100000) + + ```JSON + { + "timeseries": "root.ln.wf03.wt03.speed", + "limit": 1000 + } + ``` +
+ +
+ +6. 点击【认证配置】标签页,选择【Basic Auth】作为认证方式,并准确输入 IoTDB 的用户名和密码 +
+ +
+ +7. 点击【下一步】,将在`data`部分看到接口返回结果。如下图展示接口中,返回了`time`、 `rownumber`和`value`信息,同时需要指定各字段数据类型。完成设置后,点击界面右下角的【保存】按钮。 +
+ +
+ +8. 保存后进入新建 API 数据源页面,点击右上角【保存】按钮。 +
+ +
+ +9. 保存数据源:保存后,可在 API 分类菜单下查看该数据源及其详细信息,或编辑该数据源。 +
+ +
+ +### 配置数据集 + +1. 创建 API 数据集:在导航条中跳转至数据集页面,点击页面左上角的 【 + 】 符号,选择【API 数据集】类型,选择此数据集所在的目录,即可进入新建 API 数据集页面。 +
+ + +
+ +2. 在新建 API 数据集页面,选择刚才新建的 API 数据源和包含在数据集中的对应数据表(下图左),并设置数据集名称(下图右)。设置完毕后,点击页面右上角的【保存】按钮以完成数据集的创建。 +
+ + +
+ +3. 选择刚刚创建的数据集,进入【字段管理】标签页,然后将所需的字段(如 rowNum)标记为维度。 +
+ +
+ +4. 配置更新频率:在【更新信息】页面上点击【添加任务】,设置以下信息: + + 任务名称:根据实际情况填写 + + 更新方式:选择【全量更新】 + + 执行频率:根据实际情况设置(考虑DataEase获取速度,建议设置为大于 5 秒更新一次),例如需要设置为每 5 秒更新,则可以选择【表达式设定】并在【cron 表达式】中设置为`0/5 * * * * ? *` + 配置完成后,点击页面右下角的【确认】按钮保存设置。 +
+ +
+ +5. 任务已成功添加。可以通过点击页面左上角的【执行记录】选项查看执行记录。 +
+ +
+ +### 配置仪表板 + +1. 在导航条中跳转至仪表板页面,可以点击【 + 】符号新建目录,并且在对应目录,点击【 + 】符号,然后从弹出的菜单中选择【新建仪表板】 +
+ +
+ +2. 按需进行设置后点击【确定】,以自定义设置为例,确定后进入新建仪表板页面 +
+ +
+ +3. 在新建仪表板页面,点击【视图】按钮以打开添加视图的弹窗。在弹窗中,选择之前创建的数据集,然后点击【下一步】继续操作。 +
+ +
+ +4. 在选择图表类型的步骤中,根据展示需求,选择一个合适的图表类型,如【基础折线图】。选择完毕后,点击【确认】按钮应用选择。 +
+ +
+ +5. 在图表配置界面,通过拖放操作将`rowNum`字段拖拽到类别轴(通常是 X 轴),将`value`字段拖拽到值轴(通常是 Y 轴)。 +
+ +
+ +6. 在图表的类别轴设置中,选择将排序方式设定为升序,这样数据将按照从小到大的顺序展示。设置数据刷新频率以确定图表更新的频率。完成这些设置后,您可以进一步调整图表的其他格式和样式选项,比如颜色、大小等,以满足展示需求。调整完后,点击页面右上角的【保存】按钮来保存图表配置。 +>由于 DataEase 在自动更新数据集后可能会导致原本按升序返回的 API 数据顺序错乱,所以需要在图表配置中手动指定排序方式。 +
+ +
+ +7. 退出编辑后查看效果 +
+ +
\ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-IoTDB.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-IoTDB.md new file mode 100644 index 00000000..6d2e61de --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-IoTDB.md @@ -0,0 +1,121 @@ + + +# Apache Flink(IoTDB) + +IoTDB 与 [Apache Flink](https://flink.apache.org/) 的集成。此模块包含了 iotdb sink,允许 flink job 将时序数据写入 IoTDB。 + +## IoTDBSink + +使用 `IoTDBSink` ,您需要定义一个 `IoTDBOptions` 和一个 `IoTSerializationSchema` 实例。 `IoTDBSink` 默认每次发送一个数据,可以通过调用 `withBatchSize(int)` 进行调整。 + +## 示例 + +该示例演示了如下从一个 Flink job 中发送数据到 IoTDB server 的场景: + +- 一个模拟的 Source `SensorSource` 每秒钟产生一个数据点。 + +- Flink 使用 `IoTDBSink` 消费产生的数据并写入 IoTDB 。 + + ```java + import org.apache.iotdb.flink.options.IoTDBSinkOptions; + import org.apache.iotdb.tsfile.file.metadata.enums.CompressionType; + import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + import org.apache.iotdb.tsfile.file.metadata.enums.TSEncoding; + + import com.google.common.collect.Lists; + import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; + import org.apache.flink.streaming.api.functions.source.SourceFunction; + + import java.security.SecureRandom; + import java.util.HashMap; + import java.util.Map; + import java.util.Random; + + public class FlinkIoTDBSink { + public static void main(String[] args) throws Exception { + // run the flink job on local mini cluster + StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); + + IoTDBSinkOptions options = new IoTDBSinkOptions(); + options.setHost("127.0.0.1"); + options.setPort(6667); + options.setUser("root"); + options.setPassword("root"); + + // If the server enables auto_create_schema, then we do not need to register all timeseries + // here. + options.setTimeseriesOptionList( + Lists.newArrayList( + new IoTDBSinkOptions.TimeseriesOption( + "root.sg.d1.s1", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY))); + + IoTSerializationSchema serializationSchema = new DefaultIoTSerializationSchema(); + IoTDBSink ioTDBSink = + new IoTDBSink(options, serializationSchema) + // enable batching + .withBatchSize(10) + // how many connections to the server will be created for each parallelism + .withSessionPoolSize(3); + + env.addSource(new SensorSource()) + .name("sensor-source") + .setParallelism(1) + .addSink(ioTDBSink) + .name("iotdb-sink"); + + env.execute("iotdb-flink-example"); + } + + private static class SensorSource implements SourceFunction> { + boolean running = true; + Random random = new SecureRandom(); + + @Override + public void run(SourceContext context) throws Exception { + while (running) { + Map tuple = new HashMap(); + tuple.put("device", "root.sg.d1"); + tuple.put("timestamp", String.valueOf(System.currentTimeMillis())); + tuple.put("measurements", "s1"); + tuple.put("types", "DOUBLE"); + tuple.put("values", String.valueOf(random.nextDouble())); + + context.collect(tuple); + Thread.sleep(1000); + } + } + + @Override + public void cancel() { + running = false; + } + } + } + + ``` + + + +## 运行方法 + +* 启动 IoTDB server +* 运行 `org.apache.iotdb.flink.FlinkIoTDBSink.java` 将 Flink job 运行在本地的集群上。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-TsFile.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-TsFile.md new file mode 100644 index 00000000..17a3975e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Flink-TsFile.md @@ -0,0 +1,178 @@ + + +# Apache Flink(TsFile) + +## 关于 TsFile-Flink 连接器 + +TsFile-Flink-Connector 对 Tsfile 类型的外部数据源实现 Flink 的支持。 这使用户可以通过 Flink DataStream/DataSet 进行读取,写入和查询。 + +使用此连接器,您可以 + +* 从本地文件系统或 hdfs 加载单个或多个 TsFile (只支持以 DataSet 的形式)到 Flink 。 +* 将本地文件系统或 hdfs 中特定目录中的所有文件加载到 Flink 中。 + +## 快速开始 + +### TsFileInputFormat 示例 + +1. 使用默认的 RowRowRecordParser 创建 TsFileInputFormat 。 + +```java +String[] filedNames = { + QueryConstant.RESERVED_TIME, + "device_1.sensor_1", + "device_1.sensor_2", + "device_1.sensor_3", + "device_2.sensor_1", + "device_2.sensor_2", + "device_2.sensor_3" +}; +TypeInformation[] typeInformations = new TypeInformation[] { + Types.LONG, + Types.FLOAT, + Types.INT, + Types.INT, + Types.FLOAT, + Types.INT, + Types.INT +}; +List paths = Arrays.stream(filedNames) + .filter(s -> !s.equals(QueryConstant.RESERVED_TIME)) + .map(Path::new) + .collect(Collectors.toList()); +RowTypeInfo rowTypeInfo = new RowTypeInfo(typeInformations, filedNames); +QueryExpression queryExpression = QueryExpression.create(paths, null); +RowRowRecordParser parser = RowRowRecordParser.create(rowTypeInfo, queryExpression.getSelectedSeries()); +TsFileInputFormat inputFormat = new TsFileInputFormat<>(queryExpression, parser); +``` + +2. 从输入格式读取数据并打印到标准输出 stdout: + +DataStream: + +```java +StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment(); +inputFormat.setFilePath("source.tsfile"); +DataStream source = senv.createInput(inputFormat); +DataStream rowString = source.map(Row::toString); +Iterator result = DataStreamUtils.collect(rowString); +while (result.hasNext()) { + System.out.println(result.next()); +} +``` + +DataSet: + +```java +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +inputFormat.setFilePath("source.tsfile"); +DataSet source = env.createInput(inputFormat); +List result = source.map(Row::toString).collect(); +for (String s : result) { + System.out.println(s); +} +``` + +### TSRecordOutputFormat 示例 + +1. 使用默认的 RowTSRecordConverter 创建 TSRecordOutputFormat 。 + +```java +String[] filedNames = { + QueryConstant.RESERVED_TIME, + "device_1.sensor_1", + "device_1.sensor_2", + "device_1.sensor_3", + "device_2.sensor_1", + "device_2.sensor_2", + "device_2.sensor_3" +}; +TypeInformation[] typeInformations = new TypeInformation[] { + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG, + Types.LONG +}; +RowTypeInfo rowTypeInfo = new RowTypeInfo(typeInformations, filedNames); +Schema schema = new Schema(); +schema.extendTemplate("template", new MeasurementSchema("sensor_1", TSDataType.INT64, TSEncoding.TS_2DIFF)); +schema.extendTemplate("template", new MeasurementSchema("sensor_2", TSDataType.INT64, TSEncoding.TS_2DIFF)); +schema.extendTemplate("template", new MeasurementSchema("sensor_3", TSDataType.INT64, TSEncoding.TS_2DIFF)); +RowTSRecordConverter converter = new RowTSRecordConverter(rowTypeInfo); +TSRecordOutputFormat outputFormat = new TSRecordOutputFormat<>(schema, converter); +``` + +2. 通过输出格式写数据: + +DataStream: + +```java +StreamExecutionEnvironment senv = StreamExecutionEnvironment.getExecutionEnvironment(); +senv.setParallelism(1); +List data = new ArrayList<>(7); +data.add(new Tuple7(1L, 2L, 3L, 4L, 5L, 6L, 7L)); +data.add(new Tuple7(2L, 3L, 4L, 5L, 6L, 7L, 8L)); +data.add(new Tuple7(3L, 4L, 5L, 6L, 7L, 8L, 9L)); +data.add(new Tuple7(4L, 5L, 6L, 7L, 8L, 9L, 10L)); +data.add(new Tuple7(6L, 6L, 7L, 8L, 9L, 10L, 11L)); +data.add(new Tuple7(7L, 7L, 8L, 9L, 10L, 11L, 12L)); +data.add(new Tuple7(8L, 8L, 9L, 10L, 11L, 12L, 13L)); +outputFormat.setOutputFilePath(new org.apache.flink.core.fs.Path(path)); +DataStream source = senv.fromCollection( + data, Types.TUPLE(Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG)); +source.map(t -> { + Row row = new Row(7); + for (int i = 0; i < 7; i++) { + row.setField(i, t.getField(i)); + } + return row; +}).returns(rowTypeInfo).writeUsingOutputFormat(outputFormat); +senv.execute(); +``` + +DataSet: + +```java +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); +env.setParallelism(1); +List data = new ArrayList<>(7); +data.add(new Tuple7(1L, 2L, 3L, 4L, 5L, 6L, 7L)); +data.add(new Tuple7(2L, 3L, 4L, 5L, 6L, 7L, 8L)); +data.add(new Tuple7(3L, 4L, 5L, 6L, 7L, 8L, 9L)); +data.add(new Tuple7(4L, 5L, 6L, 7L, 8L, 9L, 10L)); +data.add(new Tuple7(6L, 6L, 7L, 8L, 9L, 10L, 11L)); +data.add(new Tuple7(7L, 7L, 8L, 9L, 10L, 11L, 12L)); +data.add(new Tuple7(8L, 8L, 9L, 10L, 11L, 12L, 13L)); +DataSet source = env.fromCollection( + data, Types.TUPLE(Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG, Types.LONG)); +source.map(t -> { + Row row = new Row(7); + for (int i = 0; i < 7; i++) { + row.setField(i, t.getField(i)); + } + return row; +}).returns(rowTypeInfo).write(outputFormat, path); +env.execute(); +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Connector.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Connector.md new file mode 100644 index 00000000..a0623bbb --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Connector.md @@ -0,0 +1,184 @@ + + +# Grafana(IoTDB) + +Grafana 是开源的指标量监测和可视化工具,可用于展示时序数据和应用程序运行分析。Grafana 支持 Graphite,InfluxDB 等国际主流时序数据库作为数据源。在 IoTDB 项目中,我们开发了 Grafana 展现 IoTDB 中时序数据的连接器 IoTDB-Grafana-Connector,为您提供使用 Grafana 展示 IoTDB 数据库中的时序数据的可视化方法。 + +## Grafana 的安装与部署 + +### 安装 + +* Grafana 组件下载地址:https://grafana.com/grafana/download +* 版本 >= 4.4.1 + +### simple-json-datasource 数据源插件安装 + + +* 插件名称: simple-json-datasource +* 下载地址: https://github.com/grafana/simple-json-datasource + +#### windows系统 +具体下载方法是:到Grafana的插件目录中:`{Grafana文件目录}\data\plugins\`(Windows系统,启动Grafana后会自动创建`data\plugins`目录)或`/var/lib/grafana/plugins` (Linux系统,plugins目录需要手动创建)或`/usr/local/var/lib/grafana/plugins`(MacOS系统,具体位置参看使用`brew install`安装Grafana后命令行给出的位置提示。 + +执行下面的命令: + +``` +Shell > git clone https://github.com/grafana/simple-json-datasource.git +``` + +#### linux系统 +建议使用grafana-cli安装该插件,具体安装命令如下 + +``` +sudo grafana-cli plugins install grafana-simple-json-datasource +sudo service grafana-server restart +``` + +#### 后续操作 +然后重启Grafana服务器,在浏览器中登录Grafana,在“Add data source”页面中“Type”选项出现“SimpleJson”即为安装成功。 + +如果出现如下报错 +``` +Unsigned plugins were found during plugin initialization. Grafana Labs cannot guarantee the integrity of these plugins. We recommend only using signed plugins. +The following plugins are disabled and not shown in the list below: +``` + +请找到相关的grafana的配置文件(例如windows下的customer.ini,linux下rpm安装后为/etc/grafana/grafana.ini),并进行如下的配置 + +``` +allow_loading_unsigned_plugins = "grafana-simple-json-datasource" +``` + +### 启动 Grafana + +进入 Grafana 的安装目录,使用以下命令启动 Grafana: +* Windows 系统: +``` +Shell > bin\grafana-server.exe +``` +* Linux 系统: +``` +Shell > sudo service grafana-server start +``` +* MacOS 系统: +``` +Shell > grafana-server --config=/usr/local/etc/grafana/grafana.ini --homepath /usr/local/share/grafana cfg:default.paths.logs=/usr/local/var/log/grafana cfg:default.paths.data=/usr/local/var/lib/grafana cfg:default.paths.plugins=/usr/local/var/lib/grafana/plugins +``` +更多安装详情,请点 [这里](https://grafana.com/docs/grafana/latest/installation/) + +## IoTDB 安装 + +参见 [https://github.com/apache/iotdb](https://github.com/apache/iotdb) + +## Grafana-IoTDB-Connector 连接器安装 + +```shell +git clone https://github.com/apache/iotdb.git +``` + +## 启动 Grafana-IoTDB-Connector + + * 方案一(适合开发者) + +导入整个项目,maven 依赖安装完后,直接运行`iotdb/grafana-connector/rc/main/java/org/apache/iotdb/web/grafana`目录下`TsfileWebDemoApplication.java`,这个 grafana 连接器采用 springboot 开发 + + * 方案二(适合使用者) + +```shell +cd iotdb +mvn clean package -pl iotdb-connector/grafana-connector -am -Dmaven.test.skip=true +cd iotdb-connector/grafana-connector/target +java -jar iotdb-grafana-connector-{version}.war + . ____ _ __ _ _ + /\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \ +( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \ + \\/ ___)| |_)| | | | | || (_| | ) ) ) ) + ' |____| .__|_| |_|_| |_\__, | / / / / + =========|_|==============|___/=/_/_/_/ + :: Spring Boot :: (v1.5.4.RELEASE) +... +``` + +如果您需要配置属性,将`grafana/src/main/resources/application.properties`移动到 war 包同级目录下(`grafana/target`) + +## 使用 Grafana + +Grafana 以网页的 dashboard 形式为您展示数据,在使用时请您打开浏览器,访问 http://\:\ + +默认地址为 http://localhost:3000/ + +注:IP 为您的 Grafana 所在的服务器 IP,Port 为 Grafana 的运行端口(默认 3000)。默认登录的用户名和密码都是“admin”。 + +### 添加 IoTDB 数据源 + +点击左上角的“Grafana”图标,选择`Data Source`选项,然后再点击`Add data source`。 + + +在编辑数据源的时候,`Type`一栏选择`Simplejson`,`URL`一栏填写 http://\:\,IP 为您的 IoTDB-Grafana-Connector 连接器所在的服务器 IP,Port 为运行端口(默认 8888)。之后确保 IoTDB 已经启动,点击“Save & Test”,出现“Data Source is working”提示表示配置成功。 + + +### 操作 Grafana + +进入 Grafana 可视化页面后,可以选择添加时间序列,如下图。您也可以按照 Grafana 官方文档进行相应的操作,详情可参看 Grafana 官方文档:http://docs.grafana.org/guides/getting_started/。 + + + +## 配置 grafana + +``` +# IoTDB 的 IP 和端口 +spring.datasource.url=jdbc:iotdb://127.0.0.1:6667/ +spring.datasource.username=root +spring.datasource.password=root +spring.datasource.driver-class-name=org.apache.iotdb.jdbc.IoTDBDriver +server.port=8888 +# Use this value to set timestamp precision as "ms", "us" or "ns", which must to be same with the timestamp +# precision of Apache IoTDB engine. +timestamp_precision=ms + +# 是否开启降采样 +isDownSampling=true +# 默认采样 interval +interval=1m +# 用于对连续数据 (int, long, float, double) 进行降采样的聚合函数 +# COUNT, FIRST_VALUE, LAST_VALUE, MAX_TIME, MAX_VALUE, AVG, MIN_TIME, MIN_VALUE, NOW, SUM +continuous_data_function=AVG +# 用于对离散数据 (boolean, string) 进行降采样的聚合函数 +# COUNT, FIRST_VALUE, LAST_VALUE, MAX_TIME, MIN_TIME, NOW +discrete_data_function=LAST_VALUE +``` + +其中 interval 具体配置信息如下 + +<1h: no sampling + +1h~1d : intervals = 1m + +1d~30d:intervals = 1h + +\>30d:intervals = 1d + +配置完后,请重新运行 war 包 + +``` +java -jar iotdb-grafana-connector-{version}.war +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Plugin.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Plugin.md new file mode 100644 index 00000000..7e385301 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Grafana-Plugin.md @@ -0,0 +1,288 @@ + + +# Grafana 插件 + +Grafana 是开源的指标量监测和可视化工具,可用于展示时序数据和应用程序运行分析。 + +在 IoTDB 项目中,我们开发了 Grafana 插件,该插件通过调用 IoTDB REST 服务来展现 IoTDB 中时序数据 ,提供了众多时序数据的可视化方法。Grafana 插件相较于 IoTDB-Grafana-Connector 连接器执行效率更高、支持的查询种类更多。只要在您部署环境允许的情况下,*我们都推荐直接使用 Grafana 插件而不使用 IoTDB-Grafana-Connector 连接器*。 + +## 部署 Grafana 插件 + +### 安装 Grafana + +* Grafana 组件下载地址:https://grafana.com/grafana/download +* 版本 >= 9.3.0 + +### grafana-plugin 获取 + +#### Grafana官方下载 apache-iotdb-datasource + +二进制文件下载地址:https://grafana.com/api/plugins/apache-iotdb-datasource/versions/1.0.0/download + +### grafana-plugin 插件安装 + +### 方式一 使用 grafana-cli 工具安装(推荐) + +* 使用 grafana-cli 工具从命令行安装 apache-iotdb-datasource,命令内容如下: + +```shell +grafana-cli plugins install apache-iotdb-datasource +``` + +### 方式二 使用Grafana 界面安装(推荐) + +从本地 Grafana 点击 Configuration -> Plugins -> 搜索 IoTDB 进行插件安装 + +### 方式三 手动安装grafana-plugin 插件(不推荐) + +* 拷贝上述生成的前端工程目标文件夹到 Grafana 的插件目录中 `${Grafana文件目录}\data\plugins\`。如果没有此目录可以手动建或者启动grafana会自动建立,当然也可以修改plugins的位置,具体请查看下面的修改Grafana 的插件目录位置说明。 + +* 启动Grafana服务,如果 Grafana 服务已启动,则需要停止Grafana服务,然后再启动Grafana。 + +更多有关Grafana详情,请点 [这里](https://grafana.com/docs/grafana/latest/plugins/installation/) + +### 启动 Grafana + +进入 Grafana 的安装目录,使用以下命令启动 Grafana: +* Windows 系统: + +```shell +bin\grafana-server.exe +``` +* Linux 系统: + +```shell +sudo service grafana-server start +``` +* MacOS 系统: + +```shell +brew services start grafana +``` +更多详情,请点 [这里](https://grafana.com/docs/grafana/latest/installation/) + + +### 配置 IoTDB REST 服务 + +进入 `{iotdb 目录}/conf`,打开 `iotdb-system.properties` 文件,并作如下修改: + +```properties +# Is the REST service enabled +enable_rest_service=true + +# the binding port of the REST service +rest_service_port=18080 +``` + +启动(重启)IoTDB 使配置生效,此时 IoTDB REST 服务处于运行状态。 + + + +## 使用 Grafana 插件 + +### 访问 Grafana dashboard + +Grafana 以网页的 dashboard 形式为您展示数据,在使用时请您打开浏览器,访问 `http://:`。 + +注:IP 为您的 Grafana 所在的服务器 IP,Port 为 Grafana 的运行端口(默认 3000)。 + +在本地试用时,Grafana dashboard 的默认地址为 `http://localhost:3000/`。 + +默认登录的用户名和密码都是 `admin`。 + + + +### 添加 IoTDB 数据源 + +点击左侧的 `设置` 图标,选择 `Data Source` 选项,然后再点击 `Add data source`。 + + + + + +选择 `Apache IoTDB` 数据源,`URL` 一栏填写 `http://:`。 + +Ip 为您的 IoTDB 服务器所在的宿主机 IP,port 为 REST 服务的运行端口(默认 18080)。 + +输入 IoTDB 服务器的 username 和 password,点击 `Save & Test`,出现 `Data source is working` 则提示配置成功。 + + + + + +### 创建一个新的 Panel + +点击左侧的 `Dashboards` 图标,选择 `Manage`,如下图所示: + + + +点击右上方的 `New Dashboard` 图标,选择 `Add an empty panel`,如下图所示: + + + +Grafana Plugin 支持SQL: Full Customized和SQL: Drop-down List 两种方式,默认是SQL: Full Customized方式。 + + + +#### SQL: Full Customized 输入方式 + +在 SELECT 输入框、FROM 输入框、WHERE输入框、CONTROL输入框中输入内容,其中 WHERE 和 CONTROL 输入框为非必填。 + +如果一个查询涉及多个表达式,我们可以点击 SELECT 输入框右侧的 `+` 来添加 SELECT 子句中的表达式,也可以点击 FROM 输入框右侧的 `+` 来添加路径前缀,如下图所示: + + + +SELECT 输入框中的内容可以是时间序列的后缀,可以是函数或自定义函数,可以是算数表达式,也可以是它们的嵌套表达式。您还可以使用 as 子句来重命名需要显示的结果序列名字。 + +下面是 SELECT 输入框中一些合法的输入举例: + +* `s1` +* `top_k(s1, 'k'='1') as top` +* `sin(s1) + cos(s1 + s2)` +* `udf(s1) as "中文别名"` + +FROM 输入框中的内容必须是时间序列的前缀路径,比如 `root.sg.d`。 + +WHERE 输入框为非必须填写项目,填写内容应当是查询的过滤条件,比如 `time > 0` 或者 `s1 < 1024 and s2 > 1024`。 + +CONTROL 输入框为非必须填写项目,填写内容应当是控制查询类型、输出格式的特殊子句。其中GROUP BY 输入框支持使用grafana的全局变量来获取当前时间区间变化`$__from`(起始时间)、`$__to`(结束时间),下面是 CONTROL 输入框中一些合法的输入举例: + +* `GROUP BY ([$__from, $__to), 1d)` +* `GROUP BY ([$__from, $__to),3h,1d)` +* `GROUP BY ([2017-11-01T00:00:00, 2017-11-07T23:00:00), 1d)` +* `GROUP BY ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` +* `GROUP BY ([$__from, $__to), 1m) FILL (PREVIOUSUNTILLAST)` +* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (PREVIOUSUNTILLAST)` +* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (PREVIOUS, 1m)` +* `GROUP BY ([2017-11-07T23:50:00, 2017-11-07T23:59:00), 1m) FILL (LINEAR, 5m, 5m)` +* `GROUP BY ((2017-11-01T00:00:00, 2017-11-07T23:00:00], 1d), LEVEL=1` +* `GROUP BY ([0, 20), 2ms, 3ms), LEVEL=1` + +提示:为了避免OOM问题,不推荐使用select * from root.xx.** 这种语句在Grafana plugin中使用。 + +#### SQL: Drop-down List 输入方式 +在 TIME-SERIES 选择框中选择一条时间序列、FUNCTION 选择一个函数、SAMPLING INTERVAL、SLIDING STEP、LEVEL、FILL 输入框中输入内容,其中 TIME-SERIESL 为必填项其余为非必填项。 + + + +### 变量与模板功能的支持 + +SQL: Full Customized和SQL: Drop-down List两种输入方式都支持 Grafana 的变量与模板功能,下面示例中使用SQL: Full Customized输入方式,SQL: Drop-down List与之类似。 + +创建一个新的 Panel 后,点击右上角的设置按钮,如下图所示: + + + +选择 `Variables`,点击 `Add variable` ,如下图所示: + + + +示例一:输入 `Name`,`Label`,选择Type的`Query`、在Query 中输入show child paths xx , 点击 `Update` 按钮,如下图所示: + + + +应用 Variables,在 `grafana panel` 中输入变量点击 `save` 按钮,如下图所示 + + + +示例二:变量嵌套使用,如下图所示 + + + + + + + +示例三:函数变量使用,如下图所示 + + + + +上图中Name 是变量名称也是将来我们在panel中使用的变量名称,Label是变量的展示名称如果为空就显示Name的变量反之则显示Label的名称, +Type下拉中有Query、Custom、Text box、Constant、DataSource、Interval、Ad hoc filters等这些都可以在IoTDB的Grafana Plugin 中使用 +更加详细介绍用法请查看官方手册(https://grafana.com/docs/grafana/latest/variables/) + +除了上面的示例外,还支持下面这些语句: +* `show databases` +* `show timeseries` +* `show child nodes` +* `show all ttl` +* `show latest timeseries` +* `show devices` +* `select xx from root.xxx limit xx 等sql 查询` + +* 提示:如果查询的字段中有布尔类型的数据,会将true转化成1,false转化成0结果值进行显示。 + +### 告警功能 +本插件支持 Grafana alert功能。在Grafana 9告警界面一共有6个Tab,分别是Alert rules、Contact points、Notification policies、Silences、Alert groups、Admin + +* `Alert rules` 告警规则列表,用于展示和配置告警规则 +* `Contact points` 为通知渠道,包括DingDing、Email、Slack、WebHook、Prometheus Alertmanager等 +* `Notification policies` 配置告警发送到哪个通道的路由,以及发送通知的时间和重复频率,静默配置 +* `Silences` 为配置告警静默时间段 +* `Alert groups` 告警组,配置的告警触发后会在这里显示 +* `Admin` 提供通过JSON方式修改告警配置 + +1. 在Grafana panel中,点击alerting按钮,如下图所示: + + + +2. 点击`Create alert rule from this panel`,如下图所示: + + + +3. 在第1步中设置查询和警报条件,Conditions 表示查询条件,可以配置多个组合查询条件。如下图所示: + + + +图中的查询条件:min() OF A IS BELOW 0,表示将A选项卡中的最小值在0一下就会触发条件,单击该函数可将其更改为另一个函数。 + +提示:警报规则中使用的查询不能包含任何模板变量。目前我们只支持条件之间的AND和OR运算符,它们是串行执行的。 +例如,我们按以下顺序有 3 个条件: 条件:B(计算为:TRUE)或条件:C(计算为:FALSE)和条件:D(计算为:TRUE) 所以结果将计算为((对或错)和对)=对。 + +4. 选择完指标及告警规则后点击`Preview`按钮,进行数据预览如下图所示: + + +5. 在第 2 步中,指定警报评估间隔,对于`Evaluate every`,指定评估频率。必须是 10 秒的倍数。例如,1m,30s。 +对于`Evaluate for`,指定在警报触发之前的持续时间。如下图所示: + + +6. 在第 3 步中,添加存储位置、规则组以及与规则关联的其他元数据。 其中`Rule name`指定规则的名称。规则名称必须是唯一的。 + + +7. 在第 4 步中,添加自定义标签。 从下拉列表中选择现有键值对添加自定义标签,或通过输入新键或值来添加新标签。如下图所示: + + + +8. 单击保存以保存规则或单击保存并退出以保存规则并返回到警报页面。 +9. 告警状态常用的有`Normal`、`Pending`、`Firing`等状态,如下图所示: + + + +10. 我们也可以为告警配置`Contact points`用来接收告警通知,更加详细操作可以参考官方文档(https://grafana.com/docs/grafana/latest/alerting/manage-notifications/create-contact-point/)。 + +想了解alert更多详细的操作可以查看官方文档https://grafana.com/docs/grafana/latest/alerting/ + +## 更多 + +更多关于 Grafana 操作详情可参看 Grafana 官方文档:http://docs.grafana.org/guides/getting_started/。 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Hive-TsFile.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Hive-TsFile.md new file mode 100644 index 00000000..126a023f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Hive-TsFile.md @@ -0,0 +1,167 @@ + + +# Apache Hive(TsFile) + +## 什么是 TsFile 的 Hive 连接器 + +TsFile 的 Hive 连接器实现了对 Hive 读取外部 Tsfile 类型的文件格式的支持, +使用户能够通过 Hive 操作 Tsfile。 + +有了这个连接器,用户可以 +* 将单个 Tsfile 文件加载进 Hive,不论文件是存储在本地文件系统或者是 HDFS 中 +* 将某个特定目录下的所有文件加载进 Hive,不论文件是存储在本地文件系统或者是 HDFS 中 +* 使用 HQL 查询 tsfile +* 到现在为止,写操作在 hive-connector 中还没有被支持。所以,HQL 中的 insert 操作是不被允许的 + +## 系统环境要求 + +|Hadoop Version |Hive Version | Java Version | TsFile | +|------------- |------------ | ------------ |------------ | +| `2.7.3` or `3.2.1` | `2.3.6` or `3.1.2` | `1.8` | `1.0.0+`| + +## 数据类型对应关系 + +| TsFile 数据类型 | Hive 数据类型 | +| ---------------- | --------------- | +| BOOLEAN | Boolean | +| INT32 | INT | +| INT64 | BIGINT | +| FLOAT | Float | +| DOUBLE | Double | +| TEXT | STRING | + +## 为 Hive 添加依赖 jar 包 + +为了在 Hive 中使用 Tsfile 的 hive 连接器,我们需要把 hive 连接器的 jar 导入进 hive。 + +从 下载完 iotdb 后,你可以使用 `mvn clean package -pl iotdb-connector/hive-connector -am -Dmaven.test.skip=true -P get-jar-with-dependencies`命令得到一个 `hive-connector-X.X.X-SNAPSHOT-jar-with-dependencies.jar`。 + +然后在 hive 的命令行中,使用`add jar XXX`命令添加依赖。例如: + +```shell +hive> add jar /Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar; + +Added [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] to class path +Added resources: [/Users/hive/iotdb/hive-connector/target/hive-connector-1.0.0-jar-with-dependencies.jar] +``` + +## 创建 Tsfile-backed 的 Hive 表 + +为了创建一个 Tsfile-backed 的表,需要将`serde`指定为`org.apache.iotdb.hive.TsFileSerDe`, +将`inputformat`指定为`org.apache.iotdb.hive.TSFHiveInputFormat`, +将`outputformat`指定为`org.apache.iotdb.hive.TSFHiveOutputFormat`。 + +同时要提供一个只包含两个字段的 Schema,这两个字段分别是`time_stamp`和`sensor_id`。 +`time_stamp`代表的是时间序列的时间值,`sensor_id`是你想要从 tsfile 文件中提取出来分析的传感器名称,比如说`sensor_1`。 +表的名字可以是 hive 所支持的任何表名。 + +需要提供一个路径供 hive-connector 从其中拉取最新的数据。 + +这个路径必须是一个指定的文件夹,这个文件夹可以在你的本地文件系统上,也可以在 HDFS 上,如果你启动了 Hadoop 的话。 +如果是本地文件系统,要以这样的形式`file:///data/data/sequence/root.baic2.WWS.leftfrontdoor/` + +最后需要在`TBLPROPERTIES`里指明`device_id` + +例如: + +``` +CREATE EXTERNAL TABLE IF NOT EXISTS only_sensor_1( + time_stamp TIMESTAMP, + sensor_1 BIGINT) +ROW FORMAT SERDE 'org.apache.iotdb.hive.TsFileSerDe' +STORED AS + INPUTFORMAT 'org.apache.iotdb.hive.TSFHiveInputFormat' + OUTPUTFORMAT 'org.apache.iotdb.hive.TSFHiveOutputFormat' +LOCATION '/data/data/sequence/root.baic2.WWS.leftfrontdoor/' +TBLPROPERTIES ('device_id'='root.baic2.WWS.leftfrontdoor.plc1'); +``` + +在这个例子里,我们从`/data/data/sequence/root.baic2.WWS.leftfrontdoor/`中拉取`root.baic2.WWS.leftfrontdoor.plc1.sensor_1`的数据。 +这个表可能产生如下描述: + +``` +hive> describe only_sensor_1; +OK +time_stamp timestamp from deserializer +sensor_1 bigint from deserializer +Time taken: 0.053 seconds, Fetched: 2 row(s) +``` + +到目前为止,Tsfile-backed 的表已经可以像 hive 中其他表一样被操作了。 + +## 从 Tsfile-backed 的 Hive 表中查询 + +在做任何查询之前,我们需要通过如下命令,在 hive 中设置`hive.input.format`: + +``` +hive> set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; +``` + +现在,我们已经在 hive 中有了一个名为`only_sensor_1`的外部表。 +我们可以使用 HQL 做任何查询来分析其中的数据。 + +例如: + +### 选择查询语句示例 + +``` +hive> select * from only_sensor_1 limit 10; +OK +1 1000000 +2 1000001 +3 1000002 +4 1000003 +5 1000004 +6 1000005 +7 1000006 +8 1000007 +9 1000008 +10 1000009 +Time taken: 1.464 seconds, Fetched: 10 row(s) +``` + +### 聚合查询语句示例 + +``` +hive> select count(*) from only_sensor_1; +WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. +Query ID = jackietien_20191016202416_d1e3e233-d367-4453-b39a-2aac9327a3b6 +Total jobs = 1 +Launching Job 1 out of 1 +Number of reduce tasks determined at compile time: 1 +In order to change the average load for a reducer (in bytes): + set hive.exec.reducers.bytes.per.reducer= +In order to limit the maximum number of reducers: + set hive.exec.reducers.max= +In order to set a constant number of reducers: + set mapreduce.job.reduces= +Job running in-process (local Hadoop) +2019-10-16 20:24:18,305 Stage-1 map = 0%, reduce = 0% +2019-10-16 20:24:27,443 Stage-1 map = 100%, reduce = 100% +Ended Job = job_local867757288_0002 +MapReduce Jobs Launched: +Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS +Total MapReduce CPU Time Spent: 0 msec +OK +1000000 +Time taken: 11.334 seconds, Fetched: 1 row(s) +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md new file mode 100644 index 00000000..fe001d7e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Ignition-IoTDB-plugin_timecho.md @@ -0,0 +1,272 @@ + +# Ignition + +## 产品概述 + +1. Ignition简介 + +Ignition 是一个基于WEB的监控和数据采集工具(SCADA)- 一个开放且可扩展的通用平台。Ignition可以让你更轻松地控制、跟踪、显示和分析企业的所有数据,提升业务能力。更多介绍详情请参考[Ignition官网](https://docs.inductiveautomation.com/docs/8.1/getting-started/introducing-ignition) + +2. Ignition-IoTDB Connector介绍 + + Ignition-IoTDB Connector分为两个模块:Ignition-IoTDB连接器、Ignition-IoTDB With JDBC。其中: + + - Ignition-IoTDB 连接器:提供了将 Ignition 采集到的数据存入 IoTDB 的能力,也支持在Components中进行数据读取,同时注入了 `system.iotdb.insert`和`system.iotdb.query`脚本接口用于方便在Ignition编程使用 + - Ignition-IoTDB With JDBC:Ignition-IoTDB With JDBC 可以在 `Transaction Groups` 模块中使用,不适用于 `Tag Historian`模块,可以用于自定义写入和查询。 + + 两个模块与Ignition的具体关系与内容如下图所示。 + + ![](https://alioss.timecho.com/docs/img/Ignition.png) + +## 安装要求 + +| **准备内容** | **版本要求** | +| :------------------------: | :------------------------------------------------------------: | +| IoTDB | 要求已安装V1.3.1及以上版本,安装请参考 IoTDB [部署指导](../Deployment-and-Maintenance/IoTDB-Package_timecho.md) | +| Ignition | 要求已安装 8.1.x版本(8.1.37及以上)的 8.1 版本,安装请参考 Ignition 官网[安装指导](https://docs.inductiveautomation.com/docs/8.1/getting-started/installing-and-upgrading)(其他版本适配请联系商务了解) | +| Ignition-IoTDB连接器模块 | 请联系商务获取 | +| Ignition-IoTDB With JDBC模块 | 下载地址:https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/ | + +## Ignition-IoTDB连接器使用说明 + +### 简介 + +Ignition-IoTDB连接器模块可以将数据存入与历史数据库提供程序关联的数据库连接中。数据根据其数据类型直接存储到 SQL 数据库中的表中,以及毫秒时间戳。根据每个标签上的值模式和死区设置,仅在更改时存储数据,从而避免重复和不必要的数据存储。 + +Ignition-IoTDB连接器提供了将 Ignition 采集到的数据存入 IoTDB 的能力。 + +### 安装步骤 + +步骤一:进入 `Config` - `System`- `Modules` 模块,点击最下方的`Install or Upgrade a Module...` + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-1.PNG) + +步骤二:选择获取到的 `modl`,选择文件并上传,点击 `Install`,信任相关证书。 + +![](https://alioss.timecho.com/docs/img/ignition-3.png) + +步骤三:安装完成后可以看到如下内容 + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-3.PNG) + +步骤四:进入 `Config` - `Tags`- `History` 模块,点击下方的`Create new Historical Tag Provider...` + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-4.png) + +步骤五:选择 `IoTDB`并填写配置信息 + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%E8%BF%9E%E6%8E%A5%E5%99%A8-5.PNG) + +配置内容如下: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
名称含义默认值备注
Main
Provider NameProvider 名称-
Enabled true为 true 时才能使用该 Provider
Description备注-
IoTDB Settings
Host Name目标IoTDB实例的地址-
Port Number目标IoTDB实例的端口6667
Username目标IoTDB的用户名-
Password目标IoTDB的密码-
Database Name要存储的数据库名称,以 root 开头,如 root.db-
Pool SizeSessionPool 的 Size50可以按需进行配置
Store and Forward Settings保持默认即可
+ + +### 使用说明 + +#### 配置历史数据存储 + +- 配置好 `Provider` 后就可以在 `Designer` 中使用 `IoTDB Tag Historian` 了,就跟使用其他的 `Provider` 一样,右键点击对应 `Tag` 选择 `Edit tag(s)`,在 Tag Editor 中选择 History 分类 + + ![](https://alioss.timecho.com/docs/img/ignition-7.png) + +- 设置 `History Enabled` 为 `true`,并选择 `Storage Provider` 为上一步创建的 `Provider`,按需要配置其它参数,并点击 `OK`,然后保存项目。此时数据将会按照设置的内容持续的存入 `IoTDB` 实例中。 + + ![](https://alioss.timecho.com/docs/img/ignition-8.png) + +#### 读取数据 + +- 也可以在 Report 的 Data 标签下面直接选择存入 IoTDB 的 Tags + + ![](https://alioss.timecho.com/docs/img/ignition-9.png) + +- 在 Components 中也可以直接浏览相关数据 + + ![](https://alioss.timecho.com/docs/img/ignition-10.png) + +#### 脚本模块:该功能能够与 IoTDB 进行交互 + +1. system.iotdb.insert: + + +- 脚本说明:将数据写入到 IoTDB 实例中 + +- 脚本定义: + ``` shell + system.iotdb.insert(historian, deviceId, timestamps, measurementNames, measurementValues) + ``` + +- 参数: + + - `str historian`:对应的 IoTDB Tag Historian Provider 的名称 + - `str deviceId`:写入的 deviceId,不含配置的 database,如 Sine + - `long[] timestamps`:写入的数据点对于的时间戳列表 + - `str[] measurementNames`:写入的物理量的名称列表 + - `str[][] measurementValues`:写入的数据点数据,与时间戳列表和物理量名称列表对应 + +- 返回值:无 + +- 可用范围:Client, Designer, Gateway + +- 使用示例: + + ```shell + system.iotdb.insert("IoTDB", "Sine", [system.date.now()],["measure1","measure2"],[["val1","val2"]]) + ``` + +2. system.iotdb.query: + + +- 脚本说明:查询写到 IoTDB 实例中的数据 + +- 脚本定义: + ```shell + system.iotdb.query(historian, sql) + ``` + +- 参数: + + - `str historian`:对应的 IoTDB Tag Historian Provider 的名称 + - `str sql`:待查询的 sql 语句 + +- 返回值: + 查询的结果:`List>` + +- 可用范围:Client, Designer, Gateway +- 使用示例: + +```shell +system.iotdb.query("IoTDB", "select * from root.db.Sine where time > 1709563427247") +``` + +## Ignition-IoTDB With JDBC + +### 简介 + + Ignition-IoTDB With JDBC提供了一个 JDBC 驱动,允许用户使用标准的JDBC API 连接和查询 lgnition-loTDB 数据库 + +### 安装步骤 + + 步骤一:进入 `Config` - `Databases` -`Drivers` 模块,创建 `Translator` + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%20With%20JDBC-1.png) + + 步骤二:进入 `Config` - `Databases` -`Drivers` 模块,创建 `JDBC Driver`,选择上一步配置的 `Translator`并上传下载的 `IoTDB-JDBC`,Classname 配置为 `org.apache.iotdb.jdbc.IoTDBDriver` + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%20With%20JDBC-2.png) + +步骤三:进入 `Config` - `Databases` -`Connections` 模块,创建新的 `Connections`,`JDBC Driver` 选择上一步创建的 `IoTDB Driver`,配置相关信息后保存即可使用 + +![](https://alioss.timecho.com/docs/img/Ignition-IoTDB%20With%20JDBC-3.png) + +### 使用说明 + +#### 数据写入 + + 在`Transaction Groups`中的 `Data Source`选择之前创建的 `Connection` + +- `Table name` 需设置为 root 开始的完整的设备路径 +- 取消勾选 `Automatically create table` +- `Store timestame to` 配置为 time + +不选择其他项,设置好字段,并 `Enabled` 后 数据会安装设置存入对应的 IoTDB + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E5%86%99%E5%85%A5-1.png) + +#### 数据查询 + +- 在 `Database Query Browser` 中选择`Data Source`选择之前创建的 `Connection`,即可编写 SQL 语句查询 IoTDB 中的数据 + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E6%9F%A5%E8%AF%A2-ponz.png) + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/NiFi-IoTDB.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/NiFi-IoTDB.md new file mode 100644 index 00000000..62aec397 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/NiFi-IoTDB.md @@ -0,0 +1,140 @@ + +# Apache NiFi + +## Apache NiFi简介 + +Apache NiFi 是一个易用的、功能强大的、可靠的数据处理和分发系统。 + +Apache NiFi 支持强大的、可伸缩的数据路由、转换和系统中介逻辑的有向图。 + +Apache NiFi 包含以下功能: + +* 基于浏览器的用户接口: + * 设计、控制、反馈和监控的无缝体验 +* 数据起源跟踪 + * 从头到尾完整的信息族谱 +* 丰富的配置 + * 丢失容忍和保证交付 + * 低延迟和高吞吐 + * 动态优先级策略 + * 运行时可以修改流配置 + * 反向压力控制 +* 扩展设计 + * 用于定制 processors 和 services 的组件体系结构 + * 快速开发和迭代测试 +* 安全会话 + * 带有可配置认证策略的 HTTPS 协议 + * 多租户授权和策略管理 + * 包括TLS和SSH的加密通信的标准协议 + +## PutIoTDBRecord + +这是一个用于数据写入的处理器。它使用配置的 Record Reader 将传入 FlowFile 的内容读取为单独的记录,并使用本机接口将它们写入 Apache IoTDB。 + +### PutIoTDBRecord的配置项 + +| 配置项 | 描述 | 默认值 | 是否必填 | +| ------------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------| ------ | -------- | +| Host | IoTDB 的主机名 | null | true | +| Port | IoTDB 的端口 | 6667 | true | +| Username | IoTDB 的用户名 | null | true | +| Password | IoTDB 的密码 | null | true | +| Prefix | 将被写入IoTDB的数据的tsName前缀 以root. 开头
可以使用Nifi expression language做动态替换. | null | true | +| Time | 时间字段名 | null | true | +| Record Reader | 指定一个 Record Reader controller service 来解析数据,并且推断数据格式。 | null | true | +| Schema | IoTDB 需要的 schema 不能很好的被 NiFi 支持,因此你可以在这里自定义 schema。
除此之外,你可以通过这个方式设置编码和压缩类型。如果你没有设置这个配置,就会使用 Record Reader 推断的 schema。
这个配置可以通过 Attributes 的表达式来更新。 | null | false | +| Aligned | 是否使用 aligned 接口?
这个配置可以通过 Attributes 的表达式来更新。 | false | false | +| MaxRowNumber | 指定 tablet 的最大行数。
这个配置可以通过 Attributes 的表达式来更新。 | 1024 | false | + +### Flowfile 的推断数据类型 + +如果要使用推断类型,需要注意以下几点: + +1. 输入的 flowfile 需要能被 `Record Reader` 读取。 +2. flowfile的 schema 中必须包含以时间字段名属性命名的字段 +3. `Time`的数据类型只能是 `STRING` 或者 `LONG `。 +4. 除`Time` 以外的列必须以 `root.` 开头。 +5. 支持的数据类型有: `INT`,`LONG`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `TEXT`。 + +### 通过配置项自定义 schema + +如上所述,通过配置项来自定义 schema 比起推断的 schema来说,是一种更加灵活和强大的方式。 + + `Schema` 配置项的解构如下: + +```json +{ + "fields": [{ + "tsName": "s1", + "dataType": "INT32", + "encoding": "RLE", + "compressionType": "GZIP" + }, { + "tsName": "s2", + "dataType": "INT64", + "encoding": "RLE", + "compressionType": "GZIP" + }] +} +``` + +**注意** + +1. flowfile 的第一列数据必须为 `Time`。剩下的必须与 `fields` 配置中保持一样的顺序。 +1. 定义 shema 的 JSON 中必须包含 `timeType` and `fields` 这两项。 +2. `timeType` 只支持 `LONG` 和 `STRING` 这两个选项。 +3. `tsName` 和 `dataType` 这两项必须被设置。 +4. 当数据插入IoTDB时,Prefix属性会被添加到 tsName以作为插入的字段名。 +5. 支持的 `dataTypes` 有:`INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `TEXT`。 +6. 支持的 `encoding` 有: `PLAIN`, `DICTIONARY`, `RLE`, `DIFF`, `TS_2DIFF`, `BITMAP`, `GORILLA_V1`, `REGULAR`, `GORILLA`,`ZIGZAG`,`CHIMP`, `SPRINTZ`, `RLBE`。 +7. 支持的 `compressionType` 有: `UNCOMPRESSED`, `SNAPPY`, `GZIP`, `LZO`, `SDT`, `PAA`, `PLA`, `LZ4`, `ZSTD`, `LZMA2`。 + +## Relationships + +| relationship | 描述 | +| ------------ | ----------------------- | +| success | 数据能被正确的写入。 | +| failure | schema 或者数据有异常。 | + +## QueryIoTDBRecord + +这是一个用于数据读取的处理器。它通过读取 FlowFile 的内容中的SQL 查询来对IoTDB的原生接口进行访问,并将查询结果用Record Writer写入 flowfile。 + +### QueryIoTDBRecord的配置项 + +| 配置项 | 描述 | 默认值 | 是否必填 | +| ------------- |--------------------------------------------------------------------------------| ------ | -------- | +| Host | IoTDB 的主机名 | null | true | +| Port | IoTDB 的端口 | 6667 | true | +| Username | IoTDB 的用户名 | null | true | +| Password | IoTDB 的密码 | null | true | +| Record Writer | 指定一个 Record Writer controller service 来写入数据。 | null | true | +| iotdb-query | 需要执行的IoTDB query
。 Note: 如果有连入侧的连接那么查询会从FlowFile的内容中提取,否则使用当前配置的属性 | null | false | +| iotdb-query-chunk-size | 返回的结果可以进行分块,数据流中会返回一批按设置大小切分的数据,而不是一个单一的响应. 分块查询可以返回无限量的行。 注意: 数据分块只有在设置不为0时启用 | 0 | false | + + +## Relationships + +| relationship | 描述 | +| ------------ | ----------------------- | +| success | 数据能被正确的写入。 | +| failure | schema 或者数据有异常。 | diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-IoTDB.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-IoTDB.md new file mode 100644 index 00000000..0376fb1f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-IoTDB.md @@ -0,0 +1,229 @@ + + +# Apache Spark(IoTDB) + +## 版本支持 + +支持的 Spark 与 Scala 版本如下: + +| Spark 版本 | Scala 版本 | +|----------------|--------------| +| `2.4.0-latest` | `2.11, 2.12` | + +## 注意事项 + +1. 当前版本的 `spark-iotdb-connector` 支持 `2.11` 与 `2.12` 两个版本的 Scala,暂不支持 `2.13` 版本。 +2. `spark-iotdb-connector` 支持在 Java、Scala 版本的 Spark 与 PySpark 中使用。 + +## 部署 + +`spark-iotdb-connector` 总共有两个使用场景,分别为 IDE 开发与 spark-shell 调试。 + +### IDE 开发 + +在 IDE 开发时,只需要在 `pom.xml` 文件中添加以下依赖即可: + +``` xml + + org.apache.iotdb + + spark-iotdb-connector_2.12.10 + ${iotdb.version} + +``` + +### `spark-shell` 调试 + +如果需要在 `spark-shell` 中使用 `spark-iotdb-connetcor`,需要先在官网下载 `with-dependencies` 版本的 jar 包。然后再将 Jar 包拷贝到 `${SPARK_HOME}/jars` 目录中即可。 +执行以下命令即可: + +```shell +cp spark-iotdb-connector_2.12.10-${iotdb.version}.jar $SPARK_HOME/jars/ +``` + +此外,为了保证 spark 能使用 JDBC 和 IoTDB 连接,需要进行如下操作: + +运行如下命令来编译 IoTDB-JDBC 连接器: + +```shell +mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies +``` + +编译后的 jar 包在如下目录中: + +```shell +$IoTDB_HOME/iotdb-client/jdbc/target/iotdb-jdbc-{version}-SNAPSHOT-jar-with-dependencies.jar +``` + +最后再将 jar 包拷贝到 `${SPARK_HOME}/jars` 目录中即可。执行以下命令即可: + +```shell +cp iotdb-jdbc-{version}-SNAPSHOT-jar-with-dependencies.jar $SPARK_HOME/jars/ +``` + +## 使用 + +### 参数 + +| 参数 | 描述 | 默认值 | 使用范围 | 能否为空 | +|--------------|------------------------------------------------|------|------------|-------| +| url | 指定 IoTDB 的 JDBC 的 URL | null | read、write | false | +| user | IoTDB 的用户名 | root | read、write | true | +| password | IoTDB 的密码 | root | read、write | true | +| sql | 用于指定查询的 SQL 语句 | null | read | true | +| numPartition | 在 read 中用于指定 DataFrame 的分区数,在 write 中用于设置写入并发数 | 1 | read、write | true | +| lowerBound | 查询的起始时间戳(包含) | 0 | read | true | +| upperBound | 查询的结束时间戳(包含) | 0 | read | true | + +### 从 IoTDB 读取数据 + +以下是一个示例,演示如何从 IoTDB 中读取数据成为 DataFrame。 + +```scala +import org.apache.iotdb.spark.db._ + +val df = spark.read.format("org.apache.iotdb.spark.db") + .option("user", "root") + .option("password", "root") + .option("url", "jdbc:iotdb://127.0.0.1:6667/") + .option("sql", "select ** from root") // 查询 SQL + .option("lowerBound", "0") // 时间戳下界 + .option("upperBound", "100000000") // 时间戳上界 + .option("numPartition", "5") // 分区数 + .load + +df.printSchema() + +df.show() +``` + +### 将数据写入 IoTDB + +以下是一个示例,演示如何将数据写入 IoTDB。 + +```scala +// 构造窄表数据 +val df = spark.createDataFrame(List( + (1L, "root.test.d0", 1, 1L, 1.0F, 1.0D, true, "hello"), + (2L, "root.test.d0", 2, 2L, 2.0F, 2.0D, false, "world"))) + +val dfWithColumn = df.withColumnRenamed("_1", "Time") + .withColumnRenamed("_2", "Device") + .withColumnRenamed("_3", "s0") + .withColumnRenamed("_4", "s1") + .withColumnRenamed("_5", "s2") + .withColumnRenamed("_6", "s3") + .withColumnRenamed("_7", "s4") + .withColumnRenamed("_8", "s5") + +// 写入窄表数据 +dfWithColumn + .write + .format("org.apache.iotdb.spark.db") + .option("url", "jdbc:iotdb://127.0.0.1:6667/") + .save + +// 构造宽表数据 +val df = spark.createDataFrame(List( + (1L, 1, 1L, 1.0F, 1.0D, true, "hello"), + (2L, 2, 2L, 2.0F, 2.0D, false, "world"))) + +val dfWithColumn = df.withColumnRenamed("_1", "Time") + .withColumnRenamed("_2", "root.test.d0.s0") + .withColumnRenamed("_3", "root.test.d0.s1") + .withColumnRenamed("_4", "root.test.d0.s2") + .withColumnRenamed("_5", "root.test.d0.s3") + .withColumnRenamed("_6", "root.test.d0.s4") + .withColumnRenamed("_7", "root.test.d0.s5") + +// 写入宽表数据 +dfWithColumn.write.format("org.apache.iotdb.spark.db") + .option("url", "jdbc:iotdb://127.0.0.1:6667/") + .option("numPartition", "10") + .save +``` + +### 宽表与窄表转换 + +以下是如何转换宽表与窄表的示例: + +* 从宽到窄 + +```scala +import org.apache.iotdb.spark.db._ + +val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root.** where time < 1100 and time > 1000").load +val narrow_df = Transformer.toNarrowForm(spark, wide_df) +``` + +* 从窄到宽 + +```scala +import org.apache.iotdb.spark.db._ + +val wide_df = Transformer.toWideForm(spark, narrow_df) +``` + +## 宽表与窄表 + +以下 TsFile 结构为例:TsFile 模式中有三个度量:状态,温度和硬件。 这三种测量的基本信息如下: + +| 名称 | 类型 | 编码 | +|-----|---------|-------| +| 状态 | Boolean | PLAIN | +| 温度 | Float | RLE | +| 硬件 | Text | PLAIN | + +TsFile 中的现有数据如下: + +* `d1:root.ln.wf01.wt01` +* `d2:root.ln.wf02.wt02` + +| time | d1.status | time | d1.temperature | time | d2.hardware | time | d2.status | +|------|-----------|------|----------------|------|-------------|------|-----------| +| 1 | True | 1 | 2.2 | 2 | "aaa" | 1 | True | +| 3 | True | 2 | 2.2 | 4 | "bbb" | 2 | False | +| 5 | False | 3 | 2.1 | 6 | "ccc" | 4 | True | + +宽(默认)表形式如下: + +| Time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware | +|------|-------------------------------|--------------------------|----------------------------|-------------------------------|--------------------------|----------------------------| +| 1 | null | true | null | 2.2 | true | null | +| 2 | null | false | aaa | 2.2 | null | null | +| 3 | null | null | null | 2.1 | true | null | +| 4 | null | true | bbb | null | null | null | +| 5 | null | null | null | null | false | null | +| 6 | null | null | ccc | null | null | null | + +你还可以使用窄表形式,如下所示: + +| Time | Device | status | hardware | temperature | +|------|-------------------|--------|----------|-------------| +| 1 | root.ln.wf02.wt01 | true | null | 2.2 | +| 1 | root.ln.wf02.wt02 | true | null | null | +| 2 | root.ln.wf02.wt01 | null | null | 2.2 | +| 2 | root.ln.wf02.wt02 | false | aaa | null | +| 3 | root.ln.wf02.wt01 | true | null | 2.1 | +| 4 | root.ln.wf02.wt02 | true | bbb | null | +| 5 | root.ln.wf02.wt01 | false | null | null | +| 6 | root.ln.wf02.wt02 | null | ccc | null | diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-TsFile.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-TsFile.md new file mode 100644 index 00000000..0bcc8ce9 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Spark-TsFile.md @@ -0,0 +1,320 @@ + + +# Apache Spark(TsFile) + +## About TsFile-Spark-Connector + +TsFile-Spark-Connector 对 Tsfile 类型的外部数据源实现 Spark 的支持。 这使用户可以通过 Spark 读取,写入和查询 Tsfile。 + +使用此连接器,您可以 + +- 从本地文件系统或 hdfs 加载单个 TsFile 到 Spark +- 将本地文件系统或 hdfs 中特定目录中的所有文件加载到 Spark 中 +- 将数据从 Spark 写入 TsFile + +## System Requirements + +| Spark Version | Scala Version | Java Version | TsFile | +| ------------- | ------------- | ------------ | -------- | +| `2.4.3` | `2.11.8` | `1.8` | `1.0.0` | + +> 注意:有关如何下载和使用 TsFile 的更多信息,请参见以下链接:https://github.com/apache/iotdb/tree/master/tsfile +> 注意:spark 版本目前仅支持 2.4.3, 其他版本可能存在不适配的问题,目前已知 2.4.7 的版本存在不适配的问题 + +## 快速开始 + +### 本地模式 + +在本地模式下使用 TsFile-Spark-Connector 启动 Spark: + +``` +./ --jars tsfile-spark-connector.jar,tsfile-{version}-jar-with-dependencies.jar,hadoop-tsfile-{version}-jar-with-dependencies.jar +``` + +- \是您的 spark-shell 的真实路径。 +- 多个 jar 包用逗号分隔,没有任何空格。 +- 有关如何获取 TsFile 的信息,请参见 https://github.com/apache/iotdb/tree/master/tsfile。 +- 获取到 dependency 包:```mvn clean package -DskipTests -P get-jar-with-dependencies``` + +### 分布式模式 + +在分布式模式下使用 TsFile-Spark-Connector 启动 Spark(即,Spark 集群通过 spark-shell 连接): + +``` +. / --jars tsfile-spark-connector.jar,tsfile-{version}-jar-with-dependencies.jar,hadoop-tsfile-{version}-jar-with-dependencies.jar --master spark://ip:7077 +``` + +注意: + +- \是您的 spark-shell 的真实路径。 +- 多个 jar 包用逗号分隔,没有任何空格。 +- 有关如何获取 TsFile 的信息,请参见 https://github.com/apache/iotdb/tree/master/tsfile。 + +## 数据类型对应 + +| TsFile 数据类型 | SparkSQL 数据类型 | +| -------------- | ---------------- | +| BOOLEAN | BooleanType | +| INT32 | IntegerType | +| INT64 | LongType | +| FLOAT | FloatType | +| DOUBLE | DoubleType | +| TEXT | StringType | + +## 模式推断 + +显示 TsFile 的方式取决于架构。 以以下 TsFile 结构为例:TsFile 模式中有三个度量:状态,温度和硬件。 这三种测量的基本信息如下: + +|名称 | 类型 | 编码 | +| ---- | ---- | ---- | +|状态 | Boolean|PLAIN| +|温度 | Float|RLE| +|硬件 |Text|PLAIN| + +TsFile 中的现有数据如下: + + * d1:root.ln.wf01.wt01 + * d2:root.ln.wf02.wt02 + +| time | d1.status | time | d1.temperature | time | d2.hardware | time | d2.status | +| :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | +| 1 | True | 1 | 2.2 | 2 | "aaa" | 1 | True | +| 3 | True | 2 | 2.2 | 4 | "bbb" | 2 | False | +| 5 | False | 3 | 2.1 | 6 | "ccc" | 4 | True | + +相应的 SparkSQL 表如下: + +| time | root.ln.wf02.wt02.temperature | root.ln.wf02.wt02.status | root.ln.wf02.wt02.hardware | root.ln.wf01.wt01.temperature | root.ln.wf01.wt01.status | root.ln.wf01.wt01.hardware | +| ---- | ----------------------------- | ------------------------ | -------------------------- | ----------------------------- | ------------------------ | -------------------------- | +| 1 | null | true | null | 2.2 | true | null | +| 2 | null | false | aaa | 2.2 | null | null | +| 3 | null | null | null | 2.1 | true | null | +| 4 | null | true | bbb | null | null | null | +| 5 | null | null | null | null | false | null | +| 6 | null | null | ccc | null | null | null | + +您还可以使用如下所示的窄表形式:(您可以参阅第 6 部分,了解如何使用窄表形式) + +| time | device_name | status | hardware | temperature | +| ---- | ----------------- | ------ | -------- | ----------- | +| 1 | root.ln.wf02.wt01 | true | null | 2.2 | +| 1 | root.ln.wf02.wt02 | true | null | null | +| 2 | root.ln.wf02.wt01 | null | null | 2.2 | +| 2 | root.ln.wf02.wt02 | false | aaa | null | +| 3 | root.ln.wf02.wt01 | true | null | 2.1 | +| 4 | root.ln.wf02.wt02 | true | bbb | null | +| 5 | root.ln.wf02.wt01 | false | null | null | +| 6 | root.ln.wf02.wt02 | null | ccc | null | + +## Scala API + +注意:请记住预先分配必要的读写权限。 + + * 示例 1:从本地文件系统读取 + +```scala +import org.apache.iotdb.spark.tsfile._ +val wide_df = spark.read.tsfile("test.tsfile") +wide_df.show + +val narrow_df = spark.read.tsfile("test.tsfile", true) +narrow_df.show +``` + + * 示例 2:从 hadoop 文件系统读取 + +```scala +import org.apache.iotdb.spark.tsfile._ +val wide_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +wide_df.show + +val narrow_df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +narrow_df.show +``` + + * 示例 3:从特定目录读取 + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/usr/hadoop") +df.show +``` + +注 1:现在不支持目录中所有 TsFile 的全局时间排序。 + +注 2:具有相同名称的度量应具有相同的架构。 + + * 示例 4:广泛形式的查询 + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select * from tsfile_table where `device_1.sensor_1`>0 and `device_1.sensor_2` < 22") +newDf.show +``` + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select count(*) from tsfile_table") +newDf.show +``` + + * 示例 5:缩小形式的查询 + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select * from tsfile_table where device_name = 'root.ln.wf02.wt02' and temperature > 5") +newDf.show +``` + +```scala +import org.apache.iotdb.spark.tsfile._ +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +df.createOrReplaceTempView("tsfile_table") +val newDf = spark.sql("select count(*) from tsfile_table") +newDf.show +``` + + * 例 6:写宽格式 + +```scala +// we only support wide_form table to write +import org.apache.iotdb.spark.tsfile._ + +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile") +df.show +df.write.tsfile("hdfs://localhost:9000/output") + +val newDf = spark.read.tsfile("hdfs://localhost:9000/output") +newDf.show +``` + + * 例 7:写窄格式 + +```scala +// we only support wide_form table to write +import org.apache.iotdb.spark.tsfile._ + +val df = spark.read.tsfile("hdfs://localhost:9000/test.tsfile", true) +df.show +df.write.tsfile("hdfs://localhost:9000/output", true) + +val newDf = spark.read.tsfile("hdfs://localhost:9000/output", true) +newDf.show +``` + +附录 A:模式推断的旧设计 + +显示 TsFile 的方式与 TsFile Schema 有关。 以以下 TsFile 结构为例:TsFile 架构中有三个度量:状态,温度和硬件。 这三个度量的基本信息如下: + +|名称 | 类型 | 编码 | +| ---- | ---- | ---- | +|状态 | Boolean|PLAIN| +|温度 | Float|RLE| +|硬件|Text|PLAIN| + +文件中的现有数据如下: + + * delta_object1: root.ln.wf01.wt01 + * delta_object2: root.ln.wf02.wt02 + * delta_object3: :root.sgcc.wf03.wt01 + +| time | delta_object1.status | time | delta_object1.temperature | time | delta_object2.hardware | time | delta_object2.status | time | delta_object3.status | time | delta_object3.temperature | +| :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | +| 1 | True | 1 | 2.2 | 2 | "aaa" | 1 | True | 2 | True | 3 | 3.3 | +| 3 | True | 2 | 2.2 | 4 | "bbb" | 2 | False | 3 | True | 6 | 6.6 | +| 5 | False| 3 | 2.1 | 6 | "ccc" | 4 | True | 4 | True | 8 | 8.8 | +| 7 | True | 4 | 2.0 | 8 | "ddd" | 5 | False | 6 | True | 9 | 9.9 | + +有两种显示方法: + + * 默认方式 + +将创建两列来存储设备的完整路径:time(LongType)和 delta_object(StringType)。 + +- `time`:时间戳记,LongType +- `delta_object`:Delta_object ID,StringType + +接下来,为每个度量创建一列以存储特定数据。 SparkSQL 表结构如下: + +|time(LongType)|delta\_object(StringType)|status(BooleanType)|temperature(FloatType)|hardware(StringType)| +|--- |--- |--- |--- |--- | +|1|root.ln.wf01.wt01|True|2.2|null| +|1|root.ln.wf02.wt02|True|null|null| +|2|root.ln.wf01.wt01|null|2.2|null| +|2|root.ln.wf02.wt02|False|null|"aaa"| +|2|root.sgcc.wf03.wt01|True|null|null| +|3|root.ln.wf01.wt01|True|2.1|null| +|3|root.sgcc.wf03.wt01|True|3.3|null| +|4|root.ln.wf01.wt01|null|2.0|null| +|4|root.ln.wf02.wt02|True|null|"bbb"| +|4|root.sgcc.wf03.wt01|True|null|null| +|5|root.ln.wf01.wt01|False|null|null| +|5|root.ln.wf02.wt02|False|null|null| +|5|root.sgcc.wf03.wt01|True|null|null| +|6|root.ln.wf02.wt02|null|null|"ccc"| +|6|root.sgcc.wf03.wt01|null|6.6|null| +|7|root.ln.wf01.wt01|True|null|null| +|8|root.ln.wf02.wt02|null|null|"ddd"| +|8|root.sgcc.wf03.wt01|null|8.8|null| +|9|root.sgcc.wf03.wt01|null|9.9|null| + + * 展开 delta_object 列 + +通过“。”将设备列展开为多个列,忽略根目录“root”。方便进行更丰富的聚合操作。如果用户想使用这种显示方式,需要在表创建语句中设置参数“delta\_object\_name”(参考本手册 5.1 节中的示例 5),在本例中,将参数“delta\_object\_name”设置为“root.device.turbine”。路径层的数量必须是一对一的。此时,除了“根”层之外,为设备路径的每一层创建一列。列名是参数中的名称,值是设备相应层的名称。接下来,将为每个度量创建一个列来存储特定的数据。 + +那么 SparkSQL 表结构如下: + +|time(LongType)|group(StringType)|field(StringType)|device(StringType)|status(BooleanType)|temperature(FloatType)|hardware(StringType)| +|--- |--- |--- |--- |--- |--- |--- | +|1|ln|wf01|wt01|True|2.2|null| +|1|ln|wf02|wt02|True|null|null| +|2|ln|wf01|wt01|null|2.2|null| +|2|ln|wf02|wt02|False|null|"aaa"| +|2|sgcc|wf03|wt01|True|null|null| +|3|ln|wf01|wt01|True|2.1|null| +|3|sgcc|wf03|wt01|True|3.3|null| +|4|ln|wf01|wt01|null|2.0|null| +|4|ln|wf02|wt02|True|null|"bbb"| +|4|sgcc|wf03|wt01|True|null|null| +|5|ln|wf01|wt01|False|null|null| +|5|ln|wf02|wt02|False|null|null| +|5|sgcc|wf03|wt01|True|null|null| +|6|ln|wf02|wt02|null|null|"ccc"| +|6|sgcc|wf03|wt01|null|6.6|null| +|7|ln|wf01|wt01|True|null|null| +|8|ln|wf02|wt02|null|null|"ddd"| +|8|sgcc|wf03|wt01|null|8.8|null| +|9|sgcc|wf03|wt01|null|9.9|null| + +TsFile-Spark-Connector 可以通过 SparkSQL 在 SparkSQL 中以表的形式显示一个或多个 tsfile。它还允许用户指定一个目录或使用通配符来匹配多个目录。如果有多个 tsfile,那么所有 tsfile 中的度量值的并集将保留在表中,并且具有相同名称的度量值在默认情况下具有相同的数据类型。注意,如果存在名称相同但数据类型不同的情况,TsFile-Spark-Connector 将不能保证结果的正确性。 + +写入过程是将数据 aframe 写入一个或多个 tsfile。默认情况下,需要包含两个列:time 和 delta_object。其余的列用作测量。如果用户希望将第二个表结构写回 TsFile,可以设置“delta\_object\_name”参数(请参阅本手册 5.1 节的 5.1 节)。 + +附录 B:旧注 + +注意:检查 Spark 根目录中的 jar 软件包,并将 libthrift-0.9.2.jar 和 libfb303-0.9.2.jar 分别替换为 libthrift-0.9.1.jar 和 libfb303-0.9.1.jar。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Telegraf-IoTDB.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Telegraf-IoTDB.md new file mode 100644 index 00000000..e6a80fba --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Telegraf-IoTDB.md @@ -0,0 +1,110 @@ + + +# Telegraf +Telegraf 是一个开源代理工具,用于收集、处理和传输度量数据,由 InfluxData 开发。 +Telegraf 有以下这些特点: +* 插件体系结构: Telegraf 的强大之处在于其广泛的插件生态系统。它支持多种输入、输出和处理器插件,可以与各种数据源和目标无缝集成。 +* 数据收集: Telegraf 擅长从不同来源收集度量数据,例如系统指标、日志、数据库等。其多功能性使其适用于监视应用程序、基础架构和物联网设备。 +* 输出目标: 一旦收集到数据,可以将其发送到各种输出目标,包括流行的数据库如 InfluxDB。这种灵活性使 Telegraf 适应不同的监视和分析设置。 +* 配置简易: Telegraf 的配置使用 TOML 文件进行。这种简单性使用户能够轻松定义输入、输出和处理器,使定制变得简单明了。 +* 社区与支持: 作为开源项目,Telegraf 受益于活跃的社区。用户可以通过论坛和文档贡献插件、报告问题并寻求帮助。 + +# Telegraf IoTDB 输出插件 +这个输出插件保存 Telegraf 中的监控信息到 Apache IoTDB 的后端,支持 session 连接和数据写入。 + +## 注意事项 +1. 在使用这个插件前,需要配置 IP 地址,端口号,用户名,密码以及其他数据库服务器的信息,另外还有数据转换,时间单位和其他配置。 +2. 输出到 IoTDB 的路径需要满足章节 ‘语法约定’ 中的要求 +3. 查看 https://github.com/influxdata/telegraf/tree/master/plugins/outputs/iotdb 了解如何配置 Telegraf IoTDB Output 插件. + +## 示例 +以下是一个使用 Telegraf 收集 CPU 数据输出到 IoTDB 的示例。 +1. 使用 telegraf 命令生成配置文件 +``` +telegraf --sample-config --input-filter cpu --output-filter iotdb > cpu_iotdb.conf +``` +2. 修改 input cpu 插件的配置 +``` +# Read metrics about cpu usage +[[inputs.cpu]] + ## Whether to report per-cpu stats or not + percpu = true + ## Whether to report total system cpu stats or not + totalcpu = true + ## If true, collect raw CPU time metrics + collect_cpu_time = false + ## If true, compute and report the sum of all non-idle CPU states + report_active = false + ## If true and the info is available then add core_id and physical_id tags + core_tags = false + name_override = "root.demo.telgraf.cpu" +``` +3. 修改 output iotdb 插件的配置 +``` +# Save metrics to an IoTDB Database +[[outputs.iotdb]] + ## Configuration of IoTDB server connection + host = "127.0.0.1" + # port = "6667" + + ## Configuration of authentication + # user = "root" + # password = "root" + + ## Timeout to open a new session. + ## A value of zero means no timeout. + # timeout = "5s" + + ## Configuration of type conversion for 64-bit unsigned int + ## IoTDB currently DOES NOT support unsigned integers (version 13.x). + ## 32-bit unsigned integers are safely converted into 64-bit signed integers by the plugin, + ## however, this is not true for 64-bit values in general as overflows may occur. + ## The following setting allows to specify the handling of 64-bit unsigned integers. + ## Available values are: + ## - "int64" -- convert to 64-bit signed integers and accept overflows + ## - "int64_clip" -- convert to 64-bit signed integers and clip the values on overflow to 9,223,372,036,854,775,807 + ## - "text" -- convert to the string representation of the value + # uint64_conversion = "int64_clip" + + ## Configuration of TimeStamp + ## TimeStamp is always saved in 64bits int. timestamp_precision specifies the unit of timestamp. + ## Available value: + ## "second", "millisecond", "microsecond", "nanosecond"(default) + timestamp_precision = "millisecond" + + ## Handling of tags + ## Tags are not fully supported by IoTDB. + ## A guide with suggestions on how to handle tags can be found here: + ## https://iotdb.apache.org/UserGuide/Master/API/InfluxDB-Protocol.html + ## + ## Available values are: + ## - "fields" -- convert tags to fields in the measurement + ## - "device_id" -- attach tags to the device ID + ## + ## For Example, a metric named "root.sg.device" with the tags `tag1: "private"` and `tag2: "working"` and + ## fields `s1: 100` and `s2: "hello"` will result in the following representations in IoTDB + ## - "fields" -- root.sg.device, s1=100, s2="hello", tag1="private", tag2="working" + ## - "device_id" -- root.sg.device.private.working, s1=100, s2="hello" + convert_tags_to = "fields" +``` +4. 使用这个配置文件运行 Telegraf,一段时间后,可以在 IoTDB 中查询到收集的这些数据 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Thingsboard.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Thingsboard.md new file mode 100644 index 00000000..f244d9f7 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Thingsboard.md @@ -0,0 +1,99 @@ + +# ThingsBoard + +## 产品概述 + +1. ThingsBoard 简介 + + ThingsBoard 是一个开源物联网平台,可实现物联网项目的快速开发、管理和扩展。更多介绍详情请参考[ ThingsBoard 官网](https://thingsboard.io/docs/getting-started-guides/what-is-thingsboard/)。 + + ![](https://alioss.timecho.com/docs/img/ThingsBoard-1.PNG) + +2. ThingsBoard-IoTDB 简介 + + ThingsBoard-IoTDB 提供了将 ThingsBoard 中的数据存储到 IoTDB 的能力,也支持在 ThingsBoard 中读取 root.thingsboard 数据库下的数据信息。详细架构图如下图黄色标识所示。 + +### 关系示意图 + +![](https://alioss.timecho.com/docs/img/Thingsboard-2.png) + +## 安装要求 + +| 准备内容 | 版本要求 | +| :-------------------------- | :----------------------------------------------------------- | +| JDK | 要求已安装 17 及以上版本,具体下载请查看 [Oracle 官网](https://www.oracle.com/java/technologies/downloads/) | +| IoTDB | 要求已安装 V1.3.0 及以上版本,具体安装过程请参考[ 部署指导](https://www.timecho.com/docs/zh/UserGuide/latest/Deployment-and-Maintenance/IoTDB-Package_timecho.html) | +| ThingsBoard(IoTDB 适配版) | 安装包请联系商务获取,具体安装步骤参见下文 | + +## 安装步骤 + +具体安装步骤请参考 [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)。其中: + +- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 2 ThingsBoard 服务安装】使用上方从商务获取的安装包进行安装(使用 ThingsBoard 官方安装包无法使用 iotdb) +- [ThingsBoard 官网](https://thingsboard.io/docs/user-guide/install/ubuntu/)中【步骤 3 配置 ThingsBoard 数据库-ThingsBoard 配置】步骤中需要按照下方内容添加环境变量 + +```Bash +# ThingsBoard 原有配置 +export SPRING_DATASOURCE_URL=jdbc:postgresql://localhost:5432/thingsboard +export SPRING_DATASOURCE_USERNAME=postgres +export SPRING_DATASOURCE_PASSWORD=PUT_YOUR_POSTGRESQL_PASSWORD_HERE ##修改为pg的密码 + +# 使用IoTDB需修改以下变量 +export DATABASE_TS_TYPE=iotdb ## 原配置为sql,将变量值改为iotdb + + +# 使用IoTDB需增加以下变量 +export DATABASE_TS_LATEST_TYPE=iotdb +export IoTDB_HOST=127.0.0.1 ## iotdb所在的ip地址 +export IoTDB_PORT:6667 ## iotdb的端口号,默认为6667 +export IoTDB_USER:root ## iotdb的用户名,默认为root +export IoTDB_PASSWORD:root ## iotdb的密码,默认为root +export IoTDB_CONNECTION_TIMEOUT:5000 ## iotdb超时时间设置 +export IoTDB_FETCH_SIZE:1024 ## 单次请求所拉取的数据条数,推荐设置为1024 +export IoTDB_MAX_SIZE:200 ##sessionpool内的最大数量,推荐设置为>=并发请求数 +export IoTDB_DATABASE:root.thingsboard ##thingsboard数据写入iotdb所存储的数据库,支持自定义 +``` + +## 使用说明 + +1. 创建设备并接入数据:在 Thingsboard 的实体-设备中创建设备并通过工业网关将数据发送到 ThingsBoard 指定设备中 + +![](https://alioss.timecho.com/docs/img/ThingsBoard-3.PNG) + +2. 设置规则链:在规则链库中对于“SD-032F 泵”设置告警规则并将该规则链设置为根链 + +
+  +  +
+ +3. 查看告警记录:对于产生的告警记录已经通过点击“设备-告警”来进行查看 + +![](https://alioss.timecho.com/docs/img/ThingsBoard-6.png) + +4. 数据可视化:在“仪表板”中通过“新建仪表板-绑定设备-关联参数”进行可视化设置 + +
+  +  +
+ diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB_apache.md new file mode 100644 index 00000000..9773004f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB_apache.md @@ -0,0 +1,174 @@ + + +# Apache Zeppelin + +## Zeppelin 简介 + +Apache Zeppelin 是一个基于网页的交互式数据分析系统。用户可以通过 Zeppelin 连接数据源并使用 SQL、Scala 等进行交互式操作。操作可以保存为文档(类似于 Jupyter)。Zeppelin 支持多种数据源,包括 Spark、ElasticSearch、Cassandra 和 InfluxDB 等等。现在,IoTDB 已经支持使用 Zeppelin 进行操作。样例如下: + +![iotdb-note-snapshot](https://alioss.timecho.com/docs/img/github/102752947-520a3e80-43a5-11eb-8fb1-8fac471c8c7e.png) + +## Zeppelin-IoTDB 解释器 + +### 系统环境需求 + +| IoTDB 版本 | Java 版本 | Zeppelin 版本 | +| :--------: | :-----------: | :-----------: | +| >=`0.12.0` | >=`1.8.0_271` | `>=0.9.0` | + +安装 IoTDB:参考 [快速上手](../Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md). 假设 IoTDB 安装在 `$IoTDB_HOME`. + +安装 Zeppelin: +> 方法 1 直接下载:下载 [Zeppelin](https://zeppelin.apache.org/download.html#) 并解压二进制文件。推荐下载 [netinst](http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-netinst.tgz) 二进制包,此包由于未编译不相关的 interpreter,因此大小相对较小。 +> +> 方法 2 源码编译:参考 [从源码构建 Zeppelin](https://zeppelin.apache.org/docs/latest/setup/basics/how_to_build.html) ,使用命令为 `mvn clean package -pl zeppelin-web,zeppelin-server -am -DskipTests`。 + +假设 Zeppelin 安装在 `$Zeppelin_HOME`. + +### 编译解释器 + +运行如下命令编译 IoTDB Zeppelin 解释器。 + +```shell +cd $IoTDB_HOME + mvn clean package -pl iotdb-connector/zeppelin-interpreter -am -DskipTests -P get-jar-with-dependencies +``` + +编译后的解释器位于如下目录: + +```shell +$IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar +``` + +### 安装解释器 + +当你编译好了解释器,在 Zeppelin 的解释器目录下创建一个新的文件夹`iotdb`,并将 IoTDB 解释器放入其中。 + +```shell +cd $IoTDB_HOME +mkdir -p $Zeppelin_HOME/interpreter/iotdb +cp $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar $Zeppelin_HOME/interpreter/iotdb +``` + +### 修改 Zeppelin 配置 + +进入 `$Zeppelin_HOME/conf`,使用 template 创建 Zeppelin 配置文件: + +```shell +cp zeppelin-site.xml.template zeppelin-site.xml +``` + +打开 zeppelin-site.xml 文件,将 `zeppelin.server.addr` 项修改为 `0.0.0.0` + +### 启动 Zeppelin 和 IoTDB + +进入 `$Zeppelin_HOME` 并运行 Zeppelin: + +```shell +# Unix/OS X +> ./bin/zeppelin-daemon.sh start + +# Windows +> .\bin\zeppelin.cmd +``` + +进入 `$IoTDB_HOME` 并运行 IoTDB: + +```shell +# Unix/OS X +> nohup sbin/start-server.sh >/dev/null 2>&1 & +or +> nohup sbin/start-server.sh -c -rpc_port >/dev/null 2>&1 & + +# Windows +> sbin\start-server.bat -c -rpc_port +``` + +## 使用 Zeppelin-IoTDB 解释器 + +当 Zeppelin 启动后,访问 [http://127.0.0.1:8080/](http://127.0.0.1:8080/) + +通过如下步骤创建一个新的笔记本页面: + +1. 点击 `Create new node` 按钮 +2. 设置笔记本名 +3. 选择解释器为 iotdb + +现在可以开始使用 Zeppelin 操作 IoTDB 了。 + +![iotdb-create-note](https://alioss.timecho.com/docs/img/github/102752945-5171a800-43a5-11eb-8614-53b3276a3ce2.png) + +我们提供了一些简单的 SQL 来展示 Zeppelin-IoTDB 解释器的使用: + +```sql +CREATE DATABASE root.ln.wf01.wt01; +CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN; +CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=PLAIN; +CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32, ENCODING=PLAIN; + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (1, 1.1, false, 11); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (2, 2.2, true, 22); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (3, 3.3, false, 33); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (4, 4.4, false, 44); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (5, 5.5, false, 55); + +SELECT * +FROM root.ln.wf01.wt01 +WHERE time >= 1 + AND time <= 6; +``` + +样例如下: + +![iotdb-note-snapshot2](https://alioss.timecho.com/docs/img/github/102752948-52a2d500-43a5-11eb-9156-0c55667eb4cd.png) + +用户也可以参考 [[1]](https://zeppelin.apache.org/docs/0.9.0/usage/display_system/basic.html) 编写更丰富多彩的文档。 + +以上样例放置于 `$IoTDB_HOME/zeppelin-interpreter/Zeppelin-IoTDB-Demo.zpln` + +## 解释器配置项 + +进入页面 [http://127.0.0.1:8080/#/interpreter](http://127.0.0.1:8080/#/interpreter) 并配置 IoTDB 的连接参数: + +![iotdb-configuration](https://alioss.timecho.com/docs/img/github/102752940-50407b00-43a5-11eb-94fb-3e3be222183c.png) + +可配置参数默认值和解释如下: + +| 属性 | 默认值 | 描述 | +| ---------------------------- | --------- | -------------------------------- | +| iotdb.host | 127.0.0.1 | IoTDB 主机名 | +| iotdb.port | 6667 | IoTDB 端口 | +| iotdb.username | root | 用户名 | +| iotdb.password | root | 密码 | +| iotdb.fetchSize | 10000 | 查询结果分批次返回时,每一批数量 | +| iotdb.zoneId | | 时区 ID | +| iotdb.enable.rpc.compression | FALSE | 是否允许 rpc 压缩 | +| iotdb.time.display.type | default | 时间戳的展示格式 | diff --git a/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB_timecho.md new file mode 100644 index 00000000..555ed677 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Ecosystem-Integration/Zeppelin-IoTDB_timecho.md @@ -0,0 +1,174 @@ + + +# Apache Zeppelin + +## Zeppelin 简介 + +Apache Zeppelin 是一个基于网页的交互式数据分析系统。用户可以通过 Zeppelin 连接数据源并使用 SQL、Scala 等进行交互式操作。操作可以保存为文档(类似于 Jupyter)。Zeppelin 支持多种数据源,包括 Spark、ElasticSearch、Cassandra 和 InfluxDB 等等。现在,IoTDB 已经支持使用 Zeppelin 进行操作。样例如下: + +![iotdb-note-snapshot](https://alioss.timecho.com/docs/img/github/102752947-520a3e80-43a5-11eb-8fb1-8fac471c8c7e.png) + +## Zeppelin-IoTDB 解释器 + +### 系统环境需求 + +| IoTDB 版本 | Java 版本 | Zeppelin 版本 | +| :--------: | :-----------: | :-----------: | +| >=`0.12.0` | >=`1.8.0_271` | `>=0.9.0` | + +安装 IoTDB:参考 [快速上手](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md). 假设 IoTDB 安装在 `$IoTDB_HOME`. + +安装 Zeppelin: +> 方法 1 直接下载:下载 [Zeppelin](https://zeppelin.apache.org/download.html#) 并解压二进制文件。推荐下载 [netinst](http://www.apache.org/dyn/closer.cgi/zeppelin/zeppelin-0.9.0/zeppelin-0.9.0-bin-netinst.tgz) 二进制包,此包由于未编译不相关的 interpreter,因此大小相对较小。 +> +> 方法 2 源码编译:参考 [从源码构建 Zeppelin](https://zeppelin.apache.org/docs/latest/setup/basics/how_to_build.html) ,使用命令为 `mvn clean package -pl zeppelin-web,zeppelin-server -am -DskipTests`。 + +假设 Zeppelin 安装在 `$Zeppelin_HOME`. + +### 编译解释器 + +运行如下命令编译 IoTDB Zeppelin 解释器。 + +```shell +cd $IoTDB_HOME + mvn clean package -pl iotdb-connector/zeppelin-interpreter -am -DskipTests -P get-jar-with-dependencies +``` + +编译后的解释器位于如下目录: + +```shell +$IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar +``` + +### 安装解释器 + +当你编译好了解释器,在 Zeppelin 的解释器目录下创建一个新的文件夹`iotdb`,并将 IoTDB 解释器放入其中。 + +```shell +cd $IoTDB_HOME +mkdir -p $Zeppelin_HOME/interpreter/iotdb +cp $IoTDB_HOME/zeppelin-interpreter/target/zeppelin-{version}-SNAPSHOT-jar-with-dependencies.jar $Zeppelin_HOME/interpreter/iotdb +``` + +### 修改 Zeppelin 配置 + +进入 `$Zeppelin_HOME/conf`,使用 template 创建 Zeppelin 配置文件: + +```shell +cp zeppelin-site.xml.template zeppelin-site.xml +``` + +打开 zeppelin-site.xml 文件,将 `zeppelin.server.addr` 项修改为 `0.0.0.0` + +### 启动 Zeppelin 和 IoTDB + +进入 `$Zeppelin_HOME` 并运行 Zeppelin: + +```shell +# Unix/OS X +> ./bin/zeppelin-daemon.sh start + +# Windows +> .\bin\zeppelin.cmd +``` + +进入 `$IoTDB_HOME` 并运行 IoTDB: + +```shell +# Unix/OS X +> nohup sbin/start-server.sh >/dev/null 2>&1 & +or +> nohup sbin/start-server.sh -c -rpc_port >/dev/null 2>&1 & + +# Windows +> sbin\start-server.bat -c -rpc_port +``` + +## 使用 Zeppelin-IoTDB 解释器 + +当 Zeppelin 启动后,访问 [http://127.0.0.1:8080/](http://127.0.0.1:8080/) + +通过如下步骤创建一个新的笔记本页面: + +1. 点击 `Create new node` 按钮 +2. 设置笔记本名 +3. 选择解释器为 iotdb + +现在可以开始使用 Zeppelin 操作 IoTDB 了。 + +![iotdb-create-note](https://alioss.timecho.com/docs/img/github/102752945-5171a800-43a5-11eb-8614-53b3276a3ce2.png) + +我们提供了一些简单的 SQL 来展示 Zeppelin-IoTDB 解释器的使用: + +```sql +CREATE DATABASE root.ln.wf01.wt01; +CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN; +CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=PLAIN; +CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32, ENCODING=PLAIN; + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (1, 1.1, false, 11); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (2, 2.2, true, 22); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (3, 3.3, false, 33); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (4, 4.4, false, 44); + +INSERT INTO root.ln.wf01.wt01 (timestamp, temperature, status, hardware) +VALUES (5, 5.5, false, 55); + +SELECT * +FROM root.ln.wf01.wt01 +WHERE time >= 1 + AND time <= 6; +``` + +样例如下: + +![iotdb-note-snapshot2](https://alioss.timecho.com/docs/img/github/102752948-52a2d500-43a5-11eb-9156-0c55667eb4cd.png) + +用户也可以参考 [[1]](https://zeppelin.apache.org/docs/0.9.0/usage/display_system/basic.html) 编写更丰富多彩的文档。 + +以上样例放置于 `$IoTDB_HOME/zeppelin-interpreter/Zeppelin-IoTDB-Demo.zpln` + +## 解释器配置项 + +进入页面 [http://127.0.0.1:8080/#/interpreter](http://127.0.0.1:8080/#/interpreter) 并配置 IoTDB 的连接参数: + +![iotdb-configuration](https://alioss.timecho.com/docs/img/github/102752940-50407b00-43a5-11eb-94fb-3e3be222183c.png) + +可配置参数默认值和解释如下: + +| 属性 | 默认值 | 描述 | +| ---------------------------- | --------- | -------------------------------- | +| iotdb.host | 127.0.0.1 | IoTDB 主机名 | +| iotdb.port | 6667 | IoTDB 端口 | +| iotdb.username | root | 用户名 | +| iotdb.password | root | 密码 | +| iotdb.fetchSize | 10000 | 查询结果分批次返回时,每一批数量 | +| iotdb.zoneId | | 时区 ID | +| iotdb.enable.rpc.compression | FALSE | 是否允许 rpc 压缩 | +| iotdb.time.display.type | default | 时间戳的展示格式 | diff --git a/src/zh/UserGuide/V2.0.1/Tree/FAQ/Frequently-asked-questions.md b/src/zh/UserGuide/V2.0.1/Tree/FAQ/Frequently-asked-questions.md new file mode 100644 index 00000000..4d859ab5 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/FAQ/Frequently-asked-questions.md @@ -0,0 +1,261 @@ + + + + +# 常见问题 + +## 一般问题 + +### 如何查询我的IoTDB版本? + +有几种方法可以识别您使用的 IoTDB 版本: + +* 启动 IoTDB 的命令行界面: + +``` +> ./start-cli.sh -p 6667 -pw root -u root -h localhost + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version x.x.x +``` + +* 检查 pom.xml 文件: + +``` +x.x.x +``` + +* 使用 JDBC API: + +``` +String iotdbVersion = tsfileDatabaseMetadata.getDatabaseProductVersion(); +``` + +* 使用命令行接口: + +``` +IoTDB> show version +show version ++---------------+ +|version | ++---------------+ +|x.x.x | ++---------------+ +Total line number = 1 +It costs 0.241s +``` + +### 在哪里可以找到IoTDB的日志? + +假设您的根目录是: + +```shell +$ pwd +/workspace/iotdb + +$ ls -l +server/ +cli/ +pom.xml +Readme.md +... +``` + +假如 `$IOTDB_HOME = /workspace/iotdb/server/target/iotdb-server-{project.version}` + +假如 `$IOTDB_CLI_HOME = /workspace/iotdb/cli/target/iotdb-cli-{project.version}` + +在默认的设置里,logs 文件夹会被存储在```IOTDB_HOME/logs```。您可以在```IOTDB_HOME/conf```目录下的```logback.xml```文件中修改日志的级别和日志的存储路径。 + +### 在哪里可以找到IoTDB的数据文件? + +在默认的设置里,数据文件(包含 TsFile,metadata,WAL)被存储在```IOTDB_HOME/data/datanode```文件夹。 + +### 如何知道IoTDB中存储了多少时间序列? + +使用 IoTDB 的命令行接口: + +``` +IoTDB> show timeseries +``` + +在返回的结果里,会展示`Total timeseries number`,这个数据就是 IoTDB 中 timeseries 的数量。 + +在当前版本中,IoTDB 支持直接使用命令行接口查询时间序列的数量: + +``` +IoTDB> count timeseries +``` + +如果您使用的是 Linux 操作系统,您可以使用以下的 Shell 命令: + +``` +> grep "0,root" $IOTDB_HOME/data/system/schema/mlog.txt | wc -l +> 6 +``` + +### 可以使用Hadoop和Spark读取IoTDB中的TsFile吗? + +是的。IoTDB 与开源生态紧密结合。IoTDB 支持 [Hadoop](https://github.com/apache/iotdb-extras/tree/master/connectors/iotdb-connector/hadoop), [Spark](https://github.com/apache/iotdb-extras/tree/master/connectors/spark-iotdb-connector) 和 [Grafana](https://github.com/apache/iotdb-extras/tree/master/connectors/grafana-connector) 可视化工具。 + +### IoTDB如何处理重复的数据点? + +一个数据点是由一个完整的时间序列路径(例如:```root.vehicle.d0.s0```) 和时间戳唯一标识的。如果您使用与现有点相同的路径和时间戳提交一个新点,那么 IoTDB 将更新这个点的值,而不是插入一个新点。 + +### 我如何知道具体的timeseries的类型? + +在 IoTDB 的命令行接口中使用 SQL ```SHOW TIMESERIES ```: + +例如:如果您想知道所有 timeseries 的类型 \ 应该为 `root.**`。上面的 SQL 应该修改为: + +``` +IoTDB> show timeseries root.** +``` + +如果您想查询一个指定的时间序列,您可以修改 \ 为时间序列的完整路径。比如: + +``` +IoTDB> show timeseries root.fit.d1.s1 +``` + +您还可以在 timeseries 路径中使用通配符: + +``` +IoTDB> show timeseries root.fit.d1.* +``` + +### 如何更改IoTDB的客户端时间显示格式? + +IoTDB 客户端默认显示的时间是人类可读的(比如:```1970-01-01T08:00:00.001```),如果您想显示是时间戳或者其他可读格式,请在启动命令上添加参数```-disableISO8601```: + +``` +> $IOTDB_CLI_HOME/sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root -disableISO8601 +``` + +### 怎么处理来自`org.apache.ratis.grpc.server.GrpcLogAppender`的`IndexOutOfBoundsException`? + +这是我们的依赖Ratis 2.4.1的一个内部错误日志,不会对数据写入和读取造成任何影响。 +已经报告给Ratis社区,并会在未来的版本中修复。 + +### 预估内存不足报错如何处理? + +报错信息: +``` +301: There is not enough memory to execute current fragment instance, current remaining free memory is 86762854, estimated memory usage for current fragment instance is 270139392 +``` +报错分析: +datanode_memory_proportion参数控制分给查询的内存,chunk_timeseriesmeta_free_memory_proportion参数控制查询执行可用的内存。 +默认情况下分给查询的内存为堆内存*30%,查询执行可用的内存为查询内存的20%。 +报错显示当前剩余查询执行可用内存为86762854B=82.74MB,该查询预估使用执行内存270139392B=257.6MB。 + +一些可能的改进项: + +- 在不改变默认参数的前提下,调大IoTDB的堆内存大于 4.2G(4.2G * 1024MB=4300MB),4300M*30%*20%=258M>257.6M,可以满足要求。 +- 更改 datanode_memory_proportion 等参数,使查询执行可用内存>257.6MB。 +- 减少导出的时间序列数量。 +- 给查询语句添加 slimit 限制,也是减少查询时间序列的一种方案。 +- 添加 align by device,会按照device顺序进行输出,内存占用会降低至单device级别。 + + +## 分布式部署 FAQ + +### 集群启停 + +#### ConfigNode初次启动失败,如何排查原因? + +- ConfigNode初次启动时确保已清空data/confignode目录 +- 确保该ConfigNode使用到的没有被占用,没有与已启动的ConfigNode使用到的冲突 +- 确保该ConfigNode的cn_seed_config_node(指向存活的ConfigNode;如果该ConfigNode是启动的第一个ConfigNode,该值指向自身)配置正确 +- 确保该ConfigNode的配置项(共识协议、副本数等)等与cn_seed_config_node对应的ConfigNode集群一致 + +#### ConfigNode初次启动成功,show cluster的结果里为何没有该节点? + +- 检查cn_seed_config_node是否正确指向了正确的地址; 如果cn_seed_config_node指向了自身,则会启动一个新的ConfigNode集群 + +#### DataNode初次启动失败,如何排查原因? + +- DataNode初次启动时确保已清空data/datanode目录。 如果启动结果为“Reject DataNode restart.”则表示启动时可能没有清空data/datanode目录 +- 确保该DataNode使用到的没有被占用,没有与已启动的DataNode使用到的冲突 +- 确保该DataNode的dn_seed_config_node指向存活的ConfigNode + +#### 移除DataNode执行失败,如何排查? + +- 检查remove-datanode脚本的参数是否正确,是否传入了正确的ip:port或正确的dataNodeId +- 只有集群可用节点数量 > max(元数据副本数量, 数据副本数量)时,移除操作才允许被执行 +- 执行移除DataNode的过程会将该DataNode上的数据迁移到其他存活的DataNode,数据迁移以Region为粒度,如果某个Region迁移失败,则被移除的DataNode会一直处于Removing状态 +- 补充:处于Removing状态的节点,其节点上的Region也是Removing或Unknown状态,即不可用状态。 该Remvoing状态的节点也不会接受客户端的请求。如果要使Removing状态的节点变为可用,用户可以使用set system status to running 命令将该节点设置为Running状态;如果要使迁移失败的Region处于可用状态,可以使用migrate region from datanodeId1 to datanodeId2 命令将该不可用的Region迁移到其他存活的节点。另外IoTDB后续也会提供 `remove-datanode.sh -f` 命令,来强制移除节点(迁移失败的Region会直接丢弃) + +#### 挂掉的DataNode是否支持移除? + +- 当前集群副本数量大于1时可以移除。 如果集群副本数量等于1,则不支持移除。 在下个版本会推出强制移除的命令 + +#### 从0.13升级到1.0需要注意什么? + +- 0.13版本与1.0版本的文件目录结构是不同的,不能将0.13的data目录直接拷贝到1.0集群使用。如果需要将0.13的数据导入至1.0,可以使用LOAD功能 +- 0.13版本的默认RPC地址是0.0.0.0,1.0版本的默认RPC地址是127.0.0.1 + + +### 集群重启 + +#### 如何重启集群中的某个ConfigNode? + +- 第一步:通过`stop-confignode.sh`或kill进程方式关闭ConfigNode进程 +- 第二步:通过执行`start-confignode.sh`启动ConfigNode进程实现重启 +- 下个版本IoTDB会提供一键重启的操作 + +#### 如何重启集群中的某个DataNode? + +- 第一步:通过`stop-datanode.sh`或kill进程方式关闭DataNode进程 +- 第二步:通过执行`start-datanode.sh`启动DataNode进程实现重启 +- 下个版本IoTDB会提供一键重启的操作 + +#### 将某个ConfigNode移除后(remove-confignode),能否再利用该ConfigNode的data目录重启? + +- 不能。会报错:Reject ConfigNode restart. Because there are no corresponding ConfigNode(whose nodeId=xx) in the cluster. + +#### 将某个DataNode移除后(remove-datanode),能否再利用该DataNode的data目录重启? + +- 不能正常重启,启动结果为“Reject DataNode restart. Because there are no corresponding DataNode(whose nodeId=xx) in the cluster. Possible solutions are as follows:...” + +#### 用户看到某个ConfigNode/DataNode变成了Unknown状态,在没有kill对应进程的情况下,直接删除掉ConfigNode/DataNode对应的data目录,然后执行`start-confignode.sh`/`start-datanode.sh`,这种情况下能成功吗? + +- 无法启动成功,会报错端口已被占用 + +### 集群运维 + +#### Show cluster执行失败,显示“please check server status”,如何排查? + +- 确保ConfigNode集群一半以上的节点处于存活状态 +- 确保客户端连接的DataNode处于存活状态 + +#### 某一DataNode节点的磁盘文件损坏,如何修复这个节点? + +- 当前只能通过remove-datanode的方式进行实现。remove-datanode执行的过程中会将该DataNode上的数据迁移至其他存活的DataNode节点(前提是集群设置的副本数大于1) +- 下个版本IoTDB会提供一键修复节点的功能 + +#### 如何降低ConfigNode、DataNode使用的内存? + +- 在conf/confignode-env.sh、conf/datanode-env.sh文件可通过调整ON_HEAP_MEMORY、OFF_HEAP_MEMORY等选项可以调整ConfigNode、DataNode使用的最大堆内、堆外内存 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_apache.md b/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_apache.md new file mode 100644 index 00000000..2fc38471 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_apache.md @@ -0,0 +1,76 @@ + + +# 产品介绍 + +Apache IoTDB 是一款低成本、高性能的物联网原生时序数据库。它可以解决企业组建物联网大数据平台管理时序数据时所遇到的应用场景复杂、数据体量大、采样频率高、数据乱序多、数据处理耗时长、分析需求多样、存储与运维成本高等多种问题。 + +- Github仓库链接:https://github.com/apache/iotdb + +- 开源安装包下载:https://iotdb.apache.org/zh/Download/ + +- 安装部署与使用文档:[快速上手](../QuickStart/QuickStart_apache.md) + + +## 产品体系 + +IoTDB 体系由若干个组件构成,帮助用户高效地管理和分析物联网产生的海量时序数据。 + +
+ Introduction-zh-apache.png +
+ +其中: + +1. **时序数据库(Apache IoTDB)**:时序数据存储的核心组件,其能够为用户提供高压缩存储能力、丰富时序查询能力、实时流处理能力,同时具备数据的高可用和集群的高扩展性,并在安全层面提供全方位保障。同时 TimechoDB 还为用户提供多种应用工具,方便用户配置和管理系统;多语言API和外部系统应用集成能力,方便用户在 TimechoDB 基础上构建业务应用。 +2. **时序数据标准文件格式(Apache TsFile)**:该文件格式是一种专为时序数据设计的存储格式,可以高效地存储和查询海量时序数据。目前 IoTDB、AINode 等模块的底层存储文件均由 Apache TsFile 进行支撑。通过 TsFile,用户可以在采集、管理、应用&分析阶段统一使用相同的文件格式进行数据管理,极大简化了数据采集到分析的整个流程,提高时序数据管理的效率和便捷度。 +3. **时序模型训推一体化引擎(IoTDB AINode)**:针对智能分析场景,IoTDB 提供 AINode 时序模型训推一体化引擎,它提供了一套完整的时序数据分析工具,底层为模型训练引擎,支持训练任务与数据管理,与包括机器学习、深度学习等。通过这些工具,用户可以对存储在 IoTDB 中的数据进行深入分析,挖掘出其中的价值。 + + +## 产品特性 + +Apache IoTDB 具备以下优势和特性: + +- 灵活的部署方式:支持云端一键部署、终端解压即用、终端-云端无缝连接(数据云端同步工具) + +- 低硬件成本的存储解决方案:支持高压缩比的磁盘存储,无需区分历史库与实时库,数据统一管理 + +- 层级化的测点组织管理方式:支持在系统中根据设备实际层级关系进行建模,以实现与工业测点管理结构的对齐,同时支持针对层级结构的目录查看、检索等能力 + +- 高通量的数据读写:支持百万级设备接入、数据高速读写、乱序/多频采集等复杂工业读写场景 + +- 丰富的时间序列查询语义:支持时序数据原生计算引擎,支持查询时时间戳对齐,提供近百种内置聚合与时序计算函数,支持面向时序特征分析和AI能力 + +- 高可用的分布式系统:支持HA分布式架构,系统提供7*24小时不间断的实时数据库服务,一个物理节点宕机或网络故障,不会影响系统的正常运行;支持物理节点的增加、删除或过热,系统会自动进行计算/存储资源的负载均衡处理;支持异构环境,不同类型、不同性能的服务器可以组建集群,系统根据物理机的配置,自动负载均衡 + +- 极低的使用&运维门槛:支持类 SQL 语言、提供多语言原生二次开发接口、具备控制台等完善的工具体系 + +- 丰富的生态环境对接:支持Hadoop、Spark等大数据生态系统组件对接,支持Grafana、Thingsboard、DataEase等设备管理和可视化工具 + +## 商业版本 + +天谋科技在 Apache IoTDB 开源版本的基础上提供了原厂商业化产品 TimechoDB,为企业、商业客户提供企业级产品和服务,它可以解决企业组建物联网大数据平台管理时序数据时所遇到的应用场景复杂、数据体量大、采样频率高、数据乱序多、数据处理耗时长、分析需求多样、存储与运维成本高等多种问题。 + +天谋科技基于 TimechoDB 提供更多样的产品功能、更强大的性能和稳定性、更丰富的效能工具,并为用户提供全方位的企业服务,从而为商业化客户提供更强大的产品能力,和更优质的开发、运维、使用体验。 + +- 天谋科技官网:https://www.timecho.com/ + +- TimechoDB 安装部署与使用文档:[快速上手](../QuickStart/QuickStart_timecho.md) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_timecho.md new file mode 100644 index 00000000..b5fa25b2 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/IoTDB-Introduction_timecho.md @@ -0,0 +1,265 @@ + + +# 产品介绍 + +TimechoDB 是一款低成本、高性能的物联网原生时序数据库,是天谋科技基于 Apache IoTDB 社区版本提供的原厂商业化产品。它可以解决企业组建物联网大数据平台管理时序数据时所遇到的应用场景复杂、数据体量大、采样频率高、数据乱序多、数据处理耗时长、分析需求多样、存储与运维成本高等多种问题。 + +天谋科技基于 TimechoDB 提供更多样的产品功能、更强大的性能和稳定性、更丰富的效能工具,并为用户提供全方位的企业服务,从而为商业化客户提供更强大的产品能力,和更优质的开发、运维、使用体验。 + +- 下载、部署与使用:[快速上手](../QuickStart/QuickStart_timecho.md) + +## 产品体系 + +天谋产品体系由若干个组件构成,覆盖由【数据采集】到【数据管理】到【数据分析&应用】的全时序数据生命周期,做到“采-存-用”一体化时序数据解决方案,帮助用户高效地管理和分析物联网产生的海量时序数据。 + +
+ Introduction-zh-timecho.png +
+ + +其中: + +1. **时序数据库(TimechoDB,基于 Apache IoTDB 提供的原厂商业化产品)**:时序数据存储的核心组件,其能够为用户提供高压缩存储能力、丰富时序查询能力、实时流处理能力,同时具备数据的高可用和集群的高扩展性,并在安全层面提供全方位保障。同时 TimechoDB 还为用户提供多种应用工具,方便用户配置和管理系统;多语言API和外部系统应用集成能力,方便用户在 TimechoDB 基础上构建业务应用。 +2. **时序数据标准文件格式(Apache TsFile,多位天谋科技核心团队成员主导&贡献代码)**:该文件格式是一种专为时序数据设计的存储格式,可以高效地存储和查询海量时序数据。目前 Timecho 采集、存储、智能分析等模块的底层存储文件均由 Apache TsFile 进行支撑。TsFile 可以被高效地加载至 IoTDB 中,也能够被迁移出来。通过 TsFile,用户可以在采集、管理、应用&分析阶段统一使用相同的文件格式进行数据管理,极大简化了数据采集到分析的整个流程,提高时序数据管理的效率和便捷度。 +3. **时序模型训推一体化引擎(AINode)**:针对智能分析场景,TimechoDB 提供 AINode 时序模型训推一体化引擎,它提供了一套完整的时序数据分析工具,底层为模型训练引擎,支持训练任务与数据管理,与包括机器学习、深度学习等。通过这些工具,用户可以对存储在 TimechoDB 中的数据进行深入分析,挖掘出其中的价值。 +4. **数据采集**:为了更加便捷的对接各类工业采集场景, 天谋科技提供数据采集接入服务,支持多种协议和格式,可以接入各种传感器、设备产生的数据,同时支持断点续传、网闸穿透等特性。更加适配工业领域采集过程中配置难、传输慢、网络弱的特点,让用户的数采变得更加简单、高效。 + + +## 产品特性 + +TimechoDB 具备以下优势和特性: + +- 灵活的部署方式:支持云端一键部署、终端解压即用、终端-云端无缝连接(数据云端同步工具) + +- 低硬件成本的存储解决方案:支持高压缩比的磁盘存储,无需区分历史库与实时库,数据统一管理 + +- 层级化的测点组织管理方式:支持在系统中根据设备实际层级关系进行建模,以实现与工业测点管理结构的对齐,同时支持针对层级结构的目录查看、检索等能力 + +- 高通量的数据读写:支持百万级设备接入、数据高速读写、乱序/多频采集等复杂工业读写场景 + +- 丰富的时间序列查询语义:支持时序数据原生计算引擎,支持查询时时间戳对齐,提供近百种内置聚合与时序计算函数,支持面向时序特征分析和AI能力 + +- 高可用的分布式系统:支持HA分布式架构,系统提供7*24小时不间断的实时数据库服务,一个物理节点宕机或网络故障,不会影响系统的正常运行;支持物理节点的增加、删除或过热,系统会自动进行计算/存储资源的负载均衡处理;支持异构环境,不同类型、不同性能的服务器可以组建集群,系统根据物理机的配置,自动负载均衡 + +- 极低的使用&运维门槛:支持类 SQL 语言、提供多语言原生二次开发接口、具备控制台等完善的工具体系 + +- 丰富的生态环境对接:支持Hadoop、Spark等大数据生态系统组件对接,支持Grafana、Thingsboard、DataEase等设备管理和可视化工具 + +## 企业特性 + +### 更高阶的产品功能 + +TimechoDB 在开源版基础上提供了更多高阶产品功能,在内核层面针对工业生产场景进行原生升级和优化,如多级存储、云边协同、可视化工具、安全增强等功能,能够让用户无需过多关注底层逻辑,将精力聚焦在业务开发中,让工业生产更简单更高效,为企业带来更多的经济效益。如: + +- 双活部署:双活通常是指两个独立的单机(或集群),实时进行镜像同步,它们的配置完全独立,可以同时接收外界的写入,每一个独立的单机(或集群)都可以将写入到自己的数据同步到另一个单机(或集群)中,两个单机(或集群)的数据可达到最终一致。 + +- 数据同步:通过数据库内置的同步模块,支持数据由场站向中心汇聚,支持全量汇聚、部分汇聚、级联汇聚等各类场景,可支持实时数据同步与批量数据同步两种模式。同时提供多种内置插件,支持企业数据同步应用中的网闸穿透、加密传输、压缩传输等相关要求。 + +- 多级存储:通过升级底层存储能力,支持根据访问频率和数据重要性等因素将数据划分为冷、温、热等不同层级的数据,并将其存储在不同介质中(如 SSD、机械硬盘、云存储等),同时在查询过程中也由系统进行数据调度。从而在保证数据访问速度的同时,降低客户数据存储成本。 + +- 安全增强:通过白名单、审计日志等功能加强企业内部管理,降低数据泄露风险。 + +详细功能对比如下: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
功能Apache IoTDBTimechoDB
部署模式单机部署
分布式部署
双活部署×
容器部署部分支持
数据库功能测点管理
数据写入
数据查询
连续查询
触发器
用户自定义函数
权限管理
数据同步仅文件同步,无内置插件实时同步+文件同步,丰富内置插件
流处理仅框架,无内置插件框架+丰富内置插件
多级存储×
视图×
白名单×
审计日志×
配套工具可视化控制台×
集群管理工具×
系统监控工具×
国产化国产化兼容性认证×
技术支持最佳实践×
使用培训×
+ +### 更高效/稳定的产品性能 + +TimechoDB 在开源版的基础上优化了稳定性与性能,经过企业版技术支持,能够实现10倍以上性能提升,并具有故障及时恢复的性能优势。 + +### 更用户友好的工具体系 + +TimechoDB 将为用户提供更简单、易用的工具体系,通过集群监控面板(IoTDB Grafana)、数据库控制台(IoTDB Workbench)、集群管理工具(IoTDB Deploy Tool,简称 IoTD)等产品帮助用户快速部署、管理、监控数据库集群,降低运维人员工作/学习成本,简化数据库运维工作,使运维过程更加方便、快捷。 + +- 集群监控面板:旨在解决 IoTDB 及其所在操作系统的监控问题,主要包括:操作系统资源监控、IoTDB 性能监控,及上百项内核监控指标,从而帮助用户监控集群健康状态,并进行集群调优和运维。 + +
+

总体概览

+

操作系统资源监控

+

IoTDB 性能监控

+
+
+ + + +
+

+ +- 数据库控制台:旨在提供低门槛的数据库交互工具,通过提供界面化的控制台帮助用户简洁明了的进行元数据管理、数据增删改查、权限管理、系统管理等操作,简化数据库使用难度,提高数据库使用效率。 + + +
+

首页

+

元数据管理

+

SQL 查询

+
+
+ + + +
+

+ + +- 集群管理工具:旨在解决分布式系统多节点的运维难题,主要包括集群部署、集群启停、弹性扩容、配置更新、数据导出等功能,从而实现对复杂数据库集群的一键式指令下发,极大降低管理难度。 + + +
+  +
+ +### 更专业的企业技术服务 + +TimechoDB 客户提供强大的原厂服务,包括但不限于现场安装及培训、专家顾问咨询、现场紧急救助、软件升级、在线自助服务、远程支持、最新开发版使用指导等服务。同时,为了使 IoTDB 更契合工业生产场景,我们会根据企业实际数据结构和读写负载,进行建模方案推荐、读写性能调优、压缩比调优、数据库配置推荐及其他的技术支持。如遇到部分产品未覆盖的工业化定制场景,TimechoDB 将根据用户特点提供定制化开发工具。 + +相较于开源版本,每 2-3 个月一个发版周期,TimechoDB 提供周期更快的发版频率,同时针对客户现场紧急问题,提供天级别的专属修复,确保生产环境稳定。 + + +### 更兼容的国产化适配 + +TimechoDB 代码自研可控,同时兼容大部分主流信创产品(CPU、操作系统等),并完成与多个厂家的兼容认证,确保产品的合规性和安全性。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/Scenario.md b/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/Scenario.md new file mode 100644 index 00000000..ae104b06 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/IoTDB-Introduction/Scenario.md @@ -0,0 +1,95 @@ + + +# 应用场景 + +## 应用1——车联网 + +### 背景 + +> - 难点:设备多,序列多 + +某车企业务体量庞大,需处理车辆多、数据量大,亿级数据测点,每秒超千万条新增数据点,毫秒级采集频率,对数据库的实时写入、存储与处理能力均要求较高。 + +原始架构中使用Hbase集群作为存储数据库,查询延迟高,系统维护难度和成本高。难以满足需求。而IoTDB支持百万级测点数高频数据写入和查询毫秒级响应,高效的数据处理方式可以让用户快速、准确地获取到所需数据,大幅提升了数据处理的效率。 + +因此选择以IoTDB为数据存储层,架构轻量,减轻运维成本,且支持弹性扩缩容和高可用,确保系统的稳定性和可用性。 + +### 架构 + +该车企以IoTDB为时序数据存储引擎的数据管理架构如下图所示。 + + +![img](https://alioss.timecho.com/docs/img/1280X1280.PNG) + +车辆数据基于TCP和工业协议编码后发送至边缘网关,网关将数据发往消息队列Kafka集群,解耦生产和消费两端。Kafka将数据发送至Flink进行实时处理,处理后的数据写入IoTDB中,历史数据和最新数据均在IoTDB中进行查询,最后数据通过API流入可视化平台等进行应用。 + +## 应用2——智能运维 + +### 背景 + +某钢厂旨在搭建低成本、大规模接入能力的远程智能运维软硬件平台,接入数百条产线,百万以上设备,千万级时间序列,实现智能运维远程覆盖。 + +此过程中面临诸多痛点: + +> - 设备种类繁多、协议众多、数据类型众多 +> - 时序数据特别是高频数据,数据量巨大 +> - 海量时序数据下的读写速度无法满足业务需求 +> - 现有时序数据管理组件无法满足各类高级应用需求 + +而选取IoTDB作为智能运维平台的存储数据库后,能稳定写入多频及高频采集数据,覆盖钢铁全工序,并采用复合压缩算法使数据大小缩减10倍以上,节省成本。IoTDB 还有效支持超过10年的历史数据降采样查询,帮助企业挖掘数据趋势,助力企业长远战略分析。 + +### 架构 + +下图为该钢厂的智能运维平台架构设计。 + +![img](https://alioss.timecho.com/docs/img/1280X1280%20(1).PNG) + +## 应用3——智能工厂 + +### 背景 + +> - 难点/亮点:云边协同 + +某卷烟厂希望从“传统工厂”向“高端工厂”完成转型升级,利用物联网和设备监控技术,加强信息管理和服务实现数据在企业内部自由流动,数据和决策的上通下达,帮助企业提高生产力,降低运营成本。 + +### 架构 + +下图为该工厂的物联网系统架构,IoTDB贯穿公司、工厂、车间三级物联网平台,实现设备统一联调联控。车间层面的数据通过边缘层的IoTDB进行实时采集、处理和存储,并实现了一系列的分析任务。经过预处理的数据被发送至平台层的IoTDB,进行业务层面的数据治理,如设备管理、连接管理、服务支持等。最终,数据会被集成到集团层面的IoTDB中,供整个组织进行综合分析和决策。 + +![img](https://alioss.timecho.com/docs/img/1280X1280%20(2).PNG) + + +## 应用4——工况监控 + +### 背景 + +> - 难点/亮点:智慧供热,降本增效 + +某电厂需要对风机锅炉设备、发电机、变电设备等主辅机数万测点进行监控。在以往的供暖供热过程中缺少对于下一阶段的供热量的预判,导致无效供热、过度供热、供热不足等情况。 + +使用IoTDB作为存储与分析引擎后,结合气象数据、楼控数据、户控数据、换热站数据、官网数据、热源侧数据等总和评判供热量,所有数据在IoTDB中进行时间对齐,为智慧供热提供可靠的数据依据,实现智慧供热。同时也解决了按需计费、管网、热站等相关供热过程中各重要组成部分的工况监控,减少了人力投入。 + +### 架构 + +下图为该电厂的供热场景数据管理架构。 + +![img](https://alioss.timecho.com/docs/img/7b7a22ae-6367-4084-a526-53c88190bc50.png) diff --git a/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart.md b/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart.md new file mode 100644 index 00000000..d94d5e0c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart.md @@ -0,0 +1,23 @@ +--- +redirectTo: QuickStart_apache.html +--- + diff --git a/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_apache.md b/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_apache.md new file mode 100644 index 00000000..16043b75 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_apache.md @@ -0,0 +1,91 @@ + + +# 快速上手 + +本篇文档将帮助您了解快速入门 IoTDB 的方法。 + +## 如何安装部署? + +本篇文档将帮助您快速安装部署 IoTDB,您可以通过以下文档的链接快速定位到所需要查看的内容: + +1. 准备所需机器资源:IoTDB 的部署和运行需要考虑多个方面的机器资源配置。具体资源配置可查看 [资源规划](../Deployment-and-Maintenance/Database-Resources.md) + +2. 完成系统配置准备:IoTDB 的系统配置涉及多个方面,关键的系统配置介绍可查看 [系统配置](../Deployment-and-Maintenance/Environment-Requirements.md) + +3. 获取安装包:您可以在[ Apache IoTDB 官网](https://iotdb.apache.org/zh/Download/)获取获取 IoTDB 安装包。具体安装包结构可查看:[安装包获取](../Deployment-and-Maintenance/IoTDB-Package_apache.md) + +4. 安装数据库:您可以根据实际部署架构选择以下教程进行安装部署: + + - 单机版:[单机版](../Deployment-and-Maintenance/Stand-Alone-Deployment_apache.md) + + - 集群版:[集群版](../Deployment-and-Maintenance/Cluster-Deployment_apache.md) + +> ❗️注意:目前我们仍然推荐直接在物理机/虚拟机上安装部署,如需要 docker 部署,可参考:[Docker 部署](../Deployment-and-Maintenance/Docker-Deployment_apache.md) + +## 如何使用? + +1. 数据库建模设计:数据库建模是创建数据库系统的重要步骤,它涉及到设计数据的结构和关系,以确保数据的组织方式能够满足特定应用的需求,下面的文档将会帮助您快速了解 IoTDB 的建模设计: + + - 时序概念介绍:[走进时序数据](../Basic-Concept/Navigating_Time_Series_Data.md) + + - 建模设计介绍:[数据模型介绍](../Basic-Concept/Data-Model-and-Terminology.md) + + - SQL 语法介绍:[SQL 语法介绍](../Basic-Concept/Operate-Metadata_apache.md) + +2. 数据写入:在数据写入方面,IoTDB 提供了多种方式来插入实时数据,基本的数据写入操作请查看 [数据写入](../Basic-Concept/Write-Delete-Data.md) + +3. 数据查询:IoTDB 提供了丰富的数据查询功能,数据查询的基本介绍请查看 [数据查询](../Basic-Concept/Query-Data.md) + +4. 其他进阶功能:除了数据库常见的写入、查询等功能外,IoTDB 还支持“数据同步、流处理框架、权限管理”等功能,具体使用方法可参见具体文档: + + - 数据同步:[数据同步](../User-Manual/Data-Sync_apache.md) + + - 流处理框架:[流处理框架](../User-Manual/Streaming_apache.md) + + - 权限管理:[权限管理](../User-Manual/Authority-Management.md) + +5. 应用编程接口: IoTDB 提供了多种应用编程接口(API),以便于开发者在应用程序中与 IoTDB 进行交互,目前支持 [Java](../API/Programming-Java-Native-API.md)、[Python](../API/Programming-Python-Native-API.md)、[C++](../API/Programming-Cpp-Native-API.md)等,更多编程接口可参见官网【应用编程接口】其他章节 + +## 还有哪些便捷的周边工具? + +IoTDB 除了自身拥有丰富的功能外,其周边的工具体系包含的种类十分齐全。本篇文档将帮助您快速使用周边工具体系: + + - 测试工具:IoT-benchmark 是一个基于 Java 和大数据环境开发的时序数据库基准测试工具,由清华大学软件学院研发并开源。它支持多种写入和查询方式,能够存储测试信息和结果供进一步查询或分析,并支持与 Tableau 集成以可视化测试结果。具体使用介绍请查看:[测试工具](../Tools-System/Benchmark.md) + + - 数据导入脚本:针对于不同场景,IoTDB 为用户提供多种批量导入数据的操作方式,具体使用介绍请查看:[数据导入](../Tools-System/Data-Import-Tool.md) + + + - 数据导出脚本:针对于不同场景,IoTDB 为用户提供多种批量导出数据的操作方式,具体使用介绍请查看:[数据导出](../Tools-System/Data-Export-Tool.md) + +## 想了解更多技术细节? + +如果您想了解 IoTDB 的更多技术内幕,可以移步至下面的文档: + + - 研究论文:IoTDB 具有列式存储、数据编码、预计算和索引技术,以及其类 SQL 接口和高性能数据处理能力,同时与 Apache Hadoop、MapReduce 和 Apache Spark 无缝集成。相关研究论文请查看 [研究论文](../Technical-Insider/Publication.md) + + - 压缩&编码:IoTDB 通过多样化的编码和压缩技术,针对不同数据类型优化存储效率,想了解更多请查看 [压缩&编码](../Technical-Insider/Encoding-and-Compression.md) + + - 数据分区和负载均衡:IoTDB 基于时序数据特性,精心设计了数据分区策略和负载均衡算法,提升了集群的可用性和性能,想了解更多请查看 [数据分区和负载均衡](../Technical-Insider/Cluster-data-partitioning.md) + +## 使用过程中遇到问题? + +如果您在安装或使用过程中遇到困难,可以移步至 [常见问题](../FAQ/Frequently-asked-questions.md) 中进行查看 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_timecho.md new file mode 100644 index 00000000..bcd0285e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/QuickStart/QuickStart_timecho.md @@ -0,0 +1,109 @@ + + +# 快速上手 + +本篇文档将帮助您了解快速入门 IoTDB 的方法。 + +## 如何安装部署? + +本篇文档将帮助您快速安装部署 IoTDB,您可以通过以下文档的链接快速定位到所需要查看的内容: + +1. 准备所需机器资源:IoTDB 的部署和运行需要考虑多个方面的机器资源配置。具体资源配置可查看 [资源规划](../Deployment-and-Maintenance/Database-Resources.md) + +2. 完成系统配置准备:IoTDB 的系统配置涉及多个方面,关键的系统配置介绍可查看 [系统配置](../Deployment-and-Maintenance/Environment-Requirements.md) + +3. 获取安装包:您可以联系天谋商务获取 IoTDB 安装包,以确保下载的是最新且稳定的版本。具体安装包结构可查看:[安装包获取](../Deployment-and-Maintenance/IoTDB-Package_timecho.md) + +4. 安装数据库并激活:您可以根据实际部署架构选择以下教程进行安装部署: + + - 单机版:[单机版](../Deployment-and-Maintenance/Stand-Alone-Deployment_timecho.md) + + - 集群版:[集群版](../Deployment-and-Maintenance//Cluster-Deployment_timecho.md) + + - 双活版:[双活版](../Deployment-and-Maintenance/Dual-Active-Deployment_timecho.md) + +> ❗️注意:目前我们仍然推荐直接在物理机/虚拟机上安装部署,如需要 docker 部署,可参考:[Docker 部署](../Deployment-and-Maintenance/Docker-Deployment_timecho.md) + +5. 安装数据库配套工具:企业版数据库提供监控面板、可视化控制台等配套工具,建议在部署企业版时安装,可以帮助您更加便捷的使用 IoTDB: + + - 监控面板:提供了上百个数据库监控指标,用来对 IoTDB 及其所在操作系统进行细致监控,从而进行系统优化、性能优化、发现瓶颈等,安装步骤可查看 [监控面板部署](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) + + - 可视化控制台:是 IoTDB 的可视化界面,支持通过界面交互的形式提供元数据管理、数据查询、数据可视化等功能的操作,帮助用户简单、高效的使用数据库,安装步骤可查看 [可视化控制台部署](../Deployment-and-Maintenance/workbench-deployment_timecho.md) + +## 如何使用? + +1. 数据库建模设计:数据库建模是创建数据库系统的重要步骤,它涉及到设计数据的结构和关系,以确保数据的组织方式能够满足特定应用的需求,下面的文档将会帮助您快速了解 IoTDB 的建模设计: + + - 时序概念介绍:[走进时序数据](../Basic-Concept/Navigating_Time_Series_Data.md) + + - 建模设计介绍:[数据模型介绍](../Basic-Concept/Data-Model-and-Terminology.md) + + - SQL 语法介绍:[SQL 语法介绍](../Basic-Concept/Operate-Metadata_timecho.md) + +2. 数据写入:在数据写入方面,IoTDB 提供了多种方式来插入实时数据,基本的数据写入操作请查看 [数据写入](../Basic-Concept/Write-Delete-Data.md) + +3. 数据查询:IoTDB 提供了丰富的数据查询功能,数据查询的基本介绍请查看 [数据查询](../Basic-Concept/Query-Data.md) + +4. 其他进阶功能:除了数据库常见的写入、查询等功能外,IoTDB 还支持“数据同步、流处理框架、安全控制、权限管理、AI 分析”等功能,具体使用方法可参见具体文档: + + - 数据同步:[数据同步](../User-Manual/Data-Sync_timecho.md) + + - 流处理框架:[流处理框架](../User-Manual/Streaming_timecho.md) + + - 安全控制:[安全控制](../User-Manual/White-List_timecho.md) + + - 权限管理:[权限管理](../User-Manual/Authority-Management.md) + + - AI 分析:[AI 能力](../User-Manual/AINode_timecho.md) + +5. 应用编程接口: IoTDB 提供了多种应用编程接口(API),以便于开发者在应用程序中与 IoTDB 进行交互,目前支持[ Java 原生接口](../API/Programming-Java-Native-API.md)、[Python 原生接口](../API/Programming-Python-Native-API.md)、[C++原生接口](../API/Programming-Cpp-Native-API.md)、[Go 原生接口](../API/Programming-Go-Native-API.md)等,更多编程接口可参见官网【应用编程接口】其他章节 + +## 还有哪些便捷的周边工具? + +IoTDB 除了自身拥有丰富的功能外,其周边的工具体系包含的种类十分齐全。本篇文档将帮助您快速使用周边工具体系: + + - 可视化控制台:workbench 是 IoTDB 的一个支持界面交互的形式的可视化界面,提供直观的元数据管理、数据查询和数据可视化等功能,提升用户操作数据库的便捷性和效率,具体使用介绍请查看 [可视化控制台部署](../Deployment-and-Maintenance/workbench-deployment_timecho.md) + + - 监控面板:是一个对 IoTDB 及其所在操作系统进行细致监控的工具,涵盖数据库性能、系统资源等上百个数据库监控指标,助力系统优化与瓶颈识别等,具体使用介绍请查看 [监控面板部署](../Deployment-and-Maintenance/Monitoring-panel-deployment.md) + + - 测试工具:IoT-benchmark 是一个基于 Java 和大数据环境开发的时序数据库基准测试工具,由清华大学软件学院研发并开源。它支持多种写入和查询方式,能够存储测试信息和结果供进一步查询或分析,并支持与 Tableau 集成以可视化测试结果。具体使用介绍请查看:[测试工具](../Tools-System/Benchmark.md) + + - 数据导入脚本:针对于不同场景,IoTDB 为用户提供多种批量导入数据的操作方式,具体使用介绍请查看:[数据导入](../Tools-System/Data-Import-Tool.md) + + + - 数据导出脚本:针对于不同场景,IoTDB 为用户提供多种批量导出数据的操作方式,具体使用介绍请查看:[数据导出](../Tools-System/Data-Export-Tool.md) + + +## 想了解更多技术细节? + +如果您想了解 IoTDB 的更多技术内幕,可以移步至下面的文档: + + - 研究论文:IoTDB 具有列式存储、数据编码、预计算和索引技术,以及其类 SQL 接口和高性能数据处理能力,同时与 Apache Hadoop、MapReduce 和 Apache Spark 无缝集成。相关研究论文请查看 [研究论文](../Technical-Insider/Publication.md) + + - 压缩&编码:IoTDB 通过多样化的编码和压缩技术,针对不同数据类型优化存储效率,想了解更多请查看 [压缩&编码](../Technical-Insider/Encoding-and-Compression.md) + + - 数据分区和负载均衡:IoTDB 基于时序数据特性,精心设计了数据分区策略和负载均衡算法,提升了集群的可用性和性能,想了解更多请查看 [数据分区和负载均衡](../Technical-Insider/Cluster-data-partitioning.md) + + +## 使用过程中遇到问题? + +如果您在安装或使用过程中遇到困难,可以移步至 [常见问题](../FAQ/Frequently-asked-questions.md) 中进行查看 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/Common-Config-Manual.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/Common-Config-Manual.md new file mode 100644 index 00000000..7aefb2e4 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/Common-Config-Manual.md @@ -0,0 +1,2220 @@ + + +# 配置参数 + +## 公共配置参数 + +IoTDB ConfigNode 和 DataNode 的公共配置参数位于 `conf` 目录下。 + +* `iotdb-system.properties`:IoTDB 集群的公共配置。 + +### 改后生效方式 +不同的配置参数有不同的生效方式,分为以下三种: + ++ **仅允许在第一次启动服务前修改:** 在第一次启动 ConfigNode/DataNode 后即禁止修改,修改会导致 ConfigNode/DataNode 无法启动。 ++ **重启服务生效:** ConfigNode/DataNode 启动后仍可修改,但需要重启 ConfigNode/DataNode 后才生效。 ++ **热加载:** 可在 ConfigNode/DataNode 运行时修改,修改后通过 Session 或 Cli 发送 ```load configuration``` 或 `set configuration` 命令(SQL)至 IoTDB 使配置生效。 + +### 系统配置项 + +#### 副本配置 + +* config\_node\_consensus\_protocol\_class + +| 名字 | config\_node\_consensus\_protocol\_class | +|:------:|:------------------------------------------------| +| 描述 | ConfigNode 副本的共识协议,仅支持 RatisConsensus | +| 类型 | String | +| 默认值 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* schema\_replication\_factor + +| 名字 | schema\_replication\_factor | +|:------:|:----------------------------| +| 描述 | Database 的默认元数据副本数 | +| 类型 | int32 | +| 默认值 | 1 | +| 改后生效方式 | 重启服务后对**新的 Database** 生效 | + + +* schema\_region\_consensus\_protocol\_class + +| 名字 | schema\_region\_consensus\_protocol\_class | +|:------:|:----------------------------------------------------------------| +| 描述 | 元数据副本的共识协议,多副本时只能使用 RatisConsensus | +| 类型 | String | +| 默认值 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* data\_replication\_factor + +| 名字 | data\_replication\_factor | +|:------:|:--------------------------| +| 描述 | Database 的默认数据副本数 | +| 类型 | int32 | +| 默认值 | 1 | +| 改后生效方式 | 重启服务后对**新的 Database** 生效 | + +* data\_region\_consensus\_protocol\_class + +| 名字 | data\_region\_consensus\_protocol\_class | +|:------:|:------------------------------------------------------------------------------| +| 描述 | 数据副本的共识协议,多副本时可以使用 IoTConsensus 或 RatisConsensus | +| 类型 | String | +| 默认值 | org.apache.iotdb.consensus.iot.IoTConsensus | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +#### 负载均衡配置 + +* series\_partition\_slot\_num + +| 名字 | series\_slot\_num | +|:------:|:------------------| +| 描述 | 序列分区槽数 | +| 类型 | int32 | +| 默认值 | 10000 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* series\_partition\_executor\_class + +| 名字 | series\_partition\_executor\_class | +|:------:|:------------------------------------------------------------------| +| 描述 | 序列分区哈希函数 | +| 类型 | String | +| 默认值 | org.apache.iotdb.commons.partition.executor.hash.BKDRHashExecutor | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* schema\_region\_group\_extension\_policy + +| 名字 | schema\_region\_group\_extension\_policy | +|:------:|:-----------------------------------------| +| 描述 | SchemaRegionGroup 的扩容策略 | +| 类型 | string | +| 默认值 | AUTO | +| 改后生效方式 | 重启生效 | + +* default\_schema\_region\_group\_num\_per\_database + +| 名字 | default\_schema\_region\_group\_num\_per\_database | +|:------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------| +| 描述 | 当选用 CUSTOM-SchemaRegionGroup 扩容策略时,此参数为每个 Database 拥有的 SchemaRegionGroup 数量;当选用 AUTO-SchemaRegionGroup 扩容策略时,此参数为每个 Database 最少拥有的 SchemaRegionGroup 数量 | +| 类型 | int | +| 默认值 | 1 | +| 改后生效方式 | 重启生效 | + +* schema\_region\_per\_data\_node + +| 名字 | schema\_region\_per\_data\_node | +|:------:|:--------------------------------------| +| 描述 | 期望每个 DataNode 可管理的 SchemaRegion 的最大数量 | +| 类型 | double | +| 默认值 | 与 schema_replication_factor 相同 | +| 改后生效方式 | 重启生效 | + +* data\_region\_group\_extension\_policy + +| 名字 | data\_region\_group\_extension\_policy | +|:------:|:---------------------------------------| +| 描述 | DataRegionGroup 的扩容策略 | +| 类型 | string | +| 默认值 | AUTO | +| 改后生效方式 | 重启生效 | + +* default\_data\_region\_group\_num\_per\_database + +| 名字 | default\_data\_region\_group\_per\_database | +|:------:|:------------------------------------------------------------------------------------------------------------------------------------------------| +| 描述 | 当选用 CUSTOM-DataRegionGroup 扩容策略时,此参数为每个 Database 拥有的 DataRegionGroup 数量;当选用 AUTO-DataRegionGroup 扩容策略时,此参数为每个 Database 最少拥有的 DataRegionGroup 数量 | +| 类型 | int | +| 默认值 | 2 | +| 改后生效方式 | 重启生效 | + +* data\_region\_per\_data\_node + +| 名字 | data\_region\_per\_data\_node| +|:------:|:-----------------------------| +| 描述 | 期望每个 DataNode 可管理的 DataRegion 的最大数量 | +| 类型 | double | +| 默认值 | 1.0 | +| 改后生效方式 | 重启生效 | + +* enable\_data\_partition\_inherit\_policy + +| 名字 | enable\_data\_partition\_inherit\_policy | +|:------:|:----------------------------------------------------------------| +| 描述 | 开启 DataPartition 继承策略后,同一个序列分区槽内的 DataPartition 会继承之前时间分区槽的分配结果 | +| 类型 | Boolean | +| 默认值 | false | +| 改后生效方式 | 重启生效 | + +* leader\_distribution\_policy + +| 名字 | leader\_distribution\_policy | +|:------:|:-----------------------------| +| 描述 | 集群 RegionGroup 的 leader 分配策略 | +| 类型 | String | +| 默认值 | MIN_COST_FLOW | +| 改后生效方式 | 重启生效 | + +* enable\_auto\_leader\_balance\_for\_ratis\_consensus + +| 名字 | enable\_auto\_leader\_balance\_for\_ratis\_consensus | +|:------:|:-----------------------------------------------------| +| 描述 | 是否为 Ratis 共识协议开启自动均衡 leader 策略 | +| 类型 | Boolean | +| 默认值 | false | +| 改后生效方式 | 重启生效 | + +* enable\_auto\_leader\_balance\_for\_iot\_consensus + +| 名字 | enable\_auto\_leader\_balance\_for\_iot\_consensus | +|:------:|:---------------------------------------------------| +| 描述 | 是否为 IoT 共识协议开启自动均衡 leader 策略 | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 重启生效 | + +#### 集群管理 + +* cluster\_name + +| 名字 | cluster\_name | +|:----:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| 描述 | 集群名称 | +| 类型 | String | +| 默认值 | default_cluster | +| 修改方式 | CLI 中执行语句 ```set configuration "cluster_name"="xxx"``` (xxx为希望修改成的集群名称) | +| 注意 | 此修改通过网络分发至每个节点。在网络波动或者有节点宕机的情况下,不保证能够在全部节点修改成功。未修改成功的节点重启时无法加入集群,此时需要手动修改该节点的配置文件中的cluster_name项,再重启。正常情况下,不建议通过手动修改配置文件的方式修改集群名称,不建议通过```load configuration```的方式热加载。 | + +* time\_partition\_interval + +| 名字 | time\_partition\_interval | +|:------:|:--------------------------| +| 描述 | Database 默认的数据时间分区间隔 | +| 类型 | Long | +| 单位 | 毫秒 | +| 默认值 | 604800000 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* heartbeat\_interval\_in\_ms + +| 名字 | heartbeat\_interval\_in\_ms | +|:------:|:----------------------------| +| 描述 | 集群节点间的心跳间隔 | +| 类型 | Long | +| 单位 | ms | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +* disk\_space\_warning\_threshold + +| 名字 | disk\_space\_warning\_threshold | +|:------:|:--------------------------------| +| 描述 | DataNode 磁盘剩余阈值 | +| 类型 | double(percentage) | +| 默认值 | 0.05 | +| 改后生效方式 | 重启生效 | + +#### 内存控制配置 + +* datanode\_memory\_proportion + +|名字| datanode\_memory\_proportion | +|:---:|:----------------------------------------------------------------------| +|描述| 存储,查询,元数据,流处理引擎,共识层,空闲内存比例 | +|类型| Ratio | +|默认值| 3:3:1:1:1:1 | +|改后生效方式| 重启生效 | + +* schema\_memory\_allocate\_proportion + +|名字| schema\_memory\_allocate\_proportion | +|:---:|:------------------------------------------------------------| +|描述| SchemaRegion, SchemaCache,PartitionCache,LastCache 占元数据内存比例 | +|类型| Ratio | +|默认值| 5:3:1:1 | +|改后生效方式| 重启生效 | + +* storage\_engine\_memory\_proportion + +|名字| storage\_engine\_memory\_proportion | +|:---:|:------------------------------------| +|描述| 写入和合并占存储内存比例 | +|类型| Ratio | +|默认值| 8:2 | +|改后生效方式| 重启生效 | + +* write\_memory\_proportion + +|名字| write\_memory\_proportion | +|:---:|:-------------------------------------| +|描述| Memtable 和 TimePartitionInfo 占写入内存比例 | +|类型| Ratio | +|默认值| 19:1 | +|改后生效方式| 重启生效 | + +* primitive\_array\_size + +| 名字 | primitive\_array\_size | +|:------:|:-----------------------| +| 描述 | 数组池中的原始数组大小(每个数组的长度) | +| 类型 | int32 | +| 默认值 | 64 | +| 改后生效方式 | 重启生效 | + +* flush\_proportion + +| 名字 | flush\_proportion | +| :----------: | :---------------------------------------------------------------------------------------------------------- | +| 描述 | 调用flush disk的写入内存比例,默认0.4,若有极高的写入负载力(比如batch=1000),可以设置为低于默认值,比如0.2 | +| 类型 | Double | +| 默认值 | 0.4 | +| 改后生效方式 | 重启生效 | + +* buffered\_arrays\_memory\_proportion + +| 名字 | buffered\_arrays\_memory\_proportion | +| :----------: | :-------------------------------------- | +| 描述 | 为缓冲数组分配的写入内存比例,默认为0.6 | +| 类型 | Double | +| 默认值 | 0.6 | +| 改后生效方式 | 重启生效 | + +* reject\_proportion + +| 名字 | reject\_proportion | +| :----------: | :----------------------------------------------------------------------------------------------------------------------- | +| 描述 | 拒绝插入的写入内存比例,默认0.8,若有极高的写入负载力(比如batch=1000)并且物理内存足够大,它可以设置为高于默认值,如0.9 | +| 类型 | Double | +| 默认值 | 0.8 | +| 改后生效方式 | 重启生效 | + +* write\_memory\_variation\_report\_proportion + +| 名字 | write\_memory\_variation\_report\_proportion | +| :----------: | :-------------------------------------------------------------------------------- | +| 描述 | 如果 DataRegion 的内存增加超过写入可用内存的一定比例,则向系统报告。默认值为0.001 | +| 类型 | Double | +| 默认值 | 0.001 | +| 改后生效方式 | 重启生效 | + +* check\_period\_when\_insert\_blocked + +|名字| check\_period\_when\_insert\_blocked | +|:---:|:---| +|描述| 当插入被拒绝时,等待时间(以毫秒为单位)去再次检查系统,默认为50。若插入被拒绝,读取负载低,可以设置大一些。 | +|类型| int32 | +|默认值| 50 | +|改后生效方式|重启生效| + +* io\_task\_queue\_size\_for\_flushing + +|名字| io\_task\_queue\_size\_for\_flushing | +|:---:|:---| +|描述| ioTaskQueue 的大小。默认值为10。| +|类型| int32 | +|默认值| 10 | +|改后生效方式|重启生效| + +* enable\_query\_memory\_estimation + +|名字| enable\_query\_memory\_estimation | +|:---:|:----------------------------------| +|描述| 开启后会预估每次查询的内存使用量,如果超过可用内存,会拒绝本次查询 | +|类型| bool | +|默认值| true | +|改后生效方式| 热加载 | + +* partition\_cache\_size + +|名字| partition\_cache\_size | +|:---:|:---| +|描述| 分区信息缓存的最大缓存条目数。| +|类型| Int32 | +|默认值| 1000 | +|改后生效方式|重启生效| + +#### 元数据引擎配置 + +* schema\_engine\_mode + +|名字| schema\_engine\_mode | +|:---:|:---| +|描述| 元数据引擎的运行模式,支持 Memory 和 PBTree;PBTree 模式下支持将内存中暂时不用的序列元数据实时置换到磁盘上,需要使用时再加载进内存;此参数在集群中所有的 DataNode 上务必保持相同。| +|类型| string | +|默认值| Memory | +|改后生效方式|仅允许在第一次启动服务前修改| + +* mlog\_buffer\_size + +|名字| mlog\_buffer\_size | +|:---:|:---| +|描述| mlog 的 buffer 大小 | +|类型| int32 | +|默认值| 1048576 | +|改后生效方式|热加载| + +* sync\_mlog\_period\_in\_ms + +| 名字 | sync\_mlog\_period\_in\_ms | +| :----------: | :---------------------------------------------------------------------------------------------------- | +| 描述 | mlog定期刷新到磁盘的周期,单位毫秒。如果该参数为0,则表示每次对元数据的更新操作都会被立即写到磁盘上。 | +| 类型 | Int64 | +| 默认值 | 100 | +| 改后生效方式 | 重启生效 | + +* tag\_attribute\_total\_size + +|名字| tag\_attribute\_total\_size | +|:---:|:---| +|描述| 每个时间序列标签和属性的最大持久化字节数 | +|类型| int32 | +|默认值| 700 | +|改后生效方式|仅允许在第一次启动服务前修改| + +* tag\_attribute\_flush\_interval + +|名字| tag\_attribute\_flush\_interval | +|:---:|:--------------------------------| +|描述| 标签和属性记录的间隔数,达到此记录数量时将强制刷盘 | +|类型| int32 | +|默认值| 1000 | +|改后生效方式| 仅允许在第一次启动服务前修改 | + +* schema\_region\_device\_node\_cache\_size + +|名字| schema\_region\_device\_node\_cache\_size | +|:---:|:--------------------------------| +|描述| schemaRegion中用于加速device节点访问所设置的device节点缓存的大小 | +|类型| Int32 | +|默认值| 10000 | +|改后生效方式| 重启生效 | + +* max\_measurement\_num\_of\_internal\_request + +|名字| max\_measurement\_num\_of\_internal\_request | +|:---:|:--------------------------------| +|描述| 一次注册序列请求中若物理量过多,在系统内部执行时将被拆分为若干个轻量级的子请求,每个子请求中的物理量数目不超过此参数设置的最大值。 | +|类型| Int32 | +|默认值| 10000 | +|改后生效方式| 重启生效 | + +#### 数据类型自动推断 + +* enable\_auto\_create\_schema + +| 名字 | enable\_auto\_create\_schema | +| :----------: | :------------------------------------- | +| 描述 | 当写入的序列不存在时,是否自动创建序列 | +| 取值 | true or false | +| 默认值 | true | +| 改后生效方式 | 重启生效 | + +* default\_storage\_group\_level + +|名字| default\_storage\_group\_level | +|:---:|:---| +|描述| 当写入的数据不存在且自动创建序列时,若需要创建相应的 database,将序列路径的哪一层当做 database。例如,如果我们接到一个新序列 root.sg0.d1.s2, 并且 level=1, 那么 root.sg0 被视为database(因为 root 是 level 0 层)| +|取值| int32 | +|默认值| 1 | +|改后生效方式|重启生效| + +* boolean\_string\_infer\_type + +| 名字 | boolean\_string\_infer\_type | +| :----------: | :----------------------------------------- | +| 描述 | "true" 或者 "false" 字符串被推断的数据类型 | +| 取值 | BOOLEAN 或者 TEXT | +| 默认值 | BOOLEAN | +| 改后生效方式 | 重启生效 | + +* integer\_string\_infer\_type + +| 名字 | integer\_string\_infer\_type | +| :----------: |:----------------------------------| +| 描述 | 整型字符串推断的数据类型 | +| 取值 | INT32, INT64, FLOAT, DOUBLE, TEXT | +| 默认值 | DOUBLE | +| 改后生效方式 | 重启生效 | + +* floating\_string\_infer\_type + +| 名字 | floating\_string\_infer\_type | +| :----------: |:------------------------------| +| 描述 | "6.7"等字符串被推断的数据类型 | +| 取值 | DOUBLE, FLOAT or TEXT | +| 默认值 | DOUBLE | +| 改后生效方式 | 重启生效 | + +* nan\_string\_infer\_type + +| 名字 | nan\_string\_infer\_type | +| :----------: | :--------------------------- | +| 描述 | "NaN" 字符串被推断的数据类型 | +| 取值 | DOUBLE, FLOAT or TEXT | +| 默认值 | DOUBLE | +| 改后生效方式 | 重启生效 | + +* default\_boolean\_encoding + +| 名字 | default\_boolean\_encoding | +| :----------: | :------------------------- | +| 描述 | BOOLEAN 类型编码格式 | +| 取值 | PLAIN, RLE | +| 默认值 | RLE | +| 改后生效方式 | 重启生效 | + +* default\_int32\_encoding + +| 名字 | default\_int32\_encoding | +| :----------: |:----------------------------------------| +| 描述 | int32 类型编码格式 | +| 取值 | PLAIN, RLE, TS\_2DIFF, REGULAR, GORILLA | +| 默认值 | RLE | +| 改后生效方式 | 重启生效 | + +* default\_int64\_encoding + +| 名字 | default\_int64\_encoding | +| :----------: |:----------------------------------------| +| 描述 | int64 类型编码格式 | +| 取值 | PLAIN, RLE, TS\_2DIFF, REGULAR, GORILLA | +| 默认值 | RLE | +| 改后生效方式 | 重启生效 | + +* default\_float\_encoding + +| 名字 | default\_float\_encoding | +| :----------: |:-------------------------------| +| 描述 | float 类型编码格式 | +| 取值 | PLAIN, RLE, TS\_2DIFF, GORILLA | +| 默认值 | GORILLA | +| 改后生效方式 | 重启生效 | + +* default\_double\_encoding + +| 名字 | default\_double\_encoding | +| :----------: |:-------------------------------| +| 描述 | double 类型编码格式 | +| 取值 | PLAIN, RLE, TS\_2DIFF, GORILLA | +| 默认值 | GORILLA | +| 改后生效方式 | 重启生效 | + +* default\_text\_encoding + +| 名字 | default\_text\_encoding | +| :----------: | :---------------------- | +| 描述 | text 类型编码格式 | +| 取值 | PLAIN | +| 默认值 | PLAIN | +| 改后生效方式 | 重启生效 | + +#### 查询配置 + +* read\_consistency\_level + +|名字| read\_consistency\_level | +|:---:|:---| +|描述| 查询一致性等级,取值 “strong” 时从 Leader 副本查询,取值 “weak” 时随机查询一个副本。| +|类型| String | +|默认值| strong | +|改后生效方式| 重启生效 | + +* meta\_data\_cache\_enable + +|名字| meta\_data\_cache\_enable | +|:---:|:---| +|描述| 是否缓存元数据(包括 BloomFilter、Chunk Metadata 和 TimeSeries Metadata。)| +|类型|Boolean| +|默认值| true | +|改后生效方式| 重启生效| + +* chunk\_timeseriesmeta\_free\_memory\_proportion + +| 名字 | chunk\_timeseriesmeta\_free\_memory\_proportion | +| :----------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| 描述 | 读取内存分配比例,BloomFilterCache、ChunkCache、TimeseriesMetadataCache、数据集查询的内存和可用内存的查询。参数形式为a : b : c : d : e,其中a、b、c、d、e为整数。 例如“1 : 1 : 1 : 1 : 1” ,“1 : 100 : 200 : 300 : 400” 。 | +| 类型 | String | +| 默认值 | 1 : 100 : 200 : 300 : 400 | +| 改后生效方式 | 重启生效 | + +* enable\_last\_cache + +| 名字 | enable\_last\_cache | +| :----------: | :------------------ | +| 描述 | 是否开启最新点缓存 | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 重启生效 | + +* mpp\_data\_exchange\_core\_pool\_size + +| 名字 | mpp\_data\_exchange\_core\_pool\_size | +| :----------: | :------------------------------------ | +| 描述 | MPP 数据交换线程池核心线程数 | +| 类型 | int32 | +| 默认值 | 10 | +| 改后生效方式 | 重启生效 | + +* mpp\_data\_exchange\_max\_pool\_size + +| 名字 | mpp\_data\_exchange\_max\_pool\_size | +| :----------: | :----------------------------------- | +| 描述 | MPP 数据交换线程池最大线程数 | +| 类型 | int32 | +| 默认值 | 10 | +| 改后生效方式 | 重启生效 | + +* mpp\_data\_exchange\_keep\_alive\_time\_in\_ms + +| 名字 | mpp\_data\_exchange\_keep\_alive\_time\_in\_ms | +| :----------: | :--------------------------------------------- | +| 描述 | MPP 数据交换最大等待时间 | +| 类型 | int32 | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +* driver\_task\_execution\_time\_slice\_in\_ms + +| 名字 | driver\_task\_execution\_time\_slice\_in\_ms | +| :----------: | :------------------------------------------- | +| 描述 | 单个 DriverTask 最长执行时间 | +| 类型 | int32 | +| 默认值 | 100 | +| 改后生效方式 | 重启生效 | + +* max\_tsblock\_size\_in\_bytes + +| 名字 | max\_tsblock\_size\_in\_bytes | +| :----------: | :---------------------------- | +| 描述 | 单个 TsBlock 的最大容量 | +| 类型 | int32 | +| 默认值 | 1024 * 1024 (1 MB) | +| 改后生效方式 | 重启生效 | + +* max\_tsblock\_line\_numbers + +| 名字 | max\_tsblock\_line\_numbers | +| :----------: | :-------------------------- | +| 描述 | 单个 TsBlock 的最大行数 | +| 类型 | int32 | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +* slow\_query\_threshold + +|名字| slow\_query\_threshold | +|:---:|:-----------------------| +|描述| 慢查询的时间阈值。单位:毫秒。 | +|类型| Int32 | +|默认值| 30000 | +|改后生效方式| 热加载 | + +* query\_timeout\_threshold + +|名字| query\_timeout\_threshold | +|:---:|:---| +|描述| 查询的最大执行时间。单位:毫秒。| +|类型| Int32 | +|默认值| 60000 | +|改后生效方式| 重启生效| + +* max\_allowed\_concurrent\_queries + +|名字| max\_allowed\_concurrent\_queries | +|:---:|:---| +|描述| 允许的最大并发查询数量。 | +|类型| Int32 | +|默认值| 1000 | +|改后生效方式|重启生效| + +* query\_thread\_count + +|名字| query\_thread\_count | +|:---:|:----------------------------------------------------------------------------| +|描述| 当 IoTDB 对内存中的数据进行查询时,最多启动多少个线程来执行该操作。如果该值小于等于 0,那么采用机器所安装的 CPU 核的数量。 | +|类型| Int32 | +|默认值| CPU 核数 | +|改后生效方式| 重启生效 | + +* batch\_size + +|名字| batch\_size | +|:---:|:---| +|描述| 服务器中每次迭代的数据量(数据条目,即不同时间戳的数量。) | +|类型| Int32 | +|默认值| 100000 | +|改后生效方式|重启生效| + +#### TTL 配置 +* ttl\_check\_interval + +| 名字 | ttl\_check\_interval | +| :----------: |:-------------------------| +| 描述 | ttl 检查任务的间隔,单位 ms,默认为 2h | +| 类型 | int | +| 默认值 | 7200000 | +| 改后生效方式 | 重启生效 | + +* max\_expired\_time + +| 名字 | max\_expired\_time | +| :----------: |:-----------------------------| +| 描述 | 如果一个文件中存在设备已经过期超过此时间,那么这个文件将被立即整理。单位 ms,默认为一个月 | +| 类型 | int | +| 默认值 | 2592000000 | +| 改后生效方式 | 重启生效 | + +* expired\_data\_ratio + +| 名字 | expired\_data\_ratio | +| :----------: |:----------------------------------------------------------| +| 描述 | 过期设备比例。如果一个文件中过期设备的比率超过这个值,那么这个文件中的过期数据将通过 compaction 清理。 | +| 类型 | float | +| 默认值 | 0.3 | +| 改后生效方式 | 重启生效 | + +#### 存储引擎配置 + +* timestamp\_precision + +| 名字 | timestamp\_precision | +| :----------: | :-------------------------- | +| 描述 | 时间戳精度,支持 ms、us、ns | +| 类型 | String | +| 默认值 | ms | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* tier\_ttl\_in\_ms + +|名字| tier\_ttl\_in\_ms | +|:---:|:--------------| +|描述| 定义每个层级负责的数据范围,通过 TTL 表示 | +|类型| long | +|默认值| -1 | +|改后生效方式| 重启生效 | + + +* max\_waiting\_time\_when\_insert\_blocked + +| 名字 | max\_waiting\_time\_when\_insert\_blocked | +| :----------: |:------------------------------------------| +| 描述 | 当插入请求等待超过这个时间,则抛出异常,单位 ms | +| 类型 | Int32 | +| 默认值 | 10000 | +| 改后生效方式 | 重启生效 | + +* handle\_system\_error + +| 名字 | handle\_system\_error | +| :----------: |:-----------------------| +| 描述 | 当系统遇到不可恢复的错误时的处理方法 | +| 类型 | String | +| 默认值 | CHANGE\_TO\_READ\_ONLY | +| 改后生效方式 | 重启生效 | + +* enable\_timed\_flush\_seq\_memtable + +| 名字 | enable\_timed\_flush\_seq\_memtable | +|:------:|:------------------------------------| +| 描述 | 是否开启定时刷盘顺序 memtable | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 热加载 | + +* seq\_memtable\_flush\_interval\_in\_ms + +| 名字 | seq\_memtable\_flush\_interval\_in\_ms | +|:------:|:---------------------------------------------| +| 描述 | 当 memTable 的创建时间小于当前时间减去该值时,该 memtable 需要被刷盘 | +| 类型 | int32 | +| 默认值 | 10800000 | +| 改后生效方式 | 热加载 | + +* seq\_memtable\_flush\_check\_interval\_in\_ms + +|名字| seq\_memtable\_flush\_check\_interval\_in\_ms | +|:---:|:---| +|描述| 检查顺序 memtable 是否需要刷盘的时间间隔 | +|类型| int32 | +|默认值| 600000 | +|改后生效方式| 热加载 | + +* enable\_timed\_flush\_unseq\_memtable + +| 名字 | enable\_timed\_flush\_unseq\_memtable | +| :----------: | :------------------------------------ | +| 描述 | 是否开启定时刷新乱序 memtable | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 热加载 | + +* unseq\_memtable\_flush\_interval\_in\_ms + +| 名字 | unseq\_memtable\_flush\_interval\_in\_ms | +|:------:|:---------------------------------------------| +| 描述 | 当 memTable 的创建时间小于当前时间减去该值时,该 memtable 需要被刷盘 | +| 类型 | int32 | +| 默认值 | 600000 | +| 改后生效方式 | 热加载 | + +* unseq\_memtable\_flush\_check\_interval\_in\_ms + +|名字| unseq\_memtable\_flush\_check\_interval\_in\_ms | +|:---:|:---| +|描述| 检查乱序 memtable 是否需要刷盘的时间间隔 | +|类型| int32 | +|默认值| 30000 | +|改后生效方式| 热加载 | + +* tvlist\_sort\_algorithm + +|名字| tvlist\_sort\_algorithm | +|:---:|:------------------------| +|描述| memtable中数据的排序方法 | +|类型| String | +|默认值| TIM | +|改后生效方式| 重启生效 | + +* avg\_series\_point\_number\_threshold + +|名字| avg\_series\_point\_number\_threshold | +|:---:|:--------------------------------------| +|描述| 内存中平均每个时间序列点数最大值,达到触发 flush | +|类型| int32 | +|默认值| 100000 | +|改后生效方式| 重启生效 | + +* flush\_thread\_count + +|名字| flush\_thread\_count | +|:---:|:---| +|描述| 当 IoTDB 将内存中的数据写入磁盘时,最多启动多少个线程来执行该操作。如果该值小于等于 0,那么采用机器所安装的 CPU 核的数量。默认值为 0。| +|类型| int32 | +|默认值| 0 | +|改后生效方式|重启生效| + +* enable\_partial\_insert + +| 名字 | enable\_partial\_insert | +| :----------: | :----------------------------------------------------------------- | +| 描述 | 在一次 insert 请求中,如果部分测点写入失败,是否继续写入其他测点。 | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 重启生效 | + +* recovery\_log\_interval\_in\_ms + +| 名字 | recovery\_log\_interval\_in\_ms | +| :----------: |:--------------------------------| +| 描述 | data region的恢复过程中打印日志信息的间隔 | +| 类型 | Int32 | +| 默认值 | 5000 | +| 改后生效方式 | 重启生效 | + +* 0.13\_data\_insert\_adapt + +| 名字 | 0.13\_data\_insert\_adapt | +| :----------: |:----------------------------------| +| 描述 | 如果 0.13 版本客户端进行写入,需要将此配置项设置为 true | +| 类型 | Boolean | +| 默认值 | false | +| 改后生效方式 | 重启生效 | + +* device\_path\_cache\_size + +| 名字 | device\_path\_cache\_size | +| :----------: |:------------------------------------------------------| +| 描述 | Device Path 缓存的最大数量,这个缓存可以避免写入过程中重复的 Device Path 对象创建 | +| 类型 | Int32 | +| 默认值 | 500000 | +| 改后生效方式 | 重启生效 | + +* insert\_multi\_tablet\_enable\_multithreading\_column\_threshold + +| 名字 | insert\_multi\_tablet\_enable\_multithreading\_column\_threshold | +| :----------: |:-----------------------------------------------------------------| +| 描述 | 插入时启用多线程插入列数的阈值 | +| 类型 | int32 | +| 默认值 | 10 | +| 改后生效方式 | 重启生效 | + +#### 合并配置 + +* enable\_seq\_space\_compaction + +| 名字 | enable\_seq\_space\_compaction | +| :----------: | :------------------------------------- | +| 描述 | 顺序空间内合并,开启顺序文件之间的合并 | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 热加载 | + +* enable\_unseq\_space\_compaction + +| 名字 | enable\_unseq\_space\_compaction | +| :----------: |:---------------------------------| +| 描述 | 乱序空间内合并,开启乱序文件之间的合并 | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 热加载 | + +* enable\_cross\_space\_compaction + +| 名字 | enable\_cross\_space\_compaction | +| :----------: | :----------------------------------------- | +| 描述 | 跨空间合并,开启将乱序文件合并到顺序文件中 | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 热加载 | + +* enable\_auto\_repair\_compaction + +| 名字 | enable\_auto\_repair\_compaction | +| :----------: |:---------------------------------| +| 描述 | 修复文件的合并任务 | +| 类型 | Boolean | +| 默认值 | true | +| 改后生效方式 | 热加载 | + + +* cross\_selector + +|名字| cross\_selector | +|:---:|:----------------| +|描述| 跨空间合并任务选择器的类型 | +|类型| String | +|默认值| rewrite | +|改后生效方式| 重启生效 | + +* cross\_performer + +|名字| cross\_performer | +|:---:|:-----------------| +|描述| 跨空间合并任务执行器的类型,可选项是read_point和fast,默认是read_point,fast还在测试中 | +|类型| String | +|默认值| read\_point | +|改后生效方式| 重启生效 | + +* inner\_seq\_selector + + +|名字| inner\_seq\_selector | +|:---:|:----------------------------------------------------------------------------| +|描述| 顺序空间内合并任务选择器的类型,可选 size\_tiered\_single_\target,size\_tiered\_multi\_target | +|类型| String | +|默认值| size\_tiered\_multi\_target | +|改后生效方式| 热加载 | + + +* inner\_seq\_performer + +|名字| inner\_seq\_performer | +|:---:|:------------------------------------------------------------| +|描述| 顺序空间内合并任务执行器的类型,可选项是read_chunk和fast,默认是read_chunk,fast还在测试中 | +|类型| String | +|默认值| read\_chunk | +|改后生效方式| 重启生效 | + +* inner\_unseq\_selector + +|名字| inner\_unseq\_selector | +|:---:|:----------------------------------------------------------------------------| +|描述| 乱序空间内合并任务选择器的类型,可选 size\_tiered\_single_\target,size\_tiered\_multi\_target | +|类型| String | +|默认值| size\_tiered\_multi\_target | +|改后生效方式| 热加载 | + + +* inner\_unseq\_performer + +|名字| inner\_unseq\_performer | +|:---:|:------------------------------------------------------------| +|描述| 乱序空间内合并任务执行器的类型,可选项是read_point和fast,默认是read_point,fast还在测试中 | +|类型| String | +|默认值| read\_point | +|改后生效方式| 重启生效 | + +* compaction\_priority + +| 名字 | compaction\_priority | +| :----------: |:------------------------------------------------------------------------------------------| +| 描述 | 合并时的优先级,BALANCE 各种合并平等,INNER_CROSS 优先进行顺序文件和顺序文件或乱序文件和乱序文件的合并,CROSS_INNER 优先将乱序文件合并到顺序文件中 | +| 类型 | String | +| 默认值 | INNER_CROSS | +| 改后生效方式 | 重启服务生效 | + + +* target\_compaction\_file\_size + +| 名字 | target\_compaction\_file\_size | +| :----------: |:-------------------------------| +| 描述 | 合并后的目标文件大小 | +| 类型 | Int64 | +| 默认值 | 2147483648 | +| 改后生效方式 | 重启生效 | + +* target\_chunk\_size + +| 名字 | target\_chunk\_size | +| :----------: | :---------------------- | +| 描述 | 合并时 Chunk 的目标大小 | +| 类型 | Int64 | +| 默认值 | 1048576 | +| 改后生效方式 | 重启生效 | + +* target\_chunk\_point\_num + +|名字| target\_chunk\_point\_num | +|:---:|:---| +|描述| 合并时 Chunk 的目标点数 | +|类型| int32 | +|默认值| 100000 | +|改后生效方式|重启生效| + +* chunk\_size\_lower\_bound\_in\_compaction + +| 名字 | chunk\_size\_lower\_bound\_in\_compaction | +| :----------: |:------------------------------------------| +| 描述 | 合并时源 Chunk 的大小小于这个值,将被解开成点进行合并 | +| 类型 | Int64 | +| 默认值 | 10240 | +| 改后生效方式 | 重启生效 | + +* chunk\_point\_num\_lower\_bound\_in\_compaction + +|名字| chunk\_point\_num\_lower\_bound\_in\_compaction | +|:---:|:------------------------------------------------| +|描述| 合并时源 Chunk 的点数小于这个值,将被解开成点进行合并 | +|类型| int32 | +|默认值| 1000 | +|改后生效方式| 重启生效 | + +* inner\_compaction\_total\_file\_num\_threshold + +|名字| inner\_compaction\_total\_file\_num\_threshold | +|:---:|:---| +|描述| 空间内合并中一次合并最多参与的文件数 | +|类型| int32 | +|默认值| 30| +|改后生效方式|重启生效| + +* inner\_compaction\_total\_file\_size\_threshold + +|名字| inner\_compaction\_total\_file\_size\_threshold | +|:---:|:------------------------------------------------| +|描述| 空间内合并任务最大选中文件总大小,单位:byte | +|类型| int64 | +|默认值| 10737418240 | +|改后生效方式| 热加载 | + +* compaction\_max\_aligned\_series\_num\_in\_one\_batch + +|名字| compaction\_max\_aligned\_series\_num\_in\_one\_batch | +|:---:|:------------------------------------------------------| +|描述| 对齐序列合并一次执行时处理的值列数量 | +|类型| int32 | +|默认值| 10 | +|改后生效方式| 热加载 | + +* max\_level\_gap\_in\_inner\_compaction + +|名字| max\_level\_gap\_in\_inner\_compaction | +|:---:|:---------------------------------------| +|描述| 空间内合并选文件时最大允许跨的文件层级 | +|类型| int32 | +|默认值| 2 | +|改后生效方式| 热加载 | + +* inner\_compaction\_candidate\_file\_num + +|名字| inner\_compaction\_candidate\_file\_num | +|:---:|:----------------------------------------| +|描述| 符合构成一个空间内合并任务的候选文件数量 | +|类型| int32 | +|默认值| 30 | +|改后生效方式| 热加载 | + +* max\_cross\_compaction\_candidate\_file\_num + +|名字| max\_cross\_compaction\_candidate\_file\_num | +|:---:|:---------------------------------------------| +|描述| 跨空间合并中一次合并最多参与的文件数 | +|类型| int32 | +|默认值| 500 | +|改后生效方式| 热加载 | + + +* max\_cross\_compaction\_candidate\_file\_size + +|名字| max\_cross\_compaction\_candidate\_file\_size | +|:---:|:----------------------------------------------| +|描述| 跨空间合并中一次合并最多参与的文件总大小 | +|类型| Int64 | +|默认值| 5368709120 | +|改后生效方式| 热加载 | + + +* compaction\_thread\_count + +|名字| compaction\_thread\_count | +|:---:|:--------------------------| +|描述| 执行合并任务的线程数目 | +|类型| int32 | +|默认值| 10 | +|改后生效方式| 热加载 | + +* compaction\_schedule\_interval\_in\_ms + +| 名字 | compaction\_schedule\_interval\_in\_ms | +| :----------: | :------------------------------------- | +| 描述 | 合并调度的时间间隔 | +| 类型 | Int64 | +| 默认值 | 60000 | +| 改后生效方式 | 重启生效 | + +* compaction\_submission\_interval\_in\_ms + +| 名字 | compaction\_submission\_interval\_in\_ms | +| :----------: | :--------------------------------------- | +| 描述 | 合并任务提交的间隔 | +| 类型 | Int64 | +| 默认值 | 60000 | +| 改后生效方式 | 重启生效 | + +* compaction\_write\_throughput\_mb\_per\_sec + +|名字| compaction\_write\_throughput\_mb\_per\_sec | +|:---:|:---| +|描述| 每秒可达到的写入吞吐量合并限制。| +|类型| int32 | +|默认值| 16 | +|改后生效方式| 重启生效| + +* compaction\_read\_throughput\_mb\_per\_sec + +| 名字 | compaction\_read\_throughput\_mb\_per\_sec | +|:---------:|:-------------------------------------------| +| 描述 | 合并每秒读吞吐限制,单位为 byte,设置为 0 代表不限制 | +| 类型 | int32 | +| 默认值 | 0 | +| Effective | 热加载 | + +* compaction\_read\_operation\_per\_sec + +| 名字 | compaction\_read\_operation\_per\_sec | +|:---------:|:--------------------------------------| +| 描述 | 合并每秒读操作数量限制,设置为 0 代表不限制 | +| 类型 | int32 | +| 默认值 | 0 | +| Effective | 热加载 | + +* sub\_compaction\_thread\_count + +|名字| sub\_compaction\_thread\_count | +|:---:|:--------------------------------| +|描述| 每个合并任务的子任务线程数,只对跨空间合并和乱序空间内合并生效 | +|类型| int32 | +|默认值| 4 | +|改后生效方式| 热加载 | + +* enable\_tsfile\_validation + +| 名字 | enable\_tsfile\_validation | +|:---------:|:------------------------------| +| 描述 | Flush, Load 或合并后验证 tsfile 正确性 | +| 类型 | boolean | +| 默认值 | false | +| 改后生效方式 | 热加载 | + +* candidate\_compaction\_task\_queue\_size + +|名字| candidate\_compaction\_task\_queue\_size | +|:---:|:-----------------------------------------| +|描述| 合并任务优先级队列的大小 | +|类型| int32 | +|默认值| 50 | +|改后生效方式| 重启生效 | + +* compaction\_schedule\_thread\_num + +|名字| compaction\_schedule\_thread\_num | +|:---:|:-----------------------------------------| +|描述| 选择合并任务的线程数量 | +|类型| int32 | +|默认值| 4 | +|改后生效方式| 热加载 | + +#### 写前日志配置 + +* wal\_mode + +| 名字 | wal\_mode | +|:------:|:------------------------------------------------------------------------------------| +| 描述 | 写前日志的写入模式. DISABLE 模式下会关闭写前日志;SYNC 模式下写入请求会在成功写入磁盘后返回; ASYNC 模式下写入请求返回时可能尚未成功写入磁盘后。 | +| 类型 | String | +| 默认值 | ASYNC | +| 改后生效方式 | 重启生效 | + +* max\_wal\_nodes\_num + +| 名字 | max\_wal\_nodes\_num | +|:------:|:-----------------------------| +| 描述 | 写前日志节点的最大数量,默认值 0 表示数量由系统控制。 | +| 类型 | int32 | +| 默认值 | 0 | +| 改后生效方式 | 重启生效 | + +* wal\_async\_mode\_fsync\_delay\_in\_ms + +| 名字 | wal\_async\_mode\_fsync\_delay\_in\_ms | +|:------:|:---------------------------------------| +| 描述 | async 模式下写前日志调用 fsync 前的等待时间 | +| 类型 | int32 | +| 默认值 | 1000 | +| 改后生效方式 | 热加载 | + +* wal\_sync\_mode\_fsync\_delay\_in\_ms + +| 名字 | wal\_sync\_mode\_fsync\_delay\_in\_ms | +|:------:|:--------------------------------------| +| 描述 | sync 模式下写前日志调用 fsync 前的等待时间 | +| 类型 | int32 | +| 默认值 | 3 | +| 改后生效方式 | 热加载 | + +* wal\_buffer\_size\_in\_byte + +| 名字 | wal\_buffer\_size\_in\_byte | +|:------:|:----------------------------| +| 描述 | 写前日志的 buffer 大小 | +| 类型 | int32 | +| 默认值 | 33554432 | +| 改后生效方式 | 重启生效 | + +* wal\_buffer\_queue\_capacity + +| 名字 | wal\_buffer\_queue\_capacity | +|:------:|:-----------------------------| +| 描述 | 写前日志阻塞队列大小上限 | +| 类型 | int32 | +| 默认值 | 500 | +| 改后生效方式 | 重启生效 | + +* wal\_file\_size\_threshold\_in\_byte + +| 名字 | wal\_file\_size\_threshold\_in\_byte | +|:------:|:-------------------------------------| +| 描述 | 写前日志文件封口阈值 | +| 类型 | int32 | +| 默认值 | 31457280 | +| 改后生效方式 | 热加载 | + +* wal\_min\_effective\_info\_ratio + +| 名字 | wal\_min\_effective\_info\_ratio | +|:------:|:---------------------------------| +| 描述 | 写前日志最小有效信息比 | +| 类型 | double | +| 默认值 | 0.1 | +| 改后生效方式 | 热加载 | + +* wal\_memtable\_snapshot\_threshold\_in\_byte + +| 名字 | wal\_memtable\_snapshot\_threshold\_in\_byte | +|:------:|:---------------------------------------------| +| 描述 | 触发写前日志中内存表快照的内存表大小阈值 | +| 类型 | int64 | +| 默认值 | 8388608 | +| 改后生效方式 | 热加载 | + +* max\_wal\_memtable\_snapshot\_num + +| 名字 | max\_wal\_memtable\_snapshot\_num | +|:------:|:----------------------------------| +| 描述 | 写前日志中内存表的最大数量上限 | +| 类型 | int32 | +| 默认值 | 1 | +| 改后生效方式 | 热加载 | + +* delete\_wal\_files\_period\_in\_ms + +| 名字 | delete\_wal\_files\_period\_in\_ms | +|:------:|:-----------------------------------| +| 描述 | 删除写前日志的检查间隔 | +| 类型 | int64 | +| 默认值 | 20000 | +| 改后生效方式 | 热加载 | + +#### TsFile 配置 + +* group\_size\_in\_byte + +|名字| group\_size\_in\_byte | +|:---:|:---| +|描述| 每次将内存中的数据写入到磁盘时的最大写入字节数 | +|类型| int32 | +|默认值| 134217728 | +|改后生效方式|热加载| + +* page\_size\_in\_byte + +|名字| page\_size\_in\_byte | +|:---:|:---| +|描述| 内存中每个列写出时,写成的单页最大的大小,单位为字节 | +|类型| int32 | +|默认值| 65536 | +|改后生效方式|热加载| + +* max\_number\_of\_points\_in\_page + +|名字| max\_number\_of\_points\_in\_page | +|:---:|:----------------------------------| +|描述| 一个页中最多包含的数据点(时间戳-值的二元组)数量 | +|类型| int32 | +|默认值| 10000 | +|改后生效方式| 热加载 | + +* pattern\_matching\_threshold + +|名字| pattern\_matching\_threshold | +|:---:|:-----------------------------| +|描述| 正则表达式匹配时最大的匹配次数 | +|类型| int32 | +|默认值| 1000000 | +|改后生效方式| 热加载 | + +* max\_string\_length + +|名字| max\_string\_length | +|:---:|:---| +|描述| 针对字符串类型的数据,单个字符串最大长度,单位为字符| +|类型| int32 | +|默认值| 128 | +|改后生效方式|热加载| + +* float\_precision + +|名字| float\_precision | +|:---:|:---| +|描述| 浮点数精度,为小数点后数字的位数 | +|类型| int32 | +|默认值| 默认为 2 位。注意:32 位浮点数的十进制精度为 7 位,64 位浮点数的十进制精度为 15 位。如果设置超过机器精度将没有实际意义。 | +|改后生效方式|热加载| + +* value\_encoder + +| 名字 | value\_encoder | +| :----------: | :------------------------------------ | +| 描述 | value 列编码方式 | +| 类型 | 枚举 String: “TS_2DIFF”,“PLAIN”,“RLE” | +| 默认值 | PLAIN | +| 改后生效方式 | 热加载 | + +* compressor + +| 名字 | compressor | +|:------:|:-------------------------------------------------------------| +| 描述 | 数据压缩方法; 对齐序列中时间列的压缩方法 | +| 类型 | 枚举 String : "UNCOMPRESSED", "SNAPPY", "LZ4", "ZSTD", "LZMA2" | +| 默认值 | SNAPPY | +| 改后生效方式 | 热加载 | + +* max\_degree\_of\_index\_node + +|名字| max\_degree\_of\_index\_node | +|:---:|:---| +|描述| 元数据索引树的最大度(即每个节点的最大子节点个数)。 | +|类型| int32 | +|默认值| 256 | +|改后生效方式|仅允许在第一次启动服务前修改| + + +#### 授权配置 + +* authorizer\_provider\_class + +| 名字 | authorizer\_provider\_class | +| :----------: | :------------------------------------------------------ | +| 描述 | 权限服务的类名 | +| 类型 | String | +| 默认值 | org.apache.iotdb.commons.auth.authorizer.LocalFileAuthorizer | +| 改后生效方式 | 重启生效 | +| 其他可选值 | org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer | + +* openID\_url + +| 名字 | openID\_url | +| :----------: | :--------------------------------------------------------- | +| 描述 | openID 服务器地址 (当 OpenIdAuthorizer 被启用时必须设定) | +| 类型 | String(一个 http 地址) | +| 默认值 | 无 | +| 改后生效方式 | 重启生效 | + +* iotdb\_server\_encrypt\_decrypt\_provider + +| 名字 | iotdb\_server\_encrypt\_decrypt\_provider | +| :----------: | :------------------------------------------------------------- | +| 描述 | 用于用户密码加密的类 | +| 类型 | String | +| 默认值 | org.apache.iotdb.commons.security.encrypt.MessageDigestEncrypt | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* iotdb\_server\_encrypt\_decrypt\_provider\_parameter + +| 名字 | iotdb\_server\_encrypt\_decrypt\_provider\_parameter | +| :----------: | :--------------------------------------------------- | +| 描述 | 用于初始化用户密码加密类的参数 | +| 类型 | String | +| 默认值 | 空 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* author\_cache\_size + +| 名字 | author\_cache\_size | +| :----------: | :----------------------- | +| 描述 | 用户缓存与角色缓存的大小 | +| 类型 | int32 | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +* author\_cache\_expire\_time + +| 名字 | author\_cache\_expire\_time | +| :----------: | :------------------------------------- | +| 描述 | 用户缓存与角色缓存的有效期,单位为分钟 | +| 类型 | int32 | +| 默认值 | 30 | +| 改后生效方式 | 重启生效 | + +#### UDF查询配置 + +* udf\_initial\_byte\_array\_length\_for\_memory\_control + +|名字| udf\_initial\_byte\_array\_length\_for\_memory\_control | +|:---:|:---| +|描述| 用于评估UDF查询中文本字段的内存使用情况。建议将此值设置为略大于所有文本的平均长度记录。 | +|类型| int32 | +|默认值| 48 | +|改后生效方式|重启生效| + +* udf\_memory\_budget\_in\_mb + +| 名字 | udf\_memory\_budget\_in\_mb | +| :----------: | :----------------------------------------------------------------------------- | +| 描述 | 在一个UDF查询中使用多少内存(以 MB 为单位)。上限为已分配内存的 20% 用于读取。 | +| 类型 | Float | +| 默认值 | 30.0 | +| 改后生效方式 | 重启生效 | + +* udf\_reader\_transformer\_collector\_memory\_proportion + +| 名字 | udf\_reader\_transformer\_collector\_memory\_proportion | +| :----------: | :-------------------------------------------------------- | +| 描述 | UDF内存分配比例。参数形式为a : b : c,其中a、b、c为整数。 | +| 类型 | String | +| 默认值 | 1:1:1 | +| 改后生效方式 | 重启生效 | + +* udf\_lib\_dir + +| 名字 | udf\_lib\_dir | +| :----------: | :--------------------------- | +| 描述 | UDF 日志及jar文件存储路径 | +| 类型 | String | +| 默认值 | ext/udf(Windows:ext\\udf) | +| 改后生效方式 | 重启生效 | + +#### 触发器配置 + +* trigger\_lib\_dir + +| 名字 | trigger\_lib\_dir | +| :----------: |:------------------| +| 描述 | 触发器 JAR 包存放的目录 | +| 类型 | String | +| 默认值 | ext/trigger | +| 改后生效方式 | 重启生效 | + +* stateful\_trigger\_retry\_num\_when\_not\_found + +| 名字 | stateful\_trigger\_retry\_num\_when\_not\_found | +| :----------: |:------------------------------------------------| +| 描述 | 有状态触发器触发无法找到触发器实例时的重试次数 | +| 类型 | Int32 | +| 默认值 | 3 | +| 改后生效方式 | 重启生效 | + + +#### SELECT-INTO配置 + +* into\_operation\_buffer\_size\_in\_byte + +| 名字 | into\_operation\_buffer\_size\_in\_byte | +| :----------: | :-------------------------------------------------------------------- | +| 描述 | 执行 select-into 语句时,待写入数据占用的最大内存(单位:Byte) | +| 类型 | int64 | +| 默认值 | 100MB | +| 改后生效方式 | 热加载 | + +* select\_into\_insert\_tablet\_plan\_row\_limit + +| 名字 | select\_into\_insert\_tablet\_plan\_row\_limit | +| :----------: | :-------------------------------------------------------------------- | +| 描述 | 执行 select-into 语句时,一个 insert-tablet-plan 中可以处理的最大行数 | +| 类型 | int32 | +| 默认值 | 10000 | +| 改后生效方式 | 热加载 | + +* into\_operation\_execution\_thread\_count + +| 名字 | into\_operation\_execution\_thread\_count | +| :---------: | :---------------------------------------- | +| 描述 | SELECT INTO 中执行写入任务的线程池的线程数 | +| 类型 | int32 | +| 默认值 | 2 | +| 改后生效方式 | 重启生效 | + +#### 连续查询配置 + +* continuous\_query\_submit\_thread\_count + +| 名字 | continuous\_query\_execution\_thread | +| :----------: |:----------------------------------| +| 描述 | 执行连续查询任务的线程池的线程数 | +| 类型 | int32 | +| 默认值 | 2 | +| 改后生效方式 | 重启生效 | + +* continuous\_query\_min\_every\_interval\_in\_ms + +| 名字 | continuous\_query\_min\_every\_interval\_in\_ms | +| :----------: |:------------------------------------------------| +| 描述 | 连续查询执行时间间隔的最小值 | +| 类型 | long (duration) | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +#### PIPE 配置 + +* pipe_lib_dir + +| **名字** | **pipe_lib_dir** | +| ------------ | -------------------------- | +| 描述 | 自定义 Pipe 插件的存放目录 | +| 类型 | string | +| 默认值 | ext/pipe | +| 改后生效方式 | 暂不支持修改 | + +* pipe_subtask_executor_max_thread_num + +| **名字** | **pipe_subtask_executor_max_thread_num** | +| ------------ | ------------------------------------------------------------ | +| 描述 | pipe 子任务 processor、sink 中各自可以使用的最大线程数。实际值将是 min(pipe_subtask_executor_max_thread_num, max(1, CPU核心数 / 2))。 | +| 类型 | int | +| 默认值 | 5 | +| 改后生效方式 | 重启生效 | + +* pipe_sink_timeout_ms + +| **名字** | **pipe_sink_timeout_ms** | +| ------------ | --------------------------------------------- | +| 描述 | thrift 客户端的连接超时时间(以毫秒为单位)。 | +| 类型 | int | +| 默认值 | 900000 | +| 改后生效方式 | 重启生效 | + +* pipe_sink_selector_number + +| **名字** | **pipe_sink_selector_number** | +| ------------ | ------------------------------------------------------------ | +| 描述 | 在 iotdb-thrift-async-sink 插件中可以使用的最大执行结果处理线程数量。 建议将此值设置为小于或等于 pipe_sink_max_client_number。 | +| 类型 | int | +| 默认值 | 4 | +| 改后生效方式 | 重启生效 | + +* pipe_sink_max_client_number + +| **名字** | **pipe_sink_max_client_number** | +| ------------ | ----------------------------------------------------------- | +| 描述 | 在 iotdb-thrift-async-sink 插件中可以使用的最大客户端数量。 | +| 类型 | int | +| 默认值 | 16 | +| 改后生效方式 | 重启生效 | + +* pipe_air_gap_receiver_enabled + +| **名字** | **pipe_air_gap_receiver_enabled** | +| ------------ | ------------------------------------------------------------ | +| 描述 | 是否启用通过网闸接收 pipe 数据。接收器只能在 tcp 模式下返回 0 或 1,以指示数据是否成功接收。 | +| 类型 | Boolean | +| 默认值 | false | +| 改后生效方式 | 重启生效 | + +* pipe_air_gap_receiver_port + +| **名字** | **pipe_air_gap_receiver_port** | +| ------------ | ------------------------------------ | +| 描述 | 服务器通过网闸接收 pipe 数据的端口。 | +| 类型 | int | +| 默认值 | 9780 | +| 改后生效方式 | 重启生效 | + +* pipe_all_sinks_rate_limit_bytes_per_second + +| **名字** | **pipe_all_sinks_rate_limit_bytes_per_second** | +| ------------ | ------------------------------------------------------------ | +| 描述 | 所有 pipe sink 每秒可以传输的总字节数。当给定的值小于或等于 0 时,表示没有限制。默认值是 -1,表示没有限制。 | +| 类型 | double | +| 默认值 | -1 | +| 改后生效方式 | 可热加载 | + +#### IoT 共识协议配置 + +当Region配置了IoTConsensus共识协议之后,下述的配置项才会生效 + +* data_region_iot_max_log_entries_num_per_batch + +| 名字 | data_region_iot_max_log_entries_num_per_batch | +| :----------: | :-------------------------------- | +| 描述 | IoTConsensus batch 的最大日志条数 | +| 类型 | int32 | +| 默认值 | 1024 | +| 改后生效方式 | 重启生效 | + +* data_region_iot_max_size_per_batch + +| 名字 | data_region_iot_max_size_per_batch | +| :----------: | :---------------------------- | +| 描述 | IoTConsensus batch 的最大大小 | +| 类型 | int32 | +| 默认值 | 16MB | +| 改后生效方式 | 重启生效 | + +* data_region_iot_max_pending_batches_num + +| 名字 | data_region_iot_max_pending_batches_num | +| :----------: | :---------------------------------- | +| 描述 | IoTConsensus batch 的流水线并发阈值 | +| 类型 | int32 | +| 默认值 | 12 | +| 改后生效方式 | 重启生效 | + +* data_region_iot_max_memory_ratio_for_queue + +| 名字 | data_region_iot_max_memory_ratio_for_queue | +| :----------: | :---------------------------- | +| 描述 | IoTConsensus 队列内存分配比例 | +| 类型 | double | +| 默认值 | 0.6 | +| 改后生效方式 | 重启生效 | + +#### Ratis 共识协议配置 +当Region配置了RatisConsensus共识协议之后,下述的配置项才会生效 + +* config\_node\_ratis\_log\_appender\_buffer\_size\_max + +| 名字 | config\_node\_ratis\_log\_appender\_buffer\_size\_max | +|:------:|:-----------------------------------------------| +| 描述 | confignode 一次同步日志RPC最大的传输字节限制 | +| 类型 | int32 | +| 默认值 | 4MB | +| 改后生效方式 | 重启生效 | + + +* schema\_region\_ratis\_log\_appender\_buffer\_size\_max + +| 名字 | schema\_region\_ratis\_log\_appender\_buffer\_size\_max | +|:------:|:-------------------------------------------------| +| 描述 | schema region 一次同步日志RPC最大的传输字节限制 | +| 类型 | int32 | +| 默认值 | 4MB | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_log\_appender\_buffer\_size\_max + +| 名字 | data\_region\_ratis\_log\_appender\_buffer\_size\_max | +|:------:|:-----------------------------------------------| +| 描述 | data region 一次同步日志RPC最大的传输字节限制 | +| 类型 | int32 | +| 默认值 | 4MB | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_snapshot\_trigger\_threshold + +| 名字 | config\_node\_ratis\_snapshot\_trigger\_threshold | +|:------:|:---------------------------------------------| +| 描述 | confignode 触发snapshot需要的日志条数 | +| 类型 | int32 | +| 默认值 | 400,000 | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_snapshot\_trigger\_threshold + +| 名字 | schema\_region\_ratis\_snapshot\_trigger\_threshold | +|:------:|:-----------------------------------------------| +| 描述 | schema region 触发snapshot需要的日志条数 | +| 类型 | int32 | +| 默认值 | 400,000 | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_snapshot\_trigger\_threshold + +| 名字 | data\_region\_ratis\_snapshot\_trigger\_threshold | +|:------:|:---------------------------------------------| +| 描述 | data region 触发snapshot需要的日志条数 | +| 类型 | int32 | +| 默认值 | 400,000 | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_log\_unsafe\_flush\_enable + +| 名字 | config\_node\_ratis\_log\_unsafe\_flush\_enable | +|:------:|:------------------------------------------| +| 描述 | confignode 是否允许Raft日志异步刷盘 | +| 类型 | boolean | +| 默认值 | false | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_log\_unsafe\_flush\_enable + +| 名字 | schema\_region\_ratis\_log\_unsafe\_flush\_enable | +|:------:|:--------------------------------------------| +| 描述 | schema region 是否允许Raft日志异步刷盘 | +| 类型 | boolean | +| 默认值 | false | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_log\_unsafe\_flush\_enable + +| 名字 | data\_region\_ratis\_log\_unsafe\_flush\_enable | +|:------:|:------------------------------------------| +| 描述 | data region 是否允许Raft日志异步刷盘 | +| 类型 | boolean | +| 默认值 | false | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_log\_segment\_size\_max\_in\_byte + +| 名字 | config\_node\_ratis\_log\_segment\_size\_max\_in\_byte | +|:------:|:-----------------------------------------------| +| 描述 | confignode 一个RaftLog日志段文件的大小 | +| 类型 | int32 | +| 默认值 | 24MB | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_log\_segment\_size\_max\_in\_byte + +| 名字 | schema\_region\_ratis\_log\_segment\_size\_max\_in\_byte | +|:------:|:-------------------------------------------------| +| 描述 | schema region 一个RaftLog日志段文件的大小 | +| 类型 | int32 | +| 默认值 | 24MB | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_log\_segment\_size\_max\_in\_byte + +| 名字 | data\_region\_ratis\_log\_segment\_size\_max\_in\_byte | +|:------:|:-----------------------------------------------| +| 描述 | data region 一个RaftLog日志段文件的大小 | +| 类型 | int32 | +| 默认值 | 24MB | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_grpc\_flow\_control\_window + +| 名字 | config\_node\_ratis\_grpc\_flow\_control\_window | +|:------:|:-------------------------------------------| +| 描述 | confignode grpc 流式拥塞窗口大小 | +| 类型 | int32 | +| 默认值 | 4MB | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_grpc\_flow\_control\_window + +| 名字 | schema\_region\_ratis\_grpc\_flow\_control\_window | +|:------:|:---------------------------------------------| +| 描述 | schema region grpc 流式拥塞窗口大小 | +| 类型 | int32 | +| 默认值 | 4MB | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_grpc\_flow\_control\_window + +| 名字 | data\_region\_ratis\_grpc\_flow\_control\_window | +|:------:|:-------------------------------------------| +| 描述 | data region grpc 流式拥塞窗口大小 | +| 类型 | int32 | +| 默认值 | 4MB | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_grpc\_leader\_outstanding\_appends\_max + +| 名字 | config\_node\_ratis\_grpc\_leader\_outstanding\_appends\_max | +| :----------: | :----------------------------------------------------- | +| 描述 | config node grpc 流水线并发阈值 | +| 类型 | int32 | +| 默认值 | 128 | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max + +| 名字 | schema\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max | +| :----------: | :------------------------------------------------------ | +| 描述 | schema region grpc 流水线并发阈值 | +| 类型 | int32 | +| 默认值 | 128 | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max + +| 名字 | data\_region\_ratis\_grpc\_leader\_outstanding\_appends\_max | +| :----------: | :---------------------------------------------------- | +| 描述 | data region grpc 流水线并发阈值 | +| 类型 | int32 | +| 默认值 | 128 | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_log\_force\_sync\_num + +| 名字 | config\_node\_ratis\_log\_force\_sync\_num | +| :----------: | :------------------------------------ | +| 描述 | config node fsync 阈值 | +| 类型 | int32 | +| 默认值 | 128 | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_log\_force\_sync\_num + +| 名字 | schema\_region\_ratis\_log\_force\_sync\_num | +| :----------: | :------------------------------------- | +| 描述 | schema region fsync 阈值 | +| 类型 | int32 | +| 默认值 | 128 | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_log\_force\_sync\_num + +| 名字 | data\_region\_ratis\_log\_force\_sync\_num | +| :----------: | :----------------------------------- | +| 描述 | data region fsync 阈值 | +| 类型 | int32 | +| 默认值 | 128 | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_rpc\_leader\_election\_timeout\_min\_ms + +| 名字 | config\_node\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | +|:------:|:-----------------------------------------------------| +| 描述 | confignode leader 选举超时最小值 | +| 类型 | int32 | +| 默认值 | 2000ms | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms + +| 名字 | schema\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | +|:------:|:-------------------------------------------------------| +| 描述 | schema region leader 选举超时最小值 | +| 类型 | int32 | +| 默认值 | 2000ms | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms + +| 名字 | data\_region\_ratis\_rpc\_leader\_election\_timeout\_min\_ms | +|:------:|:-----------------------------------------------------| +| 描述 | data region leader 选举超时最小值 | +| 类型 | int32 | +| 默认值 | 2000ms | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_rpc\_leader\_election\_timeout\_max\_ms + +| 名字 | config\_node\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | +|:------:|:-----------------------------------------------------| +| 描述 | confignode leader 选举超时最大值 | +| 类型 | int32 | +| 默认值 | 2000ms | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms + +| 名字 | schema\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | +|:------:|:-------------------------------------------------------| +| 描述 | schema region leader 选举超时最大值 | +| 类型 | int32 | +| 默认值 | 2000ms | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms + +| 名字 | data\_region\_ratis\_rpc\_leader\_election\_timeout\_max\_ms | +|:------:|:-----------------------------------------------------| +| 描述 | data region leader 选举超时最大值 | +| 类型 | int32 | +| 默认值 | 2000ms | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_request\_timeout\_ms + +| 名字 | config\_node\_ratis\_request\_timeout\_ms | +|:------:|:-------------------------------------| +| 描述 | confignode Raft 客户端重试超时 | +| 类型 | int32 | +| 默认值 | 10s | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_request\_timeout\_ms + +| 名字 | schema\_region\_ratis\_request\_timeout\_ms | +|:------:|:---------------------------------------| +| 描述 | schema region Raft 客户端重试超时 | +| 类型 | int32 | +| 默认值 | 10s | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_request\_timeout\_ms + +| 名字 | data\_region\_ratis\_request\_timeout\_ms | +|:------:|:-------------------------------------| +| 描述 | data region Raft 客户端重试超时 | +| 类型 | int32 | +| 默认值 | 10s | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_max\_retry\_attempts + +| 名字 | config\_node\_ratis\_max\_retry\_attempts | +|:------:|:-------------------------------------| +| 描述 | confignode Raft客户端最大重试次数 | +| 类型 | int32 | +| 默认值 | 10 | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_initial\_sleep\_time\_ms + +| 名字 | config\_node\_ratis\_initial\_sleep\_time\_ms | +|:------:|:----------------------------------------| +| 描述 | confignode Raft客户端初始重试睡眠时长 | +| 类型 | int32 | +| 默认值 | 100ms | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_max\_sleep\_time\_ms + +| 名字 | config\_node\_ratis\_max\_sleep\_time\_ms | +|:------:|:------------------------------------| +| 描述 | confignode Raft客户端最大重试睡眠时长 | +| 类型 | int32 | +| 默认值 | 10s | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_max\_retry\_attempts + +| 名字 | schema\_region\_ratis\_max\_retry\_attempts | +|:------:|:---------------------------------------| +| 描述 | schema region Raft客户端最大重试次数 | +| 类型 | int32 | +| 默认值 | 10 | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_initial\_sleep\_time\_ms + +| 名字 | schema\_region\_ratis\_initial\_sleep\_time\_ms | +|:------:|:------------------------------------------| +| 描述 | schema region Raft客户端初始重试睡眠时长 | +| 类型 | int32 | +| 默认值 | 100ms | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_max\_sleep\_time\_ms + +| 名字 | schema\_region\_ratis\_max\_sleep\_time\_ms | +|:------:|:--------------------------------------| +| 描述 | schema region Raft客户端最大重试睡眠时长 | +| 类型 | int32 | +| 默认值 | 10s | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_max\_retry\_attempts + +| 名字 | data\_region\_ratis\_max\_retry\_attempts | +|:------:|:-------------------------------------| +| 描述 | data region Raft客户端最大重试次数 | +| 类型 | int32 | +| 默认值 | 10 | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_initial\_sleep\_time\_ms + +| 名字 | data\_region\_ratis\_initial\_sleep\_time\_ms | +|:------:|:----------------------------------------| +| 描述 | data region Raft客户端初始重试睡眠时长 | +| 类型 | int32 | +| 默认值 | 100ms | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_max\_sleep\_time\_ms + +| 名字 | data\_region\_ratis\_max\_sleep\_time\_ms | +|:------:|:------------------------------------| +| 描述 | data region Raft客户端最大重试睡眠时长 | +| 类型 | int32 | +| 默认值 | 10s | +| 改后生效方式 | 重启生效 | + +* ratis\_first\_election\_timeout\_min\_ms + +| 名字 | ratis\_first\_election\_timeout\_min\_ms | +|:------:|:----------------------------------------------------------------| +| 描述 | Ratis协议首次选举最小超时时间 | +| 类型 | int64 | +| 默认值 | 50 (ms) | +| 改后生效方式 | 重启生效 | + +* ratis\_first\_election\_timeout\_max\_ms + +| 名字 | ratis\_first\_election\_timeout\_max\_ms | +|:------:|:----------------------------------------------------------------| +| 描述 | Ratis协议首次选举最大超时时间 | +| 类型 | int64 | +| 默认值 | 150 (ms) | +| 改后生效方式 | 重启生效 | + + +* config\_node\_ratis\_preserve\_logs\_num\_when\_purge + +| 名字 | config\_node\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:-----------------------------------------------| +| 描述 | confignode snapshot后保持一定数量日志不删除 | +| 类型 | int32 | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_preserve\_logs\_num\_when\_purge + +| 名字 | schema\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:-------------------------------------------------| +| 描述 | schema region snapshot后保持一定数量日志不删除 | +| 类型 | int32 | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_preserve\_logs\_num\_when\_purge + +| 名字 | data\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:-----------------------------------------------| +| 描述 | data region snapshot后保持一定数量日志不删除 | +| 类型 | int32 | +| 默认值 | 1000 | +| 改后生效方式 | 重启生效 | + +* config\_node\_ratis\_log\_max\_size + +| 名字 | config\_node\_ratis\_log\_max\_size | +|:------:|:----------------------------------------------------------------| +| 描述 | config node磁盘Raft Log最大占用空间 | +| 类型 | int64 | +| 默认值 | 2147483648 (2GB) | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_log\_max\_size + +| 名字 | schema\_region\_ratis\_log\_max\_size | +|:------:|:----------------------------------------------------------------| +| 描述 | schema region 磁盘Raft Log最大占用空间 | +| 类型 | int64 | +| 默认值 | 2147483648 (2GB) | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_log\_max\_size + +| 名字 | data\_region\_ratis\_log\_max\_size | +|:------:|:----------------------------------------------------------------| +| 描述 | data region 磁盘Raft Log最大占用空间| +| 类型 | int64 | +| 默认值 | 21474836480 (20GB) | +| 改后生效方式 | 重启生效 | + + +* config\_node\_ratis\_periodic\_snapshot\_interval + +| 名字 | config\_node\_ratis\_periodic\_snapshot\_interval | +|:------:|:----------------------------------------------------------------| +| 描述 | config node定期snapshot的间隔时间 | +| 类型 | int64 | +| 默认值 | 86400 (秒) | +| 改后生效方式 | 重启生效 | + +* schema\_region\_ratis\_periodic\_snapshot\_interval + +| 名字 | schema\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:----------------------------------------------------------------| +| 描述 | schema region定期snapshot的间隔时间 | +| 类型 | int64 | +| 默认值 | 86400 (秒) | +| 改后生效方式 | 重启生效 | + +* data\_region\_ratis\_periodic\_snapshot\_interval + +| 名字 | data\_region\_ratis\_preserve\_logs\_num\_when\_purge | +|:------:|:----------------------------------------------------------------| +| 描述 | data region定期snapshot的间隔时间 | +| 类型 | int64 | +| 默认值 | 86400 (秒) | +| 改后生效方式 | 重启生效 | + + +#### Procedure 配置 + +* procedure\_core\_worker\_thread\_count + +| 名字 | procedure\_core\_worker\_thread\_count | +| :----------: | :--------------------------------- | +| 描述 | 工作线程数量 | +| 类型 | int32 | +| 默认值 | 4 | +| 改后生效方式 | 重启生效 | + +* procedure\_completed\_clean\_interval + +| 名字 | procedure\_completed\_clean\_interval | +| :----------: | :--------------------------------- | +| 描述 | 清理已完成的 procedure 时间间隔 | +| 类型 | int32 | +| 默认值 | 30(s) | +| 改后生效方式 | 重启生效 | + +* procedure\_completed\_evict\_ttl + +| 名字 | procedure\_completed\_evict\_ttl | +| :----------: | :-------------------------------- | +| 描述 | 已完成的 procedure 的数据保留时间 | +| 类型 | int32 | +| 默认值 | 800(s) | +| 改后生效方式 | 重启生效 | + +#### MQTT代理配置 + +* enable\_mqtt\_service + +| 名字 | enable\_mqtt\_service。 | +| :----------: | :---------------------- | +| 描述 | 是否开启MQTT服务 | +| 类型 | Boolean | +| 默认值 | false | +| 改后生效方式 | 热加载 | + +* mqtt\_host + +| 名字 | mqtt\_host | +| :----------: | :------------------- | +| 描述 | MQTT服务绑定的host。 | +| 类型 | String | +| 默认值 | 0.0.0.0 | +| 改后生效方式 | 热加载 | + +* mqtt\_port + +|名字| mqtt\_port | +|:---:|:---| +|描述| MQTT服务绑定的port。 | +|类型| int32 | +|默认值| 1883 | +|改后生效方式|热加载| + +* mqtt\_handler\_pool\_size + +|名字| mqtt\_handler\_pool\_size | +|:---:|:---| +|描述| 用于处理MQTT消息的处理程序池大小。 | +|类型| int32 | +|默认值| 1 | +|改后生效方式|热加载| + +* mqtt\_payload\_formatter + +| 名字 | mqtt\_payload\_formatter | +| :----------: | :--------------------------- | +| 描述 | MQTT消息有效负载格式化程序。 | +| 类型 | String | +| 默认值 | json | +| 改后生效方式 | 热加载 | + +* mqtt\_max\_message\_size + +|名字| mqtt\_max\_message\_size | +|:---:|:---| +|描述| MQTT消息的最大长度(以字节为单位)。 | +|类型| int32 | +|默认值| 1048576 | +|改后生效方式|热加载| + +#### TsFile 主动监听&加载功能配置 + +* load\_active\_listening\_enable + +|名字| load\_active\_listening\_enable | +|:---:|:---| +|描述| 是否开启 DataNode 主动监听并且加载 tsfile 的功能(默认开启)。 | +|类型| Boolean | +|默认值| true | +|改后生效方式|热加载| + +* load\_active\_listening\_dirs + +|名字| load\_active\_listening\_dirs | +|:---:|:---| +|描述| 需要监听的目录(自动包括目录中的子目录),如有多个使用 “,“ 隔开默认的目录为 ext/load/pending(支持热装载)。 | +|类型| String | +|默认值| ext/load/pending | +|改后生效方式|热加载| + +* load\_active\_listening\_fail\_dir + +|名字| load\_active\_listening\_fail\_dir | +|:---:|:---| +|描述| 执行加载 tsfile 文件失败后将文件转存的目录,只能配置一个。 | +|类型| String | +|默认值| ext/load/failed | +|改后生效方式|热加载| + +* load\_active\_listening\_max\_thread\_num + +|名字| load\_active\_listening\_max\_thread\_num | +|:---:|:---| +|描述| 同时执行加载 tsfile 任务的最大线程数,参数被注释掉时的默值为 max(1, CPU 核心数 / 2),当用户设置的值不在这个区间[1, CPU核心数 /2]内时,会设置为默认值 (1, CPU 核心数 / 2)。 | +|类型| Long | +|默认值| max(1, CPU 核心数 / 2) | +|改后生效方式|重启后生效| + + +* load\_active\_listening\_check\_interval\_seconds + +|名字| load\_active\_listening\_check\_interval\_seconds | +|:---:|:---| +|描述| 主动监听轮询间隔,单位秒。主动监听 tsfile 的功能是通过轮询检查文件夹实现的。该配置指定了两次检查 load_active_listening_dirs 的时间间隔,每次检查完成 load_active_listening_check_interval_seconds 秒后,会执行下一次检查。当用户设置的轮询间隔小于 1 时,会被设置为默认值 5 秒。 | +|类型| Long | +|默认值| 5| +|改后生效方式|重启后生效| \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/ConfigNode-Config-Manual.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/ConfigNode-Config-Manual.md new file mode 100644 index 00000000..4e08f9e0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/ConfigNode-Config-Manual.md @@ -0,0 +1,210 @@ + + +# ConfigNode 配置参数 + +IoTDB ConfigNode 配置文件均位于 IoTDB 安装目录:`conf`文件夹下。 + +* `confignode-env.sh/bat`:环境配置项的配置文件,可以配置 ConfigNode 的内存大小。 + +* `iotdb-system.properties`:IoTDB 的配置文件。 + +## 环境配置项(confignode-env.sh/bat) + +环境配置项主要用于对 ConfigNode 运行的 Java 环境相关参数进行配置,如 JVM 相关配置。ConfigNode 启动时,此部分配置会被传给 JVM,详细配置项说明如下: + +* MEMORY\_SIZE + +|名字|MEMORY\_SIZE| +|:---:|:---| +|描述|IoTDB ConfigNode 启动时分配的内存大小 | +|类型|String| +|默认值|取决于操作系统和机器配置。默认为机器内存的十分之三,最多会被设置为 16G。| +|改后生效方式|重启服务生效| + +* ON\_HEAP\_MEMORY + +|名字|ON\_HEAP\_MEMORY| +|:---:|:---| +|描述|IoTDB ConfigNode 能使用的堆内内存大小, 曾用名: MAX\_HEAP\_SIZE | +|类型|String| +|默认值|取决于MEMORY\_SIZE的配置。| +|改后生效方式|重启服务生效| + +* OFF\_HEAP\_MEMORY + +|名字|OFF\_HEAP\_MEMORY| +|:---:|:---| +|描述|IoTDB ConfigNode 能使用的堆外内存大小, 曾用名: MAX\_DIRECT\_MEMORY\_SIZE | +|类型|String| +|默认值|取决于MEMORY\_SIZE的配置。| +|改后生效方式|重启服务生效| + +## 系统配置项(iotdb-system.properties) + +IoTDB 集群的全局配置通过 ConfigNode 配置。 + +### Config Node RPC 配置 + +* cn\_internal\_address + +| 名字 | cn\_internal\_address | +|:------:|:----------------------| +| 描述 | ConfigNode 集群内部地址 | +| 类型 | String | +| 默认值 | 127.0.0.1 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +* cn\_internal\_port + +| 名字 | cn\_internal\_port | +|:------:|:----------------------| +| 描述 | ConfigNode 集群服务监听端口 | +| 类型 | Short Int : [0,65535] | +| 默认值 | 10710 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +### 共识协议 + +* cn\_consensus\_port + +| 名字 | cn\_consensus\_port | +|:------:|:----------------------| +| 描述 | ConfigNode 的共识协议通信端口 | +| 类型 | Short Int : [0,65535] | +| 默认值 | 10720 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +### SeedConfigNode 配置 + +* cn\_seed\_config\_node + +| 名字 | cn\_seed\_config\_node | +|:------:|:--------------------------------------| +| 描述 | 目标 ConfigNode 地址,ConfigNode 通过此地址加入集群,推荐使用 SeedConfigNode。V1.2.2 及以前曾用名是 cn\_target\_config\_node\_list | +| 类型 | String | +| 默认值 | 127.0.0.1:10710 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +### 数据目录 + +* cn\_system\_dir + +|名字| cn\_system\_dir | +|:---:|:---------------------------------------------------------| +|描述| ConfigNode 系统数据存储路径 | +|类型| String | +|默认值| data/confignode/system(Windows:data\\configndoe\\system) | +|改后生效方式| 重启服务生效 | + +* cn\_consensus\_dir + +|名字| cn\_consensus\_dir | +|:---:|:---------------------------------------------------| +|描述| ConfigNode 共识协议数据存储路径 | +|类型| String | +|默认值| data/confignode/consensus(Windows:data\\configndoe\\consensus) | +|改后生效方式| 重启服务生效 | + +### Thrift RPC 配置 + +* cn\_rpc\_thrift\_compression\_enable + +| 名字 | cn\_rpc\_thrift\_compression\_enable | +|:------:|:-------------------------------------| +| 描述 | 是否启用 thrift 的压缩机制。 | +| 类型 | Boolean | +| 默认值 | false | +| 改后生效方式 | 重启服务生效 | + +* cn\_rpc\_advanced\_compression\_enable + +| 名字 | cn\_rpc\_advanced\_compression\_enable | +|:------:|:---------------------------------------| +| 描述 | 是否启用 thrift 的自定制压缩机制。 | +| 类型 | Boolean | +| 默认值 | false | +| 改后生效方式 | 重启服务生效 | + +* cn\_rpc\_max\_concurrent\_client\_num + +| 名字 | cn\_rpc\_max\_concurrent\_client\_num | +|:------:|:--------------------------------------| +| 描述 | 最大连接数。 | +| 类型 | Short Int : [0,65535] | +| 默认值 | 65535 | +| 改后生效方式 | 重启服务生效 | + +* cn\_thrift\_max\_frame\_size + +| 名字 | cn\_thrift\_max\_frame\_size | +|:------:|:---------------------------------------------| +| 描述 | RPC 请求/响应的最大字节数 | +| 类型 | long | +| 默认值 | 536870912 (默认值512MB,应大于等于 512 * 1024 * 1024) | +| 改后生效方式 | 重启服务生效 | + +* cn\_thrift\_init\_buffer\_size + +| 名字 | cn\_thrift\_init\_buffer\_size | +|:------:|:-------------------------------| +| 描述 | 字节数 | +| 类型 | Long | +| 默认值 | 1024 | +| 改后生效方式 | 重启服务生效 | + +* cn\_connection\_timeout\_ms + +| 名字 | cn\_connection\_timeout\_ms | +|:------:|:----------------------------| +| 描述 | 节点连接超时时间 | +| 类型 | int | +| 默认值 | 60000 | +| 改后生效方式 | 重启服务生效 | + +* cn\_selector\_thread\_nums\_of\_client\_manager + +| 名字 | cn\_selector\_thread\_nums\_of\_client\_manager | +|:------:|:------------------------------------------------| +| 描述 | 客户端异步线程管理的选择器线程数量 | +| 类型 | int | +| 默认值 | 1 | +| 改后生效方式 | 重启服务生效 | + +* cn\_core\_client\_count\_for\_each\_node\_in\_client\_manager + +| 名字 | cn\_core\_client\_count\_for\_each\_node\_in\_client\_manager | +|:------:|:--------------------------------------------------------------| +| 描述 | 单 ClientManager 中路由到每个节点的核心 Client 个数 | +| 类型 | int | +| 默认值 | 200 | +| 改后生效方式 | 重启服务生效 | + +* cn\_max\_client\_count\_for\_each\_node\_in\_client\_manager + +| 名字 | cn\_max\_client\_count\_for\_each\_node\_in\_client\_manager | +|:------:|:-------------------------------------------------------------| +| 描述 | 单 ClientManager 中路由到每个节点的最大 Client 个数 | +| 类型 | int | +| 默认值 | 300 | +| 改后生效方式 | 重启服务生效 | + +### Metric 监控配置 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/DataNode-Config-Manual.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/DataNode-Config-Manual.md new file mode 100644 index 00000000..c8884843 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/DataNode-Config-Manual.md @@ -0,0 +1,576 @@ + + +# DataNode 配置参数 + +IoTDB DataNode 与 Standalone 模式共用一套配置文件,均位于 IoTDB 安装目录:`conf`文件夹下。 + +* `datanode-env.sh/bat`:环境配置项的配置文件,可以配置 DataNode 的内存大小。 + +* `iotdb-system.properties`:IoTDB 的配置文件。 + +## 热修改配置项 + +为方便用户使用,IoTDB 为用户提供了热修改功能,即在系统运行过程中修改 `iotdb-system.properties` 中部分配置参数并即时应用到系统中。下面介绍的参数中,改后 生效方式为`热加载` +的均为支持热修改的配置参数。 + +通过 Session 或 Cli 发送 ```load configuration``` 或 `set configuration` 命令(SQL)至 IoTDB 可触发配置热加载。 + +## 环境配置项(datanode-env.sh/bat) + +环境配置项主要用于对 DataNode 运行的 Java 环境相关参数进行配置,如 JVM 相关配置。DataNode/Standalone 启动时,此部分配置会被传给 JVM,详细配置项说明如下: + +* MEMORY\_SIZE + +|名字|MEMORY\_SIZE| +|:---:|:---| +|描述|IoTDB DataNode 启动时分配的内存大小 | +|类型|String| +|默认值|取决于操作系统和机器配置。默认为机器内存的二分之一。| +|改后生效方式|重启服务生效| + +* ON\_HEAP\_MEMORY + +|名字|ON\_HEAP\_MEMORY| +|:---:|:---| +|描述|IoTDB DataNode 能使用的堆内内存大小, 曾用名: MAX\_HEAP\_SIZE | +|类型|String| +|默认值|取决于MEMORY\_SIZE的配置。| +|改后生效方式|重启服务生效| + +* OFF\_HEAP\_MEMORY + +|名字|OFF\_HEAP\_MEMORY| +|:---:|:---| +|描述|IoTDB DataNode 能使用的堆外内存大小, 曾用名: MAX\_DIRECT\_MEMORY\_SIZE | +|类型|String| +|默认值|取决于MEMORY\_SIZE的配置| +|改后生效方式|重启服务生效| + +* JMX\_LOCAL + +|名字|JMX\_LOCAL| +|:---:|:---| +|描述|JMX 监控模式,配置为 true 表示仅允许本地监控,设置为 false 的时候表示允许远程监控。如想在本地通过网络连接JMX Service,比如nodeTool.sh会尝试连接127.0.0.1:31999,请将JMX_LOCAL设置为false。| +|类型|枚举 String : “true”, “false”| +|默认值|true| +|改后生效方式|重启服务生效| + +* JMX\_PORT + +|名字|JMX\_PORT| +|:---:|:---| +|描述|JMX 监听端口。请确认该端口是不是系统保留端口并且未被占用。| +|类型|Short Int: [0,65535]| +|默认值|31999| +|改后生效方式|重启服务生效| + +## 系统配置项(iotdb-system.properties) + +系统配置项是 IoTDB DataNode/Standalone 运行的核心配置,它主要用于设置 DataNode/Standalone 数据库引擎的参数。 + +### Data Node RPC 服务配置 + +* dn\_rpc\_address + +|名字| dn\_rpc\_address | +|:---:|:-----------------| +|描述| 客户端 RPC 服务监听地址 | +|类型| String | +|默认值| 0.0.0.0 | +|改后生效方式| 重启服务生效 | + +* dn\_rpc\_port + +|名字| dn\_rpc\_port | +|:---:|:---| +|描述| Client RPC 服务监听端口| +|类型| Short Int : [0,65535] | +|默认值| 6667 | +|改后生效方式|重启服务生效| + +* dn\_internal\_address + +|名字| dn\_internal\_address | +|:---:|:---| +|描述| DataNode 内网通信地址 | +|类型| string | +|默认值| 127.0.0.1 | +|改后生效方式|仅允许在第一次启动服务前修改| + +* dn\_internal\_port + +|名字| dn\_internal\_port | +|:---:|:-------------------| +|描述| DataNode 内网通信端口 | +|类型| int | +|默认值| 10730 | +|改后生效方式| 仅允许在第一次启动服务前修改 | + +* dn\_mpp\_data\_exchange\_port + +|名字| dn\_mpp\_data\_exchange\_port | +|:---:|:---| +|描述| MPP 数据交换端口 | +|类型| int | +|默认值| 10740 | +|改后生效方式|仅允许在第一次启动服务前修改| + +* dn\_schema\_region\_consensus\_port + +|名字| dn\_schema\_region\_consensus\_port | +|:---:|:---| +|描述| DataNode 元数据副本的共识协议通信端口 | +|类型| int | +|默认值| 10750 | +|改后生效方式|仅允许在第一次启动服务前修改| + +* dn\_data\_region\_consensus\_port + +|名字| dn\_data\_region\_consensus\_port | +|:---:|:---| +|描述| DataNode 数据副本的共识协议通信端口 | +|类型| int | +|默认值| 10760 | +|改后生效方式|仅允许在第一次启动服务前修改| + +* dn\_join\_cluster\_retry\_interval\_ms + +|名字| dn\_join\_cluster\_retry\_interval\_ms | +|:---:|:---------------------------------------| +|描述| DataNode 再次重试加入集群等待时间 | +|类型| long | +|默认值| 5000 | +|改后生效方式| 重启服务生效 | + + +### SSL 配置 + +* enable\_thrift\_ssl + +|名字| enable\_thrift\_ssl | +|:---:|:----------------------------------------------| +|描述| 当enable\_thrift\_ssl配置为true时,将通过dn\_rpc\_port使用 SSL 加密进行通信 | +|类型| Boolean | +|默认值| false | +|改后生效方式| 重启服务生效 | + +* enable\_https + +|名字| enable\_https | +|:---:|:-------------------------| +|描述| REST Service 是否开启 SSL 配置 | +|类型| Boolean | +|默认值| false | +|改后生效方式| 重启生效 | + +* key\_store\_path + +|名字| key\_store\_path | +|:---:|:-----------------| +|描述| ssl证书路径 | +|类型| String | +|默认值| "" | +|改后生效方式| 重启服务生效 | + +* key\_store\_pwd + +|名字| key\_store\_pwd | +|:---:|:----------------| +|描述| ssl证书密码 | +|类型| String | +|默认值| "" | +|改后生效方式| 重启服务生效 | + + +### SeedConfigNode 配置 + +* dn\_seed\_config\_node + +|名字| dn\_seed\_config\_node | +|:---:|:------------------------------------| +|描述| ConfigNode 地址,DataNode 启动时通过此地址加入集群,推荐使用 SeedConfigNode。V1.2.2 及以前曾用名是 dn\_target\_config\_node\_list | +|类型| String | +|默认值| 127.0.0.1:10710 | +|改后生效方式| 仅允许在第一次启动服务前修改 | + +### 连接配置 + +* dn\_session\_timeout\_threshold + +|名字| dn\_session_timeout_threshold | +|:---:|:------------------------------| +|描述| 最大的会话空闲时间 | +|类型| int | +|默认值| 0 | +|改后生效方式| 重启服务生效 | + + +* dn\_rpc\_thrift\_compression\_enable + +|名字| dn\_rpc\_thrift\_compression\_enable | +|:---:|:---------------------------------| +|描述| 是否启用 thrift 的压缩机制 | +|类型| Boolean | +|默认值| false | +|改后生效方式| 重启服务生效 | + +* dn\_rpc\_advanced\_compression\_enable + +|名字| dn\_rpc\_advanced\_compression\_enable | +|:---:|:-----------------------------------| +|描述| 是否启用 thrift 的自定制压缩机制 | +|类型| Boolean | +|默认值| false | +|改后生效方式| 重启服务生效 | + +* dn\_rpc\_selector\_thread\_count + +| 名字 | rpc\_selector\_thread\_count | +|:------:|:-----------------------------| +| 描述 | rpc 选择器线程数量 | +| 类型 | int | +| 默认值 | 1 | +| 改后生效方式 | 重启服务生效 | + +* dn\_rpc\_min\_concurrent\_client\_num + +| 名字 | rpc\_min\_concurrent\_client\_num | +|:------:|:----------------------------------| +| 描述 | 最小连接数 | +| 类型 | Short Int : [0,65535] | +| 默认值 | 1 | +| 改后生效方式 | 重启服务生效 | + +* dn\_rpc\_max\_concurrent\_client\_num + +| 名字 | dn\_rpc\_max\_concurrent\_client\_num | +|:------:|:----------------------------------| +| 描述 | 最大连接数 | +| 类型 | Short Int : [0,65535] | +| 默认值 | 65535 | +| 改后生效方式 | 重启服务生效 | + +* dn\_thrift\_max\_frame\_size + +|名字| dn\_thrift\_max\_frame\_size | +|:---:|:---| +|描述| RPC 请求/响应的最大字节数| +|类型| long | +|默认值| 536870912 (默认值512MB,应大于等于 512 * 1024 * 1024) | +|改后生效方式|重启服务生效| + +* dn\_thrift\_init\_buffer\_size + +|名字| dn\_thrift\_init\_buffer\_size | +|:---:|:---| +|描述| 字节数 | +|类型| long | +|默认值| 1024 | +|改后生效方式|重启服务生效| + +* dn\_connection\_timeout\_ms + +| 名字 | dn\_connection\_timeout\_ms | +|:------:|:----------------------------| +| 描述 | 节点连接超时时间 | +| 类型 | int | +| 默认值 | 60000 | +| 改后生效方式 | 重启服务生效 | + +* dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager + +| 名字 | dn\_core\_client\_count\_for\_each\_node\_in\_client\_manager | +|:------:|:--------------------------------------------------------------| +| 描述 | 单 ClientManager 中路由到每个节点的核心 Client 个数 | +| 类型 | int | +| 默认值 | 200 | +| 改后生效方式 | 重启服务生效 | + +* dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager + +| 名字 | dn\_max\_client\_count\_for\_each\_node\_in\_client\_manager | +|:------:|:-------------------------------------------------------------| +| 描述 | 单 ClientManager 中路由到每个节点的最大 Client 个数 | +| 类型 | int | +| 默认值 | 300 | +| 改后生效方式 | 重启服务生效 | + +### 目录配置 + +* dn\_system\_dir + +| 名字 | dn\_system\_dir | +|:------:|:--------------------------------------------------------------------| +| 描述 | IoTDB 元数据存储路径,默认存放在和 sbin 目录同级的 data 目录下。相对路径的起始目录与操作系统相关,建议使用绝对路径。 | +| 类型 | String | +| 默认值 | data/datanode/system(Windows:data\\datanode\\system) | +| 改后生效方式 | 重启服务生效 | + +* dn\_data\_dirs + +| 名字 | dn\_data\_dirs | +|:------:|:-------------------------------------------------------------------| +| 描述 | IoTDB 数据存储路径,默认存放在和 sbin 目录同级的 data 目录下。相对路径的起始目录与操作系统相关,建议使用绝对路径。 | +| 类型 | String | +| 默认值 | data/datanode/data(Windows:data\\datanode\\data) | +| 改后生效方式 | 重启服务生效 | + +* dn\_multi\_dir\_strategy + +| 名字 | dn\_multi\_dir\_strategy | +|:------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| 描述 | IoTDB 在 data\_dirs 中为 TsFile 选择目录时采用的策略。可使用简单类名或类名全称。系统提供以下三种策略:
1. SequenceStrategy:IoTDB 按顺序选择目录,依次遍历 data\_dirs 中的所有目录,并不断轮循;
2. MaxDiskUsableSpaceFirstStrategy:IoTDB 优先选择 data\_dirs 中对应磁盘空余空间最大的目录;
您可以通过以下方法完成用户自定义策略:
1. 继承 org.apache.iotdb.db.storageengine.rescon.disk.strategy.DirectoryStrategy 类并实现自身的 Strategy 方法;
2. 将实现的类的完整类名(包名加类名,UserDefineStrategyPackage)填写到该配置项;
3. 将该类 jar 包添加到工程中。 | +| 类型 | String | +| 默认值 | SequenceStrategy | +| 改后生效方式 | 热加载 | + +* dn\_consensus\_dir + +| 名字 | dn\_consensus\_dir | +|:------:|:-------------------------------------------------------------------------| +| 描述 | IoTDB 共识层日志存储路径,默认存放在和 sbin 目录同级的 data 目录下。相对路径的起始目录与操作系统相关,建议使用绝对路径。 | +| 类型 | String | +| 默认值 | data/datanode/consensus(Windows:data\\datanode\\consensus) | +| 改后生效方式 | 重启服务生效 | + +* dn\_wal\_dirs + +| 名字 | dn\_wal\_dirs | +|:------:|:---------------------------------------------------------------------| +| 描述 | IoTDB 写前日志存储路径,默认存放在和 sbin 目录同级的 data 目录下。相对路径的起始目录与操作系统相关,建议使用绝对路径。 | +| 类型 | String | +| 默认值 | data/datanode/wal(Windows:data\\datanode\\wal) | +| 改后生效方式 | 重启服务生效 | + +* dn\_tracing\_dir + +| 名字 | dn\_tracing\_dir | +|:------:|:--------------------------------------------------------------------| +| 描述 | IoTDB 追踪根目录路径,默认存放在和 sbin 目录同级的 data 目录下。相对路径的起始目录与操作系统相关,建议使用绝对路径。 | +| 类型 | String | +| 默认值 | datanode/tracing | +| 改后生效方式 | 重启服务生效 | + +* dn\_sync\_dir + +| 名字 | dn\_sync\_dir | +|:------:|:----------------------------------------------------------------------| +| 描述 | IoTDB sync 存储路径,默认存放在和 sbin 目录同级的 data 目录下。相对路径的起始目录与操作系统相关,建议使用绝对路径。 | +| 类型 | String | +| 默认值 | data/datanode/sync | +| 改后生效方式 | 重启服务生效 | + +### Metric 配置 + +## 开启 GC 日志 + +GC 日志默认是关闭的。为了性能调优,用户可能会需要收集 GC 信息。 +若要打开 GC 日志,则需要在启动 IoTDB Server 的时候加上"printgc"参数: + +```bash +nohup sbin/start-datanode.sh printgc >/dev/null 2>&1 & +``` + +或者 + +```bash +sbin\start-datanode.bat printgc +``` + +GC 日志会被存储在`IOTDB_HOME/logs/gc.log`. 至多会存储 10 个 gc.log 文件,每个文件最多 10MB。 + +#### REST 服务配置 + +* enable\_rest\_service + +|名字| enable\_rest\_service | +|:---:|:--------------------| +|描述| 是否开启Rest服务。 | +|类型| Boolean | +|默认值| false | +|改后生效方式| 重启生效 | + +* rest\_service\_port + +|名字| rest\_service\_port | +|:---:|:------------------| +|描述| Rest服务监听端口号 | +|类型| int32 | +|默认值| 18080 | +|改后生效方式| 重启生效 | + +* enable\_swagger + +|名字| enable\_swagger | +|:---:|:-----------------------| +|描述| 是否启用swagger来展示rest接口信息 | +|类型| Boolean | +|默认值| false | +|改后生效方式| 重启生效 | + +* rest\_query\_default\_row\_size\_limit + +|名字| rest\_query\_default\_row\_size\_limit | +|:---:|:----------------------------------| +|描述| 一次查询能返回的结果集最大行数 | +|类型| int32 | +|默认值| 10000 | +|改后生效方式| 重启生效 | + +* cache\_expire + +|名字| cache\_expire | +|:---:|:--------------| +|描述| 缓存客户登录信息的过期时间 | +|类型| int32 | +|默认值| 28800 | +|改后生效方式| 重启生效 | + +* cache\_max\_num + +|名字| cache\_max\_num | +|:---:|:--------------| +|描述| 缓存中存储的最大用户数量 | +|类型| int32 | +|默认值| 100 | +|改后生效方式| 重启生效 | + +* cache\_init\_num + +|名字| cache\_init\_num | +|:---:|:---------------| +|描述| 缓存初始容量 | +|类型| int32 | +|默认值| 10 | +|改后生效方式| 重启生效 | + +* trust\_store\_path + +|名字| trust\_store\_path | +|:---:|:---------------| +|描述| keyStore 密码(非必填) | +|类型| String | +|默认值| "" | +|改后生效方式| 重启生效 | + +* trust\_store\_pwd + +|名字| trust\_store\_pwd | +|:---:|:---------------| +|描述| trustStore 密码(非必填) | +|类型| String | +|默认值| "" | +|改后生效方式| 重启生效 | + +* idle\_timeout + +|名字| idle\_timeout | +|:---:|:--------------| +|描述| SSL 超时时间,单位为秒 | +|类型| int32 | +|默认值| 5000 | +|改后生效方式| 重启生效 | + + + +#### 多级存储配置 + +* dn\_default\_space\_usage\_thresholds + +|名字| dn\_default\_space\_usage\_thresholds | +|:---:|:--------------| +|描述| 定义每个层级数据目录的最小剩余空间比例;当剩余空间少于该比例时,数据会被自动迁移至下一个层级;当最后一个层级的剩余存储空间到低于此阈值时,会将系统置为 READ_ONLY | +|类型| double | +|默认值| 0.85 | +|改后生效方式| 热加载 | + +* remote\_tsfile\_cache\_dirs + +|名字| remote\_tsfile\_cache\_dirs | +|:---:|:--------------| +|描述| 云端存储在本地的缓存目录 | +|类型| string | +|默认值| data/datanode/data/cache | +|改后生效方式| 重启生效 | + +* remote\_tsfile\_cache\_page\_size\_in\_kb + +|名字| remote\_tsfile\_cache\_page\_size\_in\_kb | +|:---:|:--------------| +|描述| 云端存储在本地缓存文件的块大小 | +|类型| int | +|默认值| 20480 | +|改后生效方式| 重启生效 | + +* remote\_tsfile\_cache\_max\_disk\_usage\_in\_mb + +|名字| remote\_tsfile\_cache\_max\_disk\_usage\_in\_mb | +|:---:|:--------------| +|描述| 云端存储本地缓存的最大磁盘占用大小 | +|类型| long | +|默认值| 51200 | +|改后生效方式| 重启生效 | + +* object\_storage\_type + +|名字| object\_storage\_type | +|:---:|:--------------| +|描述| 云端存储类型 | +|类型| string | +|默认值| AWS_S3 | +|改后生效方式| 重启生效 | + +* object\_storage\_bucket + +|名字| object\_storage\_bucket | +|:---:|:--------------| +|描述| 云端存储 bucket 的名称 | +|类型| string | +|默认值| iotdb_data | +|改后生效方式| 重启生效 | + +* object\_storage\_endpoiont + +|名字| object\_storage\_endpoiont | +|:---:|:--------------| +|描述| 云端存储的 endpoint | +|类型| string | +|默认值| 无 | +|改后生效方式| 重启生效 | + +* object\_storage\_access\_key + +|名字| object\_storage\_access\_key | +|:---:|:--------------| +|描述| 云端存储的验证信息 key | +|类型| string | +|默认值| 无 | +|改后生效方式| 重启生效 | + +* object\_storage\_access\_secret + +|名字| object\_storage\_access\_secret | +|:---:|:--------------| +|描述| 云端存储的验证信息 secret | +|类型| string | +|默认值| 无 | +|改后生效方式| 重启生效 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/Keywords.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/Keywords.md new file mode 100644 index 00000000..0d681ffb --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/Keywords.md @@ -0,0 +1,227 @@ + + +# 关键字 + +保留字(不能用于作为标识符): + +- ROOT +- TIME +- TIMESTAMP + +一般关键字: + +- ADD +- AFTER +- ALIAS +- ALIGN +- ALIGNED +- ALL +- ALTER +- ALTER_TIMESERIES +- ANY +- APPEND +- APPLY_TEMPLATE +- AS +- ASC +- ATTRIBUTES +- BEFORE +- BEGIN +- BLOCKED +- BOUNDARY +- BY +- CACHE +- CHILD +- CLEAR +- CLUSTER +- CONCAT +- CONFIGNODES +- CONFIGURATION +- CONTINUOUS +- COUNT +- CONTAIN +- CQ +- CQS +- CREATE +- CREATE_CONTINUOUS_QUERY +- CREATE_FUNCTION +- CREATE_ROLE +- CREATE_TIMESERIES +- CREATE_TRIGGER +- CREATE_USER +- DATA +- DATABASE +- DATABASES +- DATANODES +- DEACTIVATE +- DEBUG +- DELETE +- DELETE_ROLE +- DELETE_STORAGE_GROUP +- DELETE_TIMESERIES +- DELETE_USER +- DESC +- DESCRIBE +- DEVICE +- DEVICEID +- DEVICES +- DISABLE +- DISCARD +- DROP +- DROP_CONTINUOUS_QUERY +- DROP_FUNCTION +- DROP_TRIGGER +- END +- ENDTIME +- EVERY +- EXPLAIN +- FILL +- FILE +- FLUSH +- FOR +- FROM +- FULL +- FUNCTION +- FUNCTIONS +- GLOBAL +- GRANT +- GRANT_ROLE_PRIVILEGE +- GRANT_USER_PRIVILEGE +- GRANT_USER_ROLE +- GROUP +- HAVING +- HEAD +- INDEX +- INFO +- INSERT +- INSERT_TIMESERIES +- INTO +- KILL +- LABEL +- LAST +- LATEST +- LEVEL +- LIKE +- LIMIT +- LINEAR +- LINK +- LIST +- LIST_ROLE +- LIST_USER +- LOAD +- LOCAL +- LOCK +- MERGE +- METADATA +- MODIFY_PASSWORD +- NODES +- NONE +- NOW +- OF +- OFF +- OFFSET +- ON +- ORDER +- ONSUCCESS +- PARTITION +- PASSWORD +- PATHS +- PIPE +- PIPES +- PIPESINK +- PIPESINKS +- PIPESINKTYPE +- POLICY +- PREVIOUS +- PREVIOUSUNTILLAST +- PRIVILEGES +- PROCESSLIST +- PROPERTY +- PRUNE +- QUERIES +- QUERY +- RANGE +- READONLY +- READ_TEMPLATE +- READ_TEMPLATE_APPLICATION +- READ_TIMESERIES +- REGEXP +- REGIONID +- REGIONS +- REMOVE +- RENAME +- RESAMPLE +- RESOURCE +- REVOKE +- REVOKE_ROLE_PRIVILEGE +- REVOKE_USER_PRIVILEGE +- REVOKE_USER_ROLE +- ROLE +- RUNNING +- SCHEMA +- SELECT +- SERIESSLOTID +- SET +- SET_STORAGE_GROUP +- SETTLE +- SGLEVEL +- SHOW +- SLIMIT +- SOFFSET +- STORAGE +- START +- STARTTIME +- STATELESS +- STATEFUL +- STOP +- SYSTEM +- TAIL +- TAGS +- TASK +- TEMPLATE +- TIMEOUT +- TIMESERIES +- TIMESLOTID +- TO +- TOLERANCE +- TOP +- TRACING +- TRIGGER +- TRIGGERS +- TTL +- UNLINK +- UNLOAD +- UNSET +- UPDATE +- UPDATE_TEMPLATE +- UPSERT +- URI +- USER +- USING +- VALUES +- VERIFY +- VERSION +- VIEW +- WATERMARK_EMBEDDING +- WHERE +- WITH +- WITHOUT +- WRITABLE \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/Modify-Config-Manual.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/Modify-Config-Manual.md new file mode 100644 index 00000000..4f2ea63a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/Modify-Config-Manual.md @@ -0,0 +1,71 @@ + + +# 配置项修改介绍 +## 设置方式 +* 使用sql语句修改【推荐】 +* 直接修改配置文件【不推荐】 +## 生效方式 +* 第一次启动后不可修改 (first_start) +* 重启后生效 (restart) +* 热加载 (hot_reload) +# 直接修改配置文件 +可以通过重启或以下命令生效 +## 热加载配置命令 +使支持热加载的配置项改动立即生效。 +对于已经写在配置文件中修改过的配置项,从配置文件中删除或注释后再进行 load configuration 将恢复默认值。 +``` +load configuration +``` +# 配置项操作语句 +设置配置项 +``` +set configuration "key1"="value1" "key2"="value2"... (on nodeId) +``` +### 示例1 +``` +set configuration "enable_cross_space_compaction"="false" +``` +对集群所有节点永久生效,设置 enable_cross_space_compaction 为 false,并写入到 iotdb-system.properties 中。 +### 示例2 +``` +set configuration "enable_cross_space_compaction"="false" "enable_seq_space_compaction"="false" on 1 +``` +对 nodeId 为 1 的节点永久生效,设置 enable_cross_space_compaction 为 false,设置 enable_seq_space_compaction 为 false,并写入到 iotdb-system.properties 中。 +### 示例3 +``` +set configuration "enable_cross_space_compaction"="false" "timestamp_precision"="ns" +``` +对集群所有节点永久生效,设置 enable_cross_space_compaction 为 false,timestamp_precision 为 ns,并写入到 iotdb-system.properties 中。但是,timestamp_precision 是第一次启动后就无法修改的配置项,因此会忽略这个配置项的更新,返回如下。 +``` +Msg: org.apache.iotdb.jdbc.IoTDBSQLException: 301: ignored config items: [timestamp_precision] +``` +# 生效配置项 +支持热加载立即生效的配置项在 iotdb-system.properties.template 文件中标记 effectiveMode 为 hot_reload +示例 +``` +# Used for indicate cluster name and distinguish different cluster. +# If you need to modify the cluster name, it's recommended to use 'set configuration "cluster_name=xxx"' sql. +# Manually modifying configuration file is not recommended, which may cause node restart fail. +# effectiveMode: hot_reload +# Datatype: string +cluster_name=defaultCluster +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/Status-Codes.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/Status-Codes.md new file mode 100644 index 00000000..d941f786 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/Status-Codes.md @@ -0,0 +1,178 @@ + + +# 状态码 + +IoTDB 引入了**状态码**这一概念。例如,因为 IoTDB 需要在写入数据之前首先注册时间序列,一种可能的解决方案是: + +``` +try { + writeData(); +} catch (SQLException e) { + // the most case is that the time series does not exist + if (e.getMessage().contains("exist")) { + //However, using the content of the error message is not so efficient + registerTimeSeries(); + //write data once again + writeData(); + } +} + +``` + +利用状态码,我们就可以不必写诸如`if (e.getErrorMessage().contains("exist"))`的代码, +只需要使用`e.getStatusType().getCode() == TSStatusCode.TIME_SERIES_NOT_EXIST_ERROR.getStatusCode()`。 + +这里是状态码和相对应信息的列表: + +| 状态码 | 状态类型 | 状态信息 | +|:-----|:---------------------------------------|:--------------------------| +| 200 | SUCCESS_STATUS | 成功状态 | +| 201 | INCOMPATIBLE_VERSION | 版本不兼容 | +| 202 | CONFIGURATION_ERROR | 配置文件有错误项 | +| 203 | START_UP_ERROR | 启动错误 | +| 204 | SHUT_DOWN_ERROR | 关机错误 | +| 300 | UNSUPPORTED_OPERATION | 不支持的操作 | +| 301 | EXECUTE_STATEMENT_ERROR | 执行语句错误 | +| 302 | MULTIPLE_ERROR | 多行语句执行错误 | +| 303 | ILLEGAL_PARAMETER | 参数错误 | +| 304 | OVERLAP_WITH_EXISTING_TASK | 与正在执行的其他操作冲突 | +| 305 | INTERNAL_SERVER_ERROR | 服务器内部错误 | +| 306 | DISPATCH_ERROR | 分发错误 | +| 400 | REDIRECTION_RECOMMEND | 推荐客户端重定向 | +| 500 | DATABASE_NOT_EXIST | 数据库不存在 | +| 501 | DATABASE_ALREADY_EXISTS | 数据库已存在 | +| 502 | SERIES_OVERFLOW | 序列数量超过阈值 | +| 503 | TIMESERIES_ALREADY_EXIST | 时间序列已存在 | +| 504 | TIMESERIES_IN_BLACK_LIST | 时间序列正在删除 | +| 505 | ALIAS_ALREADY_EXIST | 路径别名已经存在 | +| 506 | PATH_ALREADY_EXIST | 路径已经存在 | +| 507 | METADATA_ERROR | 处理元数据错误 | +| 508 | PATH_NOT_EXIST | 路径不存在 | +| 509 | ILLEGAL_PATH | 路径不合法 | +| 510 | CREATE_TEMPLATE_ERROR | 创建物理量模板失败 | +| 511 | DUPLICATED_TEMPLATE | 元数据模板重复 | +| 512 | UNDEFINED_TEMPLATE | 元数据模板未定义 | +| 513 | TEMPLATE_NOT_SET | 元数据模板未设置 | +| 514 | DIFFERENT_TEMPLATE | 元数据模板不一致 | +| 515 | TEMPLATE_IS_IN_USE | 元数据模板正在使用 | +| 516 | TEMPLATE_INCOMPATIBLE | 元数据模板不兼容 | +| 517 | SEGMENT_NOT_FOUND | 未找到 Segment | +| 518 | PAGE_OUT_OF_SPACE | PBTreeFile 中 Page 空间不够 | +| 519 | RECORD_DUPLICATED | 记录重复 | +| 520 | SEGMENT_OUT_OF_SPACE | PBTreeFile 中 segment 空间不够 | +| 521 | PBTREE_FILE_NOT_EXISTS | PBTreeFile 不存在 | +| 522 | OVERSIZE_RECORD | 记录大小超过元数据文件页面大小 | +| 523 | PBTREE_FILE_REDO_LOG_BROKEN | PBTreeFile 的 redo 日志损坏 | +| 524 | TEMPLATE_NOT_ACTIVATED | 元数据模板未激活 | +| 526 | SCHEMA_QUOTA_EXCEEDED | 集群元数据超过配额上限 | +| 527 | MEASUREMENT_ALREADY_EXISTS_IN_TEMPLATE | 元数据模板中已存在物理量 | +| 600 | SYSTEM_READ_ONLY | IoTDB 系统只读 | +| 601 | STORAGE_ENGINE_ERROR | 存储引擎相关错误 | +| 602 | STORAGE_ENGINE_NOT_READY | 存储引擎还在恢复中,还不能接受读写操作 | +| 603 | DATAREGION_PROCESS_ERROR | DataRegion 相关错误 | +| 604 | TSFILE_PROCESSOR_ERROR | TsFile 处理器相关错误 | +| 605 | WRITE_PROCESS_ERROR | 写入相关错误 | +| 606 | WRITE_PROCESS_REJECT | 写入拒绝错误 | +| 607 | OUT_OF_TTL | 插入时间少于 TTL 时间边界 | +| 608 | COMPACTION_ERROR | 合并错误 | +| 609 | ALIGNED_TIMESERIES_ERROR | 对齐时间序列错误 | +| 610 | WAL_ERROR | WAL 异常 | +| 611 | DISK_SPACE_INSUFFICIENT | 磁盘空间不足 | +| 700 | SQL_PARSE_ERROR | SQL 语句分析错误 | +| 701 | SEMANTIC_ERROR | SQL 语义错误 | +| 702 | GENERATE_TIME_ZONE_ERROR | 生成时区错误 | +| 703 | SET_TIME_ZONE_ERROR | 设置时区错误 | +| 704 | QUERY_NOT_ALLOWED | 查询语句不允许 | +| 705 | LOGICAL_OPERATOR_ERROR | 逻辑符相关错误 | +| 706 | LOGICAL_OPTIMIZE_ERROR | 逻辑优化相关错误 | +| 707 | UNSUPPORTED_FILL_TYPE | 不支持的填充类型 | +| 708 | QUERY_PROCESS_ERROR | 查询处理相关错误 | +| 709 | MPP_MEMORY_NOT_ENOUGH | MPP 框架中任务执行内存不足 | +| 710 | CLOSE_OPERATION_ERROR | 关闭操作错误 | +| 711 | TSBLOCK_SERIALIZE_ERROR | TsBlock 序列化错误 | +| 712 | INTERNAL_REQUEST_TIME_OUT | MPP 操作超时 | +| 713 | INTERNAL_REQUEST_RETRY_ERROR | 内部操作重试失败 | +| 714 | NO_SUCH_QUERY | 查询不存在 | +| 715 | QUERY_WAS_KILLED | 查询执行时被终止 | +| 800 | UNINITIALIZED_AUTH_ERROR | 授权模块未初始化 | +| 801 | WRONG_LOGIN_PASSWORD | 用户名或密码错误 | +| 802 | NOT_LOGIN | 没有登录 | +| 803 | NO_PERMISSION | 没有操作权限 | +| 804 | USER_NOT_EXIST | 用户不存在 | +| 805 | USER_ALREADY_EXIST | 用户已存在 | +| 806 | USER_ALREADY_HAS_ROLE | 用户拥有对应角色 | +| 807 | USER_NOT_HAS_ROLE | 用户未拥有对应角色 | +| 808 | ROLE_NOT_EXIST | 角色不存在 | +| 809 | ROLE_ALREADY_EXIST | 角色已存在 | +| 810 | ALREADY_HAS_PRIVILEGE | 已拥有对应权限 | +| 811 | NOT_HAS_PRIVILEGE | 未拥有对应权限 | +| 812 | CLEAR_PERMISSION_CACHE_ERROR | 清空权限缓存失败 | +| 813 | UNKNOWN_AUTH_PRIVILEGE | 未知权限 | +| 814 | UNSUPPORTED_AUTH_OPERATION | 不支持的权限操作 | +| 815 | AUTH_IO_EXCEPTION | 权限模块IO异常 | +| 900 | MIGRATE_REGION_ERROR | Region 迁移失败 | +| 901 | CREATE_REGION_ERROR | 创建 region 失败 | +| 902 | DELETE_REGION_ERROR | 删除 region 失败 | +| 903 | PARTITION_CACHE_UPDATE_ERROR | 更新分区缓存失败 | +| 904 | CONSENSUS_NOT_INITIALIZED | 共识层未初始化,不能提供服务 | +| 905 | REGION_LEADER_CHANGE_ERROR | Region leader 迁移失败 | +| 906 | NO_AVAILABLE_REGION_GROUP | 无法找到可用的 Region 副本组 | +| 907 | LACK_DATA_PARTITION_ALLOCATION | 调用创建数据分区方法的返回结果里缺少信息 | +| 1000 | DATANODE_ALREADY_REGISTERED | DataNode 在集群中已经注册 | +| 1001 | NO_ENOUGH_DATANODE | DataNode 数量不足,无法移除节点或创建副本 | +| 1002 | ADD_CONFIGNODE_ERROR | 新增 ConfigNode 失败 | +| 1003 | REMOVE_CONFIGNODE_ERROR | 移除 ConfigNode 失败 | +| 1004 | DATANODE_NOT_EXIST | 此 DataNode 不存在 | +| 1005 | DATANODE_STOP_ERROR | DataNode 关闭失败 | +| 1006 | REMOVE_DATANODE_ERROR | 移除 datanode 失败 | +| 1007 | REGISTER_DATANODE_WITH_WRONG_ID | 注册的 DataNode 中有错误的注册id | +| 1008 | CAN_NOT_CONNECT_DATANODE | 连接 DataNode 失败 | +| 1100 | LOAD_FILE_ERROR | 加载文件错误 | +| 1101 | LOAD_PIECE_OF_TSFILE_ERROR | 加载 TsFile 片段异常 | +| 1102 | DESERIALIZE_PIECE_OF_TSFILE_ERROR | 反序列化 TsFile 片段异常 | +| 1103 | SYNC_CONNECTION_ERROR | 同步连接错误 | +| 1104 | SYNC_FILE_REDIRECTION_ERROR | 同步文件时重定向异常 | +| 1105 | SYNC_FILE_ERROR | 同步文件异常 | +| 1106 | CREATE_PIPE_SINK_ERROR | 创建 PIPE Sink 失败 | +| 1107 | PIPE_ERROR | PIPE 异常 | +| 1108 | PIPESERVER_ERROR | PIPE server 异常 | +| 1109 | VERIFY_METADATA_ERROR | 校验元数据失败 | +| 1200 | UDF_LOAD_CLASS_ERROR | UDF 加载类异常 | +| 1201 | UDF_DOWNLOAD_ERROR | 无法从 ConfigNode 下载 UDF | +| 1202 | CREATE_UDF_ON_DATANODE_ERROR | 在 DataNode 创建 UDF 失败 | +| 1203 | DROP_UDF_ON_DATANODE_ERROR | 在 DataNode 卸载 UDF 失败 | +| 1300 | CREATE_TRIGGER_ERROR | ConfigNode 创建 Trigger 失败 | +| 1301 | DROP_TRIGGER_ERROR | ConfigNode 删除 Trigger 失败 | +| 1302 | TRIGGER_FIRE_ERROR | 触发器执行错误 | +| 1303 | TRIGGER_LOAD_CLASS_ERROR | 触发器加载类异常 | +| 1304 | TRIGGER_DOWNLOAD_ERROR | 从 ConfigNode 下载触发器异常 | +| 1305 | CREATE_TRIGGER_INSTANCE_ERROR | 创建触发器实例异常 | +| 1306 | ACTIVE_TRIGGER_INSTANCE_ERROR | 激活触发器实例异常 | +| 1307 | DROP_TRIGGER_INSTANCE_ERROR | 删除触发器实例异常 | +| 1308 | UPDATE_TRIGGER_LOCATION_ERROR | 更新有状态的触发器所在 DataNode 异常 | +| 1400 | NO_SUCH_CQ | CQ 任务不存在 | +| 1401 | CQ_ALREADY_ACTIVE | CQ 任务已激活 | +| 1402 | CQ_AlREADY_EXIST | CQ 任务已存在 | +| 1403 | CQ_UPDATE_LAST_EXEC_TIME_ERROR | CQ 更新上一次执行时间失败 | + +> 在最新版本中,我们重构了 IoTDB 的异常类。通过将错误信息统一提取到异常类中,并为所有异常添加不同的错误代码,从而当捕获到异常并引发更高级别的异常时,错误代码将保留并传递,以便用户了解详细的错误原因。 +除此之外,我们添加了一个基础异常类“ProcessException”,由所有异常扩展。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/Syntax-Rule.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/Syntax-Rule.md new file mode 100644 index 00000000..1579bc52 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/Syntax-Rule.md @@ -0,0 +1,275 @@ + + +# 标识符 +## 字面值常量 + +该部分对 IoTDB 中支持的字面值常量进行说明,包括字符串常量、数值型常量、时间戳常量、布尔型常量和空值。 + +### 字符串常量 + +在 IoTDB 中,字符串是由**单引号(`'`)或双引号(`"`)字符括起来的字符序列**。示例如下: + +```Plain%20Text +'a string' +"another string" +``` + +#### 使用场景 + +- `INSERT` 或者 `SELECT` 中用于表达 `TEXT` 类型数据的场景。 + + ```SQL + # insert 示例 + insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') + insert into root.ln.wf02.wt02(timestamp,hardware) values(2, '\\') + + +-----------------------------+--------------------------+ + | Time|root.ln.wf02.wt02.hardware| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.001+08:00| v1| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.002+08:00| \\| + +-----------------------------+--------------------------+ + + # select 示例 + select code from root.sg1.d1 where code in ('string1', 'string2'); + ``` + +- `LOAD` / `REMOVE` / `SETTLE` 指令中的文件路径。 + + ```SQL + # load 示例 + LOAD 'examplePath' + + # remove 示例 + REMOVE 'examplePath' + + # SETTLE 示例 + SETTLE 'examplePath' + ``` + +- 用户密码。 + + ```SQL + # 示例,write_pwd 即为用户密码 + CREATE USER ln_write_user 'write_pwd' + ``` + +- 触发器和 UDF 中的类全类名,示例如下: + + ```SQL + # 触发器示例,AS 后使用字符串表示类全类名 + CREATE TRIGGER `alert-listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + + # UDF 示例,AS 后使用字符串表示类全类名 + CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- Select 子句中可以为结果集中的值指定别名,别名可以被定义为字符串或者标识符,示例如下: + + ```SQL + select s1 as 'temperature', s2 as 'speed' from root.ln.wf01.wt01; + + # 表头如下所示 + +-----------------------------+-----------|-----+ + | Time|temperature|speed| + +-----------------------------+-----------|-----+ + ``` + +- 用于表示键值对,键值对的键和值可以被定义成常量(包括字符串)或者标识符,具体请参考键值对章节。 + +#### 如何在字符串内使用引号 + +- 在单引号引起的字符串内,双引号无需特殊处理。同理,在双引号引起的字符串内,单引号无需特殊处理。 +- 在单引号引起的字符串里,可以通过双写单引号来表示一个单引号,即单引号 ' 可以表示为 ''。 +- 在双引号引起的字符串里,可以通过双写双引号来表示一个双引号,即双引号 " 可以表示为 ""。 + +字符串内使用引号的示例如下: + +```Plain%20Text +'string' // string +'"string"' // "string" +'""string""' // ""string"" +'''string' // 'string + +"string" // string +"'string'" // 'string' +"''string''" // ''string'' +"""string" // "string +``` + +### 数值型常量 + +数值型常量包括整型和浮点型。 + +整型常量是一个数字序列。可以以 `+` 或 `-` 开头表示正负。例如:`1`, `-1`。 + +带有小数部分或由科学计数法表示的为浮点型常量,例如:`.1`, `3.14`, `-2.23`, `+1.70`, `1.2E3`, `1.2E-3`, `-1.2E3`, `-1.2E-3`。 + +在 IoTDB 中,`INT32` 和 `INT64` 表示整数类型(计算是准确的),`FLOAT` 和 `DOUBLE` 表示浮点数类型(计算是近似的)。 + +在浮点上下文中可以使用整数,它会被解释为等效的浮点数。 + +### 时间戳常量 + +时间戳是一个数据到来的时间点,在 IoTDB 中分为绝对时间戳和相对时间戳。详细信息可参考 [数据类型文档](../Basic-Concept/Data-Type.md)。 + +特别地,`NOW()`表示语句开始执行时的服务端系统时间戳。 + +### 布尔型常量 + +布尔值常量 `TRUE` 和 `FALSE` 分别等价于 `1` 和 `0`,它们对大小写不敏感。 + +### 空值 + +`NULL`值表示没有数据。`NULL`对大小写不敏感。 + +## 标识符 + +### 使用场景 + +在 IoTDB 中,触发器名称、UDF函数名、元数据模板名称、用户与角色名、连续查询标识、Pipe、PipeSink、键值对中的键和值、别名等可以作为标识符。 + +### 约束 + +请注意,此处约束是标识符的通用约束,具体标识符可能还附带其它约束条件,如用户名限制字符数大于等于4,更严格的约束请参考具体标识符相关的说明文档。 + +**标识符命名有以下约束:** + +- 不使用反引号括起的标识符中,允许出现以下字符: + - [ 0-9 a-z A-Z _ ] (字母,数字,下划线) + - ['\u2E80'..'\u9FFF'] (UNICODE 中文字符) + +### 反引号 + +**如果出现如下情况,标识符需要使用反引号进行引用:** + +- 标识符包含不允许的特殊字符。 +- 标识符为实数。 + +#### 如何在反引号引起的标识符中使用引号 + +**在反引号引起的标识符中可以直接使用单引号和双引号。** + +**在用反引号引用的标识符中,可以通过双写反引号的方式使用反引号,即 ` 可以表示为 ``。** + +示例如下: + +```SQL +# 创建模板 t1`t +create device template `t1``t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + +# 创建模板 t1't"t +create device template `t1't"t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +#### 反引号相关示例 + +- 触发器名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建触发器 alert.`listener-sg1d1s1 + CREATE TRIGGER `alert.``listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + ``` + +- UDF 名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建名为 111 的 UDF,111 为实数,所以需要用反引号引用。 + CREATE FUNCTION `111` AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- 元数据模板名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建名为 111 的元数据模板,111 为实数,需要用反引号引用。 + create device template `111` + (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + ``` + +- 用户名、角色名出现上述特殊情况时需使用反引号引用,同时无论是否使用反引号引用,用户名、角色名中均不允许出现空格,具体请参考权限管理章节中的说明。 + + ```sql + # 创建用户 special`user. + CREATE USER `special``user.` 'write_pwd' + + # 创建角色 111 + CREATE ROLE `111` + ``` + +- 连续查询标识出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建连续查询 test.cq + CREATE CONTINUOUS QUERY `test.cq` + BEGIN + SELECT max_value(temperature) + INTO temperature_max + FROM root.ln.*.* + GROUP BY time(10s) + END + ``` + +- Pipe、PipeSink 名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建 PipeSink test.*1 + CREATE PIPESINK `test.*1` AS IoTDB ('ip' = '输入你的IP') + + # 创建 Pipe test.*2 + CREATE PIPE `test.*2` TO `test.*1` FROM + (select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS) WITH 'SyncDelOp' = 'true' + ``` + +- Select 子句中可以结果集中的值指定别名,别名可以被定义为字符串或者标识符,示例如下: + + ```sql + select s1 as temperature, s2 as speed from root.ln.wf01.wt01; + # 表头如下所示 + +-----------------------------+-----------+-----+ + | Time|temperature|speed| + +-----------------------------+-----------+-----+ + ``` + +- 用于表示键值对,键值对的键和值可以被定义成常量(包括字符串)或者标识符,具体请参考键值对章节。 + +- 路径中非 database 的节点允许含有`*`符号,在使用时需要把该节点用反引号括起来(如下),但是此种用法只建议在在路径中不可避免含有`*`时使用。 + + ```sql + `root.db.*` + ``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Reference/UDF-Libraries_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Reference/UDF-Libraries_apache.md new file mode 100644 index 00000000..7112666c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Reference/UDF-Libraries_apache.md @@ -0,0 +1,5346 @@ + +# UDF函数库 + +基于用户自定义函数能力,IoTDB 提供了一系列关于时序数据处理的函数,包括数据质量、数据画像、异常检测、 频域分析、数据匹配、数据修复、序列发现、机器学习等,能够满足工业领域对时序数据处理的需求。 + +> 注意:当前UDF函数库中的函数仅支持毫秒级的时间戳精度。 + +## 安装步骤 +1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 + + | UDF 函数库版本 | 支持的 IoTDB 版本 | 下载链接 | + | --------------- | ----------------- | ------------------------------------------------------------ | + | UDF-1.3.3.zip | V1.3.3及以上 | 请联系天谋商务获取 | + | UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系天谋商务获取 | + +2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 +3. 在 IoTDB 的 SQL 命令行终端(CLI)或可视化控制台(Workbench)的 SQL 操作界面中,执行下述相应的函数注册语句。 +4. 批量注册:两种注册方式:注册脚本 或 SQL汇总语句 +- 注册脚本 + - 将压缩包中的注册脚本(register-UDF.sh 或 register-UDF.bat)按需复制到 IoTDB 的 tools 目录下,修改脚本中的参数(默认为host=127.0.0.1,rpcPort=6667,user=root,pass=root); + - 启动 IoTDB 服务,运行注册脚本批量注册 UDF + +- SQL汇总语句 + - 打开压缩包中的SQl文件,复制全部 SQL 语句,在 IoTDB 的 SQL 命令行终端(CLI)或可视化控制台(Workbench)的 SQL 操作界面中,执行全部 SQl 语句批量注册 UDF + +## 数据质量 + +### Completeness + +#### 注册语句 + +```sql +create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' +``` + +#### 函数简介 + +本函数用于计算时间序列的完整性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的完整性,并输出窗口第一个数据点的时间戳和窗口的完整性。 + +**函数名:** COMPLETENESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 ++ `downtime`:完整性计算是否考虑停机异常。它的取值为 'true' 或 'false',默认值为 'true'. 在考虑停机异常时,长时间的数据缺失将被视作停机,不对完整性产生影响。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行完整性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +### Consistency + +#### 注册语句 + +```sql +create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' +``` + +#### 函数简介 + +本函数用于计算时间序列的一致性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的一致性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** CONSISTENCY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行一致性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +### Timeliness + +#### 注册语句 + +```sql +create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' +``` + +#### 函数简介 + +本函数用于计算时间序列的时效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的时效性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** TIMELINESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行时效性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +### Validity + +#### 注册语句 + +```sql +create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' +``` + +#### 函数简介 + +本函数用于计算时间序列的有效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的有效性,并输出窗口第一个数据点的时间戳和窗口的有效性。 + + +**函数名:** VALIDITY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行有效性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + + + + +## 数据画像 + +### ACF + +#### 注册语句 + +```sql +create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' +``` + +#### 函数简介 + +本函数用于计算时间序列的自相关函数值,即序列与自身之间的互相关函数。 + +**函数名:** ACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点。 + +**提示:** + ++ 序列中的`NaN`值会被忽略,在计算中表现为0。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +### Distinct + +#### 注册语句 + +```sql +create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' +``` + +#### 函数简介 + +本函数可以返回输入序列中出现的所有不同的元素。 + +**函数名:** DISTINCT + +**输入序列:** 仅支持单个输入序列,类型可以是任意的 + +**输出序列:** 输出单个序列,类型与输入相同。 + +**提示:** + ++ 输出序列的时间戳是无意义的。输出顺序是任意的。 ++ 缺失值和空值将被忽略,但`NaN`不会被忽略。 ++ 字符串区分大小写 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select distinct(s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +### Histogram + +#### 注册语句 + +```sql +create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的分布直方图。 + +**函数名:** HISTOGRAM + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `min`:表示所求数据范围的下限,默认值为 -Double.MAX_VALUE。 ++ `max`:表示所求数据范围的上限,默认值为 Double.MAX_VALUE,`start`的值必须小于或等于`end`。 ++ `count`: 表示直方图分桶的数量,默认值为 1,其值必须为正整数。 + +**输出序列:** 直方图分桶的值,其中第 i 个桶(从 1 开始计数)表示的数据范围下界为$min+ (i-1)\cdot\frac{max-min}{count}$,数据范围上界为$min+ i \cdot \frac{max-min}{count}$。 + + +**提示:** + ++ 如果某个数据点的数值小于`min`,它会被放入第 1 个桶;如果某个数据点的数值大于`max`,它会被放入最后 1 个桶。 ++ 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +### Integral + +#### 注册语句 + +```sql +create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' +``` + +#### 函数简介 + +本函数用于计算时间序列的数值积分,即以时间为横坐标、数值为纵坐标绘制的折线图中折线以下的面积。 + +**函数名:** INTEGRAL + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `unit`:积分求解所用的时间轴单位,取值为 "1S", "1s", "1m", "1H", "1d"(区分大小写),分别表示以毫秒、秒、分钟、小时、天为单位计算积分。 + 缺省情况下取 "1s",以秒为单位。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为积分结果的数据点。 + +**提示:** + ++ 积分值等于折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和,不同时间单位下相当于横轴进行不同倍数放缩,得到的积分值可直接按放缩倍数转换。 + ++ 数据中`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + +#### 使用示例 + +##### 参数缺省 + +缺省情况下积分以1s为时间单位。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + + +##### 指定时间单位 + +指定以分钟为时间单位。 + + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+3) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +### IntegralAvg + +#### 注册语句 + +```sql +create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' +``` + +#### 函数简介 + +本函数用于计算时间序列的函数均值,即在相同时间单位下的数值积分除以序列总的时间跨度。更多关于数值积分计算的信息请参考`Integral`函数。 + +**函数名:** INTEGRALAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为时间加权平均结果的数据点。 + +**提示:** + ++ 时间加权的平均值等于在任意时间单位`unit`下计算的数值积分(即折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和), + 除以相同时间单位下输入序列的时间跨度,其值与具体采用的时间单位无关,默认与 IoTDB 时间单位一致。 + ++ 数据中的`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + ++ 输入序列为空时,函数输出结果为 0;仅有一个数据点时,输出结果为该点数值。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +### Mad + +#### 注册语句 + +```sql +create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似绝对中位差,绝对中位差为所有数值与其中位数绝对偏移量的中位数。 + +如有数据集$\{1,3,3,5,5,6,7,8,9\}$,其中位数为5,所有数值与中位数的偏移量的绝对值为$\{0,0,1,2,2,2,3,4,4\}$,其中位数为2,故而原数据集的绝对中位差为2。 + +**函数名:** MAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `error`:近似绝对中位差的基于数值的误差百分比,取值范围为 [0,1),默认值为 0。如当`error`=0.01 时,记精确绝对中位差为a,近似绝对中位差为b,不等式 $0.99a \le b \le 1.01a$ 成立。当`error`=0 时,计算结果为精确绝对中位差。 + + +**输出序列:** 输出单个序列,类型为DOUBLE,序列仅包含一个时间戳为 0、值为绝对中位差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +##### 精确查询 + +当`error`参数缺省或为0时,本函数计算精确绝对中位差。 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +............ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select mad(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|median(root.test.s1, "error"="0")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +##### 近似查询 + +当`error`参数取值不为 0 时,本函数计算近似绝对中位差。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select mad(s1, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s1, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| ++-----------------------------+---------------------------------+ +``` + +### Median + +#### 注册语句 + +```sql +create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似中位数。中位数是顺序排列的一组数据中居于中间位置的数;当序列有偶数个时,中位数为中间二者的平均数。 + +**函数名:** MEDIAN + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `error`:近似中位数的基于排名的误差百分比,取值范围 [0,1),默认值为 0。如当`error`=0.01 时,计算出的中位数的真实排名百分比在 0.49~0.51 之间。当`error`=0 时,计算结果为精确中位数。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为中位数的数据点。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select median(s1, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s1, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+------------------------------------+ +``` + +### MinMax + +#### 注册语句 + +```sql +create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' +``` + +#### 函数简介 + +本函数将输入序列使用 min-max 方法进行标准化。最小值归一至 0,最大值归一至 1. + +**函数名:** MINMAX + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为"batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供最大值及最小值进行流式计算转换。默认为 "batch"。 ++ `min`:使用流式计算时的最小值。 ++ `max`:使用流式计算时的最大值。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select minmax(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + + + +### MvAvg + +#### 注册语句 + +```sql +create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' +``` + +#### 函数简介 + +本函数计算序列的移动平均。 + +**函数名:** MVAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:移动窗口的长度。默认值为 10. + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 指定窗口长度 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +### PACF + +#### 注册语句 + +```sql +create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' +``` + +#### 函数简介 + +本函数通过求解 Yule-Walker 方程,计算序列的偏自相关系数。对于特殊的输入序列,方程可能没有解,此时输出`NaN`。 + +**函数名:** PACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:最大滞后阶数。默认值为$\min(10\log_{10}n,n-1)$,$n$表示数据点个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 指定滞后阶数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select pacf(s1, "lag"="5") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------+ +| Time|pacf(root.test.d1.s1, "lag"="5")| ++-----------------------------+--------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| +|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| +|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| +|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| ++-----------------------------+--------------------------------+ +``` + +### Percentile + +#### 注册语句 + +```sql +create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似分位数。 + +**函数名:** PERCENTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名百分比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算中位数。 ++ `error`:近似分位数的基于排名的误差百分比,取值范围为 [0,1),默认值为0。如`rank`=0.5 且`error`=0.01,则计算出的分位数的真实排名百分比在 0.49~0.51之间。当`error`=0 时,计算结果为精确分位数。 + +**输出序列:** 输出单个序列,类型与输入序列相同。当`error`=0时,序列仅包含一个时间戳为分位数第一次出现的时间戳、值为分位数的数据点;否则,输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +用于查询的 SQL 语句: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|percentile(root.test.s0, "rank"="0.2", "error"="0.01")| ++-----------------------------+------------------------------------------------------+ +|2021-03-17T10:35:02.054+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +```输入序列: + +``` ++-----------------------------+-------------+ +| Time|root.test2.s1| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+-------------+ +............ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select percentile(s1, "rank"="0.2", "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| ++-----------------------------+-------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| -1.0| ++-----------------------------+-------------------------------------------------------+ +``` + + +### Quantile + +#### 注册语句 + +```sql +create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的近似分位数。本函数基于KLL sketch算法实现。 + +**函数名:** QUANTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算近似中位数。 ++ `K`:允许维护的KLL sketch大小,最小值为100,默认值为800。如`rank`=0.5 且`K`=800,则计算出的分位数的真实排名比有至少99%的可能性在 0.49~0.51之间。 + +**输出序列:** 输出单个序列,类型与输入序列相同。输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + + +输入序列: + +``` ++-----------------------------+-------------+ +| Time|root.test1.s1| ++-----------------------------+-------------+ +|2021-03-17T10:32:17.054+08:00| 7| +|2021-03-17T10:32:18.054+08:00| 15| +|2021-03-17T10:32:19.054+08:00| 36| +|2021-03-17T10:32:20.054+08:00| 39| +|2021-03-17T10:32:21.054+08:00| 40| +|2021-03-17T10:32:22.054+08:00| 41| +|2021-03-17T10:32:23.054+08:00| 20| +|2021-03-17T10:32:24.054+08:00| 18| ++-----------------------------+-------------+ +............ +Total line number = 8 +``` + +用于查询的 SQL 语句: + +```sql +select quantile(s1, "rank"="0.2", "K"="800") from root.test1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------+ +| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.000000000000001| ++-----------------------------+------------------------------------------------+ +``` + +### Period + +#### 注册语句 + +```sql +create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的周期。 + +**函数名:** PERIOD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 INT32,序列仅包含一个时间戳为 0、值为周期的数据点。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select period(s1) from root.test.d3 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +### QLB + +#### 注册语句 + +```sql +create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' +``` + +#### 函数简介 + +本函数对输入序列计算$Q_{LB} $统计量,并计算对应的p值。p值越小表明序列越有可能为非平稳序列。 + +**函数名:** QLB + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:计算时用到的最大延迟阶数,取值应为 1 至 n-2 之间的整数,n 为序列采样总数。默认取 n-2。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列是$Q_{LB} $统计量对应的 p 值,时间标签代表偏移阶数。 + +**提示:** $Q_{LB} $统计量由自相关系数求得,如需得到统计量而非 p 值,可以使用 ACF 函数。 + +#### 使用示例 + +##### 使用默认参数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select QLB(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +### Resample + +#### 注册语句 + +```sql +create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' +``` + +#### 函数简介 + +本函数对输入序列按照指定的频率进行重采样,包括上采样和下采样。目前,本函数支持的上采样方法包括`NaN`填充法 (NaN)、前值填充法 (FFill)、后值填充法 (BFill) 以及线性插值法 (Linear);本函数支持的下采样方法为分组聚合,聚合方法包括最大值 (Max)、最小值 (Min)、首值 (First)、末值 (Last)、平均值 (Mean)和中位数 (Median)。 + +**函数名:** RESAMPLE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `every`:重采样频率,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 ++ `interp`:上采样的插值方法,取值为 'NaN'、'FFill'、'BFill' 或 'Linear'。在缺省情况下,使用`NaN`填充法。 ++ `aggr`:下采样的聚合方法,取值为 'Max'、'Min'、'First'、'Last'、'Mean' 或 'Median'。在缺省情况下,使用平均数聚合。 ++ `start`:重采样的起始时间(包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用第一个有效数据点的时间戳。 ++ `end`:重采样的结束时间(不包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用最后一个有效数据点的时间戳。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列按照重采样频率严格等间隔分布。 + +**提示:** 数据中的`NaN`将会被忽略。 + +#### 使用示例 + +##### 上采样 + +当重采样频率高于数据原始频率时,将会进行上采样。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +##### 下采样 + +当重采样频率低于数据原始频率时,将会进行下采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + +###### 指定重采样时间段 + +可以使用`start`和`end`两个参数指定重采样的时间段,超出实际时间范围的部分会被插值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +### Sample + +#### 注册语句 + +```sql +create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' +``` + +#### 函数简介 + +本函数对输入序列进行采样,即从输入序列中选取指定数量的数据点并输出。目前,本函数支持三种采样方法:**蓄水池采样法 (reservoir sampling)** 对数据进行随机采样,所有数据点被采样的概率相同;**等距采样法 (isometric sampling)** 按照相等的索引间隔对数据进行采样,**最大三角采样法 (triangle sampling)** 对所有数据会按采样率分桶,每个桶内会计算数据点间三角形面积,并保留面积最大的点,该算法通常用于数据的可视化展示中,采用过程可以保证一些关键的突变点在采用中得到保留,更多抽样算法细节可以阅读论文 [here](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf)。 + +**函数名:** SAMPLE + +**输入序列:** 仅支持单个输入序列,类型可以是任意的。 + +**参数:** + ++ `method`:采样方法,取值为 'reservoir','isometric' 或 'triangle' 。在缺省情况下,采用蓄水池采样法。 ++ `k`:采样数,它是一个正整数,在缺省情况下为 1。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列的长度为采样数,序列中的每一个数据点都来自于输入序列。 + +**提示:** 如果采样数大于序列长度,那么输入序列中所有的数据点都会被输出。 + +#### 使用示例 + + +##### 蓄水池采样 + +当`method`参数为 'reservoir' 或缺省时,采用蓄水池采样法对输入序列进行采样。由于该采样方法具有随机性,下面展示的输出序列只是一种可能的结果。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + + +##### 等距采样 + +当`method`参数为 'isometric' 时,采用等距采样法对输入序列进行采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +### Segment + +#### 注册语句 + +```sql +create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' +``` + +#### 函数简介 + +本函数按照数据的线性变化趋势将数据划分为多个子序列,返回分段直线拟合后的子序列首值或所有拟合值。 + +**函数名:** SEGMENT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `output`:"all" 输出所有拟合值;"first" 输出子序列起点拟合值。默认为 "first"。 + ++ `error`:判定存在线性趋势的误差允许阈值。误差的定义为子序列进行线性拟合的误差的绝对值的均值。默认为 0.1. + +**输出序列:** 输出单个序列,类型为 DOUBLE。 + +**提示:** 函数默认所有数据等时间间隔分布。函数读取所有数据,若原始数据过多,请先进行降采样处理。拟合采用自底向上方法,子序列的尾值可能会被认作子序列首值输出。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select segment(s1,"error"="0.1") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +### Skew + +#### 注册语句 + +```sql +create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的总体偏度 + +**函数名:** SKEW + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为总体偏度的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select skew(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +### Spline + +#### 注册语句 + +```sql +create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' +``` + +#### 函数简介 + +本函数提供对原始序列进行三次样条曲线拟合后的插值重采样。 + +**函数名:** SPLINE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `points`:重采样个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**提示**:输出序列保留输入序列的首尾值,等时间间隔采样。仅当输入点个数不少于 4 个时才计算插值。 + +#### 使用示例 + +##### 指定插值个数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spline(s1, "points"="151") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +### Spread + +#### 注册语句 + +```sql +create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' +``` + +#### 函数简介 + +本函数用于计算时间序列的极差,即最大值减去最小值的结果。 + +**函数名:** SPREAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型与输入相同,序列仅包含一个时间戳为 0 、值为极差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + + + +### ZScore + +#### 注册语句 + +```sql +create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' +``` + +#### 函数简介 + +本函数将输入序列使用z-score方法进行归一化。 + +**函数名:** ZSCORE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为 "batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供均值及方差进行流式计算转换。默认为 "batch"。 ++ `avg`:使用流式计算时的均值。 ++ `sd`:使用流式计算时的标准差。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select zscore(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` + + + +## 异常检测 + +### IQR + +#### 注册语句 + +```sql +create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' +``` + +#### 函数简介 + +本函数用于检验超出上下四分位数1.5倍IQR的数据分布异常。 + +**函数名:** IQR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:若设置为 "batch",则将数据全部读入后检测;若设置为 "stream",则需用户提供上下四分位数进行流式检测。默认为 "batch"。 ++ `q1`:使用流式计算时的下四分位数。 ++ `q3`:使用流式计算时的上四分位数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**说明**:$IQR=Q_3-Q_1$ + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select iqr(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +### KSigma + +#### 注册语句 + +```sql +create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' +``` + +#### 函数简介 + +本函数利用动态 K-Sigma 算法进行异常检测。在一个窗口内,与平均值的差距超过k倍标准差的数据将被视作异常并输出。 + +**函数名:** KSIGMA + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `k`:在动态 K-Sigma 算法中,分布异常的标准差倍数阈值,默认值为 3。 ++ `window`:动态 K-Sigma 算法的滑动窗口大小,默认值为 10000。 + + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** k 应大于 0,否则将不做输出。 + +#### 使用示例 + +##### 指定k + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +### LOF + +#### 注册语句 + +```sql +create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' +``` + +#### 函数简介 + +本函数使用局部离群点检测方法用于查找序列的密度异常。将根据提供的第k距离数及局部离群点因子(lof)阈值,判断输入数据是否为离群点,即异常,并输出各点的 LOF 值。 + +**函数名:** LOF + +**输入序列:** 多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:使用的检测方法。默认为 default,以高维数据计算。设置为 series,将一维时间序列转换为高维数据计算。 ++ `k`:使用第k距离计算局部离群点因子.默认为 3。 ++ `window`:每次读取数据的窗口长度。默认为 10000. ++ `windowsize`:使用series方法时,转化高维数据的维数,即单个窗口的大小。默认为 5。 + +**输出序列:** 输出单时间序列,类型为DOUBLE。 + +**提示:** 不完整的数据行会被忽略,不参与计算,也不标记为离群点。 + + +#### 使用示例 + +##### 默认参数 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +##### 诊断一维时间序列 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +### MissDetect + +#### 注册语句 + +```sql +create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' +``` + +#### 函数简介 + +本函数用于检测数据中的缺失异常。在一些数据中,缺失数据会被线性插值填补,在数据中出现完美的线性片段,且这些片段往往长度较大。本函数通过在数据中发现这些完美线性片段来检测缺失异常。 + +**函数名:** MISSDETECT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `minlen`:被标记为异常的完美线性片段的最小长度,是一个大于等于 10 的整数,默认值为 10。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN,即该数据点是否为缺失异常。 + +**提示:** 数据中的`NaN`将会被忽略。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +### Range + +#### 注册语句 + +```sql +create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' +``` + +#### 函数简介 + +本函数用于查找时间序列的范围异常。将根据提供的上界与下界,判断输入数据是否越界,即异常,并输出所有异常点为新的时间序列。 + +**函数名:** RANGE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `lower_bound`:范围异常检测的下界。 ++ `upper_bound`:范围异常检测的上界。 + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** 应满足`upper_bound`大于`lower_bound`,否则将不做输出。 + + +#### 使用示例 + +##### 指定上界与下界 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +### TwoSidedFilter + +#### 注册语句 + +```sql +create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' +``` + +#### 函数简介 + +本函数基于双边窗口检测法对输入序列中的异常点进行过滤。 + +**函数名:** TWOSIDEDFILTER + +**输出序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型与输入相同,是输入序列去除异常点后的结果。 + +**参数:** + +- `len`:双边窗口检测法中的窗口大小,取值范围为正整数,默认值为 5.如当`len`=3 时,算法向前、向后各取长度为3的窗口,在窗口中计算异常度。 +- `threshold`:异常度的阈值,取值范围为(0,1),默认值为 0.3。阈值越高,函数对于异常度的判定标准越严格。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +输出序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +### Outlier + +#### 注册语句 + +```sql +create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' +``` + +#### 函数简介 + +本函数用于检测基于距离的异常点。在当前窗口中,如果一个点距离阈值范围内的邻居数量(包括它自己)少于密度阈值,则该点是异常点。 + +**函数名:** OUTLIER + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `r`:基于距离异常检测中的距离阈值。 ++ `k`:基于距离异常检测中的密度阈值。 ++ `w`:用于指定滑动窗口的大小。 ++ `s`:用于指定滑动窗口的步长。 + +**输出序列**:输出单个序列,类型与输入序列相同。 + +#### 使用示例 + +##### 指定查询参数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + +### MasterTrain + +#### 函数简介 + +本函数基于主数据训练VAR预测模型。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由连续p+1个非错误值作为训练样本训练VAR模型,输出训练后的模型参数。 + +**函数名:** MasterTrain + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 + +**输出序列:** 输出单个序列,类型为DOUBLE。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn spotless:apply` +- 在根目录运行 `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies` 编译项目 +- 将 `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterTrain as org.apache.iotdb.library.anomaly.UDTFMasterTrain'`。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ + +``` + +### MasterDetect + +#### 函数简介 + +本函数基于主数据检测并修复时间序列中的错误值。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由MasterTrain训练的模型进行时间序列预测,错误值将由预测值及主数据共同修复。 + +**函数名:** MasterDetect + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `beta`:异常值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `output_type`:输出结果类型,可选'repair'或'anomaly',即输出修复结果或异常检测结果,在缺省情况下默认为'repair'。 ++ `output_column`:输出列的序号,默认为1,即输出第一列的修复结果。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn spotless:apply` +- 在根目录运行 `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies` 编译项目 +- 将 `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'`。 + +**输出序列:** 输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +##### 修复 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +##### 异常检测 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| false| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + +## 频域分析 + +### Conv + +#### 注册语句 + +```sql +create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' +``` + +#### 函数简介 + +本函数对两个输入序列进行卷积,即多项式乘法。 + + +**函数名:** CONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为DOUBLE,它是两个序列卷积的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +### Deconv + +#### 注册语句 + +```sql +create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' +``` + +#### 函数简介 + +本函数对两个输入序列进行去卷积,即多项式除法运算。 + +**函数名:** DECONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `result`:去卷积的结果,取值为'quotient'或'remainder',分别对应于去卷积的商和余数。在缺省情况下,输出去卷积的商。 + +**输出序列:** 输出单个序列,类型为DOUBLE。它是将第二个序列从第一个序列中去卷积(第一个序列除以第二个序列)的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +##### 计算去卷积的商 + +当`result`参数缺省或为'quotient'时,本函数计算去卷积的商。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +##### 计算去卷积的余数 + +当`result`参数为'remainder'时,本函数计算去卷积的余数。输入序列同上,用于查询的SQL语句如下: + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +### DWT + +#### 注册语句 + +```sql +create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' +``` + +#### 函数简介 + +本函数对输入序列进行一维离散小波变换。 + +**函数名:** DWT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:小波滤波的类型,提供'Haar', 'DB4', 'DB6', 'DB8',其中DB指代Daubechies。若不设置该参数,则用户需提供小波滤波的系数。不区分大小写。 ++ `coef`:小波滤波的系数。若提供该参数,请使用英文逗号','分割各项,不添加空格或其它符号。 ++ `layer`:进行变换的次数,最终输出的向量个数等同于$layer+1$.默认取1。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。 + +**提示:** 输入序列长度必须为2的整数次幂。 + +#### 使用示例 + +##### Haar变换 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +### FFT + +#### 注册语句 + +```sql +create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' +``` + +#### 函数简介 + +本函数对输入序列进行快速傅里叶变换。 + +**函数名:** FFT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:傅里叶变换的类型,取值为'uniform'或'nonuniform',缺省情况下为'uniform'。当取值为'uniform'时,时间戳将被忽略,所有数据点都将被视作等距的,并应用等距快速傅里叶算法;当取值为'nonuniform'时,将根据时间戳应用非等距快速傅里叶算法(未实现)。 ++ `result`:傅里叶变换的结果,取值为'real'、'imag'、'abs'或'angle',分别对应于变换结果的实部、虚部、模和幅角。在缺省情况下,输出变换的模。 ++ `compress`:压缩参数,取值范围(0,1],是有损压缩时保留的能量比例。在缺省情况下,不进行压缩。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +##### 等距傅里叶变换 + +当`type`参数缺省或为'uniform'时,本函数进行等距傅里叶变换。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select fft(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此在输出序列中$k=4$和$k=5$处有尖峰。 + +##### 等距傅里叶变换并压缩 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +注:基于傅里叶变换结果的共轭性质,压缩结果只保留前一半;根据给定的压缩参数,从低频到高频保留数据点,直到保留的能量比例超过该值;保留最后一个数据点以表示序列长度。 + +### HighPass + +#### 注册语句 + +```sql +create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' +``` + +#### 函数简介 + +本函数对输入序列进行高通滤波,提取高于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** HIGHPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此高通滤波之后的输出序列服从$y=sin(2\pi t/4)$。 + +### IFFT + +#### 注册语句 + +```sql +create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' +``` + +#### 函数简介 + +本函数将输入的两个序列作为实部和虚部视作一个复数,进行逆快速傅里叶变换,并输出结果的实部。输入数据的格式参见`FFT`函数的输出,并支持以`FFT`函数压缩后的输出作为本函数的输入。 + +**函数名:** IFFT + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `start`:输出序列的起始时刻,是一个格式为'yyyy-MM-dd HH:mm:ss'的时间字符串。在缺省情况下,为'1970-01-01 08:00:00'。 ++ `interval`:输出序列的时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,为1s。 + + +**输出序列:** 输出单个序列,类型为DOUBLE。该序列是一个等距时间序列,它的值是将两个输入序列依次作为实部和虚部进行逆快速傅里叶变换的结果。 + +**提示:** 如果某行数据中包含空值或`NaN`,该行数据将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +用于查询的SQL语句: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +### LowPass + +#### 注册语句 + +```sql +create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' +``` + +#### 函数简介 + +本函数对输入序列进行低通滤波,提取低于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** LOWPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` +## Envelope + +### 函数简介 + +本函数通过输入一维浮点数数组和用户指定的调制频率,实现对信号的解调和包络提取。解调的目标是从复杂的信号中提取感兴趣的部分,使其更易理解。比如通过解调可以找到信号的包络,即振幅的变化趋势。 + +**函数名:** Envelope + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `frequency`:频率(选填,正数。不填此参数,系统会基于序列对应时间的时间间隔来推断频率)。 ++ `amplification`: 扩增倍数(选填,正整数。输出Time列的结果为正整数的集合,不会输出小数。当频率小1时,可通过此参数对频率进行扩增以展示正常的结果)。 + +**输出序列:** ++ `Time`: 该列返回的值的含义是频率而并非时间,如果输出的格式为时间格式(如:1970-01-01T08:00:19.000+08:00),请将其转为时间戳值。 + ++ `Envelope(Path, 'frequency'='{frequency}')`:输出单个序列,类型为DOUBLE,它是包络分析之后的结果。 + +**提示:** 当解调的原始序列的值不连续时,本函数会视为连续处理,建议被分析的时间序列是一段值完整的时间序列。同时建议指定开始时间与结束时间。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:01.000+08:00| 1.0 | +|1970-01-01T08:00:02.000+08:00| 2.0 | +|1970-01-01T08:00:03.000+08:00| 3.0 | +|1970-01-01T08:00:04.000+08:00| 4.0 | +|1970-01-01T08:00:05.000+08:00| 5.0 | +|1970-01-01T08:00:06.000+08:00| 6.0 | +|1970-01-01T08:00:07.000+08:00| 7.0 | +|1970-01-01T08:00:08.000+08:00| 8.0 | +|1970-01-01T08:00:09.000+08:00| 9.0 | +|1970-01-01T08:00:10.000+08:00| 10.0 | ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: +```sql +set time_display_type=long; +select envelope(s1),envelope(s1,'frequency'='1000'),envelope(s1,'amplification'='10') from root.test.d1; +``` +输出序列: + +``` ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +|Time|envelope(root.test.d1.s1)|envelope(root.test.d1.s1, "frequency"="1000")|envelope(root.test.d1.s1, "amplification"="10")| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +| 0| 6.284350808484124| 6.284350808484124| 6.284350808484124| +| 100| 1.5581923657404393| 1.5581923657404393| null| +| 200| 0.8503211038340728| 0.8503211038340728| null| +| 300| 0.512808785945551| 0.512808785945551| null| +| 400| 0.26361156774506744| 0.26361156774506744| null| +|1000| null| null| 1.5581923657404393| +|2000| null| null| 0.8503211038340728| +|3000| null| null| 0.512808785945551| +|4000| null| null| 0.26361156774506744| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ + +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此低通滤波之后的输出序列服从$y=2sin(2\pi t/5)$。 + + + +## 数据匹配 + +### Cov + +#### 注册语句 + +```sql +create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的总体协方差。 + +**函数名:** COV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为总体协方差的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +### Dtw + +#### 注册语句 + +```sql +create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的 DTW 距离。 + +**函数名:** DTW + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为两个时间序列的 DTW 距离值。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出 0。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +### Pearson + +#### 注册语句 + +```sql +create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的皮尔森相关系数。 + +**函数名:** PEARSON + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为皮尔森相关系数的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +### PtnSym + +#### 注册语句 + +```sql +create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' +``` + +#### 函数简介 + +本函数用于寻找序列中所有对称度小于阈值的对称子序列。对称度通过 DTW 计算,值越小代表序列对称性越高。 + +**函数名:** PTNSYM + +**输入序列:** 仅支持一个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:对称子序列的长度,是一个正整数,默认值为 10。 ++ `threshold`:对称度阈值,是一个非负数,只有对称度小于等于该值的对称子序列才会被输出。在缺省情况下,所有的子序列都会被输出。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中的每一个数据点对应于一个对称子序列,时间戳为子序列的起始时刻,值为对称度。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +### XCorr + +#### 注册语句 + +```sql +create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' +``` + +#### 函数简介 + +本函数用于计算两条时间序列的互相关函数值, +对离散序列而言,互相关函数可以表示为 +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +常用于表征两条序列在不同对齐条件下的相似度。 + +**函数名:** XCORR + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点, +其中正中心的值为两条序列按照预先对齐的结果计算的互相关系数(即等于以上公式的$CR(0)$), +前半部分的值表示将后一条输入序列向前平移时计算的互相关系数, +直至两条序列没有重合的数据点(不包含完全分离时的结果$CR(-N)=0.0$), +后半部分类似。 +用公式可表示为(所有序列的索引从1开始计数): +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**提示:** + ++ 两条序列中的`null` 和`NaN` 值会被忽略,在计算中表现为 0。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` + + + +## 数据修复 + +### TimestampRepair + +#### 注册语句 + +```sql +create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' +``` + +### 函数简介 + +本函数用于时间戳修复。根据给定的标准时间间隔,采用最小化修复代价的方法,通过对数据时间戳的微调,将原本时间戳间隔不稳定的数据修复为严格等间隔的数据。在未给定标准时间间隔的情况下,本函数将使用时间间隔的中位数 (median)、众数 (mode) 或聚类中心 (cluster) 来推算标准时间间隔。 + + +**函数名:** TIMESTAMPREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `interval`: 标准时间间隔(单位是毫秒),是一个正整数。在缺省情况下,将根据指定的方法推算。 ++ `method`:推算标准时间间隔的方法,取值为 'median', 'mode' 或 'cluster',仅在`interval`缺省时有效。在缺省情况下,将使用中位数方法进行推算。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +### 使用示例 + +#### 指定标准时间间隔 + +在给定`interval`参数的情况下,本函数将按照指定的标准时间间隔进行修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +#### 自动推算标准时间间隔 + +如果`interval`参数没有给定,本函数将按照推算的标准时间间隔进行修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +### ValueFill + +#### 注册语句 + +```sql +create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' +``` + +#### 函数简介 + +**函数名:** ValueFill + +**输入序列:** 单列时序数据,类型为INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, 默认为 "linear"。其中,“mean” 指使用均值填补的方法; “previous" 指使用前值填补方法;“linear" 指使用线性插值填补方法;“likelihood” 为基于速度的正态分布的极大似然估计方法;“AR” 指自回归的填补方法;“MA” 指滑动平均的填补方法;"SCREEN" 指约束填补方法;缺省情况下使用 “linear”。 + +**输出序列:** 填补后的单维序列。 + +**备注:** AR 模型采用 AR(1),时序列需满足自相关条件,否则将输出单个数据点 (0, 0.0). + +#### 使用示例 +##### 使用 linear 方法进行填补 + +当`method`缺省或取值为 'linear' 时,本函数将使用线性插值方法进行填补。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuefill(s1) from root.test.d2 +``` + +输出序列: + + + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +##### 使用 previous 方法进行填补 + +当`method`取值为 'previous' 时,本函数将使前值填补方法进行数值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +### ValueRepair + +#### 注册语句 + +```sql +create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' +``` + +#### 函数简介 + +本函数用于对时间序列的数值进行修复。目前,本函数支持两种修复方法:**Screen** 是一种基于速度阈值的方法,在最小改动的前提下使得所有的速度符合阈值要求;**LsGreedy** 是一种基于速度变化似然的方法,将速度变化建模为高斯分布,并采用贪心算法极大化似然函数。 + +**函数名:** VALUEREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的方法,取值为 'Screen' 或 'LsGreedy'. 在缺省情况下,使用 Screen 方法进行修复。 ++ `minSpeed`:该参数仅在使用 Screen 方法时有效。当速度小于该值时会被视作数值异常点加以修复。在缺省情况下为中位数减去三倍绝对中位差。 ++ `maxSpeed`:该参数仅在使用 Screen 方法时有效。当速度大于该值时会被视作数值异常点加以修复。在缺省情况下为中位数加上三倍绝对中位差。 ++ `center`:该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的中心。在缺省情况下为 0。 ++ `sigma` :该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的标准差。在缺省情况下为绝对中位差。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +#### 使用示例 + +##### 使用 Screen 方法进行修复 + +当`method`缺省或取值为 'Screen' 时,本函数将使用 Screen 方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +##### 使用 LsGreedy 方法进行修复 + +当`method`取值为 'LsGreedy' 时,本函数将使用 LsGreedy 方法进行数值修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +### MasterRepair + +#### 函数简介 + +本函数实现基于主数据的时间序列数据修复。 + +**函数名:**MasterRepair + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `omega`:算法窗口大小,非负整数(单位为毫秒), 在缺省情况下,算法根据不同时间差下的两个元组距离自动估计该参数。 +- `eta`:算法距离阈值,正数, 在缺省情况下,算法根据窗口中元组的距离分布自动估计该参数。 +- `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 +- `output_column`:输出列的序号,默认输出第一列的修复结果。 + +**输出序列:**输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +输出序列: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +### SeasonalRepair + +#### 函数简介 +本函数用于对周期性时间序列的数值进行基于分解的修复。目前,本函数支持两种方法:**Classical**使用经典分解方法得到的残差项检测数值的异常波动,并使用滑动平均修复序列;**Improved**使用改进的分解方法得到的残差项检测数值的异常波动,并使用滑动中值修复序列。 + +**函数名:** SEASONALREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的分解方法,取值为'Classical'或'Improved'。在缺省情况下,使用经典分解方法进行修复。 ++ `period`:序列的周期。 ++ `k`:残差项的范围阈值,用来限制残差项偏离中心的程度。在缺省情况下为9。 ++ `max_iter`:算法的最大迭代次数。在缺省情况下为10。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +#### 使用示例 +##### 使用经典分解方法进行修复 +当`method`缺省或取值为'Classical'时,本函数将使用经典分解方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +##### 使用改进的分解方法进行修复 +当`method`取值为'Improved'时,本函数将使用改进的分解方法进行数值修复。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` + + + +## 序列发现 + +### ConsecutiveSequences + +#### 注册语句 + +```sql +create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' +``` + +#### 函数简介 + +本函数用于在多维严格等间隔数据中发现局部最长连续子序列。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续子序列是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。如果某个连续子序列不是任何连续子序列的真子序列,那么它是局部最长的。 + + +**函数名:** CONSECUTIVESEQUENCES + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个局部最长连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +#### 使用示例 + +##### 手动指定标准时间间隔 + +本函数可以通过`gap`参数手动指定标准时间间隔。需要注意的是,错误的参数设置会导致输出产生严重错误。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + +##### 自动估计标准时间间隔 + +当`gap`参数缺省时,本函数可以利用众数估计标准时间间隔,得到同样的结果。因此,这种用法更受推荐。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +### ConsecutiveWindows + +#### 注册语句 + +```sql +create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' +``` + +#### 函数简介 + +本函数用于在多维严格等间隔数据中发现指定长度的连续窗口。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续窗口是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。 + + +**函数名:** CONSECUTIVEWINDOWS + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 ++ `length`:序列长度,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个指定长度连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` + + + +## 机器学习 + +### AR + +#### 注册语句 + +```sql +create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' +``` +#### 函数简介 + +本函数用于学习数据的自回归模型系数。 + +**函数名:** AR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `p`:自回归模型的阶数。默认为1。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。第一行对应模型的一阶系数,以此类推。 + +**提示:** + +- `p`应为正整数。 + +- 序列中的大部分点为等间隔采样点。 +- 序列中的缺失点通过线性插值进行填补后用于学习过程。 + +#### 使用示例 + +##### 指定阶数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +### Representation + +#### 函数简介 + +本函数用于时间序列的表示。 + +**函数名:** Representation + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为INT32,长度为`tb*vb`。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +#### 使用示例 + +##### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +### RM + +#### 函数简介 + +本函数用于基于时间序列表示的匹配度。 + +**函数名:** RM + +**输入序列:** 仅支持两个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度为`1`。序列的时间戳从0开始,序列仅有一个数据点,其时间戳为0,值为两个时间序列的匹配度。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +#### 使用示例 + +##### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/Function-and-Expression.md b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/Function-and-Expression.md new file mode 100644 index 00000000..2de545af --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/Function-and-Expression.md @@ -0,0 +1,3203 @@ + + +# 内置函数与表达式 + +## 聚合函数 + +聚合函数是多对一函数。它们对一组值进行聚合计算,得到单个聚合结果。 + +除了 `COUNT()`, `COUNT_IF()`之外,其他所有聚合函数都忽略空值,并在没有输入行或所有值为空时返回空值。 例如,`SUM()` 返回 null 而不是零,而 `AVG()` 在计数中不包括 null 值。 + +IoTDB 支持的聚合函数如下: + +| 函数名 | 功能描述 | 允许的输入类型 | 必要的属性参数 | 输出类型 | +| ------------- | ------------------------------------------------------------ | ------------------------- | ------------------------------------------------------------ | -------------- | +| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| COUNT | 计算数据点数。 | 所有类型 | 无 | INT64 | +| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| STDDEV_POP | 求总体标准差。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| STDDEV_SAMP | 求样本标准差。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| VAR_POP | 求总体方差。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| VAR_SAMP | 求样本方差。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| MAX_VALUE | 求最大值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| MIN_VALUE | 求最小值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| FIRST_VALUE | 求时间戳最小的值。 | 所有类型 | 无 | 与输入类型一致 | +| LAST_VALUE | 求时间戳最大的值。 | 所有类型 | 无 | 与输入类型一致 | +| MAX_TIME | 求最大时间戳。 | 所有类型 | 无 | Timestamp | +| MIN_TIME | 求最小时间戳。 | 所有类型 | 无 | Timestamp | +| COUNT_IF | 求数据点连续满足某一给定条件,且满足条件的数据点个数(用keep表示)满足指定阈值的次数。 | BOOLEAN | `[keep >=/>/=/!=/= threshold`,`threshold`类型为`INT64`
`ignoreNull`:可选,默认为`true`;为`true`表示忽略null值,即如果中间出现null值,直接忽略,不会打断连续性;为`false`表示不忽略null值,即如果中间出现null值,会打断连续性 | INT64 | +| TIME_DURATION | 求某一列最大一个不为NULL的值所在时间戳与最小一个不为NULL的值所在时间戳的时间戳差 | 所有类型 | 无 | INT64 | +| MODE | 求众数。注意:
1.输入序列的不同值个数过多时会有内存异常风险;
2.如果所有元素出现的频次相同,即没有众数,则返回对应时间戳最小的值;
3.如果有多个众数,则返回对应时间戳最小的众数。 | 所有类型 | 无 | 与输入类型一致 | +| COUNT_TIME | 查询结果集的时间戳的数量。与 align by device 搭配使用时,得到的结果是每个设备的结果集的时间戳的数量。 | 所有类型,输入参数只能为* | 无 | INT64 | +| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。 | 第一个输入 x 可以是任意类型,第二个输入 y 只能是 INT32 INT64 FLOAT DOUBLE | 无 | 与第一个输入 x 的数据类型一致 | +| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。 | 第一个输入 x 可以是任意类型,第二个输入 y 只能是 INT32 INT64 FLOAT DOUBLE | 无 | 与第一个输入 x 的数据类型一致 | + + +### COUNT_IF + +#### 语法 +```sql +count_if(predicate, [keep >=/>/=/!=/注意: count_if 当前暂不支持与 group by time 的 SlidingWindow 一起使用 + +#### 使用示例 + +##### 原始数据 + +``` ++-----------------------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2| ++-----------------------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 0| 0| +|1970-01-01T08:00:00.002+08:00| null| 0| +|1970-01-01T08:00:00.003+08:00| 0| 0| +|1970-01-01T08:00:00.004+08:00| 0| 0| +|1970-01-01T08:00:00.005+08:00| 1| 0| +|1970-01-01T08:00:00.006+08:00| 1| 0| +|1970-01-01T08:00:00.007+08:00| 1| 0| +|1970-01-01T08:00:00.008+08:00| 0| 0| +|1970-01-01T08:00:00.009+08:00| 0| 0| +|1970-01-01T08:00:00.010+08:00| 0| 0| ++-----------------------------+-------------+-------------+ +``` + +##### 不使用ignoreNull参数(忽略null) + +SQL: +```sql +select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1 +``` + +输出: +``` ++--------------------------------------------------+--------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3)|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3)| ++--------------------------------------------------+--------------------------------------------------+ +| 2| 1| ++--------------------------------------------------+-------------------------------------------------- +``` + +##### 使用ignoreNull参数 + +SQL: +```sql +select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1 +``` + +输出: +``` ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +| 1| 1| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +``` + +### TIME_DURATION +#### 语法 +```sql + time_duration(Path) +``` +#### 使用示例 +##### 准备数据 +``` ++----------+-------------+ +| Time|root.db.d1.s1| ++----------+-------------+ +| 1| 70| +| 3| 10| +| 4| 303| +| 6| 110| +| 7| 302| +| 8| 110| +| 9| 60| +| 10| 70| +|1677570934| 30| ++----------+-------------+ +``` +##### 写入语句 +```sql +"CREATE DATABASE root.db", +"CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN tags(city=Beijing)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1, 2, 10, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(2, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(3, 10, 0, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(4, 303, 30, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(5, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(6, 110, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(7, 302, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(8, 110, null, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(9, 60, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(10,70, 20, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1677570934, 30, 0, true)", +``` + +查询: +```sql +select time_duration(s1) from root.db.d1 +``` + +输出 +``` ++----------------------------+ +|time_duration(root.db.d1.s1)| ++----------------------------+ +| 1677570933| ++----------------------------+ +``` +> 注:若数据点只有一个,则返回0,若数据点为null,则返回null。 + +### COUNT_TIME +#### 语法 +```sql + count_time(*) +``` +#### 使用示例 +##### 准备数据 +``` ++----------+-------------+-------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2|root.db.d2.s1|root.db.d2.s2| ++----------+-------------+-------------+-------------+-------------+ +| 0| 0| null| null| 0| +| 1| null| 1| 1| null| +| 2| null| 2| 2| null| +| 4| 4| null| null| 4| +| 5| 5| 5| 5| 5| +| 7| null| 7| 7| null| +| 8| 8| 8| 8| 8| +| 9| null| 9| null| null| ++----------+-------------+-------------+-------------+-------------+ +``` +##### 写入语句 +```sql +CREATE DATABASE root.db; +CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d1.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +INSERT INTO root.db.d1(time, s1) VALUES(0, 0), (4,4), (5,5), (8,8); +INSERT INTO root.db.d1(time, s2) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8), (9,9); +INSERT INTO root.db.d2(time, s1) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8); +INSERT INTO root.db.d2(time, s2) VALUES(0, 0), (4,4), (5,5), (8,8); +``` + +查询示例1: +```sql +select count_time(*) from root.db.** +``` + +输出 +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +查询示例2: +```sql +select count_time(*) from root.db.d1, root.db.d2 +``` + +输出 +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +查询示例3: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) +``` + +输出 +``` ++-----------------------------+-------------+ +| Time|count_time(*)| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 1| +|1970-01-01T08:00:00.008+08:00| 2| ++-----------------------------+-------------+ +``` + +查询示例4: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) align by device +``` + +输出 +``` ++-----------------------------+----------+-------------+ +| Time| Device|count_time(*)| ++-----------------------------+----------+-------------+ +|1970-01-01T08:00:00.000+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.000+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d2| 1| ++-----------------------------+----------+-------------+ + +``` + +> 注: +> 1. count_time里的表达式只能为*。 +> 2. count_time不能和其他的聚合函数一起使用。 +> 3. having语句里不支持使用count_time, 使用count_time聚合函数时不支持使用having语句。 +> 4. count_time不支持与group by level, group by tag一起使用。 + + + +### MAX_BY + +#### 功能定义 +max_by(x, y): 返回 y 最大时对应时间戳下的 x 值。 +- max_by 必须有两个输入参数 x 和 y。 +- 第一个输入可以为 time 关键字, max_by(time, x) 返回 x 取最大值时对应的时间戳。 +- 如果 y 最大时对应的时间戳下 x 为 null,则返回 null。 +- 如果 y 可以在多个时间戳下取得最大值,取最大值中最小时间戳对应的 x 值。 +- 与 IoTDB max_value 保持一致,仅支持 INT32、INT64、FLOAT、DOUBLE 作为 y 的输入,支持所有六种类型作为 x 的输入。 +- x, y 的输入均不允许为具体数值。 + + +#### 语法 +```sql +select max_by(x, y) from root.sg +select max_by(time, x) from root.sg +``` + +#### 使用示例 + +##### 原始数据 +```sql +IoTDB> select * from root.test ++-----------------------------+-----------+-----------+ +| Time|root.test.a|root.test.b| ++-----------------------------+-----------+-----------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 10.0| +|1970-01-01T08:00:00.002+08:00| 2.0| 10.0| +|1970-01-01T08:00:00.003+08:00| 3.0| 3.0| +|1970-01-01T08:00:00.004+08:00| 10.0| 10.0| +|1970-01-01T08:00:00.005+08:00| 10.0| 12.0| +|1970-01-01T08:00:00.006+08:00| 6.0| 6.0| ++-----------------------------+-----------+-----------+ +``` +##### 查询示例 +查询最大值对应的时间戳: +```sql +IoTDB> select max_by(time, a), max_value(a) from root.test ++-------------------------+------------------------+ +|max_by(Time, root.test.a)| max_value(root.test.a)| ++-------------------------+------------------------+ +| 4| 10.0| ++-------------------------+------------------------+ +``` + +求 a 最大时对应的 b 值: +```sql +IoTDB> select max_by(b, a) from root.test ++--------------------------------+ +|max_by(root.test.b, root.test.a)| ++--------------------------------+ +| 10.0| ++--------------------------------+ +``` + +结合表达式使用: +```sql +IoTDB> select max_by(b + 1, a * 2) from root.test ++----------------------------------------+ +|max_by(root.test.b + 1, root.test.a * 2)| ++----------------------------------------+ +| 11.0| ++----------------------------------------+ +``` + +结合 group by time 子句使用: +```sql +IoTDB> select max_by(b, a) from root.test group by ([0,7),4ms) ++-----------------------------+--------------------------------+ +| Time|max_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +结合 having 子句使用: +```sql +IoTDB> select max_by(b, a) from root.test group by ([0,7),4ms) having max_by(b, a) > 4.0 ++-----------------------------+--------------------------------+ +| Time|max_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` +结合 order by 子句使用: +```sql +IoTDB> select max_by(b, a) from root.test group by ([0,7),4ms) order by time desc ++-----------------------------+--------------------------------+ +| Time|max_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +``` + +#### 功能定义 +min_by(x, y): 返回 y 最小时对应时间戳下的 x 值。 +- min_by 必须有两个输入参数 x 和 y。 +- 第一个输入可以为 time 关键字, min_by(time, x) 返回 x 取最小值时对应的时间戳。 +- 如果 y 最大时对应的时间戳下 x 为 null,则返回 null。 +- 如果 y 可以在多个时间戳下取得最小值,取最小值中最小时间戳对应的 x 值。 +- 与 IoTDB min_value 保持一致,仅支持 INT32、INT64、FLOAT、DOUBLE 作为 y 的输入,支持所有六种类型作为 x 的输入。 +- x, y 的输入均不允许为具体数值。 + +#### 语法 +```sql +select min_by(x, y) from root.sg +select min_by(time, x) from root.sg +``` + +#### 使用示例 + +##### 原始数据 +```sql +IoTDB> select * from root.test ++-----------------------------+-----------+-----------+ +| Time|root.test.a|root.test.b| ++-----------------------------+-----------+-----------+ +|1970-01-01T08:00:00.001+08:00| 4.0| 10.0| +|1970-01-01T08:00:00.002+08:00| 3.0| 10.0| +|1970-01-01T08:00:00.003+08:00| 2.0| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 10.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 12.0| +|1970-01-01T08:00:00.006+08:00| 6.0| 6.0| ++-----------------------------+-----------+-----------+ +``` +##### 查询示例 +查询最小值对应的时间戳: +```sql +IoTDB> select min_by(time, a), min_value(a) from root.test ++-------------------------+------------------------+ +|min_by(Time, root.test.a)| min_value(root.test.a)| ++-------------------------+------------------------+ +| 4| 1.0| ++-------------------------+------------------------+ +``` + +求 a 最小时对应的 b 值: +```sql +IoTDB> select min_by(b, a) from root.test ++--------------------------------+ +|min_by(root.test.b, root.test.a)| ++--------------------------------+ +| 10.0| ++--------------------------------+ +``` + +结合表达式使用: +```sql +IoTDB> select min_by(b + 1, a * 2) from root.test ++----------------------------------------+ +|min_by(root.test.b + 1, root.test.a * 2)| ++----------------------------------------+ +| 11.0| ++----------------------------------------+ +``` + +结合 group by time 子句使用: +```sql +IoTDB> select min_by(b, a) from root.test group by ([0,7),4ms) ++-----------------------------+--------------------------------+ +| Time|min_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +结合 having 子句使用: +```sql +IoTDB> select min_by(b, a) from root.test group by ([0,7),4ms) having max_by(b, a) > 4.0 ++-----------------------------+--------------------------------+ +| Time|min_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +结合 order by 子句使用: +```sql +IoTDB> select min_by(b, a) from root.test group by ([0,7),4ms) order by time desc ++-----------------------------+--------------------------------+ +| Time|min_by(root.test.b, root.test.a)| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.004+08:00| 10.0| ++-----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 3.0| ++-----------------------------+--------------------------------+ +``` + + + + +## 算数运算符和函数 + +### 算数运算符 + +#### 一元算数运算符 + +支持的运算符:`+`, `-` + +输入数据类型要求:`INT32`, `INT64`, `FLOAT`, `DOUBLE` + +输出数据类型:与输入数据类型一致 + +#### 二元算数运算符 + +支持的运算符:`+`, `-`, `*`, `/`, `%` + +输入数据类型要求:`INT32`, `INT64`, `FLOAT`和`DOUBLE` + +输出数据类型:`DOUBLE` + +注意:当某个时间戳下左操作数和右操作数都不为空(`null`)时,二元运算操作才会有输出结果 + +#### 使用示例 + +例如: + +```sql +select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 +``` + +结果: + +``` ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.sg.d1.s1|-root.sg.d1.s1|root.sg.d1.s2|root.sg.d1.s2|root.sg.d1.s1 + root.sg.d1.s2|root.sg.d1.s1 - root.sg.d1.s2|root.sg.d1.s1 * root.sg.d1.s2|root.sg.d1.s1 / root.sg.d1.s2|root.sg.d1.s1 % root.sg.d1.s2| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| -1.0| 1.0| 1.0| 2.0| 0.0| 1.0| 1.0| 0.0| +|1970-01-01T08:00:00.002+08:00| 2.0| -2.0| 2.0| 2.0| 4.0| 0.0| 4.0| 1.0| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.0| -3.0| 3.0| 3.0| 6.0| 0.0| 9.0| 1.0| 0.0| +|1970-01-01T08:00:00.004+08:00| 4.0| -4.0| 4.0| 4.0| 8.0| 0.0| 16.0| 1.0| 0.0| +|1970-01-01T08:00:00.005+08:00| 5.0| -5.0| 5.0| 5.0| 10.0| 0.0| 25.0| 1.0| 0.0| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.014s +``` + +### 数学函数 + +目前 IoTDB 支持下列数学函数,这些数学函数的行为与这些函数在 Java Math 标准库中对应实现的行为一致。 + +| 函数名 | 输入序列类型 | 输出序列类型 | 必要属性参数 | Java 标准库中的对应实现 | +| ------- | ------------------------------ | ------------------------ |-----------| ------------------------------------------------------------ | +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | 与输入序列的实际类型一致 | | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE |`places`:四舍五入有效位数,正数为小数点后面的有效位数,负数为整数位的有效位数 | Math#rint(Math#pow(10,places))/Math#pow(10,places)| +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sqrt(double) | + +例如: + +``` sql +select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +结果: + +``` ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +| Time| root.sg1.d1.s1|sin(root.sg1.d1.s1)| cos(root.sg1.d1.s1)|tan(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 0.8133527237573284| 0.5817708713544664| 1.3980636773094157| +|2020-12-10T17:11:49.038+08:00|4377791063319964531|-0.8938962705202537| 0.4482738644511651| -1.994085181866842| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 0.9627757585308978|-0.27030138509681073|-3.5618602479083545| +|2020-12-10T17:11:49.040+08:00|2508858212791964081|-0.6073417341629443| -0.7944406950452296| 0.7644897069734913| +|2020-12-10T17:11:49.041+08:00|2817297431185141819|-0.8419358900502509| -0.5395775727782725| 1.5603611649667768| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +Total line number = 5 +It costs 0.008s +``` +#### ROUND + +例如: +```sql +select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1 +``` + +```sql ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +| Time|root.db.d1.s4|ROUND(root.db.d1.s4)|ROUND(root.db.d1.s4,2)|ROUND(root.db.d1.s4,-1)| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 101.14345| 101.0| 101.14| 100.0| +|1970-01-01T08:00:00.002+08:00| 20.144346| 20.0| 20.14| 20.0| +|1970-01-01T08:00:00.003+08:00| 20.614372| 21.0| 20.61| 20.0| +|1970-01-01T08:00:00.005+08:00| 20.814346| 21.0| 20.81| 20.0| +|1970-01-01T08:00:00.006+08:00| 60.71443| 61.0| 60.71| 60.0| +|2023-03-13T16:16:19.764+08:00| 10.143425| 10.0| 10.14| 10.0| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +Total line number = 6 +It costs 0.059s +``` + + + +## 比较运算符和函数 + +### 基本比较运算符 + +- 输入数据类型: `INT32`, `INT64`, `FLOAT`, `DOUBLE`。 +- 注意:会将所有数据转换为`DOUBLE`类型后进行比较。`==`和`!=`可以直接比较两个`BOOLEAN`。 +- 返回类型:`BOOLEAN`。 + +|运算符 |含义| +|----------------------------|-----------| +|`>` |大于| +|`>=` |大于等于| +|`<` |小于| +|`<=` |小于等于| +|`==` |等于| +|`!=` / `<>` |不等于| + +**示例:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +运行结果 +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` + +### `BETWEEN ... AND ...` 运算符 + +|运算符 |含义| +|----------------------------|-----------| +|`BETWEEN ... AND ...` |在指定范围内| +|`NOT BETWEEN ... AND ...` |不在指定范围内| + +**示例:** 选择区间 [36.5,40] 内或之外的数据: + +```sql +select temperature from root.sg1.d1 where temperature between 36.5 and 40; +``` + +```sql +select temperature from root.sg1.d1 where temperature not between 36.5 and 40; +``` + +### 模糊匹配运算符 + +对于 TEXT 类型的数据,支持使用 `Like` 和 `Regexp` 运算符对数据进行模糊匹配 + +|运算符 |含义| +|----------------------------|-----------| +|`LIKE` |匹配简单模式| +|`NOT LIKE` |无法匹配简单模式| +|`REGEXP` |匹配正则表达式| +|`NOT REGEXP` |无法匹配正则表达式| + +输入数据类型:`TEXT` + +返回类型:`BOOLEAN` + +#### 使用 `Like` 进行模糊匹配 + +**匹配规则:** + +- `%` 表示任意0个或多个字符。 +- `_` 表示任意单个字符。 + +**示例 1:** 查询 `root.sg.d1` 下 `value` 含有`'cc'`的数据。 + +```shell +IoTDB> select * from root.sg.d1 where value like '%cc%' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 `root.sg.d1` 下 `value` 中间为 `'b'`、前后为任意单个字符的数据。 + +```shell +IoTDB> select * from root.sg.device where value like '_b_' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00| abc| ++-----------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +#### 使用 `Regexp` 进行模糊匹配 + +需要传入的过滤条件为 **Java 标准库风格的正则表达式**。 + +**常见的正则匹配举例:** + +``` +长度为3-20的所有字符:^.{3,20}$ +大写英文字符:^[A-Z]+$ +数字和英文字符:^[A-Za-z0-9]+$ +以a开头的:^a.* +``` + +**示例 1:** 查询 root.sg.d1 下 value 值为26个英文字符组成的字符串。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 root.sg.d1 下 value 值为26个小写英文字符组成的字符串且时间大于100的。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 3:** + +```sql +select b, b like '1%', b regexp '[0-2]' from root.test; +``` + +运行结果 +``` ++-----------------------------+-----------+-------------------------+--------------------------+ +| Time|root.test.b|root.test.b LIKE '^1.*?$'|root.test.b REGEXP '[0-2]'| ++-----------------------------+-----------+-------------------------+--------------------------+ +|1970-01-01T08:00:00.001+08:00| 111test111| true| true| +|1970-01-01T08:00:00.003+08:00| 333test333| false| false| ++-----------------------------+-----------+-------------------------+--------------------------+ +``` + +### `IS NULL` 运算符 + +|运算符 |含义| +|----------------------------|-----------| +|`IS NULL` |是空值| +|`IS NOT NULL` |不是空值| + +**示例 1:** 选择值为空的数据: + +```sql +select code from root.sg1.d1 where temperature is null; +``` + +**示例 2:** 选择值为非空的数据: + +```sql +select code from root.sg1.d1 where temperature is not null; +``` + +### `IN` 运算符 + +|运算符 |含义| +|----------------------------|-----------| +|`IN` / `CONTAINS` |是指定列表中的值| +|`NOT IN` / `NOT CONTAINS` |不是指定列表中的值| + +输入数据类型:`All Types` + +返回类型 `BOOLEAN` + +**注意:请确保集合中的值可以被转为输入数据的类型。** +> 例如: +> +>`s1 in (1, 2, 3, 'test')`,`s1`的数据类型是`INT32` +> +> 我们将会抛出异常,因为`'test'`不能被转为`INT32`类型 + +**示例 1:** 选择值在特定范围内的数据: + +```sql +select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +``` + +**示例 2:** 选择值在特定范围外的数据: + +```sql +select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +``` + +**示例 3:** + +```sql +select a, a in (1, 2) from root.test; +``` + +输出2: +``` ++-----------------------------+-----------+--------------------+ +| Time|root.test.a|root.test.a IN (1,2)| ++-----------------------------+-----------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1| true| +|1970-01-01T08:00:00.003+08:00| 3| false| ++-----------------------------+-----------+--------------------+ +``` + +### 条件函数 + +条件函数针对每个数据点进行条件判断,返回布尔值。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`:DOUBLE类型 | BOOLEAN 类型 | 返回`ts_value >= threshold`的bool值 | +| IN_RANGE | INT32 / INT64 / FLOAT / DOUBLE | `lower`:DOUBLE类型
`upper`:DOUBLE类型 | BOOLEAN类型 | 返回`ts_value >= lower && ts_value <= upper`的bool值 | | + +测试数据: + +``` +IoTDB> select ts from root.test; ++-----------------------------+------------+ +| Time|root.test.ts| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 3| +|1970-01-01T08:00:00.004+08:00| 4| ++-----------------------------+------------+ +``` + +**示例 1:** + +SQL语句: +```sql +select ts, on_off(ts, 'threshold'='2') from root.test; +``` + +输出: +``` +IoTDB> select ts, on_off(ts, 'threshold'='2') from root.test; ++-----------------------------+------------+-------------------------------------+ +| Time|root.test.ts|on_off(root.test.ts, "threshold"="2")| ++-----------------------------+------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| true| ++-----------------------------+------------+-------------------------------------+ +``` + +**示例 2:** + +Sql语句: +```sql +select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; +``` + +输出: +``` +IoTDB> select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; ++-----------------------------+------------+--------------------------------------------------+ +| Time|root.test.ts|in_range(root.test.ts, "lower"="2", "upper"="3.1")| ++-----------------------------+------------+--------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| false| ++-----------------------------+------------+--------------------------------------------------+ +``` + + + +## 逻辑运算符 + +### 一元逻辑运算符 + +- 支持运算符:`!` +- 输入数据类型:`BOOLEAN`。 +- 输出数据类型:`BOOLEAN`。 +- 注意:`!`的优先级很高,记得使用括号调整优先级。 + +### 二元逻辑运算符 + +- 支持运算符 + - AND:`and`,`&`, `&&` + - OR:`or`,`|`,`||` + +- 输入数据类型:`BOOLEAN`。 + +- 返回类型 `BOOLEAN`。 + +- 注意:当某个时间戳下左操作数和右操作数都为`BOOLEAN`类型时,二元逻辑操作才会有输出结果。 + +**示例:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +运行结果 +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` + + + +## 字符串处理 + +### STRING_CONTAINS + +#### 函数简介 + +本函数判断字符串中是否存在子串 `s` + +**函数名:** STRING_CONTAINS + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `s`: 待搜寻的字符串。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +#### 使用示例 + +``` sql +select s1, string_contains(s1, 's'='warn') from root.sg1.d4; +``` + +结果: + +``` ++-----------------------------+--------------+-------------------------------------------+ +| Time|root.sg1.d4.s1|string_contains(root.sg1.d4.s1, "s"="warn")| ++-----------------------------+--------------+-------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| true| +|1970-01-01T08:00:00.002+08:00| error:-37229| false| +|1970-01-01T08:00:00.003+08:00| warn:1731| true| ++-----------------------------+--------------+-------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +### STRING_MATCHES + +#### 函数简介 + +本函数判断字符串是否能够被正则表达式`regex`匹配。 + +**函数名:** STRING_MATCHES + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `regex`: Java 标准库风格的正则表达式。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +#### 使用示例 + +``` sql +select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; +``` + +结果: + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d4.s1|string_matches(root.sg1.d4.s1, "regex"="[^\\s]+37229")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| false| +|1970-01-01T08:00:00.002+08:00| error:-37229| true| +|1970-01-01T08:00:00.003+08:00| warn:1731| false| ++-----------------------------+--------------+------------------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +### Length + +#### 函数简介 + +本函数用于获取输入序列的长度。 + +**函数名:** LENGTH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**输出序列:** 输出单个序列,类型为 INT32。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, length(s1) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+----------------------+ +| Time|root.sg1.d1.s1|length(root.sg1.d1.s1)| ++-----------------------------+--------------+----------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 6| +|1970-01-01T08:00:00.002+08:00| 22test22| 8| ++-----------------------------+--------------+----------------------+ +``` + +### Locate + +#### 函数简介 + +本函数用于获取`target`子串第一次出现在输入序列的位置,如果输入序列中不包含`target`则返回 -1 。 + +**函数名:** LOCATE + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `target`: 需要被定位的子串。 ++ `reverse`: 指定是否需要倒序定位,默认值为`false`, 即从左至右定位。 + +**输出序列:** 输出单个序列,类型为INT32。 + +**提示:** 下标从 0 开始。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, locate(s1, "target"="1") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 0| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------+ +``` + +另一个用于查询的 SQL 语句: + +```sql +select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1", "reverse"="true")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 5| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------------------------+ +``` + +### StartsWith + +#### 函数简介 + +本函数用于判断输入序列是否有指定前缀。 + +**函数名:** STARTSWITH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `target`: 需要匹配的前缀。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, startswith(s1, "target"="1") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+----------------------------------------+ +| Time|root.sg1.d1.s1|startswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+----------------------------------------+ +``` + +### EndsWith + +#### 函数简介 + +本函数用于判断输入序列是否有指定后缀。 + +**函数名:** ENDSWITH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `target`: 需要匹配的后缀。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, endswith(s1, "target"="1") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|endswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+--------------------------------------+ +``` + +### Concat + +#### 函数简介 + +本函数用于拼接输入序列和`target`字串。 + +**函数名:** CONCAT + +**输入序列:** 至少一个输入序列,类型为 TEXT。 + +**参数:** ++ `targets`: 一系列 K-V, key需要以`target`为前缀且不重复, value是待拼接的字符串。 ++ `series_behind`: 指定拼接时时间序列是否在后面,默认为`false`。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** ++ 如果输入序列是NULL, 跳过该序列的拼接。 ++ 函数只能将输入序列和`targets`区分开各自拼接。`concat(s1, "target1"="IoT", s2, "target2"="DB")`和 + `concat(s1, s2, "target1"="IoT", "target2"="DB")`得到的结果是一样的。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| 1test1IoTDB| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 22test222222testIoTDB| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +``` + +另一个用于查询的 SQL 语句: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB", "series_behind"="true")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| IoTDB1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| IoTDB22test222222test| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +``` + +### Substring + +#### 函数简介 +提取字符串的子字符串,从指定的第一个字符开始,并在指定的字符数之后停止。下标从1开始。from 和 for的范围是 INT32 类型取值范围。 + +**函数名:** SUBSTRING + + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**参数:** ++ `from`: 指定子串开始下标。 ++ `for`: 指定多少个字符数后停止。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, substring(s1 from 1 for 2) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|SUBSTRING(root.sg1.d1.s1 FROM 1 FOR 2)| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1t| +|1970-01-01T08:00:00.002+08:00| 22test22| 22| ++-----------------------------+--------------+--------------------------------------+ +``` + +### Replace + +#### 函数简介 +将输入序列中的子串替换成目标子串。 + +**函数名:** REPLACE + + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**参数:** ++ 第一个参数: 需要替换的目标子串。 ++ 第二个参数: 要替换成的子串。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, replace(s1, 'es', 'tt') from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+-----------------------------------+ +| Time|root.sg1.d1.s1|REPLACE(root.sg1.d1.s1, 'es', 'tt')| ++-----------------------------+--------------+-----------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1tttt1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22tttt22| ++-----------------------------+--------------+-----------------------------------+ +``` + +### Upper + +#### 函数简介 + +本函数用于将输入序列转化为大写。 + +**函数名:** UPPER + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, upper(s1) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|upper(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22TEST22| ++-----------------------------+--------------+---------------------+ +``` + +### Lower + +#### 函数简介 + +本函数用于将输入序列转换为小写。 + +**函数名:** LOWER + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, lower(s1) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|lower(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| 1test1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| 22test22| ++-----------------------------+--------------+---------------------+ +``` + +### Trim + +#### 函数简介 + +本函数用于移除输入序列前后的空格。 + +**函数名:** TRIM + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s3| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s3, trim(s3) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------------+ +| Time|root.sg1.d1.s3|trim(root.sg1.d1.s3)| ++-----------------------------+--------------+--------------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | 3querytest3| ++-----------------------------+--------------+--------------------+ +``` + +### StrCmp + +#### 函数简介 + +本函数用于比较两个输入序列。 如果值相同返回 `0` , 序列1的值小于序列2的值返回一个`负数`,序列1的值大于序列2的值返回一个`正数`。 + +**函数名:** StrCmp + +**输入序列:** 输入两个序列,类型均为 TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果任何一个输入是NULL,返回NULL。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, s2, strcmp(s1, s2) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|strcmp(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 66| ++-----------------------------+--------------+--------------+--------------------------------------+ +``` + +### StrReplace + +#### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于将文本中的子串替换为指定的字符串。 + +**函数名:** STRREPLACE + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `target`: 需要替换的字符子串 ++ `replace`: 替换后的字符串。 ++ `limit`: 替换次数,大于等于 -1 的整数,默认为 -1 表示所有匹配的子串都会被替换。 ++ `offset`: 需要跳过的匹配次数,即前`offset`次匹配到的字符子串并不会被替换,默认为 0。 ++ `reverse`: 是否需要反向计数,默认为 false 即按照从左向右的次序。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",",| +| | "replace"="/", "limit"="2")| ++-----------------------------+-----------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A/B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A/A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B/B| +|2021-01-01T00:00:04.000+08:00| A+/A/A+,A| +|2021-01-01T00:00:05.000+08:00| A/B-/B,B| ++-----------------------------+-----------------------------------------+ +``` + +另一个用于查询的 SQL 语句: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= | +| | "|", "limit"="1", "offset"="1", "reverse"="true")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A,B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B,B| +|2021-01-01T00:00:04.000+08:00| A+,A/A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-/B,B| ++-----------------------------+-----------------------------------------------------+ +``` + +### RegexMatch + +#### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于正则表达式匹配文本中的具体内容并返回。 + +**函数名:** REGEXMATCH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `regex`: 匹配的正则表达式,支持所有 Java 正则表达式语法,比如`\d+\.\d+\.\d+\.\d+`将会匹配任意 IPv4 地址. ++ `group`: 输出的匹配组序号,根据 java.util.regex 规定,第 0 组为整个正则表达式,此后的组按照左括号出现的顺序依次编号。 + 如`A(B(CD))`中共有三个组,第 0 组`A(B(CD))`,第 1 组`B(CD)`和第 2 组`CD`。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 空值或无法匹配给定的正则表达式的数据点没有输出结果。 + +#### 使用示例 + + +输入序列: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +用于查询的 SQL 语句: + +```sql +select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------------------------+ +| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")| ++-----------------------------+----------------------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 192.168.0.1| +|2021-01-01T00:00:02.000+08:00| 192.168.0.24| +|2021-01-01T00:00:03.000+08:00| 192.168.0.2| +|2021-01-01T00:00:04.000+08:00| 192.168.0.5| +|2021-01-01T00:00:05.000+08:00| 192.168.0.124| ++-----------------------------+----------------------------------------------------------------------+ +``` + +### RegexReplace + +##### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于将文本中符合正则表达式的匹配结果替换为指定的字符串。 + +**函数名:** REGEXREPLACE + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `regex`: 需要替换的正则表达式,支持所有 Java 正则表达式语法。 ++ `replace`: 替换后的字符串,支持 Java 正则表达式中的后向引用, + 形如'$1'指代了正则表达式`regex`中的第一个分组,并会在替换时自动填充匹配到的子串。 ++ `limit`: 替换次数,大于等于 -1 的整数,默认为 -1 表示所有匹配的子串都会被替换。 ++ `offset`: 需要跳过的匹配次数,即前`offset`次匹配到的字符子串并不会被替换,默认为 0。 ++ `reverse`: 是否需要反向计数,默认为 false 即按照从左向右的次序。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +##### 使用示例 + +输入序列: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +用于查询的 SQL 语句: + +```sql +select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------------+ +| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",| +| | "replace"="cluster-$1", "limit"="1")| ++-----------------------------+-----------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]| ++-----------------------------+-----------------------------------------------------------+ +``` + +#### RegexSplit + +##### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于使用给定的正则表达式切分文本,并返回指定的项。 + +**函数名:** REGEXSPLIT + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `regex`: 用于分割文本的正则表达式,支持所有 Java 正则表达式语法,比如`['"]`将会匹配任意的英文引号`'`和`"`。 ++ `index`: 输出结果在切分后数组中的序号,需要是大于等于 -1 的整数,默认值为 -1 表示返回切分后数组的长度,其它非负整数即表示返回数组中对应位置的切分结果(数组的秩从 0 开始计数)。 + +**输出序列:** 输出单个序列,在`index`为 -1 时输出数据类型为 INT32,否则为 TEXT。 + +**提示:** 如果`index`超出了切分后结果数组的秩范围,例如使用`,`切分`0,1,2`时输入`index`为 3,则该数据点没有输出结果。 + +##### 使用示例 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 4| +|2021-01-01T00:00:02.000+08:00| 4| +|2021-01-01T00:00:03.000+08:00| 3| +|2021-01-01T00:00:04.000+08:00| 4| +|2021-01-01T00:00:05.000+08:00| 4| ++-----------------------------+------------------------------------------------------+ +``` + +另一个查询的 SQL 语句: + +```sql +select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| B-| +|2021-01-01T00:00:02.000+08:00| B+| +|2021-01-01T00:00:04.000+08:00| A| +|2021-01-01T00:00:05.000+08:00| B| ++-----------------------------+-----------------------------------------------------+ +``` + + + +## 数据类型转换 + +### CAST + +#### 函数简介 + +当前 IoTDB 支持6种数据类型,其中包括 INT32、INT64、FLOAT、DOUBLE、BOOLEAN 以及 TEXT。当我们对数据进行查询或者计算时可能需要进行数据类型的转换, 比如说将 TEXT 转换为 INT32,或者提高数据精度,比如说将 FLOAT 转换为 DOUBLE。IoTDB 支持使用cast 函数对数据类型进行转换。 + +语法示例如下: + +```sql +SELECT cast(s1 as INT32) from root.sg +``` + +cast 函数语法形式上与 PostgreSQL 一致,AS 后指定的数据类型表明要转换成的目标类型,目前 IoTDB 支持的六种数据类型均可以在 cast 函数中使用,遵循的转换规则如下表所示,其中行表示原始数据类型,列表示要转化成的目标数据类型: + +| | **INT32** | **INT64** | **FLOAT** | **DOUBLE** | **BOOLEAN** | **TEXT** | +| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------------------------------------------- | ----------------------- | ------------------------------------------------------------ | -------------------------------- | +| **INT32** | 不转化 | 直接转化 | 直接转化 | 直接转化 | !=0 : true
==0: false | String.valueOf() | +| **INT64** | 超出 INT32 范围:执行抛异常
否则:直接转化 | 不转化 | 直接转化 | 直接转化 | !=0L : true
==0: false | String.valueOf() | +| **FLOAT** | 超出 INT32 范围:执行抛异常
否则:四舍五入(Math.round()) | 超出 INT64 范围:执行抛异常
否则:四舍五入(Math.round()) | 不转化 | 直接转化 | !=0.0f : true
==0: false | String.valueOf() | +| **DOUBLE** | 超出 INT32 范围:执行抛异常
否则:四舍五入(Math.round()) | 超出 INT64 范围:执行抛异常
否则:四舍五入(Math.round()) | 超出 FLOAT 范围:执行抛异常
否则:直接转化 | 不转化 | !=0.0 : true
==0: false | String.valueOf() | +| **BOOLEAN** | true: 1
false: 0 | true: 1L
false: 0 | true: 1.0f
false: 0 | true: 1.0
false: 0 | 不转化 | true: "true"
false: "false" | +| **TEXT** | Integer.parseInt() | Long.parseLong() | Float.parseFloat() | Double.parseDouble() | text.toLowerCase =="true" : true
text.toLowerCase =="false" : false
其它情况:执行抛异常 | 不转化 | + +#### 使用示例 + +``` +// timeseries +IoTDB> show timeseries root.sg.d1.** ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +|root.sg.d1.s3| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s4| null| root.sg| DOUBLE| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s5| null| root.sg| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s6| null| root.sg| TEXT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s1| null| root.sg| INT32| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s2| null| root.sg| INT64| PLAIN| SNAPPY|null| null| null| null| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ + +// data of timeseries +IoTDB> select * from root.sg.d1; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d1.s3|root.sg.d1.s4|root.sg.d1.s5|root.sg.d1.s6|root.sg.d1.s1|root.sg.d1.s2| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| false| 10000| 0| 0| +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| false| 3| 1| 1| +|1970-01-01T08:00:00.002+08:00| 2.7| 2.7| true| TRue| 2| 2| +|1970-01-01T08:00:00.003+08:00| 3.33| 3.33| true| faLse| 3| 3| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ + +// cast BOOLEAN to other types +IoTDB> select cast(s5 as INT32), cast(s5 as INT64),cast(s5 as FLOAT),cast(s5 as DOUBLE), cast(s5 as TEXT) from root.sg.d1 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +| Time|CAST(root.sg.d1.s5 AS INT32)|CAST(root.sg.d1.s5 AS INT64)|CAST(root.sg.d1.s5 AS FLOAT)|CAST(root.sg.d1.s5 AS DOUBLE)|CAST(root.sg.d1.s5 AS TEXT)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ + +// cast TEXT to numeric types +IoTDB> select cast(s6 as INT32), cast(s6 as INT64), cast(s6 as FLOAT), cast(s6 as DOUBLE) from root.sg.d1 where time < 2 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +| Time|CAST(root.sg.d1.s6 AS INT32)|CAST(root.sg.d1.s6 AS INT64)|CAST(root.sg.d1.s6 AS FLOAT)|CAST(root.sg.d1.s6 AS DOUBLE)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 10000| 10000| 10000.0| 10000.0| +|1970-01-01T08:00:00.001+08:00| 3| 3| 3.0| 3.0| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ + +// cast TEXT to BOOLEAN +IoTDB> select cast(s6 as BOOLEAN) from root.sg.d1 where time >= 2 ++-----------------------------+------------------------------+ +| Time|CAST(root.sg.d1.s6 AS BOOLEAN)| ++-----------------------------+------------------------------+ +|1970-01-01T08:00:00.002+08:00| true| +|1970-01-01T08:00:00.003+08:00| false| ++-----------------------------+------------------------------+ +``` + + + +## 常序列生成函数 + +常序列生成函数用于生成所有数据点的值都相同的时间序列。 + +常序列生成函数接受一个或者多个时间序列输入,其输出的数据点的时间戳集合是这些输入序列时间戳集合的并集。 + +目前 IoTDB 支持如下常序列生成函数: + +| 函数名 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| ------ | ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | +| CONST | `value`: 输出的数据点的值
`type`: 输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数 `type` 决定 | 根据输入属性 `value` 和 `type` 输出用户指定的常序列。 | +| PI | 无 | DOUBLE | 常序列的值:`π` 的 `double` 值,圆的周长与其直径的比值,即圆周率,等于 *Java标准库* 中的`Math.PI`。 | +| E | 无 | DOUBLE | 常序列的值:`e` 的 `double` 值,自然对数的底,它等于 *Java 标准库* 中的 `Math.E`。 | + +例如: + +``` sql +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; +``` + +结果: + +``` +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|const(root.sg1.d1.s1, "value"="1024", "type"="INT64")|pi(root.sg1.d1.s2)|e(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 1024| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.001+08:00| 1.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.002+08:00| 2.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.003+08:00| null| 3.0| null| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.004+08:00| null| 4.0| null| 3.141592653589793| 2.718281828459045| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +Total line number = 5 +It costs 0.005s +``` + + + +## 选择函数 + +目前 IoTDB 支持如下选择函数: + +| 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| -------- | ------------------------------------- | ------------------------------------------------- | ------------------------ | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最大的`k`个数据点。若多于`k`个数据点的值并列最大,则返回时间戳最小的数据点。 | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最小的`k`个数据点。若多于`k`个数据点的值并列最小,则返回时间戳最小的数据点。 | + +例如: + +``` sql +select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; +``` + +结果: + +``` ++-----------------------------+--------------------+------------------------------+---------------------------------+ +| Time| root.sg1.d2.s1|top_k(root.sg1.d2.s1, "k"="2")|bottom_k(root.sg1.d2.s1, "k"="2")| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +|2020-12-10T20:36:15.531+08:00| 1531604122307244742| 1531604122307244742| null| +|2020-12-10T20:36:15.532+08:00|-7426070874923281101| null| null| +|2020-12-10T20:36:15.533+08:00|-7162825364312197604| -7162825364312197604| null| +|2020-12-10T20:36:15.534+08:00|-8581625725655917595| null| -8581625725655917595| +|2020-12-10T20:36:15.535+08:00|-7667364751255535391| null| -7667364751255535391| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +Total line number = 5 +It costs 0.006s +``` + + + +## 区间查询函数 + +### 连续满足区间函数 + +连续满足条件区间函数用来查询所有满足指定条件的连续区间。 + +按返回值可分为两类: +1. 返回满足条件连续区间的起始时间戳和时间跨度(时间跨度为0表示此处只有起始时间这一个数据点满足条件) +2. 返回满足条件连续区间的起始时间戳和后面连续满足条件的点的个数(个数为1表示此处只有起始时间这一个数据点满足条件) + +| 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | +|-------------------|--------------------------------------|------------------------------------------------|-------|------------------------------------------------------------------| +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | + +测试数据: +``` +IoTDB> select s1,s2,s3,s4,s5 from root.sg.d2; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d2.s1|root.sg.d2.s2|root.sg.d2.s3|root.sg.d2.s4|root.sg.d2.s5| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.004+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.005+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.006+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.007+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +``` + +sql: +```sql +select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; +``` + +结果: +``` ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +| Time|root.sg.d2.s1|zero_count(root.sg.d2.s1)|non_zero_count(root.sg.d2.s2)|zero_duration(root.sg.d2.s3)|non_zero_duration(root.sg.d2.s4)| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.001+08:00| 1| null| 2| null| 1| +|1970-01-01T08:00:00.002+08:00| 1| null| null| null| null| +|1970-01-01T08:00:00.003+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.004+08:00| 1| null| 1| null| 0| +|1970-01-01T08:00:00.005+08:00| 0| 2| null| 1| null| +|1970-01-01T08:00:00.006+08:00| 0| null| null| null| null| +|1970-01-01T08:00:00.007+08:00| 1| null| 1| null| 0| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +``` + + + +## 趋势计算函数 + +目前 IoTDB 支持如下趋势计算函数: + +| 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | +| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 无 | INT64 | 统计序列中某数据点的时间戳与前一数据点时间戳的差。范围内第一个数据点没有对应的结果输出。 | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 无 | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 无 | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差的绝对值。范围内第一个数据点没有对应的结果输出。 | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | 无 | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率,数量上等同于 DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | 无 | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率的绝对值,数量上等同于 NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:可选,默认为true;为true时,前一个数据点值为null时,忽略该数据点继续向前找到第一个出现的不为null的值;为false时,如果前一个数据点为null,则不忽略,使用null进行相减,结果也为null | DOUBLE | 统计序列中某数据点的值与前一数据点的值的差。第一个数据点没有对应的结果输出,输出值为null | + +例如: + +``` sql +select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +结果: + +``` ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +| Time| root.sg1.d1.s1|time_difference(root.sg1.d1.s1)|difference(root.sg1.d1.s1)|non_negative_difference(root.sg1.d1.s1)|derivative(root.sg1.d1.s1)|non_negative_derivative(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 1| -8431715764844238876| 8431715764844238876| -8.4317157648442388E18| 8.4317157648442388E18| +|2020-12-10T17:11:49.038+08:00|4377791063319964531| 1| -2982932021602795251| 2982932021602795251| -2.982932021602795E18| 2.982932021602795E18| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 1| 3594694504414678384| 3594694504414678384| 3.5946945044146785E18| 3.5946945044146785E18| +|2020-12-10T17:11:49.040+08:00|2508858212791964081| 1| -5463627354942678834| 5463627354942678834| -5.463627354942679E18| 5.463627354942679E18| +|2020-12-10T17:11:49.041+08:00|2817297431185141819| 1| 308439218393177738| 308439218393177738| 3.0843921839317773E17| 3.0843921839317773E17| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +Total line number = 5 +It costs 0.014s +``` + +### 使用示例 + +#### 原始数据 + +``` ++-----------------------------+------------+------------+ +| Time|root.test.s1|root.test.s2| ++-----------------------------+------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| null| +|1970-01-01T08:00:00.003+08:00| null| 3.0| +|1970-01-01T08:00:00.004+08:00| 4| null| +|1970-01-01T08:00:00.005+08:00| 5| 5.0| +|1970-01-01T08:00:00.006+08:00| null| 6.0| ++-----------------------------+------------+------------+ +``` + +#### 不使用ignoreNull参数(忽略null) + +SQL: +```sql +SELECT DIFF(s1), DIFF(s2) from root.test; +``` + +输出: +``` ++-----------------------------+------------------+------------------+ +| Time|DIFF(root.test.s1)|DIFF(root.test.s2)| ++-----------------------------+------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| 2.0| +|1970-01-01T08:00:00.004+08:00| 2.0| null| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+------------------+------------------+ +``` + +#### 使用ignoreNull参数 + +SQL: +```sql +SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; +``` + +输出: +``` ++-----------------------------+----------------------------------------+----------------------------------------+ +| Time|DIFF(root.test.s1, "ignoreNull"="false")|DIFF(root.test.s2, "ignoreNull"="false")| ++-----------------------------+----------------------------------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| null| +|1970-01-01T08:00:00.004+08:00| null| null| +|1970-01-01T08:00:00.005+08:00| 1.0| null| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+----------------------------------------+----------------------------------------+ +``` + + + +## 采样函数 + +### 等数量分桶降采样函数 + +本函数对输入序列进行等数量分桶采样,即根据用户给定的降采样比例和降采样方法将输入序列按固定点数等分为若干桶。在每个桶内通过给定的采样方法进行采样。 + +#### 等数量分桶随机采样 + +对等数量分桶后,桶内进行随机采样。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | 降采样比例 `proportion`,取值范围为`(0, 1]`,默认为`0.1` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶随机采样 | + +##### 示例 + +输入序列:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条数据。 + +``` +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 5.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|.............................|.............................| +|1970-01-01T08:00:00.089+08:00| 89.0| +|1970-01-01T08:00:00.090+08:00| 90.0| +|1970-01-01T08:00:00.091+08:00| 91.0| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 95.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +sql: +```sql +select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+-------------+ +| Time|random_sample| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.035+08:00| 35.0| +|1970-01-01T08:00:00.047+08:00| 47.0| +|1970-01-01T08:00:00.059+08:00| 59.0| +|1970-01-01T08:00:00.063+08:00| 63.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.086+08:00| 86.0| +|1970-01-01T08:00:00.096+08:00| 96.0| ++-----------------------------+-------------+ +Total line number = 10 +It costs 0.024s +``` + +#### 等数量分桶聚合采样 + +采用聚合采样法对输入序列进行采样,用户需要另外提供一个聚合函数参数即 +- `type`:聚合类型,取值为`avg`或`max`或`min`或`sum`或`extreme`或`variance`。在缺省情况下,采用`avg`。其中`extreme`表示等分桶中,绝对值最大的值。`variance`表示采样等分桶中的方差。 + +每个桶采样输出的时间戳为这个桶第一个点的时间戳 + + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`:取值类型有`avg`, `max`, `min`, `sum`, `extreme`, `variance`, 默认为`avg` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶聚合采样 | + +##### 示例 + +输入序列:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条有序数据,同等分桶随机采样的测试数据。 + +sql: +```sql +select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +| Time| agg_avg|agg_max|agg_min|agg_sum|agg_extreme|agg_variance| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| 14.5| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| 20.0| 245.0| 29.0| 8.25| +|1970-01-01T08:00:00.030+08:00| 34.5| 39.0| 30.0| 345.0| 39.0| 8.25| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| 54.5| 59.0| 50.0| 545.0| 59.0| 8.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| 8.25| +|1970-01-01T08:00:00.070+08:00|74.50000000000001| 79.0| 70.0| 745.0| 79.0| 8.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 8.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 8.25| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +Total line number = 10 +It costs 0.044s +``` + +#### 等数量分桶 M4 采样 + +采用M4采样法对输入序列进行采样。即对于每个桶采样首、尾、最小和最大值。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶M4采样 | + +##### 示例 + +输入序列:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条有序数据,同等分桶随机采样的测试数据。 + +sql: +```sql +select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+---------+ +| Time|M4_sample| ++-----------------------------+---------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.038+08:00| 38.0| +|1970-01-01T08:00:00.039+08:00| 39.0| +|1970-01-01T08:00:00.040+08:00| 40.0| +|1970-01-01T08:00:00.041+08:00| 41.0| +|1970-01-01T08:00:00.078+08:00| 78.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.080+08:00| 80.0| +|1970-01-01T08:00:00.081+08:00| 81.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+---------+ +Total line number = 12 +It costs 0.065s +``` + +#### 等数量分桶离群值采样 + +本函数对输入序列进行等数量分桶离群值采样,即根据用户给定的降采样比例和桶内采样个数将输入序列按固定点数等分为若干桶,在每个桶内通过给定的离群值采样方法进行采样。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`取值为`avg`或`stendis`或`cos`或`prenextdis`,默认为`avg`
`number`取值应大于0,默认`3`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例和桶内采样个数的等分桶离群值采样 | + +参数说明 +- `proportion`: 采样比例 + - `number`: 每个桶内的采样个数,默认`3` +- `type`: 离群值采样方法,取值为 + - `avg`: 取桶内数据点的平均值,并根据采样比例,找到距离均值最远的`top number`个 + - `stendis`: 取桶内每一个数据点距离桶的首末数据点连成直线的垂直距离,并根据采样比例,找到距离最大的`top number`个 + - `cos`: 设桶内一个数据点为b,b左边的数据点为a,b右边的数据点为c,则取ab与bc向量的夹角的余弦值,值越小,说明形成的角度越大,越可能是异常值。找到cos值最小的`top number`个 + - `prenextdis`: 设桶内一个数据点为b,b左边的数据点为a,b右边的数据点为c,则取ab与bc的长度之和作为衡量标准,和越大越可能是异常值,找到最大的`top number`个 + +##### 示例 + +测试数据:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条数据,其中为了加入离群值,我们使得个位数为5的值自增100。 +``` +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 105.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|1970-01-01T08:00:00.013+08:00| 13.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.015+08:00| 115.0| +|1970-01-01T08:00:00.016+08:00| 16.0| +|.............................|.............................| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 195.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +sql: +```sql +select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +| Time|outlier_avg_sample|outlier_stendis_sample|outlier_cos_sample|outlier_prenextdis_sample| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +|1970-01-01T08:00:00.005+08:00| 105.0| 105.0| 105.0| 105.0| +|1970-01-01T08:00:00.015+08:00| 115.0| 115.0| 115.0| 115.0| +|1970-01-01T08:00:00.025+08:00| 125.0| 125.0| 125.0| 125.0| +|1970-01-01T08:00:00.035+08:00| 135.0| 135.0| 135.0| 135.0| +|1970-01-01T08:00:00.045+08:00| 145.0| 145.0| 145.0| 145.0| +|1970-01-01T08:00:00.055+08:00| 155.0| 155.0| 155.0| 155.0| +|1970-01-01T08:00:00.065+08:00| 165.0| 165.0| 165.0| 165.0| +|1970-01-01T08:00:00.075+08:00| 175.0| 175.0| 175.0| 175.0| +|1970-01-01T08:00:00.085+08:00| 185.0| 185.0| 185.0| 185.0| +|1970-01-01T08:00:00.095+08:00| 195.0| 195.0| 195.0| 195.0| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +Total line number = 10 +It costs 0.041s +``` + +### M4函数 + +#### 函数简介 + +M4用于在窗口内采样第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`): + +- 第一个点是拥有这个窗口内最小时间戳的点; +- 最后一个点是拥有这个窗口内最大时间戳的点; +- 最小值点是拥有这个窗口内最小值的点(如果有多个这样的点,M4只返回其中一个); +- 最大值点是拥有这个窗口内最大值的点(如果有多个这样的点,M4只返回其中一个)。 + +image + +| 函数名 | 可接收的输入序列类型 | 属性参数 | 输出序列类型 | 功能类型 | +| ------ | ------------------------------ | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | 包含固定点数的窗口和滑动时间窗口使用不同的属性参数。包含固定点数的窗口使用属性`windowSize`和`slidingStep`。滑动时间窗口使用属性`timeInterval`、`slidingStep`、`displayWindowBegin`和`displayWindowEnd`。更多细节见下文。 | INT32 / INT64 / FLOAT / DOUBLE | 返回每个窗口内的第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`)。在一个窗口内的聚合点输出之前,M4会将它们按照时间戳递增排序并且去重。 | + +#### 属性参数 + +**(1) 包含固定点数的窗口(SlidingSizeWindowAccessStrategy)使用的属性参数:** + ++ `windowSize`: 一个窗口内的点数。Int数据类型。必需的属性参数。 ++ `slidingStep`: 按照设定的点数来滑动窗口。Int数据类型。可选的属性参数;如果没有设置,默认取值和`windowSize`一样。 + +image + +**(2) 滑动时间窗口(SlidingTimeWindowAccessStrategy)使用的属性参数:** + ++ `timeInterval`: 一个窗口的时间长度。Long数据类型。必需的属性参数。 ++ `slidingStep`: 按照设定的时长来滑动窗口。Long数据类型。可选的属性参数;如果没有设置,默认取值和`timeInterval`一样。 ++ `displayWindowBegin`: 窗口滑动的起始时间戳位置(包含在内)。Long数据类型。可选的属性参数;如果没有设置,默认取值为Long.MIN_VALUE,意为使用输入的时间序列的第一个点的时间戳作为窗口滑动的起始时间戳位置。 ++ `displayWindowEnd`: 结束时间限制(不包含在内;本质上和`WHERE time < displayWindowEnd`起的效果是一样的)。Long数据类型。可选的属性参数;如果没有设置,默认取值为Long.MAX_VALUE,意为除了输入的时间序列自身数据读取完毕之外没有增加额外的结束时间过滤条件限制。 + +groupBy window + +#### 示例 + +输入的时间序列: + +```sql ++-----------------------------+------------------+ +| Time|root.vehicle.d1.s1| ++-----------------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.002+08:00| 15.0| +|1970-01-01T08:00:00.005+08:00| 10.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.027+08:00| 20.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.040+08:00| 20.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+------------------+ +``` + +查询语句1: + +```sql +select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 +``` + +输出结果1: + +```sql ++-----------------------------+-----------------------------------------------------------------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "timeInterval"="25", "displayWindowBegin"="0", "displayWindowEnd"="100")| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +Total line number = 8 +``` + +查询语句2: + +```sql +select M4(s1,'windowSize'='10') from root.vehicle.d1 +``` + +输出结果2: + +```sql ++-----------------------------+-----------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "windowSize"="10")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------+ +Total line number = 7 +``` + +#### 推荐的使用场景 + +**(1) 使用场景:保留极端点的降采样** + +由于M4为每个窗口聚合其第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`),因此M4通常保留了极值点,因此比其他下采样方法(如分段聚合近似 (PAA))能更好地保留模式。如果你想对时间序列进行下采样并且希望保留极值点,你可以试试 M4。 + +**(2) 使用场景:基于M4降采样的大规模时间序列的零误差双色折线图可视化** + +参考论文["M4: A Visualization-Oriented Time Series Data Aggregation"](http://www.vldb.org/pvldb/vol7/p797-jugel.pdf),作为大规模时间序列可视化的降采样方法,M4可以做到双色折线图的零变形。 + +假设屏幕画布的像素宽乘高是`w*h`,假设时间序列要可视化的时间范围是`[tqs,tqe)`,并且(tqe-tqs)是w的整数倍,那么落在第i个时间跨度`Ii=[tqs+(tqe-tqs)/w*(i-1),tqs+(tqe-tqs)/w*i)` 内的点将会被画在第i个像素列中,i=1,2,...,w。于是从可视化驱动的角度出发,使用查询语句:`"select M4(s1,'timeInterval'='(tqe-tqs)/w','displayWindowBegin'='tqs','displayWindowEnd'='tqe') from root.vehicle.d1"`,来采集每个时间跨度内的第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`)。降采样时间序列的结果点数不会超过`4*w`个,与此同时,使用这些聚合点画出来的二色折线图与使用原始数据画出来的在像素级别上是完全一致的。 + +为了免除参数值硬编码的麻烦,当Grafana用于可视化时,我们推荐使用Grafana的[模板变量](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables)`$ __interval_ms`,如下所示: + +```sql +select M4(s1,'timeInterval'='$__interval_ms') from root.sg1.d1 +``` + +其中`timeInterval`自动设置为`(tqe-tqs)/w`。请注意,这里的时间精度假定为毫秒。 + +#### 和其它函数的功能比较 + +| SQL | 是否支持M4聚合 | 滑动窗口类型 | 示例 | 相关文档 | +| ------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| 1. 带有Group By子句的内置聚合函数 | 不支持,缺少`BOTTOM_TIME`和`TOP_TIME`,即缺少最小值点和最大值点的时间戳。 | Time Window | `select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#built-in-aggregate-functions
https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#downsampling-aggregate-query | +| 2. EQUAL_SIZE_BUCKET_M4_SAMPLE (内置UDF) | 支持* | Size Window. `windowSize = 4*(int)(1/proportion)` | `select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Select-Expression.html#time-series-generating-functions | +| **3. M4 (内置UDF)** | 支持* | Size Window, Time Window | (1) Size Window: `select M4(s1,'windowSize'='10') from root.vehicle.d1`
(2) Time Window: `select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1` | 本文档 | +| 4. 扩展带有Group By子句的内置聚合函数来支持M4聚合 | 未实施 | 未实施 | 未实施 | 未实施 | + +进一步比较`EQUAL_SIZE_BUCKET_M4_SAMPLE`和`M4`: + +**(1) 不同的M4聚合函数定义:** + +在每个窗口内,`EQUAL_SIZE_BUCKET_M4_SAMPLE`从排除了第一个点和最后一个点之后剩余的点中提取最小值点和最大值点。 + +而`M4`则是从窗口内所有点中(包括第一个点和最后一个点)提取最小值点和最大值点,这个定义与元数据中保存的`max_value`和`min_value`的语义更加一致。 + +值得注意的是,在一个窗口内的聚合点输出之前,`EQUAL_SIZE_BUCKET_M4_SAMPLE`和`M4`都会将它们按照时间戳递增排序并且去重。 + +**(2) 不同的滑动窗口:** + +`EQUAL_SIZE_BUCKET_M4_SAMPLE`使用SlidingSizeWindowAccessStrategy,并且通过采样比例(`proportion`)来间接控制窗口点数(`windowSize`),转换公式是`windowSize = 4*(int)(1/proportion)`。 + +`M4`支持两种滑动窗口:SlidingSizeWindowAccessStrategy和SlidingTimeWindowAccessStrategy,并且`M4`通过相应的参数直接控制窗口的点数或者时长。 + + + +## 时间序列处理 + +### CHANGE_POINTS + +#### 函数简介 + +本函数用于去除输入序列中的连续相同值。如输入序列`1,1,2,2,3`输出序列为`1,2,3`。 + +**函数名:** CHANGE_POINTS + +**输入序列:** 仅支持输入1个序列。 + +**参数:** 无 + +#### 使用示例 + +原始数据: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|root.testChangePoints.d1.s1|root.testChangePoints.d1.s2|root.testChangePoints.d1.s3|root.testChangePoints.d1.s4|root.testChangePoints.d1.s5|root.testChangePoints.d1.s6| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| true| 2| 2| 2.0| 1.0| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| 2| 1.0| 1.0| 2test2| +|1970-01-01T08:00:00.004+08:00| true| 1| 3| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.005+08:00| true| 1| 3| 1.0| 1.0| 1test1| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +``` + +用于查询的SQL语句: + +```sql +select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +| Time|change_points(root.testChangePoints.d1.s1)|change_points(root.testChangePoints.d1.s2)|change_points(root.testChangePoints.d1.s3)|change_points(root.testChangePoints.d1.s4)|change_points(root.testChangePoints.d1.s5)|change_points(root.testChangePoints.d1.s6)| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| null| 2| 2| 2.0| null| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| null| 1.0| null| null| +|1970-01-01T08:00:00.004+08:00| true| null| 3| null| null| 1test1| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +``` + + + +## Lambda 表达式 + +### JEXL 自定义函数 + +#### 函数简介 + +Java Expression Language (JEXL) 是一个表达式语言引擎。我们使用 JEXL 来扩展 UDF,在命令行中,通过简易的 lambda 表达式来实现 UDF。 + +lambda 表达式中支持的运算符详见链接 [JEXL 中 lambda 表达式支持的运算符](https://commons.apache.org/proper/commons-jexl/apidocs/org/apache/commons/jexl3/package-summary.html#customization) 。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr`是一个支持标准的一元或多元参数的lambda表达式,符合`x -> {...}`或`(x, y, z) -> {...}`的格式,例如`x -> {x * 2}`, `(x, y, z) -> {x + y * z}`| INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | 返回将输入的时间序列通过lambda表达式变换的序列 | + +#### 使用示例 + +输入序列: +``` +IoTDB> select * from root.ln.wf01.wt01; ++-----------------------------+---------------------+--------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.str|root.ln.wf01.wt01.st|root.ln.wf01.wt01.temperature| ++-----------------------------+---------------------+--------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| str| 10.0| 0.0| +|1970-01-01T08:00:00.001+08:00| str| 20.0| 1.0| +|1970-01-01T08:00:00.002+08:00| str| 30.0| 2.0| +|1970-01-01T08:00:00.003+08:00| str| 40.0| 3.0| +|1970-01-01T08:00:00.004+08:00| str| 50.0| 4.0| +|1970-01-01T08:00:00.005+08:00| str| 60.0| 5.0| +|1970-01-01T08:00:00.006+08:00| str| 70.0| 6.0| +|1970-01-01T08:00:00.007+08:00| str| 80.0| 7.0| +|1970-01-01T08:00:00.008+08:00| str| 90.0| 8.0| +|1970-01-01T08:00:00.009+08:00| str| 100.0| 9.0| +|1970-01-01T08:00:00.010+08:00| str| 110.0| 10.0| ++-----------------------------+---------------------+--------------------+-----------------------------+ +``` + +用于查询的SQL语句: +```sql +select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01; +``` + +输出序列: +``` ++-----------------------------+-----+-----+-----+------+-----+--------+ +| Time|jexl1|jexl2|jexl3| jexl4|jexl5| jexl6| ++-----------------------------+-----+-----+-----+------+-----+--------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 0.0| 0.0| 10.0| 10.0str| +|1970-01-01T08:00:00.001+08:00| 2.0| 3.0| 1.0| 100.0| 21.0| 21.0str| +|1970-01-01T08:00:00.002+08:00| 4.0| 6.0| 4.0| 200.0| 32.0| 32.0str| +|1970-01-01T08:00:00.003+08:00| 6.0| 9.0| 9.0| 300.0| 43.0| 43.0str| +|1970-01-01T08:00:00.004+08:00| 8.0| 12.0| 16.0| 400.0| 54.0| 54.0str| +|1970-01-01T08:00:00.005+08:00| 10.0| 15.0| 25.0| 500.0| 65.0| 65.0str| +|1970-01-01T08:00:00.006+08:00| 12.0| 18.0| 36.0| 600.0| 76.0| 76.0str| +|1970-01-01T08:00:00.007+08:00| 14.0| 21.0| 49.0| 700.0| 87.0| 87.0str| +|1970-01-01T08:00:00.008+08:00| 16.0| 24.0| 64.0| 800.0| 98.0| 98.0str| +|1970-01-01T08:00:00.009+08:00| 18.0| 27.0| 81.0| 900.0|109.0|109.0str| +|1970-01-01T08:00:00.010+08:00| 20.0| 30.0|100.0|1000.0|120.0|120.0str| ++-----------------------------+-----+-----+-----+------+-----+--------+ +Total line number = 11 +It costs 0.118s +``` + + + +## 条件表达式 + +### CASE + +CASE表达式是一种条件表达式,可用于根据特定条件返回不同的值,功能类似于其它语言中的if-else。 +CASE表达式由以下部分组成: +- CASE关键字:表示开始CASE表达式。 +- WHEN-THEN子句:可能存在多个,用于定义条件与给出结果。此子句又分为WHEN和THEN两个部分,WHEN部分表示条件,THEN部分表示结果表达式。如果WHEN条件为真,则返回对应的THEN结果。 +- ELSE子句:如果没有任何WHEN-THEN子句的条件为真,则返回ELSE子句中的结果。可以不存在ELSE子句。 +- END关键字:表示结束CASE表达式。 + +CASE表达式是一种标量运算,可以配合任何其它的标量运算或聚合函数使用。 + +下文把所有THEN部分和ELSE子句并称为结果子句。 + +#### 语法示例 + +CASE表达式支持两种格式。 + +语法示例如下: +- 格式1: +```sql + CASE + WHEN condition1 THEN expression1 + [WHEN condition2 THEN expression2] ... + [ELSE expression_end] + END +``` + 从上至下检查WHEN子句中的condition。 + + condition为真时返回对应THEN子句中的expression,condition为假时继续检查下一个WHEN子句中的condition。 +- 格式2: +```sql + CASE caseValue + WHEN whenValue1 THEN expression1 + [WHEN whenValue2 THEN expression2] ... + [ELSE expression_end] + END +``` + + 从上至下检查WHEN子句中的whenValue是否与caseValue相等。 + + 满足caseValue=whenValue时返回对应THEN子句中的expression,不满足时继续检查下一个WHEN子句中的whenValue。 + + 格式2会被iotdb转换成等效的格式1,例如以上sql语句会转换成: +```sql + CASE + WHEN caseValue=whenValue1 THEN expression1 + [WHEN caseValue=whenValue1 THEN expression1] ... + [ELSE expression_end] + END +``` + +如果格式1中的condition均不为真,或格式2中均不满足caseVaule=whenValue,则返回ELSE子句中的expression_end;不存在ELSE子句则返回null。 + +#### 注意事项 + +- 格式1中,所有WHEN子句必须返回BOOLEAN类型。 +- 格式2中,所有WHEN子句必须能够与CASE子句进行判等。 +- 一个CASE表达式中所有结果子句的返回值类型需要满足一定的条件: + - BOOLEAN类型不能与其它类型共存,存在其它类型会报错。 + - TEXT类型不能与其它类型共存,存在其它类型会报错。 + - 其它四种数值类型可以共存,最终结果会为DOUBLE类型,转换过程可能会存在精度损失。 +- CASE表达式没有实现惰性计算,即所有子句都会被计算。 +- CASE表达式不支持与UDF混用。 +- CASE表达式内部不能存在聚合函数,但CASE表达式的结果可以提供给聚合函数。 +- 使用CLI时,由于CASE表达式字符串较长,推荐用as为表达式提供别名。 + +#### 使用示例 + +##### 示例1 + +CASE表达式可对数据进行直观地分析,例如: + +- 某种化学产品的制备需要温度和压力都处于特定范围之内 +- 在制备过程中传感器会侦测温度和压力,在iotdb中形成T(temperature)和P(pressure)两个时间序列 + +这种应用场景下,CASE表达式可以指出哪些时间的参数是合适的,哪些时间的参数不合适,以及为什么不合适。 + +数据: +```sql +IoTDB> select * from root.test1 ++-----------------------------+------------+------------+ +| Time|root.test1.P|root.test1.T| ++-----------------------------+------------+------------+ +|2023-03-29T11:25:54.724+08:00| 1000000.0| 1025.0| +|2023-03-29T11:26:13.445+08:00| 1000094.0| 1040.0| +|2023-03-29T11:27:36.988+08:00| 1000095.0| 1041.0| +|2023-03-29T11:27:56.446+08:00| 1000095.0| 1059.0| +|2023-03-29T11:28:20.838+08:00| 1200000.0| 1040.0| ++-----------------------------+------------+------------+ +``` + +SQL语句: +```sql +select T, P, case +when 1000=1050 then "bad temperature" +when P<=1000000 or P>=1100000 then "bad pressure" +end as `result` +from root.test1 +``` + + +输出: +``` ++-----------------------------+------------+------------+---------------+ +| Time|root.test1.T|root.test1.P| result| ++-----------------------------+------------+------------+---------------+ +|2023-03-29T11:25:54.724+08:00| 1025.0| 1000000.0| bad pressure| +|2023-03-29T11:26:13.445+08:00| 1040.0| 1000094.0| good!| +|2023-03-29T11:27:36.988+08:00| 1041.0| 1000095.0| good!| +|2023-03-29T11:27:56.446+08:00| 1059.0| 1000095.0|bad temperature| +|2023-03-29T11:28:20.838+08:00| 1040.0| 1200000.0| bad pressure| ++-----------------------------+------------+------------+---------------+ +``` + + +##### 示例2 + +CASE表达式可实现结果的自由转换,例如将具有某种模式的字符串转换成另一种字符串。 + +数据: +```sql +IoTDB> select * from root.test2 ++-----------------------------+--------------+ +| Time|root.test2.str| ++-----------------------------+--------------+ +|2023-03-27T18:23:33.427+08:00| abccd| +|2023-03-27T18:23:39.389+08:00| abcdd| +|2023-03-27T18:23:43.463+08:00| abcdefg| ++-----------------------------+--------------+ +``` + +SQL语句: +```sql +select str, case +when str like "%cc%" then "has cc" +when str like "%dd%" then "has dd" +else "no cc and dd" end as `result` +from root.test2 +``` + +输出: +``` ++-----------------------------+--------------+------------+ +| Time|root.test2.str| result| ++-----------------------------+--------------+------------+ +|2023-03-27T18:23:33.427+08:00| abccd| has cc| +|2023-03-27T18:23:39.389+08:00| abcdd| has dd| +|2023-03-27T18:23:43.463+08:00| abcdefg|no cc and dd| ++-----------------------------+--------------+------------+ +``` + +##### 示例3:搭配聚合函数 + +###### 合法:聚合函数←CASE表达式 + +CASE表达式可作为聚合函数的参数。例如,与聚合函数COUNT搭配,可实现同时按多个条件进行数据统计。 + +数据: +```sql +IoTDB> select * from root.test3 ++-----------------------------+------------+ +| Time|root.test3.x| ++-----------------------------+------------+ +|2023-03-27T18:11:11.300+08:00| 0.0| +|2023-03-27T18:11:14.658+08:00| 1.0| +|2023-03-27T18:11:15.981+08:00| 2.0| +|2023-03-27T18:11:17.668+08:00| 3.0| +|2023-03-27T18:11:19.112+08:00| 4.0| +|2023-03-27T18:11:20.822+08:00| 5.0| +|2023-03-27T18:11:22.462+08:00| 6.0| +|2023-03-27T18:11:24.174+08:00| 7.0| +|2023-03-27T18:11:25.858+08:00| 8.0| +|2023-03-27T18:11:27.979+08:00| 9.0| ++-----------------------------+------------+ +``` + +SQL语句: + +```sql +select +count(case when x<=1 then 1 end) as `(-∞,1]`, +count(case when 1 select * from root.test4 ++-----------------------------+------------+ +| Time|root.test4.x| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| ++-----------------------------+------------+ +``` + +SQL语句: +```sql +select x, case x when 1 then "one" when 2 then "two" else "other" end from root.test4 +``` + +输出: +``` ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +| Time|root.test4.x|CASE WHEN root.test4.x = 1 THEN "one" WHEN root.test4.x = 2 THEN "two" ELSE "other"| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| one| +|1970-01-01T08:00:00.002+08:00| 2.0| two| +|1970-01-01T08:00:00.003+08:00| 3.0| other| +|1970-01-01T08:00:00.004+08:00| 4.0| other| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +``` + +##### 示例5:结果子句类型 + +CASE表达式的结果子句的返回值需要满足一定的类型限制。 + +此示例中,继续使用示例4中的数据。 + +###### 非法:BOOLEAN与其它类型共存 + +SQL语句: +```sql +select x, case x when 1 then true when 2 then 2 end from root.test4 +``` + +输出: +``` +Msg: 701: CASE expression: BOOLEAN and other types cannot exist at same time +``` + +###### 合法:只存在BOOLEAN类型 + +SQL语句: +```sql +select x, case x when 1 then true when 2 then false end as `result` from root.test4 +``` + +输出: +``` ++-----------------------------+------------+------+ +| Time|root.test4.x|result| ++-----------------------------+------------+------+ +|1970-01-01T08:00:00.001+08:00| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 2.0| false| +|1970-01-01T08:00:00.003+08:00| 3.0| null| +|1970-01-01T08:00:00.004+08:00| 4.0| null| ++-----------------------------+------------+------+ +``` + +###### 非法:TEXT与其它类型共存 + +SQL语句: +```sql +select x, case x when 1 then 1 when 2 then "str" end from root.test4 +``` + +输出: +``` +Msg: 701: CASE expression: TEXT and other types cannot exist at same time +``` + +###### 合法:只存在TEXT类型 + +见示例1。 + +###### 合法:数值类型共存 + +SQL语句: +```sql +select x, case x +when 1 then 1 +when 2 then 222222222222222 +when 3 then 3.3 +when 4 then 4.4444444444444 +end as `result` +from root.test4 +``` + +输出: +``` ++-----------------------------+------------+-------------------+ +| Time|root.test4.x| result| ++-----------------------------+------------+-------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0|2.22222222222222E14| +|1970-01-01T08:00:00.003+08:00| 3.0| 3.299999952316284| +|1970-01-01T08:00:00.004+08:00| 4.0| 4.44444465637207| ++-----------------------------+------------+-------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/Operator-and-Expression.md b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/Operator-and-Expression.md new file mode 100644 index 00000000..6afc120a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/Operator-and-Expression.md @@ -0,0 +1,529 @@ + + +# 函数与运算符 + +## 运算符 +### 算数运算符 +|运算符 |含义| +|----------------------------|-----------| +|`+` |取正(单目)| +|`-` |取负(单目)| +|`*` |乘| +|`/` |除| +|`%` |取余| +|`+` |加| +|`-` |减| + +详细说明及示例见文档 [算数运算符和函数](../Reference/Function-and-Expression.md#算数运算符)。 + +### 比较运算符 +|运算符 |含义| +|----------------------------|-----------| +|`>` |大于| +|`>=` |大于等于| +|`<` |小于| +|`<=` |小于等于| +|`==` |等于| +|`!=` / `<>` |不等于| +|`BETWEEN ... AND ...` |在指定范围内| +|`NOT BETWEEN ... AND ...` |不在指定范围内| +|`LIKE` |匹配简单模式| +|`NOT LIKE` |无法匹配简单模式| +|`REGEXP` |匹配正则表达式| +|`NOT REGEXP` |无法匹配正则表达式| +|`IS NULL` |是空值| +|`IS NOT NULL` |不是空值| +|`IN` / `CONTAINS` |是指定列表中的值| +|`NOT IN` / `NOT CONTAINS` |不是指定列表中的值| + +详细说明及示例见文档 [比较运算符和函数](../Reference/Function-and-Expression.md#比较运算符和函数)。 + +### 逻辑运算符 +|运算符 |含义| +|----------------------------|-----------| +|`NOT` / `!` |取非(单目)| +|`AND` / `&` / `&&` |逻辑与| +|`OR`/ | / || |逻辑或| + +详细说明及示例见文档 [逻辑运算符](../Reference/Function-and-Expression.md#逻辑运算符)。 + +### 运算符优先级 + +运算符的优先级从高到低排列如下,同一行的运算符优先级相同。 + +```sql +!, - (unary operator), + (unary operator) +*, /, DIV, %, MOD +-, + +=, ==, <=>, >=, >, <=, <, <>, != +LIKE, REGEXP, NOT LIKE, NOT REGEXP +BETWEEN ... AND ..., NOT BETWEEN ... AND ... +IS NULL, IS NOT NULL +IN, CONTAINS, NOT IN, NOT CONTAINS +AND, &, && +OR, |, || +``` + +## 内置函数 + +列表中的函数无须注册即可在 IoTDB 中使用,数据函数质量库中的函数需要参考注册步骤进行注册后才能使用。 + +### 聚合函数 + +| 函数名 | 功能描述 | 允许的输入类型 | 输出类型 | +| ----------- | ------------------------------------------------------------ |------------------------------------------------------| -------------- | +| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| COUNT | 计算数据点数。 | 所有类型 | INT | +| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV | STDDEV_SAMP 的别名,求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_POP | 求总体标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| STDDEV_SAMP | 求样本标准差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VARIANCE | VAR_SAMP 的别名,求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_POP | 求总体方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| VAR_SAMP | 求样本方差。 | INT32 INT64 FLOAT DOUBLE | DOUBLE | +| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 与输入类型一致 | +| MAX_VALUE | 求最大值。 | INT32 INT64 FLOAT DOUBLE STRING TIMESTAMP DATE | 与输入类型一致 | +| MIN_VALUE | 求最小值。 | INT32 INT64 FLOAT DOUBLE STRING TIMESTAMP DATE | 与输入类型一致 | +| FIRST_VALUE | 求时间戳最小的值。 | 所有类型 | 与输入类型一致 | +| LAST_VALUE | 求时间戳最大的值。 | 所有类型 | 与输入类型一致 | +| MAX_TIME | 求最大时间戳。 | 所有类型 | Timestamp | +| MIN_TIME | 求最小时间戳。 | 所有类型 | Timestamp | +| MAX_BY | MAX_BY(x, y) 求二元输入 x 和 y 在 y 最大时对应的 x 的值。MAX_BY(time, x) 返回 x 取最大值时对应的时间戳。 | 第一个输入 x 可以是任意类型,第二个输入 y 只能是 INT32 INT64 FLOAT DOUBLE STRING TIMESTAMP DATE | 与第一个输入 x 的数据类型一致 | +| MIN_BY | MIN_BY(x, y) 求二元输入 x 和 y 在 y 最小时对应的 x 的值。MIN_BY(time, x) 返回 x 取最小值时对应的时间戳。 | 第一个输入 x 可以是任意类型,第二个输入 y 只能是 INT32 INT64 FLOAT DOUBLE STRING TIMESTAMP DATE | 与第一个输入 x 的数据类型一致 | + +详细说明及示例见文档 [聚合函数](../Reference/Function-and-Expression.md#聚合函数)。 + +### 数学函数 + +| 函数名 | 输入序列类型 | 输出序列类型 | 必要属性参数 | Java 标准库中的对应实现 | +| ------- | ------------------------------ | ------------------------ |----------------------------------------------|-------------------------------------------------------------------| +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | 与输入序列的实际类型一致 | | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | `places`:四舍五入有效位数,正数为小数点后面的有效位数,负数为整数位的有效位数 | Math#rint(Math#pow(10,places))/Math#pow(10,places) | +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sqrt(double) | + + +详细说明及示例见文档 [数学函数](../Reference/Function-and-Expression.md#数学函数)。 + +### 比较函数 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`:DOUBLE | BOOLEAN | 返回`ts_value >= threshold`的bool值 | +| IN_RANGE | INT32 / INT64 / FLOAT / DOUBLE | `lower`:DOUBLE
`upper`:DOUBLE | BOOLEAN | 返回`ts_value >= lower && ts_value <= upper`的bool值 | | + +详细说明及示例见文档 [比较运算符和函数](../Reference/Function-and-Expression.md#比较运算符和函数)。 + +### 字符串函数 + +| 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | +|-----------------|-------------|-----------------------------------------------------------------------------------------------------------| ------------ |-------------------------------------------------------------------------| +| STRING_CONTAINS | TEXT STRING | `s`: 待搜寻的字符串 | BOOLEAN | 判断字符串中是否存在`s` | +| STRING_MATCHES | TEXT STRING | `regex`: Java 标准库风格的正则表达式 | BOOLEAN | 判断字符串是否能够被正则表达式`regex`匹配 | +| LENGTH | TEXT STRING | 无 | INT32 | 返回字符串的长度 | +| LOCATE | TEXT STRING | `target`: 需要被定位的子串
`reverse`: 指定是否需要倒序定位,默认值为`false`, 即从左至右定位 | INT32 | 获取`target`子串第一次出现在输入序列的位置,如果输入序列中不包含`target`则返回 -1 | +| STARTSWITH | TEXT STRING | `target`: 需要匹配的前缀 | BOOLEAN | 判断字符串是否有指定前缀 | +| ENDSWITH | TEXT STRING | `target`: 需要匹配的后缀 | BOOLEAN | 判断字符串是否有指定后缀 | +| CONCAT | TEXT STRING | `targets`: 一系列 K-V, key需要以`target`为前缀且不重复, value是待拼接的字符串。
`series_behind`: 指定拼接时时间序列是否在后面,默认为`false`。 | TEXT | 拼接字符串和`target`字串 | +| SUBSTRING | TEXT STRING | `from`: 指定子串开始下标
`for`: 指定的字符个数之后停止 | TEXT | 提取字符串的子字符串,从指定的第一个字符开始,并在指定的字符数之后停止。下标从1开始。from 和 for的范围是 INT32 类型取值范围。 | +| REPLACE | TEXT STRING | 第一个参数: 需要替换的目标子串
第二个参数:要替换成的子串 | TEXT | 将输入序列中的子串替换成目标子串 | +| UPPER | TEXT STRING | 无 | TEXT | 将字符串转化为大写 | +| LOWER | TEXT STRING | 无 | TEXT | 将字符串转化为小写 | +| TRIM | TEXT STRING | 无 | TEXT | 移除字符串前后的空格 | +| STRCMP | TEXT STRING | 无 | TEXT | 用于比较两个输入序列,如果值相同返回 `0` , 序列1的值小于序列2的值返回一个`负数`,序列1的值大于序列2的值返回一个`正数` | + +详细说明及示例见文档 [字符串处理函数](../Reference/Function-and-Expression.md#字符串处理)。 + +### 数据类型转换函数 + +| 函数名 | 必要的属性参数 | 输出序列类型 | 功能类型 | +| ------ | ------------------------------------------------------------ | ------------------------ | ---------------------------------- | +| CAST | `type`:输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数`type`决定 | 将数据转换为`type`参数指定的类型。 | + +详细说明及示例见文档 [数据类型转换](../Reference/Function-and-Expression.md#数据类型转换)。 + +### 常序列生成函数 + +| 函数名 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| ------ | ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | +| CONST | `value`: 输出的数据点的值
`type`: 输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数 `type` 决定 | 根据输入属性 `value` 和 `type` 输出用户指定的常序列。 | +| PI | 无 | DOUBLE | 常序列的值:`π` 的 `double` 值,圆的周长与其直径的比值,即圆周率,等于 *Java标准库* 中的`Math.PI`。 | +| E | 无 | DOUBLE | 常序列的值:`e` 的 `double` 值,自然对数的底,它等于 *Java 标准库* 中的 `Math.E`。 | + +详细说明及示例见文档 [常序列生成函数](../Reference/Function-and-Expression.md#常序列生成函数)。 + +### 选择函数 + +| 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| -------- |-------------------------------------------------------------------| ------------------------------------------------- | ------------------------ | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT / STRING / DATE / TIEMSTAMP | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最大的`k`个数据点。若多于`k`个数据点的值并列最大,则返回时间戳最小的数据点。 | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT / STRING / DATE / TIEMSTAMP | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最小的`k`个数据点。若多于`k`个数据点的值并列最小,则返回时间戳最小的数据点。 | + +详细说明及示例见文档 [选择函数](../Reference/Function-and-Expression.md#选择函数)。 + +### 区间查询函数 + +| 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | +|-------------------|--------------------------------------|------------------------------------------------|-------|------------------------------------------------------------------| +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | + +详细说明及示例见文档 [区间查询函数](../Reference/Function-and-Expression.md#区间查询函数)。 + +### 趋势计算函数 + +| 函数名 | 输入序列类型 | 输出序列类型 | 功能描述 | +| ----------------------- | ----------------------------------------------- | ------------------------ | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | INT64 | 统计序列中某数据点的时间戳与前一数据点时间戳的差。范围内第一个数据点没有对应的结果输出。 | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差的绝对值。范围内第一个数据点没有对应的结果输出。 | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率,数量上等同于 DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率的绝对值,数量上等同于 NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | + + +| 函数名 | 输入序列类型 | 参数 | 输出序列类型 | 功能描述 | +|------|--------------------------------|------------------------------------------------------------------------------------------------------------------------|--------|------------------------------------------------| +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:可选,默认为true;为true时,前一个数据点值为null时,忽略该数据点继续向前找到第一个出现的不为null的值;为false时,如果前一个数据点为null,则不忽略,使用null进行相减,结果也为null | DOUBLE | 统计序列中某数据点的值与前一数据点的值的差。第一个数据点没有对应的结果输出,输出值为null | + +详细说明及示例见文档 [趋势计算函数](../Reference/Function-and-Expression.md#趋势计算函数)。 + +### 采样函数 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | 降采样比例 `proportion`,取值范围为`(0, 1]`,默认为`0.1` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶随机采样 | +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`:取值类型有`avg`, `max`, `min`, `sum`, `extreme`, `variance`, 默认为`avg` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶聚合采样 | +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶M4采样 | +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`取值为`avg`或`stendis`或`cos`或`prenextdis`,默认为`avg`
`number`取值应大于0,默认`3`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例和桶内采样个数的等分桶离群值采样 | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | 包含固定点数的窗口和滑动时间窗口使用不同的属性参数。包含固定点数的窗口使用属性`windowSize`和`slidingStep`。滑动时间窗口使用属性`timeInterval`、`slidingStep`、`displayWindowBegin`和`displayWindowEnd`。更多细节见下文。 | INT32 / INT64 / FLOAT / DOUBLE | 返回每个窗口内的第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`)。在一个窗口内的聚合点输出之前,M4会将它们按照时间戳递增排序并且去重。 | + +详细说明及示例见文档 [采样函数](../Reference/Function-and-Expression.md#采样函数)。 +### 时间序列处理函数 + +| 函数名 | 输入序列类型 | 参数 | 输出序列类型 | 功能描述 | +| ------------- | ------------------------------ | ---- | ------------------------ | -------------------------- | +| CHANGE_POINTS | INT32 / INT64 / FLOAT / DOUBLE | / | 与输入序列的实际类型一致 | 去除输入序列中的连续相同值 | + +详细说明及示例见文档 [时间序列处理](../Reference/Function-and-Expression.md#时间序列处理)。 + + +## Lambda 表达式 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +| ------ | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | ---------------------------------------------- | +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr`是一个支持标准的一元或多元参数的lambda表达式,符合`x -> {...}`或`(x, y, z) -> {...}`的格式,例如`x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | 返回将输入的时间序列通过lambda表达式变换的序列 | + +详细说明及示例见文档 [Lambda 表达式](../Reference/Function-and-Expression.md#Lambda表达式) + +## 条件表达式 + +| 表达式名称 | 含义 | +|---------------------------|-----------| +| `CASE` | 类似if else | + +详细说明及示例见文档 [条件表达式](../Reference/Function-and-Expression.md#条件表达式) + +## SELECT 表达式 + +`SELECT` 子句指定查询的输出,由若干个 `selectExpr` 组成。 每个 `selectExpr` 定义了查询结果中的一列或多列。 + +**`selectExpr` 是一个由时间序列路径后缀、常量、函数和运算符组成的表达式。即 `selectExpr` 中可以包含:** + +- 时间序列路径后缀(支持使用通配符) +- 运算符 + - 算数运算符 + - 比较运算符 + - 逻辑运算符 +- 函数 + - 聚合函数 + - 时间序列生成函数(包括内置函数和用户自定义函数) +- 常量 + +#### 使用别名 + +由于 IoTDB 独特的数据模型,在每个传感器前都附带有设备等诸多额外信息。有时,我们只针对某个具体设备查询,而这些前缀信息频繁显示造成了冗余,影响了结果集的显示与分析。 + +IoTDB 支持使用`AS`为查询结果集中的列指定别名。 + +**示例:** + +```sql +select s1 as temperature, s2 as speed from root.ln.wf01.wt01; +``` + +结果集将显示为: + +| Time | temperature | speed | +| ---- | ----------- | ----- | +| ... | ... | ... | + +#### 运算符 + +IoTDB 中支持的运算符列表见文档 [运算符和函数](../Reference/Function-and-Expression.md#算数运算符和函数)。 + +#### 函数 + +##### 聚合函数 + +聚合函数是多对一函数。它们对一组值进行聚合计算,得到单个聚合结果。 + +**包含聚合函数的查询称为聚合查询**,否则称为时间序列查询。 + +**注意:聚合查询和时间序列查询不能混合使用。** 下列语句是不支持的: + +```sql +select s1, count(s1) from root.sg.d1; +select sin(s1), count(s1) from root.sg.d1; +select s1, count(s1) from root.sg.d1 group by ([10,100),10ms); +``` + +IoTDB 支持的聚合函数见文档 [聚合函数](../Reference/Function-and-Expression.md#聚合函数)。 + +##### 时间序列生成函数 + +时间序列生成函数接受若干原始时间序列作为输入,产生一列时间序列输出。与聚合函数不同的是,时间序列生成函数的结果集带有时间戳列。 + +所有的时间序列生成函数都可以接受 * 作为输入,都可以与原始时间序列查询混合进行。 + +###### 内置时间序列生成函数 + +IoTDB 中支持的内置函数列表见文档 [运算符和函数](../Reference/Function-and-Expression.md#算数运算符)。 + +###### 自定义时间序列生成函数 + +IoTDB 支持通过用户自定义函数(点击查看: [用户自定义函数](../Reference/UDF-Libraries.md) )能力进行函数功能扩展。 + +#### 嵌套表达式举例 + +IoTDB 支持嵌套表达式,由于聚合查询和时间序列查询不能在一条查询语句中同时出现,我们将支持的嵌套表达式分为时间序列查询嵌套表达式和聚合查询嵌套表达式两类。 + +##### 时间序列查询嵌套表达式 + +IoTDB 支持在 `SELECT` 子句中计算由**时间序列、常量、时间序列生成函数(包括用户自定义函数)和运算符**组成的任意嵌套表达式。 + +**说明:** + +- 当某个时间戳下左操作数和右操作数都不为空(`null`)时,表达式才会有结果,否则表达式值为`null`,且默认不出现在结果集中。 +- 如果表达式中某个操作数对应多条时间序列(如通配符 `*`),那么每条时间序列对应的结果都会出现在结果集中(按照笛卡尔积形式)。 + +**示例 1:** + +```sql +select a, + b, + ((a + 1) * 2 - 1) % 2 + 1.5, + sin(a + sin(a + sin(b))), + -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 +from root.sg1; +``` + +运行结果: + +``` ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Time|root.sg1.a|root.sg1.b|((((root.sg1.a + 1) * 2) - 1) % 2) + 1.5|sin(root.sg1.a + sin(root.sg1.a + sin(root.sg1.b)))|(-root.sg1.a + root.sg1.b * ((sin(root.sg1.a + root.sg1.b) * sin(root.sg1.a + root.sg1.b)) + (cos(root.sg1.a + root.sg1.b) * cos(root.sg1.a + root.sg1.b)))) + 1| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 1| 1| 2.5| 0.9238430524420609| -1.0| +|1970-01-01T08:00:00.020+08:00| 2| 2| 2.5| 0.7903505371876317| -3.0| +|1970-01-01T08:00:00.030+08:00| 3| 3| 2.5| 0.14065207680386618| -5.0| +|1970-01-01T08:00:00.040+08:00| 4| null| 2.5| null| null| +|1970-01-01T08:00:00.050+08:00| null| 5| null| null| null| +|1970-01-01T08:00:00.060+08:00| 6| 6| 2.5| -0.7288037411970916| -11.0| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +Total line number = 6 +It costs 0.048s +``` + +**示例 2:** + +```sql +select (a + b) * 2 + sin(a) from root.sg +``` + +运行结果: + +``` ++-----------------------------+----------------------------------------------+ +| Time|((root.sg.a + root.sg.b) * 2) + sin(root.sg.a)| ++-----------------------------+----------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 59.45597888911063| +|1970-01-01T08:00:00.020+08:00| 100.91294525072763| +|1970-01-01T08:00:00.030+08:00| 139.01196837590714| +|1970-01-01T08:00:00.040+08:00| 180.74511316047935| +|1970-01-01T08:00:00.050+08:00| 219.73762514629607| +|1970-01-01T08:00:00.060+08:00| 259.6951893788978| +|1970-01-01T08:00:00.070+08:00| 300.7738906815579| +|1970-01-01T08:00:00.090+08:00| 39.45597888911063| +|1970-01-01T08:00:00.100+08:00| 39.45597888911063| ++-----------------------------+----------------------------------------------+ +Total line number = 9 +It costs 0.011s +``` + +**示例 3:** + +```sql +select (a + *) / 2 from root.sg1 +``` + +运行结果: + +``` ++-----------------------------+-----------------------------+-----------------------------+ +| Time|(root.sg1.a + root.sg1.a) / 2|(root.sg1.a + root.sg1.b) / 2| ++-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.010+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.020+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.030+08:00| 3.0| 3.0| +|1970-01-01T08:00:00.040+08:00| 4.0| null| +|1970-01-01T08:00:00.060+08:00| 6.0| 6.0| ++-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.011s +``` + +**示例 4:** + +```sql +select (a + b) * 3 from root.sg, root.ln +``` + +运行结果: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|(root.sg.a + root.sg.b) * 3|(root.sg.a + root.ln.b) * 3|(root.ln.a + root.sg.b) * 3|(root.ln.a + root.ln.b) * 3| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.010+08:00| 90.0| 270.0| 360.0| 540.0| +|1970-01-01T08:00:00.020+08:00| 150.0| 330.0| 690.0| 870.0| +|1970-01-01T08:00:00.030+08:00| 210.0| 450.0| 570.0| 810.0| +|1970-01-01T08:00:00.040+08:00| 270.0| 240.0| 690.0| 660.0| +|1970-01-01T08:00:00.050+08:00| 330.0| null| null| null| +|1970-01-01T08:00:00.060+08:00| 390.0| null| null| null| +|1970-01-01T08:00:00.070+08:00| 450.0| null| null| null| +|1970-01-01T08:00:00.090+08:00| 60.0| null| null| null| +|1970-01-01T08:00:00.100+08:00| 60.0| null| null| null| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +Total line number = 9 +It costs 0.014s +``` + +##### 聚合查询嵌套表达式 + +IoTDB 支持在 `SELECT` 子句中计算由**聚合函数、常量、时间序列生成函数和表达式**组成的任意嵌套表达式。 + +**说明:** + +- 当某个时间戳下左操作数和右操作数都不为空(`null`)时,表达式才会有结果,否则表达式值为`null`,且默认不出现在结果集中。但在使用`GROUP BY`子句的聚合查询嵌套表达式中,我们希望保留每个时间窗口的值,所以表达式值为`null`的窗口也包含在结果集中。 +- 如果表达式中某个操作数对应多条时间序列(如通配符`*`),那么每条时间序列对应的结果都会出现在结果集中(按照笛卡尔积形式)。 + +**示例 1:** + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) +from root.ln.wf01.wt01; +``` + +运行结果: + +``` ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|avg(root.ln.wf01.wt01.temperature) + sum(root.ln.wf01.wt01.hardware)| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +| 15.927999999999999| -0.21826546964855045| 16.927999999999997| -7426.0| 7441.928| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +Total line number = 1 +It costs 0.009s +``` + +**示例 2:** + +```sql +select avg(*), + (avg(*) + 1) * 3 / 2 -1 +from root.sg1 +``` + +运行结果: + +``` ++---------------+---------------+-------------------------------------+-------------------------------------+ +|avg(root.sg1.a)|avg(root.sg1.b)|(avg(root.sg1.a) + 1) * 3 / 2 - 1 |(avg(root.sg1.b) + 1) * 3 / 2 - 1 | ++---------------+---------------+-------------------------------------+-------------------------------------+ +| 3.2| 3.4| 5.300000000000001| 5.6000000000000005| ++---------------+---------------+-------------------------------------+-------------------------------------+ +Total line number = 1 +It costs 0.007s +``` + +**示例 3:** + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) as custom_sum +from root.ln.wf01.wt01 +GROUP BY([10, 90), 10ms); +``` + +运行结果: + +``` ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +| Time|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|custom_sum| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +|1970-01-01T08:00:00.010+08:00| 13.987499999999999| 0.9888207947857667| 14.987499999999999| -3211.0| 3224.9875| +|1970-01-01T08:00:00.020+08:00| 29.6| -0.9701057337071853| 30.6| -3720.0| 3749.6| +|1970-01-01T08:00:00.030+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.040+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.050+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.060+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.070+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.080+08:00| null| null| null| null| null| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +Total line number = 8 +It costs 0.012s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/SQL-Manual.md b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/SQL-Manual.md new file mode 100644 index 00000000..c3591e03 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/SQL-Manual.md @@ -0,0 +1,1973 @@ +# SQL手册 + +## 元数据操作 + +### 数据库管理 + +#### 创建数据库 + +```sql +CREATE DATABASE root.ln +``` + +#### 查看数据库 + +```sql +show databases +show databases root.* +show databases root.** +``` + +#### 删除数据库 + +```sql +DELETE DATABASE root.ln +DELETE DATABASE root.sgcc +DELETE DATABASE root.** +``` + +#### 统计数据库数量 + +```sql +count databases +count databases root.* +count databases root.sgcc.* +count databases root.sgcc +``` + +### 时间序列管理 + +#### 创建时间序列 + +```sql +create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +- 简化版 + +```sql +create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN +create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE +create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN +create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN +create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN +create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE +``` + +- 错误提示 + +```sql +create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +> error: encoding TS_2DIFF does not support BOOLEAN +``` + +#### 创建对齐时间序列 + +```sql +CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +#### 删除时间序列 + +```sql +delete timeseries root.ln.wf01.wt01.status +delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +delete timeseries root.ln.wf02.* +drop timeseries root.ln.wf02.* +``` + +#### 查看时间序列 + +```sql +SHOW TIMESERIES +SHOW TIMESERIES +SHOW TIMESERIES root.** +SHOW TIMESERIES root.ln.** +SHOW TIMESERIES root.ln.** limit 10 offset 10 +SHOW TIMESERIES root.ln.** where timeseries contains 'wf01.wt' +SHOW TIMESERIES root.ln.** where dataType=FLOAT +SHOW TIMESERIES root.ln.** where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +SHOW LATEST TIMESERIES +``` + +#### 统计时间序列数量 + +```sql +COUNT TIMESERIES root.** +COUNT TIMESERIES root.ln.** +COUNT TIMESERIES root.ln.*.*.status +COUNT TIMESERIES root.ln.wf01.wt01.status +COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +COUNT TIMESERIES root.** WHERE time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +COUNT TIMESERIES root.** GROUP BY LEVEL=1 +COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +#### 标签点管理 + +```sql +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +- 重命名标签或属性 + +```sql +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +- 重新设置标签或属性的值 + +```sql +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` + +- 删除已经存在的标签或属性 + +```sql +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` + +- 添加新的标签 + +```sql +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` + +- 添加新的属性 + +```sql +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` + +- 更新插入别名,标签和属性 + +```sql +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +- 使用标签作为过滤条件查询时间序列 + +```sql +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +返回给定路径的下的所有满足条件的时间序列信息: + +```sql +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +- 使用标签作为过滤条件统计时间序列数量 + +```sql +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +返回给定路径的下的所有满足条件的时间序列的数量: + +```sql +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +创建对齐时间序列: + +```sql +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +支持查询: + +```sql +show timeseries where TAGS(tag1)='v1' +``` + +### 时间序列路径管理 + +#### 查看路径的所有子路径 + +```sql +SHOW CHILD PATHS pathPattern +- 查询 root.ln 的下一层:show child paths root.ln +- 查询形如 root.xx.xx.xx 的路径:show child paths root.*.* +``` +#### 查看路径的所有子节点 + +```sql +SHOW CHILD NODES pathPattern + +- 查询 root 的下一层:show child nodes root +- 查询 root.ln 的下一层 :show child nodes root.ln +``` +#### 查看设备 + +```sql +IoTDB> show devices + +IoTDB> show devices root.ln.** + +IoTDB> show devices where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` +##### 查看设备及其 database 信息 + +```sql +IoTDB> show devices with database + +IoTDB> show devices root.ln.** with database +``` +#### 统计节点数 + +```sql +IoTDB > COUNT NODES root.** LEVEL=2 + +IoTDB > COUNT NODES root.ln.** LEVEL=2 + +IoTDB > COUNT NODES root.ln.wf01.* LEVEL=3 + +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` +#### 统计设备数量 + +```sql + +IoTDB> count devices + +IoTDB> count devices root.ln.** + +IoTDB> count devices where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` +### 设备模板管理 + + +![img](https://alioss.timecho.com/docs/img/%E6%A8%A1%E6%9D%BF.png) +![img](https://alioss.timecho.com/docs/img/template.jpg) + + + +#### 创建设备模板 + +```Go +CREATE DEVICE TEMPLATE ALIGNED? '(' [',' ]+ ')' +``` + +创建包含两个非对齐序列的设备模板 +```sql +IoTDB> create device template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` +创建包含一组对齐序列的设备模板 +```sql +IoTDB> create device template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` +#### 挂载设备模板 +```sql +IoTDB> set DEVICE TEMPLATE t1 to root.sg1 +``` +#### 激活设备模板 +```sql +IoTDB> create timeseries using DEVICE TEMPLATE on root.sg1.d1 + +IoTDB> set DEVICE TEMPLATE t1 to root.sg1.d1 + +IoTDB> set DEVICE TEMPLATE t2 to root.sg1.d2 + +IoTDB> create timeseries using device template on root.sg1.d1 + +IoTDB> create timeseries using device template on root.sg1.d2 +``` +#### 查看设备模板 +```sql +IoTDB> show device templates +``` +- 查看某个设备模板下的物理量 +```sql +IoTDB> show nodes in device template t1 +``` +- 查看挂载了某个设备模板的路径 +```sql +IoTDB> show paths set device template t1 +``` +- 查看使用了某个设备模板的路径(即模板在该路径上已激活,序列已创建) +```sql +IoTDB> show paths using device template t1 +``` +#### 解除设备模板 +```sql +IoTDB> delete timeseries of device template t1 from root.sg1.d1 +``` +```sql +IoTDB> deactivate device template t1 from root.sg1.d1 +``` +批量处理 +```sql +IoTDB> delete timeseries of device template t1 from root.sg1.*, root.sg2.* +``` +```sql +IoTDB> deactivate device template t1 from root.sg1.*, root.sg2.* +``` +#### 卸载设备模板 +```sql +IoTDB> unset device template t1 from root.sg1.d1 +``` +#### 删除设备模板 +```sql +IoTDB> drop device template t1 +``` +### 数据存活时间管理 + +#### 设置 TTL +```sql +IoTDB> set ttl to root.ln 3600000 +``` +```sql +IoTDB> set ttl to root.sgcc.** 3600000 +``` +```sql +IoTDB> set ttl to root.** 3600000 +``` +#### 取消 TTL +```sql +IoTDB> unset ttl from root.ln +``` +```sql +IoTDB> unset ttl from root.sgcc.** +``` +```sql +IoTDB> unset ttl from root.** +``` + +#### 显示 TTL +```sql +IoTDB> SHOW ALL TTL +``` +```sql +IoTDB> SHOW TTL ON pathPattern +``` +```sql +IoTDB> show DEVICES +``` +## 写入数据 + +### 写入单列数据 +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) +``` +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1'),(2, 'v1') +``` +### 写入多列数据 +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) values (2, false, 'v2') +``` +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') +``` +### 使用服务器时间戳 +```sql +IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') +``` +### 写入对齐时间序列数据 +```sql +IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) +``` +```sql +IoTDB > insert into root.sg1.d1(timestamp, s1, s2) aligned values(1, 1, 1) +``` +```sql +IoTDB > insert into root.sg1.d1(timestamp, s1, s2) aligned values(2, 2, 2), (3, 3, 3) +``` +```sql +IoTDB > select * from root.sg1.d1 +``` +### 加载 TsFile 文件数据 + +load '' [sglevel=int][onSuccess=delete/none] + +#### 通过指定文件路径(绝对路径)加载单 tsfile 文件 + +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +- `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1 onSuccess=delete` + + +#### 通过指定文件夹路径(绝对路径)批量加载文件 + +- `load '/Users/Desktop/data'` +- `load '/Users/Desktop/data' sglevel=1` +- `load '/Users/Desktop/data' onSuccess=delete` +- `load '/Users/Desktop/data' sglevel=1 onSuccess=delete` + +## 删除数据 + +### 删除单列数据 +```sql +delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; +``` +```sql +delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` +```sql +delete from root.ln.wf02.wt02.status where time < 10 +``` +```sql +delete from root.ln.wf02.wt02.status where time <= 10 +``` +```sql +delete from root.ln.wf02.wt02.status where time < 20 and time > 10 +``` +```sql +delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 +``` +```sql +delete from root.ln.wf02.wt02.status where time > 20 +``` +```sql +delete from root.ln.wf02.wt02.status where time >= 20 +``` +```sql +delete from root.ln.wf02.wt02.status where time = 20 +``` +出错: +```sql +delete from root.ln.wf02.wt02.status where time > 4 or time < 0 + +Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic + +expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' +``` + +删除时间序列中的所有数据: +```sql +delete from root.ln.wf02.wt02.status +``` +### 删除多列数据 +```sql +delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; +``` +声明式的编程方式: +```sql +IoTDB> delete from root.ln.wf03.wt02.status where time < now() + +Msg: The statement is executed successfully. +``` +## 数据查询 + +### 基础查询 + +#### 时间过滤查询 +```sql +select temperature from root.ln.wf01.wt01 where time < 2017-11-01T00:08:00.000 +``` +#### 根据一个时间区间选择多列数据 +```sql +select status, temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` +#### 按照多个时间区间选择同一设备的多列数据 +```sql +select status, temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` +#### 按照多个时间区间选择不同设备的多列数据 +```sql +select wf01.wt01.status, wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` +#### 根据时间降序返回结果集 +```sql +select * from root.ln.** where time > 1 order by time desc limit 10; +``` +### 选择表达式 + +#### 使用别名 +```sql +select s1 as temperature, s2 as speed from root.ln.wf01.wt01; +``` +#### 运算符 + +#### 函数 + +不支持: +```sql +select s1, count(s1) from root.sg.d1; + +select sin(s1), count(s1) from root.sg.d1; + +select s1, count(s1) from root.sg.d1 group by ([10,100),10ms); +``` +##### 时间序列查询嵌套表达式 + +示例 1: +```sql +select a, + +​ b, + +​ ((a + 1) * 2 - 1) % 2 + 1.5, + +​ sin(a + sin(a + sin(b))), + +​ -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 + +from root.sg1; +``` +示例 2: +```sql +select (a + b) * 2 + sin(a) from root.sg + +示例 3: + +select (a + *) / 2 from root.sg1 + +示例 4: + +select (a + b) * 3 from root.sg, root.ln +``` +##### 聚合查询嵌套表达式 + +示例 1: +```sql +select avg(temperature), + +​ sin(avg(temperature)), + +​ avg(temperature) + 1, + +​ -sum(hardware), + +​ avg(temperature) + sum(hardware) + +from root.ln.wf01.wt01; +``` +示例 2: +```sql +select avg(*), + +​ (avg(*) + 1) * 3 / 2 -1 + +from root.sg1 +``` +示例 3: +```sql +select avg(temperature), + +​ sin(avg(temperature)), + +​ avg(temperature) + 1, + +​ -sum(hardware), + +​ avg(temperature) + sum(hardware) as custom_sum + +from root.ln.wf01.wt01 + +GROUP BY([10, 90), 10ms); +``` +#### 最新点查询 + +SQL 语法: + +```Go +select last [COMMA ]* from < PrefixPath > [COMMA < PrefixPath >]* [ORDER BY TIMESERIES (DESC | ASC)?] +``` + +查询 root.ln.wf01.wt01.status 的最新数据点 +```sql +IoTDB> select last status from root.ln.wf01.wt01 +``` +查询 root.ln.wf01.wt01 下 status,temperature 时间戳大于等于 2017-11-07T23:50:00 的最新数据点 +```sql +IoTDB> select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 +``` + 查询 root.ln.wf01.wt01 下所有序列的最新数据点,并按照序列名降序排列 +```sql +IoTDB> select last * from root.ln.wf01.wt01 order by timeseries desc; +``` +### 查询过滤条件 + +#### 时间过滤条件 + +选择时间戳大于 2022-01-01T00:05:00.000 的数据: +```sql +select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; +``` +选择时间戳等于 2022-01-01T00:05:00.000 的数据: +```sql +select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; +``` +选择时间区间 [2017-11-01T00:05:00.000, 2017-11-01T00:12:00.000) 内的数据: +```sql +select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` +#### 值过滤条件 + +选择值大于 36.5 的数据: +```sql +select temperature from root.sg1.d1 where temperature > 36.5; +``` +选择值等于 true 的数据: +```sql +select status from root.sg1.d1 where status = true; +``` +选择区间 [36.5,40] 内或之外的数据: +```sql +select temperature from root.sg1.d1 where temperature between 36.5 and 40; +``` +```sql +select temperature from root.sg1.d1 where temperature not between 36.5 and 40; +``` +选择值在特定范围内的数据: +```sql +select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +``` +选择值在特定范围外的数据: +```sql +select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +``` +选择值为空的数据: +```sql +select code from root.sg1.d1 where temperature is null; +``` +选择值为非空的数据: +```sql +select code from root.sg1.d1 where temperature is not null; +``` +#### 模糊查询 + +查询 `root.sg.d1` 下 `value` 含有`'cc'`的数据 +```sql +IoTDB> select * from root.sg.d1 where value like '%cc%' +``` +查询 `root.sg.d1` 下 `value` 中间为 `'b'`、前后为任意单个字符的数据 +```sql +IoTDB> select * from root.sg.device where value like '_b_' +``` +查询 root.sg.d1 下 value 值为26个英文字符组成的字符串 +```sql +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' +``` + +查询 root.sg.d1 下 value 值为26个小写英文字符组成的字符串且时间大于100的 +```sql +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 +``` + +### 分段分组聚合 + +#### 未指定滑动步长的时间区间分组聚合查询 +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); +``` +#### 指定滑动步长的时间区间分组聚合查询 +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); +``` +滑动步长可以小于聚合窗口 +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-01 10:00:00), 4h, 2h); +``` +#### 按照自然月份的时间区间分组聚合查询 +```sql +select count(status) from root.ln.wf01.wt01 where time > 2017-11-01T01:00:00 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` +每个时间间隔窗口内都有数据 +```sql +select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` +#### 左开右闭区间 +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); +``` +#### 与分组聚合混合使用 + +统计降采样后的数据点个数 +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; +``` +加上滑动 Step 的降采样后的结果也可以汇总 +```sql +select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; +``` +#### 路径层级分组聚合 + +统计不同 database 下 status 序列的数据点个数 +```sql +select count(status) from root.** group by level = 1 +``` + 统计不同设备下 status 序列的数据点个数 +```sql +select count(status) from root.** group by level = 3 +``` +统计不同 database 下的不同设备中 status 序列的数据点个数 +```sql +select count(status) from root.** group by level = 1, 3 +``` +查询所有序列下温度传感器 temperature 的最大值 +```sql +select max_value(temperature) from root.** group by level = 0 +``` +查询某一层级下所有传感器拥有的总数据点数 +```sql +select count(*) from root.ln.** group by level = 2 +``` +#### 标签分组聚合 + +##### 单标签聚合查询 +```sql +SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); +``` +##### 多标签聚合查询 +```sql +SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); +``` +##### 基于时间区间的标签聚合查询 +```sql +SELECT AVG(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); +``` +#### 差值分段聚合 +```sql +group by variation(controlExpression[,delta][,ignoreNull=true/false]) +``` +##### delta=0时的等值事件分段 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) +``` +指定ignoreNull为false +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) +``` +##### delta!=0时的差值事件分段 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) +``` +#### 条件分段聚合 +```sql +group by condition(predict,[keep>/>=/=/<=/<]threshold,[,ignoreNull=true/false]) +``` +查询至少连续两行以上的charging_status=1的数据 +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoreNull=true) +``` +当设置`ignoreNull`为false时,遇到null值为将其视为一个不满足条件的行,得到结果原先的分组被含null的行拆分 +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoreNull=false) +``` +#### 会话分段聚合 +```sql +group by session(timeInterval) +``` +按照不同的时间单位设定时间间隔 +```sql +select __endTime,count(*) from root.** group by session(1d) +``` +和`HAVING`、`ALIGN BY DEVICE`共同使用 +```sql +select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device +``` +#### 点数分段聚合 +```sql +group by count(controlExpression, size[,ignoreNull=true/false]) +``` +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) + +当使用ignoreNull将null值也考虑进来 +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) +``` +### 聚合结果过滤 + +不正确的: +```sql +select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 + +select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 + +select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 + +select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 +``` +SQL 示例: +```sql + select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 2; + + select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device; +``` +### 结果集补空值 +```sql +FILL '(' PREVIOUS | LINEAR | constant (, interval=DURATION_LITERAL)? ')' +``` +#### `PREVIOUS` 填充 +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); +``` +#### `PREVIOUS` 填充并指定填充超时阈值 +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous, 2m); +``` +#### `LINEAR` 填充 +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); +``` +#### 常量填充 +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); +``` +使用 `BOOLEAN` 类型的常量填充 +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); +``` +### 查询结果分页 + +#### 按行分页 + + 基本的 `LIMIT` 子句 +```sql +select status, temperature from root.ln.wf01.wt01 limit 10 +``` +带 `OFFSET` 的 `LIMIT` 子句 +```sql +select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 +``` +`LIMIT` 子句与 `WHERE` 子句结合 +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time< 2017-11-01T00:12:00.000 limit 5 offset 3 +``` + `LIMIT` 子句与 `GROUP BY` 子句组合 +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 4 offset 3 +``` +#### 按列分页 + + 基本的 `SLIMIT` 子句 +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 +``` +带 `SOFFSET` 的 `SLIMIT` 子句 +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 +``` +`SLIMIT` 子句与 `GROUP BY` 子句结合 +```sql +select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 +``` +`SLIMIT` 子句与 `LIMIT` 子句结合 +```sql +select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 +``` +### 排序 + +时间对齐模式下的排序 +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; +``` +设备对齐模式下的排序 +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; +``` +在时间戳相等时按照设备名排序 +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; +``` +没有显式指定时 +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` +对聚合后的结果进行排序 +```sql +select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device +``` +### 查询对齐模式 + +#### 按设备对齐 +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` +### 查询写回(SELECT INTO) + +#### 整体描述 +```sql +selectIntoStatement + +​ : SELECT + +​ resultColumn [, resultColumn] ... + +​ INTO intoItem [, intoItem] ... + +​ FROM prefixPath [, prefixPath] ... + +​ [WHERE whereCondition] + +​ [GROUP BY groupByTimeClause, groupByLevelClause] + +​ [FILL ({PREVIOUS | LINEAR | constant} (, interval=DURATION_LITERAL)?)] + +​ [LIMIT rowLimit OFFSET rowOffset] + +​ [ALIGN BY DEVICE] + +​ ; + + + +intoItem + +​ : [ALIGNED] intoDevicePath '(' intoMeasurementName [',' intoMeasurementName]* ')' + +​ ; +``` +按时间对齐,将 `root.sg` database 下四条序列的查询结果写入到 `root.sg_copy` database 下指定的四条序列中 +```sql +IoTDB> select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; +``` +按时间对齐,将聚合查询的结果存储到指定序列中 +```sql +IoTDB> select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); +``` +按设备对齐 +```sql +IoTDB> select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +``` +按设备对齐,将表达式计算的结果存储到指定序列中 +```sql +IoTDB> select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; +``` +#### 使用变量占位符 + +##### 按时间对齐(默认) + +###### 目标设备不使用变量占位符 & 目标物理量列表使用变量占位符 +``` + +select s1, s2 + +into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) + +from root.sg.d1, root.sg.d2; +``` + +该语句等价于: +``` + +select s1, s2 + +into root.sg_copy.d1(s1), root.sg_copy.d2(s1), root.sg_copy.d1(s2), root.sg_copy.d2(s2) + +from root.sg.d1, root.sg.d2; +``` + +###### 目标设备使用变量占位符 & 目标物理量列表不使用变量占位符 + +``` +select d1.s1, d1.s2, d2.s3, d3.s4 + +into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) + +from root.sg; +``` + +###### 目标设备使用变量占位符 & 目标物理量列表使用变量占位符 + +``` +select * into root.sg_bk.::(::) from root.sg.**; +``` + +##### 按设备对齐(使用 `ALIGN BY DEVICE`) + +###### 目标设备不使用变量占位符 & 目标物理量列表使用变量占位符 +``` + +select s1, s2, s3, s4 + +into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) + +from root.sg.d1, root.sg.d2, root.sg.d3 + +align by device; +``` + +###### 目标设备使用变量占位符 & 目标物理量列表不使用变量占位符 +``` + +select avg(s1), sum(s2) + sum(s3), count(s4) + +into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) + +from root.** + +align by device; +``` + +###### 目标设备使用变量占位符 & 目标物理量列表使用变量占位符 +``` + +select * into ::(backup_${4}) from root.sg.** align by device; +``` + +#### 指定目标序列为对齐序列 +``` + +select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +``` +## 运维语句 +生成对应的查询计划 +``` +explain select s1,s2 from root.sg.d1 +``` +执行对应的查询语句,并获取分析结果 +``` +explain analyze select s1,s2 from root.sg.d1 order by s1 +``` +## 运算符 + +更多见文档[Operator-and-Expression](./Operator-and-Expression.md) + +### 算数运算符 + +更多见文档 [Arithmetic Operators and Functions](./Operator-and-Expression.md#算数运算符) + +```sql +select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 +``` + +### 比较运算符 + +更多见文档[Comparison Operators and Functions](./Operator-and-Expression.md#比较运算符) + +```sql +# Basic comparison operators +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; + +# `BETWEEN ... AND ...` operator +select temperature from root.sg1.d1 where temperature between 36.5 and 40; +select temperature from root.sg1.d1 where temperature not between 36.5 and 40; + +# Fuzzy matching operator: Use `Like` for fuzzy matching +select * from root.sg.d1 where value like '%cc%' +select * from root.sg.device where value like '_b_' + +# Fuzzy matching operator: Use `Regexp` for fuzzy matching +select * from root.sg.d1 where value regexp '^[A-Za-z]+$' +select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 +select b, b like '1%', b regexp '[0-2]' from root.test; + +# `IS NULL` operator +select code from root.sg1.d1 where temperature is null; +select code from root.sg1.d1 where temperature is not null; + +# `IN` operator +select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +select a, a in (1, 2) from root.test; +``` + +### 逻辑运算符 + +更多见文档[Logical Operators](./Operator-and-Expression.md#逻辑运算符) + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +## 内置函数 + +更多见文档[Operator-and-Expression](./Operator-and-Expression.md#聚合函数) + +### Aggregate Functions + +更多见文档[Aggregate Functions](./Operator-and-Expression.md#聚合函数) + +```sql +select count(status) from root.ln.wf01.wt01; + +select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1; +select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1; + +select time_duration(s1) from root.db.d1; +``` + +### 算数函数 + +更多见文档[Arithmetic Operators and Functions](./Operator-and-Expression.md#数学函数) + +```sql +select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; +select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1; +``` + +### 比较函数 + +更多见文档[Comparison Operators and Functions](./Operator-and-Expression.md#比较函数) + +```sql +select ts, on_off(ts, 'threshold'='2') from root.test; +select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; +``` + +### 字符串处理函数 + +更多见文档[String Processing](./Operator-and-Expression.md#字符串函数) + +```sql +select s1, string_contains(s1, 's'='warn') from root.sg1.d4; +select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; +select s1, length(s1) from root.sg1.d1 +select s1, locate(s1, "target"="1") from root.sg1.d1 +select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 +select s1, startswith(s1, "target"="1") from root.sg1.d1 +select s1, endswith(s1, "target"="1") from root.sg1.d1 +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 +select s1, substring(s1 from 1 for 2) from root.sg1.d1 +select s1, replace(s1, 'es', 'tt') from root.sg1.d1 +select s1, upper(s1) from root.sg1.d1 +select s1, lower(s1) from root.sg1.d1 +select s3, trim(s3) from root.sg1.d1 +select s1, s2, strcmp(s1, s2) from root.sg1.d1 +select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 +select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 +select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 +select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 +select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 +select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 +``` + +### 数据类型转换函数 + +更多见文档[Data Type Conversion Function](./Operator-and-Expression.md#数据类型转换函数) + +```sql +SELECT cast(s1 as INT32) from root.sg +``` + +### 常序列生成函数 + +更多见文档[Constant Timeseries Generating Functions](./Operator-and-Expression.md#常序列生成函数) + +```sql +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; +``` + +### 选择函数 + +更多见文档[Selector Functions](./Operator-and-Expression.md#选择函数) + +```sql +select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; +``` + +### 区间查询函数 + +更多见文档[Continuous Interval Functions](./Operator-and-Expression.md#区间查询函数) + +```sql +select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; +``` + +### 趋势计算函数 + +更多见文档[Variation Trend Calculation Functions](./Operator-and-Expression.md#趋势计算函数) + +```sql +select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; + +SELECT DIFF(s1), DIFF(s2) from root.test; +SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; +``` + +### 采样函数 + +更多见文档[Sample Functions](./Operator-and-Expression.md#采样函数)。 +### 时间序列处理函数 + +更多见文档[Sample Functions](./Operator-and-Expression.md#时间序列处理函数)。 + +```sql +select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; +select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; +select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; +select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; + +select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 +select M4(s1,'windowSize'='10') from root.vehicle.d1 +``` + +### 时间序列处理函数 + +更多见文档[Time-Series](./Operator-and-Expression.md#时间序列处理函数) + +```sql +select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 +``` + +## 数据质量函数库 + +更多见文档[UDF-Libraries](./UDF-Libraries_timecho.md) + +### 数据质量 + +更多见文档[Data-Quality](./UDF-Libraries_timecho.md#数据质量) + +```sql +# Completeness +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Consistency +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Timeliness +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Validity +select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 + +# Accuracy +select Accuracy(t1,t2,t3,m1,m2,m3) from root.test +``` + +### 数据画像 + +更多见文档[Data-Profiling](./UDF-Libraries_timecho.md#数据画像) + +```sql +# ACF +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 + +# Distinct +select distinct(s2) from root.test.d2 + +# Histogram +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 + +# Integral +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 + +# IntegralAvg +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 + +# Mad +select mad(s0) from root.test +select mad(s0, "error"="0.01") from root.test + +# Median +select median(s0, "error"="0.01") from root.test + +# MinMax +select minmax(s1) from root.test + +# Mode +select mode(s2) from root.test.d2 + +# MvAvg +select mvavg(s1, "window"="3") from root.test + +# PACF +select pacf(s1, "lag"="5") from root.test + +# Percentile +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test + +# Quantile +select quantile(s0, "rank"="0.2", "K"="800") from root.test + +# Period +select period(s1) from root.test.d3 + +# QLB +select QLB(s1) from root.test.d1 + +# Resample +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 + +# Sample +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +select sample(s1,'method'='isometric','k'='5') from root.test.d1 + +# Segment +select segment(s1, "error"="0.1") from root.test + +# Skew +select skew(s1) from root.test.d1 + +# Spline +select spline(s1, "points"="151") from root.test + +# Spread +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 + +# Stddev +select stddev(s1) from root.test.d1 + +# ZScore +select zscore(s1) from root.test +``` + +### 异常检测 + +更多见文档[Anomaly-Detection](./UDF-Libraries_timecho.md#异常检测) + +```sql +# IQR +select iqr(s1) from root.test + +# KSigma +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 + +# LOF +select lof(s1,s2) from root.test.d1 where time<1000 +select lof(s1, "method"="series") from root.test.d1 where time<1000 + +# MissDetect +select missdetect(s2,'minlen'='10') from root.test.d2 + +# Range +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 + +# TwoSidedFilter +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test + +# Outlier +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test + +# MasterTrain +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test + +# MasterDetect +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +### 频域分析 + +更多见文档[Frequency-Domain](./UDF-Libraries_timecho.md#频域分析) + +```sql +# Conv +select conv(s1,s2) from root.test.d2 + +# Deconv +select deconv(s3,s2) from root.test.d2 +select deconv(s3,s2,'result'='remainder') from root.test.d2 + +# DWT +select dwt(s1,"method"="haar") from root.test.d1 + +# FFT +select fft(s1) from root.test.d1 +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 + +# HighPass +select highpass(s1,'wpass'='0.45') from root.test.d1 + +# IFFT +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 + +# LowPass +select lowpass(s1,'wpass'='0.45') from root.test.d1 + +# Envelope +select envelope(s1) from root.test.d1 +``` + +### 数据匹配 + +更多见文档[Data-Matching](./UDF-Libraries_timecho.md#数据匹配) + +```sql +# Cov +select cov(s1,s2) from root.test.d2 + +# DTW +select dtw(s1,s2) from root.test.d2 + +# Pearson +select pearson(s1,s2) from root.test.d2 + +# PtnSym +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 + +# XCorr +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +### 数据修复 + +更多见文档[Data-Repairing](./UDF-Libraries_timecho.md#数据修复) + +```sql +# TimestampRepair +select timestamprepair(s1,'interval'='10000') from root.test.d2 +select timestamprepair(s1) from root.test.d2 + +# ValueFill +select valuefill(s1) from root.test.d2 +select valuefill(s1,"method"="previous") from root.test.d2 + +# ValueRepair +select valuerepair(s1) from root.test.d2 +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 + +# MasterRepair +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test + +# SeasonalRepair +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +### 序列发现 + +更多见文档[Series-Discovery](./UDF-Libraries_timecho.md#序列发现) + +```sql +# ConsecutiveSequences +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +select consecutivesequences(s1,s2) from root.test.d1 + +# ConsecutiveWindows +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +### 机器学习 + +更多见文档[Machine-Learning](./UDF-Libraries_timecho.md#机器学习) + +```sql +# AR +select ar(s0,"p"="2") from root.test.d0 + +# Representation +select representation(s0,"tb"="3","vb"="2") from root.test.d0 + +# RM +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +## Lambda 表达式 + +更多见文档[Lambda](./Operator-and-Expression.md#lambda-表达式) + +```sql +select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` +``` + +## 条件表达式 + +更多见文档[Conditional Expressions](./Operator-and-Expression.md#条件表达式) + +```sql +select T, P, case +when 1000=1050 then "bad temperature" +when P<=1000000 or P>=1100000 then "bad pressure" +end as `result` +from root.test1 + +select str, case +when str like "%cc%" then "has cc" +when str like "%dd%" then "has dd" +else "no cc and dd" end as `result` +from root.test2 + +select +count(case when x<=1 then 1 end) as `(-∞,1]`, +count(case when 1 +[RESAMPLE + [EVERY ] + [BOUNDARY ] + [RANGE [, end_time_offset]] +] +[TIMEOUT POLICY BLOCKED|DISCARD] +BEGIN + SELECT CLAUSE + INTO CLAUSE + FROM CLAUSE + [WHERE CLAUSE] + [GROUP BY([, ]) [, level = ]] + [HAVING CLAUSE] + [FILL ({PREVIOUS | LINEAR | constant} (, interval=DURATION_LITERAL)?)] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +END +``` + +#### 配置连续查询执行的周期性间隔 +```sql +CREATE CONTINUOUS QUERY cq1 + +RESAMPLE EVERY 20s + +BEGIN + + SELECT max_value(temperature) + + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + + FROM root.ln.*.* + + GROUP BY(10s) + +END + + + +\> SELECT temperature_max from root.ln.*.*; +``` +#### 配置连续查询的时间窗口大小 +``` +CREATE CONTINUOUS QUERY cq2 + +RESAMPLE RANGE 40s + +BEGIN + + SELECT max_value(temperature) + + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + + FROM root.ln.*.* + + GROUP BY(10s) + +END + + +\> SELECT temperature_max from root.ln.*.*; +``` +#### 同时配置连续查询执行的周期性间隔和时间窗口大小 +```sql +CREATE CONTINUOUS QUERY cq3 + +RESAMPLE EVERY 20s RANGE 40s + +BEGIN + + SELECT max_value(temperature) + + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + + FROM root.ln.*.* + + GROUP BY(10s) + + FILL(100.0) + +END + + + +\> SELECT temperature_max from root.ln.*.*; +``` +#### 配置连续查询每次查询执行时间窗口的结束时间 +```sql +CREATE CONTINUOUS QUERY cq4 + +RESAMPLE EVERY 20s RANGE 40s, 20s + +BEGIN + + SELECT max_value(temperature) + + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + + FROM root.ln.*.* + + GROUP BY(10s) + + FILL(100.0) + +END + + + +\> SELECT temperature_max from root.ln.*.*; +``` +#### 没有GROUP BY TIME子句的连续查询 +```sql +CREATE CONTINUOUS QUERY cq5 + +RESAMPLE EVERY 20s + +BEGIN + + SELECT temperature + 1 + + INTO root.precalculated_sg.::(temperature) + + FROM root.ln.*.* + + align by device + +END + + + +\> SELECT temperature from root.precalculated_sg.*.* align by device; +``` +### 连续查询的管理 + +#### 查询系统已有的连续查询 + +展示集群中所有的已注册的连续查询 +```sql +SHOW (CONTINUOUS QUERIES | CQS) +``` +```sql +SHOW CONTINUOUS QUERIES; +``` +#### 删除已有的连续查询 + +删除指定的名为cq_id的连续查询: + +```sql +DROP (CONTINUOUS QUERY | CQ) +``` +```sql +DROP CONTINUOUS QUERY s1_count_cq; +``` +#### 作为子查询的替代品 + +1. 创建一个连续查询 +```sql +CREATE CQ s1_count_cq + +BEGIN + +​ SELECT count(s1) + +​ INTO root.sg_count.d.count_s1 + +​ FROM root.sg.d + +​ GROUP BY(30m) + +END +``` +1. 查询连续查询的结果 +```sql +SELECT avg(count_s1) from root.sg_count.d; +``` +## 用户自定义函数 + +### UDFParameters +```sql +SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; +``` +### UDF 注册 + +```sql +CREATE FUNCTION AS (USING URI URI-STRING)? +``` + +#### 不指定URI +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' +``` +#### 指定URI +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' +``` +### UDF 卸载 + +```sql +DROP FUNCTION +``` +```sql +DROP FUNCTION example +``` +### UDF 查询 + +#### 带自定义输入参数的查询 +```sql +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +``` +```sql +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; +``` +#### 与其他查询的嵌套查询 +```sql +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; + +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; + +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; + +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` +### 查看所有注册的 UDF +```sql +SHOW FUNCTIONS +``` +## 权限管理 + +### 用户与角色相关 + +- 创建用户(需 MANAGE_USER 权限) + + +```SQL +CREATE USER +eg: CREATE USER user1 'passwd' +``` + +- 删除用户 (需 MANEGE_USER 权限) + + +```SQL +DROP USER +eg: DROP USER user1 +``` + +- 创建角色 (需 MANAGE_ROLE 权限) + +```SQL +CREATE ROLE +eg: CREATE ROLE role1 +``` + +- 删除角色 (需 MANAGE_ROLE 权限) + + +```SQL +DROP ROLE +eg: DROP ROLE role1 +``` + +- 赋予用户角色 (需 MANAGE_ROLE 权限) + + +```SQL +GRANT ROLE TO +eg: GRANT ROLE admin TO user1 +``` + +- 移除用户角色 (需 MANAGE_ROLE 权限) + + +```SQL +REVOKE ROLE FROM +eg: REVOKE ROLE admin FROM user1 +``` + +- 列出所有用户 (需 MANEGE_USER 权限) + +```SQL +LIST USER +``` + +- 列出所有角色 (需 MANAGE_ROLE 权限) + +```SQL +LIST ROLE +``` + +- 列出指定角色下所有用户 (需 MANEGE_USER 权限) + +```SQL +LIST USER OF ROLE +eg: LIST USER OF ROLE roleuser +``` + +- 列出指定用户下所有角色 + +用户可以列出自己的角色,但列出其他用户的角色需要拥有 MANAGE_ROLE 权限。 + +```SQL +LIST ROLE OF USER +eg: LIST ROLE OF USER tempuser +``` + +- 列出用户所有权限 + +用户可以列出自己的权限信息,但列出其他用户的权限需要拥有 MANAGE_USER 权限。 + +```SQL +LIST PRIVILEGES OF USER ; +eg: LIST PRIVILEGES OF USER tempuser; + +``` + +- 列出角色所有权限 + +用户可以列出自己具有的角色的权限信息,列出其他角色的权限需要有 MANAGE_ROLE 权限。 + +```SQL +LIST PRIVILEGES OF ROLE ; +eg: LIST PRIVILEGES OF ROLE actor; +``` + +- 更新密码 + +用户可以更新自己的密码,但更新其他用户密码需要具备MANAGE_USER 权限。 + +```SQL +ALTER USER SET PASSWORD ; +eg: ALTER USER tempuser SET PASSWORD 'newpwd'; +``` + +### 授权与取消授权 + +用户使用授权语句对赋予其他用户权限,语法如下: + +```SQL +GRANT ON TO ROLE/USER [WITH GRANT OPTION]; +eg: GRANT READ ON root.** TO ROLE role1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.** TO USER user1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.**,root.t2.** TO USER user1; +eg: GRANT MANAGE_ROLE ON root.** TO USER user1 WITH GRANT OPTION; +eg: GRANT ALL ON root.** TO USER user1 WITH GRANT OPTION; +``` + +用户使用取消授权语句可以将其他的权限取消,语法如下: + +```SQL +REVOKE ON FROM ROLE/USER ; +eg: REVOKE READ ON root.** FROM ROLE role1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.** FROM USER user1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.**, root.t2.** FROM USER user1; +eg: REVOKE MANAGE_ROLE ON root.** FROM USER user1; +eg: REVOKE ALL ON ROOT.** FROM USER user1; +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_apache.md b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_apache.md new file mode 100644 index 00000000..7112666c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_apache.md @@ -0,0 +1,5346 @@ + +# UDF函数库 + +基于用户自定义函数能力,IoTDB 提供了一系列关于时序数据处理的函数,包括数据质量、数据画像、异常检测、 频域分析、数据匹配、数据修复、序列发现、机器学习等,能够满足工业领域对时序数据处理的需求。 + +> 注意:当前UDF函数库中的函数仅支持毫秒级的时间戳精度。 + +## 安装步骤 +1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 + + | UDF 函数库版本 | 支持的 IoTDB 版本 | 下载链接 | + | --------------- | ----------------- | ------------------------------------------------------------ | + | UDF-1.3.3.zip | V1.3.3及以上 | 请联系天谋商务获取 | + | UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系天谋商务获取 | + +2. 将获取的压缩包中的 library-udf.jar 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 +3. 在 IoTDB 的 SQL 命令行终端(CLI)或可视化控制台(Workbench)的 SQL 操作界面中,执行下述相应的函数注册语句。 +4. 批量注册:两种注册方式:注册脚本 或 SQL汇总语句 +- 注册脚本 + - 将压缩包中的注册脚本(register-UDF.sh 或 register-UDF.bat)按需复制到 IoTDB 的 tools 目录下,修改脚本中的参数(默认为host=127.0.0.1,rpcPort=6667,user=root,pass=root); + - 启动 IoTDB 服务,运行注册脚本批量注册 UDF + +- SQL汇总语句 + - 打开压缩包中的SQl文件,复制全部 SQL 语句,在 IoTDB 的 SQL 命令行终端(CLI)或可视化控制台(Workbench)的 SQL 操作界面中,执行全部 SQl 语句批量注册 UDF + +## 数据质量 + +### Completeness + +#### 注册语句 + +```sql +create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' +``` + +#### 函数简介 + +本函数用于计算时间序列的完整性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的完整性,并输出窗口第一个数据点的时间戳和窗口的完整性。 + +**函数名:** COMPLETENESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 ++ `downtime`:完整性计算是否考虑停机异常。它的取值为 'true' 或 'false',默认值为 'true'. 在考虑停机异常时,长时间的数据缺失将被视作停机,不对完整性产生影响。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行完整性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +### Consistency + +#### 注册语句 + +```sql +create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' +``` + +#### 函数简介 + +本函数用于计算时间序列的一致性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的一致性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** CONSISTENCY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行一致性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +### Timeliness + +#### 注册语句 + +```sql +create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' +``` + +#### 函数简介 + +本函数用于计算时间序列的时效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的时效性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** TIMELINESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行时效性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +### Validity + +#### 注册语句 + +```sql +create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' +``` + +#### 函数简介 + +本函数用于计算时间序列的有效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的有效性,并输出窗口第一个数据点的时间戳和窗口的有效性。 + + +**函数名:** VALIDITY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行有效性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + + + + +## 数据画像 + +### ACF + +#### 注册语句 + +```sql +create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' +``` + +#### 函数简介 + +本函数用于计算时间序列的自相关函数值,即序列与自身之间的互相关函数。 + +**函数名:** ACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点。 + +**提示:** + ++ 序列中的`NaN`值会被忽略,在计算中表现为0。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +### Distinct + +#### 注册语句 + +```sql +create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' +``` + +#### 函数简介 + +本函数可以返回输入序列中出现的所有不同的元素。 + +**函数名:** DISTINCT + +**输入序列:** 仅支持单个输入序列,类型可以是任意的 + +**输出序列:** 输出单个序列,类型与输入相同。 + +**提示:** + ++ 输出序列的时间戳是无意义的。输出顺序是任意的。 ++ 缺失值和空值将被忽略,但`NaN`不会被忽略。 ++ 字符串区分大小写 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select distinct(s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +### Histogram + +#### 注册语句 + +```sql +create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的分布直方图。 + +**函数名:** HISTOGRAM + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `min`:表示所求数据范围的下限,默认值为 -Double.MAX_VALUE。 ++ `max`:表示所求数据范围的上限,默认值为 Double.MAX_VALUE,`start`的值必须小于或等于`end`。 ++ `count`: 表示直方图分桶的数量,默认值为 1,其值必须为正整数。 + +**输出序列:** 直方图分桶的值,其中第 i 个桶(从 1 开始计数)表示的数据范围下界为$min+ (i-1)\cdot\frac{max-min}{count}$,数据范围上界为$min+ i \cdot \frac{max-min}{count}$。 + + +**提示:** + ++ 如果某个数据点的数值小于`min`,它会被放入第 1 个桶;如果某个数据点的数值大于`max`,它会被放入最后 1 个桶。 ++ 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +### Integral + +#### 注册语句 + +```sql +create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' +``` + +#### 函数简介 + +本函数用于计算时间序列的数值积分,即以时间为横坐标、数值为纵坐标绘制的折线图中折线以下的面积。 + +**函数名:** INTEGRAL + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `unit`:积分求解所用的时间轴单位,取值为 "1S", "1s", "1m", "1H", "1d"(区分大小写),分别表示以毫秒、秒、分钟、小时、天为单位计算积分。 + 缺省情况下取 "1s",以秒为单位。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为积分结果的数据点。 + +**提示:** + ++ 积分值等于折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和,不同时间单位下相当于横轴进行不同倍数放缩,得到的积分值可直接按放缩倍数转换。 + ++ 数据中`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + +#### 使用示例 + +##### 参数缺省 + +缺省情况下积分以1s为时间单位。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + + +##### 指定时间单位 + +指定以分钟为时间单位。 + + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+3) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +### IntegralAvg + +#### 注册语句 + +```sql +create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' +``` + +#### 函数简介 + +本函数用于计算时间序列的函数均值,即在相同时间单位下的数值积分除以序列总的时间跨度。更多关于数值积分计算的信息请参考`Integral`函数。 + +**函数名:** INTEGRALAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为时间加权平均结果的数据点。 + +**提示:** + ++ 时间加权的平均值等于在任意时间单位`unit`下计算的数值积分(即折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和), + 除以相同时间单位下输入序列的时间跨度,其值与具体采用的时间单位无关,默认与 IoTDB 时间单位一致。 + ++ 数据中的`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + ++ 输入序列为空时,函数输出结果为 0;仅有一个数据点时,输出结果为该点数值。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +### Mad + +#### 注册语句 + +```sql +create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似绝对中位差,绝对中位差为所有数值与其中位数绝对偏移量的中位数。 + +如有数据集$\{1,3,3,5,5,6,7,8,9\}$,其中位数为5,所有数值与中位数的偏移量的绝对值为$\{0,0,1,2,2,2,3,4,4\}$,其中位数为2,故而原数据集的绝对中位差为2。 + +**函数名:** MAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `error`:近似绝对中位差的基于数值的误差百分比,取值范围为 [0,1),默认值为 0。如当`error`=0.01 时,记精确绝对中位差为a,近似绝对中位差为b,不等式 $0.99a \le b \le 1.01a$ 成立。当`error`=0 时,计算结果为精确绝对中位差。 + + +**输出序列:** 输出单个序列,类型为DOUBLE,序列仅包含一个时间戳为 0、值为绝对中位差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +##### 精确查询 + +当`error`参数缺省或为0时,本函数计算精确绝对中位差。 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +............ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select mad(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|median(root.test.s1, "error"="0")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +##### 近似查询 + +当`error`参数取值不为 0 时,本函数计算近似绝对中位差。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select mad(s1, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s1, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| ++-----------------------------+---------------------------------+ +``` + +### Median + +#### 注册语句 + +```sql +create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似中位数。中位数是顺序排列的一组数据中居于中间位置的数;当序列有偶数个时,中位数为中间二者的平均数。 + +**函数名:** MEDIAN + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `error`:近似中位数的基于排名的误差百分比,取值范围 [0,1),默认值为 0。如当`error`=0.01 时,计算出的中位数的真实排名百分比在 0.49~0.51 之间。当`error`=0 时,计算结果为精确中位数。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为中位数的数据点。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select median(s1, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s1, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+------------------------------------+ +``` + +### MinMax + +#### 注册语句 + +```sql +create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' +``` + +#### 函数简介 + +本函数将输入序列使用 min-max 方法进行标准化。最小值归一至 0,最大值归一至 1. + +**函数名:** MINMAX + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为"batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供最大值及最小值进行流式计算转换。默认为 "batch"。 ++ `min`:使用流式计算时的最小值。 ++ `max`:使用流式计算时的最大值。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select minmax(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + + + +### MvAvg + +#### 注册语句 + +```sql +create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' +``` + +#### 函数简介 + +本函数计算序列的移动平均。 + +**函数名:** MVAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:移动窗口的长度。默认值为 10. + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 指定窗口长度 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +### PACF + +#### 注册语句 + +```sql +create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' +``` + +#### 函数简介 + +本函数通过求解 Yule-Walker 方程,计算序列的偏自相关系数。对于特殊的输入序列,方程可能没有解,此时输出`NaN`。 + +**函数名:** PACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:最大滞后阶数。默认值为$\min(10\log_{10}n,n-1)$,$n$表示数据点个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 指定滞后阶数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select pacf(s1, "lag"="5") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------+ +| Time|pacf(root.test.d1.s1, "lag"="5")| ++-----------------------------+--------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| +|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| +|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| +|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| ++-----------------------------+--------------------------------+ +``` + +### Percentile + +#### 注册语句 + +```sql +create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似分位数。 + +**函数名:** PERCENTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名百分比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算中位数。 ++ `error`:近似分位数的基于排名的误差百分比,取值范围为 [0,1),默认值为0。如`rank`=0.5 且`error`=0.01,则计算出的分位数的真实排名百分比在 0.49~0.51之间。当`error`=0 时,计算结果为精确分位数。 + +**输出序列:** 输出单个序列,类型与输入序列相同。当`error`=0时,序列仅包含一个时间戳为分位数第一次出现的时间戳、值为分位数的数据点;否则,输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +用于查询的 SQL 语句: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|percentile(root.test.s0, "rank"="0.2", "error"="0.01")| ++-----------------------------+------------------------------------------------------+ +|2021-03-17T10:35:02.054+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +```输入序列: + +``` ++-----------------------------+-------------+ +| Time|root.test2.s1| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+-------------+ +............ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select percentile(s1, "rank"="0.2", "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| ++-----------------------------+-------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| -1.0| ++-----------------------------+-------------------------------------------------------+ +``` + + +### Quantile + +#### 注册语句 + +```sql +create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的近似分位数。本函数基于KLL sketch算法实现。 + +**函数名:** QUANTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算近似中位数。 ++ `K`:允许维护的KLL sketch大小,最小值为100,默认值为800。如`rank`=0.5 且`K`=800,则计算出的分位数的真实排名比有至少99%的可能性在 0.49~0.51之间。 + +**输出序列:** 输出单个序列,类型与输入序列相同。输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + + +输入序列: + +``` ++-----------------------------+-------------+ +| Time|root.test1.s1| ++-----------------------------+-------------+ +|2021-03-17T10:32:17.054+08:00| 7| +|2021-03-17T10:32:18.054+08:00| 15| +|2021-03-17T10:32:19.054+08:00| 36| +|2021-03-17T10:32:20.054+08:00| 39| +|2021-03-17T10:32:21.054+08:00| 40| +|2021-03-17T10:32:22.054+08:00| 41| +|2021-03-17T10:32:23.054+08:00| 20| +|2021-03-17T10:32:24.054+08:00| 18| ++-----------------------------+-------------+ +............ +Total line number = 8 +``` + +用于查询的 SQL 语句: + +```sql +select quantile(s1, "rank"="0.2", "K"="800") from root.test1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------+ +| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.000000000000001| ++-----------------------------+------------------------------------------------+ +``` + +### Period + +#### 注册语句 + +```sql +create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的周期。 + +**函数名:** PERIOD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 INT32,序列仅包含一个时间戳为 0、值为周期的数据点。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select period(s1) from root.test.d3 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +### QLB + +#### 注册语句 + +```sql +create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' +``` + +#### 函数简介 + +本函数对输入序列计算$Q_{LB} $统计量,并计算对应的p值。p值越小表明序列越有可能为非平稳序列。 + +**函数名:** QLB + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:计算时用到的最大延迟阶数,取值应为 1 至 n-2 之间的整数,n 为序列采样总数。默认取 n-2。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列是$Q_{LB} $统计量对应的 p 值,时间标签代表偏移阶数。 + +**提示:** $Q_{LB} $统计量由自相关系数求得,如需得到统计量而非 p 值,可以使用 ACF 函数。 + +#### 使用示例 + +##### 使用默认参数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select QLB(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +### Resample + +#### 注册语句 + +```sql +create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' +``` + +#### 函数简介 + +本函数对输入序列按照指定的频率进行重采样,包括上采样和下采样。目前,本函数支持的上采样方法包括`NaN`填充法 (NaN)、前值填充法 (FFill)、后值填充法 (BFill) 以及线性插值法 (Linear);本函数支持的下采样方法为分组聚合,聚合方法包括最大值 (Max)、最小值 (Min)、首值 (First)、末值 (Last)、平均值 (Mean)和中位数 (Median)。 + +**函数名:** RESAMPLE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `every`:重采样频率,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 ++ `interp`:上采样的插值方法,取值为 'NaN'、'FFill'、'BFill' 或 'Linear'。在缺省情况下,使用`NaN`填充法。 ++ `aggr`:下采样的聚合方法,取值为 'Max'、'Min'、'First'、'Last'、'Mean' 或 'Median'。在缺省情况下,使用平均数聚合。 ++ `start`:重采样的起始时间(包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用第一个有效数据点的时间戳。 ++ `end`:重采样的结束时间(不包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用最后一个有效数据点的时间戳。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列按照重采样频率严格等间隔分布。 + +**提示:** 数据中的`NaN`将会被忽略。 + +#### 使用示例 + +##### 上采样 + +当重采样频率高于数据原始频率时,将会进行上采样。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +##### 下采样 + +当重采样频率低于数据原始频率时,将会进行下采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + +###### 指定重采样时间段 + +可以使用`start`和`end`两个参数指定重采样的时间段,超出实际时间范围的部分会被插值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +### Sample + +#### 注册语句 + +```sql +create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' +``` + +#### 函数简介 + +本函数对输入序列进行采样,即从输入序列中选取指定数量的数据点并输出。目前,本函数支持三种采样方法:**蓄水池采样法 (reservoir sampling)** 对数据进行随机采样,所有数据点被采样的概率相同;**等距采样法 (isometric sampling)** 按照相等的索引间隔对数据进行采样,**最大三角采样法 (triangle sampling)** 对所有数据会按采样率分桶,每个桶内会计算数据点间三角形面积,并保留面积最大的点,该算法通常用于数据的可视化展示中,采用过程可以保证一些关键的突变点在采用中得到保留,更多抽样算法细节可以阅读论文 [here](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf)。 + +**函数名:** SAMPLE + +**输入序列:** 仅支持单个输入序列,类型可以是任意的。 + +**参数:** + ++ `method`:采样方法,取值为 'reservoir','isometric' 或 'triangle' 。在缺省情况下,采用蓄水池采样法。 ++ `k`:采样数,它是一个正整数,在缺省情况下为 1。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列的长度为采样数,序列中的每一个数据点都来自于输入序列。 + +**提示:** 如果采样数大于序列长度,那么输入序列中所有的数据点都会被输出。 + +#### 使用示例 + + +##### 蓄水池采样 + +当`method`参数为 'reservoir' 或缺省时,采用蓄水池采样法对输入序列进行采样。由于该采样方法具有随机性,下面展示的输出序列只是一种可能的结果。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + + +##### 等距采样 + +当`method`参数为 'isometric' 时,采用等距采样法对输入序列进行采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +### Segment + +#### 注册语句 + +```sql +create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' +``` + +#### 函数简介 + +本函数按照数据的线性变化趋势将数据划分为多个子序列,返回分段直线拟合后的子序列首值或所有拟合值。 + +**函数名:** SEGMENT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `output`:"all" 输出所有拟合值;"first" 输出子序列起点拟合值。默认为 "first"。 + ++ `error`:判定存在线性趋势的误差允许阈值。误差的定义为子序列进行线性拟合的误差的绝对值的均值。默认为 0.1. + +**输出序列:** 输出单个序列,类型为 DOUBLE。 + +**提示:** 函数默认所有数据等时间间隔分布。函数读取所有数据,若原始数据过多,请先进行降采样处理。拟合采用自底向上方法,子序列的尾值可能会被认作子序列首值输出。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select segment(s1,"error"="0.1") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +### Skew + +#### 注册语句 + +```sql +create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的总体偏度 + +**函数名:** SKEW + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为总体偏度的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select skew(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +### Spline + +#### 注册语句 + +```sql +create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' +``` + +#### 函数简介 + +本函数提供对原始序列进行三次样条曲线拟合后的插值重采样。 + +**函数名:** SPLINE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `points`:重采样个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**提示**:输出序列保留输入序列的首尾值,等时间间隔采样。仅当输入点个数不少于 4 个时才计算插值。 + +#### 使用示例 + +##### 指定插值个数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spline(s1, "points"="151") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +### Spread + +#### 注册语句 + +```sql +create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' +``` + +#### 函数简介 + +本函数用于计算时间序列的极差,即最大值减去最小值的结果。 + +**函数名:** SPREAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型与输入相同,序列仅包含一个时间戳为 0 、值为极差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + + + +### ZScore + +#### 注册语句 + +```sql +create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' +``` + +#### 函数简介 + +本函数将输入序列使用z-score方法进行归一化。 + +**函数名:** ZSCORE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为 "batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供均值及方差进行流式计算转换。默认为 "batch"。 ++ `avg`:使用流式计算时的均值。 ++ `sd`:使用流式计算时的标准差。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select zscore(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` + + + +## 异常检测 + +### IQR + +#### 注册语句 + +```sql +create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' +``` + +#### 函数简介 + +本函数用于检验超出上下四分位数1.5倍IQR的数据分布异常。 + +**函数名:** IQR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:若设置为 "batch",则将数据全部读入后检测;若设置为 "stream",则需用户提供上下四分位数进行流式检测。默认为 "batch"。 ++ `q1`:使用流式计算时的下四分位数。 ++ `q3`:使用流式计算时的上四分位数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**说明**:$IQR=Q_3-Q_1$ + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select iqr(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +### KSigma + +#### 注册语句 + +```sql +create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' +``` + +#### 函数简介 + +本函数利用动态 K-Sigma 算法进行异常检测。在一个窗口内,与平均值的差距超过k倍标准差的数据将被视作异常并输出。 + +**函数名:** KSIGMA + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `k`:在动态 K-Sigma 算法中,分布异常的标准差倍数阈值,默认值为 3。 ++ `window`:动态 K-Sigma 算法的滑动窗口大小,默认值为 10000。 + + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** k 应大于 0,否则将不做输出。 + +#### 使用示例 + +##### 指定k + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +### LOF + +#### 注册语句 + +```sql +create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' +``` + +#### 函数简介 + +本函数使用局部离群点检测方法用于查找序列的密度异常。将根据提供的第k距离数及局部离群点因子(lof)阈值,判断输入数据是否为离群点,即异常,并输出各点的 LOF 值。 + +**函数名:** LOF + +**输入序列:** 多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:使用的检测方法。默认为 default,以高维数据计算。设置为 series,将一维时间序列转换为高维数据计算。 ++ `k`:使用第k距离计算局部离群点因子.默认为 3。 ++ `window`:每次读取数据的窗口长度。默认为 10000. ++ `windowsize`:使用series方法时,转化高维数据的维数,即单个窗口的大小。默认为 5。 + +**输出序列:** 输出单时间序列,类型为DOUBLE。 + +**提示:** 不完整的数据行会被忽略,不参与计算,也不标记为离群点。 + + +#### 使用示例 + +##### 默认参数 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +##### 诊断一维时间序列 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +### MissDetect + +#### 注册语句 + +```sql +create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' +``` + +#### 函数简介 + +本函数用于检测数据中的缺失异常。在一些数据中,缺失数据会被线性插值填补,在数据中出现完美的线性片段,且这些片段往往长度较大。本函数通过在数据中发现这些完美线性片段来检测缺失异常。 + +**函数名:** MISSDETECT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `minlen`:被标记为异常的完美线性片段的最小长度,是一个大于等于 10 的整数,默认值为 10。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN,即该数据点是否为缺失异常。 + +**提示:** 数据中的`NaN`将会被忽略。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +### Range + +#### 注册语句 + +```sql +create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' +``` + +#### 函数简介 + +本函数用于查找时间序列的范围异常。将根据提供的上界与下界,判断输入数据是否越界,即异常,并输出所有异常点为新的时间序列。 + +**函数名:** RANGE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `lower_bound`:范围异常检测的下界。 ++ `upper_bound`:范围异常检测的上界。 + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** 应满足`upper_bound`大于`lower_bound`,否则将不做输出。 + + +#### 使用示例 + +##### 指定上界与下界 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +### TwoSidedFilter + +#### 注册语句 + +```sql +create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' +``` + +#### 函数简介 + +本函数基于双边窗口检测法对输入序列中的异常点进行过滤。 + +**函数名:** TWOSIDEDFILTER + +**输出序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型与输入相同,是输入序列去除异常点后的结果。 + +**参数:** + +- `len`:双边窗口检测法中的窗口大小,取值范围为正整数,默认值为 5.如当`len`=3 时,算法向前、向后各取长度为3的窗口,在窗口中计算异常度。 +- `threshold`:异常度的阈值,取值范围为(0,1),默认值为 0.3。阈值越高,函数对于异常度的判定标准越严格。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +输出序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +### Outlier + +#### 注册语句 + +```sql +create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' +``` + +#### 函数简介 + +本函数用于检测基于距离的异常点。在当前窗口中,如果一个点距离阈值范围内的邻居数量(包括它自己)少于密度阈值,则该点是异常点。 + +**函数名:** OUTLIER + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `r`:基于距离异常检测中的距离阈值。 ++ `k`:基于距离异常检测中的密度阈值。 ++ `w`:用于指定滑动窗口的大小。 ++ `s`:用于指定滑动窗口的步长。 + +**输出序列**:输出单个序列,类型与输入序列相同。 + +#### 使用示例 + +##### 指定查询参数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + +### MasterTrain + +#### 函数简介 + +本函数基于主数据训练VAR预测模型。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由连续p+1个非错误值作为训练样本训练VAR模型,输出训练后的模型参数。 + +**函数名:** MasterTrain + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 + +**输出序列:** 输出单个序列,类型为DOUBLE。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn spotless:apply` +- 在根目录运行 `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies` 编译项目 +- 将 `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterTrain as org.apache.iotdb.library.anomaly.UDTFMasterTrain'`。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ + +``` + +### MasterDetect + +#### 函数简介 + +本函数基于主数据检测并修复时间序列中的错误值。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由MasterTrain训练的模型进行时间序列预测,错误值将由预测值及主数据共同修复。 + +**函数名:** MasterDetect + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `beta`:异常值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `output_type`:输出结果类型,可选'repair'或'anomaly',即输出修复结果或异常检测结果,在缺省情况下默认为'repair'。 ++ `output_column`:输出列的序号,默认为1,即输出第一列的修复结果。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn spotless:apply` +- 在根目录运行 `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies` 编译项目 +- 将 `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'`。 + +**输出序列:** 输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +##### 修复 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +##### 异常检测 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| false| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + +## 频域分析 + +### Conv + +#### 注册语句 + +```sql +create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' +``` + +#### 函数简介 + +本函数对两个输入序列进行卷积,即多项式乘法。 + + +**函数名:** CONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为DOUBLE,它是两个序列卷积的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +### Deconv + +#### 注册语句 + +```sql +create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' +``` + +#### 函数简介 + +本函数对两个输入序列进行去卷积,即多项式除法运算。 + +**函数名:** DECONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `result`:去卷积的结果,取值为'quotient'或'remainder',分别对应于去卷积的商和余数。在缺省情况下,输出去卷积的商。 + +**输出序列:** 输出单个序列,类型为DOUBLE。它是将第二个序列从第一个序列中去卷积(第一个序列除以第二个序列)的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +##### 计算去卷积的商 + +当`result`参数缺省或为'quotient'时,本函数计算去卷积的商。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +##### 计算去卷积的余数 + +当`result`参数为'remainder'时,本函数计算去卷积的余数。输入序列同上,用于查询的SQL语句如下: + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +### DWT + +#### 注册语句 + +```sql +create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' +``` + +#### 函数简介 + +本函数对输入序列进行一维离散小波变换。 + +**函数名:** DWT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:小波滤波的类型,提供'Haar', 'DB4', 'DB6', 'DB8',其中DB指代Daubechies。若不设置该参数,则用户需提供小波滤波的系数。不区分大小写。 ++ `coef`:小波滤波的系数。若提供该参数,请使用英文逗号','分割各项,不添加空格或其它符号。 ++ `layer`:进行变换的次数,最终输出的向量个数等同于$layer+1$.默认取1。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。 + +**提示:** 输入序列长度必须为2的整数次幂。 + +#### 使用示例 + +##### Haar变换 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +### FFT + +#### 注册语句 + +```sql +create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' +``` + +#### 函数简介 + +本函数对输入序列进行快速傅里叶变换。 + +**函数名:** FFT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:傅里叶变换的类型,取值为'uniform'或'nonuniform',缺省情况下为'uniform'。当取值为'uniform'时,时间戳将被忽略,所有数据点都将被视作等距的,并应用等距快速傅里叶算法;当取值为'nonuniform'时,将根据时间戳应用非等距快速傅里叶算法(未实现)。 ++ `result`:傅里叶变换的结果,取值为'real'、'imag'、'abs'或'angle',分别对应于变换结果的实部、虚部、模和幅角。在缺省情况下,输出变换的模。 ++ `compress`:压缩参数,取值范围(0,1],是有损压缩时保留的能量比例。在缺省情况下,不进行压缩。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +##### 等距傅里叶变换 + +当`type`参数缺省或为'uniform'时,本函数进行等距傅里叶变换。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select fft(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此在输出序列中$k=4$和$k=5$处有尖峰。 + +##### 等距傅里叶变换并压缩 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +注:基于傅里叶变换结果的共轭性质,压缩结果只保留前一半;根据给定的压缩参数,从低频到高频保留数据点,直到保留的能量比例超过该值;保留最后一个数据点以表示序列长度。 + +### HighPass + +#### 注册语句 + +```sql +create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' +``` + +#### 函数简介 + +本函数对输入序列进行高通滤波,提取高于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** HIGHPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此高通滤波之后的输出序列服从$y=sin(2\pi t/4)$。 + +### IFFT + +#### 注册语句 + +```sql +create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' +``` + +#### 函数简介 + +本函数将输入的两个序列作为实部和虚部视作一个复数,进行逆快速傅里叶变换,并输出结果的实部。输入数据的格式参见`FFT`函数的输出,并支持以`FFT`函数压缩后的输出作为本函数的输入。 + +**函数名:** IFFT + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `start`:输出序列的起始时刻,是一个格式为'yyyy-MM-dd HH:mm:ss'的时间字符串。在缺省情况下,为'1970-01-01 08:00:00'。 ++ `interval`:输出序列的时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,为1s。 + + +**输出序列:** 输出单个序列,类型为DOUBLE。该序列是一个等距时间序列,它的值是将两个输入序列依次作为实部和虚部进行逆快速傅里叶变换的结果。 + +**提示:** 如果某行数据中包含空值或`NaN`,该行数据将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +用于查询的SQL语句: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +### LowPass + +#### 注册语句 + +```sql +create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' +``` + +#### 函数简介 + +本函数对输入序列进行低通滤波,提取低于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** LOWPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` +## Envelope + +### 函数简介 + +本函数通过输入一维浮点数数组和用户指定的调制频率,实现对信号的解调和包络提取。解调的目标是从复杂的信号中提取感兴趣的部分,使其更易理解。比如通过解调可以找到信号的包络,即振幅的变化趋势。 + +**函数名:** Envelope + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `frequency`:频率(选填,正数。不填此参数,系统会基于序列对应时间的时间间隔来推断频率)。 ++ `amplification`: 扩增倍数(选填,正整数。输出Time列的结果为正整数的集合,不会输出小数。当频率小1时,可通过此参数对频率进行扩增以展示正常的结果)。 + +**输出序列:** ++ `Time`: 该列返回的值的含义是频率而并非时间,如果输出的格式为时间格式(如:1970-01-01T08:00:19.000+08:00),请将其转为时间戳值。 + ++ `Envelope(Path, 'frequency'='{frequency}')`:输出单个序列,类型为DOUBLE,它是包络分析之后的结果。 + +**提示:** 当解调的原始序列的值不连续时,本函数会视为连续处理,建议被分析的时间序列是一段值完整的时间序列。同时建议指定开始时间与结束时间。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:01.000+08:00| 1.0 | +|1970-01-01T08:00:02.000+08:00| 2.0 | +|1970-01-01T08:00:03.000+08:00| 3.0 | +|1970-01-01T08:00:04.000+08:00| 4.0 | +|1970-01-01T08:00:05.000+08:00| 5.0 | +|1970-01-01T08:00:06.000+08:00| 6.0 | +|1970-01-01T08:00:07.000+08:00| 7.0 | +|1970-01-01T08:00:08.000+08:00| 8.0 | +|1970-01-01T08:00:09.000+08:00| 9.0 | +|1970-01-01T08:00:10.000+08:00| 10.0 | ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: +```sql +set time_display_type=long; +select envelope(s1),envelope(s1,'frequency'='1000'),envelope(s1,'amplification'='10') from root.test.d1; +``` +输出序列: + +``` ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +|Time|envelope(root.test.d1.s1)|envelope(root.test.d1.s1, "frequency"="1000")|envelope(root.test.d1.s1, "amplification"="10")| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +| 0| 6.284350808484124| 6.284350808484124| 6.284350808484124| +| 100| 1.5581923657404393| 1.5581923657404393| null| +| 200| 0.8503211038340728| 0.8503211038340728| null| +| 300| 0.512808785945551| 0.512808785945551| null| +| 400| 0.26361156774506744| 0.26361156774506744| null| +|1000| null| null| 1.5581923657404393| +|2000| null| null| 0.8503211038340728| +|3000| null| null| 0.512808785945551| +|4000| null| null| 0.26361156774506744| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ + +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此低通滤波之后的输出序列服从$y=2sin(2\pi t/5)$。 + + + +## 数据匹配 + +### Cov + +#### 注册语句 + +```sql +create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的总体协方差。 + +**函数名:** COV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为总体协方差的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +### Dtw + +#### 注册语句 + +```sql +create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的 DTW 距离。 + +**函数名:** DTW + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为两个时间序列的 DTW 距离值。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出 0。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +### Pearson + +#### 注册语句 + +```sql +create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的皮尔森相关系数。 + +**函数名:** PEARSON + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为皮尔森相关系数的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +### PtnSym + +#### 注册语句 + +```sql +create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' +``` + +#### 函数简介 + +本函数用于寻找序列中所有对称度小于阈值的对称子序列。对称度通过 DTW 计算,值越小代表序列对称性越高。 + +**函数名:** PTNSYM + +**输入序列:** 仅支持一个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:对称子序列的长度,是一个正整数,默认值为 10。 ++ `threshold`:对称度阈值,是一个非负数,只有对称度小于等于该值的对称子序列才会被输出。在缺省情况下,所有的子序列都会被输出。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中的每一个数据点对应于一个对称子序列,时间戳为子序列的起始时刻,值为对称度。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +### XCorr + +#### 注册语句 + +```sql +create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' +``` + +#### 函数简介 + +本函数用于计算两条时间序列的互相关函数值, +对离散序列而言,互相关函数可以表示为 +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +常用于表征两条序列在不同对齐条件下的相似度。 + +**函数名:** XCORR + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点, +其中正中心的值为两条序列按照预先对齐的结果计算的互相关系数(即等于以上公式的$CR(0)$), +前半部分的值表示将后一条输入序列向前平移时计算的互相关系数, +直至两条序列没有重合的数据点(不包含完全分离时的结果$CR(-N)=0.0$), +后半部分类似。 +用公式可表示为(所有序列的索引从1开始计数): +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**提示:** + ++ 两条序列中的`null` 和`NaN` 值会被忽略,在计算中表现为 0。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` + + + +## 数据修复 + +### TimestampRepair + +#### 注册语句 + +```sql +create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' +``` + +### 函数简介 + +本函数用于时间戳修复。根据给定的标准时间间隔,采用最小化修复代价的方法,通过对数据时间戳的微调,将原本时间戳间隔不稳定的数据修复为严格等间隔的数据。在未给定标准时间间隔的情况下,本函数将使用时间间隔的中位数 (median)、众数 (mode) 或聚类中心 (cluster) 来推算标准时间间隔。 + + +**函数名:** TIMESTAMPREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `interval`: 标准时间间隔(单位是毫秒),是一个正整数。在缺省情况下,将根据指定的方法推算。 ++ `method`:推算标准时间间隔的方法,取值为 'median', 'mode' 或 'cluster',仅在`interval`缺省时有效。在缺省情况下,将使用中位数方法进行推算。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +### 使用示例 + +#### 指定标准时间间隔 + +在给定`interval`参数的情况下,本函数将按照指定的标准时间间隔进行修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +#### 自动推算标准时间间隔 + +如果`interval`参数没有给定,本函数将按照推算的标准时间间隔进行修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +### ValueFill + +#### 注册语句 + +```sql +create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' +``` + +#### 函数简介 + +**函数名:** ValueFill + +**输入序列:** 单列时序数据,类型为INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, 默认为 "linear"。其中,“mean” 指使用均值填补的方法; “previous" 指使用前值填补方法;“linear" 指使用线性插值填补方法;“likelihood” 为基于速度的正态分布的极大似然估计方法;“AR” 指自回归的填补方法;“MA” 指滑动平均的填补方法;"SCREEN" 指约束填补方法;缺省情况下使用 “linear”。 + +**输出序列:** 填补后的单维序列。 + +**备注:** AR 模型采用 AR(1),时序列需满足自相关条件,否则将输出单个数据点 (0, 0.0). + +#### 使用示例 +##### 使用 linear 方法进行填补 + +当`method`缺省或取值为 'linear' 时,本函数将使用线性插值方法进行填补。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuefill(s1) from root.test.d2 +``` + +输出序列: + + + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +##### 使用 previous 方法进行填补 + +当`method`取值为 'previous' 时,本函数将使前值填补方法进行数值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +### ValueRepair + +#### 注册语句 + +```sql +create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' +``` + +#### 函数简介 + +本函数用于对时间序列的数值进行修复。目前,本函数支持两种修复方法:**Screen** 是一种基于速度阈值的方法,在最小改动的前提下使得所有的速度符合阈值要求;**LsGreedy** 是一种基于速度变化似然的方法,将速度变化建模为高斯分布,并采用贪心算法极大化似然函数。 + +**函数名:** VALUEREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的方法,取值为 'Screen' 或 'LsGreedy'. 在缺省情况下,使用 Screen 方法进行修复。 ++ `minSpeed`:该参数仅在使用 Screen 方法时有效。当速度小于该值时会被视作数值异常点加以修复。在缺省情况下为中位数减去三倍绝对中位差。 ++ `maxSpeed`:该参数仅在使用 Screen 方法时有效。当速度大于该值时会被视作数值异常点加以修复。在缺省情况下为中位数加上三倍绝对中位差。 ++ `center`:该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的中心。在缺省情况下为 0。 ++ `sigma` :该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的标准差。在缺省情况下为绝对中位差。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +#### 使用示例 + +##### 使用 Screen 方法进行修复 + +当`method`缺省或取值为 'Screen' 时,本函数将使用 Screen 方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +##### 使用 LsGreedy 方法进行修复 + +当`method`取值为 'LsGreedy' 时,本函数将使用 LsGreedy 方法进行数值修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +### MasterRepair + +#### 函数简介 + +本函数实现基于主数据的时间序列数据修复。 + +**函数名:**MasterRepair + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `omega`:算法窗口大小,非负整数(单位为毫秒), 在缺省情况下,算法根据不同时间差下的两个元组距离自动估计该参数。 +- `eta`:算法距离阈值,正数, 在缺省情况下,算法根据窗口中元组的距离分布自动估计该参数。 +- `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 +- `output_column`:输出列的序号,默认输出第一列的修复结果。 + +**输出序列:**输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +输出序列: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +### SeasonalRepair + +#### 函数简介 +本函数用于对周期性时间序列的数值进行基于分解的修复。目前,本函数支持两种方法:**Classical**使用经典分解方法得到的残差项检测数值的异常波动,并使用滑动平均修复序列;**Improved**使用改进的分解方法得到的残差项检测数值的异常波动,并使用滑动中值修复序列。 + +**函数名:** SEASONALREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的分解方法,取值为'Classical'或'Improved'。在缺省情况下,使用经典分解方法进行修复。 ++ `period`:序列的周期。 ++ `k`:残差项的范围阈值,用来限制残差项偏离中心的程度。在缺省情况下为9。 ++ `max_iter`:算法的最大迭代次数。在缺省情况下为10。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +#### 使用示例 +##### 使用经典分解方法进行修复 +当`method`缺省或取值为'Classical'时,本函数将使用经典分解方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +##### 使用改进的分解方法进行修复 +当`method`取值为'Improved'时,本函数将使用改进的分解方法进行数值修复。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` + + + +## 序列发现 + +### ConsecutiveSequences + +#### 注册语句 + +```sql +create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' +``` + +#### 函数简介 + +本函数用于在多维严格等间隔数据中发现局部最长连续子序列。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续子序列是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。如果某个连续子序列不是任何连续子序列的真子序列,那么它是局部最长的。 + + +**函数名:** CONSECUTIVESEQUENCES + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个局部最长连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +#### 使用示例 + +##### 手动指定标准时间间隔 + +本函数可以通过`gap`参数手动指定标准时间间隔。需要注意的是,错误的参数设置会导致输出产生严重错误。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + +##### 自动估计标准时间间隔 + +当`gap`参数缺省时,本函数可以利用众数估计标准时间间隔,得到同样的结果。因此,这种用法更受推荐。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +### ConsecutiveWindows + +#### 注册语句 + +```sql +create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' +``` + +#### 函数简介 + +本函数用于在多维严格等间隔数据中发现指定长度的连续窗口。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续窗口是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。 + + +**函数名:** CONSECUTIVEWINDOWS + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 ++ `length`:序列长度,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个指定长度连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` + + + +## 机器学习 + +### AR + +#### 注册语句 + +```sql +create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' +``` +#### 函数简介 + +本函数用于学习数据的自回归模型系数。 + +**函数名:** AR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `p`:自回归模型的阶数。默认为1。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。第一行对应模型的一阶系数,以此类推。 + +**提示:** + +- `p`应为正整数。 + +- 序列中的大部分点为等间隔采样点。 +- 序列中的缺失点通过线性插值进行填补后用于学习过程。 + +#### 使用示例 + +##### 指定阶数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +### Representation + +#### 函数简介 + +本函数用于时间序列的表示。 + +**函数名:** Representation + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为INT32,长度为`tb*vb`。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +#### 使用示例 + +##### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +### RM + +#### 函数简介 + +本函数用于基于时间序列表示的匹配度。 + +**函数名:** RM + +**输入序列:** 仅支持两个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度为`1`。序列的时间戳从0开始,序列仅有一个数据点,其时间戳为0,值为两个时间序列的匹配度。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +#### 使用示例 + +##### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_timecho.md new file mode 100644 index 00000000..aebf800a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/SQL-Manual/UDF-Libraries_timecho.md @@ -0,0 +1,5333 @@ + +# UDF函数库 + +基于用户自定义函数能力,IoTDB 提供了一系列关于时序数据处理的函数,包括数据质量、数据画像、异常检测、 频域分析、数据匹配、数据修复、序列发现、机器学习等,能够满足工业领域对时序数据处理的需求。 + +> 注意:当前UDF函数库中的函数仅支持毫秒级的时间戳精度。 + +## 安装步骤 +1. 请获取与 IoTDB 版本兼容的 UDF 函数库 JAR 包的压缩包。 + + | UDF 函数库版本 | 支持的 IoTDB 版本 | 下载链接 | + | --------------- | ----------------- | ------------------------------------------------------------ | + | UDF-1.3.3.zip | V1.3.3及以上 |请联系天谋商务获取 | + | UDF-1.3.2.zip | V1.0.0~V1.3.2 | 请联系天谋商务获取 | + +2. 将获取的压缩包中的 `library-udf.jar` 文件放置在 IoTDB 集群所有节点的 `/ext/udf` 的目录下 +3. 在 IoTDB 的 SQL 命令行终端(CLI)或可视化控制台(Workbench)的 SQL 操作界面中,执行下述相应的函数注册语句。 +4. 批量注册:两种注册方式:注册脚本 或 SQL汇总语句 +- 注册脚本 + - 将压缩包中的注册脚本(`register-UDF.sh` 或 `register-UDF.bat`)按需复制到 IoTDB 的 tools 目录下,修改脚本中的参数(默认为host=127.0.0.1,rpcPort=6667,user=root,pass=root); + - 启动 IoTDB 服务,运行注册脚本批量注册 UDF + +- SQL汇总语句 + - 打开压缩包中的SQl文件,复制全部 SQL 语句,在 IoTDB 的 SQL 命令行终端(CLI)或可视化控制台(Workbench)的 SQL 操作界面中,执行全部 SQl 语句批量注册 UDF + +## 数据质量 + +### Completeness + +#### 注册语句 + +```sql +create function completeness as 'org.apache.iotdb.library.dquality.UDTFCompleteness' +``` + +#### 函数简介 + +本函数用于计算时间序列的完整性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的完整性,并输出窗口第一个数据点的时间戳和窗口的完整性。 + +**函数名:** COMPLETENESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 ++ `downtime`:完整性计算是否考虑停机异常。它的取值为 'true' 或 'false',默认值为 'true'. 在考虑停机异常时,长时间的数据缺失将被视作停机,不对完整性产生影响。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行完整性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +### Consistency + +#### 注册语句 + +```sql +create function consistency as 'org.apache.iotdb.library.dquality.UDTFConsistency' +``` + +#### 函数简介 + +本函数用于计算时间序列的一致性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的一致性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** CONSISTENCY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行一致性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +### Timeliness + +#### 注册语句 + +```sql +create function timeliness as 'org.apache.iotdb.library.dquality.UDTFTimeliness' +``` + +#### 函数简介 + +本函数用于计算时间序列的时效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的时效性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** TIMELINESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行时效性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +### Validity + +#### 注册语句 + +```sql +create function validity as 'org.apache.iotdb.library.dquality.UDTFValidity' +``` + +#### 函数简介 + +本函数用于计算时间序列的有效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的有效性,并输出窗口第一个数据点的时间戳和窗口的有效性。 + + +**函数名:** VALIDITY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行有效性计算。否则,该窗口将被忽略,不做任何输出。 + + +#### 使用示例 + +##### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +##### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + + + + +## 数据画像 + +### ACF + +#### 注册语句 + +```sql +create function acf as 'org.apache.iotdb.library.dprofile.UDTFACF' +``` + +#### 函数简介 + +本函数用于计算时间序列的自相关函数值,即序列与自身之间的互相关函数。 + +**函数名:** ACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点。 + +**提示:** + ++ 序列中的`NaN`值会被忽略,在计算中表现为0。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +### Distinct + +#### 注册语句 + +```sql +create function distinct as 'org.apache.iotdb.library.dprofile.UDTFDistinct' +``` + +#### 函数简介 + +本函数可以返回输入序列中出现的所有不同的元素。 + +**函数名:** DISTINCT + +**输入序列:** 仅支持单个输入序列,类型可以是任意的 + +**输出序列:** 输出单个序列,类型与输入相同。 + +**提示:** + ++ 输出序列的时间戳是无意义的。输出顺序是任意的。 ++ 缺失值和空值将被忽略,但`NaN`不会被忽略。 ++ 字符串区分大小写 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select distinct(s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +### Histogram + +#### 注册语句 + +```sql +create function histogram as 'org.apache.iotdb.library.dprofile.UDTFHistogram' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的分布直方图。 + +**函数名:** HISTOGRAM + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `min`:表示所求数据范围的下限,默认值为 -Double.MAX_VALUE。 ++ `max`:表示所求数据范围的上限,默认值为 Double.MAX_VALUE,`start`的值必须小于或等于`end`。 ++ `count`: 表示直方图分桶的数量,默认值为 1,其值必须为正整数。 + +**输出序列:** 直方图分桶的值,其中第 i 个桶(从 1 开始计数)表示的数据范围下界为$min+ (i-1)\cdot\frac{max-min}{count}$,数据范围上界为$min+ i \cdot \frac{max-min}{count}$。 + + +**提示:** + ++ 如果某个数据点的数值小于`min`,它会被放入第 1 个桶;如果某个数据点的数值大于`max`,它会被放入最后 1 个桶。 ++ 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +### Integral + +#### 注册语句 + +```sql +create function integral as 'org.apache.iotdb.library.dprofile.UDAFIntegral' +``` + +#### 函数简介 + +本函数用于计算时间序列的数值积分,即以时间为横坐标、数值为纵坐标绘制的折线图中折线以下的面积。 + +**函数名:** INTEGRAL + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `unit`:积分求解所用的时间轴单位,取值为 "1S", "1s", "1m", "1H", "1d"(区分大小写),分别表示以毫秒、秒、分钟、小时、天为单位计算积分。 + 缺省情况下取 "1s",以秒为单位。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为积分结果的数据点。 + +**提示:** + ++ 积分值等于折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和,不同时间单位下相当于横轴进行不同倍数放缩,得到的积分值可直接按放缩倍数转换。 + ++ 数据中`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + +#### 使用示例 + +##### 参数缺省 + +缺省情况下积分以1s为时间单位。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + + +##### 指定时间单位 + +指定以分钟为时间单位。 + + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+3) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +### IntegralAvg + +#### 注册语句 + +```sql +create function integralavg as 'org.apache.iotdb.library.dprofile.UDAFIntegralAvg' +``` + +#### 函数简介 + +本函数用于计算时间序列的函数均值,即在相同时间单位下的数值积分除以序列总的时间跨度。更多关于数值积分计算的信息请参考`Integral`函数。 + +**函数名:** INTEGRALAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为时间加权平均结果的数据点。 + +**提示:** + ++ 时间加权的平均值等于在任意时间单位`unit`下计算的数值积分(即折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和), + 除以相同时间单位下输入序列的时间跨度,其值与具体采用的时间单位无关,默认与 IoTDB 时间单位一致。 + ++ 数据中的`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + ++ 输入序列为空时,函数输出结果为 0;仅有一个数据点时,输出结果为该点数值。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +### Mad + +#### 注册语句 + +```sql +create function mad as 'org.apache.iotdb.library.dprofile.UDAFMad' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似绝对中位差,绝对中位差为所有数值与其中位数绝对偏移量的中位数。 + +如有数据集$\{1,3,3,5,5,6,7,8,9\}$,其中位数为5,所有数值与中位数的偏移量的绝对值为$\{0,0,1,2,2,2,3,4,4\}$,其中位数为2,故而原数据集的绝对中位差为2。 + +**函数名:** MAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `error`:近似绝对中位差的基于数值的误差百分比,取值范围为 [0,1),默认值为 0。如当`error`=0.01 时,记精确绝对中位差为a,近似绝对中位差为b,不等式 $0.99a \le b \le 1.01a$ 成立。当`error`=0 时,计算结果为精确绝对中位差。 + + +**输出序列:** 输出单个序列,类型为DOUBLE,序列仅包含一个时间戳为 0、值为绝对中位差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +##### 精确查询 + +当`error`参数缺省或为0时,本函数计算精确绝对中位差。 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +............ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select mad(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|median(root.test.s1, "error"="0")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +##### 近似查询 + +当`error`参数取值不为 0 时,本函数计算近似绝对中位差。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select mad(s1, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s1, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9900000000000001| ++-----------------------------+---------------------------------+ +``` + +### Median + +#### 注册语句 + +```sql +create function median as 'org.apache.iotdb.library.dprofile.UDAFMedian' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似中位数。中位数是顺序排列的一组数据中居于中间位置的数;当序列有偶数个时,中位数为中间二者的平均数。 + +**函数名:** MEDIAN + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `error`:近似中位数的基于排名的误差百分比,取值范围 [0,1),默认值为 0。如当`error`=0.01 时,计算出的中位数的真实排名百分比在 0.49~0.51 之间。当`error`=0 时,计算结果为精确中位数。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为中位数的数据点。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select median(s1, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s1, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| ++-----------------------------+------------------------------------+ +``` + +### MinMax + +#### 注册语句 + +```sql +create function minmax as 'org.apache.iotdb.library.dprofile.UDTFMinMax' +``` + +#### 函数简介 + +本函数将输入序列使用 min-max 方法进行标准化。最小值归一至 0,最大值归一至 1. + +**函数名:** MINMAX + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为"batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供最大值及最小值进行流式计算转换。默认为 "batch"。 ++ `min`:使用流式计算时的最小值。 ++ `max`:使用流式计算时的最大值。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select minmax(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + + + +### MvAvg + +#### 注册语句 + +```sql +create function mvavg as 'org.apache.iotdb.library.dprofile.UDTFMvAvg' +``` + +#### 函数简介 + +本函数计算序列的移动平均。 + +**函数名:** MVAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:移动窗口的长度。默认值为 10. + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 指定窗口长度 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +### PACF + +#### 注册语句 + +```sql +create function pacf as 'org.apache.iotdb.library.dprofile.UDTFPACF' +``` + +#### 函数简介 + +本函数通过求解 Yule-Walker 方程,计算序列的偏自相关系数。对于特殊的输入序列,方程可能没有解,此时输出`NaN`。 + +**函数名:** PACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:最大滞后阶数。默认值为$\min(10\log_{10}n,n-1)$,$n$表示数据点个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 指定滞后阶数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select pacf(s1, "lag"="5") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------+ +| Time|pacf(root.test.d1.s1, "lag"="5")| ++-----------------------------+--------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| -0.5744680851063829| +|2020-01-01T00:00:03.000+08:00| 0.3172297297297296| +|2020-01-01T00:00:04.000+08:00| -0.2977686586304181| +|2020-01-01T00:00:05.000+08:00| -2.0609033521065867| ++-----------------------------+--------------------------------+ +``` + +### Percentile + +#### 注册语句 + +```sql +create function percentile as 'org.apache.iotdb.library.dprofile.UDAFPercentile' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的精确或近似分位数。 + +**函数名:** PERCENTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名百分比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算中位数。 ++ `error`:近似分位数的基于排名的误差百分比,取值范围为 [0,1),默认值为0。如`rank`=0.5 且`error`=0.01,则计算出的分位数的真实排名百分比在 0.49~0.51之间。当`error`=0 时,计算结果为精确分位数。 + +**输出序列:** 输出单个序列,类型与输入序列相同。当`error`=0时,序列仅包含一个时间戳为分位数第一次出现的时间戳、值为分位数的数据点;否则,输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +用于查询的 SQL 语句: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|percentile(root.test.s0, "rank"="0.2", "error"="0.01")| ++-----------------------------+------------------------------------------------------+ +|2021-03-17T10:35:02.054+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +```输入序列: + +``` ++-----------------------------+-------------+ +| Time|root.test2.s1| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+-------------+ +............ +Total line number = 20 +``` + +用于查询的 SQL 语句: + +```sql +select percentile(s1, "rank"="0.2", "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|percentile(root.test2.s1, "rank"="0.2", "error"="0.01")| ++-----------------------------+-------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| -1.0| ++-----------------------------+-------------------------------------------------------+ +``` + + +### Quantile + +#### 注册语句 + +```sql +create function quantile as 'org.apache.iotdb.library.dprofile.UDAFQuantile' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的近似分位数。本函数基于KLL sketch算法实现。 + +**函数名:** QUANTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算近似中位数。 ++ `K`:允许维护的KLL sketch大小,最小值为100,默认值为800。如`rank`=0.5 且`K`=800,则计算出的分位数的真实排名比有至少99%的可能性在 0.49~0.51之间。 + +**输出序列:** 输出单个序列,类型与输入序列相同。输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + + +输入序列: + +``` ++-----------------------------+-------------+ +| Time|root.test1.s1| ++-----------------------------+-------------+ +|2021-03-17T10:32:17.054+08:00| 7| +|2021-03-17T10:32:18.054+08:00| 15| +|2021-03-17T10:32:19.054+08:00| 36| +|2021-03-17T10:32:20.054+08:00| 39| +|2021-03-17T10:32:21.054+08:00| 40| +|2021-03-17T10:32:22.054+08:00| 41| +|2021-03-17T10:32:23.054+08:00| 20| +|2021-03-17T10:32:24.054+08:00| 18| ++-----------------------------+-------------+ +............ +Total line number = 8 +``` + +用于查询的 SQL 语句: + +```sql +select quantile(s1, "rank"="0.2", "K"="800") from root.test1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------+ +| Time|quantile(root.test1.s1, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.000000000000001| ++-----------------------------+------------------------------------------------+ +``` + +### Period + +#### 注册语句 + +```sql +create function period as 'org.apache.iotdb.library.dprofile.UDAFPeriod' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的周期。 + +**函数名:** PERIOD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 INT32,序列仅包含一个时间戳为 0、值为周期的数据点。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select period(s1) from root.test.d3 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +### QLB + +#### 注册语句 + +```sql +create function qlb as 'org.apache.iotdb.library.dprofile.UDTFQLB' +``` + +#### 函数简介 + +本函数对输入序列计算$Q_{LB} $统计量,并计算对应的p值。p值越小表明序列越有可能为非平稳序列。 + +**函数名:** QLB + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:计算时用到的最大延迟阶数,取值应为 1 至 n-2 之间的整数,n 为序列采样总数。默认取 n-2。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列是$Q_{LB} $统计量对应的 p 值,时间标签代表偏移阶数。 + +**提示:** $Q_{LB} $统计量由自相关系数求得,如需得到统计量而非 p 值,可以使用 ACF 函数。 + +#### 使用示例 + +##### 使用默认参数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select QLB(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +### Resample + +#### 注册语句 + +```sql +create function re_sample as 'org.apache.iotdb.library.dprofile.UDTFResample' +``` + +#### 函数简介 + +本函数对输入序列按照指定的频率进行重采样,包括上采样和下采样。目前,本函数支持的上采样方法包括`NaN`填充法 (NaN)、前值填充法 (FFill)、后值填充法 (BFill) 以及线性插值法 (Linear);本函数支持的下采样方法为分组聚合,聚合方法包括最大值 (Max)、最小值 (Min)、首值 (First)、末值 (Last)、平均值 (Mean)和中位数 (Median)。 + +**函数名:** RESAMPLE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `every`:重采样频率,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 ++ `interp`:上采样的插值方法,取值为 'NaN'、'FFill'、'BFill' 或 'Linear'。在缺省情况下,使用`NaN`填充法。 ++ `aggr`:下采样的聚合方法,取值为 'Max'、'Min'、'First'、'Last'、'Mean' 或 'Median'。在缺省情况下,使用平均数聚合。 ++ `start`:重采样的起始时间(包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用第一个有效数据点的时间戳。 ++ `end`:重采样的结束时间(不包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用最后一个有效数据点的时间戳。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列按照重采样频率严格等间隔分布。 + +**提示:** 数据中的`NaN`将会被忽略。 + +#### 使用示例 + +##### 上采样 + +当重采样频率高于数据原始频率时,将会进行上采样。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +##### 下采样 + +当重采样频率低于数据原始频率时,将会进行下采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + +###### 指定重采样时间段 + +可以使用`start`和`end`两个参数指定重采样的时间段,超出实际时间范围的部分会被插值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +### Sample + +#### 注册语句 + +```sql +create function sample as 'org.apache.iotdb.library.dprofile.UDTFSample' +``` + +#### 函数简介 + +本函数对输入序列进行采样,即从输入序列中选取指定数量的数据点并输出。目前,本函数支持三种采样方法:**蓄水池采样法 (reservoir sampling)** 对数据进行随机采样,所有数据点被采样的概率相同;**等距采样法 (isometric sampling)** 按照相等的索引间隔对数据进行采样,**最大三角采样法 (triangle sampling)** 对所有数据会按采样率分桶,每个桶内会计算数据点间三角形面积,并保留面积最大的点,该算法通常用于数据的可视化展示中,采用过程可以保证一些关键的突变点在采用中得到保留,更多抽样算法细节可以阅读论文 [here](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf)。 + +**函数名:** SAMPLE + +**输入序列:** 仅支持单个输入序列,类型可以是任意的。 + +**参数:** + ++ `method`:采样方法,取值为 'reservoir','isometric' 或 'triangle' 。在缺省情况下,采用蓄水池采样法。 ++ `k`:采样数,它是一个正整数,在缺省情况下为 1。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列的长度为采样数,序列中的每一个数据点都来自于输入序列。 + +**提示:** 如果采样数大于序列长度,那么输入序列中所有的数据点都会被输出。 + +#### 使用示例 + + +##### 蓄水池采样 + +当`method`参数为 'reservoir' 或缺省时,采用蓄水池采样法对输入序列进行采样。由于该采样方法具有随机性,下面展示的输出序列只是一种可能的结果。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + + +##### 等距采样 + +当`method`参数为 'isometric' 时,采用等距采样法对输入序列进行采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +### Segment + +#### 注册语句 + +```sql +create function segment as 'org.apache.iotdb.library.dprofile.UDTFSegment' +``` + +#### 函数简介 + +本函数按照数据的线性变化趋势将数据划分为多个子序列,返回分段直线拟合后的子序列首值或所有拟合值。 + +**函数名:** SEGMENT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `output`:"all" 输出所有拟合值;"first" 输出子序列起点拟合值。默认为 "first"。 + ++ `error`:判定存在线性趋势的误差允许阈值。误差的定义为子序列进行线性拟合的误差的绝对值的均值。默认为 0.1. + +**输出序列:** 输出单个序列,类型为 DOUBLE。 + +**提示:** 函数默认所有数据等时间间隔分布。函数读取所有数据,若原始数据过多,请先进行降采样处理。拟合采用自底向上方法,子序列的尾值可能会被认作子序列首值输出。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select segment(s1,"error"="0.1") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +### Skew + +#### 注册语句 + +```sql +create function skew as 'org.apache.iotdb.library.dprofile.UDAFSkew' +``` + +#### 函数简介 + +本函数用于计算单列数值型数据的总体偏度 + +**函数名:** SKEW + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为总体偏度的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select skew(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +### Spline + +#### 注册语句 + +```sql +create function spline as 'org.apache.iotdb.library.dprofile.UDTFSpline' +``` + +#### 函数简介 + +本函数提供对原始序列进行三次样条曲线拟合后的插值重采样。 + +**函数名:** SPLINE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `points`:重采样个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**提示**:输出序列保留输入序列的首尾值,等时间间隔采样。仅当输入点个数不少于 4 个时才计算插值。 + +#### 使用示例 + +##### 指定插值个数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spline(s1, "points"="151") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +### Spread + +#### 注册语句 + +```sql +create function spread as 'org.apache.iotdb.library.dprofile.UDAFSpread' +``` + +#### 函数简介 + +本函数用于计算时间序列的极差,即最大值减去最小值的结果。 + +**函数名:** SPREAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型与输入相同,序列仅包含一个时间戳为 0 、值为极差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + + + +### ZScore + +#### 注册语句 + +```sql +create function zscore as 'org.apache.iotdb.library.dprofile.UDTFZScore' +``` + +#### 函数简介 + +本函数将输入序列使用z-score方法进行归一化。 + +**函数名:** ZSCORE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为 "batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供均值及方差进行流式计算转换。默认为 "batch"。 ++ `avg`:使用流式计算时的均值。 ++ `sd`:使用流式计算时的标准差。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select zscore(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` + + + +## 异常检测 + +### IQR + +#### 注册语句 + +```sql +create function iqr as 'org.apache.iotdb.library.anomaly.UDTFIQR' +``` + +#### 函数简介 + +本函数用于检验超出上下四分位数1.5倍IQR的数据分布异常。 + +**函数名:** IQR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:若设置为 "batch",则将数据全部读入后检测;若设置为 "stream",则需用户提供上下四分位数进行流式检测。默认为 "batch"。 ++ `q1`:使用流式计算时的下四分位数。 ++ `q3`:使用流式计算时的上四分位数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**说明**:$IQR=Q_3-Q_1$ + +#### 使用示例 + +##### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select iqr(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +### KSigma + +#### 注册语句 + +```sql +create function ksigma as 'org.apache.iotdb.library.anomaly.UDTFKSigma' +``` + +#### 函数简介 + +本函数利用动态 K-Sigma 算法进行异常检测。在一个窗口内,与平均值的差距超过k倍标准差的数据将被视作异常并输出。 + +**函数名:** KSIGMA + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `k`:在动态 K-Sigma 算法中,分布异常的标准差倍数阈值,默认值为 3。 ++ `window`:动态 K-Sigma 算法的滑动窗口大小,默认值为 10000。 + + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** k 应大于 0,否则将不做输出。 + +#### 使用示例 + +##### 指定k + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +### LOF + +#### 注册语句 + +```sql +create function LOF as 'org.apache.iotdb.library.anomaly.UDTFLOF' +``` + +#### 函数简介 + +本函数使用局部离群点检测方法用于查找序列的密度异常。将根据提供的第k距离数及局部离群点因子(lof)阈值,判断输入数据是否为离群点,即异常,并输出各点的 LOF 值。 + +**函数名:** LOF + +**输入序列:** 多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:使用的检测方法。默认为 default,以高维数据计算。设置为 series,将一维时间序列转换为高维数据计算。 ++ `k`:使用第k距离计算局部离群点因子.默认为 3。 ++ `window`:每次读取数据的窗口长度。默认为 10000. ++ `windowsize`:使用series方法时,转化高维数据的维数,即单个窗口的大小。默认为 5。 + +**输出序列:** 输出单时间序列,类型为DOUBLE。 + +**提示:** 不完整的数据行会被忽略,不参与计算,也不标记为离群点。 + + +#### 使用示例 + +##### 默认参数 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +##### 诊断一维时间序列 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +### MissDetect + +#### 注册语句 + +```sql +create function missdetect as 'org.apache.iotdb.library.anomaly.UDTFMissDetect' +``` + +#### 函数简介 + +本函数用于检测数据中的缺失异常。在一些数据中,缺失数据会被线性插值填补,在数据中出现完美的线性片段,且这些片段往往长度较大。本函数通过在数据中发现这些完美线性片段来检测缺失异常。 + +**函数名:** MISSDETECT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `minlen`:被标记为异常的完美线性片段的最小长度,是一个大于等于 10 的整数,默认值为 10。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN,即该数据点是否为缺失异常。 + +**提示:** 数据中的`NaN`将会被忽略。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +### Range + +#### 注册语句 + +```sql +create function range as 'org.apache.iotdb.library.anomaly.UDTFRange' +``` + +#### 函数简介 + +本函数用于查找时间序列的范围异常。将根据提供的上界与下界,判断输入数据是否越界,即异常,并输出所有异常点为新的时间序列。 + +**函数名:** RANGE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `lower_bound`:范围异常检测的下界。 ++ `upper_bound`:范围异常检测的上界。 + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** 应满足`upper_bound`大于`lower_bound`,否则将不做输出。 + + +#### 使用示例 + +##### 指定上界与下界 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +### TwoSidedFilter + +#### 注册语句 + +```sql +create function twosidedfilter as 'org.apache.iotdb.library.anomaly.UDTFTwoSidedFilter' +``` + +#### 函数简介 + +本函数基于双边窗口检测法对输入序列中的异常点进行过滤。 + +**函数名:** TWOSIDEDFILTER + +**输出序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型与输入相同,是输入序列去除异常点后的结果。 + +**参数:** + +- `len`:双边窗口检测法中的窗口大小,取值范围为正整数,默认值为 5.如当`len`=3 时,算法向前、向后各取长度为3的窗口,在窗口中计算异常度。 +- `threshold`:异常度的阈值,取值范围为(0,1),默认值为 0.3。阈值越高,函数对于异常度的判定标准越严格。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +输出序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +### Outlier + +#### 注册语句 + +```sql +create function outlier as 'org.apache.iotdb.library.anomaly.UDTFOutlier' +``` + +#### 函数简介 + +本函数用于检测基于距离的异常点。在当前窗口中,如果一个点距离阈值范围内的邻居数量(包括它自己)少于密度阈值,则该点是异常点。 + +**函数名:** OUTLIER + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `r`:基于距离异常检测中的距离阈值。 ++ `k`:基于距离异常检测中的密度阈值。 ++ `w`:用于指定滑动窗口的大小。 ++ `s`:用于指定滑动窗口的步长。 + +**输出序列**:输出单个序列,类型与输入序列相同。 + +#### 使用示例 + +##### 指定查询参数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + +### MasterTrain + +#### 函数简介 + +本函数基于主数据训练VAR预测模型。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由连续p+1个非错误值作为训练样本训练VAR模型,输出训练后的模型参数。 + +**函数名:** MasterTrain + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 + +**输出序列:** 输出单个序列,类型为DOUBLE。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn spotless:apply` +- 在根目录运行 `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies` 编译项目 +- 将 `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterTrain as org.apache.iotdb.library.anomaly.UDTFMasterTrain'`。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ + +``` + +### MasterDetect + +#### 函数简介 + +本函数基于主数据检测并修复时间序列中的错误值。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由MasterTrain训练的模型进行时间序列预测,错误值将由预测值及主数据共同修复。 + +**函数名:** MasterDetect + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `beta`:异常值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `output_type`:输出结果类型,可选'repair'或'anomaly',即输出修复结果或异常检测结果,在缺省情况下默认为'repair'。 ++ `output_column`:输出列的序号,默认为1,即输出第一列的修复结果。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn spotless:apply` +- 在根目录运行 `mvn clean package -pl library-udf -DskipTests -am -P get-jar-with-dependencies` 编译项目 +- 将 `./library-UDF/target/library-udf-1.2.0-SNAPSHOT-jar-with-dependencies.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'`。 + +**输出序列:** 输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +##### 修复 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +##### 异常检测 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| false| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + +## 频域分析 + +### Conv + +#### 注册语句 + +```sql +create function conv as 'org.apache.iotdb.library.frequency.UDTFConv' +``` + +#### 函数简介 + +本函数对两个输入序列进行卷积,即多项式乘法。 + + +**函数名:** CONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为DOUBLE,它是两个序列卷积的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +### Deconv + +#### 注册语句 + +```sql +create function deconv as 'org.apache.iotdb.library.frequency.UDTFDeconv' +``` + +#### 函数简介 + +本函数对两个输入序列进行去卷积,即多项式除法运算。 + +**函数名:** DECONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `result`:去卷积的结果,取值为'quotient'或'remainder',分别对应于去卷积的商和余数。在缺省情况下,输出去卷积的商。 + +**输出序列:** 输出单个序列,类型为DOUBLE。它是将第二个序列从第一个序列中去卷积(第一个序列除以第二个序列)的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +##### 计算去卷积的商 + +当`result`参数缺省或为'quotient'时,本函数计算去卷积的商。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +##### 计算去卷积的余数 + +当`result`参数为'remainder'时,本函数计算去卷积的余数。输入序列同上,用于查询的SQL语句如下: + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +### DWT + +#### 注册语句 + +```sql +create function dwt as 'org.apache.iotdb.library.frequency.UDTFDWT' +``` + +#### 函数简介 + +本函数对输入序列进行一维离散小波变换。 + +**函数名:** DWT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:小波滤波的类型,提供'Haar', 'DB4', 'DB6', 'DB8',其中DB指代Daubechies。若不设置该参数,则用户需提供小波滤波的系数。不区分大小写。 ++ `coef`:小波滤波的系数。若提供该参数,请使用英文逗号','分割各项,不添加空格或其它符号。 ++ `layer`:进行变换的次数,最终输出的向量个数等同于$layer+1$.默认取1。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。 + +**提示:** 输入序列长度必须为2的整数次幂。 + +#### 使用示例 + +##### Haar变换 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +### FFT + +#### 注册语句 + +```sql +create function fft as 'org.apache.iotdb.library.frequency.UDTFFFT' +``` + +#### 函数简介 + +本函数对输入序列进行快速傅里叶变换。 + +**函数名:** FFT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:傅里叶变换的类型,取值为'uniform'或'nonuniform',缺省情况下为'uniform'。当取值为'uniform'时,时间戳将被忽略,所有数据点都将被视作等距的,并应用等距快速傅里叶算法;当取值为'nonuniform'时,将根据时间戳应用非等距快速傅里叶算法(未实现)。 ++ `result`:傅里叶变换的结果,取值为'real'、'imag'、'abs'或'angle',分别对应于变换结果的实部、虚部、模和幅角。在缺省情况下,输出变换的模。 ++ `compress`:压缩参数,取值范围(0,1],是有损压缩时保留的能量比例。在缺省情况下,不进行压缩。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +##### 等距傅里叶变换 + +当`type`参数缺省或为'uniform'时,本函数进行等距傅里叶变换。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select fft(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此在输出序列中$k=4$和$k=5$处有尖峰。 + +##### 等距傅里叶变换并压缩 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +注:基于傅里叶变换结果的共轭性质,压缩结果只保留前一半;根据给定的压缩参数,从低频到高频保留数据点,直到保留的能量比例超过该值;保留最后一个数据点以表示序列长度。 + +### HighPass + +#### 注册语句 + +```sql +create function highpass as 'org.apache.iotdb.library.frequency.UDTFHighPass' +``` + +#### 函数简介 + +本函数对输入序列进行高通滤波,提取高于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** HIGHPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此高通滤波之后的输出序列服从$y=sin(2\pi t/4)$。 + +### IFFT + +#### 注册语句 + +```sql +create function ifft as 'org.apache.iotdb.library.frequency.UDTFIFFT' +``` + +#### 函数简介 + +本函数将输入的两个序列作为实部和虚部视作一个复数,进行逆快速傅里叶变换,并输出结果的实部。输入数据的格式参见`FFT`函数的输出,并支持以`FFT`函数压缩后的输出作为本函数的输入。 + +**函数名:** IFFT + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `start`:输出序列的起始时刻,是一个格式为'yyyy-MM-dd HH:mm:ss'的时间字符串。在缺省情况下,为'1970-01-01 08:00:00'。 ++ `interval`:输出序列的时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,为1s。 + + +**输出序列:** 输出单个序列,类型为DOUBLE。该序列是一个等距时间序列,它的值是将两个输入序列依次作为实部和虚部进行逆快速傅里叶变换的结果。 + +**提示:** 如果某行数据中包含空值或`NaN`,该行数据将会被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +用于查询的SQL语句: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +### LowPass + +#### 注册语句 + +```sql +create function lowpass as 'org.apache.iotdb.library.frequency.UDTFLowPass' +``` + +#### 函数简介 + +本函数对输入序列进行低通滤波,提取低于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** LOWPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此低通滤波之后的输出序列服从$y=2sin(2\pi t/5)$。 + + +### Envelope + +#### 注册语句 + +```sql +create function envelope as 'org.apache.iotdb.library.frequency.UDFEnvelopeAnalysis' +``` + +#### 函数简介 + +本函数通过输入一维浮点数数组和用户指定的调制频率,实现对信号的解调和包络提取。解调的目标是从复杂的信号中提取感兴趣的部分,使其更易理解。比如通过解调可以找到信号的包络,即振幅的变化趋势。 + +**函数名:** Envelope + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `frequency`:频率(选填,正数。不填此参数,系统会基于序列对应时间的时间间隔来推断频率)。 ++ `amplification`: 扩增倍数(选填,正整数。输出Time列的结果为正整数的集合,不会输出小数。当频率小1时,可通过此参数对频率进行扩增以展示正常的结果)。 + +**输出序列:** ++ `Time`: 该列返回的值的含义是频率而并非时间,如果输出的格式为时间格式(如:1970-01-01T08:00:19.000+08:00),请将其转为时间戳值。 + ++ `Envelope(Path, 'frequency'='{frequency}')`:输出单个序列,类型为DOUBLE,它是包络分析之后的结果。 + +**提示:** 当解调的原始序列的值不连续时,本函数会视为连续处理,建议被分析的时间序列是一段值完整的时间序列。同时建议指定开始时间与结束时间。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:01.000+08:00| 1.0 | +|1970-01-01T08:00:02.000+08:00| 2.0 | +|1970-01-01T08:00:03.000+08:00| 3.0 | +|1970-01-01T08:00:04.000+08:00| 4.0 | +|1970-01-01T08:00:05.000+08:00| 5.0 | +|1970-01-01T08:00:06.000+08:00| 6.0 | +|1970-01-01T08:00:07.000+08:00| 7.0 | +|1970-01-01T08:00:08.000+08:00| 8.0 | +|1970-01-01T08:00:09.000+08:00| 9.0 | +|1970-01-01T08:00:10.000+08:00| 10.0 | ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: +```sql +set time_display_type=long; +select envelope(s1),envelope(s1,'frequency'='1000'),envelope(s1,'amplification'='10') from root.test.d1; +``` +输出序列: + +``` ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +|Time|envelope(root.test.d1.s1)|envelope(root.test.d1.s1, "frequency"="1000")|envelope(root.test.d1.s1, "amplification"="10")| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +| 0| 6.284350808484124| 6.284350808484124| 6.284350808484124| +| 100| 1.5581923657404393| 1.5581923657404393| null| +| 200| 0.8503211038340728| 0.8503211038340728| null| +| 300| 0.512808785945551| 0.512808785945551| null| +| 400| 0.26361156774506744| 0.26361156774506744| null| +|1000| null| null| 1.5581923657404393| +|2000| null| null| 0.8503211038340728| +|3000| null| null| 0.512808785945551| +|4000| null| null| 0.26361156774506744| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ + +``` + +## 数据匹配 + +### Cov + +#### 注册语句 + +```sql +create function cov as 'org.apache.iotdb.library.dmatch.UDAFCov' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的总体协方差。 + +**函数名:** COV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为总体协方差的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +### Dtw + +#### 注册语句 + +```sql +create function dtw as 'org.apache.iotdb.library.dmatch.UDAFDtw' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的 DTW 距离。 + +**函数名:** DTW + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为两个时间序列的 DTW 距离值。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出 0。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +### Pearson + +#### 注册语句 + +```sql +create function pearson as 'org.apache.iotdb.library.dmatch.UDAFPearson' +``` + +#### 函数简介 + +本函数用于计算两列数值型数据的皮尔森相关系数。 + +**函数名:** PEARSON + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为皮尔森相关系数的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +### PtnSym + +#### 注册语句 + +```sql +create function ptnsym as 'org.apache.iotdb.library.dmatch.UDTFPtnSym' +``` + +#### 函数简介 + +本函数用于寻找序列中所有对称度小于阈值的对称子序列。对称度通过 DTW 计算,值越小代表序列对称性越高。 + +**函数名:** PTNSYM + +**输入序列:** 仅支持一个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:对称子序列的长度,是一个正整数,默认值为 10。 ++ `threshold`:对称度阈值,是一个非负数,只有对称度小于等于该值的对称子序列才会被输出。在缺省情况下,所有的子序列都会被输出。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中的每一个数据点对应于一个对称子序列,时间戳为子序列的起始时刻,值为对称度。 + + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +### XCorr + +#### 注册语句 + +```sql +create function xcorr as 'org.apache.iotdb.library.dmatch.UDTFXCorr' +``` + +#### 函数简介 + +本函数用于计算两条时间序列的互相关函数值, +对离散序列而言,互相关函数可以表示为 +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +常用于表征两条序列在不同对齐条件下的相似度。 + +**函数名:** XCORR + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点, +其中正中心的值为两条序列按照预先对齐的结果计算的互相关系数(即等于以上公式的$CR(0)$), +前半部分的值表示将后一条输入序列向前平移时计算的互相关系数, +直至两条序列没有重合的数据点(不包含完全分离时的结果$CR(-N)=0.0$), +后半部分类似。 +用公式可表示为(所有序列的索引从1开始计数): +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**提示:** + ++ 两条序列中的`null` 和`NaN` 值会被忽略,在计算中表现为 0。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` + + + +## 数据修复 + +### TimestampRepair + +#### 注册语句 + +```sql +create function timestamprepair as 'org.apache.iotdb.library.drepair.UDTFTimestampRepair' +``` + +### 函数简介 + +本函数用于时间戳修复。根据给定的标准时间间隔,采用最小化修复代价的方法,通过对数据时间戳的微调,将原本时间戳间隔不稳定的数据修复为严格等间隔的数据。在未给定标准时间间隔的情况下,本函数将使用时间间隔的中位数 (median)、众数 (mode) 或聚类中心 (cluster) 来推算标准时间间隔。 + + +**函数名:** TIMESTAMPREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `interval`: 标准时间间隔(单位是毫秒),是一个正整数。在缺省情况下,将根据指定的方法推算。 ++ `method`:推算标准时间间隔的方法,取值为 'median', 'mode' 或 'cluster',仅在`interval`缺省时有效。在缺省情况下,将使用中位数方法进行推算。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +### 使用示例 + +#### 指定标准时间间隔 + +在给定`interval`参数的情况下,本函数将按照指定的标准时间间隔进行修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +#### 自动推算标准时间间隔 + +如果`interval`参数没有给定,本函数将按照推算的标准时间间隔进行修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +### ValueFill + +#### 注册语句 + +```sql +create function valuefill as 'org.apache.iotdb.library.drepair.UDTFValueFill' +``` + +#### 函数简介 + +**函数名:** ValueFill + +**输入序列:** 单列时序数据,类型为INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, 默认为 "linear"。其中,“mean” 指使用均值填补的方法; “previous" 指使用前值填补方法;“linear" 指使用线性插值填补方法;“likelihood” 为基于速度的正态分布的极大似然估计方法;“AR” 指自回归的填补方法;“MA” 指滑动平均的填补方法;"SCREEN" 指约束填补方法;缺省情况下使用 “linear”。 + +**输出序列:** 填补后的单维序列。 + +**备注:** AR 模型采用 AR(1),时序列需满足自相关条件,否则将输出单个数据点 (0, 0.0). + +#### 使用示例 +##### 使用 linear 方法进行填补 + +当`method`缺省或取值为 'linear' 时,本函数将使用线性插值方法进行填补。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuefill(s1) from root.test.d2 +``` + +输出序列: + + + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +##### 使用 previous 方法进行填补 + +当`method`取值为 'previous' 时,本函数将使前值填补方法进行数值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +### ValueRepair + +#### 注册语句 + +```sql +create function valuerepair as 'org.apache.iotdb.library.drepair.UDTFValueRepair' +``` + +#### 函数简介 + +本函数用于对时间序列的数值进行修复。目前,本函数支持两种修复方法:**Screen** 是一种基于速度阈值的方法,在最小改动的前提下使得所有的速度符合阈值要求;**LsGreedy** 是一种基于速度变化似然的方法,将速度变化建模为高斯分布,并采用贪心算法极大化似然函数。 + +**函数名:** VALUEREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的方法,取值为 'Screen' 或 'LsGreedy'. 在缺省情况下,使用 Screen 方法进行修复。 ++ `minSpeed`:该参数仅在使用 Screen 方法时有效。当速度小于该值时会被视作数值异常点加以修复。在缺省情况下为中位数减去三倍绝对中位差。 ++ `maxSpeed`:该参数仅在使用 Screen 方法时有效。当速度大于该值时会被视作数值异常点加以修复。在缺省情况下为中位数加上三倍绝对中位差。 ++ `center`:该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的中心。在缺省情况下为 0。 ++ `sigma` :该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的标准差。在缺省情况下为绝对中位差。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +#### 使用示例 + +##### 使用 Screen 方法进行修复 + +当`method`缺省或取值为 'Screen' 时,本函数将使用 Screen 方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +##### 使用 LsGreedy 方法进行修复 + +当`method`取值为 'LsGreedy' 时,本函数将使用 LsGreedy 方法进行数值修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +### MasterRepair + +#### 函数简介 + +本函数实现基于主数据的时间序列数据修复。 + +**函数名:**MasterRepair + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `omega`:算法窗口大小,非负整数(单位为毫秒), 在缺省情况下,算法根据不同时间差下的两个元组距离自动估计该参数。 +- `eta`:算法距离阈值,正数, 在缺省情况下,算法根据窗口中元组的距离分布自动估计该参数。 +- `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 +- `output_column`:输出列的序号,默认输出第一列的修复结果。 + +**输出序列:**输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +输出序列: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +### SeasonalRepair + +#### 函数简介 +本函数用于对周期性时间序列的数值进行基于分解的修复。目前,本函数支持两种方法:**Classical**使用经典分解方法得到的残差项检测数值的异常波动,并使用滑动平均修复序列;**Improved**使用改进的分解方法得到的残差项检测数值的异常波动,并使用滑动中值修复序列。 + +**函数名:** SEASONALREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的分解方法,取值为'Classical'或'Improved'。在缺省情况下,使用经典分解方法进行修复。 ++ `period`:序列的周期。 ++ `k`:残差项的范围阈值,用来限制残差项偏离中心的程度。在缺省情况下为9。 ++ `max_iter`:算法的最大迭代次数。在缺省情况下为10。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +#### 使用示例 +##### 使用经典分解方法进行修复 +当`method`缺省或取值为'Classical'时,本函数将使用经典分解方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +##### 使用改进的分解方法进行修复 +当`method`取值为'Improved'时,本函数将使用改进的分解方法进行数值修复。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` + + + +## 序列发现 + +### ConsecutiveSequences + +#### 注册语句 + +```sql +create function consecutivesequences as 'org.apache.iotdb.library.series.UDTFConsecutiveSequences' +``` + +#### 函数简介 + +本函数用于在多维严格等间隔数据中发现局部最长连续子序列。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续子序列是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。如果某个连续子序列不是任何连续子序列的真子序列,那么它是局部最长的。 + + +**函数名:** CONSECUTIVESEQUENCES + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个局部最长连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +#### 使用示例 + +##### 手动指定标准时间间隔 + +本函数可以通过`gap`参数手动指定标准时间间隔。需要注意的是,错误的参数设置会导致输出产生严重错误。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + +##### 自动估计标准时间间隔 + +当`gap`参数缺省时,本函数可以利用众数估计标准时间间隔,得到同样的结果。因此,这种用法更受推荐。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +### ConsecutiveWindows + +#### 注册语句 + +```sql +create function consecutivewindows as 'org.apache.iotdb.library.series.UDTFConsecutiveWindows' +``` + +#### 函数简介 + +本函数用于在多维严格等间隔数据中发现指定长度的连续窗口。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续窗口是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。 + + +**函数名:** CONSECUTIVEWINDOWS + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 ++ `length`:序列长度,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个指定长度连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +#### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` + + + +## 机器学习 + +### AR + +#### 注册语句 + +```sql +create function ar as 'org.apache.iotdb.library.dlearn.UDTFAR' +``` +#### 函数简介 + +本函数用于学习数据的自回归模型系数。 + +**函数名:** AR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `p`:自回归模型的阶数。默认为1。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。第一行对应模型的一阶系数,以此类推。 + +**提示:** + +- `p`应为正整数。 + +- 序列中的大部分点为等间隔采样点。 +- 序列中的缺失点通过线性插值进行填补后用于学习过程。 + +#### 使用示例 + +##### 指定阶数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +### Representation + +#### 函数简介 + +本函数用于时间序列的表示。 + +**函数名:** Representation + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为INT32,长度为`tb*vb`。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +#### 使用示例 + +##### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +### RM + +#### 函数简介 + +本函数用于基于时间序列表示的匹配度。 + +**函数名:** RM + +**输入序列:** 仅支持两个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度为`1`。序列的时间戳从0开始,序列仅有一个数据点,其时间戳为0,值为两个时间序列的匹配度。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +#### 使用示例 + +##### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Cluster-data-partitioning.md b/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Cluster-data-partitioning.md new file mode 100644 index 00000000..3d188f07 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Cluster-data-partitioning.md @@ -0,0 +1,110 @@ + + +# 负载均衡 +本文档介绍 IoTDB 中的分区策略和负载均衡策略。根据时序数据的特性,IoTDB 按序列和时间维度对其进行分区。结合序列分区与时间分区创建一个分区,作为划分的基本单元。为了提高吞吐量并降低管理成本,这些分区被均匀分配到分片(Region)中,分片是复制的基本单元。分片的副本决定了数据的存储位置,主副本负责主要负载的管理。在此过程中,副本放置算法决定哪些节点将持有分片副本,而主副本选择算法则指定哪个副本将成为主副本。 + +## 分区策略和分区分配 +IoTDB 为时间序列数据实现了量身定制的分区算法。在此基础上,缓存于配置节点和数据节点上的分区信息不仅易于管理,而且能够清晰区分冷热数据。随后,平衡的分区被均匀分配到集群的分片中,以实现存储均衡。 + +### 分区策略 +IoTDB 将生产环境中的每个传感器映射为一个时间序列。然后,使用序列分区算法对时间序列进行分区以管理其元数据,再结合时间分区算法来管理其数据。下图展示了 IoTDB 如何对时序数据进行分区。 + + + +#### 分区算法 +由于生产环境中通常部署大量设备和传感器,IoTDB 使用序列分区算法以确保分区信息的大小可控。由于生成的时间序列与时间戳相关联,IoTDB 使用时间分区算法来清晰区分冷热分区。 + +##### 序列分区算法 +默认情况下,IoTDB 将序列分区的数量限制为 1000,并将序列分区算法配置为哈希分区算法。这带来以下收益: ++ 由于序列分区的数量是固定常量,序列与序列分区之间的映射保持稳定。因此,IoTDB 不需要频繁进行数据迁移。 ++ 序列分区的负载相对均衡,因为序列分区的数量远小于生产环境中部署的传感器数量。 + +更进一步,如果能够更准确地估计生产环境中的实际负载情况,序列分区算法可以配置为自定义的哈希分区或列表分区,以在所有序列分区中实现更均匀的负载分布。 + +##### 时间分区算法 +时间分区算法通过下式将给定的时间戳转换为相应的时间分区 + +$$\left\lfloor\frac{\text{Timestamp}-\text{StartTimestamp}}{\text{TimePartitionInterval}}\right\rfloor\text{。}$$ + +在此式中,$\text{StartTimestamp}$ 和 $\text{TimePartitionInterval}$ 都是可配置参数,以适应不同的生产环境。$\text{StartTimestamp}$ 表示第一个时间分区的起始时间,而 $\text{TimePartitionInterval}$ 定义了每个时间分区的持续时间。默认情况下,$\text{TimePartitionInterval}$ 设置为七天。 + +#### 元数据分区 +由于序列分区算法对时间序列进行了均匀分区,每个序列分区对应一个元数据分区。这些元数据分区随后被均匀分配到 元数据分片 中,以实现元数据的均衡分布。 + +#### 数据分区 +结合序列分区与时间分区创建数据分区。由于序列分区算法对时间序列进行了均匀分区,特定时间分区内的数据分区负载保持均衡。这些数据分区随后被均匀分配到数据分片中,以实现数据的均衡分布。 + +### 分区分配 +IoTDB 使用分片来实现时间序列的弹性存储,集群中分片的数量由所有数据节点的总资源决定。由于分片的数量是动态的,IoTDB 可以轻松扩展。元数据分片和数据分片都遵循相同的分区分配算法,即均匀划分所有序列分区。下图展示了分区分配过程,其中动态扩展的分片匹配不断扩展的时间序列和集群。 + + + +#### 分片扩容 +分片的数量由下式给出 + +$$\text{RegionGroupNumber}=\left\lfloor\frac{\sum_{i=1}^{DataNodeNumber}\text{RegionNumber}_i}{\text{ReplicationFactor}}\right\rfloor\text{。}$$ + +在此式中,$\text{RegionNumber}_i$ 表示期望在第 $i$ 个数据节点上放置的副本数量,而 $\text{ReplicationFactor}$ 表示每个分片中的副本数量。$\text{RegionNumber}_i$ 和 $\text{ReplicationFactor}$ 都是可配置的参数。$\text{RegionNumber}_i$ 可以根据第 $i$ 个数据节点上的可用硬件资源(如 CPU 核心数量、内存大小等)确定,以适应不同的物理服务器。$\text{ReplicationFactor}$ 可以调整以确保不同级别的容错能力。 + +#### 分配策略 +元数据分片和数据分片都遵循相同的分配策略,即均匀划分所有序列分区。因此,每个元数据分片持有相同数量的元数据分区,以确保元数据存储均衡。同样,对于每个时间分区,每个数据分片 获取与其持有的序列分区对应的数据分区。因此,时间分区内的数据分区均匀分布在所有数据分片中,确保每个时间分区内的数据存储均衡。 + +值得注意的是,IoTDB 有效利用了时序数据的特性。当配置了 TTL(生存时间)时,IoTDB 可实现无需迁移的时序数据弹性存储,该功能在集群扩展时最小化了对在线操作的影响。上图展示了该功能的一个实例:新生成的数据分区被均匀分配到每个数据分片,过期数据会自动归档。因此,集群的存储最终将保持平衡。 + +## 均衡策略 +为了提高集群的可用性和性能,IoTDB 采用了精心设计的存储均衡和计算均衡算法。 + +### 存储均衡 +数据节点持有的副本数量反映了它的存储负载。如果数据节点之间的副本数量差异较大,拥有更多副本的数据节点可能成为存储瓶颈。尽管简单的轮询(Round Robin)放置算法可以通过确保每个数据节点持有等量副本来实现存储均衡,但它会降低集群的容错能力,如下所示: + + + ++ 假设集群有 4 个数据节点,4 个分片,并且副本因子为 2。 ++ 将分片 $r_1$ 的 2 个副本放置在数据节点 $n_1$ 和 $n_2$ 上。 ++ 将分片 $r_2$ 的 2 个副本放置在数据节点 $n_3$ 和 $n_4$ 上。 ++ 将分片 $r_3$ 的 2 个副本放置在数据节点 $n_1$ 和 $n_3$ 上。 ++ 将分片 $r_4$ 的 2 个副本放置在数据节点 $n_2$ 和 $n_4$ 上。 + +在这种情况下,如果数据节点 $n_2$ 发生故障,由它先前负责的负载将只能全部转移到数据节点 $n_1$,可能导致其过载。 + +为了解决这个问题,IoTDB 采用了一种副本放置算法,该算法不仅将副本均匀放置到所有数据节点上,还确保每个 数据节点在发生故障时,能够将其负载转移到足够多的其他数据节点。因此,集群实现了存储分布的均衡,并具备较高的容错能力,从而确保其可用性。 + +### 计算均衡 +数据节点持有的主副本数量反映了它的计算负载。如果数据节点之间持有主副本数量差异较大,拥有更多主副本的数据节点可能成为计算瓶颈。如果主副本选择过程使用直观的贪心算法,当副本以容错算法放置时,可能会导致主副本分布不均,如下所示: + + + ++ 假设集群有 4 个数据节点,4 个分片,并且副本因子为 2。 ++ 选择分片 $r_5$ 在数据节点 $n_5$ 上的副本作为主副本。 ++ 选择分片 $r_6$ 在数据节点 $n_7$ 上的副本作为主副本。 ++ 选择分片 $r_7$ 在数据节点 $n_7$ 上的副本作为主副本。 ++ 选择分片 $r_8$ 在数据节点 $n_8$ 上的副本作为主副本。 + +请注意,以上步骤严格遵循贪心算法。然而,到第 3 步时,无论在数据节点 $n_5$ 或 $n_7$ 上选择分片 $r_7$ 的主副本,都会导致主副本分布不均衡。根本原因在于每一步贪心选择都缺乏全局视角,最终导致局部最优解。 + +为了解决这个问题,IoTDB 采用了一种主副本选择算法,能够持续平衡集群中的主副本分布。因此,集群实现了计算负载的均衡分布,确保了其性能。 + +## Source Code ++ [数据分区](https://github.com/apache/iotdb/tree/master/iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/partition) ++ [分区分配](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/partition) ++ [副本放置](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/副本) ++ [主副本选择](https://github.com/apache/iotdb/tree/master/iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/manager/load/balancer/router/主副本) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Encoding-and-Compression.md b/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Encoding-and-Compression.md new file mode 100644 index 00000000..0fd4bbdd --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Encoding-and-Compression.md @@ -0,0 +1,124 @@ + + +# 编码和压缩 + +## 编码方式 + +### 基本编码方式 + +为了提高数据的存储效率,需要在数据写入的过程中对数据进行编码,从而减少磁盘空间的使用量。在写数据以及读数据的过程中都能够减少 I/O 操作的数据量从而提高性能。IoTDB 支持多种针对不同类型的数据的编码方法: + +1. PLAIN 编码(PLAIN) + + PLAIN 编码,默认的编码方式,即不编码,支持多种数据类型,压缩和解压缩的时间效率较高,但空间存储效率较低。 + +2. 二阶差分编码(TS_2DIFF) + + 二阶差分编码,比较适合编码单调递增或者递减的序列数据,不适合编码波动较大的数据。 + +3. 游程编码(RLE) + + 游程编码,比较适合存储某些数值连续出现的序列,不适合编码大部分情况下前后值不一样的序列数据。 + + 游程编码也可用于对浮点数进行编码,但在创建时间序列的时候需指定保留小数位数(MAX_POINT_NUMBER,具体指定方式参见本文 [SQL 参考文档](../SQL-Manual/SQL-Manual.md))。比较适合存储某些浮点数值连续出现的序列数据,不适合存储对小数点后精度要求较高以及前后波动较大的序列数据。 + + > 游程编码(RLE)和二阶差分编码(TS_2DIFF)对 float 和 double 的编码是有精度限制的,默认保留 2 位小数。推荐使用 GORILLA。 + +4. GORILLA 编码(GORILLA) + + GORILLA 编码是一种无损编码,它比较适合编码前后值比较接近的数值序列,不适合编码前后波动较大的数据。 + + 当前系统中存在两个版本的 GORILLA 编码实现,推荐使用`GORILLA`,不推荐使用`GORILLA_V1`(已过时)。 + + 使用限制:使用 Gorilla 编码 INT32 数据时,需要保证序列中不存在值为`Integer.MIN_VALUE`的数据点;使用 Gorilla 编码 INT64 数据时,需要保证序列中不存在值为`Long.MIN_VALUE`的数据点。 + +5. 字典编码 (DICTIONARY) + + 字典编码是一种无损编码。它适合编码基数小的数据(即数据去重后唯一值数量小)。不推荐用于基数大的数据。 + +6. ZIGZAG 编码 + + ZigZag编码将有符号整型映射到无符号整型,适合比较小的整数。 + +7. CHIMP 编码 + + CHIMP 是一种无损编码。它是一种新的流式浮点数据压缩算法,可以节省存储空间。这个编码适用于前后值比较接近的数值序列,对波动小和随机噪声少的序列数据更加友好。 + + 使用限制:如果对 INT32 类型数据使用 CHIMP 编码,需要确保数据点中没有 `Integer.MIN_VALUE`。 如果对 INT64 类型数据使用 CHIMP 编码,需要确保数据点中没有 `Long.MIN_VALUE`。 + +8. SPRINTZ 编码 + + SPRINTZ编码是一种无损编码,将原始时序数据分别进行预测、Zigzag编码、位填充和游程编码。SPRINTZ编码适合差分值的绝对值较小(即波动较小)的时序数据,不适合差分值较大(即波动较大)的时序数据。 + +9. RLBE 编码 + + RLBE编码是一种无损编码,将差分编码,位填充编码,游程长度,斐波那契编码和拼接等编码思想结合到一起。RLBE编码适合递增且递增值较小的时序数据,不适合波动较大的时序数据。 + +### 数据类型与编码的对应关系 + +前文介绍的五种编码适用于不同的数据类型,若对应关系错误,则无法正确创建时间序列。数据类型与支持其编码的编码方式对应关系总结如下表所示。 + +| **数据类型** | **最佳的编码(默认)** | **支持的编码** | +| ------------ | ---------------------- | ----------------------------------------------------------- | +| BOOLEAN | RLE | PLAIN, RLE | +| INT32 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| DATE | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| INT64 | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| TIMESTAMP | TS_2DIFF | PLAIN, RLE, TS_2DIFF, GORILLA, ZIGZAG, CHIMP, SPRINTZ, RLBE | +| FLOAT | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| DOUBLE | GORILLA | PLAIN, RLE, TS_2DIFF, GORILLA, CHIMP, SPRINTZ, RLBE | +| TEXT | PLAIN | PLAIN, DICTIONARY | +| STRING | PLAIN | PLAIN, DICTIONARY | +| BLOB | PLAIN | PLAIN | + +当用户输入的数据类型与编码方式不对应时,系统会提示错误。如下所示,二阶差分编码不支持布尔类型: + +``` +IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +Msg: 507: encoding TS_2DIFF does not support BOOLEAN +``` + +## 压缩方式 + +当时间序列写入并按照指定的类型编码为二进制数据后,IoTDB 会使用压缩技术对该数据进行压缩,进一步提升空间存储效率。虽然编码和压缩都旨在提升存储效率,但编码技术通常只适合特定的数据类型(如二阶差分编码只适合与 INT32 或者 INT64 编码,存储浮点数需要先将他们乘以 10m 以转换为整数),然后将它们转换为二进制流。压缩方式(SNAPPY)针对二进制流进行压缩,因此压缩方式的使用不再受数据类型的限制。 + +### 基本压缩方式 + +IoTDB 允许在创建一个时间序列的时候指定该列的压缩方式。现阶段 IoTDB 支持以下几种压缩方式: + +* UNCOMPRESSED(不压缩) +* SNAPPY 压缩 +* LZ4 压缩(最佳压缩方式) +* GZIP 压缩 +* ZSTD 压缩 +* LZMA2 压缩 + +压缩方式的指定语法详见本文[SQL 参考文档](../SQL-Manual/SQL-Manual.md)。 + +### 压缩比统计信息 + +压缩比统计信息文件:data/datanode/system/compression_ratio + +* ratio_sum: memtable压缩比的总和 +* memtable_flush_time: memtable刷盘的总次数 + +通过 `ratio_sum / memtable_flush_time` 可以计算出平均压缩比 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Publication.md b/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Publication.md new file mode 100644 index 00000000..bd7f431d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Technical-Insider/Publication.md @@ -0,0 +1,41 @@ + + +# 学术成果 + +Apache IoTDB 始于清华大学软件学院。IoTDB 是一个用于管理大量时间序列数据的数据库,它采用了列式存储、数据编码、预计算和索引技术,具有类 SQL 的接口,可支持每秒每节点写入数百万数据点,可以秒级获得超过数万亿个数据点的查询结果。它还可以很容易地与 Apache Hadoop、MapReduce 和 Apache Spark 集成以进行分析。 + +相关研究论文如下: +* [Grouping Time Series for Efficient Columnar Storage](https://sxsong.github.io/doc/23sigmod-group.pdf), Chenguang Fang, Shaoxu Song, Haoquan Guan, Xiangdong Huang, Chen Wang, Jianmin Wang. SIGMOD 2023. +* [Learning Autoregressive Model in LSM-Tree based Store](https://sxsong.github.io/doc/23kdd.pdf), Yunxiang Su, Wenxuan Ma, Shaoxu Song. SIGMOD 2023. +* [TsQuality: Measuring Time Series Data Quality in Apache IoTDB](https://sxsong.github.io/doc/23vldb-qaulity.pdf), Yuanhui Qiu, Chenguang Fang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. VLDB 2023. +* [Frequency Domain Data Encoding in Apache IoTDB](https://sxsong.github.io/doc/22vldb-frequency.pdf), Haoyu Wang, Shaoxu Song. VLDB 2023. +* [Non-Blocking Raft for High Throughput IoT Data](https://sxsong.github.io/doc/23icde-raft.pdf), Tian Jiang, Xiangdong Huang, Shaoxu Song, Chen Wang, Jianmin Wang, Ruibo Li, Jincheng Sun. ICDE 2023. +* [Backward-Sort for Time Series in Apache IoTDB](https://sxsong.github.io/doc/23icde-sort.pdf), Xiaojian Zhang, Hongyin Zhang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. ICDE 2023. +* [Time Series Data Encoding for Efficient Storage: A Comparative Analysis in Apache IoTDB](https://sxsong.github.io/doc/22vldb-encoding.pdf), Jinzhao Xiao, Yuxiang Huang, Changyu Hu, Shaoxu Song, Xiangdong Huang, Jianmin Wang. VLDB 2022. +* [Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree](https://sxsong.github.io/doc/22icde-separation.pdf), Yuyuan Kang, Xiangdong Huang, Shaoxu Song, Lingzhe Zhang, Jialin Qiao, Chen Wang, Jianmin Wang, Julian Feinauer. ICDE 2022. +* [Dual-PISA: An index for aggregation operations on time series data](https://www.sciencedirect.com/science/article/pii/S0306437918305489), Jialin Qiao, Xiangdong Huang, Jianmin Wang, Raymond K Wong. IS 2020. +* [Apache IoTDB: time-series database for internet of things](http://www.vldb.org/pvldb/vol13/p2901-wang.pdf), Chen Wang, Xiangdong Huang, Jialin Qiao, Tian Jiang, Lei Rui, Jinrui Zhang, Rong Kang, Julian Feinauer, Kevin A. McGrail, Peng Wang, Jun Yuan, Jianmin Wang, Jiaguang Sun. VLDB 2020. +* [KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping](https://www.semanticscholar.org/paper/KV-match%3A-A-Subsequence-Matching-Approach-and-Time-Wu-Wang/9ed84cb15b7e5052028fc5b4d667248713ac8592), Jiaye Wu and Peng Wang and Chen Wang and Wei Wang and Jianmin Wang. ICDE 2019. +* [The Design of Apache IoTDB distributed framework](http://ndbc2019.sdu.edu.cn/info/1002/1044.htm), Tianan Li, Jianmin Wang, Xiangdong Huang, Yi Xu, Dongfang Mao, Jun Yuan. NDBC 2019. +* [Matching Consecutive Subpatterns over Streaming Time Series](https://link.springer.com/chapter/10.1007/978-3-319-96893-3_8), Rong Kang and Chen Wang and Peng Wang and Yuting Ding and Jianmin Wang. APWeb/WAIM 2018. +* [PISA: An Index for Aggregating Big Time Series Data](https://dl.acm.org/citation.cfm?id=2983775&dl=ACM&coll=DL), Xiangdong Huang and Jianmin Wang and Raymond K. Wong and Jinrui Zhang and Chen Wang. CIKM 2016. + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Benchmark.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Benchmark.md new file mode 100644 index 00000000..9c89db2a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Benchmark.md @@ -0,0 +1,352 @@ + + +# 测试工具 + +## 概述 + +IoT-benchmark 是基于 Java 和大数据环境开发的时序数据库基准测试工具,由清华大学软件学院研发并开源。它使用方便,支持多种写入以及查询方式,支持存储测试信息和结果以供进一步查询或分析,支持与 Tableau 集成以可视化测试结果。 + +下图1-1囊括了测试基准流程及其他扩展功能。这些流程可以由IoT-benchmark 统一来完成。IoT Benchmark 支持多种工作负载,包括**纯写入、纯查询、写入查询混合**等,支持**软硬件系统监控、测试指标度量**等监控功能,还实现了**初始化数据库自动化、测试数据分析及系统参数优化**等功能。 + +![](https://alioss.timecho.com/docs/img/bm1.PNG) + +图1-1 + +借鉴 YCSB 测试工具将工作负载生成、性能指标测量和数据库接口三个组件分离的设计思想,IoT-benchmark 的模块化设计如图1-2所示。与基于 YCSB 的测试工具系统不同的是,IoT-benchmark 增加了系统监控模块,支持测试数据和系统指标监控数据的持久化。此外也增加了一些特别针对时序数据场景的特殊负载测试功能,如支持物联网场景的批量写入和多种乱序数据写入模式。 + +![](https://alioss.timecho.com/docs/img/bm2.PNG) + +图1-2 + + + +目前 IoT-benchmark 支持如下时间序列数据库、版本和连接方式: + +| 数据库 | 版本 | 连接方式 | +| --------------- | ------- | -------------------------------------------------------- | +| InfluxDB | v1.x
v2.0 | SDK | | +| TimescaleDB | -- | jdbc | +| OpenTSDB | -- | Http Request | +| QuestDB | v6.0.7 | jdbc | +| TDengine | v2.2.0.2 | jdbc | +| VictoriaMetrics | v1.64.0 | Http Request | +| KairosDB | -- | Http Request | +| IoTDB | v1.x
v0.13 | jdbc、sessionByTablet、sessionByRecord、sessionByRecords | + +表1-1大数据测试基准对比 + +### 软件安装与环境搭建 + +#### IoT Benchmark 运行的前置条件 + +1. Java 8 +2. Maven 3.6+ +3. 对应的合适版本的数据库,如 Apache IoTDB 1.0 + + + +#### IoT Benchmark 的获取方式 + +- **获取二进制包**:进入https://github.com/thulab/iot-benchmark/releases 下载需要的安装包。下载下来为一个压缩文件,选择文件夹解压即可使用。 +- 源代码编译(可用户 Apache IoTDB 1.0 的测试): + - 第一步(编译 IoTDB Session 最新包):进入官网 https://github.com/apache/iotdb/tree/rel/1.0 下载 IoTDB 源码,在根目录下运行命令 mvn clean package install -pl session -am -DskipTests 编译 IoTDB Session 的最新包。 + - 第二步(编译 IoTDB Benchmark 测试包):进入官网 https://github.com/thulab/iot-benchmark 下载源码,在根目录下运行 mvn clean package install -pl iotdb-1.0 -am -DskipTests 编译测试 Apache IoTDB 1.0版本的测试包,测试包位置与根目录的相对路径为 ./iotdb-1.0/target/iotdb-1.0-0.0.1/iotdb-1.0-0.0.1 + + + +#### IoT Benchmark 的测试包结构 + +测试包的目录结构如下图1-3所示。其中测试配置文件为conf/config.properties,测试启动脚本为benchmark\.sh (Linux & MacOS) 和 benchmark.bat (Windows),详细文件用途见表1-2所示。 + +![](https://alioss.timecho.com/docs/img/bm3.png) + +图1-3文件和文件夹列表 + +| 名称 | 子文件 | 用途 | +| ---------------- | ----------------- | ------------------------- | +| benchmark.bat | - | Windows环境运行启动脚本 | +| benchmark\.sh | - | Linux/Mac环境运行启动脚本 | +| conf | config.properties | 测试场景配置文件 | +| logback.xml | 日志输出配置文件 | | +| lib | - | 依赖库文件 | +| LICENSE | - | 许可文件 | +| bin | startup\.sh | 初始化脚本文件夹 | +| ser-benchmark\.sh | - | 监控模式启动脚本 | + +表1-2文件和文件夹列表用途 + +#### IoT Benchmark 执行测试 + +1. 按照测试需求修改配置文件,主要参数介绍见 1.2 节,对应配置文件为conf/config.properties,**比如测试Apache** **IoTDB 1.0,则需要修改 DB_SWITCH=IoTDB-100-SESSION_BY_TABLET** +2. 启动被测时间序列数据库 +3. 通过运行 +4. 启动IoT-benchmark执行测试。执行中观测被测时间序列数据库和IoT-benchmark状态,执行完毕后查看结果和分析测试过程。 + +#### IoT Benchmark 结果说明 + +测试的所有日志文件被存放于 logs 文件夹下,测试的结果在测试完成后被存放到 data/csvOutput 文件夹下,例如测试后我们得到了如下的结果矩阵: + +![](https://alioss.timecho.com/docs/img/bm4.png) + +- Result Matrix + - OkOperation:成功的对应操作次数 + - OkPoint:对于写入操作,是成功写入的点数;对于查询操作,是成功查询到的点数。 + - FailOperation:失败的对应操作次数 + - FailPoint:对于写入操作是写入失败的点数 +- Latency(mx) Matrix + - AVG:操作平均耗时 + - MIN:操作最小耗时 + - Pn:操作整体分布的对应分位值,比如P25是下四分位数 + + + +### 主要参数介绍 + +本节重点解释说明了主要参数的用途和配置方法。 + +#### 工作模式和操作比例 + +- 工作模式参数“BENCHMARK_WORK_MODE”可选项为“默认模式”和“服务器监控”;其中“服务器监控”模式可直接通过执行ser-benchmark.sh脚本启动,脚本会自动修改该参数。“默认模式”为常用测试模式,结合配置OPERATION_PROPORTION参数达到“纯写入”、“纯查询”和“读写混合”的测试操作比例定义 + +- 当运行ServerMode来执行被测时序数据库运行环境监控时IoT-benchmark依赖sysstat软件相关命令;如果需要持久化测试过程数据时选择MySQL或IoTDB,则需要安装该类数据库;ServerMode和CSV的记录模式只能在Linux系统中使用,记录测试过程中的相关系统信息。因此我们建议使用MacOs或Linux系统,本文以Linux(Centos7)系统为例,如果使用Windows系统,可以使用conf文件夹下的benchmark.bat脚本启动IoT-benchmark。 + + 表1-3测试模式 + +| 模式名称 | BENCHMARK_WORK_MODE | 模式内容 | +| ---------------------- | ------------------- | ------------------------------------------------------------ | +| 常规测试模式 | testWithDefaultPath | 支持多种读和写操作的混合负载 | +| 服务器资源使用监控模式 | serverMODE | 服务器资源使用监控模式(该模式下运行通过ser-benchmark.sh脚本启动,无需手动配置该参数 | + +#### 服务器连接信息 + +工作模式指定后,被测时序数据库的信息要如何告知IoT-benchmark呢?当前通过“DB_SWITCH”告知被测时序数据库类型;通过“HOST”告知被测时序数据库网络地址;通过“PORT”告知被测时序数据库网络端口;通过“USERNAME”告知被测时序数据库登录用户名;通过“PASSWORD”告知被测时序数据库登录用户的密码;通过“DB_NAME”告知被测时序数据库名称;通过“TOKEN”告知被测时序数据库连接认证Token(InfluxDB 2.0使用); + +#### 写入场景构建参数 + +表1-4写入场景构建参数 + +| 参数名称 | 类型 | 示例 | 系统描述 | +| -------------------------- | ------ | ------------------------- | ------------------------------------------------------------ | +| CLIENT_NUMBER | 整数 | 100 | 客户端总数 | +| GROUP_NUMBER | 整数 | 20 | 数据库的数量;仅针对IoTDB。 | +| DEVICE_NUMBER | 整数 | 100 | 设备总数 | +| SENSOR_NUMBER | 整数 | 300 | 每个设备的传感器总数 | +| INSERT_DATATYPE_PROPORTION | 字符串 | 1:1:1:1:1:1 | 设备的数据类型比例,BOOLEAN:INT32:INT64:FLOAT:DOUBLE:TEXT | +| POINT_STEP | 整数 | 1000 | 数据间时间戳间隔,即生成的数据两个时间戳之间的固定长度。 | +| OP_MIN_INTERVAL | 整数 | 0 | 操作最小执行间隔:若操作耗时大于该值则立即执行下一个,否则等待 (OP_MIN_INTERVAL-实际执行时间) ms;如果为0,则参数不生效;如果为-1,则其值和POINT_STEP一致 | +| IS_OUT_OF_ORDER | 布尔 | false | 是否乱序写入 | +| OUT_OF_ORDER_RATIO | 浮点数 | 0.3 | 乱序写入的数据比例 | +| BATCH_SIZE_PER_WRITE | 整数 | 1 | 批写入数据行数(一次写入多少行数据) | +| START_TIME | 时间 | 2022-10-30T00:00:00+08:00 | 写入数据的开始时间戳;以该时间戳为起点开始模拟创建数据时间戳。 | +| LOOP | 整数 | 86400 | 总操作次数:具体每种类型操作会按OPERATION_PROPORTION定义的比例划分 | +| OPERATION_PROPORTION | 字符 | 1:0:0:0:0:0:0:0:0:0:0 | # 各操作的比例,按照顺序为 写入:Q1:Q2:Q3:Q4:Q5:Q6:Q7:Q8:Q9:Q10, 请注意使用英文冒号。比例中的每一项是整数。 | + +按照表1-4配置参数启动可描述测试场景为:向被测时序数据库压力写入30000个(100个设备,每个设备300个传感器)时间序列2022年10月30日一天的顺序数据,总计25.92亿个数据点。其中每个设备的300个传感器数据类型分别为50个布尔、50个整数、50个长整数、50个浮点、50个双精度、50个字符。如果我们将表格中IS_OUT_OF_ORDER的值改为true,那么他表示的场景为:向被测时序数据库压力写入30000个时间序列2022年10月30日一天的数据,其中存在30%的乱序数据(到达时序数据库时间晚于其他生成时间晚于自身的数据点)。 + +#### 查询场景构建参数 + +表1-5查询场景构建参数 + +| 参数名称 | 类型 | 示例 | 系统描述 | +| -------------------- | ---- | --------------------- | ------------------------------------------------------------ | +| QUERY_DEVICE_NUM | 整数 | 2 | 每条查询语句中查询涉及到的设备数量 | +| QUERY_SENSOR_NUM | 整数 | 2 | 每条查询语句中查询涉及到的传感器数量 | +| QUERY_AGGREGATE_FUN | 字符 | count | 在聚集查询中使用的聚集函数,比如count、avg、sum、max_time等 | +| STEP_SIZE | 整数 | 1 | 时间过滤条件的时间起点变化步长,若设为0则每个查询的时间过滤条件是一样的,单位:POINT_STEP | +| QUERY_INTERVAL | 整数 | 250000 | 起止时间的查询中开始时间与结束时间之间的时间间隔,和Group By中的时间间隔 | +| QUERY_LOWER_VALUE | 整数 | -5 | 条件查询子句时的参数,where xxx > QUERY_LOWER_VALUE | +| GROUP_BY_TIME_UNIT | 整数 | 20000 | Group by语句中的组的大小 | +| LOOP | 整数 | 10 | 总操作次数:具体每种类型操作会按OPERATION_PROPORTION定义的比例划分 | +| OPERATION_PROPORTION | 字符 | 0:0:0:0:0:0:0:0:0:0:1 | 写入:Q1:Q2:Q3:Q4:Q5:Q6:Q7:Q8:Q9:Q10 | + +表1-6查询类型及示例 SQL + +| 编号 | 查询类型 | IoTDB 示例 SQL | +| ---- | ---------------------------- | ------------------------------------------------------------ | +| Q1 | 精确点查询 | select v1 from root.db.d1 where time = ? | +| Q2 | 时间范围查询 | select v1 from root.db.d1 where time > ? and time < ? | +| Q3 | 带值过滤的时间范围查询 | select v1 from root.db.d1 where time > ? and time < ? and v1 > ? | +| Q4 | 时间范围聚合查询 | select count(v1) from root.db.d1 where and time > ? and time < ? | +| Q5 | 带值过滤的全时间范围聚合查询 | select count(v1) from root.db.d1 where v1 > ? | +| Q6 | 带值过滤的时间范围聚合查询 | select count(v1) from root.db.d1 where v1 > ? and time > ? and time < ? | +| Q7 | 时间分组聚合查询 | select count(v1) from root.db.d1 group by ([?, ?), ?, ?) | +| Q8 | 最新点查询 | select last v1 from root.db.d1 | +| Q9 | 倒序范围查询 | select v1 from root.sg.d1 where time > ? and time < ? order by time desc | +| Q10 | 倒序带值过滤的范围查询 | select v1 from root.sg.d1 where time > ? and time < ? and v1 > ? order by time desc | + + + + + +按照表1-5配置参数启动可描述测试场景为:从被测时序数据库执行10次2个设备2个传感器的倒序带值过滤的范围查询,SQL语句为:select s_0,s_31from data where time >2022-10-30T00:00:00+08:00 and time < 2022-10-30T00:04:10+08:00 and s_0 > -5 and device in d_21,d_46 order by time desc。 + +#### 测试过程和测试结果持久化 + +IoT-benchmark目前支持通过配置参数“TEST_DATA_PERSISTENCE”将测试过程和测试结果持久化到IoTDB、MySQL和CSV;其中写入到MySQL和CSV可以定义分库分表的行数上限,例如“RECORD_SPLIT=true、RECORD_SPLIT_MAX_LINE=10000000”表示每个数据库表或CSV文件按照总行数为1千万切分存放;如果记录到MySQL或IoTDB需要提供数据库链接信息,分别包括“TEST_DATA_STORE_IP”数据库的IP地址、“TEST_DATA_STORE_PORT”数据库的端口号、“TEST_DATA_STORE_DB”数据库的名称、“TEST_DATA_STORE_USER”数据库用户名、“TEST_DATA_STORE_PW”数据库用户密码。 + +如果我们设置“TEST_DATA_PERSISTENCE=CSV”,测试执行时和执行完毕后我们可以在IoT-benchmark根目录下看到新生成的data文件夹,其下包含csv文件夹记录测试过程;csvOutput文件夹记录测试结果。如果我们设置“TEST_DATA_PERSISTENCE=MySQL”,它会在测试开始前在指定的MySQL数据库中创建命名如“testWithDefaultPath_被测数据库名称_备注_测试启动时间”的数据表记录测试过程;会在名为“CONFIG”的数据表(如果不存在则创建该表),写入本次测试的配置信息;当测试完成时会在名为“FINAL_RESULT”的数据表(如果不存在则创建该表)中写入本次测试结果。 + +## 实际案例 + +我们以中车青岛四方车辆研究所有限公司应用为例,参考《ApacheIoTDB在智能运维平台存储中的应用》中描述的场景进行实际操作说明。 + +测试目标:模拟中车青岛四方所场景因切换时间序列数据库实际需求,对比预期使用的IoTDB和原有系统使用的KairosDB性能。 + +测试环境:为了保证在实验过程中消除其他无关服务与进程对数据库性能的影响,以及不同数据库之间的相互影响,本实验中的本地数据库均部署并运行在资源配置相同的多个独立的虚拟机上。因此,本实验搭建了 4 台 Linux( CentOS7 /x86) 虚拟机,并分别在上面部署了IoT-benchmark、 IoTDB数据库、KairosDB数据库、MySQL数据库。每一台虚拟机的具体资源配置如表2-1所示。每一台虚拟机的具体用途如表2-2所示。 + +表2-1虚拟机配置信息 + +| 硬件配置信息 | 系统描述 | +| ------------ | -------- | +| OS System | CentOS7 | +| CPU核数 | 16 | +| 内存 | 32G | +| 硬盘 | 200G | +| 网卡 | 千兆 | + + + +表2-2虚拟机用途 + +| IP | 用途 | +| ---------- | ------------- | +| 172.21.4.2 | IoT-benchmark | +| 172.21.4.3 | Apache-iotdb | +| 172.21.4.4 | KaiosDB | +| 172.21.4.5 | MySQL | + +### 写入测试 + +场景描述:创建100个客户端来模拟100列车、每列车3000个传感器、数据类型为DOUBLE类型、数据时间间隔为500ms(2Hz)、顺序发送。参考以上需求我们需要修改IoT-benchmark配置参数如表2-3中所列。 + +表2-3配置参数信息 + +| 参数名称 | IoTDB值 | KairosDB值 | +| -------------------------- | --------------------------- | ---------- | +| DB_SWITCH | IoTDB-013-SESSION_BY_TABLET | KairosDB | +| HOST | 172.21.4.3 | 172.21.4.4 | +| PORT | 6667 | 8080 | +| BENCHMARK_WORK_MODE | testWithDefaultPath | | +| OPERATION_PROPORTION | 1:0:0:0:0:0:0:0:0:0:0 | | +| CLIENT_NUMBER | 100 | | +| GROUP_NUMBER | 10 | | +| DEVICE_NUMBER | 100 | | +| SENSOR_NUMBER | 3000 | | +| INSERT_DATATYPE_PROPORTION | 0:0:0:0:1:0 | | +| POINT_STEP | 500 | | +| OP_MIN_INTERVAL | 0 | | +| IS_OUT_OF_ORDER | false | | +| BATCH_SIZE_PER_WRITE | 1 | | +| LOOP | 10000 | | +| TEST_DATA_PERSISTENCE | MySQL | | +| TEST_DATA_STORE_IP | 172.21.4.5 | | +| TEST_DATA_STORE_PORT | 3306 | | +| TEST_DATA_STORE_DB | demo | | +| TEST_DATA_STORE_USER | root | | +| TEST_DATA_STORE_PW | admin | | +| REMARK | demo | | + +首先在172.21.4.3和172.21.4.4上分别启动被测时间序列数据库Apache-IoTDB和KairosDB,之后在172.21.4.2、172.21.4.3和172.21.4.4上通过ser-benchamrk.sh脚本启动服务器资源监控(图2-1)。然后按照表2-3在172.21.4.2分别修改iotdb-0.13-0.0.1和kairosdb-0.0.1文件夹内的conf/config.properties文件满足测试需求。先后使用benchmark.sh启动对Apache-IoTDB和KairosDB的写入测试。 + +![img](https://alioss.timecho.com/docs/img/bm5.png) + +图2-1服务器监控任务 + +​ 例如我们首先启动对KairosDB的测试,IoT-benchmark会在MySQL数据库中创建CONFIG数据表存放本次测试配置信息(图2-2),测试执行中会有日志输出当前测试进度(图2-3)。测试完成时会输出本次测试结果(图2-3),同时将结果写入FINAL_RESULT数据表中(图2-4)。 + +![](https://alioss.timecho.com/docs/img/bm6.png) + +图2-2测试配置信息表 + +![](https://alioss.timecho.com/docs/img/bm7.png) +![](https://alioss.timecho.com/docs/img/bm8.png) +![](https://alioss.timecho.com/docs/img/bm9.png) +![](https://alioss.timecho.com/docs/img/bm10.png) + +图2-3测试进度和结果 + +![](https://alioss.timecho.com/docs/img/bm11.png) + +图2-4测试结果表 + +之后我们再启动对Apache-IoTDB的测试,同样的IoT-benchmark会在MySQL数据库CONFIG数据表中写入本次测试配置信息,测试执行中会有日志输出当前测试进度。测试完成时会输出本次测试结果,同时将结果写入FINAL_RESULT数据表中。 + +依照测试结果信息我们知道同样的配置写入Apache-IoTDB和KairosDB写入延时时间分别为:55.98ms和1324.45ms;写入吞吐分别为:5,125,600.86点/秒和224,819.01点/秒;测试分别执行了585.30秒和11777.99秒。并且KairosDB有写入失败出现,排查后发现是数据磁盘使用率已达到100%,无磁盘空间继续接收数据。而Apache-IoTDB无写入失败现象,全部数据写入完毕后占用磁盘空间仅为4.7G(如图2-5所示);从写入吞吐和磁盘占用情况上看Apache-IoTDB均优于KairosDB。当然后续还有其他测试来从多方面观察和对比,比如查询性能、文件压缩比、数据安全性等。 + +![](https://alioss.timecho.com/docs/img/bm12.png) + +图2-5磁盘使用情况 + +那么测试过程中各个服务器资源使用情况如何呢?每个写操作具体的表现如何呢?这个时候我们就可以通过安装和使用Tableau来可视化服务器监控表和测试过程记录表内的数据了。Tableau的使用本文不展开介绍,通过它连接测试数据持久化的数据表后具体结果下如图(以Apache-IoTDB为例): + +![](https://alioss.timecho.com/docs/img/bm13.png) +![](https://alioss.timecho.com/docs/img/bm14.png) + +图2-6Tableau可视化测试过程 + + + +### 查询测试 + +场景描述:在写入测试场景下模拟10个客户端对时序数据库Apache-IoTDB内存放的数据进行全类型查询任务。配置如下: + +表2-4配置参数信息 + +| 参数名称 | 示例 | +| -------------------- | --------------------- | +| CLIENT_NUMBER | 10 | +| QUERY_DEVICE_NUM | 2 | +| QUERY_SENSOR_NUM | 2 | +| QUERY_AGGREGATE_FUN | count | +| STEP_SIZE | 1 | +| QUERY_INTERVAL | 250000 | +| QUERY_LOWER_VALUE | -5 | +| GROUP_BY_TIME_UNIT | 20000 | +| LOOP | 30 | +| OPERATION_PROPORTION | 0:1:1:1:1:1:1:1:1:1:1 | + +执行结果: + +![](https://alioss.timecho.com/docs/img/bm15.png) + +图2-7查询测试结果 + +### 其他参数说明 + +之前章节中针对Apache-IoTDB和KairosDB进行写入性能对比,但是用户如果要执行模拟真实写入速率测试该如何配置?测试时间过长该如何控制呢?生成的模拟数据有哪些规律吗?如果IoT-Benchmark服务器配置较低,可以使用多台机器模拟压力输出吗? + +表2-5配置参数信息 + +| 场景 | 参数 | 值 | 说明 | +| ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | --------------------------------- | +| 模拟真实写入速率 | OP_INTERVAL | -1 | 也可输入整数控制操作间隔 | +| 指定测试时长(1小时) | TEST_MAX_TIME | 3600000 | 单位 ms;需要LOOP执行时间大于该值 | +| 定义模拟数据规律:支持全部数据类型,数量平均分类;支持五种数据分布,数量平均分布;字符串长度为10;小数位数为2 | INSERT_DATATYPE_PROPORTION | 1:1:1:1:1:1 | 数据类型分布比率 | +| LINE_RATIO | 1 | 线性 | | +| SIN_RATIO | 1 | 傅里叶函数 | | +| SQUARE_RATIO | 1 | 方波 | | +| RANDOM_RATIO | 1 | 随机数 | | +| CONSTANT_RATIO | 1 | 常数 | | +| STRING_LENGTH | 10 | 字符串长度 | | +| DOUBLE_LENGTH | 2 | 小数位数 | | +| 三台机器模拟300台设备数据写入 | BENCHMARK_CLUSTER | true | 开启多benchmark模式 | +| BENCHMARK_INDEX | 0、1、3 | 以[写入测试](./Benchmark.md#写入测试)写入参数为例:0号负责设备编号0-99数据写入;1号负责设备编号100-199数据写入;2号负责设备编号200-299数据写入; | | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/CLI.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/CLI.md new file mode 100644 index 00000000..80728257 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/CLI.md @@ -0,0 +1,276 @@ + + +# SQL 命令行终端 (CLI) + +IOTDB 为用户提供 cli/Shell 工具用于启动客户端和服务端程序。下面介绍每个 cli/Shell 工具的运行方式和相关参数。 +> \$IOTDB\_HOME 表示 IoTDB 的安装目录所在路径。 + +## 安装 +如果使用源码版,可以在 iotdb 的根目录下执行 + +```shell +> mvn clean package -pl cli -am -DskipTests +``` + +在生成完毕之后,IoTDB 的 Cli 工具位于文件夹"cli/target/iotdb-cli-{project.version}"中。 + +如果你下载的是二进制版,那么 Cli 可以在 sbin 文件夹下直接找到。 + +## 运行 + +### Cli 运行方式 +安装后的 IoTDB 中有一个默认用户:`root`,默认密码为`root`。用户可以使用该用户尝试运行 IoTDB 客户端以测试服务器是否正常启动。客户端启动脚本为$IOTDB_HOME/sbin 文件夹下的`start-cli`脚本。启动脚本时需要指定运行 IP 和 RPC PORT。以下为服务器在本机启动,且用户未更改运行端口号的示例,默认端口为 6667。若用户尝试连接远程服务器或更改了服务器运行的端口号,请在-h 和-p 项处使用服务器的 IP 和 RPC PORT。
+用户也可以在启动脚本的最前方设置自己的环境变量,如 JAVA_HOME 等 (对于 linux 用户,脚本路径为:"/sbin/start-cli.sh"; 对于 windows 用户,脚本路径为:"/sbin/start-cli.bat") + +Linux 系统与 MacOS 系统启动命令如下: + +```shell +Shell > bash sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root +``` +Windows 系统启动命令如下: + +```shell +Shell > sbin\start-cli.bat -h 127.0.0.1 -p 6667 -u root -pw root +``` +回车后即可成功启动客户端。启动后出现如图提示即为启动成功。 + +``` + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version + +Successfully login at 127.0.0.1:6667 +``` +输入`quit`或`exit`可退出 cli 结束本次会话,cli 输出`quit normally`表示退出成功。 + +### Cli 运行参数 + +|参数名|参数类型|是否为必需参数| 说明| 例子 | +|:---|:---|:---|:---|:---| +|-disableISO8601 |没有参数 | 否 |如果设置了这个参数,IoTDB 将以数字的形式打印时间戳 (timestamp)。|-disableISO8601| +|-h <`host`> |string 类型,不需要引号|是|IoTDB 客户端连接 IoTDB 服务器的 IP 地址。|-h 10.129.187.21| +|-help|没有参数|否|打印 IoTDB 的帮助信息|-help| +|-p <`rpcPort`>|int 类型|是|IoTDB 连接服务器的端口号,IoTDB 默认运行在 6667 端口。|-p 6667| +|-pw <`password`>|string 类型,不需要引号|否|IoTDB 连接服务器所使用的密码。如果没有输入密码 IoTDB 会在 Cli 端提示输入密码。|-pw root| +|-u <`username`>|string 类型,不需要引号|是|IoTDB 连接服务器锁使用的用户名。|-u root| +|-maxPRC <`maxPrintRowCount`>|int 类型|否|设置 IoTDB 返回客户端命令行中所显示的最大行数。|-maxPRC 10| +|-e <`execute`> |string 类型|否|在不进入客户端输入模式的情况下,批量操作 IoTDB|-e "show databases"| +|-c | 空 | 否 | 如果服务器设置了 `rpc_thrift_compression_enable=true`, 则 CLI 必须使用 `-c` | -c | + +下面展示一条客户端命令,功能是连接 IP 为 10.129.187.21 的主机,端口为 6667 ,用户名为 root,密码为 root,以数字的形式打印时间戳,IoTDB 命令行显示的最大行数为 10。 + +Linux 系统与 MacOS 系统启动命令如下: + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` +Windows 系统启动命令如下: + +```shell +Shell > sbin\start-cli.bat -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` + +### CLI 特殊命令 +下面列举了一些CLI的特殊命令。 + +| 命令 | 描述 / 例子 | +|:---|:---| +| `set time_display_type=xxx` | 例如: long, default, ISO8601, yyyy-MM-dd HH:mm:ss | +| `show time_display_type` | 显示时间显示方式 | +| `set time_zone=xxx` | 例如: +08:00, Asia/Shanghai | +| `show time_zone` | 显示CLI的时区 | +| `set fetch_size=xxx` | 设置从服务器查询数据时的读取条数 | +| `show fetch_size` | 显示读取条数的大小 | +| `set max_display_num=xxx` | 设置 CLI 一次展示的最大数据条数, 设置为-1表示无限制 | +| `help` | 获取CLI特殊命令的提示 | +| `exit/quit` | 退出CLI | + +### 使用 OpenID 作为用户名认证登录 + +OpenID Connect (OIDC) 使用 keycloack 作为 OIDC 服务权限认证服务。 + +#### 配置 +配置位于 iotdb-system.properties,设定 authorizer_provider_class 为 org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer 则开启了 openID 服务,默认情况下值为 org.apache.iotdb.commons.auth.authorizer.LocalFileAuthorizer 表示没有开启 openID 服务。 + +``` +authorizer_provider_class=org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer +``` +如果开启了 openID 服务则 openID_url 为必填项,openID_url 值为 http://ip:port/realms/{realmsName} + +``` +openID_url=http://127.0.0.1:8080/realms/iotdb/ +``` +####keycloack 配置 + +1、下载 keycloack 程序(此教程为21.1.0版本),在 keycloack/bin 中启动 keycloack + +```shell +Shell > cd bin +Shell > ./kc.sh start-dev +``` +2、使用 https://ip:port 登陆 keycloack, 首次登陆需要创建用户 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/login_keycloak.png?raw=true) + +3、点击 Administration Console 进入管理端 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/AdministrationConsole.png?raw=true) + +4、在左侧的 Master 菜单点击 Create Realm, 输入 Realm Name 创建一个新的 Realm + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_1.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_2.jpg?raw=true) + +5、点击左侧菜单 Clients,创建 client + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/client.jpg?raw=true) + +6、点击左侧菜单 User,创建 user + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/user.jpg?raw=true) + +7、点击新创建的用户 id,点击 Credentials 导航输入密码和关闭 Temporary 选项,至此 keyclork 配置完成 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/pwd.jpg?raw=true) + +8、创建角色,点击左侧菜单的 Roles然后点击Create Role 按钮添加角色 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role1.jpg?raw=true) + +9、在Role Name 中输入`iotdb_admin`,点击save 按钮。提示:这里的`iotdb_admin`不能为其他名称否则即使登陆成功后也将无权限使用iotdb的查询、插入、创建 database、添加用户、角色等功能 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role2.jpg?raw=true) + +10、点击左侧的User 菜单然后点击用户列表中的用户为该用户添加我们刚创建的`iotdb_admin`角色 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role3.jpg?raw=true) + +11、选择Role Mappings ,在Assign role选择`iotdb_admin`增加角色 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role4.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role5.jpg?raw=true) + +提示:如果用户角色有调整需要重新生成token并且重新登陆iotdb才会生效 + +以上步骤提供了一种 keycloak 登陆 iotdb 方式,更多方式请参考 keycloak 配置 + +若对应的 IoTDB 服务器开启了使用 OpenID Connect (OIDC) 作为权限认证服务,那么就不再需要使用用户名密码进行登录。 +替而代之的是使用 Token,以及空密码。 +此时,登录命令如下: + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u {my-access-token} -pw "" +``` + +其中,需要将{my-access-token} (注意,包括{})替换成你的 token,即 access_token 对应的值。密码为空需要再次确认。 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/iotdbpw.jpeg?raw=true) + +如何获取 token 取决于你的 OIDC 设置。 最简单的一种情况是使用`password-grant`。例如,假设你在用 keycloack 作为你的 OIDC 服务, +并且你在 keycloack 中有一个被定义成 public 的`iotdb`客户的 realm,那么你可以使用如下`curl`命令获得 token。 +(注意例子中的{}和里面的内容需要替换成具体的服务器地址和 realm 名字): +```shell +curl -X POST "http://{your-keycloack-server}/realms/{your-realm}/protocol/openid-connect/token" \ -H "Content-Type: application/x-www-form-urlencoded" \ + -d "username={username}" \ + -d "password={password}" \ + -d 'grant_type=password' \ + -d "client_id=iotdb-client" +``` + +示例结果如下: + +```json +{"access_token":"eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJxMS1XbTBvelE1TzBtUUg4LVNKYXAyWmNONE1tdWNXd25RV0tZeFpKNG93In0.eyJleHAiOjE1OTAzOTgwNzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNjA0ZmYxMDctN2NiNy00NTRmLWIwYmQtY2M2ZDQwMjFiNGU4IiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiYWNjb3VudCIsInN1YiI6ImJhMzJlNDcxLWM3NzItNGIzMy04ZGE2LTZmZThhY2RhMDA3MyIsInR5cCI6IkJlYXJlciIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsImFjciI6IjEiLCJhbGxvd2VkLW9yaWdpbnMiOlsibG9jYWxob3N0OjgwODAiXSwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbIm9mZmxpbmVfYWNjZXNzIiwidW1hX2F1dGhvcml6YXRpb24iLCJpb3RkYl9hZG1pbiJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoiZW1haWwgcHJvZmlsZSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJ1c2VyIn0.nwbrJkWdCNjzFrTDwKNuV5h9dDMg5ytRKGOXmFIajpfsbOutJytjWTCB2WpA8E1YI3KM6gU6Jx7cd7u0oPo5syHhfCz119n_wBiDnyTZkFOAPsx0M2z20kvBLN9k36_VfuCMFUeddJjO31MeLTmxB0UKg2VkxdczmzMH3pnalhxqpnWWk3GnrRrhAf2sZog0foH4Ae3Ks0lYtYzaWK_Yo7E4Px42-gJpohy3JevOC44aJ4auzJR1RBj9LUbgcRinkBy0JLi6XXiYznSC2V485CSBHW3sseXn7pSXQADhnmGQrLfFGO5ZljmPO18eFJaimdjvgSChsrlSEmTDDsoo5Q","expires_in":300,"refresh_expires_in":1800,"refresh_token":"eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhMzZlMGU0NC02MWNmLTQ5NmMtOGRlZi03NTkwNjQ5MzQzMjEifQ.eyJleHAiOjE1OTAzOTk1NzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNmMxNTBiY2EtYmE5NC00NTgxLWEwODEtYjI2YzhhMmI5YmZmIiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwic3ViIjoiYmEzMmU0NzEtYzc3Mi00YjMzLThkYTYtNmZlOGFjZGEwMDczIiwidHlwIjoiUmVmcmVzaCIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsInNjb3BlIjoiZW1haWwgcHJvZmlsZSJ9.ayNpXdNX28qahodX1zowrMGiUCw2AodlHBQFqr8Ui7c","token_type":"bearer","not-before-policy":0,"session_state":"060d2862-14ed-42fe-baf7-8d1f784657f1","scope":"email profile"} +``` + +### Cli 的批量操作 +当您想要通过脚本的方式通过 Cli / Shell 对 IoTDB 进行批量操作时,可以使用-e 参数。通过使用该参数,您可以在不进入客户端输入模式的情况下操作 IoTDB。 + +为了避免 SQL 语句和其他参数混淆,现在只支持-e 参数作为最后的参数使用。 + +针对 cli/Shell 工具的-e 参数用法如下: + +Linux 系统与 MacOS 指令: + +```shell +Shell > bash sbin/start-cli.sh -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +Windows 系统指令 +```shell +Shell > sbin\start-cli.bat -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +在 Windows 环境下,-e 参数的 SQL 语句需要使用` `` `对于`" "`进行替换 + +为了更好的解释-e 参数的使用,可以参考下面在 Linux 上执行的例子。 + +假设用户希望对一个新启动的 IoTDB 进行如下操作: + +1. 创建名为 root.demo 的 database + +2. 创建名为 root.demo.s1 的时间序列 + +3. 向创建的时间序列中插入三个数据点 + +4. 查询验证数据是否插入成功 + +那么通过使用 cli/Shell 工具的 -e 参数,可以采用如下的脚本: + +```shell +# !/bin/bash + +host=127.0.0.1 +rpcPort=6667 +user=root +pass=root + +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "CREATE DATABASE root.demo" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create timeseries root.demo.s1 WITH DATATYPE=INT32, ENCODING=RLE" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(1,10)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(2,11)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(3,12)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "select s1 from root.demo" +``` + +打印出来的结果显示如下,通过这种方式进行的操作与客户端的输入模式以及通过 JDBC 进行操作结果是一致的。 + +```shell + Shell > bash ./shell.sh ++-----------------------------+------------+ +| Time|root.demo.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 10| +|1970-01-01T08:00:00.002+08:00| 11| +|1970-01-01T08:00:00.003+08:00| 12| ++-----------------------------+------------+ +Total line number = 3 +It costs 0.267s +``` + +需要特别注意的是,在脚本中使用 -e 参数时要对特殊字符进行转义。 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Data-Export-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Data-Export-Tool.md new file mode 100644 index 00000000..361d5076 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Data-Export-Tool.md @@ -0,0 +1,199 @@ +# 数据导出 + +## 1. 导出工具介绍 + +导出工具可以将 SQL 查询的数据导出为指定的格式,包含用于导出 TsFile 文件的 `export-tsfile.sh/bat` 脚本和支持 CSV 和 SQL 格式的导出的 `export-data.sh/bat` 脚本。 + +## 2. 支持的数据类型 + +- CSV:纯文本格式,存储格式化数据,需按照下文指定 CSV 格式进行构造 + +- SQL:包含自定义 SQL 语句的文件 + +- TsFile: IoTDB 中使用的时间序列的文件格式 + +## 3. export-tsfile 脚本 + +支持 TsFile: IoTDB 中使用的时间序列的文件格式 + +#### 3.1 运行命令 + +```Bash +# Unix/OS X +tools/export-tsfile.sh -h -p -u -pw -td [-f -q -s ] + +# Windows +tools\export-tsfile.bat -h -p -u -pw -td [-f -q -s ] +``` + +#### 3.2 参数介绍 + +| **参数** | **定义** | **是否必填** | **默认** | +| -------- | ------------------------------------------------------------ | ------------ | --------- | +| -h | 主机名 | 否 | 127.0.0.1 | +| -p | 端口号 | 否 | 6667 | +| -u | 用户名 | 否 | root | +| -pw | 密码 | 否 | root | +| -t | 目标文件目录,用于指定输出文件应该保存到的目录 | 是 | - | +| -tfn | 导出文件的名称 | 否 | - | +| -q | 想要执行的查询命令的数量,可能用于批量执行查询 | 否 | - | +| -s | SQL 文件路径,用于指定包含要执行的 SQL 语句的文件位置 | 否 | - | +| -timeout | 会话查询的超时时间,用于指定查询操作在自动终止前允许的最长时间 | 否 | - | + +除此之外,如果没有使用`-s`和`-q`参数,在导出脚本被启动之后你需要按照程序提示输入查询语句,不同的查询结果会被保存到不同的TsFile文件中。 + +#### 3.3 运行示例 + +```Bash +# Unix/OS X +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 + +# Windows +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 +``` + +## 4. export-data 脚本 + +支持 CSV:纯文本格式,存储格式化数据,需按照下文指定 CSV 格式进行构造 + +支持 SQL:包含自定义 SQL 语句的文件 + +#### 4.1 运行命令 + +```Bash +# Unix/OS X +>tools/export-data.sh -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] + +# Windows +>tools\export-data.bat -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] +``` + +#### 4.2 参数介绍 + +| **参数** | **定义** | **是否必填** | **默认** | +| --------- | ------------------------------------------------------------ | ------------ | ------------------------ | +| -h | 主机名 | 否 | 127.0.0.1 | +| -p | 端口号 | 否 | 6667 | +| -u | 用户名 | 否 | root | +| -pw | 密码 | 否 | root | +| -t | 导出的 CSV 或 SQL 文件的输出路径(V1.3.2版本参数是`-td`) | 是 | | +| -datatype | 是否在 CSV 文件的 header 中时间序列的后面打印出对应的数据类型,选项为 true 或者 false | 否 | true | +| -q | 在命令中直接指定想要执行的查询语句(目前仅支持部分语句,详细明细见下表)说明:-q 与 -s 参数必填其一,同时填写则 -q 生效。详细支持的 SQL 语句示例,请参考下方“SQL语句支持明细” | 否 | | +| -s | 指定 SQL 文件,该文件可包含一条或多条 SQL 语句。如果包含多条 SQL 语句,语句之间应该用换行(回车)进行分割。每一条 SQL 语句对应一个或多个输出的CSV或 SQL 文件说明:-q 与 -s 参数必填其一,同时填写则-q生效。详细支持的 SQL 语句示例,请参考下方“SQL语句支持规则” | 否 | | +| -type | 指定导出的文件类型,选项为 csv 或者 sql | 否 | csv | +| -tf | 指定时间格式。时间格式必须遵守[ISO 8601](https://calendars.wikia.org/wiki/ISO_8601)标准,或时间戳(`timestamp`) 说明:只在 -type 为 csv 时生效 | 否 | yyyy-MM-dd HH:mm:ss.SSSz | +| -lpf | 指定导出的 dump 文件最大行数(V1.3.2版本参数是`-linesPerFile`) | 否 | 10000 | +| -timeout | 指定 session 查询时的超时时间,单位为ms | 否 | -1 | + +#### 4.3 SQL 语句支持规则 + +1. 只支持查询语句,非查询语句(如:元数据管理、系统管理等语句)不支持。对于不支持的 SQL ,程序会自动跳过,同时输出错误信息。 +2. 查询语句中目前版本仅支持原始数据的导出,如果有使用 group by、聚合函数、udf、操作运算符等则不支持导出为 SQL。原始数据导出时请注意,若导出多个设备数据,请使用 align by device 语句。详细示例如下: + +| | **支持导出** | **示例** | +| ----------------------------------------- | ------------ | --------------------------------------------- | +| 原始数据单设备查询 | 支持 | select * from root.s_0.d_0 | +| 原始数据多设备查询(aligin by device) | 支持 | select * from root.** align by device | +| 原始数据多设备查询(无 aligin by device) | 不支持 | select * from root.**select * from root.s_0.* | + +#### 4.4 运行示例 + +- 导出某 SQL 执行范围下的所有数据至 CSV 文件。 + +```Bash +# Unix/OS X +>tools/export-data.sh -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] + +# Windows +>tools\export-data.bat -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] +``` + +- 导出结果 + +```Bash +Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice +2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 +2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 +``` + +- 导出 SQL 文件内所有 SQL 执行范围下的所有数据至 CSV 文件。 + +```Bash +# Unix/OS X +>tools/export-data.sh -t ./data/ -s export.sql +# Windows +>tools/export-data.bat -t ./data/ -s export.sql +``` + +- export.sql 文件内容(-s 参数指向的文件) + +```Bash +select * from root.stock.** limit 100 +select * from root.db.** limit 100 +``` + +- 导出结果文件1 + +```Bash +Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice +2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 +2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 +``` + +- 导出结果文件2 + +```Bash +Time,root.db.Random.RandomBoolean +2024-07-22T17:16:05.820+08:00,true +2024-07-22T17:16:02.597+08:00,false +``` + +- 将 IoTDB 数据库中在 SQL 文件内定义的数据,以对齐的格式将其导出为 SQL 语句。 + +```Bash +# Unix/OS X +>tools/export-data.sh -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +# Windows +>tools/export-data.bat -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +``` + +- 导出结果 + +```Bash +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249629831,0.62308747,2.0,0.012206747854849653,-6.0,false,0.14164352); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249630834,0.7520042,3.0,0.22760657101910464,-5.0,true,0.089064896); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249631835,0.3981064,3.0,0.6254559288663467,-6.0,false,0.9767922); +``` + +- 将某 SQL 执行范围下的所有数据导出至 CSV 文件,指定导出的时间格式为`yyyy-MM-dd HH:mm:ss`,且表头时间序列的后面打印出对应的数据类型。 + +```Bash +# Unix/OS X +>tools/export-data.sh -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +# Windows +>tools/export-data.bat -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true +``` + +- 导出结果 + +```Bash +Time,root.stock.Legacy.0700HK.L1_BidPrice(DOUBLE),root.stock.Legacy.0700HK.Type(DOUBLE),root.stock.Legacy.0700HK.L1_BidSize(DOUBLE),root.stock.Legacy.0700HK.Domain(DOUBLE),root.stock.Legacy.0700HK.L1_BuyNo(BOOLEAN),root.stock.Legacy.0700HK.L1_AskPrice(DOUBLE) +2024-07-30 10:33:55,0.44574088,3.0,0.21476832811611501,-4.0,true,0.5951748 +2024-07-30 10:33:56,0.6880933,3.0,0.6289119476165305,-5.0,false,0.114634395 +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Data-Import-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Data-Import-Tool.md new file mode 100644 index 00000000..08659116 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Data-Import-Tool.md @@ -0,0 +1,206 @@ +# 数据导入 + +## 1. IoTDB 数据导入 + +IoTDB 目前支持将 CSV、SQL、及TsFile(IoTDB底层开放式时间序列文件格式)格式的数据导入数据库。具体功能如下: + + + + + + + + + + + + + + + + + + + + + + + + + + +
文件格式IoTDB工具具体介绍
CSVimport-data.sh/bat可用于单个或一个目录的 CSV 文件批量导入 IoTDB
SQLimport-data.sh/bat可用于单个或一个目录的 SQL 文件批量导入 IoTDB
TsFileload-tsfile.sh/bat可用于单个或一个目录的 TsFile 文件批量导入 IoTDB
TsFile 主动监听&加载功能根据用户配置,监听指定路径下TsFile文件的变化,将新增加的TsFile文件加载入IoTDB
+ +## 2. import-data 脚本 + +- 支持格式:CSV、SQL + +### 2.1 运行命令: + +```Bash +# Unix/OS X +>tools/import-data.sh -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] + +# Windows +>tools\import-data.bat -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] +``` + +### 2.2 参数介绍: + +| **参数** | **定义** | **是否必填** | **默认** | +| ---------- | ------------------------------------------------------------ | ------------ | ------------------------- | +| -h | 数据库IP地址 | 否 | 127.0.0.1 | +| -p | 数据库端口 | 否 | 6667 | +| -u | 数据库连接用户名 | 否 | root | +| -pw | 数据库连接密码 | 否 | root | +| -s | 指定想要导入的数据,这里可以指定文件或者文件夹。如果指定的是文件夹,将会把文件夹中所有的后缀为 csv 或者 sql 的文件进行批量导入(V1.3.2版本参数是`-f`) | 是 | | +| -fd | 指定存放失败 SQL 文件的目录,如果未指定这个参数,失败的文件将会被保存到源数据的目录中。 说明:对于不支持的 SQL ,不合法的 SQL ,执行失败的 SQL 都会放到失败目录下的失败文件里(默认为 文件名.failed) | 否 | 源文件名加上`.failed`后缀 | +| -aligned | 指定是否使用`aligned`接口,选项为 true 或者 false 说明:这个参数只在导入文件为csv文件时生效 | 否 | false | +| -batch | 用于指定每一批插入的数据的点数(最小值为1,最大值为 Integer.*MAX_VALUE*)。如果程序报了`org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`这个错的话,就可以适当的调低这个参数。 | 否 | 100000 | +| -tp | 指定时间精度,可选值包括`ms`(毫秒),`ns`(纳秒),`us`(微秒) | 否 | ms | +| -lpf | 指定每个导入失败文件写入数据的行数(V1.3.2版本参数是`-linesPerFailedFile`) | 否 | 10000 | +| -typeInfer | 用于指定类型推断规则,如。 说明:用于指定类型推断规则.`srcTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`NaN`.`dstTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`text`.当`srcTsDataType`为`boolean`, `dstTsDataType`只能为`boolean`或`text`.当`srcTsDataType`为`NaN`, `dstTsDataType`只能为`float`, `double`或`text`.当`srcTsDataType`为数值类型, `dstTsDataType`的精度需要高于`srcTsDataType`.例如:`-typeInfer boolean=text,float=double` | 否 | | + +### 2.3 运行示例: + +- 导入当前`data`目录下的`dump0_0.sql`数据到本机 IoTDB 数据库中。 + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.sql +# Windows +>tools/import-data.bat -s ./data/dump0_0.sql +``` + +- 将当前`data`目录下的所有数据以对齐的方式导入到本机 IoTDB 数据库中。 + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/ -fd ./failed/ -aligned true +# Windows +>tools/import-data.bat -s ./data/ -fd ./failed/ -aligned true +``` + +- 导入当前`data`目录下的`dump0_0.csv`数据到本机 IoTDB 数据库中。 + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.csv -fd ./failed/ +# Windows +>tools/import-data.bat -s ./data/dump0_0.csv -fd ./failed/ +``` + +- 将当前`data`目录下的`dump0_0.csv`数据以对齐的方式,一批导入100000条导入到`192.168.100.1`IP所在主机的 IoTDB 数据库中,失败的记录记在当前`failed`目录下,每个文件最多记1000条。 + +```Bash +# Unix/OS X +>tools/import-data.sh -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +# Windows +>tools/import-data.bat -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +``` + +## 3. load-tsfile 脚本 + +- 支持格式:TsFile + +### 3.1 运行命令 + +```Bash +# Unix/OS X +>tools/load-tsfile.sh -h -p -u -pw -s -os [-sd ] -of [-fd ] [-tn ] + +# Windows +>tools\load-tsfile.bat -h -p -u -pw -s -os [-sd ] -of [-fd ] [-tn ] +``` + +### 3.2 参数介绍 + +| **参数** | **定义** | **是否必填** | **默认** | +| -------- | ------------------------------------------------------------ | ----------------------------------- | ------------------- | +| -h | 主机名 | 否 | root | +| -p | 端口号 | 否 | root | +| -u | 用户名 | 否 | 127.0.0.1 | +| -pw | 密码 | 否 | 6667 | +| -s | 待加载的脚本文件(夹)的本地目录路径 | 是 | | +| -os | none:不删除
mv:移动成功的文件到目标文件夹
cp:硬连接(拷贝)成功的文件到目标文件夹
delete:删除 | 是 | | +| -sd | 当--on_succcess为mv或cp时,mv或cp的目标文件夹。文件的文件名变为文件夹打平后拼接原有文件名 | 当--on_succcess为mv或cp时需要填写是 | ${EXEC_DIR}/success | +| -of | none:跳过
mv:移动失败的文件到目标文件夹
cp:硬连接(拷贝)失败的文件到目标文件夹
delete:删除 | 是 | | +| -fd | 当--on_fail指定为mv或cp时,mv或cp的目标文件夹。文件的文件名变为文件夹打平后拼接原有文件名 | 当--on_fail指定为mv或cp时需要填写 | ${EXEC_DIR}/fail | +| -tn | 最大并行线程数 | 是 | 8 | + +### 3.3 运行示例: + +```Bash +# Unix/OS X +> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8 +> tools/load-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8 + +# Windows +> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os mv -of cp -sd /path/success/dir -fd /path/failure/dir -tn 8 +> tools/load_data.bat -h 127.0.0.1 -p 6667 -u root -pw root -s /path/sql -os delete -of delete -tn 8 +``` + +## 4. TsFile 主动监听&加载功能 + +TsFile 主动监听&加载功能能够主动监听指定目标路径(用户配置)下TsFile的文件变化,并将目标路径下的TsFile文件自动同步至指定接收路径(用户配置)。通过此功能,IoTDB 能自动检测并加载这些文件,无需手动执行任何额外的加载操作。这种自动化流程不仅简化了用户的操作步骤,还减少了操作过程中可能出现的错误,有效降低了用户在使用过程中的复杂性。 + +![](https://alioss.timecho.com/docs/img/Data-import1.png) + + +### 4.1 配置参数 + +可通过从配置文件模版 `iotdb-system.properties.template` 中找到下列参数,添加到 IoTDB 配置文件 `iotdb-system.properties` 中开启TsFile 主动监听&加载功能。完整配置如下:: + + +| **配置参数** | **参数说明** | **value 取值范围** | **是否必填** | **默认值** | **加载方式** | +| -------------------------------------------- | ------------------------------------------------------------ | -------------------------- | ------------ | ---------------------- | ---------------- | +| load_active_listening_enable | 是否开启 DataNode 主动监听并且加载 tsfile 的功能(默认开启)。 | Boolean: true,false | 选填 | true | 热加载 | +| load_active_listening_dirs | 需要监听的目录(自动包括目录中的子目录),如有多个使用 “,“ 隔开默认的目录为 ext/load/pending(支持热装载) | String: 一个或多个文件目录 | 选填 | ext/load/pending | 热加载 | +| load_active_listening_fail_dir | 执行加载 tsfile 文件失败后将文件转存的目录,只能配置一个 | String: 一个文件目录 | 选填 | ext/load/failed | 热加载 | +| load_active_listening_max_thread_num | 同时执行加载 tsfile 任务的最大线程数,参数被注释掉时的默值为 max(1, CPU 核心数 / 2),当用户设置的值不在这个区间[1, CPU核心数 /2]内时,会设置为默认值 (1, CPU 核心数 / 2) | Long: [1, Long.MAX_VALUE] | 选填 | max(1, CPU 核心数 / 2) | 重启后生效 | +| load_active_listening_check_interval_seconds | 主动监听轮询间隔,单位秒。主动监听 tsfile 的功能是通过轮询检查文件夹实现的。该配置指定了两次检查 load_active_listening_dirs 的时间间隔,每次检查完成 load_active_listening_check_interval_seconds 秒后,会执行下一次检查。当用户设置的轮询间隔小于 1 时,会被设置为默认值 5 秒 | Long: [1, Long.MAX_VALUE] | 选填 | 5 | 重启后生效 | + +### 4.2 注意事项 + +1. 如果待加载的文件中,存在 mods 文件,应优先将 mods 文件移动到监听目录下面,然后再移动 tsfile 文件,且 mods 文件应和对应的 tsfile 文件处于同一目录。防止加载到 tsfile 文件时,加载不到对应的 mods 文件 + +```SQL +FUNCTION moveFilesToListeningDirectory(sourceDirectory, listeningDirectory) + // 移动 mods 文件 + modsFiles = searchFiles(sourceDirectory, "*mods*") + IF modsFiles IS NOT EMPTY + FOR EACH file IN modsFiles + MOVE(file, listeningDirectory) + END FOR + END IF + + // 移动 tsfile 文件 + tsfileFiles = searchFiles(sourceDirectory, "*tsfile*") + IF tsfileFiles IS NOT EMPTY + FOR EACH file IN tsfileFiles + MOVE(file, listeningDirectory) + END FOR + END IF +END FUNCTION + +FUNCTION searchFiles(directory, pattern) + matchedFiles = [] + FOR EACH file IN directory.files + IF file.name MATCHES pattern + APPEND file TO matchedFiles + END IF + END FOR + RETURN matchedFiles +END FUNCTION + +FUNCTION MOVE(sourceFile, targetDirectory) + // 实现文件从 sourceFile 移动到 targetDirectory 的逻辑 +END FUNCTION +``` + +2. 禁止设置 Pipe 的 receiver 目录、存放数据的 data 目录等作为监听目录 + +3. 禁止 `load_active_listening_fail_dir` 与 `load_active_listening_dirs` 存在相同的目录,或者互相嵌套 + +4. 保证 `load_active_listening_dirs` 目录有足够的权限,在加载成功之后,文件将会被删除,如果没有删除权限,则会重复加载 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_apache.md new file mode 100644 index 00000000..57b527cc --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_apache.md @@ -0,0 +1,229 @@ + + +# 集群管理工具 + +## 数据文件夹概览工具 + +IoTDB数据文件夹概览工具用于打印出数据文件夹的结构概览信息,工具位置为 tools/tsfile/print-iotdb-data-dir。 + +### 用法 + +- Windows: + +```bash +.\print-iotdb-data-dir.bat (<输出结果的存储路径>) +``` + +- Linux or MacOs: + +```shell +./print-iotdb-data-dir.sh (<输出结果的存储路径>) +``` + +注意:如果没有设置输出结果的存储路径, 将使用相对路径"IoTDB_data_dir_overview.txt"作为默认值。 + +### 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data +```````````````````````` +Starting Printing the IoTDB Data Directory Overview +```````````````````````` +output save path:IoTDB_data_dir_overview.txt +data dir num:1 +143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +|============================================================== +|D:\github\master\iotdb\data\datanode\data +|--sequence +| |--root.redirect0 +| | |--1 +| | | |--0 +| |--root.redirect1 +| | |--2 +| | | |--0 +| |--root.redirect2 +| | |--3 +| | | |--0 +| |--root.redirect3 +| | |--4 +| | | |--0 +| |--root.redirect4 +| | |--5 +| | | |--0 +| |--root.redirect5 +| | |--6 +| | | |--0 +| |--root.sg1 +| | |--0 +| | | |--0 +| | | |--2760 +|--unsequence +|============================================================== +````````````````````````` + +## TsFile概览工具 + +TsFile概览工具用于以概要模式打印出一个TsFile的内容,工具位置为 tools/tsfile/print-tsfile。 + +### 用法 + +- Windows: + +```bash +.\print-tsfile-sketch.bat (<输出结果的存储路径>) +``` + +- Linux or MacOs: + +```shell +./print-tsfile-sketch.sh (<输出结果的存储路径>) +``` + +注意:如果没有设置输出结果的存储路径, 将使用相对路径"TsFile_sketch_view.txt"作为默认值。 + +### 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt +```````````````````````` +Starting Printing the TsFile Sketch +```````````````````````` +TsFile path:D:\github\master\1669359533965-1-0-0.tsfile +Sketch save path:D:\github\master\sketch.txt +148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +-------------------------------- TsFile Sketch -------------------------------- +file path: D:\github\master\1669359533965-1-0-0.tsfile +file length: 2974 + + POSITION| CONTENT + -------- ------- + 0| [magic head] TsFile + 6| [version number] 3 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 + 7| [Chunk Group Header] + | [marker] 0 + | [deviceID] root.sg1.d1 + 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends + 2656| [marker] 2 + 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [ChunkIndex] offset=20 + 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [ChunkIndex] offset=893 + 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [ChunkIndex] offset=1766 + 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT + | + | +||||||||||||||||||||| [TsFileMetadata] begins + 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE + | + | + | [meta offset] 2656 + | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 +||||||||||||||||||||| [TsFileMetadata] ends + 2964| [TsFileMetadataSize] 73 + 2968| [magic tail] TsFile + 2974| END of TsFile +---------------------------- IndexOfTimerseriesIndex Tree ----------------------------- + [MetadataIndex:LEAF_DEVICE] + └──────[root.sg1.d1,2870] + [MetadataIndex:LEAF_MEASUREMENT] + └──────[s1,2657] +---------------------------------- TsFile Sketch End ---------------------------------- +````````````````````````` + +解释: + +- 以"|"为分隔,左边是在TsFile文件中的实际位置,右边是梗概内容。 +- "|||||||||||||||||||||"是为增强可读性而添加的导引信息,不是TsFile中实际存储的数据。 +- 最后打印的"IndexOfTimerseriesIndex Tree"是对TsFile文件末尾的元数据索引树的重新整理打印,便于直观理解,不是TsFile中存储的实际数据。 + +## TsFile Resource概览工具 + +TsFile resource概览工具用于打印出TsFile resource文件的内容,工具位置为 tools/tsfile/print-tsfile-resource-files。 + +### 用法 + +- Windows: + +```bash +.\print-tsfile-resource-files.bat +``` + +- Linux or MacOs: + +``` +./print-tsfile-resource-files.sh +``` + +### 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. +````````````````````````` + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. +````````````````````````` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_timecho.md new file mode 100644 index 00000000..63d69149 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Maintenance-Tool_timecho.md @@ -0,0 +1,1013 @@ + + +# 集群管理工具 + +## 集群管理工具 + +IoTDB 集群管理工具是一款易用的运维工具(企业版工具)。旨在解决 IoTDB 分布式系统多节点的运维难题,主要包括集群部署、集群启停、弹性扩容、配置更新、数据导出等功能,从而实现对复杂数据库集群的一键式指令下发,极大降低管理难度。本文档将说明如何用集群管理工具远程部署、配置、启动和停止 IoTDB 集群实例。 + +### 环境准备 + +本工具为 TimechoDB(基于IoTDB的企业版数据库)配套工具,您可以联系您的销售获取工具下载方式。 + +IoTDB 要部署的机器需要依赖jdk 8及以上版本、lsof、netstat、unzip功能如果没有请自行安装,可以参考文档最后的一节环境所需安装命令。 + +提示:IoTDB集群管理工具需要使用有root权限的账号 + +### 部署方法 + +#### 下载安装 + +本工具为TimechoDB(基于IoTDB的企业版数据库)配套工具,您可以联系您的销售获取工具下载方式。 + +注意:由于二进制包仅支持GLIBC2.17 及以上版本,因此最低适配Centos7版本 + +* 在iotd目录内输入以下指令后: + +```bash +bash install-iotdbctl.sh +``` + +即可在之后的 shell 内激活 iotdbctl 关键词,如检查部署前所需的环境指令如下所示: + +```bash +iotdbctl cluster check example +``` + +* 也可以不激活iotd直接使用 <iotdbctl absolute path>/sbin/iotdbctl 来执行命令,如检查部署前所需的环境: + +```bash +/sbin/iotdbctl cluster check example +``` + +### 系统结构 + +IoTDB集群管理工具主要由config、logs、doc、sbin目录组成。 + +* `config`存放要部署的集群配置文件如果要使用集群部署工具需要修改里面的yaml文件。 +* `logs` 存放部署工具日志,如果想要查看部署工具执行日志请查看`logs/iotd_yyyy_mm_dd.log`。 +* `sbin` 存放集群部署工具所需的二进制包。 +* `doc` 存放用户手册、开发手册和推荐部署手册。 + + +### 集群配置文件介绍 + +* 在`iotdbctl/config` 目录下有集群配置的yaml文件,yaml文件名字就是集群名字yaml 文件可以有多个,为了方便用户配置yaml文件在iotd/config目录下面提供了`default_cluster.yaml`示例。 +* yaml 文件配置由`global`、`confignode_servers`、`datanode_servers`、`grafana_server`、`prometheus_server`四大部分组成 +* global 是通用配置主要配置机器用户名密码、IoTDB本地安装文件、Jdk配置等。在`iotdbctl/config`目录中提供了一个`default_cluster.yaml`样例数据, + 用户可以复制修改成自己集群名字并参考里面的说明进行配置IoTDB集群,在`default_cluster.yaml`样例中没有注释的均为必填项,已经注释的为非必填项。 + +例如要执行`default_cluster.yaml`检查命令则需要执行命令`iotdbctl cluster check default_cluster`即可, +更多详细命令请参考下面命令列表。 + + + +| 参数 | 说明 | 是否必填 | +|-------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| +| iotdb\_zip\_dir | IoTDB 部署分发目录,如果值为空则从`iotdb_download_url`指定地址下载 | 非必填 | +| iotdb\_download\_url | IoTDB 下载地址,如果`iotdb_zip_dir` 没有值则从指定地址下载 | 非必填 | +| jdk\_tar\_dir | jdk 本地目录,可使用该 jdk 路径进行上传部署至目标节点。 | 非必填 | +| jdk\_deploy\_dir | jdk 远程机器部署目录,会将 jdk 部署到该目录下面,与下面的`jdk_dir_name`参数构成完整的jdk部署目录即 `/` | 非必填 | +| jdk\_dir\_name | jdk 解压后的目录名称默认是jdk_iotdb | 非必填 | +| iotdb\_lib\_dir | IoTDB lib 目录或者IoTDB 的lib 压缩包仅支持.zip格式 ,仅用于IoTDB升级,默认处于注释状态,如需升级请打开注释修改路径即可。如果使用zip文件请使用zip 命令压缩iotdb/lib目录例如 zip -r lib.zip apache\-iotdb\-1.2.0/lib/* | 非必填 | +| user | ssh登陆部署机器的用户名 | 必填 | +| password | ssh登录的密码, 如果password未指定使用pkey登陆, 请确保已配置节点之间ssh登录免密钥 | 非必填 | +| pkey | 密钥登陆如果password有值优先使用password否则使用pkey登陆 | 非必填 | +| ssh\_port | ssh登录端口 | 必填 | +| iotdb\_admin_user | iotdb服务用户名默认root | 非必填 | +| iotdb\_admin_password | iotdb服务密码默认root | 非必填 | +| deploy\_dir | IoTDB 部署目录,会把 IoTDB 部署到该目录下面与下面的`iotdb_dir_name`参数构成完整的IoTDB 部署目录即 `/` | 必填 | +| iotdb\_dir\_name | IoTDB 解压后的目录名称默认是iotdb | 非必填 | +| datanode-env.sh | 对应`iotdb/config/datanode-env.sh` ,在`global`与`confignode_servers`同时配置值时优先使用`confignode_servers`中的值 | 非必填 | +| confignode-env.sh | 对应`iotdb/config/confignode-env.sh`,在`global`与`datanode_servers`同时配置值时优先使用`datanode_servers`中的值 | 非必填 | +| iotdb-common.properties | 对应`iotdb/config/iotdb-common.properties` | 非必填 | +| cn\_seed\_config\_node | 集群配置地址指向存活的ConfigNode,默认指向confignode\_x,在`global`与`confignode_servers`同时配置值时优先使用`confignode_servers`中的值,对应`iotdb/config/iotdb-system.properties`中的`cn_seed_config_node` | 必填 | +| dn\_seed\_config\_node | 集群配置地址指向存活的ConfigNode,默认指向confignode\_x,在`global`与`datanode_servers`同时配置值时优先使用`datanode_servers`中的值,对应`iotdb/config/iotdb-system.properties`中的`dn_seed_config_node` | 必填 | + +其中datanode-env.sh 和confignode-env.sh 可以配置额外参数extra_opts,当该参数配置后会在datanode-env.sh 和confignode-env.sh 后面追加对应的值,可参考default\_cluster.yaml,配置示例如下: +datanode-env.sh: +extra_opts: | +IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:+UseG1GC" +IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:MaxGCPauseMillis=200" + + +* confignode_servers 是部署IoTDB Confignodes配置,里面可以配置多个Confignode + 默认将第一个启动的ConfigNode节点node1当作Seed-ConfigNode + +| 参数 | 说明 | 是否必填 | +|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| +| name | Confignode 名称 | 必填 | +| deploy\_dir | IoTDB config node 部署目录 | 必填| | +| cn\_internal\_address | 对应iotdb/内部通信地址,对应`iotdb/config/iotdb-system.properties`中的`cn_internal_address` | 必填 | +| cn\_seed\_config\_node | 集群配置地址指向存活的ConfigNode,默认指向confignode_x,在`global`与`confignode_servers`同时配置值时优先使用`confignode_servers`中的值,对应`iotdb/config/iotdb-confignode.properties`中的`cn_seed_config_node` | 必填 | +| cn\_internal\_port | 内部通信端口,对应`iotdb/config/iotdb-system.properties`中的`cn_internal_port` | 必填 | +| cn\_consensus\_port | 对应`iotdb/config/iotdb-system.properties`中的`cn_consensus_port` | 非必填 | +| cn\_data\_dir | 对应`iotdb/config/iotdb-system.properties`中的`cn_data_dir` | 必填 | +| iotdb-system.properties | 对应`iotdb/config/iotdb-system.properties`在`global`与`confignode_servers`同时配置值优先使用confignode\_servers中的值 | 非必填 | + +* datanode_servers 是部署IoTDB Datanodes配置,里面可以配置多个Datanode + +| 参数 | 说明 | 是否必填 | +| -------------------------- | ------------------------------------------------------------ | -------- | +| name | Datanode 名称 | 必填 | +| deploy_dir | IoTDB data node 部署目录,注:该目录不能与下面的IoTDB config node部署目录相同 | 必填 | +| dn_rpc_address | datanode rpc 地址对应`iotdb/config/iotdb-system.properties`中的`dn_rpc_address` | 必填 | +| dn_internal_address | 内部通信地址,对应`iotdb/config/iotdb-system.properties`中的`dn_internal_address` | 必填 | +| dn_seed_config_node | 集群配置地址指向存活的ConfigNode,默认指向confignode_x,在`global`与`datanode_servers`同时配置值时优先使用`datanode_servers`中的值,对应`iotdb/config/iotdb-datanode.properties`中的`dn_seed_config_node`,推荐使用 SeedConfigNode | 必填 | +| dn_rpc_port | datanode rpc端口地址,对应`iotdb/config/iotdb-system.properties`中的`dn_rpc_port` | 必填 | +| dn_internal_port | 内部通信端口,对应`iotdb/config/iotdb-system.properties`中的`dn_internal_port` | 必填 | +| iotdb-system.properties | 对应`iotdb/config/iotdb-system.properties`在`global`与`datanode_servers`同时配置值优先使用`datanode_servers`中的值 | 非必填 | + + +| 参数 | 说明 |是否必填| +|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--- | +| name | Datanode 名称 |必填| +| deploy\_dir | IoTDB data node 部署目录 |必填| +| dn\_rpc\_address | datanode rpc 地址对应`iotdb/config/iotdb-system.properties`中的`dn_rpc_address` |必填| +| dn\_internal\_address | 内部通信地址,对应`iotdb/config/iotdb-system.properties`中的`dn_internal_address` |必填| +| dn\_seed\_config\_node | 集群配置地址指向存活的ConfigNode,默认指向confignode_x,在`global`与`datanode_servers`同时配置值时优先使用`datanode_servers`中的值,对应`iotdb/config/iotdb-system.properties`中的`dn_seed_config_node` |必填| +| dn\_rpc\_port | datanode rpc端口地址,对应`iotdb/config/iotdb-system.properties`中的`dn_rpc_port` |必填| +| dn\_internal\_port | 内部通信端口,对应`iotdb/config/iotdb-system.properties`中的`dn_internal_port` |必填| +| iotdb-system.properties | 对应`iotdb/config/iotdb-common.properties`在`global`与`datanode_servers`同时配置值优先使用`datanode_servers`中的值 |非必填| + +* grafana_server 是部署Grafana 相关配置 + +| 参数 | 说明 | 是否必填 | +|--------------------|------------------|-------------------| +| grafana\_dir\_name | grafana 解压目录名称 | 非必填默认grafana_iotdb | +| host | grafana 部署的服务器ip | 必填 | +| grafana\_port | grafana 部署机器的端口 | 非必填,默认3000 | +| deploy\_dir | grafana 部署服务器目录 | 必填 | +| grafana\_tar\_dir | grafana 压缩包位置 | 必填 | +| dashboards | dashboards 所在的位置 | 非必填,多个用逗号隔开 | + +* prometheus_server 是部署Prometheus 相关配置 + +| 参数 | 说明 | 是否必填 | +|--------------------------------|------------------|-----------------------| +| prometheus_dir\_name | prometheus 解压目录名称 | 非必填默认prometheus_iotdb | +| host | prometheus 部署的服务器ip | 必填 | +| prometheus\_port | prometheus 部署机器的端口 | 非必填,默认9090 | +| deploy\_dir | prometheus 部署服务器目录 | 必填 | +| prometheus\_tar\_dir | prometheus 压缩包位置 | 必填 | +| storage\_tsdb\_retention\_time | 默认保存数据天数 默认15天 | 非必填 | +| storage\_tsdb\_retention\_size | 指定block可以保存的数据大小默认512M ,注意单位KB, MB, GB, TB, PB, EB | 非必填 | + +如果在config/xxx.yaml的`iotdb-system.properties`和`iotdb-system.properties`中配置了metrics,则会自动把配置放入到promethues无需手动修改 + +注意:如何配置yaml key对应的值包含特殊字符如:等建议整个value使用双引号,对应的文件路径中不要使用包含空格的路径,防止出现识别出现异常问题。 + +### 使用场景 + +#### 清理数据场景 + +* 清理集群数据场景会删除IoTDB集群中的data目录以及yaml文件中配置的`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`和`ext`目录。 +* 首先执行停止集群命令、然后在执行集群清理命令。 +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster clean default_cluster +``` + +#### 集群销毁场景 + +* 集群销毁场景会删除IoTDB集群中的`data`、`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`、`ext`、`IoTDB`部署目录、 + grafana部署目录和prometheus部署目录。 +* 首先执行停止集群命令、然后在执行集群销毁命令。 + + +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster destroy default_cluster +``` + +#### 集群升级场景 + +* 集群升级首先需要在config/xxx.yaml中配置`iotdb_lib_dir`为要上传到服务器的jar所在目录路径(例如iotdb/lib)。 +* 如果使用zip文件上传请使用zip 命令压缩iotdb/lib目录例如 zip -r lib.zip apache-iotdb-1.2.0/lib/* +* 执行上传命令、然后执行重启IoTDB集群命令即可完成集群升级 + +```bash +iotdbctl cluster dist-lib default_cluster +iotdbctl cluster restart default_cluster +``` + +#### 集群配置文件的热部署场景 + +* 首先修改在config/xxx.yaml中配置。 +* 执行分发命令、然后执行热部署命令即可完成集群配置的热部署 + +```bash +iotdbctl cluster dist-conf default_cluster +iotdbctl cluster reload default_cluster +``` + +#### 集群扩容场景 + +* 首先修改在config/xxx.yaml中添加一个datanode 或者confignode 节点。 +* 执行集群扩容命令 +```bash +iotdbctl cluster scaleout default_cluster +``` + +#### 集群缩容场景 + +* 首先在config/xxx.yaml中找到要缩容的节点名字或者ip+port(其中confignode port 是cn_internal_port、datanode port 是rpc_port) +* 执行集群缩容命令 +```bash +iotdbctl cluster scalein default_cluster +``` + +#### 已有IoTDB集群,使用集群部署工具场景 + +* 配置服务器的`user`、`passwod`或`pkey`、`ssh_port` +* 修改config/xxx.yaml中IoTDB 部署路径,`deploy_dir`(IoTDB 部署目录)、`iotdb_dir_name`(IoTDB解压目录名称,默认是iotdb) + 例如IoTDB 部署完整路径是`/home/data/apache-iotdb-1.1.1`则需要修改yaml文件`deploy_dir:/home/data/`、`iotdb_dir_name:apache-iotdb-1.1.1` +* 如果服务器不是使用的java_home则修改`jdk_deploy_dir`(jdk 部署目录)、`jdk_dir_name`(jdk解压后的目录名称,默认是jdk_iotdb),如果使用的是java_home 则不需要修改配置 + 例如jdk部署完整路径是`/home/data/jdk_1.8.2`则需要修改yaml文件`jdk_deploy_dir:/home/data/`、`jdk_dir_name:jdk_1.8.2` +* 配置`cn_seed_config_node`、`dn_seed_config_node` +* 配置`confignode_servers`中`iotdb-system.properties`里面的`cn_internal_address`、`cn_internal_port`、`cn_consensus_port`、`cn_system_dir`、 + `cn_consensus_dir`里面的值不是IoTDB默认的则需要配置否则可不必配置 +* 配置`datanode_servers`中`iotdb-system.properties`里面的`dn_rpc_address`、`dn_internal_address`、`dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`等 +* 执行初始化命令 + +```bash +iotdbctl cluster init default_cluster +``` + +#### 一键部署IoTDB、Grafana和Prometheus 场景 + +* 配置`iotdb-system.properties` 打开metrics接口 +* 配置Grafana 配置,如果`dashboards` 有多个就用逗号隔开,名字不能重复否则会被覆盖。 +* 配置Prometheus配置,IoTDB 集群配置了metrics 则无需手动修改Prometheus 配置会根据哪个节点配置了metrics,自动修改Prometheus 配置。 +* 启动集群 + +```bash +iotdbctl cluster start default_cluster +``` + +更加详细参数请参考上方的 集群配置文件介绍 + + +### 命令格式 + +本工具的基本用法为: +```bash +iotdbctl cluster [params (Optional)] +``` +* key 表示了具体的命令。 + +* cluster name 表示集群名称(即`iotdbctl/config` 文件中yaml文件名字)。 + +* params 表示了命令的所需参数(选填)。 + +* 例如部署default_cluster集群的命令格式为: + +```bash +iotdbctl cluster deploy default_cluster +``` + +* 集群的功能及参数列表如下: + +| 命令 | 功能 | 参数 | +|-----------------|-------------------------------|-------------------------------------------------------------------------------------------------------------------------| +| check | 检测集群是否可以部署 | 集群名称列表 | +| clean | 清理集群 | 集群名称 | +| deploy/dist-all | 部署集群 | 集群名称 ,-N,模块名称(iotdb、grafana、prometheus可选),-op force(可选) | +| list | 打印集群及状态列表 | 无 | +| start | 启动集群 | 集群名称,-N,节点名称(nodename、grafana、prometheus可选) | +| stop | 关闭集群 | 集群名称,-N,节点名称(nodename、grafana、prometheus可选) ,-op force(nodename、grafana、prometheus可选) | +| restart | 重启集群 | 集群名称,-N,节点名称(nodename、grafana、prometheus可选),-op force(强制停止)/rolling(滚动重启) | +| show | 查看集群信息,details字段表示展示集群信息细节 | 集群名称, details(可选) | +| destroy | 销毁集群 | 集群名称,-N,模块名称(iotdb、grafana、prometheus可选) | +| scaleout | 集群扩容 | 集群名称 | +| scalein | 集群缩容 | 集群名称,-N,集群节点名字或集群节点ip+port | +| reload | 集群热加载 | 集群名称 | +| dist-conf | 集群配置文件分发 | 集群名称 | +| dumplog | 备份指定集群日志 | 集群名称,-N,集群节点名字 -h 备份至目标机器ip -pw 备份至目标机器密码 -p 备份至目标机器端口 -path 备份的目录 -startdate 起始时间 -enddate 结束时间 -loglevel 日志类型 -l 传输速度 | +| dumpdata | 备份指定集群数据 | 集群名称, -h 备份至目标机器ip -pw 备份至目标机器密码 -p 备份至目标机器端口 -path 备份的目录 -startdate 起始时间 -enddate 结束时间 -l 传输速度 | +| dist-lib | lib 包升级 | 集群名字(升级完后请重启) | +| init | 已有集群使用集群部署工具时,初始化集群配置 | 集群名字,初始化集群配置 | +| status | 查看进程状态 | 集群名字 | +| acitvate | 激活集群 | 集群名字 | +| dist-plugin | 上传plugin(udf,trigger,pipe)到集群 | 集群名字,-type 类型 U(udf)/T(trigger)/P(pipe) -file /xxxx/trigger.jar,上传完成后需手动执行创建udf、pipe、trigger命令 | +| upgrade | 滚动升级 | 集群名字 | +| health_check | 健康检查 | 集群名字,-N,节点名称(可选) | +| backup | 停机备份 | 集群名字,-N,节点名称(可选) | +| importschema | 元数据导入 | 集群名字,-N,节点名称(必填) -param 参数 | +| exportschema | 元数据导出 | 集群名字,-N,节点名称(必填) -param 参数 | + + +### 详细命令执行过程 + +下面的命令都是以default_cluster.yaml 为示例执行的,用户可以修改成自己的集群文件来执行 + +#### 检查集群部署环境命令 + +```bash +iotdbctl cluster check default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 验证目标节点是否能够通过 SSH 登录 + +* 验证对应节点上的 JDK 版本是否满足IoTDB jdk1.8及以上版本、服务器是否按照unzip、是否安装lsof 或者netstat + +* 如果看到下面提示`Info:example check successfully!` 证明服务器已经具备安装的要求, + 如果输出`Error:example check fail!` 证明有部分条件没有满足需求可以查看上面的输出的Error日志(例如:`Error:Server (ip:172.20.31.76) iotdb port(10713) is listening`)进行修复, + 如果检查jdk没有满足要求,我们可以自己在yaml 文件中配置一个jdk1.8 及以上版本的进行部署不影响后面使用, + 如果检查lsof、netstat或者unzip 不满足要求需要在服务器上自行安装。 + +#### 部署集群命令 + +```bash +iotdbctl cluster deploy default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 根据`confignode_servers` 和`datanode_servers`中的节点信息上传IoTDB压缩包和jdk压缩包(如果yaml中配置`jdk_tar_dir`和`jdk_deploy_dir`值) + +* 根据yaml文件节点配置信息生成并上传`iotdb-system.properties` + +```bash +iotdbctl cluster deploy default_cluster -op force +``` +注意:该命令会强制执行部署,具体过程会删除已存在的部署目录重新部署 + +*部署单个模块* +```bash +# 部署grafana模块 +iotdbctl cluster deploy default_cluster -N grafana +# 部署prometheus模块 +iotdbctl cluster deploy default_cluster -N prometheus +# 部署iotdb模块 +iotdbctl cluster deploy default_cluster -N iotdb +``` + +#### 启动集群命令 + +```bash +iotdbctl cluster start default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 启动confignode,根据yaml配置文件中`confignode_servers`中的顺序依次启动同时根据进程id检查confignode是否正常,第一个confignode 为seek config + +* 启动datanode,根据yaml配置文件中`datanode_servers`中的顺序依次启动同时根据进程id检查datanode是否正常 + +* 如果根据进程id检查进程存在后,通过cli依次检查集群列表中每个服务是否正常,如果cli链接失败则每隔10s重试一次直到成功最多重试5次 + + +*启动单个节点命令* +```bash +#按照IoTDB 节点名称启动 +iotdbctl cluster start default_cluster -N datanode_1 +#按照IoTDB 集群ip+port启动,其中port对应confignode的cn_internal_port、datanode的rpc_port +iotdbctl cluster start default_cluster -N 192.168.1.5:6667 +#启动grafana +iotdbctl cluster start default_cluster -N grafana +#启动prometheus +iotdbctl cluster start default_cluster -N prometheus +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件 + +* 根据提供的节点名称或者ip:port找到对于节点位置信息,如果启动的节点是`data_node`则ip使用yaml 文件中的`dn_rpc_address`、port 使用的是yaml文件中datanode_servers 中的`dn_rpc_port`。 + 如果启动的节点是`config_node`则ip使用的是yaml文件中confignode_servers 中的`cn_internal_address` 、port 使用的是`cn_internal_port` + +* 启动该节点 + +说明:由于集群部署工具仅是调用了IoTDB集群中的start-confignode.sh和start-datanode.sh 脚本, +在实际输出结果失败时有可能是集群还未正常启动,建议使用status命令进行查看当前集群状态(iotdbctl cluster status xxx) + + +#### 查看IoTDB集群状态命令 + +```bash +iotdbctl cluster show default_cluster +#查看IoTDB集群详细信息 +iotdbctl cluster show default_cluster details +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 依次在datanode通过cli执行`show cluster details` 如果有一个节点执行成功则不会在后续节点继续执行cli直接返回结果 + + +#### 停止集群命令 + + +```bash +iotdbctl cluster stop default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 根据`datanode_servers`中datanode节点信息,按照配置先后顺序依次停止datanode节点 + +* 根据`confignode_servers`中confignode节点信息,按照配置依次停止confignode节点 + +*强制停止集群命令* + +```bash +iotdbctl cluster stop default_cluster -op force +``` +会直接执行kill -9 pid 命令强制停止集群 + +*停止单个节点命令* + +```bash +#按照IoTDB 节点名称停止 +iotdbctl cluster stop default_cluster -N datanode_1 +#按照IoTDB 集群ip+port停止(ip+port是按照datanode中的ip+dn_rpc_port获取唯一节点或confignode中的ip+cn_internal_port获取唯一节点) +iotdbctl cluster stop default_cluster -N 192.168.1.5:6667 +#停止grafana +iotdbctl cluster stop default_cluster -N grafana +#停止prometheus +iotdbctl cluster stop default_cluster -N prometheus +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件 + +* 根据提供的节点名称或者ip:port找到对应节点位置信息,如果停止的节点是`data_node`则ip使用yaml 文件中的`dn_rpc_address`、port 使用的是yaml文件中datanode_servers 中的`dn_rpc_port`。 + 如果停止的节点是`config_node`则ip使用的是yaml文件中confignode_servers 中的`cn_internal_address` 、port 使用的是`cn_internal_port` + +* 停止该节点 + +说明:由于集群部署工具仅是调用了IoTDB集群中的stop-confignode.sh和stop-datanode.sh 脚本,在某些情况下有可能iotdb集群并未停止。 + + +#### 清理集群数据命令 + +```bash +iotdbctl cluster clean default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`配置信息 + +* 根据`confignode_servers`、`datanode_servers`中的信息,检查是否还有服务正在运行, + 如果有任何一个服务正在运行则不会执行清理命令 + +* 删除IoTDB集群中的data目录以及yaml文件中配置的`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`和`ext`目录。 + + + +#### 重启集群命令 + +```bash +iotdbctl cluster restart default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 + +* 执行上述的停止集群命令(stop),然后执行启动集群命令(start) 具体参考上面的start 和stop 命令 + +*强制重启集群命令* + +```bash +iotdbctl cluster restart default_cluster -op force +``` +会直接执行kill -9 pid 命令强制停止集群,然后启动集群 + +*重启单个节点命令* + +```bash +#按照IoTDB 节点名称重启datanode_1 +iotdbctl cluster restart default_cluster -N datanode_1 +#按照IoTDB 节点名称重启confignode_1 +iotdbctl cluster restart default_cluster -N confignode_1 +#重启grafana +iotdbctl cluster restart default_cluster -N grafana +#重启prometheus +iotdbctl cluster restart default_cluster -N prometheus +``` + +#### 集群缩容命令 + +```bash +#按照节点名称缩容 +iotdbctl cluster scalein default_cluster -N nodename +#按照ip+port缩容(ip+port按照datanode中的ip+dn_rpc_port获取唯一节点,confignode中的ip+cn_internal_port获取唯一节点) +iotdbctl cluster scalein default_cluster -N ip:port +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 判断要缩容的confignode节点和datanode是否只剩一个,如果只剩一个则不能执行缩容 + +* 然后根据ip:port或者nodename 获取要缩容的节点信息,执行缩容命令,然后销毁该节点目录,如果缩容的节点是`data_node`则ip使用yaml 文件中的`dn_rpc_address`、port 使用的是yaml文件中datanode_servers 中的`dn_rpc_port`。 + 如果缩容的节点是`config_node`则ip使用的是yaml文件中confignode_servers 中的`cn_internal_address` 、port 使用的是`cn_internal_port` + + +提示:目前一次仅支持一个节点缩容 + +#### 集群扩容命令 + +```bash +iotdbctl cluster scaleout default_cluster +``` +* 修改config/xxx.yaml 文件添加一个datanode 节点或者confignode节点 + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 找到要扩容的节点,执行上传IoTDB压缩包和jdb包(如果yaml中配置`jdk_tar_dir`和`jdk_deploy_dir`值)并解压 + +* 根据yaml文件节点配置信息生成并上传`iotdb-system.properties` + +* 执行启动该节点命令并校验节点是否启动成功 + +提示:目前一次仅支持一个节点扩容 + +#### 销毁集群命令 +```bash +iotdbctl cluster destroy default_cluster +``` + +* cluster-name 找到默认位置的 yaml 文件 + +* 根据`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`中node节点信息,检查是否节点还在运行, + 如果有任何一个节点正在运行则停止销毁命令 + +* 删除IoTDB集群中的`data`以及yaml文件配置的`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`、`ext`、`IoTDB`部署目录、 + grafana部署目录和prometheus部署目录 + +*销毁单个模块* +```bash +# 销毁grafana模块 +iotdbctl cluster destroy default_cluster -N grafana +# 销毁prometheus模块 +iotdbctl cluster destroy default_cluster -N prometheus +# 销毁iotdb模块 +iotdbctl cluster destroy default_cluster -N iotdb +``` + +#### 分发集群配置命令 +```bash +iotdbctl cluster dist-conf default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 + +* 根据yaml文件节点配置信息生成并依次上传`iotdb-system.properties`到指定节点 + +#### 热加载集群配置命令 +```bash +iotdbctl cluster reload default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 根据yaml文件节点配置信息依次在cli中执行`load configuration` + +#### 集群节点日志备份 +```bash +iotdbctl cluster dumplog default_cluster -N datanode_1,confignode_1 -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/logs' -logs '/root/data/db/iotdb/logs' +``` +* 根据 cluster-name 找到默认位置的 yaml 文件 + +* 该命令会根据yaml文件校验datanode_1,confignode_1 是否存在,然后根据配置的起止日期(startdate<=logtime<=enddate)备份指定节点datanode_1,confignode_1 的日志数据到指定服务`192.168.9.48` 端口`36000` 数据备份路径是 `/iotdb/logs` ,IoTDB日志存储路径在`/root/data/db/iotdb/logs`(非必填,如果不填写-logs xxx 默认从IoTDB安装路径/logs下面备份日志) + +| 命令 | 功能 | 是否必填 | +|------------|------------------------------------| ---| +| -h | 存放备份数据机器ip |否| +| -u | 存放备份数据机器用户名 |否| +| -pw | 存放备份数据机器密码 |否| +| -p | 存放备份数据机器端口(默认22) |否| +| -path | 存放备份数据的路径(默认当前路径) |否| +| -loglevel | 日志基本有all、info、error、warn(默认是全部) |否| +| -l | 限速(默认不限速范围0到104857601 单位Kbit/s) |否| +| -N | 配置文件集群名称多个用逗号隔开 |是| +| -startdate | 起始时间(包含默认1970-01-01) |否| +| -enddate | 截止时间(包含) |否| +| -logs | IoTDB 日志存放路径,默认是({iotdb}/logs) |否| + +#### 集群节点数据备份 +```bash +iotdbctl cluster dumpdata default_cluster -granularity partition -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/datas' +``` +* 该命令会根据yaml文件获取leader 节点,然后根据起止日期(startdate<=logtime<=enddate)备份数据到192.168.9.48 服务上的/iotdb/datas 目录下 + +| 命令 | 功能 | 是否必填 | +| ---|---------------------------------| ---| +|-h| 存放备份数据机器ip |否| +|-u| 存放备份数据机器用户名 |否| +|-pw| 存放备份数据机器密码 |否| +|-p| 存放备份数据机器端口(默认22) |否| +|-path| 存放备份数据的路径(默认当前路径) |否| +|-granularity| 类型partition |是| +|-l| 限速(默认不限速范围0到104857601 单位Kbit/s) |否| +|-startdate| 起始时间(包含) |是| +|-enddate| 截止时间(包含) |是| + +#### 集群lib包上传(升级) +```bash +iotdbctl cluster dist-lib default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 上传lib包 + +注意执行完升级后请重启IoTDB 才能生效 + +#### 集群初始化 +```bash +iotdbctl cluster init default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 +* 初始化集群配置 + +#### 查看集群进程状态 +```bash +iotdbctl cluster status default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 +* 展示集群的存活状态 + +#### 集群授权激活 + +集群激活默认是通过输入激活码激活,也可以通过-op license_path 通过license路径激活 + +* 默认激活方式 +```bash +iotdbctl cluster activate default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`配置信息 +* 读取里面的机器码 +* 等待输入激活码 + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* 激活单个节点 + +```bash +iotdbctl cluster activate default_cluster -N confignode1 +``` + +* 通过license路径方式激活 + +```bash +iotdbctl cluster activate default_cluster -op license_path +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`配置信息 +* 读取里面的机器码 +* 等待输入激活码 + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* 激活单个节点 + +```bash +iotdbctl cluster activate default_cluster -N confignode1 -op license_path +``` + +* 通过license路径方式激活 + +```bash +iotdbctl cluster activate default_cluster -op license_path +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`配置信息 +* 读取里面的机器码 +* 等待输入激活码 + +### 集群plugin分发 +```bash +#分发udf +iotdbctl cluster dist-plugin default_cluster -type U -file /xxxx/udf.jar +#分发trigger +iotdbctl cluster dist-plugin default_cluster -type T -file /xxxx/trigger.jar +#分发pipe +iotdbctl cluster dist-plugin default_cluster -type P -file /xxxx/pipe.jar +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取 `datanode_servers`配置信息 + +* 上传udf/trigger/pipe jar包 + +上传完成后需要手动执行创建udf/trigger/pipe命令 + +### 集群滚动升级 +```bash +iotdbctl cluster upgrade default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 上传lib包 +* confignode 执行停止、替换lib包、启动,然后datanode执行停止、替换lib包、启动 + + + +### 集群健康检查 +```bash +iotdbctl cluster health_check default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 +* 每个节点执行health_check.sh + +* 单个节点健康检查 +```bash +iotdbctl cluster health_check default_cluster -N datanode_1 +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`datanode_servers`配置信息 +* datanode1 执行health_check.sh + + +### 集群停机备份 +```bash +iotdbctl cluster backup default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 +* 每个节点执行backup.sh + +* 单个节点健康检查 +```bash +iotdbctl cluster backup default_cluster -N datanode_1 +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`datanode_servers`配置信息 +* datanode1 执行backup.sh + +说明:多个节点部署到单台机器,只支持 quick 模式 + +### 集群元数据导入 + +```bash +iotdbctl cluster importschema default_cluster -N datanode1 -param "-s ./dump0.csv -fd ./failed/ -lpf 10000" +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`datanode_servers`配置信息 +* datanode1 执行元数据导入import-schema.sh + +其中 -param的参数如下: + +| 命令 | 功能 | 是否必填 | +|-----|---------------------------------|------| +| -s |指定想要导入的数据文件,这里可以指定文件或者文件夹。如果指定的是文件夹,将会把文件夹中所有的后缀为csv的文件进行批量导入。 | 是 | +| -fd |指定一个目录来存放导入失败的文件,如果没有指定这个参数,失败的文件将会被保存到源数据的目录中,文件名为是源文件名加上.failed的后缀。 | 否 | +| -lpf |用于指定每个导入失败文件写入数据的行数,默认值为10000 | 否 | + + + +### 集群元数据导出 + +```bash +iotdbctl cluster exportschema default_cluster -N datanode1 -param "-t ./ -pf ./pattern.txt -lpf 10 -t 10000" +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`datanode_servers`配置信息 +* datanode1 执行元数据导入export-schema.sh + +其中 -param的参数如下: + +| 命令 | 功能 | 是否必填 | +|-----|------------------------------------------------------------|------| +| -t | 为导出的CSV文件指定输出路径 | 是 | +| -path |指定导出元数据的path pattern,指定该参数后会忽略-s参数例如:root.stock.** | 否 | +| -pf |如果未指定-path,则需指定该参数,指定查询元数据路径所在文件路径,支持 txt 文件格式,每个待导出的路径为一行。 | 否 | +| -lpf |指定导出的dump文件最大行数,默认值为10000。 | 否 | +| -timeout |指定session查询时的超时时间,单位为ms | 否 | + + + +### 集群部署工具样例介绍 +在集群部署工具安装目录中config/example 下面有3个yaml样例,如果需要可以复制到config 中进行修改即可 + +| 名称 | 说明 | +|-----------------------------|------------------------------------------------| +| default\_1c1d.yaml | 1个confignode和1个datanode 配置样例 | +| default\_3c3d.yaml | 3个confignode和3个datanode 配置样例 | +| default\_3c3d\_grafa\_prome | 3个confignode和3个datanode、Grafana、Prometheus配置样例 | + +## 数据文件夹概览工具 + +IoTDB数据文件夹概览工具用于打印出数据文件夹的结构概览信息,工具位置为 tools/tsfile/print-iotdb-data-dir。 + +### 用法 + +- Windows: + +```bash +.\print-iotdb-data-dir.bat (<输出结果的存储路径>) +``` + +- Linux or MacOs: + +```shell +./print-iotdb-data-dir.sh (<输出结果的存储路径>) +``` + +注意:如果没有设置输出结果的存储路径, 将使用相对路径"IoTDB_data_dir_overview.txt"作为默认值。 + +### 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data +```````````````````````` +Starting Printing the IoTDB Data Directory Overview +```````````````````````` +output save path:IoTDB_data_dir_overview.txt +data dir num:1 +143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +|============================================================== +|D:\github\master\iotdb\data\datanode\data +|--sequence +| |--root.redirect0 +| | |--1 +| | | |--0 +| |--root.redirect1 +| | |--2 +| | | |--0 +| |--root.redirect2 +| | |--3 +| | | |--0 +| |--root.redirect3 +| | |--4 +| | | |--0 +| |--root.redirect4 +| | |--5 +| | | |--0 +| |--root.redirect5 +| | |--6 +| | | |--0 +| |--root.sg1 +| | |--0 +| | | |--0 +| | | |--2760 +|--unsequence +|============================================================== +````````````````````````` + +## TsFile概览工具 + +TsFile概览工具用于以概要模式打印出一个TsFile的内容,工具位置为 tools/tsfile/print-tsfile。 + +### 用法 + +- Windows: + +```bash +.\print-tsfile-sketch.bat (<输出结果的存储路径>) +``` + +- Linux or MacOs: + +```shell +./print-tsfile-sketch.sh (<输出结果的存储路径>) +``` + +注意:如果没有设置输出结果的存储路径, 将使用相对路径"TsFile_sketch_view.txt"作为默认值。 + +### 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt +```````````````````````` +Starting Printing the TsFile Sketch +```````````````````````` +TsFile path:D:\github\master\1669359533965-1-0-0.tsfile +Sketch save path:D:\github\master\sketch.txt +148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +-------------------------------- TsFile Sketch -------------------------------- +file path: D:\github\master\1669359533965-1-0-0.tsfile +file length: 2974 + + POSITION| CONTENT + -------- ------- + 0| [magic head] TsFile + 6| [version number] 3 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 + 7| [Chunk Group Header] + | [marker] 0 + | [deviceID] root.sg1.d1 + 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends + 2656| [marker] 2 + 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [ChunkIndex] offset=20 + 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [ChunkIndex] offset=893 + 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [ChunkIndex] offset=1766 + 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT + | + | +||||||||||||||||||||| [TsFileMetadata] begins + 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE + | + | + | [meta offset] 2656 + | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 +||||||||||||||||||||| [TsFileMetadata] ends + 2964| [TsFileMetadataSize] 73 + 2968| [magic tail] TsFile + 2974| END of TsFile +---------------------------- IndexOfTimerseriesIndex Tree ----------------------------- + [MetadataIndex:LEAF_DEVICE] + └──────[root.sg1.d1,2870] + [MetadataIndex:LEAF_MEASUREMENT] + └──────[s1,2657] +---------------------------------- TsFile Sketch End ---------------------------------- +````````````````````````` + +解释: + +- 以"|"为分隔,左边是在TsFile文件中的实际位置,右边是梗概内容。 +- "|||||||||||||||||||||"是为增强可读性而添加的导引信息,不是TsFile中实际存储的数据。 +- 最后打印的"IndexOfTimerseriesIndex Tree"是对TsFile文件末尾的元数据索引树的重新整理打印,便于直观理解,不是TsFile中存储的实际数据。 + +## TsFile Resource概览工具 + +TsFile resource概览工具用于打印出TsFile resource文件的内容,工具位置为 tools/tsfile/print-tsfile-resource-files。 + +### 用法 + +- Windows: + +```bash +.\print-tsfile-resource-files.bat +``` + +- Linux or MacOs: + +``` +./print-tsfile-resource-files.sh +``` + +### 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. +````````````````````````` + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. +````````````````````````` diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_apache.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_apache.md new file mode 100644 index 00000000..cc0f8ee0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_apache.md @@ -0,0 +1,168 @@ + + + +# Prometheus + +## 监控指标的 Prometheus 映射关系 + +> 对于 Metric Name 为 name, Tags 为 K1=V1, ..., Kn=Vn 的监控指标有如下映射,其中 value 为具体值 + +| 监控指标类型 | 映射关系 | +| ---------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value | +| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value | +| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value | +| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="mean"} value | +| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value | + +## 修改配置文件 + +1) 以 DataNode 为例,修改 iotdb-system.properties 配置文件如下: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=CORE +dn_metric_prometheus_reporter_port=9091 +``` + +2) 启动 IoTDB DataNode + +3) 打开浏览器或者用```curl``` 访问 ```http://servier_ip:9091/metrics```, 就能得到如下 metric 数据: + +``` +... +# HELP file_count +# TYPE file_count gauge +file_count{name="wal",} 0.0 +file_count{name="unseq",} 0.0 +file_count{name="seq",} 2.0 +... +``` + +## Prometheus + Grafana + +如上所示,IoTDB 对外暴露出标准的 Prometheus 格式的监控指标数据,可以使用 Prometheus 采集并存储监控指标,使用 Grafana +可视化监控指标。 + +IoTDB、Prometheus、Grafana三者的关系如下图所示: + +![iotdb_prometheus_grafana](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) + +1. IoTDB在运行过程中持续收集监控指标数据。 +2. Prometheus以固定的间隔(可配置)从IoTDB的HTTP接口拉取监控指标数据。 +3. Prometheus将拉取到的监控指标数据存储到自己的TSDB中。 +4. Grafana以固定的间隔(可配置)从Prometheus查询监控指标数据并绘图展示。 + +从交互流程可以看出,我们需要做一些额外的工作来部署和配置Prometheus和Grafana。 + +比如,你可以对Prometheus进行如下的配置(部分参数可以自行调整)来从IoTDB获取监控数据 + +```yaml +job_name: pull-metrics +honor_labels: true +honor_timestamps: true +scrape_interval: 15s +scrape_timeout: 10s +metrics_path: /metrics +scheme: http +follow_redirects: true +static_configs: + - targets: + - localhost:9091 +``` + +更多细节可以参考下面的文档: + +[Prometheus安装使用文档](https://prometheus.io/docs/prometheus/latest/getting_started/) + +[Prometheus从HTTP接口拉取metrics数据的配置说明](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) + +[Grafana安装使用文档](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) + +[Grafana从Prometheus查询数据并绘图的文档](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) + +## Apache IoTDB Dashboard + +`Apache IoTDB Dashboard`是 IoTDB 企业版的配套产品,支持统一集中式运维管理,可通过一个监控面板监控多个集群。你可以联系商务获取到 Dashboard 的 Json文件。 + + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20default%20cluster.png) + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20cluster2.png) + + +### 集群概览 + +可以监控包括但不限于: +- 集群总CPU核数、总内存空间、总硬盘空间 +- 集群包含多少个ConfigNode与DataNode +- 集群启动时长 +- 集群写入速度 +- 集群各节点当前CPU、内存、磁盘使用率 +- 分节点的信息 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%A6%82%E8%A7%88.png) + +### 数据写入 + +可以监控包括但不限于: +- 写入平均耗时、耗时中位数、99%分位耗时 +- WAL文件数量与尺寸 +- 节点 WAL flush SyncBuffer 耗时 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%86%99%E5%85%A5.png) + +### 数据查询 + +可以监控包括但不限于: +- 节点查询加载时间序列元数据耗时 +- 节点查询读取时间序列耗时 +- 节点查询修改时间序列元数据耗时 +- 节点查询加载Chunk元数据列表耗时 +- 节点查询修改Chunk元数据耗时 +- 节点查询按照Chunk元数据过滤耗时 +- 节点查询构造Chunk Reader耗时的平均值 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%9F%A5%E8%AF%A2.png) + +### 存储引擎 + +可以监控包括但不限于: +- 分类型的文件数量、大小 +- 处于各阶段的TsFile数量、大小 +- 各类任务的数量与耗时 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E.png) + +### 系统监控 + +可以监控包括但不限于: +- 系统内存、交换内存、进程内存 +- 磁盘空间、文件数、文件尺寸 +- JVM GC时间占比、分类型的GC次数、GC数据量、各年代的堆内存占用 +- 网络传输速率、包发送速率 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E5%86%85%E5%AD%98%E4%B8%8E%E7%A1%AC%E7%9B%98.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9Fjvm.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E7%BD%91%E7%BB%9C.png) diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_timecho.md new file mode 100644 index 00000000..850c75ea --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Monitor-Tool_timecho.md @@ -0,0 +1,168 @@ + + + +# 监控工具 + +## 监控指标的 Prometheus 映射关系 + +> 对于 Metric Name 为 name, Tags 为 K1=V1, ..., Kn=Vn 的监控指标有如下映射,其中 value 为具体值 + +| 监控指标类型 | 映射关系 | +| ---------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value | +| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value | +| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value | +| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="mean"} value | +| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value | + +## 修改配置文件 + +1) 以 DataNode 为例,修改 iotdb-system.properties 配置文件如下: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=CORE +dn_metric_prometheus_reporter_port=9091 +``` + +2) 启动 IoTDB DataNode + +3) 打开浏览器或者用```curl``` 访问 ```http://servier_ip:9091/metrics```, 就能得到如下 metric 数据: + +``` +... +# HELP file_count +# TYPE file_count gauge +file_count{name="wal",} 0.0 +file_count{name="unseq",} 0.0 +file_count{name="seq",} 2.0 +... +``` + +## Prometheus + Grafana + +如上所示,IoTDB 对外暴露出标准的 Prometheus 格式的监控指标数据,可以使用 Prometheus 采集并存储监控指标,使用 Grafana +可视化监控指标。 + +IoTDB、Prometheus、Grafana三者的关系如下图所示: + +![iotdb_prometheus_grafana](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) + +1. IoTDB在运行过程中持续收集监控指标数据。 +2. Prometheus以固定的间隔(可配置)从IoTDB的HTTP接口拉取监控指标数据。 +3. Prometheus将拉取到的监控指标数据存储到自己的TSDB中。 +4. Grafana以固定的间隔(可配置)从Prometheus查询监控指标数据并绘图展示。 + +从交互流程可以看出,我们需要做一些额外的工作来部署和配置Prometheus和Grafana。 + +比如,你可以对Prometheus进行如下的配置(部分参数可以自行调整)来从IoTDB获取监控数据 + +```yaml +job_name: pull-metrics +honor_labels: true +honor_timestamps: true +scrape_interval: 15s +scrape_timeout: 10s +metrics_path: /metrics +scheme: http +follow_redirects: true +static_configs: + - targets: + - localhost:9091 +``` + +更多细节可以参考下面的文档: + +[Prometheus安装使用文档](https://prometheus.io/docs/prometheus/latest/getting_started/) + +[Prometheus从HTTP接口拉取metrics数据的配置说明](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) + +[Grafana安装使用文档](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) + +[Grafana从Prometheus查询数据并绘图的文档](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) + +## Apache IoTDB Dashboard + +我们提供了Apache IoTDB Dashboard,支持统一集中式运维管理,可通过一个监控面板监控多个集群。 + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20default%20cluster.png) + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20cluster2.png) + +你可以在企业版中获取到 Dashboard 的 Json文件。 + +### 集群概览 + +可以监控包括但不限于: +- 集群总CPU核数、总内存空间、总硬盘空间 +- 集群包含多少个ConfigNode与DataNode +- 集群启动时长 +- 集群写入速度 +- 集群各节点当前CPU、内存、磁盘使用率 +- 分节点的信息 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%A6%82%E8%A7%88.png) + +### 数据写入 + +可以监控包括但不限于: +- 写入平均耗时、耗时中位数、99%分位耗时 +- WAL文件数量与尺寸 +- 节点 WAL flush SyncBuffer 耗时 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%86%99%E5%85%A5.png) + +### 数据查询 + +可以监控包括但不限于: +- 节点查询加载时间序列元数据耗时 +- 节点查询读取时间序列耗时 +- 节点查询修改时间序列元数据耗时 +- 节点查询加载Chunk元数据列表耗时 +- 节点查询修改Chunk元数据耗时 +- 节点查询按照Chunk元数据过滤耗时 +- 节点查询构造Chunk Reader耗时的平均值 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%9F%A5%E8%AF%A2.png) + +### 存储引擎 + +可以监控包括但不限于: +- 分类型的文件数量、大小 +- 处于各阶段的TsFile数量、大小 +- 各类任务的数量与耗时 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E.png) + +### 系统监控 + +可以监控包括但不限于: +- 系统内存、交换内存、进程内存 +- 磁盘空间、文件数、文件尺寸 +- JVM GC时间占比、分类型的GC次数、GC数据量、各年代的堆内存占用 +- 网络传输速率、包发送速率 + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E5%86%85%E5%AD%98%E4%B8%8E%E7%A1%AC%E7%9B%98.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9Fjvm.png) + +![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E7%B3%BB%E7%BB%9F%20%E7%BD%91%E7%BB%9C.png) diff --git a/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Workbench_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Workbench_timecho.md new file mode 100644 index 00000000..0dc256d3 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/Tools-System/Workbench_timecho.md @@ -0,0 +1,31 @@ +# 可视化控制台 +## 第1章 产品介绍 +IoTDB可视化控制台是在IoTDB企业版时序数据库基础上针对工业场景的实时数据收集、存储与分析一体化的数据管理场景开发的扩展组件,旨在为用户提供高效、可靠的实时数据存储和查询解决方案。它具有体量轻、性能高、易使用的特点,完美对接 Hadoop 与 Spark 生态,适用于工业物联网应用中海量时间序列数据高速写入和复杂分析查询的需求。 + +## 第2章 使用说明 +IoTDB的可视化控制台包含以下功能模块: +| **功能模块** | **功能说明** | +| ------------ | ------------------------------------------------------------ | +| 实例管理 | 支持对连接实例进行统一管理,支持创建、编辑和删除,同时可以可视化呈现多实例的关系,帮助客户更清晰的管理多数据库实例 | +| 首页 | 支持查看数据库实例中各节点的服务运行状态(如是否激活、是否运行、IP信息等),支持查看集群、ConfigNode、DataNode运行监控状态,对数据库运行健康度进行监控,判断实例是否有潜在运行问题 | +| 测点列表 | 支持直接查看实例中的测点信息,包括所在数据库信息(如数据库名称、数据保存时间、设备数量等),及测点信息(测点名称、数据类型、压缩编码等),同时支持单条或批量创建、导出、删除测点 | +| 数据模型 | 支持查看各层级从属关系,将层级模型直观展示 | +| 数据查询 | 支持对常用数据查询场景提供界面式查询交互,并对查询数据进行批量导入、导出 | +| 统计查询 | 支持对常用数据统计场景提供界面式查询交互,如最大值、最小值、平均值、总和的结果输出。 | +| SQL操作 | 支持对数据库SQL进行界面式交互,单条或多条语句执行,结果的展示和导出 | +| 趋势 | 支持一键可视化查看数据整体趋势,对选中测点进行实时&历史数据绘制,观察测点实时&历史运行状态 | +| 分析 | 支持将数据通过不同的分析方式(如傅里叶变换等)进行可视化展示 | +| 视图 | 支持通过界面来查看视图名称、视图描述、结果测点以及表达式等信息,同时还可以通过界面交互快速的创建、编辑、删除视图 | +| 数据同步 | 支持对数据库间的数据同步任务进行直观创建、查看、管理,支持直接查看任务运行状态、同步数据和目标地址,还可以通过界面实时观察到同步状态的监控指标变化 | +| 权限管理 | 支持对权限进行界面管控,用于管理和控制数据库用户访问和操作数据库的权限 | +| 审计日志 | 支持对用户在数据库上的操作进行详细记录,包括DDL、DML和查询操作。帮助用户追踪和识别潜在的安全威胁、数据库错误和滥用行为 | + +主要功能展示: +* 首页 +![首页.png](https://alioss.timecho.com/docs/img/%E9%A6%96%E9%A1%B5.png) +* 测点列表 +![测点列表.png](https://alioss.timecho.com/docs/img/workbench-1.png) +* 数据查询 +![数据查询.png](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E6%9F%A5%E8%AF%A2.png) +* 趋势 +![历史趋势.png](https://alioss.timecho.com/docs/img/%E5%8E%86%E5%8F%B2%E8%B6%8B%E5%8A%BF.png) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/AINode_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/AINode_timecho.md new file mode 100644 index 00000000..291d2a1e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/AINode_timecho.md @@ -0,0 +1,650 @@ + + +# AI能力(AINode) + +AINode 是 IoTDB 在ConfigNode、DataNode后提供的第三种内生节点,该节点通过与 IoTDB 集群的 DataNode、ConfigNode 的交互,扩展了对时间序列进行机器学习分析的能力,支持从外部引入已有机器学习模型进行注册,并使用注册的模型在指定时序数据上通过简单 SQL 语句完成时序分析任务的过程,将模型的创建、管理及推理融合在数据库引擎中。目前已提供常见时序分析场景(例如预测与异常检测)的机器学习算法或自研模型。 + +系统架构如下图所示: +::: center + +::: +三种节点的职责如下: + +- **ConfigNode**:负责保存和管理模型的元信息;负责分布式节点管理。 +- **DataNode**:负责接收并解析用户的 SQL请求;负责存储时间序列数据;负责数据的预处理计算。 +- **AINode**:负责模型文件的导入创建以及模型推理。 + +## 优势特点 + +与单独构建机器学习服务相比,具有以下优势: + +- **简单易用**:无需使用 Python 或 Java 编程,使用 SQL 语句即可完成机器学习模型管理与推理的完整流程。如创建模型可使用CREATE MODEL语句、使用模型进行推理可使用CALL INFERENCE(...)语句等,使用更加简单便捷。 + +- **避免数据迁移**:使用 IoTDB 原生机器学习可以将存储在 IoTDB 中的数据直接应用于机器学习模型的推理,无需将数据移动到单独的机器学习服务平台,从而加速数据处理、提高安全性并降低成本。 + +![](https://alioss.timecho.com/docs/img/h1.PNG) + +- **内置先进算法**:支持业内领先机器学习分析算法,覆盖典型时序分析任务,为时序数据库赋能原生数据分析能力。如: + - **时间序列预测(Time Series Forecasting)**:从过去时间序列中学习变化模式;从而根据给定过去时间的观测值,输出未来序列最可能的预测。 + - **时序异常检测(Anomaly Detection for Time Series)**:在给定的时间序列数据中检测和识别异常值,帮助发现时间序列中的异常行为。 + - **时间序列标注(Time Series Annotation)**:为每个数据点或特定时间段添加额外的信息或标记,例如事件发生、异常点、趋势变化等,以便更好地理解和分析数据。 + + +## 基本概念 + +- **模型(Model)**:机器学习模型,以时序数据作为输入,输出分析任务的结果或决策。模型是AINode 的基本管理单元,支持模型的增(注册)、删、查、用(推理)。 +- **创建(Create)**: 将外部设计或训练好的模型文件或算法加载到MLNode中,由IoTDB统一管理与使用。 +- **推理(Inference)**:使用创建的模型在指定时序数据上完成该模型适用的时序分析任务的过程。 +- **内置能力(Built-in)**:AINode 自带常见时序分析场景(例如预测与异常检测)的机器学习算法或自研模型。 + +::: center + +::: + +## 安装部署 + +AINode 的部署可参考文档 [部署指导](../Deployment-and-Maintenance/AINode_Deployment_timecho.md#AINode-部署) 章节。 + +## 使用指导 + +AINode 对时序数据相关的深度学习模型提供了模型创建及删除的流程,内置模型无需创建及删除,可直接使用,并且在完成推理后创建的内置模型实例将自动销毁。 + +### 注册模型 + +通过指定模型输入输出的向量维度,可以注册训练好的深度学习模型,从而用于模型推理。 + +符合以下内容的模型可以注册到AINode中: + 1. AINode 支持的PyTorch 2.1.0、 2.2.0版本训练的模型,需避免使用2.2.0版本以上的特性。 + 2. AINode支持使用PyTorch JIT存储的模型,模型文件需要包含模型的参数和结构。 + 3. 模型输入序列可以包含一列或多列,若有多列,需要和模型能力、模型配置文件对应。 + 4. 模型的输入输出维度必须在`config.yaml`配置文件中明确定义。使用模型时,必须严格按照`config.yaml`配置文件中定义的输入输出维度。如果输入输出列数不匹配配置文件,将会导致错误。 + +下方为模型注册的SQL语法定义。 + +```SQL +create model using uri +``` + +SQL中参数的具体含义如下: + +- model_name:模型的全局唯一标识,不可重复。模型名称具备以下约束: + + - 允许出现标识符 [ 0-9 a-z A-Z _ ] (字母,数字,下划线) + - 长度限制为2-64字符 + - 大小写敏感 + +- uri:模型注册文件的资源路径,路径下应包含**模型权重model.pt文件和模型的元数据描述文件config.yaml** + + - 模型权重文件:深度学习模型训练完成后得到的权重文件,目前支持pytorch训练得到的.pt文件 + + - yaml元数据描述文件:模型注册时需要提供的与模型结构有关的参数,其中必须包含模型的输入输出维度用于模型推理: + + - | **参数名** | **参数描述** | **示例** | + | ------------ | ---------------------------- | -------- | + | input_shape | 模型输入的行列,用于模型推理 | [96,2] | + | output_shape | 模型输出的行列,用于模型推理 | [48,2] | + + - ​ 除了模型推理外,还可以指定模型输入输出的数据类型: + + - | **参数名** | **参数描述** | **示例** | + | ----------- | ------------------ | --------------------- | + | input_type | 模型输入的数据类型 | ['float32','float32'] | + | output_type | 模型输出的数据类型 | ['float32','float32'] | + + - ​ 除此之外,可以额外指定备注信息用于在模型管理时进行展示 + + - | **参数名** | **参数描述** | **示例** | + | ---------- | ---------------------------------------------- | ------------------------------------------- | + | attributes | 可选,用户自行设定的模型备注信息,用于模型展示 | 'model_type': 'dlinear','kernel_size': '25' | + + +除了本地模型文件的注册,还可以通过URI来指定远程资源路径来进行注册,使用开源的模型仓库(例如HuggingFace)。 + +#### 示例 + +在当前的example文件夹下,包含model.pt和config.yaml文件,model.pt为训练得到,config.yaml的内容如下: + +```YAML +configs: + # 必选项 + input_shape: [96, 2] # 表示模型接收的数据为96行x2列 + output_shape: [48, 2] # 表示模型输出的数据为48行x2列 + + # 可选项 默认为全部float32,列数为shape对应的列数 + input_type: ["int64","int64"] #输入对应的数据类型,需要与输入列数匹配 + output_type: ["text","int64"] #输出对应的数据类型,需要与输出列数匹配 + +attributes: # 可选项 为用户自定义的备注信息 + 'model_type': 'dlinear' + 'kernel_size': '25' +``` + +指定该文件夹作为加载路径就可以注册该模型 + +```SQL +IoTDB> create model dlinear_example using uri "file://./example" +``` + +也可以从huggingFace上下载对应的模型文件进行注册 + +```SQL +IoTDB> create model dlinear_example using uri "https://huggingface.com/IoTDBML/dlinear/" +``` + +SQL执行后会异步进行注册的流程,可以通过模型展示查看模型的注册状态(见模型展示章节),注册成功的耗时主要受到模型文件大小的影响。 + +模型注册完成后,就可以通过使用正常查询的方式调用具体函数,进行模型推理。 + +### 查看模型 + +注册成功的模型可以通过show models指令查询模型的具体信息。其SQL定义如下: + +```SQL +show models + +show models +``` + +除了直接展示所有模型的信息外,可以指定model id来查看某一具体模型的信息。模型展示的结果中包含如下信息: + +| **ModelId** | **State** | **Configs** | **Attributes** | +| ------------ | ------------------------------------- | ---------------------------------------------- | -------------- | +| 模型唯一标识 | 模型注册状态(LOADING,ACTIVE,DROPPING) | InputShape, outputShapeInputTypes, outputTypes | 模型备注信息 | + +其中,State用于展示当前模型注册的状态,包含以下三个阶段 + +- **LOADING**:已经在configNode中添加对应的模型元信息,正将模型文件传输到AINode节点上 +- **ACTIVE:** 模型已经设置完成,模型处于可用状态 +- **DROPPING**:模型删除中,正在从configNode以及AINode处删除模型相关信息 +- **UNAVAILABLE**: 模型创建失败,可以通过drop model删除创建失败的model_name。 + +#### 示例 + +```SQL +IoTDB> show models + + ++---------------------+--------------------------+-----------+----------------------------+-----------------------+ +| ModelId| ModelType| State| Configs| Notes| ++---------------------+--------------------------+-----------+----------------------------+-----------------------+ +| dlinear_example| USER_DEFINED| ACTIVE| inputShape:[96,2]| | +| | | | outputShape:[48,2]| | +| | | | inputDataType:[float,float]| | +| | | |outputDataType:[float,float]| | +| _STLForecaster| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +| _NaiveForecaster| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +| _ARIMA| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +|_ExponentialSmoothing| BUILT_IN_FORECAST| ACTIVE| |Built-in model in IoTDB| +| _GaussianHMM|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| +| _GMMHMM|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| +| _Stray|BUILT_IN_ANOMALY_DETECTION| ACTIVE| |Built-in model in IoTDB| ++---------------------+--------------------------+-----------+------------------------------------------------------------+-----------------------+ +``` + +我们前面已经注册了对应的模型,可以通过对应的指定查看模型状态,active表明模型注册成功,可用于推理。 + +### 删除模型 + +对于注册成功的模型,用户可以通过SQL进行删除。该操作除了删除configNode上的元信息外,还会删除所有AINode下的相关模型文件。其SQL如下: + +```SQL +drop model +``` + +需要指定已经成功注册的模型model_name来删除对应的模型。由于模型删除涉及多个节点上的数据删除,操作不会立即完成,此时模型的状态为DROPPING,该状态的模型不能用于模型推理。 + +### 使用内置模型推理 + +SQL语法如下: + + +```SQL +call inference(,sql[,=]) +``` + +内置模型推理无需注册流程,通过call关键字,调用inference函数就可以使用模型的推理功能,其对应的参数介绍如下: + +- **built_in_model_name:** 内置模型名称 +- **parameterName**:参数名 +- **parameterValue**:参数值 + +#### 内置模型及参数说明 + +目前已内置如下机器学习模型,具体参数说明请参考以下链接。 + +| 模型 | built_in_model_name | 任务类型 | 参数说明 | +| -------------------- | --------------------- | -------- | ------------------------------------------------------------ | +| Arima | _Arima | 预测 | [Arima参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.arima.ARIMA.html?highlight=Arima) | +| STLForecaster | _STLForecaster | 预测 | [STLForecaster参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.trend.STLForecaster.html#sktime.forecasting.trend.STLForecaster) | +| NaiveForecaster | _NaiveForecaster | 预测 | [NaiveForecaster参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.naive.NaiveForecaster.html#naiveforecaster) | +| ExponentialSmoothing | _ExponentialSmoothing | 预测 | [ExponentialSmoothing参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.forecasting.exp_smoothing.ExponentialSmoothing.html) | +| GaussianHMM | _GaussianHMM | 标注 | [GaussianHMM参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.hmm_learn.gaussian.GaussianHMM.html) | +| GMMHMM | _GMMHMM | 标注 | [GMMHMM参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.hmm_learn.gmm.GMMHMM.html) | +| Stray | _Stray | 异常检测 | [Stray参数说明](https://www.sktime.net/en/latest/api_reference/auto_generated/sktime.annotation.stray.STRAY.html) | + +#### 示例 + +下面是使用内置模型推理的一个操作示例,使用内置的Stray模型进行异常检测算法,输入为`[144,1]`,输出为`[144,1]`,我们通过SQL使用其进行推理。 + +```SQL +IoTDB> select * from root.eg.airline ++-----------------------------+------------------+ +| Time|root.eg.airline.s0| ++-----------------------------+------------------+ +|1949-01-31T00:00:00.000+08:00| 224.0| +|1949-02-28T00:00:00.000+08:00| 118.0| +|1949-03-31T00:00:00.000+08:00| 132.0| +|1949-04-30T00:00:00.000+08:00| 129.0| +...... +|1960-09-30T00:00:00.000+08:00| 508.0| +|1960-10-31T00:00:00.000+08:00| 461.0| +|1960-11-30T00:00:00.000+08:00| 390.0| +|1960-12-31T00:00:00.000+08:00| 432.0| ++-----------------------------+------------------+ +Total line number = 144 + +IoTDB> call inference(_Stray, "select s0 from root.eg.airline", k=2) ++-------+ +|output0| ++-------+ +| 0| +| 0| +| 0| +| 0| +...... +| 1| +| 1| +| 0| +| 0| +| 0| +| 0| ++-------+ +Total line number = 144 +``` + +### 使用深度学习模型推理 + +SQL语法如下: + +```SQL +call inference(,sql[,window=]) + + +window_function: + head(window_size) + tail(window_size) + count(window_size,sliding_step) +``` + +在完成模型的注册后,通过call关键字,调用inference函数就可以使用模型的推理功能,其对应的参数介绍如下: + +- **model_name**: 对应一个已经注册的模型 +- **sql**:sql查询语句,查询的结果作为模型的输入进行模型推理。查询的结果中行列的维度需要与具体模型config中指定的大小相匹配。(这里的sql不建议使用`SELECT *`子句,因为在IoTDB中,`*`并不会对列进行排序,因此列的顺序是未定义的,可以使用`SELECT s0,s1`的方式确保列的顺序符合模型输入的预期) +- **window_function**: 推理过程中可以使用的窗口函数,目前提供三种类型的窗口函数用于辅助模型推理: + - **head(window_size)**: 获取数据中最前的window_size个点用于模型推理,该窗口可用于数据裁剪 + ![](https://alioss.timecho.com/docs/img/AINode-call1.png) + + - **tail(window_size)**:获取数据中最后的window_size个点用于模型推,该窗口可用于数据裁剪 + ![](https://alioss.timecho.com/docs/img/AINode-call2.png) + + - **count(window_size, sliding_step)**:基于点数的滑动窗口,每个窗口的数据会分别通过模型进行推理,如下图示例所示,window_size为2的窗口函数将输入数据集分为三个窗口,每个窗口分别进行推理运算生成结果。该窗口可用于连续推理 + ![](https://alioss.timecho.com/docs/img/AINode-call3.png) + +**说明1: window可以用来解决sql查询结果和模型的输入行数要求不一致时的问题,对行进行裁剪。需要注意的是,当列数不匹配或是行数直接少于模型需求时,推理无法进行,会返回错误信息。** + +**说明2: 在深度学习应用中,经常将时间戳衍生特征(数据中的时间列)作为生成式任务的协变量,一同输入到模型中以提升模型的效果,但是在模型的输出结果中一般不包含时间列。为了保证实现的通用性,模型推理结果只对应模型的真实输出,如果模型不输出时间列,则结果中不会包含。** + + +#### 示例 + +下面是使用深度学习模型推理的一个操作示例,针对上面提到的输入为`[96,2]`,输出为`[48,2]`的`dlinear`预测模型,我们通过SQL使用其进行推理。 + +```Shell +IoTDB> select s1,s2 from root.** ++-----------------------------+-------------------+-------------------+ +| Time| root.eg.etth.s0| root.eg.etth.s1| ++-----------------------------+-------------------+-------------------+ +|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| +|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| +|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| +|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| +|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| +|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| +|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| +...... +|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| +|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| +|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| +|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| +|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| +|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| +|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| ++-----------------------------+-------------------+-------------------+ +Total line number = 96 + +IoTDB> call inference(dlinear_example,"select s0,s1 from root.**") ++--------------------------------------------+-----------------------------+ +| _result_0| _result_1| ++--------------------------------------------+-----------------------------+ +| 0.726302981376648| 1.6549958229064941| +| 0.7354921698570251| 1.6482787370681763| +| 0.7238251566886902| 1.6278168201446533| +...... +| 0.7692174911499023| 1.654654049873352| +| 0.7685555815696716| 1.6625318765640259| +| 0.7856493592262268| 1.6508299350738525| ++--------------------------------------------+-----------------------------+ +Total line number = 48 +``` + +#### 使用tail/head窗口函数的示例 + +当数据量不定且想要取96行最新数据用于推理时,可以使用对应的窗口函数tail。head函数的用法与其类似,不同点在于其取的是最早的96个点。 + +```Shell +IoTDB> select s1,s2 from root.** ++-----------------------------+-------------------+-------------------+ +| Time| root.eg.etth.s0| root.eg.etth.s1| ++-----------------------------+-------------------+-------------------+ +|1988-01-01T00:00:00.000+08:00| 0.7355| 1.211| +...... +|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| +|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| +|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| +|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| +|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| +|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| +|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| +...... +|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| +|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| +|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| +|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| +|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| +|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| +|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| ++-----------------------------+-------------------+-------------------+ +Total line number = 996 + +IoTDB> call inference(dlinear_example,"select s0,s1 from root.**",window=tail(96)) ++--------------------------------------------+-----------------------------+ +| _result_0| _result_1| ++--------------------------------------------+-----------------------------+ +| 0.726302981376648| 1.6549958229064941| +| 0.7354921698570251| 1.6482787370681763| +| 0.7238251566886902| 1.6278168201446533| +...... +| 0.7692174911499023| 1.654654049873352| +| 0.7685555815696716| 1.6625318765640259| +| 0.7856493592262268| 1.6508299350738525| ++--------------------------------------------+-----------------------------+ +Total line number = 48 +``` + +#### 使用count窗口函数的示例 + +该窗口主要用于计算式任务,当任务对应的模型一次只能处理固定行数据而最终想要的确实多组预测结果时,使用该窗口函数可以使用点数滑动窗口进行连续推理。假设我们现在有一个异常检测模型anomaly_example(input: [24,2], output[1,1]),对每24行数据会生成一个0/1的标签,其使用示例如下: + +```Shell +IoTDB> select s1,s2 from root.** ++-----------------------------+-------------------+-------------------+ +| Time| root.eg.etth.s0| root.eg.etth.s1| ++-----------------------------+-------------------+-------------------+ +|1990-01-01T00:00:00.000+08:00| 0.7855| 1.611| +|1990-01-02T00:00:00.000+08:00| 0.7818| 1.61| +|1990-01-03T00:00:00.000+08:00| 0.7867| 1.6293| +|1990-01-04T00:00:00.000+08:00| 0.786| 1.637| +|1990-01-05T00:00:00.000+08:00| 0.7849| 1.653| +|1990-01-06T00:00:00.000+08:00| 0.7866| 1.6537| +|1990-01-07T00:00:00.000+08:00| 0.7886| 1.662| +...... +|1990-03-31T00:00:00.000+08:00| 0.7585| 1.678| +|1990-04-01T00:00:00.000+08:00| 0.7587| 1.6763| +|1990-04-02T00:00:00.000+08:00| 0.76| 1.6813| +|1990-04-03T00:00:00.000+08:00| 0.7669| 1.684| +|1990-04-04T00:00:00.000+08:00| 0.7645| 1.677| +|1990-04-05T00:00:00.000+08:00| 0.7625| 1.68| +|1990-04-06T00:00:00.000+08:00| 0.7617| 1.6917| ++-----------------------------+-------------------+-------------------+ +Total line number = 96 + +IoTDB> call inference(anomaly_example,"select s0,s1 from root.**",window=count(24,24)) ++-------------------------+ +| _result_0| ++-------------------------+ +| 0| +| 1| +| 1| +| 0| ++-------------------------+ +Total line number = 4 +``` + +其中结果集中每行的标签对应每24行数据为一组,输入该异常检测模型后的输出。 + +## 权限管理 + +使用AINode相关的功能时,可以使用IoTDB本身的鉴权去做一个权限管理,用户只有在具备 USE_MODEL 权限时,才可以使用模型管理的相关功能。当使用推理功能时,用户需要有访问输入模型的SQL对应的源序列的权限。 + +| 权限名称 | 权限范围 | 管理员用户(默认ROOT) | 普通用户 | 路径相关 | +| --------- | --------------------------------- | ---------------------- | -------- | -------- | +| USE_MODEL | create model / show models / drop model | √ | √ | x | +| READ_DATA | call inference | √ | √ | √ | + +## 实际案例 + +### 电力负载预测 + +在部分工业场景下,会存在预测电力负载的需求,预测结果可用于优化电力供应、节约能源和资源、支持规划和扩展以及增强电力系统的可靠性。 + +我们所使用的 ETTh1 的测试集的数据为[ETTh1](https://alioss.timecho.com/docs/img/ETTh1.csv)。 + + +包含间隔1h采集一次的电力数据,每条数据由负载和油温构成,分别为:High UseFul Load, High UseLess Load, Middle UseLess Load, Low UseFul Load, Low UseLess Load, Oil Temperature。 + +在该数据集上,IoTDB-ML的模型推理功能可以通过以往高中低三种负载的数值和对应时间戳油温的关系,预测未来一段时间内的油温,赋能电网变压器的自动调控和监视。 + +#### 步骤一:数据导入 + +用户可以使用tools文件夹中的`import-csv.sh` 向 IoTDB 中导入 ETT 数据集 + +```Bash +bash ./import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ../../ETTh1.csv +``` + +#### 步骤二:模型导入 + +我们可以在iotdb-cli 中输入以下SQL从 huggingface 上拉取一个已经训练好的模型进行注册,用于后续的推理。 + +```SQL +create model dlinear using uri 'https://huggingface.co/hvlgo/dlinear/tree/main' +``` + +该模型基于较为轻量化的深度模型DLinear训练而得,能够以相对快的推理速度尽可能多地捕捉到序列内部的变化趋势和变量间的数据变化关系,相较于其他更深的模型更适用于快速实时预测。 + +#### 步骤三:模型推理 + +```Shell +IoTDB> select s0,s1,s2,s3,s4,s5,s6 from root.eg.etth LIMIT 96 ++-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ +| Time|root.eg.etth.s0|root.eg.etth.s1|root.eg.etth.s2|root.eg.etth.s3|root.eg.etth.s4|root.eg.etth.s5|root.eg.etth.s6| ++-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ +|2017-10-20T00:00:00.000+08:00| 10.449| 3.885| 8.706| 2.025| 2.041| 0.944| 8.864| +|2017-10-20T01:00:00.000+08:00| 11.119| 3.952| 8.813| 2.31| 2.071| 1.005| 8.442| +|2017-10-20T02:00:00.000+08:00| 9.511| 2.88| 7.533| 1.564| 1.949| 0.883| 8.16| +|2017-10-20T03:00:00.000+08:00| 9.645| 2.21| 7.249| 1.066| 1.828| 0.914| 7.949| +...... +|2017-10-23T20:00:00.000+08:00| 8.105| 0.938| 4.371| -0.569| 3.533| 1.279| 9.708| +|2017-10-23T21:00:00.000+08:00| 7.167| 1.206| 4.087| -0.462| 3.107| 1.432| 8.723| +|2017-10-23T22:00:00.000+08:00| 7.1| 1.34| 4.015| -0.32| 2.772| 1.31| 8.864| +|2017-10-23T23:00:00.000+08:00| 9.176| 2.746| 7.107| 1.635| 2.65| 1.097| 9.004| ++-----------------------------+---------------+---------------+---------------+---------------+---------------+---------------+---------------+ +Total line number = 96 + +IoTDB> call inference(dlinear_example, "select s0,s1,s2,s3,s4,s5,s6 from root.eg.etth", window=head(96)) ++-----------+----------+----------+------------+---------+----------+----------+ +| output0| output1| output2| output3| output4| output5| output6| ++-----------+----------+----------+------------+---------+----------+----------+ +| 10.319546| 3.1450553| 7.877341| 1.5723765|2.7303758| 1.1362307| 8.867775| +| 10.443649| 3.3286757| 7.8593454| 1.7675098| 2.560634| 1.1177158| 8.920919| +| 10.883752| 3.2341104| 8.47036| 1.6116762|2.4874182| 1.1760603| 8.798939| +...... +| 8.0115595| 1.2995274| 6.9900327|-0.098746896| 3.04923| 1.176214| 9.548782| +| 8.612427| 2.5036244| 5.6790237| 0.66474205|2.8870275| 1.2051733| 9.330128| +| 10.096699| 3.399722| 6.9909| 1.7478468|2.7642853| 1.1119363| 9.541455| ++-----------+----------+----------+------------+---------+----------+----------+ +Total line number = 48 +``` + +我们将对油温的预测的结果和真实结果进行对比,可以得到以下的图像。 + +图中10/24 00:00之前的数据为输入模型的过去数据,10/24 00:00后的蓝色线条为模型给出的油温预测结果,而红色为数据集中实际的油温数据(用于进行对比)。 + +![](https://alioss.timecho.com/docs/img/AINode-analysis1.png) + +可以看到,我们使用了过去96个小时(4天)的六个负载信息和对应时间油温的关系,基于之前学习到的序列间相互关系对未来48个小时(2天)的油温这一数据的可能变化进行了建模,可以看到可视化后预测曲线与实际结果在趋势上保持了较高程度的一致性。 + +### 功率预测 + +变电站需要对电流、电压、功率等数据进行电力监控,用于检测潜在的电网问题、识别电力系统中的故障、有效管理电网负载以及分析电力系统的性能和趋势等。 + +我们利用某变电站中的电流、电压和功率等数据构成了真实场景下的数据集。该数据集包括变电站近四个月时间跨度,每5 - 6s 采集一次的 A相电压、B相电压、C相电压等数据。 + +测试集数据内容为[data](https://alioss.timecho.com/docs/img/data.csv)。 + +在该数据集上,IoTDB-ML的模型推理功能可以通过以往A相电压,B相电压和C相电压的数值和对应时间戳,预测未来一段时间内的C相电压,赋能变电站的监视管理。 + +#### 步骤一:数据导入 + +用户可以使用tools文件夹中的`import-csv.sh` 导入数据集 + +```Bash +bash ./import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ../../data.csv +``` + +#### 步骤二:模型导入 + +我们可以在iotdb-cli 中选择内置模型或已经注册好的模型用于后续的推理。 + +我们采用内置模型STLForecaster进行预测,STLForecaster 是一个基于 statsmodels 库中 STL 实现的时间序列预测方法。 + +#### 步骤三:模型推理 + +```Shell +IoTDB> select * from root.eg.voltage limit 96 ++-----------------------------+------------------+------------------+------------------+ +| Time|root.eg.voltage.s0|root.eg.voltage.s1|root.eg.voltage.s2| ++-----------------------------+------------------+------------------+------------------+ +|2023-02-14T20:38:32.000+08:00| 2038.0| 2028.0| 2041.0| +|2023-02-14T20:38:38.000+08:00| 2014.0| 2005.0| 2018.0| +|2023-02-14T20:38:44.000+08:00| 2014.0| 2005.0| 2018.0| +...... +|2023-02-14T20:47:52.000+08:00| 2024.0| 2016.0| 2027.0| +|2023-02-14T20:47:57.000+08:00| 2024.0| 2016.0| 2027.0| +|2023-02-14T20:48:03.000+08:00| 2024.0| 2016.0| 2027.0| ++-----------------------------+------------------+------------------+------------------+ +Total line number = 96 + +IoTDB> call inference(_STLForecaster, "select s0,s1,s2 from root.eg.voltage", window=head(96),predict_length=48) ++---------+---------+---------+ +| output0| output1| output2| ++---------+---------+---------+ +|2026.3601|2018.2953|2029.4257| +|2019.1538|2011.4361|2022.0888| +|2025.5074|2017.4522|2028.5199| +...... + +|2022.2336|2015.0290|2025.1023| +|2015.7241|2008.8975|2018.5085| +|2022.0777|2014.9136|2024.9396| +|2015.5682|2008.7821|2018.3458| ++---------+---------+---------+ +Total line number = 48 +``` +我们将对C相电压的预测的结果和真实结果进行对比,可以得到以下的图像。 + +图中 02/14 20:48 之前的数据为输入模型的过去数据, 02/14 20:48 后的蓝色线条为模型给出的C相电压预测结果,而红色为数据集中实际的C相电压数据(用于进行对比)。 + +![](https://alioss.timecho.com/docs/img/AINode-analysis2.png) + +可以看到,我们使用了过去10分钟的电压的数据,基于之前学习到的序列间相互关系对未来5分钟的C相电压这一数据的可能变化进行了建模,可以看到可视化后预测曲线与实际结果在趋势上保持了一定的同步性。 + +### 异常检测 + +在民航交通运输业,存在着对乘机旅客数量进行异常检测的需求。异常检测的结果可用于指导调整航班的调度,以使得企业获得更大效益。 + +Airline Passengers一个时间序列数据集,该数据集记录了1949年至1960年期间国际航空乘客数量,间隔一个月进行一次采样。该数据集共含一条时间序列。数据集为[airline](https://alioss.timecho.com/docs/img/airline.csv)。 +在该数据集上,IoTDB-ML的模型推理功能可以通过捕捉序列的变化规律以对序列时间点进行异常检测,赋能交通运输业。 + +#### 步骤一:数据导入 + +用户可以使用tools文件夹中的`import-csv.sh` 导入数据集 + +```Bash +bash ./import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f ../../data.csv +``` + +#### 步骤二:模型推理 + +IoTDB内置有部分可以直接使用的机器学习算法,使用其中的异常检测算法进行预测的样例如下: + +```Shell +IoTDB> select * from root.eg.airline ++-----------------------------+------------------+ +| Time|root.eg.airline.s0| ++-----------------------------+------------------+ +|1949-01-31T00:00:00.000+08:00| 224.0| +|1949-02-28T00:00:00.000+08:00| 118.0| +|1949-03-31T00:00:00.000+08:00| 132.0| +|1949-04-30T00:00:00.000+08:00| 129.0| +...... +|1960-09-30T00:00:00.000+08:00| 508.0| +|1960-10-31T00:00:00.000+08:00| 461.0| +|1960-11-30T00:00:00.000+08:00| 390.0| +|1960-12-31T00:00:00.000+08:00| 432.0| ++-----------------------------+------------------+ +Total line number = 144 + +IoTDB> call inference(_Stray, "select s0 from root.eg.airline", k=2) ++-------+ +|output0| ++-------+ +| 0| +| 0| +| 0| +| 0| +...... +| 1| +| 1| +| 0| +| 0| +| 0| +| 0| ++-------+ +Total line number = 144 +``` + +我们将检测为异常的结果进行绘制,可以得到以下图像。其中蓝色曲线为原时间序列,用红色点特殊标注的时间点为算法检测为异常的时间点。 + +![](https://alioss.timecho.com/docs/img/s6.png) + +可以看到,Stray模型对输入序列变化进行了建模,成功检测出出现异常的时间点。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Audit-Log_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Audit-Log_timecho.md new file mode 100644 index 00000000..0165a99d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Audit-Log_timecho.md @@ -0,0 +1,108 @@ + + + +# 审计日志 + +## 功能背景 + + 审计日志是数据库的记录凭证,通过审计日志功能可以查询到用户在数据库中增删改查等各项操作,以保证信息安全。关于IoTDB的审计日志功能可以实现以下场景的需求: + +- 可以按链接来源(是否人为操作)决定是否记录审计日志,如:非人为操作如硬件采集器写入的数据不需要记录审计日志,人为操作如普通用户通过cli、workbench等工具操作的数据需要记录审计日志。 +- 过滤掉系统级别的写入操作,如IoTDB监控体系本身记录的写入操作等。 + + + +### 场景说明 + + + +#### 对所有用户的所有操作(增、删、改、查)进行记录 + +通过审计日志功能追踪到所有用户在数据中的各项操作。其中所记录的信息要包含数据操作(新增、删除、查询)及元数据操作(新增、修改、删除、查询)、客户端登录信息(用户名、ip地址)。 + + + +客户端的来源 + +- Cli、workbench、Zeppelin、Grafana、通过 Session/JDBC/MQTT 等协议传入的请求 + +![审计日志](https://alioss.timecho.com/docs/img/%E5%AE%A1%E8%AE%A1%E6%97%A5%E5%BF%97.PNG) + + +#### 可关闭部分用户连接的审计日志 + + + +如非人为操作,硬件采集器通过 Session/JDBC/MQTT 写入的数据不需要记录审计日志 + + + +## 功能定义 + + + +通过配置可以实现: + +- 决定是否开启审计功能 +- 决定审计日志的输出位置,支持输出至一项或多项 + 1. 日志文件 + 2. IoTDB存储 +- 决定是否屏蔽原生接口的写入,防止记录审计日志过多影响性能 +- 决定审计日志内容类别,支持记录一项或多项 + 1. 数据的新增、删除操作 + 2. 数据和元数据的查询操作 + 3. 元数据类的新增、修改、删除操作 + +### 配置项 + + 在iotdb-system.properties中修改以下几项配置 + +```YAML +#################### +### Audit log Configuration +#################### + +# whether to enable the audit log. +# Datatype: Boolean +# enable_audit_log=false + +# Output location of audit logs +# Datatype: String +# IOTDB: the stored time series is: root.__system.audit._{user} +# LOGGER: log_audit.log in the log directory +# audit_log_storage=IOTDB,LOGGER + +# whether enable audit log for DML operation of data +# whether enable audit log for DDL operation of schema +# whether enable audit log for QUERY operation of data and schema +# Datatype: String +# audit_log_operation=DML,DDL,QUERY + +# whether the local write api records audit logs +# Datatype: Boolean +# This contains Session insert api: insertRecord(s), insertTablet(s),insertRecordsOfOneDevice +# MQTT insert api +# RestAPI insert api +# This parameter will cover the DML in audit_log_operation +# enable_audit_log_for_native_insert_api=true +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Authority-Management.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Authority-Management.md new file mode 100644 index 00000000..814762e8 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Authority-Management.md @@ -0,0 +1,510 @@ + + +# 权限管理 + +IoTDB 为用户提供了权限管理操作,为用户提供对数据与集群系统的权限管理功能,保障数据与系统安全。 +本篇介绍IoTDB 中权限模块的基本概念、用户定义、权限管理、鉴权逻辑与功能用例。在 JAVA 编程环境中,您可以使用 [JDBC API](../API/Programming-JDBC.md) 单条或批量执行权限管理类语句。 + +## 基本概念 + +### 用户 + +用户即数据库的合法使用者。一个用户与一个唯一的用户名相对应,并且拥有密码作为身份验证的手段。一个人在使用数据库之前,必须先提供合法的(即存于数据库中的)用户名与密码,作为用户成功登录。 + +### 权限 + +数据库提供多种操作,但并非所有的用户都能执行所有操作。如果一个用户可以执行某项操作,则称该用户有执行该操作的权限。权限通常需要一个路径来限定其生效范围,可以使用[路径模式](../Basic-Concept/Data-Model-and-Terminology.md)灵活管理权限。 + +### 角色 + +角色是若干权限的集合,并且有一个唯一的角色名作为标识符。角色通常和一个现实身份相对应(例如交通调度员),而一个现实身份可能对应着多个用户。这些具有相同现实身份的用户往往具有相同的一些权限,角色就是为了能对这样的权限进行统一的管理的抽象。 + +### 默认用户与角色 + +安装初始化后的 IoTDB 中有一个默认用户:root,默认密码为 root。该用户为管理员用户,固定拥有所有权限,无法被赋予、撤销权限,也无法被删除,数据库内仅有一个管理员用户。 + +一个新创建的用户或角色不具备任何权限。 + +## 用户定义 + +拥有 MANAGE_USER、MANAGE_ROLE 的用户或者管理员可以创建用户或者角色,需要满足以下约束: + +### 用户名限制 + +4~32个字符,支持使用英文大小写字母、数字、特殊字符(`!@#$%^&*()_+-=`) + +用户无法创建和管理员用户同名的用户。 + +### 密码限制 + +4~32个字符,可使用大写小写字母、数字、特殊字符(`!@#$%^&*()_+-=`),密码默认采用 MD5 进行加密。 + +### 角色名限制 + +4~32个字符,支持使用英文大小写字母、数字、特殊字符(`!@#$%^&*()_+-=`) + +用户无法创建和管理员用户同名的角色。 + +## 权限管理 + +IoTDB 主要有两类权限:序列权限、全局权限。 + +### 序列权限 + +序列权限约束了用户访问数据的范围与方式,支持对绝对路径与前缀匹配路径授权,可对timeseries 粒度生效。 + +下表描述了这类权限的种类与范围: + +| 权限名称 | 描述 | +|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| READ_DATA | 允许读取授权路径下的序列数据。 | +| WRITE_DATA | 允许读取授权路径下的序列数据。
允许插入、删除授权路径下的的序列数据。
允许在授权路径下导入、加载数据,在导入数据时,需要拥有对应路径的 WRITE_DATA 权限,在自动创建数据库与序列时,需要有 MANAGE_DATABASE 与 WRITE_SCHEMA 权限。 | +| READ_SCHEMA | 允许获取授权路径下元数据树的详细信息:
包括:路径下的数据库、子路径、子节点、设备、序列、模版、视图等。 | +| WRITE_SCHEMA | 允许获取授权路径下元数据树的详细信息。
允许在授权路径下对序列、模版、视图等进行创建、删除、修改操作。
在创建或修改 view 的时候,会检查 view 路径的 WRITE_SCHEMA 权限、数据源的 READ_SCHEMA 权限。
在对 view 进行查询、插入时,会检查 view 路径的 READ_DATA 权限、WRITE_DATA 权限。
允许在授权路径下设置、取消、查看TTL。
允许在授权路径下挂载或者接触挂载模板。 | + +### 全局权限 + +全局权限约束了用户使用的数据库功能、限制了用户执行改变系统状态与任务状态的命令,用户获得全局授权后,可对数据库进行管理。 + +下表描述了系统权限的种类: + +| 权限名称 | 描述 | +|:---------------:|:------------------------------------------------------------------| +| MANAGE_DATABASE | - 允许用户创建、删除数据库. | +| MANAGE_USER | - 允许用户创建、删除、修改、查看用户。 | +| MANAGE_ROLE | - 允许用户创建、删除、查看角色。
允许用户将角色授予给其他用户,或取消其他用户的角色。 | +| USE_TRIGGER | - 允许用户创建、删除、查看触发器。
与触发器的数据源权限检查相独立。 | +| USE_UDF | - 允许用户创建、删除、查看用户自定义函数。
与自定义函数的数据源权限检查相独立。 | +| USE_CQ | - 允许用户创建、开始、停止、删除、查看管道。
允许用户创建、删除、查看管道插件。
与管道的数据源权限检查相独立。 | +| USE_PIPE | - 允许用户注册、开始、停止、卸载、查询流处理任务。
- 允许用户注册、卸载、查询注册流处理任务插件。 | +| EXTEND_TEMPLATE | - 允许自动扩展模板。 | +| MAINTAIN | - 允许用户查询、取消查询。
允许用户查看变量。
允许用户查看集群状态。 | +| USE_MODEL | - 允许用户创建、删除、查询深度学习模型 | + +关于模板权限: + +1. 模板的创建、删除、修改、查询、挂载、卸载仅允许管理员操作。 +2. 激活模板需要拥有激活路径的 WRITE_SCHEMA 权限 +3. 若开启了自动创建,在向挂载了模板的不存在路径写入时,数据库会自动扩展模板并写入数据,因此需要有 EXTEND_TEMPLATE 权限与写入序列的 WRITE_DATA 权限。 +4. 解除模板,需要拥有挂载模板路径的 WRITE_SCHEMA 权限。 +5. 查询使用了某个元数据模板的路径,需要有路径的 READ_SCHEMA 权限,否则将返回为空。 + +### 权限授予与取消 + +在 IoTDB 中,用户可以由三种途径获得权限: + +1. 由超级管理员授予,超级管理员可以控制其他用户的权限。 +2. 由允许权限授权的用户授予,该用户获得权限时被指定了 grant option 关键字。 +3. 由超级管理员或者有 MANAGE_ROLE 的用户授予某个角色进而获取权限。 + +取消用户的权限,可以由以下几种途径: + +1. 由超级管理员取消用户的权限。 +2. 由允许权限授权的用户取消权限,该用户获得权限时被指定了 grant option 关键字。 +3. 由超级管理员或者MANAGE_ROLE 的用户取消用户的某个角色进而取消权限。 + +- 在授权时,必须指定路径。全局权限需要指定为 root.**, 而序列相关权限必须为绝对路径或者以双通配符结尾的前缀路径。 +- 当授予角色权限时,可以为该权限指定 with grant option 关键字,意味着用户可以转授其授权路径上的权限,也可以取消其他用户的授权路径上的权限。例如用户 A 在被授予`集团1.公司1.**`的读权限时制定了 grant option 关键字,那么 A 可以将`集团1.公司1`以下的任意节点、序列的读权限转授给他人, 同样也可以取消其他用户 `集团1.公司1` 下任意节点的读权限。 +- 在取消授权时,取消授权语句会与用户所有的权限路径进行匹配,将匹配到的权限路径进行清理,例如用户A 具有 `集团1.公司1.工厂1 `的读权限, 在取消 `集团1.公司1.** `的读权限时,会清除用户A 的 `集团1.公司1.工厂1` 的读权限。 + + + +## 鉴权 + +用户权限主要由三部分组成:权限生效范围(路径), 权限类型, with grant option 标记: + +``` +userTest1 : + root.t1.** - read_schema, read_data - with grant option + root.** - write_schema, write_data - with grant option +``` + +每个用户都有一个这样的权限访问列表,标识他们获得的所有权限,可以通过 `LIST PRIVILEGES OF USER ` 查看他们的权限。 + +在对一个路径进行鉴权时,数据库会进行路径与权限的匹配。例如检查 `root.t1.t2` 的 read_schema 权限时,首先会与权限访问列表的 `root.t1.**`进行匹配,匹配成功,则检查该路径是否包含待鉴权的权限,否则继续下一条路径-权限的匹配,直到匹配成功或者匹配结束。 + +在进行多路径鉴权时,对于多路径查询任务,数据库只会将有权限的数据呈现出来,无权限的数据不会包含在结果中;对于多路径写入任务,数据库要求必须所有的目标序列都获得了对应的权限,才能进行写入。 + +请注意,下面的操作需要检查多重权限 +1. 开启了自动创建序列功能,在用户将数据插入到不存在的序列中时,不仅需要对应序列的写入权限,还需要序列的元数据修改权限。 +2. 执行 select into 语句时,需要检查源序列的读权限与目标序列的写权限。需要注意的是源序列数据可能因为权限不足而仅能获取部分数据,目标序列写入权限不足时会报错终止任务。 +3. View 权限与数据源的权限是独立的,向 view 执行读写操作仅会检查 view 的权限,而不再对源路径进行权限校验。 + + + +## 功能语法与示例 + +IoTDB 提供了组合权限,方便用户授权: + +| 权限名称 | 权限范围 | +|-------|-------------------------| +| ALL | 所有权限 | +| READ | READ_SCHEMA、READ_DATA | +| WRITE | WRITE_SCHEMA、WRITE_DATA | + +组合权限并不是一种具体的权限,而是一种简写方式,与直接书写对应的权限名称没有差异。 + +下面将通过一系列具体的用例展示权限语句的用法,非管理员执行下列语句需要提前获取权限,所需的权限标记在操作描述后。 + +### 用户与角色相关 + +- 创建用户(需 MANAGE_USER 权限) + + +```SQL +CREATE USER +eg: CREATE USER user1 'passwd' +``` + +- 删除用户 (需 MANEGE_USER 权限) + + +```SQL +DROP USER +eg: DROP USER user1 +``` + +- 创建角色 (需 MANAGE_ROLE 权限) + +```SQL +CREATE ROLE +eg: CREATE ROLE role1 +``` + +- 删除角色 (需 MANAGE_ROLE 权限) + + +```SQL +DROP ROLE +eg: DROP ROLE role1 +``` + +- 赋予用户角色 (需 MANAGE_ROLE 权限) + + +```SQL +GRANT ROLE TO +eg: GRANT ROLE admin TO user1 +``` + +- 移除用户角色 (需 MANAGE_ROLE 权限) + + +```SQL +REVOKE ROLE FROM +eg: REVOKE ROLE admin FROM user1 +``` + +- 列出所有用户 (需 MANEGE_USER 权限) + +```SQL +LIST USER +``` + +- 列出所有角色 (需 MANAGE_ROLE 权限) + +```SQL +LIST ROLE +``` + +- 列出指定角色下所有用户 (需 MANEGE_USER 权限) + +```SQL +LIST USER OF ROLE +eg: LIST USER OF ROLE roleuser +``` + +- 列出指定用户下所有角色 + +用户可以列出自己的角色,但列出其他用户的角色需要拥有 MANAGE_ROLE 权限。 + +```SQL +LIST ROLE OF USER +eg: LIST ROLE OF USER tempuser +``` + +- 列出用户所有权限 + +用户可以列出自己的权限信息,但列出其他用户的权限需要拥有 MANAGE_USER 权限。 + +```SQL +LIST PRIVILEGES OF USER ; +eg: LIST PRIVILEGES OF USER tempuser; + +``` + +- 列出角色所有权限 + +用户可以列出自己具有的角色的权限信息,列出其他角色的权限需要有 MANAGE_ROLE 权限。 + +```SQL +LIST PRIVILEGES OF ROLE ; +eg: LIST PRIVILEGES OF ROLE actor; +``` + +- 更新密码 + +用户可以更新自己的密码,但更新其他用户密码需要具备MANAGE_USER 权限。 + +```SQL +ALTER USER SET PASSWORD ; +eg: ALTER USER tempuser SET PASSWORD 'newpwd'; +``` + +### 授权与取消授权 + +用户使用授权语句对赋予其他用户权限,语法如下: + +```SQL +GRANT ON TO ROLE/USER [WITH GRANT OPTION]; +eg: GRANT READ ON root.** TO ROLE role1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.** TO USER user1; +eg: GRANT READ_DATA, WRITE_DATA ON root.t1.**,root.t2.** TO USER user1; +eg: GRANT MANAGE_ROLE ON root.** TO USER user1 WITH GRANT OPTION; +eg: GRANT ALL ON root.** TO USER user1 WITH GRANT OPTION; +``` + +用户使用取消授权语句可以将其他的权限取消,语法如下: + +```SQL +REVOKE ON FROM ROLE/USER ; +eg: REVOKE READ ON root.** FROM ROLE role1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.** FROM USER user1; +eg: REVOKE READ_DATA, WRITE_DATA ON root.t1.**, root.t2.** FROM USER user1; +eg: REVOKE MANAGE_ROLE ON root.** FROM USER user1; +eg: REVOKE ALL ON ROOT.** FROM USER user1; +``` + +- **非管理员用户执行授权/取消授权语句时,需要对\ 有\ 权限,并且该权限是被标记带有 WITH GRANT OPTION 的。** + +- 在授予取消全局权限时,或者语句中包含全局权限时(ALL 展开会包含全局权限),须指定 path 为 root.**。 例如,以下授权/取消授权语句是合法的: + + ```SQL + GRANT MANAGE_USER ON root.** TO USER user1; + GRANT MANAGE_ROLE ON root.** TO ROLE role1 WITH GRANT OPTION; + GRANT ALL ON root.** TO role role1 WITH GRANT OPTION; + REVOKE MANAGE_USER ON root.** FROM USER user1; + REVOKE MANAGE_ROLE ON root.** FROM ROLE role1; + REVOKE ALL ON root.** FROM ROLE role1; + ``` + 下面的语句是非法的: + + ```SQL + GRANT READ, MANAGE_ROLE ON root.t1.** TO USER user1; + GRANT ALL ON root.t1.t2 TO USER user1 WITH GRANT OPTION; + REVOKE ALL ON root.t1.t2 FROM USER user1; + REVOKE READ, MANAGE_ROLE ON root.t1.t2 FROM ROLE ROLE1; + ``` + +- \ 必须为全路径或者以双通配符结尾的匹配路径,以下路径是合法的: + + ```SQL + root.** + root.t1.t2.** + root.t1.t2.t3 + ``` + + 以下的路径是非法的: + + ```SQL + root.t1.* + root.t1.**.t2 + root.t1*.t2.t3 + ``` + +## 示例 + +根据本文中描述的 [样例数据](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt) 内容,IoTDB 的样例数据可能同时属于 ln, sgcc 等不同发电集团,不同的发电集团不希望其他发电集团获取自己的数据库数据,因此我们需要将不同的数据在集团层进行权限隔离。 + +### 创建用户 + +使用 `CREATE USER ` 创建用户。例如,我们可以使用具有所有权限的root用户为 ln 和 sgcc 集团创建两个用户角色,名为 ln_write_user, sgcc_write_user,密码均为 write_pwd。建议使用反引号(`)包裹用户名。SQL 语句为: + +```SQL +CREATE USER `ln_write_user` 'write_pwd' +CREATE USER `sgcc_write_user` 'write_pwd' +``` +此时使用展示用户的 SQL 语句: + +```SQL +LIST USER +``` + +我们可以看到这两个已经被创建的用户,结果如下: + +```SQL +IoTDB> CREATE USER `ln_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> LIST USER; ++---------------+ +| user| ++---------------+ +| ln_write_user| +| root| +|sgcc_write_user| ++---------------+ +Total line number = 3 +It costs 0.012s +``` + +### 赋予用户权限 + +此时,虽然两个用户已经创建,但是他们不具有任何权限,因此他们并不能对数据库进行操作,例如我们使用 ln_write_user 用户对数据库中的数据进行写入,SQL 语句为: + +```SQL +INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +``` + +此时,系统不允许用户进行此操作,会提示错误: + +```SQL +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Msg: 803: No permissions for this operation, please add privilege WRITE_DATA on [root.ln.wf01.wt01.status] +``` + +现在,我们用 root 用户分别赋予他们向对应路径的写入权限. + +我们使用 `GRANT ON TO USER ` 语句赋予用户权限,例如: +```SQL +GRANT WRITE_DATA ON root.ln.** TO USER `ln_write_user` +GRANT WRITE_DATA ON root.sgcc1.**, root.sgcc2.** TO USER `sgcc_write_user` +``` + +执行状态如下所示: + +```SQL +IoTDB> GRANT WRITE_DATA ON root.ln.** TO USER `ln_write_user` +Msg: The statement is executed successfully. +IoTDB> GRANT WRITE_DATA ON root.sgcc1.**, root.sgcc2.** TO USER `sgcc_write_user` +Msg: The statement is executed successfully. +``` + +接着使用ln_write_user再尝试写入数据 + +```SQL +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: The statement is executed successfully. +``` + +### 撤销用户权限 +授予用户权限后,我们可以使用 `REVOKE ON FROM USER `来撤销已经授予用户的权限。例如,用root用户撤销ln_write_user和sgcc_write_user的权限: + +``` SQL +REVOKE WRITE_DATA ON root.ln.** FROM USER `ln_write_user` +REVOKE WRITE_DATA ON root.sgcc1.**, root.sgcc2.** FROM USER `sgcc_write_user` +``` + +执行状态如下所示: +``` SQL +IoTDB> REVOKE WRITE_DATA ON root.ln.** FROM USER `ln_write_user` +Msg: The statement is executed successfully. +IoTDB> REVOKE WRITE_DATA ON root.sgcc1.**, root.sgcc2.** FROM USER `sgcc_write_user` +Msg: The statement is executed successfully. +``` + +撤销权限后,ln_write_user就没有向root.ln.**写入数据的权限了。 + +``` SQL +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: 803: No permissions for this operation, please add privilege WRITE_DATA on [root.ln.wf01.wt01.status] +``` + +## 其他说明 + +角色是权限的集合,而权限和角色都是用户的一种属性。即一个角色可以拥有若干权限。一个用户可以拥有若干角色与权限(称为用户自身权限)。 + +目前在 IoTDB 中并不存在相互冲突的权限,因此一个用户真正具有的权限是用户自身权限与其所有的角色的权限的并集。即要判定用户是否能执行某一项操作,就要看用户自身权限或用户的角色的所有权限中是否有一条允许了该操作。用户自身权限与其角色权限,他的多个角色的权限之间可能存在相同的权限,但这并不会产生影响。 + +需要注意的是:如果一个用户自身有某种权限(对应操作 A),而他的某个角色有相同的权限。那么如果仅从该用户撤销该权限无法达到禁止该用户执行操作 A 的目的,还需要从这个角色中也撤销对应的权限,或者从这个用户将该角色撤销。同样,如果仅从上述角色将权限撤销,也不能禁止该用户执行操作 A。 + +同时,对角色的修改会立即反映到所有拥有该角色的用户上,例如对角色增加某种权限将立即使所有拥有该角色的用户都拥有对应权限,删除某种权限也将使对应用户失去该权限(除非用户本身有该权限)。 + +## 升级说明 + +在 1.3 版本前,权限类型较多,在这一版实现中,权限类型做了精简,并且添加了对权限路径的约束。 + +数据库 1.3 版本的权限路径必须为全路径或者以双通配符结尾的匹配路径,在系统升级时,会自动转换不合法的权限路径和权限类型。 +路径上首个非法节点会被替换为`**`, 不在支持的权限类型也会映射到当前系统支持的权限上。 + +例如: + +| 权限类型 | 权限路径 | 映射之后的权限类型 | 权限路径 | +| ----------------- | --------------- |-----------------| ------------- | +| CREATE_DATBASE | root.db.t1.* | MANAGE_DATABASE | root.** | +| INSERT_TIMESERIES | root.db.t2.*.t3 | WRITE_DATA | root.db.t2.** | +| CREATE_TIMESERIES | root.db.t2*c.t3 | WRITE_SCHEMA | root.db.** | +| LIST_ROLE | root.** | (忽略) | | + + +新旧版本的权限类型对照可以参照下面的表格(--IGNORE 表示新版本忽略该权限): + +| 权限名称 | 是否路径相关 | 新权限名称 | 是否路径相关 | +|---------------------------|--------|-----------------|--------| +| CREATE_DATABASE | 是 | MANAGE_DATABASE | 否 | +| INSERT_TIMESERIES | 是 | WRITE_DATA | 是 | +| UPDATE_TIMESERIES | 是 | WRITE_DATA | 是 | +| READ_TIMESERIES | 是 | READ_DATA | 是 | +| CREATE_TIMESERIES | 是 | WRITE_SCHEMA | 是 | +| DELETE_TIMESERIES | 是 | WRITE_SCHEMA | 是 | +| CREATE_USER | 否 | MANAGE_USER | 否 | +| DELETE_USER | 否 | MANAGE_USER | 否 | +| MODIFY_PASSWORD | 否 | -- IGNORE | | +| LIST_USER | 否 | -- IGNORE | | +| GRANT_USER_PRIVILEGE | 否 | -- IGNORE | | +| REVOKE_USER_PRIVILEGE | 否 | -- IGNORE | | +| GRANT_USER_ROLE | 否 | MANAGE_ROLE | 否 | +| REVOKE_USER_ROLE | 否 | MANAGE_ROLE | 否 | +| CREATE_ROLE | 否 | MANAGE_ROLE | 否 | +| DELETE_ROLE | 否 | MANAGE_ROLE | 否 | +| LIST_ROLE | 否 | -- IGNORE | | +| GRANT_ROLE_PRIVILEGE | 否 | -- IGNORE | | +| REVOKE_ROLE_PRIVILEGE | 否 | -- IGNORE | | +| CREATE_FUNCTION | 否 | USE_UDF | 否 | +| DROP_FUNCTION | 否 | USE_UDF | 否 | +| CREATE_TRIGGER | 是 | USE_TRIGGER | 否 | +| DROP_TRIGGER | 是 | USE_TRIGGER | 否 | +| START_TRIGGER | 是 | USE_TRIGGER | 否 | +| STOP_TRIGGER | 是 | USE_TRIGGER | 否 | +| CREATE_CONTINUOUS_QUERY | 否 | USE_CQ | 否 | +| DROP_CONTINUOUS_QUERY | 否 | USE_CQ | 否 | +| ALL | 否 | All privilegs | | +| DELETE_DATABASE | 是 | MANAGE_DATABASE | 否 | +| ALTER_TIMESERIES | 是 | WRITE_SCHEMA | 是 | +| UPDATE_TEMPLATE | 否 | -- IGNORE | | +| READ_TEMPLATE | 否 | -- IGNORE | | +| APPLY_TEMPLATE | 是 | WRITE_SCHEMA | 是 | +| READ_TEMPLATE_APPLICATION | 否 | -- IGNORE | | +| SHOW_CONTINUOUS_QUERIES | 否 | -- IGNORE | | +| CREATE_PIPEPLUGIN | 否 | USE_PIPE | 否 | +| DROP_PIPEPLUGINS | 否 | USE_PIPE | 否 | +| SHOW_PIPEPLUGINS | 否 | -- IGNORE | | +| CREATE_PIPE | 否 | USE_PIPE | 否 | +| START_PIPE | 否 | USE_PIPE | 否 | +| STOP_PIPE | 否 | USE_PIPE | 否 | +| DROP_PIPE | 否 | USE_PIPE | 否 | +| SHOW_PIPES | 否 | -- IGNORE | | +| CREATE_VIEW | 是 | WRITE_SCHEMA | 是 | +| ALTER_VIEW | 是 | WRITE_SCHEMA | 是 | +| RENAME_VIEW | 是 | WRITE_SCHEMA | 是 | +| DELETE_VIEW | 是 | WRITE_SCHEMA | 是 | diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_apache.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_apache.md new file mode 100644 index 00000000..b882513b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_apache.md @@ -0,0 +1,527 @@ + + +# 数据同步 +数据同步是工业物联网的典型需求,通过数据同步机制,可实现 IoTDB 之间的数据共享,搭建完整的数据链路来满足内网外网数据互通、端边云同步、数据迁移、数据备份等需求。 + +## 功能概述 + +### 数据同步 + +一个数据同步任务包含 3 个阶段: + +![](https://alioss.timecho.com/docs/img/dataSync01.png) + +- 抽取(Source)阶段:该部分用于从源 IoTDB 抽取数据,在 SQL 语句中的 source 部分定义 +- 处理(Process)阶段:该部分用于处理从源 IoTDB 抽取出的数据,在 SQL 语句中的 processor 部分定义 +- 发送(Sink)阶段:该部分用于向目标 IoTDB 发送数据,在 SQL 语句中的 sink 部分定义 + +通过 SQL 语句声明式地配置 3 个部分的具体内容,可实现灵活的数据同步能力。目前数据同步支持以下信息的同步,您可以在创建同步任务时对同步范围进行选择(默认选择 data.insert,即同步新写入的数据): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
同步范围同步内容说明
all所有范围
data(数据)insert(增量)同步新写入的数据
delete(删除)同步被删除的数据
schema(元数据)database(数据库)同步数据库的创建、修改或删除操作
timeseries(时间序列)同步时间序列的定义和属性
TTL(数据到期时间)同步数据的存活时间
auth(权限)-同步用户权限和访问控制
+ +### 功能限制及说明 + +元数据(schema)、权限(auth)同步功能存在如下限制: + +- 使用元数据同步时,要求`Schema region`、`ConfigNode` 的共识协议必须为默认的 ratis 协议,即`iotdb-system.properties`配置文件中是否包含`config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus`、`schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus`,不包含即为默认值ratis 协议。 + +- 为了防止潜在的冲突,请在开启元数据同步时关闭接收端自动创建元数据功能。可通过修改 `iotdb-system.properties`配置文件中的`enable_auto_create_schema`配置项为 false,关闭元数据自动创建功能。 + +- 开启元数据同步时,不支持使用自定义插件。 + +- 在进行数据同步任务时,请避免执行任何删除操作,防止两端状态不一致。 + +## 使用说明 + +数据同步任务有三种状态:RUNNING、STOPPED 和 DROPPED。任务状态转换如下图所示: + +![](https://alioss.timecho.com/docs/img/Data-Sync01.png) + +创建后任务会直接启动,同时当任务发生异常停止后,系统会自动尝试重启任务。 + +提供以下 SQL 语句对同步任务进行状态管理。 + +### 创建任务 + +使用 `CREATE PIPE` 语句来创建一条数据同步任务,下列属性中`PipeId`和`sink`必填,`source`和`processor`为选填项,输入 SQL 时注意 `SOURCE`与 `SINK` 插件顺序不能替换。 + +SQL 示例如下: + +```SQL +CREATE PIPE [IF NOT EXISTS] -- PipeId 是能够唯一标定任务的名字 +-- 数据抽取插件,可选插件 +WITH SOURCE ( + [ = ,], +) +-- 数据处理插件,可选插件 +WITH PROCESSOR ( + [ = ,], +) +-- 数据连接插件,必填插件 +WITH SINK ( + [ = ,], +) +``` + +**IF NOT EXISTS 语义**:用于创建操作中,确保当指定 Pipe 不存在时,执行创建命令,防止因尝试创建已存在的 Pipe 而导致报错。 + +### 开始任务 + +开始处理数据: + +```SQL +START PIPE +``` + +### 停止任务 + +停止处理数据: + +```SQL +STOP PIPE +``` + +### 删除任务 + +删除指定任务: + +```SQL +DROP PIPE [IF EXISTS] +``` + +**IF EXISTS 语义**:用于删除操作中,确保当指定 Pipe 存在时,执行删除命令,防止因尝试删除不存在的 Pipe 而导致报错。 + +删除任务不需要先停止同步任务。 + +### 查看任务 + +查看全部任务: + +```SQL +SHOW PIPES +``` + +查看指定任务: + +```SQL +SHOW PIPE +``` + + pipe 的 show pipes 结果示例: + +```SQL ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor| PipeSink|ExceptionMessage|RemainingEventCount|EstimatedRemainingSeconds| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +|59abf95db892428b9d01c5fa318014ea|2024-06-17T14:03:44.189|RUNNING| {}| {}|{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}| | 128| 1.03| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +``` + + +其中各列含义如下: + +- **ID**:同步任务的唯一标识符 +- **CreationTime**:同步任务的创建的时间 +- **State**:同步任务的状态 +- **PipeSource**:同步数据流的来源 +- **PipeProcessor**:同步数据流在传输过程中的处理逻辑 +- **PipeSink**:同步数据流的目的地 +- **ExceptionMessage**:显示同步任务的异常信息 +- **RemainingEventCount(统计存在延迟)**:剩余 event 数,当前数据同步任务中的所有 event 总数,包括数据和元数据同步的 event,以及系统和用户自定义的 event。 +- **EstimatedRemainingSeconds(统计存在延迟)**:剩余时间,基于当前 event 个数和 pipe 处速率,预估完成传输的剩余时间。 + +### 同步插件 + +为了使得整体架构更加灵活以匹配不同的同步场景需求,我们支持在同步任务框架中进行插件组装。系统为您预置了一些常用插件可直接使用,同时您也可以自定义 processor 插件 和 Sink 插件,并加载至 IoTDB 系统进行使用。查看系统中的插件(含自定义与内置插件)可以用以下语句: + +```SQL +SHOW PIPEPLUGINS +``` + +返回结果如下: + +```SQL +IoTDB> SHOW PIPEPLUGINS ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| PluginName|PluginType| ClassName| PluginJar| ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | +| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | +| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | +| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | +| IOTDB-THRIFT-SSL-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ + +``` + +预置插件详细介绍如下(各插件的详细参数可参考本文[参数说明](#参考参数说明)): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
类型自定义插件插件名称介绍适用版本
source 插件不支持iotdb-source默认的 extractor 插件,用于抽取 IoTDB 历史或实时数据1.2.x
processor 插件支持do-nothing-processor默认的 processor 插件,不对传入的数据做任何的处理1.2.x
sink 插件支持do-nothing-sink不对发送出的数据做任何的处理1.2.x
iotdb-thrift-sink默认的 sink 插件(V1.3.1及以上),用于 IoTDB(V1.2.0 及以上)与 IoTDB(V1.2.0 及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景1.2.x
iotdb-thrift-ssl-sink用于 IoTDB(V1.3.1 及以上)与 IoTDB(V1.2.0 及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,单线程 sync blocking IO 模型,适用于安全需求较高的场景 1.3.1+
+ +导入自定义插件可参考[流处理框架](./Streaming_timecho.md#自定义流处理插件管理)章节。 + +## 使用示例 + +### 全量数据同步 + +本例子用来演示将一个 IoTDB 的所有数据同步至另一个 IoTDB,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E5%90%8C%E6%AD%A51.png) + +在这个例子中,我们可以创建一个名为 A2B 的同步任务,用来同步 A IoTDB 到 B IoTDB 间的全量数据,这里需要用到用到 sink 的 iotdb-thrift-sink 插件(内置插件),需通过 node-urls 配置目标端 IoTDB 中 DataNode 节点的数据服务端口的 url,如下面的示例语句: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 部分数据同步 + +本例子用来演示同步某个历史时间范围( 2023 年 8 月 23 日 8 点到 2023 年 10 月 23 日 8 点)的数据至另一个 IoTDB,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E5%90%8C%E6%AD%A51.png) + +在这个例子中,我们可以创建一个名为 A2B 的同步任务。首先我们需要在 source 中定义传输数据的范围,由于传输的是历史数据(历史数据是指同步任务创建之前存在的数据),需要配置数据的起止时间 start-time 和 end-time 以及传输的模式 mode。通过 node-urls 配置目标端 IoTDB 中 DataNode 节点的数据服务端口的 url。 + +详细语句如下: + +```SQL +create pipe A2B +WITH SOURCE ( + 'source'= 'iotdb-source', + 'realtime.mode' = 'stream' -- 新插入数据(pipe创建后)的抽取模式 + 'start-time' = '2023.08.23T08:00:00+00:00', -- 同步所有数据的开始 event time,包含 start-time + 'end-time' = '2023.10.23T08:00:00+00:00' -- 同步所有数据的结束 event time,包含 end-time +) +with SINK ( + 'sink'='iotdb-thrift-async-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 边云数据传输 + +本例子用来演示多个 IoTDB 之间边云传输数据的场景,数据由 B 、C、D 集群分别都同步至 A 集群,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/dataSync03.png) + +在这个例子中,为了将 B 、C、D 集群的数据同步至 A,在 BA 、CA、DA 之间的 pipe 需要配置`path`限制范围,以及要保持边侧和云侧的数据一致 pipe 需要配置`inclusion=all`来同步全量数据和元数据,详细语句如下: + +在 B IoTDB 上执行下列语句,将 B 中数据同步至 A: + +```SQL +create pipe BA +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'path'='root.db.**', -- 限制范围 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +在 C IoTDB 上执行下列语句,将 C 中数据同步至 A: + +```SQL +create pipe CA +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'path'='root.db.**', -- 限制范围 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +在 D IoTDB 上执行下列语句,将 D 中数据同步至 A: + +```SQL +create pipe DA +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'path'='root.db.**', -- 限制范围 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 级联数据传输 + +本例子用来演示多个 IoTDB 之间级联传输数据的场景,数据由 A 集群同步至 B 集群,再同步至 C 集群,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/1706698610134.jpg) + +在这个例子中,为了将 A 集群的数据同步至 C,在 BC 之间的 pipe 需要将 `forwarding-pipe-requests` 配置为`true`,详细语句如下: + +在 A IoTDB 上执行下列语句,将 A 中数据同步至 B: + +```SQL +create pipe AB +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +在 B IoTDB 上执行下列语句,将 B 中数据同步至 C: + +```SQL +create pipe BC +with source ( + 'forwarding-pipe-requests' = 'true' --是否转发由其他 Pipe 写入的数据 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 压缩同步 + +IoTDB 支持在同步过程中指定数据压缩方式。可通过配置 `compressor` 参数,实现数据的实时压缩和传输。`compressor`目前支持 snappy / gzip / lz4 / zstd / lzma2 5 种可选算法,且可以选择多种压缩算法组合,按配置的顺序进行压缩。`rate-limit-bytes-per-second`(V1.3.3 及以后版本支持)每秒最大允许传输的byte数,计算压缩后的byte,若小于0则不限制。 + +如创建一个名为 A2B 的同步任务: + +```SQL +create pipe A2B +with sink ( + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url + 'compressor' = 'snappy,lz4' -- 压缩算法 +) +``` + +### 加密同步 + +IoTDB 支持在同步过程中使用 SSL 加密,从而在不同的 IoTDB 实例之间安全地传输数据。通过配置 SSL 相关的参数,如证书地址和密码(`ssl.trust-store-path`)、(`ssl.trust-store-pwd`)可以确保数据在同步过程中被 SSL 加密所保护。 + +如创建名为 A2B 的同步任务: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-ssl-sink', + 'node-urls'='127.0.0.1:6667', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url + 'ssl.trust-store-path'='pki/trusted', -- 连接目标端 DataNode 所需的 trust store 证书路径 + 'ssl.trust-store-pwd'='root' -- 连接目标端 DataNode 所需的 trust store 证书密码 +) +``` + +## 参考:注意事项 + +可通过修改 IoTDB 配置文件(`iotdb-system.properties`)以调整数据同步的参数,如同步数据存储目录等。完整配置如下:: + +V1.3.3+: + +```Properties +# pipe_receiver_file_dir +# If this property is unset, system will save the data in the default relative path directory under the IoTDB folder(i.e., %IOTDB_HOME%/${cn_system_dir}/pipe/receiver). +# If it is absolute, system will save the data in the exact location it points to. +# If it is relative, system will save the data in the relative path directory it indicates under the IoTDB folder. +# Note: If pipe_receiver_file_dir is assigned an empty string(i.e.,zero-size), it will be handled as a relative path. +# effectiveMode: restart +# For windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is absolute. Otherwise, it is relative. +# pipe_receiver_file_dir=data\\confignode\\system\\pipe\\receiver +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_receiver_file_dir=data/confignode/system/pipe/receiver + +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# effectiveMode: first_start +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# effectiveMode: restart +# Datatype: int +pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# effectiveMode: restart +# Datatype: int +pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# effectiveMode: restart +# Datatype: int +pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# effectiveMode: restart +# Datatype: int +pipe_sink_max_client_number=16 + +# The total bytes that all pipe sinks can transfer per second. +# When given a value less than or equal to 0, it means no limit. +# default value is -1, which means no limit. +# effectiveMode: hot_reload +# Datatype: double +pipe_all_sinks_rate_limit_bytes_per_second=-1 +``` + +## 参考:参数说明 + +### source 参数(V1.3.3) + +| 参数 | 描述 | value 取值范围 | 是否必填 | 默认取值 | +| ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | -------------- | +| source | iotdb-source | String: iotdb-source | 必填 | - | +| inclusion | 用于指定数据同步任务中需要同步范围,分为数据、元数据和权限 | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | 选填 | data.insert | +| inclusion.exclusion | 用于从 inclusion 指定的同步范围内排除特定的操作,减少同步的数据量 | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | 选填 | 空字符串 | +| mode | 用于在每个 data region 发送完毕时分别发送结束事件,并在全部 data region 发送完毕后自动 drop pipe。query:结束,subscribe:不结束。 | String: query / subscribe | 选填 | subscribe | +| path | 用于筛选待同步的时间序列及其相关元数据 / 数据的路径模式元数据同步只能用pathpath 是精确匹配,参数必须为前缀路径或完整路径,即不能含有 `"*"`,最多在 path参数的尾部含有一个 `"**"` | String:IoTDB 的 pattern | 选填 | root.** | +| pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | 选填 | root | +| start-time | 同步所有数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | 选填 | Long.MIN_VALUE | +| end-time | 同步所有数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | 选填 | Long.MAX_VALUE | +| realtime.mode | 新插入数据(pipe 创建后)的抽取模式 | String: batch | 选填 | batch | +| forwarding-pipe-requests | 是否转发由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true | 选填 | true | +| history.loose-range | tsfile传输时,是否放宽历史数据(pipe创建前)范围。"":不放宽范围,严格按照设置的条件挑选数据"time":放宽时间范围,避免对TsFile进行拆分,可以提升同步效率"path":放宽路径范围,避免对TsFile进行拆分,可以提升同步效率"time, path" 、 "path, time" 、"all" : 放宽所有范围,避免对TsFile进行拆分,可以提升同步效率 | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | 选填 | "" | +| realtime.loose-range | tsfile传输时,是否放宽实时数据(pipe创建前)范围。"":不放宽范围,严格按照设置的条件挑选数据"time":放宽时间范围,避免对TsFile进行拆分,可以提升同步效率"path":放宽路径范围,避免对TsFile进行拆分,可以提升同步效率"time, path" 、 "path, time" 、"all" : 放宽所有范围,避免对TsFile进行拆分,可以提升同步效率 | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | 选填 | "" | +| mods.enable | 是否发送 tsfile 的 mods 文件 | Boolean: true / false | 选填 | false | + +> 💎 **说明**:为保持低版本兼容,history.enable、history.start-time、history.end-time、realtime.enable 仍可使用,但在新版本中不推荐。 +> +> 💎 **说明:数据抽取模式 batch 的含义** +> - **batch**:该模式下,任务将对数据进行批量(按底层数据文件)处理、发送,其特点是低时效、高吞吐 + + +### sink 参数 + +> 在 1.3.3 及以上的版本中,只包含sink的情况下,不再需要额外增加with sink 前缀 + +#### iotdb-thrift-sink + +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------ | +| sink | iotdb-thrift-sink 或 iotdb-thrift-async-sink | String: iotdb-thrift-sink 或 iotdb-thrift-async-sink | 必填 | - | +| node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url(请注意同步任务不支持向自身服务进行转发) | String. 例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 必填 | - | +| batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | +| batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | +| batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 | 16*1024*1024 | + +#### iotdb-thrift-ssl-sink + +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------ | +| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | 必填 | - | +| node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url(请注意同步任务不支持向自身服务进行转发) | String. 例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 必填 | - | +| batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | +| batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | +| batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 | 16*1024*1024 | +| ssl.trust-store-path | 连接目标端 DataNode 所需的 trust store 证书路径 | String.Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 必填 | - | +| ssl.trust-store-pwd | 连接目标端 DataNode 所需的 trust store 证书密码 | Integer | 必填 | - | + + diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_timecho.md new file mode 100644 index 00000000..db6e76bd --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-Sync_timecho.md @@ -0,0 +1,607 @@ + + +# 数据同步 +数据同步是工业物联网的典型需求,通过数据同步机制,可实现 IoTDB 之间的数据共享,搭建完整的数据链路来满足内网外网数据互通、端边云同步、数据迁移、数据备份等需求。 + +## 功能概述 + +### 数据同步 + +一个数据同步任务包含 3 个阶段: + +![](https://alioss.timecho.com/docs/img/dataSync01.png) + +- 抽取(Source)阶段:该部分用于从源 IoTDB 抽取数据,在 SQL 语句中的 source 部分定义 +- 处理(Process)阶段:该部分用于处理从源 IoTDB 抽取出的数据,在 SQL 语句中的 processor 部分定义 +- 发送(Sink)阶段:该部分用于向目标 IoTDB 发送数据,在 SQL 语句中的 sink 部分定义 + +通过 SQL 语句声明式地配置 3 个部分的具体内容,可实现灵活的数据同步能力。目前数据同步支持以下信息的同步,您可以在创建同步任务时对同步范围进行选择(默认选择 data.insert,即同步新写入的数据): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
同步范围同步内容说明
all所有范围
data(数据)insert(增量)同步新写入的数据
delete(删除)同步被删除的数据
schema(元数据)database(数据库)同步数据库的创建、修改或删除操作
timeseries(时间序列)同步时间序列的定义和属性
TTL(数据到期时间)同步数据的存活时间
auth(权限)-同步用户权限和访问控制
+ +### 功能限制及说明 + +元数据(schema)、权限(auth)同步功能存在如下限制: + +- 使用元数据同步时,要求`Schema region`、`ConfigNode` 的共识协议必须为默认的 ratis 协议,即`iotdb-system.properties`配置文件中是否包含`config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus`、`schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus`,不包含即为默认值ratis 协议。 + +- 为了防止潜在的冲突,请在开启元数据同步时关闭接收端自动创建元数据功能。可通过修改 `iotdb-system.properties`配置文件中的`enable_auto_create_schema`配置项为 false,关闭元数据自动创建功能。 + +- 开启元数据同步时,不支持使用自定义插件。 + +- 双活集群中元数据同步需避免两端同时操作。 + +- 在进行数据同步任务时,请避免执行任何删除操作,防止两端状态不一致。 + +## 使用说明 + +数据同步任务有三种状态:RUNNING、STOPPED 和 DROPPED。任务状态转换如下图所示: + +![](https://alioss.timecho.com/docs/img/Data-Sync01.png) + +创建后任务会直接启动,同时当任务发生异常停止后,系统会自动尝试重启任务。 + +提供以下 SQL 语句对同步任务进行状态管理。 + +### 创建任务 + +使用 `CREATE PIPE` 语句来创建一条数据同步任务,下列属性中`PipeId`和`sink`必填,`source`和`processor`为选填项,输入 SQL 时注意 `SOURCE`与 `SINK` 插件顺序不能替换。 + +SQL 示例如下: + +```SQL +CREATE PIPE [IF NOT EXISTS] -- PipeId 是能够唯一标定任务的名字 +-- 数据抽取插件,可选插件 +WITH SOURCE ( + [ = ,], +) +-- 数据处理插件,可选插件 +WITH PROCESSOR ( + [ = ,], +) +-- 数据连接插件,必填插件 +WITH SINK ( + [ = ,], +) +``` + +**IF NOT EXISTS 语义**:用于创建操作中,确保当指定 Pipe 不存在时,执行创建命令,防止因尝试创建已存在的 Pipe 而导致报错。 + +### 开始任务 + +开始处理数据: + +```SQL +START PIPE +``` + +### 停止任务 + +停止处理数据: + +```SQL +STOP PIPE +``` + +### 删除任务 + +删除指定任务: + +```SQL +DROP PIPE [IF EXISTS] +``` + +**IF EXISTS 语义**:用于删除操作中,确保当指定 Pipe 存在时,执行删除命令,防止因尝试删除不存在的 Pipe 而导致报错。 + +删除任务不需要先停止同步任务。 + +### 查看任务 + +查看全部任务: + +```SQL +SHOW PIPES +``` + +查看指定任务: + +```SQL +SHOW PIPE +``` + + pipe 的 show pipes 结果示例: + +```SQL ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor| PipeSink|ExceptionMessage|RemainingEventCount|EstimatedRemainingSeconds| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +|59abf95db892428b9d01c5fa318014ea|2024-06-17T14:03:44.189|RUNNING| {}| {}|{sink=iotdb-thrift-sink, sink.ip=127.0.0.1, sink.port=6668}| | 128| 1.03| ++--------------------------------+-----------------------+-------+----------+-------------+-----------------------------------------------------------+----------------+-------------------+-------------------------+ +``` + +其中各列含义如下: + +- **ID**:同步任务的唯一标识符 +- **CreationTime**:同步任务的创建的时间 +- **State**:同步任务的状态 +- **PipeSource**:同步数据流的来源 +- **PipeProcessor**:同步数据流在传输过程中的处理逻辑 +- **PipeSink**:同步数据流的目的地 +- **ExceptionMessage**:显示同步任务的异常信息 +- **RemainingEventCount(统计存在延迟)**:剩余 event 数,当前数据同步任务中的所有 event 总数,包括数据和元数据同步的 event,以及系统和用户自定义的 event。 +- **EstimatedRemainingSeconds(统计存在延迟)**:剩余时间,基于当前 event 个数和 pipe 处速率,预估完成传输的剩余时间。 + +### 同步插件 + +为了使得整体架构更加灵活以匹配不同的同步场景需求,我们支持在同步任务框架中进行插件组装。系统为您预置了一些常用插件可直接使用,同时您也可以自定义 processor 插件 和 Sink 插件,并加载至 IoTDB 系统进行使用。查看系统中的插件(含自定义与内置插件)可以用以下语句: + +```SQL +SHOW PIPEPLUGINS +``` + +返回结果如下: + +```SQL +IoTDB> SHOW PIPEPLUGINS ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| PluginName|PluginType| ClassName| PluginJar| ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ +| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.donothing.DoNothingProcessor| | +| DO-NOTHING-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.donothing.DoNothingConnector| | +| IOTDB-AIR-GAP-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.airgap.IoTDBAirGapConnector| | +| IOTDB-SOURCE| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.iotdb.IoTDBExtractor| | +| IOTDB-THRIFT-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftConnector| | +| IOTDB-THRIFT-SSL-SINK| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.iotdb.thrift.IoTDBThriftSslConnector| | ++------------------------------+----------+--------------------------------------------------------------------------------------------------+----------------------------------------------------+ + +``` + +预置插件详细介绍如下(各插件的详细参数可参考本文[参数说明](#参考参数说明)): + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
类型自定义插件插件名称介绍适用版本
source 插件不支持iotdb-source默认的 extractor 插件,用于抽取 IoTDB 历史或实时数据1.2.x
processor 插件支持do-nothing-processor默认的 processor 插件,不对传入的数据做任何的处理1.2.x
sink 插件支持do-nothing-sink不对发送出的数据做任何的处理1.2.x
iotdb-thrift-sink默认的 sink 插件(V1.3.1及以上),用于 IoTDB(V1.2.0 及以上)与 IoTDB(V1.2.0 及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景1.2.x
iotdb-air-gap-sink用于 IoTDB(V1.2.2 及以上)向 IoTDB(V1.2.2 及以上)跨单向数据网闸的数据同步。支持的网闸型号包括南瑞 Syskeeper 2000 等1.2.x
iotdb-thrift-ssl-sink用于 IoTDB(V1.3.1 及以上)与 IoTDB(V1.2.0 及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,单线程 sync blocking IO 模型,适用于安全需求较高的场景 1.3.1+
+ +导入自定义插件可参考[流处理框架](./Streaming_timecho.md#自定义流处理插件管理)章节。 + +## 使用示例 + +### 全量数据同步 + +本例子用来演示将一个 IoTDB 的所有数据同步至另一个 IoTDB,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E5%90%8C%E6%AD%A51.png) + +在这个例子中,我们可以创建一个名为 A2B 的同步任务,用来同步 A IoTDB 到 B IoTDB 间的全量数据,这里需要用到用到 sink 的 iotdb-thrift-sink 插件(内置插件),需通过 node-urls 配置目标端 IoTDB 中 DataNode 节点的数据服务端口的 url,如下面的示例语句: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 部分数据同步 + +本例子用来演示同步某个历史时间范围( 2023 年 8 月 23 日 8 点到 2023 年 10 月 23 日 8 点)的数据至另一个 IoTDB,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E5%90%8C%E6%AD%A51.png) + +在这个例子中,我们可以创建一个名为 A2B 的同步任务。首先我们需要在 source 中定义传输数据的范围,由于传输的是历史数据(历史数据是指同步任务创建之前存在的数据),需要配置数据的起止时间 start-time 和 end-time 以及传输的模式 mode。通过 node-urls 配置目标端 IoTDB 中 DataNode 节点的数据服务端口的 url。 + +详细语句如下: + +```SQL +create pipe A2B +WITH SOURCE ( + 'source'= 'iotdb-source', + 'realtime.mode' = 'stream' -- 新插入数据(pipe创建后)的抽取模式 + 'start-time' = '2023.08.23T08:00:00+00:00', -- 同步所有数据的开始 event time,包含 start-time + 'end-time' = '2023.10.23T08:00:00+00:00' -- 同步所有数据的结束 event time,包含 end-time +) +with SINK ( + 'sink'='iotdb-thrift-async-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 双向数据传输 + +本例子用来演示两个 IoTDB 之间互为双活的场景,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/1706698592139.jpg) + +在这个例子中,为了避免数据无限循环,需要将 A 和 B 上的参数`forwarding-pipe-requests` 均设置为 `false`,表示不转发从另一 pipe 传输而来的数据,以及要保持两侧的数据一致 pipe 需要配置`inclusion=all`来同步全量数据和元数据。 + +详细语句如下: + +在 A IoTDB 上执行下列语句: + +```SQL +create pipe AB +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'forwarding-pipe-requests' = 'false' --不转发由其他 Pipe 写入的数据 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +在 B IoTDB 上执行下列语句: + +```SQL +create pipe BA +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'forwarding-pipe-requests' = 'false' --是否转发由其他 Pipe 写入的数据 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` +### 边云数据传输 + +本例子用来演示多个 IoTDB 之间边云传输数据的场景,数据由 B 、C、D 集群分别都同步至 A 集群,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/dataSync03.png) + +在这个例子中,为了将 B 、C、D 集群的数据同步至 A,在 BA 、CA、DA 之间的 pipe 需要配置`path`限制范围,以及要保持边侧和云侧的数据一致 pipe 需要配置`inclusion=all`来同步全量数据和元数据,详细语句如下: + +在 B IoTDB 上执行下列语句,将 B 中数据同步至 A: + +```SQL +create pipe BA +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'path'='root.db.**', -- 限制范围 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6667', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +在 C IoTDB 上执行下列语句,将 C 中数据同步至 A: + +```SQL +create pipe CA +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'path'='root.db.**', -- 限制范围 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +在 D IoTDB 上执行下列语句,将 D 中数据同步至 A: + +```SQL +create pipe DA +with source ( + 'inclusion'='all', -- 表示同步全量数据、元数据和权限 + 'path'='root.db.**', -- 限制范围 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 级联数据传输 + +本例子用来演示多个 IoTDB 之间级联传输数据的场景,数据由 A 集群同步至 B 集群,再同步至 C 集群,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/1706698610134.jpg) + +在这个例子中,为了将 A 集群的数据同步至 C,在 BC 之间的 pipe 需要将 `forwarding-pipe-requests` 配置为`true`,详细语句如下: + +在 A IoTDB 上执行下列语句,将 A 中数据同步至 B: + +```SQL +create pipe AB +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +在 B IoTDB 上执行下列语句,将 B 中数据同步至 C: + +```SQL +create pipe BC +with source ( + 'forwarding-pipe-requests' = 'true' --是否转发由其他 Pipe 写入的数据 +) +with sink ( + 'sink'='iotdb-thrift-sink', + 'node-urls' = '127.0.0.1:6669', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 跨网闸数据传输 + +本例子用来演示将一个 IoTDB 的数据,经过单向网闸,同步至另一个 IoTDB 的场景,数据链路如下图所示: + +![](https://alioss.timecho.com/docs/img/%E6%95%B0%E6%8D%AE%E4%BC%A0%E8%BE%931.png) + +在这个例子中,需要使用 sink 任务中的 iotdb-air-gap-sink 插件(目前支持部分型号网闸,具体型号请联系天谋科技工作人员确认),配置网闸后,在 A IoTDB 上执行下列语句,其中 node-urls 填写网闸配置的目标端 IoTDB 中 DataNode 节点的数据服务端口的 url,详细语句如下: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-air-gap-sink', + 'node-urls' = '10.53.53.53:9780', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url +) +``` + +### 压缩同步 + +IoTDB 支持在同步过程中指定数据压缩方式。可通过配置 `compressor` 参数,实现数据的实时压缩和传输。`compressor`目前支持 snappy / gzip / lz4 / zstd / lzma2 5 种可选算法,且可以选择多种压缩算法组合,按配置的顺序进行压缩。`rate-limit-bytes-per-second`(V1.3.3 及以后版本支持)每秒最大允许传输的byte数,计算压缩后的byte,若小于0则不限制。 + +如创建一个名为 A2B 的同步任务: + +```SQL +create pipe A2B +with sink ( + 'node-urls' = '127.0.0.1:6668', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url + 'compressor' = 'snappy,lz4' -- + 'rate-limit-bytes-per-second'='1048576' -- 每秒最大允许传输的byte数 +) +``` + +### 加密同步 + +IoTDB 支持在同步过程中使用 SSL 加密,从而在不同的 IoTDB 实例之间安全地传输数据。通过配置 SSL 相关的参数,如证书地址和密码(`ssl.trust-store-path`)、(`ssl.trust-store-pwd`)可以确保数据在同步过程中被 SSL 加密所保护。 + +如创建名为 A2B 的同步任务: + +```SQL +create pipe A2B +with sink ( + 'sink'='iotdb-thrift-ssl-sink', + 'node-urls'='127.0.0.1:6667', -- 目标端 IoTDB 中 DataNode 节点的数据服务端口的 url + 'ssl.trust-store-path'='pki/trusted', -- 连接目标端 DataNode 所需的 trust store 证书路径 + 'ssl.trust-store-pwd'='root' -- 连接目标端 DataNode 所需的 trust store 证书密码 +) +``` + +## 参考:注意事项 + +可通过修改 IoTDB 配置文件(`iotdb-system.properties`)以调整数据同步的参数,如同步数据存储目录等。完整配置如下:: + +V1.3.3+: + +```Properties +# pipe_receiver_file_dir +# If this property is unset, system will save the data in the default relative path directory under the IoTDB folder(i.e., %IOTDB_HOME%/${cn_system_dir}/pipe/receiver). +# If it is absolute, system will save the data in the exact location it points to. +# If it is relative, system will save the data in the relative path directory it indicates under the IoTDB folder. +# Note: If pipe_receiver_file_dir is assigned an empty string(i.e.,zero-size), it will be handled as a relative path. +# effectiveMode: restart +# For windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is absolute. Otherwise, it is relative. +# pipe_receiver_file_dir=data\\confignode\\system\\pipe\\receiver +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_receiver_file_dir=data/confignode/system/pipe/receiver + +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# effectiveMode: first_start +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# effectiveMode: restart +# Datatype: int +pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# effectiveMode: restart +# Datatype: int +pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# effectiveMode: restart +# Datatype: int +pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# effectiveMode: restart +# Datatype: int +pipe_sink_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# effectiveMode: restart +# Datatype: Boolean +pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# Datatype: int +# effectiveMode: restart +pipe_air_gap_receiver_port=9780 + +# The total bytes that all pipe sinks can transfer per second. +# When given a value less than or equal to 0, it means no limit. +# default value is -1, which means no limit. +# effectiveMode: hot_reload +# Datatype: double +pipe_all_sinks_rate_limit_bytes_per_second=-1 +``` + +## 参考:参数说明 + +### source 参数(V1.3.3) + +| 参数 | 描述 | value 取值范围 | 是否必填 | 默认取值 | +| ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | -------------- | +| source | iotdb-source | String: iotdb-source | 必填 | - | +| inclusion | 用于指定数据同步任务中需要同步范围,分为数据、元数据和权限 | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | 选填 | data.insert | +| inclusion.exclusion | 用于从 inclusion 指定的同步范围内排除特定的操作,减少同步的数据量 | String:all, data(insert,delete), schema(database,timeseries,ttl), auth | 选填 | 空字符串 | +| mode | 用于在每个 data region 发送完毕时分别发送结束事件,并在全部 data region 发送完毕后自动 drop pipe。query:结束,subscribe:不结束。 | String: query / subscribe | 选填 | subscribe | +| path | 用于筛选待同步的时间序列及其相关元数据 / 数据的路径模式元数据同步只能用pathpath 是精确匹配,参数必须为前缀路径或完整路径,即不能含有 `"*"`,最多在 path参数的尾部含有一个 `"**"` | String:IoTDB 的 pattern | 选填 | root.** | +| pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | 选填 | root | +| start-time | 同步所有数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | 选填 | Long.MIN_VALUE | +| end-time | 同步所有数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | 选填 | Long.MAX_VALUE | +| realtime.mode | 新插入数据(pipe创建后)的抽取模式 | String: stream, batch | 选填 | batch | +| forwarding-pipe-requests | 是否转发由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | 选填 | true | +| history.loose-range | tsfile传输时,是否放宽历史数据(pipe创建前)范围。"":不放宽范围,严格按照设置的条件挑选数据"time":放宽时间范围,避免对TsFile进行拆分,可以提升同步效率"path":放宽路径范围,避免对TsFile进行拆分,可以提升同步效率"time, path" 、 "path, time" 、"all" : 放宽所有范围,避免对TsFile进行拆分,可以提升同步效率 | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | 选填 | "" | +| realtime.loose-range | tsfile传输时,是否放宽实时数据(pipe创建前)范围。"":不放宽范围,严格按照设置的条件挑选数据"time":放宽时间范围,避免对TsFile进行拆分,可以提升同步效率"path":放宽路径范围,避免对TsFile进行拆分,可以提升同步效率"time, path" 、 "path, time" 、"all" : 放宽所有范围,避免对TsFile进行拆分,可以提升同步效率 | String: "" 、 "time" 、 "path" 、 "time, path" 、 "path, time" 、 "all" | 选填 | "" | +| mods.enable | 是否发送 tsfile 的 mods 文件 | Boolean: true / false | 选填 | false | + +> 💎 **说明**:为保持低版本兼容,history.enable、history.start-time、history.end-time、realtime.enable 仍可使用,但在新版本中不推荐。 +> +> 💎 **说明:数据抽取模式 stream 和 batch 的差异** +> - **stream(推荐)**:该模式下,任务将对数据进行实时处理、发送,其特点是高时效、低吞吐 +> - **batch**:该模式下,任务将对数据进行批量(按底层数据文件)处理、发送,其特点是低时效、高吞吐 + + +## sink **参数** + +> 在 1.3.3 及以上的版本中,只包含sink的情况下,不再需要额外增加with sink 前缀 + +#### iotdb-thrift-sink + +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------ | +| sink | iotdb-thrift-sink 或 iotdb-thrift-async-sink | String: iotdb-thrift-sink 或 iotdb-thrift-async-sink | 必填 | - | +| node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url(请注意同步任务不支持向自身服务进行转发) | String. 例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 必填 | - | +| batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | +| batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | +| batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 | 16*1024*1024 | + +#### iotdb-air-gap-sink + +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| ---------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | -------- | +| sink | iotdb-air-gap-sink | String: iotdb-air-gap-sink | 必填 | - | +| node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String. 例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 必填 | - | +| air-gap.handshake-timeout-ms | 发送端与接收端在首次尝试建立连接时握手请求的超时时长,单位:毫秒 | Integer | 选填 | 5000 | + +#### iotdb-thrift-ssl-sink + +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| ----------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------ | +| sink | iotdb-thrift-ssl-sink | String: iotdb-thrift-ssl-sink | 必填 | - | +| node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url(请注意同步任务不支持向自身服务进行转发) | String. 例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 必填 | - | +| batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | +| batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | +| batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 | 16*1024*1024 | +| ssl.trust-store-path | 连接目标端 DataNode 所需的 trust store 证书路径 | String.Example: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 必填 | - | +| ssl.trust-store-pwd | 连接目标端 DataNode 所需的 trust store 证书密码 | Integer | 必填 | - | diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-subscription.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-subscription.md new file mode 100644 index 00000000..18dd6d46 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Data-subscription.md @@ -0,0 +1,144 @@ +# 数据订阅 + +## 1. 功能介绍 + +IoTDB 数据订阅模块(又称 IoTDB 订阅客户端)是IoTDB V1.3.3 版本后支持的功能,它为用户提供了一种区别于数据查询的流式数据消费方式。它参考了 Kafka 等消息队列产品的基本概念和逻辑,**提供数据订阅和消费接口**,但并不是为了完全替代这些消费队列的产品,更多的是在简单流式获取数据的场景为用户提供更加便捷的数据订阅服务。 + +在下面应用场景中,使用 IoTDB 订阅客户端消费数据会有显著的优势: + +1. **持续获取最新数据**:使用订阅的方式,比定时查询更实时、应用编程更简单、系统负担更小; +2. **简化数据推送至第三方系统**:无需在 IoTDB 内部开发不同系统的数据推送组件,可以在第三方系统内实现数据的流式获取,更方便将数据发送至 Flink、Kafka、DataX、Camel、MySQL、PG 等系统。 + +## 2. 主要概念 + +IoTDB 订阅客户端包含 3 个核心概念:Topic、Consumer、Consumer Group,具体关系如下图 + +
+ +
+ +1. **Topic(主题)**: Topic 是 IoTDB 的数据空间,由路径和时间范围表示(如 root.** 的全时间范围)。消费者可以订阅这些主题的数据(当前已有的和未来写入的)。不同于 Kafka,IoTDB 可在数据入库后再创建 Topic,且输出格式可选择 Message 或 TsFile 两种。 + +2. **Consumer(消费者)**: Consumer 是 IoTDB 的订阅客户端,负责接收和处理发布到特定 Topic 的数据。Consumer 从队列中获取数据并进行相应的处理。在 IoTDB 订阅客户端中提供了两种类型的 Consumers: + - 一种是 `SubscriptionPullConsumer`,对应的是消息队列中的 pull 消费模式,用户代码需要主动调用数据获取逻辑 + - 一种是 `SubscriptionPushConsumer`,对应的是消息队列中的 push 消费模式,用户代码由新到达的数据事件触发 + +3. **Consumer Group(消费者组)**: Consumer Group 是一组 Consumers 的集合,拥有相同 Consumer Group ID 的消费者属于同一个消费者组。Consumer Group 有以下特点: + - Consumer Group 与 Consumer 为一对多的关系。即一个 consumer group 中的 consumers 可以有任意多个,但不允许一个 consumer 同时加入多个 consumer groups + - 允许一个 Consumer Group 中有不同类型的 Consumer(`SubscriptionPullConsumer` 和 `SubscriptionPushConsumer`) + - 一个 topic 不需要被一个 consumer group 中的所有 consumer 订阅 + - 当同一个 Consumer Group 中不同的 Consumers 订阅了相同的 Topic 时,该 Topic 下的每条数据只会被组内的一个 Consumer 处理,确保数据不会被重复处理 + +## 3. SQL 语句 + +### 3.1 Topic 管理 + +IoTDB 支持通过 SQL 语句对 Topic 进行创建、删除、查看操作。Topic状态变化如下图所示: + +
+ +
+ +#### 3.1.1 创建 Topic + +SQL 语句为: + +```SQL + CREATE TOPIC [IF NOT EXISTS] + WITH ( + [ = ,], + ); +``` +**IF NOT EXISTS 语义**:用于创建操作中,确保当指定 Topic 不存在时,执行创建命令,防止因尝试创建已存在的 Topic 而导致报错。 + +各参数详细解释如下: + +| 参数 | 是否必填(默认值) | 参数含义 | +| :-------------------------------------------- | :--------------------------------- | :----------------------------------------------------------- | +| **path** | optional: `root.**` | topic 对应订阅数据时间序列的路径 path,表示一组需要订阅的时间序列集合 | +| **start-time** | optional: `MIN_VALUE` | topic 对应订阅数据时间序列的开始时间(event time)可以为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00也可以为 long 值,含义为裸时间戳,单位与数据库时间戳精度一致支持特殊 value **`now`**,含义为 topic 的创建时间,当 start-time 为 `now` 且 end-time 为 MAX_VALUE 时表示只订阅实时数据 | +| **end-time** | optional: `MAX_VALUE` | topic 对应订阅数据时间序列的结束时间(event time)可以为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00也可以为 long 值,含义为裸时间戳,单位与数据库时间戳精度一致支持特殊 value `now`,含义为 topic 的创建时间,当 end-time 为 `now` 且 start-time 为 MIN_VALUE 时表示只订阅历史数据 | +| **processor** | optional: `do-nothing-processor` | processor 插件名及其参数配置,表示对原始订阅数据应用的自定义处理逻辑,可以通过类似 pipe processor 插件的方式指定 | +| **format** | optional: `SessionDataSetsHandler` | 表示从该主题订阅出的数据呈现形式,目前支持下述两种数据形式:`SessionDataSetsHandler`:使用 `SubscriptionSessionDataSetsHandler` 获取从该主题订阅出的数据,消费者可以按行消费每条数据`TsFileHandler`:使用 `SubscriptionTsFileHandler` 获取从该主题订阅出的数据,消费者可以直接订阅到存储相应数据的 TsFile | +| **mode** **(v1.3.3.2 及之后版本支持)** | option: `live` | topic 对应的订阅模式,有两个选项:`live`:订阅该主题时,订阅的数据集模式为动态数据集,即可以不断消费到最新的数据`snapshot`:consumer 订阅该主题时,订阅的数据集模式为静态数据集,即 consumer group 订阅该主题的时刻(不是创建主题的时刻)数据的 snapshot;形成订阅后的静态数据集不支持 TTL | +| **loose-range** **(v1.3.3.2 及之后版本支持)** | option: `""` | String: 是否严格按照 path 和 time range 来筛选该 topic 对应的数据,例如:`""`:严格按照 path 和 time range 来筛选该 topic 对应的数据`"time"`:不严格按照 time range 来筛选该 topic 对应的数据(粗筛);严格按照 path 来筛选该 topic 对应的数据`"path"`:不严格按照 path 来筛选该 topic 对应的数据(粗筛);严格按照 time range 来筛选该 topic 对应的数据`"time, path"` / `"path, time"` / `"all"`:不严格按照 path 和 time range 来筛选该 topic 对应的数据(粗筛) | + +示例如下: + +```SQL +-- 全量订阅 +CREATE TOPIC root_all; + +-- 自定义订阅 +CREATE TOPIC IF NOT EXISTS db_timerange +WITH ( + 'path' = 'root.db.**', + 'start-time' = '2023-01-01', + 'end-time' = '2023-12-31', +); +``` + +#### 3.1.2 删除 Topic + +Topic 在没有被订阅的情况下,才能被删除,Topic 被删除时,其相关的消费进度都会被清理 + +```SQL +DROP TOPIC [IF EXISTS] ; +``` + +**IF EXISTS 语义**:用于删除操作中,确保当指定 Topic 存在时,执行删除命令,防止因尝试删除不存在的 Topic 而导致报错。 + +#### 3.1.3 查看 Topic + +```SQL +SHOW TOPICS; +SHOW TOPIC ; +``` + +结果集: + +```SQL +[TopicName|TopicConfigs] +``` + +- TopicName:主题 ID +- TopicConfigs:主题配置 + +### 3.2 查看订阅状态 + +查看所有订阅关系: + +```SQL +-- 查询所有的 topics 与 consumer group 的订阅关系 +SHOW SUBSCRIPTIONS +-- 查询某个 topic 下所有的 subscriptions +SHOW SUBSCRIPTIONS ON +``` + +结果集: + +```SQL +[TopicName|ConsumerGroupName|SubscribedConsumers] +``` + +- TopicName:主题 ID +- ConsumerGroupName:用户代码中指定的消费者组 ID +- SubscribedConsumers:该消费者组中订阅了该主题的所有客户端 ID + +## 4. API 接口 + +除 SQL 语句外,IoTDB 还支持通过 Java 原生接口使用数据订阅功能。详细语法参见页面:Java 原生接口([链接](../API/Programming-Java-Native-API.md))。 + +## 5. 常见问题 + +### 5.1 IoTDB 数据订阅与 Kafka 的区别是什么? + +1. 消费有序性 + +- **Kafka 保证消息在单个 partition 内是有序的**,当某个 topic 仅对应一个 partition 且只有一个 consumer 订阅了这个 topic,即可保证该 consumer(单线程) 消费该 topic 数据的顺序即为数据写入的顺序。 +- IoTDB 订阅客户端**不保证** consumer 消费数据的顺序即为数据写入的顺序,但会尽量反映数据写入的顺序。 + +2. 消息送达语义 + +- Kafka 可以通过配置实现 Producer 和 Consumer 的 Exactly once 语义。 +- IoTDB 订阅客户端目前无法提供 Consumer 的 Exactly once 语义。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Database-Programming.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Database-Programming.md new file mode 100644 index 00000000..72b9570b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Database-Programming.md @@ -0,0 +1,586 @@ + + +# 连续查询 + +## 简介 +连续查询(Continuous queries, aka CQ) 是对实时数据周期性地自动执行的查询,并将查询结果写入指定的时间序列中。 + +用户可以通过连续查询实现滑动窗口流式计算,如计算某个序列每小时平均温度,并写入一个新序列中。用户可以自定义 `RESAMPLE` 子句去创建不同的滑动窗口,可以实现对于乱序数据一定程度的容忍。 + +## 语法 + +```sql +CREATE (CONTINUOUS QUERY | CQ) +[RESAMPLE + [EVERY ] + [BOUNDARY ] + [RANGE [, end_time_offset]] +] +[TIMEOUT POLICY BLOCKED|DISCARD] +BEGIN + SELECT CLAUSE + INTO CLAUSE + FROM CLAUSE + [WHERE CLAUSE] + [GROUP BY([, ]) [, level = ]] + [HAVING CLAUSE] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +END +``` + +> 注意: +> 1. 如果where子句中出现任何时间过滤条件,IoTDB将会抛出异常,因为IoTDB会自动为每次查询执行指定时间范围。 +> 2. GROUP BY TIME CLAUSE在连续查询中的语法稍有不同,它不能包含原来的第一个参数,即 [start_time, end_time),IoTDB会自动填充这个缺失的参数。如果指定,IoTDB将会抛出异常。 +> 3. 如果连续查询中既没有GROUP BY TIME子句,也没有指定EVERY子句,IoTDB将会抛出异常。 + +### 连续查询语法中参数含义的描述 + +- `` 为连续查询指定一个全局唯一的标识。 +- `` 指定了连续查询周期性执行的间隔。现在支持的时间单位有:ns, us, ms, s, m, h, d, w, 并且它的值不能小于用户在`iotdb-system.properties`配置文件中指定的`continuous_query_min_every_interval`。这是一个可选参数,默认等于group by子句中的`group_by_interval`。 +- `` 指定了每次查询执行窗口的开始时间,即`now()-`。现在支持的时间单位有:ns, us, ms, s, m, h, d, w。这是一个可选参数,默认等于`EVERY`子句中的`every_interval`。 +- `` 指定了每次查询执行窗口的结束时间,即`now()-`。现在支持的时间单位有:ns, us, ms, s, m, h, d, w。这是一个可选参数,默认等于`0`. +- `` 表示用户期待的连续查询的首个周期任务的执行时间。(因为连续查询只会对当前实时的数据流做计算,所以该连续查询实际首个周期任务的执行时间并不一定等于用户指定的时间,具体计算逻辑如下所示) + - `` 可以早于、等于或者迟于当前时间。 + - 这个参数是可选的,默认等于`0`。 + - 首次查询执行窗口的开始时间为` - `. + - 首次查询执行窗口的结束时间为` - `. + - 第i个查询执行窗口的时间范围是`[ - + (i - 1) * , - + (i - 1) * )`。 + - 如果当前时间早于或等于, 那连续查询的首个周期任务的执行时间就是用户指定的`execution_boundary_time`. + - 如果当前时间迟于用户指定的`execution_boundary_time`,那么连续查询的首个周期任务的执行时间就是`execution_boundary_time + i * `中第一个大于或等于当前时间的值。 + +> - 都应该大于 0 +> - 应该小于等于 +> - 用户应该根据实际需求,为 指定合适的值 +> - 如果大于,在每一次查询执行的时间窗口上会有部分重叠 +> - 如果小于,在连续的两次查询执行的时间窗口中间将会有未覆盖的时间范围 +> - start_time_offset 应该大于end_time_offset + +#### ``等于`` + +![1](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic1.png?raw=true) + +#### ``大于`` + +![2](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic2.png?raw=true) + +#### ``小于`` + +![3](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic3.png?raw=true) + +#### ``不为0 + +![4](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic4.png?raw=true) + +- `TIMEOUT POLICY` 指定了我们如何处理“前一个时间窗口还未执行完时,下一个窗口的执行时间已经到达的场景,默认值是`BLOCKED`. + - `BLOCKED`意味着即使下一个窗口的执行时间已经到达,我们依旧需要阻塞等待前一个时间窗口的查询执行完再开始执行下一个窗口。如果使用`BLOCKED`策略,所有的时间窗口都将会被依此执行,但是如果遇到执行查询的时间长于周期性间隔时,连续查询的结果会迟于最新的时间窗口范围。 + - `DISCARD`意味着如果前一个时间窗口还未执行完,我们会直接丢弃下一个窗口的执行时间。如果使用`DISCARD`策略,可能会有部分时间窗口得不到执行。但是一旦前一个查询执行完后,它将会使用最新的时间窗口,所以它的执行结果总能赶上最新的时间窗口范围,当然是以部分时间窗口得不到执行为代价。 + + +## 连续查询的用例 + +下面是用例数据,这是一个实时的数据流,我们假设数据都按时到达。 + +```` ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.ln.wf02.wt02.temperature|root.ln.wf02.wt01.temperature|root.ln.wf01.wt02.temperature|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|2021-05-11T22:18:14.598+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:19.941+08:00| 0.0| 68.0| 68.0| 103.0| +|2021-05-11T22:18:24.949+08:00| 122.0| 45.0| 11.0| 14.0| +|2021-05-11T22:18:29.967+08:00| 47.0| 14.0| 59.0| 181.0| +|2021-05-11T22:18:34.979+08:00| 182.0| 113.0| 29.0| 180.0| +|2021-05-11T22:18:39.990+08:00| 42.0| 11.0| 52.0| 19.0| +|2021-05-11T22:18:44.995+08:00| 78.0| 38.0| 123.0| 52.0| +|2021-05-11T22:18:49.999+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:55.003+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +```` + +### 配置连续查询执行的周期性间隔 + +在`RESAMPLE`子句中使用`EVERY`参数指定连续查询的执行间隔,如果没有指定,默认等于`group_by_interval`。 + +```sql +CREATE CONTINUOUS QUERY cq1 +RESAMPLE EVERY 20s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq1`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。 + +`cq1`每20秒执行一次,每次执行的查询的时间窗口范围是从过去20秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq1`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq1`并不会处理当前时间窗口以外的数据,即`2021-05-11T22:18:20.000+08:00`以前的数据,所以我们会得到如下结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 配置连续查询的时间窗口大小 + +使用`RANGE`子句中的`start_time_offset`参数指定连续查询每次执行的时间窗口的开始时间偏移,如果没有指定,默认值等于`EVERY`参数。 + +```sql +CREATE CONTINUOUS QUERY cq2 +RESAMPLE RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq2`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。 + +`cq2`每10秒执行一次,每次执行的查询的时间窗口范围是从过去40秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq2`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| NULL| NULL| NULL| NULL| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:18:50.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:10, 2021-05-11T22:18:50)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq2`并不会写入全是null值的行,值得注意的是`cq2`会多次计算某些区间的聚合值,下面是计算结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 同时配置连续查询执行的周期性间隔和时间窗口大小 + +使用`RESAMPLE`子句中的`EVERY`参数和`RANGE`参数分别指定连续查询的执行间隔和窗口大小。并且使用`fill()`来填充没有值的时间区间。 + +```sql +CREATE CONTINUOUS QUERY cq3 +RESAMPLE EVERY 20s RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq3`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。如果某些区间没有值,用`100.0`填充。 + +`cq3`每20秒执行一次,每次执行的查询的时间窗口范围是从过去40秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq3`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +值得注意的是`cq3`会多次计算某些区间的聚合值,下面是计算结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 配置连续查询每次查询执行时间窗口的结束时间 + +使用`RESAMPLE`子句中的`EVERY`参数和`RANGE`参数分别指定连续查询的执行间隔和窗口大小。并且使用`fill()`来填充没有值的时间区间。 + +```sql +CREATE CONTINUOUS QUERY cq4 +RESAMPLE EVERY 20s RANGE 40s, 20s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq4`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。如果某些区间没有值,用`100.0`填充。 + +`cq4`每20秒执行一次,每次执行的查询的时间窗口范围是从过去40秒到过去20秒。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq4`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:20)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +值得注意的是`cq4`只会计算每个聚合区间一次,并且每次开始执行计算的时间都会比当前的时间窗口结束时间迟20s, 下面是计算结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 没有GROUP BY TIME子句的连续查询 + +不使用`GROUP BY TIME`子句,并在`RESAMPLE`子句中显式使用`EVERY`参数指定连续查询的执行间隔。 + +```sql +CREATE CONTINUOUS QUERY cq5 +RESAMPLE EVERY 20s +BEGIN + SELECT temperature + 1 + INTO root.precalculated_sg.::(temperature) + FROM root.ln.*.* + align by device +END +``` + +`cq5`计算以`root.ln`为前缀的所有`temperature + 1`的值,并将结果储存在另一个 database `root.precalculated_sg`中。除 database 名称不同外,目标序列与源序列路径名均相同。 + +`cq5`每20秒执行一次,每次执行的查询的时间窗口范围是从过去20秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq5`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq5` generate 16 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| ++-----------------------------+-------------------------------+-----------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq5` generate 12 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +> +```` + +`cq5`并不会处理当前时间窗口以外的数据,即`2021-05-11T22:18:20.000+08:00`以前的数据,所以我们会得到如下结果: + +```` +> SELECT temperature from root.precalculated_sg.*.* align by device; ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +```` + +## 连续查询的管理 + +### 查询系统已有的连续查询 + +展示集群中所有的已注册的连续查询 + +```sql +SHOW (CONTINUOUS QUERIES | CQS) +``` + +`SHOW (CONTINUOUS QUERIES | CQS)`会将结果集按照`cq_id`排序。 + +#### 例子 + +```sql +SHOW CONTINUOUS QUERIES; +``` + +执行以上sql,我们将会得到如下的查询结果: + +| cq_id | query | state | +|:-------------|---------------------------------------------------------------------------------------------------------------------------------------|-------| +| s1_count_cq | CREATE CQ s1_count_cq
BEGIN
SELECT count(s1)
INTO root.sg_count.d.count_s1
FROM root.sg.d
GROUP BY(30m)
END | active | + + +### 删除已有的连续查询 + +删除指定的名为cq_id的连续查询: + +```sql +DROP (CONTINUOUS QUERY | CQ) +``` + +DROP CQ并不会返回任何结果集。 + +#### 例子 + +删除名为s1_count_cq的连续查询: + +```sql +DROP CONTINUOUS QUERY s1_count_cq; +``` + +### 修改已有的连续查询 + +目前连续查询一旦被创建就不能再被修改。如果想要修改某个连续查询,只能先用`DROP`命令删除它,然后再用`CREATE`命令重新创建。 + + +## 连续查询的使用场景 + +### 对数据进行降采样并对降采样后的数据使用不同的保留策略 + +可以使用连续查询,定期将高频率采样的原始数据(如每秒1000个点),降采样(如每秒仅保留一个点)后保存到另一个 database 的同名序列中。高精度的原始数据所在 database 的`TTL`可能设置的比较短,比如一天,而低精度的降采样后的数据所在的 database `TTL`可以设置的比较长,比如一个月,从而达到快速释放磁盘空间的目的。 + +### 预计算代价昂贵的查询 + +我们可以通过连续查询对一些重复的查询进行预计算,并将查询结果保存在某些目标序列中,这样真实查询并不需要真的再次去做计算,而是直接查询目标序列的结果,从而缩短了查询的时间。 + +> 预计算查询结果尤其对一些可视化工具渲染时序图和工作台时有很大的加速作用。 + +### 作为子查询的替代品 + +IoTDB现在不支持子查询,但是我们可以通过创建连续查询得到相似的功能。我们可以将子查询注册为一个连续查询,并将子查询的结果物化到目标序列中,外层查询再直接查询哪个目标序列。 + +#### 例子 + +IoTDB并不会接收如下的嵌套子查询。这个查询会计算s1序列每隔30分钟的非空值数量的平均值: + +```sql +SELECT avg(count_s1) from (select count(s1) as count_s1 from root.sg.d group by([0, now()), 30m)); +``` + +为了得到相同的结果,我们可以: + +**1. 创建一个连续查询** + +这一步执行内层子查询部分。下面创建的连续查询每隔30分钟计算一次`root.sg.d.s1`序列的非空值数量,并将结果写入目标序列`root.sg_count.d.count_s1`中。 + +```sql +CREATE CQ s1_count_cq +BEGIN + SELECT count(s1) + INTO root.sg_count.d.count_s1 + FROM root.sg.d + GROUP BY(30m) +END +``` + +**2. 查询连续查询的结果** + +这一步执行外层查询的avg([...])部分。 + +查询序列`root.sg_count.d.count_s1`的值,并计算平均值: + +```sql +SELECT avg(count_s1) from root.sg_count.d; +``` + + +## 连续查询相关的配置参数 +| 参数名 | 描述 | 类型 | 默认值 | +| :---------------------------------- |----------------------|----------|---------------| +| `continuous_query_submit_thread` | 用于周期性提交连续查询执行任务的线程数 | int32 | 2 | +| `continuous_query_min_every_interval_in_ms` | 系统允许的连续查询最小的周期性时间间隔 | duration | 1000 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/IoTDB-View_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/IoTDB-View_timecho.md new file mode 100644 index 00000000..b7b77617 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/IoTDB-View_timecho.md @@ -0,0 +1,548 @@ + + +# 视图 + +## 序列视图应用背景 + +### 应用场景1 时间序列重命名(PI资产管理) + +实际应用中,采集数据的设备可能使用人类难以理解的标识号来命名,这给业务层带来了查询上的困难。 + +而序列视图能够重新组织管理这些序列,在不改变原有序列内容、无需新建或拷贝序列的情况下,使用新的模型结构来访问他们。 + +**例如**:一台云端设备使用自己的网卡MAC地址组成实体编号,存储数据时写入如下时间序列:`root.db.0800200A8C6D.xvjeifg`. + +对于用户来说,它是难以理解的。但此时,用户能够使用序列视图功能对它重命名,将它映射到一个序列视图中去,使用`root.view.device001.temperature`来访问采集到的数据。 + +### 应用场景2 简化业务层查询逻辑 + +有时用户有大量设备,管理着大量时间序列。在进行某项业务时,用户希望仅处理其中的部分序列,此时就可以通过序列视图功能挑选出关注重点,方便反复查询、写入。 + +**例如**:用户管理一条产品流水线,各环节的设备有大量时间序列。温度检测员仅需要关注设备温度,就可以抽取温度相关的序列,组成序列视图。 + +### 应用场景3 辅助权限管理 + +生产过程中,不同业务负责的范围一般不同,出于安全考虑往往需要通过权限管理来限制业务员的访问范围。 + +**例如**:安全管理部门现在仅需要监控某生产线上各设备的温度,但这些数据与其他机密数据存放在同一数据库。此时,就可以创建若干新的视图,视图中仅含有生产线上与温度有关的时间序列,接着,向安全员只赋予这些序列视图的权限,从而达到权限限制的目的。 + +### 设计序列视图功能的动机 + +结合上述两类使用场景,设计序列视图功能的动机,主要有: + +1. 时间序列重命名。 +2. 简化业务层查询逻辑。 +3. 辅助权限管理,通过视图向特定用户开放数据。 + +## 序列视图概念 + +### 术语概念 + +约定:若无特殊说明,本文档所指定的视图均是**序列视图**,未来可能引入设备视图等新功能。 + +### 序列视图 + +序列视图是一种组织管理时间序列的方式。 + +在传统关系型数据库中,数据都必须存放在一个表中,而在IoTDB等时序数据库中,序列才是存储单元。因此,IoTDB中序列视图的概念也是建立在序列上的。 + +一个序列视图就是一条虚拟的时间序列,每条虚拟的时间序列都像是一条软链接或快捷方式,映射到某个视图外部的序列或者某种计算逻辑。换言之,一个虚拟序列要么映射到某个确定的外部序列,要么由多个外部序列运算得来。 + +用户可以使用复杂的SQL查询创建视图,此时序列视图就像一条被存储的查询语句,当从视图中读取数据时,就把被存储的查询语句作为数据来源,放在FROM子句中。 + +### 别名序列 + +在序列视图中,有一类特殊的存在,他们满足如下所有条件: + +1. 数据来源为单一的时间序列 +2. 没有任何计算逻辑 +3. 没有任何筛选条件(例如无WHERE子句的限制) + +这样的序列视图,被称为**别名序列**,或别名序列视图。不完全满足上述所有条件的序列视图,就称为非别名序列视图。他们之间的区别是:只有别名序列支持写入功能。 + +**所有序列视图包括别名序列目前均不支持触发器功能(Trigger)。** + +### 嵌套视图 + +用户可能想从一个现有的序列视图中选出若干序列,组成一个新的序列视图,就称之为嵌套视图。 + +**当前版本不支持嵌套视图功能**。 + +### IoTDB中对序列视图的一些约束 + +#### 限制1 序列视图必须依赖于一个或者若干个时间序列 + +一个序列视图有两种可能的存在形式: + +1. 它映射到一条时间序列 +2. 它由一条或若干条时间序列计算得来 + +前种存在形式已在前文举例,易于理解;而此处的后一种存在形式,则是因为序列视图允许计算逻辑的存在。 + +比如,用户在同一个锅炉安装了两个温度计,现在需要计算两个温度值的平均值作为测量结果。用户采集到的是如下两个序列:`root.db.d01.temperature01`、`root.db.d01.temperature02`。 + +此时,用户可以使用两个序列求平均值,作为视图中的一条序列:`root.db.d01.avg_temperature`。 + +该例子会3.1.2详细展开。 + +#### 限制2 非别名序列视图是只读的 + +不允许向非别名序列视图写入。 + +只有别名序列视图是支持写入的。 + +#### 限制3 不允许嵌套视图 + +不能选定现有序列视图中的某些列来创建序列视图,无论是直接的还是间接的。 + +本限制将在3.1.3给出示例。 + +#### 限制4 序列视图与时间序列不能重名 + +序列视图和时间序列都位于同一棵树下,所以他们不能重名。 + +任何一条序列的名称(路径)都应该是唯一确定的。 + +#### 限制5 序列视图与时间序列的时序数据共用,标签等元数据不共用 + +序列视图是指向时间序列的映射,所以它们完全共用时序数据,由时间序列负责持久化存储。 + +但是它们的tag、attributes等元数据不共用。 + +这是因为进行业务查询时,面向视图的用户关心的是当前视图的结构,而如果使用group by tag等方式做查询,显然希望是得到视图下含有对应tag的分组效果,而非时间序列的tag的分组效果(用户甚至对那些时间序列毫无感知)。 + +## 序列视图功能介绍 + +### 创建视图 + +创建一个序列视图与创建一条时间序列类似,区别在于需要通过AS关键字指定数据来源,即原始序列。 + +#### 创建视图的SQL + +用户可以选取一些序列创建一个视图: + +```SQL +CREATE VIEW root.view.device.status +AS + SELECT s01 + FROM root.db.device +``` + +它表示用户从现有设备`root.db.device`中选出了`s01`这条序列,创建了序列视图`root.view.device.status`。 + +序列视图可以与时间序列存在于同一实体下,例如: + +```SQL +CREATE VIEW root.db.device.status +AS + SELECT s01 + FROM root.db.device +``` + +这样,`root.db.device`下就有了`s01`的一份虚拟拷贝,但是使用不同的名字`status`。 + +可以发现,上述两个例子中的序列视图,都是别名序列,我们给用户提供一种针对该序列的更方便的创建方式: + +```SQL +CREATE VIEW root.view.device.status +AS + root.db.device.s01 +``` + +#### 创建含有计算逻辑的视图 + +沿用2.2章节限制1中的例子: + +> 用户在同一个锅炉安装了两个温度计,现在需要计算两个温度值的平均值作为测量结果。用户采集到的是如下两个序列:`root.db.d01.temperature01`、`root.db.d01.temperature02`。 +> +> 此时,用户可以使用两个序列求平均值,作为视图中的一条序列:`root.view.device01.avg_temperature`。 + +如果不使用视图,用户可以这样查询两个温度的平均值: + +```SQL +SELECT (temperature01 + temperature02) / 2 +FROM root.db.d01 +``` + +而如果使用序列视图,用户可以这样创建一个视图来简化将来的查询: + +```SQL +CREATE VIEW root.db.d01.avg_temperature +AS + SELECT (temperature01 + temperature02) / 2 + FROM root.db.d01 +``` + +然后用户可以这样查询: + +```SQL +SELECT avg_temperature FROM root.db.d01 +``` + +#### 不支持嵌套序列视图 + +继续沿用3.1.2中的例子,现在用户想使用序列视图`root.db.d01.avg_temperature`创建一个新的视图,这是不允许的。我们目前不支持嵌套视图,无论它是否是别名序列,都不支持。 + +比如下列SQL语句会报错: + +```SQL +CREATE VIEW root.view.device.avg_temp_copy +AS + root.db.d01.avg_temperature -- 不支持。不允许嵌套视图 +``` + +#### 一次创建多条序列视图 + +一次只能指定一个序列视图对用户来说使用不方便,则可以一次指定多条序列,比如: + +```SQL +CREATE VIEW root.db.device.status, root.db.device.sub.hardware +AS + SELECT s01, s02 + FROM root.db.device +``` + +此外,上述写法可以做简化: + +```SQL +CREATE VIEW root.db.device(status, sub.hardware) +AS + SELECT s01, s02 + FROM root.db.device +``` + +上述两条语句都等价于如下写法: + +```SQL +CREATE VIEW root.db.device.status +AS + SELECT s01 + FROM root.db.device; + +CREATE VIEW root.db.device.sub.hardware +AS + SELECT s02 + FROM root.db.device +``` + +也等价于如下写法 + +```SQL +CREATE VIEW root.db.device.status, root.db.device.sub.hardware +AS + root.db.device.s01, root.db.device.s02 + +-- 或者 + +CREATE VIEW root.db.device(status, sub.hardware) +AS + root.db.device(s01, s02) +``` + +##### 所有序列间的映射关系为静态存储 + +有时,SELECT子句中可能包含运行时才能确定的语句个数,比如如下的语句: + +```SQL +SELECT s01, s02 +FROM root.db.d01, root.db.d02 +``` + +上述语句能匹配到的序列数量是并不确定的,和系统状态有关。即便如此,用户也可以使用它创建视图。 + +不过需要特别注意,所有序列间的映射关系为静态存储(创建时固定)!请看以下示例: + +当前数据库中仅含有`root.db.d01.s01`、`root.db.d02.s01`、`root.db.d02.s02`三条序列,接着创建视图: + +```SQL +CREATE VIEW root.view.d(alpha, beta, gamma) +AS + SELECT s01, s02 + FROM root.db.d01, root.db.d02 +``` + +时间序列之间映射关系如下: + +| 序号 | 时间序列 | 序列视图 | +| ---- | ----------------- | ----------------- | +| 1 | `root.db.d01.s01` | root.view.d.alpha | +| 2 | `root.db.d02.s01` | root.view.d.beta | +| 3 | `root.db.d02.s02` | root.view.d.gamma | + +此后,用户新增了序列`root.db.d01.s02`,则它不对应到任何视图;接着,用户删除`root.db.d01.s01`,则查询`root.view.d.alpha`会直接报错,它也不会对应到`root.db.d01.s02`。 + +请时刻注意,序列间映射关系是静态地、固化地存储的。 + +#### 批量创建序列视图 + +现有若干个设备,每个设备都有一个温度数值,例如: + +1. root.db.d1.temperature +2. root.db.d2.temperature +3. ... + +这些设备下可能存储了很多其他序列(例如`root.db.d1.speed`),但目前可以创建一个视图,只包含这些设备的温度值,而不关系其他序列: + +```SQL +CREATE VIEW root.db.view(${2}_temperature) +AS + SELECT temperature FROM root.db.* +``` + +这里仿照了查询写回(`SELECT INTO`)对命名规则的约定,使用变量占位符来指定命名规则。可以参考:[查询写回(SELECT INTO)](../User-Manual/Query-Data.md#查询写回(INTO-子句)) + +这里`root.db.*.temperature`指定了有哪些时间序列会被包含在视图中;`${2}`则指定了从时间序列中的哪个节点提取出名字来命名序列视图。 + +此处,`${2}`指代的是`root.db.*.temperature`的层级2(从 0 开始),也就是`*`的匹配结果;`${2}_temperature`则是将匹配结果与`temperature`通过下划线拼接了起来,构成视图下各序列的节点名称。 + +上述创建视图的语句,和下列写法是等价的: + +```SQL +CREATE VIEW root.db.view(${2}_${3}) +AS + SELECT temperature from root.db.* +``` + +最终视图中含有这些序列: + +1. root.db.view.d1_temperature +2. root.db.view.d2_temperature +3. ... + +使用通配符创建,只会存储创建时刻的静态映射关系。 + +#### 创建视图时SELECT子句受到一定限制 + +创建序列视图时,使用的SELECT子句受到一定限制。主要限制如下: + +1. 不能使用`WHERE`子句。 +2. 不能使用`GROUP BY`子句。 +3. 不能使用`MAX_VALUE`等聚合函数。 + +简单来说,`AS`后只能使用`SELECT ... FROM ... `的结构,且该查询语句的结果必须能构成一条时间序列。 + +### 视图数据查询 + +对于可以支持的数据查询功能,在执行时序数据查询时,序列视图与时间序列可以无差别使用,行为完全一致。 + +**目前序列视图不支持的查询类型如下:** + +1. **align by device 查询** +2. **group by tags 查询** + +用户也可以在同一个SELECT语句中混合查询时间序列与序列视图,比如: + +```SQL +SELECT temperature01, temperature02, avg_temperature +FROM root.db.d01 +WHERE temperature01 < temperature02 +``` + +但是,如果用户想要查询序列的元数据,例如tag、attributes等,则查询到的是序列视图的结果,而并非序列视图所引用的时间序列的结果。 + +此外,对于别名序列,如果用户想要得到时间序列的tag、attributes等信息,则需要先查询视图列的映射,找到对应的时间序列,再向时间序列查询tag、attributes等信息。查询视图列的映射的方法将会在3.5部分说明。 + +### 视图修改 + +对视图的修改,例如改名、修改计算逻辑、删除等操作,都和创建新的视图类似,需要重新指定整个视图的全部列相关的描述。 + +#### 修改视图数据来源 + +```SQL +ALTER VIEW root.view.device.status +AS + SELECT s01 + FROM root.ln.wf.d01 +``` + +#### 修改视图的计算逻辑 + +```SQL +ALTER VIEW root.db.d01.avg_temperature +AS + SELECT (temperature01 + temperature02 + temperature03) / 3 + FROM root.db.d01 +``` + +#### 标签点管理 + +- 添加新的标签 + +```SQL +ALTER view root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` + +- 添加新的属性 + +```SQL +ALTER view root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` + +- 重命名标签或属性 + +```SQL +ALTER view root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +- 重新设置标签或属性的值 + +```SQL +ALTER view root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` + +- 删除已经存在的标签或属性 + +```SQL +ALTER view root.turbine.d1.s1 DROP tag1, tag2 +``` + +- 更新插入别名,标签和属性 + +> 如果该别名,标签或属性原来不存在,则插入,否则,用新值更新原来的旧值 + +```SQL +ALTER view root.turbine.d1.s1 UPSERT TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +#### 删除视图 + +因为一个视图就是一条序列,因此可以像删除时间序列一样删除一个视图。 + +```SQL +DELETE VIEW root.view.device.avg_temperatue +``` + +### 视图同步 + +序列视图的数据总是经由实时的查询获得,因此天然支持数据同步。 + +#### 如果依赖的原序列被删除了 + +当序列视图查询时(序列解析时),如果依赖的时间序列不存在,则**返回空结果集**。 + +这和查询一个不存在的序列的反馈类似,但是有区别:如果依赖的时间序列无法解析,空结果集是包含表头的,以此来提醒用户该视图是存在问题的。 + +此外,被依赖的时间序列删除时,不会去查找是否有依赖于该列的视图,用户不会收到任何警告。 + +#### 不支持非别名序列的数据写入 + +不支持向非别名序列的写入。 + +详情请参考前文 2.1.6 限制2 + +#### 序列的元数据不共用 + +详情请参考前文2.1.6 限制5 + +### 视图元数据查询 + +视图元数据查询,特指查询视图本身的元数据(例如视图有多少列),以及数据库内视图的信息(例如有哪些视图)。 + +#### 查看当前的视图列 + +用户有两种查询方式: + +1. 使用`SHOW TIMESERIES`进行查询,该查询既包含时间序列,也包含序列视图。但是只能显示视图的部分属性 +2. 使用`SHOW VIEW`进行查询,该查询只包含序列视图。能完整显示序列视图的属性。 + +举例: + +```Shell +IoTDB> show timeseries; ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters|ViewType| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.device.s01 | null| root.db| INT32| RLE| SNAPPY|null| null| null| null| BASE| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.view.status | null| root.db| INT32| RLE| SNAPPY|null| null| null| null| VIEW| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.d01.temp01 | null| root.db| FLOAT| RLE| SNAPPY|null| null| null| null| BASE| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.d01.temp02 | null| root.db| FLOAT| RLE| SNAPPY|null| null| null| null| BASE| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +|root.db.d01.avg_temp| null| root.db| FLOAT| null| null|null| null| null| null| VIEW| ++--------------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+--------+ +Total line number = 5 +It costs 0.789s +IoTDB> +``` + +最后一列`ViewType`中显示了该序列的类型,时间序列为BASE,序列视图是VIEW。 + +此外,某些序列视图的属性会缺失,比如`root.db.d01.avg_temp`是由温度均值计算得来,所以`Encoding`和`Compression`属性都为空值。 + +此外,`SHOW TIMESERIES`语句的查询结果主要分为两部分: + +1. 时序数据的信息,例如数据类型,压缩方式,编码等 +2. 其他元数据信息,例如tag,attribute,所属database等 + +对于序列视图,展示的时序数据信息与其原始序列一致或者为空值(比如计算得到的平均温度有数据类型但是无压缩方式);展示的元数据信息则是视图的内容。 + +如果要得知视图的更多信息,需要使用`SHOW ``VIEW`。`SHOW ``VIEW`中展示视图的数据来源等。 + +```Shell +IoTDB> show VIEW root.**; ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +| Timeseries|Database|DataType|Tags|Attributes|ViewType| SOURCE| ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +|root.db.view.status | root.db| INT32|null| null| VIEW| root.db.device.s01| ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +|root.db.d01.avg_temp| root.db| FLOAT|null| null| VIEW|(root.db.d01.temp01+root.db.d01.temp02)/2| ++--------------------+--------+--------+----+----------+--------+-----------------------------------------+ +Total line number = 2 +It costs 0.789s +IoTDB> +``` + +最后一列`SOURCE`显示了该序列视图的数据来源,列出了创建该序列的SQL语句。 + +##### 关于数据类型 + +上述两种查询都涉及视图的数据类型。视图的数据类型是根据定义视图的查询语句或别名序列的原始时间序列类型推断出来的。这个数据类型是根据当前系统的状态实时计算出来的,因此在不同时刻查询到的数据类型可能是改变的。 + +## FAQ + +#### Q1:我想让视图实现类型转换的功能。例如,原有一个int32类型的时间序列,和其他int64类型的序列被放在了同一个视图中。我现在希望通过视图查询到的数据,都能自动转换为int64类型。 + +> Ans:这不是序列视图的职能范围。但是可以使用`CAST`进行转换,比如: + +```SQL +CREATE VIEW root.db.device.int64_status +AS + SELECT CAST(s1, 'type'='INT64') from root.db.device +``` + +> 这样,查询`root.view.status`时,就会得到int64类型的结果。 +> +> 请特别注意,上述例子中,序列视图的数据是通过`CAST`转换得到的,因此`root.db.device.int64_status`并不是一条别名序列,也就**不支持写入**。 + +#### Q2:是否支持默认命名?选择若干时间序列,创建视图;但是我不指定每条序列的名字,由数据库自动命名? + +> Ans:不支持。用户必须明确指定命名。 + +#### Q3:在原有体系中,创建时间序列`root.db.device.s01`,可以发现自动创建了database`root.db`,自动创建了device`root.db.device`。接着删除时间序列`root.db.device.s01`,可以发现`root.db.device`被自动删除,`root.db`却还是保留的。对于创建视图,会沿用这一机制吗?出于什么考虑呢? + +> Ans:保持原有的行为不变,引入视图功能不会改变原有的这些逻辑。 + +#### Q4:是否支持序列视图重命名? + +> A:当前版本不支持重命名,可以自行创建新名称的视图投入使用。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Maintennance.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Maintennance.md new file mode 100644 index 00000000..b1cb1941 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Maintennance.md @@ -0,0 +1,351 @@ + +# 运维语句 + +## Explain/Explain Analyze 语句 + +查询分析的意义在于帮助用户理解查询的执行机制和性能瓶颈,从而实现查询优化和性能提升。这不仅关乎到查询的执行效率,也直接影响到应用的用户体验和资源的有效利用。为了进行有效的查询分析,**IoTDB** **V1.3.2及以上版本**提供了查询分析语句:Explain 和 Explain Analyze。 + +- Explain 语句:允许用户预览查询 SQL 的执行计划,包括 IoTDB 如何组织数据检索和处理。 +- Explain Analyze 语句:在 Explain 语句基础上增加了性能分析,完整执行SQL并展示查询执行过程中的时间和资源消耗。为IoTDB用户深入理解查询详情以及进行查询优化提供了详细的相关信息。与其他常用的 IoTDB 排查手段相比,Explain Analyze 没有部署负担,同时能够针对单条 sql 进行分析,能够更好定位问题。各类方法对比如下: + +| 方法 | 安装难度 | 业务影响 | 功能范围 | +| :------------------ | :----------------------------------------------------------- | :--------------------------------------------------- | :----------------------------------------------------- | +| Explain Analyze语句 | 低。无需安装额外组件,为IoTDB内置SQL语句 | 低。只会影响当前分析的单条查询,对线上其他负载无影响 | 支持分布式,可支持对单条SQL进行追踪 | +| 监控面板 | 中。需要安装IoTDB监控面板工具(企业版工具),并开启IoTDB监控服务 | 中。IoTDB监控服务记录指标会带来额外耗时 | 支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | +| Arthas抽样 | 中。需要安装Java Arthas工具(部分内网无法直接安装Arthas,且安装后,有时需要重启应用) | 高。CPU 抽样可能会影响线上业务的响应速度 | 不支持分布式,仅支持对数据库整体查询负载和耗时进行分析 | + +### Explain 语句 + +#### 语法 + +Explain命令允许用户查看SQL查询的执行计划。执行计划以算子的形式展示,描述了IoTDB会如何执行查询。其语法如下,其中SELECT_STATEMENT是查询相关的SQL语句: + +```SQL +EXPLAIN +``` + +Explain的返回结果包括了数据访问策略、过滤条件是否下推以及查询计划在不同节点的分配等信息,为用户提供了一种手段,以可视化查询的内部执行逻辑。 + +#### 示例 + +```SQL +# 插入数据 +insert into root.explain.data(timestamp, column1, column2) values(1710494762, "hello", "explain") + +# 执行explain语句 +explain select * from root.explain.data +``` + +执行上方SQL,会得到如下结果。不难看出,IoTDB分别通过两个SeriesScan节点去获取column1和column2的数据,最后通过fullOuterTimeJoin将其连接。 + +```Plain ++-----------------------------------------------------------------------+ +| distribution plan| ++-----------------------------------------------------------------------+ +| ┌───────────────────┐ | +| │FullOuterTimeJoin-3│ | +| │Order: ASC │ | +| └───────────────────┘ | +| ┌─────────────────┴─────────────────┐ | +| │ │ | +|┌─────────────────────────────────┐ ┌─────────────────────────────────┐| +|│SeriesScan-4 │ │SeriesScan-5 │| +|│Series: root.explain.data.column1│ │Series: root.explain.data.column2│| +|│Partition: 3 │ │Partition: 3 │| +|└─────────────────────────────────┘ └─────────────────────────────────┘| ++-----------------------------------------------------------------------+ +``` + +### Explain Analyze 语句 + +#### 语法 + +Explain Analyze 是 IOTDB 查询引擎自带的性能分析 SQL,与 Explain 不同,它会执行对应的查询计划并统计执行信息,可以用于追踪一条查询的具体性能分布,用于对资源进行观察,进行性能调优与异常分析。其语法如下: + +```SQL +EXPLAIN ANALYZE [VERBOSE] +``` + +其中SELECT_STATEMENT对应需要分析的查询语句;VERBOSE为打印详细分析结果,不填写VERBOSE时EXPLAIN ANALYZE将会省略部分信息。 + +在EXPLAIN ANALYZE的结果集中,会包含如下信息: + +![explain-analyze-1.png](https://alioss.timecho.com/upload/explain-analyze-1.png) + +其中: + +- QueryStatistics包含查询层面进的统计信息,主要包含规划解析阶段耗时,Fragment元数据等信息。 +- FragmentInstance是IoTDB在一个节点上查询计划的封装,每一个节点都会在结果集中输出一份Fragment信息,主要包含FragmentStatistics和算子信息。FragmentStastistics包含Fragment的统计信息,包括总实际耗时(墙上时间),所涉及到的TsFile,调度信息等情况。在一个Fragment的信息输出同时会以节点树层级的方式展示该Fragment下计划节点的统计信息,主要包括:CPU运行时间、输出的数据行数、指定接口被调用的次数、所占用的内存、节点专属的定制信息。 + +#### 特别说明 + +1. Explain Analyze 语句的结果简化 + +由于在 Fragment 中会输出当前节点中执行的所有节点信息,当一次查询涉及的序列过多时,每个节点都被输出,会导致 Explain Analyze 返回的结果集过大,因此当相同类型的节点超过 10 个时,系统会自动合并当前 Fragment 下所有相同类型的节点,合并后统计信息也被累积,对于一些无法合并的定制信息会直接丢弃(如下图)。 + +![explain-analyze-2.png](https://alioss.timecho.com/upload/explain-analyze-2.png) + +用户也可以自行修改 iotdb-system.properties 中的配置项`merge_threshold_of_explain_analyze`来设置触发合并的节点阈值,该参数支持热加载。 + +2. 查询超时场景使用 Explain Analyze 语句 + +Explain Analyze 本身是一种特殊的查询,当执行超时的时候,无法使用Explain Analyze语句进行分析。为了在查询超时的情况下也可以通过分析结果排查超时原因,Explain Analyze还提供了**定时日志**机制(无需用户配置),每经过一定的时间间隔会将 Explain Analyze 的当前结果以文本的形式输出到专门的日志中。当查询超时时,用户可以前往logs/log_explain_analyze.log中查看对应的日志进行排查。 + +日志的时间间隔基于查询的超时时间进行计算,可以保证在超时的情况下至少会有两次的结果记录。 + +#### 示例 + +下面是Explain Analyze的一个例子: + +```SQL +# 插入数据 +insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494762, "hello", "explain", "analyze") +insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494862, "hello2", "explain2", "analyze2") +insert into root.explain.analyze.data(timestamp, column1, column2, column3) values(1710494962, "hello3", "explain3", "analyze3") + +# 执行explain analyze语句 +explain analyze select column2 from root.explain.analyze.data order by column1 +``` + +得到输出如下: + +```Plain ++-------------------------------------------------------------------------------------------------+ +| Explain Analyze| ++-------------------------------------------------------------------------------------------------+ +|Analyze Cost: 1.739 ms | +|Fetch Partition Cost: 0.940 ms | +|Fetch Schema Cost: 0.066 ms | +|Logical Plan Cost: 0.000 ms | +|Logical Optimization Cost: 0.000 ms | +|Distribution Plan Cost: 0.000 ms | +|Fragment Instances Count: 1 | +| | +|FRAGMENT-INSTANCE[Id: 20240315_115800_00030_1.2.0][IP: 127.0.0.1][DataRegion: 4][State: FINISHED]| +| Total Wall Time: 25 ms | +| Cost of initDataQuerySource: 0.175 ms | +| Seq File(unclosed): 0, Seq File(closed): 1 | +| UnSeq File(unclosed): 0, UnSeq File(closed): 0 | +| ready queued time: 0.280 ms, blocked queued time: 2.456 ms | +| [PlanNodeId 10]: IdentitySinkNode(IdentitySinkOperator) | +| CPU Time: 0.780 ms | +| output: 1 rows | +| HasNext() Called Count: 3 | +| Next() Called Count: 2 | +| Estimated Memory Size: : 1245184 | +| [PlanNodeId 5]: TransformNode(TransformOperator) | +| CPU Time: 0.764 ms | +| output: 1 rows | +| HasNext() Called Count: 3 | +| Next() Called Count: 2 | +| Estimated Memory Size: : 1245184 | +| [PlanNodeId 4]: SortNode(SortOperator) | +| CPU Time: 0.721 ms | +| output: 1 rows | +| HasNext() Called Count: 3 | +| Next() Called Count: 2 | +| sortCost/ns: 1125 | +| sortedDataSize: 272 | +| prepareCost/ns: 610834 | +| [PlanNodeId 3]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) | +| CPU Time: 0.706 ms | +| output: 1 rows | +| HasNext() Called Count: 5 | +| Next() Called Count: 1 | +| [PlanNodeId 7]: SeriesScanNode(SeriesScanOperator) | +| CPU Time: 1.085 ms | +| output: 1 rows | +| HasNext() Called Count: 2 | +| Next() Called Count: 1 | +| SeriesPath: root.explain.analyze.data.column2 | +| [PlanNodeId 8]: SeriesScanNode(SeriesScanOperator) | +| CPU Time: 1.091 ms | +| output: 1 rows | +| HasNext() Called Count: 2 | +| Next() Called Count: 1 | +| SeriesPath: root.explain.analyze.data.column1 | ++-------------------------------------------------------------------------------------------------+ +``` + +触发合并后的部分结果示例如下: + +```Plain +Analyze Cost: 143.679 ms +Fetch Partition Cost: 22.023 ms +Fetch Schema Cost: 63.086 ms +Logical Plan Cost: 0.000 ms +Logical Optimization Cost: 0.000 ms +Distribution Plan Cost: 0.000 ms +Fragment Instances Count: 2 + +FRAGMENT-INSTANCE[Id: 20240311_041502_00001_1.2.0][IP: 192.168.130.9][DataRegion: 14] + Total Wall Time: 39964 ms + Cost of initDataQuerySource: 1.834 ms + Seq File(unclosed): 0, Seq File(closed): 3 + UnSeq File(unclosed): 0, UnSeq File(closed): 0 + ready queued time: 504.334 ms, blocked queued time: 25356.419 ms + [PlanNodeId 20793]: IdentitySinkNode(IdentitySinkOperator) Count: * 1 + CPU Time: 24440.724 ms + input: 71216 rows + HasNext() Called Count: 35963 + Next() Called Count: 35962 + Estimated Memory Size: : 33882112 + [PlanNodeId 10385]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) Count: * 8 + CPU Time: 41437.708 ms + input: 243011 rows + HasNext() Called Count: 41965 + Next() Called Count: 41958 + Estimated Memory Size: : 33882112 + [PlanNodeId 11569]: SeriesScanNode(SeriesScanOperator) Count: * 1340 + CPU Time: 1397.822 ms + input: 134000 rows + HasNext() Called Count: 2353 + Next() Called Count: 1340 + Estimated Memory Size: : 32833536 + [PlanNodeId 20778]: ExchangeNode(ExchangeOperator) Count: * 7 + CPU Time: 109.245 ms + input: 71891 rows + HasNext() Called Count: 1431 + Next() Called Count: 1431 + +FRAGMENT-INSTANCE[Id: 20240311_041502_00001_1.3.0][IP: 192.168.130.9][DataRegion: 11] + Total Wall Time: 39912 ms + Cost of initDataQuerySource: 15.439 ms + Seq File(unclosed): 0, Seq File(closed): 2 + UnSeq File(unclosed): 0, UnSeq File(closed): 0 + ready queued time: 152.988 ms, blocked queued time: 37775.356 ms + [PlanNodeId 20786]: IdentitySinkNode(IdentitySinkOperator) Count: * 1 + CPU Time: 2020.258 ms + input: 48800 rows + HasNext() Called Count: 978 + Next() Called Count: 978 + Estimated Memory Size: : 42336256 + [PlanNodeId 20771]: FullOuterTimeJoinNode(FullOuterTimeJoinOperator) Count: * 8 + CPU Time: 5255.307 ms + input: 195800 rows + HasNext() Called Count: 2455 + Next() Called Count: 2448 + Estimated Memory Size: : 42336256 + [PlanNodeId 11867]: SeriesScanNode(SeriesScanOperator) Count: * 1680 + CPU Time: 1248.080 ms + input: 168000 rows + HasNext() Called Count: 3198 + Next() Called Count: 1680 + Estimated Memory Size: : 41287680 + +...... +``` + +### 常见问题 + +#### WALL TIME(墙上时间)和 CPU TIME(CPU时间)的区别? + +CPU 时间也称为处理器时间或处理器使用时间,指的是程序在执行过程中实际占用 CPU 进行计算的时间,显示的是程序实际消耗的处理器资源。 + +墙上时间也称为实际时间或物理时间,指的是从程序开始执行到程序结束的总时间,包括了所有等待时间。 + +1. WALL TIME < CPU TIME 的场景:比如一个查询分片最后被调度器使用两个线程并行执行,真实物理世界上是 10s 过去了,但两个线程,可能一直占了两个 cpu 核跑了 10s,那 cpu time 就是 20s,wall time就是 10s +2. WALL TIME > CPU TIME 的场景:因为系统内可能会存在多个查询并行执行,但查询的执行线程数和内存是固定的, + 1. 所以当查询分片被某些资源阻塞住时(比如没有足够的内存进行数据传输、或等待上游数据)就会放入Blocked Queue,此时查询分片并不会占用 CPU TIME,但WALL TIME(真实物理时间的时间)是在向前流逝的 + 2. 或者当查询线程资源不足时,比如当前共有16个查询线程,但系统内并发有20个查询分片,即使所有查询都没有被阻塞,也只会同时并行运行16个查询分片,另外四个会被放入 READY QUEUE,等待被调度执行,此时查询分片并不会占用 CPU TIME,但WALL TIME(真实物理时间的时间)是在向前流逝的 + +#### Explain Analyze 是否有额外开销,测出的耗时是否与查询真实执行时有差别? + +几乎没有,因为 explain analyze operator 是单独的线程执行,收集原查询的统计信息,且这些统计信息,即使不explain analyze,原来的查询也会生成,只是没有人去取。并且 explain analyze 是纯 next 遍历结果集,不会打印,所以与原来查询真实执行时的耗时不会有显著差别。 + +#### IO 耗时主要关注几个指标? + +可能涉及 IO 耗时的主要有个指标,loadTimeSeriesMetadataDiskSeqTime, loadTimeSeriesMetadataDiskUnSeqTime 以及 construct[NonAligned/Aligned]ChunkReadersDiskTime + +TimeSeriesMetadata 的加载分别统计了顺序和乱序文件,但 Chunk 的读取暂时未分开统计,但顺乱序比例可以通过TimeseriesMetadata 顺乱序的比例计算出来。 + +#### 乱序数据对查询性能的影响能否有一些指标展示出来? + +乱序数据产生的影响主要有两个: + +1. 需要在内存中多做一个归并排序(一般认为这个耗时是比较短的,毕竟是纯内存的 cpu 操作) +2. 乱序数据会产生数据块间的时间范围重叠,导致统计信息无法使用 + 1. 无法利用统计信息直接 skip 掉整个不满足值过滤要求的 chunk + 1. 一般用户的查询都是只包含时间过滤条件,则不会有影响 + 2. 无法利用统计信息直接计算出聚合值,无需读取数据 + +单独对于乱序数据的性能影响,目前并没有有效的观测手段,除非就是在有乱序数据的时候,执行一遍查询耗时,然后等乱序合完了,再执行一遍,才能对比出来。 + +因为即使乱序这部分数据进了顺序,也是需要 IO、加压缩、decode,这个耗时少不了,不会因为乱序数据被合并进乱序了,就减少了。 + +#### 执行 explain analyze 时,查询超时后,为什么结果没有输出在 log_explain_analyze.log 中? + +升级时,只替换了 lib 包,没有替换 conf/logback-datanode.xml,需要替换一下 conf/logback-datanode.xml,然后不需要重启(该文件内容可以被热加载),大约等待 1 分钟后,重新执行 explain analyze verbose。 + +### 实战案例 + +#### 案例一:查询涉及文件数量过多,磁盘IO成为瓶颈,导致查询速度变慢 + +![explain-analyze-3.png](https://alioss.timecho.com/upload/explain-analyze-3.png) + +查询总耗时为 938 ms,其中从文件中读取索引区和数据区的耗时占据 918 ms,涉及了总共 289 个文件,假设查询涉及 N 个 TsFile,磁盘单次 seek 耗时为 t_seek,读取文件尾部索引的耗时为 t_index,读取文件的数据块的耗时为 t_chunk,则首次查询(未命中缓存)的理论耗时为 `cost = N * (t_seek + t_index + t_seek + t_chunk)`,按经验值,HDD 磁盘的一次 seek 耗时约为 5-10ms,所以查询涉及的文件数越多,查询延迟会越大。 + +最终优化方案为: + +1. 调整合并参数,降低文件数量 +2. 更换 HDD 为 SSD,降低磁盘单次 IO 的延迟 + +#### 案例二:like 谓词执行慢导致查询超时 + +执行如下 sql 时,查询超时(默认超时时间为 60s) + +```SQL +select count(s1) as total from root.db.d1 where s1 like '%XXXXXXXX%' +``` + +执行 explain analyze verbose 时,即使查询超时,也会每隔 15s,将阶段性的采集结果输出到 log_explain_analyze.log 中,从 log_explain_analyze.log 中得到最后两次的输出结果如下: + +![explain-analyze-4.png](https://alioss.timecho.com/upload/explain-analyze-4.png) + +![explain-analyze-5.png](https://alioss.timecho.com/upload/explain-analyze-5.png) + +观察结果,我们发现是因为查询未加时间条件,涉及的数据太多 constructAlignedChunkReadersDiskTime 和 pageReadersDecodeAlignedDiskTime的耗时一直在涨,意味着一直在读新的 chunk。但 AlignedSeriesScanNode 的输出信息一直是 0,这是因为算子只有在输出至少一行满足条件的数据时,才会让出时间片,并更新信息。从总的读取耗时(loadTimeSeriesMetadataAlignedDiskSeqTime + loadTimeSeriesMetadataAlignedDiskUnSeqTime + constructAlignedChunkReadersDiskTime + pageReadersDecodeAlignedDiskTime=约13.4秒)来看,其他耗时(60s - 13.4 = 46.6)应该都是在执行过滤条件上(like 谓词的执行很耗时)。 + +最终优化方案为:增加时间过滤条件,避免全表扫描 + +## Start/Stop Repair Data 语句 +用于修复由于系统 bug 导致的乱序 +### START REPAIR DATA + +启动一个数据修复任务,扫描创建修复任务的时间之前产生的 tsfile 文件并修复有乱序错误的文件。 + +```sql +IoTDB> START REPAIR DATA +IoTDB> START REPAIR DATA ON LOCAL +IoTDB> START REPAIR DATA ON CLUSTER +``` + +### STOP REPAIR DATA + +停止一个进行中的修复任务。如果需要再次恢复一个已停止的数据修复任务的进度,可以重新执行 `START REPAIR DATA`. + +```sql +IoTDB> STOP REPAIR DATA +IoTDB> STOP REPAIR DATA ON LOCAL +IoTDB> STOP REPAIR DATA ON CLUSTER +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Streaming_apache.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Streaming_apache.md new file mode 100644 index 00000000..bef07327 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Streaming_apache.md @@ -0,0 +1,817 @@ + + +# IoTDB 流处理框架 + +IoTDB 流处理框架允许用户实现自定义的流处理逻辑,可以实现对存储引擎变更的监听和捕获、实现对变更数据的变形、实现对变形后数据的向外推送等逻辑。 + +我们将一个数据流处理任务称为 Pipe。一个流处理任务(Pipe)包含三个子任务: + +- 抽取(Source) +- 处理(Process) +- 发送(Sink) + +流处理框架允许用户使用 Java 语言自定义编写三个子任务的处理逻辑,通过类似 UDF 的方式处理数据。 +在一个 Pipe 中,上述的三个子任务分别由三种插件执行实现,数据会依次经过这三个插件进行处理: +Pipe Source 用于抽取数据,Pipe Processor 用于处理数据,Pipe Sink 用于发送数据,最终数据将被发至外部系统。 + +**Pipe 任务的模型如下:** + +![任务模型图](https://alioss.timecho.com/docs/img/1706697228308.jpg) + +描述一个数据流处理任务,本质就是描述 Pipe Source、Pipe Processor 和 Pipe Sink 插件的属性。 +用户可以通过 SQL 语句声明式地配置三个子任务的具体属性,通过组合不同的属性,实现灵活的数据 ETL 能力。 + +利用流处理框架,可以搭建完整的数据链路来满足端*边云同步、异地灾备、读写负载分库*等需求。 + +## 自定义流处理插件开发 + +### 编程开发依赖 + +推荐采用 maven 构建项目,在`pom.xml`中添加以下依赖。请注意选择和 IoTDB 服务器版本相同的依赖版本。 + +```xml + + org.apache.iotdb + pipe-api + 1.3.1 + provided + +``` + +### 事件驱动编程模型 + +流处理插件的用户编程接口设计,参考了事件驱动编程模型的通用设计理念。事件(Event)是用户编程接口中的数据抽象,而编程接口与具体的执行方式解耦,只需要专注于描述事件(数据)到达系统后,系统期望的处理方式即可。 + +在流处理插件的用户编程接口中,事件是数据库数据写入操作的抽象。事件由单机流处理引擎捕获,按照流处理三个阶段的流程,依次传递至 PipeSource 插件,PipeProcessor 插件和 PipeSink 插件,并依次在三个插件中触发用户逻辑的执行。 + +为了兼顾端侧低负载场景下的流处理低延迟和端侧高负载场景下的流处理高吞吐,流处理引擎会动态地在操作日志和数据文件中选择处理对象,因此,流处理的用户编程接口要求用户提供下列两类事件的处理逻辑:操作日志写入事件 TabletInsertionEvent 和数据文件写入事件 TsFileInsertionEvent。 + +#### **操作日志写入事件(TabletInsertionEvent)** + +操作日志写入事件(TabletInsertionEvent)是对用户写入请求的高层数据抽象,它通过提供统一的操作接口,为用户提供了操纵写入请求底层数据的能力。 + +对于不同的数据库部署方式,操作日志写入事件对应的底层存储结构是不一样的。对于单机部署的场景,操作日志写入事件是对写前日志(WAL)条目的封装;对于分布式部署的场景,操作日志写入事件是对单个节点共识协议操作日志条目的封装。 + +对于数据库不同写入请求接口生成的写入操作,操作日志写入事件对应的请求结构体的数据结构也是不一样的。IoTDB 提供了 InsertRecord、InsertRecords、InsertTablet、InsertTablets 等众多的写入接口,每一种写入请求都使用了完全不同的序列化方式,生成的二进制条目也不尽相同。 + +操作日志写入事件的存在,为用户提供了一种统一的数据操作视图,它屏蔽了底层数据结构的实现差异,极大地降低了用户的编程门槛,提升了功能的易用性。 + +```java +/** TabletInsertionEvent is used to define the event of data insertion. */ +public interface TabletInsertionEvent extends Event { + + /** + * The consumer processes the data row by row and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processRowByRow(BiConsumer consumer); + + /** + * The consumer processes the Tablet directly and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processTablet(BiConsumer consumer); +} +``` + +#### **数据文件写入事件(TsFileInsertionEvent)** + +数据文件写入事件(TsFileInsertionEvent) 是对数据库文件落盘操作的高层抽象,它是若干操作日志写入事件(TabletInsertionEvent)的数据集合。 + +IoTDB 的存储引擎是 LSM 结构的。数据写入时会先将写入操作落盘到日志结构的文件里,同时将写入数据保存在内存里。当内存达到控制上限,则会触发刷盘行为,即将内存中的数据转换为数据库文件,同时删除之前预写的操作日志。当内存中的数据转换为数据库文件中的数据时,会经过编码压缩和通用压缩两次压缩处理,因此数据库文件的数据相比内存中的原始数据占用的空间更少。 + +在极端的网络情况下,直接传输数据文件相比传输数据写入的操作要更加经济,它会占用更低的网络带宽,能实现更快的传输速度。当然,天下没有免费的午餐,对文件中的数据进行计算处理,相比直接对内存中的数据进行计算处理时,需要额外付出文件 I/O 的代价。但是,正是磁盘数据文件和内存写入操作两种结构各有优劣的存在,给了系统做动态权衡调整的机会,也正是基于这样的观察,插件的事件模型中才引入了数据文件写入事件。 + +综上,数据文件写入事件出现在流处理插件的事件流中,存在下面两种情况: + +(1)历史数据抽取:一个流处理任务开始前,所有已经落盘的写入数据都会以 TsFile 的形式存在。一个流处理任务开始后,采集历史数据时,历史数据将以 TsFileInsertionEvent 作为抽象; + +(2)实时数据抽取:一个流处理任务进行时,当数据流中实时处理操作日志写入事件的速度慢于写入请求速度一定进度之后,未来得及处理的操作日志写入事件会被被持久化至磁盘,以 TsFile 的形式存在,这一些数据被流处理引擎抽取到后,会以 TsFileInsertionEvent 作为抽象。 + +```java +/** + * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, + * which is compressed and encoded, and requires IO cost for computational processing. + */ +public interface TsFileInsertionEvent extends Event { + + /** + * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. + * + * @return {@code Iterable} the list of TabletInsertionEvent + */ + Iterable toTabletInsertionEvents(); +} +``` + +### 自定义流处理插件编程接口定义 + +基于自定义流处理插件编程接口,用户可以轻松编写数据抽取插件、数据处理插件和数据发送插件,从而使得流处理功能灵活适配各种工业场景。 + +#### 数据抽取插件接口 + +数据抽取是流处理数据从数据抽取到数据发送三阶段的第一阶段。数据抽取插件(PipeSource)是流处理引擎和存储引擎的桥梁,它通过监听存储引擎的行为, +捕获各种数据写入事件。 + +```java +/** + * PipeSource + * + *

PipeSource is responsible for capturing events from sources. + * + *

Various data sources can be supported by implementing different PipeSource classes. + * + *

The lifecycle of a PipeSource is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH SOURCE` clause in SQL are + * parsed and the validation method {@link PipeSource#validate(PipeParameterValidator)} will + * be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} will be called to + * config the runtime behavior of the PipeSource. + *
  • Then the method {@link PipeSource#start()} will be called to start the PipeSource. + *
  • While the collaboration task is in progress, the method {@link PipeSource#supply()} will be + * called to capture events from sources and then the events will be passed to the + * PipeProcessor. + *
  • The method {@link PipeSource#close()} will be called when the collaboration task is + * cancelled (the `DROP PIPE` command is executed). + *
+ */ +public interface PipeSource extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSource. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSourceRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSource#validate(PipeParameterValidator)} + * is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSource + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSourceRuntimeConfiguration configuration) + throws Exception; + + /** + * Start the Source. After this method is called, events should be ready to be supplied by + * {@link PipeSource#supply()}. This method is called after {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @throws Exception the user can throw errors if necessary + */ + void start() throws Exception; + + /** + * Supply single event from the Source and the caller will send the event to the processor. + * This method is called after {@link PipeSource#start()} is called. + * + * @return the event to be supplied. the event may be null if the Source has no more events at + * the moment, but the Source is still running for more events. + * @throws Exception the user can throw errors if necessary + */ + Event supply() throws Exception; +} +``` + +#### 数据处理插件接口 + +数据处理是流处理数据从数据抽取到数据发送三阶段的第二阶段。数据处理插件(PipeProcessor)主要用于过滤和转换由数据抽取插件(PipeSource)捕获的 +各种事件。 + +```java +/** + * PipeProcessor + * + *

PipeProcessor is used to filter and transform the Event formed by the PipeSource. + * + *

The lifecycle of a PipeProcessor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are + * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeProcessor. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSink. The + * following 3 methods will be called: {@link + * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link + * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link + * PipeProcessor#process(Event, EventCollector)}. + *
    • PipeSink serializes the events into binaries and send them to sinks. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeProcessor#close() } method will be called. + *
+ */ +public interface PipeProcessor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeProcessor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the + * events processing. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeProcessor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is called to process the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) + throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + process(tabletInsertionEvent, eventCollector); + } + } + + /** + * This method is called to process the Event. + * + * @param event Event to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(Event event, EventCollector eventCollector) throws Exception; +} +``` + +#### 数据发送插件接口 + +数据发送是流处理数据从数据抽取到数据发送三阶段的第三阶段。数据发送插件(PipeSink)主要用于发送经由数据处理插件(PipeProcessor)处理过后的 +各种事件,它作为流处理框架的网络实现层,接口上应允许接入多种实时通信协议和多种连接器。 + +```java +/** + * PipeSink + * + *

PipeSink is responsible for sending events to sinks. + * + *

Various network protocols can be supported by implementing different PipeSink classes. + * + *

The lifecycle of a PipeSink is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH SINK` clause in SQL are + * parsed and the validation method {@link PipeSink#validate(PipeParameterValidator)} will be + * called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link PipeSink#customize(PipeParameters, + * PipeSinkRuntimeConfiguration)} will be called to config the runtime behavior of the + * PipeSink and the method {@link PipeSink#handshake()} will be called to create a connection + * with sink. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSink. + *
    • PipeSink serializes the events into binaries and send them to sinks. The following 3 + * methods will be called: {@link PipeSink#transfer(TabletInsertionEvent)}, {@link + * PipeSink#transfer(TsFileInsertionEvent)} and {@link PipeSink#transfer(Event)}. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeSink#close() } method will be called. + *
+ * + *

In addition, the method {@link PipeSink#heartbeat()} will be called periodically to check + * whether the connection with sink is still alive. The method {@link PipeSink#handshake()} will be + * called to create a new connection with the sink when the method {@link PipeSink#heartbeat()} + * throws exceptions. + */ +public interface PipeSink extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSink. In this method, the user can do the following + * things: + * + *

    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSinkRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSink#validate(PipeParameterValidator)} is + * called and before the method {@link PipeSink#handshake()} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSink + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSinkRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is used to create a connection with sink. This method will be called after the + * method {@link PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called or + * will be called when the method {@link PipeSink#heartbeat()} throws exceptions. + * + * @throws Exception if the connection is failed to be created + */ + void handshake() throws Exception; + + /** + * This method will be called periodically to check whether the connection with sink is still + * alive. + * + * @throws Exception if the connection dies + */ + void heartbeat() throws Exception; + + /** + * This method is used to transfer the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; + + /** + * This method is used to transfer the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { + try { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + transfer(tabletInsertionEvent); + } + } finally { + tsFileInsertionEvent.close(); + } + } + + /** + * This method is used to transfer the generic events, including HeartbeatEvent. + * + * @param event Event to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(Event event) throws Exception; +} +``` + +## 自定义流处理插件管理 + +为了保证用户自定义插件在实际生产中的灵活性和易用性,系统还需要提供对插件进行动态统一管理的能力。 +本章节介绍的流处理插件管理语句提供了对插件进行动态统一管理的入口。 + +### 加载插件语句 + +在 IoTDB 中,若要在系统中动态载入一个用户自定义插件,则首先需要基于 PipeSource、 PipeProcessor 或者 PipeSink 实现一个具体的插件类, +然后需要将插件类编译打包成 jar 可执行文件,最后使用加载插件的管理语句将插件载入 IoTDB。 + +加载插件的管理语句的语法如图所示。 + +```sql +CREATE PIPEPLUGIN [IF NOT EXISTS] <别名> +AS <全类名> +USING +``` + +**IF NOT EXISTS 语义**:用于创建操作中,确保当指定 Pipe Plugin 不存在时,执行创建命令,防止因尝试创建已存在的 Pipe Plugin 而导致报错。 + +例如,用户实现了一个全类名为 edu.tsinghua.iotdb.pipe.ExampleProcessor 的数据处理插件, +打包后的 jar 资源包存放到了 https://example.com:8080/iotdb/pipe-plugin.jar 上,用户希望在流处理引擎中使用这个插件, +将插件标记为 example。那么,这个数据处理插件的创建语句如图所示。 + +```sql +CREATE PIPEPLUGIN IF NOT EXISTS example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +### 删除插件语句 + +当用户不再想使用一个插件,需要将插件从系统中卸载时,可以使用如图所示的删除插件语句。 + +```sql +DROP PIPEPLUGIN [IF EXISTS] <别名> +``` + +**IF EXISTS 语义**:用于删除操作中,确保当指定 Pipe Plugin 存在时,执行删除命令,防止因尝试删除不存在的 Pipe Plugin 而导致报错。 + +### 查看插件语句 + +用户也可以按需查看系统中的插件。查看插件的语句如图所示。 + +```sql +SHOW PIPEPLUGINS +``` + +## 系统预置的流处理插件 + +### 预置 source 插件 + +#### iotdb-source + +作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 + + +| key | value | value 取值范围 | required or optional with default | +|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------------------------| +| source | iotdb-source | String: iotdb-source | required | +| source.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| source.history.start-time | 抽取的历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| source.history.end-time | 抽取的历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| start-time(V1.3.1+) | start of synchronizing all data event time,including start-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| end-time(V1.3.1+) | end of synchronizing all data event time,including end-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | + +> 🚫 **source.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,抽取效率较高,其他任意格式都将降低性能 +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'source.pattern'='root.aligned.1' 的 pipe 时: + > + > * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100T + > + > 的数据会被抽取; + > + > * root.aligned.\`1\` +> * root.aligned.\`123\` + > + > 的数据不会被抽取。 + +> ❗️**source.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + +> 💎 **iotdb-source 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** + +### 预置 processor 插件 + +#### do-nothing-processor + +作用:不对 source 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +|-----------|----------------------|------------------------------|-----------------------------------| +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### 预置 sink 插件 + +#### do-nothing-sink + +作用:不对 processor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +|------|-----------------|-------------------------|-----------------------------------| +| sink | do-nothing-sink | String: do-nothing-sink | required | + +## 流处理任务管理 + +### 创建流处理任务 + +使用 `CREATE PIPE` 语句来创建流处理任务。以数据同步流处理任务的创建为例,示例 SQL 语句如下: + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH SOURCE ( + -- 默认的 IoTDB 数据抽取插件 + 'source' = 'iotdb-source', + -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + 'source.pattern' = 'root.timecho', + -- 描述被抽取的历史数据的时间范围,表示最早时间 + 'source.history.start-time' = '2011.12.03T10:15:30+01:00', + -- 描述被抽取的历史数据的时间范围,表示最晚时间 + 'source.history.end-time' = '2022.12.03T10:15:30+01:00', +) +WITH PROCESSOR ( + -- 默认的数据处理插件,即不做任何处理 + 'processor' = 'do-nothing-processor', +) +WITH SINK ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'sink' = 'iotdb-thrift-sink', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'sink.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'sink.port' = '6667', +) +``` + +**创建流处理任务时需要配置 PipeId 以及三个插件部分的参数:** + + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +|-----------|--------------------------------|---------------------------|----------------------|------------------------------|--------------------------| +| PipeId | 全局唯一标定一个流处理任务的名称 | 必填 | - | - | - | +| source | Pipe Source 插件,负责在数据库底层抽取流处理数据 | 选填 | iotdb-source | 将数据库的全量历史数据和后续到达的实时数据接入流处理任务 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| sink | Pipe Sink 插件,负责发送数据 | 必填 | - | - | | + +示例中,使用了 iotdb-source、do-nothing-processor 和 iotdb-thrift-sink 插件构建数据流处理任务。IoTDB 还内置了其他的流处理插件,**请查看“系统预置流处理插件”一节**。 + +**一个最简的 CREATE PIPE 语句示例如下:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH SINK ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'sink' = 'iotdb-thrift-sink', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'sink.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'sink.port' = '6667', +) +``` + +其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 + +**注意:** + +- SOURCE 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 +- SINK 为必填配置,需要在 CREATE PIPE 语句中声明式配置 +- SINK 具备自复用能力。对于不同的流处理任务,如果他们的 SINK 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 SINK 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个流处理任务的声明: + + ```sql + CREATE PIPE pipe1 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.ip' = 'localhost', + 'sink.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.port' = '9999', + 'sink.ip' = 'localhost', + ) + ``` + + - 因为它们对 SINK 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 SINK 进行复用,最终 pipe1, pipe2 的 SINK 将会是同一个实例。 +- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### 启动流处理任务 + +CREATE PIPE 语句成功执行后,流处理任务相关实例会被创建,但整个流处理任务的运行状态会被置为 STOPPED(V1.3.0),即流处理任务不会立刻处理数据。在 1.3.1 及以上的版本,流处理任务的运行状态在创建后将被立即置为 RUNNING。 + +可以使用 START PIPE 语句使流处理任务开始处理数据: + +```sql +START PIPE +``` + +### 停止流处理任务 + +使用 STOP PIPE 语句使流处理任务停止处理数据: + +```sql +STOP PIPE +``` + +### 删除流处理任务 + +使用 DROP PIPE 语句使流处理任务停止处理数据(当流处理任务状态为 RUNNING 时),然后删除整个流处理任务流处理任务: + +```sql +DROP PIPE +``` + +用户在删除流处理任务前,不需要执行 STOP 操作。 + +### 展示流处理任务 + +使用 SHOW PIPES 语句查看所有流处理任务: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +| ID| CreationTime| State|PipeSource|PipeProcessor|PipeSink|ExceptionMessage| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| {}| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +``` + +可以使用 `` 指定想看的某个流处理任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Sink 被复用的情况。 + +```sql +SHOW PIPES +WHERE SINK USED BY +``` + +### 流处理任务运行状态迁移 + +一个流处理 pipe 在其生命周期中会经过多种状态: + +- **RUNNING:** pipe 正在正常工作 + - 当一个 pipe 被成功创建之后,其初始状态为工作状态(V1.3.1+) +- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: + - 当一个 pipe 被成功创建之后,其初始状态为暂停状态(V1.3.0) + - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED + - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- **DROPPED:** pipe 任务被永久删除 + +下图表明了所有状态以及状态的迁移: + +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## 权限管理 + +### 流处理任务 + + +| 权限名称 | 描述 | +|----------|---------------| +| USE_PIPE | 注册流处理任务。路径无关。 | +| USE_PIPE | 开启流处理任务。路径无关。 | +| USE_PIPE | 停止流处理任务。路径无关。 | +| USE_PIPE | 卸载流处理任务。路径无关。 | +| USE_PIPE | 查询流处理任务。路径无关。 | + +### 流处理任务插件 + + +| 权限名称 | 描述 | +|:---------|-----------------| +| USE_PIPE | 注册流处理任务插件。路径无关。 | +| USE_PIPE | 卸载流处理任务插件。路径无关。 | +| USE_PIPE | 查询流处理任务插件。路径无关。 | + +## 配置参数 + +在 iotdb-system.properties 中: + +V1.3.0: +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 + +# The maximum number of selectors that can be used in the async connector. +# pipe_async_connector_selector_number=1 + +# The core number of clients that can be used in the async connector. +# pipe_async_connector_core_client_number=8 + +# The maximum number of clients that can be used in the async connector. +# pipe_async_connector_max_client_number=16 +``` + +V1.3.1+: +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# pipe_sink_max_client_number=16 +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Streaming_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Streaming_timecho.md new file mode 100644 index 00000000..0386f4e6 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Streaming_timecho.md @@ -0,0 +1,862 @@ + + +# IoTDB 流处理框架 + +IoTDB 流处理框架允许用户实现自定义的流处理逻辑,可以实现对存储引擎变更的监听和捕获、实现对变更数据的变形、实现对变形后数据的向外推送等逻辑。 + +我们将一个数据流处理任务称为 Pipe。一个流处理任务(Pipe)包含三个子任务: + +- 抽取(Source) +- 处理(Process) +- 发送(Sink) + +流处理框架允许用户使用 Java 语言自定义编写三个子任务的处理逻辑,通过类似 UDF 的方式处理数据。 +在一个 Pipe 中,上述的三个子任务分别由三种插件执行实现,数据会依次经过这三个插件进行处理: +Pipe Source 用于抽取数据,Pipe Processor 用于处理数据,Pipe Sink 用于发送数据,最终数据将被发至外部系统。 + +**Pipe 任务的模型如下:** + +![任务模型图](https://alioss.timecho.com/docs/img/1706697228308.jpg) + +描述一个数据流处理任务,本质就是描述 Pipe Source、Pipe Processor 和 Pipe Sink 插件的属性。 +用户可以通过 SQL 语句声明式地配置三个子任务的具体属性,通过组合不同的属性,实现灵活的数据 ETL 能力。 + +利用流处理框架,可以搭建完整的数据链路来满足端*边云同步、异地灾备、读写负载分库*等需求。 + +## 自定义流处理插件开发 + +### 编程开发依赖 + +推荐采用 maven 构建项目,在`pom.xml`中添加以下依赖。请注意选择和 IoTDB 服务器版本相同的依赖版本。 + +```xml + + org.apache.iotdb + pipe-api + 1.3.1 + provided + +``` + +### 事件驱动编程模型 + +流处理插件的用户编程接口设计,参考了事件驱动编程模型的通用设计理念。事件(Event)是用户编程接口中的数据抽象,而编程接口与具体的执行方式解耦,只需要专注于描述事件(数据)到达系统后,系统期望的处理方式即可。 + +在流处理插件的用户编程接口中,事件是数据库数据写入操作的抽象。事件由单机流处理引擎捕获,按照流处理三个阶段的流程,依次传递至 PipeSource 插件,PipeProcessor 插件和 PipeSink 插件,并依次在三个插件中触发用户逻辑的执行。 + +为了兼顾端侧低负载场景下的流处理低延迟和端侧高负载场景下的流处理高吞吐,流处理引擎会动态地在操作日志和数据文件中选择处理对象,因此,流处理的用户编程接口要求用户提供下列两类事件的处理逻辑:操作日志写入事件 TabletInsertionEvent 和数据文件写入事件 TsFileInsertionEvent。 + +#### **操作日志写入事件(TabletInsertionEvent)** + +操作日志写入事件(TabletInsertionEvent)是对用户写入请求的高层数据抽象,它通过提供统一的操作接口,为用户提供了操纵写入请求底层数据的能力。 + +对于不同的数据库部署方式,操作日志写入事件对应的底层存储结构是不一样的。对于单机部署的场景,操作日志写入事件是对写前日志(WAL)条目的封装;对于分布式部署的场景,操作日志写入事件是对单个节点共识协议操作日志条目的封装。 + +对于数据库不同写入请求接口生成的写入操作,操作日志写入事件对应的请求结构体的数据结构也是不一样的。IoTDB 提供了 InsertRecord、InsertRecords、InsertTablet、InsertTablets 等众多的写入接口,每一种写入请求都使用了完全不同的序列化方式,生成的二进制条目也不尽相同。 + +操作日志写入事件的存在,为用户提供了一种统一的数据操作视图,它屏蔽了底层数据结构的实现差异,极大地降低了用户的编程门槛,提升了功能的易用性。 + +```java +/** TabletInsertionEvent is used to define the event of data insertion. */ +public interface TabletInsertionEvent extends Event { + + /** + * The consumer processes the data row by row and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processRowByRow(BiConsumer consumer); + + /** + * The consumer processes the Tablet directly and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processTablet(BiConsumer consumer); +} +``` + +#### **数据文件写入事件(TsFileInsertionEvent)** + +数据文件写入事件(TsFileInsertionEvent) 是对数据库文件落盘操作的高层抽象,它是若干操作日志写入事件(TabletInsertionEvent)的数据集合。 + +IoTDB 的存储引擎是 LSM 结构的。数据写入时会先将写入操作落盘到日志结构的文件里,同时将写入数据保存在内存里。当内存达到控制上限,则会触发刷盘行为,即将内存中的数据转换为数据库文件,同时删除之前预写的操作日志。当内存中的数据转换为数据库文件中的数据时,会经过编码压缩和通用压缩两次压缩处理,因此数据库文件的数据相比内存中的原始数据占用的空间更少。 + +在极端的网络情况下,直接传输数据文件相比传输数据写入的操作要更加经济,它会占用更低的网络带宽,能实现更快的传输速度。当然,天下没有免费的午餐,对文件中的数据进行计算处理,相比直接对内存中的数据进行计算处理时,需要额外付出文件 I/O 的代价。但是,正是磁盘数据文件和内存写入操作两种结构各有优劣的存在,给了系统做动态权衡调整的机会,也正是基于这样的观察,插件的事件模型中才引入了数据文件写入事件。 + +综上,数据文件写入事件出现在流处理插件的事件流中,存在下面两种情况: + +(1)历史数据抽取:一个流处理任务开始前,所有已经落盘的写入数据都会以 TsFile 的形式存在。一个流处理任务开始后,采集历史数据时,历史数据将以 TsFileInsertionEvent 作为抽象; + +(2)实时数据抽取:一个流处理任务进行时,当数据流中实时处理操作日志写入事件的速度慢于写入请求速度一定进度之后,未来得及处理的操作日志写入事件会被被持久化至磁盘,以 TsFile 的形式存在,这一些数据被流处理引擎抽取到后,会以 TsFileInsertionEvent 作为抽象。 + +```java +/** + * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, + * which is compressed and encoded, and requires IO cost for computational processing. + */ +public interface TsFileInsertionEvent extends Event { + + /** + * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. + * + * @return {@code Iterable} the list of TabletInsertionEvent + */ + Iterable toTabletInsertionEvents(); +} +``` + +### 自定义流处理插件编程接口定义 + +基于自定义流处理插件编程接口,用户可以轻松编写数据抽取插件、数据处理插件和数据发送插件,从而使得流处理功能灵活适配各种工业场景。 + +#### 数据抽取插件接口 + +数据抽取是流处理数据从数据抽取到数据发送三阶段的第一阶段。数据抽取插件(PipeSource)是流处理引擎和存储引擎的桥梁,它通过监听存储引擎的行为, +捕获各种数据写入事件。 + +```java +/** + * PipeSource + * + *

PipeSource is responsible for capturing events from sources. + * + *

Various data sources can be supported by implementing different PipeSource classes. + * + *

The lifecycle of a PipeSource is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH SOURCE` clause in SQL are + * parsed and the validation method {@link PipeSource#validate(PipeParameterValidator)} will + * be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} will be called to + * config the runtime behavior of the PipeSource. + *
  • Then the method {@link PipeSource#start()} will be called to start the PipeSource. + *
  • While the collaboration task is in progress, the method {@link PipeSource#supply()} will be + * called to capture events from sources and then the events will be passed to the + * PipeProcessor. + *
  • The method {@link PipeSource#close()} will be called when the collaboration task is + * cancelled (the `DROP PIPE` command is executed). + *
+ */ +public interface PipeSource extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSource. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSourceRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSource#validate(PipeParameterValidator)} + * is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSource + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSourceRuntimeConfiguration configuration) + throws Exception; + + /** + * Start the Source. After this method is called, events should be ready to be supplied by + * {@link PipeSource#supply()}. This method is called after {@link + * PipeSource#customize(PipeParameters, PipeSourceRuntimeConfiguration)} is called. + * + * @throws Exception the user can throw errors if necessary + */ + void start() throws Exception; + + /** + * Supply single event from the Source and the caller will send the event to the processor. + * This method is called after {@link PipeSource#start()} is called. + * + * @return the event to be supplied. the event may be null if the Source has no more events at + * the moment, but the Source is still running for more events. + * @throws Exception the user can throw errors if necessary + */ + Event supply() throws Exception; +} +``` + +#### 数据处理插件接口 + +数据处理是流处理数据从数据抽取到数据发送三阶段的第二阶段。数据处理插件(PipeProcessor)主要用于过滤和转换由数据抽取插件(PipeSource)捕获的 +各种事件。 + +```java +/** + * PipeProcessor + * + *

PipeProcessor is used to filter and transform the Event formed by the PipeSource. + * + *

The lifecycle of a PipeProcessor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are + * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeProcessor. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSink. The + * following 3 methods will be called: {@link + * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link + * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link + * PipeProcessor#process(Event, EventCollector)}. + *
    • PipeSink serializes the events into binaries and send them to sinks. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeProcessor#close() } method will be called. + *
+ */ +public interface PipeProcessor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeProcessor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the + * events processing. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeProcessor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is called to process the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) + throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + process(tabletInsertionEvent, eventCollector); + } + } + + /** + * This method is called to process the Event. + * + * @param event Event to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(Event event, EventCollector eventCollector) throws Exception; +} +``` + +#### 数据发送插件接口 + +数据发送是流处理数据从数据抽取到数据发送三阶段的第三阶段。数据发送插件(PipeSink)主要用于发送经由数据处理插件(PipeProcessor)处理过后的 +各种事件,它作为流处理框架的网络实现层,接口上应允许接入多种实时通信协议和多种连接器。 + +```java +/** + * PipeSink + * + *

PipeSink is responsible for sending events to sinks. + * + *

Various network protocols can be supported by implementing different PipeSink classes. + * + *

The lifecycle of a PipeSink is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH SINK` clause in SQL are + * parsed and the validation method {@link PipeSink#validate(PipeParameterValidator)} will be + * called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link PipeSink#customize(PipeParameters, + * PipeSinkRuntimeConfiguration)} will be called to config the runtime behavior of the + * PipeSink and the method {@link PipeSink#handshake()} will be called to create a connection + * with sink. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeSource captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeSink. + *
    • PipeSink serializes the events into binaries and send them to sinks. The following 3 + * methods will be called: {@link PipeSink#transfer(TabletInsertionEvent)}, {@link + * PipeSink#transfer(TsFileInsertionEvent)} and {@link PipeSink#transfer(Event)}. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeSink#close() } method will be called. + *
+ * + *

In addition, the method {@link PipeSink#heartbeat()} will be called periodically to check + * whether the connection with sink is still alive. The method {@link PipeSink#handshake()} will be + * called to create a new connection with the sink when the method {@link PipeSink#heartbeat()} + * throws exceptions. + */ +public interface PipeSink extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeSink. In this method, the user can do the following + * things: + * + *

    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeSinkRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link PipeSink#validate(PipeParameterValidator)} is + * called and before the method {@link PipeSink#handshake()} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeSink + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeSinkRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is used to create a connection with sink. This method will be called after the + * method {@link PipeSink#customize(PipeParameters, PipeSinkRuntimeConfiguration)} is called or + * will be called when the method {@link PipeSink#heartbeat()} throws exceptions. + * + * @throws Exception if the connection is failed to be created + */ + void handshake() throws Exception; + + /** + * This method will be called periodically to check whether the connection with sink is still + * alive. + * + * @throws Exception if the connection dies + */ + void heartbeat() throws Exception; + + /** + * This method is used to transfer the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; + + /** + * This method is used to transfer the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { + try { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + transfer(tabletInsertionEvent); + } + } finally { + tsFileInsertionEvent.close(); + } + } + + /** + * This method is used to transfer the generic events, including HeartbeatEvent. + * + * @param event Event to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(Event event) throws Exception; +} +``` + +## 自定义流处理插件管理 + +为了保证用户自定义插件在实际生产中的灵活性和易用性,系统还需要提供对插件进行动态统一管理的能力。 +本章节介绍的流处理插件管理语句提供了对插件进行动态统一管理的入口。 + +### 加载插件语句 + +在 IoTDB 中,若要在系统中动态载入一个用户自定义插件,则首先需要基于 PipeSource、 PipeProcessor 或者 PipeSink 实现一个具体的插件类, +然后需要将插件类编译打包成 jar 可执行文件,最后使用加载插件的管理语句将插件载入 IoTDB。 + +加载插件的管理语句的语法如图所示。 + +```sql +CREATE PIPEPLUGIN [IF NOT EXISTS] <别名> +AS <全类名> +USING +``` + +**IF NOT EXISTS 语义**:用于创建操作中,确保当指定 Pipe Plugin 不存在时,执行创建命令,防止因尝试创建已存在的 Pipe Plugin 而导致报错。 + +示例:假如用户实现了一个全类名为edu.tsinghua.iotdb.pipe.ExampleProcessor 的数据处理插件,打包后的jar包为 pipe-plugin.jar ,用户希望在流处理引擎中使用这个插件,将插件标记为 example。插件包有两种使用方式,一种为上传到URI服务器,一种为上传到集群本地目录,两种方法任选一种即可。 + +【方式一】上传到URI服务器 + +准备工作:使用该种方式注册,您需要提前将 JAR 包上传到 URI 服务器上并确保执行注册语句的IoTDB实例能够访问该 URI 服务器。例如 https://example.com:8080/iotdb/pipe-plugin.jar 。 + +创建语句: + +```sql +CREATE PIPEPLUGIN IF NOT EXISTS example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +【方式二】上传到集群本地目录 + +准备工作:使用该种方式注册,您需要提前将 JAR 包放置到DataNode节点所在机器的任意路径下,推荐您将JAR包放在IoTDB安装路径的/ext/pipe目录下(安装包中已有,无需新建)。例如:iotdb-1.x.x-bin/ext/pipe/pipe-plugin.jar。(**注意:如果您使用的是集群,那么需要将 JAR 包放置到每个 DataNode 节点所在机器的该路径下)** + +创建语句: + +```sql +CREATE PIPEPLUGIN IF NOT EXISTS example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +### 删除插件语句 + +当用户不再想使用一个插件,需要将插件从系统中卸载时,可以使用如图所示的删除插件语句。 + +```sql +DROP PIPEPLUGIN [IF EXISTS] <别名> +``` + +**IF EXISTS 语义**:用于删除操作中,确保当指定 Pipe Plugin 存在时,执行删除命令,防止因尝试删除不存在的 Pipe Plugin 而导致报错。 + +### 查看插件语句 + +用户也可以按需查看系统中的插件。查看插件的语句如图所示。 + +```sql +SHOW PIPEPLUGINS +``` + +## 系统预置的流处理插件 + +### 预置 source 插件 + +#### iotdb-source + +作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 + + +| key | value | value 取值范围 | required or optional with default | +|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|-----------------------------------| +| source | iotdb-source | String: iotdb-source | required | +| source.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| source.history.start-time | 抽取的历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| source.history.end-time | 抽取的历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| start-time(V1.3.1+) | start of synchronizing all data event time,including start-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| end-time(V1.3.1+) | end of synchronizing all data event time,including end-time. Will disable "history.start-time" "history.end-time" if configured | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| source.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | optional: hybrid | +| source.forwarding-pipe-requests | 是否抽取由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | optional: true | + +> 🚫 **source.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,抽取效率较高,其他任意格式都将降低性能 +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'source.pattern'='root.aligned.1' 的 pipe 时: + > + > * root.aligned.1TS + > * root.aligned.1TS.\`1\` +> * root.aligned.100T + > + > 的数据会被抽取; + > + > * root.aligned.\`1\` +> * root.aligned.\`123\` + > + > 的数据不会被抽取。 + +> ❗️**source.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + +> 💎 **iotdb-source 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** + +> 📌 **source.realtime.mode:数据抽取的模式** +> +> * log:该模式下,任务仅使用操作日志进行数据处理、发送 +> * file:该模式下,任务仅使用数据文件进行数据处理、发送 +> * hybrid:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 + +> 🍕 **source.forwarding-pipe-requests:是否允许转发从另一 pipe 传输而来的数据** +> +> * 如果要使用 pipe 构建 A -> B -> C 的数据同步,那么 B -> C 的 pipe 需要将该参数为 true 后,A -> B 中 A 通过 pipe 写入 B 的数据才能被正确转发到 C +> * 如果要使用 pipe 构建 A \<-> B 的双向数据同步(双活),那么 A -> B 和 B -> A 的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发 + +### 预置 processor 插件 + +#### do-nothing-processor + +作用:不对 source 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +|-----------|----------------------|------------------------------|-----------------------------------| +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### 预置 sink 插件 + +#### do-nothing-sink + +作用:不对 processor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +|------|-----------------|-------------------------|-----------------------------------| +| sink | do-nothing-sink | String: do-nothing-sink | required | + +## 流处理任务管理 + +### 创建流处理任务 + +使用 `CREATE PIPE` 语句来创建流处理任务。以数据同步流处理任务的创建为例,示例 SQL 语句如下: + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH SOURCE ( + -- 默认的 IoTDB 数据抽取插件 + 'source' = 'iotdb-source', + -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + 'source.pattern' = 'root.timecho', + -- 是否抽取历史数据 + 'source.history.enable' = 'true', + -- 描述被抽取的历史数据的时间范围,表示最早时间 + 'source.history.start-time' = '2011.12.03T10:15:30+01:00', + -- 描述被抽取的历史数据的时间范围,表示最晚时间 + 'source.history.end-time' = '2022.12.03T10:15:30+01:00', + -- 是否抽取实时数据 + 'source.realtime.enable' = 'true', + -- 描述实时数据的抽取方式 + 'source.realtime.mode' = 'hybrid', +) +WITH PROCESSOR ( + -- 默认的数据处理插件,即不做任何处理 + 'processor' = 'do-nothing-processor', +) +WITH SINK ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'sink' = 'iotdb-thrift-sink', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'sink.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'sink.port' = '6667', +) +``` + +**创建流处理任务时需要配置 PipeId 以及三个插件部分的参数:** + + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +|-----------|--------------------------------|---------------------------|----------------------|------------------------------|--------------------------| +| PipeId | 全局唯一标定一个流处理任务的名称 | 必填 | - | - | - | +| source | Pipe Source 插件,负责在数据库底层抽取流处理数据 | 选填 | iotdb-source | 将数据库的全量历史数据和后续到达的实时数据接入流处理任务 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| sink | Pipe Sink 插件,负责发送数据 | 必填 | - | - | | + +示例中,使用了 iotdb-source、do-nothing-processor 和 iotdb-thrift-sink 插件构建数据流处理任务。IoTDB 还内置了其他的流处理插件,**请查看“系统预置流处理插件”一节**。 + +**一个最简的 CREATE PIPE 语句示例如下:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH SINK ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'sink' = 'iotdb-thrift-sink', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'sink.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'sink.port' = '6667', +) +``` + +其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 + +**注意:** + +- SOURCE 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 +- SINK 为必填配置,需要在 CREATE PIPE 语句中声明式配置 +- SINK 具备自复用能力。对于不同的流处理任务,如果他们的 SINK 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 SINK 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个流处理任务的声明: + + ```sql + CREATE PIPE pipe1 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.ip' = 'localhost', + 'sink.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH SINK ( + 'sink' = 'iotdb-thrift-sink', + 'sink.port' = '9999', + 'sink.ip' = 'localhost', + ) + ``` + + - 因为它们对 SINK 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 SINK 进行复用,最终 pipe1, pipe2 的 SINK 将会是同一个实例。 +- 在 source 为默认的 iotdb-source,且 source.forwarding-pipe-requests 为默认值 true 时,请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### 启动流处理任务 + +CREATE PIPE 语句成功执行后,流处理任务相关实例会被创建,但整个流处理任务的运行状态会被置为 STOPPED,即流处理任务不会立刻处理数据(V1.3.0)。在 1.3.1 及以上的版本,流处理任务的运行状态在创建后将被立即置为 RUNNING。 + +可以使用 START PIPE 语句使流处理任务开始处理数据: + +```sql +START PIPE +``` + +### 停止流处理任务 + +使用 STOP PIPE 语句使流处理任务停止处理数据: + +```sql +STOP PIPE +``` + +### 删除流处理任务 + +使用 DROP PIPE 语句使流处理任务停止处理数据(当流处理任务状态为 RUNNING 时),然后删除整个流处理任务流处理任务: + +```sql +DROP PIPE +``` + +用户在删除流处理任务前,不需要执行 STOP 操作。 + +### 展示流处理任务 + +使用 SHOW PIPES 语句查看所有流处理任务: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +| ID| CreationTime | State|PipeSource|PipeProcessor|PipeSink|ExceptionMessage| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| {}| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+----------+-------------+--------+----------------+ +``` + +可以使用 `` 指定想看的某个流处理任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Sink 被复用的情况。 + +```sql +SHOW PIPES +WHERE SINK USED BY +``` + +### 流处理任务运行状态迁移 + +一个流处理 pipe 在其的生命周期中会经过多种状态: + +- **RUNNING:** pipe 正在正常工作 + - 当一个 pipe 被成功创建之后,其初始状态为工作状态(V1.3.1+) +- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: + - 当一个 pipe 被成功创建之后,其初始状态为暂停状态(V1.3.0) + - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED + - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- **DROPPED:** pipe 任务被永久删除 + +下图表明了所有状态以及状态的迁移: + +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## 权限管理 + +### 流处理任务 + + +| 权限名称 | 描述 | +|----------|---------------| +| USE_PIPE | 注册流处理任务。路径无关。 | +| USE_PIPE | 开启流处理任务。路径无关。 | +| USE_PIPE | 停止流处理任务。路径无关。 | +| USE_PIPE | 卸载流处理任务。路径无关。 | +| USE_PIPE | 查询流处理任务。路径无关。 | + +### 流处理任务插件 + + +| 权限名称 | 描述 | +|----------|-----------------| +| USE_PIPE | 注册流处理任务插件。路径无关。 | +| USE_PIPE | 卸载流处理任务插件。路径无关。 | +| USE_PIPE | 查询流处理任务插件。路径无关。 | + +## 配置参数 + +在 iotdb-system.properties 中: + +V1.3.0+: +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 + +# The maximum number of selectors that can be used in the async connector. +# pipe_async_connector_selector_number=1 + +# The core number of clients that can be used in the async connector. +# pipe_async_connector_core_client_number=8 + +# The maximum number of clients that can be used in the async connector. +# pipe_async_connector_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# pipe_air_gap_receiver_port=9780 +``` + +V1.3.1+: +```Properties +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_sink_timeout_ms=900000 + +# The maximum number of selectors that can be used in the sink. +# Recommend to set this value to less than or equal to pipe_sink_max_client_number. +# pipe_sink_selector_number=4 + +# The maximum number of clients that can be used in the sink. +# pipe_sink_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# pipe_air_gap_receiver_port=9780 +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Tiered-Storage_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Tiered-Storage_timecho.md new file mode 100644 index 00000000..501a3966 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Tiered-Storage_timecho.md @@ -0,0 +1,97 @@ + + +# 多级存储 +## 概述 + +多级存储功能向用户提供多种存储介质管理的能力,用户可以使用多级存储功能为 IoTDB 配置不同类型的存储介质,并为存储介质进行分级。具体的,在 IoTDB 中,多级存储的配置体现为多目录的管理。用户可以将多个存储目录归为同一类,作为一个“层级”向 IoTDB 中配置,这种“层级”我们称之为 storage tier;同时,用户可以根据数据的冷热进行分类,并将不同类别的数据存储到指定的“层级”中。当前 IoTDB 支持通过数据的 TTL 进行冷热数据的分类,当一个层级中的数据不满足当前层级定义的 TTL 规则时,该数据会被自动迁移至下一层级中。 + +## 参数定义 + +在 IoTDB 中开启多级存储,需要进行以下几个方面的配置: + +1. 配置数据目录,并将数据目录分为不同的层级 +2. 配置每个层级所管理的数据的 TTL,以区分不同层级管理的冷热数据类别。 +3. 配置每个层级的最小剩余存储空间比例,当该层级的存储空间触发该阈值时,该层级的数据会被自动迁移至下一层级(可选)。 + +具体的参数定义及其描述如下。 + +| 配置项 | 默认值 | 说明 | 约束 | +| ---------------------------------------- | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| dn_data_dirs | data/datanode/data | 用来指定不同的存储目录,并将存储目录进行层级划分 | 每级存储使用分号分隔,单级内使用逗号分隔;云端配置只能作为最后一级存储且第一级不能作为云端存储;最多配置一个云端对象;远端存储目录使用 OBJECT_STORAGE 来表示 | +| tier_ttl_in_ms | -1 | 定义每个层级负责的数据范围,通过 TTL 表示 | 每级存储使用分号分隔;层级数量需与 dn_data_dirs 定义的层级数一致;"-1" 表示"无限制" | +| dn_default_space_usage_thresholds | 0.85 | 定义每个层级数据目录的最小剩余空间比例;当剩余空间少于该比例时,数据会被自动迁移至下一个层级;当最后一个层级的剩余存储空间到低于此阈值时,会将系统置为 READ_ONLY | 每级存储使用分号分隔;层级数量需与 dn_data_dirs 定义的层级数一致 | +| object_storage_type | AWS_S3 | 云端存储类型 | IoTDB 当前只支持 AWS S3 作为远端存储类型,此参数不支持修改 | +| object_storage_bucket | iotdb_data | 云端存储 bucket 的名称 | AWS S3 中的 bucket 定义;如果未使用远端存储,无需配置 | +| object_storage_endpoiont | | 云端存储的 endpoint | AWS S3 的 endpoint;如果未使用远端存储,无需配置 | +| object_storage_access_key | | 云端存储的验证信息 key | AWS S3 的 credential key;如果未使用远端存储,无需配置 | +| object_storage_access_secret | | 云端存储的验证信息 secret | AWS S3 的 credential secret;如果未使用远端存储,无需配置 | +| remote_tsfile_cache_dirs | data/datanode/data/cache | 云端存储在本地的缓存目录 | 如果未使用远端存储,无需配置 | +| remote_tsfile_cache_page_size_in_kb | 20480 | 云端存储在本地缓存文件的块大小 | 如果未使用远端存储,无需配置 | +| remote_tsfile_cache_max_disk_usage_in_mb | 51200 | 云端存储本地缓存的最大磁盘占用大小 | 如果未使用远端存储,无需配置 | + + +## 本地多级存储配置示例 + +以下以本地两级存储的配置示例。 + +```JavaScript +// 必须配置项 +dn_data_dirs=/data1/data;/data2/data,/data3/data; +tier_ttl_in_ms=86400000;-1 +dn_default_space_usage_thresholds=0.2;0.1 +``` + +在该示例中,共配置了两个层级的存储,具体为: + +| **层级** | **数据目录** | **数据范围** | **磁盘最小剩余空间阈值** | +| -------- | -------------------------------------- | --------------- | ------------------------ | +| 层级一 | 目录一:/data1/data | 最近 1 天的数据 | 20% | +| 层级二 | 目录一:/data2/data目录二:/data3/data | 1 天以前的数据 | 10% | + +## 远端多级存储配置示例 + +以下以三级存储为例: + +```JavaScript +// 必须配置项 +dn_data_dirs=/data1/data;/data2/data,/data3/data;OBJECT_STORAGE +tier_ttl_in_ms=86400000;864000000;-1 +dn_default_space_usage_thresholds=0.2;0.15;0.1 +object_storage_name=AWS_S3 +object_storage_bucket=iotdb +object_storage_endpoiont= +object_storage_access_key= +object_storage_access_secret= + +// 可选配置项 +remote_tsfile_cache_dirs=data/datanode/data/cache +remote_tsfile_cache_page_size_in_kb=20971520 +remote_tsfile_cache_max_disk_usage_in_mb=53687091200 +``` + +在该示例中,共配置了三个层级的存储,具体为: + +| **层级** | **数据目录** | **数据范围** | **磁盘最小剩余空间阈值** | +| -------- | -------------------------------------- | ---------------------------- | ------------------------ | +| 层级一 | 目录一:/data1/data | 最近 1 天的数据 | 20% | +| 层级二 | 目录一:/data2/data目录二:/data3/data | 过去1 天至过去 10 天内的数据 | 15% | +| 层级三 | 远端 AWS S3 存储 | 过去 10 天以前的数据 | 10% | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Trigger.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Trigger.md new file mode 100644 index 00000000..aacc1991 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/Trigger.md @@ -0,0 +1,467 @@ + + +# 触发器 + +## 使用说明 + +触发器提供了一种侦听序列数据变动的机制。配合用户自定义逻辑,可完成告警、数据转发等功能。 + +触发器基于 Java 反射机制实现。用户通过简单实现 Java 接口,即可实现数据侦听。IoTDB 允许用户动态注册、卸载触发器,在注册、卸载期间,无需启停服务器。 + +### 侦听模式 + +IoTDB 的单个触发器可用于侦听符合特定模式的时间序列的数据变动,如时间序列 root.sg.a 上的数据变动,或者符合路径模式 root.**.a 的时间序列上的数据变动。您在注册触发器时可以通过 SQL 语句指定触发器侦听的路径模式。 + +### 触发器类型 + +目前触发器分为两类,您在注册触发器时可以通过 SQL 语句指定类型: + +- 有状态的触发器。该类触发器的执行逻辑可能依赖前后的多条数据,框架会将不同节点写入的数据汇总到同一个触发器实例进行计算,来保留上下文信息,通常用于采样或者统计一段时间的数据聚合信息。集群中只有一个节点持有有状态触发器的实例。 +- 无状态的触发器。触发器的执行逻辑只和当前输入的数据有关,框架无需将不同节点的数据汇总到同一个触发器实例中,通常用于单行数据的计算和异常检测等。集群中每个节点均持有无状态触发器的实例。 + +### 触发时机 + +触发器的触发时机目前有两种,后续会拓展其它触发时机。您在注册触发器时可以通过 SQL 语句指定触发时机: + +- BEFORE INSERT,即在数据持久化之前触发。请注意,目前触发器并不支持数据清洗,不会对要持久化的数据本身进行变动。 +- AFTER INSERT,即在数据持久化之后触发。 + +## 编写触发器 + +### 触发器依赖 + +触发器的逻辑需要您编写 Java 类进行实现。 +在编写触发器逻辑时,需要使用到下面展示的依赖。如果您使用 [Maven](http://search.maven.org/),则可以直接从 [Maven 库](http://search.maven.org/)中搜索到它们。请注意选择和目标服务器版本相同的依赖版本。 + +``` xml + + org.apache.iotdb + iotdb-server + 1.0.0 + provided + +``` + +### 接口说明 + +编写一个触发器需要实现 `org.apache.iotdb.trigger.api.Trigger` 类。 + +```java +import org.apache.iotdb.trigger.api.enums.FailureStrategy; +import org.apache.iotdb.tsfile.write.record.Tablet; + +public interface Trigger { + + /** + * This method is mainly used to validate {@link TriggerAttributes} before calling {@link + * Trigger#onCreate(TriggerAttributes)}. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void validate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when creating a trigger after validation. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void onCreate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when dropping a trigger. + * + * @throws Exception e + */ + default void onDrop() throws Exception {} + + /** + * When restarting a DataNode, Triggers that have been registered will be restored and this method + * will be called during the process of restoring. + * + * @throws Exception e + */ + default void restore() throws Exception {} + + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } + + /** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +} +``` + +该类主要提供了两类编程接口:**生命周期相关接口**和**数据变动侦听相关接口**。该类中所有的接口都不是必须实现的,当您不实现它们时,它们不会对流经的数据操作产生任何响应。您可以根据实际需要,只实现其中若干接口。 + +下面是所有可供用户进行实现的接口的说明。 + +#### 生命周期相关接口 + +| 接口定义 | 描述 | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| *default void validate(TriggerAttributes attributes) throws Exception {}* | 用户在使用 `CREATE TRIGGER` 语句创建触发器时,可以指定触发器需要使用的参数,该接口会用于验证参数正确性。 | +| *default void onCreate(TriggerAttributes attributes) throws Exception {}* | 当您使用`CREATE TRIGGER`语句创建触发器后,该接口会被调用一次。在每一个触发器实例的生命周期内,该接口会且仅会被调用一次。该接口主要有如下作用:帮助用户解析 SQL 语句中的自定义属性(使用`TriggerAttributes`)。 可以创建或申请资源,如建立外部链接、打开文件等。 | +| *default void onDrop() throws Exception {}* | 当您使用`DROP TRIGGER`语句删除触发器后,该接口会被调用。在每一个触发器实例的生命周期内,该接口会且仅会被调用一次。该接口主要有如下作用:可以进行资源释放的操作。可以用于持久化触发器计算的结果。 | +| *default void restore() throws Exception {}* | 当重启 DataNode 时,集群会恢复 DataNode 上已经注册的触发器实例,在此过程中会为该 DataNode 上的有状态触发器调用一次该接口。有状态触发器实例所在的 DataNode 宕机后,集群会在另一个可用 DataNode 上恢复该触发器的实例,在此过程中会调用一次该接口。该接口可以用于自定义恢复逻辑。 | + +#### 数据变动侦听相关接口 + +##### 侦听接口 + +```java + /** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +``` + +数据变动时,触发器以 Tablet 作为触发操作的单位。您可以通过 Tablet 获取相应序列的元数据和数据,然后进行相应的触发操作,触发成功则返回值应当为 true。该接口返回 false 或是抛出异常我们均认为触发失败。在触发失败时,我们会根据侦听策略接口进行相应的操作。 + +进行一次 INSERT 操作时,对于其中的每条时间序列,我们会检测是否有侦听该路径模式的触发器,然后将符合同一个触发器所侦听的路径模式的时间序列数据组装成一个新的 Tablet 用于触发器的 fire 接口。可以理解成: + +```java +Map> pathToTriggerListMap => Map +``` + +**请注意,目前我们不对触发器的触发顺序有任何保证。** + +下面是示例: + +假设有三个触发器,触发器的触发时机均为 BEFORE INSERT + +- 触发器 Trigger1 侦听路径模式:root.sg.* +- 触发器 Trigger2 侦听路径模式:root.sg.a +- 触发器 Trigger3 侦听路径模式:root.sg.b + +写入语句: + +```sql +insert into root.sg(time, a, b) values (1, 1, 1); +``` + +序列 root.sg.a 匹配 Trigger1 和 Trigger2,序列 root.sg.b 匹配 Trigger1 和 Trigger3,那么: + +- root.sg.a 和 root.sg.b 的数据会被组装成一个新的 tablet1,在相应的触发时机进行 Trigger1.fire(tablet1) +- root.sg.a 的数据会被组装成一个新的 tablet2,在相应的触发时机进行 Trigger2.fire(tablet2) +- root.sg.b 的数据会被组装成一个新的 tablet3,在相应的触发时机进行 Trigger3.fire(tablet3) + +##### 侦听策略接口 + +在触发器触发失败时,我们会根据侦听策略接口设置的策略进行相应的操作,您可以通过下述接口设置 `org.apache.iotdb.trigger.api.enums.FailureStrategy`,目前有乐观和悲观两种策略: + +- 乐观策略:触发失败的触发器不影响后续触发器的触发,也不影响写入流程,即我们不对触发失败涉及的序列做额外处理,仅打日志记录失败,最后返回用户写入数据成功,但触发部分失败。 +- 悲观策略:失败触发器影响后续所有 Pipeline 的处理,即我们认为该 Trigger 触发失败会导致后续所有触发流程不再进行。如果该触发器的触发时机为 BEFORE INSERT,那么写入也不再进行,直接返回写入失败。 + +```java + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } +``` + +您可以参考下图辅助理解,其中 Trigger1 配置采用乐观策略,Trigger2 配置采用悲观策略。Trigger1 和 Trigger2 的触发时机是 BEFORE INSERT,Trigger3 和 Trigger4 的触发时机是 AFTER INSERT。 正常执行流程如下: + + + + + + +### 示例 + +如果您使用 [Maven](http://search.maven.org/),可以参考我们编写的示例项目 trigger-example。您可以在 [这里](https://github.com/apache/iotdb/tree/master/example/trigger) 找到它。后续我们会加入更多的示例项目供您参考。 + +下面是其中一个示例项目的代码: + +```java +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.trigger; + +import org.apache.iotdb.db.engine.trigger.sink.alertmanager.AlertManagerConfiguration; +import org.apache.iotdb.db.engine.trigger.sink.alertmanager.AlertManagerEvent; +import org.apache.iotdb.db.engine.trigger.sink.alertmanager.AlertManagerHandler; +import org.apache.iotdb.trigger.api.Trigger; +import org.apache.iotdb.trigger.api.TriggerAttributes; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.write.record.Tablet; +import org.apache.iotdb.tsfile.write.schema.MeasurementSchema; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; + +public class ClusterAlertingExample implements Trigger { + private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class); + + private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); + + private final AlertManagerConfiguration alertManagerConfiguration = + new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); + + private String alertname; + + private final HashMap labels = new HashMap<>(); + + private final HashMap annotations = new HashMap<>(); + + @Override + public void onCreate(TriggerAttributes attributes) throws Exception { + alertname = "alert_test"; + + labels.put("series", "root.ln.wf01.wt01.temperature"); + labels.put("value", ""); + labels.put("severity", ""); + + annotations.put("summary", "high temperature"); + annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); + + alertManagerHandler.open(alertManagerConfiguration); + } + + @Override + public void onDrop() throws IOException { + alertManagerHandler.close(); + } + + @Override + public boolean fire(Tablet tablet) throws Exception { + List measurementSchemaList = tablet.getSchemas(); + for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { + if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { + // for example, we only deal with the columns of Double type + double[] values = (double[]) tablet.values[i]; + for (double value : values) { + if (value > 100.0) { + LOGGER.info("trigger value > 100"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "critical"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } else if (value > 50.0) { + LOGGER.info("trigger value > 50"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "warning"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } + } + } + } + return true; + } +} +``` +## 管理触发器 + +您可以通过 SQL 语句注册和卸载一个触发器实例,您也可以通过 SQL 语句查询到所有已经注册的触发器。 + +**我们建议您在注册触发器时停止写入。** + +### 注册触发器 + +触发器可以注册在任意路径模式上。被注册有触发器的序列将会被触发器侦听,当序列上有数据变动时,触发器中对应的触发方法将会被调用。 + +注册一个触发器可以按如下流程进行: + +1. 按照编写触发器章节的说明,实现一个完整的 Trigger 类,假定这个类的全类名为 `org.apache.iotdb.trigger.ClusterAlertingExample` +2. 将项目打成 JAR 包。 +3. 使用 SQL 语句注册该触发器。注册过程中会仅只会调用一次触发器的 `validate` 和 `onCreate` 接口,具体请参考编写触发器章节。 + +完整 SQL 语法如下: + +```sql +// Create Trigger +createTrigger + : CREATE triggerType TRIGGER triggerName=identifier triggerEventClause ON pathPattern AS className=STRING_LITERAL uriClause? triggerAttributeClause? + ; + +triggerType + : STATELESS | STATEFUL + ; + +triggerEventClause + : (BEFORE | AFTER) INSERT + ; + +uriClause + : USING URI uri + ; + +uri + : STRING_LITERAL + ; + +triggerAttributeClause + : WITH LR_BRACKET triggerAttribute (COMMA triggerAttribute)* RR_BRACKET + ; + +triggerAttribute + : key=attributeKey operator_eq value=attributeValue + ; +``` + +下面对 SQL 语法进行说明,您可以结合使用说明章节进行理解: + +- triggerName:触发器 ID,该 ID 是全局唯一的,用于区分不同触发器,大小写敏感。 +- triggerType:触发器类型,分为无状态(STATELESS)和有状态(STATEFUL)两类。 +- triggerEventClause:触发时机,目前仅支持写入前(BEFORE INSERT)和写入后(AFTER INSERT)两种。 +- pathPattern:触发器侦听的路径模式,可以包含通配符 * 和 **。 +- className:触发器实现类的类名。 +- uriClause:可选项,当不指定该选项时,我们默认 DBA 已经在各个 DataNode 节点的 trigger_root_dir 目录(配置项,默认为 IOTDB_HOME/ext/trigger)下放置好创建该触发器需要的 JAR 包。当指定该选项时,我们会将该 URI 对应的文件资源下载并分发到各 DataNode 的 trigger_root_dir/install 目录下。 +- triggerAttributeClause:用于指定触发器实例创建时需要设置的参数,SQL 语法中该部分是可选项。 + +下面是一个帮助您理解的 SQL 语句示例: + +```sql +CREATE STATELESS TRIGGER triggerTest +BEFORE INSERT +ON root.sg.** +AS 'org.apache.iotdb.trigger.ClusterAlertingExample' +USING URI 'http://jar/ClusterAlertingExample.jar' +WITH ( + "name" = "trigger", + "limit" = "100" +) +``` + +上述 SQL 语句创建了一个名为 triggerTest 的触发器: + +- 该触发器是无状态的(STATELESS) +- 在写入前触发(BEFORE INSERT) +- 该触发器侦听路径模式为 root.sg.** +- 所编写的触发器类名为 org.apache.iotdb.trigger.ClusterAlertingExample +- JAR 包的 URI 为 http://jar/ClusterAlertingExample.jar +- 创建该触发器实例时会传入 name 和 limit 两个参数。 + +### 卸载触发器 + +可以通过指定触发器 ID 的方式卸载触发器,卸载触发器的过程中会且仅会调用一次触发器的 `onDrop` 接口。 + +卸载触发器的 SQL 语法如下: + +```sql +// Drop Trigger +dropTrigger + : DROP TRIGGER triggerName=identifier +; +``` + +下面是示例语句: + +```sql +DROP TRIGGER triggerTest1 +``` + +上述语句将会卸载 ID 为 triggerTest1 的触发器。 + +### 查询触发器 + +可以通过 SQL 语句查询集群中存在的触发器的信息。SQL 语法如下: + +```sql +SHOW TRIGGERS +``` + +该语句的结果集格式如下: + +| TriggerName | Event | Type | State | PathPattern | ClassName | NodeId | +| ------------ | ---------------------------- | -------------------- | ------------------------------------------- | ----------- | --------------------------------------- | --------------------------------------- | +| triggerTest1 | BEFORE_INSERT / AFTER_INSERT | STATELESS / STATEFUL | INACTIVE / ACTIVE / DROPPING / TRANSFFERING | root.** | org.apache.iotdb.trigger.TriggerExample | ALL(STATELESS) / DATA_NODE_ID(STATEFUL) | + + +### 触发器状态说明 + +在集群中注册以及卸载触发器的过程中,我们维护了触发器的状态,下面是对这些状态的说明: + +| 状态 | 描述 | 是否建议写入进行 | +| ------------ | ------------------------------------------------------------ | ---------------- | +| INACTIVE | 执行 `CREATE TRIGGER` 的中间状态,集群刚在 ConfigNode 上记录该触发器的信息,还未在任何 DataNode 上激活该触发器 | 否 | +| ACTIVE | 执行 `CREATE TRIGGE` 成功后的状态,集群所有 DataNode 上的该触发器都已经可用 | 是 | +| DROPPING | 执行 `DROP TRIGGER` 的中间状态,集群正处在卸载该触发器的过程中 | 否 | +| TRANSFERRING | 集群正在进行该触发器实例位置的迁移 | 否 | + +## 重要注意事项 + +- 触发器从注册时开始生效,不对已有的历史数据进行处理。**即只有成功注册触发器之后发生的写入请求才会被触发器侦听到。** +- 触发器目前采用**同步触发**,所以编写时需要保证触发器效率,否则可能会大幅影响写入性能。**您需要自己保证触发器内部的并发安全性**。 +- 集群中**不能注册过多触发器**。因为触发器信息全量保存在 ConfigNode 中,并且在所有 DataNode 都有一份该信息的副本。 +- **建议注册触发器时停止写入**。注册触发器并不是一个原子操作,注册触发器时,会出现集群内部分节点已经注册了该触发器,部分节点尚未注册成功的中间状态。为了避免部分节点上的写入请求被触发器侦听到,部分节点上没有被侦听到的情况,我们建议注册触发器时不要执行写入。 +- 触发器将作为进程内程序执行,如果您的触发器编写不慎,内存占用过多,由于 IoTDB 并没有办法监控触发器所使用的内存,所以有 OOM 的风险。 +- 持有有状态触发器实例的节点宕机时,我们会尝试在另外的节点上恢复相应实例,在恢复过程中我们会调用一次触发器类的 restore 接口,您可以在该接口中实现恢复触发器所维护的状态的逻辑。 +- 触发器 JAR 包有大小限制,必须小于 min(`config_node_ratis_log_appender_buffer_size_max`, 2G),其中 `config_node_ratis_log_appender_buffer_size_max` 是一个配置项,具体含义可以参考 IOTDB 配置项说明。 +- **不同的 JAR 包中最好不要有全类名相同但功能实现不一样的类**。例如:触发器 trigger1、trigger2 分别对应资源 trigger1.jar、trigger2.jar。如果两个 JAR 包里都包含一个 `org.apache.iotdb.trigger.example.AlertListener` 类,当 `CREATE TRIGGER` 使用到这个类时,系统会随机加载其中一个 JAR 包中的类,最终导致触发器执行行为不一致以及其他的问题。 + +## 配置参数 + +| 配置项 | 含义 | +| ------------------------------------------------- | ---------------------------------------------- | +| *trigger_lib_dir* | 保存触发器 jar 包的目录位置 | +| *stateful\_trigger\_retry\_num\_when\_not\_found* | 有状态触发器触发无法找到触发器实例时的重试次数 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/UDF-development.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/UDF-development.md new file mode 100644 index 00000000..d2ecb5dc --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/UDF-development.md @@ -0,0 +1,721 @@ +# UDF 开发 + +## 1. UDF 开发 + +### 1.1 UDF 依赖 + +如果您使用 [Maven](http://search.maven.org/) ,可以从 [Maven 库](http://search.maven.org/) 中搜索下面示例中的依赖。请注意选择和目标 IoTDB 服务器版本相同的依赖版本。 + +``` xml + + org.apache.iotdb + udf-api + 1.0.0 + provided + +``` + +### 1.2 UDTF(User Defined Timeseries Generating Function) + +编写一个 UDTF 需要继承`org.apache.iotdb.udf.api.UDTF`类,并至少实现`beforeStart`方法和一种`transform`方法。 + +#### 接口说明: + +| 接口定义 | 描述 | 是否必须 | +| :----------------------------------------------------------- | :----------------------------------------------------------- | ------------------------- | +| void validate(UDFParameterValidator validator) throws Exception | 在初始化方法`beforeStart`调用前执行,用于检测`UDFParameters`中用户输入的参数是否合法。 | 否 | +| void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception | 初始化方法,在 UDTF 处理输入数据前,调用用户自定义的初始化行为。用户每执行一次 UDTF 查询,框架就会构造一个新的 UDF 类实例,该方法在每个 UDF 类实例被初始化时调用一次。在每一个 UDF 类实例的生命周期内,该方法只会被调用一次。 | 是 | +| Object transform(Row row) throws Exception` | 这个方法由框架调用。当您在`beforeStart`中选择以`MappableRowByRowAccessStrategy`的策略消费原始数据时,可以选用该方法进行数据处理。输入参数以`Row`的形式传入,输出结果通过返回值`Object`输出。 | 所有`transform`方法四选一 | +| void transform(Column[] columns, ColumnBuilder builder) throws Exception | 这个方法由框架调用。当您在`beforeStart`中选择以`MappableRowByRowAccessStrategy`的策略消费原始数据时,可以选用该方法进行数据处理。输入参数以`Column[]`的形式传入,输出结果通过`ColumnBuilder`输出。您需要在该方法内自行调用`builder`提供的数据收集方法,以决定最终的输出数据。 | 所有`transform`方法四选一 | +| void transform(Row row, PointCollector collector) throws Exception | 这个方法由框架调用。当您在`beforeStart`中选择以`RowByRowAccessStrategy`的策略消费原始数据时,这个数据处理方法就会被调用。输入参数以`Row`的形式传入,输出结果通过`PointCollector`输出。您需要在该方法内自行调用`collector`提供的数据收集方法,以决定最终的输出数据。 | 所有`transform`方法四选一 | +| void transform(RowWindow rowWindow, PointCollector collector) throws Exception | 这个方法由框架调用。当您在`beforeStart`中选择以`SlidingSizeWindowAccessStrategy`或者`SlidingTimeWindowAccessStrategy`的策略消费原始数据时,这个数据处理方法就会被调用。输入参数以`RowWindow`的形式传入,输出结果通过`PointCollector`输出。您需要在该方法内自行调用`collector`提供的数据收集方法,以决定最终的输出数据。 | 所有`transform`方法四选一 | +| void terminate(PointCollector collector) throws Exception | 这个方法由框架调用。该方法会在所有的`transform`调用执行完成后,在`beforeDestory`方法执行前被调用。在一个 UDF 查询过程中,该方法会且只会调用一次。您需要在该方法内自行调用`collector`提供的数据收集方法,以决定最终的输出数据。 | 否 | +| void beforeDestroy() | UDTF 的结束方法。此方法由框架调用,并且只会被调用一次,即在处理完最后一条记录之后被调用。 | 否 | + +在一个完整的 UDTF 实例生命周期中,各个方法的调用顺序如下: + +1. void validate(UDFParameterValidator validator) throws Exception +2. void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception +3. Object transform(Row row) throws Exception 或着 void transform(Column[] columns, ColumnBuilder builder) throws Exception 或者 void transform(Row row, PointCollector collector) throws Exception 或者 void transform(RowWindow rowWindow, PointCollector collector) throws Exception +4. void terminate(PointCollector collector) throws Exception +5. void beforeDestroy() + +> 注意,框架每执行一次 UDTF 查询,都会构造一个全新的 UDF 类实例,查询结束时,对应的 UDF 类实例即被销毁,因此不同 UDTF 查询(即使是在同一个 SQL 语句中)UDF 类实例内部的数据都是隔离的。您可以放心地在 UDTF 中维护一些状态数据,无需考虑并发对 UDF 类实例内部状态数据的影响。 + +#### 接口详细介绍: + +1. **void validate(UDFParameterValidator validator) throws Exception** + + `validate`方法能够对用户输入的参数进行验证。 + + 您可以在该方法中限制输入序列的数量和类型,检查用户输入的属性或者进行自定义逻辑的验证。 + + `UDFParameterValidator`的使用方法请见 Javadoc。 + +2. **void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception** + + `beforeStart`方法有两个作用: + 1. 帮助用户解析 SQL 语句中的 UDF 参数 + 2. 配置 UDF 运行时必要的信息,即指定 UDF 访问原始数据时采取的策略和输出结果序列的类型 + 3. 创建资源,比如建立外部链接,打开文件等 + +2.1 **UDFParameters** + +`UDFParameters`的作用是解析 SQL 语句中的 UDF 参数(SQL 中 UDF 函数名称后括号中的部分)。参数包括序列类型参数和字符串 key-value 对形式输入的属性参数。 + +示例: + +``` sql +SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; +``` + +用法: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + String stringValue = parameters.getString("key1"); // iotdb + Float floatValue = parameters.getFloat("key2"); // 123.45 + Double doubleValue = parameters.getDouble("key3"); // null + int intValue = parameters.getIntOrDefault("key4", 678); // 678 + // do something + + // configurations + // ... +} +``` + +2.2 **UDTFConfigurations** + +您必须使用 `UDTFConfigurations` 指定 UDF 访问原始数据时采取的策略和输出结果序列的类型。 + +用法: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(Type.INT32); +} +``` + +其中`setAccessStrategy`方法用于设定 UDF 访问原始数据时采取的策略,`setOutputDataType`用于设定输出结果序列的类型。 + + 2.2.1 **setAccessStrategy** + +注意,您在此处设定的原始数据访问策略决定了框架会调用哪一种`transform`方法 ,请实现与原始数据访问策略对应的`transform`方法。当然,您也可以根据`UDFParameters`解析出来的属性参数,动态决定设定哪一种策略,因此,实现两种`transform`方法也是被允许的。 + +下面是您可以设定的访问原始数据的策略: + +| 接口定义 | 描述 | 调用的`transform`方法 | +| ------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| MappableRowByRowStrategy | 自定义标量函数
框架会为每一行原始数据输入调用一次`transform`方法,输入 k 列时间序列 1 行数据,输出 1 列时间序列 1 行数据,可用于标量函数出现的任何子句和表达式中,如select子句、where子句等。 | void transform(Column[] columns, ColumnBuilder builder) throws ExceptionObject transform(Row row) throws Exception | +| RowByRowAccessStrategy | 自定义时间序列生成函数,逐行地处理原始数据。
框架会为每一行原始数据输入调用一次`transform`方法,输入 k 列时间序列 1 行数据,输出 1 列时间序列 n 行数据。
当输入一个序列时,该行就作为输入序列的一个数据点。
当输入多个序列时,输入序列按时间对齐后,每一行作为的输入序列的一个数据点。
(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`) | void transform(Row row, PointCollector collector) throws Exception | +| SlidingTimeWindowAccessStrategy | 自定义时间序列生成函数,以滑动时间窗口的方式处理原始数据。
框架会为每一个原始数据输入窗口调用一次`transform`方法,输入 k 列时间序列 m 行数据,输出 1 列时间序列 n 行数据。
一个窗口可能存在多行数据,输入序列按时间对齐后,每个窗口作为的输入序列的一个数据点。
(每个窗口可能存在 i 行,每行数据可能存在某一列为`null`值,但不会全部都是`null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | +| SlidingSizeWindowAccessStrategy | 自定义时间序列生成函数,以固定行数的方式处理原始数据,即每个数据处理窗口都会包含固定行数的数据(最后一个窗口除外)。
框架会为每一个原始数据输入窗口调用一次`transform`方法,输入 k 列时间序列 m 行数据,输出 1 列时间序列 n 行数据。
一个窗口可能存在多行数据,输入序列按时间对齐后,每个窗口作为的输入序列的一个数据点。
(每个窗口可能存在 i 行,每行数据可能存在某一列为`null`值,但不会全部都是`null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | +| SessionTimeWindowAccessStrategy | 自定义时间序列生成函数,以会话窗口的方式处理原始数据。
框架会为每一个原始数据输入窗口调用一次`transform`方法,输入 k 列时间序列 m 行数据,输出 1 列时间序列 n 行数据。
一个窗口可能存在多行数据,输入序列按时间对齐后,每个窗口作为的输入序列的一个数据点。
(每个窗口可能存在 i 行,每行数据可能存在某一列为`null`值,但不会全部都是`null`) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | +| StateWindowAccessStrategy | 自定义时间序列生成函数,以状态窗口的方式处理原始数据。
框架会为每一个原始数据输入窗口调用一次`transform`方法,输入 1 列时间序列 m 行数据,输出 1 列时间序列 n 行数据。
一个窗口可能存在多行数据,目前仅支持对一个物理量也就是一列数据进行开窗。 | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | + +#### 接口详情: + +- `MappableRowByRowStrategy` 和 `RowByRowAccessStrategy`的构造不需要任何参数。 + +- `SlidingTimeWindowAccessStrategy` + +开窗示意图: + + + +`SlidingTimeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 3 类参数: + +1. 时间轴显示时间窗开始和结束时间 + +时间轴显示时间窗开始和结束时间不是必须要提供的。当您不提供这类参数时,时间轴显示时间窗开始时间会被定义为整个查询结果集中最小的时间戳,时间轴显示时间窗结束时间会被定义为整个查询结果集中最大的时间戳。 + +2. 划分时间轴的时间间隔参数(必须为正数) +3. 滑动步长(不要求大于等于时间间隔,但是必须为正数) + +滑动步长参数也不是必须的。当您不提供滑动步长参数时,滑动步长会被设定为划分时间轴的时间间隔。 + +3 类参数的关系可见下图。策略的构造方法详见 Javadoc。 + + + +> 注意,最后的一些时间窗口的实际时间间隔可能小于规定的时间间隔参数。另外,可能存在某些时间窗口内数据行数量为 0 的情况,这种情况框架也会为该窗口调用一次`transform`方法。 + +- `SlidingSizeWindowAccessStrategy` + +开窗示意图: + + + +`SlidingSizeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 2 个参数: + +1. 窗口大小,即一个数据处理窗口包含的数据行数。注意,最后一些窗口的数据行数可能少于规定的数据行数。 +2. 滑动步长,即下一窗口第一个数据行与当前窗口第一个数据行间的数据行数(不要求大于等于窗口大小,但是必须为正数) + +滑动步长参数不是必须的。当您不提供滑动步长参数时,滑动步长会被设定为窗口大小。 + +- `SessionTimeWindowAccessStrategy` + +开窗示意图:**时间间隔小于等于给定的最小时间间隔 sessionGap 则分为一组。** + + + + +`SessionTimeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 2 类参数: + +1. 时间轴显示时间窗开始和结束时间。 +2. 会话窗口之间的最小时间间隔。 + +- `StateWindowAccessStrategy` + +开窗示意图:**对于数值型数据,状态差值小于等于给定的阈值 delta 则分为一组。** + + + +`StateWindowAccessStrategy`有四种构造方法: + +1. 针对数值型数据,可以提供时间轴显示时间窗开始和结束时间以及对于单个窗口内部允许变化的阈值delta。 +2. 针对文本数据以及布尔数据,可以提供时间轴显示时间窗开始和结束时间。对于这两种数据类型,单个窗口内的数据是相同的,不需要提供变化阈值。 +3. 针对数值型数据,可以只提供单个窗口内部允许变化的阈值delta,时间轴显示时间窗开始时间会被定义为整个查询结果集中最小的时间戳,时间轴显示时间窗结束时间会被定义为整个查询结果集中最大的时间戳。 +4. 针对文本数据以及布尔数据,可以不提供任何参数,开始与结束时间戳见3中解释。 + +StateWindowAccessStrategy 目前只能接收一列输入。策略的构造方法详见 Javadoc。 + + 2.2.2 **setOutputDataType** + +注意,您在此处设定的输出结果序列的类型,决定了`transform`方法中`PointCollector`实际能够接收的数据类型。`setOutputDataType`中设定的输出类型和`PointCollector`实际能够接收的数据输出类型关系如下: + +| `setOutputDataType`中设定的输出类型 | `PointCollector`实际能够接收的输出类型 | +| :---------------------------------- | :----------------------------------------------------------- | +| INT32 | int | +| INT64 | long | +| FLOAT | float | +| DOUBLE | double | +| BOOLEAN | boolean | +| TEXT | java.lang.String 和 org.apache.iotdb.udf.api.type.Binary | + +UDTF 输出序列的类型是运行时决定的。您可以根据输入序列类型动态决定输出序列类型。 + +示例: + +```java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(parameters.getDataType(0)); +} +``` + +3. **Object transform(Row row) throws Exception** + +当您在`beforeStart`方法中指定 UDF 读取原始数据的策略为 `MappableRowByRowAccessStrategy`,您就需要该方法和下面的`void transform(Column[] columns, ColumnBuilder builder) throws Exception` 二选一来实现,在该方法中增加对原始数据处理的逻辑。 + +该方法每次处理原始数据的一行。原始数据由`Row`读入,由返回值输出。您必须在一次`transform`方法调用中,根据每个输入的数据点输出一个对应的数据点,即输入和输出依然是一对一的。需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`Object transform(Row row) throws Exception`方法的完整 UDF 示例。它是一个加法器,接收两列时间序列输入,输出这两个数据点的代数和。 + +```java +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameterValidator; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.MappableRowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + private Type dataType; + + @Override + public void validate(UDFParameterValidator validator) throws Exception { + validator + .validateInputSeriesNumber(2) + .validateInputSeriesDataType(0, Type.INT64) + .validateInputSeriesDataType(1, Type.INT64); + } + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + dataType = parameters.getDataType(0); + configurations + .setAccessStrategy(new MappableRowByRowAccessStrategy()) + .setOutputDataType(dataType); + } + + @Override + public Object transform(Row row) throws Exception { + return row.getLong(0) + row.getLong(1); + } +} +``` + +4. **void transform(Column[] columns, ColumnBuilder builder) throws Exception** + +当您在`beforeStart`方法中指定 UDF 读取原始数据的策略为 `MappableRowByRowAccessStrategy`,您就需要实现该方法,在该方法中增加对原始数据处理的逻辑。 + +该方法每次处理原始数据的多行,经过性能测试,我们发现一次性处理多行的 UDTF 比一次处理一行的 UDTF 性能更好。原始数据由`Column[]`读入,由`ColumnBuilder`输出。您必须在一次`transform`方法调用中,根据每个输入的数据点输出一个对应的数据点,即输入和输出依然是一对一的。需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`void transform(Column[] columns, ColumnBuilder builder) throws Exceptionn`方法的完整 UDF 示例。它是一个加法器,接收两列时间序列输入,输出这两个数据点的代数和。 + +``` java +import org.apache.iotdb.tsfile.read.common.block.column.Column; +import org.apache.iotdb.tsfile.read.common.block.column.ColumnBuilder; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameterValidator; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.MappableRowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + private Type type; + + @Override + public void validate(UDFParameterValidator validator) throws Exception { + validator + .validateInputSeriesNumber(2) + .validateInputSeriesDataType(0, Type.INT64) + .validateInputSeriesDataType(1, Type.INT64); + } + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + type = parameters.getDataType(0); + configurations.setAccessStrategy(new MappableRowByRowAccessStrategy()).setOutputDataType(type); + } + + @Override + public void transform(Column[] columns, ColumnBuilder builder) throws Exception { + long[] inputs1 = columns[0].getLongs(); + long[] inputs2 = columns[1].getLongs(); + + int count = columns[0].getPositionCount(); + for (int i = 0; i < count; i++) { + builder.writeLong(inputs1[i] + inputs2[i]); + } + } +} +``` + +5. **void transform(Row row, PointCollector collector) throws Exception** + +当您在`beforeStart`方法中指定 UDF 读取原始数据的策略为 `RowByRowAccessStrategy`,您就需要实现该方法,在该方法中增加对原始数据处理的逻辑。 + +该方法每次处理原始数据的一行。原始数据由`Row`读入,由`PointCollector`输出。您可以选择在一次`transform`方法调用中输出任意数量的数据点。需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`void transform(Row row, PointCollector collector) throws Exception`方法的完整 UDF 示例。它是一个加法器,接收两列时间序列输入,当这两个数据点都不为`null`时,输出这两个数据点的代数和。 + +``` java +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(Type.INT64) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) throws Exception { + if (row.isNull(0) || row.isNull(1)) { + return; + } + collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); + } +} +``` + +6. **void transform(RowWindow rowWindow, PointCollector collector) throws Exception** + +当您在`beforeStart`方法中指定 UDF 读取原始数据的策略为 `SlidingTimeWindowAccessStrategy`或者`SlidingSizeWindowAccessStrategy`时,您就需要实现该方法,在该方法中增加对原始数据处理的逻辑。 + +该方法每次处理固定行数或者固定时间间隔内的一批数据,我们称包含这一批数据的容器为窗口。原始数据由`RowWindow`读入,由`PointCollector`输出。`RowWindow`能够帮助您访问某一批次的`Row`,它提供了对这一批次的`Row`进行随机访问和迭代访问的接口。您可以选择在一次`transform`方法调用中输出任意数量的数据点,需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`void transform(RowWindow rowWindow, PointCollector collector) throws Exception`方法的完整 UDF 示例。它是一个计数器,接收任意列数的时间序列输入,作用是统计并输出指定时间范围内每一个时间窗口中的数据行数。 + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.RowWindow; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Counter implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(Type.INT32) + .setAccessStrategy(new SlidingTimeWindowAccessStrategy( + parameters.getLong("time_interval"), + parameters.getLong("sliding_step"), + parameters.getLong("display_window_begin"), + parameters.getLong("display_window_end"))); + } + + @Override + public void transform(RowWindow rowWindow, PointCollector collector) throws Exception { + if (rowWindow.windowSize() != 0) { + collector.putInt(rowWindow.windowStartTime(), rowWindow.windowSize()); + } + } +} +``` + +7. **void terminate(PointCollector collector) throws Exception** + +在一些场景下,UDF 需要遍历完所有的原始数据后才能得到最后的输出结果。`terminate`接口为这类 UDF 提供了支持。 + +该方法会在所有的`transform`调用执行完成后,在`beforeDestory`方法执行前被调用。您可以选择使用`transform`方法进行单纯的数据处理,最后使用`terminate`将处理结果输出。 + +结果需要由`PointCollector`输出。您可以选择在一次`terminate`方法调用中输出任意数量的数据点。需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`void terminate(PointCollector collector) throws Exception`方法的完整 UDF 示例。它接收一个`INT32`类型的时间序列输入,作用是输出该序列的最大值点。 + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Max implements UDTF { + + private Long time; + private int value; + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) { + if (row.isNull(0)) { + return; + } + int candidateValue = row.getInt(0); + if (time == null || value < candidateValue) { + time = row.getTime(); + value = candidateValue; + } + } + + @Override + public void terminate(PointCollector collector) throws IOException { + if (time != null) { + collector.putInt(time, value); + } + } +} +``` + +8. **void beforeDestroy()** + +UDTF 的结束方法,您可以在此方法中进行一些资源释放等的操作。 + +此方法由框架调用。对于一个 UDF 类实例而言,生命周期中会且只会被调用一次,即在处理完最后一条记录之后被调用。 + +### 1.3 UDAF(User Defined Aggregation Function) + +一个完整的 UDAF 定义涉及到 State 和 UDAF 两个类。 + +#### State 类 + +编写一个 State 类需要实现`org.apache.iotdb.udf.api.State`接口,下表是需要实现的方法说明。 + +#### 接口说明: + +| 接口定义 | 描述 | 是否必须 | +| -------------------------------- | ------------------------------------------------------------ | -------- | +| void reset() | 将 `State` 对象重置为初始的状态,您需要像编写构造函数一样,在该方法内填入 `State` 类中各个字段的初始值。 | 是 | +| byte[] serialize() | 将 `State` 序列化为二进制数据。该方法用于 IoTDB 内部的 `State` 对象传递,注意序列化的顺序必须和下面的反序列化方法一致。 | 是 | +| void deserialize(byte[] bytes) | 将二进制数据反序列化为 `State`。该方法用于 IoTDB 内部的 `State` 对象传递,注意反序列化的顺序必须和上面的序列化方法一致。 | 是 | + +#### 接口详细介绍: + +1. **void reset()** + +该方法的作用是将 `State` 重置为初始的状态,您需要在该方法内填写 `State` 对象中各个字段的初始值。出于优化上的考量,IoTDB 在内部会尽可能地复用 `State`,而不是为每一个组创建一个新的 `State`,这样会引入不必要的开销。当 `State` 更新完一个组中的数据之后,就会调用这个方法重置为初始状态,以此来处理下一个组。 + +以求平均数(也就是 `avg`)的 `State` 为例,您需要数据的总和 `sum` 与数据的条数 `count`,并在 `reset()` 方法中将二者初始化为 0。 + +```java +class AvgState implements State { + double sum; + + long count; + + @Override + public void reset() { + sum = 0; + count = 0; + } + + // other methods +} +``` + +2. **byte[] serialize()/void deserialize(byte[] bytes)** + +该方法的作用是将 State 序列化为二进制数据,和从二进制数据中反序列化出 State。IoTDB 作为分布式数据库,涉及到在不同节点中传递数据,因此您需要编写这两个方法,来实现 State 在不同节点中的传递。注意序列化和反序列的顺序必须一致。 + +还是以求平均数(也就是求 avg)的 State 为例,您可以通过任意途径将 State 的内容转化为 `byte[]` 数组,以及从 `byte[]` 数组中读取出 State 的内容,下面展示的是用 Java8 引入的 `ByteBuffer` 进行序列化/反序列的代码: + +```java +@Override +public byte[] serialize() { + ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES); + buffer.putDouble(sum); + buffer.putLong(count); + + return buffer.array(); +} + +@Override +public void deserialize(byte[] bytes) { + ByteBuffer buffer = ByteBuffer.wrap(bytes); + sum = buffer.getDouble(); + count = buffer.getLong(); +} +``` + +#### UDAF 类 + +编写一个 UDAF 类需要实现`org.apache.iotdb.udf.api.UDAF`接口,下表是需要实现的方法说明。 + +#### 接口说明: + +| 接口定义 | 描述 | 是否必须 | +| ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | +| void validate(UDFParameterValidator validator) throws Exception | 在初始化方法`beforeStart`调用前执行,用于检测`UDFParameters`中用户输入的参数是否合法。该方法与 UDTF 的`validate`相同。 | 否 | +| void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception | 初始化方法,在 UDAF 处理输入数据前,调用用户自定义的初始化行为。与 UDTF 不同的是,这里的 configuration 是 `UDAFConfiguration` 类型。 | 是 | +| State createState() | 创建`State`对象,一般只需要调用默认构造函数,然后按需修改默认的初始值即可。 | 是 | +| void addInput(State state, Column[] columns, BitMap bitMap) | 根据传入的数据`Column[]`批量地更新`State`对象,注意最后一列,也就是 `columns[columns.length - 1]` 总是代表时间列。另外`BitMap`表示之前已经被过滤掉的数据,您在编写该方法时需要手动判断对应的数据是否被过滤掉。 | 是 | +| void combineState(State state, State rhs) | 将`rhs`状态合并至`state`状态中。在分布式场景下,同一组的数据可能分布在不同节点上,IoTDB 会为每个节点上的部分数据生成一个`State`对象,然后调用该方法合并成完整的`State`。 | 是 | +| void outputFinal(State state, ResultValue resultValue) | 根据`State`中的数据,计算出最终的聚合结果。注意根据聚合的语义,每一组只能输出一个值。 | 是 | +| void beforeDestroy() | UDAF 的结束方法。此方法由框架调用,并且只会被调用一次,即在处理完最后一条记录之后被调用。 | 否 | + +在一个完整的 UDAF 实例生命周期中,各个方法的调用顺序如下: + +1. State createState() +2. void validate(UDFParameterValidator validator) throws Exception +3. void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception +4. void addInput(State state, Column[] columns, BitMap bitMap) +5. void combineState(State state, State rhs) +6. void outputFinal(State state, ResultValue resultValue) +7. void beforeDestroy() + +和 UDTF 类似,框架每执行一次 UDAF 查询,都会构造一个全新的 UDF 类实例,查询结束时,对应的 UDF 类实例即被销毁,因此不同 UDAF 查询(即使是在同一个 SQL 语句中)UDF 类实例内部的数据都是隔离的。您可以放心地在 UDAF 中维护一些状态数据,无需考虑并发对 UDF 类实例内部状态数据的影响。 + +#### 接口详细介绍: + +1. **void validate(UDFParameterValidator validator) throws Exception** + +同 UDTF, `validate`方法能够对用户输入的参数进行验证。 + +您可以在该方法中限制输入序列的数量和类型,检查用户输入的属性或者进行自定义逻辑的验证。 + +2. **void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception** + + `beforeStart`方法的作用 UDAF 相同: + + 1. 帮助用户解析 SQL 语句中的 UDF 参数 + 2. 配置 UDF 运行时必要的信息,即指定 UDF 访问原始数据时采取的策略和输出结果序列的类型 + 3. 创建资源,比如建立外部链接,打开文件等。 + +其中,`UDFParameters` 类型的作用可以参照上文。 + +2.2 **UDTFConfigurations** + +和 UDTF 的区别在于,UDAF 使用了 `UDAFConfigurations` 作为 `configuration` 对象的类型。 + +目前,该类仅支持设置输出数据的类型。 + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setOutputDataType(Type.INT32); +} +``` + +`setOutputDataType` 中设定的输出类型和 `ResultValue` 实际能够接收的数据输出类型关系如下: + +| `setOutputDataType`中设定的输出类型 | `ResultValue`实际能够接收的输出类型 | +| :---------------------------------- | :------------------------------------- | +| INT32 | int | +| INT64 | long | +| FLOAT | float | +| DOUBLE | double | +| BOOLEAN | boolean | +| TEXT | org.apache.iotdb.udf.api.type.Binary | + +UDAF 输出序列的类型也是运行时决定的。您可以根据输入序列类型动态决定输出序列类型。 + +示例: + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setOutputDataType(parameters.getDataType(0)); +} +``` + +3. **State createState()** + +为 UDAF 创建并初始化 `State`。由于 Java 语言本身的限制,您只能调用 `State` 类的默认构造函数。默认构造函数会为类中所有的字段赋一个默认的初始值,如果该初始值并不符合您的要求,您需要在这个方法内进行手动的初始化。 + +下面是一个包含手动初始化的例子。假设您要实现一个累乘的聚合函数,`State` 的初始值应该设置为 1,但是默认构造函数会初始化为 0,因此您需要在调用默认构造函数之后,手动对 `State` 进行初始化: + +```java +public State createState() { + MultiplyState state = new MultiplyState(); + state.result = 1; + return state; +} +``` + +4. **void addInput(State state, Column[] columns, BitMap bitMap)** + +该方法的作用是,通过原始的输入数据来更新 `State` 对象。出于性能上的考量,也是为了和 IoTDB 向量化的查询引擎相对齐,原始的输入数据不再是一个数据点,而是列的数组 `Column[]`。注意最后一列(也就是 `columns[columns.length - 1]` )总是时间列,因此您也可以在 UDAF 中根据时间进行不同的操作。 + +由于输入参数的类型不是一个数据点,而是多个列,您需要手动对列中的部分数据进行过滤处理,这就是第三个参数 `BitMap` 存在的意义。它用来标识这些列中哪些数据被过滤掉了,您在任何情况下都无需考虑被过滤掉的数据。 + +下面是一个用于统计数据条数(也就是 count)的 `addInput()` 示例。它展示了您应该如何使用 `BitMap` 来忽视那些已经被过滤掉的数据。注意还是由于 Java 语言本身的限制,您需要在方法的开头将接口中定义的 `State` 类型强制转化为自定义的 `State` 类型,不然后续无法正常使用该 `State` 对象。 + +```java +public void addInput(State state, Column[] columns, BitMap bitMap) { + CountState countState = (CountState) state; + + int count = columns[0].getPositionCount(); + for (int i = 0; i < count; i++) { + if (bitMap != null && !bitMap.isMarked(i)) { + continue; + } + if (!columns[0].isNull(i)) { + countState.count++; + } + } +} +``` + +5. **void combineState(State state, State rhs)** + +该方法的作用是合并两个 `State`,更加准确的说,是用第二个 `State` 对象来更新第一个 `State` 对象。IoTDB 是分布式数据库,同一组的数据可能分布在多个不同的节点上。出于性能考虑,IoTDB 会为每个节点上的部分数据先进行聚合成 `State`,然后再将不同节点上的、属于同一个组的 `State` 进行合并,这就是 `combineState` 的作用。 + +下面是一个用于求平均数(也就是 avg)的 `combineState()` 示例。和 `addInput` 类似,您都需要在开头对两个 `State` 进行强制类型转换。另外需要注意是用第二个 `State` 的内容来更新第一个 `State` 的值。 + +```java +public void combineState(State state, State rhs) { + AvgState avgState = (AvgState) state; + AvgState avgRhs = (AvgState) rhs; + + avgState.count += avgRhs.count; + avgState.sum += avgRhs.sum; +} +``` + +6. **void outputFinal(State state, ResultValue resultValue)** + +该方法的作用是从 `State` 中计算出最终的结果。您需要访问 `State` 中的各个字段,求出最终的结果,并将最终的结果设置到 `ResultValue` 对象中。IoTDB 内部会为每个组在最后调用一次这个方法。注意根据聚合的语义,最终的结果只能是一个值。 + +下面还是一个用于求平均数(也就是 avg)的 `outputFinal` 示例。除了开头的强制类型转换之外,您还将看到 `ResultValue` 对象的具体用法,即通过 `setXXX`(其中 `XXX` 是类型名)来设置最后的结果。 + +```java +public void outputFinal(State state, ResultValue resultValue) { + AvgState avgState = (AvgState) state; + + if (avgState.count != 0) { + resultValue.setDouble(avgState.sum / avgState.count); + } else { + resultValue.setNull(); + } +} +``` + +7. **void beforeDestroy()** + +UDAF 的结束方法,您可以在此方法中进行一些资源释放等的操作。 + +此方法由框架调用。对于一个 UDF 类实例而言,生命周期中会且只会被调用一次,即在处理完最后一条记录之后被调用。 + +### 1.4 完整 Maven 项目示例 + +如果您使用 [Maven](http://search.maven.org/),可以参考我们编写的示例项目**udf-example**。您可以在 [这里](https://github.com/apache/iotdb/tree/master/example/udf) 找到它。 + + +## 2. 为iotdb贡献通用的内置UDF函数 + +该部分主要讲述了外部用户如何将自己编写的 UDF 贡献给 IoTDB 社区。 + +## 2.1 前提条件 + +1. UDF 具有通用性。 + + 通用性主要指的是:UDF 在某些业务场景下,可以被广泛使用。换言之,就是 UDF 具有复用价值,可被社区内其他用户直接使用。 + + 如果不确定自己写的 UDF 是否具有通用性,可以发邮件到 `dev@iotdb.apache.org` 或直接创建 ISSUE 发起讨论。 + +2. UDF 已经完成测试,且能够正常运行在用户的生产环境中。 + +### 2.2 贡献清单 + +1. UDF 的源代码 +2. UDF 的测试用例 +3. UDF 的使用说明 + +### 2.3 贡献内容 + +#### 2.3.1 源代码 + +1. 在`iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin`中创建 UDF 主类和相关的辅助类。 +2. 在`iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin/BuiltinTimeSeriesGeneratingFunction.java`中注册编写的 UDF。 + +#### 2.3.2 测试用例 + +至少需要为贡献的 UDF 编写集成测试。 + +可以在`integration-test/src/test/java/org/apache/iotdb/db/it/udf`中为贡献的 UDF 新增一个测试类进行测试。 + +#### 2.3.3 使用说明 + +使用说明需要包含:UDF 的名称、UDF 的作用、执行函数必须的属性参数、函数的适用的场景以及使用示例等。 + +使用说明需包含中英文两个版本。应分别在 `docs/zh/UserGuide/Operation Manual/DML Data Manipulation Language.md` 和 `docs/UserGuide/Operation Manual/DML Data Manipulation Language.md` 中新增使用说明。 + +#### 2.3.4 提交 PR + +当准备好源代码、测试用例和使用说明后,就可以将 UDF 贡献到 IoTDB 社区了。在 [Github](https://github.com/apache/iotdb) 上面提交 Pull Request (PR) 即可。具体提交方式见:[贡献指南](https://iotdb.apache.org/zh/Community/Development-Guide.html)。 + +当 PR 评审通过并被合并后, UDF 就已经贡献给 IoTDB 社区了! \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_apache.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_apache.md new file mode 100644 index 00000000..7c085f60 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_apache.md @@ -0,0 +1,209 @@ +# 用户自定义函数 + +## 1. UDF 介绍 + +UDF(User Defined Function)即用户自定义函数,IoTDB 提供多种内建的面向时序处理的函数,也支持扩展自定义函数来满足更多的计算需求。 + +IoTDB 支持两种类型的 UDF 函数,如下表所示。 + + + + + + + + + + + + + + + + + + + + + +
UDF 分类数据访问策略描述
UDTFMAPPABLE_ROW_BY_ROW自定义标量函数,输入 k 列时间序列 1 行数据,输出 1 列时间序列 1 行数据,可用于标量函数出现的任何子句和表达式中,如select子句、where子句等。
ROW_BY_ROW
SLIDING_TIME_WINDOW
SLIDING_SIZE_WINDOW
SESSION_TIME_WINDOW
STATE_WINDOW
自定义时间序列生成函数,输入 k 列时间序列 m 行数据,输出 1 列时间序列 n 行数据,输入行数 m 可以与输出行数 n 不相同,只能用于SELECT子句中。
UDAF-自定义聚合函数,输入 k 列时间序列 m 行数据,输出 1 列时间序列 1 行数据,可用于聚合函数出现的任何子句和表达式中,如select子句、having子句等。
+ +### 1.1 UDF 使用 + +UDF 的使用方法与普通内建函数类似,可以直接在 SELECT 语句中像调用普通函数一样使用UDF。 + +#### 1.支持的基础 SQL 语法 + +* `SLIMIT` / `SOFFSET` +* `LIMIT` / `OFFSET` +* 支持值过滤 +* 支持时间过滤 + + +#### 2. 带 * 查询 + +假定现在有时间序列 `root.sg.d1.s1`和 `root.sg.d1.s2`。 + +* **执行`SELECT example(*) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1)`和`example(root.sg.d1.s2)`的结果。 + +* **执行`SELECT example(s1, *) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1, root.sg.d1.s1)`和`example(root.sg.d1.s1, root.sg.d1.s2)`的结果。 + +* **执行`SELECT example(*, *) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1, root.sg.d1.s1)`,`example(root.sg.d1.s2, root.sg.d1.s1)`,`example(root.sg.d1.s1, root.sg.d1.s2)` 和 `example(root.sg.d1.s2, root.sg.d1.s2)`的结果。 + +#### 3. 带自定义输入参数的查询 + +可以在进行 UDF 查询的时候,向 UDF 传入任意数量的键值对参数。键值对中的键和值都需要被单引号或者双引号引起来。注意,键值对参数只能在所有时间序列后传入。下面是一组例子: + + 示例: +``` sql +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; +``` + +#### 4. 与其他查询的嵌套查询 + + 示例: +``` sql +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` + + +## 2. UDF 开发 + +可以参考 UDF函数开发:[开发指导](./UDF-development.md) + +## 3. UDF 管理 + +### 3.1 UDF 注册 + +注册一个 UDF 可以按如下流程进行: + +1. 实现一个完整的 UDF 类,假定这个类的全类名为`org.apache.iotdb.udf.UDTFExample` +2. 将项目打成 JAR 包,如果使用 Maven 管理项目,可以参考 [Maven 项目示例](https://github.com/apache/iotdb/tree/master/example/udf)的写法 +3. 进行注册前的准备工作,根据注册方式的不同需要做不同的准备,具体可参考以下例子 +4. 使用以下 SQL 语句注册 UDF + +```sql +CREATE FUNCTION AS (USING URI URI-STRING) +``` + +#### 示例:注册名为`example`的 UDF,以下两种注册方式任选其一即可 + +#### 方式一:手动放置jar包 + +准备工作: +使用该种方式注册时,需要提前将 JAR 包放置到集群所有节点的 `ext/udf`目录下(该目录可配置)。 + +注册语句: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' +``` + +#### 方式二:集群通过URI自动安装jar包 + +准备工作: +使用该种方式注册时,需要提前将 JAR 包上传到 URI 服务器上并确保执行注册语句的 IoTDB 实例能够访问该 URI 服务器。 + +注册语句: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' +``` + +IoTDB 会下载 JAR 包并同步到整个集群。 + +#### 注意 + +1. 由于 IoTDB 的 UDF 是通过反射技术动态装载的,因此在装载过程中无需启停服务器。 + +2. UDF 函数名称是大小写不敏感的。 + +3. 请不要给 UDF 函数注册一个内置函数的名字。使用内置函数的名字给 UDF 注册会失败。 + +4. 不同的 JAR 包中最好不要有全类名相同但实现功能逻辑不一样的类。例如 UDF(UDAF/UDTF):`udf1`、`udf2`分别对应资源`udf1.jar`、`udf2.jar`。如果两个 JAR 包里都包含一个`org.apache.iotdb.udf.UDTFExample`类,当同一个 SQL 中同时使用到这两个 UDF 时,系统会随机加载其中一个类,导致 UDF 执行行为不一致。 + +### 3.2 UDF 卸载 + +SQL 语法如下: + +```sql +DROP FUNCTION +``` + +示例:卸载上述例子的 UDF: + +```sql +DROP FUNCTION example +``` + + +### 3.3 查看所有注册的 UDF + +``` sql +SHOW FUNCTIONS +``` + +### 3.4 UDF 配置 + +- 允许在 `iotdb-system.properties` 中配置 udf 的存储目录.: + ``` Properties +# UDF lib dir + +udf_lib_dir=ext/udf +``` + +- 使用自定义函数时,提示内存不足,更改 `iotdb-system.properties` 中下述配置参数并重启服务。 + ``` Properties + +# Used to estimate the memory usage of text fields in a UDF query. +# It is recommended to set this value to be slightly larger than the average length of all text +# effectiveMode: restart +# Datatype: int +udf_initial_byte_array_length_for_memory_control=48 + +# How much memory may be used in ONE UDF query (in MB). +# The upper limit is 20% of allocated memory for read. +# effectiveMode: restart +# Datatype: float +udf_memory_budget_in_mb=30.0 + +# UDF memory allocation ratio. +# The parameter form is a:b:c, where a, b, and c are integers. +# effectiveMode: restart +udf_reader_transformer_collector_memory_proportion=1:1:1 +``` + +### 3.5 UDF 用户权限 + +用户在使用 UDF 时会涉及到 `USE_UDF` 权限,具备该权限的用户才被允许执行 UDF 注册、卸载和查询操作。 + +更多用户权限相关的内容,请参考 [权限管理语句](./Authority-Management.md##权限管理)。 + + +## 4. UDF 函数库 + +基于用户自定义函数能力,IoTDB 提供了一系列关于时序数据处理的函数,包括数据质量、数据画像、异常检测、 频域分析、数据匹配、数据修复、序列发现、机器学习等,能够满足工业领域对时序数据处理的需求。 + +可以参考 [UDF 函数库](../SQL-Manual/UDF-Libraries_apache.md)文档,查找安装步骤及每个函数对应的注册语句,以确保正确注册所有需要的函数。 + +## 5. 常见问题: + +1. 如何修改已经注册的 UDF? + +答:假设 UDF 的名称为`example`,全类名为`org.apache.iotdb.udf.UDTFExample`,由`example.jar`引入 + +1. 首先卸载已经注册的`example`函数,执行`DROP FUNCTION example` +2. 删除 `iotdb-server-1.0.0-all-bin/ext/udf` 目录下的`example.jar` +3. 修改`org.apache.iotdb.udf.UDTFExample`中的逻辑,重新打包,JAR 包的名字可以仍然为`example.jar` +4. 将新的 JAR 包上传至 `iotdb-server-1.0.0-all-bin/ext/udf` 目录下 +5. 装载新的 UDF,执行`CREATE FUNCTION example AS "org.apache.iotdb.udf.UDTFExample"` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_timecho.md new file mode 100644 index 00000000..2125951b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/User-defined-function_timecho.md @@ -0,0 +1,209 @@ +# 用户自定义函数 + +## 1. UDF 介绍 + +UDF(User Defined Function)即用户自定义函数,IoTDB 提供多种内建的面向时序处理的函数,也支持扩展自定义函数来满足更多的计算需求。 + +IoTDB 支持两种类型的 UDF 函数,如下表所示。 + + + + + + + + + + + + + + + + + + + + + +
UDF 分类数据访问策略描述
UDTFMAPPABLE_ROW_BY_ROW自定义标量函数,输入 k 列时间序列 1 行数据,输出 1 列时间序列 1 行数据,可用于标量函数出现的任何子句和表达式中,如select子句、where子句等。
ROW_BY_ROW
SLIDING_TIME_WINDOW
SLIDING_SIZE_WINDOW
SESSION_TIME_WINDOW
STATE_WINDOW
自定义时间序列生成函数,输入 k 列时间序列 m 行数据,输出 1 列时间序列 n 行数据,输入行数 m 可以与输出行数 n 不相同,只能用于SELECT子句中。
UDAF-自定义聚合函数,输入 k 列时间序列 m 行数据,输出 1 列时间序列 1 行数据,可用于聚合函数出现的任何子句和表达式中,如select子句、having子句等。
+ +### 1.1 UDF 使用 + +UDF 的使用方法与普通内建函数类似,可以直接在 SELECT 语句中像调用普通函数一样使用UDF。 + +#### 1.支持的基础 SQL 语法 + +* `SLIMIT` / `SOFFSET` +* `LIMIT` / `OFFSET` +* 支持值过滤 +* 支持时间过滤 + + +#### 2. 带 * 查询 + +假定现在有时间序列 `root.sg.d1.s1`和 `root.sg.d1.s2`。 + +* **执行`SELECT example(*) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1)`和`example(root.sg.d1.s2)`的结果。 + +* **执行`SELECT example(s1, *) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1, root.sg.d1.s1)`和`example(root.sg.d1.s1, root.sg.d1.s2)`的结果。 + +* **执行`SELECT example(*, *) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1, root.sg.d1.s1)`,`example(root.sg.d1.s2, root.sg.d1.s1)`,`example(root.sg.d1.s1, root.sg.d1.s2)` 和 `example(root.sg.d1.s2, root.sg.d1.s2)`的结果。 + +#### 3. 带自定义输入参数的查询 + +可以在进行 UDF 查询的时候,向 UDF 传入任意数量的键值对参数。键值对中的键和值都需要被单引号或者双引号引起来。注意,键值对参数只能在所有时间序列后传入。下面是一组例子: + + 示例: +``` sql +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; +``` + +#### 4. 与其他查询的嵌套查询 + + 示例: +``` sql +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` + + +## 2. UDF 开发 + +可以参考 UDF函数开发:[开发指导](./UDF-development.md) + +## 3. UDF 管理 + +### 3.1 UDF 注册 + +注册一个 UDF 可以按如下流程进行: + +1. 实现一个完整的 UDF 类,假定这个类的全类名为`org.apache.iotdb.udf.UDTFExample` +2. 将项目打成 JAR 包,如果使用 Maven 管理项目,可以参考 [Maven 项目示例](https://github.com/apache/iotdb/tree/master/example/udf)的写法 +3. 进行注册前的准备工作,根据注册方式的不同需要做不同的准备,具体可参考以下例子 +4. 使用以下 SQL 语句注册 UDF + +```sql +CREATE FUNCTION AS (USING URI URI-STRING) +``` + +#### 示例:注册名为`example`的 UDF,以下两种注册方式任选其一即可 + +#### 方式一:手动放置jar包 + +准备工作: +使用该种方式注册时,需要提前将 JAR 包放置到集群所有节点的 `ext/udf`目录下(该目录可配置)。 + +注册语句: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' +``` + +#### 方式二:集群通过URI自动安装jar包 + +准备工作: +使用该种方式注册时,需要提前将 JAR 包上传到 URI 服务器上并确保执行注册语句的 IoTDB 实例能够访问该 URI 服务器。 + +注册语句: + +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' +``` + +IoTDB 会下载 JAR 包并同步到整个集群。 + +#### 注意 + +1. 由于 IoTDB 的 UDF 是通过反射技术动态装载的,因此在装载过程中无需启停服务器。 + +2. UDF 函数名称是大小写不敏感的。 + +3. 请不要给 UDF 函数注册一个内置函数的名字。使用内置函数的名字给 UDF 注册会失败。 + +4. 不同的 JAR 包中最好不要有全类名相同但实现功能逻辑不一样的类。例如 UDF(UDAF/UDTF):`udf1`、`udf2`分别对应资源`udf1.jar`、`udf2.jar`。如果两个 JAR 包里都包含一个`org.apache.iotdb.udf.UDTFExample`类,当同一个 SQL 中同时使用到这两个 UDF 时,系统会随机加载其中一个类,导致 UDF 执行行为不一致。 + +### 3.2 UDF 卸载 + +SQL 语法如下: + +```sql +DROP FUNCTION +``` + +示例:卸载上述例子的 UDF: + +```sql +DROP FUNCTION example +``` + + +### 3.3 查看所有注册的 UDF + +``` sql +SHOW FUNCTIONS +``` + +### 3.4 UDF 配置 + +- 允许在 `iotdb-system.properties` 中配置 udf 的存储目录.: + ``` Properties +# UDF lib dir + +udf_lib_dir=ext/udf +``` + +- 使用自定义函数时,提示内存不足,更改 `iotdb-system.properties` 中下述配置参数并重启服务。 + ``` Properties + +# Used to estimate the memory usage of text fields in a UDF query. +# It is recommended to set this value to be slightly larger than the average length of all text +# effectiveMode: restart +# Datatype: int +udf_initial_byte_array_length_for_memory_control=48 + +# How much memory may be used in ONE UDF query (in MB). +# The upper limit is 20% of allocated memory for read. +# effectiveMode: restart +# Datatype: float +udf_memory_budget_in_mb=30.0 + +# UDF memory allocation ratio. +# The parameter form is a:b:c, where a, b, and c are integers. +# effectiveMode: restart +udf_reader_transformer_collector_memory_proportion=1:1:1 +``` + +### 3.5 UDF 用户权限 + +用户在使用 UDF 时会涉及到 `USE_UDF` 权限,具备该权限的用户才被允许执行 UDF 注册、卸载和查询操作。 + +更多用户权限相关的内容,请参考 [权限管理语句](./Authority-Management.md##权限管理)。 + + +## 4. UDF 函数库 + +基于用户自定义函数能力,IoTDB 提供了一系列关于时序数据处理的函数,包括数据质量、数据画像、异常检测、 频域分析、数据匹配、数据修复、序列发现、机器学习等,能够满足工业领域对时序数据处理的需求。 + +可以参考 [UDF 函数库](../SQL-Manual/UDF-Libraries_timecho.md)文档,查找安装步骤及每个函数对应的注册语句,以确保正确注册所有需要的函数。 + +## 5. 常见问题: + +1. 如何修改已经注册的 UDF? + +答:假设 UDF 的名称为`example`,全类名为`org.apache.iotdb.udf.UDTFExample`,由`example.jar`引入 + +1. 首先卸载已经注册的`example`函数,执行`DROP FUNCTION example` +2. 删除 `iotdb-server-1.0.0-all-bin/ext/udf` 目录下的`example.jar` +3. 修改`org.apache.iotdb.udf.UDTFExample`中的逻辑,重新打包,JAR 包的名字可以仍然为`example.jar` +4. 将新的 JAR 包上传至 `iotdb-server-1.0.0-all-bin/ext/udf` 目录下 +5. 装载新的 UDF,执行`CREATE FUNCTION example AS "org.apache.iotdb.udf.UDTFExample"` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/User-Manual/White-List_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/White-List_timecho.md new file mode 100644 index 00000000..4b43a03f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/User-Manual/White-List_timecho.md @@ -0,0 +1,70 @@ + + + +# 白名单 + +**功能描述** + +允许哪些客户端地址能连接 IoTDB + +**配置文件** + +conf/iotdb-system.properties + +conf/white.list + +**配置项** + +iotdb-system.properties: + +决定是否开启白名单功能 + +```YAML +# 是否开启白名单功能 +enable_white_list=true +``` + +white.list: + +决定哪些IP地址能够连接IoTDB + +```YAML +# 支持注释 +# 支持精确匹配,每行一个ip +10.2.3.4 + +# 支持*通配符,每行一个ip +10.*.1.3 +10.100.0.* +``` + +**注意事项** + +1. 如果通过session客户端取消本身的白名单,当前连接并不会立即断开。在下次创建连接的时候拒绝。 +2. 如果直接修改white.list,一分钟内生效。如果通过session客户端修改,立即生效,更新内存中的值和white.list磁盘文件 +3. 开启白名单功能,没有white.list 文件,启动DB服务成功,但是,拒绝所有连接。 +4. DB服务运行中,删除 white.list 文件,至多一分钟后,拒绝所有连接。 +5. 是否开启白名单功能的配置,可以热加载。 +6. 使用Java 原生接口修改白名单,必须是root用户才能修改,拒绝非root用户修改;修改内容必须合法,否则会抛出StatementExecutionException异常。 + +![白名单](https://alioss.timecho.com/docs/img/%E7%99%BD%E5%90%8D%E5%8D%95.PNG) + diff --git a/src/zh/UserGuide/V2.0.1/Tree/UserGuideReadme.md b/src/zh/UserGuide/V2.0.1/Tree/UserGuideReadme.md new file mode 100644 index 00000000..d0bf223c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/UserGuideReadme.md @@ -0,0 +1,30 @@ + +# IoTDB 用户手册 Toc + +我们一直都在致力于不断向 IOTDB 中引入更多功能,因此不同的发行版本的用户手册文档也不尽相同。 + +"In Progress Version" 用于匹配 IOTDB 源代码存储库的 master 分支。 +其他文档用于匹配 IoTDB 发布的版本。 + +- [In progress version](https://iotdb.apache.org/UserGuide/Master/QuickStart/QuickStart_apache.html) +- [Version 1.0.x](https://iotdb.apache.org/UserGuide/V1.0.x/QuickStart/QuickStart.html) +- [Version 0.13.x](https://iotdb.apache.org/UserGuide/V0.13.x/QuickStart/QuickStart.html) diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/AINode_Deployment.md b/src/zh/UserGuide/V2.0.1/Tree/stage/AINode_Deployment.md new file mode 100644 index 00000000..6bfb403e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/AINode_Deployment.md @@ -0,0 +1,329 @@ + +# AINode 部署 + +## 安装环境 + +### 建议操作系统 + +Ubuntu, CentOS, MacOS + +### 运行环境 + +AINode目前要求系统3.8以上的Python,且带有pip和venv工具 + +如果是联网的情况,AINode会创建虚拟环境并自动下载运行时的依赖包,不需要额外配置。 + +如果是非联网的环境,可以从 https://cloud.tsinghua.edu.cn/d/4c1342f6c272439aa96c/ 中获取安装所需要的依赖包并离线安装。 + +## 安装步骤 + +用户可以下载AINode的软件安装包,下载并解压后即完成AINode的安装。也可以从代码仓库中下载源码并编译来获取安装包。 + +## 软件目录结构 + +下载软件安装包并解压后,可以得到如下的目录结构 + +```Shell +|-- apache-iotdb-AINode-bin + |-- lib # 打包的二进制可执行文件,包含环境依赖 + |-- conf # 存放配置文件 + - iotdb-AINode.properties + |-- sbin # AINode相关启动脚本 + - start-AINode.sh + - start-AINode.bat + - stop-AINode.sh + - stop-AINode.bat + - remove-AINode.sh + - remove-AINode.bat + |-- licenses + - LICENSE + - NOTICE + - README.md + - README_ZH.md + - RELEASE_NOTES.md +``` + +- **lib:**AINode编译后的二进制可执行文件以及相关的代码依赖 +- **conf:**包含AINode的配置项,具体包含以下配置项 +- **sbin:**AINode的运行脚本,可以启动,移除和停止AINode + +## 启动AINode + +在完成Seed-ConfigNode的部署后,可以通过添加AINode节点来支持模型的注册和推理功能。在配置项中指定IoTDB集群的信息后,可以执行相应的指令来启动AINode,加入IoTDB集群。 + +注意:启动AINode需要系统环境中含有3.8及以上的Python解释器作为默认解释器,用户在使用前请检查环境变量中是否存在Python解释器且可以通过`python`指令直接调用。 + +### 直接启动 + +在获得安装包的文件后,用户可以直接进行AINode的初次启动。 + +在Linux和MacOS上的启动指令如下: + +```Shell +> bash sbin/start-AINode.sh +``` + +在windows上的启动指令如下: + +```Shell +> sbin\start-AINode.bat +``` + +如果首次启动AINode且没有指定解释器路径,那么脚本将在程序根目录使用系统Python解释器新建venv虚拟环境,并在这个环境中自动先后安装AINode的第三方依赖和AINode主程序。**这个过程将产生大小约为1GB的虚拟环境,请预留好安装的空间**。在后续启动时,如果未指定解释器路径,脚本将自动寻找上面新建的venv环境并启动AINode,无需重复安装程序和依赖。 + +注意,如果希望在某次启动时强制重新安装AINode本体,可以通过-r激活reinstall,该参数会根据lib下的文件重新安装AINode。 + +Linux和MacOS: + +```Shell +> bash sbin/start-AINode.sh -r +``` + +Windows: + +```Shell +> sbin\start-AINode.bat -r +``` + +例如,用户在lib中更换了更新版本的AINode安装包,但该安装包并不会安装到用户的常用环境中。此时用户即需要在启动时添加-r选项来指示脚本强制重新安装虚拟环境中的AINode主程序,实现版本的更新 + +### 指定自定义虚拟环境 + +在启动AINode时,可以通过指定一个虚拟环境解释器路径来将AINode主程序及其依赖安装到特定的位置。具体需要指定参数ain_interpreter_dir的值。 + +Linux和MacOS: + +```Shell +> bash sbin/start-AINode.sh -i xxx/bin/python +``` + +Windows: + +```Shell +> sbin\start-AINode.bat -i xxx\Scripts\python.exe +``` + +在指定Python解释器的时候请输入虚拟环境中Python解释器的**可执行文件**的地址。目前AINode**支持venv、****conda****等虚拟环境**,**不支持输入系统Python解释器作为安装位置**。为了保证脚本能够正常识别,请**尽可能使用绝对路径** + +### 加入集群 + +AINode启动过程中会自动将新的AINode加入IoTDB集群。启动AINode后可以通过在IoTDB的cli命令行中输入集群查询的SQL来验证节点是否加入成功。 + +```Shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|Running| 127.0.0.1| 10810|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ + +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | |UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730| | 0.0.0.0| 6667| 10740| 10750| 10760|UNKNOWN|190e303-dev| +| 2| AINode|Running| 127.0.0.1| 10810| | 0.0.0.0| 10810| | | |UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+-------+-----------+ + +IoTDB> show AINodes ++------+-------+----------+-------+ +|NodeID| Status|RpcAddress|RpcPort| ++------+-------+----------+-------+ +| 2|Running| 127.0.0.1| 10810| ++------+-------+----------+-------+ +``` + +## 移除AINode + +当需要把一个已经连接的AINode移出集群时,可以执行对应的移除脚本。 + +在Linux和MacOS上的指令如下: + +```Shell +> bash sbin/remove-AINode.sh +``` + +在windows上的启动指令如下: + +```Shell +> sbin\remove-AINode.bat +``` + +移除节点后,将无法查询到节点的相关信息。 + +```Shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` + +另外,如果之前自定义了AINode安装的位置,那么在调用remove脚本的时候也需要附带相应的路径作为参数: + +Linux和MacOS: + +```Shell +> bash sbin/remove-AINode.sh -i xxx/bin/python +``` + +Windows: + +```Shell +> sbin\remove-AINode.bat -i 1 xxx\Scripts\python.exe +``` + +类似地,在env脚本中持久化修改的脚本参数同样会在执行移除的时候生效。 + +如果用户丢失了data文件夹下的文件,可能AINode本地无法主动移除自己,需要用户指定节点号、地址和端口号进行移除,此时我们支持用户按照以下方法输入参数进行删除 + +Linux和MacOS: + +```Shell +> bash sbin/remove-AINode.sh -t /: +``` + +Windows: + +```Shell +> sbin\remove-AINode.bat -t /: +``` + +## 停止AINode + +如果需要停止正在运行的AINode节点,则执行相应的关闭脚本。 + +在Linux和MacOS上的指令如下: + +```Shell +> bash sbin/stop-AINode.sh +``` + +在windows上的启动指令如下: + +```Shell +> sbin\stop-AINode.bat +``` + +此时无法获取节点的具体状态,也就无法使用对应的管理和推理功能。如果需要重新启动该节点,再次执行启动脚本即可。 + +```Shell +IoTDB> show cluster ++------+----------+-------+---------------+------------+-------+-----------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version| BuildInfo| ++------+----------+-------+---------------+------------+-------+-----------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710|UNKNOWN|190e303-dev| +| 1| DataNode|Running| 127.0.0.1| 10730|UNKNOWN|190e303-dev| +| 2| AINode|UNKNOWN| 127.0.0.1| 10790|UNKNOWN|190e303-dev| ++------+----------+-------+---------------+------------+-------+-----------+ +``` + +## 脚本参数详情 + +AINode启动过程中支持两种参数,其具体的作用如下图所示: + +| **名称** | **作用脚本** | 标签 | **描述** | **类型** | **默认值** | 输入方式 | +| ------------------- | ---------------- | ---- | ------------------------------------------------------------ | -------- | ---------------- | --------------------- | +| ain_interpreter_dir | start remove env | -i | AINode所安装在的虚拟环境的解释器路径,需要使用绝对路径 | String | 默认读取环境变量 | 调用时输入+持久化修改 | +| ain_remove_target | remove stop | -t | AINode关闭时可以指定待移除的目标AINode的Node ID、地址和端口号,格式为`/:` | String | 无 | 调用时输入 | +| ain_force_reinstall | start remove env | -r | 该脚本在检查AINode安装情况的时候是否检查版本,如果检查则在版本不对的情况下会强制安装lib里的whl安装包 | Bool | false | 调用时输入 | +| ain_no_dependencies | start remove env | -n | 指定在安装AINode的时候是否安装依赖,如果指定则仅安装AINode主程序而不安装依赖。 | Bool | false | 调用时输入 | + +除了按照上文所述的方法在执行脚本时传入上述参数外,也可以在`conf`文件夹下的`AINode-env.sh`和`AINode-env.bat`脚本中持久化地修改部分参数。 + +`AINode-env.sh`: + +```Bash +# The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark +# ain_interpreter_dir= +``` + +`AINode-env.bat`: + +```Plain +@REM The defaulte venv environment is used if ain_interpreter_dir is not set. Please use absolute path without quotation mark +@REM set ain_interpreter_dir= +``` + +在写入参数值的后解除对应行的注释并保存即可在下一次执行脚本时生效。 + +## AINode配置项 + +AINode支持修改一些必要的参数。可以在`conf/iotdb-AINode.properties`文件中找到下列参数并进行持久化的修改: + +| **名称** | **描述** | **类型** | **默认值** | **改后生效方式** | +| --------------------------- | ------------------------------------------------------------ | -------- | ------------------ | ---------------------------- | +| ain_seed_config_node | AINode启动时注册的ConfigNode地址 | String | 10710 | 仅允许在第一次启动服务前修改 | +| ain_inference_rpc_address | AINode提供服务与通信的地址 | String | 127.0.0.1 | 重启后生效 | +| ain_inference_rpc_port | AINode提供服务与通信的端口 | String | 10810 | 重启后生效 | +| ain_system_dir | AINode元数据存储路径,相对路径的起始目录与操作系统相关,建议使用绝对路径。 | String | data/AINode/system | 重启后生效 | +| ain_models_dir | AINode存储模型文件的路径,相对路径的起始目录与操作系统相关,建议使用绝对路径。 | String | data/AINode/models | 重启后生效 | +| ain_logs_dir | AINode存储日志的路径,相对路径的起始目录与操作系统相关,建议使用绝对路径。 | String | logs/AINode | 重启后生效 | + +## 常见问题解答 + +1. **启动AINode时出现找不到venv模块的报错** + +当使用默认方式启动AINode时,会在安装包目录下创建一个python虚拟环境并安装依赖,因此要求安装venv模块。通常来说python3.8及以上的版本会自带venv,但对于一些系统自带的python环境可能并不满足这一要求。出现该报错时有两种解决方案(二选一): + +- 在本地安装venv模块,以ubuntu为例,可以通过运行以下命令来安装python自带的venv模块。或者从python官网安装一个自带venv的python版本 + +```SQL +apt-get install python3.8-venv +``` + +- 在运行启动脚本时通过-i指定已有的python解释器路径作为AINode的运行环境,这样就不再需要创建一个新的虚拟环境。 + +2. **在CentOS7中编译python环境** + +在centos7的新环境中(自带python3.6)不满足启动mlnode的要求,需要自行编译python3.8+(python在centos7中未提供二进制包) + +- 安装OpenSSL + +> Currently Python versions 3.6 to 3.9 are compatible with OpenSSL 1.0.2, 1.1.0, and 1.1.1. + +Python要求我们的系统上安装有OpenSSL,具体安装方法可见https://stackoverflow.com/questions/56552390/how-to-fix-ssl-module-in-python-is-not-available-in-centos + +- 安装编译python + +使用以下指定从官网下载安装包并解压 + +```SQL +wget https://www.python.org/ftp/python/3.8.1/Python-3.8.1.tgz +tar -zxvf Python-3.8.1.tgz +``` + +编译安装对应的python包 + +```SQL +./configure prefix=/usr/local/python3 -with-openssl=/usr/local/openssl +make && make install +``` + +3. **windows下出现类似“error:Microsoft Visual** **C++** **14.0 or greater is required...”的编译问题** + +出现对应的报错,通常是c++版本或是setuptools版本不足,可以在https://stackoverflow.com/questions/44951456/pip-error-microsoft-visual-c-14-0-is-required中查找适合的解决方案。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Administration-Management/Administration.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Administration-Management/Administration.md new file mode 100644 index 00000000..babb7793 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Administration-Management/Administration.md @@ -0,0 +1,536 @@ + + +# 权限管理 + +IoTDB 为用户提供了权限管理操作,从而为用户提供对于数据的权限管理功能,保障数据的安全。 + +我们将通过以下几个具体的例子为您示范基本的用户权限操作,详细的 SQL 语句及使用方式详情请参见本文 [数据模式与概念章节](../Basic-Concept/Data-Model-and-Terminology.md)。同时,在 JAVA 编程环境中,您可以使用 [JDBC API](../API/Programming-JDBC.md) 单条或批量执行权限管理类语句。 + +## 基本概念 + +### 用户 + +用户即数据库的合法使用者。一个用户与一个唯一的用户名相对应,并且拥有密码作为身份验证的手段。一个人在使用数据库之前,必须先提供合法的(即存于数据库中的)用户名与密码,使得自己成为用户。 + +### 权限 + +数据库提供多种操作,并不是所有的用户都能执行所有操作。如果一个用户可以执行某项操作,则称该用户有执行该操作的权限。权限可分为数据管理权限(如对数据进行增删改查)以及权限管理权限(用户、角色的创建与删除,权限的赋予与撤销等)。数据管理权限往往需要一个路径来限定其生效范围,可使用[路径模式](../Basic-Concept/Data-Model-and-Terminology.md)灵活管理权限。 + +### 角色 + +角色是若干权限的集合,并且有一个唯一的角色名作为标识符。用户通常和一个现实身份相对应(例如交通调度员),而一个现实身份可能对应着多个用户。这些具有相同现实身份的用户往往具有相同的一些权限。角色就是为了能对这样的权限进行统一的管理的抽象。 + +### 默认用户及其具有的角色 + +初始安装后的 IoTDB 中有一个默认用户:root,默认密码为 root。该用户为管理员用户,固定拥有所有权限,无法被赋予、撤销权限,也无法被删除。 + +## 权限操作示例 + +根据本文中描述的 [样例数据](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt) 内容,IoTDB 的样例数据可能同时属于 ln, sgcc 等不同发电集团,不同的发电集团不希望其他发电集团获取自己的数据库数据,因此我们需要将不同的数据在集团层进行权限隔离。 + +### 创建用户 + +使用 `CREATE USER ` 创建用户。例如,我们可以使用具有所有权限的root用户为 ln 和 sgcc 集团创建两个用户角色,名为 ln_write_user, sgcc_write_user,密码均为 write_pwd。建议使用反引号(`)包裹用户名。SQL 语句为: + +``` +CREATE USER `ln_write_user` 'write_pwd' +CREATE USER `sgcc_write_user` 'write_pwd' +``` +此时使用展示用户的 SQL 语句: + +``` +LIST USER +``` +我们可以看到这两个已经被创建的用户,结果如下: + +``` +IoTDB> CREATE USER `ln_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' +Msg: The statement is executed successfully. +IoTDB> LIST USER ++---------------+ +| user| ++---------------+ +| ln_write_user| +| root| +|sgcc_write_user| ++---------------+ +Total line number = 3 +It costs 0.157s +``` + +### 赋予用户权限 + +此时,虽然两个用户已经创建,但是他们不具有任何权限,因此他们并不能对数据库进行操作,例如我们使用 ln_write_user 用户对数据库中的数据进行写入,SQL 语句为: + +``` +INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +``` +此时,系统不允许用户进行此操作,会提示错误: + +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +现在,我们用root用户分别赋予他们向对应 database 数据的写入权限. + +我们使用 `GRANT USER PRIVILEGES ON ` 语句赋予用户权限(注:其中,创建用户权限无需指定路径),例如: + +``` +GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +``` +执行状态如下所示: + +``` +IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +接着使用ln_write_user再尝试写入数据 +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: The statement is executed successfully. +``` + +### 撤销用户权限 + +授予用户权限后,我们可以使用 `REVOKE USER PRIVILEGES ON ` 来撤销已授予的用户权限(注:其中,撤销创建用户权限无需指定路径)。例如,用root用户撤销ln_write_user和sgcc_write_user的权限: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +``` + +执行状态如下所示: + +``` +REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** +Msg: The statement is executed successfully. +REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** +Msg: The statement is executed successfully. +REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER +Msg: The statement is executed successfully. +``` + +撤销权限后,ln_write_user就没有向root.ln.**写入数据的权限了。 +``` +INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) +Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. +``` + +### SQL 语句 + +与权限相关的语句包括: + +* 创建用户 + +``` +CREATE USER ; +Eg: IoTDB > CREATE USER `thulab` 'passwd'; +``` + +* 删除用户 + +``` +DROP USER ; +Eg: IoTDB > DROP USER `xiaoming`; +``` + +* 创建角色 + +``` +CREATE ROLE ; +Eg: IoTDB > CREATE ROLE `admin`; +``` + +* 删除角色 + +``` +DROP ROLE ; +Eg: IoTDB > DROP ROLE `admin`; +``` + +* 赋予用户权限 + +``` +GRANT USER PRIVILEGES ON ; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +- 赋予用户全部的权限 + +``` +GRANT USER PRIVILEGES ALL; +Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; +``` + +* 赋予角色权限 + +``` +GRANT ROLE PRIVILEGES ON ; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +- 赋予角色全部的权限 + +``` +GRANT ROLE PRIVILEGES ALL; +Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; +``` + +* 赋予用户角色 + +``` +GRANT TO ; +Eg: IoTDB > GRANT `temprole` TO tempuser; +``` + +* 撤销用户权限 + +``` +REVOKE USER PRIVILEGES ON ; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; +``` + +- 移除用户所有权限 + +``` +REVOKE USER PRIVILEGES ALL; +Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; +``` + +* 撤销角色权限 + +``` +REVOKE ROLE PRIVILEGES ON ; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; +``` + +- 撤销角色全部的权限 + +``` +REVOKE ROLE PRIVILEGES ALL; +Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; +``` + +* 撤销用户角色 + +``` +REVOKE FROM ; +Eg: IoTDB > REVOKE `temprole` FROM tempuser; +``` + +* 列出所有用户 + +``` +LIST USER +Eg: IoTDB > LIST USER +``` + +* 列出指定角色下所有用户 + +``` +LIST USER OF ROLE ; +Eg: IoTDB > LIST USER OF ROLE `roleuser`; +``` + +* 列出所有角色 + +``` +LIST ROLE +Eg: IoTDB > LIST ROLE +``` + +* 列出指定用户下所有角色 + +``` +LIST ROLE OF USER ; +Eg: IoTDB > LIST ROLE OF USER `tempuser`; +``` + +* 列出用户所有权限 + +``` +LIST PRIVILEGES USER ; +Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; +``` + +* 列出用户在具体路径上相关联的权限 + +``` +LIST PRIVILEGES USER ON ; +Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; ++--------+-----------------------------------+ +| role| privilege| ++--------+-----------------------------------+ +| | root.ln.** : ALTER_TIMESERIES| +|temprole|root.ln.wf01.** : CREATE_TIMESERIES| ++--------+-----------------------------------+ +Total line number = 2 +It costs 0.005s +``` + +* 列出角色所有权限 + +``` +LIST PRIVILEGES ROLE ; +Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; +``` + +* 列出角色在具体路径上相关联的权限 + +``` +LIST PRIVILEGES ROLE ON ; +Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; ++-----------------------------------+ +| privilege| ++-----------------------------------+ +|root.ln.wf01.** : CREATE_TIMESERIES| ++-----------------------------------+ +Total line number = 1 +It costs 0.005s +``` + +* 更新密码 + +``` +ALTER USER SET PASSWORD ; +Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; +``` + + +## 其他说明 + +### 用户、权限与角色的关系 + +角色是权限的集合,而权限和角色都是用户的一种属性。即一个角色可以拥有若干权限。一个用户可以拥有若干角色与权限(称为用户自身权限)。 + +目前在 IoTDB 中并不存在相互冲突的权限,因此一个用户真正具有的权限是用户自身权限与其所有的角色的权限的并集。即要判定用户是否能执行某一项操作,就要看用户自身权限或用户的角色的所有权限中是否有一条允许了该操作。用户自身权限与其角色权限,他的多个角色的权限之间可能存在相同的权限,但这并不会产生影响。 + +需要注意的是:如果一个用户自身有某种权限(对应操作 A),而他的某个角色有相同的权限。那么如果仅从该用户撤销该权限无法达到禁止该用户执行操作 A 的目的,还需要从这个角色中也撤销对应的权限,或者从这个用户将该角色撤销。同样,如果仅从上述角色将权限撤销,也不能禁止该用户执行操作 A。 + +同时,对角色的修改会立即反映到所有拥有该角色的用户上,例如对角色增加某种权限将立即使所有拥有该角色的用户都拥有对应权限,删除某种权限也将使对应用户失去该权限(除非用户本身有该权限)。 + +### 系统所含权限列表 + +**系统所含权限列表** + +| 权限名称 | 说明 | 示例 | +|:--------------------------|:----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CREATE\_DATABASE | 创建 database。包含设置 database 的权限和TTL。路径相关 | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | +| DELETE\_DATABASE | 删除 database。路径相关 | Eg: `delete database root.ln;` | +| CREATE\_TIMESERIES | 创建时间序列。路径相关 | Eg1: 创建时间序列
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: 创建对齐时间序列
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | +| INSERT\_TIMESERIES | 插入数据。路径相关 | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | +| ALTER\_TIMESERIES | 修改时间序列标签。路径相关 | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | +| READ\_TIMESERIES | 查询数据。路径相关 | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [数据查询](../Query-Data/Overview.md)(这一节之下的查询语句均使用该权限)
Eg8: CVS格式数据导出
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: 查询性能追踪
`tracing select * from root.**`
Eg10: UDF查询
`select example(*) from root.sg.d1`
Eg11: 查询触发器
`show triggers`
Eg12: 统计查询
`count devices` | +| DELETE\_TIMESERIES | 删除数据或时间序列。路径相关 | Eg1: 删除时间序列
`delete timeseries root.ln.wf01.wt01.status`
Eg2: 删除数据
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: 使用DROP关键字
`drop timeseries root.ln.wf01.wt01.status` | +| CREATE\_USER | 创建用户。路径无关 | Eg: `create user thulab 'passwd';` | +| DELETE\_USER | 删除用户。路径无关 | Eg: `drop user xiaoming;` | +| MODIFY\_PASSWORD | 修改所有用户的密码。路径无关。(没有该权限者仍然能够修改自己的密码。) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | +| LIST\_USER | 列出所有用户,列出具有某角色的所有用户,列出用户在指定路径下相关权限。路径无关 | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | +| GRANT\_USER\_PRIVILEGE | 赋予用户权限。路径无关 | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| REVOKE\_USER\_PRIVILEGE | 撤销用户权限。路径无关 | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | +| GRANT\_USER\_ROLE | 赋予用户角色。路径无关 | Eg: `grant temprole to tempuser;` | +| REVOKE\_USER\_ROLE | 撤销用户角色。路径无关 | Eg: `revoke temprole from tempuser;` | +| CREATE\_ROLE | 创建角色。路径无关 | Eg: `create role admin;` | +| DELETE\_ROLE | 删除角色。路径无关 | Eg: `drop role admin;` | +| LIST\_ROLE | 列出所有角色,列出某用户下所有角色,列出角色在指定路径下相关权限。路径无关 | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | +| GRANT\_ROLE\_PRIVILEGE | 赋予角色权限。路径无关 | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| REVOKE\_ROLE\_PRIVILEGE | 撤销角色权限。路径无关 | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | +| CREATE_FUNCTION | 注册 UDF。路径无关 | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | +| DROP_FUNCTION | 卸载 UDF。路径无关 | Eg: `drop function example` | +| CREATE_TRIGGER | 创建触发器。路径相关 | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | +| DROP_TRIGGER | 卸载触发器。路径相关 | Eg: `drop trigger 'alert-listener-sg1d1s1'` | +| CREATE_CONTINUOUS_QUERY | 创建连续查询。路径无关 | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | +| DROP_CONTINUOUS_QUERY | 卸载连续查询。路径无关 | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | +| SHOW_CONTINUOUS_QUERIES | 展示所有连续查询。路径无关 | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | +| UPDATE_TEMPLATE | 创建、删除模板。路径无关。 | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | +| READ_TEMPLATE | 查看所有模板、模板内容。 路径无关 | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | +| APPLY_TEMPLATE | 挂载、卸载、激活、解除模板。路径有关。 | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | +| READ_TEMPLATE_APPLICATION | 查看模板的挂载路径和激活路径。路径无关 | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | + +注意: 路径无关的权限只能在路径root.**下赋予或撤销; + +注意: 下述sql语句需要赋予多个权限才可以使用: + +- 导入数据,需要赋予`READ_TIMESERIES`,`INSERT_TIMESERIES`两种权限。 + +``` +Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv +``` + +- 查询写回(SELECT_INTO) + - 需要所有 `select` 子句中源序列的 `READ_TIMESERIES` 权限 + - 需要所有 `into` 子句中目标序列 `INSERT_TIMESERIES` 权限 + +``` +Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 +``` + +### 用户名限制 + +IoTDB 规定用户名的字符长度不小于 4,其中用户名不能包含空格。 + +### 密码限制 + +IoTDB 规定密码的字符长度不小于 4,其中密码不能包含空格,密码默认采用 MD5 进行加密。 + +### 角色名限制 + +IoTDB 规定角色名的字符长度不小于 4,其中角色名不能包含空格。 + +### 权限管理中的路径模式 + +一个路径模式的结果集包含了它的子模式的结果集的所有元素。例如,`root.sg.d.*`是`root.sg.*.*`的子模式,而`root.sg.**`不是`root.sg.*.*`的子模式。当用户被授予对某个路径模式的权限时,在他的DDL或DML中使用的模式必须是该路径模式的子模式,这保证了用户访问时间序列时不会超出他的权限范围。 + +### 权限缓存 + +在分布式相关的权限操作中,在进行除了创建用户和角色之外的其他权限更改操作时,都会先清除与该用户(角色)相关的所有的`dataNode`的缓存信息,如果任何一台`dataNode`缓存信息清楚失败,这个权限更改的任务就会失败。 + +### 非root用户限制进行的操作 + +目前以下IoTDB支持的sql语句只有`root`用户可以进行操作,且没有对应的权限可以赋予新用户。 + +#### TsFile管理 + +- 加载TsFile + +``` +Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' +``` + +- 删除TsFile文件 + +``` +Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' +``` + +- 卸载TsFile文件到指定目录 + +``` +Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' +``` + +#### 删除时间分区(实验性功能) + +- 删除时间分区(实验性功能) + +``` +Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 +``` + +#### 连续查询 + +- 连续查询(CQ) + +``` +Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END +``` + +#### 运维命令 + +- FLUSH + +``` +Eg: IoTDB > flush +``` + +- MERGE + +``` +Eg: IoTDB > MERGE +Eg: IoTDB > FULL MERGE +``` + +- CLEAR CACHE + +```sql +Eg: IoTDB > CLEAR CACHE +``` + +- START REPAIR DATA + +```sql +Eg: IoTDB > START REPAIR DATA +``` + +- STOP REPAIR DATA + +```sql +Eg: IoTDB > STOP REPAIR DATA +``` + +- SET SYSTEM TO READONLY / WRITABLE + +``` +Eg: IoTDB > SET SYSTEM TO READONLY / WRITABLE +``` + +- 查询终止 + +``` +Eg: IoTDB > KILL QUERY 1 +``` + +#### 水印工具 + +- 为新用户施加水印 + +``` +Eg: IoTDB > grant watermark_embedding to Alice +``` + +- 撤销水印 + +``` +Eg: IoTDB > revoke watermark_embedding from Alice +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Architecture.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Architecture.md new file mode 100644 index 00000000..dc66884d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Architecture.md @@ -0,0 +1,44 @@ + + +# 系统架构 + +IoTDB 套件由若干个组件构成,共同形成“数据收集-数据写入-数据存储-数据查询-数据可视化-数据分析”等一系列功能。 + +如下图展示了使用 IoTDB 套件全部组件后形成的整体应用架构。下文称所有组件形成 IoTDB 套件,而 IoTDB 特指其中的时间序列数据库组件。 + + + +在上图中,用户可以通过 JDBC 将来自设备上传感器采集的时序数据、服务器负载和 CPU 内存等系统状态数据、消息队列中的时序数据、应用程序的时序数据或者其他数据库中的时序数据导入到本地或者远程的 IoTDB 中。用户还可以将上述数据直接写成本地(或位于 HDFS 上)的 TsFile 文件。 + +可以将 TsFile 文件写入到 HDFS 上,进而实现在 Hadoop 或 Spark 的数据处理平台上的诸如异常检测、机器学习等数据处理任务。 + +对于写入到 HDFS 或者本地的 TsFile 文件,可以利用 TsFile-Hadoop 或 TsFile-Spark 连接器允许 Hadoop 或 Spark 进行数据处理。 + +对于分析的结果,可以写回成 TsFile 文件。 + +IoTDB 和 TsFile 还提供了相应的客户端工具,满足用户查看和写入数据的 SQL 形式、脚本形式和图形化形式等多种需求。 + +IoTDB 提供了单机部署和集群部署两种模式。在集群部署模式下,IoTDB支持自动故障转移,确保系统在节点故障时能够快速切换到备用节点。切换时间可以达到秒级,从而最大限度地减少系统中断时间,且可保证切换后数据不丢失。当故障节点恢复正常,系统会自动将其重新纳入集群,确保集群的高可用性和可伸缩性。 + +IoTDB还支持读写分离模式部署,可以将读操作和写操作分别分配给不同的节点,从而实现负载均衡和提高系统的并发处理能力。 + +通过这些特性,IoTDB能够避免单点性能瓶颈和单点故障(SPOF),提供高可用性和可靠性的数据存储和管理解决方案。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment.md new file mode 100644 index 00000000..ac4f213a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment.md @@ -0,0 +1,613 @@ + + +# 集群版部署 +## 集群版部署 +以本地环境为例,演示 IoTDB 集群的启动、扩容与缩容。 + +**注意:本文档为使用本地不同端口,进行伪分布式环境部署的教程,仅用于练习。在真实环境部署时,一般不需要修改节点端口,仅需配置节点`IP地址`或者`hostname(机器名/域名)`即可。** + +### 1. 准备启动环境 + +解压 apache-iotdb-1.3.0-all-bin.zip 至 cluster0 目录。 + +### 2. 启动最小集群 + +在 Linux 环境中,部署 1 个 ConfigNode 和 1 个 DataNode(1C1D)集群版,默认 1 副本: + +``` +./cluster0/sbin/start-confignode.sh +./cluster0/sbin/start-datanode.sh +``` + +### 3. 验证最小集群 + ++ 最小集群启动成功,启动 Cli 进行验证: + +``` +./cluster0/sbin/start-cli.sh +``` + ++ 在 Cli 执行 `show cluster details` + 指令,结果如下所示: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+--------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort |SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+--------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 1| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+----------+-------+--------+-------------------+-----------------+ +Total line number = 2 +It costs 0.242s +``` + +### 4. 准备扩容环境 + +解压 apache-iotdb-1.3.0-all-bin.zip 至 cluster1 目录和 cluster2 目录 + +### 5. 修改节点配置文件 + +对于 cluster1 目录: + ++ 修改 ConfigNode 配置: + +| **配置项** | **值** | +| ------------------------------ | --------------- | +| cn_internal_address | 127.0.0.1 | +| cn_internal_port | 10711 | +| cn_consensus_port | 10721 | +| cn_seed_config_node | 127.0.0.1:10710 | + ++ 修改 DataNode 配置: + +| **配置项** | **值** | +| ----------------------------------- | --------------- | +| dn_rpc_address | 127.0.0.1 | +| dn_rpc_port | 6668 | +| dn_internal_address | 127.0.0.1 | +| dn_internal_port | 10731 | +| dn_mpp_data_exchange_port | 10741 | +| dn_schema_region_consensus_port | 10751 | +| dn_data_region_consensus_port | 10761 | +| dn_seed_config_node | 127.0.0.1:10710 | + +对于 cluster2 目录: + ++ 修改 ConfigNode 配置: + +| **配置项** | **值** | +| ------------------------------ | --------------- | +| cn_internal_address | 127.0.0.1 | +| cn_internal_port | 10712 | +| cn_consensus_port | 10722 | +| cn_seed_config_node | 127.0.0.1:10710 | + ++ 修改 DataNode 配置: + +| **配置项** | **值** | +| ----------------------------------- | --------------- | +| dn_rpc_address | 127.0.0.1 | +| dn_rpc_port | 6669 | +| dn_internal_address | 127.0.0.1 | +| dn_internal_port | 10732 | +| dn_mpp_data_exchange_port | 10742 | +| dn_schema_region_consensus_port | 10752 | +| dn_data_region_consensus_port | 10762 | +| dn_seed_config_node | 127.0.0.1:10710 | + +### 6. 集群扩容 + +将集群扩容至 3 个 ConfigNode 和 3 个 DataNode(3C3D)集群版, +指令执行顺序为先启动 ConfigNode,再启动 DataNode: + +``` +./cluster1/sbin/start-confignode.sh +./cluster2/sbin/start-confignode.sh +./cluster1/sbin/start-datanode.sh +./cluster2/sbin/start-datanode.sh +``` + +### 7. 验证扩容结果 + +在 Cli 执行 `show cluster details`,结果如下: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 2|ConfigNode|Running| 127.0.0.1| 10711| 10721| | | | | | +| 3|ConfigNode|Running| 127.0.0.1| 10712| 10722| | | | | | +| 1| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 127.0.0.1| 10731| | 127.0.0.1| 6668| 10741| 10751| 10761| +| 5| DataNode|Running| 127.0.0.1| 10732| | 127.0.0.1| 6669| 10742| 10752| 10762| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +### 8. 集群缩容 + ++ 缩容一个 ConfigNode: + +``` +# 使用 ip:port 移除 +./cluster0/sbin/remove-confignode.sh 127.0.0.1:10711 + +# 使用节点编号移除 +./cluster0/sbin/remove-confignode.sh 2 +``` + ++ 缩容一个 DataNode: + +``` +# 使用 ip:port 移除 +./cluster0/sbin/remove-datanode.sh 127.0.0.1:6668 + +# 使用节点编号移除 +./cluster0/sbin/remove-confignode.sh 4 +``` + +### 9. 验证缩容结果 + +在 Cli 执行 `show cluster details`,结果如下: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 3|ConfigNode|Running| 127.0.0.1| 10712| 10722| | | | | | +| 1| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| 127.0.0.1| 10732| | 127.0.0.1| 6669| 10742| 10752| 10762| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 4 +It costs 0.005s +``` + +## 手动部署 + +### 前置检查 + +1. JDK>=1.8 的运行环境,并配置好 JAVA_HOME 环境变量。 +2. 设置最大文件打开数为 65535。 +3. 关闭交换内存。 +4. 首次启动ConfigNode节点时,确保已清空ConfigNode节点的data/confignode目录;首次启动DataNode节点时,确保已清空DataNode节点的data/datanode目录。 +5. 如果整个集群处在可信环境下,可以关闭机器上的防火墙选项。 +6. 在集群默认配置中,ConfigNode 会占用端口 10710 和 10720,DataNode 会占用端口 6667、10730、10740、10750 和 10760, + 请确保这些端口未被占用,或者手动修改配置文件中的端口配置。 + +### 安装包获取 + +你可以选择下载二进制文件或从源代码编译。 + +#### 下载二进制文件 + +1. 打开官网[Download Page](https://iotdb.apache.org/Download/)。 +2. 下载 IoTDB 1.3.0 版本的二进制文件。 +3. 解压得到 apache-iotdb-1.3.0-all-bin 目录。 + +#### 使用源码编译 + +##### 下载源码 + +**Git** + +``` +git clone https://github.com/apache/iotdb.git +git checkout v1.3.0 +``` + +**官网下载** + +1. 打开官网[Download Page](https://iotdb.apache.org/Download/)。 +2. 下载 IoTDB 1.3.0 版本的源码。 +3. 解压得到 apache-iotdb-1.3.0 目录。 + +##### 编译源码 + +在 IoTDB 源码根目录下: + +``` +mvn clean package -pl distribution -am -DskipTests +``` + +编译成功后,可在目录 +**distribution/target/apache-iotdb-1.3.0-all-bin/apache-iotdb-1.3.0-all-bin** +找到集群版本的二进制文件。 + +### 安装包说明 + +打开 apache-iotdb-1.3.0-all-bin,可见以下目录: + +| **目录** | **说明** | +| -------- | ------------------------------------------------------------ | +| conf | 配置文件目录,包含 ConfigNode、DataNode、JMX 和 logback 等配置文件 | +| data | 数据文件目录,包含 ConfigNode 和 DataNode 的数据文件 | +| lib | 库文件目录 | +| licenses | 证书文件目录 | +| logs | 日志文件目录,包含 ConfigNode 和 DataNode 的日志文件 | +| sbin | 脚本目录,包含 ConfigNode 和 DataNode 的启停移除脚本,以及 Cli 的启动脚本等 | +| tools | 系统工具目录 | + +### 集群安装配置 + +#### 集群安装 + +`apache-iotdb-1.3.0-all-bin` 包含 ConfigNode 和 DataNode, +请将安装包部署于你目标集群的所有机器上,推荐将安装包部署于所有服务器的相同目录下。 + +如果你希望先在一台服务器上尝试部署 IoTDB 集群,请参考 +[Cluster Quick Start](../QuickStart/ClusterQuickStart.md)。 + +#### 集群配置 + +接下来需要修改每个服务器上的配置文件,登录服务器, +并将工作路径切换至 `apache-iotdb-1.3.0-all-bin`, +配置文件在 `./conf` 目录内。 + +对于所有部署 ConfigNode 的服务器,需要修改[通用配置](../Reference/Common-Config-Manual.md)和 [ConfigNode 配置](../Reference/ConfigNode-Config-Manual.md)。 + +对于所有部署 DataNode 的服务器,需要修改[通用配置](../Reference/Common-Config-Manual.md)和 [DataNode 配置](../Reference/DataNode-Config-Manual.md)。 + +##### 通用配置 + +打开通用配置文件 ./conf/iotdb-system.properties, +可根据 [部署推荐](./Deployment-Recommendation.md) +设置以下参数: + +| **配置项** | **说明** | **默认** | +| ------------------------------------------ | ------------------------------------------------------------ | ----------------------------------------------- | +| cluster_name | 节点希望加入的集群的名称 | defaultCluster | +| config_node_consensus_protocol_class | ConfigNode 使用的共识协议 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| schema_replication_factor | 元数据副本数,DataNode 数量不应少于此数目 | 1 | +| schema_region_consensus_protocol_class | 元数据副本组的共识协议 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| data_replication_factor | 数据副本数,DataNode 数量不应少于此数目 | 1 | +| data_region_consensus_protocol_class | 数据副本组的共识协议。注:RatisConsensus 目前不支持多数据目录 | org.apache.iotdb.consensus.iot.IoTConsensus | + +**注意:上述配置项在集群启动后即不可更改,且务必保证所有节点的通用配置完全一致,否则节点无法启动。** + +##### ConfigNode 配置 + +打开 ConfigNode 配置文件 ./conf/iotdb-system.properties,根据服务器/虚拟机的 IP 地址和可用端口,设置以下参数: + +| **配置项** | **说明** | **默认** | **用法** | +| ------------------------------ | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | +| cn_internal_address | ConfigNode 在集群内部通讯使用的地址 | 127.0.0.1 | 设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| cn_internal_port | ConfigNode 在集群内部通讯使用的端口 | 10710 | 设置为任意未占用端口 | +| cn_consensus_port | ConfigNode 副本组共识协议通信使用的端口 | 10720 | 设置为任意未占用端口 | +| cn_seed_config_node | 节点注册加入集群时连接的 ConfigNode 的地址。注:只能配置一个 | 127.0.0.1:10710 | 对于 Seed-ConfigNode,设置为自己的 cn_internal_address:cn_internal_port;对于其它 ConfigNode,设置为另一个正在运行的 ConfigNode 的 cn_internal_address:cn_internal_port | + +**注意:上述配置项在节点启动后即不可更改,且务必保证所有端口均未被占用,否则节点无法启动。** + +##### DataNode 配置 + +打开 DataNode 配置文件 ./conf/iotdb-system.properties,根据服务器/虚拟机的 IP 地址和可用端口,设置以下参数: + +| **配置项** | **说明** | **默认** | **用法** | +| ----------------------------------- | ----------------------------------------- | --------------- | ------------------------------------------------------------ | +| dn_rpc_address | 客户端 RPC 服务的地址 | 127.0.0.1 | 设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 设置为任意未占用端口 | +| dn_internal_address | DataNode 在集群内部接收控制流使用的地址 | 127.0.0.1 | 设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_internal_port | DataNode 在集群内部接收控制流使用的端口 | 10730 | 设置为任意未占用端口 | +| dn_mpp_data_exchange_port | DataNode 在集群内部接收数据流使用的端口 | 10740 | 设置为任意未占用端口 | +| dn_data_region_consensus_port | DataNode 的数据副本间共识协议通信的端口 | 10750 | 设置为任意未占用端口 | +| dn_schema_region_consensus_port | DataNode 的元数据副本间共识协议通信的端口 | 10760 | 设置为任意未占用端口 | +| dn_seed_config_node | 集群中正在运行的 ConfigNode 地址 | 127.0.0.1:10710 | 设置为任意正在运行的 ConfigNode 的 cn_internal_address:cn_internal_port,可设置多个,用逗号(",")隔开 | + +**注意:上述配置项在节点启动后即不可更改,且务必保证所有端口均未被占用,否则节点无法启动。** + +### 集群操作 + +#### 启动集群 + +本小节描述如何启动包括若干 ConfigNode 和 DataNode 的集群。 +集群可以提供服务的标准是至少启动一个 ConfigNode 且启动 不小于(数据/元数据)副本个数 的 DataNode。 + +总体启动流程分为三步: + +1. 启动种子 ConfigNode +2. 增加 ConfigNode(可选) +3. 增加 DataNode + +##### 启动 Seed-ConfigNode + +**集群第一个启动的节点必须是 ConfigNode,第一个启动的 ConfigNode 必须遵循本小节教程。** + +第一个启动的 ConfigNode 是 Seed-ConfigNode,标志着新集群的创建。 +在启动 Seed-ConfigNode 前,请打开通用配置文件 ./conf/iotdb-system.properties,并检查如下参数: + +| **配置项** | **检查** | +| ------------------------------------------ | -------------------------- | +| cluster_name | 已设置为期望的集群名称 | +| config_node_consensus_protocol_class | 已设置为期望的共识协议 | +| schema_replication_factor | 已设置为期望的元数据副本数 | +| schema_region_consensus_protocol_class | 已设置为期望的共识协议 | +| data_replication_factor | 已设置为期望的数据副本数 | +| data_region_consensus_protocol_class | 已设置为期望的共识协议 | + +**注意:** 请根据[部署推荐](./Deployment-Recommendation.md)配置合适的通用参数,这些参数在首次配置后即不可修改。 + +接着请打开它的配置文件 ./conf/iotdb-system.properties,并检查如下参数: + +| **配置项** | **检查** | +| ------------------------------ | ------------------------------------------------------------ | +| cn_internal_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| cn_internal_port | 该端口未被占用 | +| cn_consensus_port | 该端口未被占用 | +| cn_seed_config_node | 已设置为自己的内部通讯地址,即 cn_internal_address:cn_internal_port | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-confignode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +ConfigNode 的其它配置参数可参考 +[ConfigNode 配置参数](../Reference/ConfigNode-Config-Manual.md)。 + +##### 增加更多 ConfigNode(可选) + +**只要不是第一个启动的 ConfigNode 就必须遵循本小节教程。** + +可向集群添加更多 ConfigNode,以保证 ConfigNode 的高可用。常用的配置为额外增加两个 ConfigNode,使集群共有三个 ConfigNode。 + +新增的 ConfigNode 需要保证 ./conf/iotdb-common.properites 中的所有配置参数与 Seed-ConfigNode 完全一致,否则可能启动失败或产生运行时错误。 +因此,请着重检查通用配置文件中的以下参数: + +| **配置项** | **检查** | +| ------------------------------------------ | --------------------------- | +| cluster_name | 与 Seed-ConfigNode 保持一致 | +| config_node_consensus_protocol_class | 与 Seed-ConfigNode 保持一致 | +| schema_replication_factor | 与 Seed-ConfigNode 保持一致 | +| schema_region_consensus_protocol_class | 与 Seed-ConfigNode 保持一致 | +| data_replication_factor | 与 Seed-ConfigNode 保持一致 | +| data_region_consensus_protocol_class | 与 Seed-ConfigNode 保持一致 | + +接着请打开它的配置文件 ./conf/iotdb-system.properties,并检查以下参数: + +| **配置项** | **检查** | +| ------------------------------ | ------------------------------------------------------------ | +| cn_internal_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| cn_internal_port | 该端口未被占用 | +| cn_consensus_port | 该端口未被占用 | +| cn_seed_config_node | 已设置为另一个正在运行的 ConfigNode 的内部通讯地址,推荐使用 Seed-ConfigNode 的内部通讯地址 | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-confignode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +ConfigNode 的其它配置参数可参考 +[ConfigNode 配置参数](../Reference/ConfigNode-Config-Manual.md)。 + +##### 增加 DataNode + +**确保集群已有正在运行的 ConfigNode 后,才能开始增加 DataNode。** + +可以向集群中添加任意个 DataNode。 +在添加新的 DataNode 前,请先打开通用配置文件 ./conf/iotdb-system.properties 并检查以下参数: + +| **配置项** | **检查** | +| ------------- | --------------------------- | +| cluster_name | 与 Seed-ConfigNode 保持一致 | + +接着打开它的配置文件 ./conf/iotdb-system.properties 并检查以下参数: + +| **配置项** | **检查** | +| ----------------------------------- | ------------------------------------------------------------ | +| dn_rpc_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_rpc_port | 该端口未被占用 | +| dn_internal_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_internal_port | 该端口未被占用 | +| dn_mpp_data_exchange_port | 该端口未被占用 | +| dn_data_region_consensus_port | 该端口未被占用 | +| dn_schema_region_consensus_port | 该端口未被占用 | +| dn_seed_config_node | 已设置为正在运行的 ConfigNode 的内部通讯地址,推荐使用 Seed-ConfigNode 的内部通讯地址 | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-datanode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-datanode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-datanode.bat +``` + +DataNode 的其它配置参数可参考 +[DataNode配置参数](../Reference/DataNode-Config-Manual.md) 。 + +**注意:当且仅当集群拥有不少于副本个数(max{schema_replication_factor, data_replication_factor})的 DataNode 后,集群才可以提供服务** + +#### 启动 Cli + +若搭建的集群仅用于本地调试,可直接执行 ./sbin 目录下的 Cli 启动脚本: + +``` +# Linux +./sbin/start-cli.sh + +# Windows +.\sbin\start-cli.bat +``` + +若希望通过 Cli 连接生产环境的集群, +请阅读 [Cli 使用手册](../Tools-System/CLI.md)。 + +#### 验证集群 + +以在6台服务器上启动的3C3D(3个ConfigNode 和 3个DataNode)集群为例, +这里假设3个ConfigNode依次为iotdb-1(192.168.1.10)、iotdb-2(192.168.1.11)、iotdb-3(192.168.1.12),且3个ConfigNode启动时均使用了默认的端口10710与10720; +3个DataNode依次为iotdb-4(192.168.1.20)、iotdb-5(192.168.1.21)、iotdb-6(192.168.1.22),且3个DataNode启动时均使用了默认的端口6667、10730、10740、10750与10760。 + +我们为这六台地址全部设置 hostname +```shell +echo "192.168.132.10 iotdb-1" >> /etc/hosts +echo "192.168.132.11 iotdb-2" >> /etc/hosts +echo "192.168.132.12 iotdb-3" >> /etc/hosts +echo "192.168.132.20 iotdb-4" >> /etc/hosts +echo "192.168.132.21 iotdb-5" >> /etc/hosts +echo "192.168.132.22 iotdb-6" >> /etc/hosts +``` + +当按照6.1步骤成功启动集群后,在 Cli 执行 `show cluster details`,看到的结果应当如下: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| iotdb-1 | 10710| 10720| | | | | | +| 2|ConfigNode|Running| iotdb-2 | 10710| 10720| | | | | | +| 3|ConfigNode|Running| iotdb-3 | 10710| 10720| | | | | | +| 1| DataNode|Running| iotdb-4 | 10730| | iotdb-4| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| iotdb-5 | 10730| | iotdb-5| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| iotdb-6 | 10730| | iotdb-6| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +若所有节点的状态均为 **Running**,则说明集群部署成功; +否则,请阅读启动失败节点的运行日志,并检查对应的配置参数。 + +#### 停止 IoTDB 进程 + +本小节描述如何手动关闭 IoTDB 的 ConfigNode 或 DataNode 进程。 + +##### 使用脚本停止 ConfigNode + +执行停止 ConfigNode 脚本: + +``` +# Linux +./sbin/stop-confignode.sh + +# Windows +.\sbin\stop-confignode.bat +``` + +##### 使用脚本停止 DataNode + +执行停止 DataNode 脚本: + +``` +# Linux +./sbin/stop-datanode.sh + +# Windows +.\sbin\stop-datanode.bat +``` + +##### 停止节点进程 + +首先获取节点的进程号: + +``` +jps + +# 或 + +ps aux | grep iotdb +``` + +结束进程: + +``` +kill -9 +``` + +**注意:有些端口的信息需要 root 权限才能获取,在此情况下请使用 sudo** + +#### 集群缩容 + +本小节描述如何将 ConfigNode 或 DataNode 移出集群。 + +##### 移除 ConfigNode + +在移除 ConfigNode 前,请确保移除后集群至少还有一个活跃的 ConfigNode。 +在活跃的 ConfigNode 上执行 remove-confignode 脚本: + +``` +# Linux +## 根据 confignode_id 移除节点 +./sbin/remove-confignode.sh + +## 根据 ConfigNode 内部通讯地址和端口移除节点 +./sbin/remove-confignode.sh : + + +# Windows +## 根据 confignode_id 移除节点 +.\sbin\remove-confignode.bat + +## 根据 ConfigNode 内部通讯地址和端口移除节点 +.\sbin\remove-confignode.bat : +``` + +##### 移除 DataNode + +在移除 DataNode 前,请确保移除后集群至少还有不少于(数据/元数据)副本个数的 DataNode。 +在活跃的 DataNode 上执行 remove-datanode 脚本: + +``` +# Linux +## 根据 datanode_id 移除节点 +./sbin/remove-datanode.sh + +## 根据 DataNode RPC 服务地址和端口移除节点 +./sbin/remove-datanode.sh : + + +# Windows +## 根据 datanode_id 移除节点 +.\sbin\remove-datanode.bat + +## 根据 DataNode RPC 服务地址和端口移除节点 +.\sbin\remove-datanode.bat : +``` + +### 常见问题 + +请参考 [分布式部署FAQ](../FAQ/Frequently-asked-questions.md#分布式部署-faq) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment_timecho.md new file mode 100644 index 00000000..e852a80f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster-Deployment_timecho.md @@ -0,0 +1,1109 @@ + + +# 集群版部署 + +## 集群版部署(使用集群管理工具) + +IoTDB 集群管理工具是一款易用的运维工具(企业版工具)。旨在解决 IoTDB 分布式系统多节点的运维难题,主要包括集群部署、集群启停、弹性扩容、配置更新、数据导出等功能,从而实现对复杂数据库集群的一键式指令下发, +极大降低管理难度。本文档将说明如何用集群管理工具远程部署、配置、启动和停止 IoTDB 集群实例。 + +### 环境准备 + +本工具为 TimechoDB(基于IoTDB的企业版数据库)配套工具,您可以联系您的销售获取工具下载方式。 + +IoTDB 要部署的机器需要依赖jdk 8及以上版本、lsof、netstat、unzip功能如果没有请自行安装,可以参考文档最后的一节环境所需安装命令。 + +提示:IoTDB集群管理工具需要使用有root权限的账号 + +### 部署方法 + +#### 下载安装 + +本工具为TimechoDB(基于IoTDB的企业版数据库)配套工具,您可以联系您的销售获取工具下载方式。 + +注意:由于二进制包仅支持GLIBC2.17 及以上版本,因此最低适配Centos7版本 + +* 在iotdbctl目录内输入以下指令后: + +```bash +bash install-iotdbctl.sh +``` + +即可在之后的 shell 内激活 iotdbctl 关键词,如检查部署前所需的环境指令如下所示: + +```bash +iotdbctl cluster check example +``` + +* 也可以不激活iotd直接使用 <iotdbctl absolute path>/sbin/iotdbctl 来执行命令,如检查部署前所需的环境: + +```bash +/sbin/iotdbctl cluster check example +``` + +### 系统结构 + +IoTDB集群管理工具主要由config、logs、doc、sbin目录组成。 + +* `config`存放要部署的集群配置文件如果要使用集群部署工具需要修改里面的yaml文件。 + +* `logs` 存放部署工具日志,如果想要查看部署工具执行日志请查看`logs/iotd_yyyy_mm_dd.log`。 + +* `sbin` 存放集群部署工具所需的二进制包。 + +* `doc` 存放用户手册、开发手册和推荐部署手册。 + +### 集群配置文件介绍 + +* 在`iotdbctl/config` 目录下有集群配置的yaml文件,yaml文件名字就是集群名字yaml 文件可以有多个,为了方便用户配置yaml文件在iotd/config目录下面提供了`default_cluster.yaml`示例。 +* yaml 文件配置由`global`、`confignode_servers`、`datanode_servers`、`grafana_server`、`prometheus_server`四大部分组成 +* global 是通用配置主要配置机器用户名密码、IoTDB本地安装文件、Jdk配置等。在`iotdbctl/config`目录中提供了一个`default_cluster.yaml`样例数据, + 用户可以复制修改成自己集群名字并参考里面的说明进行配置IoTDB集群,在`default_cluster.yaml`样例中没有注释的均为必填项,已经注释的为非必填项。 + +例如要执行`default_cluster.yaml`检查命令则需要执行命令`iotdbctl cluster check default_cluster`即可, +更多详细命令请参考下面命令列表。 + + +| 参数 | 说明 | 是否必填 | +|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| +| `iotdb_zip_dir` | IoTDB 部署分发目录,如果值为空则从`iotdb_download_url`指定地址下载 | 非必填 | +| `iotdb_download_url` | IoTDB 下载地址,如果`iotdb_zip_dir` 没有值则从指定地址下载 | 非必填 | +| `jdk_tar_dir` | jdk 本地目录,可使用该 jdk 路径进行上传部署至目标节点。 | 非必填 | +| `jdk_deploy_dir` | jdk 远程机器部署目录,会将 jdk 部署到该目录下面,与下面的`jdk_dir_name`参数构成完整的jdk部署目录即 `/` | 非必填 | +| `jdk_dir_name` | jdk 解压后的目录名称默认是jdk_iotdb | 非必填 | +| `iotdb_lib_dir` | IoTDB lib 目录或者IoTDB 的lib 压缩包仅支持.zip格式 ,仅用于IoTDB升级,默认处于注释状态,如需升级请打开注释修改路径即可。如果使用zip文件请使用zip 命令压缩iotdb/lib目录例如 zip -r lib.zip apache\-iotdb\-1.2.0/lib/* | 非必填 | +| `user` | ssh登陆部署机器的用户名 | 必填 | +| `password` | ssh登录的密码, 如果password未指定使用pkey登陆, 请确保已配置节点之间ssh登录免密钥 | 非必填 | +| `pkey` | 密钥登陆如果password有值优先使用password否则使用pkey登陆 | 非必填 | +| `ssh_port` | ssh登录端口 | 必填 | +| `deploy_dir` | IoTDB 部署目录,会把 IoTDB 部署到该目录下面与下面的`iotdb_dir_name`参数构成完整的IoTDB 部署目录即 `/` | 必填 | +| `iotdb_dir_name` | IoTDB 解压后的目录名称默认是iotdb | 非必填 | +| `datanode-env.sh` | 对应`iotdb/config/datanode-env.sh` ,在`global`与`confignode_servers`同时配置值时优先使用`confignode_servers`中的值 | 非必填 | +| `confignode-env.sh` | 对应`iotdb/config/confignode-env.sh`,在`global`与`datanode_servers`同时配置值时优先使用`datanode_servers`中的值 | 非必填 | +| `iotdb-system.properties` | 对应`iotdb/config/iotdb-system.properties` | 非必填 | +| `cn_seed_config_node` | 集群配置地址指向存活的ConfigNode,默认指向confignode_x,在`global`与`confignode_servers`同时配置值时优先使用`confignode_servers`中的值,对应`iotdb/config/iotdb-system.properties`中的`cn_seed_config_node` | 必填 | +| `dn_seed_config_node` | 集群配置地址指向存活的ConfigNode,默认指向confignode_x,在`global`与`datanode_servers`同时配置值时优先使用`datanode_servers`中的值,对应`iotdb/config/iotdb-system.properties`中的`dn_seed_config_node` | 必填 | + +其中 `datanode-env.sh` 和 `confignode-env.sh` 可以配置额外参数`extra_opts`,当该参数配置后会在`datanode-env.sh` 和`confignode-env.sh` 后面追加对应的值,可参考`default_cluster.yaml`,配置示例如下: +``` yaml +datanode-env.sh: + extra_opts: | + IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:+UseG1GC" + IOTDB_JMX_OPTS="$IOTDB_JMX_OPTS -XX:MaxGCPauseMillis=200" +``` + +* confignode_servers 是部署IoTDB Confignodes配置,里面可以配置多个Confignode + 默认将第一个启动的ConfigNode节点node1当作Seed-ConfigNode + +| 参数 | 说明 | 是否必填 | +|-----------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| +| `name` | Confignode 名称 | 必填 | +| `deploy_dir` | IoTDB config node 部署目录 | 必填| | +| `cn_internal_address` | 对应iotdb/内部通信地址,对应`iotdb/config/iotdb-system.properties`中的`cn_internal_address` | 必填 | +| `cn_seed_config_node` | 集群配置地址指向存活的ConfigNode,默认指向confignode_x,在`global`与`confignode_servers`同时配置值时优先使用`confignode_servers`中的值,对应`iotdb/config/iotdb-system.properties`中的`cn_seed_config_node` | 必填 | +| `cn_internal_port` | 内部通信端口,对应`iotdb/config/iotdb-system.properties`中的`cn_internal_port` | 必填 | +| `cn_consensus_port` | 对应`iotdb/config/iotdb-system.properties`中的`cn_consensus_port` | 非必填 | +| `cn_data_dir` | 对应`iotdb/config/iotdb-system.properties`中的`cn_data_dir` | 必填 | +| `iotdb-system.properties` | 对应`iotdb/config/iotdb-system.properties`在`global`与`confignode_servers`同时配置值优先使用confignode_servers中的值 | 非必填 | + +* datanode_servers 是部署IoTDB Datanodes配置,里面可以配置多个Datanode + +| 参数 | 说明 |是否必填| +|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|--- | +| name | Datanode 名称 |必填| +| `deploy_dir` | IoTDB data node 部署目录 |必填| +| `dn_rpc_address` | datanode rpc 地址对应`iotdb/config/iotdb-system.properties`中的`dn_rpc_address` |必填| +| `dn_internal_address` | 内部通信地址,对应`iotdb/config/iotdb-system.properties`中的`dn_internal_address` |必填| +| `dn_seed_config_node` | 集群配置地址指向存活的ConfigNode,默认指向confignode_x,在`global`与`datanode_servers`同时配置值时优先使用`datanode_servers`中的值,对应`iotdb/config/iotdb-system.properties`中的`dn_seed_config_node` |必填| +| `dn_rpc_port` | datanode rpc端口地址,对应`iotdb/config/iotdb-system.properties`中的`dn_rpc_port` |必填| +| `dn_internal_port` | 内部通信端口,对应`iotdb/config/iotdb-system.properties`中的`dn_internal_port` |必填| +| `iotdb-system.properties` | 对应`iotdb/config/iotdb-system.properties`在`global`与`datanode_servers`同时配置值优先使用`datanode_servers`中的值 |非必填| + +* grafana_server 是部署Grafana 相关配置 + +| 参数 | 说明 | 是否必填 | +|--------------------|------------------|-------------------| +| `grafana_dir_name` | grafana 解压目录名称 | 非必填默认grafana_iotdb | +| `host` | grafana 部署的服务器ip | 必填 | +| `grafana_port` | grafana 部署机器的端口 | 非必填,默认3000 | +| `deploy_dir` | grafana 部署服务器目录 | 必填 | +| `grafana_tar_dir` | grafana 压缩包位置 | 必填 | +| `dashboards` | dashboards 所在的位置 | 非必填,多个用逗号隔开 | + +* prometheus_server 是部署Prometheus 相关配置 + +| 参数 | 说明 | 是否必填 | +|--------------------------------|------------------|-----------------------| +| `prometheus_dir_name` | prometheus 解压目录名称 | 非必填默认prometheus_iotdb | +| `host` | prometheus 部署的服务器ip | 必填 | +| `prometheus_port` | prometheus 部署机器的端口 | 非必填,默认9090 | +| `deploy_dir` | prometheus 部署服务器目录 | 必填 | +| `prometheus_tar_dir` | prometheus 压缩包位置 | 必填 | +| `storage_tsdb_retention_time` | 默认保存数据天数 默认15天 | 非必填 | +| `storage_tsdb_retention_size` | 指定block可以保存的数据大小 ,注意单位KB, MB, GB, TB, PB, EB | 非必填 | + +如果在config/xxx.yaml的`iotdb-system.properties`中配置了metrics,则会自动把配置放入到promethues无需手动修改 + +注意:如何配置yaml key对应的值包含特殊字符如:等建议整个value使用双引号,对应的文件路径中不要使用包含空格的路径,防止出现识别出现异常问题。 + +### 使用场景 + +#### 清理数据场景 + +* 清理集群数据场景会删除IoTDB集群中的data目录以及yaml文件中配置的`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`和`ext`目录。 +* 首先执行停止集群命令、然后在执行集群清理命令。 +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster clean default_cluster +``` + +#### 集群销毁场景 + +* 集群销毁场景会删除IoTDB集群中的`data`、`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`、`ext`、`IoTDB`部署目录、 + grafana部署目录和prometheus部署目录。 +* 首先执行停止集群命令、然后在执行集群销毁命令。 + + +```bash +iotdbctl cluster stop default_cluster +iotdbctl cluster destroy default_cluster +``` + +#### 集群升级场景 + +* 集群升级首先需要在config/xxx.yaml中配置`iotdb_lib_dir`为要上传到服务器的jar所在目录路径(例如iotdb/lib)。 +* 如果使用zip文件上传请使用zip 命令压缩iotdb/lib目录例如 zip -r lib.zip apache-iotdb-1.2.0/lib/* +* 执行上传命令、然后执行重启IoTDB集群命令即可完成集群升级 + +```bash +iotdbctl cluster upgrade default_cluster +iotdbctl cluster restart default_cluster +``` + +#### 集群配置文件的热部署场景 + +* 首先修改在config/xxx.yaml中配置。 +* 执行分发命令、然后执行热部署命令即可完成集群配置的热部署 + +```bash +iotdbctl cluster distribute default_cluster +iotdbctl cluster reload default_cluster +``` + +#### 集群扩容场景 + +* 首先修改在config/xxx.yaml中添加一个datanode 或者confignode 节点。 +* 执行集群扩容命令 +```bash +iotdbctl cluster scaleout default_cluster +``` + +#### 集群缩容场景 + +* 首先在config/xxx.yaml中找到要缩容的节点名字或者ip+port(其中confignode port 是cn_internal_port、datanode port 是rpc_port) +* 执行集群缩容命令 +```bash +iotdbctl cluster scalein default_cluster +``` + +#### 已有IoTDB集群,使用集群部署工具场景 + +* 配置服务器的`user`、`passwod`或`pkey`、`ssh_port` +* 修改config/xxx.yaml中IoTDB 部署路径,`deploy_dir`(IoTDB 部署目录)、`iotdb_dir_name`(IoTDB解压目录名称,默认是iotdb) + 例如IoTDB 部署完整路径是`/home/data/apache-iotdb-1.1.1`则需要修改yaml文件`deploy_dir:/home/data/`、`iotdb_dir_name:apache-iotdb-1.1.1` +* 如果服务器不是使用的java_home则修改`jdk_deploy_dir`(jdk 部署目录)、`jdk_dir_name`(jdk解压后的目录名称,默认是jdk_iotdb),如果使用的是java_home 则不需要修改配置 + 例如jdk部署完整路径是`/home/data/jdk_1.8.2`则需要修改yaml文件`jdk_deploy_dir:/home/data/`、`jdk_dir_name:jdk_1.8.2` +* 配置`cn_seed_config_node`、`dn_seed_config_node` +* 配置`confignode_servers`中`iotdb-system.properties`里面的`cn_internal_address`、`cn_internal_port`、`cn_consensus_port`、`cn_system_dir`、 + `cn_consensus_dir`和`iotdb-system.properties`里面的值不是IoTDB默认的则需要配置否则可不必配置 +* 配置`datanode_servers`中`iotdb-system.properties`里面的`dn_rpc_address`、`dn_internal_address`、`dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`和`iotdb-system.properties`等 +* 执行初始化命令 + +```bash +iotdbctl cluster init default_cluster +``` + +#### 一键部署IoTDB、Grafana和Prometheus 场景 + +* 配置`iotdb-system.properties` 打开metrics接口 +* 配置Grafana 配置,如果`dashboards` 有多个就用逗号隔开,名字不能重复否则会被覆盖。 +* 配置Prometheus配置,IoTDB 集群配置了metrics 则无需手动修改Prometheus 配置会根据哪个节点配置了metrics,自动修改Prometheus 配置。 +* 启动集群 + +```bash +iotdbctl cluster start default_cluster +``` + +更加详细参数请参考上方的 集群配置文件介绍 + + +### 命令格式 + +本工具的基本用法为: +```bash +iotdbctl cluster [params (Optional)] +``` +* key 表示了具体的命令。 + +* cluster name 表示集群名称(即`iotdbctl/config` 文件中yaml文件名字)。 + +* params 表示了命令的所需参数(选填)。 + +* 例如部署default_cluster集群的命令格式为: + +```bash +iotdbctl cluster deploy default_cluster +``` + +* 集群的功能及参数列表如下: + +| 命令 | 功能 | 参数 | +|------------|----------------------------|-------------------------------------------------------------------------------------------------------------------------| +| check | 检测集群是否可以部署 | 集群名称列表 | +| clean | 清理集群 | 集群名称 | +| deploy | 部署集群 | 集群名称 ,-N,模块名称(iotdb、grafana、prometheus可选),-op force(可选) | +| list | 打印集群及状态列表 | 无 | +| start | 启动集群 | 集群名称,-N,节点名称(nodename、grafana、prometheus可选) | +| stop | 关闭集群 | 集群名称,-N,节点名称(nodename、grafana、prometheus可选) ,-op force(nodename、grafana、prometheus可选) | +| restart | 重启集群 | 集群名称,-N,节点名称(nodename、grafana、prometheus可选),-op force(nodename、grafana、prometheus可选) | +| show | 查看集群信息,details字段表示展示集群信息细节 | 集群名称, details(可选) | +| destroy | 销毁集群 | 集群名称,-N,模块名称(iotdb、grafana、prometheus可选) | +| scaleout | 集群扩容 | 集群名称 | +| scalein | 集群缩容 | 集群名称,-N,集群节点名字或集群节点ip+port | +| reload | 集群热加载 | 集群名称 | +| distribute | 集群配置文件分发 | 集群名称 | +| dumplog | 备份指定集群日志 | 集群名称,-N,集群节点名字 -h 备份至目标机器ip -pw 备份至目标机器密码 -p 备份至目标机器端口 -path 备份的目录 -startdate 起始时间 -enddate 结束时间 -loglevel 日志类型 -l 传输速度 | +| dumpdata | 备份指定集群数据 | 集群名称, -h 备份至目标机器ip -pw 备份至目标机器密码 -p 备份至目标机器端口 -path 备份的目录 -startdate 起始时间 -enddate 结束时间 -l 传输速度 | +| upgrade | lib 包升级 | 集群名字(升级完后请重启) | +| init | 已有集群使用集群部署工具时,初始化集群配置 | 集群名字,初始化集群配置 | +| status | 查看进程状态 | 集群名字 | +| acitvate | 激活集群 | 集群名字 | +### 详细命令执行过程 + +下面的命令都是以default_cluster.yaml 为示例执行的,用户可以修改成自己的集群文件来执行 + +#### 检查集群部署环境命令 + +```bash +iotdbctl cluster check default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 验证目标节点是否能够通过 SSH 登录 + +* 验证对应节点上的 JDK 版本是否满足IoTDB jdk1.8及以上版本、服务器是否按照unzip、是否安装lsof 或者netstat + +* 如果看到下面提示`Info:example check successfully!` 证明服务器已经具备安装的要求, + 如果输出`Error:example check fail!` 证明有部分条件没有满足需求可以查看上面的输出的Error日志(例如:`Error:Server (ip:172.20.31.76) iotdb port(10713) is listening`)进行修复, + 如果检查jdk没有满足要求,我们可以自己在yaml 文件中配置一个jdk1.8 及以上版本的进行部署不影响后面使用, + 如果检查lsof、netstat或者unzip 不满足要求需要在服务器上自行安装。 + +#### 部署集群命令 + +```bash +iotdbctl cluster deploy default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 根据`confignode_servers` 和`datanode_servers`中的节点信息上传IoTDB压缩包和jdk压缩包(如果yaml中配置`jdk_tar_dir`和`jdk_deploy_dir`值) + +* 根据yaml文件节点配置信息生成并上传`iotdb-system.properties` + +```bash +iotdbctl cluster deploy default_cluster -op force +``` +注意:该命令会强制执行部署,具体过程会删除已存在的部署目录重新部署 + +*部署单个模块* +```bash +# 部署grafana模块 +iotdbctl cluster deploy default_cluster -N grafana +# 部署prometheus模块 +iotdbctl cluster deploy default_cluster -N prometheus +# 部署iotdb模块 +iotdbctl cluster deploy default_cluster -N iotdb +``` + +#### 启动集群命令 + +```bash +iotdbctl cluster start default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 启动confignode,根据yaml配置文件中`confignode_servers`中的顺序依次启动同时根据进程id检查confignode是否正常,第一个confignode 为seek config + +* 启动datanode,根据yaml配置文件中`datanode_servers`中的顺序依次启动同时根据进程id检查datanode是否正常 + +* 如果根据进程id检查进程存在后,通过cli依次检查集群列表中每个服务是否正常,如果cli链接失败则每隔10s重试一次直到成功最多重试5次 + + +*启动单个节点命令* +```bash +#按照IoTDB 节点名称启动 +iotdbctl cluster start default_cluster -N datanode_1 +#按照IoTDB 集群ip+port启动,其中port对应confignode的cn_internal_port、datanode的rpc_port +iotdbctl cluster start default_cluster -N 192.168.1.5:6667 +#启动grafana +iotdbctl cluster start default_cluster -N grafana +#启动prometheus +iotdbctl cluster start default_cluster -N prometheus +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件 + +* 根据提供的节点名称或者ip:port找到对于节点位置信息,如果启动的节点是`data_node`则ip使用yaml 文件中的`dn_rpc_address`、port 使用的是yaml文件中datanode_servers 中的`dn_rpc_port`。 + 如果启动的节点是`config_node`则ip使用的是yaml文件中confignode_servers 中的`cn_internal_address` 、port 使用的是`cn_internal_port` + +* 启动该节点 + +说明:由于集群部署工具仅是调用了IoTDB集群中的start-confignode.sh和start-datanode.sh 脚本, +在实际输出结果失败时有可能是集群还未正常启动,建议使用status命令进行查看当前集群状态(iotdbctl cluster status xxx) + + +#### 查看IoTDB集群状态命令 + +```bash +iotdbctl cluster show default_cluster +#查看IoTDB集群详细信息 +iotdbctl cluster show default_cluster details +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 依次在datanode通过cli执行`show cluster details` 如果有一个节点执行成功则不会在后续节点继续执行cli直接返回结果 + + +#### 停止集群命令 + + +```bash +iotdbctl cluster stop default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 根据`datanode_servers`中datanode节点信息,按照配置先后顺序依次停止datanode节点 + +* 根据`confignode_servers`中confignode节点信息,按照配置依次停止confignode节点 + +*强制停止集群命令* + +```bash +iotdbctl cluster stop default_cluster -op force +``` +会直接执行kill -9 pid 命令强制停止集群 + +*停止单个节点命令* + +```bash +#按照IoTDB 节点名称停止 +iotdbctl cluster stop default_cluster -N datanode_1 +#按照IoTDB 集群ip+port停止(ip+port是按照datanode中的ip+dn_rpc_port获取唯一节点或confignode中的ip+cn_internal_port获取唯一节点) +iotdbctl cluster stop default_cluster -N 192.168.1.5:6667 +#停止grafana +iotdbctl cluster stop default_cluster -N grafana +#停止prometheus +iotdbctl cluster stop default_cluster -N prometheus +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件 + +* 根据提供的节点名称或者ip:port找到对应节点位置信息,如果停止的节点是`data_node`则ip使用yaml 文件中的`dn_rpc_address`、port 使用的是yaml文件中datanode_servers 中的`dn_rpc_port`。 + 如果停止的节点是`config_node`则ip使用的是yaml文件中confignode_servers 中的`cn_internal_address` 、port 使用的是`cn_internal_port` + +* 停止该节点 + +说明:由于集群部署工具仅是调用了IoTDB集群中的stop-confignode.sh和stop-datanode.sh 脚本,在某些情况下有可能iotdb集群并未停止。 + + +#### 清理集群数据命令 + +```bash +iotdbctl cluster clean default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`配置信息 + +* 根据`confignode_servers`、`datanode_servers`中的信息,检查是否还有服务正在运行, + 如果有任何一个服务正在运行则不会执行清理命令 + +* 删除IoTDB集群中的data目录以及yaml文件中配置的`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`和`ext`目录。 + + + +#### 重启集群命令 + +```bash +iotdbctl cluster restart default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 + +* 执行上述的停止集群命令(stop),然后执行启动集群命令(start) 具体参考上面的start 和stop 命令 + +*强制重启集群命令* + +```bash +iotdbctl cluster restart default_cluster -op force +``` +会直接执行kill -9 pid 命令强制停止集群,然后启动集群 + +*重启单个节点命令* + +```bash +#按照IoTDB 节点名称重启datanode_1 +iotdbctl cluster restart default_cluster -N datanode_1 +#按照IoTDB 节点名称重启confignode_1 +iotdbctl cluster restart default_cluster -N confignode_1 +#重启grafana +iotdbctl cluster restart default_cluster -N grafana +#重启prometheus +iotdbctl cluster restart default_cluster -N prometheus +``` + +#### 集群缩容命令 + +```bash +#按照节点名称缩容 +iotdbctl cluster scalein default_cluster -N nodename +#按照ip+port缩容(ip+port按照datanode中的ip+dn_rpc_port获取唯一节点,confignode中的ip+cn_internal_port获取唯一节点) +iotdbctl cluster scalein default_cluster -N ip:port +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 判断要缩容的confignode节点和datanode是否只剩一个,如果只剩一个则不能执行缩容 + +* 然后根据ip:port或者nodename 获取要缩容的节点信息,执行缩容命令,然后销毁该节点目录,如果缩容的节点是`data_node`则ip使用yaml 文件中的`dn_rpc_address`、port 使用的是yaml文件中datanode_servers 中的`dn_rpc_port`。 + 如果缩容的节点是`config_node`则ip使用的是yaml文件中confignode_servers 中的`cn_internal_address` 、port 使用的是`cn_internal_port` + + +提示:目前一次仅支持一个节点缩容 + +#### 集群扩容命令 + +```bash +iotdbctl cluster scaleout default_cluster +``` +* 修改config/xxx.yaml 文件添加一个datanode 节点或者confignode节点 + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 找到要扩容的节点,执行上传IoTDB压缩包和jdb包(如果yaml中配置`jdk_tar_dir`和`jdk_deploy_dir`值)并解压 + +* 根据yaml文件节点配置信息生成并上传`iotdb-system.properties` + +* 执行启动该节点命令并校验节点是否启动成功 + +提示:目前一次仅支持一个节点扩容 + +#### 销毁集群命令 +```bash +iotdbctl cluster destroy default_cluster +``` + +* cluster-name 找到默认位置的 yaml 文件 + +* 根据`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`中node节点信息,检查是否节点还在运行, + 如果有任何一个节点正在运行则停止销毁命令 + +* 删除IoTDB集群中的`data`以及yaml文件配置的`cn_system_dir`、`cn_consensus_dir`、 + `dn_data_dirs`、`dn_consensus_dir`、`dn_system_dir`、`logs`、`ext`、`IoTDB`部署目录、 + grafana部署目录和prometheus部署目录 + +*销毁单个模块* +```bash +# 销毁grafana模块 +iotdbctl cluster destroy default_cluster -N grafana +# 销毁prometheus模块 +iotdbctl cluster destroy default_cluster -N prometheus +# 销毁iotdb模块 +iotdbctl cluster destroy default_cluster -N iotdb +``` + +#### 分发集群配置命令 +```bash +iotdbctl cluster distribute default_cluster +``` + +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 + +* 根据yaml文件节点配置信息生成并依次上传`iotdb-system.properties`到指定节点 + +#### 热加载集群配置命令 +```bash +iotdbctl cluster reload default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 根据yaml文件节点配置信息依次在cli中执行`load configuration` + +#### 集群节点日志备份 +```bash +iotdbctl cluster dumplog default_cluster -N datanode_1,confignode_1 -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/logs' -logs '/root/data/db/iotdb/logs' +``` +* 根据 cluster-name 找到默认位置的 yaml 文件 + +* 该命令会根据yaml文件校验datanode_1,confignode_1 是否存在,然后根据配置的起止日期(startdate<=logtime<=enddate)备份指定节点datanode_1,confignode_1 的日志数据到指定服务`192.168.9.48` 端口`36000` 数据备份路径是 `/iotdb/logs` ,IoTDB日志存储路径在`/root/data/db/iotdb/logs`(非必填,如果不填写-logs xxx 默认从IoTDB安装路径/logs下面备份日志) + +| 命令 | 功能 | 是否必填 | +|------------|------------------------------------| ---| +| -h | 存放备份数据机器ip |否| +| -u | 存放备份数据机器用户名 |否| +| -pw | 存放备份数据机器密码 |否| +| -p | 存放备份数据机器端口(默认22) |否| +| -path | 存放备份数据的路径(默认当前路径) |否| +| -loglevel | 日志基本有all、info、error、warn(默认是全部) |否| +| -l | 限速(默认不限速范围0到104857601 单位Kbit/s) |否| +| -N | 配置文件集群名称多个用逗号隔开 |是| +| -startdate | 起始时间(包含默认1970-01-01) |否| +| -enddate | 截止时间(包含) |否| +| -logs | IoTDB 日志存放路径,默认是({iotdb}/logs) |否| + +#### 集群节点数据备份 +```bash +iotdbctl cluster dumpdata default_cluster -granularity partition -startdate '2023-04-11' -enddate '2023-04-26' -h 192.168.9.48 -p 36000 -u root -pw root -path '/iotdb/datas' +``` +* 该命令会根据yaml文件获取leader 节点,然后根据起止日期(startdate<=logtime<=enddate)备份数据到192.168.9.48 服务上的/iotdb/datas 目录下 + +| 命令 | 功能 | 是否必填 | +| ---|---------------------------------| ---| +|-h| 存放备份数据机器ip |否| +|-u| 存放备份数据机器用户名 |否| +|-pw| 存放备份数据机器密码 |否| +|-p| 存放备份数据机器端口(默认22) |否| +|-path| 存放备份数据的路径(默认当前路径) |否| +|-granularity| 类型partition |是| +|-l| 限速(默认不限速范围0到104857601 单位Kbit/s) |否| +|-startdate| 起始时间(包含) |是| +|-enddate| 截止时间(包含) |是| + +#### 集群升级 +```bash +iotdbctl cluster upgrade default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`和`datanode_servers`配置信息 + +* 上传lib包 + +注意执行完升级后请重启IoTDB 才能生效 + +#### 集群初始化 +```bash +iotdbctl cluster init default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 +* 初始化集群配置 + +#### 查看集群进程状态 +```bash +iotdbctl cluster status default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`、`datanode_servers`、`grafana`、`prometheus`配置信息 +* 展示集群的存活状态 + +#### 集群授权激活 + +集群激活默认是通过输入激活码激活,也可以通过-op license_path 通过license路径激活 + +* 默认激活方式 +```bash +iotdbctl cluster activate default_cluster +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`配置信息 +* 读取里面的机器码 +* 等待输入激活码 + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* 激活单个节点 + +```bash +iotdbctl cluster activate default_cluster -N confignode1 +``` + +* 通过license路径方式激活 + +```bash +iotdbctl cluster activate default_cluster -op license_path +``` +* 根据 cluster-name 找到默认位置的 yaml 文件,获取`confignode_servers`配置信息 +* 读取里面的机器码 +* 等待输入激活码 + +```bash +Machine code: +Kt8NfGP73FbM8g4Vty+V9qU5lgLvwqHEF3KbLN/SGWYCJ61eFRKtqy7RS/jw03lHXt4MwdidrZJ== +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKgGXEGzMms25+u== +Please enter the activation code: +JHQpXu97IKwv3rzbaDwoPLUuzNCm5aEeC9ZEBW8ndKg=,lTF1Dur1AElXIi/5jPV9h0XCm8ziPd9/R+tMYLsze1oAPxE87+Nwws= +Activation successful +``` +* 激活单个节点 + +```bash +iotdbctl cluster activate default_cluster -N confignode1 -op license_path +``` + + +### 集群部署工具样例介绍 +在集群部署工具安装目录中config/example 下面有3个yaml样例,如果需要可以复制到config 中进行修改即可 + +| 名称 | 说明 | +|-----------------------------|------------------------------------------------| +| default_1c1d.yaml | 1个confignode和1个datanode 配置样例 | +| default_3c3d.yaml | 3个confignode和3个datanode 配置样例 | +| default_3c3d_grafa_prome | 3个confignode和3个datanode、Grafana、Prometheus配置样例 | + +## 手动部署 + +### 前置检查 + +1. JDK>=1.8 的运行环境,并配置好 JAVA_HOME 环境变量。 +2. 设置最大文件打开数为 65535。 +3. 关闭交换内存。 +4. 首次启动ConfigNode节点时,确保已清空ConfigNode节点的data/confignode目录;首次启动DataNode节点时,确保已清空DataNode节点的data/datanode目录。 +5. 如果整个集群处在可信环境下,可以关闭机器上的防火墙选项。 +6. 在集群默认配置中,ConfigNode 会占用端口 10710 和 10720,DataNode 会占用端口 6667、10730、10740、10750 和 10760, + 请确保这些端口未被占用,或者手动修改配置文件中的端口配置。 + +### 安装包获取 + +你可以选择下载二进制文件或从源代码编译。 + +#### 下载二进制文件 + +1. 打开官网[Download Page](https://iotdb.apache.org/Download/)。 +2. 下载 IoTDB 1.3.0 版本的二进制文件。 +3. 解压得到 apache-iotdb-1.3.0-all-bin 目录。 + +#### 使用源码编译 + +##### 下载源码 + +**Git** + +``` +git clone https://github.com/apache/iotdb.git +git checkout v1.3.0 +``` + +**官网下载** + +1. 打开官网[Download Page](https://iotdb.apache.org/Download/)。 +2. 下载 IoTDB 1.3.0 版本的源码。 +3. 解压得到 apache-iotdb-1.3.0 目录。 + +##### 编译源码 + +在 IoTDB 源码根目录下: + +``` +mvn clean package -pl distribution -am -DskipTests +``` + +编译成功后,可在目录 +**distribution/target/apache-iotdb-1.3.0-all-bin/apache-iotdb-1.3.0-all-bin** +找到集群版本的二进制文件。 + +### 安装包说明 + +打开 apache-iotdb-1.3.0-all-bin,可见以下目录: + +| **目录** | **说明** | +| -------- | ------------------------------------------------------------ | +| conf | 配置文件目录,包含 ConfigNode、DataNode、JMX 和 logback 等配置文件 | +| data | 数据文件目录,包含 ConfigNode 和 DataNode 的数据文件 | +| lib | 库文件目录 | +| licenses | 证书文件目录 | +| logs | 日志文件目录,包含 ConfigNode 和 DataNode 的日志文件 | +| sbin | 脚本目录,包含 ConfigNode 和 DataNode 的启停移除脚本,以及 Cli 的启动脚本等 | +| tools | 系统工具目录 | + +### 集群安装配置 + +#### 集群安装 + +`apache-iotdb-1.3.0-all-bin` 包含 ConfigNode 和 DataNode, +请将安装包部署于你目标集群的所有机器上,推荐将安装包部署于所有服务器的相同目录下。 + +如果你希望先在一台服务器上尝试部署 IoTDB 集群,请参考 +[Cluster Quick Start](../QuickStart/ClusterQuickStart.md)。 + +#### 集群配置 + +接下来需要修改每个服务器上的配置文件,登录服务器, +并将工作路径切换至 `apache-iotdb-1.3.0-all-bin`, +配置文件在 `./conf` 目录内。 + +对于所有部署 ConfigNode 的服务器,需要修改[通用配置](../Reference/Common-Config-Manual.md)和 [ConfigNode 配置](../Reference/ConfigNode-Config-Manual.md)。 + +对于所有部署 DataNode 的服务器,需要修改[通用配置](../Reference/Common-Config-Manual.md)和 [DataNode 配置](../Reference/DataNode-Config-Manual.md)。 + +##### 通用配置 + +打开通用配置文件 ./conf/iotdb-system.properties, +可根据 [部署推荐](./Deployment-Recommendation.md) +设置以下参数: + +| **配置项** | **说明** | **默认** | +| ------------------------------------------ | ------------------------------------------------------------ | ----------------------------------------------- | +| cluster_name | 节点希望加入的集群的名称 | defaultCluster | +| config_node_consensus_protocol_class | ConfigNode 使用的共识协议 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| schema_replication_factor | 元数据副本数,DataNode 数量不应少于此数目 | 1 | +| schema_region_consensus_protocol_class | 元数据副本组的共识协议 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| data_replication_factor | 数据副本数,DataNode 数量不应少于此数目 | 1 | +| data_region_consensus_protocol_class | 数据副本组的共识协议。注:RatisConsensus 目前不支持多数据目录 | org.apache.iotdb.consensus.iot.IoTConsensus | + +**注意:上述配置项在集群启动后即不可更改,且务必保证所有节点的通用配置完全一致,否则节点无法启动。** + +##### ConfigNode 配置 + +打开 ConfigNode 配置文件 ./conf/iotdb-system.properties,根据服务器/虚拟机的 IP 地址和可用端口,设置以下参数: + +| **配置项** | **说明** | **默认** | **用法** | +| ------------------------------ | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------ | +| cn_internal_address | ConfigNode 在集群内部通讯使用的地址 | 127.0.0.1 | 设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| cn_internal_port | ConfigNode 在集群内部通讯使用的端口 | 10710 | 设置为任意未占用端口 | +| cn_consensus_port | ConfigNode 副本组共识协议通信使用的端口 | 10720 | 设置为任意未占用端口 | +| cn_seed_config_node | 节点注册加入集群时连接的 ConfigNode 的地址。注:只能配置一个 | 127.0.0.1:10710 | 对于 Seed-ConfigNode,设置为自己的 cn_internal_address:cn_internal_port;对于其它 ConfigNode,设置为另一个正在运行的 ConfigNode 的 cn_internal_address:cn_internal_port | + +**注意:上述配置项在节点启动后即不可更改,且务必保证所有端口均未被占用,否则节点无法启动。** + +##### DataNode 配置 + +打开 DataNode 配置文件 ./conf/iotdb-system.properties,根据服务器/虚拟机的 IP 地址和可用端口,设置以下参数: + +| **配置项** | **说明** | **默认** | **用法** | +| ----------------------------------- | ----------------------------------------- | --------------- | ------------------------------------------------------------ | +| dn_rpc_address | 客户端 RPC 服务的地址 | 127.0.0.1 | 设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_rpc_port | 客户端 RPC 服务的端口 | 6667 | 设置为任意未占用端口 | +| dn_internal_address | DataNode 在集群内部接收控制流使用的地址 | 127.0.0.1 | 设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_internal_port | DataNode 在集群内部接收控制流使用的端口 | 10730 | 设置为任意未占用端口 | +| dn_mpp_data_exchange_port | DataNode 在集群内部接收数据流使用的端口 | 10740 | 设置为任意未占用端口 | +| dn_data_region_consensus_port | DataNode 的数据副本间共识协议通信的端口 | 10750 | 设置为任意未占用端口 | +| dn_schema_region_consensus_port | DataNode 的元数据副本间共识协议通信的端口 | 10760 | 设置为任意未占用端口 | +| dn_seed_config_node | 集群中正在运行的 ConfigNode 地址 | 127.0.0.1:10710 | 设置为任意正在运行的 ConfigNode 的 cn_internal_address:cn_internal_port,可设置多个,用逗号(",")隔开 | + +**注意:上述配置项在节点启动后即不可更改,且务必保证所有端口均未被占用,否则节点无法启动。** + +### 集群操作 + +#### 启动集群 + +本小节描述如何启动包括若干 ConfigNode 和 DataNode 的集群。 +集群可以提供服务的标准是至少启动一个 ConfigNode 且启动 不小于(数据/元数据)副本个数 的 DataNode。 + +总体启动流程分为三步: + +1. 启动种子 ConfigNode +2. 增加 ConfigNode(可选) +3. 增加 DataNode + +##### 启动 Seed-ConfigNode + +**集群第一个启动的节点必须是 ConfigNode,第一个启动的 ConfigNode 必须遵循本小节教程。** + +第一个启动的 ConfigNode 是 Seed-ConfigNode,标志着新集群的创建。 +在启动 Seed-ConfigNode 前,请打开通用配置文件 ./conf/iotdb-system.properties,并检查如下参数: + +| **配置项** | **检查** | +| ------------------------------------------ | -------------------------- | +| cluster_name | 已设置为期望的集群名称 | +| config_node_consensus_protocol_class | 已设置为期望的共识协议 | +| schema_replication_factor | 已设置为期望的元数据副本数 | +| schema_region_consensus_protocol_class | 已设置为期望的共识协议 | +| data_replication_factor | 已设置为期望的数据副本数 | +| data_region_consensus_protocol_class | 已设置为期望的共识协议 | + +**注意:** 请根据[部署推荐](./Deployment-Recommendation.md)配置合适的通用参数,这些参数在首次配置后即不可修改。 + +接着请打开它的配置文件 ./conf/iotdb-system.properties,并检查如下参数: + +| **配置项** | **检查** | +| ------------------------------ | ------------------------------------------------------------ | +| cn_internal_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| cn_internal_port | 该端口未被占用 | +| cn_consensus_port | 该端口未被占用 | +| cn_seed_config_node | 已设置为自己的内部通讯地址,即 cn_internal_address:cn_internal_port | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-confignode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +ConfigNode 的其它配置参数可参考 +[ConfigNode 配置参数](../Reference/ConfigNode-Config-Manual.md)。 + +##### 增加更多 ConfigNode(可选) + +**只要不是第一个启动的 ConfigNode 就必须遵循本小节教程。** + +可向集群添加更多 ConfigNode,以保证 ConfigNode 的高可用。常用的配置为额外增加两个 ConfigNode,使集群共有三个 ConfigNode。 + +新增的 ConfigNode 需要保证 ./conf/iotdb-common.properites 中的所有配置参数与 Seed-ConfigNode 完全一致,否则可能启动失败或产生运行时错误。 +因此,请着重检查通用配置文件中的以下参数: + +| **配置项** | **检查** | +| ------------------------------------------ | --------------------------- | +| cluster_name | 与 Seed-ConfigNode 保持一致 | +| config_node_consensus_protocol_class | 与 Seed-ConfigNode 保持一致 | +| schema_replication_factor | 与 Seed-ConfigNode 保持一致 | +| schema_region_consensus_protocol_class | 与 Seed-ConfigNode 保持一致 | +| data_replication_factor | 与 Seed-ConfigNode 保持一致 | +| data_region_consensus_protocol_class | 与 Seed-ConfigNode 保持一致 | + +接着请打开它的配置文件 ./conf/iotdb-system.properties,并检查以下参数: + +| **配置项** | **检查** | +| ------------------------------ | ------------------------------------------------------------ | +| cn_internal_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| cn_internal_port | 该端口未被占用 | +| cn_consensus_port | 该端口未被占用 | +| cn_seed_config_node | 已设置为另一个正在运行的 ConfigNode 的内部通讯地址,推荐使用 Seed-ConfigNode 的内部通讯地址 | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-confignode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +ConfigNode 的其它配置参数可参考 +[ConfigNode 配置参数](../Reference/ConfigNode-Config-Manual.md)。 + +##### 增加 DataNode + +**确保集群已有正在运行的 ConfigNode 后,才能开始增加 DataNode。** + +可以向集群中添加任意个 DataNode。 +在添加新的 DataNode 前,请先打开通用配置文件 ./conf/iotdb-system.properties 并检查以下参数: + +| **配置项** | **检查** | +| ------------- | --------------------------- | +| cluster_name | 与 Seed-ConfigNode 保持一致 | + +接着打开它的配置文件 ./conf/iotdb-system.properties 并检查以下参数: + +| **配置项** | **检查** | +| ----------------------------------- | ------------------------------------------------------------ | +| dn_rpc_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_rpc_port | 该端口未被占用 | +| dn_internal_address | 已设置为服务器的 `IP地址`或`hostname(机器名/域名)` | +| dn_internal_port | 该端口未被占用 | +| dn_mpp_data_exchange_port | 该端口未被占用 | +| dn_data_region_consensus_port | 该端口未被占用 | +| dn_schema_region_consensus_port | 该端口未被占用 | +| dn_seed_config_node | 已设置为正在运行的 ConfigNode 的内部通讯地址,推荐使用 Seed-ConfigNode 的内部通讯地址 | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-datanode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-datanode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-datanode.bat +``` + +DataNode 的其它配置参数可参考 +[DataNode配置参数](../Reference/DataNode-Config-Manual.md) 。 + +**注意:当且仅当集群拥有不少于副本个数(max{schema_replication_factor, data_replication_factor})的 DataNode 后,集群才可以提供服务** + +#### 启动 Cli + +若搭建的集群仅用于本地调试,可直接执行 ./sbin 目录下的 Cli 启动脚本: + +``` +# Linux +./sbin/start-cli.sh + +# Windows +.\sbin\start-cli.bat +``` + +若希望通过 Cli 连接生产环境的集群, +请阅读 [Cli 使用手册](../Tools-System/CLI.md)。 + +#### 验证集群 + +以在6台服务器上启动的3C3D(3个ConfigNode 和 3个DataNode)集群为例, +这里假设3个ConfigNode依次为iotdb-1(192.168.1.10)、iotdb-2(192.168.1.11)、iotdb-3(192.168.1.12),且3个ConfigNode启动时均使用了默认的端口10710与10720; +3个DataNode依次为iotdb-4(192.168.1.20)、iotdb-5(192.168.1.21)、iotdb-6(192.168.1.22),且3个DataNode启动时均使用了默认的端口6667、10730、10740、10750与10760。 + +我们为这六台地址全部设置 hostname +```shell +echo "192.168.132.10 iotdb-1" >> /etc/hosts +echo "192.168.132.11 iotdb-2" >> /etc/hosts +echo "192.168.132.12 iotdb-3" >> /etc/hosts +echo "192.168.132.20 iotdb-4" >> /etc/hosts +echo "192.168.132.21 iotdb-5" >> /etc/hosts +echo "192.168.132.22 iotdb-6" >> /etc/hosts +``` + +当按照6.1步骤成功启动集群后,在 Cli 执行 `show cluster details`,看到的结果应当如下: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| iotdb-1 | 10710| 10720| | | | | | +| 2|ConfigNode|Running| iotdb-2 | 10710| 10720| | | | | | +| 3|ConfigNode|Running| iotdb-3 | 10710| 10720| | | | | | +| 1| DataNode|Running| iotdb-4 | 10730| | iotdb-4| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| iotdb-5 | 10730| | iotdb-5| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| iotdb-6 | 10730| | iotdb-6| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +若所有节点的状态均为 **Running**,则说明集群部署成功; +否则,请阅读启动失败节点的运行日志,并检查对应的配置参数。 + +#### 停止 IoTDB 进程 + +本小节描述如何手动关闭 IoTDB 的 ConfigNode 或 DataNode 进程。 + +##### 使用脚本停止 ConfigNode + +执行停止 ConfigNode 脚本: + +``` +# Linux +./sbin/stop-confignode.sh + +# Windows +.\sbin\stop-confignode.bat +``` + +##### 使用脚本停止 DataNode + +执行停止 DataNode 脚本: + +``` +# Linux +./sbin/stop-datanode.sh + +# Windows +.\sbin\stop-datanode.bat +``` + +##### 停止节点进程 + +首先获取节点的进程号: + +``` +jps + +# 或 + +ps aux | grep iotdb +``` + +结束进程: + +``` +kill -9 +``` + +**注意:有些端口的信息需要 root 权限才能获取,在此情况下请使用 sudo** + +#### 集群缩容 + +本小节描述如何将 ConfigNode 或 DataNode 移出集群。 + +##### 移除 ConfigNode + +在移除 ConfigNode 前,请确保移除后集群至少还有一个活跃的 ConfigNode。 +在活跃的 ConfigNode 上执行 remove-confignode 脚本: + +``` +# Linux +## 根据 confignode_id 移除节点 +./sbin/remove-confignode.sh + +## 根据 ConfigNode 内部通讯地址和端口移除节点 +./sbin/remove-confignode.sh : + + +# Windows +## 根据 confignode_id 移除节点 +.\sbin\remove-confignode.bat + +## 根据 ConfigNode 内部通讯地址和端口移除节点 +.\sbin\remove-confignode.bat : +``` + +##### 移除 DataNode + +在移除 DataNode 前,请确保移除后集群至少还有不少于(数据/元数据)副本个数的 DataNode。 +在活跃的 DataNode 上执行 remove-datanode 脚本: + +``` +# Linux +## 根据 datanode_id 移除节点 +./sbin/remove-datanode.sh + +## 根据 DataNode RPC 服务地址和端口移除节点 +./sbin/remove-datanode.sh : + + +# Windows +## 根据 datanode_id 移除节点 +.\sbin\remove-datanode.bat + +## 根据 DataNode RPC 服务地址和端口移除节点 +.\sbin\remove-datanode.bat : +``` + +### 常见问题 + +请参考 [分布式部署FAQ](../FAQ/Frequently-asked-questions.md#分布式部署-faq) diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Concept.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Concept.md new file mode 100644 index 00000000..f503ae15 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Concept.md @@ -0,0 +1,118 @@ + + +# 分布式 +## 集群基本概念 + +Apache IoTDB 集群版包含两种角色的节点,ConfigNode 和 DataNode,分别为不同的进程,可独立部署。 + +集群架构示例如下图: + + + + +ConfigNode 是集群的控制节点,管理集群的节点状态、分区信息等,集群所有 ConfigNode 组成一个高可用组,数据全量备份。 + +注意:ConfigNode 的副本数是集群当前加入的 ConfigNode 个数,一半以上的 ConfigNode 存活集群才能提供服务。 + +DataNode 是集群的数据节点,管理多个数据分片、元数据分片,数据即时间序列中的时间戳和值,元数据为时间序列的路径信息、数据类型等。 + +Client 只能通过 DataNode 进行数据读写。 + +### 名词解释 + +| 名词 | 类型 | 解释 | +|:------------------|:--------|:---------------------------------------------------| +| ConfigNode | 节点角色 | 配置节点,管理集群节点信息、分区信息,监控集群状态、控制负载均衡 | +| DataNode | 节点角色 | 数据节点,管理数据、元数据 | +| Database | 元数据 | 数据库,不同数据库的数据物理隔离 | +| DeviceId | 设备名 | 元数据树中从 root 到倒数第二级的全路径表示一个设备名 | +| SeriesSlot | 元数据分区 | 每个 Database 包含多个元数据分区,根据设备名进行分区 | +| SchemaRegion | 一组元数据分区 | 多个 SeriesSlot 的集合 | +| SchemaRegionGroup | 逻辑概念 | 包含元数据副本数个 SchemaRegion,管理相同的元数据,互为备份 | +| SeriesTimeSlot | 数据分区 | 一个元数据分区的一段时间的数据对应一个数据分区,每个元数据分区对应多个数据分区,根据时间范围进行分区 | +| DataRegion | 一组数据分区 | 多个 SeriesTimeSlot 的集合 | +| DataRegionGroup | 逻辑概念 | 包含数据副本数个 DataRegion,管理相同的数据,互为备份 | + +## 集群特点 + +* 原生分布式 + * IoTDB 各模块原生支持分布式。 + * Standalone 是分布式的一种特殊的部署形态。 +* 扩展性 + * 支持秒级增加节点,无需进行数据迁移。 +* 大规模并行处理架构 MPP + * 采用大规模并行处理架构及火山模型进行数据处理,具有高扩展性。 +* 可根据不同场景需求选择不同的共识协议 + * 数据副本组和元数据副本组,可以采用不同的共识协议。 +* 可扩展分区策略 + * 集群采用分区表管理数据和元数据分区,自定义灵活的分配策略。 +* 内置监控框架 + * 内置集群监控,可以监控集群节点。 + +## 分区策略 + +分区策略将数据和元数据划分到不同的 RegionGroup 中,并把 RegionGroup 的 Region 分配到不同的 DataNode。 + +推荐设置 1 个 database,集群会根据节点数和核数动态分配资源。 + +Database 包含多个 SchemaRegion 和 DataRegion,由 DataNode 管理。 + +* 元数据分区策略 + * 对于一条未使用模板的时间序列的元数据,ConfigNode 会根据设备 ID (从 root 到倒数第二层节点的全路径)映射到一个序列分区,并将此序列分区分配到一组 SchemaRegion 中。 + +* 数据分区策略 + * 对于一个时间序列数据点,ConfigNode 会根据设备 ID 映射到一个序列分区(纵向分区),再根据时间戳映射到一个序列时间分区(横向分区),并将此序列时间分区分配到一组 DataRegion 中。 + +IoTDB 使用了基于槽的分区策略,因此分区信息的大小是可控的,不会随时间序列或设备数无限增长。 + +Region 会分配到不同的 DataNode 上,分配 Region 时会保证不同 DataNode 的负载均衡。 + +## 复制策略 + +复制策略将数据复制多份,互为副本,多个副本可以一起提供高可用服务,容忍部分副本失效的情况。 + +Region 是数据复制的基本单位,一个 Region 的多个副本构成了一个高可用复制组,数据互为备份。 + +* 集群内的副本组 + * ConfigNodeGroup:由所有 ConfigNode 组成。 + * SchemaRegionGroup:集群有多个元数据组,每个 SchemaRegionGroup 内有多个 ID 相同的 SchemaRegion。 + * DataRegionGroup:集群有多个数据组,每个 DataRegionGroup 内有多个 ID 相同的 DataRegion。 + + +完整的集群分区复制的示意图如下: + + + +图中包含 1 个 SchemaRegionGroup,元数据采用 3 副本,因此 3 个白色的 SchemaRegion-0 组成了一个副本组。 + +图中包含 3 个 DataRegionGroup,数据采用 3 副本,因此一共有 9 个 DataRegion。 + +## 共识协议(一致性协议) + +每个副本组的多个副本之间,都通过一个具体的共识协议保证数据一致性,共识协议会将读写请求应用到多个副本上。 + +* 现有的共识协议 + * SimpleConsensus:提供强一致性,仅单副本时可用,一致性协议的极简实现,效率最高。 + * IoTConsensus:提供最终一致性,任意副本数可用,2 副本时可容忍 1 节点失效,当前仅可用于 DataRegion 的副本上,写入可以在任一副本进行,并异步复制到其他副本。 + * RatisConsensus:提供强一致性,Raft 协议的一种实现,任意副本数可用,当前可用于任意副本组上。目前DataRegion使用RatisConsensus时暂不支持多数据目录,预计会在后续版本中支持这一功能。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Maintenance.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Maintenance.md new file mode 100644 index 00000000..b3a5d36e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Maintenance.md @@ -0,0 +1,717 @@ + + +# 集群运维命令 + +## 展示集群配置 + +当前 IoTDB 支持使用如下 SQL 展示集群的关键参数: +``` +SHOW VARIABLES +``` + +示例: +``` +IoTDB> show variables ++----------------------------------+-----------------------------------------------------------------+ +| Variables| Value| ++----------------------------------+-----------------------------------------------------------------+ +| ClusterName| defaultCluster| +| DataReplicationFactor| 1| +| SchemaReplicationFactor| 1| +| DataRegionConsensusProtocolClass| org.apache.iotdb.consensus.iot.IoTConsensus| +|SchemaRegionConsensusProtocolClass| org.apache.iotdb.consensus.ratis.RatisConsensus| +| ConfigNodeConsensusProtocolClass| org.apache.iotdb.consensus.ratis.RatisConsensus| +| TimePartitionInterval| 604800000| +| DefaultTTL(ms)| 9223372036854775807| +| ReadConsistencyLevel| strong| +| SchemaRegionPerDataNode| 1.0| +| DataRegionPerDataNode| 5.0| +| LeastDataRegionGroupNum| 5| +| SeriesSlotNum| 10000| +| SeriesSlotExecutorClass|org.apache.iotdb.commons.partition.executor.hash.BKDRHashExecutor| +| DiskSpaceWarningThreshold| 0.05| ++----------------------------------+-----------------------------------------------------------------+ +Total line number = 15 +It costs 0.225s +``` + +**注意:** 必须保证该 SQL 展示的所有配置参数在同一集群各个节点完全一致 + +## 展示 ConfigNode 信息 + +当前 IoTDB 支持使用如下 SQL 展示 ConfigNode 的信息: +``` +SHOW CONFIGNODES +``` + +示例: +``` +IoTDB> show confignodes ++------+-------+---------------+------------+--------+ +|NodeID| Status|InternalAddress|InternalPort| Role| ++------+-------+---------------+------------+--------+ +| 0|Running| 127.0.0.1| 10710| Leader| +| 1|Running| 127.0.0.1| 10711|Follower| +| 2|Running| 127.0.0.1| 10712|Follower| ++------+-------+---------------+------------+--------+ +Total line number = 3 +It costs 0.030s +``` + +### ConfigNode 状态定义 +对 ConfigNode 各状态定义如下: + +- **Running**: ConfigNode 正常运行 +- **Unknown**: ConfigNode 未正常上报心跳 + - 无法接收其它 ConfigNode 同步来的数据 + - 不会被选为集群的 ConfigNode-leader + +## 展示 DataNode 信息 + +当前 IoTDB 支持使用如下 SQL 展示 DataNode 的信息: +``` +SHOW DATANODES +``` + +示例: +``` +IoTDB> create timeseries root.sg.d1.s1 with datatype=BOOLEAN,encoding=PLAIN +Msg: The statement is executed successfully. +IoTDB> create timeseries root.sg.d2.s1 with datatype=BOOLEAN,encoding=PLAIN +Msg: The statement is executed successfully. +IoTDB> create timeseries root.ln.d1.s1 with datatype=BOOLEAN,encoding=PLAIN +Msg: The statement is executed successfully. +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 127.0.0.1| 6667| 0| 1| +| 2|Running| 127.0.0.1| 6668| 0| 1| ++------+-------+----------+-------+-------------+---------------+ + +Total line number = 2 +It costs 0.007s + +IoTDB> insert into root.ln.d1(timestamp,s1) values(1,true) +Msg: The statement is executed successfully. +IoTDB> show datanodes ++------+-------+----------+-------+-------------+---------------+ +|NodeID| Status|RpcAddress|RpcPort|DataRegionNum|SchemaRegionNum| ++------+-------+----------+-------+-------------+---------------+ +| 1|Running| 127.0.0.1| 6667| 1| 1| +| 2|Running| 127.0.0.1| 6668| 0| 1| ++------+-------+----------+-------+-------------+---------------+ +Total line number = 2 +It costs 0.006s +``` + +### DataNode 状态定义 +DataNode 的状态机如下图所示: + + +对 DataNode 各状态定义如下: + +- **Running**: DataNode 正常运行,可读可写 +- **Unknown**: DataNode 未正常上报心跳,ConfigNode 认为该 DataNode 不可读写 + - 少数 Unknown DataNode 不影响集群读写 +- **Removing**: DataNode 正在移出集群,不可读写 + - 少数 Removing DataNode 不影响集群读写 +- **ReadOnly**: DataNode 磁盘剩余空间低于 disk_warning_threshold(默认 5%),DataNode 可读但不能写入,不能同步数据 + - 少数 ReadOnly DataNode 不影响集群读写 + - ReadOnly DataNode 可以查询元数据和数据 + - ReadOnly DataNode 可以删除元数据和数据 + - ReadOnly DataNode 可以创建元数据,不能写入数据 + - 所有 DataNode 处于 ReadOnly 状态时,集群不能写入数据,仍可以创建 Database 和元数据 + +**对于一个 DataNode**,不同状态元数据查询、创建、删除的影响如下表所示: + +| DataNode 状态 | 可读 | 可创建 | 可删除 | +|-------------|-----|-----|-----| +| Running | 是 | 是 | 是 | +| Unknown | 否 | 否 | 否 | +| Removing | 否 | 否 | 否 | +| ReadOnly | 是 | 是 | 是 | + +**对于一个 DataNode**,不同状态数据查询、写入、删除的影响如下表所示: + +| DataNode 状态 | 可读 | 可写 | 可删除 | +|-------------|-----|-----|-----| +| Running | 是 | 是 | 是 | +| Unknown | 否 | 否 | 否 | +| Removing | 否 | 否 | 否 | +| ReadOnly | 是 | 否 | 是 | + +## 展示全部节点信息 + +当前 IoTDB 支持使用如下 SQL 展示全部节点的信息: +``` +SHOW CLUSTER +``` + +示例: +``` +IoTDB> show cluster ++------+----------+-------+---------------+------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort| ++------+----------+-------+---------------+------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| +| 1|ConfigNode|Running| 127.0.0.1| 10711| +| 2|ConfigNode|Running| 127.0.0.1| 10712| +| 3| DataNode|Running| 127.0.0.1| 10730| +| 4| DataNode|Running| 127.0.0.1| 10731| +| 5| DataNode|Running| 127.0.0.1| 10732| ++------+----------+-------+---------------+------------+ +Total line number = 6 +It costs 0.011s +``` + +在节点被关停后,它的状态也会改变,如下所示: +``` +IoTDB> show cluster ++------+----------+-------+---------------+------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort| ++------+----------+-------+---------------+------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| +| 1|ConfigNode|Unknown| 127.0.0.1| 10711| +| 2|ConfigNode|Running| 127.0.0.1| 10712| +| 3| DataNode|Running| 127.0.0.1| 10730| +| 4| DataNode|Running| 127.0.0.1| 10731| +| 5| DataNode|Running| 127.0.0.1| 10732| ++------+----------+-------+---------------+------------+ +Total line number = 6 +It costs 0.012s +``` + +展示全部节点的详细配置信息: +``` +SHOW CLUSTER DETAILS +``` + +示例: +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort|RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 127.0.0.1| 10710| 10720| | | | | | +| 1|ConfigNode|Running| 127.0.0.1| 10711| 10721| | | | | | +| 2|ConfigNode|Running| 127.0.0.1| 10712| 10722| | | | | | +| 3| DataNode|Running| 127.0.0.1| 10730| | 127.0.0.1| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 127.0.0.1| 10731| | 127.0.0.1| 6668| 10741| 10751| 10761| +| 5| DataNode|Running| 127.0.0.1| 10732| | 127.0.0.1| 6669| 10742| 10752| 10762| ++------+----------+-------+---------------+------------+-------------------+----------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.340s +``` + +## 展示 Region 信息 + +集群中以 SchemaRegion/DataRegion 作为元数据/数据的复制和管理单元,Region 的状态和分布对于系统运维和测试有很大帮助,如以下场景: + +- 查看集群中各个 Region 被分配到了哪些 DataNode,是否均衡 +- 查看集群中各个 Region 被分配了哪些分区,是否均衡 +- 查看集群中各个 RegionGroup 的 leader 被分配到了哪些 DataNode,是否均衡 + +当前 IoTDB 支持使用如下 SQL 展示 Region 信息: + +- `SHOW REGIONS`: 展示所有 Region 分布 +- `SHOW SCHEMA REGIONS`: 展示所有 SchemaRegion 分布 +- `SHOW DATA REGIONS`: 展示所有 DataRegion 分布 +- `SHOW (DATA|SCHEMA)? REGIONS OF DATABASE `: 展示指定数据库 对应的 Region 分布 +- `SHOW (DATA|SCHEMA)? REGIONS ON NODEID `: 展示指定节点 对应的 Region 分布 +- `SHOW (DATA|SCHEMA)? REGIONS (OF DATABASE )? (ON NODEID )?`: 展示指定数据库 在指定节点 对应的 Region 分布 + +展示所有 Region 的分布: +``` +IoTDB> show regions ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2| DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 12 +It costs 0.165s +``` +其中,SeriesSlotNum 指的是 region 内 seriesSlot 的个数。同样地,TimeSlotNum 也指 region 内 timeSlot 的个数。 + +展示 SchemaRegion 或 DataRegion 的分布: +``` +IoTDB> show data regions ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2| DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.011s + +IoTDB> show schema regions ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.012s +``` + +展示指定数据库 对应的 Region 分布: +``` +IoTDB> show regions of database root.sg1 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+-- -----+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.007s + +IoTDB> show regions of database root.sg1, root.sg2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0| DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2| DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 12 +It costs 0.009s + +IoTDB> show data regions of database root.sg1, root.sg2 ++--------+----------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+----------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0|DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0|DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 0|DataRegion|Running|root.sg1| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.013| +| 2|DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2|DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:20.011| +| 2|DataRegion|Running|root.sg2| 1| 1| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:20.395| ++--------+----------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.007s + +IoTDB> show schema regions of database root.sg1, root.sg2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 3| 127.0.0.1| 6669| Leader|2023-03-07T17:32:18.398| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 3| 127.0.0.1| 6669|Follower|2023-03-07T17:32:19.637| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 6 +It costs 0.009s +``` + +展示指定节点 对应的 Region 分布: +``` +IoTDB> show regions on nodeid 1 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 4 +It costs 0.165s + +IoTDB> show regions on nodeid 1, 2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:18.245| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:19.011| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:19.232| +| 3|SchemaRegion|Running|root.sg2| 1| 0| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:19.450| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 8 +It costs 0.165s +``` + +展示指定数据库 在指定节点 对应的 Region 分布: +``` +IoTDB> show regions of database root.sg1 on nodeid 1 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 1|SchemaRegion|Running|root.sg1| 1| 0| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.111| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 2 +It costs 0.165s + +IoTDB> show data regions of database root.sg1, root.sg2 on nodeid 1, 2 ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +|RegionId| Type| Status|Database|SeriesSlotNum|TimeSlotNum|DataNodeId|RpcAddress|RpcPort| Role| CreateTime| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +| 0| DataRegion|Running|root.sg1| 1| 1| 1| 127.0.0.1| 6667|Follower|2023-03-07T17:32:18.520| +| 0| DataRegion|Running|root.sg1| 1| 1| 2| 127.0.0.1| 6668| Leader|2023-03-07T17:32:18.749| +| 2| DataRegion|Running|root.sg2| 1| 1| 1| 127.0.0.1| 6667| Leader|2023-03-07T17:32:19.834| +| 2| DataRegion|Running|root.sg2| 1| 1| 2| 127.0.0.1| 6668|Follower|2023-03-07T17:32:19.011| ++--------+------------+-------+--------+-------------+-----------+----------+----------+-------+--------+-----------------------+ +Total line number = 4 +It costs 0.165s +``` + +### Region 状态定义 +Region 继承所在 DataNode 的状态,对 Region 各状态定义如下: + +- **Running**: Region 所在 DataNode 正常运行,Region 可读可写 +- **Unknown**: Region 所在 DataNode 未正常上报心跳,ConfigNode 认为该 Region 不可读写 +- **Removing**: Region 所在 DataNode 正在被移出集群,Region 不可读写 +- **ReadOnly**: Region 所在 DataNode 的磁盘剩余空间低于 disk_warning_threshold(默认 5%),Region 可读,但不能写入,不能同步数据 + +**单个 Region 的状态切换不会影响所属 RegionGroup 的运行**, +在设置多副本集群时(即元数据副本数和数据副本数大于 1), +同 RegionGroup 其它 Running 状态的 Region 能保证该 RegionGroup 的高可用性。 + +**对于一个 RegionGroup:** +- 当且仅当严格多于一半的 Region 处于 Running 状态时, 该 RegionGroup 可进行数据的查询、写入和删除操作 +- 如果处于 Running 状态的 Region 少于一半,该 RegionGroup 不可进行数据的数据的查询、写入和删除操作 + +## 展示集群槽信息 + +集群使用分区来管理元数据和数据,分区定义如下: + +- **元数据分区**:SeriesSlot +- **数据分区**: + +在文档[Cluster-Concept](./Cluster-Concept.md)中可以查看详细信息。 + +可以使用以下 SQL 来查询分区对应信息: + +### 展示数据分区所在的 DataRegion + +展示某数据库或某设备的数据分区所在的 DataRegion: +- `- SHOW DATA REGIONID WHERE (DATABASE=root.xxx |DEVICE=root.xxx.xxx) (AND TIME=xxxxx)?` + + +有如下几点说明: + +1. DEVICE 为设备名对应唯一的 SeriesSlot,TIME 为时间戳或者通用时间对应唯一的 SeriesTimeSlot。 + +2. DATABASE 和 DEVICE 必须以 root 开头,如果是不存在的路径时返回空,不报错,下同。 + +3. DATABASE 和 DEVICE 目前不支持通配符匹配或者批量查询,如果包含 * 或 ** 的通配符或者输入多个 DATABASE 和 DEVICE 则会报错,下同。 + +4. TIME 支持时间戳和通用日期。对于时间戳,必须得大于等于0,对于通用日期,需要不早于1970-01-01 00:00:00 + + +示例: +``` +IoTDB> show data regionid where device=root.sg.m1.d1 ++--------+ +|RegionId| ++--------+ +| 1| +| 2| ++--------+ +Total line number = 2 +It costs 0.006s + +IoTDB> show data regionid where device=root.sg.m1.d1 and time=604800000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s + +IoTDB> show data regionid where device=root.sg.m1.d1 and time=1970-01-08T00:00:00.000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s + +IoTDB> show data regionid where database=root.sg ++--------+ +|RegionId| ++--------+ +| 1| +| 2| ++--------+ +Total line number = 2 +It costs 0.006s + +IoTDB> show data regionid where database=root.sg and time=604800000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s + +IoTDB> show data regionid where database=root.sg and time=1970-01-08T00:00:00.000 ++--------+ +|RegionId| ++--------+ +| 1| ++--------+ +Total line number = 1 +It costs 0.006s +``` + +### 展示元数据分区所在的 SchemaRegion + +展示某数据库或某设备的元数据分区所在的 SchemaRegion: +- `SHOW SCHEMA REGIONID WHERE (DATABASE=root.xxx | DEVICE=root.xxx.xxx)` + + +示例: +``` +IoTDB> show schema regionid where device=root.sg.m1.d2 ++--------+ +|RegionId| ++--------+ +| 0| ++--------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show schema regionid where database=root.sg ++--------+ +|RegionId| ++--------+ +| 0| ++--------+ +Total line number = 1 +It costs 0.007s +``` +### 展示数据库的序列槽 +展示某数据库内数据或元数据的序列槽(SeriesSlot): +- `SHOW (DATA|SCHEMA) SERIESSLOTID WHERE DATABASE=root.xxx` + +示例: +``` +IoTDB> show data seriesslotid where database = root.sg ++------------+ +|SeriesSlotId| ++------------+ +| 5286| ++------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show schema seriesslotid where database = root.sg ++------------+ +|SeriesSlotId| ++------------+ +| 5286| ++------------+ +Total line number = 1 +It costs 0.006s +``` + +### 展示过滤条件下的时间分区 +展示某设备或某数据库或某dataRegion的时间分区(TimePartition): +- `SHOW TIMEPARTITION WHERE (DEVICE=root.a.b |REGIONID = r0 | DATABASE=root.xxx) (AND STARTTIME=t1)?(AND ENDTIME=t2)?` + +有如下几点说明: + +1. TimePartition 是 SeriesTimeSlotId 的简称。 + +2. REGIONID 如果为 schemaRegion 的 Id 返回空,不报错。 +3. REGIONID 不支持批量查询,如果输入多个 REGIONID 则会报错,下同。 + +4. STARTTIME 和 ENDTIME 支持时间戳和通用日期。对于时间戳,必须得大于等于0,对于通用日期,需要不早于1970-01-01 00:00:00。 + +5. 返回结果中的 StartTime 为 TimePartition 对应时间区间的起始时间。 + +示例: +``` +IoTDB> show timePartition where device=root.sg.m1.d1 ++-------------------------------------+ +|TimePartition| StartTime| ++-------------------------------------+ +| 0|1970-01-01T00:00:00.000| ++-------------------------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show timePartition where regionId = 1 ++-------------------------------------+ +|TimePartition| StartTime| ++-------------------------------------+ +| 0|1970-01-01T00:00:00.000| ++-------------------------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> show timePartition where database = root.sg ++-------------------------------------+ +|TimePartition| StartTime| ++-------------------------------------+ +| 0|1970-01-01T00:00:00.000| ++-------------------------------------+ +| 1|1970-01-08T00:00:00.000| ++-------------------------------------+ +Total line number = 2 +It costs 0.007s +``` + +### 统计过滤条件下的时间分区个数 + +统计某设备或某数据库或某dataRegion的时间分区(TimePartition): + +- `COUNT TIMEPARTITION WHERE (DEVICE=root.a.b |REGIONID = r0 | DATABASE=root.xxx) (AND STARTTIME=t1)?(AND ENDTIME=t2)?` + +``` +IoTDB> count timePartition where device=root.sg.m1.d1 ++--------------------+ +|count(timePartition)| ++--------------------+ +| 1| ++--------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> count timePartition where regionId = 1 ++--------------------+ +|count(timePartition)| ++--------------------+ +| 1| ++--------------------+ +Total line number = 1 +It costs 0.007s + +IoTDB> count timePartition where database = root.sg ++--------------------+ +|count(timePartition)| ++--------------------+ +| 2| ++--------------------+ +Total line number = 1 +It costs 0.007s +``` + + +## 迁移 Region +以下 SQL 语句可以被用于手动迁移一个 region, 可用于负载均衡或其他目的。 +``` +MIGRATE REGION FROM TO +``` +示例: +``` +IoTDB> SHOW REGIONS ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +|RegionId| Type| Status| Database|SeriesSlotId|TimeSlotId|DataNodeId|RpcAddress|RpcPort| Role| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 3| 127.0.0.1| 6670| Leader| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 4| 127.0.0.1| 6681|Follower| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 5| 127.0.0.1| 6668|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 1| 127.0.0.1| 6667|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 3| 127.0.0.1| 6670|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 7| 127.0.0.1| 6669| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 3| 127.0.0.1| 6670| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 4| 127.0.0.1| 6681|Follower| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 5| 127.0.0.1| 6668|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 1| 127.0.0.1| 6667|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 5| 127.0.0.1| 6668| Leader| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 7| 127.0.0.1| 6669|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 3| 127.0.0.1| 6670|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681| Leader| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 7| 127.0.0.1| 6669|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 1| 127.0.0.1| 6667| Leader| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 5| 127.0.0.1| 6668|Follower| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +Total line number = 18 +It costs 0.161s + +IoTDB> MIGRATE REGION 1 FROM 3 TO 4 +Msg: The statement is executed successfully. + +IoTDB> SHOW REGIONS ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +|RegionId| Type| Status| Database|SeriesSlotId|TimeSlotId|DataNodeId|RpcAddress|RpcPort| Role| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 3| 127.0.0.1| 6670| Leader| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 4| 127.0.0.1| 6681|Follower| +| 0|SchemaRegion|Running|root.test.g_0| 500| 0| 5| 127.0.0.1| 6668|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 1| 127.0.0.1| 6667|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 4| 127.0.0.1| 6681|Follower| +| 1| DataRegion|Running|root.test.g_0| 183| 200| 7| 127.0.0.1| 6669| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 3| 127.0.0.1| 6670| Leader| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 4| 127.0.0.1| 6681|Follower| +| 2| DataRegion|Running|root.test.g_0| 181| 200| 5| 127.0.0.1| 6668|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 1| 127.0.0.1| 6667|Follower| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 5| 127.0.0.1| 6668| Leader| +| 3| DataRegion|Running|root.test.g_0| 180| 200| 7| 127.0.0.1| 6669|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 3| 127.0.0.1| 6670|Follower| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681| Leader| +| 4| DataRegion|Running|root.test.g_0| 179| 200| 7| 127.0.0.1| 6669|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 1| 127.0.0.1| 6667| Leader| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 4| 127.0.0.1| 6681|Follower| +| 5| DataRegion|Running|root.test.g_0| 179| 200| 5| 127.0.0.1| 6668|Follower| ++--------+------------+-------+-------------+------------+----------+----------+----------+-------+--------+ +Total line number = 18 +It costs 0.165s +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Setup.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Setup.md new file mode 100644 index 00000000..9e17dd98 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Cluster-Setup.md @@ -0,0 +1,436 @@ + + +# 集群安装和启动 + +## 1. 目标 + +本文档为 IoTDB 集群版(1.0.0)的安装及启动教程。 + +## 2. 前置检查 + +1. JDK>=1.8 的运行环境,并配置好 JAVA_HOME 环境变量。 +2. 设置最大文件打开数为 65535。 +3. 关闭交换内存。 +4. 首次启动ConfigNode节点时,确保已清空ConfigNode节点的data/confignode目录;首次启动DataNode节点时,确保已清空DataNode节点的data/datanode目录。 +5. 如果整个集群处在可信环境下,可以关闭机器上的防火墙选项。 +6. 在集群默认配置中,ConfigNode 会占用端口 10710 和 10720,DataNode 会占用端口 6667、10730、10740、10750 和 10760, +请确保这些端口未被占用,或者手动修改配置文件中的端口配置。 + +## 3. 安装包获取 + +你可以选择下载二进制文件(见 3.1)或从源代码编译(见 3.2)。 + +### 3.1 下载二进制文件 + +1. 打开官网[Download Page](https://iotdb.apache.org/Download/)。 +2. 下载 IoTDB 1.0.0 版本的二进制文件。 +3. 解压得到 apache-iotdb-1.0.0-all-bin 目录。 + +### 3.2 使用源码编译 + +#### 3.2.1 下载源码 + +**Git** +``` +git clone https://github.com/apache/iotdb.git +git checkout v1.0.0 +``` + +**官网下载** +1. 打开官网[Download Page](https://iotdb.apache.org/Download/)。 +2. 下载 IoTDB 1.0.0 版本的源码。 +3. 解压得到 apache-iotdb-1.0.0 目录。 + +#### 3.2.2 编译源码 + +在 IoTDB 源码根目录下: +``` +mvn clean package -pl distribution -am -DskipTests +``` + +编译成功后,可在目录 +**distribution/target/apache-iotdb-1.0.0-SNAPSHOT-all-bin/apache-iotdb-1.0.0-SNAPSHOT-all-bin** +找到集群版本的二进制文件。 + +## 4. 安装包说明 + +打开 apache-iotdb-1.0.0-SNAPSHOT-all-bin,可见以下目录: + +| **目录** | **说明** | +|----------|-----------------------------------------------------| +| conf | 配置文件目录,包含 ConfigNode、DataNode、JMX 和 logback 等配置文件 | +| data | 数据文件目录,包含 ConfigNode 和 DataNode 的数据文件 | +| lib | 库文件目录 | +| licenses | 证书文件目录 | +| logs | 日志文件目录,包含 ConfigNode 和 DataNode 的日志文件 | +| sbin | 脚本目录,包含 ConfigNode 和 DataNode 的启停移除脚本,以及 Cli 的启动脚本等 | +| tools | 系统工具目录 | + +## 5. 集群安装配置 + +### 5.1 集群安装 + +`apache-iotdb-1.0.0-SNAPSHOT-all-bin` 包含 ConfigNode 和 DataNode, +请将安装包部署于你目标集群的所有机器上,推荐将安装包部署于所有服务器的相同目录下。 + +如果你希望先在一台服务器上尝试部署 IoTDB 集群,请参考 +[Cluster Quick Start](https://iotdb.apache.org/zh/UserGuide/Master/QuickStart/ClusterQuickStart.html)。 + +### 5.2 集群配置 + +接下来需要修改每个服务器上的配置文件,登录服务器, +并将工作路径切换至 `apache-iotdb-1.0.0-SNAPSHOT-all-bin`, +配置文件在 `./conf` 目录内。 + +对于所有部署 ConfigNode 的服务器,需要修改通用配置(见 5.2.1)和 ConfigNode 配置(见 5.2.2)。 + +对于所有部署 DataNode 的服务器,需要修改通用配置(见 5.2.1)和 DataNode 配置(见 5.2.3)。 + +#### 5.2.1 通用配置 + +打开通用配置文件 ./conf/iotdb-system.properties, +可根据 [部署推荐](https://iotdb.apache.org/zh/UserGuide/Master/Cluster/Deployment-Recommendation.html) +设置以下参数: + +| **配置项** | **说明** | **默认** | +|--------------------------------------------|----------------------------------------|-------------------------------------------------| +| cluster\_name | 节点希望加入的集群的名称 | defaultCluster | +| config\_node\_consensus\_protocol\_class | ConfigNode 使用的共识协议 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| schema\_replication\_factor | 元数据副本数,DataNode 数量不应少于此数目 | 1 | +| schema\_region\_consensus\_protocol\_class | 元数据副本组的共识协议 | org.apache.iotdb.consensus.ratis.RatisConsensus | +| data\_replication\_factor | 数据副本数,DataNode 数量不应少于此数目 | 1 | +| data\_region\_consensus\_protocol\_class | 数据副本组的共识协议。注:RatisConsensus 目前不支持多数据目录 | org.apache.iotdb.consensus.iot.IoTConsensus | + +**注意:上述配置项在集群启动后即不可更改,且务必保证所有节点的通用配置完全一致,否则节点无法启动。** + +#### 5.2.2 ConfigNode 配置 + +打开 ConfigNode 配置文件 ./conf/iotdb-system.properties,根据服务器/虚拟机的 IP 地址和可用端口,设置以下参数: + +| **配置项** | **说明** | **默认** | **用法** | +|--------------------------------|--------------------------------------|-----------------|------------------------------------------------------------------------------------------------------------------------------------------------------| +| cn\_internal\_address | ConfigNode 在集群内部通讯使用的地址 | 127.0.0.1 | 设置为服务器的 IPV4 地址或域名 | +| cn\_internal\_port | ConfigNode 在集群内部通讯使用的端口 | 10710 | 设置为任意未占用端口 | +| cn\_consensus\_port | ConfigNode 副本组共识协议通信使用的端口 | 10720 | 设置为任意未占用端口 | +| cn\_target\_config\_node\_list | 节点注册加入集群时连接的 ConfigNode 的地址。注:只能配置一个 | 127.0.0.1:10710 | 对于 Seed-ConfigNode,设置为自己的 cn\_internal\_address:cn\_internal\_port;对于其它 ConfigNode,设置为另一个正在运行的 ConfigNode 的 cn\_internal\_address:cn\_internal\_port | + +**注意:上述配置项在节点启动后即不可更改,且务必保证所有端口均未被占用,否则节点无法启动。** + +#### 5.2.3 DataNode 配置 + +打开 DataNode 配置文件 ./conf/iotdb-system.properties,根据服务器/虚拟机的 IP 地址和可用端口,设置以下参数: + +| **配置项** | **说明** | **默认** | **用法** | +|-------------------------------------|---------------------------|-----------------|-----------------------------------------------------------------------------------| +| dn\_rpc\_address | 客户端 RPC 服务的地址 | 127.0.0.1 | 设置为服务器的 IPV4 地址或域名 | +| dn\_rpc\_port | 客户端 RPC 服务的端口 | 6667 | 设置为任意未占用端口 | +| dn\_internal\_address | DataNode 在集群内部接收控制流使用的地址 | 127.0.0.1 | 设置为服务器的 IPV4 地址或域名 | +| dn\_internal\_port | DataNode 在集群内部接收控制流使用的端口 | 10730 | 设置为任意未占用端口 | +| dn\_mpp\_data\_exchange\_port | DataNode 在集群内部接收数据流使用的端口 | 10740 | 设置为任意未占用端口 | +| dn\_data\_region\_consensus\_port | DataNode 的数据副本间共识协议通信的端口 | 10750 | 设置为任意未占用端口 | +| dn\_schema\_region\_consensus\_port | DataNode 的元数据副本间共识协议通信的端口 | 10760 | 设置为任意未占用端口 | +| dn\_target\_config\_node\_list | 集群中正在运行的 ConfigNode 地址 | 127.0.0.1:10710 | 设置为任意正在运行的 ConfigNode 的 cn\_internal\_address:cn\_internal\_port,可设置多个,用逗号(",")隔开 | + +**注意:上述配置项在节点启动后即不可更改,且务必保证所有端口均未被占用,否则节点无法启动。** + +## 6. 集群操作 + +### 6.1 启动集群 + +本小节描述如何启动包括若干 ConfigNode 和 DataNode 的集群。 +集群可以提供服务的标准是至少启动一个 ConfigNode 且启动 不小于(数据/元数据)副本个数 的 DataNode。 + +总体启动流程分为三步: + +1. 启动种子 ConfigNode +2. 增加 ConfigNode(可选) +3. 增加 DataNode + +#### 6.1.1 启动 Seed-ConfigNode + +**集群第一个启动的节点必须是 ConfigNode,第一个启动的 ConfigNode 必须遵循本小节教程。** + +第一个启动的 ConfigNode 是 Seed-ConfigNode,标志着新集群的创建。 +在启动 Seed-ConfigNode 前,请打开通用配置文件 ./conf/iotdb-system.properties,并检查如下参数: + +| **配置项** | **检查** | +|--------------------------------------------|---------------| +| cluster\_name | 已设置为期望的集群名称 | +| config\_node\_consensus\_protocol\_class | 已设置为期望的共识协议 | +| schema\_replication\_factor | 已设置为期望的元数据副本数 | +| schema\_region\_consensus\_protocol\_class | 已设置为期望的共识协议 | +| data\_replication\_factor | 已设置为期望的数据副本数 | +| data\_region\_consensus\_protocol\_class | 已设置为期望的共识协议 | + +**注意:** 请根据[部署推荐](https://iotdb.apache.org/zh/UserGuide/Master/Cluster/Deployment-Recommendation.html)配置合适的通用参数,这些参数在首次配置后即不可修改。 + +接着请打开它的配置文件 ./conf/iotdb-system.properties,并检查如下参数: + +| **配置项** | **检查** | +|--------------------------------|----------------------------------------------------------| +| cn\_internal\_address | 已设置为服务器的 IPV4 地址或域名 | +| cn\_internal\_port | 该端口未被占用 | +| cn\_consensus\_port | 该端口未被占用 | +| cn\_target\_config\_node\_list | 已设置为自己的内部通讯地址,即 cn\_internal\_address:cn\_internal\_port | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-confignode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +ConfigNode 的其它配置参数可参考 +[ConfigNode 配置参数](https://iotdb.apache.org/zh/UserGuide/Master/Reference/ConfigNode-Config-Manual.html)。 + +#### 6.1.2 增加更多 ConfigNode(可选) + +**只要不是第一个启动的 ConfigNode 就必须遵循本小节教程。** + +可向集群添加更多 ConfigNode,以保证 ConfigNode 的高可用。常用的配置为额外增加两个 ConfigNode,使集群共有三个 ConfigNode。 + +新增的 ConfigNode 需要保证 ./conf/iotdb-common.properites 中的所有配置参数与 Seed-ConfigNode 完全一致,否则可能启动失败或产生运行时错误。 +因此,请着重检查通用配置文件中的以下参数: + +| **配置项** | **检查** | +|--------------------------------------------|------------------------| +| cluster\_name | 与 Seed-ConfigNode 保持一致 | +| config\_node\_consensus\_protocol\_class | 与 Seed-ConfigNode 保持一致 | +| schema\_replication\_factor | 与 Seed-ConfigNode 保持一致 | +| schema\_region\_consensus\_protocol\_class | 与 Seed-ConfigNode 保持一致 | +| data\_replication\_factor | 与 Seed-ConfigNode 保持一致 | +| data\_region\_consensus\_protocol\_class | 与 Seed-ConfigNode 保持一致 | + +接着请打开它的配置文件 ./conf/iotdb-system.properties,并检查以下参数: + +| **配置项** | **检查** | +|--------------------------------|--------------------------------------------------------------| +| cn\_internal\_address | 已设置为服务器的 IPV4 地址或域名 | +| cn\_internal\_port | 该端口未被占用 | +| cn\_consensus\_port | 该端口未被占用 | +| cn\_target\_config\_node\_list | 已设置为另一个正在运行的 ConfigNode 的内部通讯地址,推荐使用 Seed-ConfigNode 的内部通讯地址 | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-confignode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-confignode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-confignode.bat +``` + +ConfigNode 的其它配置参数可参考 +[ConfigNode配置参数](https://iotdb.apache.org/zh/UserGuide/Master/Reference/ConfigNode-Config-Manual.html)。 + +#### 6.1.3 增加 DataNode + +**确保集群已有正在运行的 ConfigNode 后,才能开始增加 DataNode。** + +可以向集群中添加任意个 DataNode。 +在添加新的 DataNode 前,请先打开通用配置文件 ./conf/iotdb-system.properties 并检查以下参数: + +| **配置项** | **检查** | +|--------------------------------------------|------------------------| +| cluster\_name | 与 Seed-ConfigNode 保持一致 | + +接着打开它的配置文件 ./conf/iotdb-system.properties 并检查以下参数: + +| **配置项** | **检查** | +|-------------------------------------|-----------------------------------------------------------| +| dn\_rpc\_address | 已设置为服务器的 IPV4 地址或域名 | +| dn\_rpc\_port | 该端口未被占用 | +| dn\_internal\_address | 已设置为服务器的 IPV4 地址或域名 | +| dn\_internal\_port | 该端口未被占用 | +| dn\_mpp\_data\_exchange\_port | 该端口未被占用 | +| dn\_data\_region\_consensus\_port | 该端口未被占用 | +| dn\_schema\_region\_consensus\_port | 该端口未被占用 | +| dn\_target\_config\_node\_list | 已设置为正在运行的 ConfigNode 的内部通讯地址,推荐使用 Seed-ConfigNode 的内部通讯地址 | + +检查完毕后,即可在服务器上运行启动脚本: + +``` +# Linux 前台启动 +bash ./sbin/start-datanode.sh + +# Linux 后台启动 +nohup bash ./sbin/start-datanode.sh >/dev/null 2>&1 & + +# Windows +.\sbin\start-datanode.bat +``` + +DataNode 的其它配置参数可参考 +[DataNode配置参数](https://iotdb.apache.org/zh/UserGuide/Master/Reference/DataNode-Config-Manual.html)。 + +**注意:当且仅当集群拥有不少于副本个数(max{schema\_replication\_factor, data\_replication\_factor})的 DataNode 后,集群才可以提供服务** + +### 6.2 启动 Cli + +若搭建的集群仅用于本地调试,可直接执行 ./sbin 目录下的 Cli 启动脚本: + +``` +# Linux +./sbin/start-cli.sh + +# Windows +.\sbin\start-cli.bat +``` + +若希望通过 Cli 连接生产环境的集群, +请阅读 [Cli 使用手册](https://iotdb.apache.org/zh/UserGuide/Master/QuickStart/Command-Line-Interface.html)。 + +### 6.3 验证集群 + +以在6台服务器上启动的3C3D(3个ConfigNode 和 3个DataNode)集群为例, +这里假设3个ConfigNode的IP地址依次为192.168.1.10、192.168.1.11、192.168.1.12,且3个ConfigNode启动时均使用了默认的端口10710与10720; +3个DataNode的IP地址依次为192.168.1.20、192.168.1.21、192.168.1.22,且3个DataNode启动时均使用了默认的端口6667、10730、10740、10750与10760。 + +当按照6.1步骤成功启动集群后,在 Cli 执行 `show cluster details`,看到的结果应当如下: + +``` +IoTDB> show cluster details ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|ConfigConsensusPort| RpcAddress|RpcPort|MppPort|SchemaConsensusPort|DataConsensusPort| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +| 0|ConfigNode|Running| 192.168.1.10| 10710| 10720| | | | | | +| 2|ConfigNode|Running| 192.168.1.11| 10710| 10720| | | | | | +| 3|ConfigNode|Running| 192.168.1.12| 10710| 10720| | | | | | +| 1| DataNode|Running| 192.168.1.20| 10730| |192.168.1.20| 6667| 10740| 10750| 10760| +| 4| DataNode|Running| 192.168.1.21| 10730| |192.168.1.21| 6667| 10740| 10750| 10760| +| 5| DataNode|Running| 192.168.1.22| 10730| |192.168.1.22| 6667| 10740| 10750| 10760| ++------+----------+-------+---------------+------------+-------------------+------------+-------+-------+-------------------+-----------------+ +Total line number = 6 +It costs 0.012s +``` + +若所有节点的状态均为 **Running**,则说明集群部署成功; +否则,请阅读启动失败节点的运行日志,并检查对应的配置参数。 + +### 6.4 停止 IoTDB 进程 + +本小节描述如何手动关闭 IoTDB 的 ConfigNode 或 DataNode 进程。 + +#### 6.4.1 使用脚本停止 ConfigNode + +执行停止 ConfigNode 脚本: + +``` +# Linux +./sbin/stop-confignode.sh + +# Windows +.\sbin\stop-confignode.bat +``` + +#### 6.4.2 使用脚本停止 DataNode + +执行停止 DataNode 脚本: + +``` +# Linux +./sbin/stop-datanode.sh + +# Windows +.\sbin\stop-datanode.bat +``` + +#### 6.4.3 停止节点进程 + +首先获取节点的进程号: + +``` +jps + +# 或 + +ps aux | grep iotdb +``` + +结束进程: + +``` +kill -9 +``` + +**注意:有些端口的信息需要 root 权限才能获取,在此情况下请使用 sudo** + +### 6.5 集群缩容 + +本小节描述如何将 ConfigNode 或 DataNode 移出集群。 + +#### 6.5.1 移除 ConfigNode + +在移除 ConfigNode 前,请确保移除后集群至少还有一个活跃的 ConfigNode。 +在活跃的 ConfigNode 上执行 remove-confignode 脚本: + +``` +# Linux +## 根据 confignode_id 移除节点 +./sbin/remove-confignode.sh + +## 根据 ConfigNode 内部通讯地址和端口移除节点 +./sbin/remove-confignode.sh : + + +# Windows +## 根据 confignode_id 移除节点 +.\sbin\remove-confignode.bat + +## 根据 ConfigNode 内部通讯地址和端口移除节点 +.\sbin\remove-confignode.bat : +``` + +#### 6.5.2 移除 DataNode + +在移除 DataNode 前,请确保移除后集群至少还有不少于(数据/元数据)副本个数的 DataNode。 +在活跃的 DataNode 上执行 remove-datanode 脚本: + +``` +# Linux +## 根据 datanode_id 移除节点 +./sbin/remove-datanode.sh + +## 根据 DataNode RPC 服务地址和端口移除节点 +./sbin/remove-datanode.sh : + + +# Windows +## 根据 datanode_id 移除节点 +.\sbin\remove-datanode.bat + +## 根据 DataNode RPC 服务地址和端口移除节点 +.\sbin\remove-datanode.bat : +``` + +## 7. 常见问题 + +请参考 [分布式部署FAQ](https://iotdb.apache.org/zh/UserGuide/Master/FAQ/FAQ-for-cluster-setup.html) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Get-Installation-Package.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Get-Installation-Package.md new file mode 100644 index 00000000..cbd11bdf --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Cluster/Get-Installation-Package.md @@ -0,0 +1,213 @@ + + +# 安装包获取 + +IoTDB 为您提供了三种安装方式,您可以参考下面的建议,任选其中一种: + +第一种,从官网下载安装包。这是我们推荐使用的安装方式,通过该方式,您将得到一个可以立即使用的、打包好的二进制可执行文件。 + +第二种,使用源码编译。若您需要自行修改代码,可以使用该安装方式。 + +第三种,使用 Docker 镜像。dockerfile 文件位于[github](https://github.com/apache/iotdb/blob/master/docker) + +## 安装环境要求 + +安装前请保证您的电脑上配有 JDK>=1.8 的运行环境,并配置好 JAVA_HOME 环境变量。 + +如果您需要从源码进行编译,还需要安装: + +1. Maven >= 3.6 的运行环境,具体安装方法可以参考以下链接:[https://maven.apache.org/install.html](https://maven.apache.org/install.html)。 + +> 注: 也可以选择不安装,使用我们提供的'mvnw' 或 'mvnw.cmd' 工具。使用时请用'mvnw' 或 'mvnw.cmd'命令代替下文的'mvn'命令。 + +## 从官网下载二进制可执行文件 + +您可以从 [http://iotdb.apache.org/Download/](http://iotdb.apache.org/Download/) 上下载已经编译好的可执行程序 iotdb-xxx.zip,该压缩包包含了 IoTDB 系统运行所需的所有必要组件。 + +下载后,您可使用以下操作对 IoTDB 的压缩包进行解压: + +``` +Shell > unzip iotdb-.zip +``` + +## 使用源码编译 + +您可以获取已发布的源码 [https://iotdb.apache.org/Download/](https://iotdb.apache.org/Download/) ,或者从 [https://github.com/apache/iotdb/tree/master](https://github.com/apache/iotdb/tree/master) git 仓库获取 + +源码克隆后,进入到源码文件夹目录下。如果您想编译已经发布过的版本,可以先用`git checkout -b my_{project.version} v{project.version}`命令新建并切换分支。比如您要编译0.12.4这个版本,您可以用如下命令去切换分支: + +```shell +> git checkout -b my_0.12.4 v0.12.4 +``` + +切换分支之后就可以使用以下命令进行编译: + +``` +> mvn clean package -pl server -am -Dmaven.test.skip=true +``` + +编译后,IoTDB 服务器会在 "server/target/iotdb-server-{project.version}" 文件夹下,包含以下内容: + +``` ++- sbin/ <-- script files +| ++- conf/ <-- configuration files +| ++- lib/ <-- project dependencies +| ++- tools/ <-- system tools +``` + +如果您想要编译项目中的某个模块,您可以在源码文件夹中使用`mvn clean package -pl {module.name} -am -DskipTests`命令进行编译。如果您需要的是带依赖的 jar 包,您可以在编译命令后面加上`-P get-jar-with-dependencies`参数。比如您想编译带依赖的 jdbc jar 包,您就可以使用以下命令进行编译: + +```shell +> mvn clean package -pl jdbc -am -DskipTests -P get-jar-with-dependencies +``` + +编译完成后就可以在`{module.name}/target`目录中找到需要的包了。 + +## 通过 Docker 安装 + +Apache IoTDB 的 Docker 镜像已经上传至 [https://hub.docker.com/r/apache/iotdb](https://hub.docker.com/r/apache/iotdb)。 +Apache IoTDB 的配置项以环境变量形式添加到容器内。 + +### 简单尝试 +```shell +# 获取镜像 +docker pull apache/iotdb:1.1.0-standalone +# 创建 docker bridge 网络 +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +# 创建 docker 容器 +# 注意:必须固定IP部署。IP改变会导致 confignode 启动失败。 +docker run -d --name iotdb-service \ + --hostname iotdb-service \ + --network iotdb \ + --ip 172.18.0.6 \ + -p 6667:6667 \ + -e cn_internal_address=iotdb-service \ + -e cn_seed_config_node=iotdb-service:10710 \ + -e cn_internal_port=10710 \ + -e cn_consensus_port=10720 \ + -e dn_rpc_address=iotdb-service \ + -e dn_internal_address=iotdb-service \ + -e dn_seed_config_node=iotdb-service:10710 \ + -e dn_mpp_data_exchange_port=10740 \ + -e dn_schema_region_consensus_port=10750 \ + -e dn_data_region_consensus_port=10760 \ + -e dn_rpc_port=6667 \ + apache/iotdb:1.1.0-standalone +# 尝试使用命令行执行SQL +docker exec -ti iotdb-service /iotdb/sbin/start-cli.sh -h iotdb-service +``` +外部连接: +```shell +# <主机IP/hostname> 是物理机的真实IP或域名。如果在同一台物理机,可以是127.0.0.1。 +$IOTDB_HOME/sbin/start-cli.sh -h <主机IP/hostname> -p 6667 +``` +```yaml +# docker-compose-1c1d.yml +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.1.0-standalone + hostname: iotdb-service + container_name: iotdb-service + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb-service + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-service:10710 + - dn_rpc_address=iotdb-service + - dn_internal_address=iotdb-service + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb-service:10710 + volumes: + - ./data:/iotdb/data + - ./logs:/iotdb/logs + networks: + iotdb: + ipv4_address: 172.18.0.6 + +networks: + iotdb: + external: true +``` +### 集群部署 +目前只支持 host 网络和 overlay 网络,不支持 bridge 网络。overlay 网络参照[1C2D](https://github.com/apache/iotdb/tree/master/docker/src/main/DockerCompose/docker-compose-cluster-1c2d.yml)的写法,host 网络如下。 + +假如有三台物理机,它们的hostname分别是iotdb-1、iotdb-2、iotdb-3。依次启动。 +以 iotdb-2 节点的docker-compose文件为例: +```yaml +version: "3" +services: + iotdb-confignode: + image: apache/iotdb:1.1.0-confignode + container_name: iotdb-confignode + environment: + - cn_internal_address=iotdb-2 + - cn_seed_config_node=iotdb-1:10710 + - schema_replication_factor=3 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - data_replication_factor=3 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/confignode:/iotdb/data + - ./logs/confignode:/iotdb/logs + network_mode: "host" + + iotdb-datanode: + image: apache/iotdb:1.1.0-datanode + container_name: iotdb-datanode + environment: + - dn_rpc_address=iotdb-2 + - dn_internal_address=iotdb-2 + - dn_seed_config_node=iotdb-1:10710 + - data_replication_factor=3 + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/datanode:/iotdb/data/ + - ./logs/datanode:/iotdb/logs/ + network_mode: "host" +``` +注意: +1. `dn_seed_config_node`所有节点配置一样,需要配置第一个启动的节点,这里为`iotdb-1`。 +2. 上面docker-compose文件中,`iotdb-2`需要替换为每个节点的 hostname、域名或者IP地址。 +3. 需要映射`/etc/hosts`,文件内配置了 iotdb-1、iotdb-2、iotdb-3 与IP的映射。或者可以在 docker-compose 文件中增加 `extra_hosts` 配置。 +4. 首次启动时,必须首先启动 `iotdb-1`。 +5. 如果部署失败要重新部署集群,必须将所有节点上的IoTDB服务停止并删除,然后清除`data`和`logs`文件夹后,再启动。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/ClusterQuickStart.md b/src/zh/UserGuide/V2.0.1/Tree/stage/ClusterQuickStart.md new file mode 100644 index 00000000..0c736f23 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/ClusterQuickStart.md @@ -0,0 +1,276 @@ + + +# 快速上手(集群版) +本文将简单介绍 IoTDB 集群的安装配置、扩容和缩容等常规操作。 +遇到问题可以看: +[FAQ](../FAQ/Frequently-asked-questions.md) + +## 安装部署 + +部署集群时我们推荐优先使用`hostname`来配置集群,这样可以避免一些网络问题。需要在每个节点上分别配置`/etc/hosts` ,windows 上为`C:\Windows\System32\drivers\etc\hosts`。 + +我们将以最小的改动,启动一个含有3个 ConfigNode 和3个DataNode(3C3D)集群: +- 数据/元数据副本数为1 +- 集群名称为defaultCluster +- Confignode JVM 的最大堆内存配置为机器内存的 1/4 +- Datanode JVM 的最大堆内存配置为机器内存的 1/4 + +假设有3台物理机(下面称节点),操作系统为Linux,并且已经安装配置好了JAVA环境(具体见[单机版对安装环境说明](../QuickStart/QuickStart.md)),安装目录均为`/data/iotdb`。 +IP地址和服务角色分配如下: + +| 节点IP | 192.168.132.10 | 192.168.132.11 | 192.168.132.12 | +|--------|:---------------|:---------------|:---------------| +| `hostname` | iotdb-1 | iotdb-2 | iotdb-3 | +| 服务 | ConfigNode | ConfigNode | ConfigNode | +| 服务 | DataNode | DataNode | DataNode | + +端口占用: + +| 服务 | ConfigNode | DataNode | +|---|---|---| +| 端口 | 10710, 10720 | 6667, 10730, 10740, 10750, 10760 | + +**说明:** +- 可以使用`IP地址`或者`hostname(机器名/域名)`来安装配置 IoTDB 集群,本文以`hostname(机器名/域名)`为例。使用`hostname(机器名/域名)`,需要配置`/etc/hosts`。 +- 优先推荐使用 `hostname(机器名/域名)` 进行配置,这样可以避免一些网络问题,也更方便迁移集群。 +- JVM堆内存配置: `confignode-env.sh` 和 `datanode-env.sh` 内配置`ON_HEAP_MEMORY`, 建议设置值大于等于1G。ConfigNode 1~2G就足够了,DataNode的内存配置则要取决于数据接入的数据量和查询数据量。 + +### 下载安装包 +在每个节点,将安装包[下载](https://iotdb.apache.org/Download/)后,解压到安装目录,这里为`/data/iotdb`。 +目录结构: +```shell +/data/iotdb/ +├── conf # 配置文件 +├── lib # jar library +├── sbin # 启动/停止等脚本 +└── tools # 其他工具 +``` + +### 修改节点配置文件 + +在每个节点均配置 hosts +```shell +echo "192.168.132.10 iotdb-1" >> /etc/hosts +echo "192.168.132.11 iotdb-2" >> /etc/hosts +echo "192.168.132.12 iotdb-3" >> /etc/hosts +``` + +配置文件在 `/data/iotdb/conf`目录下。 +按照下表修改相应的配置文件: + +| 配置| 配置项 |IP:192.168.132.10 | IP:192.168.132.11 | IP:192.168.132.12 | +|------------|:-------------------------------|----------------------|----------------------|:---------------------| +| iotdb-system.properties | cn_internal_address | iotdb-1 | iotdb-2 | iotdb-3 | +| iotdb-system.properties | cn_seed_config_node | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | +| iotdb-system.properties | dn_rpc_address | iotdb-1 | iotdb-2 | iotdb-3 | +| iotdb-system.properties | dn_internal_address | iotdb-1 | iotdb-2 | iotdb-3 | +| iotdb-system.properties | dn_seed_config_node | iotdb-1:10710 | iotdb-1:10710 | iotdb-1:10710 | + +**注意:** +我们推荐所有节点的 iotdb-system.properties 和 JVM 的内存配置是一致的。 + +### 启动集群 +启动集群前,需保证配置正确,保证 IoTDB 安装目录下没有数据(`data`目录)。 +#### 启动第一个节点 +即上面表格中`cn_seed_config_node`配置的节点。 +登录该节点 `iotdb-1(192.168.132.10)`,执行下面命令: +```shell +cd /data/iotdb +# 启动 ConfigNode 和 DataNode 服务 +sbin/start-standalone.sh + +# 查看 DataNode 日志以确定启动成功 +tail -f logs/log_datanode_all.log +# 期望看见类似下方的日志 +# 2023-07-21 20:26:01,881 [main] INFO o.a.i.db.service.DataNode:192 - Congratulation, IoTDB DataNode is set up successfully. Now, enjoy yourself! +``` + +如果没有看到上面所说的日志或者看到了 Exception,那么代表启动失败了。请查看 `/data/iotdb/logs` 目录内的`log_confignode_all.log` 和 `log_datanode_all.log` 日志文件。 + +**注意**: +- 要保证第一个节点启动成功后,再启动其他节点。确切的说,要先保证第一个 ConfigNode 服务启动成功,即`cn_seed_config_node`配置的节点。 +- 如果启动失败,需要[清理环境](#【附录】清理环境)后,再次启动。 +- ConfigNode 和 DataNode 服务都可以单独启动: +```shell +# 单独启动 ConfigNode, 后台启动 +sbin/start-confignode.sh -d +# 单独启动 DataNode,后台启动 +sbin/start-datanode.sh -d +``` + +#### 启动其他两个节点的 ConfigNode 和 DataNode +在节点 `iotdb-2(192.168.132.11)` 和 `iotdb-3(192.168.132.12)` 两个节点上分别执行: +```shell +cd /data/iotdb +# 启动 ConfigNode 和 DataNode 服务 +sbin/start-standalone.sh +``` +如果启动失败,需要在所有节点执行[清理环境](#【附录】清理环境)后,然后从启动第一个节点开始,再重新执行一次。 + +#### 检验集群状态 +在任意节点上,在 Cli 执行 `show cluster`: +```shell +/data/iotdb/sbin/start-cli.sh -h iotdb-1 +IoTDB>show cluster; +# 示例结果如下: ++------+----------+-------+---------------+------------+-------+---------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version|BuildInfo| ++------+----------+-------+---------------+------------+-------+---------+ +| 0|ConfigNode|Running| iotdb-1 | 10710|1.x.x | xxxxxxx| +| 1| DataNode|Running| iotdb-1 | 10730|1.x.x | xxxxxxx| +| 2|ConfigNode|Running| iotdb-2 | 10710|1.x.x | xxxxxxx| +| 3| DataNode|Running| iotdb-2 | 10730|1.x.x | xxxxxxx| +| 4|ConfigNode|Running| iotdb-3 | 10710|1.x.x | xxxxxxx| +| 5| DataNode|Running| iotdb-3 | 10730|1.x.x | xxxxxxx| ++------+----------+-------+---------------+------------+-------+---------+ +``` +**说明:** +`start-cli.sh -h` 后指定的IP地址,可以是任意一个 DataNode 的IP地址。 + + +### 【附录】清理环境 +在所有节点执行: +1. 结束 ConfigNode 和 DataNode 进程。 +```shell +# 1. 停止 ConfigNode 和 DataNode 服务 +sbin/stop-standalone.sh + +# 2. 检查是否还有进程残留 +jps +# 或者 +ps -ef|grep iotdb + +# 3. 如果有进程残留,则手动kill +kill -9 +# 如果确定机器上仅有1个iotdb,可以使用下面命令清理残留进程 +ps -ef|grep iotdb|grep -v grep|tr -s ' ' ' ' |cut -d ' ' -f2|xargs kill -9 +``` + +2. 删除 data 和 logs 目录。 +```shell +cd /data/iotdb +rm -rf data logs +``` + +说明:删除 data 目录是必要的,删除 logs 目录是为了纯净日志,非必需。 + + +## 集群扩容 +扩容方式与上方启动其他节点相同。也就是,在要添加的节点上,下载IoTDB的安装包,解压,修改配置,然后启动。这里要添加节点的IP为 `192.168.132.13` +**注意:** +- 扩容的节点必须是干净的节点,不能有数据(也就是`data`目录) +- iotdb-system.properties中的`cluster_name`的配置必须和已有集群一致。 +- `cn_seed_config_node` 和 `dn_seed_config_node`的配置必须和已有集群一致。 +- 原有数据不会移动到新节点,新创建的元数据分区和数据分区很可能在新的节点。 + +### 修改配置 +在原节点上新增一行 hosts +```shell +echo "192.168.132.13 iotdb-4" >> /etc/hosts +``` + +在节点设置 hosts +```shell +echo "192.168.132.10 iotdb-1" >> /etc/hosts +echo "192.168.132.11 iotdb-2" >> /etc/hosts +echo "192.168.132.12 iotdb-3" >> /etc/hosts +echo "192.168.132.13 iotdb-4" >> /etc/hosts +``` +按照下表修改相应的配置文件: + +| 配置 | 配置项 | IP:192.168.132.13 | +|------------|:-------------------------------|:---------------------| +| iotdb-system.properties | cn_internal_address | iotdb-4 | +| iotdb-system.properties | cn_seed_config_node | iotdb-1:10710 | +| iotdb-system.properties | dn_rpc_address | iotdb-4 | +| iotdb-system.properties | dn_internal_address | iotdb-4 | +| iotdb-system.properties | dn_seed_config_node | iotdb-1:10710 | + +### 扩容 +在新增节点`iotdb-4(192.168.132.13)`上,执行: +```shell +cd /data/iotdb +# 启动 ConfigNode 和 DataNode 服务 +sbin/start-standalone.sh +``` + +### 验证扩容结果 +在 Cli 执行 `show cluster`,结果如下: +```shell +/data/iotdb/sbin/start-cli.sh -h iotdb-1 +IoTDB>show cluster; +# 示例结果如下: ++------+----------+-------+---------------+------------+-------+---------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version|BuildInfo| ++------+----------+-------+---------------+------------+-------+---------+ +| 0|ConfigNode|Running| iotdb-1 | 10710|1.x.x | xxxxxxx| +| 1| DataNode|Running| iotdb-1 | 10730|1.x.x | xxxxxxx| +| 2|ConfigNode|Running| iotdb-2 | 10710|1.x.x | xxxxxxx| +| 3| DataNode|Running| iotdb-2 | 10730|1.x.x | xxxxxxx| +| 4|ConfigNode|Running| iotdb-3 | 10710|1.x.x | xxxxxxx| +| 5| DataNode|Running| iotdb-3 | 10730|1.x.x | xxxxxxx| +| 6|ConfigNode|Running| iotdb-4 | 10710|1.x.x | xxxxxxx| +| 7| DataNode|Running| iotdb-4 | 10730|1.x.x | xxxxxxx| ++------+----------+-------+---------------+------------+-------+---------+ +``` + +## 集群缩容 +**注意:** +- 可以在任何一个集群内的节点上,执行缩容操作。 +- 集群内的任意节点都可以被缩容。但是存留的 DataNode 服务不能小于副本数设置。 +- 请耐心等待缩容脚本执行结束,并仔细阅读日志说明,尤其是结束前的指南说明。 + +### 缩容一个 ConfigNode +```shell +cd /data/iotdb +# 方式一:使用 ip:port 移除 +sbin/remove-confignode.sh iotdb-4:10710 + +# 方式二:使用节点编号移除, `show cluster`中的 NodeID +sbin/remove-confignode.sh 6 +``` + +### 缩容一个 DataNode +```shell +cd /data/iotdb +# 方式一:使用 ip:port 移除 +sbin/remove-datanode.sh iotdb-4:6667 + +# 方式二:使用节点编号移除, `show cluster`中的 NodeID +sbin/remove-datanode.sh 7 +``` + +### 验证缩容结果 + +在 Cli 执行 `show cluster`,结果如下: +```shell ++------+----------+-------+---------------+------------+-------+---------+ +|NodeID| NodeType| Status|InternalAddress|InternalPort|Version|BuildInfo| ++------+----------+-------+---------------+------------+-------+---------+ +| 0|ConfigNode|Running| iotdb-1 | 10710|1.x.x | xxxxxxx| +| 1| DataNode|Running| iotdb-1 | 10730|1.x.x | xxxxxxx| +| 2|ConfigNode|Running| iotdb-2 | 10710|1.x.x | xxxxxxx| +| 3| DataNode|Running| iotdb-2 | 10730|1.x.x | xxxxxxx| +| 4|ConfigNode|Running| iotdb-3 | 10710|1.x.x | xxxxxxx| +| 5| DataNode|Running| iotdb-3 | 10730|1.x.x | xxxxxxx| ++------+----------+-------+---------------+------------+-------+---------+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Command-Line-Interface.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Command-Line-Interface.md new file mode 100644 index 00000000..1f29df1d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Command-Line-Interface.md @@ -0,0 +1,275 @@ + + +# SQL 命令行终端 (CLI) + +IOTDB 为用户提供 cli/Shell 工具用于启动客户端和服务端程序。下面介绍每个 cli/Shell 工具的运行方式和相关参数。 +> \$IOTDB\_HOME 表示 IoTDB 的安装目录所在路径。 + +## 安装 +如果使用源码版,可以在 iotdb 的根目录下执行 + +```shell +> mvn clean package -pl iotdb-client/cli -am -DskipTests +``` + +在生成完毕之后,IoTDB 的 Cli 工具位于文件夹"cli/target/iotdb-cli-{project.version}"中。 + +如果你下载的是二进制版,那么 Cli 可以在 sbin 文件夹下直接找到。 + +## 运行 + +### Cli 运行方式 +安装后的 IoTDB 中有一个默认用户:`root`,默认密码为`root`。用户可以使用该用户尝试运行 IoTDB 客户端以测试服务器是否正常启动。客户端启动脚本为$IOTDB_HOME/sbin 文件夹下的`start-cli`脚本。启动脚本时需要指定运行 IP 和 RPC PORT。以下为服务器在本机启动,且用户未更改运行端口号的示例,默认端口为 6667。若用户尝试连接远程服务器或更改了服务器运行的端口号,请在-h 和-p 项处使用服务器的 IP 和 RPC PORT。
+用户也可以在启动脚本的最前方设置自己的环境变量,如 JAVA_HOME 等 (对于 linux 用户,脚本路径为:"/sbin/start-cli.sh"; 对于 windows 用户,脚本路径为:"/sbin/start-cli.bat") + +Linux 系统与 MacOS 系统启动命令如下: + +```shell +Shell > bash sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root +``` +Windows 系统启动命令如下: + +```shell +Shell > sbin\start-cli.bat -h 127.0.0.1 -p 6667 -u root -pw root +``` +回车后即可成功启动客户端。启动后出现如图提示即为启动成功。 + +``` + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version + +Successfully login at 127.0.0.1:6667 +``` +输入`quit`或`exit`可退出 cli 结束本次会话,cli 输出`quit normally`表示退出成功。 + +### Cli 运行参数 + +|参数名|参数类型|是否为必需参数| 说明| 例子 | +|:---|:---|:---|:---|:---| +|-disableISO8601 |没有参数 | 否 |如果设置了这个参数,IoTDB 将以数字的形式打印时间戳 (timestamp)。|-disableISO8601| +|-h <`host`> |string 类型,不需要引号|是|IoTDB 客户端连接 IoTDB 服务器的 IP 地址。|-h 10.129.187.21| +|-help|没有参数|否|打印 IoTDB 的帮助信息|-help| +|-p <`rpcPort`>|int 类型|是|IoTDB 连接服务器的端口号,IoTDB 默认运行在 6667 端口。|-p 6667| +|-pw <`password`>|string 类型,不需要引号|否|IoTDB 连接服务器所使用的密码。如果没有输入密码 IoTDB 会在 Cli 端提示输入密码。|-pw root| +|-u <`username`>|string 类型,不需要引号|是|IoTDB 连接服务器锁使用的用户名。|-u root| +|-maxPRC <`maxPrintRowCount`>|int 类型|否|设置 IoTDB 返回客户端命令行中所显示的最大行数。|-maxPRC 10| +|-e <`execute`> |string 类型|否|在不进入客户端输入模式的情况下,批量操作 IoTDB|-e "show databases"| +|-c | 空 | 否 | 如果服务器设置了 `rpc_thrift_compression_enable=true`, 则 CLI 必须使用 `-c` | -c | + +下面展示一条客户端命令,功能是连接 IP 为 10.129.187.21 的主机,端口为 6667 ,用户名为 root,密码为 root,以数字的形式打印时间戳,IoTDB 命令行显示的最大行数为 10。 + +Linux 系统与 MacOS 系统启动命令如下: + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` +Windows 系统启动命令如下: + +```shell +Shell > sbin\start-cli.bat -h 10.129.187.21 -p 6667 -u root -pw root -disableISO8601 -maxPRC 10 +``` + +### CLI 特殊命令 +下面列举了一些CLI的特殊命令。 + +| 命令 | 描述 / 例子 | +|:---|:---| +| `set time_display_type=xxx` | 例如: long, default, ISO8601, yyyy-MM-dd HH:mm:ss | +| `show time_display_type` | 显示时间显示方式 | +| `set time_zone=xxx` | 例如: +08:00, Asia/Shanghai | +| `show time_zone` | 显示CLI的时区 | +| `set fetch_size=xxx` | 设置从服务器查询数据时的读取条数 | +| `show fetch_size` | 显示读取条数的大小 | +| `set max_display_num=xxx` | 设置 CLI 一次展示的最大数据条数, 设置为-1表示无限制 | +| `help` | 获取CLI特殊命令的提示 | +| `exit/quit` | 退出CLI | + +### 使用 OpenID 作为用户名认证登录 + +OpenID Connect (OIDC) 使用 keycloack 作为 OIDC 服务权限认证服务。 + +#### 配置 +配置位于 iotdb-system.properties,设定 authorizer_provider_class 为 org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer 则开启了 openID 服务,默认情况下值为 org.apache.iotdb.commons.auth.authorizer.LocalFileAuthorizer 表示没有开启 openID 服务。 + +``` +authorizer_provider_class=org.apache.iotdb.commons.auth.authorizer.OpenIdAuthorizer +``` +如果开启了 openID 服务则 openID_url 为必填项,openID_url 值为 http://ip:port/realms/{realmsName} + +``` +openID_url=http://127.0.0.1:8080/realms/iotdb/ +``` +####keycloack 配置 + +1、下载 keycloack 程序(此教程为21.1.0版本),在 keycloack/bin 中启动 keycloack + +```shell +Shell > cd bin +Shell > ./kc.sh start-dev +``` +2、使用 https://ip:port 登陆 keycloack, 首次登陆需要创建用户 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/login_keycloak.png?raw=true) + +3、点击 Administration Console 进入管理端 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/AdministrationConsole.png?raw=true) + +4、在左侧的 Master 菜单点击 Create Realm, 输入 Realm Name 创建一个新的 Realm + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_1.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_Realm_2.jpg?raw=true) + +5、点击左侧菜单 Clients,创建 client + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/client.jpg?raw=true) + +6、点击左侧菜单 User,创建 user + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/user.jpg?raw=true) + +7、点击新创建的用户 id,点击 Credentials 导航输入密码和关闭 Temporary 选项,至此 keyclork 配置完成 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/pwd.jpg?raw=true) + +8、创建角色,点击左侧菜单的 Roles然后点击Create Role 按钮添加角色 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role1.jpg?raw=true) + +9、在Role Name 中输入`iotdb_admin`,点击save 按钮。提示:这里的`iotdb_admin`不能为其他名称否则即使登陆成功后也将无权限使用iotdb的查询、插入、创建 database、添加用户、角色等功能 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role2.jpg?raw=true) + +10、点击左侧的User 菜单然后点击用户列表中的用户为该用户添加我们刚创建的`iotdb_admin`角色 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role3.jpg?raw=true) + +11、选择Role Mappings ,在Assign role选择`iotdb_admin`增加角色 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role4.jpg?raw=true) + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/add_role5.jpg?raw=true) + +提示:如果用户角色有调整需要重新生成token并且重新登陆iotdb才会生效 + +以上步骤提供了一种 keycloak 登陆 iotdb 方式,更多方式请参考 keycloak 配置 + +若对应的 IoTDB 服务器开启了使用 OpenID Connect (OIDC) 作为权限认证服务,那么就不再需要使用用户名密码进行登录。 +替而代之的是使用 Token,以及空密码。 +此时,登录命令如下: + +```shell +Shell > bash sbin/start-cli.sh -h 10.129.187.21 -p 6667 -u {my-access-token} -pw "" +``` + +其中,需要将{my-access-token} (注意,包括{})替换成你的 token,即 access_token 对应的值。密码为空需要再次确认。 + +![avatar](https://alioss.timecho.com/docs/img/UserGuide/CLI/Command-Line-Interface/iotdbpw.jpeg?raw=true) + +如何获取 token 取决于你的 OIDC 设置。 最简单的一种情况是使用`password-grant`。例如,假设你在用 keycloack 作为你的 OIDC 服务, +并且你在 keycloack 中有一个被定义成 public 的`iotdb`客户的 realm,那么你可以使用如下`curl`命令获得 token。 +(注意例子中的{}和里面的内容需要替换成具体的服务器地址和 realm 名字): +```shell +curl -X POST "http://{your-keycloack-server}/realms/{your-realm}/protocol/openid-connect/token" \ -H "Content-Type: application/x-www-form-urlencoded" \ + -d "username={username}" \ + -d "password={password}" \ + -d 'grant_type=password' \ + -d "client_id=iotdb-client" +``` + +示例结果如下: + +```json +{"access_token":"eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJxMS1XbTBvelE1TzBtUUg4LVNKYXAyWmNONE1tdWNXd25RV0tZeFpKNG93In0.eyJleHAiOjE1OTAzOTgwNzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNjA0ZmYxMDctN2NiNy00NTRmLWIwYmQtY2M2ZDQwMjFiNGU4IiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiYWNjb3VudCIsInN1YiI6ImJhMzJlNDcxLWM3NzItNGIzMy04ZGE2LTZmZThhY2RhMDA3MyIsInR5cCI6IkJlYXJlciIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsImFjciI6IjEiLCJhbGxvd2VkLW9yaWdpbnMiOlsibG9jYWxob3N0OjgwODAiXSwicmVhbG1fYWNjZXNzIjp7InJvbGVzIjpbIm9mZmxpbmVfYWNjZXNzIiwidW1hX2F1dGhvcml6YXRpb24iLCJpb3RkYl9hZG1pbiJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsibWFuYWdlLWFjY291bnQiLCJtYW5hZ2UtYWNjb3VudC1saW5rcyIsInZpZXctcHJvZmlsZSJdfX0sInNjb3BlIjoiZW1haWwgcHJvZmlsZSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJ1c2VyIn0.nwbrJkWdCNjzFrTDwKNuV5h9dDMg5ytRKGOXmFIajpfsbOutJytjWTCB2WpA8E1YI3KM6gU6Jx7cd7u0oPo5syHhfCz119n_wBiDnyTZkFOAPsx0M2z20kvBLN9k36_VfuCMFUeddJjO31MeLTmxB0UKg2VkxdczmzMH3pnalhxqpnWWk3GnrRrhAf2sZog0foH4Ae3Ks0lYtYzaWK_Yo7E4Px42-gJpohy3JevOC44aJ4auzJR1RBj9LUbgcRinkBy0JLi6XXiYznSC2V485CSBHW3sseXn7pSXQADhnmGQrLfFGO5ZljmPO18eFJaimdjvgSChsrlSEmTDDsoo5Q","expires_in":300,"refresh_expires_in":1800,"refresh_token":"eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJhMzZlMGU0NC02MWNmLTQ5NmMtOGRlZi03NTkwNjQ5MzQzMjEifQ.eyJleHAiOjE1OTAzOTk1NzEsImlhdCI6MTU5MDM5Nzc3MSwianRpIjoiNmMxNTBiY2EtYmE5NC00NTgxLWEwODEtYjI2YzhhMmI5YmZmIiwiaXNzIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwiYXVkIjoiaHR0cDovL2F1dGguZGVtby5wcmFnbWF0aWNpbmR1c3RyaWVzLmRlL2F1dGgvcmVhbG1zL0lvVERCIiwic3ViIjoiYmEzMmU0NzEtYzc3Mi00YjMzLThkYTYtNmZlOGFjZGEwMDczIiwidHlwIjoiUmVmcmVzaCIsImF6cCI6ImlvdGRiIiwic2Vzc2lvbl9zdGF0ZSI6IjA2MGQyODYyLTE0ZWQtNDJmZS1iYWY3LThkMWY3ODQ2NTdmMSIsInNjb3BlIjoiZW1haWwgcHJvZmlsZSJ9.ayNpXdNX28qahodX1zowrMGiUCw2AodlHBQFqr8Ui7c","token_type":"bearer","not-before-policy":0,"session_state":"060d2862-14ed-42fe-baf7-8d1f784657f1","scope":"email profile"} +``` + +### Cli 的批量操作 +当您想要通过脚本的方式通过 Cli / Shell 对 IoTDB 进行批量操作时,可以使用-e 参数。通过使用该参数,您可以在不进入客户端输入模式的情况下操作 IoTDB。 + +为了避免 SQL 语句和其他参数混淆,现在只支持-e 参数作为最后的参数使用。 + +针对 cli/Shell 工具的-e 参数用法如下: + +Linux 系统与 MacOS 指令: + +```shell +Shell > bash sbin/start-cli.sh -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +Windows 系统指令 +```shell +Shell > sbin\start-cli.bat -h {host} -p {rpcPort} -u {user} -pw {password} -e {sql for iotdb} +``` + +在 Windows 环境下,-e 参数的 SQL 语句需要使用` `` `对于`" "`进行替换 + +为了更好的解释-e 参数的使用,可以参考下面在 Linux 上执行的例子。 + +假设用户希望对一个新启动的 IoTDB 进行如下操作: + +1. 创建名为 root.demo 的 database + +2. 创建名为 root.demo.s1 的时间序列 + +3. 向创建的时间序列中插入三个数据点 + +4. 查询验证数据是否插入成功 + +那么通过使用 cli/Shell 工具的 -e 参数,可以采用如下的脚本: + +```shell +# !/bin/bash + +host=127.0.0.1 +rpcPort=6667 +user=root +pass=root + +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "CREATE DATABASE root.demo" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "create timeseries root.demo.s1 WITH DATATYPE=INT32, ENCODING=RLE" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(1,10)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(2,11)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "insert into root.demo(timestamp,s1) values(3,12)" +bash ./sbin/start-cli.sh -h ${host} -p ${rpcPort} -u ${user} -pw ${pass} -e "select s1 from root.demo" +``` + +打印出来的结果显示如下,通过这种方式进行的操作与客户端的输入模式以及通过 JDBC 进行操作结果是一致的。 + +```shell + Shell > bash ./shell.sh ++-----------------------------+------------+ +| Time|root.demo.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 10| +|1970-01-01T08:00:00.002+08:00| 11| +|1970-01-01T08:00:00.003+08:00| 12| ++-----------------------------+------------+ +Total line number = 3 +It costs 0.267s +``` + +需要特别注意的是,在脚本中使用 -e 参数时要对特殊字符进行转义。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Import-Export-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Import-Export-Tool.md new file mode 100644 index 00000000..34d68309 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Import-Export-Tool.md @@ -0,0 +1,278 @@ + + +# 数据导入导出脚本 + +IoTDB 提供了数据导入导出脚本(tools/export-data、tools/import-data,V1.3.2 及之后版本支持;历史版本可使用 tools/export-csv、tools/import-csv 脚本,使用参考[文档](./TsFile-Import-Export-Tool.md)),用于实现 IoTDB 内部数据与外部文件的交互,适用于单个文件或目录文件批量操作。 + +## 支持的数据格式 + +- **CSV**:纯文本格式,存储格式化数据,需按照下文指定 CSV 格式进行构造 +- **SQL**:包含自定义 SQL 语句的文件 + +## export-data 脚本(数据导出) + +### 运行命令 + +```Bash +# Unix/OS X +>tools/export-data.sh -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] + +# Windows +>tools\export-data.bat -h -p -u -pw -t [-tf -datatype -q -s -tfn -lpf -type -aligned ] +``` + +参数介绍: + +| 参数 | 定义 | 是否必填 | 默认 | +| :-------- |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------| :----------------------- | +| -h | 数据库IP地址 | 否 | 127.0.0.1 | +| -p | 数据库端口 | 否 | 6667 | +| -u | 数据库连接用户名 | 否 | root | +| -pw | 数据库连接密码 | 否 | root | +| -t | 导出的 CSV 或 SQL 文件的输出路径(V1.3.2版本参数是`-td`) | 是 | | +| -datatype | 是否在 CSV 文件的 header 中时间序列的后面打印出对应的数据类型,选项为 true 或者 false | 否 | true | +| -q | 在命令中直接指定想要执行的查询语句(目前仅支持部分语句,详细明细见下表)
说明:-q 与 -s 参数必填其一,同时填写则 -q 生效。详细支持的 SQL 语句示例,请参考下方“SQL语句支持明细” | 否 | | +| -s | 指定 SQL 文件,该文件可包含一条或多条 SQL 语句。如果包含多条 SQL 语句,语句之间应该用换行(回车)进行分割。每一条 SQL 语句对应一个或多个输出的CSV或 SQL 文件
说明:-q 与 -s 参数必填其一,同时填写则-q生效。详细支持的 SQL 语句示例,请参考下方“SQL语句支持规则” | 否 | | +| -type | 指定导出的文件类型,选项为 csv 或者 sql | 否 | csv | +| -tf | 指定时间格式。时间格式必须遵守[ISO 8601](https://calendars.wikia.org/wiki/ISO_8601)标准,或时间戳(`timestamp`)
说明:只在 -type 为 csv 时生效 | 否 | yyyy-MM-dd HH:mm:ss.SSSz | +| -lpf | 指定导出的 dump 文件最大行数(V1.3.2版本参数是`-linesPerFile`) | 否 | 10000 | +| -timeout | 指定 session 查询时的超时时间,单位为ms | 否 | -1 | + +SQL 语句支持规则: + +1. 只支持查询语句,非查询语句(如:元数据管理、系统管理等语句)不支持。对于不支持的 SQL ,程序会自动跳过,同时输出错误信息。 +2. 查询语句中目前版本仅支持原始数据的导出,如果有使用 group by、聚合函数、udf、操作运算符等则不支持导出为 SQL。原始数据导出时请注意,若导出多个设备数据,请使用 align by device 语句。详细示例如下: + +| | 支持导出 | 示例 | +|-------------------------------| -------- | --------------------------------------------- | +| 原始数据单设备查询 | 支持 | select * from root.s_0.d_0 | +| 原始数据多设备查询(aligin by device) | 支持 | select * from root.** align by device | +| 原始数据多设备查询(无 aligin by device) | 不支持 | select * from root.**
select * from root.s_0.* | + +### 运行示例 + +- 导出某 SQL 执行范围下的所有数据至 CSV 文件。 + ```Bash + # Unix/OS X + >tools/export-data.sh -t ./data/ -q 'select * from root.stock.**' + # Windows + >tools/export-data.bat -t ./data/ -q 'select * from root.stock.**' + ``` + +- 导出结果 + ```Bash + Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice + 2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 + 2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 + ``` +- 导出 SQL 文件内所有 SQL 执行范围下的所有数据至 CSV 文件。 + ```Bash + # Unix/OS X + >tools/export-data.sh -t ./data/ -s export.sql + # Windows + >tools/export-data.bat -t ./data/ -s export.sql + ``` + +- export.sql 文件内容(-s 参数指向的文件) + ```SQL + select * from root.stock.** limit 100 + select * from root.db.** limit 100 + ``` + +- 导出结果文件1 + ```Bash + Time,root.stock.Legacy.0700HK.L1_BidPrice,root.stock.Legacy.0700HK.Type,root.stock.Legacy.0700HK.L1_BidSize,root.stock.Legacy.0700HK.Domain,root.stock.Legacy.0700HK.L1_BuyNo,root.stock.Legacy.0700HK.L1_AskPrice + 2024-07-29T18:37:18.700+08:00,0.9666617,3.0,0.021367407654674264,-6.0,false,0.8926191 + 2024-07-29T18:37:19.701+08:00,0.3057328,3.0,0.9965377284981661,-5.0,false,0.15167356 + ``` + +- 导出结果文件2 + ```Bash + Time,root.db.Random.RandomBoolean + 2024-07-22T17:16:05.820+08:00,true + 2024-07-22T17:16:02.597+08:00,false + ``` +- 将 IoTDB 数据库中在 SQL 文件内定义的数据,以对齐的格式将其导出为 SQL 语句。 + ```Bash + # Unix/OS X + >tools/export-data.sh -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true + # Windows + >tools/export-data.bat -h 127.0.0.1 -p 6667 -u root -p root -t ./data/ -s export.sql -type sql -aligned true + ``` + +- 导出结果 + ```Bash + INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249629831,0.62308747,2.0,0.012206747854849653,-6.0,false,0.14164352); + INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249630834,0.7520042,3.0,0.22760657101910464,-5.0,true,0.089064896); + INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) ALIGNED VALUES (1722249631835,0.3981064,3.0,0.6254559288663467,-6.0,false,0.9767922); + ``` +- 将某 SQL 执行范围下的所有数据导出至 CSV 文件,指定导出的时间格式为`yyyy-MM-dd HH:mm:ss`,且表头时间序列的后面打印出对应的数据类型。 + ```Bash + # Unix/OS X + >tools/export-data.sh -t ./data/ -tf 'yyyy-MM-dd HH:mm:ss' -datatype true -q "select * from root.stock.**" -type csv + # Windows + >tools/export-data.bat -t ./data/ -tf 'yyyy-MM-dd HH:mm:ss' -datatype true -q "select * from root.stock.**" -type csv + ``` + +- 导出结果 + ```Bash + Time,root.stock.Legacy.0700HK.L1_BidPrice(DOUBLE),root.stock.Legacy.0700HK.Type(DOUBLE),root.stock.Legacy.0700HK.L1_BidSize(DOUBLE),root.stock.Legacy.0700HK.Domain(DOUBLE),root.stock.Legacy.0700HK.L1_BuyNo(BOOLEAN),root.stock.Legacy.0700HK.L1_AskPrice(DOUBLE) + 2024-07-30 10:33:55,0.44574088,3.0,0.21476832811611501,-4.0,true,0.5951748 + 2024-07-30 10:33:56,0.6880933,3.0,0.6289119476165305,-5.0,false,0.114634395 + ``` + +## import-data 脚本(数据导入) + +### 导入文件示例 + +#### CSV 文件示例 + +注意,在导入 CSV 数据前,需要特殊处理下列的字符: + +1. 如果 Text 类型的字段中包含特殊字符如`,`需要使用`\`来进行转义。 +2. 可以导入像`yyyy-MM-dd'T'HH:mm:ss`, `yyy-MM-dd HH:mm:ss`, 或者 `yyyy-MM-dd'T'HH:mm:ss.SSSZ`格式的时间。 +3. 时间列`Time`应该始终放在第一列。 + +示例一:时间对齐,并且 header 中不包含数据类型的数据。 + +```SQL +Time,root.test.t1.str,root.test.t2.str,root.test.t2.var +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,"123",, +``` + +示例二:时间对齐,并且 header 中包含数据类型的数据(Text 类型数据支持加双引号和不加双引号) + +```SQL +Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.var(INT32) +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,123,hello world,123 +1970-01-01T08:00:00.003+08:00,"123",, +1970-01-01T08:00:00.004+08:00,123,,12 +``` + +示例三:设备对齐,并且 header 中不包含数据类型的数据 + +```SQL +Time,Device,str,var +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +``` + +示例四:设备对齐,并且 header 中包含数据类型的数据(Text 类型数据支持加双引号和不加双引号) + +```SQL +Time,Device,str(TEXT),var(INT32) +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 +``` + +#### SQL 文件示例 + +> 对于不支持的 SQL ,不合法的 SQL ,执行失败的 SQL 都会放到失败目录下的失败文件里(默认为 文件名.failed) + +```SQL +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728578812,0.21911979,4.0,0.7129878488375604,-5.0,false,0.65362453); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728579812,0.35814416,3.0,0.04674720094979623,-5.0,false,0.9365247); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728580813,0.20012152,3.0,0.9910098187911393,-4.0,true,0.70040536); +INSERT INTO root.stock.Legacy.0700HK(TIMESTAMP,L1_BidPrice,Type,L1_BidSize,Domain,L1_BuyNo,L1_AskPrice) VALUES (1721728581814,0.034122765,4.0,0.9313345284181858,-4.0,true,0.9945297); +``` + +### 运行命令 + +```Bash +# Unix/OS X +>tools/import-data.sh -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] + +# Windows +>tools\import-data.bat -h -p -u -pw -s [-fd <./failedDirectory> -aligned -batch -tp -typeInfer -lpf ] +``` + +> 虽然 IoTDB 具有类型推断的能力,但我们仍然推荐在导入数据前创建元数据,因为这可以避免不必要的类型转换错误。如下: + +```SQL +CREATE DATABASE root.fit.d1; +CREATE DATABASE root.fit.d2; +CREATE DATABASE root.fit.p; +CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; +CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; +``` + +参数介绍: + +| 参数 | 定义 | 是否必填 | 默认 | +|:-------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| :------- | :------------------------ | +| -h | 数据库 IP 地址 | 否 | 127.0.0.1 | +| -p | 数据库端口 | 否 | 6667 | +| -u | 数据库连接用户名 | 否 | root | +| -pw | 数据库连接密码 | 否 | root | +| -s | 指定想要导入的数据,这里可以指定文件或者文件夹。如果指定的是文件夹,将会把文件夹中所有的后缀为 csv 或者 sql 的文件进行批量导入(V1.3.2版本参数是`-f`) | 是 | | +| -fd | 指定存放失败 SQL 文件的目录,如果未指定这个参数,失败的文件将会被保存到源数据的目录中。
说明:对于不支持的 SQL ,不合法的 SQL ,执行失败的 SQL 都会放到失败目录下的失败文件里(默认为 文件名.failed) | 否 | 源文件名加上`.failed`后缀 | +| -aligned | 指定是否使用`aligned`接口,选项为 true 或者 false
说明:这个参数只在导入文件为csv文件时生效 | 否 | false | +| -batch | 用于指定每一批插入的数据的点数(最小值为1,最大值为 Integer.*MAX_VALUE*)。如果程序报了`org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`这个错的话,就可以适当的调低这个参数。 | 否 | `100000` | +| -tp | 指定时间精度,可选值包括`ms`(毫秒),`ns`(纳秒),`us`(微秒) | 否 | `ms` | +| -lpf | 指定每个导入失败文件写入数据的行数(V1.3.2版本参数是`-linesPerFailedFile`) | 否 | 10000 | +| -typeInfer | 用于指定类型推断规则,如
说明:用于指定类型推断规则.`srcTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`NaN`.`dstTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`text`.当`srcTsDataType`为`boolean`, `dstTsDataType`只能为`boolean`或`text`.当`srcTsDataType`为`NaN`, `dstTsDataType`只能为`float`, `double`或`text`.当`srcTsDataType`为数值类型, `dstTsDataType`的精度需要高于`srcTsDataType`.例如:`-typeInfer boolean=text,float=double` | 否 | | + +### 运行示例 + +- 导入当前`data`目录下的`dump0_0.sql`数据到本机 IoTDB 数据库中。 + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.sql +# Windows +>tools/import-data.bat -s ./data/dump0_0.sql +``` + +- 将当前`data`目录下的所有数据以对齐的方式导入到本机 IoTDB 数据库中。 + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/ -fd ./failed/ -aligned true +# Windows +>tools/import-data.bat -s ./data/ -fd ./failed/ -aligned true +``` + +- 导入当前`data`目录下的`dump0_0.csv`数据到本机 IoTDB 数据库中。 + +```Bash +# Unix/OS X +>tools/import-data.sh -s ./data/dump0_0.csv -fd ./failed/ +# Windows +>tools/import-data.bat -s ./data/dump0_0.csv -fd ./failed/ +``` + +- 将当前`data`目录下的`dump0_0.csv`数据以对齐的方式,一批导入100000条导入到`192.168.100.1`IP所在主机的 IoTDB 数据库中,失败的记录记在当前`failed`目录下,每个文件最多记1000条。 + +```Bash +# Unix/OS X +>tools/import-data.sh -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +# Windows +>tools/import-data.bat -h 192.168.100.1 -p 6667 -u root -pw root -s ./data/dump0_0.csv -fd ./failed/ -aligned true -batch 100000 -tp ms -typeInfer boolean=text,float=double -lpf 1000 +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Modeling/DataRegion.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Modeling/DataRegion.md new file mode 100644 index 00000000..dfec5c17 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Modeling/DataRegion.md @@ -0,0 +1,55 @@ + + +# Data Region + +## 背景 + +Database 由用户显示指定,使用语句"CREATE DATABASE"来指定 database,每一个 database 有多个对应的 data region + +为了确保最终一致性,每一个 data region 有一个数据插入锁(排它锁)来同步每一次插入操作。 +所以服务端数据写入的并行度为 data region的数量。 + +## 问题 + +从背景中可知,IoTDB数据写入的并行度为 max(客户端数量,服务端数据写入的并行度),也就是max(客户端数量,data region 数量) + +在生产实践中,存储组的概念往往与特定真实世界实体相关(例如工厂,地点,国家等)。 +因此存储组的数量可能会比较小,这会导致IoTDB写入并行度不足。即使我们开再多的客户端写入线程,也无法走出这种困境。 + +## 解决方案 + +我们的方案是将一个存储组下的设备分为若干个设备组(称为 data region),将同步粒度从存储组级别改为 data region 粒度。 + +更具体的,我们使用哈希将设备分到不同的 data region 下,例如: +对于一个名为"root.sg.d"的设备(假设其存储组为"root.sg"),它属于的 data region 为"root.sg.[hash("root.sg.d") mod num_of_data_region]" + +## 使用方法 + +通过改变如下配置来设置每一个 database 下 data region 的数量: + +``` +data_region_num +``` + +推荐值为[data region number] = [CPU core number] / [user-defined database number] + +参考[配置手册](../Reference/DataNode-Config-Manual.md)以获取更多信息。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Modeling/SchemaRegion-rocksdb.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Modeling/SchemaRegion-rocksdb.md new file mode 100644 index 00000000..404c0d49 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Data-Modeling/SchemaRegion-rocksdb.md @@ -0,0 +1,105 @@ + + +# SchemaRegion RocksDB (基于rocksDB的元数据存储方式) + +## 背景 + +在IoTDB服务启动时,通过加载日志文件`mlog.bin`组织元数据信息,并将结果长期持有在内存中;随着元数据的不断增长,内存会持续上涨;为支持海量元数据场景下,内存在可控范围内波动,我们提供了基于rocksDB的元数据存储方式。 + +## 使用 + +首先使用下面的命令将 `schema-engine-rocksdb` 打包 + +```shell +mvn clean package -pl schema-engine-rocksdb -am -DskipTests +``` + +命令运行结束后,在其 target/schema-engine-rocksdb 中会有一个 lib 文件夹和 conf 文件夹。将 conf 文件夹下的文件拷贝到 server 的 conf 文件夹中,将 lib 文件夹下的文件也拷贝到 +server 的 lib 的文件夹中。 + +在系统配置文件`iotdb-system.properties`中,将配置项`schema_engine_mode`修改为`Rocksdb_based`, 如: + +``` +#################### +### Schema Engine Configuration +#################### +# Choose the mode of schema engine. The value could be Memory,PBTree and Rocksdb_based. If the provided value doesn't match any pre-defined value, Memory mode will be used as default. +# Datatype: string +schema_engine_mode=Rocksdb_based +``` + +当指定rocksdb作为元数据的存储方式时,我们开放了rocksdb相关的配置参数,您可以通过修改配置文件`schema-rocksdb.properties`,根据自己的需求,进行合理的参数调整,例如查询的缓存等。如没有特殊需求,使用默认值即可。 + +## 功能支持说明 + +该模块仍在不断完善中,部分功能暂不支持,功能模块支持情况如下: + +| 功能 | 支持情况 | +| :-----| ----: | +| 时间序列增删 | 支持 | +| 路径通配符(* 及 **)查询 | 支持 | +| tag增删 | 支持 | +| 对齐时间序列 | 支持 | +| 节点名称(*)通配 | 不支持 | +| 元数据模板template | 不支持 | +| tag索引 | 不支持 | +| continuous query | 不支持 | + + +## 附: 所有接口支持情况 + +外部接口,即客户端可以感知到,相关sql不支持; + +内部接口,即服务内部的其他模块调用逻辑,与外部sql无直接依赖关系; + +| 接口名称 | 接口类型 | 支持情况 | 说明 | +| :-----| ----: | :----: | :----: | +| createTimeseries | 外部接口 | 支持 | | +| createAlignedTimeSeries | 外部接口 | 支持 | | +| showTimeseries | 外部接口 | 部分支持 | 不支持LATEST | +| changeAlias | 外部接口 | 支持 | | +| upsertTagsAndAttributes | 外部接口 | 支持 | | +| addAttributes | 外部接口 | 支持 | | +| addTags | 外部接口 | 支持 | | +| dropTagsOrAttributes | 外部接口 | 支持 | | +| setTagsOrAttributesValue | 外部接口 | 支持 | | +| renameTagOrAttributeKey | 外部接口 | 支持 | | +| *template | 外部接口 | 不支持 | | +| *trigger | 外部接口 | 不支持 | | +| deleteSchemaRegion | 内部接口 | 支持 | | +| autoCreateDeviceMNode | 内部接口 | 不支持 | | +| isPathExist | 内部接口 | 支持 | | +| getAllTimeseriesCount | 内部接口 | 支持 | | +| getDevicesNum | 内部接口 | 支持 | | +| getNodesCountInGivenLevel | 内部接口 | 有条件支持 | 路径不支持通配 | +| getMeasurementCountGroupByLevel | 内部接口 | 支持 | | +| getNodesListInGivenLevel | 内部接口 | 有条件支持 | 路径不支持通配 | +| getChildNodePathInNextLevel | 内部接口 | 有条件支持 | 路径不支持通配 | +| getChildNodeNameInNextLevel | 内部接口 | 有条件支持 | 路径不支持通配 | +| getBelongedDevices | 内部接口 | 支持 | | +| getMatchedDevices | 内部接口 | 支持 | | +| getMeasurementPaths | 内部接口 | 支持 | | +| getMeasurementPathsWithAlias | 内部接口 | 支持 | | +| getAllMeasurementByDevicePath | 内部接口 | 支持 | | +| getDeviceNode | 内部接口 | 支持 | | +| getMeasurementMNodes | 内部接口 | 支持 | | +| getSeriesSchemasAndReadLockDevice | 内部接口 | 支持 | | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Deadband-Process.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Deadband-Process.md new file mode 100644 index 00000000..cac24d28 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Deadband-Process.md @@ -0,0 +1,108 @@ + + +# 死区处理 + +## 旋转门压缩 + +旋转门压缩(SDT)算法是一种死区处理算法。SDT 的计算复杂度较低,并使用线性趋势来表示大量数据。 + +在 IoTDB 中,SDT 在刷新到磁盘时会压缩并丢弃数据。 + +IoTDB 允许您在创建时间序列时指定 SDT 的属性,并支持以下三个属性: + +* CompDev (Compression Deviation,压缩偏差) + +CompDev 是 SDT 中最重要的参数,它表示当前样本与当前线性趋势之间的最大差值。CompDev 设置的值需要大于 0。 + +* CompMinTime (Compression Minimum Time Interval,最小压缩时间间隔) + +CompMinTime 是测量两个存储的数据点之间的时间距离的参数,用于减少噪声。 +如果当前点和最后存储的点之间的时间间隔小于或等于其值,则无论压缩偏差如何,都不会存储当前点。 +默认值为 0,单位为毫秒。 + +* CompMaxTime (Compression Maximum Time Interval,最大压缩时间间隔) + +CompMaxTime 是测量两个存储的数据点之间的时间距离的参数。 +如果当前点和最后一个存储点之间的时间间隔大于或等于其值, +无论压缩偏差如何,都将存储当前点。 +默认值为 9,223,372,036,854,775,807,单位为毫秒。 + +支持的数据类型: + +* INT32(整型) +* INT64(长整型) +* FLOAT(单精度浮点数) +* DOUBLE(双精度浮点数) + +SDT 的指定语法详见本文 [SQL 参考文档](../Reference/SQL-Reference.md)。 + +以下是使用 SDT 压缩的示例。 + +``` +IoTDB> CREATE TIMESERIES root.sg1.d0.s0 WITH DATATYPE=INT32,ENCODING=PLAIN,DEADBAND=SDT,COMPDEV=2 +``` + +刷入磁盘和 SDT 压缩之前,结果如下所示: + +``` +IoTDB> SELECT s0 FROM root.sg1.d0 ++-----------------------------+--------------+ +| Time|root.sg1.d0.s0| ++-----------------------------+--------------+ +|2017-11-01T00:06:00.001+08:00| 1| +|2017-11-01T00:06:00.002+08:00| 1| +|2017-11-01T00:06:00.003+08:00| 1| +|2017-11-01T00:06:00.004+08:00| 1| +|2017-11-01T00:06:00.005+08:00| 1| +|2017-11-01T00:06:00.006+08:00| 1| +|2017-11-01T00:06:00.007+08:00| 1| +|2017-11-01T00:06:00.015+08:00| 10| +|2017-11-01T00:06:00.016+08:00| 20| +|2017-11-01T00:06:00.017+08:00| 1| +|2017-11-01T00:06:00.018+08:00| 30| ++-----------------------------+--------------+ +Total line number = 11 +It costs 0.008s +``` + +刷入磁盘和 SDT 压缩之后,结果如下所示: +``` +IoTDB> FLUSH +IoTDB> SELECT s0 FROM root.sg1.d0 ++-----------------------------+--------------+ +| Time|root.sg1.d0.s0| ++-----------------------------+--------------+ +|2017-11-01T00:06:00.001+08:00| 1| +|2017-11-01T00:06:00.007+08:00| 1| +|2017-11-01T00:06:00.015+08:00| 10| +|2017-11-01T00:06:00.016+08:00| 20| +|2017-11-01T00:06:00.017+08:00| 1| ++-----------------------------+--------------+ +Total line number = 5 +It costs 0.044s +``` + +SDT 在刷新到磁盘时进行压缩。 SDT 算法始终存储第一个点,并且不存储最后一个点。 + +时间范围在 [2017-11-01T00:06:00.001, 2017-11-01T00:06:00.007] 的数据在压缩偏差内,因此被压缩和丢弃。 +之所以存储时间为 2017-11-01T00:06:00.007 的数据点,是因为下一个数据点 2017-11-01T00:06:00.015 的值超过压缩偏差。 +当一个数据点超过压缩偏差时,SDT 将存储上一个读取的数据点,并重新计算上下压缩边界。作为最后一个数据点,不存储时间 2017-11-01T00:06:00.018。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data.md new file mode 100644 index 00000000..672b9885 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data.md @@ -0,0 +1,160 @@ + + +# 删除数据 + +用户使用 [DELETE 语句](../Reference/SQL-Reference.md) 可以删除指定的时间序列中符合时间删除条件的数据。在删除数据时,用户可以选择需要删除的一个或多个时间序列、时间序列的前缀、时间序列带、*路径对某一个时间区间内的数据进行删除。 + +在 JAVA 编程环境中,您可以使用 JDBC API 单条或批量执行 DELETE 语句。 + +## 单传感器时间序列值删除 + +以测控 ln 集团为例,存在这样的使用场景: + +wf02 子站的 wt02 设备在 2017-11-01 16:26:00 之前的供电状态出现多段错误,且无法分析其正确数据,错误数据影响了与其他设备的关联分析。此时,需要将此时间段前的数据删除。进行此操作的 SQL 语句为: + +```sql +delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; +``` + +如果我们仅仅想要删除 2017 年内的在 2017-11-01 16:26:00 之前的数据,可以使用以下 SQL: +```sql +delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` + +IoTDB 支持删除一个时间序列任何一个时间范围内的所有时序点,用户可以使用以下 SQL 语句指定需要删除的时间范围: +```sql +delete from root.ln.wf02.wt02.status where time < 10 +delete from root.ln.wf02.wt02.status where time <= 10 +delete from root.ln.wf02.wt02.status where time < 20 and time > 10 +delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 +delete from root.ln.wf02.wt02.status where time > 20 +delete from root.ln.wf02.wt02.status where time >= 20 +delete from root.ln.wf02.wt02.status where time = 20 +``` + +需要注意,当前的删除语句不支持 where 子句后的时间范围为多个由 OR 连接成的时间区间。如下删除语句将会解析出错: +``` +delete from root.ln.wf02.wt02.status where time > 4 or time < 0 +Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic +expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' +``` + +如果 delete 语句中未指定 where 子句,则会删除时间序列中的所有数据。 +```sql +delete from root.ln.wf02.wt02.status +``` + +## 多传感器时间序列值删除 + +当 ln 集团 wf02 子站的 wt02 设备在 2017-11-01 16:26:00 之前的供电状态和设备硬件版本都需要删除,此时可以使用含义更广的 [路径模式(Path Pattern)](../Basic-Concept/Data-Model-and-Terminology.md) 进行删除操作,进行此操作的 SQL 语句为: + + +```sql +delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; +``` + +需要注意的是,当删除的路径不存在时,IoTDB 不会提示路径不存在,而是显示执行成功,因为 SQL 是一种声明式的编程方式,除非是语法错误、权限不足等,否则都不认为是错误,如下所示。 + +```sql +IoTDB> delete from root.ln.wf03.wt02.status where time < now() +Msg: The statement is executed successfully. +``` + +## 删除时间分区 (实验性功能) +您可以通过如下语句来删除某一个 database 下的指定时间分区: + +```sql +DELETE PARTITION root.ln 0,1,2 +``` + +上例中的 0,1,2 为待删除时间分区的 id,您可以通过查看 IoTDB 的数据文件夹找到它,或者可以通过计算`timestamp / partitionInterval`(向下取整), +手动地将一个时间戳转换为对应的 id,其中的`partitionInterval`可以在 IoTDB 的配置文件中找到(如果您使用的版本支持时间分区)。 + +请注意该功能目前只是实验性的,如果您不是开发者,使用时请务必谨慎。 + +## 数据存活时间(TTL) + +IoTDB 支持对 database 级别设置数据存活时间(TTL),这使得 IoTDB 可以定期、自动地删除一定时间之前的数据。合理使用 TTL +可以帮助您控制 IoTDB 占用的总磁盘空间以避免出现磁盘写满等异常。并且,随着文件数量的增多,查询性能往往随之下降, +内存占用也会有所提高。及时地删除一些较老的文件有助于使查询性能维持在一个较高的水平和减少内存资源的占用。 + +TTL的默认单位为毫秒,如果配置文件中的时间精度修改为其他单位,设置ttl时仍然使用毫秒单位。 + +### 设置 TTL + +设置 TTL 的 SQL 语句如下所示: +``` +IoTDB> set ttl to root.ln 3600000 +``` +这个例子表示在`root.ln`数据库中,只有3600000毫秒,即最近一个小时的数据将会保存,旧数据会被移除或不可见。 +``` +IoTDB> set ttl to root.sgcc.** 3600000 +``` +支持给某一路径下的 database 设置TTL,这个例子表示`root.sgcc`路径下的所有 database 设置TTL。 +``` +IoTDB> set ttl to root.** 3600000 +``` +表示给所有 database 设置TTL。 + +### 取消 TTL + +取消 TTL 的 SQL 语句如下所示: + +``` +IoTDB> unset ttl to root.ln +``` + +取消设置 TTL 后, database `root.ln`中所有的数据都会被保存。 +``` +IoTDB> unset ttl to root.sgcc.** +``` + +取消设置`root.sgcc`路径下的所有 database 的 TTL 。 +``` +IoTDB> unset ttl to root.** +``` + +取消设置所有 database 的 TTL 。 + +### 显示 TTL + +显示 TTL 的 SQL 语句如下所示: + +``` +IoTDB> SHOW ALL TTL +IoTDB> SHOW TTL ON StorageGroupNames +``` + +SHOW ALL TTL 这个例子会给出所有 database 的 TTL。 +SHOW TTL ON root.ln,root.sgcc,root.DB 这个例子会显示指定的三个 database 的 TTL。 +注意:没有设置 TTL 的 database 的 TTL 将显示为 null。 + +``` +IoTDB> show all ttl ++-------------+-------+ +| database|ttl(ms)| ++-------------+-------+ +| root.ln|3600000| +| root.sgcc| null| +| root.DB|3600000| ++-------------+-------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data/Delete-Data.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data/Delete-Data.md new file mode 100644 index 00000000..9a708308 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data/Delete-Data.md @@ -0,0 +1,92 @@ + + +# 删除数据 + +用户使用 [DELETE 语句](../Reference/SQL-Reference.md) 可以删除指定的时间序列中符合时间删除条件的数据。在删除数据时,用户可以选择需要删除的一个或多个时间序列、时间序列的前缀、时间序列带、*路径对某一个时间区间内的数据进行删除。 + +在 JAVA 编程环境中,您可以使用 JDBC API 单条或批量执行 DELETE 语句。 + +## 单传感器时间序列值删除 + +以测控 ln 集团为例,存在这样的使用场景: + +wf02 子站的 wt02 设备在 2017-11-01 16:26:00 之前的供电状态出现多段错误,且无法分析其正确数据,错误数据影响了与其他设备的关联分析。此时,需要将此时间段前的数据删除。进行此操作的 SQL 语句为: + +```sql +delete from root.ln.wf02.wt02.status where time<=2017-11-01T16:26:00; +``` + +如果我们仅仅想要删除 2017 年内的在 2017-11-01 16:26:00 之前的数据,可以使用以下 SQL: +```sql +delete from root.ln.wf02.wt02.status where time>=2017-01-01T00:00:00 and time<=2017-11-01T16:26:00; +``` + +IoTDB 支持删除一个时间序列任何一个时间范围内的所有时序点,用户可以使用以下 SQL 语句指定需要删除的时间范围: +```sql +delete from root.ln.wf02.wt02.status where time < 10 +delete from root.ln.wf02.wt02.status where time <= 10 +delete from root.ln.wf02.wt02.status where time < 20 and time > 10 +delete from root.ln.wf02.wt02.status where time <= 20 and time >= 10 +delete from root.ln.wf02.wt02.status where time > 20 +delete from root.ln.wf02.wt02.status where time >= 20 +delete from root.ln.wf02.wt02.status where time = 20 +``` + +需要注意,当前的删除语句不支持 where 子句后的时间范围为多个由 OR 连接成的时间区间。如下删除语句将会解析出错: +``` +delete from root.ln.wf02.wt02.status where time > 4 or time < 0 +Msg: 303: Check metadata error: For delete statement, where clause can only contain atomic +expressions like : time > XXX, time <= XXX, or two atomic expressions connected by 'AND' +``` + +如果 delete 语句中未指定 where 子句,则会删除时间序列中的所有数据。 +```sql +delete from root.ln.wf02.wt02.status +``` + +## 多传感器时间序列值删除 + +当 ln 集团 wf02 子站的 wt02 设备在 2017-11-01 16:26:00 之前的供电状态和设备硬件版本都需要删除,此时可以使用含义更广的 [路径模式(Path Pattern)](../Basic-Concept/Data-Model-and-Terminology.md) 进行删除操作,进行此操作的 SQL 语句为: + + +```sql +delete from root.ln.wf02.wt02.* where time <= 2017-11-01T16:26:00; +``` + +需要注意的是,当删除的路径不存在时,IoTDB 不会提示路径不存在,而是显示执行成功,因为 SQL 是一种声明式的编程方式,除非是语法错误、权限不足等,否则都不认为是错误,如下所示。 + +```sql +IoTDB> delete from root.ln.wf03.wt02.status where time < now() +Msg: The statement is executed successfully. +``` + +## 删除时间分区 (实验性功能) +您可以通过如下语句来删除某一个 database 下的指定时间分区: + +```sql +DELETE PARTITION root.ln 0,1,2 +``` + +上例中的 0,1,2 为待删除时间分区的 id,您可以通过查看 IoTDB 的数据文件夹找到它,或者可以通过计算`timestamp / partitionInterval`(向下取整), +手动地将一个时间戳转换为对应的 id,其中的`partitionInterval`可以在 IoTDB 的配置文件中找到(如果您使用的版本支持时间分区)。 + +请注意该功能目前只是实验性的,如果您不是开发者,使用时请务必谨慎。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data/TTL.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data/TTL.md new file mode 100644 index 00000000..3350b147 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Delete-Data/TTL.md @@ -0,0 +1,130 @@ + + +# 数据存活时间(TTL) + +IoTDB 支持对 device 级别设置数据存活时间(TTL),这使得 IoTDB 可以定期、自动地删除一定时间之前的数据。合理使用 TTL +可以帮助您控制 IoTDB 占用的总磁盘空间以避免出现磁盘写满等异常。并且,随着文件数量的增多,查询性能往往随之下降, +内存占用也会有所提高。及时地删除一些较老的文件有助于使查询性能维持在一个较高的水平和减少内存资源的占用。 + +TTL的默认单位为毫秒,如果配置文件中的时间精度修改为其他单位,设置ttl时仍然使用毫秒单位。 + +当设置 TTL 时,系统会根据设置的路径寻找所包含的所有 device,并为这些 device 设置 TTL 时间,系统会按设备粒度对过期数据进行删除。 +当设备数据过期后,将不能被查询到,但磁盘文件中的数据不能保证立即删除(会在一定时间内删除),但可以保证最终被删除。 +考虑到操作代价,系统不会立即物理删除超过 TTL 的数据,而是通过合并来延迟地物理删除。因此,在数据被物理删除前,如果调小或者解除 TTL,可能会导致之前因 TTL 而不可见的数据重新出现。 +系统中仅能设置至多 1000 条 TTL 规则,达到该上限时,需要先删除部分 TTL 规则才能设置新的规则 + +## TTL Path 规则 +设置的路径 path 只支持前缀路径(即路径中间不能带 \* , 且必须以 \*\* 结尾),该路径会匹配到设备,也允许用户指定不带星的 path 为具体的 database 或 device,当 path 不带 \* 时,会检查是否匹配到 database,若匹配到 database,则会同时设置 path 和 path.\*\*。 +注意:设备 TTL 设置不会对元数据的存在性进行校验,即允许对一条不存在的设备设置 TTL。 +``` +合格的 path: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +不合格的 path: +root.*.db +root.**.db.* +root.db.* +``` +## TTL 适用规则 +当一个设备适用多条TTL规则时,优先适用较精确和较长的规则。例如对于设备“root.bj.hd.dist001.turbine001”来说,规则“root.bj.hd.dist001.turbine001”比“root.bj.hd.dist001.\*\*”优先,而规则“root.bj.hd.dist001.\*\*”比“root.bj.hd.\*\*”优先; +## 设置 TTL +set ttl 操作可以理解为设置一条 TTL规则,比如 set ttl to root.sg.group1.\*\* 就相当于对所有可以匹配到该路径模式的设备挂载 ttl。 unset ttl 操作表示对相应路径模式卸载 TTL,若不存在对应 TTL,则不做任何事。若想把 TTL 调成无限大,则可以使用 INF 关键字 +设置 TTL 的 SQL 语句如下所示: +``` +set ttl to pathPattern 360000; +``` +pathPattern 是前缀路径,即路径中间不能带 \* 且必须以 \*\* 结尾。 +pathPattern 匹配对应的设备。为了兼容老版本 SQL 语法,允许用户输入的 pathPattern 匹配到 db,则自动将前缀路径扩展为 path.\*\*。 +例如,写set ttl to root.sg 360000 则会自动转化为set ttl to root.sg.\*\* 360000,转化后的语句对所有 root.sg 下的 device 设置TTL。 +但若写的 pathPattern 无法匹配到 db,则上述逻辑不会生效。 +如写set ttl to root.sg.group 360000 ,由于root.sg.group未匹配到 db,则不会被扩充为root.sg.group.\*\*。 也允许指定具体 device,不带 \*。 +## 取消 TTL + +取消 TTL 的 SQL 语句如下所示: + +``` +IoTDB> unset ttl from root.ln +``` + +取消设置 TTL 后, `root.ln` 路径下所有的数据都会被保存。 +``` +IoTDB> unset ttl from root.sgcc.** +``` + +取消设置`root.sgcc`路径下的所有的 TTL 。 +``` +IoTDB> unset ttl from root.** +``` + +取消设置所有的 TTL 。 + +新语法 +``` +IoTDB> unset ttl from root.** +``` + +旧语法 +``` +IoTDB> unset ttl to root.** +``` +新旧语法在功能上没有区别并且同时兼容,仅是新语法在用词上更符合常规。 +## 显示 TTL + +显示 TTL 的 SQL 语句如下所示: +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` +SHOW ALL TTL 这个例子会给出所有的 TTL。 +SHOW TTL ON pathPattern 这个例子会显示指定路径的 TTL。 + +显示设备的 TTL。 +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +所有设备都一定会有 TTL,即不可能是 null。INF 表示无穷大。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Deployment-Preparation.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Deployment-Preparation.md new file mode 100644 index 00000000..6c3a412f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Deployment-Preparation.md @@ -0,0 +1,40 @@ + + +## 环境要求 + +要使用IoTDB,你需要具备以下条件: + +* Java >= 1.8 +> 1.8, 11到17都是经过验证的,推荐使用17。请确保环境路径已被相应设置。 + +* Maven >= 3.6 +> 如果你想从源代码编译和安装IoTDB。 +* 建议为每个节点配置 hostname。 +* 设置最大打开文件数为65535,以避免出现 "太多的打开文件 "的错误。 +* (可选)将somaxconn设置为65535,以避免系统在高负载时出现 "连接重置 "错误。 + + +> **# Linux**
`sudo sysctl -w net.core.somaxconn=65535`
**# FreeBSD 或 Darwin**
`sudo sysctl -w kern.ipc.somaxconn=65535` + +## 安装包获取 + +企业版安装包可经由商务获取。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Deployment-Recommendation.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Deployment-Recommendation.md new file mode 100644 index 00000000..6710c79f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Deployment-Recommendation.md @@ -0,0 +1,178 @@ + + +# IoTDB 部署推荐 +## 背景 + +系统能力 +- 性能需求:系统读写速度,压缩比 +- 扩展性:系统能够用多节点管理数据,本质上是数据是否可分区管理 +- 高可用:系统能够容忍节点失效,本质上是数据是否有副本 +- 一致性:当数据有多副本时,不同副本是否一致,本质上用户是否能将数据库当做单机看待 + +缩写 +- C:ConfigNode +- D:DataNode +- aCbD:a 个 ConfigNode 和 b 个 DataNode + +## 部署模式选型 + +| 模式 | 性能 | 扩展性 | 高可用 | 一致性 | +|:------------:|:----|:----|:----|:----| +| 轻量单机模式 | 最高 | 无 | 无 | 高 | +| 可扩展单节点模式 (默认) | 高 | 高 | 中 | 高 | +| 高性能分布式模式 | 高 | 高 | 高 | 中 | +| 强一致分布式模式 | 中 | 高 | 高 | 高 | + + +| 配置 | 轻量单机模式 | 可扩展单节点模式 | 高性能分布式模式 | 强一致分布式模式 | +|:------------------------------------------------------:|:-------|:---------|:---------|:---------| +| ConfigNode 个数 | 1 | ≥1 (奇数) | ≥1 (奇数) | ≥1(奇数) | +| DataNode 个数 | 1 | ≥1 | ≥3 | ≥3 | +| 元数据副本 schema_replication_factor | 1 | 1 | 3 | 3 | +| 数据副本 data_replication_factor | 1 | 1 | 2 | 3 | +| ConfigNode 协议 config_node_consensus_protocol_class | Simple | Ratis | Ratis | Ratis | +| SchemaRegion 协议 schema_region_consensus_protocol_class | Simple | Ratis | Ratis | Ratis | +| DataRegion 协议 data_region_consensus_protocol_class | Simple | IoT | IoT | Ratis | + + +## 部署配置推荐 + +### 从 0.13 版本升级到 1.0 + +场景: +已在 0.13 版本存储了部分数据,希望迁移到 1.0 版本,并且与 0.13 表现保持一致。 + +可选方案: +1. 升级到 1C1D 单机版,ConfigNode 分配 2G 内存,DataNode 与 0.13 一致。 +2. 升级到 3C3D 高性能分布式,ConfigNode 分配 2G 内存,DataNode 与 0.13 一致。 + +配置修改: +1.0 配置参数修改: +- 数据目录不要指向0.13原有数据目录 +- region_group_extension_strategy=COSTOM +- data_region_group_per_database + - 如果是 3C3D 高性能分布式:则改为:集群 CPU 总核数/ 数据副本数 + - 如果是 1C1D,则改为:等于 0.13 的 virtual_storage_group_num 即可 ("database"一词 与 0.13 中的 "sg" 同义) + +数据迁移: +配置修改完成后,通过 load-tsfile 工具将 0.13 的 TsFile 都加载进 1.0 的 IoTDB 中,即可使用。 + +### 直接使用 1.0 + +**推荐用户仅设置 1 个 Database** + +#### 内存设置方法 +##### 根据活跃序列数估计内存 + +集群 DataNode 总堆内内存(GB) = 活跃序列数/100000 * 数据副本数 + +每个 DataNode 堆内内存(GB)= 集群DataNode总堆内内存 / DataNode 个数 + +> 假设需要用3C3D管理100万条序列,数据采用3副本,则: +> - 集群 DataNode 总堆内内存(GB):1,000,000 / 100,000 * 3 = 30G +> - 每台 DataNode 的堆内内存配置为:30 / 3 = 10G + +##### 根据总序列数估计内存 + +集群 DataNode 总堆内内存 (B) = 20 * (180 + 2 * 序列的全路径的平均字符数)* 序列总量 * 元数据副本数 + +每个 DataNode 内存配置推荐:集群 DataNode 总堆内内存 / DataNode 数目 + +> 假设需要用3C3D管理100万条序列,元数据采用3副本,序列名形如 root.sg_1.d_10.s_100(约20字符),则: +> - 集群 DataNode 总堆内内存:20 * (180 + 2 * 20)* 1,000,000 * 3 = 13.2 GB +> - 每台 DataNode 的堆内内存配置为:13.2 GB / 3 = 4.4 GB + +#### 磁盘估计 + +IoTDB 存储空间=数据存储空间 + 元数据存储空间 + 临时存储空间 + +##### 数据磁盘空间 + +序列数量 * 采样频率 * 每个数据点大小 * 存储时长 * 副本数 / 10 倍压缩比 + +| 数据类型 \ 数据点大小 | 时间戳(字节) | 值(字节) | 总共(字节) | +|:-------------------------:|:--------|:------|:-------| +| 开关量(Boolean) | 8 | 1 | 9 | +| 整型(INT32)/ 单精度浮点数(FLOAT) | 8 | 4 | 12 | +| 长整型(INT64)/ 双精度浮点数(DOUBLE) | 8 | 8 | 16 | +| 字符串(TEXT) | 8 | 假设为 a | 8+a | + + +> 示例:1000设备,每个设备100 测点,共 100000 序列。整型。采样频率1Hz(每秒一次),存储1年,3副本,压缩比按 10 算,则数据存储空间占用: +> * 简版:1000 * 100 * 12 * 86400 * 365 * 3 / 10 = 11T +> * 完整版:1000设备 * 100测点 * 12字节每数据点 * 86400秒每天 * 365天每年 * 3副本 / 10压缩比 = 11T + +##### 元数据磁盘空间 + +每条序列在磁盘日志文件中大约占用 序列字符数 + 20 字节。 +若序列有tag描述信息,则仍需加上约 tag 总字符数字节的空间。 + +##### 临时磁盘空间 + +临时磁盘空间 = 写前日志 + 共识协议 + 合并临时空间 + +1. 写前日志 + +最大写前日志空间占用 = memtable 总内存占用 ÷ 最小有效信息占比 +- memtable 总内存占用和 datanode_memory_proportion、storage_engine_memory_proportion、write_memory_proportion 三个参数有关 +- 最小有效信息占比由 wal_min_effective_info_ratio 决定 + +> 示例:为 IoTDB 分配 16G 内存,配置文件如下 +> datanode_memory_proportion=3:3:1:1:1:1 +> storage_engine_memory_proportion=8:2 +> write_memory_proportion=19:1 +> wal_min_effective_info_ratio=0.1 +> 最大写前日志空间占用 = 16 * (3 / 10) * (8 / 10) * (19 / 20) ÷ 0.1 = 36.48G + +2. 共识协议 + +Ratis共识协议 +采用Ratis共识协议的情况下,需要额外磁盘空间存储Raft Log。Raft Log会在每一次状态机 Snapshot 之后删除,因此可以通过调整 trigger_snapshot_threshold 参数控制Raft Log最大空间占用。 + +每一个 Region Raft Log占用最大空间 = 平均请求大小 * trigger_snapshot_threshold。 +集群中一个Region总的Raft Log占用空间和Raft数据副本数成正比。 + +> 示例:DataRegion, 平均每一次插入20k数据,data_region_trigger_snapshot_threshold = 400,000,那么Raft Log最大占用 = 20K * 400,000 = 8G。 +Raft Log会从0增长到8G,接着在snapshot之后重新变成0。平均占用为4G。 +当副本数为3时,集群中这个DataRegion总Raft Log最大占用 3 * 8G = 24G。 + +此外,可以通过data_region_ratis_log_max_size规定每一个DataRegion的Raft Log磁盘占用最大值, +默认为20G,能够保障运行全程中单DataRegion Raft Log总大小不超过20G。 + +3. 合并临时空间 + + - 空间内合并 + 临时文件的磁盘空间 = 源文件大小总和 + + > 示例:10个源文件,每个文件大小为100M + > 临时文件的磁盘空间 = 10 * 100 = 1000M + + + - 跨空间合并 + 跨空间合并的临时文件大小与源文件大小和顺乱序数据的重叠度有关,当乱序数据与顺序数据有相同的时间戳时,就认为有重叠。 + 乱序数据的重叠度 = 重叠的乱序数据量 / 总的乱序数据量 + + 临时文件的磁盘空间 = 源顺序文件总大小 + 源乱序文件总大小 *(1 - 重叠度) + > 示例:10个顺序文件,10个乱序文件,每个顺序文件100M,每个乱序文件50M,每个乱序文件里有一半的数据与顺序文件有相同的时间戳 + > 乱序数据的重叠度 = 25M/50M * 100% = 50% + > 临时文件的磁盘空间 = 10 * 100 + 10 * 50 * 50% = 1250M + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Docker-Install.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Docker-Install.md new file mode 100644 index 00000000..c1a3be16 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Docker-Install.md @@ -0,0 +1,181 @@ + + +# docker部署 + +Apache IoTDB 的 Docker 镜像已经上传至 [https://hub.docker.com/r/apache/iotdb](https://hub.docker.com/r/apache/iotdb)。 +Apache IoTDB 的配置项以环境变量形式添加到容器内。 + +## docker镜像安装(单机版) + +```shell +# 获取镜像 +docker pull apache/iotdb:1.3.0-standalone +# 创建 docker bridge 网络 +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +# 创建 docker 容器 +# 注意:必须固定IP部署。IP改变会导致 confignode 启动失败。 +docker run -d --name iotdb-service \ + --hostname iotdb-service \ + --network iotdb \ + --ip 172.18.0.6 \ + -p 6667:6667 \ + -e cn_internal_address=iotdb-service \ + -e cn_seed_config_node=iotdb-service:10710 \ + -e cn_internal_port=10710 \ + -e cn_consensus_port=10720 \ + -e dn_rpc_address=iotdb-service \ + -e dn_internal_address=iotdb-service \ + -e dn_seed_config_node=iotdb-service:10710 \ + -e dn_mpp_data_exchange_port=10740 \ + -e dn_schema_region_consensus_port=10750 \ + -e dn_data_region_consensus_port=10760 \ + -e dn_rpc_port=6667 \ + apache/iotdb:1.3.0-standalone +# 尝试使用命令行执行SQL +docker exec -ti iotdb-service /iotdb/sbin/start-cli.sh -h iotdb-service +``` + +外部连接: + +```shell +# <主机IP/hostname> 是物理机的真实IP或域名。如果在同一台物理机,可以是127.0.0.1。 +$IOTDB_HOME/sbin/start-cli.sh -h <主机IP/hostname> -p 6667 +``` + +```yaml +# docker-compose-1c1d.yml +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.3.0-standalone + hostname: iotdb-service + container_name: iotdb-service + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb-service + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-service:10710 + - dn_rpc_address=iotdb-service + - dn_internal_address=iotdb-service + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb-service:10710 + volumes: + - ./data:/iotdb/data + - ./logs:/iotdb/logs + networks: + iotdb: + ipv4_address: 172.18.0.6 + +networks: + iotdb: + external: true +``` +如果需要修改内存配置,需要将 IoTDB 的配置文件夹 conf 映射出来。 +1. 在 volumes 配置内增加一个映射 `./iotdb-conf:/iotdb/conf` 启动 docker 容器。执行后,iotdb-conf 目录下有了 IoTDB 的所有配置。 +2. 修改目录 iotdb-conf 下的 confignode-env.sh 和 datanode-env.sh 内的相关配置。再次启动 docker 容器。 + +## docker镜像安装(集群版) + +目前只支持 host 网络和 overlay 网络,不支持 bridge 网络。overlay 网络参照[1C2D](https://github.com/apache/iotdb/tree/master/docker/src/main/DockerCompose/docker-compose-cluster-1c2d.yml)的写法,host 网络如下。 + +假如有三台物理机,它们的hostname分别是iotdb-1、iotdb-2、iotdb-3。依次启动。 +以 iotdb-2 节点的docker-compose文件为例: + +```yaml +version: "3" +services: + iotdb-confignode: + image: apache/iotdb:1.3.0-confignode + container_name: iotdb-confignode + environment: + - cn_internal_address=iotdb-2 + - cn_seed_config_node=iotdb-1:10710 + - schema_replication_factor=3 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - data_replication_factor=3 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/confignode:/iotdb/data + - ./logs/confignode:/iotdb/logs + network_mode: "host" + + iotdb-datanode: + image: apache/iotdb:1.3.0-datanode + container_name: iotdb-datanode + environment: + - dn_rpc_address=iotdb-2 + - dn_internal_address=iotdb-2 + - dn_seed_config_node=iotdb-1:10710 + - data_replication_factor=3 + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/datanode:/iotdb/data/ + - ./logs/datanode:/iotdb/logs/ + network_mode: "host" +``` + +注意: + +1. `cn_seed_config_node`和`dn_seed_config_node`所有节点配置一样,需要配置第一个启动的节点,这里为`iotdb-1`。 +2. 上面docker-compose文件中,`iotdb-2`需要替换为每个节点的 hostname、域名或者IP地址。 +3. 需要映射`/etc/hosts`,文件内配置了 iotdb-1、iotdb-2、iotdb-3 与IP的映射。或者可以在 docker-compose 文件中增加 `extra_hosts` 配置。 +4. 首次启动时,必须首先启动 `iotdb-1`。 +5. 如果部署失败要重新部署集群,必须将所有节点上的IoTDB服务停止并删除,然后清除`data`和`logs`文件夹后,再启动。 + +## 配置 +IoTDB 的配置文件,都在安装目录的conf目录下。 +IoTDB 本身配置都可以在 docker-compose 文件的 environment 中进行配置。 +如果对日志和内存进行了自定义配置,那么需要将`conf`目录映射出来。 + +### 修改日志级别 +日志配置文件为 logback-confignode.xml 和 logback-datanode.xml,可以根据需要进行精细配置。 + +### 修改内存配置 +内存配置文件为 confignode-env.sh 和 datanode-env.sh。堆内存 ON_HEAP_MEMORY, 堆外内存 OFF_HEAP_MEMORY。例如:`ON_HEAP_MEMORY=8G, OFF_HEAP_MEMORY=2G` + +## 升级 +1. 获取新的镜像 +2. 修改 docker-compose 文件的 image +3. 使用 docker stop 和 docker rm 命令,停止运行的 docker 容器 +4. 启动 IoTDB: `docker-compose -f docker-compose-standalone.yml up -d` + +## 设置开机自启动 +1. 修改 docker-compose 文件,每个docker 容器配置:`restart: always` +2. 将 docker 服务设置为开机自启动 +以 CentOS 操作系统为例: `systemctl enable docker` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Edge-Cloud-Collaboration/Sync-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Edge-Cloud-Collaboration/Sync-Tool.md new file mode 100644 index 00000000..1ead9f51 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Edge-Cloud-Collaboration/Sync-Tool.md @@ -0,0 +1,362 @@ + + +# TsFile 同步 + +## 1.介绍 + +同步工具是持续将边缘端(发送端) IoTDB 中的时间序列数据上传并加载至云端(接收端) IoTDB 的套件工具。 + +IoTDB 同步工具内嵌于 IoTDB 引擎,与下游接收端相连,下游支持 IoTDB(单机/集群)。 + +可以在发送端使用 SQL 命令来启动或者关闭一个同步任务,并且可以随时查看同步任务的状态。在接收端,您可以通过设置 IP 白名单来规定准入 IP 地址范围。 + +## 2.模型定义 + +![pipe2.png](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Sync-Tool/pipe2.png?raw=true) + +TsFile 同步工具实现了数据从 "流入-> IoTDB ->流出" 的闭环。假设目前有两台机器A和B都安装了IoTDB,希望将 A 上的数据不断同步至 B 中。为了更好地描述这个过程,我们引入以下概念。 + +- Pipe + - 指一次同步任务,在上述案例中,我们可以看作在 A 和 B 之间有一根数据流管道连接了 A 和 B。 + - 一个正常运行的 Pipe 有两种状态:RUNNING 表示正在向接收端同步数据,STOP 表示暂停向接收端同步数据。 +- PipeSink + - 指接收端,在上述案例中,PipeSink 即是 B 这台机器。PipeSink 的类型目前仅支持 IoTDB,即接收端为 B 上安装的 IoTDB 实例。 + +## 3.注意事项 + +- 同步工具的发送端目前仅支持 IoTDB 1.0 版本**单数据副本配置**,接收端支持 IoTDB 1.0 版本任意配置。 +- 当有一个或多个发送端指向一个接收端时,这些发送端和接收端各自的设备路径集合之间应当没有交集,否则可能产生不可预料错误 + - 例如:当发送端A包括路径`root.sg.d.s`,发送端B也包括路径`root.sg.d.s`,当发送端A删除`root.sg` database 时将也会在接收端删除所有B在接收端的`root.sg.d.s`中存放的数据。 +- 两个“端”之间目前不支持相互同步。 +- 同步工具仅同步数据写入,若接收端未创建 database,自动创建与发送端同级 database。当前版本删除操作不保证被同步,不支持 TTL 的设置、Trigger、CQ 等其他操作的同步。 + - 若在发送端设置了 TTL,则启动 Pipe 时候 IoTDB 中所有未过期的数据以及未来所有的数据写入和删除都会被同步至接收端 +- 对同步任务进行操作时,需保证 `SHOW DATANODES` 中所有处于 Running 状态的 DataNode 节点均可连通,否则将执行失败。 + +## 4.快速上手 + +在发送端和接收端执行如下语句即可快速开始两个 IoTDB 之间的数据同步,完整的 SQL 语句和配置事项请查看`配置参数`和`SQL`两节,更多使用范例请参考`使用范例`节。 + +- 启动发送端 IoTDB 与接收端 IoTDB +- 创建接收端为 IoTDB 类型的 Pipe Sink + +``` +IoTDB> CREATE PIPESINK my_iotdb AS IoTDB (ip='接收端IP', port='接收端端口') +``` + +- 创建同步任务Pipe(请确保接收端 IoTDB 已经启动) + +``` +IoTDB> CREATE PIPE my_pipe TO my_iotdb +``` + +- 开始同步任务 + +``` +IoTDB> START PIPE my_pipe +``` + +- 显示所有同步任务状态 + +``` +IoTDB> SHOW PIPES +``` + +- 暂停任务 + +``` +IoTDB> STOP PIPE my_pipe +``` + +- 继续被暂停的任务 + +``` +IoTDB> START PIPE my_pipe +``` + +- 关闭任务(状态信息将被删除) + +``` +IoTDB> DROP PIPE my_pipe +``` + +## 5.配置参数 + +所有参数修改均在`$IOTDB_HOME$/conf/iotdb-system.properties`中,所有修改完成之后执行`load configuration`之后即可立刻生效。 + +### 5.1发送端相关 + +| **参数名** | **max_number_of_sync_file_retry** | +| ---------- | ------------------------------------------ | +| 描述 | 发送端同步文件到接收端失败时的最大重试次数 | +| 类型 | Int : [0,2147483647] | +| 默认值 | 5 | + +### 5.2接收端相关 + +| **参数名** | **ip_white_list** | +| ---------- | ------------------------------------------------------------ | +| 描述 | 设置同步功能发送端 IP 地址的白名单,以网段的形式表示,多个网段之间用逗号分隔。发送端向接收端同步数据时,只有当该发送端 IP 地址处于该白名单设置的网段范围内,接收端才允许同步操作。如果白名单为空,则接收端不允许任何发送端同步数据。默认接收端拒绝除了本地以外的全部 IP 的同步请求。 对该参数进行配置时,需要保证发送端所有 DataNode 地址均被覆盖。 | +| 类型 | String | +| 默认值 | 127.0.0.1/32 | + +## 6.SQL + +### SHOW PIPESINKTYPE + +- 显示当前所能支持的 PipeSink 类型。 + +```Plain%20Text +IoTDB> SHOW PIPESINKTYPE +IoTDB> ++-----+ +| type| ++-----+ +|IoTDB| ++-----+ +``` + +### CREATE PIPESINK + +- 创建接收端为 IoTDB 类型的 PipeSink,其中IP和port是可选参数。当接收端为集群时,填写任意一个 DataNode 的 `rpc_address` 与 `rpc_port`。 + +``` +IoTDB> CREATE PIPESINK AS IoTDB [(ip='127.0.0.1',port=6667);] +``` + +### DROP PIPESINK + +- 删除 PipeSink。当 PipeSink 正在被同步任务使用时,无法删除 PipeSink。 + +``` +IoTDB> DROP PIPESINK +``` + +### SHOW PIPESINK + +- 显示当前所有 PipeSink 定义,结果集有三列,分别表示 PipeSink 的名字,PipeSink 的类型,PipeSink 的属性。 + +``` +IoTDB> SHOW PIPESINKS +IoTDB> SHOW PIPESINK [PipeSinkName] +IoTDB> ++-----------+-----+------------------------+ +| name| type| attributes| ++-----------+-----+------------------------+ +|my_pipesink|IoTDB|ip='127.0.0.1',port=6667| ++-----------+-----+------------------------+ +``` + +### CREATE PIPE + +- 创建同步任务 + - 其中 select 语句目前仅支持`**`(即所有序列中的数据),from 语句目前仅支持`root`,where语句仅支持指定 time 的起始时间。起始时间的指定形式可以是 yyyy-mm-dd HH:MM:SS或时间戳。 + +```Plain%20Text +IoTDB> CREATE PIPE my_pipe TO my_iotdb [FROM (select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS)] +``` + +### STOP PIPE + +- 暂停任务 + +``` +IoTDB> STOP PIPE +``` + +### START PIPE + +- 开始任务 + +``` +IoTDB> START PIPE +``` + +### DROP PIPE + +- 关闭任务(状态信息可被删除) + +``` +IoTDB> DROP PIPE +``` + +### SHOW PIPE + +> 该指令在发送端和接收端均可执行 + +- 显示所有同步任务状态 + + - `create time`:Pipe 的创建时间 + + - `name`:Pipe 的名字 + + - `role`:当前 IoTDB 在 Pipe 中的角色,可能有两种角色: + - sender,当前 IoTDB 为同步发送端 + - receiver,当前 IoTDB 为同步接收端 + + - `remote`:Pipe的对端信息 + - 当 role 为 sender 时,这一字段值为 PipeSink 名称 + - 当 role 为 receiver 时,这一字段值为发送端 IP + + - `status`:Pipe状态 + + - `attributes`:Pipe属性 + - 当 role 为 sender 时,这一字段值为 Pipe 的同步起始时间和是否同步删除操作 + - 当 role 为 receiver 时,这一字段值为当前 DataNode 上创建的同步连接对应的数据库名称 + + - `message`:Pipe运行信息,当 Pipe 正常运行时,这一字段通常为NORMAL,当出现异常时,可能出现两种状态: + - WARN 状态,这表明发生了数据丢失或者其他错误,但是 Pipe 会保持运行 + - ERROR 状态,这表明出现了网络连接正常但数据无法传输的问题,例如发送端 IP 不在接收端白名单中,或是发送端与接收端版本不兼容 + - 当出现 ERROR 状态时,建议 STOP PIPE 后查看 DataNode 日志,检查接收端配置或网络情况后重新 START PIPE + +```Plain%20Text +IoTDB> SHOW PIPES +IoTDB> ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +| create time| name | role| remote| status| attributes|message| ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +|2022-03-30T20:58:30.689|my_pipe1| sender| my_pipesink| STOP|SyncDelOp=false,DataStartTimestamp=0| NORMAL| ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +|2022-03-31T12:55:28.129|my_pipe2|receiver|192.168.11.11| RUNNING| Database='root.vehicle'| NORMAL| ++-----------------------+--------+--------+-------------+---------+------------------------------------+-------+ +``` + +- 显示指定同步任务状态,当未指定PipeName时,与`SHOW PIPES`等效 + +``` +IoTDB> SHOW PIPE [PipeName] +``` + +## 7.使用示例 + +### 目标 + +- 创建一个从边端 IoTDB 到 云端 IoTDB 的 同步工作 +- 边端希望同步从2022年3月30日0时之后的数据 +- 边端不希望同步所有的删除操作 +- 云端 IoTDB 仅接受来自边端的 IoTDB 的数据 + +### 接收端操作 + +`vi conf/iotdb-system.properties` 配置云端参数,将白名单设置为仅接收来自 IP 为 192.168.0.1 的边端的数据 + +``` +#################### +### PIPE Server Configuration +#################### +# White IP list of Sync client. +# Please use the form of network segment to present the range of IP, for example: 192.168.0.0/16 +# If there are more than one IP segment, please separate them by commas +# The default is to allow all IP to sync +# Datatype: String +ip_white_list=192.168.0.1/32 +``` + +### 发送端操作 + +创建云端 PipeSink,指定类型为 IoTDB,指定云端 IP 地址为 192.168.0.1,指定云端的 PipeServer 服务端口为6667 + +``` +IoTDB> CREATE PIPESINK my_iotdb AS IoTDB (ip='192.168.0.1',port=6667) +``` + +创建Pipe,指定连接到my_iotdb的PipeSink,在WHREE子句中输入开始时间点2022年3月30日0时。以下两条执行语句等价。 + +``` +IoTDB> CREATE PIPE p TO my_iotdb FROM (select ** from root where time>=2022-03-30 00:00:00) +IoTDB> CREATE PIPE p TO my_iotdb FROM (select ** from root where time>= 1648569600000) +``` + +启动Pipe + +```Plain%20Text +IoTDB> START PIPE p +``` + +显示同步任务状态 + +``` +IoTDB> SHOW PIPE p +``` + +### 结果验证 + +在发送端执行以下 SQL + +```SQL +CREATE DATABASE root.vehicle; +CREATE TIMESERIES root.vehicle.d0.s0 WITH DATATYPE=INT32, ENCODING=RLE; +CREATE TIMESERIES root.vehicle.d0.s1 WITH DATATYPE=TEXT, ENCODING=PLAIN; +CREATE TIMESERIES root.vehicle.d1.s2 WITH DATATYPE=FLOAT, ENCODING=RLE; +CREATE TIMESERIES root.vehicle.d1.s3 WITH DATATYPE=BOOLEAN, ENCODING=PLAIN; +insert into root.vehicle.d0(timestamp,s0) values(now(),10); +insert into root.vehicle.d0(timestamp,s0,s1) values(now(),12,'12'); +insert into root.vehicle.d0(timestamp,s1) values(now(),'14'); +insert into root.vehicle.d1(timestamp,s2) values(now(),16.0); +insert into root.vehicle.d1(timestamp,s2,s3) values(now(),18.0,true); +insert into root.vehicle.d1(timestamp,s3) values(now(),false); +flush; +``` + +在发送端和接受端执行查询,可查询到相同的结果 + +```Plain%20Text +IoTDB> select ** from root.vehicle ++-----------------------------+------------------+------------------+------------------+------------------+ +| Time|root.vehicle.d0.s0|root.vehicle.d0.s1|root.vehicle.d1.s3|root.vehicle.d1.s2| ++-----------------------------+------------------+------------------+------------------+------------------+ +|2022-04-03T20:08:17.127+08:00| 10| null| null| null| +|2022-04-03T20:08:17.358+08:00| 12| 12| null| null| +|2022-04-03T20:08:17.393+08:00| null| 14| null| null| +|2022-04-03T20:08:17.538+08:00| null| null| null| 16.0| +|2022-04-03T20:08:17.753+08:00| null| null| true| 18.0| +|2022-04-03T20:08:18.263+08:00| null| null| false| null| ++-----------------------------+------------------+------------------+------------------+------------------+ +Total line number = 6 +It costs 0.134s +``` + +## 8.常见问题 + +- 执行 `CREATE PIPESINK demo as IoTDB` 提示 `PIPESINK [demo] already exists in IoTDB.` + - 原因:当前 PipeSink 已存在 + - 解决方案:删除 PipeSink 后重新创建 +- 执行 `DROP PIPESINK pipesinkName` 提示 `Can not drop PIPESINK [demo], because PIPE [mypipe] is using it.` + - 原因:不允许删除有正在运行的PIPE所使用的 PipeSink + - 解决方案:在发送端执行 `SHOW PIPE`,停止使用该 PipeSink 的 PIPE +- 执行 `CREATE PIPE p to demo` 提示 `PIPE [p] is STOP, please retry after drop it.` + - 原因:当前 Pipe 已存在 + - 解决方案:删除 Pipe 后重新创建 +- 执行 `CREATE PIPE p to demo`提示 `Fail to create PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}.` + - 原因:存在状态为 Running 的 DataNode 无法连通 + - 解决方案:执行 `SHOW DATANODES` 语句,检查无法连通的 DataNode 网络,或等待其状态变为 Unknown 后重新执行语句。 +- 执行 `START PIPE p` 提示 `Fail to start PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}.` + - 原因:存在状态为 Running 的 DataNode 无法连通 + - 解决方案:执行 `SHOW DATANODES` 语句,检查无法连通的 DataNode 网络,或等待其状态变为 Unknown 后重新执行语句。 +- 执行 `STOP PIPE p` 提示 `Fail to stop PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}.` + - 原因:存在状态为 Running 的 DataNode 无法连通 + - 解决方案:执行 `SHOW DATANODES` 语句,检查无法连通的 DataNode 网络,或等待其状态变为 Unknown 后重新执行语句。 +- 执行 `DROP PIPE p` 提示 `Fail to DROP_PIPE because Fail to drop PIPE [p] because Connection refused on DataNode: {id=2, internalEndPoint=TEndPoint(ip:127.0.0.1, port:10732)}. Please execute [DROP PIPE p] later to retry.` + - 原因:存在状态为 Running 的 DataNode 无法连通,Pipe 已在部分节点上被删除,状态被置为 ***DROP***。 + - 解决方案:执行 `SHOW DATANODES` 语句,检查无法连通的 DataNode 网络,或等待其状态变为 Unknown 后重新执行语句。 +- 运行时日志提示 `org.apache.iotdb.commons.exception.IoTDBException: root.** already been created as database` + - 原因:同步工具试图在接收端自动创建发送端的Database,属于正常现象 + - 解决方案:无需干预 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Features.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Features.md new file mode 100644 index 00000000..7c1d7bbc --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Features.md @@ -0,0 +1,59 @@ + + +# 主要功能特点 + +IoTDB 具有以下特点: + +* 灵活的部署方式 + * 云端一键部署 + * 终端解压即用 + * 终端-云端无缝连接(数据云端同步工具) +* 低硬件成本的存储解决方案 + * 高压缩比的磁盘存储(10 亿数据点硬盘成本低于 1.4 元) +* 目录结构的时间序列组织管理方式 + * 支持复杂结构的智能网联设备的时间序列组织 + * 支持大量同类物联网设备的时间序列组织 + * 可用模糊方式对海量复杂的时间序列目录结构进行检索 +* 高通量的时间序列数据读写 + * 支持百万级低功耗强连接设备数据接入(海量) + * 支持智能网联设备数据高速读写(高速) + * 以及同时具备上述特点的混合负载 +* 面向时间序列的丰富查询语义 + * 跨设备、跨传感器的时间序列时间对齐 + * 面向时序数据特征的计算 + * 提供面向时间维度的丰富聚合函数支持 +* 极低的学习门槛 + * 支持类 SQL 的数据操作 + * 提供 JDBC 的编程接口 + * 完善的导入导出工具 +* 完美对接开源生态环境 + * 支持开源数据分析生态系统:Hadoop、Spark + * 支持开源可视化工具对接:Grafana +* 统一的数据访问模式 + * 无需进行分库分表处理 + * 无需区分实时库和历史库 +* 高可用性支持 + * 支持HA分布式架构,系统提供7*24小时不间断的实时数据库服务 + * 应用访问系统,可以连接集群中的任何一个节点进行 + * 一个物理节点宕机或网络故障,不会影响系统的正常运行 + * 物理节点的增加、删除或过热,系统会自动进行计算/存储资源的负载均衡处理 + * 支持异构环境,不同类型、不同性能的服务器可以组建集群,系统根据物理机的配置,自动负载均衡 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Files.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Files.md new file mode 100644 index 00000000..802a788e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Files.md @@ -0,0 +1,125 @@ + + +# 数据文件存储 + +本节将介绍 IoTDB 的数据存储方式,便于您对 IoTDB 的数据管理有一个直观的了解。 + +IoTDB 需要存储的数据分为三类,分别为数据文件、系统文件以及写前日志文件。 + +## 数据文件 +> 在 basedir/data/目录下 + +数据文件存储了用户写入 IoTDB 系统的所有数据。包含 TsFile 文件和其他文件,可通过 [data_dirs 配置项](../Reference/DataNode-Config-Manual.md) 进行配置。 + +为了更好的支持用户对于磁盘空间扩展等存储需求,IoTDB 为 TsFile 的存储配置增加了多文件目录的存储方式,用户可自主配置多个存储路径作为数据的持久化位置(详情见 [data_dirs 配置项](../Reference/DataNode-Config-Manual.md)),并可以指定或自定义目录选择策略(详情见 [multi_dir_strategy 配置项](../Reference/DataNode-Config-Manual.md))。 + +### TsFile +> 在 basedir/data/sequence or unsequence/{DatabaseName}/{DataRegionId}/{TimePartitionId}/目录下 +1. {time}-{version}-{inner_compaction_count}-{cross_compaction_count}.tsfile + + 数据文件 +2. {TsFileName}.tsfile.mod + + 更新文件,主要记录删除操作 + +### TsFileResource +1. {TsFileName}.tsfile.resource + + TsFile 的概要与索引文件 + +### 与合并相关的数据文件 +> 在 basedir/data/sequence or unsequence/{DatabaseName}/目录下 + +1. 后缀为`.cross ` 或者 `.inner` + + 合并过程中产生的临时文件 +2. 后缀为`.inner-compaction.log` 或者 `.cross-compaction.log` + + 记录合并进展的日志文件 +3. 后缀为`.compaction.mods` + + 记录合并过程中发生的删除等操作 +4. 后缀为`.meta`的文件 + + 合并过程生成的元数据临时文件 + +## 系统文件 + +系统 Schema 文件,存储了数据文件的元数据信息。可通过 system_dir 配置项进行配置(详情见 [system_dir 配置项](../Reference/DataNode-Config-Manual.md))。 + +### 元数据相关文件 +> 在 basedir/system/schema 目录下 + +#### 元数据 +1. mlog.bin + + 记录的是元数据操作 +2. mtree-1.snapshot + + 元数据快照 +3. mtree-1.snapshot.tmp + + 临时文件,防止快照更新时,损坏旧快照文件 + +#### 标签和属性 +1. tlog.txt + + 存储每个时序的标签和属性 + + 默认情况下每个时序 700 字节 + +### 其他系统文件 +#### Version +> 在 basedir/system/database/{DatabaseName}/{TimePartitionId} or upgrade 目录下 +1. Version-{version} + + 版本号文件,使用文件名来记录当前最大的版本号 + +#### Upgrade +> 在 basedir/system/upgrade 目录下 +1. upgrade.txt + + 记录升级进度 + +#### Authority +> 在 basedir/system/users/目录下是用户信息 +> +> 在 basedir/system/roles/目录下是角色信息 + +#### CompressRatio +> 在 basedir/system/compression_ration 目录下 +1. Ration-{compressionRatioSum}-{calTimes} + + 记录每个文件的压缩率 + +## 写前日志文件 + +写前日志文件存储了系统的写前日志。可通过`wal_dir`配置项进行配置(详情见 [wal_dir 配置项](../Reference/DataNode-Config-Manual.md))。 +> 在 basedir/wal 目录下 +1. {DatabaseName}-{TsFileName}/wal1 + + 每个 memtable 会对应一个 wal 文件 + + +## 数据存储目录设置举例 + +接下来我们将举一个数据目录配置的例子,来具体说明如何配置数据的存储目录。 + +IoTDB 涉及到的所有数据目录路径有:data_dirs, multi_dir_strategy, system_dir 和 wal_dir,它们分别涉及的是 IoTDB 的数据文件、数据文件多目录存储策略、系统文件以及写前日志文件。您可以选择输入路径自行配置,也可以不进行任何操作使用系统默认的配置项。 + +以下我们给出一个用户对五个目录都进行自行配置的例子。 + +``` +dn_system_dir = $IOTDB_HOME/data/datanode/system +dn_data_dirs = /data1/datanode/data, /data2/datanode/data, /data3/datanode/data +dn_multi_dir_strategy=MaxDiskUsableSpaceFirstStrategy +dn_wal_dirs= $IOTDB_HOME/data/datanode/wal +``` +按照上述配置,系统会: + +* 将 TsFile 存储在路径 /data1/datanode/data、路径 /data2/datanode/data 和路径 /data3/datanode/data 中。且对这三个路径的选择策略是:`优先选择磁盘剩余空间最大的目录`,即在每次数据持久化到磁盘时系统会自动选择磁盘剩余空间最大的一个目录将数据进行写入 +* 将系统文件存储在$IOTDB_HOME/data/datanode/data +* 将写前日志文件存储在$IOTDB_HOME/data/datanode/wal diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Flink-SQL-IoTDB.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Flink-SQL-IoTDB.md new file mode 100644 index 00000000..77d02b4c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Flink-SQL-IoTDB.md @@ -0,0 +1,529 @@ +# Apache Flink(SQL) + +flink-sql-iotdb-connector 将 Flink SQL 或者 Flink Table 与 IoTDB 无缝衔接了起来,使得在 Flink 的任务中可以对 IoTDB 进行实时读写,具体可以应用到如下场景中: + +1. 实时数据同步:将数据从一个数据库实时同步到另一个数据库。 +2. 实时数据管道:构建实时数据处理管道,处理和分析数据库中的数据。 +3. 实时数据分析:实时分析数据库中的数据,提供实时的业务洞察。 +4. 实时应用:将数据库中的数据实时应用于实时应用程序,如实时报表、实时推荐等。 +5. 实时监控:实时监控数据库中的数据,检测异常和错误。 + +## 读写模式 + +| 读模式(Source) | 写模式(Sink) | +| ------------------------- | -------------------------- | +| Bounded Scan, Lookup, CDC | Streaming Sink, Batch Sink | + +### 读模式(Source) + +* **Bounded Scan:** bounded scan 的主要实现方式是通过指定 `时间序列` 以及 `查询条件的上下界(可选)`来进行查询,并且查询结果通常为多行数据。这种查询无法获取到查询之后更新的数据。 + +* **Lookup:** lookup 查询模式与 scan 查询模式不同,bounded scan 是对一个时间范围内的数据进行查询,而 `lookup` 查询只会对一个精确的时间点进行查询,所以查询结果只有一行数据。另外只有 `lookup join` 的右表才能使用 lookup 查询模式。 + +* **CDC:** 主要用于 Flink 的 `ETL` 任务当中。当 IoTDB 中的数据发生变化时,flink 会通过我们提供的 `CDC connector` 感知到,我们可以将感知到的变化数据转发给其他的外部数据源,以此达到 ETL 的目的。 + +### 写模式(Sink) + +* **Streaming sink:** 用于 Flink 的 streaming mode 中,会将 Flink 中 Dynamic Table 的增删改记录实时的同步到 IoTDB 中。 + +* **Batch sink:** 用于 Flink 的 batch mode 中,用于将 Flink 的批量计算结果一次性写入 IoTDB 中。 + +## 使用方式 + +我们提供的 flink-sql-iotdb-connector 总共提供两种使用方式,一种是在项目开发过程中通过 Maven 的方式引用,另外一种是在 Flink 的 sql-client 中使用。我们将分别介绍这两种使用方式。 + +> 📌注:flink 版本要求 1.17.0 及以上 +### Maven + +我们只需要在项目的 pom 文件中添加以下依赖即可: + +```xml + + org.apache.iotdb + flink-sql-iotdb-connector + ${iotdb.version} + +``` + +### sql-client + +如果需要在 sql-client 中使用 flink-sql-iotdb-connector,先通过以下步骤来配置环境: + +1. 在 [官网](https://iotdb.apache.org/Download/) 下载带依赖的 flink-sql-iotdb-connector 的 jar 包。 + +2. 将 jar 包拷贝到 `$FLINK_HOME/lib` 目录下。 + +3. 启动 Flink 集群。 + +4. 启动 sql-client。 + +此时就可以在 sql-client 中使用 flink-sql-iotdb-connector 了。 + +## 表结构规范 + +无论使用哪种类型的连接器,都需要满足以下的表结构规范: + +- 所有使用 `IoTDB connector` 的表,第一列的列名必须是 `Time_`,而且数据类型必须是 `BIGINT` 类型。 +- 除了 `Time_` 列以外的列名必须以 `root.` 开头。另外列名中的任意节点不能是纯数字,如果有纯数字,或者其他非法字符,必须使用反引号扩起来。比如:路径 root.sg.d0.123 是一个非法路径,但是 root.sg.d0.\`123\` 就是一个合法路径。 +- 无论使用 `pattern` 或者 `sql` 从 IoTDB 中查询数据,查询结果的时间序列名需要包含 Flink 中除了 `Time_` 以外的所有列名。如果没有查询结果中没有相应的列名,则该列将用 null 去填充。 +- flink-sql-iotdb-connector 中支持的数据类型有:`INT`, `BIGINT`, `FLOAT`, `DOUBLE`, `BOOLEAN`, `STRING`。Flink Table 中每一列的数据类型与其 IoTDB 中对应的时间序列类型都要匹配上,否则将会报错,并退出 Flink 任务。 + +以下用几个例子来说明 IoTDB 中的时间序列与 Flink Table 中列的对应关系。 + +## 读模式(Source) + +### Scan Table (Bounded) + +#### 参数 + +| 参数 | 必填 | 默认 | 类型 | 描述 | +| ------------------------ | ---- | -------------- | ------ | ------------------------------------------------------------ | +| nodeUrls | 否 | 127.0.0.1:6667 | String | 用来指定 IoTDB 的 datanode 地址,如果 IoTDB 是用集群模式搭建的话,可以指定多个地址,每个节点用逗号隔开。 | +| user | 否 | root | String | IoTDB 用户名 | +| password | 否 | root | String | IoTDB 密码 | +| scan.bounded.lower-bound | 否 | -1L | Long | bounded 的 scan 查询时的时间戳下界(包括),参数大于`0`时有效。 | +| scan.bounded.upper-bound | 否 | -1L | Long | bounded 的 scan 查询时的时间戳下界(包括),参数大于`0`时有效。 | +| sql | 是 | 无 | String | 用于在 IoTDB 端做查询。 | + +#### 示例 + +该示例演示了如何在一个 Flink Table Job 中从 IoTDB 中通过`scan table`的方式读取数据: +当前 IoTDB 中的数据如下: +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1| ++-----------------------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.028s +``` + +```java +import org.apache.flink.table.api.*; + +public class BoundedScanTest { + public static void main(String[] args) throws Exception { + // setup table environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + // setup schema + Schema iotdbTableSchema = + Schema.newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + // register table + TableDescriptor iotdbDescriptor = + TableDescriptor.forConnector("IoTDB") + .schema(iotdbTableSchema) + .option("nodeUrls", "127.0.0.1:6667") + .option("sql", "select ** from root") + .build(); + tableEnv.createTemporaryTable("iotdbTable", iotdbDescriptor); + + // output table + tableEnv.from("iotdbTable").execute().print(); + } +} +``` +执行完以上任务后,Flink 的控制台中输出的表如下: +```text ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| op | Time_ | root.sg.d0.s0 | root.sg.d1.s0 | root.sg.d1.s1 | ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| +I | 1 | 1.0833644 | 2.34874 | 1.2414109 | +| +I | 2 | 4.929185 | 3.1885583 | 4.6980085 | +| +I | 3 | 3.5206156 | 3.5600138 | 4.8080945 | +| +I | 4 | 1.3449302 | 2.8781595 | 3.3195343 | +| +I | 5 | 3.3079383 | 3.3840187 | 3.7278645 | ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +``` + +### Lookup Point + +#### 参数 + +| 参数 | 必填 | 默认 | 类型 | 描述 | +| --------------------- | ---- | -------------- | ------- | ------------------------------------------------------------ | +| nodeUrls | 否 | 127.0.0.1:6667 | String | 用来指定 IoTDB 的 datanode 地址,如果 IoTDB 是用集群模式搭建的话,可以指定多个地址,每个节点用逗号隔开。 | +| user | 否 | root | String | IoTDB 用户名 | +| password | 否 | root | String | IoTDB 密码 | +| lookup.cache.max-rows | 否 | -1 | Integer | lookup 查询时,缓存表的最大行数,参数大于`0`时生效。 | +| lookup.cache.ttl-sec | 否 | -1 | Integer | lookup 查询时,单点数据的丢弃时间,单位为`秒`。 | +| sql | 是 | 无 | String | 用于在 IoTDB 端做查询。 | + +#### 示例 + +该示例演示了如何将 IoTDB 中的`device`作为维度表进行`lookup`查询: + +* 使用 `datagen connector` 生成两个字段作为 `Lookup Join` 的左表。第一个字段为自增字段,用来表示时间戳。第二个字段为随机字段,用来表示一个 + measurement 产生的时间序列。 +* 通过 `IoTDB connector` 注册一个表作为 `Lookup Join` 的右表。 +* 将两个表 join 起来。 + +当前 IoTDB 中的数据如下: + +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1| ++-----------------------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.028s +``` + +```java +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.EnvironmentSettings; +import org.apache.flink.table.api.Schema; +import org.apache.flink.table.api.TableDescriptor; +import org.apache.flink.table.api.TableEnvironment; + +public class LookupTest { + public static void main(String[] args) { + // setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + + // register left table + Schema dataGenTableSchema = + Schema.newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("s0", DataTypes.INT()) + .build(); + + TableDescriptor datagenDescriptor = + TableDescriptor.forConnector("datagen") + .schema(dataGenTableSchema) + .option("fields.Time_.kind", "sequence") + .option("fields.Time_.start", "1") + .option("fields.Time_.end", "5") + .option("fields.s0.min", "1") + .option("fields.s0.max", "1") + .build(); + tableEnv.createTemporaryTable("leftTable", datagenDescriptor); + + // register right table + Schema iotdbTableSchema = + Schema.newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + + TableDescriptor iotdbDescriptor = + TableDescriptor.forConnector("IoTDB") + .schema(iotdbTableSchema) + .option("sql", "select ** from root") + .build(); + tableEnv.createTemporaryTable("rightTable", iotdbDescriptor); + + // join + String sql = + "SELECT l.Time_, l.s0,r.`root.sg.d0.s0`, r.`root.sg.d1.s0`, r.`root.sg.d1.s1`" + + "FROM (select *,PROCTIME() as proc_time from leftTable) AS l " + + "JOIN rightTable FOR SYSTEM_TIME AS OF l.proc_time AS r " + + "ON l.Time_ = r.Time_"; + + // output table + tableEnv.sqlQuery(sql).execute().print(); + } +} +``` +执行完以上任务后,Flink 的控制台中输出的表如下: +```text ++----+----------------------+-------------+---------------+----------------------+--------------------------------+ +| op | Time_ | s0 | root.sg.d0.s0 | root.sg.d1.s0 | root.sg.d1.s1 | ++----+----------------------+-------------+---------------+----------------------+--------------------------------+ +| +I | 5 | 1 | 3.3079383 | 3.3840187 | 3.7278645 | +| +I | 2 | 1 | 4.929185 | 3.1885583 | 4.6980085 | +| +I | 1 | 1 | 1.0833644 | 2.34874 | 1.2414109 | +| +I | 4 | 1 | 1.3449302 | 2.8781595 | 3.3195343 | +| +I | 3 | 1 | 3.5206156 | 3.5600138 | 4.8080945 | ++----+----------------------+-------------+---------------+----------------------+--------------------------------+ +``` + +### CDC + +#### 参数 + +| 参数 | 必填 | 默认 | 类型 | 描述 | +| ------------- | ---- | -------------- | ------- | ------------------------------------------------------------ | +| nodeUrls | 否 | 127.0.0.1:6667 | String | 用来指定 IoTDB 的 datanode 地址,如果 IoTDB 是用集群模式搭建的话,可以指定多个地址,每个节点用逗号隔开。 | +| user | 否 | root | String | IoTDB 用户名 | +| password | 否 | root | String | IoTDB 密码 | +| mode | 是 | BOUNDED | ENUM | **必须将此参数设置为 `CDC` 才能启动** | +| sql | 是 | 无 | String | 用于在 IoTDB 端做查询。 | +| cdc.port | 否 | 8080 | Integer | 在 IoTDB 端提供 CDC 服务的端口号。 | +| cdc.task.name | 是 | 无 | String | 当 mode 参数设置为 CDC 时是必填项。用于在 IoTDB 端创建 Pipe 任务。 | +| cdc.pattern | 是 | 无 | String | 当 mode 参数设置为 CDC 时是必填项。用于在 IoTDB 端作为发送数据的过滤条件。 | + +#### 示例 + +该示例演示了如何通过 `CDC Connector` 去获取 IoTDB 中指定路径下的变化数据: + +* 通过 `CDC Connector` 创建一张 `CDC` 表。 +* 将 `CDC` 表打印出来。 + +```java +import org.apache.flink.table.api.*; + +public class CDCTest { + public static void main(String[] args) { + // setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + // setup schema + Schema iotdbTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + + // register table + TableDescriptor iotdbDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(iotdbTableSchema) + .option("mode", "CDC") + .option("cdc.task.name", "test") + .option("cdc.pattern", "root.sg") + .build(); + tableEnv.createTemporaryTable("iotdbTable", iotdbDescriptor); + + // output table + tableEnv.from("iotdbTable").execute().print(); + } +} +``` +运行以上的 Flink CDC 任务,然后在 IoTDB-cli 中执行以下 SQL: +```sql +insert into root.sg.d1(timestamp,s0,s1) values(6,1.0,1.0); +insert into root.sg.d1(timestamp,s0,s1) values(7,1.0,1.0); +insert into root.sg.d1(timestamp,s0,s1) values(6,2.0,1.0); +insert into root.sg.d0(timestamp,s0) values(7,2.0); +``` +然后,Flink 的控制台中将打印该条数据: +```text ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| op | Time_ | root.sg.d0.s0 | root.sg.d1.s0 | root.sg.d1.s1 | ++----+----------------------+--------------------------------+--------------------------------+--------------------------------+ +| +I | 7 | | 1.0 | 1.0 | +| +I | 6 | | 1.0 | 1.0 | +| +I | 6 | | 2.0 | 1.0 | +| +I | 7 | 2.0 | | | +``` + +## 写模式(Sink) + +### Streaming Sink + +#### 参数 + +| 参数 | 必填 | 默认 | 类型 | 描述 | +| -------- | ---- | -------------- | ------- | ------------------------------------------------------------ | +| nodeUrls | 否 | 127.0.0.1:6667 | String | 用来指定 IoTDB 的 datanode 地址,如果 IoTDB 是用集群模式搭建的话,可以指定多个地址,每个节点用逗号隔开。 | +| user | 否 | root | String | IoTDB 用户名 | +| password | 否 | root | String | IoTDB 密码 | +| aligned | 否 | false | Boolean | 向 IoTDB 写入数据时是否调用`aligned`接口。 | + +#### 示例 + +该示例演示了如何在一个 Flink Table 的 Streaming Job 中如何将数据写入到 IoTDB 中: + +* 通过 `datagen connector` 生成一张源数据表。 +* 通过 `IoTDB connector` 注册一个输出表。 +* 将数据源表的数据插入到输出表中。 + +```java +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.EnvironmentSettings; +import org.apache.flink.table.api.Schema; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableDescriptor; +import org.apache.flink.table.api.TableEnvironment; + +public class StreamingSinkTest { + public static void main(String[] args) { + // setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inStreamingMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + + // create data source table + Schema dataGenTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + TableDescriptor descriptor = TableDescriptor + .forConnector("datagen") + .schema(dataGenTableSchema) + .option("rows-per-second", "1") + .option("fields.Time_.kind", "sequence") + .option("fields.Time_.start", "1") + .option("fields.Time_.end", "5") + .option("fields.root.sg.d0.s0.min", "1") + .option("fields.root.sg.d0.s0.max", "5") + .option("fields.root.sg.d1.s0.min", "1") + .option("fields.root.sg.d1.s0.max", "5") + .option("fields.root.sg.d1.s1.min", "1") + .option("fields.root.sg.d1.s1.max", "5") + .build(); + // register source table + tableEnv.createTemporaryTable("dataGenTable", descriptor); + Table dataGenTable = tableEnv.from("dataGenTable"); + + // create iotdb sink table + TableDescriptor iotdbDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(dataGenTableSchema) + .build(); + tableEnv.createTemporaryTable("iotdbSinkTable", iotdbDescriptor); + + // insert data + dataGenTable.executeInsert("iotdbSinkTable").print(); + } +} +``` + +上述任务执行完成后,在 IoTDB 的 cli 中查询结果如下: + +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1| ++-----------------------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.054s +``` + +### Batch Sink + +#### 参数 + +| 参数 | 必填 | 默认 | 类型 | 描述 | +| -------- | ---- | -------------- | ------- | ------------------------------------------------------------ | +| nodeUrls | 否 | 127.0.0.1:6667 | String | 用来指定 IoTDB 的 datanode 地址,如果 IoTDB 是用集群模式搭建的话,可以指定多个地址,每个节点用逗号隔开。 | +| user | 否 | root | String | IoTDB 用户名 | +| password | 否 | root | String | IoTDB 密码 | +| aligned | 否 | false | Boolean | 向 IoTDB 写入数据时是否调用`aligned`接口。 | + +#### 示例 + +该示例演示了如何在一个 Flink Table 的 Batch Job 中如何将数据写入到 IoTDB 中: + +* 通过 `IoTDB connector` 生成一张源数据表。 +* 通过 `IoTDB connector` 注册一个输出表。 +* 将原数据表中的列重命名后写入写回 IoTDB。 + +```java +import org.apache.flink.table.api.DataTypes; +import org.apache.flink.table.api.EnvironmentSettings; +import org.apache.flink.table.api.Schema; +import org.apache.flink.table.api.Table; +import org.apache.flink.table.api.TableDescriptor; +import org.apache.flink.table.api.TableEnvironment; + +import static org.apache.flink.table.api.Expressions.$; + +public class BatchSinkTest { + public static void main(String[] args) { + // setup environment + EnvironmentSettings settings = EnvironmentSettings + .newInstance() + .inBatchMode() + .build(); + TableEnvironment tableEnv = TableEnvironment.create(settings); + + // create source table + Schema sourceTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d0.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s0", DataTypes.FLOAT()) + .column("root.sg.d1.s1", DataTypes.FLOAT()) + .build(); + TableDescriptor sourceTableDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(sourceTableSchema) + .option("sql", "select ** from root.sg.d0,root.sg.d1") + .build(); + + tableEnv.createTemporaryTable("sourceTable", sourceTableDescriptor); + Table sourceTable = tableEnv.from("sourceTable"); + // register sink table + Schema sinkTableSchema = Schema + .newBuilder() + .column("Time_", DataTypes.BIGINT()) + .column("root.sg.d2.s0", DataTypes.FLOAT()) + .column("root.sg.d3.s0", DataTypes.FLOAT()) + .column("root.sg.d3.s1", DataTypes.FLOAT()) + .build(); + TableDescriptor sinkTableDescriptor = TableDescriptor + .forConnector("IoTDB") + .schema(sinkTableSchema) + .build(); + tableEnv.createTemporaryTable("sinkTable", sinkTableDescriptor); + + // insert data + sourceTable.renameColumns( + $("root.sg.d0.s0").as("root.sg.d2.s0"), + $("root.sg.d1.s0").as("root.sg.d3.s0"), + $("root.sg.d1.s1").as("root.sg.d3.s1") + ).insertInto("sinkTable").execute().print(); + } +} +``` + +上述任务执行完成后,在 IoTDB 的 cli 中查询结果如下: + +```text +IoTDB> select ** from root; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d0.s0|root.sg.d1.s0|root.sg.d1.s1|root.sg.d2.s0|root.sg.d3.s0|root.sg.d3.s1| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 1.0833644| 2.34874| 1.2414109| 1.0833644| 2.34874| 1.2414109| +|1970-01-01T08:00:00.002+08:00| 4.929185| 3.1885583| 4.6980085| 4.929185| 3.1885583| 4.6980085| +|1970-01-01T08:00:00.003+08:00| 3.5206156| 3.5600138| 4.8080945| 3.5206156| 3.5600138| 4.8080945| +|1970-01-01T08:00:00.004+08:00| 1.3449302| 2.8781595| 3.3195343| 1.3449302| 2.8781595| 3.3195343| +|1970-01-01T08:00:00.005+08:00| 3.3079383| 3.3840187| 3.7278645| 3.3079383| 3.3840187| 3.7278645| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +Total line number = 5 +It costs 0.015s +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/General-SQL-Statements.md b/src/zh/UserGuide/V2.0.1/Tree/stage/General-SQL-Statements.md new file mode 100644 index 00000000..6446edb5 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/General-SQL-Statements.md @@ -0,0 +1,171 @@ + + +# 常用SQL语句 + +## 数据库管理 + +数据库(Database)类似关系数据库中的 Database,是一组结构化的时序数据的集合。 + +* 创建数据库 + + 创建一个名为 root.ln 的数据库,语法如下: +``` +CREATE DATABASE root.ln +``` +* 查看数据库 + + +查看所有数据库: +``` +SHOW DATABASES +``` +* 删除数据库 + + +删除名为 root.ln 的数据库: +``` +DELETE DATABASE root.ln +``` +* 统计数据库数量 + + +统计数据库的总数 +``` +COUNT DATABASES +``` +## 时间序列管理 + +时间序列(Timeseries)是以时间为索引的数据点的集合,在IoTDB中时间序列指的是一个测点的完整序列,本节主要介绍时间序列的管理方式。 + +* 创建时间序列 + +需指定编码方式与数据类型。例如创建一条名为root.ln.wf01.wt01.temperature的时间序列: +``` +CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH datatype=FLOAT,ENCODING=RLE +``` +* 查看时间序列 + + +查看所有时间序列: +``` +SHOW TIMESERIES +``` + +使用通配符匹配数据库root.ln下的时间序列: + +``` +SHOW TIMESERIES root.ln.** +``` +* 删除时间序列 + + +删除名为 root.ln.wf01.wt01.temperature 的时间序列 +``` +DELETE TIMESERIES root.ln.wf01.wt01.temperature +``` +* 统计时间序列 + + +统计时间序列的总数 +``` +COUNT TIMESERIES root.** +``` +统计某通配符路径下的时间序列数量: +``` +COUNT TIMESERIES root.ln.** +``` +## 时间序列路径管理 + +除时间序列概念外,IoTDB中还有子路径、设备的概念。 + +**子路径:**是一条完整时间序列名称中的一部分路径,如时间序列名称为root.ln.wf01.wt01.temperature,则root.ln、root.ln.wf01、root.ln.wf01.wt01都是其子路径。 + +**设备:**是一组时间序列的组合,在 IoTDB 中设备是由root至倒数第二级节点的子路径,如时间序列名称为root.ln.wf01.wt01.temperature,则root.ln.wf01.wt01是其设备 + +* 查看设备 +``` +SHOW DEVICES +``` + +* 查看子路径 + + +查看 root.ln 的下一层: +``` +SHOW CHILD PATHS root.ln +``` +* 查看子节点 + + +查看 root.ln 的下一层: +``` +SHOW CHILD NODES root.ln +``` +* 统计设备数量 + + +统计所有设备 +``` +COUNT DEVICES +``` +* 统计节点数 + + +统计路径中指定层级的节点个数 +``` +COUNT NODES root.ln.** LEVEL=2 +``` +## 查询数据 + +以下为IoTDB中常用查询语句。 + +* 查询指定时间序列的数据 + +查询root.ln.wf01.wt01设备下的所有时间序列的数据 + +``` +SELECT * FROM root.ln.wf01.wt01 +``` + +* 查询某时间范围内的时间序列数据 + +查询root.ln.wf01.wt01.temperature时间序列中时间戳大于 2022-01-01T00:05:00.000 的数据 + +``` +SELECT temperature FROM root.ln.wf01.wt01 WHERE time > 2022-01-01T00:05:00.000 +``` + +* 查询数值在指定范围内的时间序列数据 + +查询root.ln.wf01.wt01.temperature时间序列中数值大于 36.5 的数据: + +``` +SELECT temperature FROM root.ln.wf01.wt01 WHERE temperature > 36.5 +``` + +* 使用 last 查询最新点数据 +``` +SELECT last * FROM root.ln.wf01.wt01 +``` + + + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/InfluxDB-Protocol.md b/src/zh/UserGuide/V2.0.1/Tree/stage/InfluxDB-Protocol.md new file mode 100644 index 00000000..e10cb1b2 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/InfluxDB-Protocol.md @@ -0,0 +1,347 @@ + + +## 0.引入依赖 + +```xml + + org.apache.iotdb + influxdb-protocol + 1.0.0 + +``` + +这里是一些使用 InfluxDB-Protocol 适配器连接 IoTDB 的[示例](https://github.com/apache/iotdb/blob/rel/1.1/example/influxdb-protocol-example/src/main/java/org/apache/iotdb/influxdb/InfluxDBExample.java) + + +## 1.切换方案 + +假如您原先接入 InfluxDB 的业务代码如下: + +```java +InfluxDB influxDB = InfluxDBFactory.connect(openurl, username, password); +``` + +您只需要将 InfluxDBFactory 替换为 **IoTDBInfluxDBFactory** 即可实现业务向 IoTDB 的切换: + +```java +InfluxDB influxDB = IoTDBInfluxDBFactory.connect(openurl, username, password); +``` + +## 2.方案设计 + +### 2.1 InfluxDB-Protocol适配器 + +该适配器以 IoTDB Java ServiceProvider 接口为底层基础,实现了 InfluxDB 的 Java 接口 `interface InfluxDB`,对用户提供了所有 InfluxDB 的接口方法,最终用户可以无感知地使用 InfluxDB 协议向 IoTDB 发起写入和读取请求。 + +![architecture-design](https://alioss.timecho.com/docs/img/UserGuide/API/IoTDB-InfluxDB/architecture-design.png?raw=true) + +![class-diagram](https://alioss.timecho.com/docs/img/UserGuide/API/IoTDB-InfluxDB/class-diagram.png?raw=true) + + +### 2.2 元数据格式转换 + +InfluxDB 的元数据是 tag-field 模型,IoTDB 的元数据是树形模型。为了使适配器能够兼容 InfluxDB 协议,需要把 InfluxDB 的元数据模型转换成 IoTDB 的元数据模型。 + +#### 2.2.1 InfluxDB 元数据 + +1. database: 数据库名。 +2. measurement: 测量指标名。 +3. tags : 各种有索引的属性。 +4. fields : 各种记录值(没有索引的属性)。 + +![influxdb-data](https://alioss.timecho.com/docs/img/UserGuide/API/IoTDB-InfluxDB/influxdb-data.png?raw=true) + +#### 2.2.2 IoTDB 元数据 + +1. database: 数据库。 +2. path(time series ID):存储路径。 +3. measurement: 物理量。 + +![iotdb-data](https://alioss.timecho.com/docs/img/UserGuide/API/IoTDB-InfluxDB/iotdb-data.png?raw=true) + +#### 2.2.3 两者映射关系 + +InfluxDB 元数据和 IoTDB 元数据有着如下的映射关系: +1. InfluxDB 中的 database 和 measurement 组合起来作为 IoTDB 中的 database。 +2. InfluxDB 中的 field key 作为 IoTDB 中 measurement 路径,InfluxDB 中的 field value 即是该路径下记录的测点值。 +3. InfluxDB 中的 tag 在 IoTDB 中使用 database 和 measurement 之间的路径表达。InfluxDB 的 tag key 由 database 和 measurement 之间路径的顺序隐式表达,tag value 记录为对应顺序的路径的名称。 + +InfluxDB 元数据向 IoTDB 元数据的转换关系可以由下面的公示表示: + +`root.{database}.{measurement}.{tag value 1}.{tag value 2}...{tag value N-1}.{tag value N}.{field key}` + +![influxdb-vs-iotdb-data](https://alioss.timecho.com/docs/img/UserGuide/API/IoTDB-InfluxDB/influxdb-vs-iotdb-data.png?raw=true) + +如上图所示,可以看出: + +我们在 IoTDB 中使用 database 和 measurement 之间的路径来表达 InfluxDB tag 的概念,也就是图中右侧绿色方框的部分。 + +database 和 measurement 之间的每一层都代表一个 tag。如果 tag key 的数量为 N,那么 database 和 measurement 之间的路径的层数就是 N。我们对 database 和 measurement 之间的每一层进行顺序编号,每一个序号都和一个 tag key 一一对应。同时,我们使用 database 和 measurement 之间每一层 **路径的名字** 来记 tag value,tag key 可以通过自身的序号找到对应路径层级下的 tag value. + +#### 2.2.4 关键问题 + +在 InfluxDB 的 SQL 语句中,tag 出现的顺序的不同并不会影响实际的执行结果。 + +例如:`insert factory, workshop=A1, production=B1 temperature=16.9` 和 `insert factory, production=B1, workshop=A1 temperature=16.9` 两条 InfluxDB SQL 的含义(以及执行结果)相等。 + +但在 IoTDB 中,上述插入的数据点可以存储在 `root.monitor.factory.A1.B1.temperature` 下,也可以存储在 `root.monitor.factory.B1.A1.temperature` 下。因此,IoTDB 路径中储存的 InfluxDB 的 tag 的顺序是需要被特别考虑的,因为 `root.monitor.factory.A1.B1.temperature` 和 +`root.monitor.factory.B1.A1.temperature` 是两条不同的序列。我们可以认为,IoTDB 元数据模型对 tag 顺序的处理是“敏感”的。 + +基于上述的考虑,我们还需要在 IoTDB 中记录 InfluxDB 每个 tag 对应在 IoTDB 路径中的层级顺序,以确保在执行 InfluxDB SQL 时,不论 InfluxDB SQL 中 tag 出现的顺序如何,只要该 SQL 表达的是对同一个时间序列上的操作,那么适配器都可以唯一对应到 IoTDB 中的一条时间序列上进行操作。 + +这里还需要考虑的另一个问题是:InfluxDB 的 tag key 及对应顺序关系应该如何持久化到 IoTDB 数据库中,以确保不会丢失相关信息。 + +**解决方案:** + +**tag key 对应顺序关系在内存中的形式** + +通过利用内存中的`Map>` 这样一个 Map 结构,来维护 tag 在 IoTDB 路径层级上的顺序。 + +``` java + Map> measurementTagOrder +``` + +可以看出 Map 是一个两层的结构。 + +第一层的 Key 是 String 类型的 InfluxDB measurement,第一层的 Value 是一个 结构的 Map。 + +第二层的 Key 是 String 类型的 InfluxDB tag key,第二层的 Value 是 Integer 类型的 tag order,也就是 tag 在 IoTDB 路径层级上的顺序。 + +使用时,就可以先通过 InfluxDB measurement 定位,再通过 InfluxDB tag key 定位,最后就可以获得 tag 在 IoTDB 路径层级上的顺序了。 + +**tag key 对应顺序关系的持久化方案** + +Database 为`root.TAG_INFO`,分别用 database 下的 `database_name`, `measurement_name`, `tag_name` 和 `tag_order` 测点来存储 tag key及其对应的顺序关系。 + +``` ++-----------------------------+---------------------------+------------------------------+----------------------+-----------------------+ +| Time|root.TAG_INFO.database_name|root.TAG_INFO.measurement_name|root.TAG_INFO.tag_name|root.TAG_INFO.tag_order| ++-----------------------------+---------------------------+------------------------------+----------------------+-----------------------+ +|2021-10-12T01:21:26.907+08:00| monitor| factory| workshop| 1| +|2021-10-12T01:21:27.310+08:00| monitor| factory| production| 2| +|2021-10-12T01:21:27.313+08:00| monitor| factory| cell| 3| +|2021-10-12T01:21:47.314+08:00| building| cpu| tempture| 1| ++-----------------------------+---------------------------+------------------------------+----------------------+-----------------------+ +``` + + + +### 2.3 实例 + +#### 2.3.1 插入数据 + +1. 假定按照以下的顺序插入三条数据到 InfluxDB 中 (database=monitor): + + (1)`insert student,name=A,phone=B,sex=C score=99` + + (2)`insert student,address=D score=98` + + (3)`insert student,name=A,phone=B,sex=C,address=D score=97` + +2. 简单对上述 InfluxDB 的时序进行解释,database 是 monitor; measurement 是student;tag 分别是 name,phone、sex 和 address;field 是 score。 + +对应的InfluxDB的实际存储为: + +``` +time address name phone sex socre +---- ------- ---- ----- --- ----- +1633971920128182000 A B C 99 +1633971947112684000 D 98 +1633971963011262000 D A B C 97 +``` + + +3. IoTDB顺序插入三条数据的过程如下: + + (1)插入第一条数据时,需要将新出现的三个 tag key 更新到 table 中,IoTDB 对应的记录 tag 顺序的 table 为: + + | database | measurement | tag_key | Order | + | -------- | ----------- | ------- | ----- | + | monitor | student | name | 0 | + | monitor | student | phone | 1 | + | monitor | student | sex | 2 | + + (2)插入第二条数据时,由于此时记录 tag 顺序的 table 中已经有了三个 tag key,因此需要将出现的第四个 tag key=address 更新记录。IoTDB 对应的记录 tag 顺序的 table 为: + + | database | measurement | tag_key | order | + | -------- | ----------- | ------- | ----- | + | monitor | student | name | 0 | + | monitor | student | phone | 1 | + | monitor | student | sex | 2 | + | monitor | student | address | 3 | + + (3)插入第三条数据时,此时四个 tag key 都已经记录过,所以不需要更新记录,IoTDB 对应的记录 tag 顺序的 table 为: + + | database | measurement | tag_key | order | + | -------- | ----------- | ------- | ----- | + | monitor | student | name | 0 | + | monitor | student | phone | 1 | + | monitor | student | sex | 2 | + | monitor | student | address | 3 | + +4. (1)第一条插入数据对应 IoTDB 时序为 root.monitor.student.A.B.C + + (2)第二条插入数据对应 IoTDB 时序为 root.monitor.student.PH.PH.PH.D (其中PH表示占位符)。 + + 需要注意的是,由于该条数据的 tag key=address 是第四个出现的,但是自身却没有对应的前三个 tag 值,因此需要用 PH 占位符来代替。这样做的目的是保证每条数据中的 tag 顺序不会乱,是符合当前顺序表中的顺序,从而查询数据的时候可以进行指定 tag 过滤。 + + (3)第三条插入数据对应 IoTDB 时序为 root.monitor.student.A.B.C.D + + 对应的 IoTDB 的实际存储为: + +``` ++-----------------------------+--------------------------------+-------------------------------------+----------------------------------+ +| Time|root.monitor.student.A.B.C.score|root.monitor.student.PH.PH.PH.D.score|root.monitor.student.A.B.C.D.score| ++-----------------------------+--------------------------------+-------------------------------------+----------------------------------+ +|2021-10-12T01:21:26.907+08:00| 99| NULL| NULL| +|2021-10-12T01:21:27.310+08:00| NULL| 98| NULL| +|2021-10-12T01:21:27.313+08:00| NULL| NULL| 97| ++-----------------------------+--------------------------------+-------------------------------------+----------------------------------+ +``` + +5. 如果上面三条数据插入的顺序不一样,我们可以看到对应的实际path路径也就发生了改变,因为InfluxDB数据中的Tag出现顺序发生了变化,所对应的到IoTDB中的path节点顺序也就发生了变化。 + + 但是这样实际并不会影响查询的正确性,因为一旦Influxdb的Tag顺序确定之后,查询也会按照这个顺序表记录的顺序进行Tag值过滤。所以并不会影响查询的正确性。 + +#### 2.3.2 查询数据 + +1. 查询student中phone=B的数据。在database=monitor,measurement=student中tag=phone的顺序为1,order最大值是3,对应到IoTDB的查询为: + + ```sql + select * from root.monitor.student.*.B + ``` + +2. 查询student中phone=B且score>97的数据,对应到IoTDB的查询为: + + ```sql + select * from root.monitor.student.*.B where score>97 + ``` + +3. 查询student中phone=B且score>97且时间在最近七天内的的数据,对应到IoTDB的查询为: + + ```sql + select * from root.monitor.student.*.B where score>97 and time > now()-7d + ``` + + +4. 查询student中name=A或score>97,由于tag存储在路径中,因此没有办法用一次查询同时完成tag和field的**或**语义查询,因此需要多次查询进行或运算求并集,对应到IoTDB的查询为: + + ```sql + select * from root.monitor.student.A + select * from root.monitor.student where score>97 + ``` + 最后手动对上面两次查询结果求并集。 + +5. 查询student中(name=A或phone=B或sex=C)且score>97,由于tag存储在路径中,因此没有办法用一次查询完成tag的**或**语义, 因此需要多次查询进行或运算求并集,对应到IoTDB的查询为: + + ```sql + select * from root.monitor.student.A where score>97 + select * from root.monitor.student.*.B where score>97 + select * from root.monitor.student.*.*.C where score>97 + ``` + 最后手动对上面三次查询结果求并集。 + +## 3 支持情况 + +### 3.1 InfluxDB版本支持情况 + +目前支持InfluxDB 1.x 版本,暂不支持InfluxDB 2.x 版本。 + +`influxdb-java`的maven依赖支持2.21+,低版本未进行测试。 + +### 3.2 函数接口支持情况 + +目前支持的接口函数如下: + +```java +public Pong ping(); + +public String version(); + +public void flush(); + +public void close(); + +public InfluxDB setDatabase(final String database); + +public QueryResult query(final Query query); + +public void write(final Point point); + +public void write(final String records); + +public void write(final List records); + +public void write(final String database,final String retentionPolicy,final Point point); + +public void write(final int udpPort,final Point point); + +public void write(final BatchPoints batchPoints); + +public void write(final String database,final String retentionPolicy, +final ConsistencyLevel consistency,final String records); + +public void write(final String database,final String retentionPolicy, +final ConsistencyLevel consistency,final TimeUnit precision,final String records); + +public void write(final String database,final String retentionPolicy, +final ConsistencyLevel consistency,final List records); + +public void write(final String database,final String retentionPolicy, +final ConsistencyLevel consistency,final TimeUnit precision,final List records); + +public void write(final int udpPort,final String records); + +public void write(final int udpPort,final List records); +``` + +### 3.3 查询语法支持情况 + +目前支持的查询sql语法为 + +```sql +SELECT [, , ] +FROM +WHERE [( AND | OR) [...]] +``` + +WHERE子句在`field`,`tag`和`timestamp`上支持`conditional_expressions`. + +#### field + +```sql +field_key ['string' | boolean | float | integer] +``` + +#### tag + +```sql +tag_key ['tag_value'] +``` + +#### timestamp + +```sql +timestamp ['time'] +``` + +目前timestamp的过滤条件只支持now()有关表达式,如:now()-7D,具体的时间戳暂不支持。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Integration-Test/Integration-Test-refactoring-tutorial.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Integration-Test/Integration-Test-refactoring-tutorial.md new file mode 100644 index 00000000..a81cf45c --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Integration-Test/Integration-Test-refactoring-tutorial.md @@ -0,0 +1,225 @@ + + +# 集成测试开发者文档 + +**集成测试**是软件测试中的一个阶段。在该阶段中,各个软件模块被组合起来作为一个整体进行测试。进行集成测试是为了评估某系统或某组件是否符合指定的功能需求。 + + +## Apache IoTDB 集成测试规范 + +### Apache IoTDB 集成测试的环境 + +Apache IoTDB 集成测试的环境一共有3种,分别为**本地单机测试环境、本地集群测试环境和远程测试环境。** Apache IOTDB 的集群测试需要在其中的1种或多种环境下完成。对于这三类环境的说明如下: +1. 本地单机测试环境:该环境用于完成本地的 Apache IoTDB 单机版的集成测试。若需要变更该环境的具体配置,需要在 IoTDB 实例启动前替换相应的配置文件,再启动 IoTDB 并进行测试。 +2. 本地集群测试环境:该环境用于完成本地的 Apache IoTDB 分布式版(伪分布式)的集成测试。若需要变更该环境的具体配置,需要在 IoTDB 集群启动前替换相应的配置文件,再启动 IoTDB 集群并进行测试。 +3. 远程测试环境:该环境用于测试远程 Apache IoTDB 的功能,连接的 IoTDB 实例可能是一个单机版的实例,也可以是远程集群的某一个节点。远程测试环境的具体配置的修改受到限制,暂不支持在测试时修改。 +集成测试开发者在编写测试程序时需要指定这三种环境的1种或多种。具体指定方法见后文。 + +### 黑盒测试 + +**黑盒测试** 是一种软件测试方法,它检验程序的功能,而不考虑其内部结构或工作方式。开发者不需要了解待测程序的内部逻辑即可完成测试。**Apache IoTDB 的集成测试以黑盒测试的方式进行。通过 JDBC 或 Session API 的接口实现测试输入的用例即为黑盒测试用例。** 因此,测试用例的输出验证也应该通过 JDBC 或 Session API 的返回结果实现。 + +### 集成测试的步骤 + +集成测试的步骤主要分为三步,即 (1) 构建测试类和标注测试环境、(2) 设置测试前的准备工作以及测试后的清理工作以及 (3) 实现集成测试逻辑。如果需要测试非默认环境下的 IoTDB,还需要修改 IoTDB 的配置,修改方法对应小结的第4部分。 + + + +#### 1. 集成测试类和注解 + +构建的集成测试类时,开发者需要在 Apache IoTDB 的 [integration-test](https://github.com/apache/iotdb/tree/master/integration-test) 模块中创建测试类。类名应当能够精简准确地表述该集成测试的目的。除用于服务其他测试用例的类外,含集成测试用例用于测试 Apache IoTDB 功能的类,应当命名为“功能+IT”。例如,用于测试IoTDB自动注册元数据功能的集成测试命名为“IoTDBAutoCreateSchemaIT”。 + +- Category 注解:**在构建集成测试类时,需要显式地通过引入```@Category```注明测试环境** ,测试环境用```LocalStandaloneIT.class```、```ClusterIT.class``` 和 ```RemoteIT.class```来表示,分别与“Apache IoTDB 集成测试的环境”中的本地单机测试环境、本地集群测试环境和远程测试环境对应。标签内是测试环境的集合,可以包含多个元素,表示在多种环境下分别测试。**一般情况下,标签```LocalStandaloneIT.class``` 和 ```ClusterIT.class``` 是必须添加的。** 当某些功能仅支持单机版 IoTDB 时可以只保留```LocalStandaloneIT.class```。 +- RunWith 注解: 每一个集成测试类上都需要添加 ```@RunWith(IoTDBTestRunner.class)``` 标签。 + +```java +// 给 IoTDBAliasIT 测试类加标签,分别在本地单机测试环境、 +// 本地集群测试环境和远程测试环境完成测试。 +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class, ClusterIT.class, RemoteIT.class}) +public class IoTDBAliasIT { + ... +} + +// 给 IoTDBAlignByDeviceIT 测试类加标签,分别在本地单机 +// 测试环境和本地集群测试环境完成测试。 +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class, ClusterIT.class}) +public class IoTDBAlignByDeviceIT { + ... +} +``` + +#### 2. 设置测试前的准备工作以及测试后的清理工作 + +测试前的准备工作包括启动 IoTDB(单机或集群)实例和测试用的数据准备。这些逻辑在setUp方法内实现。其中setUp方法前需要添加```@BeforeClass``` 或 ```@Before``` 标签,前者表示该方法为当前集成测试执行的第 1 个方法,并且在集成测试运行时只执行 1 次,后者表示在运行当前集成测试的每 1 个测试方法前,该方法都会被执行 1 次。 +- IoTDB 实例启动通过调用工厂类来实现,即```EnvFactory.getEnv().initBeforeClass()```。 +- 测试用的数据准备包括按测试需要提前注册 database 、注册时间序列、写入时间序列数据等。建议在测试类内实现单独的方法来准备数据,如insertData()。若需要写入多条数据,请使用批量写入的接口(JDBC中的executeBatch接口,或Session API 中的 insertRecords、insertTablets 等接口)。 + +```java +@BeforeClass +public static void setUp() throws Exception { + // 启动 IoTDB 实例 + EnvFactory.getEnv().initBeforeClass(); + ... // 准备数据 +} +``` + +测试后需要清理相关的环境,其中需要断开还没有关闭的连接。这些逻辑在 tearDown 方法内实现。其中 tearDown 方法前需要添加```@AfterClass``` 或 ```@After``` 标签,前者表示该方法为当前集成测试执行的最后一个方法,并且在集成测试运行时只执行 1 次,后者表示在运行当前集成测试的每一个测试方法后,该方法都会被执行 1 次。 +- 如果 IoTDB 连接以测试类成员变量的形式声明,并且在测试后没有断开连接,则需要在 tearDown 方法内显式断开。 +- IoTDB 环境的清理通过调用工厂类来实现,即```EnvFactory.getEnv().cleanAfterClass()```。 + +```java +@AfterClass +public static void tearDown() throws Exception { + ... // 断开连接等 + // 清理 IoTDB 实例的环境 + EnvFactory.getEnv().cleanAfterClass(); +} +``` + +#### 3. 实现集成测试逻辑 + +Apache IoTDB 的集成测试以黑盒测试的方式进行,测试方法的名称为“测试的功能点+Test”,例如“selectWithAliasTest”。测试通过 JDBC 或 Session API 的接口来完成。 + +1、使用JDBC接口 + +使用JDBC接口时,建议将连接建立在 try 语句内,以这种方式建立的连接无需在 tearDown 方法内关闭。连接需要通过工厂类来建立,即```EnvFactory.getEnv().getConnection()```,不要指定具体的 ip 地址或端口号。示例代码如下所示。 + +```java +@Test +public void someFunctionTest(){ + try (Connection connection = EnvFactory.getEnv().getConnection(); + Statement statement = connection.createStatement()) { + ... // 执行相应语句并做测试 + } catch (Exception e) { + e.printStackTrace(); + Assert.fail(); + } +} +``` +注意: +- **查询操作必须使用```executeQuery()```方法,返回ResultSet;** 对于**更新数据库等无返回值的操作,必须使用```execute()```方法。** 示例代码如下。 + +```java +@Test +public void exampleTest() throws Exception { + try (Connection connection = EnvFactory.getEnv().getConnection(); + Statement statement = connection.createStatement()) { + // 使用 execute() 方法设置存储组 + statement.execute("CREATE DATABASE root.sg"); + // 使用 executeQuery() 方法查询存储组 + try (ResultSet resultSet = statement.executeQuery("show databases")) { + if (resultSet.next()) { + String storageGroupPath = resultSet.getString("database"); + Assert.assertEquals("root.sg", storageGroupPath); + } else { + Assert.fail("This ResultSet is empty."); + } + } + } +} +``` + +2、使用 Session API + +目前暂不支持使用 Session API 来做集成测试。 + +3、测试方法的环境标签 +对于测试方法,开发者也可以指定特定的测试环境,只需要在对应的测试方法前注明环境即可。值得注意的是,有额外测试环境标注的用例,不但会在所指定的环境中进行测试,还会在该用例隶属的测试类所对应的环境中进行测试。示例代码如下。 + + +```java +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class}) +public class IoTDBExampleIT { + + // 该用例只会在本地单机测试环境中进行测试 + @Test + public void theStandaloneCaseTest() { + ... + } + + // 该用例会在本地单机测试环境、本地集群测试环境和远程测试环境中进行测试 + @Test + @Category({ClusterIT.class, RemoteIT.class}) + public void theAllEnvCaseTest() { + ... + } +} +``` + +#### 4. 测试中 IoTDB 配置参数的修改 + +有时,为了测试 IoTDB 在特定配置条件下的功能需要更改其配置。由于远程的机器配置无法修改,因此,需要更改配置的测试不支持远程测试环境,只支持本地单机测试环境和本地集群测试环境。配置文件的修改需要在setUp方法中实现,在```EnvFactory.getEnv().initBeforeClass()```之前执行,应当使用 ConfigFactory 提供的方法来实现。在 tearDown 方法内,需要将 IoTDB 的配置恢复到原默认设置,这一步在环境清理(```EnvFactory.getEnv().cleanAfterTest()```)后通过调用ConfigFactory提供的方法来执行。实例代码如下。 + +```java +@RunWith(IoTDBTestRunner.class) +@Category({LocalStandaloneIT.class, ClusterIT.class}) +public class IoTDBAlignedSeriesQueryIT { + + protected static boolean enableSeqSpaceCompaction; + protected static boolean enableUnseqSpaceCompaction; + protected static boolean enableCrossSpaceCompaction; + + @BeforeClass + public static void setUp() throws Exception { + // 获取默认配置 + enableSeqSpaceCompaction = ConfigFactory.getConfig().isEnableSeqSpaceCompaction(); + enableUnseqSpaceCompaction = ConfigFactory.getConfig().isEnableUnseqSpaceCompaction(); + enableCrossSpaceCompaction = ConfigFactory.getConfig().isEnableCrossSpaceCompaction(); + // 更新配置 + ConfigFactory.getConfig().setEnableSeqSpaceCompaction(false); + ConfigFactory.getConfig().setEnableUnseqSpaceCompaction(false); + ConfigFactory.getConfig().setEnableCrossSpaceCompaction(false); + EnvFactory.getEnv().initBeforeClass(); + AlignedWriteUtil.insertData(); + } + + @AfterClass + public static void tearDown() throws Exception { + EnvFactory.getEnv().cleanAfterClass(); + // 恢复为默认配置 + ConfigFactory.getConfig().setEnableSeqSpaceCompaction(enableSeqSpaceCompaction); + ConfigFactory.getConfig().setEnableUnseqSpaceCompaction(enableUnseqSpaceCompaction); + ConfigFactory.getConfig().setEnableCrossSpaceCompaction(enableCrossSpaceCompaction); + } +} +``` + +## Q&A +### CI 出错后查看日志的方法 +1、点击出错的测试对应的 Details + + + +2、查看和下载日志 + + + +也可以点击左上角的 summary 然后查看和下载其他错误日志。 + + + +### 运行集成测试的命令 + +请参考 [《Integration Test For the MPP Architecture》](https://github.com/apache/iotdb/blob/master/integration-test/README.md) 文档。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Interface-Comparison.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Interface-Comparison.md new file mode 100644 index 00000000..de16a635 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Interface-Comparison.md @@ -0,0 +1,50 @@ + + +# 原生接口对比 + +此章节主要为Java原生接口与Python原生接口的差异性对比,主要为方便区分Java原生接口与Python原生的不同之处。 + + + +| 序号 | 接口名称以及作用 | Java接口函数 | Python接口函数 |

接口对比说明
| +| ---- | ------------------------- |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------------------ | ------------------------------------------------------------ | +| 1 | 初始化Session | `Session.Builder.build(); Session.Builder().host(String host).port(int port).build(); Session.Builder().nodeUrls(List nodeUrls).build(); Session.Builder().fetchSize(int fetchSize).username(String username).password(String password).thriftDefaultBufferSize(int thriftDefaultBufferSize).thriftMaxFrameSize(int thriftMaxFrameSize).enableRedirection(boolean enableCacheLeader).version(Version version).build();` | `Session(ip, port_, username_, password_,fetch_size=1024, zone_id="UTC+8")` | 1.Python原生接口缺少使用默认配置初始化session 2.Python原生接口缺少指定多个可连接节点初始化session 3.Python原生接口缺失使用其他配置项初始化session | +| 2 | 开启 Session | `void open() void open(boolean enableRPCCompression)` | `session.open(enable_rpc_compression=False)` | | +| 3 | 关闭 Session | `void close()` | `session.close()` | | +| 4 | 设置 Database | `void setStorageGroup(String storageGroupId)` | `session.set_storage_group(group_name)` | | +| 5 | 删除 database | `void deleteStorageGroup(String storageGroup) void deleteStorageGroups(List storageGroups)` | `session.delete_storage_group(group_name) session.delete_storage_groups(group_name_lst)` | | +| 6 | 创建时间序列 | `void createTimeseries(String path, TSDataType dataType,TSEncoding encoding, CompressionType compressor, Map props,Map tags, Map attributes, String measurementAlias) void createMultiTimeseries(List paths, List dataTypes,List encodings, List compressors,List> propsList, List> tagsList,List> attributesList, List measurementAliasList)` | `session.create_time_series(ts_path, data_type, encoding, compressor,props=None, tags=None, attributes=None, alias=None) session.create_multi_time_series(ts_path_lst, data_type_lst, encoding_lst, compressor_lst,props_lst=None, tags_lst=None, attributes_lst=None, alias_lst=None)` | | +| 7 | 创建对齐时间序列 | `void createAlignedTimeseries(String prefixPath, List measurements,List dataTypes, List encodings,CompressionType compressor, List measurementAliasList);` | `session.create_aligned_time_series(device_id, measurements_lst, data_type_lst, encoding_lst, compressor_lst)` | | +| 8 | 删除时间序列 | `void deleteTimeseries(String path) void deleteTimeseries(List paths)` | `session.delete_time_series(paths_list)` | Python原生接口缺少删除一个时间序列的接口 | +| 9 | 检测时间序列是否存在 | `boolean checkTimeseriesExists(String path)` | `session.check_time_series_exists(path)` | | +| 10 | 元数据模版 | `public void createSchemaTemplate(Template template);` | | | +| 11 | 插入Tablet | `void insertTablet(Tablet tablet) void insertTablets(Map tablets)` | `session.insert_tablet(tablet_) session.insert_tablets(tablet_lst)` | | +| 12 | 插入Record | `void insertRecord(String prefixPath, long time, List measurements,List types, List values) void insertRecords(List deviceIds,List times,List> measurementsList,List> typesList,List> valuesList) void insertRecordsOfOneDevice(String deviceId, List times,List> valuesList)` | `session.insert_record(device_id, timestamp, measurements_, data_types_, values_) session.insert_records(device_ids_, time_list_, measurements_list_, data_type_list_, values_list_) session.insert_records_of_one_device(device_id, time_list, measurements_list, data_types_list, values_list)` | | +| 13 | 带有类型推断的写入 | `void insertRecord(String prefixPath, long time, List measurements, List values) void insertRecords(List deviceIds, List times,List> measurementsList, List> valuesList) void insertStringRecordsOfOneDevice(String deviceId, List times,List> measurementsList, List> valuesList)` | `session.insert_str_record(device_id, timestamp, measurements, string_values)` | 1.Python原生接口缺少插入多个 Record的接口 2.Python原生接口缺少插入同属于一个 device 的多个 Record | +| 14 | 对齐时间序列的写入 | `insertAlignedRecord insertAlignedRecords insertAlignedRecordsOfOneDevice insertAlignedStringRecordsOfOneDevice insertAlignedTablet insertAlignedTablets` | `insert_aligned_record insert_aligned_records insert_aligned_records_of_one_device insert_aligned_tablet insert_aligned_tablets` | Python原生接口缺少带有判断类型的对齐时间序列的写入 | +| 15 | 数据删除 | `void deleteData(String path, long endTime) void deleteData(List paths, long endTime)` | | 1.Python原生接口缺少删除一条数据的接口 2.Python原生接口缺少删除多条数据的接口 | +| 16 | 数据查询 | `SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime) SessionDataSet executeLastDataQuery(List paths, long LastTime)` | | 1.Python原生接口缺少原始数据查询的接口 2.Python原生接口缺少查询最后一条时间戳大于等于某个时间点的数据的接口 | +| 17 | IoTDB-SQL 接口-查询语句 | `SessionDataSet executeQueryStatement(String sql)` | `session.execute_query_statement(sql)` | | +| 18 | IoTDB-SQL 接口-非查询语句 | `void executeNonQueryStatement(String sql)` | `session.execute_non_query_statement(sql)` | | +| 19 | 测试接口 | `void testInsertRecord(String deviceId, long time, List measurements, List values) void testInsertRecord(String deviceId, long time, List measurements,List types, List values) void testInsertRecords(List deviceIds, List times,List> measurementsList, List> valuesList) void testInsertRecords(List deviceIds, List times,List> measurementsList, List> typesList,List> valuesList) void testInsertTablet(Tablet tablet) void testInsertTablets(Map tablets)` | Python 客户端对测试的支持是基于testcontainers库 | Python接口无原生的测试接口 | +| 20 | 针对原生接口的连接池 | `SessionPool` | | Python接口无针对原生接口的连接池 | +| 21 | 集群信息相关的接口 | `iotdb-thrift-cluster` | | Python接口不支持集群信息相关的接口 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/IoTDB-Data-Pipe_timecho.md b/src/zh/UserGuide/V2.0.1/Tree/stage/IoTDB-Data-Pipe_timecho.md new file mode 100644 index 00000000..fd78dc85 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/IoTDB-Data-Pipe_timecho.md @@ -0,0 +1,945 @@ + + +# IoTDB 数据订阅 + +**IoTDB 数据订阅功能可以将 IoTDB 的数据传输到另一个数据平台,我们将一个数据订阅任务称为 Pipe。** + +**一个 Pipe 包含三个子任务(插件):** + +- 抽取(Extract) +- 处理(Process) +- 发送(Connect) + +**Pipe 允许用户自定义三个子任务的处理逻辑,通过类似 UDF 的方式处理数据。**在一个 Pipe 中,上述的子任务分别由三种插件执行实现,数据会依次经过这三个插件进行处理:Pipe Extractor 用于抽取数据,Pipe Processor 用于处理数据,Pipe Connector 用于发送数据,最终数据将被发至外部系统。 + +**Pipe 任务的模型如下:** + +![任务模型图](https://alioss.timecho.com/docs/img/%E4%BB%BB%E5%8A%A1%E6%A8%A1%E5%9E%8B%E5%9B%BE.png) + + + +描述一个数据订阅任务,本质就是描述 Pipe Extractor、Pipe Processor 和 Pipe Connector 插件的属性。用户可以通过 SQL 语句声明式地配置三个子任务的具体属性,通过组合不同的属性,实现灵活的数据 ETL 能力。 + +利用数据订阅功能,可以搭建完整的数据链路来满足端*边云同步、异地灾备、读写负载分库*等需求。 + +# 快速开始 + +**🎯 目标:实现 IoTDB A -> IoTDB B 的全量数据订阅** + +- 启动两个 IoTDB,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) + +- 创建 A -> B 的 Pipe,在 A 上执行 + + ```sql + create pipe a2b + with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6668' + ) + ``` + +- 启动 A -> B 的 Pipe,在 A 上执行 + + ```sql + start pipe a2b + ``` + +- 向 A 写入数据 + + ```sql + INSERT INTO root.db.d(time, m) values (1, 1) + ``` + +- 在 B 检查由 A 同步过来的数据 + + ```sql + SELECT ** FROM root + ``` + +> ❗️**注:目前的 IoTDB -> IoTDB 的数据订阅实现并不支持 DDL 同步** +> +> 即:不支持 ttl,trigger,别名,模板,视图,创建/删除序列,创建/删除数据库等操作**IoTDB -> IoTDB 的数据订阅要求目标端 IoTDB:** +> +> * 开启自动创建元数据:需要人工配置数据类型的编码和压缩与发送端保持一致 +> * 不开启自动创建元数据:手工创建与源端一致的元数据 + +# Pipe 同步任务管理 + +## 创建流水线 + +可以使用 `CREATE PIPE` 语句来创建一条数据订阅任务,SQL 语句如下所示: + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流水线任务的名字 +WITH EXTRACTOR ( + -- 默认的 IoTDB 数据抽取插件 + 'extractor' = 'iotdb-extractor', + -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + 'extractor.pattern' = 'root.timecho', + -- 是否抽取历史数据 + 'extractor.history.enable' = 'true', + -- 描述被抽取的历史数据的时间范围,表示最早时间 + 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', + -- 描述被抽取的历史数据的时间范围,表示最晚时间 + 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', + -- 是否抽取实时数据 + 'extractor.realtime.enable' = 'true', + -- 描述实时数据的抽取方式 + 'extractor.realtime.mode' = 'hybrid', +) +WITH PROCESSOR ( + -- 默认的数据处理插件,即不做任何处理 + 'processor' = 'do-nothing-processor', +) +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +**创建流水线时需要配置 PipeId 以及三个插件部分的参数:** + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| --------- | ------------------------------------------------- | --------------------------- | -------------------- | -------------------------------------------------------- | ------------------------- | +| PipeId | 全局唯一标定一个同步流水线的名称 | 必填 | - | - | - | +| extractor | Pipe Extractor 插件,负责在数据库底层抽取同步数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入同步流水线 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | + +示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据订阅任务。IoTDB 还内置了其他的数据订阅插件,**请查看“系统预置数据订阅插件”一节**。 + +**一个最简的 CREATE PIPE 语句示例如下:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流水线任务的名字 +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 + +**注意:** + +- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 + +- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 + +- CONNECTOR 具备自复用能力。对于不同的流水线,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个流水线的声明: + + ```sql + CREATE PIPE pipe1 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.host' = 'localhost', + 'connector.thrift.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.port' = '9999', + 'connector.thrift.host' = 'localhost', + 'connector.id' = '1', + ) + ``` + + - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 + +- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +## 启动流水线 + +CREATE PIPE 语句成功执行后,流水线相关实例会被创建,但整个流水线的运行状态会被置为 STOPPED,即流水线不会立刻处理数据。 + +可以使用 START PIPE 语句使流水线开始处理数据: + +```sql +START PIPE +``` + +## 停止流水线 + +使用 STOP PIPE 语句使流水线停止处理数据: + +```sql +STOP PIPE +``` + +## 删除流水线 + +使用 DROP PIPE 语句使流水线停止处理数据(当流水线状态为 RUNNING 时),然后删除整个流水线同步任务: + +```sql +DROP PIPE +``` + +用户在删除流水线前,不需要执行 STOP 操作。 + +## 展示流水线 + +使用 SHOW PIPES 语句查看所有流水线: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +| ID| CreationTime | State|PipeExtractor|PipeProcessor|PipeConnector|ExceptionMessage| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| None| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +``` + +可以使用 `` 指定想看的某个同步任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 + +```sql +SHOW PIPES +WHERE CONNECTOR USED BY +``` + +## 流水线运行状态迁移 + +一个数据订阅 pipe 在其被管理的生命周期中会经过多种状态: + +- **STOPPED:**pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: + - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 + - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED + - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- **RUNNING:**pipe 正在正常工作 +- **DROPPED:**pipe 任务被永久删除 + +下图表明了所有状态以及状态的迁移: + +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +# **系统预置数据订阅插件** + +## 预置 extractor + +### iotdb-extractor + +作用:抽取 IoTDB 内部的历史或实时数据进入流水线。 + +| key | value | value 取值范围 | required or optional with default | +| ---------------------------- | ---------------------------------------------- | -------------------------------------- | --------------------------------- | +| extractor | iotdb-extractor | String: iotdb-extractor | required | +| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | optional: true | +| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | optional: true | +| extractor.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | optional: hybrid | + +> 🚫 **extractor.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,同步效率较高,其他任意格式都将降低性能 +> +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> +> * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100TS +> +> 的数据会被同步; +> +> * root.aligned.\`1\` +> * root.aligned.\`123\` +> +> 的数据不会被同步。 +> +> * root.\_\_system 的数据不会被 pipe 抽取,即不会被同步到目标端。用户虽然可以在 extractor.pattern 中包含任意前缀,包括带有(或覆盖) root.\__system 的前缀,但是 root.__system 下的数据总是会被 pipe 忽略的 + + + +> ❗️**extractor.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + + + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:**数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:**数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + + + +> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> +> 用户可以指定 iotdb-extractor 进行: +> +> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * 禁止同时设置 extractor.history.enable 和 extractor.relatime.enable 为 false + + + +> 📌 **extractor.realtime.mode:数据抽取的模式** +> +> * log:该模式下,流水线仅使用操作日志进行数据处理、发送 +> * file:该模式下,流水线仅使用数据文件进行数据处理、发送 +> * hybrid:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 + +## 预置 processor + +### do-nothing-processor + +作用:不对 extractor 传入的事件做任何的处理。 + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| processor | do-nothing-processor | String: do-nothing-processor | required | + +## 预置 connector + +### iotdb-thrift-connector-v1(别名:iotdb-thrift-connector) + +作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。使用 Thrift RPC 框架传输数据,单线程 blocking IO 模型。保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致。 + +限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 + +| key | value | value 取值范围 | required or optional with default | +| -------------- | --------------------------------------------------- | ----------------------------------------------------------- | --------------------------------- | +| connector | iotdb-thrift-connector 或 iotdb-thrift-connector-v1 | String: iotdb-thrift-connector 或 iotdb-thrift-connector-v1 | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +### iotdb-thrift-connector-v2 + +作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景。不保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致,但是保证数据发送的完整性(at-least-once)。 + +限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 + +| key | value | value 取值范围 | required or optional with default | +| ------------------- | ------------------------------------------------------- | ------------------------------------------------------------ | --------------------------------- | +| connector | iotdb-thrift-connector-v2 | String: iotdb-thrift-connector-v2 | required | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669''127.0.0.1:6667' | required | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +### iotdb-sync-connector + +作用:主要用于 IoTDB(v1.2.0+)向更低版本的 IoTDB 传输数据,使用 v1.2.0 版本前的数据同步(Sync)协议。使用 Thrift RPC 框架传输数据。单线程 sync blocking IO 模型,传输性能较弱。 + +限制:源端 IoTDB 版本需要在 v1.2.0+,目标端 IoTDB 版本可以是 v1.2.0+、v1.1.x(更低版本的 IoTDB 理论上也支持,但是未经测试)。 + +注意:理论上 v1.2.0+ IoTDB 可作为 v1.2.0 版本前的任意版本的数据同步(Sync)接收端。 + +| key | value | value 取值范围 | required or optional with default | +| ------------------ | ------------------------------------------------------------ | ---------------------------- | --------------------------------- | +| connector | iotdb-sync-connector | String: iotdb-sync-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | +| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | optional: 1.1 | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +### do-nothing-connector + +作用:不对 processor 传入的事件做任何的处理。 + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| connector | do-nothing-connector | String: do-nothing-connector | required | + +# 自定义数据订阅插件开发 + +## 编程开发依赖 + +推荐采用 maven 构建项目,在`pom.xml`中添加以下依赖。请注意选择和 IoTDB 服务器版本相同的依赖版本。 + +```xml + + org.apache.iotdb + pipe-api + 1.2.0 + provided + +``` + +## 事件驱动编程模型 + +数据订阅插件的用户编程接口设计,参考了事件驱动编程模型的通用设计理念。事件(Event)是用户编程接口中的数据抽象,而编程接口与具体的执行方式解耦,只需要专注于描述事件(数据)到达系统后,系统期望的处理方式即可。 + +在数据订阅插件的用户编程接口中,事件是数据库数据写入操作的抽象。事件由单机同步引擎捕获,按照同步三个阶段的流程,依次传递至 PipeExtractor 插件,PipeProcessor 插件和 PipeConnector 插件,并依次在三个插件中触发用户逻辑的执行。 + +为了兼顾端侧低负载场景下的同步低延迟和端侧高负载场景下的同步高吞吐,同步引擎会动态地在操作日志和数据文件中选择处理对象,因此,同步的用户编程接口要求用户提供下列两类事件的处理逻辑:操作日志写入事件 TabletInsertionEvent 和数据文件写入事件 TsFileInsertionEvent。 + +### **操作日志写入事件(TabletInsertionEvent)** + +操作日志写入事件(TabletInsertionEvent)是对用户写入请求的高层数据抽象,它通过提供统一的操作接口,为用户提供了操纵写入请求底层数据的能力。 + +对于不同的数据库部署方式,操作日志写入事件对应的底层存储结构是不一样的。对于单机部署的场景,操作日志写入事件是对写前日志(WAL)条目的封装;对于分布式部署的场景,操作日志写入事件是对单个节点共识协议操作日志条目的封装。 + +对于数据库不同写入请求接口生成的写入操作,操作日志写入事件对应的请求结构体的数据结构也是不一样的。IoTDB 提供了 InsertRecord、InsertRecords、InsertTablet、InsertTablets 等众多的写入接口,每一种写入请求都使用了完全不同的序列化方式,生成的二进制条目也不尽相同。 + +操作日志写入事件的存在,为用户提供了一种统一的数据操作视图,它屏蔽了底层数据结构的实现差异,极大地降低了用户的编程门槛,提升了功能的易用性。 + +```java +/** TabletInsertionEvent is used to define the event of data insertion. */ +public interface TabletInsertionEvent extends Event { + + /** + * The consumer processes the data row by row and collects the results by RowCollector. + * + * @return Iterable a list of new TabletInsertionEvent contains the results + * collected by the RowCollector + */ + Iterable processRowByRow(BiConsumer consumer); + + /** + * The consumer processes the Tablet directly and collects the results by RowCollector. + * + * @return Iterable a list of new TabletInsertionEvent contains the results + * collected by the RowCollector + */ + Iterable processTablet(BiConsumer consumer); +} +``` + +### **数据文件写入事件(TsFileInsertionEvent)** + +数据文件写入事件(TsFileInsertionEvent) 是对数据库文件落盘操作的高层抽象,它是若干操作日志写入事件(TabletInsertionEvent)的数据集合。 + +IoTDB 的存储引擎是 LSM 结构的。数据写入时会先将写入操作落盘到日志结构的文件里,同时将写入数据保存在内存里。当内存达到控制上限,则会触发刷盘行为,即将内存中的数据转换为数据库文件,同时删除之前预写的操作日志。当内存中的数据转换为数据库文件中的数据时,会经过编码压缩和通用压缩两次压缩处理,因此数据库文件的数据相比内存中的原始数据占用的空间更少。 + +在极端的网络情况下,直接传输数据文件相比传输数据写入的操作要更加经济,它会占用更低的网络带宽,能实现更快的传输速度。当然,天下没有免费的午餐,对文件中的数据进行计算处理,相比直接对内存中的数据进行计算处理时,需要额外付出文件 I/O 的代价。但是,正是磁盘数据文件和内存写入操作两种结构各有优劣的存在,给了系统做动态权衡调整的机会,也正是基于这样的观察,插件的事件模型中才引入了数据文件写入事件。 + +综上,数据文件写入事件出现在同步插件的事件流中,存在下面两种情况: + +(1)历史数据抽取:一个同步任务开始前,所有已经落盘的写入数据都会以 TsFile 的形式存在。一个同步任务开始后,采集历史数据时,历史数据将以 TsFileInsertionEvent 作为抽象; + +1. (2)实时数据抽取:一个同步任务进行时,当数据流中实时处理操作日志写入事件的速度慢于写入请求速度一定进度之后,未来得及处理的操作日志写入事件会被被持久化至磁盘,以 TsFile 的形式存在,这一些数据被同步引擎采集到后,会以 TsFileInsertionEvent 作为抽象。 + +```java +/** + * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, + * which is compressed and encoded, and requires IO cost for computational processing. + */ +public interface TsFileInsertionEvent extends Event { + + /** + * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. + * + * @return the list of TsFileInsertionEvent + */ + Iterable toTabletInsertionEvents(); +} +``` + +## 自定义数据订阅插件编程接口定义 + +基于自定义数据订阅插件编程接口,用户可以轻松编写数据抽取插件、 数据处理插件和数据发送插件,从而使得同步功能灵活适配各种工业场景。 + +### 数据抽取插件接口 + +数据抽取是同步数据从数据抽取到数据发送三阶段的第一阶段。数据抽取插件(PipeExtractor)是同步引擎和存储引擎的桥梁,它通过监听存储引擎的行为,捕获各种数据写入事件。 + +```java +/** + * PipeExtractor + * + *

PipeExtractor is responsible for capturing events from sources. + * + *

Various data sources can be supported by implementing different PipeExtractor classes. + * + *

The lifecycle of a PipeExtractor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH EXTRACTOR` clause in SQL are + * parsed and the validation method {@link PipeExtractor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeExtractor. + *
  • Then the method {@link PipeExtractor#start()} will be called to start the PipeExtractor. + *
  • While the collaboration task is in progress, the method {@link PipeExtractor#supply()} will + * be called to capture events from sources and then the events will be passed to the + * PipeProcessor. + *
  • The method {@link PipeExtractor#close()} will be called when the collaboration task is + * cancelled (the `DROP PIPE` command is executed). + *
+ */ +public interface PipeExtractor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeExtractor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeExtractorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeExtractor#validate(PipeParameterValidator)} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeExtractor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeExtractorRuntimeConfiguration configuration) + throws Exception; + + /** + * Start the extractor. After this method is called, events should be ready to be supplied by + * {@link PipeExtractor#supply()}. This method is called after {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} is called. + * + * @throws Exception the user can throw errors if necessary + */ + void start() throws Exception; + + /** + * Supply single event from the extractor and the caller will send the event to the processor. + * This method is called after {@link PipeExtractor#start()} is called. + * + * @return the event to be supplied. the event may be null if the extractor has no more events at + * the moment, but the extractor is still running for more events. + * @throws Exception the user can throw errors if necessary + */ + Event supply() throws Exception; +} +``` + +### 数据处理插件接口 + +数据处理是同步数据从数据抽取到数据发送三阶段的第二阶段。数据处理插件(PipeProcessor)主要用于过滤和转换由数据抽取插件(PipeExtractor)捕获的各种事件。 + +```java +/** + * PipeProcessor + * + *

PipeProcessor is used to filter and transform the Event formed by the PipeExtractor. + * + *

The lifecycle of a PipeProcessor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are + * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeProcessor. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeExtractor captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeConnector. The + * following 3 methods will be called: {@link + * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link + * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link + * PipeProcessor#process(Event, EventCollector)}. + *
    • PipeConnector serializes the events into binaries and send them to sinks. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeProcessor#close() } method will be called. + *
+ */ +public interface PipeProcessor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeProcessor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the + * events processing. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeProcessor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is called to process the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the Event. + * + * @param event Event to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(Event event, EventCollector eventCollector) throws Exception; +} +``` + +### 数据发送插件接口 + +数据发送是同步数据从数据抽取到数据发送三阶段的第三阶段。数据发送插件(PipeConnector)主要用于发送经由数据处理插件(PipeProcessor)处理过后的各种事件,它作为数据订阅框架的网络实现层,接口上应允许接入多种实时通信协议和多种连接器。 + +```java +/** + * PipeConnector + * + *

PipeConnector is responsible for sending events to sinks. + * + *

Various network protocols can be supported by implementing different PipeConnector classes. + * + *

The lifecycle of a PipeConnector is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH CONNECTOR` clause in SQL are + * parsed and the validation method {@link PipeConnector#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeConnector and the method {@link + * PipeConnector#handshake()} will be called to create a connection with sink. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeExtractor captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeConnector. + *
    • PipeConnector serializes the events into binaries and send them to sinks. The + * following 3 methods will be called: {@link + * PipeConnector#transfer(TabletInsertionEvent)}, {@link + * PipeConnector#transfer(TsFileInsertionEvent)} and {@link + * PipeConnector#transfer(Event)}. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeConnector#close() } method will be called. + *
+ * + *

In addition, the method {@link PipeConnector#heartbeat()} will be called periodically to check + * whether the connection with sink is still alive. The method {@link PipeConnector#handshake()} + * will be called to create a new connection with the sink when the method {@link + * PipeConnector#heartbeat()} throws exceptions. + */ +public interface PipeConnector extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeConnector. In this method, the user can do the + * following things: + * + *

    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeConnectorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeConnector#validate(PipeParameterValidator)} is called and before the method {@link + * PipeConnector#handshake()} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeConnector + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeConnectorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is used to create a connection with sink. This method will be called after the + * method {@link PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} is + * called or will be called when the method {@link PipeConnector#heartbeat()} throws exceptions. + * + * @throws Exception if the connection is failed to be created + */ + void handshake() throws Exception; + + /** + * This method will be called periodically to check whether the connection with sink is still + * alive. + * + * @throws Exception if the connection dies + */ + void heartbeat() throws Exception; + + /** + * This method is used to transfer the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; + + /** + * This method is used to transfer the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception; + + /** + * This method is used to transfer the Event. + * + * @param event Event to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(Event event) throws Exception; +} +``` + +# 自定义数据订阅插件管理 + +为了保证用户自定义插件在实际生产中的灵活性和易用性,系统还需要提供对插件进行动态统一管理的能力。本章节介绍的数据订阅插件管理语句提供了对插件进行动态统一管理的入口。 + +## 加载插件语句 + +在 IoTDB 中,若要在系统中动态载入一个用户自定义插件,则首先需要基于 PipeExtractor、 PipeProcessor 或者 PipeConnector 实现一个具体的插件类,然后需要将插件类编译打包成 jar 可执行文件,最后使用加载插件的管理语句将插件载入 IoTDB。 + +加载插件的管理语句的语法如图所示。 + +```sql +CREATE PIPEPLUGIN <别名> +AS <全类名> +USING +``` + +例如,用户实现了一个全类名为 edu.tsinghua.iotdb.pipe.ExampleProcessor 的数据处理插件,打包后的 jar 资源包存放到了 https://example.com:8080/iotdb/pipe-plugin.jar 上,用户希望在同步引擎中使用这个插件,将插件标记为 example。那么,这个数据处理插件的创建语句如图所示。 + +```sql +CREATE PIPEPLUGIN example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +## 删除插件语句 + +当用户不再想使用一个插件,需要将插件从系统中卸载时,可以使用如图所示的删除插件语句。 + +```sql +DROP PIPEPLUGIN <别名> +``` + +## 查看插件语句 + +用户也可以按需查看系统中的插件。查看插件的语句如图所示。 + +```sql +SHOW PIPEPLUGINS +``` + +# 权限管理 + +## Pipe 任务 + +| 权限名称 | 描述 | +|----------|-------------| +| USE_PIPE | 注册流水线。路径无关。 | +| USE_PIPE | 开启流水线。路径无关。 | +| USE_PIPE | 停止流水线。路径无关。 | +| USE_PIPE | 卸载流水线。路径无关。 | +| USE_PIPE | 查询流水线。路径无关。 | + +## Pipe 插件 + +| 权限名称 | 描述 | +|----------|---------------| +| USE_PIPE | 注册流水线插件。路径无关。 | +| USE_PIPE | 开启流水线插件。路径无关。 | +| USE_PIPE | 查询流水线插件。路径无关。 | + +# 功能特性 + +## 最少一次语义保证 **at-least-once** + +数据订阅功能向外部系统传输数据时,提供 at-least-once 的传输语义。在大部分场景下,同步功能可提供 exactly-once 保证,即所有数据被恰好同步一次。 + +但是在以下场景中,可能存在部分数据被同步多次**(断点续传)**的情况: + +- 临时的网络故障:某次数据传输请求失败后,系统会进行重试发送,直至到达最大尝试次数 +- Pipe 插件逻辑实现异常:插件运行中抛出错误,系统会进行重试发送,直至到达最大尝试次数 +- 数据节点宕机、重启等导致的数据分区切主:分区变更完成后,受影响的数据会被重新传输 +- 集群不可用:集群可用后,受影响的数据会重新传输 + +## 源端:数据写入与 Pipe 处理、发送数据异步解耦 + +数据订阅功能中,数据传输采用的是异步复制模式。 + +数据订阅与写入操作完全脱钩,不存在对写入关键路径的影响。该机制允许框架在保证持续数据订阅的前提下,保持时序数据库的写入速度。 + +## 源端:可自适应数据写入负载的数据传输策略 + +支持根据写入负载,动态调整数据传输方式,同步默认使用 TsFile 文件与操作流动态混合传输(`'extractor.realtime.mode'='hybrid'`)。 + +在数据写入负载高时,优先选择 TsFile 传输的方式。TsFile 压缩比高,节省网络带宽。 + +在数据写入负载低时,优先选择操作流同步传输的方式。操作流传输实时性高。 + +## 源端:高可用集群部署时,Pipe 服务高可用 + +当发送端 IoTDB 为高可用集群部署模式时,数据订阅服务也将是高可用的。 数据订阅框架将监控每个数据节点的数据订阅进度,并定期做轻量级的分布式一致性快照以保存同步状态。 + +- 当发送端集群某数据节点宕机时,数据订阅框架可以利用一致性快照以及保存在副本上的数据快速恢复同步,以此实现数据订阅服务的高可用。 +- 当发送端集群整体宕机并重启时,数据订阅框架也能使用快照恢复同步服务。 + +# 配置参数 + +在 iotdb-system.properties 中: + +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Window platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +pipe_lib_dir=ext/pipe + +# The name of the directory that stores the tsfiles temporarily hold or generated by the pipe module. +# The directory is located in the data directory of IoTDB. +pipe_hardlink_tsfile_dir_name=pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +pipe_subtask_executor_max_thread_num=5 + +# The number of events that need to be consumed before a checkpoint is triggered. +pipe_subtask_executor_basic_check_point_interval_by_consumed_event_count=10000 + +# The time duration (in milliseconds) between checkpoints. +pipe_subtask_executor_basic_check_point_interval_by_time_duration=10000 + +# The maximum blocking time (in milliseconds) for the pending queue. +pipe_subtask_executor_pending_queue_max_blocking_time_ms=1000 + +# The default size of ring buffer in the realtime extractor's disruptor queue. +pipe_extractor_assigner_disruptor_ring_buffer_size=65536 + +# The maximum number of entries the deviceToExtractorsCache can hold. +pipe_extractor_matcher_cache_size=1024 + +# The capacity for the number of tablet events that can be stored in the pending queue of the hybrid realtime extractor. +pipe_extractor_pending_queue_capacity=128 + +# The limit for the number of tablet events that can be held in the pending queue of the hybrid realtime extractor. +# Noted that: this should be less than or equals to realtimeExtractorPendingQueueCapacity +pipe_extractor_pending_queue_tablet_limit=64 + +# The buffer size used for reading file during file transfer. +pipe_connector_read_file_buffer_size=8388608 + +# The delay period (in milliseconds) between each retry when a connection failure occurs. +pipe_connector_retry_interval_ms=1000 + +# The size of the pending queue for the PipeConnector to store the events. +pipe_connector_pending_queue_size=1024 + +# The number of heartbeat loop cycles before collecting pipe meta once +pipe_heartbeat_loop_cycles_for_collecting_pipe_meta=100 + +# The initial delay before starting the PipeMetaSyncer service. +pipe_meta_syncer_initial_sync_delay_minutes=3 + +# The sync regular interval (in minutes) for the PipeMetaSyncer service. +pipe_meta_syncer_sync_interval_minutes=3 +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Last-Query.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Last-Query.md new file mode 100644 index 00000000..ffe61bf0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Last-Query.md @@ -0,0 +1,113 @@ + + +# 最新点查询 + +最新点查询是时序数据库 Apache IoTDB 中提供的一种特殊查询。它返回指定时间序列中时间戳最大的数据点,即一条序列的最新状态。 + +在物联网数据分析场景中,此功能尤为重要。为了满足了用户对设备实时监控的需求,Apache IoTDB 对最新点查询进行了**缓存优化**,能够提供毫秒级的返回速度。 + +### 相关配置项 +IoTDB 在 `iotdb-system.properties` 中提供了 `enable_last_cache` 和 `schema_memory_proportion` 两个配置参数,分别用于开启/关闭最新点缓存,以及控制打开最新点缓存后的内存占用。 + +#### enable_last_cache + +`enable_last_cache` 为 `true` 时,开启最新点缓存;为 `false` 时,关闭最新点缓存。 + +#### schema_memory_proportion + +指定了 SchemaRegion, SchemaCache 以及 PartitionCache的内存分配比例,最新点缓存在 SchemaCache 中,所以可以通过调整这个参数,达到调整最新点缓存内存占用的效果。 +默认为 `5:4:1`,即最新点缓存所在的 SchemaCache,占用元数据内存的 40%。 + +### SQL 语法: + +```sql +select last [COMMA ]* from < PrefixPath > [COMMA < PrefixPath >]* [ORDER BY TIMESERIES (DESC | ASC)?] +``` + +其含义是: 查询时间序列 prefixPath.path 中最近时间戳的数据。 + +- `whereClause` 中当前只支持时间过滤条件,任何其他过滤条件都将会返回异常。当缓存的最新点不满足过滤条件时,IoTDB 需要从存储中获取结果,此时性能将会有所下降。 + +- 结果集为四列的结构: + + ``` + +----+----------+-----+--------+ + |Time|timeseries|value|dataType| + +----+----------+-----+--------+ + ``` + +- 可以使用 `ORDER BY TIME/TIMESERIES/VALUE/DATATYPE (DESC | ASC)` 指定结果集按照某一列进行降序/升序排列。当值列包含多种类型的数据时,按照字符串类型来排序。 + +**示例 1:** 查询 root.ln.wf01.wt01.status 的最新数据点 + +``` +IoTDB> select last status from root.ln.wf01.wt01 ++-----------------------------+------------------------+-----+--------+ +| Time| timeseries|value|dataType| ++-----------------------------+------------------------+-----+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.status|false| BOOLEAN| ++-----------------------------+------------------------+-----+--------+ +Total line number = 1 +It costs 0.000s +``` + +**示例 2:** 查询 root.ln.wf01.wt01 下 status,temperature 时间戳大于等于 2017-11-07T23:50:00 的最新数据点。 + +``` +IoTDB> select last status, temperature from root.ln.wf01.wt01 where time >= 2017-11-07T23:50:00 ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 3:** 查询 root.ln.wf01.wt01 下所有序列的最新数据点,并按照序列名降序排列。 + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by timeseries desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 4:** 查询 root.ln.wf01.wt01 下所有序列的最新数据点,并按照dataType降序排列。 + +``` +IoTDB> select last * from root.ln.wf01.wt01 order by dataType desc; ++-----------------------------+-----------------------------+---------+--------+ +| Time| timeseries| value|dataType| ++-----------------------------+-----------------------------+---------+--------+ +|2017-11-07T23:59:00.000+08:00|root.ln.wf01.wt01.temperature|21.067368| DOUBLE| +|2017-11-07T23:59:00.000+08:00| root.ln.wf01.wt01.status| false| BOOLEAN| ++-----------------------------+-----------------------------+---------+--------+ +Total line number = 2 +It costs 0.002s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/CSV-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/CSV-Tool.md new file mode 100644 index 00000000..c8f32d37 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/CSV-Tool.md @@ -0,0 +1,261 @@ + + +# 导入导出 CSV + +CSV 工具可帮您将 CSV 格式的数据导入到 IoTDB 或者将数据从 IoTDB 导出到 CSV 文件。 + +## 使用 export-csv.sh + +### 运行方法 + +```shell +# Unix/OS X +> tools/export-csv.sh -h -p -u -pw -td [-tf -datatype -q -s ] + +# Windows +> tools\export-csv.bat -h -p -u -pw -td [-tf -datatype -q -s ] +``` + +参数: + +* `-datatype`: + - true (默认): 在CSV文件的header中时间序列的后面打印出对应的数据类型。例如:`Time, root.sg1.d1.s1(INT32), root.sg1.d1.s2(INT64)`. + - false: 只在CSV的header中打印出时间序列的名字, `Time, root.sg1.d1.s1 , root.sg1.d1.s2` +* `-q `: + - 在命令中直接指定想要执行的查询语句。 + - 例如: `select * from root.** limit 100`, or `select * from root.** limit 100 align by device` +* `-s `: + - 指定一个SQL文件,里面包含一条或多条SQL语句。如果一个SQL文件中包含多条SQL语句,SQL语句之间应该用换行符进行分割。每一条SQL语句对应一个输出的CSV文件。 +* `-td `: + - 为导出的CSV文件指定输出路径。 +* `-tf `: + - 指定一个你想要得到的时间格式。时间格式必须遵守[ISO 8601](https://calendars.wikia.org/wiki/ISO_8601)标准。如果说你想要以时间戳来保存时间,那就设置为`-tf timestamp`。 + - 例如: `-tf yyyy-MM-dd\ HH:mm:ss` or `-tf timestamp` +* `-linesPerFile `: + - 指定导出的dump文件最大行数,默认值为`10000`。 + - 例如: `-linesPerFile 1` +* `-t `: + - 指定session查询时的超时时间,单位为ms + +除此之外,如果你没有使用`-s`和`-q`参数,在导出脚本被启动之后你需要按照程序提示输入查询语句,不同的查询结果会被保存到不同的CSV文件中。 + +### 运行示例 + +```shell +# Unix/OS X +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 + +# Windows +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 +``` + +### SQL 文件示例 + +```sql +select * from root.**; +select * from root.** align by device; +``` + +`select * from root.**`的执行结果: + +```sql +Time,root.ln.wf04.wt04.status(BOOLEAN),root.ln.wf03.wt03.hardware(TEXT),root.ln.wf02.wt02.status(BOOLEAN),root.ln.wf02.wt02.hardware(TEXT),root.ln.wf01.wt01.hardware(TEXT),root.ln.wf01.wt01.status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,true,"v1",true,"v1",v1,true +1970-01-01T08:00:00.002+08:00,true,"v1",,,,true +``` + +`select * from root.** align by device`的执行结果: + +```sql +Time,Device,hardware(TEXT),status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,root.ln.wf01.wt01,"v1",true +1970-01-01T08:00:00.002+08:00,root.ln.wf01.wt01,,true +1970-01-01T08:00:00.001+08:00,root.ln.wf02.wt02,"v1",true +1970-01-01T08:00:00.001+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.002+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.001+08:00,root.ln.wf04.wt04,,true +1970-01-01T08:00:00.002+08:00,root.ln.wf04.wt04,,true +``` + +布尔类型的数据用`true`或者`false`来表示,此处没有用双引号括起来。文本数据需要使用双引号括起来。 + +### 注意 + +注意,如果导出字段存在如下特殊字符: + +1. `,`: 导出程序会在`,`字符前加`\`来进行转义。 + +## 使用 import-csv.sh + +### 创建元数据 (可选) + +```sql +CREATE DATABASE root.fit.d1; +CREATE DATABASE root.fit.d2; +CREATE DATABASE root.fit.p; +CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; +CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; +``` + +IoTDB 具有类型推断的能力,因此在数据导入前创建元数据不是必须的。但我们仍然推荐在使用 CSV 导入工具导入数据前创建元数据,因为这可以避免不必要的类型转换错误。 + +### 待导入 CSV 文件示例 + +通过时间对齐,并且header中不包含数据类型的数据。 + +```sql +Time,root.test.t1.str,root.test.t2.str,root.test.t2.int +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,"123",, +``` + +通过时间对齐,并且header中包含数据类型的数据。(Text类型数据支持加双引号和不加双引号) + +```sql +Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.int(INT32) +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,123,hello world,123 +1970-01-01T08:00:00.003+08:00,"123",, +1970-01-01T08:00:00.004+08:00,123,,12 +``` + +通过设备对齐,并且header中不包含数据类型的数据。 + +```sql +Time,Device,str,int +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +``` + +通过设备对齐,并且header中包含数据类型的数据。(Text类型数据支持加双引号和不加双引号) + +```sql +Time,Device,str(TEXT),int(INT32) +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 +``` + +### 运行方法 + +```shell +# Unix/OS X +>tools/import-csv.sh -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] [-linesPerFailedFile ] +# Windows +>tools\import-csv.bat -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] [-linesPerFailedFile ] +``` + +参数: + +* `-f`: + - 指定你想要导入的数据,这里可以指定文件或者文件夹。如果指定的是文件夹,将会把文件夹中所有的后缀为txt与csv的文件进行批量导入。 + - 例如: `-f filename.csv` + +* `-fd`: + - 指定一个目录来存放保存失败的行的文件,如果你没有指定这个参数,失败的文件将会被保存到源数据的目录中,然后文件名是源文件名加上`.failed`的后缀。 + - 例如: `-fd ./failed/` + +* `-aligned`: + - 是否使用`aligned`接口? 默认参数为`false`。 + - 例如: `-aligned true` + +* `-batch`: + - 用于指定每一批插入的数据的点数。如果程序报了`org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`这个错的话,就可以适当的调低这个参数。 + - 例如: `-batch 100000`,`100000`是默认值。 + +* `-tp`: + - 用于指定时间精度,可选值包括`ms`(毫秒),`ns`(纳秒),`us`(微秒),默认值为`ms`。 + +* `-typeInfer `: + - 用于指定类型推断规则. + - `srcTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`NaN`. + - `dstTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`text`. + - 当`srcTsDataType`为`boolean`, `dstTsDataType`只能为`boolean`或`text`. + - 当`srcTsDataType`为`NaN`, `dstTsDataType`只能为`float`, `double`或`text`. + - 当`srcTsDataType`为数值类型, `dstTsDataType`的精度需要高于`srcTsDataType`. + - 例如:`-typeInfer boolean=text,float=double` + +* `-linesPerFailedFile `: + - 用于指定每个导入失败文件写入数据的行数,默认值为10000。 + - 例如:`-linesPerFailedFile 1` + +### 运行示例 + +```sh +# Unix/OS X +>tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +>tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 +# Windows +>tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv +# or +>tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 +``` + +### 注意 + +注意,在导入数据前,需要特殊处理下列的字符: + +1. `,` :如果text类型的字段中包含`,`那么需要用`\`来进行转义。 +2. 你可以导入像`yyyy-MM-dd'T'HH:mm:ss`, `yyy-MM-dd HH:mm:ss`, 或者 `yyyy-MM-dd'T'HH:mm:ss.SSSZ`格式的时间。 +3. `Time`这一列应该放在第一列。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/IoTDB-Data-Dir-Overview-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/IoTDB-Data-Dir-Overview-Tool.md new file mode 100644 index 00000000..6c0e88b7 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/IoTDB-Data-Dir-Overview-Tool.md @@ -0,0 +1,82 @@ + + +# IoTDB数据文件夹概览工具 + +IoTDB数据文件夹概览工具用于打印出数据文件夹的结构概览信息,工具位置为 tools/tsfile/print-iotdb-data-dir。 + +## 用法 + +- Windows: + +```bash +.\print-iotdb-data-dir.bat (<输出结果的存储路径>) +``` + +- Linux or MacOs: + +```shell +./print-iotdb-data-dir.sh (<输出结果的存储路径>) +``` + +注意:如果没有设置输出结果的存储路径, 将使用相对路径"IoTDB_data_dir_overview.txt"作为默认值。 + +## 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-iotdb-data-dir.bat D:\github\master\iotdb\data\datanode\data +```````````````````````` +Starting Printing the IoTDB Data Directory Overview +```````````````````````` +output save path:IoTDB_data_dir_overview.txt +data dir num:1 +143 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +|============================================================== +|D:\github\master\iotdb\data\datanode\data +|--sequence +| |--root.redirect0 +| | |--1 +| | | |--0 +| |--root.redirect1 +| | |--2 +| | | |--0 +| |--root.redirect2 +| | |--3 +| | | |--0 +| |--root.redirect3 +| | |--4 +| | | |--0 +| |--root.redirect4 +| | |--5 +| | | |--0 +| |--root.redirect5 +| | |--6 +| | | |--0 +| |--root.sg1 +| | |--0 +| | | |--0 +| | | |--2760 +|--unsequence +|============================================================== +````````````````````````` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/JMX-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/JMX-Tool.md new file mode 100644 index 00000000..9d382bbd --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/JMX-Tool.md @@ -0,0 +1,59 @@ + + +# JMX 工具 + +Java VisualVM 提供了一个可视化的界面,用于查看 Java 应用程序在 Java 虚拟机(JVM)上运行的详细信息,并对这些应用程序进行故障排除和分析。 + +## 使用 + +第一步:获得 IoTDB-server。 + +第二步:编辑配置文件 + +* IoTDB 在本地 +查看`$IOTDB_HOME/conf/jmx.password`,使用默认用户或者在此添加新用户 +若新增用户,编辑`$IOTDB_HOME/conf/jmx.access`,添加新增用户权限 + +* IoTDB 不在本地 +编辑`$IOTDB_HOME/conf/datanode-env.sh` +修改以下参数: +``` +JMX_LOCAL="false" +JMX_IP="the_real_iotdb_server_ip" # 填写实际 IoTDB 的 IP 地址 +``` +查看`$IOTDB_HOME/conf/jmx.password`,使用默认用户或者在此添加新用户 +若新增用户,编辑`$IOTDB_HOME/conf/jmx.access`,添加新增用户权限 + +第三步:启动 IoTDB-server。 + +第四步:使用 jvisualvm +1. 确保安装 jdk 8。jdk 8 以上需要 [下载 visualvm](https://visualvm.github.io/download.html) +2. 打开 jvisualvm +3. 在左侧导航栏空白处右键 -> 添加 JMX 连接 + + +4. 填写信息进行登录,按下图分别填写,注意需要勾选”不要求 SSL 连接”。 +例如: +连接:192.168.130.15:31999 +用户名:iotdb +口令:passw!d + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Load-Tsfile.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Load-Tsfile.md new file mode 100644 index 00000000..b92bde4f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Load-Tsfile.md @@ -0,0 +1,110 @@ + + +# 加载 TsFile + +## 介绍 +加载外部 tsfile 文件工具允许用户向正在运行中的 Apache IoTDB 中加载 tsfile 文件。或者您也可以使用脚本的方式将tsfile加载进IoTDB。 + +## 使用SQL加载 +用户通过 Cli 工具或 JDBC 向 Apache IoTDB 系统发送指定命令实现文件加载的功能。 + +### 加载 tsfile 文件 + +加载 tsfile 文件的指令为:`load '' [sglevel=int][verify=true/false][onSuccess=delete/none]` + +该指令有两种用法: + +1. 通过指定文件路径(绝对路径)加载单 tsfile 文件。 + +第一个参数表示待加载的 tsfile 文件的路径。load 命令有三个可选项,分别是 sglevel,值域为整数,verify,值域为 true/false,onSuccess,值域为delete/none。不同选项之间用空格隔开,选项之间无顺序要求。 + +SGLEVEL 选项,当 tsfile 对应的 database 不存在时,用户可以通过 sglevel 参数的值来制定 database 的级别,默认为`iotdb-system.properties`中设置的级别。例如当设置 level 参数为1时表明此 tsfile 中所有时间序列中层级为1的前缀路径是 database,即若存在设备 root.sg.d1.s1,此时 root.sg 被指定为 database。 + +VERIFY 选项表示是否对载入的 tsfile 中的所有时间序列进行元数据检查,默认为 true。开启时,若载入的 tsfile 中的时间序列在当前 iotdb 中也存在,则会比较该时间序列的所有 Measurement 的数据类型是否一致,如果出现不一致将会导致载入失败,关闭该选项会跳过检查,载入更快。 + +ONSUCCESS选项表示对于成功载入的tsfile的处置方式,默认为delete,即tsfile成功加载后将被删除,如果是none表明tsfile成功加载之后依然被保留在源文件夹。 + +若待加载的 tsfile 文件对应的`.resource`文件存在,会被一并加载至 Apache IoTDB 数据文件的目录和引擎中,否则将通过 tsfile 文件重新生成对应的`.resource`文件,即加载的 tsfile 文件所对应的`.resource`文件不是必要的。 + +示例: + +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true onSuccess=none` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1 onSuccess=delete` + + +2. 通过指定文件夹路径(绝对路径)批量加载文件。 + +第一个参数表示待加载的 tsfile 文件夹的路径。选项意义与加载单个 tsfile 文件相同。 + +示例: + +* `load '/Users/Desktop/data'` +* `load '/Users/Desktop/data' verify=false` +* `load '/Users/Desktop/data' verify=true` +* `load '/Users/Desktop/data' verify=true sglevel=1` +* `load '/Users/Desktop/data' verify=false sglevel=1 onSuccess=delete` + +**注意**,如果`$IOTDB_HOME$/conf/iotdb-system.properties`中`enable_auto_create_schema=true`时会在加载tsfile的时候自动创建tsfile中的元数据,否则不会自动创建。 + +## 使用脚本加载 + +若您在Windows环境中,请运行`$IOTDB_HOME/tools/load-tsfile.bat`,若为Linux或Unix,请运行`load-tsfile.sh` + +```bash +./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--verify true/false] [--onSuccess none/delete] +-f 待加载的文件或文件夹路径,必要字段 +-h IoTDB的Host地址,可选,默认127.0.0.1 +-p IoTDB的端口,可选,默认6667 +-u IoTDb登录用户名,可选,默认root +-pw IoTDB登录密码,可选,默认root +--sgLevel 加载TsFile自动创建Database的路径层级,可选,默认值为iotdb-system.properties指定值 +--verify 是否对加载TsFile进行元数据校验,可选,默认为True +--onSuccess 对成功加载的TsFile的处理方法,可选,默认为delete,成功加载之后删除源TsFile,设为none时会 保留源TsFile +``` + +### 使用范例 + +假定服务器192.168.0.101:6667上运行一个IoTDB实例,想从将本地保存的TsFile备份文件夹D:\IoTDB\data中的所有的TsFile文件都加载进此IoTDB实例。 + +首先移动至`$IOTDB_HOME/tools/`,打开命令行,然后执行 + +```bash +./load-tsfile.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root +``` + +等待脚本执行完成之后,可以检查IoTDB实例中数据已经被正确加载 + +### 常见问题 + +- 找不到或无法加载主类 + - 可能是由于未设置环境变量$IOTDB_HOME,请设置环境变量之后重试 +- 提示-f option must be set! + - 输入命令缺少待-f字段(加载文件或文件夹路径),请添加之后重新执行 +- 执行到中途崩溃了想重新加载怎么办 + - 重新执行刚才的命令,重新加载数据不会影响加载之后的正确性 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Log-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Log-Tool.md new file mode 100644 index 00000000..f2e24856 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Log-Tool.md @@ -0,0 +1,68 @@ + + +# 系统日志 + +IoTDB 支持用户通过修改日志配置文件的方式对 IoTDB 系统日志(如日志输出级别等)进行配置,系统日志配置文件默认位置在$IOTDB_HOME/conf 文件夹下。 + +默认的日志配置文件名为 logback.xml。用户可以通过增加或更改其中的 xml 树型节点参数对系统运行日志的相关配置进行修改。需要注意的是,使用日志配置文件配置系统日志并不会在修改后立即生效,而是在重启系统后生效。详细配置说明参看本文日志文件配置说明。 + +同时,为了方便在系统运行过程中运维人员对系统的调试,我们为系统运维人员提供了动态修改日志配置的 JMX 接口,能够在系统不重启的前提下实时对系统的 Log 模块进行配置。详细使用方法参看动态系统日志配置说明)。 + +## 动态系统日志配置说明 + +### 连接 JMX + +本节以 Jconsole 为例介绍连接 JMX 并进入动态系统日志配置模块的方法。启动 Jconsole 控制页面,在新建连接处建立与 IoTDB Server 的 JMX 连接(可以选择本地进程或给定 IoTDB 的 IP 及 PORT 进行远程连接,IoTDB 的 JMX 服务默认运行端口为 31999),如下图使用远程进程连接 Localhost 下运行在 31999 端口的 IoTDB JMX 服务。 + + + +连接到 JMX 后,您可以通过 MBean 选项卡找到名为`ch.qos.logback.classic`的`MBean`,如下图所示。 + + + +在`ch.qos.logback.classic`的 MBean 操作(Operations)选项中,可以看到当前动态系统日志配置支持的 6 种接口,您可以通过使用相应的方法,来执行相应的操作,操作页面如图。 + + + +### 动态系统日志接口说明 + +* reloadDefaultConfiguration 接口 + +该方法为重新加载默认的 logback 配置文件,用户可以先对默认的配置文件进行修改,然后调用该方法将修改后的配置文件重新加载到系统中,使其生效。 + +* reloadByFileName 接口 + +该方法为加载一个指定路径的 logback 配置文件,并使其生效。该方法接受一个名为 p1 的 String 类型的参数,该参数为需要指定加载的配置文件路径。 + +* getLoggerEffectiveLevel 接口 + +该方法为获取指定 Logger 当前生效的日志级别。该方法接受一个名为 p1 的 String 类型的参数,该参数为指定 Logger 的名称。该方法返回指定 Logger 当前生效的日志级别。 + +* getLoggerLevel 接口 + +该方法为获取指定 Logger 的日志级别。该方法接受一个名为 p1 的 String 类型的参数,该参数为指定 Logger 的名称。该方法返回指定 Logger 的日志级别。 + +需要注意的是,该方法与`getLoggerEffectiveLevel`方法的区别在于,该方法返回的是指定 Logger 在配置文件中被设定的日志级别,如果用户没有对该 Logger 进行日志级别的设定,则返回空。按照 Logback 的日志级别继承机制,如果一个 Logger 没有被显示地设定日志级别,其将会从其最近的祖先继承日志级别的设定。这时,调用`getLoggerEffectiveLevel`方法将返回该 Logger 生效的日志级别;而调用本节所述方法,将返回空。 + +* setLoggerLevel 接口 + +该方法为设置指定 Logger 的日志级别。该方法接受一个名为 p1 的 String 类型的参数和一个名为 p2 的 String 类型的参数,分别指定 Logger 的名称和目标的日志等级。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/MLogParser-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/MLogParser-Tool.md new file mode 100644 index 00000000..64473e1f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/MLogParser-Tool.md @@ -0,0 +1,39 @@ + + +# Mlog 解析工具 + +0.12.x 版本之后,IoTDB 将元数据文件编码成二进制。 + +如果想要将二进制元数据解析为人可读的字符串格式,可以使用本工具。 + +该工具目前仅支持解析 mlog.bin 文件。 + +在分布式场景下,若 SchemaRegion 的共识协议采用的是 RatisConsensus,IoTDB 不会使用 mlog.bin 文件来存储元数据,也将不会生成 mlog.bin 文件。 + +## 使用方式 + +Linux/MacOS +> ./print-schema-log.sh -f /your path/mlog.bin -o /your path/mlog.txt + +Windows + +> .\print-schema-log.bat -f \your path\mlog.bin -o \your path\mlog.txt diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Maintenance-Command.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Maintenance-Command.md new file mode 100644 index 00000000..10b99406 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Maintenance-Command.md @@ -0,0 +1,222 @@ + +# 运维命令 + +## FLUSH + +将指定 database 的内存缓存区 Memory Table 的数据持久化到磁盘上,并将数据文件封口。在集群模式下,我们提供了持久化本节点的指定 database 的缓存、持久化整个集群指定 database 的缓存命令。 + +注意:此命令客户端不需要手动调用,IoTDB 有 wal 保证数据安全,IoTDB 会选择合适的时机进行 flush。 +如果频繁调用 flush 会导致数据文件很小,降低查询性能。 + +```sql +IoTDB> FLUSH +IoTDB> FLUSH ON LOCAL +IoTDB> FLUSH ON CLUSTER +IoTDB> FLUSH root.ln +IoTDB> FLUSH root.sg1,root.sg2 ON LOCAL +IoTDB> FLUSH root.sg1,root.sg2 ON CLUSTER +``` + +## CLEAR CACHE + + +手动清除chunk, chunk metadata和timeseries metadata的缓存,在内存资源紧张时,可以通过此命令,释放查询时缓存所占的内存空间。在集群模式下,我们提供了清空本节点缓存、清空整个集群缓存命令。 + +```sql +IoTDB> CLEAR CACHE +IoTDB> CLEAR CACHE ON LOCAL +IoTDB> CLEAR CACHE ON CLUSTER +``` + +## START REPAIR DATA + +启动一个数据修复任务,扫描创建修复任务的时间之前产生的 tsfile 文件并修复有乱序错误的文件。 + +```sql +IoTDB> START REPAIR DATA +IoTDB> START REPAIR DATA ON LOCAL +IoTDB> START REPAIR DATA ON CLUSTER +``` + +## STOP REPAIR DATA + +停止一个进行中的修复任务。如果需要再次恢复一个已停止的数据修复任务的进度,可以重新执行 `START REPAIR DATA`. + +```sql +IoTDB> STOP REPAIR DATA +IoTDB> STOP REPAIR DATA ON LOCAL +IoTDB> STOP REPAIR DATA ON CLUSTER +``` + +## SET SYSTEM TO READONLY / RUNNING + +手动设置系统为正常运行、只读状态。在集群模式下,我们提供了设置本节点状态、设置整个集群状态的命令,默认对整个集群生效。 + +```sql +IoTDB> SET SYSTEM TO RUNNING +IoTDB> SET SYSTEM TO READONLY ON LOCAL +IoTDB> SET SYSTEM TO READONLY ON CLUSTER +``` + +## 终止查询 + +IoTDB 支持设置 Session 连接超时和查询超时时间,并支持手动终止正在执行的查询。 + +### Session 超时 + +Session 超时控制何时关闭空闲 Session。空闲 Session 指在一段时间内没有发起任何操作的 Session。 + +Session 超时默认未开启。可以在配置文件中通过 `dn_session_timeout_threshold` 参数进行配置。 + +### 查询超时 + +对于执行时间过长的查询,IoTDB 将强行中断该查询,并抛出超时异常,如下所示: + +```sql +IoTDB> select * from root.**; +Msg: 701 Current query is time out, please check your statement or modify timeout parameter. +``` + +系统默认的超时时间为 60000 ms,可以在配置文件中通过 `query_timeout_threshold` 参数进行自定义配置。 + +如果您使用 JDBC 或 Session,还支持对单个查询设置超时时间(单位为 ms): + +```java +((IoTDBStatement) statement).executeQuery(String sql, long timeoutInMS) +session.executeQueryStatement(String sql, long timeout) +``` + +> 如果不配置超时时间参数或将超时时间设置为负数,将使用服务器端默认的超时时间。 +> 如果超时时间设置为0,则会禁用超时功能。 + +### 查询终止 + +除了被动地等待查询超时外,IoTDB 还支持主动地终止查询: + +#### 终止指定查询 + +```sql +KILL QUERY +``` + +通过指定 `queryId` 可以中止指定的查询,`queryId`是一个字符串,所以使用时需要添加引号。 + +为了获取正在执行的查询 id,用户可以使用 [show queries](#show-queries) 命令,该命令将显示所有正在执行的查询列表。 + +##### 示例 +```sql +kill query '20221205_114444_00003_5' +``` + +#### 终止所有查询 + +```sql +KILL ALL QUERIES +``` + +终止所有DataNode上的所有查询。 + +## SHOW QUERIES + +该命令用于显示所有正在执行的查询,有以下使用场景: +- 想要中止某个查询时,需要获取查询对应的queryId +- 中止某个查询后验证查询是否已被中止 + +### 语法 + +```SQL +SHOW QUERIES | (QUERY PROCESSLIST) +[WHERE whereCondition] +[ORDER BY sortKey {ASC | DESC}] +[LIMIT rowLimit] [OFFSET rowOffset] +``` +注意: +- 兼容旧语法`show query processlist` +- 使用WHERE时请保证过滤的目标列是结果集中存在的列 +- 使用ORDER BY时请保证sortKey是结果集中存在的列 + +### 结果集 +Time:查询开始时间,数据类型为`INT64` +QueryId:集群级别唯一的查询标识,数据类型为`TEXT`,格式为`yyyyMMdd_HHmmss_index_dataNodeId` +DataNodeId:执行该查询的节点,数据类型为`INT32` +ElapsedTime:查询已执行时间(不完全精确),以`秒`为单位,数据类型为`FLOAT` +Statement:查询的原始语句,数据类型为`TEXT` + +``` ++-----------------------------+-----------------------+----------+-----------+------------+ +| Time| QueryId|DataNodeId|ElapsedTime| Statement| ++-----------------------------+-----------------------+----------+-----------+------------+ +|2022-12-30T13:26:47.260+08:00|20221230_052647_00005_1| 1| 0.019|show queries| ++-----------------------------+-----------------------+----------+-----------+------------+ +``` +注意: +- 结果集默认按照Time列升序排列,如需按其他key进行排序,请使用ORDER BY子句 + +### SQL示例 +#### 示例1:获取当前所有执行时间大于30s的查询 + +SQL 语句为: +```SQL +SHOW QUERIES WHERE ElapsedTime > 30 +``` + +该 SQL 语句的执行结果如下: +``` ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +| Time| QueryId|DataNodeId|ElapsedTime| Statement| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:44.515+08:00|20221205_114444_00002_2| 2| 31.111| select * from root.test1| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:45.515+08:00|20221205_114445_00003_2| 2| 30.111| select * from root.test2| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:43.515+08:00|20221205_114443_00001_3| 3| 32.111| select * from root.**| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +``` + +#### 示例2:获取当前执行耗时Top5的查询 + +SQL 语句为: +```SQL +SHOW QUERIES limit 5 +``` + +等价于 +```SQL +SHOW QUERIES ORDER BY ElapsedTime DESC limit 5 +``` + +该 SQL 语句的执行结果如下: +``` ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +| Time| QueryId|DataNodeId|ElapsedTime| Statement| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:44.515+08:00|20221205_114444_00003_5| 5| 31.111| select * from root.test1| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:45.515+08:00|20221205_114445_00003_2| 2| 30.111| select * from root.test2| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:46.515+08:00|20221205_114446_00003_3| 3| 29.111| select * from root.test3| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:47.515+08:00|20221205_114447_00003_2| 2| 28.111| select * from root.test4| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +|2022-12-05T11:44:48.515+08:00|20221205_114448_00003_4| 4| 27.111| select * from root.test5| ++-----------------------------+-----------------------+----------+-----------+-----------------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Overlap-Validation-And-Repair-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Overlap-Validation-And-Repair-Tool.md new file mode 100644 index 00000000..c08f119d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Overlap-Validation-And-Repair-Tool.md @@ -0,0 +1,41 @@ + + +# Overlap validation and repair 工具 + +Overlap Validation And Repair 工具用于验证顺序空间内 tsfile 的 resource 文件的重叠情况并进行修复。 + +验证功能可以在任意场景下运行,在找出所有存在重叠的文件后,需要输入 'y' 以确认是否进行修复。 + +**修复功能必须在相关的 DataNode 停止之后执行,且对应的数据目录中不存在未完成的合并任务。** +为了确保没有尚未完成的合并任务,你可以修改配置文件中开启合并相关的配置项为 false,然后重启 DataNode 并等待合并恢复任务的完成,停止 DataNode,再运行这个工具。 +## 使用方法 +```shell +#MacOs or Linux +./check-overlap-sequence-files-and-repair.sh [sequence_data_dir1] [sequence_data_dir2]... +# Windows +.\check-overlap-sequence-files-and-repair.bat [sequence_data_dir1] [sequence_data_dir2]... +``` +## 示例 +```shell +./check-overlap-sequence-files-and-repair.sh /data1/sequence/ /data2/sequence +``` +这个示例指定了配置的两个数据目录进行扫描: /data1/sequence/, /data2/sequence。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/SchemaFileSketch-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/SchemaFileSketch-Tool.md new file mode 100644 index 00000000..db7fae1e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/SchemaFileSketch-Tool.md @@ -0,0 +1,35 @@ + + +# PBTreeFile 解析工具 + +自 1.1 版本起,IoTDB 将每个 database 下序列的元数据存储为 pbtree.pst 文件。 + +如果需要将该文件转为便于阅读的的格式,可以使用本工具来解析指定 pbtree.pst 。 + +## 使用方式 + +Linux/MacOS +> ./print-pbtree-file.sh -f your/path/to/pbtree.pst -o /your/path/to/sketch.txt + +Windows + +> ./print-pbtree-file.bat -f your/path/to/pbtree.pst -o /your/path/to/sketch.txt diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Load-Export-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Load-Export-Tool.md new file mode 100644 index 00000000..6706e397 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Load-Export-Tool.md @@ -0,0 +1,181 @@ + + +# TsFile导入导出工具 + +## TsFile导入工具 + +### 介绍 +加载外部 tsfile 文件工具允许用户向正在运行中的 Apache IoTDB 中加载 tsfile 文件。或者您也可以使用脚本的方式将tsfile加载进IoTDB。 + +### 使用SQL加载 +用户通过 Cli 工具或 JDBC 向 Apache IoTDB 系统发送指定命令实现文件加载的功能。 + +#### 加载 tsfile 文件 + +加载 tsfile 文件的指令为:`load '' [sglevel=int][verify=true/false][onSuccess=delete/none]` + +该指令有两种用法: + +1. 通过指定文件路径(绝对路径)加载单 tsfile 文件。 + +第一个参数表示待加载的 tsfile 文件的路径。load 命令有三个可选项,分别是 sglevel,值域为整数,verify,值域为 true/false,onSuccess,值域为delete/none。不同选项之间用空格隔开,选项之间无顺序要求。 + +SGLEVEL 选项,当 tsfile 对应的 database 不存在时,用户可以通过 sglevel 参数的值来制定 database 的级别,默认为`iotdb-system.properties`中设置的级别。例如当设置 level 参数为1时表明此 tsfile 中所有时间序列中层级为1的前缀路径是 database,即若存在设备 root.sg.d1.s1,此时 root.sg 被指定为 database。 + +VERIFY 选项表示是否对载入的 tsfile 中的所有时间序列进行元数据检查,默认为 true。开启时,若载入的 tsfile 中的时间序列在当前 iotdb 中也存在,则会比较该时间序列的所有 Measurement 的数据类型是否一致,如果出现不一致将会导致载入失败,关闭该选项会跳过检查,载入更快。 + +ONSUCCESS选项表示对于成功载入的tsfile的处置方式,默认为delete,即tsfile成功加载后将被删除,如果是none表明tsfile成功加载之后依然被保留在源文件夹。 + +若待加载的 tsfile 文件对应的`.resource`文件存在,会被一并加载至 Apache IoTDB 数据文件的目录和引擎中,否则将通过 tsfile 文件重新生成对应的`.resource`文件,即加载的 tsfile 文件所对应的`.resource`文件不是必要的。 + +示例: + +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true onSuccess=none` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1 onSuccess=delete` + + +2. 通过指定文件夹路径(绝对路径)批量加载文件。 + +第一个参数表示待加载的 tsfile 文件夹的路径。选项意义与加载单个 tsfile 文件相同。 + +示例: + +* `load '/Users/Desktop/data'` +* `load '/Users/Desktop/data' verify=false` +* `load '/Users/Desktop/data' verify=true` +* `load '/Users/Desktop/data' verify=true sglevel=1` +* `load '/Users/Desktop/data' verify=false sglevel=1 onSuccess=delete` + +**注意**,如果`$IOTDB_HOME$/conf/iotdb-system.properties`中`enable_auto_create_schema=true`时会在加载tsfile的时候自动创建tsfile中的元数据,否则不会自动创建。 + +### 使用脚本加载 + +若您在Windows环境中,请运行`$IOTDB_HOME/tools/load-tsfile.bat`,若为Linux或Unix,请运行`load-tsfile.sh` + +```bash +./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--verify true/false] [--onSuccess none/delete] +-f 待加载的文件或文件夹路径,必要字段 +-h IoTDB的Host地址,可选,默认127.0.0.1 +-p IoTDB的端口,可选,默认6667 +-u IoTDb登录用户名,可选,默认root +-pw IoTDB登录密码,可选,默认root +--sgLevel 加载TsFile自动创建Database的路径层级,可选,默认值为iotdb-system.properties指定值 +--verify 是否对加载TsFile进行元数据校验,可选,默认为True +--onSuccess 对成功加载的TsFile的处理方法,可选,默认为delete,成功加载之后删除源TsFile,设为none时会 保留源TsFile +``` + +#### 使用范例 + +假定服务器192.168.0.101:6667上运行一个IoTDB实例,想从将本地保存的TsFile备份文件夹D:\IoTDB\data中的所有的TsFile文件都加载进此IoTDB实例。 + +首先移动至`$IOTDB_HOME/tools/`,打开命令行,然后执行 + +```bash +./load-tsfile.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root +``` + +等待脚本执行完成之后,可以检查IoTDB实例中数据已经被正确加载 + +#### 常见问题 + +- 找不到或无法加载主类 + - 可能是由于未设置环境变量$IOTDB_HOME,请设置环境变量之后重试 +- 提示-f option must be set! + - 输入命令缺少待-f字段(加载文件或文件夹路径),请添加之后重新执行 +- 执行到中途崩溃了想重新加载怎么办 + - 重新执行刚才的命令,重新加载数据不会影响加载之后的正确性 + +## TsFile导出工具 + +TsFile 工具可帮您 通过执行指定sql、命令行sql、sql文件的方式将结果集以TsFile文件的格式导出至指定路径. + +### 运行方法 + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h -p -u -pw -td [-f -q -s ] + +# Windows +> tools\export-tsfile.bat -h -p -u -pw -td [-f -q -s ] +``` + +参数: +* `-h `: + - IoTDB服务的主机地址。 +* `-p `: + - IoTDB服务的端口号。 +* `-u `: + - IoTDB服务的用户名。 +* `-pw `: + - IoTDB服务的密码。 +* `-td `: + - 为导出的TsFile文件指定输出路径。 +* `-f `: + - 为导出的TsFile文件的文件名,只需写文件名称,不能包含文件路径和后缀。如果sql文件或控制台输入时包含多个sql,会按照sql顺序生成多个TsFile文件。 + - 例如:文件中或命令行共有3个SQL,-f 为"dump",那么会在目标路径下生成 dump0.tsfile、dump1.tsfile、dump2.tsfile三个TsFile文件。 +* `-q `: + - 在命令中直接指定想要执行的查询语句。 + - 例如: `select * from root.** limit 100` +* `-s `: + - 指定一个SQL文件,里面包含一条或多条SQL语句。如果一个SQL文件中包含多条SQL语句,SQL语句之间应该用换行符进行分割。每一条SQL语句对应一个输出的TsFile文件。 +* `-t `: + - 指定session查询时的超时时间,单位为ms + + +除此之外,如果你没有使用`-s`和`-q`参数,在导出脚本被启动之后你需要按照程序提示输入查询语句,不同的查询结果会被保存到不同的TsFile文件中。 + +### 运行示例 + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 + +# Windows +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 +``` + +### Q&A + +- 建议在导入数据时不要同时执行写入数据命令,这将有可能导致JVM内存不足的情况。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Resource-Sketch-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Resource-Sketch-Tool.md new file mode 100644 index 00000000..d9eb0032 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Resource-Sketch-Tool.md @@ -0,0 +1,79 @@ + + +# TsFile Resource概览工具 + +TsFile resource概览工具用于打印出TsFile resource文件的内容,工具位置为 tools/tsfile/print-tsfile-resource-files。 + +## 用法 + +- Windows: + +```bash +.\print-tsfile-resource-files.bat +``` + +- Linux or MacOs: + +``` +./print-tsfile-resource-files.sh +``` + +## 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +147 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +230 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +231 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +233 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +237 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file folder D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0 finished. +````````````````````````` + +`````````````````````````bash +.\print-tsfile-resource-files.bat D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource +```````````````````````` +Starting Printing the TsFileResources +```````````````````````` +178 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +186 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +187 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +188 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Cannot find IOTDB_HOME or IOTDB_CONF environment variable when loading config file iotdb-system.properties, use default configuration +192 [main] WARN o.a.iotdb.db.conf.IoTDBDescriptor - Couldn't load the configuration iotdb-system.properties from any of the known sources. +Analyzing D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile ... + +Resource plan index range [9223372036854775807, -9223372036854775808] +device root.sg1.d1, start time 0 (1970-01-01T08:00+08:00[GMT+08:00]), end time 99 (1970-01-01T08:00:00.099+08:00[GMT+08:00]) + +Analyzing the resource file D:\github\master\iotdb\data\datanode\data\sequence\root.sg1\0\0\1669359533489-1-0-0.tsfile.resource finished. +````````````````````````` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Settle-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Settle-Tool.md new file mode 100644 index 00000000..5d11f29d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Settle-Tool.md @@ -0,0 +1,42 @@ + + +# TsFile Settle工具 +TsFile Settle工具用于将一个或多个存在修改记录文件的TsFile重写,通过向DataNode发送RPC的方式提交TsFile合并任务来重写TsFile。 +## 使用方式 +```shell +#MacOs or Linux +./settle-tsfile.sh -h [host] -p [port] -f [filePaths] +# Windows +.\settle-tsfile.bat -h [host] -p [port] -f [filePaths] +``` +其中host和port参数为DataNodeInternalRPCService的host和port,如果不指定默认值分别为127.0.0.1和10730, filePaths参数指定要作为一个compaction任务提交的所有TsFile在此DataNode上的绝对路径,以空格分隔,需要传入至少一个路径。 + +## 使用示例 +```shell +./settle-tsfile.sh -h 127.0.0.1 -p 10730 -f /data/sequence/root.sg/0/0/1672133354759-2-0-0.tsfile /data/sequence/root.sg/0/0/1672306417865-3-0-0.tsfile /data/sequence/root.sg/0/0/1672306417865-3-0-0.tsfile +``` +## 使用要求 +* 最少指定一个TsFile +* 所有指定的TsFile都在同一个空间内且连续,不支持跨空间合并 +* 指定的文件路径为指定DataNode所在节点的该TsFile的绝对路径 +* 指定的DataNode上配置了允许输入的TsFile所在的空间执行合并操作 +* 指定的TsFile中至少有一个存在对应的.mods文件 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Sketch-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Sketch-Tool.md new file mode 100644 index 00000000..3c0a2cca --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Sketch-Tool.md @@ -0,0 +1,108 @@ + + +# TsFile概览工具 + +TsFile概览工具用于以概要模式打印出一个TsFile的内容,工具位置为 tools/tsfile/print-tsfile。 + +## 用法 + +- Windows: + +```bash +.\print-tsfile-sketch.bat (<输出结果的存储路径>) +``` + +- Linux or MacOs: + +```shell +./print-tsfile-sketch.sh (<输出结果的存储路径>) +``` + +注意:如果没有设置输出结果的存储路径, 将使用相对路径"TsFile_sketch_view.txt"作为默认值。 + +## 示例 + +以Windows系统为例: + +`````````````````````````bash +.\print-tsfile.bat D:\github\master\1669359533965-1-0-0.tsfile D:\github\master\sketch.txt +```````````````````````` +Starting Printing the TsFile Sketch +```````````````````````` +TsFile path:D:\github\master\1669359533965-1-0-0.tsfile +Sketch save path:D:\github\master\sketch.txt +148 [main] WARN o.a.i.t.c.conf.TSFileDescriptor - not found iotdb-system.properties, use the default configs. +-------------------------------- TsFile Sketch -------------------------------- +file path: D:\github\master\1669359533965-1-0-0.tsfile +file length: 2974 + + POSITION| CONTENT + -------- ------- + 0| [magic head] TsFile + 6| [version number] 3 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1, num of Chunks:3 + 7| [Chunk Group Header] + | [marker] 0 + | [deviceID] root.sg1.d1 + 20| [Chunk] of root.sg1.d1.s1, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [chunk header] marker=5, measurementID=s1, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 893| [Chunk] of root.sg1.d1.s2, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [chunk header] marker=5, measurementID=s2, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 + 1766| [Chunk] of root.sg1.d1.s3, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [chunk header] marker=5, measurementID=s3, dataSize=864, dataType=INT64, compressionType=SNAPPY, encodingType=RLE + | [page] UncompressedSize:862, CompressedSize:860 +||||||||||||||||||||| [Chunk Group] of root.sg1.d1 ends + 2656| [marker] 2 + 2657| [TimeseriesIndex] of root.sg1.d1.s1, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9032452783138882770,maxValue:9117677033041335123,firstValue:7068645577795875906,lastValue:-5833792328174747265,sumValue:5.795959009889246E19] + | [ChunkIndex] offset=20 + 2728| [TimeseriesIndex] of root.sg1.d1.s2, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-8806861312244965718,maxValue:9192550740609853234,firstValue:1150295375739457693,lastValue:-2839553973758938646,sumValue:8.2822564314572677E18] + | [ChunkIndex] offset=893 + 2799| [TimeseriesIndex] of root.sg1.d1.s3, tsDataType:INT64, startTime: 1669359533948 endTime: 1669359534047 count: 100 [minValue:-9076669333460323191,maxValue:9175278522960949594,firstValue:2537897870994797700,lastValue:7194625271253769397,sumValue:-2.126008424849926E19] + | [ChunkIndex] offset=1766 + 2870| [IndexOfTimerseriesIndex Node] type=LEAF_MEASUREMENT + | + | +||||||||||||||||||||| [TsFileMetadata] begins + 2891| [IndexOfTimerseriesIndex Node] type=LEAF_DEVICE + | + | + | [meta offset] 2656 + | [bloom filter] bit vector byte array length=31, filterSize=256, hashFunctionSize=5 +||||||||||||||||||||| [TsFileMetadata] ends + 2964| [TsFileMetadataSize] 73 + 2968| [magic tail] TsFile + 2974| END of TsFile +---------------------------- IndexOfTimerseriesIndex Tree ----------------------------- + [MetadataIndex:LEAF_DEVICE] + └──────[root.sg1.d1,2870] + [MetadataIndex:LEAF_MEASUREMENT] + └──────[s1,2657] +---------------------------------- TsFile Sketch End ---------------------------------- +````````````````````````` + +解释: + +- 以"|"为分隔,左边是在TsFile文件中的实际位置,右边是梗概内容。 +- "|||||||||||||||||||||"是为增强可读性而添加的导引信息,不是TsFile中实际存储的数据。 +- 最后打印的"IndexOfTimerseriesIndex Tree"是对TsFile文件末尾的元数据索引树的重新整理打印,便于直观理解,不是TsFile中存储的实际数据。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Split-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Split-Tool.md new file mode 100644 index 00000000..e6e669a7 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFile-Split-Tool.md @@ -0,0 +1,48 @@ + + +# TsFile 拆分工具 + +TsFile 拆分工具用来将一个 TsFile 拆分为多个 TsFile,工具位置为 tools/tsfile/split-tsfile-tool + +使用方式: + +Windows: + +``` +.\split-tsfile-tool.bat (-level <新生成文件名的空间内合并次数,默认为10>) (-size <新生成文件的大小(字节),默认为 1048576000>) +``` + + +Linux or MacOs: + +``` +./split-tsfile-tool.sh (-level <新生成文件名的空间内合并次数,默认为10>) (-size <新生成文件的大小(字节),默认为 1048576000>) +``` + +> 例如,需要指定生成 100MB 的文件,且空间内合并次数为 6,则命令为 `./split-tsfile-tool.sh test.tsfile -level 6 -size 1048576000` (Linux or MacOs) + +使用拆分工具需要注意如下事项: + +1. 拆分工具针对单个已经封口的 TsFile 进行操作,需要确保此 TsFile 已经封口,如 TsFile 在 IoTDB 内,则需要有对应的 `.resource` 文件。 +2. 拆分过程需确保文件已经从 IoTDB 中卸载。 +3. 目前未处理 TsFile 对应的 mods 文件,如果希望拆分后继续放入 IoTDB 目录中通过重启加载,需要手动将 mods 文件拷贝多份,并修改命名,为每个新生成的文件配备一个 mods 文件。 +4. 拆分工具目前尚不支持保存对齐时间序列的 TsFile。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFileSelfCheck-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFileSelfCheck-Tool.md new file mode 100644 index 00000000..17bf0a87 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/TsFileSelfCheck-Tool.md @@ -0,0 +1,42 @@ + + +# TsFile 自检工具 +IoTDB Server 提供了 TsFile 自检工具,目前该工具可以检查 TsFile 文件中的基本格式、TimeseriesMetadata 的正确性以及 TsFile 中各部分存储的 Statistics 的正确性和一致性。 + +## 使用 +第一步:创建一个 TsFileSelfCheckTool 类的对象。 + +``` java +TsFileSelfCheckTool tool = new TsFileSelfCheckTool(); +``` + +第二步:调用自检工具的 check 方法。第一个参数 path 是要检测的 TsFile 的路径。第二个参数是是否只检测 TsFile 开头和结尾的 Magic String 和 Version Number。 + +``` java +tool.check(path, false); +``` + +* check 方法的返回值有四种。 +* 返回值为 0 表示 TsFile 自检无错。 +* 返回值为 -1 表示 TsFile 存在 Statistics 不一致问题。具体会有两种异常,一种是 TimeSeriesMetadata 的 Statistics 与其后面的 ChunkMetadata 的聚合统计的 Statistics 不一致。另一种是 ChunkMetadata 的 Statistics 与其索引的 Chunk 中的 Page 聚合统计的 Statistics 不一致。 +* 返回值为 -2 表示 TsFile 版本不兼容。 +* 返回值为 -3 表示给定路径不存在 TsFile 文件。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Watermark-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Watermark-Tool.md new file mode 100644 index 00000000..1800b611 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Maintenance-Tools/Watermark-Tool.md @@ -0,0 +1,196 @@ + +# 水印工具 + +这个工具提供了 1)IoTDB 查询结果水印嵌入功能,2)可疑数据的水印检测功能。 + +## 水印嵌入 + +### 配置 + +IoTDB 默认关闭水印嵌入功能。为了使用这个功能,第一步要做的事情是修改配置文件`iotdb-system.properties`中的以下各项: + +| 名称 | 示例 | 解释 | +| ----------------------- | ------------------------------------------------------ | ----------------------------------- | +| watermark_module_opened | false | `true`打开水印嵌入功能;`false`关闭 | +| watermark_secret_key | IoTDB*2019@Beijing | 自定义密钥 | +| watermark_bit_string | 100101110100 | 要被嵌入的 0-1 比特串 | +| watermark_method | GroupBasedLSBMethod(embed_row_cycle=2,embed_lsb_num=5) | 指定水印算法及其参数 | + +注意: + +- `watermark_module_opened`: 如果您想使用水印嵌入功能,请将其设置成`true`。 +- `watermark_secret_key`: 不能使用字符 '&'。密钥长度没有限制,一般来说密钥越长,攻击难度就越高。 +- `watermark_bit_string`: 比特串长度没有限制(除了不能为空字符串),但是当长度过短时,水印检测可能达不到要求的显著性水平。 +- `watermark_method`: 现在仅支持一种算法 GroupBasedLSBMethod,因此您实际上可以修改的只有这个算法的两个参数`embed_row_cycle`和`embed_lsb_num`的值: + - 均是正整数 + - `embed_row_cycle`控制了被嵌入水印的行占总行数的比例。`embed_row_cycle`越小,被嵌入水印的行的比例就越大。当`embed_row_cycle`等于 1 的时候,所有的行都将嵌入水印。 + - GroupBasedLSBMethod 使用 LSB 嵌入。`embed_lsb_num`控制了允许嵌入水印的最低有效位的数量。`embed_lsb_num`越大,数值的可变化范围就越大。 +- `watermark_secret_key`, `watermark_bit_string`和`watermark_method`都不应该被攻击者获得。您需要自己负责配置文件`iotdb-system.properties`的安全管理。 + +### 使用示例 + + * 第一步:创建一个新用户 Alice,授予读权限,然后查询 + +一个新创建的用户默认不使用水印。因此查询结果就是数据库中的原始数据。 + +``` +.\start-cli.bat -u root -pw root +create user Alice '1234' +grant user Alice privileges READ_TIMESERIES on root.vehicle +exit + +.\start-cli.bat -u Alice -pw 1234 +select * from root ++-----------------------------------+------------------+ +| Time|root.vehicle.d0.s0| ++-----------------------------------+------------------+ +| 1970-01-01T08:00:00.001+08:00| 21.5| +| 1970-01-01T08:00:00.002+08:00| 22.5| +| 1970-01-01T08:00:00.003+08:00| 23.5| +| 1970-01-01T08:00:00.004+08:00| 24.5| +| 1970-01-01T08:00:00.005+08:00| 25.5| +| 1970-01-01T08:00:00.006+08:00| 26.5| +| 1970-01-01T08:00:00.007+08:00| 27.5| +| 1970-01-01T08:00:00.008+08:00| 28.5| +| 1970-01-01T08:00:00.009+08:00| 29.5| +| 1970-01-01T08:00:00.010+08:00| 30.5| +| 1970-01-01T08:00:00.011+08:00| 31.5| +| 1970-01-01T08:00:00.012+08:00| 32.5| +| 1970-01-01T08:00:00.013+08:00| 33.5| +| 1970-01-01T08:00:00.014+08:00| 34.5| +| 1970-01-01T08:00:00.015+08:00| 35.5| +| 1970-01-01T08:00:00.016+08:00| 36.5| +| 1970-01-01T08:00:00.017+08:00| 37.5| +| 1970-01-01T08:00:00.018+08:00| 38.5| +| 1970-01-01T08:00:00.019+08:00| 39.5| +| 1970-01-01T08:00:00.020+08:00| 40.5| +| 1970-01-01T08:00:00.021+08:00| 41.5| +| 1970-01-01T08:00:00.022+08:00| 42.5| +| 1970-01-01T08:00:00.023+08:00| 43.5| +| 1970-01-01T08:00:00.024+08:00| 44.5| +| 1970-01-01T08:00:00.025+08:00| 45.5| +| 1970-01-01T08:00:00.026+08:00| 46.5| +| 1970-01-01T08:00:00.027+08:00| 47.5| +| 1970-01-01T08:00:00.028+08:00| 48.5| +| 1970-01-01T08:00:00.029+08:00| 49.5| +| 1970-01-01T08:00:00.030+08:00| 50.5| +| 1970-01-01T08:00:00.031+08:00| 51.5| +| 1970-01-01T08:00:00.032+08:00| 52.5| +| 1970-01-01T08:00:00.033+08:00| 53.5| ++-----------------------------------+------------------+ +``` + + * 第二步:给 Alice 施加水印嵌入 + +sql 用法:`grant watermark_embedding to Alice` + +您可以使用`grant watermark_embedding to user1,user2,...`来同时给多个用户施加水印嵌入。 + +只有 root 用户有权限运行该指令。在 root 给 Alice 施加水印嵌入之后,Alice 的所有查询结果都将被嵌入水印。 + +``` +.\start-cli.bat -u root -pw root +grant watermark_embedding to Alice +exit + +.\start-cli.bat -u Alice -pw 1234 +select * from root + ++-----------------------------------+------------------+ +| Time|root.vehicle.d0.s0| ++-----------------------------------+------------------+ +| 1970-01-01T08:00:00.001+08:00| 21.5| +| 1970-01-01T08:00:00.002+08:00| 22.5| +| 1970-01-01T08:00:00.003+08:00| 23.500008| +| 1970-01-01T08:00:00.004+08:00| 24.500015| +| 1970-01-01T08:00:00.005+08:00| 25.5| +| 1970-01-01T08:00:00.006+08:00| 26.500015| +| 1970-01-01T08:00:00.007+08:00| 27.5| +| 1970-01-01T08:00:00.008+08:00| 28.500004| +| 1970-01-01T08:00:00.009+08:00| 29.5| +| 1970-01-01T08:00:00.010+08:00| 30.5| +| 1970-01-01T08:00:00.011+08:00| 31.5| +| 1970-01-01T08:00:00.012+08:00| 32.5| +| 1970-01-01T08:00:00.013+08:00| 33.5| +| 1970-01-01T08:00:00.014+08:00| 34.5| +| 1970-01-01T08:00:00.015+08:00| 35.500004| +| 1970-01-01T08:00:00.016+08:00| 36.5| +| 1970-01-01T08:00:00.017+08:00| 37.5| +| 1970-01-01T08:00:00.018+08:00| 38.5| +| 1970-01-01T08:00:00.019+08:00| 39.5| +| 1970-01-01T08:00:00.020+08:00| 40.5| +| 1970-01-01T08:00:00.021+08:00| 41.5| +| 1970-01-01T08:00:00.022+08:00| 42.500015| +| 1970-01-01T08:00:00.023+08:00| 43.5| +| 1970-01-01T08:00:00.024+08:00| 44.500008| +| 1970-01-01T08:00:00.025+08:00| 45.50003| +| 1970-01-01T08:00:00.026+08:00| 46.500008| +| 1970-01-01T08:00:00.027+08:00| 47.500008| +| 1970-01-01T08:00:00.028+08:00| 48.5| +| 1970-01-01T08:00:00.029+08:00| 49.5| +| 1970-01-01T08:00:00.030+08:00| 50.5| +| 1970-01-01T08:00:00.031+08:00| 51.500008| +| 1970-01-01T08:00:00.032+08:00| 52.5| +| 1970-01-01T08:00:00.033+08:00| 53.5| ++-----------------------------------+------------------+ +``` + + * 第三步:撤销 Alice 的水印嵌入 + +sql 用法:`revoke watermark_embedding from Alice` + +您可以使用`revoke watermark_embedding from user1,user2,...`来同时撤销多个用户的水印嵌入。 + +只有 root 用户有权限运行该指令。在 root 撤销 Alice 的水印嵌入之后,Alice 的所有查询结果就又是数据库中的原始数据了。 + +## 水印检测 + +`detect-watermark.sh` 和 `detect-watermark.bat` 是给不同平台提供的功能相同的工具脚本。 + +用法: ./detect-watermark.sh [filePath] [secretKey] [watermarkBitString] [embed_row_cycle] [embed_lsb_num] [alpha] [columnIndex] [dataType: int/float/double] + +示例: ./detect-watermark.sh /home/data/dump1.csv IoTDB*2019@Beijing 100101110100 2 5 0.05 1 float + +| Args | 示例 | 解释 | +| ------------------ | -------------------- | ---------------------------------------------- | +| filePath | /home/data/dump1.csv | 可疑数据的文件路径 | +| secretKey | IoTDB*2019@Beijing | 参见水印嵌入小节 | +| watermarkBitString | 100101110100 | 参见水印嵌入小节 | +| embed_row_cycle | 2 | 参见水印嵌入小节 | +| embed_lsb_num | 5 | 参见水印嵌入小节 | +| alpha | 0.05 | 显著性水平 | +| columnIndex | 1 | 指定可疑数据的某一列进行检测 | +| dataType | float | 指定检测列的数据类型;int/float/double 任选其一 | + +注意: + +- `filePath`: 您可以使用 export-csv 工具来生成这样的数据文件。第一行是表头, 第一列是时间列。文件中的数据示例如下: + +| Time | root.vehicle.d0.s1 | root.vehicle.d0.s1 | +| ----------------------------- | ------------------ | ------------------ | +| 1970-01-01T08:00:00.001+08:00 | 100 | null | +| ... | ... | ... | + +- `watermark_secret_key`, `watermark_bit_string`, `embed_row_cycle`和`embed_lsb_num`应该和水印嵌入过程使用的值保持一致。 + +- `alpha`: 取值范围 [0,1]。水印检测基于显著性检验,`alpha`越小,没有嵌入水印的数据被检测成嵌入水印的可能性越低,从而检测出嵌入水印的结果的可信度越高。 + +- `columnIndex`: 正整数 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/MapReduce-TsFile.md b/src/zh/UserGuide/V2.0.1/Tree/stage/MapReduce-TsFile.md new file mode 100644 index 00000000..6518c84a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/MapReduce-TsFile.md @@ -0,0 +1,200 @@ + + +# Hadoop-TsFile + +TsFile 的 Hadoop 连接器实现了对 Hadoop 读取外部 Tsfile 类型的文件格式的支持。让用户可以使用 Hadoop 的 map、reduce 等操作对 Tsfile 文件进行读取、写入和查询。 + +有了这个连接器,用户可以 +* 将单个 Tsfile 文件加载进 Hadoop,不论文件是存储在本地文件系统或者是 HDFS 中 +* 将某个特定目录下的所有文件加载进 Hadoop,不论文件是存储在本地文件系统或者是 HDFS 中 +* 将 Hadoop 处理完后的结果以 Tsfile 的格式保存 + +## 系统环境要求 + +|Hadoop 版本 | Java 版本 | TsFile 版本 | +|------------- | ------------ |------------ | +| `2.7.3` | `1.8` | `1.0.0+`| + +>注意:关于如何下载和使用 Tsfile, 请参考以下链接:https://github.com/apache/iotdb/tree/master/tsfile. + +## 数据类型对应关系 + +| TsFile 数据类型 | Hadoop writable | +| ---------------- | --------------- | +| BOOLEAN | BooleanWritable | +| INT32 | IntWritable | +| INT64 | LongWritable | +| FLOAT | FloatWritable | +| DOUBLE | DoubleWritable | +| TEXT | Text | + +## 关于 TSFInputFormat 的说明 + +TSFInputFormat 继承了 Hadoop 中 FileInputFormat 类,重写了其中切片的方法。 + +目前的切片方法是根据每个 ChunkGroup 的中点的 offset 是否属于 Hadoop 所切片的 startOffset 和 endOffset 之间,来判断是否将该 ChunkGroup 放入此切片。 + +TSFInputFormat 将 tsfile 中的数据以多个`MapWritable`记录的形式返回给用户。 + +假设我们想要从 Tsfile 中获得名为`d1`的设备的数据,该设备有三个传感器,名称分别为`s1`, `s2`, `s3`。 + +`s1`的类型是`BOOLEAN`, `s2`的类型是 `DOUBLE`, `s3`的类型是`TEXT`. + +`MapWritable`的结构如下所示: +``` +{ + "time_stamp": 10000000, + "device_id": d1, + "s1": true, + "s2": 3.14, + "s3": "middle" +} +``` + +在 Hadoop 的 Map job 中,你可以采用如下方法获得你想要的任何值 + +`mapwritable.get(new Text("s1"))` +> 注意:`MapWritable`中所有的键值类型都是`Text`。 + +## 使用示例 + +### 读示例:求和 + +首先,我们需要在 TSFInputFormat 中配置我们需要哪些数据 + +``` +// configure reading time enable +TSFInputFormat.setReadTime(job, true); +// configure reading deviceId enable +TSFInputFormat.setReadDeviceId(job, true); +// configure reading which deltaObjectIds +String[] deviceIds = {"device_1"}; +TSFInputFormat.setReadDeviceIds(job, deltaObjectIds); +// configure reading which measurementIds +String[] measurementIds = {"sensor_1", "sensor_2", "sensor_3"}; +TSFInputFormat.setReadMeasurementIds(job, measurementIds); +``` + +然后,必须指定 mapper 和 reducer 输出的键和值类型 + +``` +// set inputformat and outputformat +job.setInputFormatClass(TSFInputFormat.class); +// set mapper output key and value +job.setMapOutputKeyClass(Text.class); +job.setMapOutputValueClass(DoubleWritable.class); +// set reducer output key and value +job.setOutputKeyClass(Text.class); +job.setOutputValueClass(DoubleWritable.class); +``` +接着,就可以编写包含具体的处理数据逻辑的`mapper`和`reducer`类了。 + +``` +public static class TSMapper extends Mapper { + + @Override + protected void map(NullWritable key, MapWritable value, + Mapper.Context context) + throws IOException, InterruptedException { + + Text deltaObjectId = (Text) value.get(new Text("device_id")); + context.write(deltaObjectId, (DoubleWritable) value.get(new Text("sensor_3"))); + } +} + +public static class TSReducer extends Reducer { + + @Override + protected void reduce(Text key, Iterable values, + Reducer.Context context) + throws IOException, InterruptedException { + + double sum = 0; + for (DoubleWritable value : values) { + sum = sum + value.get(); + } + context.write(key, new DoubleWritable(sum)); + } +} +``` + +> 注意:完整的代码示例可以在如下链接中找到:https://github.com/apache/iotdb/blob/master/example/hadoop/src/main/java/org/apache/iotdb/hadoop/tsfile/TSFMRReadExample.java + +## 写示例:计算平均数并写入 Tsfile 中 + +除了`OutputFormatClass`,剩下的配置代码跟上面的读示例是一样的 + +``` +job.setOutputFormatClass(TSFOutputFormat.class); +// set reducer output key and value +job.setOutputKeyClass(NullWritable.class); +job.setOutputValueClass(HDFSTSRecord.class); +``` + +然后,是包含具体的处理数据逻辑的`mapper`和`reducer`类。 + +``` +public static class TSMapper extends Mapper { + + @Override + protected void map(NullWritable key, MapWritable value, + Mapper.Context context) + throws IOException, InterruptedException { + + Text deltaObjectId = (Text) value.get(new Text("device_id")); + long timestamp = ((LongWritable)value.get(new Text("timestamp"))).get(); + if (timestamp % 100000 == 0) { + context.write(deltaObjectId, new MapWritable(value)); + } + } +} + +/** + * This reducer calculate the average value. + */ +public static class TSReducer extends Reducer { + + @Override + protected void reduce(Text key, Iterable values, + Reducer.Context context) throws IOException, InterruptedException { + long sensor1_value_sum = 0; + long sensor2_value_sum = 0; + double sensor3_value_sum = 0; + long num = 0; + for (MapWritable value : values) { + num++; + sensor1_value_sum += ((LongWritable)value.get(new Text("sensor_1"))).get(); + sensor2_value_sum += ((LongWritable)value.get(new Text("sensor_2"))).get(); + sensor3_value_sum += ((DoubleWritable)value.get(new Text("sensor_3"))).get(); + } + HDFSTSRecord tsRecord = new HDFSTSRecord(1L, key.toString()); + DataPoint dPoint1 = new LongDataPoint("sensor_1", sensor1_value_sum / num); + DataPoint dPoint2 = new LongDataPoint("sensor_2", sensor2_value_sum / num); + DataPoint dPoint3 = new DoubleDataPoint("sensor_3", sensor3_value_sum / num); + tsRecord.addTuple(dPoint1); + tsRecord.addTuple(dPoint2); + tsRecord.addTuple(dPoint3); + context.write(NullWritable.get(), tsRecord); + } +} +``` +> 注意:完整的代码示例可以在如下链接中找到:https://github.com/apache/iotdb/blob/master/example/hadoop/src/main/java/org/apache/iotdb/hadoop/tsfile/TSMRWriteExample.java diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Alerting.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Alerting.md new file mode 100644 index 00000000..77c2fc0a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Alerting.md @@ -0,0 +1,370 @@ + + +# 告警 + +## 概览 +IoTDB 告警功能预计支持两种模式: + +* 写入触发:用户写入原始数据到原始时间序列,每插入一条数据都会触发 `Trigger` 的判断逻辑, +若满足告警要求则发送告警到下游数据接收器, +数据接收器再转发告警到外部终端。这种模式: + * 适合需要即时监控每一条数据的场景。 + * 由于触发器中的运算会影响数据写入性能,适合对原始数据写入性能不敏感的场景。 + +* 持续查询:用户写入原始数据到原始时间序列, + `ContinousQuery` 定时查询原始时间序列,将查询结果写入新的时间序列, + 每一次写入触发 `Trigger` 的判断逻辑, + 若满足告警要求则发送告警到下游数据接收器, + 数据接收器再转发告警到外部终端。这种模式: + * 适合需要定时查询数据在某一段时间内的情况的场景。 + * 适合需要将原始数据降采样并持久化的场景。 + * 由于定时查询几乎不影响原始时间序列的写入,适合对原始数据写入性能敏感的场景。 + +随着 [Trigger](../Trigger/Instructions.md) 模块的引入,可以实现写入触发模式的告警。 + +## 部署 AlertManager + +### 安装与运行 +#### 二进制文件 +预编译好的二进制文件可在 [这里](https://prometheus.io/download/) 下载。 + +运行方法: +````shell +./alertmanager --config.file= +```` + +#### Docker 镜像 +可在 [Quay.io](https://hub.docker.com/r/prom/alertmanager/) +或 [Docker Hub](https://quay.io/repository/prometheus/alertmanager) 获得。 + +运行方法: +````shell +docker run --name alertmanager -d -p 127.0.0.1:9093:9093 quay.io/prometheus/alertmanager +```` + +### 配置 + +如下是一个示例,可以覆盖到大部分配置规则,详细的配置规则参见 +[这里](https://prometheus.io/docs/alerting/latest/configuration/)。 + +示例: +``` yaml +# alertmanager.yml + +global: + # The smarthost and SMTP sender used for mail notifications. + smtp_smarthost: 'localhost:25' + smtp_from: 'alertmanager@example.org' + +# The root route on which each incoming alert enters. +route: + # The root route must not have any matchers as it is the entry point for + # all alerts. It needs to have a receiver configured so alerts that do not + # match any of the sub-routes are sent to someone. + receiver: 'team-X-mails' + + # The labels by which incoming alerts are grouped together. For example, + # multiple alerts coming in for cluster=A and alertname=LatencyHigh would + # be batched into a single group. + # + # To aggregate by all possible labels use '...' as the sole label name. + # This effectively disables aggregation entirely, passing through all + # alerts as-is. This is unlikely to be what you want, unless you have + # a very low alert volume or your upstream notification system performs + # its own grouping. Example: group_by: [...] + group_by: ['alertname', 'cluster'] + + # When a new group of alerts is created by an incoming alert, wait at + # least 'group_wait' to send the initial notification. + # This way ensures that you get multiple alerts for the same group that start + # firing shortly after another are batched together on the first + # notification. + group_wait: 30s + + # When the first notification was sent, wait 'group_interval' to send a batch + # of new alerts that started firing for that group. + group_interval: 5m + + # If an alert has successfully been sent, wait 'repeat_interval' to + # resend them. + repeat_interval: 3h + + # All the above attributes are inherited by all child routes and can + # overwritten on each. + + # The child route trees. + routes: + # This routes performs a regular expression match on alert labels to + # catch alerts that are related to a list of services. + - match_re: + service: ^(foo1|foo2|baz)$ + receiver: team-X-mails + + # The service has a sub-route for critical alerts, any alerts + # that do not match, i.e. severity != critical, fall-back to the + # parent node and are sent to 'team-X-mails' + routes: + - match: + severity: critical + receiver: team-X-pager + + - match: + service: files + receiver: team-Y-mails + + routes: + - match: + severity: critical + receiver: team-Y-pager + + # This route handles all alerts coming from a database service. If there's + # no team to handle it, it defaults to the DB team. + - match: + service: database + + receiver: team-DB-pager + # Also group alerts by affected database. + group_by: [alertname, cluster, database] + + routes: + - match: + owner: team-X + receiver: team-X-pager + + - match: + owner: team-Y + receiver: team-Y-pager + +# Inhibition rules allow to mute a set of alerts given that another alert is +# firing. +# We use this to mute any warning-level notifications if the same alert is +# already critical. +inhibit_rules: +- source_match: + severity: 'critical' + target_match: + severity: 'warning' + # Apply inhibition if the alertname is the same. + # CAUTION: + # If all label names listed in `equal` are missing + # from both the source and target alerts, + # the inhibition rule will apply! + equal: ['alertname'] + +receivers: +- name: 'team-X-mails' + email_configs: + - to: 'team-X+alerts@example.org, team-Y+alerts@example.org' + +- name: 'team-X-pager' + email_configs: + - to: 'team-X+alerts-critical@example.org' + pagerduty_configs: + - routing_key: + +- name: 'team-Y-mails' + email_configs: + - to: 'team-Y+alerts@example.org' + +- name: 'team-Y-pager' + pagerduty_configs: + - routing_key: + +- name: 'team-DB-pager' + pagerduty_configs: + - routing_key: +``` + +在后面的示例中,我们采用的配置如下: +````yaml +# alertmanager.yml + +global: + smtp_smarthost: '' + smtp_from: '' + smtp_auth_username: '' + smtp_auth_password: '' + smtp_require_tls: false + +route: + group_by: ['alertname'] + group_wait: 1m + group_interval: 10m + repeat_interval: 10h + receiver: 'email' + +receivers: + - name: 'email' + email_configs: + - to: '' + +inhibit_rules: + - source_match: + severity: 'critical' + target_match: + severity: 'warning' + equal: ['alertname'] +```` + +### API +`AlertManager` API 分为 `v1` 和 `v2` 两个版本,当前 `AlertManager` API 版本为 `v2` +(配置参见 +[api/v2/openapi.yaml](https://github.com/prometheus/alertmanager/blob/master/api/v2/openapi.yaml))。 + +默认配置的前缀为 `/api/v1` 或 `/api/v2`, +发送告警的 endpoint 为 `/api/v1/alerts` 或 `/api/v2/alerts`。 +如果用户指定了 `--web.route-prefix`, +例如 `--web.route-prefix=/alertmanager/`, +那么前缀将会变为 `/alertmanager/api/v1` 或 `/alertmanager/api/v2`, +发送告警的 endpoint 变为 `/alertmanager/api/v1/alerts` +或 `/alertmanager/api/v2/alerts`。 + +## 创建 trigger + +### 编写 trigger 类 + +用户通过自行创建 Java 类、编写钩子中的逻辑来定义一个触发器。 +具体配置流程参见 [Trigger](../Trigger/Implement-Trigger.md)。 + +下面的示例创建了 `org.apache.iotdb.trigger.ClusterAlertingExample` 类, +其 `alertManagerHandler` +成员变量可发送告警至地址为 `http://127.0.0.1:9093/` 的 AlertManager 实例。 + +当 `value > 100.0` 时,发送 `severity` 为 `critical` 的告警; +当 `50.0 < value <= 100.0` 时,发送 `severity` 为 `warning` 的告警。 + +```java +package org.apache.iotdb.trigger; + +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerConfiguration; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerEvent; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerHandler; +import org.apache.iotdb.trigger.api.Trigger; +import org.apache.iotdb.trigger.api.TriggerAttributes; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.write.record.Tablet; +import org.apache.iotdb.tsfile.write.schema.MeasurementSchema; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; + +public class ClusterAlertingExample implements Trigger { + private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class); + + private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); + + private final AlertManagerConfiguration alertManagerConfiguration = + new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); + + private String alertname; + + private final HashMap labels = new HashMap<>(); + + private final HashMap annotations = new HashMap<>(); + + @Override + public void onCreate(TriggerAttributes attributes) throws Exception { + alertname = "alert_test"; + + labels.put("series", "root.ln.wf01.wt01.temperature"); + labels.put("value", ""); + labels.put("severity", ""); + + annotations.put("summary", "high temperature"); + annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); + + alertManagerHandler.open(alertManagerConfiguration); + } + + @Override + public void onDrop() throws IOException { + alertManagerHandler.close(); + } + + @Override + public boolean fire(Tablet tablet) throws Exception { + List measurementSchemaList = tablet.getSchemas(); + for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { + if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { + // for example, we only deal with the columns of Double type + double[] values = (double[]) tablet.values[i]; + for (double value : values) { + if (value > 100.0) { + LOGGER.info("trigger value > 100"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "critical"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } else if (value > 50.0) { + LOGGER.info("trigger value > 50"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "warning"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } + } + } + } + return true; + } +} +``` + +### 创建 trigger + +如下的 sql 语句在 `root.ln.wf01.wt01.temperature` +时间序列上注册了名为 `root-ln-wf01-wt01-alert`、 +运行逻辑由 `org.apache.iotdb.trigger.ClusterAlertingExample` +类定义的触发器。 + +``` sql + CREATE STATELESS TRIGGER `root-ln-wf01-wt01-alert` + AFTER INSERT + ON root.ln.wf01.wt01.temperature + AS "org.apache.iotdb.trigger.ClusterAlertingExample" + USING URI 'http://jar/ClusterAlertingExample.jar' +``` + +## 写入数据 + +当我们完成 AlertManager 的部署和启动、Trigger 的创建, +可以通过向时间序列写入数据来测试告警功能。 + +``` sql +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (1, 0); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (2, 30); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (3, 60); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (4, 90); +INSERT INTO root.ln.wf01.wt01(timestamp, temperature) VALUES (5, 120); +``` + +执行完上述写入语句后,可以收到告警邮件。由于我们的 `AlertManager` 配置中设定 `severity` 为 `critical` 的告警 +会抑制 `severity` 为 `warning` 的告警,我们收到的告警邮件中只包含写入 +`(5, 120)` 后触发的告警。 + +alerting diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Metric-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Metric-Tool.md new file mode 100644 index 00000000..aafbdcdc --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Monitor-Alert/Metric-Tool.md @@ -0,0 +1,641 @@ + + +# 监控告警 +在 IoTDB 的运行过程中,我们希望对 IoTDB 的状态进行观测,以便于排查系统问题或者及时发现系统潜在的风险,能够**反映系统运行状态的一系列指标**就是系统监控指标。 + +## 1. 什么场景下会使用到监控? + +那么什么时候会用到监控框架呢?下面列举一些常见的场景。 + +1. 系统变慢了 + + 系统变慢几乎是最常见也最头疼的问题,这时候我们需要尽可能多的信息来帮助我们找到系统变慢的原因,比如: + + - JVM信息:是不是有FGC?GC耗时多少?GC后内存有没有恢复?是不是有大量的线程? + - 系统信息:CPU使用率是不是太高了?磁盘IO是不是很频繁? + - 连接数:当前连接是不是太多? + - 接口:当前TPS是多少?各个接口耗时有没有变化? + - 线程池:系统中各种任务是否有积压? + - 缓存命中率 + +2. 磁盘快满了 + + 这时候我们迫切想知道最近一段时间数据文件的增长情况,看看是不是某种文件有突增。 + +3. 系统运行是否正常 + + 此时我们可能需要通过错误日志的数量、集群节点的状态等指标来判断系统是否在正常运行。 + +## 2. 什么人需要使用监控? + +所有关注系统状态的人员都可以使用,包括但不限于研发、测试、运维、DBA等等 + +## 3. 什么是监控指标? + +### 3.1. 监控指标名词解释 + +在 IoTDB 的监控模块,每个监控指标被 `Metric Name` 和 `Tags` 唯一标识。 + +- `Metric Name`:指标类型名称,比如`logback_events`表示日志事件。 +- `Tags`:指标分类,形式为Key-Value对,每个指标下面可以有0到多个分类,常见的Key-Value对: + - `name = xxx`:被监控对象的名称,是对**业务逻辑**的说明。比如对于`Metric Name = entry_seconds_count` + 类型的监控项,name的含义是指被监控的业务接口。 + - `type = xxx`:监控指标类型细分,是对**监控指标**本身的说明。比如对于`Metric Name = point` + 类型的监控项,type的含义是指监控具体是什么类型的点数。 + - `status = xxx`:被监控对象的状态,是对**业务逻辑**的说明。比如对于`Metric Name = Task`类型的监控项可以通过该参数,从而区分被监控对象的状态。 + - `user = xxx`:被监控对象的相关用户,是对**业务逻辑**的说明。比如统计`root`用户的写入总点数。 + - 根据具体情况自定义:比如logback_events_total下有一个level的分类,用来表示特定级别下的日志数量。 +- `Metric Level`:**指标管理级别**,默认启动级别为`Core`级别,建议启动级别为`Important级别` + ,审核严格程度`Core > Important > Normal > All` + - `Core`:系统的核心指标,供**系统内核和运维人员**使用,关乎系统的**性能、稳定性、安全性**,比如实例的状况,系统的负载等。 + - `Important`:模块的重要指标,供**运维和测试人员**使用,直接关乎**每个模块的运行状态**,比如合并文件个数、执行情况等。 + - `Normal`:模块的一般指标,供**开发人员**使用,方便在出现问题时**定位模块**,比如合并中的特定关键操作情况。 + - `All`:模块的全部指标,供**模块开发人员**使用,往往在复现问题的时候使用,从而快速解决问题。 + +### 3.2. 监控指标对外获取数据格式 + +- IoTDB 对外提供 JMX、 Prometheus 和 IoTDB 格式的监控指标: + - 对于 JMX ,可以通过```org.apache.iotdb.metrics```获取系统监控指标指标。 + - 对于 Prometheus ,可以通过对外暴露的端口获取监控指标的值 + - 对于 IoTDB 方式对外暴露:可以通过执行 IoTDB 的查询来获取监控指标 + +## 4. 监控指标有哪些? + +目前,IoTDB 对外提供一些主要模块的监控指标,并且随着新功能的开发以及系统优化或者重构,监控指标也会同步添加和更新。如果想自己在 +IoTDB +中添加更多系统监控指标埋点,可以参考[IoTDB Metrics Framework](https://github.com/apache/iotdb/tree/master/metrics)使用说明。 + +### 4.1. Core 级别监控指标 + +Core 级别的监控指标在系统运行中默认开启,每一个 Core 级别的监控指标的添加都需要经过谨慎的评估,目前 Core 级别的监控指标如下所述: + +#### 4.1.1. 集群运行状态 + +| Metric | Tags | Type | Description | +| ------------------------- | ----------------------------------------------- | --------- |----------------------------| +| up_time | - | AutoGauge | IoTDB 启动的运行时间 | +| config_node | name="total",status="Registered/Online/Unknown" | AutoGauge | 已注册/在线/离线 confignode 的节点数量 | +| data_node | name="total",status="Registered/Online/Unknown" | AutoGauge | 已注册/在线/离线 datanode 的节点数量 | +| cluster_node_leader_count | name="{ip}:{port}" | Gauge | 节点上共识组Leader的数量 | +| cluster_node_status | name="{ip}:{port}",type="ConfigNode/DataNode" | Gauge | 节点的状态,0=Unkonwn 1=online | +| entry | name="{interface}" | Timer | Client 建立的 Thrift 的耗时情况 | +| mem | name="IoTConsensus" | AutoGauge | IoT共识协议的内存占用,单位为byte | + +#### 4.1.2. 接口层统计 + +| Metric | Tags | Type | Description | +| --------------------- | ---------------------------------- | --------- | ----------------------------------- | +| thrift_connections | name="ConfigNodeRPC" | AutoGauge | ConfigNode 的内部 Thrift 连接数 | +| thrift_connections | name="InternalRPC" | AutoGauge | DataNode 的内部 Thrift 连接数 | +| thrift_connections | name="MPPDataExchangeRPC" | AutoGauge | MPP 框架的内部 Thrift 连接数 | +| thrift_connections | name="ClientRPC" | AutoGauge | Client 建立的 Thrift 连接数 | +| thrift_active_threads | name="ConfigNodeRPC-Service" | AutoGauge | ConfigNode 的内部活跃 Thrift 连接数 | +| thrift_active_threads | name="DataNodeInternalRPC-Service" | AutoGauge | DataNode 的内部活跃 Thrift 连接数 | +| thrift_active_threads | name="MPPDataExchangeRPC-Service" | AutoGauge | MPP 框架的内部活跃 Thrift 连接数 | +| thrift_active_threads | name="ClientRPC-Service" | AutoGauge | Client 建立的活跃 Thrift 连接数 | +| session_idle_time | name = "sessionId" | Histogram | 不同 Session 的空闲时间分布情况 | + +#### 4.1.3. 节点统计 +| Metric | Tags | Type | Description | +| -------- | ----------------------------------- | --------- | --------------------------- | +| quantity | name="database" | AutoGauge | 系统数据库数量 | +| quantity | name="timeSeries" | AutoGauge | 系统时间序列数量 | +| quantity | name="pointsIn" | Counter | 系统累计写入点数 | +| points | database="{database}", type="flush" | Gauge | 最新一个刷盘的memtale的点数 | + +#### 4.1.4. 集群全链路 +| Metric | Tags | Type | Description | +| ------------------------------------ | ------------------------------------------------ | ----- | -------------------------- | +| performance_overview | interface="{interface}", type="{statement_type}" | Timer | 客户端执行的操作的耗时情况 | +| performance_overview_detail | stage="authority" | Timer | 权限认证总耗时 | +| performance_overview_detail | stage="parser" | Timer | 解析构造总耗时 | +| performance_overview_detail | stage="analyzer" | Timer | 语句分析总耗时 | +| performance_overview_detail | stage="planner" | Timer | 请求规划总耗时 | +| performance_overview_detail | stage="scheduler" | Timer | 请求执行总耗时 | +| performance_overview_schedule_detail | stage="local_scheduler" | Timer | 本地请求执行总耗时 | +| performance_overview_schedule_detail | stage="remote_scheduler" | Timer | 远程请求执行总耗时 | +| performance_overview_local_detail | stage="schema_validate" | Timer | 元数据验证总耗时 | +| performance_overview_local_detail | stage="trigger" | Timer | Trigger 触发总耗时 | +| performance_overview_local_detail | stage="storage" | Timer | 共识层总耗时 | +| performance_overview_storage_detail | stage="engine" | Timer | DataRegion 抢锁总耗时 | +| performance_overview_engine_detail | stage="lock" | Timer | DataRegion 抢锁总耗时 | +| performance_overview_engine_detail | stage="create_memtable_block" | Timer | 创建新的 Memtable 耗时 | +| performance_overview_engine_detail | stage="memory_block" | Timer | 内存控制阻塞总耗时 | +| performance_overview_engine_detail | stage="wal" | Timer | 写入 Wal 总耗时 | +| performance_overview_engine_detail | stage="memtable" | Timer | 写入 Memtable 总耗时 | +| performance_overview_engine_detail | stage="last_cache" | Timer | 更新 LastCache 总耗时 | + +#### 4.1.5. 任务统计 + +| Metric | Tags | Type | Description | +| --------- | ------------------------------------------------- | --------- | ---------------- | +| queue | name="compaction_inner", status="running/waiting" | Gauge | 空间内合并任务数 | +| queue | name="compaction_cross", status="running/waiting" | Gauge | 跨空间合并任务数 | +| queue | name="flush",status="running/waiting" | AutoGauge | 刷盘任务数 | +| cost_task | name="inner_compaction/cross_compaction/flush" | Gauge | 任务耗时情况 | + +#### 4.1.6. IoTDB 进程运行状态 + +| Metric | Tags | Type | Description | +| ----------------- | -------------- | --------- | ----------------------------------- | +| process_cpu_load | name="process" | AutoGauge | IoTDB 进程的 CPU 占用率,单位为% | +| process_cpu_time | name="process" | AutoGauge | IoTDB 进程占用的 CPU 时间,单位为ns | +| process_max_mem | name="memory" | AutoGauge | IoTDB 进程最大可用内存 | +| process_total_mem | name="memory" | AutoGauge | IoTDB 进程当前已申请内存 | +| process_free_mem | name="memory" | AutoGauge | IoTDB 进程当前剩余可用内存 | + +#### 4.1.7. 系统运行状态 + +| Metric | Tags | Type | Description | +| ------------------------------ | ------------- | --------- | ---------------------------------------- | +| sys_cpu_load | name="system" | AutoGauge | 系统的 CPU 占用率,单位为% | +| sys_cpu_cores | name="system" | Gauge | 系统的可用处理器数 | +| sys_total_physical_memory_size | name="memory" | Gauge | 系统的最大物理内存 | +| sys_free_physical_memory_size | name="memory" | AutoGauge | 系统的剩余可用内存 | +| sys_total_swap_space_size | name="memory" | AutoGauge | 系统的交换区最大空间 | +| sys_free_swap_space_size | name="memory" | AutoGauge | 系统的交换区剩余可用空间 | +| sys_committed_vm_size | name="memory" | AutoGauge | 系统保证可用于正在运行的进程的虚拟内存量 | +| sys_disk_total_space | name="disk" | AutoGauge | 系统磁盘总大小 | +| sys_disk_free_space | name="disk" | AutoGauge | 系统磁盘可用大小 | + +#### 4.1.8. IoTDB 日志统计 + +| Metric | Tags | Type | Description | +| -------------- | ----------------------------------- | ------- | ------------------ | +| logback_events | level="trace/debug/info/warn/error" | Counter | 不同类型的日志个数 | + +#### 4.1.9. 文件统计信息 + +| Metric | Tags | Type | Description | +| ---------- | ------------------------- | --------- | ---------------------------------------- | +| file_size | name="wal" | AutoGauge | 写前日志总大小,单位为byte | +| file_size | name="seq" | AutoGauge | 顺序TsFile总大小,单位为byte | +| file_size | name="unseq" | AutoGauge | 乱序TsFile总大小,单位为byte | +| file_size | name="inner-seq-temp" | AutoGauge | 顺序空间内合并临时文件大小,单位为byte | +| file_size | name="inner-unseq-temp" | AutoGauge | 乱序空间内合并临时文件大小,单位为byte | +| file_size | name="cross-temp" | AutoGauge | 跨空间合并临时文件大小,单位为byte | +| file_size | name="mods" | AutoGauge | Modification 文件的大小 | +| file_count | name="wal" | AutoGauge | 写前日志文件个数 | +| file_count | name="seq" | AutoGauge | 顺序TsFile文件个数 | +| file_count | name="unseq" | AutoGauge | 乱序TsFile文件个数 | +| file_count | name="inner-seq-temp" | AutoGauge | 顺序空间内合并临时文件个数 | +| file_count | name="inner-unseq-temp" | AutoGauge | 乱序空间内合并临时文件个数 | +| file_count | name="cross-temp" | AutoGauge | 跨空间合并临时文件个数 | +| file_count | name="open_file_handlers" | AutoGauge | IoTDB 进程打开文件数,仅支持Linux和MacOS | +| file_count | name="mods | AutoGauge | Modification 文件的数目 | + +#### 4.1.10. JVM 内存统计 + +| Metric | Tags | Type | Description | +| ------------------------------- | ------------------------------- | --------- | -------------------- | +| jvm_buffer_memory_used_bytes | id="direct/mapped" | AutoGauge | 已经使用的缓冲区大小 | +| jvm_buffer_total_capacity_bytes | id="direct/mapped" | AutoGauge | 最大缓冲区大小 | +| jvm_buffer_count_buffers | id="direct/mapped" | AutoGauge | 当前缓冲区数量 | +| jvm_memory_committed_bytes | {area="heap/nonheap",id="xxx",} | AutoGauge | 当前申请的内存大小 | +| jvm_memory_max_bytes | {area="heap/nonheap",id="xxx",} | AutoGauge | 最大内存 | +| jvm_memory_used_bytes | {area="heap/nonheap",id="xxx",} | AutoGauge | 已使用内存大小 | + +#### 4.1.11. JVM 线程统计 + +| Metric | Tags | Type | Description | +| -------------------------- | ------------------------------------------------------------- | --------- | ------------------------ | +| jvm_threads_live_threads | | AutoGauge | 当前线程数 | +| jvm_threads_daemon_threads | | AutoGauge | 当前 Daemon 线程数 | +| jvm_threads_peak_threads | | AutoGauge | 峰值线程数 | +| jvm_threads_states_threads | state="runnable/blocked/waiting/timed-waiting/new/terminated" | AutoGauge | 当前处于各种状态的线程数 | + +#### 4.1.12. JVM GC 统计 + +| Metric | Tags | Type | Description | +| ----------------------------- | ----------------------------------------------------- | --------- | -------------------------------------- | +| jvm_gc_pause | action="end of major GC/end of minor GC",cause="xxxx" | Timer | 不同原因的Young GC/Full GC的次数与耗时 | +| | +| jvm_gc_concurrent_phase_time | action="{action}",cause="{cause}" | Timer | 不同原因的Young GC/Full GC的次数与耗时 | +| | +| jvm_gc_max_data_size_bytes | | AutoGauge | 老年代内存的历史最大值 | +| jvm_gc_live_data_size_bytes | | AutoGauge | 老年代内存的使用值 | +| jvm_gc_memory_promoted_bytes | | Counter | 老年代内存正向增长累计值 | +| jvm_gc_memory_allocated_bytes | | Counter | GC分配内存正向增长累计值 | + +### 4.2. Important 级别监控指标 + +目前 Important 级别的监控指标如下所述: + +#### 4.2.1. 节点统计 + +| Metric | Tags | Type | Description | +| ------ | -------------------------------------- | --------- | ------------------------------------ | +| region | name="total",type="SchemaRegion" | AutoGauge | 分区表中 SchemaRegion 总数量 | +| region | name="total",type="DataRegion" | AutoGauge | 分区表中 DataRegion 总数量 | +| region | name="{ip}:{port}",type="SchemaRegion" | Gauge | 分区表中对应节点上 DataRegion 总数量 | +| region | name="{ip}:{port}",type="DataRegion" | Gauge | 分区表中对应节点上 DataRegion 总数量 | + +#### 4.2.2. Ratis共识协议统计 + +| Metric | Tags | Type | Description | +| --------------------- | -------------------------- | ----- | ------------------------------------------------------ | +| ratis_consensus_write | stage="writeLocally" | Timer | 本地写入阶段的时间 | +| ratis_consensus_write | stage="writeRemotely" | Timer | 远程写入阶段的时间 | +| ratis_consensus_write | stage="writeStateMachine" | Timer | 写入状态机阶段的时间 | +| ratis_server | clientWriteRequest | Timer | 处理来自客户端写请求的时间 | +| ratis_server | followerAppendEntryLatency | Timer | 跟随者追加日志条目的总时间 | +| ratis_log_worker | appendEntryLatency | Timer | 领导者追加日志条目的总时间 | +| ratis_log_worker | queueingDelay | Timer | 一个 Raft 日志操作被请求后进入队列的时间,等待队列未满 | +| ratis_log_worker | enqueuedTime | Timer | 一个 Raft 日志操作在队列中的时间 | +| ratis_log_worker | writelogExecutionTime | Timer | 一个 Raft 日志写入操作完成执行的时间 | +| ratis_log_worker | flushTime | Timer | 刷新日志的时间 | +| ratis_log_worker | closedSegmentsSizeInBytes | Gauge | 关闭的 Raft 日志段的总大小 | +| ratis_log_worker | openSegmentSizeInBytes | Gauge | 打开的 Raft 日志段的总大小 | + +#### 4.2.3. IoT共识协议统计 + +| Metric | Tags | Type | Description | +| ------------- | -------------------------------------------------------------------------------------- | --------- | --------------------------------- | +| iot_consensus | name="logDispatcher-{IP}:{Port}", region="{region}", type="currentSyncIndex" | AutoGauge | 副本组同步线程的当前同步进度 | +| iot_consensus | name="logDispatcher-{IP}:{Port}", region="{region}", type="cachedRequestInMemoryQueue" | AutoGauge | 副本组同步线程缓存队列请求总大小 | +| iot_consensus | name="IoTConsensusServerImpl", region="{region}", type="searchIndex" | AutoGauge | 副本组主流程写入进度 | +| iot_consensus | name="IoTConsensusServerImpl", region="{region}", type="safeIndex" | AutoGauge | 副本组同步进度 | +| iot_consensus | name="IoTConsensusServerImpl", region="{region}", type="syncLag" | AutoGauge | 副本组写入进度与同步进度差 | +| iot_consensus | name="IoTConsensusServerImpl", region="{region}", type="LogEntriesFromWAL" | AutoGauge | 副本组Batch中来自WAL的日志项数量 | +| iot_consensus | name="IoTConsensusServerImpl", region="{region}", type="LogEntriesFromQueue" | AutoGauge | 副本组Batch中来自队列的日志项数量 | +| stage | name="iot_consensus", region="{region}", type="getStateMachineLock" | Histogram | 主流程获取状态机锁耗时 | +| stage | name="iot_consensus", region="{region}", type="checkingBeforeWrite" | Histogram | 主流程写入状态机检查耗时 | +| stage | name="iot_consensus", region="{region}", type="writeStateMachine" | Histogram | 主流程写入状态机耗时 | +| stage | name="iot_consensus", region="{region}", type="offerRequestToQueue" | Histogram | 主流程尝试添加队列耗时 | +| stage | name="iot_consensus", region="{region}", type="consensusWrite" | Histogram | 主流程全写入耗时 | +| stage | name="iot_consensus", region="{region}", type="constructBatch" | Histogram | 同步线程构造 Batch 耗时 | +| stage | name="iot_consensus", region="{region}", type="syncLogTimePerRequest" | Histogram | 异步回调流程同步日志耗时 | + +#### 4.2.4. 缓存统计 + +| Metric | Tags | Type | Description | +| --------- |------------------------------------| --------- |-----------------------------------------------| +| cache_hit | name="chunk" | AutoGauge | ChunkCache的命中率,单位为% | +| cache_hit | name="schema" | AutoGauge | SchemaCache的命中率,单位为% | +| cache_hit | name="timeSeriesMeta" | AutoGauge | TimeseriesMetadataCache的命中率,单位为% | +| cache_hit | name="bloomFilter" | AutoGauge | TimeseriesMetadataCache中的bloomFilter的拦截率,单位为% | +| cache | name="Database", type="hit" | Counter | Database Cache 的命中次数 | +| cache | name="Database", type="all" | Counter | Database Cache 的访问次数 | +| cache | name="SchemaPartition", type="hit" | Counter | SchemaPartition Cache 的命中次数 | +| cache | name="SchemaPartition", type="all" | Counter | SchemaPartition Cache 的访问次数 | +| cache | name="DataPartition", type="hit" | Counter | DataPartition Cache 的命中次数 | +| cache | name="DataPartition", type="all" | Counter | DataPartition Cache 的访问次数 | +| cache | name="SchemaCache", type="hit" | Counter | SchemaCache 的命中次数 | +| cache | name="SchemaCache", type="all" | Counter | SchemaCache 的访问次数 | + +#### 4.2.5. 内存统计 + +| Metric | Tags | Type | Description | +| ------ | ------------------------------------ | --------- | ------------------------------------------------- | +| mem | name="database_{name}" | AutoGauge | DataNode内对应DataRegion的内存占用,单位为byte | +| mem | name="chunkMetaData_{name}" | AutoGauge | 写入TsFile时的ChunkMetaData的内存占用,单位为byte | +| mem | name="IoTConsensus" | AutoGauge | IoT共识协议的内存占用,单位为byte | +| mem | name="IoTConsensusQueue" | AutoGauge | IoT共识协议用于队列的内存占用,单位为byte | +| mem | name="IoTConsensusSync" | AutoGauge | IoT共识协议用于同步的内存占用,单位为byte | +| mem | name="schema_region_total_usage" | AutoGauge | 所有SchemaRegion的总内存占用,单位为byte | + +#### 4.2.6. 合并统计 + +| Metric | Tags | Type | Description | +| --------------------- | --------------------------------------------------- | ------- | ------------------ | +| data_written | name="compaction", type="aligned/not-aligned/total" | Counter | 合并时写入量 | +| data_read | name="compaction" | Counter | 合并时的读取量 | +| compaction_task_count | name = "inner_compaction", type="sequence" | Counter | 顺序空间内合并次数 | +| compaction_task_count | name = "inner_compaction", type="unsequence" | Counter | 乱序空间内合并次数 | +| compaction_task_count | name = "cross_compaction", type="cross" | Counter | 跨空间合并次数 | + +#### 4.2.7. IoTDB 进程统计 + +| Metric | Tags | Type | Description | +| --------------------- | -------------- | --------- | ------------------------------------ | +| process_used_mem | name="memory" | AutoGauge | IoTDB 进程当前使用内存 | +| process_mem_ratio | name="memory" | AutoGauge | IoTDB 进程的内存占用比例 | +| process_threads_count | name="process" | AutoGauge | IoTDB 进程当前线程数 | +| process_status | name="process" | AutoGauge | IoTDB 进程存活状态,1为存活,0为终止 | + +#### 4.2.8. JVM 类加载统计 + +| Metric | Tags | Type | Description | +| ---------------------------- | ---- | --------- | ------------------- | +| jvm_classes_unloaded_classes | | AutoGauge | 累计卸载的class数量 | +| jvm_classes_loaded_classes | | AutoGauge | 累计加载的class数量 | + +#### 4.2.9. JVM 编译时间统计 + +| Metric | Tags | Type | Description | +| ----------------------- | --------------------------------------------- | --------- | ------------------ | +| jvm_compilation_time_ms | {compiler="HotSpot 64-Bit Tiered Compilers",} | AutoGauge | 耗费在编译上的时间 | + +#### 4.2.10. 查询规划耗时统计 + +| Metric | Tags | Type | Description | +| --------------- | ---------------------------- | ----- | -------------------------- | +| query_plan_cost | stage="analyzer" | Timer | 查询语句分析耗时 | +| query_plan_cost | stage="logical_planner" | Timer | 查询逻辑计划规划耗时 | +| query_plan_cost | stage="distribution_planner" | Timer | 查询分布式执行计划规划耗时 | +| query_plan_cost | stage="partition_fetcher" | Timer | 分区信息拉取耗时 | +| query_plan_cost | stage="schema_fetcher" | Timer | 元数据信息拉取耗时 | + +#### 4.2.11. 执行计划分发耗时统计 + +| Metric | Tags | Type | Description | +| ---------- | ------------------------- | ----- | -------------------- | +| dispatcher | stage="wait_for_dispatch" | Timer | 分发执行计划耗时 | +| dispatcher | stage="dispatch_read" | Timer | 查询执行计划发送耗时 | + +#### 4.2.12. 查询资源访问统计 + +| Metric | Tags | Type | Description | +| -------------- | ------------------------ | ---- | -------------------------- | +| query_resource | type="sequence_tsfile" | Rate | 顺序文件访问频率 | +| query_resource | type="unsequence_tsfile" | Rate | 乱序文件访问频率 | +| query_resource | type="flushing_memtable" | Rate | flushing memtable 访问频率 | +| query_resource | type="working_memtable" | Rate | working memtable 访问频率 | + +#### 4.2.13. 数据传输模块统计 + +| Metric | Tags | Type | Description | +|---------------------|------------------------------------------------------------------------|-----------|-----------------------------------| +| data_exchange_cost | operation="source_handle_get_tsblock", type="local/remote" | Timer | source handle 接收 TsBlock 耗时 | +| data_exchange_cost | operation="source_handle_deserialize_tsblock", type="local/remote" | Timer | source handle 反序列化 TsBlock 耗时 | +| data_exchange_cost | operation="sink_handle_send_tsblock", type="local/remote" | Timer | sink handle 发送 TsBlock 耗时 | +| data_exchange_cost | operation="send_new_data_block_event_task", type="server/caller" | Timer | sink handle 发送 TsBlock RPC 耗时 | +| data_exchange_cost | operation="get_data_block_task", type="server/caller" | Timer | source handle 接收 TsBlock RPC 耗时 | +| data_exchange_cost | operation="on_acknowledge_data_block_event_task", type="server/caller" | Timer | source handle 确认接收 TsBlock RPC 耗时 | +| data_exchange_count | name="send_new_data_block_num", type="server/caller" | Histogram | sink handle 发送 TsBlock数量 | +| data_exchange_count | name="get_data_block_num", type="server/caller" | Histogram | source handle 接收 TsBlock 数量 | +| data_exchange_count | name="on_acknowledge_data_block_num", type="server/caller" | Histogram | source handle 确认接收 TsBlock 数量 | +| data_exchange_count | name="shuffle_sink_handle_size" | AutoGauge | sink handle 数量 | +| data_exchange_count | name="source_handle_size" | AutoGauge | source handle 数量 | +#### 4.2.14. 查询任务调度统计 + +| Metric | Tags | Type | Description | +|------------------|----------------------------------|-----------|---------------| +| driver_scheduler | name="ready_queued_time" | Timer | 就绪队列排队时间 | +| driver_scheduler | name="block_queued_time" | Timer | 阻塞队列排队时间 | +| driver_scheduler | name="ready_queue_task_count" | AutoGauge | 就绪队列排队任务数 | +| driver_scheduler | name="block_queued_task_count" | AutoGauge | 阻塞队列排队任务数 | +| driver_scheduler | name="timeout_queued_task_count" | AutoGauge | 超时队列排队任务数 | +| driver_scheduler | name="query_map_size" | AutoGauge | 记录在查询调度器中的查询数 | +#### 4.2.15. 查询执行耗时统计 + +| Metric | Tags | Type | Description | +| ------------------------ | ----------------------------------------------------------------------------------- | ------- | ---------------------------------------------- | +| query_execution | stage="local_execution_planner" | Timer | 算子树构造耗时 | +| query_execution | stage="query_resource_init" | Timer | 查询资源初始化耗时 | +| query_execution | stage="get_query_resource_from_mem" | Timer | 查询资源内存查询与构造耗时 | +| query_execution | stage="driver_internal_process" | Timer | Driver 执行耗时 | +| query_execution | stage="wait_for_result" | Timer | 从resultHandle 获取一次查询结果的耗时 | +| operator_execution_cost | name="{operator_name}" | Timer | 算子执行耗时 | +| operator_execution_count | name="{operator_name}" | Counter | 算子调用次数(以 next 方法调用次数计算) | +| aggregation | from="raw_data" | Timer | 从一批原始数据进行一次聚合计算的耗时 | +| aggregation | from="statistics" | Timer | 使用统计信息更新一次聚合值的耗时 | +| series_scan_cost | stage="load_timeseries_metadata", type="aligned/non_aligned", from="mem/disk" | Timer | 加载 TimeseriesMetadata 耗时 | +| series_scan_cost | stage="read_timeseries_metadata", type="", from="cache/file" | Timer | 读取一个文件的 Metadata 耗时 | +| series_scan_cost | stage="timeseries_metadata_modification", type="aligned/non_aligned", from="null" | Timer | 过滤删除的 TimeseriesMetadata 耗时 | +| series_scan_cost | stage="load_chunk_metadata_list", type="aligned/non_aligned", from="mem/disk" | Timer | 加载 ChunkMetadata 列表耗时 | +| series_scan_cost | stage="chunk_metadata_modification", type="aligned/non_aligned", from="mem/disk" | Timer | 过滤删除的 ChunkMetadata 耗时 | +| series_scan_cost | stage="chunk_metadata_filter", type="aligned/non_aligned", from="mem/disk" | Timer | 根据查询过滤条件过滤 ChunkMetadata 耗时 | +| series_scan_cost | stage="construct_chunk_reader", type="aligned/non_aligned", from="mem/disk" | Timer | 构造 ChunkReader 耗时 | +| series_scan_cost | stage="read_chunk", type="", from="cache/file" | Timer | 读取 Chunk 的耗时 | +| series_scan_cost | stage="init_chunk_reader", type="aligned/non_aligned", from="mem/disk" | Timer | 初始化 ChunkReader(构造 PageReader) 耗时 | +| series_scan_cost | stage="build_tsblock_from_page_reader", type="aligned/non_aligned", from="mem/disk" | Timer | 从 PageReader 构造 Tsblock 耗时 | +| series_scan_cost | stage="build_tsblock_from_merge_reader", type="aligned/non_aligned", from="null" | Timer | 从 MergeReader 构造 Tsblock (解乱序数据)耗时 | + +#### 4.2.16. 协调模块统计 + +| Metric | Tags | Type | Description | +|-------------|---------------------------------|-----------|-------------------| +| coordinator | name="query_execution_map_size" | AutoGauge | 当前DataNode上记录的查询数 | + +#### 4.2.17. 查询实例管理模块统计 + +| Metric | Tags | Type | Description | +|---------------------------|--------------------------------|-----------|--------------------------------------------------| +| fragment_instance_manager | name="instance_context_size" | AutoGauge | 当前 DataNode 上的查询分片 context 数 | +| fragment_instance_manager | name="instance_execution_size" | AutoGauge | 当前 DataNode 上的查询分片数 | + +#### 4.2.18. 内存池统计 + +| Metric | Tags | Type | Description | +|-------------|--------------------------------------|-----------|-------------------------------------| +| memory_pool | name="max_bytes" | Gauge | 用于数据交换的最大内存 | +| memory_pool | name="remaining_bytes" | AutoGauge | 用于数据交换的剩余内存 | +| memory_pool | name="query_memory_reservation_size" | AutoGauge | 申请内存的查询数 | +| memory_pool | name="memory_reservation_size" | AutoGauge | 申请内存的 sink handle 和 source handle 数 | + +#### 4.2.19. 本地查询分片调度模块统计 + +| Metric | Tags | Type | Description | +|-------------------------|----------------------------------|-----------|---------------------| +| local_execution_planner | name="free_memory_for_operators" | AutoGauge | 可分配给operator执行的剩余内存 | + +#### 4.2.20. 元数据引擎统计 + +| Metric | Tags | Type | Description | +| ------------- | ------------------------------------------------------------ | --------- | ---------------------------------- | +| schema_engine | name="schema_region_total_mem_usage" | AutoGauge | SchemaRegion 全局内存使用量 | +| schema_engine | name="schema_region_mem_capacity" | AutoGauge | SchemaRegion 全局可用内存 | +| schema_engine | name="schema_engine_mode" | Gauge | SchemaEngine 模式 | +| schema_engine | name="schema_region_consensus" | Gauge | 元数据管理引擎共识协议 | +| schema_engine | name="schema_region_number" | AutoGauge | SchemaRegion 个数 | +| quantity | name="template_series_cnt" | AutoGauge | 模板序列数 | +| schema_region | name="schema_region_mem_usage", region="SchemaRegion[{regionId}]" | AutoGauge | 每个 SchemaRegion 分别的内存使用量 | +| schema_region | name="schema_region_series_cnt", region="SchemaRegion[{regionId}]" | AutoGauge | 每个 SchemaRegion 分别的时间序列数 | +| schema_region | name="activated_template_cnt", region="SchemaRegion[{regionId}]" | AutoGauge | 每个 SchemaRegion 激活的模板数 | +| schema_region | name="template_series_cnt", region="SchemaRegion[{regionId}]" | AutoGauge | 每个 SchemaRegion 的模板序列数 | + +#### 4.2.21. 写入指标统计 + +| Metric | Tags | Type | Description | +|---------------------------|:----------------------------------------------------------------------|-----------|------------------------------------------| +| wal_node_num | name="wal_nodes_num" | AutoGauge | WALNode数量 | +| wal_cost | stage="make_checkpoint" type="" | Timer | 创建各种类型的Checkpoint耗时 | +| wal_cost | type="serialize_one_wal_info_entry" | Timer | 对每个WALInfoEntry serialize耗时 | +| wal_cost | stage="sync_wal_buffer" type="" | Timer | WAL flush SyncBuffer耗时 | +| wal_buffer | name="used_ratio" | Histogram | WALBuffer利用率 | +| wal_buffer | name="entries_count" | Histogram | WALBuffer条目数量 | +| wal_cost | stage="serialize_wal_entry" type="serialize_wal_entry_total" | Timer | WALBuffer serialize任务耗时 | +| wal_node_info | name="effective_info_ratio" type="" | Histogram | WALNode有效信息占比 | +| wal_node_info | name="oldest_mem_table_ram_when_cause_snapshot" type="" | Histogram | WAL触发oldest MemTable snapshot时MemTable大小 | +| wal_node_info | name="oldest_mem_table_ram_when_cause_flush" type="" | Histogram | WAL触发oldest MemTable flush时MemTable大小 | +| flush_sub_task_cost | type="sort_task" | Timer | 排序阶段中的每个series排序耗时 | +| flush_sub_task_cost | type="encoding_task" | Timer | 编码阶段中处理每个encodingTask耗时 | +| flush_sub_task_cost | type="io_task" | Timer | IO阶段中处理每个ioTask耗时 | +| flush_cost | stage="write_plan_indices" | Timer | writePlanIndices耗时 | +| flush_cost | stage="sort" | Timer | 排序阶段总耗时 | +| flush_cost | stage="encoding" | Timer | 编码阶段总耗时 | +| flush_cost | stage="io" | Timer | IO阶段总耗时 | +| pending_flush_task | type="pending_task_num" | AutoGauge | 阻塞的Task数 | +| pending_flush_task | type="pending_sub_task_num" | AutoGauge | 阻塞的SubTask数 | +| flushing_mem_table_status | name="mem_table_size" region="DataRegion[]" | Histogram | Flush时MemTable大小 | +| flushing_mem_table_status | name="total_point_num" region="DataRegion[]" | Histogram | Flush时MemTable中point数量 | +| flushing_mem_table_status | name="series_num" region="DataRegion[]" | Histogram | Flush时MemTable中series数量 | +| flushing_mem_table_status | name="avg_series_points_num" region="DataRegion[]" | Histogram | Flush时该memTable内平均每个Memchunk中的point数量 | +| flushing_mem_table_status | name="tsfile_compression_ratio" region="DataRegion[]" | Histogram | Flush MemTable时对应的TsFile的压缩率 | +| flushing_mem_table_status | name="flush_tsfile_size" region="DataRegion[]" | Histogram | Flush的MemTable对应的TsFile大小 | +| data_region_mem_cost | name="data_region_mem_cost" | AutoGauge | DataRegion内存占用 | + +### 4.3. Normal 级别监控指标 + +#### 4.3.1. 集群 + +| Metric | Tags | Type | Description | +| ------ | ------------------------------------------------------------ | --------- | ------------------------------------------------------- | +| region | name="{DatabaseName}",type="SchemaRegion/DataRegion" | AutoGauge | 特定节点上不同 Database 的 DataRegion/SchemaRegion 个数 | +| slot | name="{DatabaseName}",type="schemaSlotNumber/dataSlotNumber" | AutoGauge | 特定节点上不同 Database 的 DataSlot/SchemaSlot 个数 | + +### 4.4. All 级别监控指标 + +目前还没有All级别的监控指标,后续会持续添加。 + +## 5. 怎样获取这些系统监控? + +- 监控模块的相关配置均在`conf/iotdb-{datanode/confignode}.properties`中,所有配置项支持通过`load configuration`命令热加载。 + +### 5.1. 使用 JMX 方式 + +对于使用 JMX 对外暴露的指标,可以通过 Jconsole 来进行查看。在进入 Jconsole 监控页面后,首先会看到 IoTDB +的各类运行情况的概览。在这里,您可以看到堆内存信息、线程信息、类信息以及服务器的 CPU 使用情况。 + +#### 5.1.1. 获取监控指标数据 + +连接到 JMX 后,您可以通过 "MBeans" 标签找到名为 "org.apache.iotdb.metrics" 的 "MBean",可以在侧边栏中查看所有监控指标的具体值。 + +metric-jmx + +#### 5.1.2. 获取其他相关数据 + +连接到 JMX 后,您可以通过 "MBeans" 标签找到名为 "org.apache.iotdb.service" 的 "MBean",如下图所示,了解服务的基本状态 + +
+ +为了提高查询性能,IOTDB 对 ChunkMetaData 和 TsFileMetaData 进行了缓存。用户可以使用 MXBean +,展开侧边栏`org.apache.iotdb.db.service`查看缓存命中率: + + + +### 5.2. 使用 Prometheus 方式 + +#### 5.2.1. 监控指标的 Prometheus 映射关系 + +> 对于 Metric Name 为 name, Tags 为 K1=V1, ..., Kn=Vn 的监控指标有如下映射,其中 value 为具体值 + +| 监控指标类型 | 映射关系 | +| ---------------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Counter | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value | +| AutoGauge、Gauge | name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value | +| Histogram | name_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value
name{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value | +| Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="mean"} value | +| Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value | + +#### 5.2.2. 修改配置文件 + +1) 以 DataNode 为例,修改 iotdb-system.properties 配置文件如下: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=CORE +dn_metric_prometheus_reporter_port=9091 +``` + +2) 启动 IoTDB DataNode + +3) 打开浏览器或者用```curl``` 访问 ```http://servier_ip:9091/metrics```, 就能得到如下 metric 数据: + +``` +... +# HELP file_count +# TYPE file_count gauge +file_count{name="wal",} 0.0 +file_count{name="unseq",} 0.0 +file_count{name="seq",} 2.0 +... +``` + +#### 5.2.3. Prometheus + Grafana + +如上所示,IoTDB 对外暴露出标准的 Prometheus 格式的监控指标数据,可以使用 Prometheus 采集并存储监控指标,使用 Grafana +可视化监控指标。 + +IoTDB、Prometheus、Grafana三者的关系如下图所示: + +![iotdb_prometheus_grafana](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/iotdb_prometheus_grafana.png) + +1. IoTDB在运行过程中持续收集监控指标数据。 +2. Prometheus以固定的间隔(可配置)从IoTDB的HTTP接口拉取监控指标数据。 +3. Prometheus将拉取到的监控指标数据存储到自己的TSDB中。 +4. Grafana以固定的间隔(可配置)从Prometheus查询监控指标数据并绘图展示。 + +从交互流程可以看出,我们需要做一些额外的工作来部署和配置Prometheus和Grafana。 + +比如,你可以对Prometheus进行如下的配置(部分参数可以自行调整)来从IoTDB获取监控数据 + +```yaml +job_name: pull-metrics +honor_labels: true +honor_timestamps: true +scrape_interval: 15s +scrape_timeout: 10s +metrics_path: /metrics +scheme: http +follow_redirects: true +static_configs: + - targets: + - localhost:9091 +``` + +更多细节可以参考下面的文档: + +[Prometheus安装使用文档](https://prometheus.io/docs/prometheus/latest/getting_started/) + +[Prometheus从HTTP接口拉取metrics数据的配置说明](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) + +[Grafana安装使用文档](https://grafana.com/docs/grafana/latest/getting-started/getting-started/) + +[Grafana从Prometheus查询数据并绘图的文档](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) + +#### 5.2.4. Apache IoTDB Dashboard + +我们提供了Apache IoTDB Dashboard,在Grafana中显示的效果图如下所示: + +![Apache IoTDB Dashboard](https://alioss.timecho.com/docs/img/UserGuide/System-Tools/Metrics/dashboard.png) + +你可以在企业版中获取到 Dashboard 的 Json文件。 + +### 5.3. 使用 IoTDB 方式 + +#### 5.3.1. 监控指标的 IoTDB 映射关系 + +> 对于 Metric Name 为 name, Tags 为 K1=V1, ..., Kn=Vn 的监控指标有如下映射,以默认写到 root.__system.metric.`clusterName`.`nodeType`.`nodeId` 为例 + +| 监控指标类型 | 映射关系 | +| ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Counter | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.value | +| AutoGauge、Gauge | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.value | +| Histogram | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.count
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.max
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.sum
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p0
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p50
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p75
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p99
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p999 | +| Rate | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.count
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.mean
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m1
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m5
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m15 | +| Timer | root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.count
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.max
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.mean
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.sum
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p0
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p50
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p75
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p99
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.p999
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m1
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m5
root.__system.metric.`clusterName`.`nodeType`.`nodeId`.name.`K1=V1`...`Kn=Vn`.m15 | + +#### 5.3.2. 获取监控指标 + +根据如上的映射关系,可以构成相关的 IoTDB 查询语句获取监控指标 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Monitoring-Board-Install-and-Deploy.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Monitoring-Board-Install-and-Deploy.md new file mode 100644 index 00000000..c690e3f8 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Monitoring-Board-Install-and-Deploy.md @@ -0,0 +1,208 @@ + + +# 1. 监控面板安装部署 +从 IoTDB 1.0 版本开始,我们引入了系统监控模块,可以完成对 IoTDB 的重要运行指标进行监控,本文介绍了如何在 IoTDB 分布式开启系统监控模块,并且使用 Prometheus + Grafana 的方式完成对系统监控指标的可视化。 + +## 1.1 前期准备 + +### 1.1.1 软件要求 + +1. IoTDB:1.0 版本及以上,可以前往官网下载:https://iotdb.apache.org/Download/ +2. Prometheus:2.30.3 版本及以上,可以前往官网下载:https://prometheus.io/download/ +3. Grafana:8.4.2 版本及以上,可以前往官网下载:https://grafana.com/grafana/download +4. IoTDB 监控面板:基于企业版IoTDB的数据库监控面板,您可联系商务获取 + + +### 1.1.2 启动 ConfigNode +> 本文以 3C3D 为例 + +1. 进入`iotdb-enterprise-1.3.x.x-bin`包 +2. 修改配置文件`conf/iotdb-system.properties`,修改如下配置,其他配置保持不变: + +```properties +cn_metric_reporter_list=PROMETHEUS +cn_metric_level=IMPORTANT +cn_metric_prometheus_reporter_port=9091 +``` + +3. 运行脚本启动 ConfigNode:`./sbin/start-confignode.sh`,出现如下提示则为启动成功: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/1.png) + +4. 在浏览器进入http://localhost:9091/metrics网址,可以查看到如下的监控项信息: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/2.png) + +5. 同样地,另外两个 ConfigNode 节点可以分别配置到 9092 和 9093 端口。 + +### 1.1.3 启动 DataNode +1. 进入`iotdb-enterprise-1.3.x.x-bin`包 +2. 修改配置文件`conf/iotdb-system.properties`,修改如下配置,其他配置保持不变: + +```properties +dn_metric_reporter_list=PROMETHEUS +dn_metric_level=IMPORTANT +dn_metric_prometheus_reporter_port=9094 +``` + +3. 运行脚本启动 DataNode:`./sbin/start-datanode.sh`,出现如下提示则为启动成功: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/3.png) + +4. 在浏览器进入`http://localhost:9094/metrics`网址,可以查看到如下的监控项信息: + +![](https://spricoder.oss-cn-shanghai.aliyuncs.com/Apache%20IoTDB/metric/cluster-introduce/4.png) + +5. 同样地,另外两个 DataNode 可以配置到 9095 和 9096 端口。 + +### 1.1.4 说明 + +进行以下操作前请确认IoTDB集群已启动。 + +本文将在一台机器(3 个 ConfigNode 和 3 个 DataNode)环境上进行监控面板搭建,其他集群配置是类似的,用户可以根据自己的集群情况(ConfigNode 和 DataNode 的数量)进行配置调整。本文搭建的集群的基本配置信息如下表所示。 + +| 集群角色 | 节点IP | 监控模块推送器 | 监控模块级别 | 监控 Port | +| ---------- | --------- | -------------- | ------------ | --------- | +| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9091 | +| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9092 | +| ConfigNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9093 | +| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9094 | +| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9095 | +| DataNode | 127.0.0.1 | PROMETHEUS | IMPORTANT | 9096 | + +## 1.2 配置 Prometheus 采集监控指标 + +1. 下载安装包。下载Prometheus的二进制包到本地,解压后进入对应文件夹: + +```Shell +tar xvfz prometheus-*.tar.gz +cd prometheus-* +``` + +2. 修改配置。修改Prometheus的配置文件prometheus.yml如下 + a. 新增 confignode 任务收集 ConfigNode 的监控数据 + b. 新增 datanode 任务收集 DataNode 的监控数据 + +```YAML +global: + scrape_interval: 15s + +scrape_configs: + - job_name: "prometheus" + static_configs: + - targets: ["localhost:9090"] + - job_name: "confignode" + static_configs: + - targets: ["localhost:9091", "localhost:9092", "localhost:9093"] + honor_labels: true + - job_name: "datanode" + static_configs: + - targets: ["localhost:9094", "localhost:9095", "localhost:9096"] + honor_labels: true +``` + +3. 启动Promethues。Prometheus 监控数据的默认过期时间为 15d。在生产环境中,建议将其调整为 180d 以上,以对更长时间的历史监控数据进行追踪,启动命令如下所示: + +```Shell +./prometheus --config.file=prometheus.yml --storage.tsdb.retention.time=180d +``` + +4. 确认启动成功。在浏览器中输入 http://localhost:9090,进入Prometheus,点击进入Status下的Target界面(如下图1),当看到State均为Up时表示配置成功并已经联通(如下图2),点击左侧链接可以跳转到网页监控。 + +![](https://alioss.timecho.com/docs/img/1a.PNG) +![](https://alioss.timecho.com/docs/img/2a.PNG) + +## 1.3 使用 Grafana 查看监控数据 + +### 1.3.1 Step1:Grafana 安装、配置与启动 + +1. 下载Grafana的二进制包到本地,解压后进入对应文件夹: + +```Shell +tar -zxvf grafana-*.tar.gz +cd grafana-* +``` + +2. 启动Grafana并进入: + +```Shell +./bin/grafana-server web +``` + +3. 在浏览器中输入 http://localhost:3000,进入Grafana,默认初始用户名和密码均为 admin。 +4. 首先我们在 Configuration 中配置 Data Source 为 Prometheus + +![](https://alioss.timecho.com/docs/img/3a.png) + +5. 在配置 Data Source 时注意 Prometheus 所在的URL,配置好后点击Save & Test 出现 Data source is working 提示则为配置成功 + +![](https://alioss.timecho.com/docs/img/4a.png) + +### 1.3.2 Step2:导入 IoTDB 监控看板 + +1. 进入 Grafana,选择 Dashboards 的 Browse + +![](https://alioss.timecho.com/docs/img/5a.png) + +2. 点击右侧 Import 按钮 + +![](https://alioss.timecho.com/docs/img/6a.png) + +3. 选择一种方式导入 Dashboard + a. 上传本地已下载的 Dashboard 的 Json 文件 + b. 将 Dashboard 的 Json 文件内容直接粘贴进入 + +![](https://alioss.timecho.com/docs/img/7a.png) + +1. 选择 Dashboard 的 Prometheus 为刚刚配置好的 Data Source,然后点击 Import + +![](https://alioss.timecho.com/docs/img/8a.png) + +5. 之后进入 Apache ConfigNode Dashboard,就看到如下的监控面板 + +![](https://alioss.timecho.com/docs/img/confignode.png) + +6. 同样,我们可以导入 Apache DataNode Dashboard,看到如下的监控面板: + +![](https://alioss.timecho.com/docs/img/datanode.png) + +7. 同样,我们可以导入 Apache Performance Overview Dashboard,看到如下的监控面板: + +![](https://alioss.timecho.com/docs/img/performance.png) + +8. 同样,我们可以导入 Apache System Overview Dashboard,看到如下的监控面板: + +![](https://alioss.timecho.com/docs/img/system.png) + +### 1.3.3 Step3:创建新的 Dashboard 进行数据可视化 + +1. 首先创建Dashboard,然后创建Panel + +![](https://alioss.timecho.com/docs/img/11a.png) + +2. 之后就可以在面板根据自己的需求对监控相关的数据进行可视化(所有相关的监控指标可以先在job中选择confignode/datanode筛选) + +![](https://alioss.timecho.com/docs/img/12a.png) + +3. 选择关注的监控指标可视化完成后,我们就得到了这样的面板: + +![](https://alioss.timecho.com/docs/img/13a.png) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Auto-Create-MetaData.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Auto-Create-MetaData.md new file mode 100644 index 00000000..5f7c0ee9 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Auto-Create-MetaData.md @@ -0,0 +1,111 @@ + + +# 自动创建元数据 + +自动创建元数据指的是根据写入数据的特征自动创建出用户未定义的时间序列, +这既能解决海量序列场景下设备及测点难以提前预测与建模的难题,又能为用户提供开箱即用的写入体验。 + +## 自动创建 database + +* enable\_auto\_create\_schema + +| 名字 | enable\_auto\_create\_schema | +|:---:|:---| +| 描述 | 是否开启自动创建元数据功能 | +| 类型 | boolean | +| 默认值 | true | +| 改后生效方式 | 重启服务生效 | + +* default\_storage\_group\_level + +| 名字 | default\_storage\_group\_level | +|:---:|:---| +| 描述 | 指定 database 在时间序列所处的层级,默认为第 1 层(root为第 0 层) | +| 类型 | int | +| 默认值 | 1 | +| 改后生效方式 | 仅允许在第一次启动服务前修改 | + +以下图为例: + +* 当 default_storage_group_level=1 时,将使用 root.turbine1 和 root.turbine2 作为 database。 + +* 当 default_storage_group_level=2 时,将使用 root.turbine1.d1、root.turbine1.d2、root.turbine2.d1 和 root.turbine2.d2 作为 database。 + +auto create database example + +## 自动创建序列的元数据(前端指定数据类型) + +* 用户在写入时精确指定数据类型: + + * Session中的insertTablet接口。 + * Session中带有TSDataType的insert接口。 + ``` + public void insertRecord(String deviceId, long time, List measurements, List types, Object... values); + public void insertRecords(List deviceIds, List times, List> measurementsList, List> typesList, List> valuesList); + ``` + * ...... + +* 插入数据的同时自动创建序列,效率较高。 + +## 自动创建序列的元数据(类型推断) + +* 在写入时直接传入字符串,数据库推断数据类型: + + * CLI的insert命令。 + * Session中不带有TSDataType的insert接口。 + ``` + public void insertRecord(String deviceId, long time, List measurements, List types, List values); + public void insertRecords(List deviceIds, List times, List> measurementsList, List> valuesList); + ``` + * ...... + +* 由于类型推断会增加写入时间,所以通过类型推断自动创建序列元数据的效率要低于通过前端指定数据类型自动创建序列元数据,建议用户在可行时先选用前端指定数据类型的方式自动创建序列的元数据。 + +### 类型推断 + +| 数据(String) | 字符串格式 | iotdb-system.properties配置项 | 默认值 | +|:---:|:---|:------------------------------|:---| +| true | boolean | boolean\_string\_infer\_type | BOOLEAN | +| 1 | integer | integer\_string\_infer\_type | FLOAT | +| 17000000(大于 2^24 的整数) | integer | long\_string\_infer\_type | DOUBLE | +| 1.2 | floating | floating\_string\_infer\_type | FLOAT | +| NaN | nan | nan\_string\_infer\_type | DOUBLE | +| 'I am text' | text | 无 | 无 | + +* 可配置的数据类型包括:BOOLEAN, INT32, INT64, FLOAT, DOUBLE, TEXT + +* long_string_infer_type 配置项的目的是防止使用 FLOAT 推断 integer_string_infer_type 而造成精度缺失。 + +### 编码方式 + +| 数据类型 | iotdb-system.properties配置项 | 默认值 | +|:---|:-----------------------------|:---| +| BOOLEAN | default\_boolean\_encoding | RLE | +| INT32 | default\_int32\_encoding | RLE | +| INT64 | default\_int64\_encoding | RLE | +| FLOAT | default\_float\_encoding | GORILLA | +| DOUBLE | default\_double\_encoding | GORILLA | +| TEXT | default\_text\_encoding | PLAIN | + +* 可配置的编码方式包括:PLAIN, RLE, TS_2DIFF, GORILLA, DICTIONARY + +* 数据类型与编码方式的对应关系详见 [编码方式](../Basic-Concept/Encoding.md)。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Database.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Database.md new file mode 100644 index 00000000..b574f8ca --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Database.md @@ -0,0 +1,227 @@ + + +# 元数据操作 +## 数据库管理 + +数据库(Database)可以被视为关系数据库中的Database。 + +### 创建数据库 + +我们可以根据存储模型建立相应的数据库。如下所示: + +``` +IoTDB > CREATE DATABASE root.ln +``` + +需要注意的是,database 的父子节点都不能再设置 database。例如在已经有`root.ln`和`root.sgcc`这两个 database 的情况下,创建`root.ln.wf01` database 是不可行的。系统将给出相应的错误提示,如下所示: + +``` +IoTDB> CREATE DATABASE root.ln.wf01 +Msg: 300: root.ln has already been created as database. +``` +Database 节点名只支持中英文字符、数字、下划线的组合,如果想设置为纯数字或者包含其他字符,需要用反引号(``)把 database 名称引起来。 + +还需注意,如果在 Windows 系统上部署,database 名是大小写不敏感的。例如同时创建`root.ln` 和 `root.LN` 是不被允许的。 + +### 查看数据库 + +在 database 创建后,我们可以使用 [SHOW DATABASES](../Reference/SQL-Reference.md) 语句和 [SHOW DATABASES \](../Reference/SQL-Reference.md) 来查看 database,SQL 语句如下所示: + +``` +IoTDB> show databases +IoTDB> show databases root.* +IoTDB> show databases root.** +``` + +执行结果为: + +``` ++-------------+----+-------------------------+-----------------------+-----------------------+ +| database| ttl|schema_replication_factor|data_replication_factor|time_partition_interval| ++-------------+----+-------------------------+-----------------------+-----------------------+ +| root.sgcc|null| 2| 2| 604800| +| root.ln|null| 2| 2| 604800| ++-------------+----+-------------------------+-----------------------+-----------------------+ +Total line number = 2 +It costs 0.060s +``` + +### 删除数据库 + +用户可以使用`DELETE DATABASE `语句删除该路径模式匹配的所有的数据库。在删除的过程中,需要注意的是数据库的数据也会被删除。 + +``` +IoTDB > DELETE DATABASE root.ln +IoTDB > DELETE DATABASE root.sgcc +// 删除所有数据,时间序列以及数据库 +IoTDB > DELETE DATABASE root.** +``` + +### 统计数据库数量 + +用户可以使用`COUNT DATABASES `语句统计数据库的数量,允许指定`PathPattern` 用来统计匹配该`PathPattern` 的数据库的数量 + +SQL 语句如下所示: + +``` +IoTDB> show databases +IoTDB> count databases +IoTDB> count databases root.* +IoTDB> count databases root.sgcc.* +IoTDB> count databases root.sgcc +``` + +执行结果为: + +``` ++-------------+ +| database| ++-------------+ +| root.sgcc| +| root.turbine| +| root.ln| ++-------------+ +Total line number = 3 +It costs 0.003s + ++-------------+ +| Database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.003s + ++-------------+ +| Database| ++-------------+ +| 3| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| Database| ++-------------+ +| 0| ++-------------+ +Total line number = 1 +It costs 0.002s + ++-------------+ +| database| ++-------------+ +| 1| ++-------------+ +Total line number = 1 +It costs 0.002s +``` + +### 设置异构数据库(进阶操作) + +在熟悉 IoTDB 元数据建模的前提下,用户可以在 IoTDB 中设置异构的数据库,以便应对不同的生产需求。 + +目前支持的数据库异构参数有: + +| 参数名 | 参数类型 | 参数描述 | +|---------------------------|---------|---------------------------| +| TTL | Long | 数据库的 TTL | +| SCHEMA_REPLICATION_FACTOR | Integer | 数据库的元数据副本数 | +| DATA_REPLICATION_FACTOR | Integer | 数据库的数据副本数 | +| SCHEMA_REGION_GROUP_NUM | Integer | 数据库的 SchemaRegionGroup 数量 | +| DATA_REGION_GROUP_NUM | Integer | 数据库的 DataRegionGroup 数量 | + +用户在配置异构参数时需要注意以下三点: ++ TTL 和 TIME_PARTITION_INTERVAL 必须为正整数。 ++ SCHEMA_REPLICATION_FACTOR 和 DATA_REPLICATION_FACTOR 必须小于等于已部署的 DataNode 数量。 ++ SCHEMA_REGION_GROUP_NUM 和 DATA_REGION_GROUP_NUM 的功能与 iotdb-system.properties 配置文件中的 +`schema_region_group_extension_policy` 和 `data_region_group_extension_policy` 参数相关,以 DATA_REGION_GROUP_NUM 为例: +若设置 `data_region_group_extension_policy=CUSTOM`,则 DATA_REGION_GROUP_NUM 将作为 Database 拥有的 DataRegionGroup 的数量; +若设置 `data_region_group_extension_policy=AUTO`,则 DATA_REGION_GROUP_NUM 将作为 Database 拥有的 DataRegionGroup 的配额下界,即当该 Database 开始写入数据时,将至少拥有此数量的 DataRegionGroup。 + +用户可以在创建 Database 时设置任意异构参数,或在单机/分布式 IoTDB 运行时调整部分异构参数。 + +#### 创建 Database 时设置异构参数 + +用户可以在创建 Database 时设置上述任意异构参数,SQL 语句如下所示: + +``` +CREATE DATABASE prefixPath (WITH databaseAttributeClause (COMMA? databaseAttributeClause)*)? +``` + +例如: +``` +CREATE DATABASE root.db WITH SCHEMA_REPLICATION_FACTOR=1, DATA_REPLICATION_FACTOR=3, SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +#### 运行时调整异构参数 + +用户可以在 IoTDB 运行时调整部分异构参数,SQL 语句如下所示: + +``` +ALTER DATABASE prefixPath WITH databaseAttributeClause (COMMA? databaseAttributeClause)* +``` + +例如: +``` +ALTER DATABASE root.db WITH SCHEMA_REGION_GROUP_NUM=1, DATA_REGION_GROUP_NUM=2; +``` + +注意,运行时只能调整下列异构参数: ++ SCHEMA_REGION_GROUP_NUM ++ DATA_REGION_GROUP_NUM + +#### 查看异构数据库 + +用户可以查询每个 Database 的具体异构配置,SQL 语句如下所示: + +``` +SHOW DATABASES DETAILS prefixPath? +``` + +例如: + +``` +IoTDB> SHOW DATABASES DETAILS ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval|SchemaRegionGroupNum|MinSchemaRegionGroupNum|MaxSchemaRegionGroupNum|DataRegionGroupNum|MinDataRegionGroupNum|MaxDataRegionGroupNum| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +|root.db1| null| 1| 3| 604800000| 0| 1| 1| 0| 2| 2| +|root.db2|86400000| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| +|root.db3| null| 1| 1| 604800000| 0| 1| 1| 0| 2| 2| ++--------+--------+-----------------------+---------------------+---------------------+--------------------+-----------------------+-----------------------+------------------+---------------------+---------------------+ +Total line number = 3 +It costs 0.058s +``` + +各列查询结果依次为: ++ 数据库名称 ++ 数据库的 TTL ++ 数据库的元数据副本数 ++ 数据库的数据副本数 ++ 数据库的时间分区间隔 ++ 数据库当前拥有的 SchemaRegionGroup 数量 ++ 数据库需要拥有的最小 SchemaRegionGroup 数量 ++ 数据库允许拥有的最大 SchemaRegionGroup 数量 ++ 数据库当前拥有的 DataRegionGroup 数量 ++ 数据库需要拥有的最小 DataRegionGroup 数量 ++ 数据库允许拥有的最大 DataRegionGroup 数量 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Node.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Node.md new file mode 100644 index 00000000..fe461820 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Node.md @@ -0,0 +1,294 @@ + + +# 节点管理 + +## 查看子路径 + +``` +SHOW CHILD PATHS pathPattern +``` + +可以查看此路径模式所匹配的所有路径的下一层的所有路径和它对应的节点类型,即pathPattern.*所匹配的路径及其节点类型。 + +节点类型:ROOT -> SG INTERNAL -> DATABASE -> INTERNAL -> DEVICE -> TIMESERIES + +示例: + +* 查询 root.ln 的下一层:show child paths root.ln + +``` ++------------+----------+ +| child paths|node types| ++------------+----------+ +|root.ln.wf01| INTERNAL| +|root.ln.wf02| INTERNAL| ++------------+----------+ +Total line number = 2 +It costs 0.002s +``` + +* 查询形如 root.xx.xx.xx 的路径:show child paths root.\*.\* + +``` ++---------------+ +| child paths| ++---------------+ +|root.ln.wf01.s1| +|root.ln.wf02.s2| ++---------------+ +``` + +## 查看子节点 + +``` +SHOW CHILD NODES pathPattern +``` + +可以查看此路径模式所匹配的节点的下一层的所有节点。 + +示例: + +* 查询 root 的下一层:show child nodes root + +``` ++------------+ +| child nodes| ++------------+ +| ln| ++------------+ +``` + +* 查询 root.ln 的下一层 :show child nodes root.ln + +``` ++------------+ +| child nodes| ++------------+ +| wf01| +| wf02| ++------------+ +``` + +## 统计节点数 + +IoTDB 支持使用`COUNT NODES LEVEL=`来统计当前 Metadata + 树下满足某路径模式的路径中指定层级的节点个数。这条语句可以用来统计带有特定采样点的设备数。例如: + +``` +IoTDB > COUNT NODES root.** LEVEL=2 +IoTDB > COUNT NODES root.ln.** LEVEL=2 +IoTDB > COUNT NODES root.ln.wf01.* LEVEL=3 +IoTDB > COUNT NODES root.**.temperature LEVEL=3 +``` + +对于上面提到的例子和 Metadata Tree,你可以获得如下结果: + +``` ++------------+ +|count(nodes)| ++------------+ +| 4| ++------------+ +Total line number = 1 +It costs 0.003s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 1| ++------------+ +Total line number = 1 +It costs 0.002s + ++------------+ +|count(nodes)| ++------------+ +| 2| ++------------+ +Total line number = 1 +It costs 0.002s +``` + +> 注意:时间序列的路径只是过滤条件,与 level 的定义无关。 + +## 查看设备 + +* SHOW DEVICES pathPattern? (WITH DATABASE)? devicesWhereClause? limitClause? + +与 `Show Timeseries` 相似,IoTDB 目前也支持两种方式查看设备。 + +* `SHOW DEVICES` 语句显示当前所有的设备信息,等价于 `SHOW DEVICES root.**`。 +* `SHOW DEVICES ` 语句规定了 `PathPattern`,返回给定的路径模式所匹配的设备信息。 +* `WHERE` 条件中可以使用 `DEVICE contains 'xxx'`,根据 device 名称进行模糊查询。 +* `WHERE` 条件中可以使用 `TEMPLATE = 'xxx'`,`TEMPLATE != 'xxx'`,根据 template 名称进行过滤查询。 +* `WHERE` 条件中可以使用 `TEMPLATE is null`,`TEMPLATE is not null`,根据 template 是否为null(null 表示没激活)进行过滤查询。 + +SQL 语句如下所示: + +``` +IoTDB> show devices +IoTDB> show devices root.ln.** +IoTDB> show devices root.ln.** where device contains 't' +IoTDB> show devices root.ln.** where template = 't1' +IoTDB> show devices root.ln.** where template is null +``` + +你可以获得如下数据: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +| root.ln.wf01.wt01| false| t1| +| root.ln.wf02.wt02| false| null| +|root.sgcc.wf03.wt01| false| null| +| root.turbine.d1| false| null| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.002s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 2 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf01.wt01| false| t1| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s + ++-----------------+---------+---------+ +| devices|isAligned| Template| ++-----------------+---------+---------+ +|root.ln.wf02.wt02| false| null| ++-----------------+---------+---------+ +Total line number = 1 +It costs 0.001s +``` + +其中,`isAligned`表示该设备下的时间序列是否对齐, +`Template`显示着该设备所激活的模板名,null 表示没有激活模板。 + +查看设备及其 database 信息,可以使用 `SHOW DEVICES WITH DATABASE` 语句。 + +* `SHOW DEVICES WITH DATABASE` 语句显示当前所有的设备信息和其所在的 database,等价于 `SHOW DEVICES root.**`。 +* `SHOW DEVICES WITH DATABASE` 语句规定了 `PathPattern`,返回给定的路径模式所匹配的设备信息和其所在的 database。 + +SQL 语句如下所示: + +``` +IoTDB> show devices with database +IoTDB> show devices root.ln.** with database +``` + +你可以获得如下数据: + +``` ++-------------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-------------------+-------------+---------+---------+ +| root.ln.wf01.wt01| root.ln| false| t1| +| root.ln.wf02.wt02| root.ln| false| null| +|root.sgcc.wf03.wt01| root.sgcc| false| null| +| root.turbine.d1| root.turbine| false| null| ++-------------------+-------------+---------+---------+ +Total line number = 4 +It costs 0.003s + ++-----------------+-------------+---------+---------+ +| devices| database|isAligned| Template| ++-----------------+-------------+---------+---------+ +|root.ln.wf01.wt01| root.ln| false| t1| +|root.ln.wf02.wt02| root.ln| false| null| ++-----------------+-------------+---------+---------+ +Total line number = 2 +It costs 0.001s +``` + +## 统计设备数量 + +* COUNT DEVICES \ + +上述语句用于统计设备的数量,同时允许指定`PathPattern` 用于统计匹配该`PathPattern` 的设备数量 + +SQL 语句如下所示: + +``` +IoTDB> show devices +IoTDB> count devices +IoTDB> count devices root.ln.** +``` + +你可以获得如下数据: + +``` ++-------------------+---------+---------+ +| devices|isAligned| Template| ++-------------------+---------+---------+ +|root.sgcc.wf03.wt03| false| null| +| root.turbine.d1| false| null| +| root.ln.wf02.wt02| false| null| +| root.ln.wf01.wt01| false| t1| ++-------------------+---------+---------+ +Total line number = 4 +It costs 0.024s + ++--------------+ +|count(devices)| ++--------------+ +| 4| ++--------------+ +Total line number = 1 +It costs 0.004s + ++--------------+ +|count(devices)| ++--------------+ +| 2| ++--------------+ +Total line number = 1 +It costs 0.004s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Template.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Template.md new file mode 100644 index 00000000..8086a087 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Template.md @@ -0,0 +1,240 @@ + + +# 元数据模板 + +IoTDB 支持元数据模板功能,实现同类型不同实体的物理量元数据共享,减少元数据内存占用,同时简化同类型实体的管理。 + +注:以下语句中的 `schema` 关键字可以省略。 + +## 创建元数据模板 + +创建元数据模板的 SQL 语法如下: + +```sql +CREATE SCHEMA TEMPLATE ALIGNED? '(' [',' ]+ ')' +``` + +**示例1:** 创建包含两个非对齐序列的元数据模板 + +```shell +IoTDB> create schema template t1 (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +**示例2:** 创建包含一组对齐序列的元数据模板 + +```shell +IoTDB> create schema template t2 aligned (lat FLOAT encoding=Gorilla, lon FLOAT encoding=Gorilla) +``` + +其中,物理量 `lat` 和 `lon` 是对齐的。 + +## 挂载元数据模板 + +元数据模板在创建后,需执行挂载操作,方可用于相应路径下的序列创建与数据写入。 + +**挂载模板前,需确保相关数据库已经创建。** + +**推荐将模板挂载在 database 节点上,不建议将模板挂载到 database 上层的节点上。** + +**模板挂载路径下禁止创建普通序列,已创建了普通序列的前缀路径上不允许挂载模板。** + +挂载元数据模板的 SQL 语句如下所示: + +```shell +IoTDB> set schema template t1 to root.sg1.d1 +``` + +## 激活元数据模板 + +挂载好元数据模板后,且系统开启自动注册序列功能的情况下,即可直接进行数据的写入。例如 database 为 root.sg1,模板 t1 被挂载到了节点 root.sg1.d1,那么可直接向时间序列(如 root.sg1.d1.temperature 和 root.sg1.d1.status)写入时间序列数据,该时间序列已可被当作正常创建的序列使用。 + +**注意**:在插入数据之前或系统未开启自动注册序列功能,模板定义的时间序列不会被创建。可以使用如下SQL语句在插入数据前创建时间序列即激活模板: + +```shell +IoTDB> create timeseries using schema template on root.sg1.d1 +``` + +**示例:** 执行以下语句 +```shell +IoTDB> set schema template t1 to root.sg1.d1 +IoTDB> set schema template t2 to root.sg1.d2 +IoTDB> create timeseries using schema template on root.sg1.d1 +IoTDB> create timeseries using schema template on root.sg1.d2 +``` + +查看此时的时间序列: +```sql +show timeseries root.sg1.** +``` + +```shell ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +|root.sg1.d1.temperature| null| root.sg1| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.sg1.d1.status| null| root.sg1| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +| root.sg1.d2.lon| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| +| root.sg1.d2.lat| null| root.sg1| FLOAT| GORILLA| SNAPPY|null| null| null| null| ++-----------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +``` + +查看此时的设备: +```sql +show devices root.sg1.** +``` + +```shell ++---------------+---------+---------+ +| devices|isAligned| Template| ++---------------+---------+---------+ +| root.sg1.d1| false| null| +| root.sg1.d2| true| null| ++---------------+---------+---------+ +``` + +## 查看元数据模板 + +- 查看所有元数据模板 + +SQL 语句如下所示: + +```shell +IoTDB> show schema templates +``` + +执行结果如下: +```shell ++-------------+ +|template name| ++-------------+ +| t2| +| t1| ++-------------+ +``` + +- 查看某个元数据模板下的物理量 + +SQL 语句如下所示: + +```shell +IoTDB> show nodes in schema template t1 +``` + +执行结果如下: +```shell ++-----------+--------+--------+-----------+ +|child nodes|dataType|encoding|compression| ++-----------+--------+--------+-----------+ +|temperature| FLOAT| RLE| SNAPPY| +| status| BOOLEAN| PLAIN| SNAPPY| ++-----------+--------+--------+-----------+ +``` + +- 查看挂载了某个元数据模板的路径 + +```shell +IoTDB> show paths set schema template t1 +``` + +执行结果如下: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +- 查看使用了某个元数据模板的路径(即模板在该路径上已激活,序列已创建) + +```shell +IoTDB> show paths using schema template t1 +``` + +执行结果如下: +```shell ++-----------+ +|child paths| ++-----------+ +|root.sg1.d1| ++-----------+ +``` + +## 解除元数据模板 + +若需删除模板表示的某一组时间序列,可采用解除模板操作,SQL语句如下所示: + +```shell +IoTDB> delete timeseries of schema template t1 from root.sg1.d1 +``` + +或 + +```shell +IoTDB> deactivate schema template t1 from root.sg1.d1 +``` + +解除操作支持批量处理,SQL语句如下所示: + +```shell +IoTDB> delete timeseries of schema template t1 from root.sg1.*, root.sg2.* +``` + +或 + +```shell +IoTDB> deactivate schema template t1 from root.sg1.*, root.sg2.* +``` + +若解除命令不指定模板名称,则会将给定路径涉及的所有模板使用情况均解除。 + +## 卸载元数据模板 + +卸载元数据模板的 SQL 语句如下所示: + +```shell +IoTDB> unset schema template t1 from root.sg1.d1 +``` + +**注意**:不支持卸载仍处于激活状态的模板,需保证执行卸载操作前解除对该模板的所有使用,即删除所有该模板表示的序列。 + +## 删除元数据模板 + +删除元数据模板的 SQL 语句如下所示: + +```shell +IoTDB> drop schema template t1 +``` + +**注意**:不支持删除已经挂载的模板,需在删除操作前保证该模板卸载成功。 + +## 修改元数据模板 + +在需要新增物理量的场景中,可以通过修改元数据模板来给所有已激活该模板的设备新增物理量。 + +修改元数据模板的 SQL 语句如下所示: + +```shell +IoTDB> alter schema template t1 add (speed FLOAT encoding=RLE, FLOAT TEXT encoding=PLAIN compression=SNAPPY) +``` + +**向已挂载模板的路径下的设备中写入数据,若写入请求中的物理量不在模板中,将自动扩展模板。** diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Timeseries.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Timeseries.md new file mode 100644 index 00000000..d8c31c36 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operate-Metadata/Timeseries.md @@ -0,0 +1,438 @@ + + +# 时间序列管理 + +## 创建时间序列 + +根据建立的数据模型,我们可以分别在两个数据库中创建相应的时间序列。创建时间序列的 SQL 语句如下所示: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature with datatype=FLOAT,encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware with datatype=TEXT,encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status with datatype=BOOLEAN,encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature with datatype=FLOAT,encoding=RLE +``` + +从 v0.13 起,可以使用简化版的 SQL 语句创建时间序列: + +``` +IoTDB > create timeseries root.ln.wf01.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.ln.wf01.wt01.temperature FLOAT encoding=RLE +IoTDB > create timeseries root.ln.wf02.wt02.hardware TEXT encoding=PLAIN +IoTDB > create timeseries root.ln.wf02.wt02.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.status BOOLEAN encoding=PLAIN +IoTDB > create timeseries root.sgcc.wf03.wt01.temperature FLOAT encoding=RLE +``` + +需要注意的是,当创建时间序列时指定的编码方式与数据类型不对应时,系统会给出相应的错误提示,如下所示: +``` +IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF +error: encoding TS_2DIFF does not support BOOLEAN +``` + +详细的数据类型与编码方式的对应列表请参见 [编码方式](../Basic-Concept/Encoding.md)。 + +## 创建对齐时间序列 + +创建一组对齐时间序列的SQL语句如下所示: + +``` +IoTDB> CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY) +``` + +一组对齐序列中的序列可以有不同的数据类型、编码方式以及压缩方式。 + +对齐的时间序列也支持设置别名、标签、属性。 + +## 删除时间序列 + +我们可以使用`(DELETE | DROP) TimeSeries `语句来删除我们之前创建的时间序列。SQL 语句如下所示: + +``` +IoTDB> delete timeseries root.ln.wf01.wt01.status +IoTDB> delete timeseries root.ln.wf01.wt01.temperature, root.ln.wf02.wt02.hardware +IoTDB> delete timeseries root.ln.wf02.* +IoTDB> drop timeseries root.ln.wf02.* +``` + +## 查看时间序列 + +* SHOW LATEST? TIMESERIES pathPattern? timeseriesWhereClause? limitClause? + + SHOW TIMESERIES 中可以有四种可选的子句,查询结果为这些时间序列的所有信息 + +时间序列信息具体包括:时间序列路径名,database,Measurement 别名,数据类型,编码方式,压缩方式,属性和标签。 + +示例: + +* SHOW TIMESERIES + + 展示系统中所有的时间序列信息 + +* SHOW TIMESERIES <`Path`> + + 返回给定路径的下的所有时间序列信息。其中 `Path` 需要为一个时间序列路径或路径模式。例如,分别查看`root`路径和`root.ln`路径下的时间序列,SQL 语句如下所示: + +``` +IoTDB> show timeseries root.** +IoTDB> show timeseries root.ln.** +``` + +执行结果分别为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.016s + ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes|deadband|deadband parameters| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|null| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+--------+-------------------+ +Total line number = 4 +It costs 0.004s +``` + +* SHOW TIMESERIES LIMIT INT OFFSET INT + + 只返回从指定下标开始的结果,最大返回条数被 LIMIT 限制,用于分页查询。例如: + +``` +show timeseries root.ln.** limit 10 offset 10 +``` + +* SHOW TIMESERIES WHERE TIMESERIES contains 'containStr' + + 对查询结果集根据 timeseries 名称进行字符串模糊匹配过滤。例如: + +``` +show timeseries root.ln.** where timeseries contains 'wf01.wt' +``` + +执行结果为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 2 +It costs 0.016s +``` + +* SHOW TIMESERIES WHERE DataType=type + + 对查询结果集根据时间序列数据类型进行过滤。例如: + +``` +show timeseries root.ln.** where dataType=FLOAT +``` + +执行结果为: + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 3 +It costs 0.016s + +``` + + +* SHOW LATEST TIMESERIES + + 表示查询出的时间序列需要按照最近插入时间戳降序排列 + + +需要注意的是,当查询路径不存在时,系统会返回 0 条时间序列。 + +## 统计时间序列总数 + +IoTDB 支持使用`COUNT TIMESERIES`来统计一条路径中的时间序列个数。SQL 语句如下所示: + +* 可以通过 `WHERE` 条件对时间序列名称进行字符串模糊匹配,语法为: `COUNT TIMESERIES WHERE TIMESERIES contains 'containStr'` 。 +* 可以通过 `WHERE` 条件对时间序列数据类型进行过滤,语法为: `COUNT TIMESERIES WHERE DataType='`。 +* 可以通过 `WHERE` 条件对标签点进行过滤,语法为: `COUNT TIMESERIES WHERE TAGS(key)='value'` 或 `COUNT TIMESERIES WHERE TAGS(key) contains 'value'`。 +* 可以通过定义`LEVEL`来统计指定层级下的时间序列个数。这条语句可以用来统计每一个设备下的传感器数量,语法为:`COUNT TIMESERIES GROUP BY LEVEL=`。 + +``` +IoTDB > COUNT TIMESERIES root.** +IoTDB > COUNT TIMESERIES root.ln.** +IoTDB > COUNT TIMESERIES root.ln.*.*.status +IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' +IoTDB > COUNT TIMESERIES root.** WHERE DATATYPE = INT64 +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) contains 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TAGS(unit) = 'c' +IoTDB > COUNT TIMESERIES root.** WHERE TIMESERIES contains 'sgcc' group by level = 1 +``` + +例如有如下时间序列(可以使用`show timeseries`展示所有时间序列): + +``` ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +| timeseries| alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +|root.sgcc.wf03.wt01.temperature| null| root.sgcc| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.sgcc.wf03.wt01.status| null| root.sgcc| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| +| root.turbine.d1.s1|newAlias| root.turbine| FLOAT| RLE| SNAPPY|{"newTag1":"newV1","tag4":"v4","tag3":"v3"}|{"attr2":"v2","attr1":"newV1","attr4":"v4","attr3":"v3"}| null| null| +| root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY| {"unit":"c"}| null| null| null| +| root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| {"description":"test1"}| null| null| null| +| root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY| null| null| null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY| null| null| null| null| ++-------------------------------+--------+-------------+--------+--------+-----------+-------------------------------------------+--------------------------------------------------------+--------+-------------------+ +Total line number = 7 +It costs 0.004s +``` + +那么 Metadata Tree 如下所示: + + + +可以看到,`root`被定义为`LEVEL=0`。那么当你输入如下语句时: + +``` +IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 +``` + +你将得到以下结果: + +``` +IoTDB> COUNT TIMESERIES root.** GROUP BY LEVEL=1 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +| root.sgcc| 2| +|root.turbine| 1| +| root.ln| 4| ++------------+-----------------+ +Total line number = 3 +It costs 0.002s + +IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf02| 2| +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 2 +It costs 0.002s + +IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=2 ++------------+-----------------+ +| column|count(timeseries)| ++------------+-----------------+ +|root.ln.wf01| 2| ++------------+-----------------+ +Total line number = 1 +It costs 0.002s +``` + +> 注意:时间序列的路径只是过滤条件,与 level 的定义无关。 + +## 标签点管理 + +我们可以在创建时间序列的时候,为它添加别名和额外的标签和属性信息。 + +标签和属性的区别在于: + +* 标签可以用来查询时间序列路径,会在内存中维护标点到时间序列路径的倒排索引:标签 -> 时间序列路径 +* 属性只能用时间序列路径来查询:时间序列路径 -> 属性 + +所用到的扩展的创建时间序列的 SQL 语句如下所示: +``` +create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=RLE, compression=SNAPPY tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2) +``` + +括号里的`temprature`是`s1`这个传感器的别名。 +我们可以在任何用到`s1`的地方,将其用`temprature`代替,这两者是等价的。 + +> IoTDB 同时支持在查询语句中 [使用 AS 函数](../Reference/SQL-Reference.md#数据管理语句) 设置别名。二者的区别在于:AS 函数设置的别名用于替代整条时间序列名,且是临时的,不与时间序列绑定;而上文中的别名只作为传感器的别名,与其绑定且可与原传感器名等价使用。 + +> 注意:额外的标签和属性信息总的大小不能超过`tag_attribute_total_size`. + + * 标签点属性更新 +创建时间序列后,我们也可以对其原有的标签点属性进行更新,主要有以下六种更新方式: +* 重命名标签或属性 +``` +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` +* 重新设置标签或属性的值 +``` +ALTER timeseries root.turbine.d1.s1 SET newTag1=newV1, attr1=newV1 +``` +* 删除已经存在的标签或属性 +``` +ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +``` +* 添加新的标签 +``` +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +``` +* 添加新的属性 +``` +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +``` +* 更新插入别名,标签和属性 +> 如果该别名,标签或属性原来不存在,则插入,否则,用新值更新原来的旧值 +``` +ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* 使用标签作为过滤条件查询时间序列,使用 TAGS(tagKey) 来标识作为过滤条件的标签 +``` +SHOW TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +``` + +返回给定路径的下的所有满足条件的时间序列信息,SQL 语句如下所示: + +``` +ALTER timeseries root.ln.wf02.wt02.hardware ADD TAGS unit=c +ALTER timeseries root.ln.wf02.wt02.status ADD TAGS description=test1 +show timeseries root.ln.** where TAGS(unit)='c' +show timeseries root.ln.** where TAGS(description) contains 'test1' +``` + +执行结果分别为: + +``` ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.hardware| null| root.ln| TEXT| PLAIN| SNAPPY|{"unit":"c"}| null| null| null| ++--------------------------+-----+-------------+--------+--------+-----------+------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.005s + ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags|attributes|deadband|deadband parameters| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +|root.ln.wf02.wt02.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|{"description":"test1"}| null| null| null| ++------------------------+-----+-------------+--------+--------+-----------+-----------------------+----------+--------+-------------------+ +Total line number = 1 +It costs 0.004s +``` + +- 使用标签作为过滤条件统计时间序列数量 + +``` +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause +COUNT TIMESERIES (<`PathPattern`>)? timeseriesWhereClause GROUP BY LEVEL= +``` + +返回给定路径的下的所有满足条件的时间序列的数量,SQL 语句如下所示: + +``` +count timeseries +count timeseries root.** where TAGS(unit)='c' +count timeseries root.** where TAGS(unit)='c' group by level = 2 +``` + +执行结果分别为: + +``` +IoTDB> count timeseries ++-----------------+ +|count(timeseries)| ++-----------------+ +| 6| ++-----------------+ +Total line number = 1 +It costs 0.019s +IoTDB> count timeseries root.** where TAGS(unit)='c' ++-----------------+ +|count(timeseries)| ++-----------------+ +| 2| ++-----------------+ +Total line number = 1 +It costs 0.020s +IoTDB> count timeseries root.** where TAGS(unit)='c' group by level = 2 ++--------------+-----------------+ +| column|count(timeseries)| ++--------------+-----------------+ +| root.ln.wf02| 2| +| root.ln.wf01| 0| +|root.sgcc.wf03| 0| ++--------------+-----------------+ +Total line number = 3 +It costs 0.011s +``` + +> 注意,现在我们只支持一个查询条件,要么是等值条件查询,要么是包含条件查询。当然 where 子句中涉及的必须是标签值,而不能是属性值。 + +创建对齐时间序列 + +``` +create aligned timeseries root.sg1.d1(s1 INT32 tags(tag1=v1, tag2=v2) attributes(attr1=v1, attr2=v2), s2 DOUBLE tags(tag3=v3, tag4=v4) attributes(attr3=v3, attr4=v4)) +``` + +执行结果如下: + +``` +IoTDB> show timeseries ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| +|root.sg1.d1.s2| null| root.sg1| DOUBLE| GORILLA| SNAPPY|{"tag4":"v4","tag3":"v3"}|{"attr4":"v4","attr3":"v3"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +支持查询: + +``` +IoTDB> show timeseries where TAGS(tag1)='v1' ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +| timeseries|alias| database|dataType|encoding|compression| tags| attributes|deadband|deadband parameters| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +|root.sg1.d1.s1| null| root.sg1| INT32| RLE| SNAPPY|{"tag1":"v1","tag2":"v2"}|{"attr2":"v2","attr1":"v1"}| null| null| ++--------------+-----+-------------+--------+--------+-----------+-------------------------+---------------------------+--------+-------------------+ +``` + +上述对时间序列标签、属性的更新等操作都支持。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Aggregation.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Aggregation.md new file mode 100644 index 00000000..7d4366e0 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Aggregation.md @@ -0,0 +1,275 @@ + + +# 聚合函数 + +聚合函数是多对一函数。它们对一组值进行聚合计算,得到单个聚合结果。 + +除了 `COUNT()`, `COUNT_IF()`之外,其他所有聚合函数都忽略空值,并在没有输入行或所有值为空时返回空值。 例如,`SUM()` 返回 null 而不是零,而 `AVG()` 在计数中不包括 null 值。 + +IoTDB 支持的聚合函数如下: + +| 函数名 | 功能描述 | 允许的输入类型 | 必要的属性参数 | 输出类型 | +|---------------|----------------------------------------------------------------------------------------------------------|--------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------| +| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| COUNT | 计算数据点数。 | 所有类型 | 无 | INT64 | +| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| MAX_VALUE | 求最大值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| MIN_VALUE | 求最小值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| FIRST_VALUE | 求时间戳最小的值。 | 所有类型 | 无 | 与输入类型一致 | +| LAST_VALUE | 求时间戳最大的值。 | 所有类型 | 无 | 与输入类型一致 | +| MAX_TIME | 求最大时间戳。 | 所有类型 | 无 | Timestamp | +| MIN_TIME | 求最小时间戳。 | 所有类型 | 无 | Timestamp | +| COUNT_IF | 求数据点连续满足某一给定条件,且满足条件的数据点个数(用keep表示)满足指定阈值的次数。 | BOOLEAN | `[keep >=/>/=/!=/= threshold`,`threshold`类型为`INT64`
`ignoreNull`:可选,默认为`true`;为`true`表示忽略null值,即如果中间出现null值,直接忽略,不会打断连续性;为`false`表示不忽略null值,即如果中间出现null值,会打断连续性 | INT64 | +| TIME_DURATION | 求某一列最大一个不为NULL的值所在时间戳与最小一个不为NULL的值所在时间戳的时间戳差 | 所有类型 | 无 | INT64 | +| MODE | 求众数。注意:
1.输入序列的不同值个数过多时会有内存异常风险;
2.如果所有元素出现的频次相同,即没有众数,则返回对应时间戳最小的值;
3.如果有多个众数,则返回对应时间戳最小的众数。 | 所有类型 | 无 | 与输入类型一致 | +| COUNT_TIME | 查询结果集的时间戳的数量。与 align by device 搭配使用时,得到的结果是每个设备的结果集的时间戳的数量。 | 所有类型,输入参数只能为* | 无 | INT64 | + + +## COUNT_IF + +### 语法 +```sql +count_if(predicate, [keep >=/>/=/!=/注意: count_if 当前暂不支持与 group by time 的 SlidingWindow 一起使用 + +### 使用示例 + +#### 原始数据 + +``` ++-----------------------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2| ++-----------------------------+-------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 0| 0| +|1970-01-01T08:00:00.002+08:00| null| 0| +|1970-01-01T08:00:00.003+08:00| 0| 0| +|1970-01-01T08:00:00.004+08:00| 0| 0| +|1970-01-01T08:00:00.005+08:00| 1| 0| +|1970-01-01T08:00:00.006+08:00| 1| 0| +|1970-01-01T08:00:00.007+08:00| 1| 0| +|1970-01-01T08:00:00.008+08:00| 0| 0| +|1970-01-01T08:00:00.009+08:00| 0| 0| +|1970-01-01T08:00:00.010+08:00| 0| 0| ++-----------------------------+-------------+-------------+ +``` + +#### 不使用ignoreNull参数(忽略null) + +SQL: +```sql +select count_if(s1=0 & s2=0, 3), count_if(s1=1 & s2=0, 3) from root.db.d1 +``` + +输出: +``` ++--------------------------------------------------+--------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3)|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3)| ++--------------------------------------------------+--------------------------------------------------+ +| 2| 1| ++--------------------------------------------------+-------------------------------------------------- +``` + +#### 使用ignoreNull参数 + +SQL: +```sql +select count_if(s1=0 & s2=0, 3, 'ignoreNull'='false'), count_if(s1=1 & s2=0, 3, 'ignoreNull'='false') from root.db.d1 +``` + +输出: +``` ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +|count_if(root.db.d1.s1 = 0 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")|count_if(root.db.d1.s1 = 1 & root.db.d1.s2 = 0, 3, "ignoreNull"="false")| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +| 1| 1| ++------------------------------------------------------------------------+------------------------------------------------------------------------+ +``` + +## TIME_DURATION +### 语法 +```sql + time_duration(Path) +``` +### 使用示例 +#### 准备数据 +``` ++----------+-------------+ +| Time|root.db.d1.s1| ++----------+-------------+ +| 1| 70| +| 3| 10| +| 4| 303| +| 6| 110| +| 7| 302| +| 8| 110| +| 9| 60| +| 10| 70| +|1677570934| 30| ++----------+-------------+ +``` +#### 写入语句 +```sql +"CREATE DATABASE root.db", +"CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN tags(city=Beijing)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1, 2, 10, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(2, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(3, 10, 0, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(4, 303, 30, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(5, null, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(6, 110, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(7, 302, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(8, 110, null, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(9, 60, 20, true)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(10,70, 20, null)", +"INSERT INTO root.db.d1(timestamp,s1,s2,s3) values(1677570934, 30, 0, true)", +``` + +查询: +```sql +select time_duration(s1) from root.db.d1 +``` + +输出 +``` ++----------------------------+ +|time_duration(root.db.d1.s1)| ++----------------------------+ +| 1677570933| ++----------------------------+ +``` +> 注:若数据点只有一个,则返回0,若数据点为null,则返回null。 + +## COUNT_TIME +### 语法 +```sql + count_time(*) +``` +### 使用示例 +#### 准备数据 +``` ++----------+-------------+-------------+-------------+-------------+ +| Time|root.db.d1.s1|root.db.d1.s2|root.db.d2.s1|root.db.d2.s2| ++----------+-------------+-------------+-------------+-------------+ +| 0| 0| null| null| 0| +| 1| null| 1| 1| null| +| 2| null| 2| 2| null| +| 4| 4| null| null| 4| +| 5| 5| 5| 5| 5| +| 7| null| 7| 7| null| +| 8| 8| 8| 8| 8| +| 9| null| 9| null| null| ++----------+-------------+-------------+-------------+-------------+ +``` +#### 写入语句 +```sql +CREATE DATABASE root.db; +CREATE TIMESERIES root.db.d1.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d1.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s1 WITH DATATYPE=INT32, ENCODING=PLAIN; +CREATE TIMESERIES root.db.d2.s2 WITH DATATYPE=INT32, ENCODING=PLAIN; +INSERT INTO root.db.d1(time, s1) VALUES(0, 0), (4,4), (5,5), (8,8); +INSERT INTO root.db.d1(time, s2) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8), (9,9); +INSERT INTO root.db.d2(time, s1) VALUES(1, 1), (2,2), (5,5), (7,7), (8,8); +INSERT INTO root.db.d2(time, s2) VALUES(0, 0), (4,4), (5,5), (8,8); +``` + +查询示例1: +```sql +select count_time(*) from root.db.** +``` + +输出 +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +查询示例2: +```sql +select count_time(*) from root.db.d1, root.db.d2 +``` + +输出 +``` ++-------------+ +|count_time(*)| ++-------------+ +| 8| ++-------------+ +``` + +查询示例3: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) +``` + +输出 +``` ++-----------------------------+-------------+ +| Time|count_time(*)| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 1| +|1970-01-01T08:00:00.008+08:00| 2| ++-----------------------------+-------------+ +``` + +查询示例4: +```sql +select count_time(*) from root.db.** group by([0, 10), 2ms) align by device +``` + +输出 +``` ++-----------------------------+----------+-------------+ +| Time| Device|count_time(*)| ++-----------------------------+----------+-------------+ +|1970-01-01T08:00:00.000+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d1| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d1| 2| +|1970-01-01T08:00:00.000+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.002+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.004+08:00|root.db.d2| 2| +|1970-01-01T08:00:00.006+08:00|root.db.d2| 1| +|1970-01-01T08:00:00.008+08:00|root.db.d2| 1| ++-----------------------------+----------+-------------+ + +``` + +> 注: +> 1. count_time里的表达式只能为*。 +> 2. count_time不能和其他的聚合函数一起使用。 +> 3. having语句里不支持使用count_time, 使用count_time聚合函数时不支持使用having语句。 +> 4. count_time不支持与group by level, group by tag一起使用。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Anomaly-Detection.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Anomaly-Detection.md new file mode 100644 index 00000000..59434de4 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Anomaly-Detection.md @@ -0,0 +1,835 @@ + + +# 异常检测 + +## IQR + +### 函数简介 + +本函数用于检验超出上下四分位数1.5倍IQR的数据分布异常。 + +**函数名:** IQR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:若设置为 "batch",则将数据全部读入后检测;若设置为 "stream",则需用户提供上下四分位数进行流式检测。默认为 "batch"。 ++ `q1`:使用流式计算时的下四分位数。 ++ `q3`:使用流式计算时的上四分位数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**说明**:$IQR=Q_3-Q_1$ + +### 使用示例 + +#### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select iqr(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +## KSigma + +### 函数简介 + +本函数利用动态 K-Sigma 算法进行异常检测。在一个窗口内,与平均值的差距超过k倍标准差的数据将被视作异常并输出。 + +**函数名:** KSIGMA + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `k`:在动态 K-Sigma 算法中,分布异常的标准差倍数阈值,默认值为 3。 ++ `window`:动态 K-Sigma 算法的滑动窗口大小,默认值为 10000。 + + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** k 应大于 0,否则将不做输出。 + +### 使用示例 + +#### 指定k + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +## LOF + +### 函数简介 + +本函数使用局部离群点检测方法用于查找序列的密度异常。将根据提供的第k距离数及局部离群点因子(lof)阈值,判断输入数据是否为离群点,即异常,并输出各点的 LOF 值。 + +**函数名:** LOF + +**输入序列:** 多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:使用的检测方法。默认为 default,以高维数据计算。设置为 series,将一维时间序列转换为高维数据计算。 ++ `k`:使用第k距离计算局部离群点因子.默认为 3。 ++ `window`:每次读取数据的窗口长度。默认为 10000. ++ `windowsize`:使用series方法时,转化高维数据的维数,即单个窗口的大小。默认为 5。 + +**输出序列:** 输出单时间序列,类型为DOUBLE。 + +**提示:** 不完整的数据行会被忽略,不参与计算,也不标记为离群点。 + + +### 使用示例 + +#### 默认参数 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +#### 诊断一维时间序列 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +## MissDetect + +### 函数简介 + +本函数用于检测数据中的缺失异常。在一些数据中,缺失数据会被线性插值填补,在数据中出现完美的线性片段,且这些片段往往长度较大。本函数通过在数据中发现这些完美线性片段来检测缺失异常。 + +**函数名:** MISSDETECT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `minlen`:被标记为异常的完美线性片段的最小长度,是一个大于等于 10 的整数,默认值为 10。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN,即该数据点是否为缺失异常。 + +**提示:** 数据中的`NaN`将会被忽略。 + + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +## Range + +### 函数简介 + +本函数用于查找时间序列的范围异常。将根据提供的上界与下界,判断输入数据是否越界,即异常,并输出所有异常点为新的时间序列。 + +**函数名:** RANGE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `lower_bound`:范围异常检测的下界。 ++ `upper_bound`:范围异常检测的上界。 + +**输出序列:** 输出单个序列,类型与输入序列相同。 + +**提示:** 应满足`upper_bound`大于`lower_bound`,否则将不做输出。 + + +### 使用示例 + +#### 指定上界与下界 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +## TwoSidedFilter + +### 函数简介 + +本函数基于双边窗口检测法对输入序列中的异常点进行过滤。 + +**函数名:** TWOSIDEDFILTER + +**输出序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型与输入相同,是输入序列去除异常点后的结果。 + +**参数:** + +- `len`:双边窗口检测法中的窗口大小,取值范围为正整数,默认值为 5.如当`len`=3 时,算法向前、向后各取长度为3的窗口,在窗口中计算异常度。 +- `threshold`:异常度的阈值,取值范围为(0,1),默认值为 0.3。阈值越高,函数对于异常度的判定标准越严格。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +输出序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +## Outlier + +### 函数简介 + +本函数用于检测基于距离的异常点。在当前窗口中,如果一个点距离阈值范围内的邻居数量(包括它自己)少于密度阈值,则该点是异常点。 + +**函数名:** OUTLIER + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `r`:基于距离异常检测中的距离阈值。 ++ `k`:基于距离异常检测中的密度阈值。 ++ `w`:用于指定滑动窗口的大小。 ++ `s`:用于指定滑动窗口的步长。 + +**输出序列**:输出单个序列,类型与输入序列相同。 + +### 使用示例 + +#### 指定查询参数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + +## MasterTrain + +### 函数简介 + +本函数基于主数据训练VAR预测模型。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由连续p+1个非错误值作为训练样本训练VAR模型,输出训练后的模型参数。 + +**函数名:** MasterTrain + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 + +**输出序列:** 输出单个序列,类型为DOUBLE。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn clean package -am -Dmaven.test.skip=true` 编译项目 +- 将 `./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterTrain as org.apache.iotdb.library.anomaly.UDTFMasterTrain'`。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ + +``` + +## MasterDetect + +### 函数简介 + +本函数基于主数据检测并修复时间序列中的错误值。将根据提供的主数据判断时间序列中的数据点是否为错误值,并由MasterTrain训练的模型进行时间序列预测,错误值将由预测值及主数据共同修复。 + +**函数名:** MasterDetect + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `p`:模型阶数。 ++ `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 ++ `eta`:错误值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `beta`:异常值判定阈值,在缺省情况下,算法根据3-sigma原则自动估计该参数。 ++ `output_type`:输出结果类型,可选'repair'或'anomaly',即输出修复结果或异常检测结果,在缺省情况下默认为'repair'。 ++ `output_column`:输出列的序号,默认为1,即输出第一列的修复结果。 + +**安装方式:** + +- 从IoTDB代码仓库下载`research/master-detector`分支代码到本地 +- 在根目录运行 `mvn clean package -am -Dmaven.test.skip=true` 编译项目 +- 将 `./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar`复制到IoTDB服务器的`./ext/udf/` 路径下。 +- 启动 IoTDB服务器,在客户端中执行 `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'`。 + +**输出序列:** 输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +#### 修复 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +#### 异常检测 + +用于查询的 SQL 语句: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| false| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Comparison.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Comparison.md new file mode 100644 index 00000000..71ac9309 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Comparison.md @@ -0,0 +1,309 @@ + + +# 比较运算符和函数 + +## 基本比较运算符 + +- 输入数据类型: `INT32`, `INT64`, `FLOAT`, `DOUBLE`。 +- 注意:会将所有数据转换为`DOUBLE`类型后进行比较。`==`和`!=`可以直接比较两个`BOOLEAN`。 +- 返回类型:`BOOLEAN`。 + +|运算符 |含义| +|----------------------------|-----------| +|`>` |大于| +|`>=` |大于等于| +|`<` |小于| +|`<=` |小于等于| +|`==` |等于| +|`!=` / `<>` |不等于| + +**示例:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +运行结果 +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` + +## `BETWEEN ... AND ...` 运算符 + +|运算符 |含义| +|----------------------------|-----------| +|`BETWEEN ... AND ...` |在指定范围内| +|`NOT BETWEEN ... AND ...` |不在指定范围内| + +**示例:** 选择区间 [36.5,40] 内或之外的数据: + +```sql +select temperature from root.sg1.d1 where temperature between 36.5 and 40; +``` + +```sql +select temperature from root.sg1.d1 where temperature not between 36.5 and 40; +``` + +## 模糊匹配运算符 + +对于 TEXT 类型的数据,支持使用 `Like` 和 `Regexp` 运算符对数据进行模糊匹配 + +|运算符 |含义| +|----------------------------|-----------| +|`LIKE` |匹配简单模式| +|`NOT LIKE` |无法匹配简单模式| +|`REGEXP` |匹配正则表达式| +|`NOT REGEXP` |无法匹配正则表达式| + +输入数据类型:`TEXT` + +返回类型:`BOOLEAN` + +### 使用 `Like` 进行模糊匹配 + +**匹配规则:** + +- `%` 表示任意0个或多个字符。 +- `_` 表示任意单个字符。 + +**示例 1:** 查询 `root.sg.d1` 下 `value` 含有`'cc'`的数据。 + +```shell +IoTDB> select * from root.sg.d1 where value like '%cc%' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 `root.sg.d1` 下 `value` 中间为 `'b'`、前后为任意单个字符的数据。 + +```shell +IoTDB> select * from root.sg.device where value like '_b_' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00| abc| ++-----------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +### 使用 `Regexp` 进行模糊匹配 + +需要传入的过滤条件为 **Java 标准库风格的正则表达式**。 + +**常见的正则匹配举例:** + +``` +长度为3-20的所有字符:^.{3,20}$ +大写英文字符:^[A-Z]+$ +数字和英文字符:^[A-Za-z0-9]+$ +以a开头的:^a.* +``` + +**示例 1:** 查询 root.sg.d1 下 value 值为26个英文字符组成的字符串。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 root.sg.d1 下 value 值为26个小写英文字符组成的字符串且时间大于100的。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 3:** + +```sql +select b, b like '1%', b regexp '[0-2]' from root.test; +``` + +运行结果 +``` ++-----------------------------+-----------+-------------------------+--------------------------+ +| Time|root.test.b|root.test.b LIKE '^1.*?$'|root.test.b REGEXP '[0-2]'| ++-----------------------------+-----------+-------------------------+--------------------------+ +|1970-01-01T08:00:00.001+08:00| 111test111| true| true| +|1970-01-01T08:00:00.003+08:00| 333test333| false| false| ++-----------------------------+-----------+-------------------------+--------------------------+ +``` + +## `IS NULL` 运算符 + +|运算符 |含义| +|----------------------------|-----------| +|`IS NULL` |是空值| +|`IS NOT NULL` |不是空值| + +**示例 1:** 选择值为空的数据: + +```sql +select code from root.sg1.d1 where temperature is null; +``` + +**示例 2:** 选择值为非空的数据: + +```sql +select code from root.sg1.d1 where temperature is not null; +``` + +## `IN` 运算符 + +|运算符 |含义| +|----------------------------|-----------| +|`IN` / `CONTAINS` |是指定列表中的值| +|`NOT IN` / `NOT CONTAINS` |不是指定列表中的值| + +输入数据类型:`All Types` + +返回类型 `BOOLEAN` + +**注意:请确保集合中的值可以被转为输入数据的类型。** +> 例如: +> +>`s1 in (1, 2, 3, 'test')`,`s1`的数据类型是`INT32` +> +> 我们将会抛出异常,因为`'test'`不能被转为`INT32`类型 + +**示例 1:** 选择值在特定范围内的数据: + +```sql +select code from root.sg1.d1 where code in ('200', '300', '400', '500'); +``` + +**示例 2:** 选择值在特定范围外的数据: + +```sql +select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); +``` + +**示例 3:** + +```sql +select a, a in (1, 2) from root.test; +``` + +输出2: +``` ++-----------------------------+-----------+--------------------+ +| Time|root.test.a|root.test.a IN (1,2)| ++-----------------------------+-----------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1| true| +|1970-01-01T08:00:00.003+08:00| 3| false| ++-----------------------------+-----------+--------------------+ +``` + +## 条件函数 + +条件函数针对每个数据点进行条件判断,返回布尔值。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`:DOUBLE类型 | BOOLEAN 类型 | 返回`ts_value >= threshold`的bool值 | +| IN_RANGE | INT32 / INT64 / FLOAT / DOUBLE | `lower`:DOUBLE类型
`upper`:DOUBLE类型 | BOOLEAN类型 | 返回`ts_value >= lower && ts_value <= upper`的bool值 | | + +测试数据: + +``` +IoTDB> select ts from root.test; ++-----------------------------+------------+ +| Time|root.test.ts| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 3| +|1970-01-01T08:00:00.004+08:00| 4| ++-----------------------------+------------+ +``` + +**示例 1:** + +SQL语句: +```sql +select ts, on_off(ts, 'threshold'='2') from root.test; +``` + +输出: +``` +IoTDB> select ts, on_off(ts, 'threshold'='2') from root.test; ++-----------------------------+------------+-------------------------------------+ +| Time|root.test.ts|on_off(root.test.ts, "threshold"="2")| ++-----------------------------+------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| true| ++-----------------------------+------------+-------------------------------------+ +``` + +**示例 2:** + +Sql语句: +```sql +select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; +``` + +输出: +``` +IoTDB> select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; ++-----------------------------+------------+--------------------------------------------------+ +| Time|root.test.ts|in_range(root.test.ts, "lower"="2", "upper"="3.1")| ++-----------------------------+------------+--------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| false| +|1970-01-01T08:00:00.002+08:00| 2| true| +|1970-01-01T08:00:00.003+08:00| 3| true| +|1970-01-01T08:00:00.004+08:00| 4| false| ++-----------------------------+------------+--------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conditional.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conditional.md new file mode 100644 index 00000000..0d1508c6 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conditional.md @@ -0,0 +1,345 @@ + + +# 条件表达式 + +## CASE + +CASE表达式是一种条件表达式,可用于根据特定条件返回不同的值,功能类似于其它语言中的if-else。 +CASE表达式由以下部分组成: +- CASE关键字:表示开始CASE表达式。 +- WHEN-THEN子句:可能存在多个,用于定义条件与给出结果。此子句又分为WHEN和THEN两个部分,WHEN部分表示条件,THEN部分表示结果表达式。如果WHEN条件为真,则返回对应的THEN结果。 +- ELSE子句:如果没有任何WHEN-THEN子句的条件为真,则返回ELSE子句中的结果。可以不存在ELSE子句。 +- END关键字:表示结束CASE表达式。 + +CASE表达式是一种标量运算,可以配合任何其它的标量运算或聚合函数使用。 + +下文把所有THEN部分和ELSE子句并称为结果子句。 + +### 语法示例 + +CASE表达式支持两种格式。 + +语法示例如下: +- 格式1: +```sql + CASE + WHEN condition1 THEN expression1 + [WHEN condition2 THEN expression2] ... + [ELSE expression_end] + END +``` + 从上至下检查WHEN子句中的condition。 + + condition为真时返回对应THEN子句中的expression,condition为假时继续检查下一个WHEN子句中的condition。 +- 格式2: +```sql + CASE caseValue + WHEN whenValue1 THEN expression1 + [WHEN whenValue2 THEN expression2] ... + [ELSE expression_end] + END +``` + + 从上至下检查WHEN子句中的whenValue是否与caseValue相等。 + + 满足caseValue=whenValue时返回对应THEN子句中的expression,不满足时继续检查下一个WHEN子句中的whenValue。 + + 格式2会被iotdb转换成等效的格式1,例如以上sql语句会转换成: +```sql + CASE + WHEN caseValue=whenValue1 THEN expression1 + [WHEN caseValue=whenValue1 THEN expression1] ... + [ELSE expression_end] + END +``` + +如果格式1中的condition均不为真,或格式2中均不满足caseVaule=whenValue,则返回ELSE子句中的expression_end;不存在ELSE子句则返回null。 + +### 注意事项 + +- 格式1中,所有WHEN子句必须返回BOOLEAN类型。 +- 格式2中,所有WHEN子句必须能够与CASE子句进行判等。 +- 一个CASE表达式中所有结果子句的返回值类型需要满足一定的条件: + - BOOLEAN类型不能与其它类型共存,存在其它类型会报错。 + - TEXT类型不能与其它类型共存,存在其它类型会报错。 + - 其它四种数值类型可以共存,最终结果会为DOUBLE类型,转换过程可能会存在精度损失。 +- CASE表达式没有实现惰性计算,即所有子句都会被计算。 +- CASE表达式不支持与UDF混用。 +- CASE表达式内部不能存在聚合函数,但CASE表达式的结果可以提供给聚合函数。 +- 使用CLI时,由于CASE表达式字符串较长,推荐用as为表达式提供别名。 + +### 使用示例 + +#### 示例1 + +CASE表达式可对数据进行直观地分析,例如: + +- 某种化学产品的制备需要温度和压力都处于特定范围之内 +- 在制备过程中传感器会侦测温度和压力,在iotdb中形成T(temperature)和P(pressure)两个时间序列 + +这种应用场景下,CASE表达式可以指出哪些时间的参数是合适的,哪些时间的参数不合适,以及为什么不合适。 + +数据: +```sql +IoTDB> select * from root.test1 ++-----------------------------+------------+------------+ +| Time|root.test1.P|root.test1.T| ++-----------------------------+------------+------------+ +|2023-03-29T11:25:54.724+08:00| 1000000.0| 1025.0| +|2023-03-29T11:26:13.445+08:00| 1000094.0| 1040.0| +|2023-03-29T11:27:36.988+08:00| 1000095.0| 1041.0| +|2023-03-29T11:27:56.446+08:00| 1000095.0| 1059.0| +|2023-03-29T11:28:20.838+08:00| 1200000.0| 1040.0| ++-----------------------------+------------+------------+ +``` + +SQL语句: +```sql +select T, P, case +when 1000=1050 then "bad temperature" +when P<=1000000 or P>=1100000 then "bad pressure" +end as `result` +from root.test1 +``` + + +输出: +``` ++-----------------------------+------------+------------+---------------+ +| Time|root.test1.T|root.test1.P| result| ++-----------------------------+------------+------------+---------------+ +|2023-03-29T11:25:54.724+08:00| 1025.0| 1000000.0| bad pressure| +|2023-03-29T11:26:13.445+08:00| 1040.0| 1000094.0| good!| +|2023-03-29T11:27:36.988+08:00| 1041.0| 1000095.0| good!| +|2023-03-29T11:27:56.446+08:00| 1059.0| 1000095.0|bad temperature| +|2023-03-29T11:28:20.838+08:00| 1040.0| 1200000.0| bad pressure| ++-----------------------------+------------+------------+---------------+ +``` + + +#### 示例2 + +CASE表达式可实现结果的自由转换,例如将具有某种模式的字符串转换成另一种字符串。 + +数据: +```sql +IoTDB> select * from root.test2 ++-----------------------------+--------------+ +| Time|root.test2.str| ++-----------------------------+--------------+ +|2023-03-27T18:23:33.427+08:00| abccd| +|2023-03-27T18:23:39.389+08:00| abcdd| +|2023-03-27T18:23:43.463+08:00| abcdefg| ++-----------------------------+--------------+ +``` + +SQL语句: +```sql +select str, case +when str like "%cc%" then "has cc" +when str like "%dd%" then "has dd" +else "no cc and dd" end as `result` +from root.test2 +``` + +输出: +``` ++-----------------------------+--------------+------------+ +| Time|root.test2.str| result| ++-----------------------------+--------------+------------+ +|2023-03-27T18:23:33.427+08:00| abccd| has cc| +|2023-03-27T18:23:39.389+08:00| abcdd| has dd| +|2023-03-27T18:23:43.463+08:00| abcdefg|no cc and dd| ++-----------------------------+--------------+------------+ +``` + +#### 示例3:搭配聚合函数 + +##### 合法:聚合函数←CASE表达式 + +CASE表达式可作为聚合函数的参数。例如,与聚合函数COUNT搭配,可实现同时按多个条件进行数据统计。 + +数据: +```sql +IoTDB> select * from root.test3 ++-----------------------------+------------+ +| Time|root.test3.x| ++-----------------------------+------------+ +|2023-03-27T18:11:11.300+08:00| 0.0| +|2023-03-27T18:11:14.658+08:00| 1.0| +|2023-03-27T18:11:15.981+08:00| 2.0| +|2023-03-27T18:11:17.668+08:00| 3.0| +|2023-03-27T18:11:19.112+08:00| 4.0| +|2023-03-27T18:11:20.822+08:00| 5.0| +|2023-03-27T18:11:22.462+08:00| 6.0| +|2023-03-27T18:11:24.174+08:00| 7.0| +|2023-03-27T18:11:25.858+08:00| 8.0| +|2023-03-27T18:11:27.979+08:00| 9.0| ++-----------------------------+------------+ +``` + +SQL语句: + +```sql +select +count(case when x<=1 then 1 end) as `(-∞,1]`, +count(case when 1 select * from root.test4 ++-----------------------------+------------+ +| Time|root.test4.x| ++-----------------------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| ++-----------------------------+------------+ +``` + +SQL语句: +```sql +select x, case x when 1 then "one" when 2 then "two" else "other" end from root.test4 +``` + +输出: +``` ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +| Time|root.test4.x|CASE WHEN root.test4.x = 1 THEN "one" WHEN root.test4.x = 2 THEN "two" ELSE "other"| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| one| +|1970-01-01T08:00:00.002+08:00| 2.0| two| +|1970-01-01T08:00:00.003+08:00| 3.0| other| +|1970-01-01T08:00:00.004+08:00| 4.0| other| ++-----------------------------+------------+-----------------------------------------------------------------------------------+ +``` + +#### 示例5:结果子句类型 + +CASE表达式的结果子句的返回值需要满足一定的类型限制。 + +此示例中,继续使用示例4中的数据。 + +##### 非法:BOOLEAN与其它类型共存 + +SQL语句: +```sql +select x, case x when 1 then true when 2 then 2 end from root.test4 +``` + +输出: +``` +Msg: 701: CASE expression: BOOLEAN and other types cannot exist at same time +``` + +##### 合法:只存在BOOLEAN类型 + +SQL语句: +```sql +select x, case x when 1 then true when 2 then false end as `result` from root.test4 +``` + +输出: +``` ++-----------------------------+------------+------+ +| Time|root.test4.x|result| ++-----------------------------+------------+------+ +|1970-01-01T08:00:00.001+08:00| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 2.0| false| +|1970-01-01T08:00:00.003+08:00| 3.0| null| +|1970-01-01T08:00:00.004+08:00| 4.0| null| ++-----------------------------+------------+------+ +``` + +##### 非法:TEXT与其它类型共存 + +SQL语句: +```sql +select x, case x when 1 then 1 when 2 then "str" end from root.test4 +``` + +输出: +``` +Msg: 701: CASE expression: TEXT and other types cannot exist at same time +``` + +##### 合法:只存在TEXT类型 + +见示例1。 + +##### 合法:数值类型共存 + +SQL语句: +```sql +select x, case x +when 1 then 1 +when 2 then 222222222222222 +when 3 then 3.3 +when 4 then 4.4444444444444 +end as `result` +from root.test4 +``` + +输出: +``` ++-----------------------------+------------+-------------------+ +| Time|root.test4.x| result| ++-----------------------------+------------+-------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0|2.22222222222222E14| +|1970-01-01T08:00:00.003+08:00| 3.0| 3.299999952316284| +|1970-01-01T08:00:00.004+08:00| 4.0| 4.44444465637207| ++-----------------------------+------------+-------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Constant.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Constant.md new file mode 100644 index 00000000..6825bd32 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Constant.md @@ -0,0 +1,57 @@ + + +# 常序列生成函数 + +常序列生成函数用于生成所有数据点的值都相同的时间序列。 + +常序列生成函数接受一个或者多个时间序列输入,其输出的数据点的时间戳集合是这些输入序列时间戳集合的并集。 + +目前 IoTDB 支持如下常序列生成函数: + +| 函数名 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| ------ | ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | +| CONST | `value`: 输出的数据点的值
`type`: 输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数 `type` 决定 | 根据输入属性 `value` 和 `type` 输出用户指定的常序列。 | +| PI | 无 | DOUBLE | 常序列的值:`π` 的 `double` 值,圆的周长与其直径的比值,即圆周率,等于 *Java标准库* 中的`Math.PI`。 | +| E | 无 | DOUBLE | 常序列的值:`e` 的 `double` 值,自然对数的底,它等于 *Java 标准库* 中的 `Math.E`。 | + +例如: + +``` sql +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; +``` + +结果: + +``` +select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|const(root.sg1.d1.s1, "value"="1024", "type"="INT64")|pi(root.sg1.d1.s2)|e(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 1024| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.001+08:00| 1.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.002+08:00| 2.0| null| 1024| null| 2.718281828459045| +|1970-01-01T08:00:00.003+08:00| null| 3.0| null| 3.141592653589793| 2.718281828459045| +|1970-01-01T08:00:00.004+08:00| null| 4.0| null| 3.141592653589793| 2.718281828459045| ++-----------------------------+--------------+--------------+-----------------------------------------------------+------------------+---------------------------------+ +Total line number = 5 +It costs 0.005s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Continuous-Interval.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Continuous-Interval.md new file mode 100644 index 00000000..c704a7e3 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Continuous-Interval.md @@ -0,0 +1,75 @@ + + +# 区间查询函数 + +## 连续满足区间函数 + +连续满足条件区间函数用来查询所有满足指定条件的连续区间。 + +按返回值可分为两类: +1. 返回满足条件连续区间的起始时间戳和时间跨度(时间跨度为0表示此处只有起始时间这一个数据点满足条件) +2. 返回满足条件连续区间的起始时间戳和后面连续满足条件的点的个数(个数为1表示此处只有起始时间这一个数据点满足条件) + +| 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | +|-------------------|--------------------------------------|------------------------------------------------|-------|------------------------------------------------------------------| +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | + +测试数据: +``` +IoTDB> select s1,s2,s3,s4,s5 from root.sg.d2; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d2.s1|root.sg.d2.s2|root.sg.d2.s3|root.sg.d2.s4|root.sg.d2.s5| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.004+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.005+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.006+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.007+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+ +``` + +sql: +```sql +select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; +``` + +结果: +``` ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +| Time|root.sg.d2.s1|zero_count(root.sg.d2.s1)|non_zero_count(root.sg.d2.s2)|zero_duration(root.sg.d2.s3)|non_zero_duration(root.sg.d2.s4)| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.001+08:00| 1| null| 2| null| 1| +|1970-01-01T08:00:00.002+08:00| 1| null| null| null| null| +|1970-01-01T08:00:00.003+08:00| 0| 1| null| 0| null| +|1970-01-01T08:00:00.004+08:00| 1| null| 1| null| 0| +|1970-01-01T08:00:00.005+08:00| 0| 2| null| 1| null| +|1970-01-01T08:00:00.006+08:00| 0| null| null| null| null| +|1970-01-01T08:00:00.007+08:00| 1| null| 1| null| 0| ++-----------------------------+-------------+-------------------------+-----------------------------+----------------------------+--------------------------------+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conversion.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conversion.md new file mode 100644 index 00000000..6bb5d867 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Conversion.md @@ -0,0 +1,102 @@ + + +# 数据类型转换 + +## CAST + +### 函数简介 + +当前 IoTDB 支持6种数据类型,其中包括 INT32、INT64、FLOAT、DOUBLE、BOOLEAN 以及 TEXT。当我们对数据进行查询或者计算时可能需要进行数据类型的转换, 比如说将 TEXT 转换为 INT32,或者提高数据精度,比如说将 FLOAT 转换为 DOUBLE。IoTDB 支持使用cast 函数对数据类型进行转换。 + +语法示例如下: + +```sql +SELECT cast(s1 as INT32) from root.sg +``` + +cast 函数语法形式上与 PostgreSQL 一致,AS 后指定的数据类型表明要转换成的目标类型,目前 IoTDB 支持的六种数据类型均可以在 cast 函数中使用,遵循的转换规则如下表所示,其中行表示原始数据类型,列表示要转化成的目标数据类型: + +| | **INT32** | **INT64** | **FLOAT** | **DOUBLE** | **BOOLEAN** | **TEXT** | +| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ----------------------------------------------- | ----------------------- | ------------------------------------------------------------ | -------------------------------- | +| **INT32** | 不转化 | 直接转化 | 直接转化 | 直接转化 | !=0 : true
==0: false | String.valueOf() | +| **INT64** | 超出 INT32 范围:执行抛异常
否则:直接转化 | 不转化 | 直接转化 | 直接转化 | !=0L : true
==0: false | String.valueOf() | +| **FLOAT** | 超出 INT32 范围:执行抛异常
否则:四舍五入(Math.round()) | 超出 INT64 范围:执行抛异常
否则:四舍五入(Math.round()) | 不转化 | 直接转化 | !=0.0f : true
==0: false | String.valueOf() | +| **DOUBLE** | 超出 INT32 范围:执行抛异常
否则:四舍五入(Math.round()) | 超出 INT64 范围:执行抛异常
否则:四舍五入(Math.round()) | 超出 FLOAT 范围:执行抛异常
否则:直接转化 | 不转化 | !=0.0 : true
==0: false | String.valueOf() | +| **BOOLEAN** | true: 1
false: 0 | true: 1L
false: 0 | true: 1.0f
false: 0 | true: 1.0
false: 0 | 不转化 | true: "true"
false: "false" | +| **TEXT** | Integer.parseInt() | Long.parseLong() | Float.parseFloat() | Double.parseDouble() | text.toLowerCase =="true" : true
text.toLowerCase =="false" : false
其它情况:执行抛异常 | 不转化 | + +### 使用示例 + +``` +// timeseries +IoTDB> show timeseries root.sg.d1.** ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +| Timeseries|Alias|Database|DataType|Encoding|Compression|Tags|Attributes|Deadband|DeadbandParameters| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ +|root.sg.d1.s3| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s4| null| root.sg| DOUBLE| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s5| null| root.sg| BOOLEAN| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s6| null| root.sg| TEXT| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s1| null| root.sg| INT32| PLAIN| SNAPPY|null| null| null| null| +|root.sg.d1.s2| null| root.sg| INT64| PLAIN| SNAPPY|null| null| null| null| ++-------------+-----+--------+--------+--------+-----------+----+----------+--------+------------------+ + +// data of timeseries +IoTDB> select * from root.sg.d1; ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +| Time|root.sg.d1.s3|root.sg.d1.s4|root.sg.d1.s5|root.sg.d1.s6|root.sg.d1.s1|root.sg.d1.s2| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| false| 10000| 0| 0| +|1970-01-01T08:00:00.001+08:00| 1.0| 1.0| false| 3| 1| 1| +|1970-01-01T08:00:00.002+08:00| 2.7| 2.7| true| TRue| 2| 2| +|1970-01-01T08:00:00.003+08:00| 3.33| 3.33| true| faLse| 3| 3| ++-----------------------------+-------------+-------------+-------------+-------------+-------------+-------------+ + +// cast BOOLEAN to other types +IoTDB> select cast(s5 as INT32), cast(s5 as INT64),cast(s5 as FLOAT),cast(s5 as DOUBLE), cast(s5 as TEXT) from root.sg.d1 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +| Time|CAST(root.sg.d1.s5 AS INT32)|CAST(root.sg.d1.s5 AS INT64)|CAST(root.sg.d1.s5 AS FLOAT)|CAST(root.sg.d1.s5 AS DOUBLE)|CAST(root.sg.d1.s5 AS TEXT)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ +|1970-01-01T08:00:00.000+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.001+08:00| 0| 0| 0.0| 0.0| false| +|1970-01-01T08:00:00.002+08:00| 1| 1| 1.0| 1.0| true| +|1970-01-01T08:00:00.003+08:00| 1| 1| 1.0| 1.0| true| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+---------------------------+ + +// cast TEXT to numeric types +IoTDB> select cast(s6 as INT32), cast(s6 as INT64), cast(s6 as FLOAT), cast(s6 as DOUBLE) from root.sg.d1 where time < 2 ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +| Time|CAST(root.sg.d1.s6 AS INT32)|CAST(root.sg.d1.s6 AS INT64)|CAST(root.sg.d1.s6 AS FLOAT)|CAST(root.sg.d1.s6 AS DOUBLE)| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 10000| 10000| 10000.0| 10000.0| +|1970-01-01T08:00:00.001+08:00| 3| 3| 3.0| 3.0| ++-----------------------------+----------------------------+----------------------------+----------------------------+-----------------------------+ + +// cast TEXT to BOOLEAN +IoTDB> select cast(s6 as BOOLEAN) from root.sg.d1 where time >= 2 ++-----------------------------+------------------------------+ +| Time|CAST(root.sg.d1.s6 AS BOOLEAN)| ++-----------------------------+------------------------------+ +|1970-01-01T08:00:00.002+08:00| true| +|1970-01-01T08:00:00.003+08:00| false| ++-----------------------------+------------------------------+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Matching.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Matching.md new file mode 100644 index 00000000..124f8ea3 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Matching.md @@ -0,0 +1,333 @@ + + +# 数据匹配 + +## Cov + +### 函数简介 + +本函数用于计算两列数值型数据的总体协方差。 + +**函数名:** COV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为总体协方差的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +## Dtw + +### 函数简介 + +本函数用于计算两列数值型数据的 DTW 距离。 + +**函数名:** DTW + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为两个时间序列的 DTW 距离值。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出 0。 + + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +## Pearson + +### 函数简介 + +本函数用于计算两列数值型数据的皮尔森相关系数。 + +**函数名:** PEARSON + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为皮尔森相关系数的数据点。 + +**提示:** + ++ 如果某行数据中包含空值、缺失值或`NaN`,该行数据将会被忽略; ++ 如果数据中所有的行都被忽略,函数将会输出`NaN`。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +## PtnSym + +### 函数简介 + +本函数用于寻找序列中所有对称度小于阈值的对称子序列。对称度通过 DTW 计算,值越小代表序列对称性越高。 + +**函数名:** PTNSYM + +**输入序列:** 仅支持一个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:对称子序列的长度,是一个正整数,默认值为 10。 ++ `threshold`:对称度阈值,是一个非负数,只有对称度小于等于该值的对称子序列才会被输出。在缺省情况下,所有的子序列都会被输出。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中的每一个数据点对应于一个对称子序列,时间戳为子序列的起始时刻,值为对称度。 + + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +## XCorr + +### 函数简介 + +本函数用于计算两条时间序列的互相关函数值, +对离散序列而言,互相关函数可以表示为 +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +常用于表征两条序列在不同对齐条件下的相似度。 + +**函数名:** XCORR + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点, +其中正中心的值为两条序列按照预先对齐的结果计算的互相关系数(即等于以上公式的$CR(0)$), +前半部分的值表示将后一条输入序列向前平移时计算的互相关系数, +直至两条序列没有重合的数据点(不包含完全分离时的结果$CR(-N)=0.0$), +后半部分类似。 +用公式可表示为(所有序列的索引从1开始计数): +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**提示:** + ++ 两条序列中的`null` 和`NaN` 值会被忽略,在计算中表现为 0。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Profiling.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Profiling.md new file mode 100644 index 00000000..21399d5f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Profiling.md @@ -0,0 +1,1878 @@ + + +# 数据画像 + +## ACF + +### 函数简介 + +本函数用于计算时间序列的自相关函数值,即序列与自身之间的互相关函数,详情参见[XCorr](./Data-Matching.md#XCorr)函数文档。 + +**函数名:** ACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列中共包含$2N-1$个数据点,每个值的具体含义参见[XCorr](./Data-Matching.md#XCorr)函数文档。 + +**提示:** + ++ 序列中的`NaN`值会被忽略,在计算中表现为0。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +## Distinct + +### 函数简介 + +本函数可以返回输入序列中出现的所有不同的元素。 + +**函数名:** DISTINCT + +**输入序列:** 仅支持单个输入序列,类型可以是任意的 + +**输出序列:** 输出单个序列,类型与输入相同。 + +**提示:** + ++ 输出序列的时间戳是无意义的。输出顺序是任意的。 ++ 缺失值和空值将被忽略,但`NaN`不会被忽略。 ++ 字符串区分大小写 + + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select distinct(s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +## Histogram + +### 函数简介 + +本函数用于计算单列数值型数据的分布直方图。 + +**函数名:** HISTOGRAM + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `min`:表示所求数据范围的下限,默认值为 -Double.MAX_VALUE。 ++ `max`:表示所求数据范围的上限,默认值为 Double.MAX_VALUE,`start`的值必须小于或等于`end`。 ++ `count`: 表示直方图分桶的数量,默认值为 1,其值必须为正整数。 + +**输出序列:** 直方图分桶的值,其中第 i 个桶(从 1 开始计数)表示的数据范围下界为$min+ (i-1)\cdot\frac{max-min}{count}$,数据范围上界为$min+ i \cdot \frac{max-min}{count}$。 + + +**提示:** + ++ 如果某个数据点的数值小于`min`,它会被放入第 1 个桶;如果某个数据点的数值大于`max`,它会被放入最后 1 个桶。 ++ 数据中的空值、缺失值和`NaN`将会被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +## Integral + +### 函数简介 + +本函数用于计算时间序列的数值积分,即以时间为横坐标、数值为纵坐标绘制的折线图中折线以下的面积。 + +**函数名:** INTEGRAL + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `unit`:积分求解所用的时间轴单位,取值为 "1S", "1s", "1m", "1H", "1d"(区分大小写),分别表示以毫秒、秒、分钟、小时、天为单位计算积分。 + 缺省情况下取 "1s",以秒为单位。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为积分结果的数据点。 + +**提示:** + ++ 积分值等于折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和,不同时间单位下相当于横轴进行不同倍数放缩,得到的积分值可直接按放缩倍数转换。 + ++ 数据中`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + +### 使用示例 + +#### 参数缺省 + +缺省情况下积分以1s为时间单位。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + + +#### 指定时间单位 + +指定以分钟为时间单位。 + + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +其计算公式为: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+3) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +## IntegralAvg + +### 函数简介 + +本函数用于计算时间序列的函数均值,即在相同时间单位下的数值积分除以序列总的时间跨度。更多关于数值积分计算的信息请参考`Integral`函数。 + +**函数名:** INTEGRALAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为时间加权平均结果的数据点。 + +**提示:** + ++ 时间加权的平均值等于在任意时间单位`unit`下计算的数值积分(即折线图中每相邻两个数据点和时间轴形成的直角梯形的面积之和), + 除以相同时间单位下输入序列的时间跨度,其值与具体采用的时间单位无关,默认与 IoTDB 时间单位一致。 + ++ 数据中的`NaN`将会被忽略。折线将以临近两个有值数据点为准。 + ++ 输入序列为空时,函数输出结果为 0;仅有一个数据点时,输出结果为该点数值。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +其计算公式为: +$$\frac{1}{2}[(1+2)\times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +## Mad + +### 函数简介 + +本函数用于计算单列数值型数据的精确或近似绝对中位差,绝对中位差为所有数值与其中位数绝对偏移量的中位数。 + +如有数据集$\{1,3,3,5,5,6,7,8,9\}$,其中位数为5,所有数值与中位数的偏移量的绝对值为$\{0,0,1,2,2,2,3,4,4\}$,其中位数为2,故而原数据集的绝对中位差为2。 + +**函数名:** MAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `error`:近似绝对中位差的基于数值的误差百分比,取值范围为 [0,1),默认值为 0。如当`error`=0.01 时,记精确绝对中位差为a,近似绝对中位差为b,不等式 $0.99a \le b \le 1.01a$ 成立。当`error`=0 时,计算结果为精确绝对中位差。 + + +**输出序列:** 输出单个序列,类型为DOUBLE,序列仅包含一个时间戳为 0、值为绝对中位差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +### 使用示例 + +#### 精确查询 + +当`error`参数缺省或为0时,本函数计算精确绝对中位差。 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +用于查询的 SQL 语句: + +```sql +select mad(s0) from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------+ +| Time| mad(root.test.s0)| ++-----------------------------+------------------+ +|1970-01-01T08:00:00.000+08:00|0.6806197166442871| ++-----------------------------+------------------+ +``` + +#### 近似查询 + +当`error`参数取值不为 0 时,本函数计算近似绝对中位差。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select mad(s0, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s0, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.6806616245859518| ++-----------------------------+---------------------------------+ +``` + +## Median + +### 函数简介 + +本函数用于计算单列数值型数据的精确或近似中位数。中位数是顺序排列的一组数据中居于中间位置的数;当序列有偶数个时,中位数为中间二者的平均数。 + +**函数名:** MEDIAN + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `error`:近似中位数的基于排名的误差百分比,取值范围 [0,1),默认值为 0。如当`error`=0.01 时,计算出的中位数的真实排名百分比在 0.49~0.51 之间。当`error`=0 时,计算结果为精确中位数。 + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为中位数的数据点。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +用于查询的 SQL 语句: + +```sql +select median(s0, "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s0, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.021884560585022| ++-----------------------------+------------------------------------+ +``` + +## MinMax + +### 函数简介 + +本函数将输入序列使用 min-max 方法进行标准化。最小值归一至 0,最大值归一至 1. + +**函数名:** MINMAX + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为"batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供最大值及最小值进行流式计算转换。默认为 "batch"。 ++ `min`:使用流式计算时的最小值。 ++ `max`:使用流式计算时的最大值。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +### 使用示例 + +#### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select minmax(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + +## Mode + +### 函数简介 + +本函数用于计算时间序列的众数,即出现次数最多的元素。 + +**函数名:** MODE + +**输入序列:** 仅支持单个输入序列,类型可以是任意的。 + +**输出序列:** 输出单个序列,类型与输入相同,序列仅包含一个时间戳为众数第一次出现的时间戳、值为众数的数据点。 + +**提示:** + ++ 如果有多个出现次数最多的元素,将会输出任意一个。 ++ 数据中的空值和缺失值将会被忽略,但`NaN`不会被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| Hello| +|1970-01-01T08:00:00.004+08:00| World| +|1970-01-01T08:00:00.005+08:00| World| +|1970-01-01T08:00:01.600+08:00| World| +|1970-01-15T09:37:34.451+08:00| Hello| +|1970-01-15T09:37:34.452+08:00| hello| +|1970-01-15T09:37:34.453+08:00| Hello| +|1970-01-15T09:37:34.454+08:00| World| +|1970-01-15T09:37:34.455+08:00| World| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select mode(s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+---------------------+ +| Time|mode(root.test.d2.s2)| ++-----------------------------+---------------------+ +|1970-01-01T08:00:00.004+08:00| World| ++-----------------------------+---------------------+ +``` + +## MvAvg + +### 函数简介 + +本函数计算序列的移动平均。 + +**函数名:** MVAVG + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:移动窗口的长度。默认值为 10. + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +### 使用示例 + +#### 指定窗口长度 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +输出序列: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +## PACF + +### 函数简介 + +本函数通过求解 Yule-Walker 方程,计算序列的偏自相关系数。对于特殊的输入序列,方程可能没有解,此时输出`NaN`。 + +**函数名:** PACF + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:最大滞后阶数。默认值为$\min(10\log_{10}n,n-1)$,$n$表示数据点个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +### 使用示例 + +#### 指定滞后阶数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2019-12-27T00:00:00.000+08:00| 5.0| +|2019-12-27T00:05:00.000+08:00| 5.0| +|2019-12-27T00:10:00.000+08:00| 5.0| +|2019-12-27T00:15:00.000+08:00| 5.0| +|2019-12-27T00:20:00.000+08:00| 6.0| +|2019-12-27T00:25:00.000+08:00| 5.0| +|2019-12-27T00:30:00.000+08:00| 6.0| +|2019-12-27T00:35:00.000+08:00| 6.0| +|2019-12-27T00:40:00.000+08:00| 6.0| +|2019-12-27T00:45:00.000+08:00| 6.0| +|2019-12-27T00:50:00.000+08:00| 6.0| +|2019-12-27T00:55:00.000+08:00| 5.982609| +|2019-12-27T01:00:00.000+08:00| 5.9652176| +|2019-12-27T01:05:00.000+08:00| 5.947826| +|2019-12-27T01:10:00.000+08:00| 5.9304347| +|2019-12-27T01:15:00.000+08:00| 5.9130435| +|2019-12-27T01:20:00.000+08:00| 5.8956523| +|2019-12-27T01:25:00.000+08:00| 5.878261| +|2019-12-27T01:30:00.000+08:00| 5.8608694| +|2019-12-27T01:35:00.000+08:00| 5.843478| +............ +Total line number = 18066 +``` + +用于查询的 SQL 语句: + +```sql +select pacf(s1, "lag"="5") from root.test +``` + +输出序列: + +``` ++-----------------------------+-----------------------------+ +| Time|pacf(root.test.s1, "lag"="5")| ++-----------------------------+-----------------------------+ +|2019-12-27T00:00:00.000+08:00| 1.0| +|2019-12-27T00:05:00.000+08:00| 0.3528915091942786| +|2019-12-27T00:10:00.000+08:00| 0.1761346122516304| +|2019-12-27T00:15:00.000+08:00| 0.1492391973294682| +|2019-12-27T00:20:00.000+08:00| 0.03560059645868398| +|2019-12-27T00:25:00.000+08:00| 0.0366222998995286| ++-----------------------------+-----------------------------+ +``` + +## Percentile + +### 函数简介 + +本函数用于计算单列数值型数据的精确或近似分位数。 + +**函数名:** PERCENTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名百分比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算中位数。 ++ `error`:近似分位数的基于排名的误差百分比,取值范围为 [0,1),默认值为0。如`rank`=0.5 且`error`=0.01,则计算出的分位数的真实排名百分比在 0.49~0.51之间。当`error`=0 时,计算结果为精确分位数。 + +**输出序列:** 输出单个序列,类型与输入序列相同。当`error`=0时,序列仅包含一个时间戳为分位数第一次出现的时间戳、值为分位数的数据点;否则,输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +用于查询的 SQL 语句: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|percentile(root.test.s0, "rank"="0.2", "error"="0.01")| ++-----------------------------+------------------------------------------------------+ +|2021-03-17T10:35:02.054+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +``` + +## Quantile + +### 函数简介 + +本函数用于计算单列数值型数据的近似分位数。本函数基于KLL sketch算法实现。 + +**函数名:** QUANTILE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `rank`:所求分位数在所有数据中的排名比,取值范围为 (0,1],默认值为 0.5。如当设为 0.5时则计算近似中位数。 ++ `K`:允许维护的KLL sketch大小,最小值为100,默认值为800。如`rank`=0.5 且`K`=800,则计算出的分位数的真实排名比有至少99%的可能性在 0.49~0.51之间。 + +**输出序列:** 输出单个序列,类型与输入序列相同。输出值的时间戳为0。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +### 使用示例 + + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +用于查询的 SQL 语句: + +```sql +select quantile(s0, "rank"="0.2", "K"="800") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|quantile(root.test.s0, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +``` + +## Period + +### 函数简介 + +本函数用于计算单列数值型数据的周期。 + +**函数名:** PERIOD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 INT32,序列仅包含一个时间戳为 0、值为周期的数据点。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select period(s1) from root.test.d3 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +## QLB + +### 函数简介 + +本函数对输入序列计算$Q_{LB} $统计量,并计算对应的p值。p值越小表明序列越有可能为非平稳序列。 + +**函数名:** QLB + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `lag`:计算时用到的最大延迟阶数,取值应为 1 至 n-2 之间的整数,n 为序列采样总数。默认取 n-2。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列是$Q_{LB} $统计量对应的 p 值,时间标签代表偏移阶数。 + +**提示:** $Q_{LB} $统计量由自相关系数求得,如需得到统计量而非 p 值,可以使用 ACF 函数。 + +### 使用示例 + +#### 使用默认参数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select QLB(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +## Resample + +### 函数简介 + +本函数对输入序列按照指定的频率进行重采样,包括上采样和下采样。目前,本函数支持的上采样方法包括`NaN`填充法 (NaN)、前值填充法 (FFill)、后值填充法 (BFill) 以及线性插值法 (Linear);本函数支持的下采样方法为分组聚合,聚合方法包括最大值 (Max)、最小值 (Min)、首值 (First)、末值 (Last)、平均值 (Mean)和中位数 (Median)。 + +**函数名:** RESAMPLE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `every`:重采样频率,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 ++ `interp`:上采样的插值方法,取值为 'NaN'、'FFill'、'BFill' 或 'Linear'。在缺省情况下,使用`NaN`填充法。 ++ `aggr`:下采样的聚合方法,取值为 'Max'、'Min'、'First'、'Last'、'Mean' 或 'Median'。在缺省情况下,使用平均数聚合。 ++ `start`:重采样的起始时间(包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用第一个有效数据点的时间戳。 ++ `end`:重采样的结束时间(不包含),是一个格式为 'yyyy-MM-dd HH:mm:ss' 的时间字符串。在缺省情况下,使用最后一个有效数据点的时间戳。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。该序列按照重采样频率严格等间隔分布。 + +**提示:** 数据中的`NaN`将会被忽略。 + +### 使用示例 + +#### 上采样 + +当重采样频率高于数据原始频率时,将会进行上采样。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +#### 下采样 + +当重采样频率低于数据原始频率时,将会进行下采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + +##### 指定重采样时间段 + +可以使用`start`和`end`两个参数指定重采样的时间段,超出实际时间范围的部分会被插值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +## Sample + +### 函数简介 + +本函数对输入序列进行采样,即从输入序列中选取指定数量的数据点并输出。目前,本函数支持三种采样方法:**蓄水池采样法 (reservoir sampling)** 对数据进行随机采样,所有数据点被采样的概率相同;**等距采样法 (isometric sampling)** 按照相等的索引间隔对数据进行采样,**最大三角采样法 (triangle sampling)** 对所有数据会按采样率分桶,每个桶内会计算数据点间三角形面积,并保留面积最大的点,该算法通常用于数据的可视化展示中,采用过程可以保证一些关键的突变点在采用中得到保留,更多抽样算法细节可以阅读论文 [here](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf)。 + +**函数名:** SAMPLE + +**输入序列:** 仅支持单个输入序列,类型可以是任意的。 + +**参数:** + ++ `method`:采样方法,取值为 'reservoir','isometric' 或 'triangle' 。在缺省情况下,采用蓄水池采样法。 ++ `k`:采样数,它是一个正整数,在缺省情况下为 1。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列的长度为采样数,序列中的每一个数据点都来自于输入序列。 + +**提示:** 如果采样数大于序列长度,那么输入序列中所有的数据点都会被输出。 + +### 使用示例 + + +#### 蓄水池采样 + +当`method`参数为 'reservoir' 或缺省时,采用蓄水池采样法对输入序列进行采样。由于该采样方法具有随机性,下面展示的输出序列只是一种可能的结果。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + + +用于查询的 SQL 语句: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + + +#### 等距采样 + +当`method`参数为 'isometric' 时,采用等距采样法对输入序列进行采样。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +## Segment + +### 函数简介 + +本函数按照数据的线性变化趋势将数据划分为多个子序列,返回分段直线拟合后的子序列首值或所有拟合值。 + +**函数名:** SEGMENT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `output`:"all" 输出所有拟合值;"first" 输出子序列起点拟合值。默认为 "first"。 + ++ `error`:判定存在线性趋势的误差允许阈值。误差的定义为子序列进行线性拟合的误差的绝对值的均值。默认为 0.1. + +**输出序列:** 输出单个序列,类型为 DOUBLE。 + +**提示:** 函数默认所有数据等时间间隔分布。函数读取所有数据,若原始数据过多,请先进行降采样处理。拟合采用自底向上方法,子序列的尾值可能会被认作子序列首值输出。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select segment(s1,"error"="0.1") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +## Skew + +### 函数简介 + +本函数用于计算单列数值型数据的总体偏度 + +**函数名:** SKEW + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为 DOUBLE,序列仅包含一个时间戳为 0、值为总体偏度的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select skew(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +## Spline + +### 函数简介 + +本函数提供对原始序列进行三次样条曲线拟合后的插值重采样。 + +**函数名:** SPLINE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `points`:重采样个数。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +**提示**:输出序列保留输入序列的首尾值,等时间间隔采样。仅当输入点个数不少于 4 个时才计算插值。 + +### 使用示例 + +#### 指定插值个数 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spline(s1, "points"="151") from root.test +``` + +输出序列: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +## Spread + +### 函数简介 + +本函数用于计算时间序列的极差,即最大值减去最小值的结果。 + +**函数名:** SPREAD + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型与输入相同,序列仅包含一个时间戳为 0 、值为极差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + +## Stddev + +### 函数简介 + +本函数用于计算单列数值型数据的总体标准差。 + +**函数名:** STDDEV + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。序列仅包含一个时间戳为 0、值为总体标准差的数据点。 + +**提示:** 数据中的空值、缺失值和`NaN`将会被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select stddev(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------+ +| Time|stddev(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 5.7662812973353965| ++-----------------------------+-----------------------+ +``` + +## ZScore + +### 函数简介 + +本函数将输入序列使用z-score方法进行归一化。 + +**函数名:** ZSCORE + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `compute`:若设置为 "batch",则将数据全部读入后转换;若设置为 "stream",则需用户提供均值及方差进行流式计算转换。默认为 "batch"。 ++ `avg`:使用流式计算时的均值。 ++ `sd`:使用流式计算时的标准差。 + +**输出序列**:输出单个序列,类型为 DOUBLE。 + +### 使用示例 + +#### 全数据计算 + +输入序列: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select zscore(s1) from root.test +``` + +输出序列: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Quality.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Quality.md new file mode 100644 index 00000000..66c22108 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Quality.md @@ -0,0 +1,579 @@ + + +# 数据质量 + +## Completeness + +### 函数简介 + +本函数用于计算时间序列的完整性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的完整性,并输出窗口第一个数据点的时间戳和窗口的完整性。 + +**函数名:** COMPLETENESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 ++ `downtime`:完整性计算是否考虑停机异常。它的取值为 'true' 或 'false',默认值为 'true'. 在考虑停机异常时,长时间的数据缺失将被视作停机,不对完整性产生影响。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行完整性计算。否则,该窗口将被忽略,不做任何输出。 + + +### 使用示例 + +#### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +#### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算完整性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +## Consistency + +### 函数简介 + +本函数用于计算时间序列的一致性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的一致性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** CONSISTENCY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行一致性计算。否则,该窗口将被忽略,不做任何输出。 + + +### 使用示例 + +#### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +#### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算一致性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +## Timeliness + +### 函数简介 + +本函数用于计算时间序列的时效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的时效性,并输出窗口第一个数据点的时间戳和窗口的时效性。 + +**函数名:** TIMELINESS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行时效性计算。否则,该窗口将被忽略,不做任何输出。 + + +### 使用示例 + +#### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +#### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算时效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +## Validity + +### 函数简介 + +本函数用于计算时间序列的有效性。将输入序列划分为若干个连续且不重叠的窗口,分别计算每一个窗口的有效性,并输出窗口第一个数据点的时间戳和窗口的有效性。 + + +**函数名:** VALIDITY + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `window`:窗口大小,它是一个大于0的整数或者一个有单位的正数。前者代表每一个窗口包含的数据点数目,最后一个窗口的数据点数目可能会不足;后者代表窗口的时间跨度,目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。缺省情况下,全部输入数据都属于同一个窗口。 + +**输出序列:** 输出单个序列,类型为DOUBLE,其中每一个数据点的值的范围都是 [0,1]. + +**提示:** 只有当窗口内的数据点数目超过10时,才会进行有效性计算。否则,该窗口将被忽略,不做任何输出。 + + +### 使用示例 + +#### 参数缺省 + +在参数缺省的情况下,本函数将会把全部输入数据都作为同一个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +输出序列: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +#### 指定窗口大小 + +在指定窗口大小的情况下,本函数会把输入数据划分为若干个窗口计算有效性。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +## Accuracy + +### 函数简介 + +本函数基于主数据计算原始时间序列的准确性。 + +**函数名**:Accuracy + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `omega`:算法窗口大小,非负整数(单位为毫秒), 在缺省情况下,算法根据不同时间差下的两个元组距离自动估计该参数。 +- `eta`:算法距离阈值,正数, 在缺省情况下,算法根据窗口中元组的距离分布自动估计该参数。 +- `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 + +**输出序列**:输出单个值,类型为DOUBLE,值的范围为[0,1]。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select Accuracy(t1,t2,t3,m1,m2,m3) from root.test +``` + +输出序列: + + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|Accuracy(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+---------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 0.875| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Repairing.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Repairing.md new file mode 100644 index 00000000..54c3786a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Data-Repairing.md @@ -0,0 +1,510 @@ + + +# 数据修复 + +## TimestampRepair + +## 函数简介 + +本函数用于时间戳修复。根据给定的标准时间间隔,采用最小化修复代价的方法,通过对数据时间戳的微调,将原本时间戳间隔不稳定的数据修复为严格等间隔的数据。在未给定标准时间间隔的情况下,本函数将使用时间间隔的中位数 (median)、众数 (mode) 或聚类中心 (cluster) 来推算标准时间间隔。 + + +**函数名:** TIMESTAMPREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `interval`: 标准时间间隔(单位是毫秒),是一个正整数。在缺省情况下,将根据指定的方法推算。 ++ `method`:推算标准时间间隔的方法,取值为 'median', 'mode' 或 'cluster',仅在`interval`缺省时有效。在缺省情况下,将使用中位数方法进行推算。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +## 使用示例 + +### 指定标准时间间隔 + +在给定`interval`参数的情况下,本函数将按照指定的标准时间间隔进行修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +### 自动推算标准时间间隔 + +如果`interval`参数没有给定,本函数将按照推算的标准时间间隔进行修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +## ValueFill + +### 函数简介 + +**函数名:** ValueFill + +**输入序列:** 单列时序数据,类型为INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, 默认为 "linear"。其中,“mean” 指使用均值填补的方法; “previous" 指使用前值填补方法;“linear" 指使用线性插值填补方法;“likelihood” 为基于速度的正态分布的极大似然估计方法;“AR” 指自回归的填补方法;“MA” 指滑动平均的填补方法;"SCREEN" 指约束填补方法;缺省情况下使用 “linear”。 + +**输出序列:** 填补后的单维序列。 + +**备注:** AR 模型采用 AR(1),时序列需满足自相关条件,否则将输出单个数据点 (0, 0.0). + +### 使用示例 +#### 使用 linear 方法进行填补 + +当`method`缺省或取值为 'linear' 时,本函数将使用线性插值方法进行填补。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuefill(s1) from root.test.d2 +``` + +输出序列: + + + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +#### 使用 previous 方法进行填补 + +当`method`取值为 'previous' 时,本函数将使前值填补方法进行数值填补。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +## ValueRepair + +### 函数简介 + +本函数用于对时间序列的数值进行修复。目前,本函数支持两种修复方法:**Screen** 是一种基于速度阈值的方法,在最小改动的前提下使得所有的速度符合阈值要求;**LsGreedy** 是一种基于速度变化似然的方法,将速度变化建模为高斯分布,并采用贪心算法极大化似然函数。 + +**函数名:** VALUEREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的方法,取值为 'Screen' 或 'LsGreedy'. 在缺省情况下,使用 Screen 方法进行修复。 ++ `minSpeed`:该参数仅在使用 Screen 方法时有效。当速度小于该值时会被视作数值异常点加以修复。在缺省情况下为中位数减去三倍绝对中位差。 ++ `maxSpeed`:该参数仅在使用 Screen 方法时有效。当速度大于该值时会被视作数值异常点加以修复。在缺省情况下为中位数加上三倍绝对中位差。 ++ `center`:该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的中心。在缺省情况下为 0。 ++ `sigma` :该参数仅在使用 LsGreedy 方法时有效。对速度变化分布建立的高斯模型的标准差。在缺省情况下为绝对中位差。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +### 使用示例 + +#### 使用 Screen 方法进行修复 + +当`method`缺省或取值为 'Screen' 时,本函数将使用 Screen 方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +#### 使用 LsGreedy 方法进行修复 + +当`method`取值为 'LsGreedy' 时,本函数将使用 LsGreedy 方法进行数值修复。 + +输入序列同上,用于查询的 SQL 语句如下: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +## MasterRepair + +### 函数简介 + +本函数实现基于主数据的时间序列数据修复。 + +**函数名:**MasterRepair + +**输入序列:** 支持多个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `omega`:算法窗口大小,非负整数(单位为毫秒), 在缺省情况下,算法根据不同时间差下的两个元组距离自动估计该参数。 +- `eta`:算法距离阈值,正数, 在缺省情况下,算法根据窗口中元组的距离分布自动估计该参数。 +- `k`:主数据中的近邻数量,正整数, 在缺省情况下,算法根据主数据中的k个近邻的元组距离自动估计该参数。 +- `output_column`:输出列的序号,默认输出第一列的修复结果。 + +**输出序列:**输出单个序列,类型与输入数据中对应列的类型相同,序列为输入列修复后的结果。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +用于查询的 SQL 语句: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +输出序列: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +## SeasonalRepair + +### 函数简介 +本函数用于对周期性时间序列的数值进行基于分解的修复。目前,本函数支持两种方法:**Classical**使用经典分解方法得到的残差项检测数值的异常波动,并使用滑动平均修复序列;**Improved**使用改进的分解方法得到的残差项检测数值的异常波动,并使用滑动中值修复序列。 + +**函数名:** SEASONALREPAIR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + ++ `method`:修复时采用的分解方法,取值为'Classical'或'Improved'。在缺省情况下,使用经典分解方法进行修复。 ++ `period`:序列的周期。 ++ `k`:残差项的范围阈值,用来限制残差项偏离中心的程度。在缺省情况下为9。 ++ `max_iter`:算法的最大迭代次数。在缺省情况下为10。 + +**输出序列:** 输出单个序列,类型与输入序列相同。该序列是修复后的输入序列。 + +**提示:** 输入序列中的`NaN`在修复之前会先进行线性插值填补。 + +### 使用示例 +#### 使用经典分解方法进行修复 +当`method`缺省或取值为'Classical'时,本函数将使用经典分解方法进行数值修复。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +#### 使用改进的分解方法进行修复 +当`method`取值为'Improved'时,本函数将使用改进的分解方法进行数值修复。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Frequency-Domain.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Frequency-Domain.md new file mode 100644 index 00000000..b9b5ecfd --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Frequency-Domain.md @@ -0,0 +1,667 @@ + + +# 频域分析 + +## Conv + +### 函数简介 + +本函数对两个输入序列进行卷积,即多项式乘法。 + + +**函数名:** CONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**输出序列:** 输出单个序列,类型为DOUBLE,它是两个序列卷积的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +## Deconv + +### 函数简介 + +本函数对两个输入序列进行去卷积,即多项式除法运算。 + +**函数名:** DECONV + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `result`:去卷积的结果,取值为'quotient'或'remainder',分别对应于去卷积的商和余数。在缺省情况下,输出去卷积的商。 + +**输出序列:** 输出单个序列,类型为DOUBLE。它是将第二个序列从第一个序列中去卷积(第一个序列除以第二个序列)的结果。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +### 使用示例 + +#### 计算去卷积的商 + +当`result`参数缺省或为'quotient'时,本函数计算去卷积的商。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +#### 计算去卷积的余数 + +当`result`参数为'remainder'时,本函数计算去卷积的余数。输入序列同上,用于查询的SQL语句如下: + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +## DWT + +### 函数简介 + +本函数对输入序列进行一维离散小波变换。 + +**函数名:** DWT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:小波滤波的类型,提供'Haar', 'DB4', 'DB6', 'DB8',其中DB指代Daubechies。若不设置该参数,则用户需提供小波滤波的系数。不区分大小写。 ++ `coef`:小波滤波的系数。若提供该参数,请使用英文逗号','分割各项,不添加空格或其它符号。 ++ `layer`:进行变换的次数,最终输出的向量个数等同于$layer+1$.默认取1。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。 + +**提示:** 输入序列长度必须为2的整数次幂。 + +### 使用示例 + +#### Haar变换 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +## FFT + +### 函数简介 + +本函数对输入序列进行快速傅里叶变换。 + +**函数名:** FFT + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `method`:傅里叶变换的类型,取值为'uniform'或'nonuniform',缺省情况下为'uniform'。当取值为'uniform'时,时间戳将被忽略,所有数据点都将被视作等距的,并应用等距快速傅里叶算法;当取值为'nonuniform'时,将根据时间戳应用非等距快速傅里叶算法(未实现)。 ++ `result`:傅里叶变换的结果,取值为'real'、'imag'、'abs'或'angle',分别对应于变换结果的实部、虚部、模和幅角。在缺省情况下,输出变换的模。 ++ `compress`:压缩参数,取值范围(0,1],是有损压缩时保留的能量比例。在缺省情况下,不进行压缩。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度与输入相等。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +### 使用示例 + +#### 等距傅里叶变换 + +当`type`参数缺省或为'uniform'时,本函数进行等距傅里叶变换。 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select fft(s1) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此在输出序列中$k=4$和$k=5$处有尖峰。 + +#### 等距傅里叶变换并压缩 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +注:基于傅里叶变换结果的共轭性质,压缩结果只保留前一半;根据给定的压缩参数,从低频到高频保留数据点,直到保留的能量比例超过该值;保留最后一个数据点以表示序列长度。 + +## HighPass + +### 函数简介 + +本函数对输入序列进行高通滤波,提取高于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** HIGHPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +### 使用示例 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此高通滤波之后的输出序列服从$y=sin(2\pi t/4)$。 + +## IFFT + +### 函数简介 + +本函数将输入的两个序列作为实部和虚部视作一个复数,进行逆快速傅里叶变换,并输出结果的实部。输入数据的格式参见`FFT`函数的输出,并支持以`FFT`函数压缩后的输出作为本函数的输入。 + +**函数名:** IFFT + +**输入序列:** 仅支持两个输入序列,类型均为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `start`:输出序列的起始时刻,是一个格式为'yyyy-MM-dd HH:mm:ss'的时间字符串。在缺省情况下,为'1970-01-01 08:00:00'。 ++ `interval`:输出序列的时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,为1s。 + + +**输出序列:** 输出单个序列,类型为DOUBLE。该序列是一个等距时间序列,它的值是将两个输入序列依次作为实部和虚部进行逆快速傅里叶变换的结果。 + +**提示:** 如果某行数据中包含空值或`NaN`,该行数据将会被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +用于查询的SQL语句: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +## LowPass + +### 函数简介 + +本函数对输入序列进行低通滤波,提取低于截止频率的分量。输入序列的时间戳将被忽略,所有数据点都将被视作等距的。 + +**函数名:** LOWPASS + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `wpass`:归一化后的截止频率,取值为(0,1),不可缺省。 + +**输出序列:** 输出单个序列,类型为DOUBLE,它是滤波后的序列,长度与时间戳均与输入一致。 + +**提示:** 输入序列中的`NaN`将被忽略。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + + +用于查询的SQL语句: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` + +注:输入序列服从$y=sin(2\pi t/4)+2sin(2\pi t/5)$,长度为20,因此低通滤波之后的输出序列服从$y=2sin(2\pi t/5)$。 + + + +## Envelope + +### 函数简介 + +本函数通过输入一维浮点数数组和用户指定的调制频率,实现对信号的解调和包络提取。解调的目标是从复杂的信号中提取感兴趣的部分,使其更易理解。比如通过解调可以找到信号的包络,即振幅的变化趋势。 + +**函数名:** Envelope + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE + +**参数:** + ++ `frequency`:频率(选填,正数。不填此参数,系统会基于序列对应时间的时间间隔来推断频率)。 ++ `amplification`: 扩增倍数(选填,正整数。输出Time列的结果为正整数的集合,不会输出小数。当频率小1时,可通过此参数对频率进行扩增以展示正常的结果)。 + +**输出序列:** ++ `Time`: 该列返回的值的含义是频率而并非时间,如果输出的格式为时间格式(如:1970-01-01T08:00:19.000+08:00),请将其转为时间戳值。 + ++ `Envelope(Path, 'frequency'='{frequency}')`:输出单个序列,类型为DOUBLE,它是包络分析之后的结果。 + +**提示:** 当解调的原始序列的值不连续时,本函数会视为连续处理,建议被分析的时间序列是一段值完整的时间序列。同时建议指定开始时间与结束时间。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:01.000+08:00| 1.0 | +|1970-01-01T08:00:02.000+08:00| 2.0 | +|1970-01-01T08:00:03.000+08:00| 3.0 | +|1970-01-01T08:00:04.000+08:00| 4.0 | +|1970-01-01T08:00:05.000+08:00| 5.0 | +|1970-01-01T08:00:06.000+08:00| 6.0 | +|1970-01-01T08:00:07.000+08:00| 7.0 | +|1970-01-01T08:00:08.000+08:00| 8.0 | +|1970-01-01T08:00:09.000+08:00| 9.0 | +|1970-01-01T08:00:10.000+08:00| 10.0 | ++-----------------------------+---------------+ +``` + +用于查询的SQL语句: +```sql +set time_display_type=long; +select envelope(s1),envelope(s1,'frequency'='1000'),envelope(s1,'amplification'='10') from root.test.d1; +``` +输出序列: + +``` ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +|Time|envelope(root.test.d1.s1)|envelope(root.test.d1.s1, "frequency"="1000")|envelope(root.test.d1.s1, "amplification"="10")| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +| 0| 6.284350808484124| 6.284350808484124| 6.284350808484124| +| 100| 1.5581923657404393| 1.5581923657404393| null| +| 200| 0.8503211038340728| 0.8503211038340728| null| +| 300| 0.512808785945551| 0.512808785945551| null| +| 400| 0.26361156774506744| 0.26361156774506744| null| +|1000| null| null| 1.5581923657404393| +|2000| null| null| 0.8503211038340728| +|3000| null| null| 0.512808785945551| +|4000| null| null| 0.26361156774506744| ++----+-------------------------+---------------------------------------------+-----------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Lambda.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Lambda.md new file mode 100644 index 00000000..af753b75 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Lambda.md @@ -0,0 +1,83 @@ + + +# Lambda 表达式 + +## JEXL 自定义函数 + +### 函数简介 + +Java Expression Language (JEXL) 是一个表达式语言引擎。我们使用 JEXL 来扩展 UDF,在命令行中,通过简易的 lambda 表达式来实现 UDF。 + +lambda 表达式中支持的运算符详见链接 [JEXL 中 lambda 表达式支持的运算符](https://commons.apache.org/proper/commons-jexl/apidocs/org/apache/commons/jexl3/package-summary.html#customization) 。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr`是一个支持标准的一元或多元参数的lambda表达式,符合`x -> {...}`或`(x, y, z) -> {...}`的格式,例如`x -> {x * 2}`, `(x, y, z) -> {x + y * z}`| INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | 返回将输入的时间序列通过lambda表达式变换的序列 | + +### 使用示例 + +输入序列: +``` +IoTDB> select * from root.ln.wf01.wt01; ++-----------------------------+---------------------+--------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.str|root.ln.wf01.wt01.st|root.ln.wf01.wt01.temperature| ++-----------------------------+---------------------+--------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| str| 10.0| 0.0| +|1970-01-01T08:00:00.001+08:00| str| 20.0| 1.0| +|1970-01-01T08:00:00.002+08:00| str| 30.0| 2.0| +|1970-01-01T08:00:00.003+08:00| str| 40.0| 3.0| +|1970-01-01T08:00:00.004+08:00| str| 50.0| 4.0| +|1970-01-01T08:00:00.005+08:00| str| 60.0| 5.0| +|1970-01-01T08:00:00.006+08:00| str| 70.0| 6.0| +|1970-01-01T08:00:00.007+08:00| str| 80.0| 7.0| +|1970-01-01T08:00:00.008+08:00| str| 90.0| 8.0| +|1970-01-01T08:00:00.009+08:00| str| 100.0| 9.0| +|1970-01-01T08:00:00.010+08:00| str| 110.0| 10.0| ++-----------------------------+---------------------+--------------------+-----------------------------+ +``` + +用于查询的SQL语句: +```sql +select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01; +``` + +输出序列: +``` ++-----------------------------+-----+-----+-----+------+-----+--------+ +| Time|jexl1|jexl2|jexl3| jexl4|jexl5| jexl6| ++-----------------------------+-----+-----+-----+------+-----+--------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| 0.0| 0.0| 10.0| 10.0str| +|1970-01-01T08:00:00.001+08:00| 2.0| 3.0| 1.0| 100.0| 21.0| 21.0str| +|1970-01-01T08:00:00.002+08:00| 4.0| 6.0| 4.0| 200.0| 32.0| 32.0str| +|1970-01-01T08:00:00.003+08:00| 6.0| 9.0| 9.0| 300.0| 43.0| 43.0str| +|1970-01-01T08:00:00.004+08:00| 8.0| 12.0| 16.0| 400.0| 54.0| 54.0str| +|1970-01-01T08:00:00.005+08:00| 10.0| 15.0| 25.0| 500.0| 65.0| 65.0str| +|1970-01-01T08:00:00.006+08:00| 12.0| 18.0| 36.0| 600.0| 76.0| 76.0str| +|1970-01-01T08:00:00.007+08:00| 14.0| 21.0| 49.0| 700.0| 87.0| 87.0str| +|1970-01-01T08:00:00.008+08:00| 16.0| 24.0| 64.0| 800.0| 98.0| 98.0str| +|1970-01-01T08:00:00.009+08:00| 18.0| 27.0| 81.0| 900.0|109.0|109.0str| +|1970-01-01T08:00:00.010+08:00| 20.0| 30.0|100.0|1000.0|120.0|120.0str| ++-----------------------------+-----+-----+-----+------+-----+--------+ +Total line number = 11 +It costs 0.118s +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Logical.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Logical.md new file mode 100644 index 00000000..42c6909b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Logical.md @@ -0,0 +1,63 @@ + + +# 逻辑运算符 + +## 一元逻辑运算符 + +- 支持运算符:`!` +- 输入数据类型:`BOOLEAN`。 +- 输出数据类型:`BOOLEAN`。 +- 注意:`!`的优先级很高,记得使用括号调整优先级。 + +## 二元逻辑运算符 + +- 支持运算符 + - AND:`and`,`&`, `&&` + - OR:`or`,`|`,`||` + +- 输入数据类型:`BOOLEAN`。 + +- 返回类型 `BOOLEAN`。 + +- 注意:当某个时间戳下左操作数和右操作数都为`BOOLEAN`类型时,二元逻辑操作才会有输出结果。 + +**示例:** + +```sql +select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; +``` + +运行结果 +``` +IoTDB> select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +| Time|root.test.a|root.test.b|root.test.a > 10|root.test.a <= root.test.b|!root.test.a <= root.test.b|(root.test.a > 10) & (root.test.a > root.test.b)| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 23| 10.0| true| false| true| true| +|1970-01-01T08:00:00.002+08:00| 33| 21.0| true| false| true| true| +|1970-01-01T08:00:00.004+08:00| 13| 15.0| true| true| false| false| +|1970-01-01T08:00:00.005+08:00| 26| 0.0| true| false| true| true| +|1970-01-01T08:00:00.008+08:00| 1| 22.0| false| true| false| false| +|1970-01-01T08:00:00.010+08:00| 23| 12.0| true| false| true| true| ++-----------------------------+-----------+-----------+----------------+--------------------------+---------------------------+------------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Machine-Learning.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Machine-Learning.md new file mode 100644 index 00000000..3a884033 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Machine-Learning.md @@ -0,0 +1,208 @@ + + +# 机器学习 + +## AR + +### 函数简介 + +本函数用于学习数据的自回归模型系数。 + +**函数名:** AR + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `p`:自回归模型的阶数。默认为1。 + +**输出序列:** 输出单个序列,类型为 DOUBLE。第一行对应模型的一阶系数,以此类推。 + +**提示:** + +- `p`应为正整数。 + +- 序列中的大部分点为等间隔采样点。 +- 序列中的缺失点通过线性插值进行填补后用于学习过程。 + +### 使用示例 + +#### 指定阶数 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +## Representation + +### 函数简介 + +本函数用于时间序列的表示。 + +**函数名:** Representation + +**输入序列:** 仅支持单个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为INT32,长度为`tb*vb`。序列的时间戳从0开始,仅用于表示顺序。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +### 使用示例 + +#### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +## RM + +### 函数简介 + +本函数用于基于时间序列表示的匹配度。 + +**函数名:** RM + +**输入序列:** 仅支持两个输入序列,类型为 INT32 / INT64 / FLOAT / DOUBLE。 + +**参数:** + +- `tb`:时间分块数量。默认为10。 +- `vb`:值分块数量。默认为10。 + +**输出序列:** 输出单个序列,类型为DOUBLE,长度为`1`。序列的时间戳从0开始,序列仅有一个数据点,其时间戳为0,值为两个时间序列的匹配度。 + +**提示:** + +- `tb `,`vb`应为正整数。 + +### 使用示例 + +#### 指定时间分块数量、值分块数量 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Mathematical.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Mathematical.md new file mode 100644 index 00000000..77331e57 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Mathematical.md @@ -0,0 +1,136 @@ + + +# 算数运算符和函数 + +## 算数运算符 + +### 一元算数运算符 + +支持的运算符:`+`, `-` + +输入数据类型要求:`INT32`, `INT64`, `FLOAT`, `DOUBLE` + +输出数据类型:与输入数据类型一致 + +### 二元算数运算符 + +支持的运算符:`+`, `-`, `*`, `/`, `%` + +输入数据类型要求:`INT32`, `INT64`, `FLOAT`和`DOUBLE` + +输出数据类型:`DOUBLE` + +注意:当某个时间戳下左操作数和右操作数都不为空(`null`)时,二元运算操作才会有输出结果 + +### 使用示例 + +例如: + +```sql +select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 +``` + +结果: + +``` ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.sg.d1.s1|-root.sg.d1.s1|root.sg.d1.s2|root.sg.d1.s2|root.sg.d1.s1 + root.sg.d1.s2|root.sg.d1.s1 - root.sg.d1.s2|root.sg.d1.s1 * root.sg.d1.s2|root.sg.d1.s1 / root.sg.d1.s2|root.sg.d1.s1 % root.sg.d1.s2| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| -1.0| 1.0| 1.0| 2.0| 0.0| 1.0| 1.0| 0.0| +|1970-01-01T08:00:00.002+08:00| 2.0| -2.0| 2.0| 2.0| 4.0| 0.0| 4.0| 1.0| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.0| -3.0| 3.0| 3.0| 6.0| 0.0| 9.0| 1.0| 0.0| +|1970-01-01T08:00:00.004+08:00| 4.0| -4.0| 4.0| 4.0| 8.0| 0.0| 16.0| 1.0| 0.0| +|1970-01-01T08:00:00.005+08:00| 5.0| -5.0| 5.0| 5.0| 10.0| 0.0| 25.0| 1.0| 0.0| ++-----------------------------+-------------+--------------+-------------+-------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.014s +``` + +## 数学函数 + +目前 IoTDB 支持下列数学函数,这些数学函数的行为与这些函数在 Java Math 标准库中对应实现的行为一致。 + +| 函数名 | 输入序列类型 | 输出序列类型 | 必要属性参数 | Java 标准库中的对应实现 | +| ------- | ------------------------------ | ------------------------ |-----------| ------------------------------------------------------------ | +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | 与输入序列的实际类型一致 | | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE |`places`:四舍五入有效位数,正数为小数点后面的有效位数,负数为整数位的有效位数 | Math#rint(Math#pow(10,places))/Math#pow(10,places)| +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sqrt(double) | + +例如: + +``` sql +select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +结果: + +``` ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +| Time| root.sg1.d1.s1|sin(root.sg1.d1.s1)| cos(root.sg1.d1.s1)|tan(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 0.8133527237573284| 0.5817708713544664| 1.3980636773094157| +|2020-12-10T17:11:49.038+08:00|4377791063319964531|-0.8938962705202537| 0.4482738644511651| -1.994085181866842| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 0.9627757585308978|-0.27030138509681073|-3.5618602479083545| +|2020-12-10T17:11:49.040+08:00|2508858212791964081|-0.6073417341629443| -0.7944406950452296| 0.7644897069734913| +|2020-12-10T17:11:49.041+08:00|2817297431185141819|-0.8419358900502509| -0.5395775727782725| 1.5603611649667768| ++-----------------------------+-------------------+-------------------+--------------------+-------------------+ +Total line number = 5 +It costs 0.008s +``` +### ROUND + +例如: +```sql +select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1 +``` + +```sql ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +| Time|root.db.d1.s4|ROUND(root.db.d1.s4)|ROUND(root.db.d1.s4,2)|ROUND(root.db.d1.s4,-1)| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 101.14345| 101.0| 101.14| 100.0| +|1970-01-01T08:00:00.002+08:00| 20.144346| 20.0| 20.14| 20.0| +|1970-01-01T08:00:00.003+08:00| 20.614372| 21.0| 20.61| 20.0| +|1970-01-01T08:00:00.005+08:00| 20.814346| 21.0| 20.81| 20.0| +|1970-01-01T08:00:00.006+08:00| 60.71443| 61.0| 60.71| 60.0| +|2023-03-13T16:16:19.764+08:00| 10.143425| 10.0| 10.14| 10.0| ++-----------------------------+-------------+--------------------+----------------------+-----------------------+ +Total line number = 6 +It costs 0.059s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Overview.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Overview.md new file mode 100644 index 00000000..a5f98927 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Overview.md @@ -0,0 +1,284 @@ + + +# 运算符和函数 + +本章介绍 IoTDB 支持的运算符和函数。IoTDB 提供了丰富的内置运算符和函数来满足您的计算需求,同时支持通过[用户自定义函数](./User-Defined-Function.md)能力进行扩展。 + +可以使用 `SHOW FUNCTIONS` 显示所有可用函数的列表,包括内置函数和自定义函数。 + +关于运算符和函数在 SQL 中的行为,可以查看文档 [选择表达式](../Query-Data/Select-Expression.md)。 + +## 运算符列表 + +### 算数运算符 +|运算符 |含义| +|----------------------------|-----------| +|`+` |取正(单目)| +|`-` |取负(单目)| +|`*` |乘| +|`/` |除| +|`%` |取余| +|`+` |加| +|`-` |减| + +详细说明及示例见文档 [算数运算符和函数](./Mathematical.md)。 + +### 比较运算符 +|运算符 |含义| +|----------------------------|-----------| +|`>` |大于| +|`>=` |大于等于| +|`<` |小于| +|`<=` |小于等于| +|`==` |等于| +|`!=` / `<>` |不等于| +|`BETWEEN ... AND ...` |在指定范围内| +|`NOT BETWEEN ... AND ...` |不在指定范围内| +|`LIKE` |匹配简单模式| +|`NOT LIKE` |无法匹配简单模式| +|`REGEXP` |匹配正则表达式| +|`NOT REGEXP` |无法匹配正则表达式| +|`IS NULL` |是空值| +|`IS NOT NULL` |不是空值| +|`IN` / `CONTAINS` |是指定列表中的值| +|`NOT IN` / `NOT CONTAINS` |不是指定列表中的值| + +详细说明及示例见文档 [比较运算符和函数](./Comparison.md)。 + +### 逻辑运算符 +|运算符 |含义| +|----------------------------|-----------| +|`NOT` / `!` |取非(单目)| +|`AND` / `&` / `&&` |逻辑与| +|`OR`/ | / || |逻辑或| + +详细说明及示例见文档 [逻辑运算符](./Logical.md)。 + +### 运算符优先级 + +运算符的优先级从高到低如下所示排列,同一行的运算符具有相同的优先级。 +```sql +!, - (单目), + (单目) +*, /, DIV, %, MOD +-, + +=, ==, <=>, >=, >, <=, <, <>, != +LIKE, REGEXP, NOT LIKE, NOT REGEXP +BETWEEN ... AND ..., NOT BETWEEN ... AND ... +IS NULL, IS NOT NULL +IN, CONTAINS, NOT IN, NOT CONTAINS +AND, &, && +OR, |, || +``` + +## 内置函数列表 + +列表中的函数无须注册即可在 IoTDB 中使用,数据函数质量库中的函数需要参考注册步骤进行注册后才能使用。 + +### 聚合函数 + +| 函数名 | 功能描述 | 允许的输入类型 | 必要的属性参数 | 输出类型 | +| ------------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | -------------- | +| SUM | 求和。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| COUNT | 计算数据点数。 | 所有类型 | 无 | INT64 | +| AVG | 求平均值。 | INT32 INT64 FLOAT DOUBLE | 无 | DOUBLE | +| EXTREME | 求具有最大绝对值的值。如果正值和负值的最大绝对值相等,则返回正值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| MAX_VALUE | 求最大值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| MIN_VALUE | 求最小值。 | INT32 INT64 FLOAT DOUBLE | 无 | 与输入类型一致 | +| FIRST_VALUE | 求时间戳最小的值。 | 所有类型 | 无 | 与输入类型一致 | +| LAST_VALUE | 求时间戳最大的值。 | 所有类型 | 无 | 与输入类型一致 | +| MAX_TIME | 求最大时间戳。 | 所有类型 | 无 | Timestamp | +| MIN_TIME | 求最小时间戳。 | 所有类型 | 无 | Timestamp | +| COUNT_IF | 求数据点连续满足某一给定条件,且满足条件的数据点个数(用keep表示)满足指定阈值的次数。 | BOOLEAN | `[keep >=/>/=/!=/= threshold`,`threshold`类型为`INT64` `ignoreNull`:可选,默认为`true`;为`true`表示忽略null值,即如果中间出现null值,直接忽略,不会打断连续性;为`false`表示不忽略null值,即如果中间出现null值,会打断连续性 | INT64 | +| TIME_DURATION | 求某一列最大一个不为NULL的值所在时间戳与最小一个不为NULL的值所在时间戳的时间戳差 | 所有类型 | 无 | INT64 | +| MODE | 求众数。注意: 1.输入序列的不同值个数过多时会有内存异常风险; 2.如果所有元素出现的频次相同,即没有众数,则返回对应时间戳最小的值; 3.如果有多个众数,则返回对应时间戳最小的众数。 | 所有类型 | 无 | 与输入类型一致 | + +详细说明及示例见文档 [聚合函数](./Aggregation.md)。 + +### 数学函数 + +| 函数名 | 输入序列类型 | 输出序列类型 | 必要属性参数 | Java 标准库中的对应实现 | +| ------- | ------------------------------ | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| SIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sin(double) | +| COS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cos(double) | +| TAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tan(double) | +| ASIN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#asin(double) | +| ACOS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#acos(double) | +| ATAN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#atan(double) | +| SINH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sinh(double) | +| COSH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#cosh(double) | +| TANH | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#tanh(double) | +| DEGREES | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toDegrees(double) | +| RADIANS | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#toRadians(double) | +| ABS | INT32 / INT64 / FLOAT / DOUBLE | 与输入序列的实际类型一致 | | Math#abs(int) / Math#abs(long) /Math#abs(float) /Math#abs(double) | +| SIGN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#signum(double) | +| CEIL | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#ceil(double) | +| FLOOR | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#floor(double) | +| ROUND | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | `places`:四舍五入有效位数,正数为小数点后面的有效位数,负数为整数位的有效位数 | Math#rint(Math#pow(10,places))/Math#pow(10,places) | +| EXP | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#exp(double) | +| LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log(double) | +| LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log10(double) | +| SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sqrt(double) | + + +详细说明及示例见文档 [算数运算符和函数](./Mathematical.md)。 + +### 比较函数 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +| -------- | ------------------------------ | ------------------------------------- | ------------ | ---------------------------------------------------- | +| ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`:DOUBLE类型 | BOOLEAN 类型 | 返回`ts_value >= threshold`的bool值 | +| IN_RANGE | INT32 / INT64 / FLOAT / DOUBLE | `lower`:DOUBLE类型 `upper`:DOUBLE类型 | BOOLEAN类型 | 返回`ts_value >= lower && ts_value <= upper`的bool值 | + +详细说明及示例见文档 [比较运算符和函数](./Comparison.md)。 + +### 字符串处理函数 + +| 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | +|-----------------| ------------ |-----------------------------------------------------------------------------------------------------------| ------------ |-------------------------------------------------------------------------| +| STRING_CONTAINS | TEXT | `s`: 待搜寻的字符串 | BOOLEAN | 判断字符串中是否存在`s` | +| STRING_MATCHES | TEXT | `regex`: Java 标准库风格的正则表达式 | BOOLEAN | 判断字符串是否能够被正则表达式`regex`匹配 | +| LENGTH | TEXT | 无 | INT32 | 返回字符串的长度 | +| LOCATE | TEXT | `target`: 需要被定位的子串
`reverse`: 指定是否需要倒序定位,默认值为`false`, 即从左至右定位 | INT32 | 获取`target`子串第一次出现在输入序列的位置,如果输入序列中不包含`target`则返回 -1 | +| STARTSWITH | TEXT | `target`: 需要匹配的前缀 | BOOLEAN | 判断字符串是否有指定前缀 | +| ENDSWITH | TEXT | `target`: 需要匹配的后缀 | BOOLEAN | 判断字符串是否有指定后缀 | +| CONCAT | TEXT | `targets`: 一系列 K-V, key需要以`target`为前缀且不重复, value是待拼接的字符串。
`series_behind`: 指定拼接时时间序列是否在后面,默认为`false`。 | TEXT | 拼接字符串和`target`字串 | +| SUBSTRING | TEXT | `from`: 指定子串开始下标
`for`: 指定的字符个数之后停止 | TEXT | 提取字符串的子字符串,从指定的第一个字符开始,并在指定的字符数之后停止。下标从1开始。from 和 for的范围是 INT32 类型取值范围。 | +| REPLACE | TEXT | 第一个参数: 需要替换的目标子串
第二个参数:要替换成的子串 | TEXT | 将输入序列中的子串替换成目标子串 | +| UPPER | TEXT | 无 | TEXT | 将字符串转化为大写 | +| LOWER | TEXT | 无 | TEXT | 将字符串转化为小写 | +| TRIM | TEXT | 无 | TEXT | 移除字符串前后的空格 | +| STRCMP | TEXT | 无 | TEXT | 用于比较两个输入序列,如果值相同返回 `0` , 序列1的值小于序列2的值返回一个`负数`,序列1的值大于序列2的值返回一个`正数` | + +详细说明及示例见文档 [字符串处理函数](./String.md)。 + +### 数据类型转换函数 + +| 函数名 | 必要的属性参数 | 输出序列类型 | 功能类型 | +| ------ | ------------------------------------------------------------ | ------------------------ | ---------------------------------- | +| CAST | `type`:输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数`type`决定 | 将数据转换为`type`参数指定的类型。 | + +详细说明及示例见文档 [数据类型转换](./Conversion.md)。 + +### 常序列生成函数 + +| 函数名 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| ------ | ------------------------------------------------------------ | -------------------------- | ------------------------------------------------------------ | +| CONST | `value`: 输出的数据点的值 `type`: 输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数 `type` 决定 | 根据输入属性 `value` 和 `type` 输出用户指定的常序列。 | +| PI | 无 | DOUBLE | 常序列的值:`π` 的 `double` 值,圆的周长与其直径的比值,即圆周率,等于 *Java标准库* 中的`Math.PI`。 | +| E | 无 | DOUBLE | 常序列的值:`e` 的 `double` 值,自然对数的底,它等于 *Java 标准库* 中的 `Math.E`。 | + +详细说明及示例见文档 [常序列生成函数](./Constant.md)。 + +### 选择函数 + +| 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| -------- | ------------------------------------- | ------------------------------------------------- | ------------------------ | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最大的`k`个数据点。若多于`k`个数据点的值并列最大,则返回时间戳最小的数据点。 | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最小的`k`个数据点。若多于`k`个数据点的值并列最小,则返回时间戳最小的数据点。 | + +详细说明及示例见文档 [选择函数](./Selection.md)。 + +### 区间查询函数 + +| 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | +| ----------------- | ------------------------------------ | ------------------------------------------------------ | ------------ | ------------------------------------------------------------ | +| ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0 `max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | +| NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0 `max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | +| ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1 `max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | +| NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1 `max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | + +详细说明及示例见文档 [区间查询函数](./Continuous-Interval.md)。 + +### 趋势计算函数 + +| 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | +| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 无 | INT64 | 统计序列中某数据点的时间戳与前一数据点时间戳的差。范围内第一个数据点没有对应的结果输出。 | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 无 | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 无 | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差的绝对值。范围内第一个数据点没有对应的结果输出。 | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | 无 | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率,数量上等同于 DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | 无 | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率的绝对值,数量上等同于 NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:可选,默认为true;为true时,前一个数据点值为null时,忽略该数据点继续向前找到第一个出现的不为null的值;为false时,如果前一个数据点为null,则不忽略,使用null进行相减,结果也为null | DOUBLE | 统计序列中某数据点的值与前一数据点的值的差。第一个数据点没有对应的结果输出,输出值为null | + +详细说明及示例见文档 [趋势计算函数](./Variation-Trend.md)。 + +### 采样函数 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | 降采样比例 `proportion`,取值范围为`(0, 1]`,默认为`0.1` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶随机采样 | +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`:取值类型有`avg`, `max`, `min`, `sum`, `extreme`, `variance`, 默认为`avg` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶聚合采样 | +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶M4采样 | +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`取值为`avg`或`stendis`或`cos`或`prenextdis`,默认为`avg`
`number`取值应大于0,默认`3`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例和桶内采样个数的等分桶离群值采样 | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | 包含固定点数的窗口和滑动时间窗口使用不同的属性参数。包含固定点数的窗口使用属性`windowSize`和`slidingStep`。滑动时间窗口使用属性`timeInterval`、`slidingStep`、`displayWindowBegin`和`displayWindowEnd`。更多细节见下文。 | INT32 / INT64 / FLOAT / DOUBLE | 返回每个窗口内的第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`)。在一个窗口内的聚合点输出之前,M4会将它们按照时间戳递增排序并且去重。 | + +详细说明及示例见文档 [采样函数](./Sample.md)。 + +### 时间序列处理函数 + +| 函数名 | 输入序列类型 | 参数 | 输出序列类型 | 功能描述 | +| ------------- | ------------------------------ | ---- | ------------------------ | -------------------------- | +| CHANGE_POINTS | INT32 / INT64 / FLOAT / DOUBLE | / | 与输入序列的实际类型一致 | 去除输入序列中的连续相同值 | + +详细说明及示例见文档 [时间序列处理](./Time-Series.md) + +## Lambda 表达式 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +| ------ | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | ---------------------------------------------- | +| JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr`是一个支持标准的一元或多元参数的lambda表达式,符合`x -> {...}`或`(x, y, z) -> {...}`的格式,例如`x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | 返回将输入的时间序列通过lambda表达式变换的序列 | + +详细说明及示例见文档 [Lambda 表达式](./Lambda.md) + +## 条件表达式 + +| 表达式名称 | 含义 | +|---------------------------|-----------| +| `CASE` | 类似if else | + +详细说明及示例见文档 [条件表达式](./Conditional.md)。 + +## 数据质量函数库 + +### 关于 + +对基于时序数据的应用而言,数据质量至关重要。基于用户自定义函数能力,IoTDB 提供了一系列关于数据质量的函数,包括数据画像、数据质量评估与修复等,能够满足工业领域对数据质量的需求。 + +### 快速上手 + +**该函数库中的函数不是内置函数,使用前要先加载到系统中。** 操作流程如下: + +1. 下载包含全部依赖的 jar 包和注册脚本 [【点击下载】](https://archive.apache.org/dist/iotdb/1.0.1/apache-iotdb-1.0.1-library-udf-bin.zip) ; +2. 将 jar 包复制到 IoTDB 程序目录的 `ext\udf` 目录下 (若您使用的是集群,请将jar包复制到所有DataNode的该目录下); +3. 启动 IoTDB; +4. 将注册脚本复制到 IoTDB 的程序目录下(与`sbin`目录同级的根目录下),修改脚本中的参数(如果需要)并运行注册脚本以注册 UDF。 + +### 已经实现的函数 + +1. [Data-Quality](../Operators-Functions/Data-Quality.md) 数据质量 +2. [Data-Profiling](../Operators-Functions/Data-Profiling.md) 数据画像 +3. [Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md) 异常检测 +4. [Frequency-Domain](../Operators-Functions/Frequency-Domain.md) 频域分析 +5. [Data-Matching](../Operators-Functions/Data-Matching.md) 数据匹配 +6. [Data-Repairing](../Operators-Functions/Data-Repairing.md) 数据修复 +7. [Series-Discovery](../Operators-Functions/Series-Discovery.md) 序列发现 +8. [Machine-Learning](../Operators-Functions/Machine-Learning.md) 机器学习 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Sample.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Sample.md new file mode 100644 index 00000000..11d524f9 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Sample.md @@ -0,0 +1,402 @@ + + +# 采样函数 + +## 等数量分桶降采样函数 + +本函数对输入序列进行等数量分桶采样,即根据用户给定的降采样比例和降采样方法将输入序列按固定点数等分为若干桶。在每个桶内通过给定的采样方法进行采样。 + +### 等数量分桶随机采样 + +对等数量分桶后,桶内进行随机采样。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_RANDOM_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | 降采样比例 `proportion`,取值范围为`(0, 1]`,默认为`0.1` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶随机采样 | + +#### 示例 + +输入序列:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条数据。 + +``` +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 5.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|.............................|.............................| +|1970-01-01T08:00:00.089+08:00| 89.0| +|1970-01-01T08:00:00.090+08:00| 90.0| +|1970-01-01T08:00:00.091+08:00| 91.0| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 95.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +sql: +```sql +select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+-------------+ +| Time|random_sample| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.035+08:00| 35.0| +|1970-01-01T08:00:00.047+08:00| 47.0| +|1970-01-01T08:00:00.059+08:00| 59.0| +|1970-01-01T08:00:00.063+08:00| 63.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.086+08:00| 86.0| +|1970-01-01T08:00:00.096+08:00| 96.0| ++-----------------------------+-------------+ +Total line number = 10 +It costs 0.024s +``` + +### 等数量分桶聚合采样 + +采用聚合采样法对输入序列进行采样,用户需要另外提供一个聚合函数参数即 +- `type`:聚合类型,取值为`avg`或`max`或`min`或`sum`或`extreme`或`variance`。在缺省情况下,采用`avg`。其中`extreme`表示等分桶中,绝对值最大的值。`variance`表示采样等分桶中的方差。 + +每个桶采样输出的时间戳为这个桶第一个点的时间戳 + + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_AGG_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`:取值类型有`avg`, `max`, `min`, `sum`, `extreme`, `variance`, 默认为`avg` | INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶聚合采样 | + +#### 示例 + +输入序列:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条有序数据,同等分桶随机采样的测试数据。 + +sql: +```sql +select equal_size_bucket_agg_sample(temperature, 'type'='avg','proportion'='0.1') as agg_avg, equal_size_bucket_agg_sample(temperature, 'type'='max','proportion'='0.1') as agg_max, equal_size_bucket_agg_sample(temperature,'type'='min','proportion'='0.1') as agg_min, equal_size_bucket_agg_sample(temperature, 'type'='sum','proportion'='0.1') as agg_sum, equal_size_bucket_agg_sample(temperature, 'type'='extreme','proportion'='0.1') as agg_extreme, equal_size_bucket_agg_sample(temperature, 'type'='variance','proportion'='0.1') as agg_variance from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +| Time| agg_avg|agg_max|agg_min|agg_sum|agg_extreme|agg_variance| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| 14.5| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| 20.0| 245.0| 29.0| 8.25| +|1970-01-01T08:00:00.030+08:00| 34.5| 39.0| 30.0| 345.0| 39.0| 8.25| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| 54.5| 59.0| 50.0| 545.0| 59.0| 8.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| 8.25| +|1970-01-01T08:00:00.070+08:00|74.50000000000001| 79.0| 70.0| 745.0| 79.0| 8.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 8.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 8.25| ++-----------------------------+-----------------+-------+-------+-------+-----------+------------+ +Total line number = 10 +It costs 0.044s +``` + +### 等数量分桶 M4 采样 + +采用M4采样法对输入序列进行采样。即对于每个桶采样首、尾、最小和最大值。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例的等分桶M4采样 | + +#### 示例 + +输入序列:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条有序数据,同等分桶随机采样的测试数据。 + +sql: +```sql +select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+---------+ +| Time|M4_sample| ++-----------------------------+---------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.038+08:00| 38.0| +|1970-01-01T08:00:00.039+08:00| 39.0| +|1970-01-01T08:00:00.040+08:00| 40.0| +|1970-01-01T08:00:00.041+08:00| 41.0| +|1970-01-01T08:00:00.078+08:00| 78.0| +|1970-01-01T08:00:00.079+08:00| 79.0| +|1970-01-01T08:00:00.080+08:00| 80.0| +|1970-01-01T08:00:00.081+08:00| 81.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+---------+ +Total line number = 12 +It costs 0.065s +``` + +### 等数量分桶离群值采样 + +本函数对输入序列进行等数量分桶离群值采样,即根据用户给定的降采样比例和桶内采样个数将输入序列按固定点数等分为若干桶,在每个桶内通过给定的离群值采样方法进行采样。 + +| 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | +|----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| +| EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion`取值范围为`(0, 1]`,默认为`0.1`
`type`取值为`avg`或`stendis`或`cos`或`prenextdis`,默认为`avg`
`number`取值应大于0,默认`3`| INT32 / INT64 / FLOAT / DOUBLE | 返回符合采样比例和桶内采样个数的等分桶离群值采样 | + +参数说明 +- `proportion`: 采样比例 + - `number`: 每个桶内的采样个数,默认`3` +- `type`: 离群值采样方法,取值为 + - `avg`: 取桶内数据点的平均值,并根据采样比例,找到距离均值最远的`top number`个 + - `stendis`: 取桶内每一个数据点距离桶的首末数据点连成直线的垂直距离,并根据采样比例,找到距离最大的`top number`个 + - `cos`: 设桶内一个数据点为b,b左边的数据点为a,b右边的数据点为c,则取ab与bc向量的夹角的余弦值,值越小,说明形成的角度越大,越可能是异常值。找到cos值最小的`top number`个 + - `prenextdis`: 设桶内一个数据点为b,b左边的数据点为a,b右边的数据点为c,则取ab与bc的长度之和作为衡量标准,和越大越可能是异常值,找到最大的`top number`个 + +#### 示例 + +测试数据:`root.ln.wf01.wt01.temperature`从`0.0-99.0`共`100`条数据,其中为了加入离群值,我们使得个位数为5的值自增100。 +``` +IoTDB> select temperature from root.ln.wf01.wt01; ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 4.0| +|1970-01-01T08:00:00.005+08:00| 105.0| +|1970-01-01T08:00:00.006+08:00| 6.0| +|1970-01-01T08:00:00.007+08:00| 7.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.009+08:00| 9.0| +|1970-01-01T08:00:00.010+08:00| 10.0| +|1970-01-01T08:00:00.011+08:00| 11.0| +|1970-01-01T08:00:00.012+08:00| 12.0| +|1970-01-01T08:00:00.013+08:00| 13.0| +|1970-01-01T08:00:00.014+08:00| 14.0| +|1970-01-01T08:00:00.015+08:00| 115.0| +|1970-01-01T08:00:00.016+08:00| 16.0| +|.............................|.............................| +|1970-01-01T08:00:00.092+08:00| 92.0| +|1970-01-01T08:00:00.093+08:00| 93.0| +|1970-01-01T08:00:00.094+08:00| 94.0| +|1970-01-01T08:00:00.095+08:00| 195.0| +|1970-01-01T08:00:00.096+08:00| 96.0| +|1970-01-01T08:00:00.097+08:00| 97.0| +|1970-01-01T08:00:00.098+08:00| 98.0| +|1970-01-01T08:00:00.099+08:00| 99.0| ++-----------------------------+-----------------------------+ +``` +sql: +```sql +select equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='avg', 'number'='2') as outlier_avg_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='stendis', 'number'='2') as outlier_stendis_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='cos', 'number'='2') as outlier_cos_sample, equal_size_bucket_outlier_sample(temperature, 'proportion'='0.1', 'type'='prenextdis', 'number'='2') as outlier_prenextdis_sample from root.ln.wf01.wt01; +``` +结果: +``` ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +| Time|outlier_avg_sample|outlier_stendis_sample|outlier_cos_sample|outlier_prenextdis_sample| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +|1970-01-01T08:00:00.005+08:00| 105.0| 105.0| 105.0| 105.0| +|1970-01-01T08:00:00.015+08:00| 115.0| 115.0| 115.0| 115.0| +|1970-01-01T08:00:00.025+08:00| 125.0| 125.0| 125.0| 125.0| +|1970-01-01T08:00:00.035+08:00| 135.0| 135.0| 135.0| 135.0| +|1970-01-01T08:00:00.045+08:00| 145.0| 145.0| 145.0| 145.0| +|1970-01-01T08:00:00.055+08:00| 155.0| 155.0| 155.0| 155.0| +|1970-01-01T08:00:00.065+08:00| 165.0| 165.0| 165.0| 165.0| +|1970-01-01T08:00:00.075+08:00| 175.0| 175.0| 175.0| 175.0| +|1970-01-01T08:00:00.085+08:00| 185.0| 185.0| 185.0| 185.0| +|1970-01-01T08:00:00.095+08:00| 195.0| 195.0| 195.0| 195.0| ++-----------------------------+------------------+----------------------+------------------+-------------------------+ +Total line number = 10 +It costs 0.041s +``` + +## M4函数 + +### 函数简介 + +M4用于在窗口内采样第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`): + +- 第一个点是拥有这个窗口内最小时间戳的点; +- 最后一个点是拥有这个窗口内最大时间戳的点; +- 最小值点是拥有这个窗口内最小值的点(如果有多个这样的点,M4只返回其中一个); +- 最大值点是拥有这个窗口内最大值的点(如果有多个这样的点,M4只返回其中一个)。 + +image + +| 函数名 | 可接收的输入序列类型 | 属性参数 | 输出序列类型 | 功能类型 | +| ------ | ------------------------------ | ------------------------------------------------------------ | ------------------------------ | ------------------------------------------------------------ | +| M4 | INT32 / INT64 / FLOAT / DOUBLE | 包含固定点数的窗口和滑动时间窗口使用不同的属性参数。包含固定点数的窗口使用属性`windowSize`和`slidingStep`。滑动时间窗口使用属性`timeInterval`、`slidingStep`、`displayWindowBegin`和`displayWindowEnd`。更多细节见下文。 | INT32 / INT64 / FLOAT / DOUBLE | 返回每个窗口内的第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`)。在一个窗口内的聚合点输出之前,M4会将它们按照时间戳递增排序并且去重。 | + +### 属性参数 + +**(1) 包含固定点数的窗口(SlidingSizeWindowAccessStrategy)使用的属性参数:** + ++ `windowSize`: 一个窗口内的点数。Int数据类型。必需的属性参数。 ++ `slidingStep`: 按照设定的点数来滑动窗口。Int数据类型。可选的属性参数;如果没有设置,默认取值和`windowSize`一样。 + +image + +**(2) 滑动时间窗口(SlidingTimeWindowAccessStrategy)使用的属性参数:** + ++ `timeInterval`: 一个窗口的时间长度。Long数据类型。必需的属性参数。 ++ `slidingStep`: 按照设定的时长来滑动窗口。Long数据类型。可选的属性参数;如果没有设置,默认取值和`timeInterval`一样。 ++ `displayWindowBegin`: 窗口滑动的起始时间戳位置(包含在内)。Long数据类型。可选的属性参数;如果没有设置,默认取值为Long.MIN_VALUE,意为使用输入的时间序列的第一个点的时间戳作为窗口滑动的起始时间戳位置。 ++ `displayWindowEnd`: 结束时间限制(不包含在内;本质上和`WHERE time < displayWindowEnd`起的效果是一样的)。Long数据类型。可选的属性参数;如果没有设置,默认取值为Long.MAX_VALUE,意为除了输入的时间序列自身数据读取完毕之外没有增加额外的结束时间过滤条件限制。 + +groupBy window + +### 示例 + +输入的时间序列: + +```sql ++-----------------------------+------------------+ +| Time|root.vehicle.d1.s1| ++-----------------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.002+08:00| 15.0| +|1970-01-01T08:00:00.005+08:00| 10.0| +|1970-01-01T08:00:00.008+08:00| 8.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.027+08:00| 20.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.040+08:00| 20.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+------------------+ +``` + +查询语句1: + +```sql +select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1 +``` + +输出结果1: + +```sql ++-----------------------------+-----------------------------------------------------------------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "timeInterval"="25", "displayWindowBegin"="0", "displayWindowEnd"="100")| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.010+08:00| 30.0| +|1970-01-01T08:00:00.020+08:00| 20.0| +|1970-01-01T08:00:00.025+08:00| 8.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------------------------------------------------------------+ +Total line number = 8 +``` + +查询语句2: + +```sql +select M4(s1,'windowSize'='10') from root.vehicle.d1 +``` + +输出结果2: + +```sql ++-----------------------------+-----------------------------------------+ +| Time|M4(root.vehicle.d1.s1, "windowSize"="10")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 5.0| +|1970-01-01T08:00:00.030+08:00| 40.0| +|1970-01-01T08:00:00.033+08:00| 9.0| +|1970-01-01T08:00:00.035+08:00| 10.0| +|1970-01-01T08:00:00.045+08:00| 30.0| +|1970-01-01T08:00:00.052+08:00| 8.0| +|1970-01-01T08:00:00.054+08:00| 18.0| ++-----------------------------+-----------------------------------------+ +Total line number = 7 +``` + +### 推荐的使用场景 + +**(1) 使用场景:保留极端点的降采样** + +由于M4为每个窗口聚合其第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`),因此M4通常保留了极值点,因此比其他下采样方法(如分段聚合近似 (PAA))能更好地保留模式。如果你想对时间序列进行下采样并且希望保留极值点,你可以试试 M4。 + +**(2) 使用场景:基于M4降采样的大规模时间序列的零误差双色折线图可视化** + +参考论文["M4: A Visualization-Oriented Time Series Data Aggregation"](http://www.vldb.org/pvldb/vol7/p797-jugel.pdf),作为大规模时间序列可视化的降采样方法,M4可以做到双色折线图的零变形。 + +假设屏幕画布的像素宽乘高是`w*h`,假设时间序列要可视化的时间范围是`[tqs,tqe)`,并且(tqe-tqs)是w的整数倍,那么落在第i个时间跨度`Ii=[tqs+(tqe-tqs)/w*(i-1),tqs+(tqe-tqs)/w*i)` 内的点将会被画在第i个像素列中,i=1,2,...,w。于是从可视化驱动的角度出发,使用查询语句:`"select M4(s1,'timeInterval'='(tqe-tqs)/w','displayWindowBegin'='tqs','displayWindowEnd'='tqe') from root.vehicle.d1"`,来采集每个时间跨度内的第一个点(`first`)、最后一个点(`last`)、最小值点(`bottom`)、最大值点(`top`)。降采样时间序列的结果点数不会超过`4*w`个,与此同时,使用这些聚合点画出来的二色折线图与使用原始数据画出来的在像素级别上是完全一致的。 + +为了免除参数值硬编码的麻烦,当Grafana用于可视化时,我们推荐使用Grafana的[模板变量](https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables)`$ __interval_ms`,如下所示: + +```sql +select M4(s1,'timeInterval'='$__interval_ms') from root.sg1.d1 +``` + +其中`timeInterval`自动设置为`(tqe-tqs)/w`。请注意,这里的时间精度假定为毫秒。 + +### 和其它函数的功能比较 + +| SQL | 是否支持M4聚合 | 滑动窗口类型 | 示例 | 相关文档 | +| ------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| 1. 带有Group By子句的内置聚合函数 | 不支持,缺少`BOTTOM_TIME`和`TOP_TIME`,即缺少最小值点和最大值点的时间戳。 | Time Window | `select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d)` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#built-in-aggregate-functions
https://iotdb.apache.org/UserGuide/Master/Query-Data/Aggregate-Query.html#downsampling-aggregate-query | +| 2. EQUAL_SIZE_BUCKET_M4_SAMPLE (内置UDF) | 支持* | Size Window. `windowSize = 4*(int)(1/proportion)` | `select equal_size_bucket_m4_sample(temperature, 'proportion'='0.1') as M4_sample from root.ln.wf01.wt01` | https://iotdb.apache.org/UserGuide/Master/Query-Data/Select-Expression.html#time-series-generating-functions | +| **3. M4 (内置UDF)** | 支持* | Size Window, Time Window | (1) Size Window: `select M4(s1,'windowSize'='10') from root.vehicle.d1`
(2) Time Window: `select M4(s1,'timeInterval'='25','displayWindowBegin'='0','displayWindowEnd'='100') from root.vehicle.d1` | 本文档 | +| 4. 扩展带有Group By子句的内置聚合函数来支持M4聚合 | 未实施 | 未实施 | 未实施 | 未实施 | + +进一步比较`EQUAL_SIZE_BUCKET_M4_SAMPLE`和`M4`: + +**(1) 不同的M4聚合函数定义:** + +在每个窗口内,`EQUAL_SIZE_BUCKET_M4_SAMPLE`从排除了第一个点和最后一个点之后剩余的点中提取最小值点和最大值点。 + +而`M4`则是从窗口内所有点中(包括第一个点和最后一个点)提取最小值点和最大值点,这个定义与元数据中保存的`max_value`和`min_value`的语义更加一致。 + +值得注意的是,在一个窗口内的聚合点输出之前,`EQUAL_SIZE_BUCKET_M4_SAMPLE`和`M4`都会将它们按照时间戳递增排序并且去重。 + +**(2) 不同的滑动窗口:** + +`EQUAL_SIZE_BUCKET_M4_SAMPLE`使用SlidingSizeWindowAccessStrategy,并且通过采样比例(`proportion`)来间接控制窗口点数(`windowSize`),转换公式是`windowSize = 4*(int)(1/proportion)`。 + +`M4`支持两种滑动窗口:SlidingSizeWindowAccessStrategy和SlidingTimeWindowAccessStrategy,并且`M4`通过相应的参数直接控制窗口的点数或者时长。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Selection.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Selection.md new file mode 100644 index 00000000..b818b3c9 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Selection.md @@ -0,0 +1,51 @@ + + +# 选择函数 + +目前 IoTDB 支持如下选择函数: + +| 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | +| -------- | ------------------------------------- | ------------------------------------------------- | ------------------------ | ------------------------------------------------------------ | +| TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最大的`k`个数据点。若多于`k`个数据点的值并列最大,则返回时间戳最小的数据点。 | +| BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最小的`k`个数据点。若多于`k`个数据点的值并列最小,则返回时间戳最小的数据点。 | + +例如: + +``` sql +select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; +``` + +结果: + +``` ++-----------------------------+--------------------+------------------------------+---------------------------------+ +| Time| root.sg1.d2.s1|top_k(root.sg1.d2.s1, "k"="2")|bottom_k(root.sg1.d2.s1, "k"="2")| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +|2020-12-10T20:36:15.531+08:00| 1531604122307244742| 1531604122307244742| null| +|2020-12-10T20:36:15.532+08:00|-7426070874923281101| null| null| +|2020-12-10T20:36:15.533+08:00|-7162825364312197604| -7162825364312197604| null| +|2020-12-10T20:36:15.534+08:00|-8581625725655917595| null| -8581625725655917595| +|2020-12-10T20:36:15.535+08:00|-7667364751255535391| null| -7667364751255535391| ++-----------------------------+--------------------+------------------------------+---------------------------------+ +Total line number = 5 +It costs 0.006s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Series-Discovery.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Series-Discovery.md new file mode 100644 index 00000000..0a1d1839 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Series-Discovery.md @@ -0,0 +1,173 @@ + + +# 序列发现 + +## ConsecutiveSequences + +### 函数简介 + +本函数用于在多维严格等间隔数据中发现局部最长连续子序列。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续子序列是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。如果某个连续子序列不是任何连续子序列的真子序列,那么它是局部最长的。 + + +**函数名:** CONSECUTIVESEQUENCES + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个局部最长连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +### 使用示例 + +#### 手动指定标准时间间隔 + +本函数可以通过`gap`参数手动指定标准时间间隔。需要注意的是,错误的参数设置会导致输出产生严重错误。 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + +#### 自动估计标准时间间隔 + +当`gap`参数缺省时,本函数可以利用众数估计标准时间间隔,得到同样的结果。因此,这种用法更受推荐。 + +输入序列同上,用于查询的SQL语句如下: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +## ConsecutiveWindows + +### 函数简介 + +本函数用于在多维严格等间隔数据中发现指定长度的连续窗口。 + +严格等间隔数据是指数据的时间间隔是严格相等的,允许存在数据缺失(包括行缺失和值缺失),但不允许存在数据冗余和时间戳偏移。 + +连续窗口是指严格按照标准时间间隔等距排布,不存在任何数据缺失的子序列。 + + +**函数名:** CONSECUTIVEWINDOWS + +**输入序列:** 支持多个输入序列,类型可以是任意的,但要满足严格等间隔的要求。 + +**参数:** + ++ `gap`:标准时间间隔,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。在缺省情况下,函数会利用众数估计标准时间间隔。 ++ `length`:序列长度,是一个有单位的正数。目前支持五种单位,分别是 'ms'(毫秒)、's'(秒)、'm'(分钟)、'h'(小时)和'd'(天)。该参数不允许缺省。 + +**输出序列:** 输出单个序列,类型为 INT32。输出序列中的每一个数据点对应一个指定长度连续子序列,时间戳为子序列的起始时刻,值为子序列包含的数据点个数。 + +**提示:** 对于不符合要求的输入,本函数不对输出做任何保证。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +用于查询的SQL语句: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/String.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/String.md new file mode 100644 index 00000000..26853b9b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/String.md @@ -0,0 +1,904 @@ + + +# 字符串处理 + +## STRING_CONTAINS + +### 函数简介 + +本函数判断字符串中是否存在子串 `s` + +**函数名:** STRING_CONTAINS + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `s`: 待搜寻的字符串。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +### 使用示例 + +``` sql +select s1, string_contains(s1, 's'='warn') from root.sg1.d4; +``` + +结果: + +``` ++-----------------------------+--------------+-------------------------------------------+ +| Time|root.sg1.d4.s1|string_contains(root.sg1.d4.s1, "s"="warn")| ++-----------------------------+--------------+-------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| true| +|1970-01-01T08:00:00.002+08:00| error:-37229| false| +|1970-01-01T08:00:00.003+08:00| warn:1731| true| ++-----------------------------+--------------+-------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +## STRING_MATCHES + +### 函数简介 + +本函数判断字符串是否能够被正则表达式`regex`匹配。 + +**函数名:** STRING_MATCHES + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `regex`: Java 标准库风格的正则表达式。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +### 使用示例 + +``` sql +select s1, string_matches(s1, 'regex'='[^\\s]+37229') from root.sg1.d4; +``` + +结果: + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d4.s1|string_matches(root.sg1.d4.s1, "regex"="[^\\s]+37229")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| warn:-8721| false| +|1970-01-01T08:00:00.002+08:00| error:-37229| true| +|1970-01-01T08:00:00.003+08:00| warn:1731| false| ++-----------------------------+--------------+------------------------------------------------------+ +Total line number = 3 +It costs 0.007s +``` + +## Length + +### 函数简介 + +本函数用于获取输入序列的长度。 + +**函数名:** LENGTH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**输出序列:** 输出单个序列,类型为 INT32。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, length(s1) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+----------------------+ +| Time|root.sg1.d1.s1|length(root.sg1.d1.s1)| ++-----------------------------+--------------+----------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 6| +|1970-01-01T08:00:00.002+08:00| 22test22| 8| ++-----------------------------+--------------+----------------------+ +``` + +## Locate + +### 函数简介 + +本函数用于获取`target`子串第一次出现在输入序列的位置,如果输入序列中不包含`target`则返回 -1 。 + +**函数名:** LOCATE + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `target`: 需要被定位的子串。 ++ `reverse`: 指定是否需要倒序定位,默认值为`false`, 即从左至右定位。 + +**输出序列:** 输出单个序列,类型为INT32。 + +**提示:** 下标从 0 开始。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, locate(s1, "target"="1") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 0| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------+ +``` + +另一个用于查询的 SQL 语句: + +```sql +select s1, locate(s1, "target"="1", "reverse"="true") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+------------------------------------------------------+ +| Time|root.sg1.d1.s1|locate(root.sg1.d1.s1, "target"="1", "reverse"="true")| ++-----------------------------+--------------+------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 5| +|1970-01-01T08:00:00.002+08:00| 22test22| -1| ++-----------------------------+--------------+------------------------------------------------------+ +``` + +## StartsWith + +### 函数简介 + +本函数用于判断输入序列是否有指定前缀。 + +**函数名:** STARTSWITH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `target`: 需要匹配的前缀。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, startswith(s1, "target"="1") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+----------------------------------------+ +| Time|root.sg1.d1.s1|startswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+----------------------------------------+ +``` + +## EndsWith + +### 函数简介 + +本函数用于判断输入序列是否有指定后缀。 + +**函数名:** ENDSWITH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** ++ `target`: 需要匹配的后缀。 + +**输出序列:** 输出单个序列,类型为 BOOLEAN。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, endswith(s1, "target"="1") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|endswith(root.sg1.d1.s1, "target"="1")| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| true| +|1970-01-01T08:00:00.002+08:00| 22test22| false| ++-----------------------------+--------------+--------------------------------------+ +``` + +## Concat + +### 函数简介 + +本函数用于拼接输入序列和`target`字串。 + +**函数名:** CONCAT + +**输入序列:** 至少一个输入序列,类型为 TEXT。 + +**参数:** ++ `targets`: 一系列 K-V, key需要以`target`为前缀且不重复, value是待拼接的字符串。 ++ `series_behind`: 指定拼接时时间序列是否在后面,默认为`false`。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** ++ 如果输入序列是NULL, 跳过该序列的拼接。 ++ 函数只能将输入序列和`targets`区分开各自拼接。`concat(s1, "target1"="IoT", s2, "target2"="DB")`和 + `concat(s1, s2, "target1"="IoT", "target2"="DB")`得到的结果是一样的。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| 1test1IoTDB| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 22test222222testIoTDB| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------+ +``` + +另一个用于查询的 SQL 语句: + +```sql +select s1, s2, concat(s1, s2, "target1"="IoT", "target2"="DB", "series_behind"="true") from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|concat(root.sg1.d1.s1, root.sg1.d1.s2, "target1"="IoT", "target2"="DB", "series_behind"="true")| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| IoTDB1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| IoTDB22test222222test| ++-----------------------------+--------------+--------------+-----------------------------------------------------------------------------------------------+ +``` + +## Substring + +### 函数简介 +提取字符串的子字符串,从指定的第一个字符开始,并在指定的字符数之后停止。下标从1开始。from 和 for的范围是 INT32 类型取值范围。 + +**函数名:** SUBSTRING + + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**参数:** ++ `from`: 指定子串开始下标。 ++ `for`: 指定多少个字符数后停止。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, substring(s1 from 1 for 2) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|SUBSTRING(root.sg1.d1.s1 FROM 1 FOR 2)| ++-----------------------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1t| +|1970-01-01T08:00:00.002+08:00| 22test22| 22| ++-----------------------------+--------------+--------------------------------------+ +``` + +## Replace + +### 函数简介 +将输入序列中的子串替换成目标子串。 + +**函数名:** REPLACE + + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**参数:** ++ 第一个参数: 需要替换的目标子串。 ++ 第二个参数: 要替换成的子串。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, replace(s1, 'es', 'tt') from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+-----------------------------------+ +| Time|root.sg1.d1.s1|REPLACE(root.sg1.d1.s1, 'es', 'tt')| ++-----------------------------+--------------+-----------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1tttt1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22tttt22| ++-----------------------------+--------------+-----------------------------------+ +``` + +## Upper + +### 函数简介 + +本函数用于将输入序列转化为大写。 + +**函数名:** UPPER + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| +|1970-01-01T08:00:00.002+08:00| 22test22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, upper(s1) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|upper(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22test22| 22TEST22| ++-----------------------------+--------------+---------------------+ +``` + +## Lower + +### 函数简介 + +本函数用于将输入序列转换为小写。 + +**函数名:** LOWER + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s1| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, lower(s1) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+---------------------+ +| Time|root.sg1.d1.s1|lower(root.sg1.d1.s1)| ++-----------------------------+--------------+---------------------+ +|1970-01-01T08:00:00.001+08:00| 1TEST1| 1test1| +|1970-01-01T08:00:00.002+08:00| 22TEST22| 22test22| ++-----------------------------+--------------+---------------------+ +``` + +## Trim + +### 函数简介 + +本函数用于移除输入序列前后的空格。 + +**函数名:** TRIM + +**输入序列:** 仅支持单个输入序列,类型为TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+ +| Time|root.sg1.d1.s3| ++-----------------------------+--------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | ++-----------------------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s3, trim(s3) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------------+ +| Time|root.sg1.d1.s3|trim(root.sg1.d1.s3)| ++-----------------------------+--------------+--------------------+ +|1970-01-01T08:00:00.002+08:00| 3querytest3| 3querytest3| +|1970-01-01T08:00:00.003+08:00| 3querytest3 | 3querytest3| ++-----------------------------+--------------+--------------------+ +``` + +## StrCmp + +### 函数简介 + +本函数用于比较两个输入序列。 如果值相同返回 `0` , 序列1的值小于序列2的值返回一个`负数`,序列1的值大于序列2的值返回一个`正数`。 + +**函数名:** StrCmp + +**输入序列:** 输入两个序列,类型均为 TEXT。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 如果任何一个输入是NULL,返回NULL。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| ++-----------------------------+--------------+--------------+ +``` + +用于查询的 SQL 语句: + +```sql +select s1, s2, strcmp(s1, s2) from root.sg1.d1 +``` + +输出序列: + +``` ++-----------------------------+--------------+--------------+--------------------------------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2|strcmp(root.sg1.d1.s1, root.sg1.d1.s2)| ++-----------------------------+--------------+--------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1test1| null| null| +|1970-01-01T08:00:00.002+08:00| 22test22| 2222test| 66| ++-----------------------------+--------------+--------------+--------------------------------------+ +``` + +## StrReplace + +### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于将文本中的子串替换为指定的字符串。 + +**函数名:** STRREPLACE + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `target`: 需要替换的字符子串 ++ `replace`: 替换后的字符串。 ++ `limit`: 替换次数,大于等于 -1 的整数,默认为 -1 表示所有匹配的子串都会被替换。 ++ `offset`: 需要跳过的匹配次数,即前`offset`次匹配到的字符子串并不会被替换,默认为 0。 ++ `reverse`: 是否需要反向计数,默认为 false 即按照从左向右的次序。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",",| +| | "replace"="/", "limit"="2")| ++-----------------------------+-----------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A/B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A/A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B/B| +|2021-01-01T00:00:04.000+08:00| A+/A/A+,A| +|2021-01-01T00:00:05.000+08:00| A/B-/B,B| ++-----------------------------+-----------------------------------------+ +``` + +另一个用于查询的 SQL 语句: + +```sql +select strreplace(s1, "target"=",", "replace"="/", "limit"="1", "offset"="1", "reverse"="true") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|strreplace(root.test.d1.s1, "target"=",", "replace"= | +| | "|", "limit"="1", "offset"="1", "reverse"="true")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| A,B/A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+/A,B+| +|2021-01-01T00:00:03.000+08:00| B+/B,B| +|2021-01-01T00:00:04.000+08:00| A+,A/A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-/B,B| ++-----------------------------+-----------------------------------------------------+ +``` + +## RegexMatch + +### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于正则表达式匹配文本中的具体内容并返回。 + +**函数名:** REGEXMATCH + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `regex`: 匹配的正则表达式,支持所有 Java 正则表达式语法,比如`\d+\.\d+\.\d+\.\d+`将会匹配任意 IPv4 地址. ++ `group`: 输出的匹配组序号,根据 java.util.regex 规定,第 0 组为整个正则表达式,此后的组按照左括号出现的顺序依次编号。 + 如`A(B(CD))`中共有三个组,第 0 组`A(B(CD))`,第 1 组`B(CD)`和第 2 组`CD`。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +**提示:** 空值或无法匹配给定的正则表达式的数据点没有输出结果。 + +### 使用示例 + + +输入序列: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +用于查询的 SQL 语句: + +```sql +select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+----------------------------------------------------------------------+ +| Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")| ++-----------------------------+----------------------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 192.168.0.1| +|2021-01-01T00:00:02.000+08:00| 192.168.0.24| +|2021-01-01T00:00:03.000+08:00| 192.168.0.2| +|2021-01-01T00:00:04.000+08:00| 192.168.0.5| +|2021-01-01T00:00:05.000+08:00| 192.168.0.124| ++-----------------------------+----------------------------------------------------------------------+ +``` + +## RegexReplace + +### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于将文本中符合正则表达式的匹配结果替换为指定的字符串。 + +**函数名:** REGEXREPLACE + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `regex`: 需要替换的正则表达式,支持所有 Java 正则表达式语法。 ++ `replace`: 替换后的字符串,支持 Java 正则表达式中的后向引用, + 形如'$1'指代了正则表达式`regex`中的第一个分组,并会在替换时自动填充匹配到的子串。 ++ `limit`: 替换次数,大于等于 -1 的整数,默认为 -1 表示所有匹配的子串都会被替换。 ++ `offset`: 需要跳过的匹配次数,即前`offset`次匹配到的字符子串并不会被替换,默认为 0。 ++ `reverse`: 是否需要反向计数,默认为 false 即按照从左向右的次序。 + +**输出序列:** 输出单个序列,类型为 TEXT。 + +### 使用示例 + +输入序列: + +``` ++-----------------------------+-------------------------------+ +| Time| root.test.d1.s1| ++-----------------------------+-------------------------------+ +|2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [192.168.0.5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]| ++-----------------------------+-------------------------------+ +``` + +用于查询的 SQL 语句: + +```sql +select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------------+ +| Time|regexreplace(root.test.d1.s1, "regex"="192\.168\.0\.(\d+)",| +| | "replace"="cluster-$1", "limit"="1")| ++-----------------------------+-----------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| [cluster-1] [SUCCESS]| +|2021-01-01T00:00:02.000+08:00| [cluster-24] [SUCCESS]| +|2021-01-01T00:00:03.000+08:00| [cluster-2] [FAIL]| +|2021-01-01T00:00:04.000+08:00| [cluster-5] [SUCCESS]| +|2021-01-01T00:00:05.000+08:00| [cluster-124] [SUCCESS]| ++-----------------------------+-----------------------------------------------------------+ +``` + +## RegexSplit + +### 函数简介 + +**非内置函数,需要注册数据质量函数库后才能使用**。本函数用于使用给定的正则表达式切分文本,并返回指定的项。 + +**函数名:** REGEXSPLIT + +**输入序列:** 仅支持单个输入序列,类型为 TEXT。 + +**参数:** + ++ `regex`: 用于分割文本的正则表达式,支持所有 Java 正则表达式语法,比如`['"]`将会匹配任意的英文引号`'`和`"`。 ++ `index`: 输出结果在切分后数组中的序号,需要是大于等于 -1 的整数,默认值为 -1 表示返回切分后数组的长度,其它非负整数即表示返回数组中对应位置的切分结果(数组的秩从 0 开始计数)。 + +**输出序列:** 输出单个序列,在`index`为 -1 时输出数据类型为 INT32,否则为 TEXT。 + +**提示:** 如果`index`超出了切分后结果数组的秩范围,例如使用`,`切分`0,1,2`时输入`index`为 3,则该数据点没有输出结果。 + +### 使用示例 + + +输入序列: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-01-01T00:00:01.000+08:00| A,B,A+,B-| +|2021-01-01T00:00:02.000+08:00| A,A+,A,B+| +|2021-01-01T00:00:03.000+08:00| B+,B,B| +|2021-01-01T00:00:04.000+08:00| A+,A,A+,A| +|2021-01-01T00:00:05.000+08:00| A,B-,B,B| ++-----------------------------+---------------+ +``` + +用于查询的 SQL 语句: + +```sql +select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| 4| +|2021-01-01T00:00:02.000+08:00| 4| +|2021-01-01T00:00:03.000+08:00| 3| +|2021-01-01T00:00:04.000+08:00| 4| +|2021-01-01T00:00:05.000+08:00| 4| ++-----------------------------+------------------------------------------------------+ +``` + +另一个查询的 SQL 语句: + +```sql +select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 +``` + +输出序列: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="3")| ++-----------------------------+-----------------------------------------------------+ +|2021-01-01T00:00:01.000+08:00| B-| +|2021-01-01T00:00:02.000+08:00| B+| +|2021-01-01T00:00:04.000+08:00| A| +|2021-01-01T00:00:05.000+08:00| B| ++-----------------------------+-----------------------------------------------------+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Time-Series.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Time-Series.md new file mode 100644 index 00000000..777fe861 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Time-Series.md @@ -0,0 +1,69 @@ + + +# 时间序列处理 + +## CHANGE_POINTS + +### 函数简介 + +本函数用于去除输入序列中的连续相同值。如输入序列`1,1,2,2,3`输出序列为`1,2,3`。 + +**函数名:** CHANGE_POINTS + +**输入序列:** 仅支持输入1个序列。 + +**参数:** 无 + +### 使用示例 + +原始数据: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|root.testChangePoints.d1.s1|root.testChangePoints.d1.s2|root.testChangePoints.d1.s3|root.testChangePoints.d1.s4|root.testChangePoints.d1.s5|root.testChangePoints.d1.s6| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| true| 2| 2| 2.0| 1.0| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| 2| 1.0| 1.0| 2test2| +|1970-01-01T08:00:00.004+08:00| true| 1| 3| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.005+08:00| true| 1| 3| 1.0| 1.0| 1test1| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +``` + +用于查询的SQL语句: + +```sql +select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 +``` + +输出序列: + +``` ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +| Time|change_points(root.testChangePoints.d1.s1)|change_points(root.testChangePoints.d1.s2)|change_points(root.testChangePoints.d1.s3)|change_points(root.testChangePoints.d1.s4)|change_points(root.testChangePoints.d1.s5)|change_points(root.testChangePoints.d1.s6)| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| true| 1| 1| 1.0| 1.0| 1test1| +|1970-01-01T08:00:00.002+08:00| null| 2| 2| 2.0| null| 2test2| +|1970-01-01T08:00:00.003+08:00| false| 1| null| 1.0| null| null| +|1970-01-01T08:00:00.004+08:00| true| null| 3| null| null| 1test1| ++-----------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/User-Defined-Function.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/User-Defined-Function.md new file mode 100644 index 00000000..3618f7ea --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/User-Defined-Function.md @@ -0,0 +1,592 @@ + + +# 用户自定义函数 + +UDF(User Defined Function)即用户自定义函数。IoTDB 提供多种内建函数来满足您的计算需求,同时您还可以通过创建自定义函数来满足更多的计算需求。 + +根据此文档,您将会很快学会 UDF 的编写、注册、使用等操作。 + +## UDF 类型 + +IoTDB 支持两种类型的 UDF 函数,如下表所示。 + +| UDF 分类 | 描述 | +| --------------------------------------------------- | ------------------------------------------------------------ | +| UDTF(User Defined Timeseries Generating Function) | 自定义时间序列生成函数。该类函数允许接收多条时间序列,最终会输出一条时间序列,生成的时间序列可以有任意多数量的数据点。 | +| UDAF(User Defined Aggregation Function) | 正在开发,敬请期待。 | + +## UDF 依赖 + +如果您使用 [Maven](http://search.maven.org/) ,可以从 [Maven 库](http://search.maven.org/) 中搜索下面示例中的依赖。请注意选择和目标 IoTDB 服务器版本相同的依赖版本。 + +``` xml + + org.apache.iotdb + udf-api + 1.0.0 + provided + +``` + +## UDTF(User Defined Timeseries Generating Function) + +编写一个 UDTF 需要继承`org.apache.iotdb.udf.api.UDTF`类,并至少实现`beforeStart`方法和一种`transform`方法。 + +下表是所有可供用户实现的接口说明。 + +| 接口定义 | 描述 | 是否必须 | +| :----------------------------------------------------------- | :----------------------------------------------------------- | ------------------ | +| `void validate(UDFParameterValidator validator) throws Exception` | 在初始化方法`beforeStart`调用前执行,用于检测`UDFParameters`中用户输入的参数是否合法。 | 否 | +| `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` | 初始化方法,在 UDTF 处理输入数据前,调用用户自定义的初始化行为。用户每执行一次 UDTF 查询,框架就会构造一个新的 UDF 类实例,该方法在每个 UDF 类实例被初始化时调用一次。在每一个 UDF 类实例的生命周期内,该方法只会被调用一次。 | 是 | +| `void transform(Row row, PointCollector collector) throws Exception` | 这个方法由框架调用。当您在`beforeStart`中选择以`RowByRowAccessStrategy`的策略消费原始数据时,这个数据处理方法就会被调用。输入参数以`Row`的形式传入,输出结果通过`PointCollector`输出。您需要在该方法内自行调用`collector`提供的数据收集方法,以决定最终的输出数据。 | 与下面的方法二选一 | +| `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | 这个方法由框架调用。当您在`beforeStart`中选择以`SlidingSizeWindowAccessStrategy`或者`SlidingTimeWindowAccessStrategy`的策略消费原始数据时,这个数据处理方法就会被调用。输入参数以`RowWindow`的形式传入,输出结果通过`PointCollector`输出。您需要在该方法内自行调用`collector`提供的数据收集方法,以决定最终的输出数据。 | 与上面的方法二选一 | +| `void terminate(PointCollector collector) throws Exception` | 这个方法由框架调用。该方法会在所有的`transform`调用执行完成后,在`beforeDestory`方法执行前被调用。在一个 UDF 查询过程中,该方法会且只会调用一次。您需要在该方法内自行调用`collector`提供的数据收集方法,以决定最终的输出数据。 | 否 | +| `void beforeDestroy() ` | UDTF 的结束方法。此方法由框架调用,并且只会被调用一次,即在处理完最后一条记录之后被调用。 | 否 | + +在一个完整的 UDTF 实例生命周期中,各个方法的调用顺序如下: + +1. `void validate(UDFParameterValidator validator) throws Exception` +2. `void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception` +3. `void transform(Row row, PointCollector collector) throws Exception`或者`void transform(RowWindow rowWindow, PointCollector collector) throws Exception` +4. `void terminate(PointCollector collector) throws Exception` +5. `void beforeDestroy() ` + +注意,框架每执行一次 UDTF 查询,都会构造一个全新的 UDF 类实例,查询结束时,对应的 UDF 类实例即被销毁,因此不同 UDTF 查询(即使是在同一个 SQL 语句中)UDF 类实例内部的数据都是隔离的。您可以放心地在 UDTF 中维护一些状态数据,无需考虑并发对 UDF 类实例内部状态数据的影响。 + +下面将详细介绍各个接口的使用方法。 + + * void validate(UDFParameterValidator validator) throws Exception + + `validate`方法能够对用户输入的参数进行验证。 + + 您可以在该方法中限制输入序列的数量和类型,检查用户输入的属性或者进行自定义逻辑的验证。 + + `UDFParameterValidator`的使用方法请见 Javadoc。 + + * void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception + + `beforeStart`方法有两个作用: + + 1. 帮助用户解析 SQL 语句中的 UDF 参数 + 2. 配置 UDF 运行时必要的信息,即指定 UDF 访问原始数据时采取的策略和输出结果序列的类型 + 3. 创建资源,比如建立外部链接,打开文件等。 + +### UDFParameters + +`UDFParameters`的作用是解析 SQL 语句中的 UDF 参数(SQL 中 UDF 函数名称后括号中的部分)。参数包括序列类型参数和字符串 key-value 对形式输入的属性参数。 + +例子: + +``` sql +SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; +``` + +用法: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + String stringValue = parameters.getString("key1"); // iotdb + Float floatValue = parameters.getFloat("key2"); // 123.45 + Double doubleValue = parameters.getDouble("key3"); // null + int intValue = parameters.getIntOrDefault("key4", 678); // 678 + // do something + + // configurations + // ... +} +``` + +### UDTFConfigurations + +您必须使用 `UDTFConfigurations` 指定 UDF 访问原始数据时采取的策略和输出结果序列的类型。 + +用法: + +``` java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(Type.INT32); +} +``` + +其中`setAccessStrategy`方法用于设定 UDF 访问原始数据时采取的策略,`setOutputDataType`用于设定输出结果序列的类型。 + + * setAccessStrategy + +注意,您在此处设定的原始数据访问策略决定了框架会调用哪一种`transform`方法 ,请实现与原始数据访问策略对应的`transform`方法。当然,您也可以根据`UDFParameters`解析出来的属性参数,动态决定设定哪一种策略,因此,实现两种`transform`方法也是被允许的。 + +下面是您可以设定的访问原始数据的策略: + +| 接口定义 | 描述 | 调用的`transform`方法 | +| :-------------------------------- |:-----------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------------------------------------ | +| `RowByRowAccessStrategy` | 逐行地处理原始数据。框架会为每一行原始数据输入调用一次`transform`方法。当 UDF 只有一个输入序列时,一行输入就是该输入序列中的一个数据点。当 UDF 有多个输入序列时,一行输入序列对应的是这些输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。 | `void transform(Row row, PointCollector collector) throws Exception` | +| `SlidingTimeWindowAccessStrategy` | 以滑动时间窗口的方式处理原始数据。框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。 | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `SlidingSizeWindowAccessStrategy` | 以固定行数的方式处理原始数据,即每个数据处理窗口都会包含固定行数的数据(最后一个窗口除外)。框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。 | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `SessionTimeWindowAccessStrategy` | 以会话窗口的方式处理原始数据,框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据,每一行数据对应的是输入序列按时间对齐后的结果(一行数据中,可能存在某一列为`null`值,但不会全部都是`null`)。 | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | +| `StateWindowAccessStrategy` | 以状态窗口的方式处理原始数据,框架会为每一个原始数据输入窗口调用一次`transform`方法。一个窗口可能存在多行数据。目前仅支持对一个物理量也就是一列数据进行开窗。 | `void transform(RowWindow rowWindow, PointCollector collector) throws Exception` | + +`RowByRowAccessStrategy`的构造不需要任何参数。 + +如图是`SlidingTimeWindowAccessStrategy`的开窗示意图。 + + +`SlidingTimeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 3 类参数: + +1. 时间轴显示时间窗开始和结束时间 +2. 划分时间轴的时间间隔参数(必须为正数) +3. 滑动步长(不要求大于等于时间间隔,但是必须为正数) + +时间轴显示时间窗开始和结束时间不是必须要提供的。当您不提供这类参数时,时间轴显示时间窗开始时间会被定义为整个查询结果集中最小的时间戳,时间轴显示时间窗结束时间会被定义为整个查询结果集中最大的时间戳。 + +滑动步长参数也不是必须的。当您不提供滑动步长参数时,滑动步长会被设定为划分时间轴的时间间隔。 + +3 类参数的关系可见下图。策略的构造方法详见 Javadoc。 + + + +注意,最后的一些时间窗口的实际时间间隔可能小于规定的时间间隔参数。另外,可能存在某些时间窗口内数据行数量为 0 的情况,这种情况框架也会为该窗口调用一次`transform`方法。 + +如图是`SlidingSizeWindowAccessStrategy`的开窗示意图。 + + +`SlidingSizeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 2 个参数: + +1. 窗口大小,即一个数据处理窗口包含的数据行数。注意,最后一些窗口的数据行数可能少于规定的数据行数。 +2. 滑动步长,即下一窗口第一个数据行与当前窗口第一个数据行间的数据行数(不要求大于等于窗口大小,但是必须为正数) + +滑动步长参数不是必须的。当您不提供滑动步长参数时,滑动步长会被设定为窗口大小。 + +如图是`SessionTimeWindowAccessStrategy`的开窗示意图。**时间间隔小于等于给定的最小时间间隔 sessionGap 则分为一组。** + + +`SessionTimeWindowAccessStrategy`有多种构造方法,您可以向构造方法提供 2 类参数: +1. 时间轴显示时间窗开始和结束时间。 +2. 会话窗口之间的最小时间间隔。 + + +如图是`StateWindowAccessStrategy`的开窗示意图。**对于数值型数据,状态差值小于等于给定的阈值 delta 则分为一组。** + + +`StateWindowAccessStrategy`有四种构造方法。 +1. 针对数值型数据,可以提供时间轴显示时间窗开始和结束时间以及对于单个窗口内部允许变化的阈值delta。 +2. 针对文本数据以及布尔数据,可以提供时间轴显示时间窗开始和结束时间。对于这两种数据类型,单个窗口内的数据是相同的,不需要提供变化阈值。 +3. 针对数值型数据,可以只提供单个窗口内部允许变化的阈值delta,时间轴显示时间窗开始时间会被定义为整个查询结果集中最小的时间戳,时间轴显示时间窗结束时间会被定义为整个查询结果集中最大的时间戳。 +4. 针对文本数据以及布尔数据,可以不提供任何参数,开始与结束时间戳见3中解释。 + +StateWindowAccessStrategy 目前只能接收一列输入。策略的构造方法详见 Javadoc。 + + * setOutputDataType + +注意,您在此处设定的输出结果序列的类型,决定了`transform`方法中`PointCollector`实际能够接收的数据类型。`setOutputDataType`中设定的输出类型和`PointCollector`实际能够接收的数据输出类型关系如下: + +| `setOutputDataType`中设定的输出类型 | `PointCollector`实际能够接收的输出类型 | +| :---------------------------------- | :----------------------------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `java.lang.String` 和 `org.apache.iotdb.udf.api.type.Binary` | + +UDTF 输出序列的类型是运行时决定的。您可以根据输入序列类型动态决定输出序列类型。 + +下面是一个简单的例子: + +```java +void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(parameters.getDataType(0)); +} +``` + +* void transform(Row row, PointCollector collector) throws Exception + +当您在`beforeStart`方法中指定 UDF 读取原始数据的策略为 `RowByRowAccessStrategy`,您就需要实现该方法,在该方法中增加对原始数据处理的逻辑。 + +该方法每次处理原始数据的一行。原始数据由`Row`读入,由`PointCollector`输出。您可以选择在一次`transform`方法调用中输出任意数量的数据点。需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`void transform(Row row, PointCollector collector) throws Exception`方法的完整 UDF 示例。它是一个加法器,接收两列时间序列输入,当这两个数据点都不为`null`时,输出这两个数据点的代数和。 + +``` java +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Adder implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(Type.INT64) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) throws Exception { + if (row.isNull(0) || row.isNull(1)) { + return; + } + collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); + } +} +``` + + * void transform(RowWindow rowWindow, PointCollector collector) throws Exception + +当您在`beforeStart`方法中指定 UDF 读取原始数据的策略为 `SlidingTimeWindowAccessStrategy`或者`SlidingSizeWindowAccessStrategy`时,您就需要实现该方法,在该方法中增加对原始数据处理的逻辑。 + +该方法每次处理固定行数或者固定时间间隔内的一批数据,我们称包含这一批数据的容器为窗口。原始数据由`RowWindow`读入,由`PointCollector`输出。`RowWindow`能够帮助您访问某一批次的`Row`,它提供了对这一批次的`Row`进行随机访问和迭代访问的接口。您可以选择在一次`transform`方法调用中输出任意数量的数据点,需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`void transform(RowWindow rowWindow, PointCollector collector) throws Exception`方法的完整 UDF 示例。它是一个计数器,接收任意列数的时间序列输入,作用是统计并输出指定时间范围内每一个时间窗口中的数据行数。 + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.RowWindow; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Counter implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(Type.INT32) + .setAccessStrategy(new SlidingTimeWindowAccessStrategy( + parameters.getLong("time_interval"), + parameters.getLong("sliding_step"), + parameters.getLong("display_window_begin"), + parameters.getLong("display_window_end"))); + } + + @Override + public void transform(RowWindow rowWindow, PointCollector collector) throws Exception { + if (rowWindow.windowSize() != 0) { + collector.putInt(rowWindow.windowStartTime(), rowWindow.windowSize()); + } + } +} +``` + + * void terminate(PointCollector collector) throws Exception + +在一些场景下,UDF 需要遍历完所有的原始数据后才能得到最后的输出结果。`terminate`接口为这类 UDF 提供了支持。 + +该方法会在所有的`transform`调用执行完成后,在`beforeDestory`方法执行前被调用。您可以选择使用`transform`方法进行单纯的数据处理,最后使用`terminate`将处理结果输出。 + +结果需要由`PointCollector`输出。您可以选择在一次`terminate`方法调用中输出任意数量的数据点。需要注意的是,输出数据点的类型必须与您在`beforeStart`方法中设置的一致,而输出数据点的时间戳必须是严格单调递增的。 + +下面是一个实现了`void terminate(PointCollector collector) throws Exception`方法的完整 UDF 示例。它接收一个`INT32`类型的时间序列输入,作用是输出该序列的最大值点。 + +```java +import java.io.IOException; +import org.apache.iotdb.udf.api.UDTF; +import org.apache.iotdb.udf.api.access.Row; +import org.apache.iotdb.udf.api.collector.PointCollector; +import org.apache.iotdb.udf.api.customizer.config.UDTFConfigurations; +import org.apache.iotdb.udf.api.customizer.parameter.UDFParameters; +import org.apache.iotdb.udf.api.customizer.strategy.RowByRowAccessStrategy; +import org.apache.iotdb.udf.api.type.Type; + +public class Max implements UDTF { + + private Long time; + private int value; + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) { + if (row.isNull(0)) { + return; + } + int candidateValue = row.getInt(0); + if (time == null || value < candidateValue) { + time = row.getTime(); + value = candidateValue; + } + } + + @Override + public void terminate(PointCollector collector) throws IOException { + if (time != null) { + collector.putInt(time, value); + } + } +} +``` + + * void beforeDestroy() + +UDTF 的结束方法,您可以在此方法中进行一些资源释放等的操作。 + +此方法由框架调用。对于一个 UDF 类实例而言,生命周期中会且只会被调用一次,即在处理完最后一条记录之后被调用。 + +## 完整 Maven 项目示例 + +如果您使用 [Maven](http://search.maven.org/),可以参考我们编写的示例项目**udf-example**。您可以在 [这里](https://github.com/apache/iotdb/tree/master/example/udf) 找到它。 + +## UDF 注册 + +注册一个 UDF 可以按如下流程进行: + +1. 实现一个完整的 UDF 类,假定这个类的全类名为`org.apache.iotdb.udf.UDTFExample` +2. 将项目打成 JAR 包,如果您使用 Maven 管理项目,可以参考上述 Maven 项目示例的写法 +3. 进行注册前的准备工作,根据注册方式的不同需要做不同的准备,具体可参考以下例子 +4. 使用以下 SQL 语句注册 UDF +```sql +CREATE FUNCTION AS (USING URI URI-STRING)? +``` + +### 示例:注册名为`example`的 UDF,以下两种注册方式任选其一即可 + +#### 不指定URI + +准备工作: +使用该种方式注册时,您需要提前将 JAR 包放置到目录 `iotdb-server-1.0.0-all-bin/ext/udf`(该目录可配置) 下。 +**注意,如果您使用的是集群,那么需要将 JAR 包放置到所有 DataNode 的该目录下** + +注册语句: +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' +``` + +#### 指定URI + +准备工作: +使用该种方式注册时,您需要提前将 JAR 包上传到 URI 服务器上并确保执行注册语句的 IoTDB 实例能够访问该 URI 服务器。 +**注意,您无需手动放置 JAR 包,IoTDB 会下载 JAR 包并正确同步到整个集群** + +注册语句: +```sql +CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' USING URI 'http://jar/example.jar' +``` + +### 注意 +由于 IoTDB 的 UDF 是通过反射技术动态装载的,因此您在装载过程中无需启停服务器。 + +UDF 函数名称是大小写不敏感的。 + +请不要给 UDF 函数注册一个内置函数的名字。使用内置函数的名字给 UDF 注册会失败。 + +不同的 JAR 包中最好不要有全类名相同但实现功能逻辑不一样的类。例如 UDF(UDAF/UDTF):`udf1`、`udf2`分别对应资源`udf1.jar`、`udf2.jar`。如果两个 JAR 包里都包含一个`org.apache.iotdb.udf.UDTFExample`类,当同一个 SQL 中同时使用到这两个 UDF 时,系统会随机加载其中一个类,导致 UDF 执行行为不一致。 + +## UDF 卸载 + +卸载 UDF 的 SQL 语法如下: + +```sql +DROP FUNCTION +``` + +可以通过如下 SQL 语句卸载上面例子中的 UDF: + +```sql +DROP FUNCTION example +``` + +## UDF 查询 + +UDF 的使用方法与普通内建函数的类似。 + +### 支持的基础 SQL 语法 + +* `SLIMIT` / `SOFFSET` +* `LIMIT` / `OFFSET` +* 支持值过滤 +* 支持时间过滤 + + +### 带 * 查询 + +假定现在有时间序列 `root.sg.d1.s1`和 `root.sg.d1.s2`。 + +* **执行`SELECT example(*) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1)`和`example(root.sg.d1.s2)`的结果。 + +* **执行`SELECT example(s1, *) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1, root.sg.d1.s1)`和`example(root.sg.d1.s1, root.sg.d1.s2)`的结果。 + +* **执行`SELECT example(*, *) from root.sg.d1`** + +那么结果集中将包括`example(root.sg.d1.s1, root.sg.d1.s1)`,`example(root.sg.d1.s2, root.sg.d1.s1)`,`example(root.sg.d1.s1, root.sg.d1.s2)` 和 `example(root.sg.d1.s2, root.sg.d1.s2)`的结果。 + +### 带自定义输入参数的查询 + +您可以在进行 UDF 查询的时候,向 UDF 传入任意数量的键值对参数。键值对中的键和值都需要被单引号或者双引号引起来。注意,键值对参数只能在所有时间序列后传入。下面是一组例子: + +``` sql +SELECT example(s1, 'key1'='value1', 'key2'='value2'), example(*, 'key3'='value3') FROM root.sg.d1; +SELECT example(s1, s2, 'key1'='value1', 'key2'='value2') FROM root.sg.d1; +``` + +### 与其他查询的嵌套查询 + +``` sql +SELECT s1, s2, example(s1, s2) FROM root.sg.d1; +SELECT *, example(*) FROM root.sg.d1 DISABLE ALIGN; +SELECT s1 * example(* / s1 + s2) FROM root.sg.d1; +SELECT s1, s2, s1 + example(s1, s2), s1 - example(s1 + example(s1, s2) / s2) FROM root.sg.d1; +``` + +## 查看所有注册的 UDF + +``` sql +SHOW FUNCTIONS +``` + +## 用户权限管理 + +用户在使用 UDF 时会涉及到 3 种权限: + +* `CREATE_FUNCTION`:具备该权限的用户才被允许执行 UDF 注册操作 +* `DROP_FUNCTION`:具备该权限的用户才被允许执行 UDF 卸载操作 +* `READ_TIMESERIES`:具备该权限的用户才被允许使用 UDF 进行查询 + +更多用户权限相关的内容,请参考 [权限管理语句](../Administration-Management/Administration.md)。 + +## 配置项 + +使用配置项 `udf_lib_dir` 来配置 udf 的存储目录. +在 SQL 语句中使用自定义函数时,可能提示内存不足。这种情况下,您可以通过更改配置文件`iotdb-system.properties`中的`udf_initial_byte_array_length_for_memory_control`,`udf_memory_budget_in_mb`和`udf_reader_transformer_collector_memory_proportion`并重启服务来解决此问题。 + +## 贡献 UDF + + + +该部分主要讲述了外部用户如何将自己编写的 UDF 贡献给 IoTDB 社区。 + +### 前提条件 + +1. UDF 具有通用性。 + + 通用性主要指的是:UDF 在某些业务场景下,可以被广泛使用。换言之,就是 UDF 具有复用价值,可被社区内其他用户直接使用。 + + 如果您不确定自己写的 UDF 是否具有通用性,可以发邮件到 `dev@iotdb.apache.org` 或直接创建 ISSUE 发起讨论。 + +2. UDF 已经完成测试,且能够正常运行在用户的生产环境中。 + +### 贡献清单 + +1. UDF 的源代码 +2. UDF 的测试用例 +3. UDF 的使用说明 + +#### 源代码 + +1. 在`iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin`中创建 UDF 主类和相关的辅助类。 +2. 在`iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/udf/builtin/BuiltinTimeSeriesGeneratingFunction.java`中注册您编写的 UDF。 + +#### 测试用例 + +您至少需要为您贡献的 UDF 编写集成测试。 + +您可以在`integration-test/src/test/java/org/apache/iotdb/db/it/udf`中为您贡献的 UDF 新增一个测试类进行测试。 + +#### 使用说明 + +使用说明需要包含:UDF 的名称、UDF 的作用、执行函数必须的属性参数、函数的适用的场景以及使用示例等。 + +使用说明需包含中英文两个版本。应分别在 `docs/zh/UserGuide/Operation Manual/DML Data Manipulation Language.md` 和 `docs/UserGuide/Operation Manual/DML Data Manipulation Language.md` 中新增使用说明。 + +### 提交 PR + +当您准备好源代码、测试用例和使用说明后,就可以将 UDF 贡献到 IoTDB 社区了。在 [Github](https://github.com/apache/iotdb) 上面提交 Pull Request (PR) 即可。具体提交方式见:[Pull Request Guide](https://iotdb.apache.org/Development/HowToCommit.html)。 + +当 PR 评审通过并被合并后,您的 UDF 就已经贡献给 IoTDB 社区了! + +## 已知实现的UDF + +### 内置UDF + +1. [Aggregate Functions](../Operators-Functions/Aggregation.md) 聚合函数 +2. [Arithmetic Operators and Functions](../Operators-Functions/Mathematical.md) 算数函数 +3. [Comparison Operators and Functions](../Operators-Functions/Comparison.md) 比较函数 +4. [String Processing](../Operators-Functions/String.md) 字符串处理函数 +5. [Data Type Conversion Function](../Operators-Functions/Conversion.md) 数据类型转换函数 +6. [Constant Timeseries Generating Functions](../Operators-Functions/Constant.md) 常序列生成函数 +7. [Selector Functions](../Operators-Functions/Selection.md) 选择函数 +8. [Continuous Interval Functions](../Operators-Functions/Continuous-Interval.md) 区间查询函数 +9. [Variation Trend Calculation Functions](../Operators-Functions/Variation-Trend.md) 趋势计算函数 +10. [Sample Functions](../Operators-Functions/Sample.md) 采样函数 +11. [Time-Series](../Operators-Functions/Time-Series.md) 时间序列处理函数 + +### 数据质量函数库 + +#### 关于 + +对基于时序数据的应用而言,数据质量至关重要。基于用户自定义函数能力,IoTDB 提供了一系列关于数据质量的函数,包括数据画像、数据质量评估与修复等,能够满足工业领域对数据质量的需求。 + +#### 快速上手 + +**该函数库中的函数不是内置函数,使用前要先加载到系统中。** 操作流程如下: + +1. 下载包含全部依赖的 jar 包和注册脚本 [【点击下载】](https://archive.apache.org/dist/iotdb/1.0.1/apache-iotdb-1.0.1-library-udf-bin.zip) ; +2. 将 jar 包复制到 IoTDB 程序目录的 `ext\udf` 目录下 (若您使用的是集群,请将jar包复制到所有DataNode的该目录下); +3. 启动 IoTDB; +4. 将注册脚本复制到 IoTDB 的程序目录下(与`sbin`目录同级的根目录下),修改脚本中的参数(如果需要)并运行注册脚本以注册 UDF。 + +#### 已经实现的函数 + +1. [Data-Quality](../Operators-Functions/Data-Quality.md) 数据质量 +2. [Data-Profiling](../Operators-Functions/Data-Profiling.md) 数据画像 +3. [Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md) 异常检测 +4. [Frequency-Domain](../Operators-Functions/Frequency-Domain.md) 频域分析 +5. [Data-Matching](../Operators-Functions/Data-Matching.md) 数据匹配 +6. [Data-Repairing](../Operators-Functions/Data-Repairing.md) 数据修复 +7. [Series-Discovery](../Operators-Functions/Series-Discovery.md) 序列发现 +8. [Machine-Learning](../Operators-Functions/Machine-Learning.md) 机器学习 + +## Q&A + +Q1: 如何修改已经注册的 UDF? + +A1: 假设 UDF 的名称为`example`,全类名为`org.apache.iotdb.udf.UDTFExample`,由`example.jar`引入 + +1. 首先卸载已经注册的`example`函数,执行`DROP FUNCTION example` +2. 删除 `iotdb-server-1.0.0-all-bin/ext/udf` 目录下的`example.jar` +3. 修改`org.apache.iotdb.udf.UDTFExample`中的逻辑,重新打包,JAR 包的名字可以仍然为`example.jar` +4. 将新的 JAR 包上传至 `iotdb-server-1.0.0-all-bin/ext/udf` 目录下 +5. 装载新的 UDF,执行`CREATE FUNCTION example AS "org.apache.iotdb.udf.UDTFExample"` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Variation-Trend.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Variation-Trend.md new file mode 100644 index 00000000..a993b58b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Operators-Functions/Variation-Trend.md @@ -0,0 +1,114 @@ + + +# 趋势计算函数 + +目前 IoTDB 支持如下趋势计算函数: + +| 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | +| ----------------------- | ----------------------------------------------- | ------------------------------------------------------------ | ------------------------ | ------------------------------------------------------------ | +| TIME_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 无 | INT64 | 统计序列中某数据点的时间戳与前一数据点时间戳的差。范围内第一个数据点没有对应的结果输出。 | +| DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 无 | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DIFFERENCE | INT32 / INT64 / FLOAT / DOUBLE | 无 | 与输入序列的实际类型一致 | 统计序列中某数据点的值与前一数据点的值的差的绝对值。范围内第一个数据点没有对应的结果输出。 | +| DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | 无 | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率,数量上等同于 DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | +| NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | 无 | DOUBLE | 统计序列中某数据点相对于前一数据点的变化率的绝对值,数量上等同于 NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE。范围内第一个数据点没有对应的结果输出。 | +| DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:可选,默认为true;为true时,前一个数据点值为null时,忽略该数据点继续向前找到第一个出现的不为null的值;为false时,如果前一个数据点为null,则不忽略,使用null进行相减,结果也为null | DOUBLE | 统计序列中某数据点的值与前一数据点的值的差。第一个数据点没有对应的结果输出,输出值为null | + +例如: + +``` sql +select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; +``` + +结果: + +``` ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +| Time| root.sg1.d1.s1|time_difference(root.sg1.d1.s1)|difference(root.sg1.d1.s1)|non_negative_difference(root.sg1.d1.s1)|derivative(root.sg1.d1.s1)|non_negative_derivative(root.sg1.d1.s1)| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +|2020-12-10T17:11:49.037+08:00|7360723084922759782| 1| -8431715764844238876| 8431715764844238876| -8.4317157648442388E18| 8.4317157648442388E18| +|2020-12-10T17:11:49.038+08:00|4377791063319964531| 1| -2982932021602795251| 2982932021602795251| -2.982932021602795E18| 2.982932021602795E18| +|2020-12-10T17:11:49.039+08:00|7972485567734642915| 1| 3594694504414678384| 3594694504414678384| 3.5946945044146785E18| 3.5946945044146785E18| +|2020-12-10T17:11:49.040+08:00|2508858212791964081| 1| -5463627354942678834| 5463627354942678834| -5.463627354942679E18| 5.463627354942679E18| +|2020-12-10T17:11:49.041+08:00|2817297431185141819| 1| 308439218393177738| 308439218393177738| 3.0843921839317773E17| 3.0843921839317773E17| ++-----------------------------+-------------------+-------------------------------+--------------------------+---------------------------------------+--------------------------+---------------------------------------+ +Total line number = 5 +It costs 0.014s +``` + +## 使用示例 + +### 原始数据 + +``` ++-----------------------------+------------+------------+ +| Time|root.test.s1|root.test.s2| ++-----------------------------+------------+------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| null| +|1970-01-01T08:00:00.003+08:00| null| 3.0| +|1970-01-01T08:00:00.004+08:00| 4| null| +|1970-01-01T08:00:00.005+08:00| 5| 5.0| +|1970-01-01T08:00:00.006+08:00| null| 6.0| ++-----------------------------+------------+------------+ +``` + +### 不使用ignoreNull参数(忽略null) + +SQL: +```sql +SELECT DIFF(s1), DIFF(s2) from root.test; +``` + +输出: +``` ++-----------------------------+------------------+------------------+ +| Time|DIFF(root.test.s1)|DIFF(root.test.s2)| ++-----------------------------+------------------+------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| 2.0| +|1970-01-01T08:00:00.004+08:00| 2.0| null| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+------------------+------------------+ +``` + +### 使用ignoreNull参数 + +SQL: +```sql +SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root.test; +``` + +输出: +``` ++-----------------------------+----------------------------------------+----------------------------------------+ +| Time|DIFF(root.test.s1, "ignoreNull"="false")|DIFF(root.test.s2, "ignoreNull"="false")| ++-----------------------------+----------------------------------------+----------------------------------------+ +|1970-01-01T08:00:00.001+08:00| null| null| +|1970-01-01T08:00:00.002+08:00| 1.0| null| +|1970-01-01T08:00:00.003+08:00| null| null| +|1970-01-01T08:00:00.004+08:00| null| null| +|1970-01-01T08:00:00.005+08:00| 1.0| null| +|1970-01-01T08:00:00.006+08:00| null| 1.0| ++-----------------------------+----------------------------------------+----------------------------------------+ +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Performance.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Performance.md new file mode 100644 index 00000000..cfd3d5de --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Performance.md @@ -0,0 +1,36 @@ + + +# 性能特点 + +本章节从数据库连接、数据库读写性能及存储性能角度介绍IoTDB的性能特点,测试工具使用开源的时序数据库基准测试工具 [iot-benchmark](../Tools-System/Benchmark.md)。 + +## 数据库连接 + +- 支持高并发连接,单台服务器可支持数万次并发连接/秒。 + +## 数据库读写性能 + +- 具备高写入吞吐的特点,单核处理写入请求大于数万次/秒,单台服务器写入性能达到数千万点/秒;集群可线性扩展,集群的写入性能可达数亿点/秒。 +- 具备高查询吞吐、低查询延迟的特点,单台服务器支持数千万点/秒查询吞吐,可在毫秒级聚合百亿数据点。 + +## 存储性能 + +- 支持存储海量数据,具备PB级数据的存储和处理能力。 +- 支持高压缩比,无损压缩能够达到20倍压缩比,有损压缩能够达到100倍压缩比。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Programming-Thrift.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Programming-Thrift.md new file mode 100644 index 00000000..2adc82c6 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Programming-Thrift.md @@ -0,0 +1,155 @@ + + +# 通信服务协议 + +## Thrift rpc 接口 + +### 简介 + +Thrift 是一个远程方法调用软件框架,用来进行可扩展且跨语言的服务的开发。 +它结合了功能强大的软件堆栈和代码生成引擎, +以构建在 C++, Java, Go,Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml 这些编程语言间无缝结合的、高效的服务。 + +IoTDB 服务端和客户端之间使用 thrift 进行通信,实际使用中建议使用 IoTDB 提供的原生客户端封装: +Session 或 Session Pool。如有特殊需要,您也可以直接针对 RPC 接口进行编程 + +默认 IoTDB 服务端使用 6667 端口作为 RPC 通信端口,可修改配置项中的 +``` +rpc_port=6667 +``` +更改默认接口 + +### rpc 接口 + +``` +// 打开一个 session +TSOpenSessionResp openSession(1:TSOpenSessionReq req); + +// 关闭一个 session +TSStatus closeSession(1:TSCloseSessionReq req); + +// 执行一条 SQL 语句 +TSExecuteStatementResp executeStatement(1:TSExecuteStatementReq req); + +// 批量执行 SQL 语句 +TSStatus executeBatchStatement(1:TSExecuteBatchStatementReq req); + +// 执行查询 SQL 语句 +TSExecuteStatementResp executeQueryStatement(1:TSExecuteStatementReq req); + +// 执行插入、删除 SQL 语句 +TSExecuteStatementResp executeUpdateStatement(1:TSExecuteStatementReq req); + +// 向服务器取下一批查询结果 +TSFetchResultsResp fetchResults(1:TSFetchResultsReq req) + +// 获取元数据 +TSFetchMetadataResp fetchMetadata(1:TSFetchMetadataReq req) + +// 取消某次查询操作 +TSStatus cancelOperation(1:TSCancelOperationReq req); + +// 关闭查询操作数据集,释放资源 +TSStatus closeOperation(1:TSCloseOperationReq req); + +// 获取时区信息 +TSGetTimeZoneResp getTimeZone(1:i64 sessionId); + +// 设置时区 +TSStatus setTimeZone(1:TSSetTimeZoneReq req); + +// 获取服务端配置 +ServerProperties getProperties(); + +// 设置 database +TSStatus setStorageGroup(1:i64 sessionId, 2:string storageGroup); + +// 创建时间序列 +TSStatus createTimeseries(1:TSCreateTimeseriesReq req); + +// 创建多条时间序列 +TSStatus createMultiTimeseries(1:TSCreateMultiTimeseriesReq req); + +// 删除时间序列 +TSStatus deleteTimeseries(1:i64 sessionId, 2:list path) + +// 删除 database +TSStatus deleteStorageGroups(1:i64 sessionId, 2:list storageGroup); + +// 按行插入数据 +TSStatus insertRecord(1:TSInsertRecordReq req); + +// 按 String 格式插入一条数据 +TSStatus insertStringRecord(1:TSInsertStringRecordReq req); + +// 按列插入数据 +TSStatus insertTablet(1:TSInsertTabletReq req); + +// 按列批量插入数据 +TSStatus insertTablets(1:TSInsertTabletsReq req); + +// 按行批量插入数据 +TSStatus insertRecords(1:TSInsertRecordsReq req); + +// 按行批量插入同属于某个设备的数据 +TSStatus insertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// 按 String 格式批量按行插入数据 +TSStatus insertStringRecords(1:TSInsertStringRecordsReq req); + +// 测试按列插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertTablet(1:TSInsertTabletReq req); + +// 测试批量按列插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertTablets(1:TSInsertTabletsReq req); + +// 测试按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecord(1:TSInsertRecordReq req); + +// 测试按 String 格式按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertStringRecord(1:TSInsertStringRecordReq req); + +// 测试按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecords(1:TSInsertRecordsReq req); + +// 测试按行批量插入同属于某个设备的数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertRecordsOfOneDevice(1:TSInsertRecordsOfOneDeviceReq req); + +// 测试按 String 格式批量按行插入数据的延迟,注意:该接口不真实插入数据,只用来测试网络延迟 +TSStatus testInsertStringRecords(1:TSInsertStringRecordsReq req); + +// 删除数据 +TSStatus deleteData(1:TSDeleteDataReq req); + +// 执行原始数据查询 +TSExecuteStatementResp executeRawDataQuery(1:TSRawDataQueryReq req); + +// 向服务器申请一个查询语句 ID +i64 requestStatementId(1:i64 sessionId); +``` + +### IDL 定义文件位置 +IDL 定义文件的路径是 thrift/src/main/thrift/rpc.thrift,其中包括了结构体定义与函数定义 + +### 生成文件位置 +在 mvn 编译过程中,会调用 thrift 编译 IDL 文件,生成最终的。class 文件 +生成的文件夹路径为 thrift/target/classes/org/apache/iotdb/service/rpc/thrift diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Programming-TsFile-API.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Programming-TsFile-API.md new file mode 100644 index 00000000..f7b47fa7 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Programming-TsFile-API.md @@ -0,0 +1,561 @@ + + +# TsFile API + +TsFile 是在 IoTDB 中使用的时间序列的文件格式。在这个章节中,我们将介绍这种文件格式的用法。 + +## 安装 TsFile library + +在您自己的项目中有两种方法使用 TsFile . + +* 使用 jar 包:编译源码生成 jar 包 + +```shell +git clone https://github.com/apache/iotdb.git +cd iotdb-core/tsfile/ +mvn clean package -Dmaven.test.skip=true +``` + +命令执行完成之后,所有的 jar 包都可以从 `target/` 目录下找到。之后您可以在自己的工程中导入 `target/tsfile-1.0.0.jar`. + +* 使用 Maven 依赖: + +编译源码并且部署到您的本地仓库中需要 3 步: + + 1. 下载源码 + + ```shell +git clone https://github.com/apache/iotdb.git + ``` + 2. 编译源码和部署到本地仓库 + + ```shell +cd iotdb-core/tsfile/ +mvn clean install -Dmaven.test.skip=true + ``` + 3. 在您自己的工程中增加依赖: + + ```xml + + org.apache.iotdb + tsfile + 0.12.0 + + ``` + +或者,您可以直接使用官方的 Maven 仓库: + + 1. 首先,在`${username}\.m2\settings.xml`目录下的`settings.xml`文件中`` + 节中增加``,内容如下: + + ```xml + + allow-snapshots + true + + + apache.snapshots + Apache Development Snapshot Repository + https://repository.apache.org/content/repositories/snapshots/ + + false + + + true + + + + + ``` + 2. 之后您可以在您的工程中增加如下依赖: + + ```xml + + org.apache.iotdb + tsfile + 1.0.0 + + ``` + +## TsFile 的使用 + +本章节演示 TsFile 的详细用法。 + +时序数据 (Time-series Data) +一个时序是由 4 个序列组成,分别是 device, measurement, time, value。 + +* **measurement**: 时间序列描述的是一个物理或者形式的测量 (measurement),比如:城市的温度,一些商品的销售数量或者是火车在不同时间的速度。 +传统的传感器(如温度计)也采用单次测量 (measurement) 并产生时间序列,我们将在下面交替使用测量 (measurement) 和传感器。 + +* **device**: 一个设备指的是一个正在进行多次测量(产生多个时间序列)的实体,例如, + ​ ​ ​ 一列正在运行的火车监控它的速度、油表、它已经运行的英里数,当前的乘客每个都被传送到一个时间序列。 + +**单行数据**: 在许多工业应用程序中,一个设备通常包含多个传感器,这些传感器可能同时具有多个值,这称为一行数据。 + +在形式上,一行数据包含一个`device_id`,它是一个时间戳,表示从 1970 年 1 月 1 日 00:00:00 开始的毫秒数, +以及由`measurement_id`和相应的`value`组成的几个数据对。一行中的所有数据对都属于这个`device_id`,并且具有相同的时间戳。 +如果其中一个度量值`measurements`在某个时间戳`timestamp`没有值`value`,将使用一个空格表示(实际上 TsFile 并不存储 null 值)。 +其格式如下: + +``` +device_id, timestamp, ... +``` + +示例数据如下所示。在本例中,两个度量值 (measurement) 的数据类型分别是`INT32`和`FLOAT`。 + +``` +device_1, 1490860659000, m1, 10, m2, 12.12 +``` + +### 写入 TsFile + +TsFile 可以通过以下三个步骤生成,完整的代码参见"写入 TsFile 示例"章节。 + +1. 构造一个`TsFileWriter`实例。 + + 以下是可用的构造函数: + + * 没有预定义 schema + + ```java + public TsFileWriter(File file) throws IOException + ``` + * 预定义 schema + + ```java + public TsFileWriter(File file, Schema schema) throws IOException + ``` + 这个是用于使用 HDFS 文件系统的。`TsFileOutput`可以是`HDFSOutput`类的一个实例。 + + ```java + public TsFileWriter(TsFileOutput output, Schema schema) throws IOException + ``` + + 如果你想自己设置一些 TSFile 的配置,你可以使用`config`参数。比如: + + ```java + TSFileConfig conf = new TSFileConfig(); + conf.setTSFileStorageFs("HDFS"); + TsFileWriter tsFileWriter = new TsFileWriter(file, schema, conf); + ``` + + 在上面的例子中,数据文件将存储在 HDFS 中,而不是本地文件系统中。如果你想在本地文件系统中存储数据文件,你可以使用`conf.setTSFileStorageFs("LOCAL")`,这也是默认的配置。 + + 您还可以通过`config.setHdfsIp(...)`和`config.setHdfsPort(...)`来配置 HDFS 的 IP 和端口。默认的 IP 是`localhost`,默认的`RPC`端口是`9000`. + + **参数:** + + * file : 写入 TsFile 数据的文件 + * schema : 文件的 schemas,将在下章进行介绍 + * config : TsFile 的一些配置项 + +2. 添加测量值 (measurement) + + 你也可以先创建一个`Schema`类的实例然后把它传递给`TsFileWriter`类的构造函数 + + `Schema`类保存的是一个映射关系,key 是一个 measurement 的名字,value 是 measurement schema. + + 下面是一系列接口: + + ```java + // Create an empty Schema or from an existing map + public Schema() + public Schema(Map measurements) + // Use this two interfaces to add measurements + public void registerMeasurement(MeasurementSchema descriptor) + public void registerMeasurements(Map measurements) + // Some useful getter and checker + public TSDataType getMeasurementDataType(String measurementId) + public MeasurementSchema getMeasurementSchema(String measurementId) + public Map getAllMeasurementSchema() + public boolean hasMeasurement(String measurementId) + ``` + + 你可以在`TsFileWriter`类中使用以下接口来添加额外的测量 (measurement): + ​ + ```java + public void addMeasurement(MeasurementSchema measurementSchema) throws WriteProcessException + ``` + + `MeasurementSchema`类保存了一个测量 (measurement) 的信息,有几个构造函数: + + ```java + public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding) + public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType) + public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType, + Map props) + ``` + + **参数:** + ​ + + * measurementID: 测量的名称,通常是传感器的名称。 + + * type: 数据类型,现在支持十种类型:`BOOLEAN`, `INT32`, `DATE`, `INT64`, `TIMESTAMP`, `FLOAT`, `DOUBLE`, `TEXT`, `STRING`, `BLOB`; + + * encoding: 编码类型。 + + * compression: 压缩方式。现在支持 `UNCOMPRESSED` 和 `SNAPPY`. + + * props: 特殊数据类型的属性。比如说`FLOAT`和`DOUBLE`可以设置`max_point_number`,`TEXT`可以设置`max_string_length`。 + 可以使用 Map 来保存键值对,比如 ("max_point_number", "3")。 + + > **注意:** 虽然一个测量 (measurement) 的名字可以被用在多个 deltaObjects 中,但是它的参数是不允许被修改的。比如: + 不允许多次为同一个测量 (measurement) 名添加不同类型的编码。下面是一个错误示例: + + ```java + // The measurement "sensor_1" is float type + addMeasurement(new MeasurementSchema("sensor_1", TSDataType.FLOAT, TSEncoding.RLE)); + // This call will throw a WriteProcessException exception + addMeasurement(new MeasurementSchema("sensor_1", TSDataType.INT32, TSEncoding.RLE)); + ``` +3. 插入和写入数据。 + + 使用这个接口创建一个新的`TSRecord`(时间戳和设备对)。 + + ```java + public TSRecord(long timestamp, String deviceId) + ``` + + 然后创建一个`DataPoint`(度量 (measurement) 和值的对应),并使用 addTuple 方法将数据 DataPoint 添加正确的值到 TsRecord。 + + 用下面这种方法写 + + ```java + public void write(TSRecord record) throws IOException, WriteProcessException + ``` + +4. 调用`close`方法来完成写入过程。 + + ```java + public void close() throws IOException + ``` + +我们也支持将数据写入已关闭的 TsFile 文件中。 + +1. 使用`ForceAppendTsFileWriter`打开已经关闭的文件。 + + ```java + public ForceAppendTsFileWriter(File file) throws IOException + ``` +2. 调用 `doTruncate` 去掉文件的 Metadata 部分 + +3. 使用 `ForceAppendTsFileWriter` 构造另一个`TsFileWriter` + + ```java + public TsFileWriter(TsFileIOWriter fileWriter) throws IOException + ``` +请注意 此时需要重新添加测量值 (measurement) 再进行上述写入操作。 + +### 写入 TsFile 示例 + +您需要安装 TsFile 到本地的 Maven 仓库中。 + +```shell +mvn clean install -pl iotdb-core/tsfile -am -DskipTests +``` + +如果存在**非对齐**的时序数据(比如:不是所有的传感器都有值),您可以通过构造** TSRecord **来写入。 + +更详细的例子可以在 + +``` +/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTSRecord.java +``` + +中查看 + +如果所有时序数据都是**对齐**的,您可以通过构造** Tablet **来写入数据。 + +更详细的例子可以在 + +``` +/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTablet.java +``` +中查看 + +在已关闭的 TsFile 文件中写入新数据的详细例子可以在 + +``` +/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileForceAppendWrite.java +``` +中查看 + +### 读取 TsFile 接口 + + * 路径的定义 + +路径是一个点 (.) 分隔的字符串,它唯一地标识 TsFile 中的时间序列,例如:"root.area_1.device_1.sensor_1"。 +最后一部分"sensor_1"称为"measurementId",其余部分"root.area_1.device_1"称为 deviceId。 +正如之前提到的,不同设备中的相同测量 (measurement) 具有相同的数据类型和编码,设备也是唯一的。 + +在 read 接口中,参数`paths`表示要选择的测量值 (measurement)。 +Path 实例可以很容易地通过类`Path`来构造。例如: + +```java +Path p = new Path("device_1.sensor_1"); +``` + +我们可以为查询传递一个 ArrayList 路径,以支持多个路径查询。 + +```java +List paths = new ArrayList(); +paths.add(new Path("device_1.sensor_1")); +paths.add(new Path("device_1.sensor_3")); +``` + +> **注意:** 在构造路径时,参数的格式应该是一个点 (.) 分隔的字符串,最后一部分是 measurement,其余部分确认为 deviceId。 + + * 定义 Filter + + * 使用条件过滤 +在 TsFile 读取过程中使用 Filter 来选择满足一个或多个给定条件的数据。 + + * IExpression +`IExpression`是一个过滤器表达式接口,它将被传递给系统查询时调用。 +我们创建一个或多个筛选器表达式,并且可以使用`Binary Filter Operators`将它们连接形成最终表达式。 + +* **创建一个 Filter 表达式** + + 有两种类型的过滤器。 + + * TimeFilter: 使用时序数据中的`time`过滤。 + + ```java + IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter); + ``` + + 使用以下关系获得一个`TimeFilter`对象(值是一个 long 型变量)。 + +|Relationship|Description| +|----|----| +|TimeFilter.eq(value)|选择时间等于值的数据| +|TimeFilter.lt(value)|选择时间小于值的数据| +|TimeFilter.gt(value)|选择时间大于值的数据| +|TimeFilter.ltEq(value)|选择时间小于等于值的数据| +|TimeFilter.gtEq(value)|选择时间大于等于值的数据| +|TimeFilter.notEq(value)|选择时间不等于值的数据| +|TimeFilter.not(TimeFilter)|选择时间不满足另一个时间过滤器的数据| + + * ValueFilter: 使用时序数据中的`value`过滤。 + + +```java +IExpression valueFilterExpr = new SingleSeriesExpression(Path, ValueFilter); +``` + + `ValueFilter`的用法与`TimeFilter`相同,只是需要确保值的类型等于 measurement(在路径中定义)的类型。 + +* **Binary Filter Operators** + + Binary filter operators 可以用来连接两个单独的表达式。 + + * BinaryExpression.and(Expression, Expression): 选择同时满足两个表达式的数据。 + * BinaryExpression.or(Expression, Expression): 选择满足任意一个表达式值的数据。 + + +Filter Expression 示例 + +* **TimeFilterExpression 示例** + +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.eq(15)); // series time = 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.ltEq(15)); // series time <= 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.lt(15)); // series time < 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.gtEq(15)); // series time >= 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.notEq(15)); // series time != 15 +``` +```java +IExpression timeFilterExpr = BinaryExpression.and( + new GlobalTimeExpression(TimeFilter.gtEq(15L)), + new GlobalTimeExpression(TimeFilter.lt(25L))); // 15 <= series time < 25 +``` +```java +IExpression timeFilterExpr = BinaryExpression.or( + new GlobalTimeExpression(TimeFilter.gtEq(15L)), + new GlobalTimeExpression(TimeFilter.lt(25L))); // series time >= 15 or series time < 25 +``` + +* 读取接口 + +首先,我们打开 TsFile 并从文件路径`path`中获取一个`ReadOnlyTsFile`实例。 + +```java +TsFileSequenceReader reader = new TsFileSequenceReader(path); +ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader); +``` +接下来,我们准备路径数组和查询表达式,然后通过这个接口得到最终的`QueryExpression`对象: + +```java +QueryExpression queryExpression = QueryExpression.create(paths, statement); +``` + +ReadOnlyTsFile 类有两个`query`方法来执行查询。 + +```java +public QueryDataSet query(QueryExpression queryExpression) throws IOException +public QueryDataSet query(QueryExpression queryExpression, long partitionStartOffset, long partitionEndOffset) throws IOException +``` + +此方法是为高级应用(如 TsFile-Spark 连接器)设计的。 + +* **参数** : 对于第二个方法,添加了两个额外的参数来支持部分查询 (Partial Query): + * `partitionStartOffset`: TsFile 的开始偏移量 + * `partitionEndOffset`: TsFile 的结束偏移量 + +>什么是部分查询? + +> 在一些分布式文件系统中(比如:HDFS), 文件被分成几个部分,这些部分被称为"Blocks"并存储在不同的节点中。在涉及的每个节点上并行执行查询可以提高效率。因此需要部分查询 (Partial Query)。部分查询 (Partial Query) 仅支持查询 TsFile 中被`QueryConstant.PARTITION_START_OFFSET`和`QueryConstant.PARTITION_END_OFFSET`分割的部分。 + +* QueryDataset 接口 + + 上面执行的查询将返回一个`QueryDataset`对象。 + + 以下是一些用户常用的接口: + + * `bool hasNext();` + + 如果该数据集仍然有数据,则返回 true。 + * `List getPaths()` + + 获取这个数据集中的路径。 + * `List getDataTypes();` + + 获取数据类型。 + + * `RowRecord next() throws IOException;` + + 获取下一条记录。 + + `RowRecord`类包含一个`long`类型的时间戳和一个`List`,用于不同传感器中的数据,我们可以使用两个 getter 方法来获取它们。 + + ```java + long getTimestamp(); + List getFields(); + ``` + + 要从一个字段获取数据,请使用以下方法: + + ```java + TSDataType getDataType(); + Object getObjectValue(); + ``` + +### 读取现有 TsFile 示例 + +您需要安装 TsFile 到本地的 Maven 仓库中。 + +有关查询语句的更详细示例,请参见 +`/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileRead.java` + +```java +package org.apache.iotdb.tsfile; +import java.io.IOException; +import java.util.ArrayList; +import org.apache.iotdb.tsfile.read.ReadOnlyTsFile; +import org.apache.iotdb.tsfile.read.TsFileSequenceReader; +import org.apache.iotdb.tsfile.read.common.Path; +import org.apache.iotdb.tsfile.read.expression.IExpression; +import org.apache.iotdb.tsfile.read.expression.QueryExpression; +import org.apache.iotdb.tsfile.read.expression.impl.BinaryExpression; +import org.apache.iotdb.tsfile.read.expression.impl.GlobalTimeExpression; +import org.apache.iotdb.tsfile.read.expression.impl.SingleSeriesExpression; +import org.apache.iotdb.tsfile.read.filter.TimeFilter; +import org.apache.iotdb.tsfile.read.filter.ValueFilter; +import org.apache.iotdb.tsfile.read.query.dataset.QueryDataSet; + +/** + * The class is to show how to read TsFile file named "test.tsfile". + * The TsFile file "test.tsfile" is generated from class TsFileWrite. + * Run TsFileWrite to generate the test.tsfile first + */ +public class TsFileRead { + private static final String DEVICE1 = "device_1"; + + private static void queryAndPrint(ArrayList paths, ReadOnlyTsFile readTsFile, IExpression statement) + throws IOException { + QueryExpression queryExpression = QueryExpression.create(paths, statement); + QueryDataSet queryDataSet = readTsFile.query(queryExpression); + while (queryDataSet.hasNext()) { + System.out.println(queryDataSet.next()); + } + System.out.println("------------"); + } + + public static void main(String[] args) throws IOException { + + // file path + String path = "test.tsfile"; + + // create reader and get the readTsFile interface + try (TsFileSequenceReader reader = new TsFileSequenceReader(path); + ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader)){ + + // use these paths(all sensors) for all the queries + ArrayList paths = new ArrayList<>(); + paths.add(new Path(DEVICE1, "sensor_1")); + paths.add(new Path(DEVICE1, "sensor_2")); + paths.add(new Path(DEVICE1, "sensor_3")); + + // no filter, should select 1 2 3 4 6 7 8 + queryAndPrint(paths, readTsFile, null); + + // time filter : 4 <= time <= 10, should select 4 6 7 8 + IExpression timeFilter = + BinaryExpression.and( + new GlobalTimeExpression(TimeFilter.gtEq(4L)), + new GlobalTimeExpression(TimeFilter.ltEq(10L))); + queryAndPrint(paths, readTsFile, timeFilter); + + // value filter : device_1.sensor_2 <= 20, should select 1 2 4 6 7 + IExpression valueFilter = + new SingleSeriesExpression(new Path(DEVICE1, "sensor_2"), ValueFilter.ltEq(20L)); + queryAndPrint(paths, readTsFile, valueFilter); + + // time filter : 4 <= time <= 10, value filter : device_1.sensor_3 >= 20, should select 4 7 8 + timeFilter = + BinaryExpression.and( + new GlobalTimeExpression(TimeFilter.gtEq(4L)), + new GlobalTimeExpression(TimeFilter.ltEq(10L))); + valueFilter = + new SingleSeriesExpression(new Path(DEVICE1, "sensor_3"), ValueFilter.gtEq(20L)); + IExpression finalFilter = BinaryExpression.and(timeFilter, valueFilter); + queryAndPrint(paths, readTsFile, finalFilter); + } + } +} +``` + +## 修改 TsFile 配置项 + +```java +TSFileConfig config = TSFileDescriptor.getInstance().getConfig(); +config.setXXX(); +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Align-By.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Align-By.md new file mode 100644 index 00000000..77bc9fc3 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Align-By.md @@ -0,0 +1,65 @@ + + +# 查询对齐模式 + +在 IoTDB 中,查询结果集**默认按照时间对齐**,包含一列时间列和若干个值列,每一行数据各列的时间戳相同。 + +除按照时间对齐外,还支持以下对齐模式: + +- 按设备对齐 `ALIGN BY DEVICE` + +## 按设备对齐 + +在按设备对齐模式下,设备名会单独作为一列出现,查询结果集包含一列时间列、一列设备列和若干个值列。如果 `SELECT` 子句中选择了 `N` 列,则结果集包含 `N + 2` 列(时间列和设备名字列)。 + +在默认情况下,结果集按照 `Device` 进行排列,在每个 `Device` 内按照 `Time` 列升序排序。 + +当查询多个设备时,要求设备之间同名的列数据类型相同。 + +为便于理解,可以按照关系模型进行对应。设备可以视为关系模型中的表,选择的列可以视为表中的列,`Time + Device` 看做其主键。 + +**示例:** + +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` + +执行如下: + +``` ++-----------------------------+-----------------+-----------+------+--------+ +| Time| Device|temperature|status|hardware| ++-----------------------------+-----------------+-----------+------+--------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| 25.96| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| 24.36| true| null| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| null| true| v1| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| null| false| v2| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| null| true| v2| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| null| true| v2| ++-----------------------------+-----------------+-----------+------+--------+ +Total line number = 6 +It costs 0.012s +``` +## 设备对齐模式下的排序 +在设备对齐模式下,默认按照设备名的字典序升序排列,每个设备内部按照时间戳大小升序排列,可以通过 `ORDER BY` 子句调整设备列和时间列的排序优先级。 + +详细说明及示例见文档 [结果集排序](./Order-By.md)。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Continuous-Query.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Continuous-Query.md new file mode 100644 index 00000000..8d08321d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Continuous-Query.md @@ -0,0 +1,585 @@ + + +# 连续查询(Continuous Query, CQ) + +## 简介 +连续查询(Continuous queries, aka CQ) 是对实时数据周期性地自动执行的查询,并将查询结果写入指定的时间序列中。 + +用户可以通过连续查询实现滑动窗口流式计算,如计算某个序列每小时平均温度,并写入一个新序列中。用户可以自定义 `RESAMPLE` 子句去创建不同的滑动窗口,可以实现对于乱序数据一定程度的容忍。 + + +```sql +CREATE (CONTINUOUS QUERY | CQ) +[RESAMPLE + [EVERY ] + [BOUNDARY ] + [RANGE [, end_time_offset]] +] +[TIMEOUT POLICY BLOCKED|DISCARD] +BEGIN + SELECT CLAUSE + INTO CLAUSE + FROM CLAUSE + [WHERE CLAUSE] + [GROUP BY([, ]) [, level = ]] + [HAVING CLAUSE] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] +END +``` + +> 注意: +> 1. 如果where子句中出现任何时间过滤条件,IoTDB将会抛出异常,因为IoTDB会自动为每次查询执行指定时间范围。 +> 2. GROUP BY TIME CLAUSE在连续查询中的语法稍有不同,它不能包含原来的第一个参数,即 [start_time, end_time),IoTDB会自动填充这个缺失的参数。如果指定,IoTDB将会抛出异常。 +> 3. 如果连续查询中既没有GROUP BY TIME子句,也没有指定EVERY子句,IoTDB将会抛出异常。 + +### 连续查询语法中参数含义的描述 + +- `` 为连续查询指定一个全局唯一的标识。 +- `` 指定了连续查询周期性执行的间隔。现在支持的时间单位有:ns, us, ms, s, m, h, d, w, 并且它的值不能小于用户在`iotdb-system.properties`配置文件中指定的`continuous_query_min_every_interval`。这是一个可选参数,默认等于group by子句中的`group_by_interval`。 +- `` 指定了每次查询执行窗口的开始时间,即`now()-`。现在支持的时间单位有:ns, us, ms, s, m, h, d, w。这是一个可选参数,默认等于`EVERY`子句中的`every_interval`。 +- `` 指定了每次查询执行窗口的结束时间,即`now()-`。现在支持的时间单位有:ns, us, ms, s, m, h, d, w。这是一个可选参数,默认等于`0`. +- `` 表示用户期待的连续查询的首个周期任务的执行时间。(因为连续查询只会对当前实时的数据流做计算,所以该连续查询实际首个周期任务的执行时间并不一定等于用户指定的时间,具体计算逻辑如下所示) + - `` 可以早于、等于或者迟于当前时间。 + - 这个参数是可选的,默认等于`0`。 + - 首次查询执行窗口的开始时间为` - `. + - 首次查询执行窗口的结束时间为` - `. + - 第i个查询执行窗口的时间范围是`[ - + (i - 1) * , - + (i - 1) * )`。 + - 如果当前时间早于或等于, 那连续查询的首个周期任务的执行时间就是用户指定的`execution_boundary_time`. + - 如果当前时间迟于用户指定的`execution_boundary_time`,那么连续查询的首个周期任务的执行时间就是`execution_boundary_time + i * `中第一个大于或等于当前时间的值。 + +> - 都应该大于 0 +> - 应该小于等于 +> - 用户应该根据实际需求,为 指定合适的值 +> - 如果大于,在每一次查询执行的时间窗口上会有部分重叠 +> - 如果小于,在连续的两次查询执行的时间窗口中间将会有未覆盖的时间范围 +> - start_time_offset 应该大于end_time_offset + +#### ``等于`` + +![1](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic1.png?raw=true) + +#### ``大于`` + +![2](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic2.png?raw=true) + +#### ``小于`` + +![3](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic3.png?raw=true) + +#### ``不为0 + +![4](https://alioss.timecho.com/docs/img/UserGuide/Process-Data/Continuous-Query/pic4.png?raw=true) + +- `TIMEOUT POLICY` 指定了我们如何处理“前一个时间窗口还未执行完时,下一个窗口的执行时间已经到达的场景,默认值是`BLOCKED`. + - `BLOCKED`意味着即使下一个窗口的执行时间已经到达,我们依旧需要阻塞等待前一个时间窗口的查询执行完再开始执行下一个窗口。如果使用`BLOCKED`策略,所有的时间窗口都将会被依此执行,但是如果遇到执行查询的时间长于周期性间隔时,连续查询的结果会迟于最新的时间窗口范围。 + - `DISCARD`意味着如果前一个时间窗口还未执行完,我们会直接丢弃下一个窗口的执行时间。如果使用`DISCARD`策略,可能会有部分时间窗口得不到执行。但是一旦前一个查询执行完后,它将会使用最新的时间窗口,所以它的执行结果总能赶上最新的时间窗口范围,当然是以部分时间窗口得不到执行为代价。 + + +## 连续查询的用例 + +下面是用例数据,这是一个实时的数据流,我们假设数据都按时到达。 + +```` ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +| Time|root.ln.wf02.wt02.temperature|root.ln.wf02.wt01.temperature|root.ln.wf01.wt02.temperature|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +|2021-05-11T22:18:14.598+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:19.941+08:00| 0.0| 68.0| 68.0| 103.0| +|2021-05-11T22:18:24.949+08:00| 122.0| 45.0| 11.0| 14.0| +|2021-05-11T22:18:29.967+08:00| 47.0| 14.0| 59.0| 181.0| +|2021-05-11T22:18:34.979+08:00| 182.0| 113.0| 29.0| 180.0| +|2021-05-11T22:18:39.990+08:00| 42.0| 11.0| 52.0| 19.0| +|2021-05-11T22:18:44.995+08:00| 78.0| 38.0| 123.0| 52.0| +|2021-05-11T22:18:49.999+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:55.003+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+ +```` + +### 配置连续查询执行的周期性间隔 + +在`RESAMPLE`子句中使用`EVERY`参数指定连续查询的执行间隔,如果没有指定,默认等于`group_by_interval`。 + +```sql +CREATE CONTINUOUS QUERY cq1 +RESAMPLE EVERY 20s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq1`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。 + +`cq1`每20秒执行一次,每次执行的查询的时间窗口范围是从过去20秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq1`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq1` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq1` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq1`并不会处理当前时间窗口以外的数据,即`2021-05-11T22:18:20.000+08:00`以前的数据,所以我们会得到如下结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 配置连续查询的时间窗口大小 + +使用`RANGE`子句中的`start_time_offset`参数指定连续查询每次执行的时间窗口的开始时间偏移,如果没有指定,默认值等于`EVERY`参数。 + +```sql +CREATE CONTINUOUS QUERY cq2 +RESAMPLE RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) +END +``` + +`cq2`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。 + +`cq2`每10秒执行一次,每次执行的查询的时间窗口范围是从过去40秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq2`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| NULL| NULL| NULL| NULL| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:18:50.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:10, 2021-05-11T22:18:50)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq2` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq2` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +`cq2`并不会写入全是null值的行,值得注意的是`cq2`会多次计算某些区间的聚合值,下面是计算结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 同时配置连续查询执行的周期性间隔和时间窗口大小 + +使用`RESAMPLE`子句中的`EVERY`参数和`RANGE`参数分别指定连续查询的执行间隔和窗口大小。并且使用`fill()`来填充没有值的时间区间。 + +```sql +CREATE CONTINUOUS QUERY cq3 +RESAMPLE EVERY 20s RANGE 40s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq3`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。如果某些区间没有值,用`100.0`填充。 + +`cq3`每20秒执行一次,每次执行的查询的时间窗口范围是从过去40秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq3`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:40)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq3` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:19:00)`. +`cq3` generate 4 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +值得注意的是`cq3`会多次计算某些区间的聚合值,下面是计算结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| +|2021-05-11T22:18:40.000+08:00| 137.0| 172.0| 135.0| 193.0| +|2021-05-11T22:18:50.000+08:00| 16.0| 124.0| 183.0| 18.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 配置连续查询每次查询执行时间窗口的结束时间 + +使用`RESAMPLE`子句中的`EVERY`参数和`RANGE`参数分别指定连续查询的执行间隔和窗口大小。并且使用`fill()`来填充没有值的时间区间。 + +```sql +CREATE CONTINUOUS QUERY cq4 +RESAMPLE EVERY 20s RANGE 40s, 20s +BEGIN + SELECT max_value(temperature) + INTO root.ln.wf02.wt02(temperature_max), root.ln.wf02.wt01(temperature_max), root.ln.wf01.wt02(temperature_max), root.ln.wf01.wt01(temperature_max) + FROM root.ln.*.* + GROUP BY(10s) + FILL(100.0) +END +``` + +`cq4`计算出`temperature`传感器每10秒的平均值,并且将查询结果存储在`temperature_max`传感器下,传感器路径前缀使用跟原来一样的前缀。如果某些区间没有值,用`100.0`填充。 + +`cq4`每20秒执行一次,每次执行的查询的时间窗口范围是从过去40秒到过去20秒。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq4`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:00, 2021-05-11T22:18:20)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq4` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq4` generate 2 lines: +> ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +> +```` + +值得注意的是`cq4`只会计算每个聚合区间一次,并且每次开始执行计算的时间都会比当前的时间窗口结束时间迟20s, 下面是计算结果: + +```` +> SELECT temperature_max from root.ln.*.*; ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +| Time|root.ln.wf02.wt02.temperature_max|root.ln.wf02.wt01.temperature_max|root.ln.wf01.wt02.temperature_max|root.ln.wf01.wt01.temperature_max| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +|2021-05-11T22:18:00.000+08:00| 100.0| 100.0| 100.0| 100.0| +|2021-05-11T22:18:10.000+08:00| 121.0| 72.0| 183.0| 115.0| +|2021-05-11T22:18:20.000+08:00| 122.0| 45.0| 59.0| 181.0| +|2021-05-11T22:18:30.000+08:00| 182.0| 113.0| 52.0| 180.0| ++-----------------------------+---------------------------------+---------------------------------+---------------------------------+---------------------------------+ +```` + +### 没有GROUP BY TIME子句的连续查询 + +不使用`GROUP BY TIME`子句,并在`RESAMPLE`子句中显式使用`EVERY`参数指定连续查询的执行间隔。 + +```sql +CREATE CONTINUOUS QUERY cq5 +RESAMPLE EVERY 20s +BEGIN + SELECT temperature + 1 + INTO root.precalculated_sg.::(temperature) + FROM root.ln.*.* + align by device +END +``` + +`cq5`计算以`root.ln`为前缀的所有`temperature + 1`的值,并将结果储存在另一个 database `root.precalculated_sg`中。除 database 名称不同外,目标序列与源序列路径名均相同。 + +`cq5`每20秒执行一次,每次执行的查询的时间窗口范围是从过去20秒到当前时间。 + +假设当前时间是`2021-05-11T22:18:40.000+08:00`,如果把日志等级设置为DEBUG,我们可以在`cq5`执行的DataNode上看到如下的输出: + +```` +At **2021-05-11T22:18:40.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:20, 2021-05-11T22:18:40)`. +`cq5` generate 16 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| ++-----------------------------+-------------------------------+-----------+ +> +At **2021-05-11T22:19:00.000+08:00**, `cq5` executes a query within the time range `[2021-05-11T22:18:40, 2021-05-11T22:19:00)`. +`cq5` generate 12 lines: +> ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +> +```` + +`cq5`并不会处理当前时间窗口以外的数据,即`2021-05-11T22:18:20.000+08:00`以前的数据,所以我们会得到如下结果: + +```` +> SELECT temperature from root.precalculated_sg.*.* align by device; ++-----------------------------+-------------------------------+-----------+ +| Time| Device|temperature| ++-----------------------------+-------------------------------+-----------+ +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt02| 123.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt02| 48.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt02| 183.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt02| 45.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt02| 79.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt02| 138.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt02| 17.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf02.wt01| 46.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf02.wt01| 15.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf02.wt01| 114.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf02.wt01| 12.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf02.wt01| 39.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf02.wt01| 173.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf02.wt01| 125.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt02| 12.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt02| 60.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt02| 30.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt02| 53.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt02| 124.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt02| 136.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt02| 184.0| +|2021-05-11T22:18:24.949+08:00|root.precalculated_sg.wf01.wt01| 15.0| +|2021-05-11T22:18:29.967+08:00|root.precalculated_sg.wf01.wt01| 182.0| +|2021-05-11T22:18:34.979+08:00|root.precalculated_sg.wf01.wt01| 181.0| +|2021-05-11T22:18:39.990+08:00|root.precalculated_sg.wf01.wt01| 20.0| +|2021-05-11T22:18:44.995+08:00|root.precalculated_sg.wf01.wt01| 53.0| +|2021-05-11T22:18:49.999+08:00|root.precalculated_sg.wf01.wt01| 194.0| +|2021-05-11T22:18:55.003+08:00|root.precalculated_sg.wf01.wt01| 19.0| ++-----------------------------+-------------------------------+-----------+ +```` + +## 连续查询的管理 + +### 查询系统已有的连续查询 + +展示集群中所有的已注册的连续查询 + +```sql +SHOW (CONTINUOUS QUERIES | CQS) +``` + +`SHOW (CONTINUOUS QUERIES | CQS)`会将结果集按照`cq_id`排序。 + +#### 例子 + +```sql +SHOW CONTINUOUS QUERIES; +``` + +执行以上sql,我们将会得到如下的查询结果: + +| cq_id | query | state | +|:-------------|---------------------------------------------------------------------------------------------------------------------------------------|-------| +| s1_count_cq | CREATE CQ s1_count_cq
BEGIN
SELECT count(s1)
INTO root.sg_count.d.count_s1
FROM root.sg.d
GROUP BY(30m)
END | active | + + +### 删除已有的连续查询 + +删除指定的名为cq_id的连续查询: + +```sql +DROP (CONTINUOUS QUERY | CQ) +``` + +DROP CQ并不会返回任何结果集。 + +#### 例子 + +删除名为s1_count_cq的连续查询: + +```sql +DROP CONTINUOUS QUERY s1_count_cq; +``` + +### 修改已有的连续查询 + +目前连续查询一旦被创建就不能再被修改。如果想要修改某个连续查询,只能先用`DROP`命令删除它,然后再用`CREATE`命令重新创建。 + + +## 连续查询的使用场景 + +### 对数据进行降采样并对降采样后的数据使用不同的保留策略 + +可以使用连续查询,定期将高频率采样的原始数据(如每秒1000个点),降采样(如每秒仅保留一个点)后保存到另一个 database 的同名序列中。高精度的原始数据所在 database 的`TTL`可能设置的比较短,比如一天,而低精度的降采样后的数据所在的 database `TTL`可以设置的比较长,比如一个月,从而达到快速释放磁盘空间的目的。 + +### 预计算代价昂贵的查询 + +我们可以通过连续查询对一些重复的查询进行预计算,并将查询结果保存在某些目标序列中,这样真实查询并不需要真的再次去做计算,而是直接查询目标序列的结果,从而缩短了查询的时间。 + +> 预计算查询结果尤其对一些可视化工具渲染时序图和工作台时有很大的加速作用。 + +### 作为子查询的替代品 + +IoTDB现在不支持子查询,但是我们可以通过创建连续查询得到相似的功能。我们可以将子查询注册为一个连续查询,并将子查询的结果物化到目标序列中,外层查询再直接查询哪个目标序列。 + +#### 例子 + +IoTDB并不会接收如下的嵌套子查询。这个查询会计算s1序列每隔30分钟的非空值数量的平均值: + +```sql +SELECT avg(count_s1) from (select count(s1) as count_s1 from root.sg.d group by([0, now()), 30m)); +``` + +为了得到相同的结果,我们可以: + +**1. 创建一个连续查询** + +这一步执行内层子查询部分。下面创建的连续查询每隔30分钟计算一次`root.sg.d.s1`序列的非空值数量,并将结果写入目标序列`root.sg_count.d.count_s1`中。 + +```sql +CREATE CQ s1_count_cq +BEGIN + SELECT count(s1) + INTO root.sg_count.d.count_s1 + FROM root.sg.d + GROUP BY(30m) +END +``` + +**2. 查询连续查询的结果** + +这一步执行外层查询的avg([...])部分。 + +查询序列`root.sg_count.d.count_s1`的值,并计算平均值: + +```sql +SELECT avg(count_s1) from root.sg_count.d; +``` + + +## 连续查询相关的配置参数 +| 参数名 | 描述 | 类型 | 默认值 | +| :---------------------------------- |----------------------|----------|---------------| +| `continuous_query_submit_thread` | 用于周期性提交连续查询执行任务的线程数 | int32 | 2 | +| `continuous_query_min_every_interval_in_ms` | 系统允许的连续查询最小的周期性时间间隔 | duration | 1000 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Fill.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Fill.md new file mode 100644 index 00000000..8c4f28cf --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Fill.md @@ -0,0 +1,331 @@ + + +# 结果集补空值 + +## 功能介绍 + +当执行一些数据查询时,结果集的某行某列可能没有数据,则此位置结果为空,但这种空值不利于进行数据可视化展示和分析,需要对空值进行填充。 + +在 IoTDB 中,用户可以使用 `FILL` 子句指定数据缺失情况下的填充模式,允许用户按照特定的方法对任何查询的结果集填充空值,如取前一个不为空的值、线性插值等。 + +## 语法定义 + +**`FILL` 子句的语法定义如下:** + +```sql +FILL '(' PREVIOUS | LINEAR | constant ')' +``` + +**注意:** +- 在 `Fill` 语句中只能指定一种填充方法,该方法作用于结果集的全部列。 +- 空值填充不兼容 0.13 版本及以前的语法(即不支持 `FILL(([(, , )?])+)`) + +## 填充方式 + +**IoTDB 目前支持以下三种空值填充方式:** + +- `PREVIOUS` 填充:使用该列前一个非空值进行填充。 +- `LINEAR` 填充:使用该列前一个非空值和下一个非空值的线性插值进行填充。 +- 常量填充:使用指定常量填充。 + +**各数据类型支持的填充方法如下表所示:** + +| 数据类型 | 支持的填充方法 | +| :------- |:------------------------| +| BOOLEAN | `PREVIOUS`、常量 | +| INT32 | `PREVIOUS`、`LINEAR`、常量 | +| INT64 | `PREVIOUS`、`LINEAR`、常量 | +| FLOAT | `PREVIOUS`、`LINEAR`、常量 | +| DOUBLE | `PREVIOUS`、`LINEAR`、常量 | +| TEXT | `PREVIOUS`、常量 | + +**注意:** 对于数据类型不支持指定填充方法的列,既不会填充它,也不会报错,只是让那一列保持原样。 + +**下面通过举例进一步说明。** + +如果我们不使用任何填充方式: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000; +``` + +查询结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +### `PREVIOUS` 填充 + +**对于查询结果集中的空值,使用该列前一个非空值进行填充。** + +**注意:** 如果结果集的某一列第一个值就为空,则不会填充该值,直到遇到该列第一个非空值为止。 + +例如,使用 `PREVIOUS` 填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(previous); +``` + +`PREVIOUS` 填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 21.93| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| false| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +**在前值填充时,能够支持指定一个时间间隔,如果当前null值的时间戳与前一个非null值的时间戳的间隔,超过指定的时间间隔,则不进行填充。** + +> 1. 在线性填充和常量填充的情况下,如果指定了第二个参数,会抛出异常 +> 2. 时间超时参数仅支持整数 + +例如,原始数据如下所示: + +```sql +select s1 from root.db.d1 +``` +``` ++-----------------------------+-------------+ +| Time|root.db.d1.s1| ++-----------------------------+-------------+ +|2023-11-08T16:41:50.008+08:00| 1.0| ++-----------------------------+-------------+ +|2023-11-08T16:46:50.011+08:00| 2.0| ++-----------------------------+-------------+ +|2023-11-08T16:48:50.011+08:00| 3.0| ++-----------------------------+-------------+ +``` + +根据时间分组,每1分钟求一个平均值 + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| null| ++-----------------------------+------------------+ +``` + +根据时间分组并用前值填充 + +```sql +select avg(s1) + from root.db.d1 + group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + +根据时间分组并用前值填充,并指定超过2分钟的就不填充 + +```sql +select avg(s1) +from root.db.d1 +group by([2023-11-08T16:40:00.008+08:00, 2023-11-08T16:50:00.008+08:00), 1m) + FILL(PREVIOUS, 2m); +``` +``` ++-----------------------------+------------------+ +| Time|avg(root.db.d1.s1)| ++-----------------------------+------------------+ +|2023-11-08T16:40:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:41:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:42:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:43:00.008+08:00| 1.0| ++-----------------------------+------------------+ +|2023-11-08T16:44:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:45:00.008+08:00| null| ++-----------------------------+------------------+ +|2023-11-08T16:46:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:47:00.008+08:00| 2.0| ++-----------------------------+------------------+ +|2023-11-08T16:48:00.008+08:00| 3.0| ++-----------------------------+------------------+ +|2023-11-08T16:49:00.008+08:00| 3.0| ++-----------------------------+------------------+ +``` + + +### `LINEAR` 填充 + +**对于查询结果集中的空值,使用该列前一个非空值和下一个非空值的线性插值进行填充。** + +**注意:** +- 如果某个值之前的所有值都为空,或者某个值之后的所有值都为空,则不会填充该值。 +- 如果某列的数据类型为boolean/text,我们既不会填充它,也不会报错,只是让那一列保持原样。 + +例如,使用 `LINEAR` 填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(linear); +``` + +`LINEAR` 填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 22.08| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +### 常量填充 + +**对于查询结果集中的空值,使用指定常量填充。** + +**注意:** +- 如果某列数据类型与常量类型不兼容,既不填充该列,也不报错,将该列保持原样。对于常量兼容的数据类型,如下表所示: + + | 常量类型 | 能够填充的序列数据类型 | + |:------ |:------------------ | + | `BOOLEAN` | `BOOLEAN` `TEXT` | + | `INT64` | `INT32` `INT64` `FLOAT` `DOUBLE` `TEXT` | + | `DOUBLE` | `FLOAT` `DOUBLE` `TEXT` | + | `TEXT` | `TEXT` | +- 当常量值大于 `INT32` 所能表示的最大值时,对于 `INT32` 类型的列,既不填充该列,也不报错,将该列保持原样。 + +例如,使用 `FLOAT` 类型的常量填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(2.0); +``` + +`FLOAT` 类型的常量填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| 2.0| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| null| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| null| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` + +再比如,使用 `BOOLEAN` 类型的常量填充,SQL 语句如下: + +```sql +select temperature, status from root.sgcc.wf03.wt01 where time >= 2017-11-01T16:37:00.000 and time <= 2017-11-01T16:40:00.000 fill(true); +``` + +`BOOLEAN` 类型的常量填充后的结果如下: + +``` ++-----------------------------+-------------------------------+--------------------------+ +| Time|root.sgcc.wf03.wt01.temperature|root.sgcc.wf03.wt01.status| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:37:00.000+08:00| 21.93| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:38:00.000+08:00| null| false| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:39:00.000+08:00| 22.23| true| ++-----------------------------+-------------------------------+--------------------------+ +|2017-11-01T16:40:00.000+08:00| 23.43| true| ++-----------------------------+-------------------------------+--------------------------+ +Total line number = 4 +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Group-By.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Group-By.md new file mode 100644 index 00000000..b7f042bb --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Group-By.md @@ -0,0 +1,913 @@ + + +# 分段分组聚合 +IoTDB支持通过`GROUP BY`子句对序列进行分段或者分组聚合。 + +分段聚合是指按照时间维度,针对同时间序列中不同数据点之间的时间关系,对数据在行的方向进行分段,每个段得到一个聚合值。目前支持**时间区间分段**、**差值分段**、**条件分段**、**会话分段**和**点数分段**,未来将支持更多分段方式。 + +分组聚合是指针对不同时间序列,在时间序列的潜在业务属性上分组,每个组包含若干条时间序列,每个组得到一个聚合值。支持**按路径层级分组**和**按序列标签分组**两种分组方式。 + +## 分段聚合 + +### 时间区间分段聚合 + +时间区间分段聚合是一种时序数据典型的查询方式,数据以高频进行采集,需要按照一定的时间间隔进行聚合计算,如计算每天的平均气温,需要将气温的序列按天进行分段,然后计算平均值。 + +在 IoTDB 中,聚合查询可以通过 `GROUP BY` 子句指定按照时间区间分段聚合。用户可以指定聚合的时间间隔和滑动步长,相关参数如下: + +* 参数 1:时间轴显示时间窗口大小 +* 参数 2:聚合窗口的大小(必须为正数) +* 参数 3:聚合窗口的滑动步长(可选,默认与聚合窗口大小相同) + +下图中指出了这三个参数的含义: + + + +接下来,我们给出几个典型例子: + +#### 未指定滑动步长的时间区间分段聚合查询 + +对应的 SQL 语句是: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d); +``` +这条查询的含义是: + +由于用户没有指定滑动步长,滑动步长将会被默认设置为跟时间间隔参数相同,也就是`1d`。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-11-01T00:00:00, 2017-11-07T23:00:00)。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`1d`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[0,1d), [1d, 2d), [2d, 3d) 等等。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-11-01T00:00:00, 2017-11-07 T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-11-01T00:00:00 到 2017-11-07T23:00:00:00 的每一天) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 1440| 26.0| +|2017-11-02T00:00:00.000+08:00| 1440| 26.0| +|2017-11-03T00:00:00.000+08:00| 1440| 25.99| +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.024s +``` + +#### 指定滑动步长的时间区间分段聚合查询 + +对应的 SQL 语句是: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d); +``` + +这条查询的含义是: + +由于用户指定了滑动步长为`1d`,GROUP BY 语句执行时将会每次把时间间隔往后移动一天的步长,而不是默认的 3 小时。 + +也就意味着,我们想要取从 2017-11-01 到 2017-11-07 每一天的凌晨 0 点到凌晨 3 点的数据。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-11-01T00:00:00, 2017-11-07T23:00:00)。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`3h`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[2017-11-01T00:00:00, 2017-11-01T03:00:00), [2017-11-02T00:00:00, 2017-11-02T03:00:00), [2017-11-03T00:00:00, 2017-11-03T03:00:00) 等等。 + +上面这个例子的第三个参数是每次时间间隔的滑动步长。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-11-01T00:00:00, 2017-11-07 T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-11-01T00:00:00 到 2017-11-07T23:00:00:00 的每一天的凌晨 0 点到凌晨 3 点) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-02T00:00:00.000+08:00| 180| 25.98| +|2017-11-03T00:00:00.000+08:00| 180| 25.96| +|2017-11-04T00:00:00.000+08:00| 180| 25.96| +|2017-11-05T00:00:00.000+08:00| 180| 26.0| +|2017-11-06T00:00:00.000+08:00| 180| 25.85| +|2017-11-07T00:00:00.000+08:00| 180| 25.99| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 7 +It costs 0.006s +``` + +滑动步长可以小于聚合窗口,此时聚合窗口之间有重叠时间(类似于一个滑动窗口)。 + +例如 SQL: +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-01 10:00:00), 4h, 2h); +``` + +SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| 25.98| +|2017-11-01T02:00:00.000+08:00| 180| 25.98| +|2017-11-01T04:00:00.000+08:00| 180| 25.96| +|2017-11-01T06:00:00.000+08:00| 180| 25.96| +|2017-11-01T08:00:00.000+08:00| 180| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 5 +It costs 0.006s +``` + +#### 按照自然月份的时间区间分段聚合查询 + +对应的 SQL 语句是: + +```sql +select count(status) from root.ln.wf01.wt01 where time > 2017-11-01T01:00:00 group by([2017-11-01T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +这条查询的含义是: + +由于用户指定了滑动步长为`2mo`,GROUP BY 语句执行时将会每次把时间间隔往后移动 2 个自然月的步长,而不是默认的 1 个自然月。 + +也就意味着,我们想要取从 2017-11-01 到 2019-11-07 每 2 个自然月的第一个月的数据。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-11-01T00:00:00, 2019-11-07T23:00:00)。 + +起始时间为 2017-11-01T00:00:00,滑动步长将会以起始时间作为标准按月递增,取当月的 1 号作为时间间隔的起始时间。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`1mo`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[2017-11-01T00:00:00, 2017-12-01T00:00:00), [2018-02-01T00:00:00, 2018-03-01T00:00:00), [2018-05-03T00:00:00, 2018-06-01T00:00:00) 等等。 + +上面这个例子的第三个参数是每次时间间隔的滑动步长。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-11-01T00:00:00, 2019-11-07T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-11-01T00:00:00 到 2019-11-07T23:00:00:00 的每两个自然月的第一个月) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-01T00:00:00.000+08:00| 259| +|2018-01-01T00:00:00.000+08:00| 250| +|2018-03-01T00:00:00.000+08:00| 259| +|2018-05-01T00:00:00.000+08:00| 251| +|2018-07-01T00:00:00.000+08:00| 242| +|2018-09-01T00:00:00.000+08:00| 225| +|2018-11-01T00:00:00.000+08:00| 216| +|2019-01-01T00:00:00.000+08:00| 207| +|2019-03-01T00:00:00.000+08:00| 216| +|2019-05-01T00:00:00.000+08:00| 207| +|2019-07-01T00:00:00.000+08:00| 199| +|2019-09-01T00:00:00.000+08:00| 181| +|2019-11-01T00:00:00.000+08:00| 60| ++-----------------------------+-------------------------------+ +``` + +对应的 SQL 语句是: + +```sql +select count(status) from root.ln.wf01.wt01 group by([2017-10-31T00:00:00, 2019-11-07T23:00:00), 1mo, 2mo); +``` + +这条查询的含义是: + +由于用户指定了滑动步长为`2mo`,GROUP BY 语句执行时将会每次把时间间隔往后移动 2 个自然月的步长,而不是默认的 1 个自然月。 + +也就意味着,我们想要取从 2017-10-31 到 2019-11-07 每 2 个自然月的第一个月的数据。 + +与上述示例不同的是起始时间为 2017-10-31T00:00:00,滑动步长将会以起始时间作为标准按月递增,取当月的 31 号(即最后一天)作为时间间隔的起始时间。若起始时间设置为 30 号,滑动步长会将时间间隔的起始时间设置为当月 30 号,若不存在则为最后一天。 + +上面这个例子的第一个参数是显示窗口参数,决定了最终的显示范围是 [2017-10-31T00:00:00, 2019-11-07T23:00:00)。 + +上面这个例子的第二个参数是划分时间轴的时间间隔参数,将`1mo`当作划分间隔,显示窗口参数的起始时间当作分割原点,时间轴即被划分为连续的时间间隔:[2017-10-31T00:00:00, 2017-11-31T00:00:00), [2018-02-31T00:00:00, 2018-03-31T00:00:00), [2018-05-31T00:00:00, 2018-06-31T00:00:00) 等等。 + +上面这个例子的第三个参数是每次时间间隔的滑动步长。 + +然后系统将会用 WHERE 子句中的时间和值过滤条件以及 GROUP BY 语句中的第一个参数作为数据的联合过滤条件,获得满足所有过滤条件的数据(在这个例子里是在 [2017-10-31T00:00:00, 2019-11-07T23:00:00) 这个时间范围的数据),并把这些数据映射到之前分割好的时间轴中(这个例子里是从 2017-10-31T00:00:00 到 2019-11-07T23:00:00:00 的每两个自然月的第一个月) + +每个时间间隔窗口内都有数据,SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-10-31T00:00:00.000+08:00| 251| +|2017-12-31T00:00:00.000+08:00| 250| +|2018-02-28T00:00:00.000+08:00| 259| +|2018-04-30T00:00:00.000+08:00| 250| +|2018-06-30T00:00:00.000+08:00| 242| +|2018-08-31T00:00:00.000+08:00| 225| +|2018-10-31T00:00:00.000+08:00| 216| +|2018-12-31T00:00:00.000+08:00| 208| +|2019-02-28T00:00:00.000+08:00| 216| +|2019-04-30T00:00:00.000+08:00| 208| +|2019-06-30T00:00:00.000+08:00| 199| +|2019-08-31T00:00:00.000+08:00| 181| +|2019-10-31T00:00:00.000+08:00| 69| ++-----------------------------+-------------------------------+ +``` + +#### 左开右闭区间 + +每个区间的结果时间戳为区间右端点,对应的 SQL 语句是: + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d); +``` + +这条查询语句的时间区间是左开右闭的,结果中不会包含时间点 2017-11-01 的数据,但是会包含时间点 2017-11-07 的数据。 + +SQL 执行后的结果集如下所示: + +``` ++-----------------------------+-------------------------------+ +| Time|count(root.ln.wf01.wt01.status)| ++-----------------------------+-------------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------------+ +Total line number = 7 +It costs 0.004s +``` + +### 差值分段聚合 +IoTDB支持通过`GROUP BY VARIATION`语句来根据差值进行分组。`GROUP BY VARIATION`会将第一个点作为一个组的**基准点**,每个新的数据在按照给定规则与基准点进行差值运算后, +如果差值小于给定的阈值则将该新点归于同一组,否则结束当前分组,以这个新的数据为新的基准点开启新的分组。 +该分组方式不会重叠,且没有固定的开始结束时间。其子句语法如下: +```sql +group by variation(controlExpression[,delta][,ignoreNull=true/false]) +``` +不同的参数含义如下 +* controlExpression + +分组所参照的值,**可以是查询数据中的某一列或是多列的表达式 +(多列表达式计算后仍为一个值,使用多列表达式时指定的列必须都为数值列)**, 差值便是根据数据的controlExpression的差值运算。 +* delta + +分组所使用的阈值,同一分组中**每个点的controlExpression对应的值与该组中基准点对应值的差值都小于`delta`**。当`delta=0`时,相当于一个等值分组,所有连续且expression值相同的数据将被分到一组。 + +* ignoreNull + +用于指定`controlExpression`的值为null时对数据的处理方式,当`ignoreNull`为false时,该null值会被视为新的值,`ignoreNull`为true时,则直接跳过对应的点。 + +在`delta`取不同值时,`controlExpression`支持的返回数据类型以及当`ignoreNull`为false时对于null值的处理方式可以见下表: + +| delta | controlExpression支持的返回类型 | ignoreNull=false时对于Null值的处理 | +|----------|--------------------------------------|-----------------------------------------------------------------| +| delta!=0 | INT32、INT64、FLOAT、DOUBLE | 若正在维护分组的值不为null,null视为无穷大/无穷小,结束当前分组。连续的null视为差值相等的值,会被分配在同一个分组 | +| delta=0 | TEXT、BINARY、INT32、INT64、FLOAT、DOUBLE | null被视为新分组中的新值,连续的null属于相同的分组 | + +下图为差值分段的一个分段方式示意图,与组中第一个数据的控制列值的差值在delta内的控制列对应的点属于相同的分组。 + +groupByVariation + +#### 使用注意事项 +1. `controlExpression`的结果应该为唯一值,如果使用通配符拼接后出现多列,则报错。 +2. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +3. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +4. 当没有指定`delta`和`ignoreNull`时,`delta`默认为0,`ignoreNull`默认为true。 +5. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 + +使用如下的原始数据,接下来会给出几个事件分段查询的使用样例 +``` ++-----------------------------+-------+-------+-------+--------+-------+-------+ +| Time| s1| s2| s3| s4| s5| s6| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +|1970-01-01T08:00:00.000+08:00| 4.5| 9.0| 0.0| 45.0| 9.0| 8.25| +|1970-01-01T08:00:00.010+08:00| null| 19.0| 10.0| 145.0| 19.0| 8.25| +|1970-01-01T08:00:00.020+08:00| 24.5| 29.0| null| 245.0| 29.0| null| +|1970-01-01T08:00:00.030+08:00| 34.5| null| 30.0| 345.0| null| null| +|1970-01-01T08:00:00.040+08:00| 44.5| 49.0| 40.0| 445.0| 49.0| 8.25| +|1970-01-01T08:00:00.050+08:00| null| 59.0| 50.0| 545.0| 59.0| 6.25| +|1970-01-01T08:00:00.060+08:00| 64.5| 69.0| 60.0| 645.0| 69.0| null| +|1970-01-01T08:00:00.070+08:00| 74.5| 79.0| null| null| 79.0| 3.25| +|1970-01-01T08:00:00.080+08:00| 84.5| 89.0| 80.0| 845.0| 89.0| 3.25| +|1970-01-01T08:00:00.090+08:00| 94.5| 99.0| 90.0| 945.0| 99.0| 3.25| +|1970-01-01T08:00:00.150+08:00| 66.5| 77.0| 90.0| 945.0| 99.0| 9.25| ++-----------------------------+-------+-------+-------+--------+-------+-------+ +``` +#### delta=0时的等值事件分段 +使用如下sql语句 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6) +``` +得到如下的查询结果,这里忽略了s6为null的行 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.040+08:00| 24.5| 3| 50.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +当指定ignoreNull为false时,会将s6为null的数据也考虑进来 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, ignoreNull=false) +``` +得到如下的结果 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.020+08:00|1970-01-01T08:00:00.030+08:00| 29.5| 1| 30.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.040+08:00| 44.5| 1| 40.0| +|1970-01-01T08:00:00.050+08:00|1970-01-01T08:00:00.050+08:00| null| 1| 50.0| +|1970-01-01T08:00:00.060+08:00|1970-01-01T08:00:00.060+08:00| 64.5| 1| 60.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +#### delta!=0时的差值事件分段 +使用如下sql语句 +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6, 4) +``` +得到如下的查询结果 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.050+08:00| 24.5| 4| 100.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.090+08:00| 84.5| 3| 170.0| +|1970-01-01T08:00:00.150+08:00|1970-01-01T08:00:00.150+08:00| 66.5| 1| 90.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +group by子句中的controlExpression同样支持列的表达式 + +```sql +select __endTime, avg(s1), count(s2), sum(s3) from root.sg.d group by variation(s6+s5, 10) +``` +得到如下的查询结果 +``` ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +| Time| __endTime|avg(root.sg.d.s1)|count(root.sg.d.s2)|sum(root.sg.d.s3)| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +|1970-01-01T08:00:00.000+08:00|1970-01-01T08:00:00.010+08:00| 4.5| 2| 10.0| +|1970-01-01T08:00:00.040+08:00|1970-01-01T08:00:00.050+08:00| 44.5| 2| 90.0| +|1970-01-01T08:00:00.070+08:00|1970-01-01T08:00:00.080+08:00| 79.5| 2| 80.0| +|1970-01-01T08:00:00.090+08:00|1970-01-01T08:00:00.150+08:00| 80.5| 2| 180.0| ++-----------------------------+-----------------------------+-----------------+-------------------+-----------------+ +``` +### 条件分段聚合 +当需要根据指定条件对数据进行筛选,并将连续的符合条件的行分为一组进行聚合运算时,可以使用`GROUP BY CONDITION`的分段方式;不满足给定条件的行因为不属于任何分组会被直接简单忽略。 +其语法定义如下: +```sql +group by condition(predict,[keep>/>=/=/<=/<]threshold,[,ignoreNull=true/false]) +``` +* predict + +返回boolean数据类型的合法表达式,用于分组的筛选。 +* keep[>/>=/=/<=/<]threshold + +keep表达式用来指定形成分组所需要连续满足`predict`条件的数据行数,只有行数满足keep表达式的分组才会被输出。keep表达式由一个'keep'字符串和`long`类型的threshold组合或者是单独的`long`类型数据构成。 + +* ignoreNull=true/false + +用于指定遇到predict为null的数据行时的处理方式,为true则跳过该行,为false则结束当前分组。 + +#### 使用注意事项 +1. keep条件在查询中是必需的,但可以省略掉keep字符串给出一个`long`类型常数,默认为`keep=该long型常数`的等于条件。 +2. `ignoreNull`默认为true。 +3. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +4. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +5. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 + + +对于如下原始数据,下面会给出几个查询样例: +``` ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +| Time|root.sg.beijing.car01.soc|root.sg.beijing.car01.charging_status|root.sg.beijing.car01.vehicle_status| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| 1| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| 1| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| 1| ++-----------------------------+-------------------------+-------------------------------------+------------------------------------+ +``` +查询至少连续两行以上的charging_status=1的数据,sql语句如下: +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoreNull=true) +``` +得到结果如下: +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 10| 5| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` +当设置`ignoreNull`为false时,遇到null值为将其视为一个不满足条件的行,会结束正在计算的分组。 +```sql +select max_time(charging_status),count(vehicle_status),last_value(soc) from root.** group by condition(charging_status=1,KEEP>=2,ignoreNull=false) +``` +得到如下结果,原先的分组被含null的行拆分: +``` ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +| Time|max_time(root.sg.beijing.car01.charging_status)|count(root.sg.beijing.car01.vehicle_status)|last_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 2| 2| 16.0| +|1970-01-01T08:00:00.005+08:00| 7| 3| 36.0| +|1970-01-01T08:00:00.009+08:00| 10| 2| 60.0| ++-----------------------------+-----------------------------------------------+-------------------------------------------+-------------------------------------+ +``` +### 会话分段聚合 +`GROUP BY SESSION`可以根据时间列的间隔进行分组,在结果集的时间列中,时间间隔小于等于设定阈值的数据会被分为一组。例如在工业场景中,设备并不总是连续运行,`GROUP BY SESSION`会将设备每次接入会话所产生的数据分为一组。 +其语法定义如下: +```sql +group by session(timeInterval) +``` +* timeInterval + +设定的时间差阈值,当两条数据时间列的差值大于该阈值,则会给数据创建一个新的分组。 + +下图为`group by session`下的一个分组示意图 + + + +#### 使用注意事项 +1. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +2. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +3. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 + +对于下面的原始数据,给出几个查询样例。 +``` ++-----------------------------+-----------------+-----------+--------+------+ +| Time| Device|temperature|hardware|status| ++-----------------------------+-----------------+-----------+--------+------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01| 35.7| 11| false| +|1970-01-01T08:00:02.000+08:00|root.ln.wf02.wt01| 35.8| 22| true| +|1970-01-01T08:00:03.000+08:00|root.ln.wf02.wt01| 35.4| 33| false| +|1970-01-01T08:00:04.000+08:00|root.ln.wf02.wt01| 36.4| 44| false| +|1970-01-01T08:00:05.000+08:00|root.ln.wf02.wt01| 36.8| 55| false| +|1970-01-01T08:00:10.000+08:00|root.ln.wf02.wt01| 36.8| 110| false| +|1970-01-01T08:00:20.000+08:00|root.ln.wf02.wt01| 37.8| 220| true| +|1970-01-01T08:00:30.000+08:00|root.ln.wf02.wt01| 37.5| 330| false| +|1970-01-01T08:00:40.000+08:00|root.ln.wf02.wt01| 37.4| 440| false| +|1970-01-01T08:00:50.000+08:00|root.ln.wf02.wt01| 37.9| 550| false| +|1970-01-01T08:01:40.000+08:00|root.ln.wf02.wt01| 38.0| 110| false| +|1970-01-01T08:02:30.000+08:00|root.ln.wf02.wt01| 38.8| 220| true| +|1970-01-01T08:03:20.000+08:00|root.ln.wf02.wt01| 38.6| 330| false| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01| 38.4| 440| false| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01| 38.3| 550| false| +|1970-01-01T08:06:40.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:07:50.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-01T08:08:00.000+08:00|root.ln.wf02.wt01| null| 0| null| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01| 38.2| 110| false| +|1970-01-02T08:08:02.000+08:00|root.ln.wf02.wt01| 37.5| 220| true| +|1970-01-02T08:08:03.000+08:00|root.ln.wf02.wt01| 37.4| 330| false| +|1970-01-02T08:08:04.000+08:00|root.ln.wf02.wt01| 36.8| 440| false| +|1970-01-02T08:08:05.000+08:00|root.ln.wf02.wt01| 37.4| 550| false| ++-----------------------------+-----------------+-----------+--------+------+ +``` +可以按照不同的时间单位设定时间间隔,sql语句如下: +```sql +select __endTime,count(*) from root.** group by session(1d) +``` +得到如下结果: +``` ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +| Time| __endTime|count(root.ln.wf02.wt01.temperature)|count(root.ln.wf02.wt01.hardware)|count(root.ln.wf02.wt01.status)| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +|1970-01-01T08:00:01.000+08:00|1970-01-01T08:08:00.000+08:00| 15| 18| 15| +|1970-01-02T08:08:01.000+08:00|1970-01-02T08:08:05.000+08:00| 5| 5| 5| ++-----------------------------+-----------------------------+------------------------------------+---------------------------------+-------------------------------+ +``` +也可以和`HAVING`、`ALIGN BY DEVICE`共同使用 +```sql +select __endTime,sum(hardware) from root.ln.wf02.wt01 group by session(50s) having sum(hardware)>0 align by device +``` +得到如下结果,其中排除了`sum(hardware)`为0的部分 +``` ++-----------------------------+-----------------+-----------------------------+-------------+ +| Time| Device| __endTime|sum(hardware)| ++-----------------------------+-----------------+-----------------------------+-------------+ +|1970-01-01T08:00:01.000+08:00|root.ln.wf02.wt01|1970-01-01T08:03:20.000+08:00| 2475.0| +|1970-01-01T08:04:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:04:20.000+08:00| 440.0| +|1970-01-01T08:05:20.000+08:00|root.ln.wf02.wt01|1970-01-01T08:05:20.000+08:00| 550.0| +|1970-01-02T08:08:01.000+08:00|root.ln.wf02.wt01|1970-01-02T08:08:05.000+08:00| 1650.0| ++-----------------------------+-----------------+-----------------------------+-------------+ +``` +### 点数分段聚合 +`GROUP BY COUNT`可以根据点数分组进行聚合运算,将连续的指定数量数据点分为一组,即按照固定的点数进行分组。 +其语法定义如下: +```sql +group by count(controlExpression, size[,ignoreNull=true/false]) +``` +* controlExpression + +计数参照的对象,可以是结果集的任意列或是列的表达式 + +* size + +一个组中数据点的数量,每`size`个数据点会被分到同一个组 + +* ignoreNull=true/false + +是否忽略`controlExpression`为null的数据点,当ignoreNull为true时,在计数时会跳过`controlExpression`结果为null的数据点 + +#### 使用注意事项 +1. 对于一个分组,默认Time列输出分组的开始时间,查询时可以使用select `__endTime`的方式来使得结果输出分组的结束时间。 +2. 与`ALIGN BY DEVICE`搭配使用时会对每个device进行单独的分组操作。 +3. 当前暂不支持与`GROUP BY LEVEL`搭配使用。 +4. 当一个分组内最终的点数不满足`size`的数量时,不会输出该分组的结果 + +对于下面的原始数据,给出几个查询样例。 +``` ++-----------------------------+-----------+-----------------------+ +| Time|root.sg.soc|root.sg.charging_status| ++-----------------------------+-----------+-----------------------+ +|1970-01-01T08:00:00.001+08:00| 14.0| 1| +|1970-01-01T08:00:00.002+08:00| 16.0| 1| +|1970-01-01T08:00:00.003+08:00| 16.0| 0| +|1970-01-01T08:00:00.004+08:00| 16.0| 0| +|1970-01-01T08:00:00.005+08:00| 18.0| 1| +|1970-01-01T08:00:00.006+08:00| 24.0| 1| +|1970-01-01T08:00:00.007+08:00| 36.0| 1| +|1970-01-01T08:00:00.008+08:00| 36.0| null| +|1970-01-01T08:00:00.009+08:00| 45.0| 1| +|1970-01-01T08:00:00.010+08:00| 60.0| 1| ++-----------------------------+-----------+-----------------------+ +``` +sql语句如下 +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5) +``` +得到如下结果,其中由于第二个1970-01-01T08:00:00.006+08:00到1970-01-01T08:00:00.010+08:00的窗口中包含四个点,不符合`size = 5`的条件,因此不被输出 +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` +而当使用ignoreNull将null值也考虑进来时,可以得到两个点计数为5的窗口,sql如下 +```sql +select count(charging_stauts), first_value(soc) from root.sg group by count(charging_status,5,ignoreNull=false) +``` +得到如下结果 +``` ++-----------------------------+-----------------------------+--------------------------------------+ +| Time| __endTime|first_value(root.sg.beijing.car01.soc)| ++-----------------------------+-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.001+08:00|1970-01-01T08:00:00.005+08:00| 14.0| +|1970-01-01T08:00:00.006+08:00|1970-01-01T08:00:00.010+08:00| 24.0| ++-----------------------------+-----------------------------+--------------------------------------+ +``` +## 分组聚合 + +### 路径层级分组聚合 + +在时间序列层级结构中,路径层级分组聚合查询用于**对某一层级下同名的序列进行聚合查询**。 + +- 使用 `GROUP BY LEVEL = INT` 来指定需要聚合的层级,并约定 `ROOT` 为第 0 层。若统计 "root.ln" 下所有序列则需指定 level 为 1。 +- 路径层次分组聚合查询支持使用所有内置聚合函数。对于 `sum`,`avg`,`min_value`, `max_value`, `extreme` 五种聚合函数,需保证所有聚合的时间序列数据类型相同。其他聚合函数没有此限制。 + +**示例1:** 不同 database 下均存在名为 status 的序列, 如 "root.ln.wf01.wt01.status", "root.ln.wf02.wt02.status", 以及 "root.sgcc.wf03.wt01.status", 如果需要统计不同 database 下 status 序列的数据点个数,使用以下查询: + +```sql +select count(status) from root.** group by level = 1 +``` + +运行结果为: + +``` ++-------------------------+---------------------------+ +|count(root.ln.*.*.status)|count(root.sgcc.*.*.status)| ++-------------------------+---------------------------+ +| 20160| 10080| ++-------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**示例2:** 统计不同设备下 status 序列的数据点个数,可以规定 level = 3, + +```sql +select count(status) from root.** group by level = 3 +``` + +运行结果为: + +``` ++---------------------------+---------------------------+ +|count(root.*.*.wt01.status)|count(root.*.*.wt02.status)| ++---------------------------+---------------------------+ +| 20160| 10080| ++---------------------------+---------------------------+ +Total line number = 1 +It costs 0.003s +``` + +注意,这时会将 database `ln` 和 `sgcc` 下名为 `wt01` 的设备视为同名设备聚合在一起。 + +**示例3:** 统计不同 database 下的不同设备中 status 序列的数据点个数,可以使用以下查询: + +```sql +select count(status) from root.** group by level = 1, 3 +``` + +运行结果为: + +``` ++----------------------------+----------------------------+------------------------------+ +|count(root.ln.*.wt01.status)|count(root.ln.*.wt02.status)|count(root.sgcc.*.wt01.status)| ++----------------------------+----------------------------+------------------------------+ +| 10080| 10080| 10080| ++----------------------------+----------------------------+------------------------------+ +Total line number = 1 +It costs 0.003s +``` + +**示例4:** 查询所有序列下温度传感器 temperature 的最大值,可以使用下列查询语句: + +```sql +select max_value(temperature) from root.** group by level = 0 +``` + +运行结果: + +``` ++---------------------------------+ +|max_value(root.*.*.*.temperature)| ++---------------------------------+ +| 26.0| ++---------------------------------+ +Total line number = 1 +It costs 0.013s +``` + +**示例5:** 上面的查询都是针对某一个传感器,特别地,**如果想要查询某一层级下所有传感器拥有的总数据点数,则需要显式规定测点为 `*`** + +```sql +select count(*) from root.ln.** group by level = 2 +``` + +运行结果: + +``` ++----------------------+----------------------+ +|count(root.*.wf01.*.*)|count(root.*.wf02.*.*)| ++----------------------+----------------------+ +| 20160| 20160| ++----------------------+----------------------+ +Total line number = 1 +It costs 0.013s +``` + +#### 与时间区间分段聚合混合使用 + +通过定义 LEVEL 来统计指定层级下的数据点个数。 + +例如: + +统计降采样后的数据点个数 + +```sql +select count(status) from root.ln.wf01.wt01 group by ((2017-11-01T00:00:00, 2017-11-07T23:00:00],1d), level=1; +``` + +结果: + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-02T00:00:00.000+08:00| 1440| +|2017-11-03T00:00:00.000+08:00| 1440| +|2017-11-04T00:00:00.000+08:00| 1440| +|2017-11-05T00:00:00.000+08:00| 1440| +|2017-11-06T00:00:00.000+08:00| 1440| +|2017-11-07T00:00:00.000+08:00| 1440| +|2017-11-07T23:00:00.000+08:00| 1380| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.006s +``` + +加上滑动 Step 的降采样后的结果也可以汇总 + +```sql +select count(status) from root.ln.wf01.wt01 group by ([2017-11-01 00:00:00, 2017-11-07 23:00:00), 3h, 1d), level=1; +``` + +``` ++-----------------------------+-------------------------+ +| Time|COUNT(root.ln.*.*.status)| ++-----------------------------+-------------------------+ +|2017-11-01T00:00:00.000+08:00| 180| +|2017-11-02T00:00:00.000+08:00| 180| +|2017-11-03T00:00:00.000+08:00| 180| +|2017-11-04T00:00:00.000+08:00| 180| +|2017-11-05T00:00:00.000+08:00| 180| +|2017-11-06T00:00:00.000+08:00| 180| +|2017-11-07T00:00:00.000+08:00| 180| ++-----------------------------+-------------------------+ +Total line number = 7 +It costs 0.004s +``` + +### 标签分组聚合 + +IoTDB 支持通过 `GROUP BY TAGS` 语句根据时间序列中定义的标签的键值做分组聚合查询。 + +我们先在 IoTDB 中写入如下示例数据,稍后会以这些数据为例介绍标签聚合查询。 + +这些是某工厂 `factory1` 在多个城市的多个车间的设备温度数据, 时间范围为 [1000, 10000)。 + +时间序列路径中的设备一级是设备唯一标识。城市信息 `city` 和车间信息 `workshop` 则被建模在该设备时间序列的标签中。 +其中,设备 `d1`、`d2` 在 `Beijing` 的 `w1` 车间, `d3`、`d4` 在 `Beijing` 的 `w2` 车间,`d5`、`d6` 在 `Shanghai` 的 `w1` 车间,`d7` 在 `Shanghai` 的 `w2` 车间。 +`d8` 和 `d9` 设备目前处于调试阶段,还未被分配到具体的城市和车间,所以其相应的标签值为空值。 + +```SQL +create database root.factory1; +create timeseries root.factory1.d1.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d2.temperature with datatype=FLOAT tags(city=Beijing, workshop=w1); +create timeseries root.factory1.d3.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d4.temperature with datatype=FLOAT tags(city=Beijing, workshop=w2); +create timeseries root.factory1.d5.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d6.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w1); +create timeseries root.factory1.d7.temperature with datatype=FLOAT tags(city=Shanghai, workshop=w2); +create timeseries root.factory1.d8.temperature with datatype=FLOAT; +create timeseries root.factory1.d9.temperature with datatype=FLOAT; + +insert into root.factory1.d1(time, temperature) values(1000, 104.0); +insert into root.factory1.d1(time, temperature) values(3000, 104.2); +insert into root.factory1.d1(time, temperature) values(5000, 103.3); +insert into root.factory1.d1(time, temperature) values(7000, 104.1); + +insert into root.factory1.d2(time, temperature) values(1000, 104.4); +insert into root.factory1.d2(time, temperature) values(3000, 103.7); +insert into root.factory1.d2(time, temperature) values(5000, 103.3); +insert into root.factory1.d2(time, temperature) values(7000, 102.9); + +insert into root.factory1.d3(time, temperature) values(1000, 103.9); +insert into root.factory1.d3(time, temperature) values(3000, 103.8); +insert into root.factory1.d3(time, temperature) values(5000, 102.7); +insert into root.factory1.d3(time, temperature) values(7000, 106.9); + +insert into root.factory1.d4(time, temperature) values(1000, 103.9); +insert into root.factory1.d4(time, temperature) values(5000, 102.7); +insert into root.factory1.d4(time, temperature) values(7000, 106.9); + +insert into root.factory1.d5(time, temperature) values(1000, 112.9); +insert into root.factory1.d5(time, temperature) values(7000, 113.0); + +insert into root.factory1.d6(time, temperature) values(1000, 113.9); +insert into root.factory1.d6(time, temperature) values(3000, 113.3); +insert into root.factory1.d6(time, temperature) values(5000, 112.7); +insert into root.factory1.d6(time, temperature) values(7000, 112.3); + +insert into root.factory1.d7(time, temperature) values(1000, 101.2); +insert into root.factory1.d7(time, temperature) values(3000, 99.3); +insert into root.factory1.d7(time, temperature) values(5000, 100.1); +insert into root.factory1.d7(time, temperature) values(7000, 99.8); + +insert into root.factory1.d8(time, temperature) values(1000, 50.0); +insert into root.factory1.d8(time, temperature) values(3000, 52.1); +insert into root.factory1.d8(time, temperature) values(5000, 50.1); +insert into root.factory1.d8(time, temperature) values(7000, 50.5); + +insert into root.factory1.d9(time, temperature) values(1000, 50.3); +insert into root.factory1.d9(time, temperature) values(3000, 52.1); +``` + +#### 单标签聚合查询 + +用户想统计该工厂每个地区的设备的温度的平均值,可以使用如下查询语句 + +```SQL +SELECT AVG(temperature) FROM root.factory1.** GROUP BY TAGS(city); +``` + +该查询会将具有同一个 `city` 标签值的时间序列的所有满足查询条件的点做平均值计算,计算结果如下 + +``` ++--------+------------------+ +| city| avg(temperature)| ++--------+------------------+ +| Beijing|104.04666697184244| +|Shanghai|107.85000076293946| +| NULL| 50.84999910990397| ++--------+------------------+ +Total line number = 3 +It costs 0.231s +``` + +从结果集中可以看到,和分段聚合、按层次分组聚合相比,标签聚合的查询结果的不同点是: +1. 标签聚合查询的聚合结果不会再做去星号展开,而是将多个时间序列的数据作为一个整体进行聚合计算。 +2. 标签聚合查询除了输出聚合结果列,还会输出聚合标签的键值列。该列的列名为聚合指定的标签键,列的值则为所有查询的时间序列中出现的该标签的值。 +如果某些时间序列未设置该标签,则在键值列中有一行单独的 `NULL` ,代表未设置标签的所有时间序列数据的聚合结果。 + +#### 多标签分组聚合查询 + +除了基本的单标签聚合查询外,还可以按顺序指定多个标签进行聚合计算。 + +例如,用户想统计每个城市的每个车间内设备的平均温度。但因为各个城市的车间名称有可能相同,所以不能直接按照 `workshop` 做标签聚合。必须要先按照城市,再按照车间处理。 + +SQL 语句如下 + +```SQL +SELECT avg(temperature) FROM root.factory1.** GROUP BY TAGS(city, workshop); +``` + +查询结果如下 + +``` ++--------+--------+------------------+ +| city|workshop| avg(temperature)| ++--------+--------+------------------+ +| NULL| NULL| 50.84999910990397| +|Shanghai| w1|113.01666768391927| +| Beijing| w2| 104.4000004359654| +|Shanghai| w2|100.10000038146973| +| Beijing| w1|103.73750019073486| ++--------+--------+------------------+ +Total line number = 5 +It costs 0.027s +``` + +从结果集中可以看到,和单标签聚合相比,多标签聚合的查询结果会根据指定的标签顺序,输出相应标签的键值列。 + +#### 基于时间区间的标签聚合查询 + +按照时间区间聚合是时序数据库中最常用的查询需求之一。IoTDB 在基于时间区间的聚合基础上,支持进一步按照标签进行聚合查询。 + +例如,用户想统计时间 `[1000, 10000)` 范围内,每个城市每个车间中的设备每 5 秒内的平均温度。 + +SQL 语句如下 + +```SQL +SELECT AVG(temperature) FROM root.factory1.** GROUP BY ([1000, 10000), 5s), TAGS(city, workshop); +``` + +查询结果如下 + +``` ++-----------------------------+--------+--------+------------------+ +| Time| city|workshop| avg(temperature)| ++-----------------------------+--------+--------+------------------+ +|1970-01-01T08:00:01.000+08:00| NULL| NULL| 50.91999893188476| +|1970-01-01T08:00:01.000+08:00|Shanghai| w1|113.20000076293945| +|1970-01-01T08:00:01.000+08:00| Beijing| w2| 103.4| +|1970-01-01T08:00:01.000+08:00|Shanghai| w2| 100.1999994913737| +|1970-01-01T08:00:01.000+08:00| Beijing| w1|103.81666692097981| +|1970-01-01T08:00:06.000+08:00| NULL| NULL| 50.5| +|1970-01-01T08:00:06.000+08:00|Shanghai| w1| 112.6500015258789| +|1970-01-01T08:00:06.000+08:00| Beijing| w2| 106.9000015258789| +|1970-01-01T08:00:06.000+08:00|Shanghai| w2| 99.80000305175781| +|1970-01-01T08:00:06.000+08:00| Beijing| w1| 103.5| ++-----------------------------+--------+--------+------------------+ +``` + +和标签聚合相比,基于时间区间的标签聚合的查询会首先按照时间区间划定聚合范围,在时间区间内部再根据指定的标签顺序,进行相应数据的聚合计算。在输出的结果集中,会包含一列时间列,该时间列值的含义和时间区间聚合查询的相同。 + +#### 标签分组聚合的限制 + +由于标签聚合功能仍然处于开发阶段,目前有如下未实现功能。 + +> 1. 暂不支持 `HAVING` 子句过滤查询结果。 +> 2. 暂不支持结果按照标签值排序。 +> 3. 暂不支持 `LIMIT`,`OFFSET`,`SLIMIT`,`SOFFSET`。 +> 4. 暂不支持 `ALIGN BY DEVICE`。 +> 5. 暂不支持聚合函数内部包含表达式,例如 `count(s+1)`。 +> 6. 不支持值过滤条件聚合,和分层聚合查询行为保持一致。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Having-Condition.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Having-Condition.md new file mode 100644 index 00000000..96695579 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Having-Condition.md @@ -0,0 +1,115 @@ + + +# 聚合结果过滤 + +如果想对聚合查询的结果进行过滤,可以在 `GROUP BY` 子句之后使用 `HAVING` 子句。 + +**注意:** + +1. `HAVING`子句中的过滤条件必须由聚合值构成,原始序列不能单独出现。 + + 下列使用方式是不正确的: + ```sql + select count(s1) from root.** group by ([1,3),1ms) having sum(s1) > s1 + select count(s1) from root.** group by ([1,3),1ms) having s1 > 1 + ``` + +2. 对`GROUP BY LEVEL`结果进行过滤时,`SELECT`和`HAVING`中出现的PATH只能有一级。 + + 下列使用方式是不正确的: + ```sql + select count(s1) from root.** group by ([1,3),1ms), level=1 having sum(d1.s1) > 1 + select count(d1.s1) from root.** group by ([1,3),1ms), level=1 having sum(s1) > 1 + ``` + +**SQL 示例:** + +- **示例 1:** + + 对于以下聚合结果进行过滤: + + ``` + +-----------------------------+---------------------+---------------------+ + | Time|count(root.test.*.s1)|count(root.test.*.s2)| + +-----------------------------+---------------------+---------------------+ + |1970-01-01T08:00:00.001+08:00| 4| 4| + |1970-01-01T08:00:00.003+08:00| 1| 0| + |1970-01-01T08:00:00.005+08:00| 2| 4| + |1970-01-01T08:00:00.007+08:00| 3| 2| + |1970-01-01T08:00:00.009+08:00| 4| 4| + +-----------------------------+---------------------+---------------------+ + ``` + + ```sql + select count(s1) from root.** group by ([1,11),2ms), level=1 having count(s2) > 2; + ``` + + 执行结果如下: + + ``` + +-----------------------------+---------------------+ + | Time|count(root.test.*.s1)| + +-----------------------------+---------------------+ + |1970-01-01T08:00:00.001+08:00| 4| + |1970-01-01T08:00:00.005+08:00| 2| + |1970-01-01T08:00:00.009+08:00| 4| + +-----------------------------+---------------------+ + ``` + +- **示例 2:** + + 对于以下聚合结果进行过滤: + ``` + +-----------------------------+-------------+---------+---------+ + | Time| Device|count(s1)|count(s2)| + +-----------------------------+-------------+---------+---------+ + |1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.003+08:00|root.test.sg1| 1| 0| + |1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.007+08:00|root.test.sg1| 2| 1| + |1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| + |1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| + |1970-01-01T08:00:00.003+08:00|root.test.sg2| 0| 0| + |1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| + |1970-01-01T08:00:00.007+08:00|root.test.sg2| 1| 1| + |1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| + +-----------------------------+-------------+---------+---------+ + ``` + + ```sql + select count(s1), count(s2) from root.** group by ([1,11),2ms) having count(s2) > 1 align by device; + ``` + + 执行结果如下: + + ``` + +-----------------------------+-------------+---------+---------+ + | Time| Device|count(s1)|count(s2)| + +-----------------------------+-------------+---------+---------+ + |1970-01-01T08:00:00.001+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.005+08:00|root.test.sg1| 1| 2| + |1970-01-01T08:00:00.009+08:00|root.test.sg1| 2| 2| + |1970-01-01T08:00:00.001+08:00|root.test.sg2| 2| 2| + |1970-01-01T08:00:00.005+08:00|root.test.sg2| 1| 2| + |1970-01-01T08:00:00.009+08:00|root.test.sg2| 2| 2| + +-----------------------------+-------------+---------+---------+ + ``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Order-By.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Order-By.md new file mode 100644 index 00000000..f6821cd9 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Order-By.md @@ -0,0 +1,277 @@ + + +# 结果集排序 + +## 时间对齐模式下的排序 +IoTDB的查询结果集默认按照时间对齐,可以使用`ORDER BY TIME`的子句指定时间戳的排列顺序。示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time desc; +``` +执行结果: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-01T00:01:00.000+08:00| v2| true| 24.36| true| +|2017-11-01T00:00:00.000+08:00| v2| true| 25.96| true| +|1970-01-01T08:00:00.002+08:00| v2| false| null| null| +|1970-01-01T08:00:00.001+08:00| v1| true| null| null| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +``` +## 设备对齐模式下的排序 +当使用`ALIGN BY DEVICE`查询对齐模式下的结果集时,可以使用`ORDER BY`子句对返回的结果集顺序进行规定。 + +在设备对齐模式下支持4种排序模式的子句,其中包括两种排序键,`DEVICE`和`TIME`,靠前的排序键为主排序键,每种排序键都支持`ASC`和`DESC`两种排列顺序。 +1. ``ORDER BY DEVICE``: 按照设备名的字典序进行排序,排序方式为字典序排序,在这种情况下,相同名的设备会以组的形式进行展示。 + +2. ``ORDER BY TIME``: 按照时间戳进行排序,此时不同的设备对应的数据点会按照时间戳的优先级被打乱排序。 + +3. ``ORDER BY DEVICE,TIME``: 按照设备名的字典序进行排序,设备名相同的数据点会通过时间戳进行排序。 + +4. ``ORDER BY TIME,DEVICE``: 按照时间戳进行排序,时间戳相同的数据点会通过设备名的字典序进行排序。 + +> 为了保证结果的可观性,当不使用`ORDER BY`子句,仅使用`ALIGN BY DEVICE`时,会为设备视图提供默认的排序方式。其中默认的排序视图为``ORDER BY DEVCE,TIME``,默认的排序顺序为`ASC`, +> 即结果集默认先按照设备名升序排列,在相同设备名内再按照时间戳升序排序。 + + +当主排序键为`DEVICE`时,结果集的格式与默认情况类似:先按照设备名对结果进行排列,在相同的设备名下内按照时间戳进行排序。示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by device desc,time asc align by device; +``` +执行结果: + +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` +主排序键为`Time`时,结果集会先按照时间戳进行排序,在时间戳相等时按照设备名排序。 +示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 order by time asc,device desc align by device; +``` +执行结果: +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| ++-----------------------------+-----------------+--------+------+-----------+ +``` +当没有显式指定时,主排序键默认为`Device`,排序顺序默认为`ASC`,示例代码如下: +```sql +select * from root.ln.** where time <= 2017-11-01T00:01:00 align by device; +``` +结果如图所示,可以看出,`ORDER BY DEVICE ASC,TIME ASC`就是默认情况下的排序方式,由于`ASC`是默认排序顺序,此处可以省略。 +``` ++-----------------------------+-----------------+--------+------+-----------+ +| Time| Device|hardware|status|temperature| ++-----------------------------+-----------------+--------+------+-----------+ +|2017-11-01T00:00:00.000+08:00|root.ln.wf01.wt01| null| true| 25.96| +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| true| 24.36| +|1970-01-01T08:00:00.001+08:00|root.ln.wf02.wt02| v1| true| null| +|1970-01-01T08:00:00.002+08:00|root.ln.wf02.wt02| v2| false| null| +|2017-11-01T00:00:00.000+08:00|root.ln.wf02.wt02| v2| true| null| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| v2| true| null| ++-----------------------------+-----------------+--------+------+-----------+ +``` +同样,可以在聚合查询中使用`ALIGN BY DEVICE`和`ORDER BY`子句,对聚合后的结果进行排序,示例代码如下所示: +```sql +select count(*) from root.ln.** group by ((2017-11-01T00:00:00.000+08:00,2017-11-01T00:03:00.000+08:00],1m) order by device asc,time asc align by device +``` +执行结果: +``` ++-----------------------------+-----------------+---------------+-------------+------------------+ +| Time| Device|count(hardware)|count(status)|count(temperature)| ++-----------------------------+-----------------+---------------+-------------+------------------+ +|2017-11-01T00:01:00.000+08:00|root.ln.wf01.wt01| null| 1| 1| +|2017-11-01T00:02:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:03:00.000+08:00|root.ln.wf01.wt01| null| 0| 0| +|2017-11-01T00:01:00.000+08:00|root.ln.wf02.wt02| 1| 1| null| +|2017-11-01T00:02:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| +|2017-11-01T00:03:00.000+08:00|root.ln.wf02.wt02| 0| 0| null| ++-----------------------------+-----------------+---------------+-------------+------------------+ +``` + +## 任意表达式排序 +除了IoTDB中规定的Time,Device关键字外,还可以通过`ORDER BY`子句对指定时间序列中任意列的表达式进行排序。 + +排序在通过`ASC`,`DESC`指定排序顺序的同时,可以通过`NULLS`语法来指定NULL值在排序中的优先级,`NULLS FIRST`默认NULL值在结果集的最上方,`NULLS LAST`则保证NULL值在结果集的最后。如果没有在子句中指定,则默认顺序为`ASC`,`NULLS LAST`。 + +对于如下的数据,将给出几个任意表达式的查询示例供参考: +``` ++-----------------------------+-------------+-------+-------+--------+-------+ +| Time| Device| base| score| bonus| total| ++-----------------------------+-------------+-------+-------+--------+-------+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0| 107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0| 105.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0| 103.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00| root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.020+08:00| root.three| 8| null| 22.5| 30.5| +|1970-01-01T08:00:00.030+08:00| root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0| 104.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0| 102.0| ++-----------------------------+-------------+-------+-------+--------+-------+ +``` + +当需要根据基础分数score对结果进行排序时,可以直接使用 +```Sql +select score from root.** order by score desc align by device +``` +会得到如下结果 + +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` + +当想要根据总分对结果进行排序,可以在order by子句中使用表达式进行计算 +```Sql +select score,total from root.one order by base+score+bonus desc +``` +该sql等价于 +```Sql +select score,total from root.one order by total desc +``` +得到如下结果 + +``` ++-----------------------------+--------------+--------------+ +| Time|root.one.score|root.one.total| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.000+08:00| 50.0| 107.0| +|1970-01-02T08:00:00.000+08:00| 50.0| 105.0| +|1970-01-03T08:00:00.000+08:00| 50.0| 103.0| ++-----------------------------+--------------+--------------+ +``` +而如果要对总分进行排序,且分数相同时依次根据score, base, bonus和提交时间进行排序时,可以通过多个表达式来指定多层排序 + +```Sql +select base, score, bonus, total from root.** order by total desc NULLS Last, + score desc NULLS Last, + bonus desc NULLS Last, + time desc align by device +``` +得到如下结果 +``` ++-----------------------------+----------+----+-----+-----+-----+ +| Time| Device|base|score|bonus|total| ++-----------------------------+----------+----+-----+-----+-----+ +|1970-01-01T08:00:00.000+08:00| root.one| 12| 50.0| 45.0|107.0| +|1970-01-02T08:00:00.000+08:00| root.one| 10| 50.0| 45.0|105.0| +|1970-01-01T08:00:00.030+08:00| root.five| 7| 53.0| 44.0|104.0| +|1970-01-03T08:00:00.000+08:00| root.one| 8| 50.0| 45.0|103.0| +|1970-01-01T08:00:00.040+08:00| root.five| 6| 54.0| 42.0|102.0| +|1970-01-01T08:00:00.010+08:00| root.four| 9| 32.0| 45.0| 86.0| +|1970-01-01T08:00:00.020+08:00| root.four| 8| 32.0| 45.0| 85.0| +|1970-01-01T08:00:00.010+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.000+08:00| root.two| 9| 50.0| 15.0| 74.0| +|1970-01-01T08:00:00.020+08:00| root.two| 8| 10.0| 15.0| 33.0| +|1970-01-01T08:00:00.010+08:00|root.three| 9| null| 24.0| 33.0| +|1970-01-01T08:00:00.030+08:00|root.three| 7| null| 23.5| 30.5| +|1970-01-01T08:00:00.020+08:00|root.three| 8| null| 22.5| 30.5| ++-----------------------------+----------+----+-----+-----+-----+ +``` +在order by中同样可以使用聚合查询表达式 +```Sql +select min_value(total) from root.** order by min_value(total) asc align by device +``` +得到如下结果 +``` ++----------+----------------+ +| Device|min_value(total)| ++----------+----------------+ +|root.three| 30.5| +| root.two| 33.0| +| root.four| 85.0| +| root.five| 102.0| +| root.one| 103.0| ++----------+----------------+ +``` +当在查询中指定多列,未被排序的列会随着行和排序列一起改变顺序,当排序列相同时行的顺序和具体实现有关(没有固定顺序) +```Sql +select min_value(total),max_value(base) from root.** order by max_value(total) desc align by device +``` +得到结果如下 +· +``` ++----------+----------------+---------------+ +| Device|min_value(total)|max_value(base)| ++----------+----------------+---------------+ +| root.one| 103.0| 12| +| root.five| 102.0| 7| +| root.four| 85.0| 9| +| root.two| 33.0| 9| +|root.three| 30.5| 9| ++----------+----------------+---------------+ +``` + +Order by device, time可以和order by expression共同使用 +```Sql +select score from root.** order by device asc, score desc, time asc align by device +``` +会得到如下结果 +``` ++-----------------------------+---------+-----+ +| Time| Device|score| ++-----------------------------+---------+-----+ +|1970-01-01T08:00:00.040+08:00|root.five| 54.0| +|1970-01-01T08:00:00.030+08:00|root.five| 53.0| +|1970-01-01T08:00:00.010+08:00|root.four| 32.0| +|1970-01-01T08:00:00.020+08:00|root.four| 32.0| +|1970-01-01T08:00:00.000+08:00| root.one| 50.0| +|1970-01-02T08:00:00.000+08:00| root.one| 50.0| +|1970-01-03T08:00:00.000+08:00| root.one| 50.0| +|1970-01-01T08:00:00.000+08:00| root.two| 50.0| +|1970-01-01T08:00:00.010+08:00| root.two| 50.0| +|1970-01-01T08:00:00.020+08:00| root.two| 10.0| ++-----------------------------+---------+-----+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Overview.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Overview.md new file mode 100644 index 00000000..6ce6120d --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Overview.md @@ -0,0 +1,342 @@ + + +# 数据查询 +## 概述 + +在 IoTDB 中,使用 `SELECT` 语句从一条或多条时间序列中查询数据。 + +### 语法定义 + +```sql +SELECT [LAST] selectExpr [, selectExpr] ... + [INTO intoItem [, intoItem] ...] + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY { + ([startTime, endTime), interval [, slidingStep]) | + LEVEL = levelNum [, levelNum] ... | + TAGS(tagKey [, tagKey] ... | + VARIATION(expression[,delta][,ignoreNull=true/false]) | + CONDITION(expression,[keep>/>=/=/ 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; +``` + +其含义为: + +被选择的设备为 ln 集团 wf01 子站 wt01 设备;被选择的时间序列为供电状态(status)和温度传感器(temperature);该语句要求选择出 “2017-11-01T00:05:00.000” 至 “2017-11-01T00:12:00.000” 之间的所选时间序列的值。 + +该 SQL 语句的执行结果如下: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 6 +It costs 0.018s +``` + +#### 示例3:按照多个时间区间选择同一设备的多列数据 + +IoTDB 支持在一次查询中指定多个时间区间条件,用户可以根据需求随意组合时间区间条件。例如, + +SQL 语句为: + +```sql +select status, temperature from root.ln.wf01.wt01 where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +其含义为: + +被选择的设备为 ln 集团 wf01 子站 wt01 设备;被选择的时间序列为“供电状态(status)”和“温度传感器(temperature)”;该语句指定了两个不同的时间区间,分别为“2017-11-01T00:05:00.000 至 2017-11-01T00:12:00.000”和“2017-11-01T16:35:00.000 至 2017-11-01T16:37:00.000”;该语句要求选择出满足任一时间区间的被选时间序列的值。 + +该 SQL 语句的执行结果如下: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| +|2017-11-01T00:10:00.000+08:00| true| 25.52| +|2017-11-01T00:11:00.000+08:00| false| 22.91| +|2017-11-01T16:35:00.000+08:00| true| 23.44| +|2017-11-01T16:36:00.000+08:00| false| 21.98| +|2017-11-01T16:37:00.000+08:00| false| 21.93| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 9 +It costs 0.018s +``` + +#### 示例4:按照多个时间区间选择不同设备的多列数据 + +该系统支持在一次查询中选择任意列的数据,也就是说,被选择的列可以来源于不同的设备。例如,SQL 语句为: + +```sql +select wf01.wt01.status, wf02.wt02.hardware from root.ln where (time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000) or (time >= 2017-11-01T16:35:00.000 and time <= 2017-11-01T16:37:00.000); +``` + +其含义为: + +被选择的时间序列为 “ln 集团 wf01 子站 wt01 设备的供电状态” 以及 “ln 集团 wf02 子站 wt02 设备的硬件版本”;该语句指定了两个时间区间,分别为 “2017-11-01T00:05:00.000 至 2017-11-01T00:12:00.000” 和 “2017-11-01T16:35:00.000 至 2017-11-01T16:37:00.000”;该语句要求选择出满足任意时间区间的被选时间序列的值。 + +该 SQL 语句的执行结果如下: + +``` ++-----------------------------+------------------------+--------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf02.wt02.hardware| ++-----------------------------+------------------------+--------------------------+ +|2017-11-01T00:06:00.000+08:00| false| v1| +|2017-11-01T00:07:00.000+08:00| false| v1| +|2017-11-01T00:08:00.000+08:00| false| v1| +|2017-11-01T00:09:00.000+08:00| false| v1| +|2017-11-01T00:10:00.000+08:00| true| v2| +|2017-11-01T00:11:00.000+08:00| false| v1| +|2017-11-01T16:35:00.000+08:00| true| v2| +|2017-11-01T16:36:00.000+08:00| false| v1| +|2017-11-01T16:37:00.000+08:00| false| v1| ++-----------------------------+------------------------+--------------------------+ +Total line number = 9 +It costs 0.014s +``` + +#### 示例5:根据时间降序返回结果集 + +IoTDB 支持 `order by time` 语句,用于对结果按照时间进行降序展示。例如,SQL 语句为: + +```sql +select * from root.ln.** where time > 1 order by time desc limit 10; +``` + +语句执行的结果为: + +``` ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +|2017-11-07T23:59:00.000+08:00| v1| false| 21.07| false| +|2017-11-07T23:58:00.000+08:00| v1| false| 22.93| false| +|2017-11-07T23:57:00.000+08:00| v2| true| 24.39| true| +|2017-11-07T23:56:00.000+08:00| v2| true| 24.44| true| +|2017-11-07T23:55:00.000+08:00| v2| true| 25.9| true| +|2017-11-07T23:54:00.000+08:00| v1| false| 22.52| false| +|2017-11-07T23:53:00.000+08:00| v2| true| 24.58| true| +|2017-11-07T23:52:00.000+08:00| v1| false| 20.18| false| +|2017-11-07T23:51:00.000+08:00| v1| false| 22.24| false| +|2017-11-07T23:50:00.000+08:00| v2| true| 23.7| true| ++-----------------------------+--------------------------+------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.016s +``` + +### 查询执行接口 + +在 IoTDB 中,提供两种方式执行数据查询操作: +- 使用 IoTDB-SQL 执行查询。 +- 常用查询的高效执行接口,包括时间序列原始数据范围查询、最新点查询、简单聚合查询。 + +#### 使用 IoTDB-SQL 执行查询 + +数据查询语句支持在 SQL 命令行终端、JDBC、JAVA / C++ / Python / Go 等编程语言 API、RESTful API 中使用。 + +- 在 SQL 命令行终端中执行查询语句:启动 SQL 命令行终端,直接输入查询语句执行即可,详见 [SQL 命令行终端](../QuickStart/Command-Line-Interface.md)。 + +- 在 JDBC 中执行查询语句,详见 [JDBC](../API/Programming-JDBC.md) 。 + +- 在 JAVA / C++ / Python / Go 等编程语言 API 中执行查询语句,详见应用编程接口一章相应文档。接口原型如下: + + ```java + SessionDataSet executeQueryStatement(String sql); + ``` + +- 在 RESTful API 中使用,详见 [HTTP API V1](../API/RestServiceV1.md) 或者 [HTTP API V2](../API/RestServiceV2.md)。 + +#### 常用查询的高效执行接口 + +各编程语言的 API 为常用的查询提供了高效执行接口,可以省去 SQL 解析等操作的耗时。包括: + +* 时间序列原始数据范围查询: + - 指定的查询时间范围为左闭右开区间,包含开始时间但不包含结束时间。 + +```java +SessionDataSet executeRawDataQuery(List paths, long startTime, long endTime); +``` + +* 最新点查询: + - 查询最后一条时间戳大于等于某个时间点的数据。 + +```java +SessionDataSet executeLastDataQuery(List paths, long lastTime); +``` + +* 聚合查询: + - 支持指定查询时间范围。指定的查询时间范围为左闭右开区间,包含开始时间但不包含结束时间。 + - 支持按照时间区间分段查询。 + +```java +SessionDataSet executeAggregationQuery(List paths, List aggregations); + +SessionDataSet executeAggregationQuery( + List paths, List aggregations, long startTime, long endTime); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval); + +SessionDataSet executeAggregationQuery( + List paths, + List aggregations, + long startTime, + long endTime, + long interval, + long slidingStep); +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Pagination.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Pagination.md new file mode 100644 index 00000000..05612588 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Pagination.md @@ -0,0 +1,283 @@ + + +# 查询结果分页 + +当查询结果集数据量很大,放在一个页面不利于显示,可以使用 `LIMIT/SLIMIT` 子句和 `OFFSET/SOFFSET `子句进行分页控制。 + +- `LIMIT` 和 `SLIMIT` 子句用于控制查询结果的行数和列数。 +- `OFFSET` 和 `SOFFSET` 子句用于控制结果显示的起始位置。 + +## 按行分页 + +用户可以通过 `LIMIT` 和 `OFFSET` 子句控制查询结果的行数,`LIMIT rowLimit` 指定查询结果的行数,`OFFSET rowOffset` 指定查询结果显示的起始行位置。 + +注意: +- 当 `rowOffset` 超过结果集的大小时,返回空结果集。 +- 当 `rowLimit` 超过结果集的大小时,返回所有查询结果。 +- 当 `rowLimit` 和 `rowOffset` 不是正整数,或超过 `INT64` 允许的最大值时,系统将提示错误。 + +我们将通过以下示例演示如何使用 `LIMIT` 和 `OFFSET` 子句。 + +- **示例 1:** 基本的 `LIMIT` 子句 + +SQL 语句: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 10 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 选择的时间序列是“状态”和“温度”。 SQL 语句要求返回查询结果的前 10 行。 + +结果如下所示: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:00:00.000+08:00| true| 25.96| +|2017-11-01T00:01:00.000+08:00| true| 24.36| +|2017-11-01T00:02:00.000+08:00| false| 20.09| +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| +|2017-11-01T00:08:00.000+08:00| false| 22.58| +|2017-11-01T00:09:00.000+08:00| false| 20.98| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 10 +It costs 0.000s +``` + +- **示例 2:** 带 `OFFSET` 的 `LIMIT` 子句 + +SQL 语句: + +```sql +select status, temperature from root.ln.wf01.wt01 limit 5 offset 3 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 选择的时间序列是“状态”和“温度”。 SQL 语句要求返回查询结果的第 3 至 7 行(第一行编号为 0 行)。 + +结果如下所示: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.342s +``` + +- **示例 3:** `LIMIT` 子句与 `WHERE` 子句结合 + +SQL 语句: + +```sql +select status,temperature from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time< 2017-11-01T00:12:00.000 limit 5 offset 3 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 选择的时间序列是“状态”和“温度”。 SQL 语句要求返回时间“ 2017-11-01T00:05:00.000”和“ 2017-11-01T00:12:00.000”之间的状态和温度传感器值的第 3 至 4 行(第一行) 编号为第 0 行)。 + +结果如下所示: + +``` ++-----------------------------+------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------------+------------------------+-----------------------------+ +|2017-11-01T00:03:00.000+08:00| false| 20.18| +|2017-11-01T00:04:00.000+08:00| false| 21.13| +|2017-11-01T00:05:00.000+08:00| false| 22.72| +|2017-11-01T00:06:00.000+08:00| false| 20.71| +|2017-11-01T00:07:00.000+08:00| false| 21.45| ++-----------------------------+------------------------+-----------------------------+ +Total line number = 5 +It costs 0.000s +``` + +- **示例 4:** `LIMIT` 子句与 `GROUP BY` 子句组合 + +SQL 语句: + +```sql +select count(status), max_value(temperature) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) limit 4 offset 3 +``` + +含义: + +SQL 语句子句要求返回查询结果的第 3 至 6 行(第一行编号为 0 行)。 + +结果如下所示: + +``` ++-----------------------------+-------------------------------+----------------------------------------+ +| Time|count(root.ln.wf01.wt01.status)|max_value(root.ln.wf01.wt01.temperature)| ++-----------------------------+-------------------------------+----------------------------------------+ +|2017-11-04T00:00:00.000+08:00| 1440| 26.0| +|2017-11-05T00:00:00.000+08:00| 1440| 26.0| +|2017-11-06T00:00:00.000+08:00| 1440| 25.99| +|2017-11-07T00:00:00.000+08:00| 1380| 26.0| ++-----------------------------+-------------------------------+----------------------------------------+ +Total line number = 4 +It costs 0.016s +``` + +## 按列分页 + +用户可以通过 `SLIMIT` 和 `SOFFSET` 子句控制查询结果的列数,`SLIMIT seriesLimit` 指定查询结果的列数,`SOFFSET seriesOffset` 指定查询结果显示的起始列位置。 + +注意: +- 仅用于控制值列,对时间列和设备列无效。 +- 当 `seriesOffset` 超过结果集的大小时,返回空结果集。 +- 当 `seriesLimit` 超过结果集的大小时,返回所有查询结果。 +- 当 `seriesLimit` 和 `seriesOffset` 不是正整数,或超过 `INT64` 允许的最大值时,系统将提示错误。 + +我们将通过以下示例演示如何使用 `SLIMIT` 和 `SOFFSET` 子句。 + +- **示例 1:** 基本的 `SLIMIT` 子句 + +SQL 语句: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 所选时间序列是该设备下的第二列,即温度。 SQL 语句要求在"2017-11-01T00:05:00.000"和"2017-11-01T00:12:00.000"的时间点之间选择温度传感器值。 + +结果如下所示: + +``` ++-----------------------------+-----------------------------+ +| Time|root.ln.wf01.wt01.temperature| ++-----------------------------+-----------------------------+ +|2017-11-01T00:06:00.000+08:00| 20.71| +|2017-11-01T00:07:00.000+08:00| 21.45| +|2017-11-01T00:08:00.000+08:00| 22.58| +|2017-11-01T00:09:00.000+08:00| 20.98| +|2017-11-01T00:10:00.000+08:00| 25.52| +|2017-11-01T00:11:00.000+08:00| 22.91| ++-----------------------------+-----------------------------+ +Total line number = 6 +It costs 0.000s +``` + +- **示例 2:** 带 `SOFFSET` 的 `SLIMIT` 子句 + +SQL 语句: + +```sql +select * from root.ln.wf01.wt01 where time > 2017-11-01T00:05:00.000 and time < 2017-11-01T00:12:00.000 slimit 1 soffset 1 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 所选时间序列是该设备下的第一列,即电源状态。 SQL 语句要求在" 2017-11-01T00:05:00.000"和"2017-11-01T00:12:00.000"的时间点之间选择状态传感器值。 + +结果如下所示: + +``` ++-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.status| ++-----------------------------+------------------------+ +|2017-11-01T00:06:00.000+08:00| false| +|2017-11-01T00:07:00.000+08:00| false| +|2017-11-01T00:08:00.000+08:00| false| +|2017-11-01T00:09:00.000+08:00| false| +|2017-11-01T00:10:00.000+08:00| true| +|2017-11-01T00:11:00.000+08:00| false| ++-----------------------------+------------------------+ +Total line number = 6 +It costs 0.003s +``` + +- **示例 3:** `SLIMIT` 子句与 `GROUP BY` 子句结合 + +SQL 语句: + +```sql +select max_value(*) from root.ln.wf01.wt01 group by ([2017-11-01T00:00:00, 2017-11-07T23:00:00),1d) slimit 1 soffset 1 +``` + +含义: + +``` ++-----------------------------+-----------------------------------+ +| Time|max_value(root.ln.wf01.wt01.status)| ++-----------------------------+-----------------------------------+ +|2017-11-01T00:00:00.000+08:00| true| +|2017-11-02T00:00:00.000+08:00| true| +|2017-11-03T00:00:00.000+08:00| true| +|2017-11-04T00:00:00.000+08:00| true| +|2017-11-05T00:00:00.000+08:00| true| +|2017-11-06T00:00:00.000+08:00| true| +|2017-11-07T00:00:00.000+08:00| true| ++-----------------------------+-----------------------------------+ +Total line number = 7 +It costs 0.000s +``` + +- **示例 4:** `SLIMIT` 子句与 `LIMIT` 子句结合 + +SQL 语句: + +```sql +select * from root.ln.wf01.wt01 limit 10 offset 100 slimit 2 soffset 0 +``` + +含义: + +所选设备为 ln 组 wf01 工厂 wt01 设备; 所选时间序列是此设备下的第 0 列至第 1 列(第一列编号为第 0 列)。 SQL 语句子句要求返回查询结果的第 100 至 109 行(第一行编号为 0 行)。 + +结果如下所示: + +``` ++-----------------------------+-----------------------------+------------------------+ +| Time|root.ln.wf01.wt01.temperature|root.ln.wf01.wt01.status| ++-----------------------------+-----------------------------+------------------------+ +|2017-11-01T01:40:00.000+08:00| 21.19| false| +|2017-11-01T01:41:00.000+08:00| 22.79| false| +|2017-11-01T01:42:00.000+08:00| 22.98| false| +|2017-11-01T01:43:00.000+08:00| 21.52| false| +|2017-11-01T01:44:00.000+08:00| 23.45| true| +|2017-11-01T01:45:00.000+08:00| 24.06| true| +|2017-11-01T01:46:00.000+08:00| 22.6| false| +|2017-11-01T01:47:00.000+08:00| 23.78| true| +|2017-11-01T01:48:00.000+08:00| 24.72| true| +|2017-11-01T01:49:00.000+08:00| 24.68| true| ++-----------------------------+-----------------------------+------------------------+ +Total line number = 10 +It costs 0.009s +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Expression.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Expression.md new file mode 100644 index 00000000..59f92c70 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Expression.md @@ -0,0 +1,286 @@ + + +# 选择表达式 + +`SELECT` 子句指定查询的输出,由若干个 `selectExpr` 组成。 每个 `selectExpr` 定义了查询结果中的一列或多列。 + +**`selectExpr` 是一个由时间序列路径后缀、常量、函数和运算符组成的表达式。即 `selectExpr` 中可以包含:** +- 时间序列路径后缀(支持使用通配符) +- 运算符 + - 算数运算符 + - 比较运算符 + - 逻辑运算符 +- 函数 + - 聚合函数 + - 时间序列生成函数(包括内置函数和用户自定义函数) +- 常量 + +## 使用别名 + +由于 IoTDB 独特的数据模型,在每个传感器前都附带有设备等诸多额外信息。有时,我们只针对某个具体设备查询,而这些前缀信息频繁显示造成了冗余,影响了结果集的显示与分析。 + +IoTDB 支持使用`AS`为查询结果集中的列指定别名。 + +**示例:** + +```sql +select s1 as temperature, s2 as speed from root.ln.wf01.wt01; +``` + +结果集将显示为: + +| Time | temperature | speed | +| ---- | ----------- | ----- | +| ... | ... | ... | + +## 运算符 + +IoTDB 中支持的运算符列表见文档 [运算符和函数](../Operators-Functions/Overview.md)。 + +## 函数 + +### 聚合函数 + +聚合函数是多对一函数。它们对一组值进行聚合计算,得到单个聚合结果。 + +**包含聚合函数的查询称为聚合查询**,否则称为时间序列查询。 + +**注意:聚合查询和时间序列查询不能混合使用。** 下列语句是不支持的: + +```sql +select s1, count(s1) from root.sg.d1; +select sin(s1), count(s1) from root.sg.d1; +select s1, count(s1) from root.sg.d1 group by ([10,100),10ms); +``` + +IoTDB 支持的聚合函数见文档 [聚合函数](../Operators-Functions/Aggregation.md)。 + +### 时间序列生成函数 + +时间序列生成函数接受若干原始时间序列作为输入,产生一列时间序列输出。与聚合函数不同的是,时间序列生成函数的结果集带有时间戳列。 + +所有的时间序列生成函数都可以接受 * 作为输入,都可以与原始时间序列查询混合进行。 + +#### 内置时间序列生成函数 + +IoTDB 中支持的内置函数列表见文档 [运算符和函数](../Operators-Functions/Overview.md)。 + +#### 自定义时间序列生成函数 + +IoTDB 支持通过用户自定义函数(点击查看: [用户自定义函数](../Operators-Functions/User-Defined-Function.md) )能力进行函数功能扩展。 + +## 嵌套表达式举例 + +IoTDB 支持嵌套表达式,由于聚合查询和时间序列查询不能在一条查询语句中同时出现,我们将支持的嵌套表达式分为时间序列查询嵌套表达式和聚合查询嵌套表达式两类。 + +### 时间序列查询嵌套表达式 + +IoTDB 支持在 `SELECT` 子句中计算由**时间序列、常量、时间序列生成函数(包括用户自定义函数)和运算符**组成的任意嵌套表达式。 + +**说明:** + +- 当某个时间戳下左操作数和右操作数都不为空(`null`)时,表达式才会有结果,否则表达式值为`null`,且默认不出现在结果集中。 +- 如果表达式中某个操作数对应多条时间序列(如通配符 `*`),那么每条时间序列对应的结果都会出现在结果集中(按照笛卡尔积形式)。 + +**示例 1:** + +```sql +select a, + b, + ((a + 1) * 2 - 1) % 2 + 1.5, + sin(a + sin(a + sin(b))), + -(a + b) * (sin(a + b) * sin(a + b) + cos(a + b) * cos(a + b)) + 1 +from root.sg1; +``` + +运行结果: + +``` ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Time|root.sg1.a|root.sg1.b|((((root.sg1.a + 1) * 2) - 1) % 2) + 1.5|sin(root.sg1.a + sin(root.sg1.a + sin(root.sg1.b)))|(-root.sg1.a + root.sg1.b * ((sin(root.sg1.a + root.sg1.b) * sin(root.sg1.a + root.sg1.b)) + (cos(root.sg1.a + root.sg1.b) * cos(root.sg1.a + root.sg1.b)))) + 1| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 1| 1| 2.5| 0.9238430524420609| -1.0| +|1970-01-01T08:00:00.020+08:00| 2| 2| 2.5| 0.7903505371876317| -3.0| +|1970-01-01T08:00:00.030+08:00| 3| 3| 2.5| 0.14065207680386618| -5.0| +|1970-01-01T08:00:00.040+08:00| 4| null| 2.5| null| null| +|1970-01-01T08:00:00.050+08:00| null| 5| null| null| null| +|1970-01-01T08:00:00.060+08:00| 6| 6| 2.5| -0.7288037411970916| -11.0| ++-----------------------------+----------+----------+----------------------------------------+---------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+ +Total line number = 6 +It costs 0.048s +``` + +**示例 2:** + +```sql +select (a + b) * 2 + sin(a) from root.sg +``` + +运行结果: + +``` ++-----------------------------+----------------------------------------------+ +| Time|((root.sg.a + root.sg.b) * 2) + sin(root.sg.a)| ++-----------------------------+----------------------------------------------+ +|1970-01-01T08:00:00.010+08:00| 59.45597888911063| +|1970-01-01T08:00:00.020+08:00| 100.91294525072763| +|1970-01-01T08:00:00.030+08:00| 139.01196837590714| +|1970-01-01T08:00:00.040+08:00| 180.74511316047935| +|1970-01-01T08:00:00.050+08:00| 219.73762514629607| +|1970-01-01T08:00:00.060+08:00| 259.6951893788978| +|1970-01-01T08:00:00.070+08:00| 300.7738906815579| +|1970-01-01T08:00:00.090+08:00| 39.45597888911063| +|1970-01-01T08:00:00.100+08:00| 39.45597888911063| ++-----------------------------+----------------------------------------------+ +Total line number = 9 +It costs 0.011s +``` + +**示例 3:** + +```sql +select (a + *) / 2 from root.sg1 +``` + +运行结果: + +``` ++-----------------------------+-----------------------------+-----------------------------+ +| Time|(root.sg1.a + root.sg1.a) / 2|(root.sg1.a + root.sg1.b) / 2| ++-----------------------------+-----------------------------+-----------------------------+ +|1970-01-01T08:00:00.010+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.020+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.030+08:00| 3.0| 3.0| +|1970-01-01T08:00:00.040+08:00| 4.0| null| +|1970-01-01T08:00:00.060+08:00| 6.0| 6.0| ++-----------------------------+-----------------------------+-----------------------------+ +Total line number = 5 +It costs 0.011s +``` + +**示例 4:** + +```sql +select (a + b) * 3 from root.sg, root.ln +``` + +运行结果: + +``` ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +| Time|(root.sg.a + root.sg.b) * 3|(root.sg.a + root.ln.b) * 3|(root.ln.a + root.sg.b) * 3|(root.ln.a + root.ln.b) * 3| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +|1970-01-01T08:00:00.010+08:00| 90.0| 270.0| 360.0| 540.0| +|1970-01-01T08:00:00.020+08:00| 150.0| 330.0| 690.0| 870.0| +|1970-01-01T08:00:00.030+08:00| 210.0| 450.0| 570.0| 810.0| +|1970-01-01T08:00:00.040+08:00| 270.0| 240.0| 690.0| 660.0| +|1970-01-01T08:00:00.050+08:00| 330.0| null| null| null| +|1970-01-01T08:00:00.060+08:00| 390.0| null| null| null| +|1970-01-01T08:00:00.070+08:00| 450.0| null| null| null| +|1970-01-01T08:00:00.090+08:00| 60.0| null| null| null| +|1970-01-01T08:00:00.100+08:00| 60.0| null| null| null| ++-----------------------------+---------------------------+---------------------------+---------------------------+---------------------------+ +Total line number = 9 +It costs 0.014s +``` + +### 聚合查询嵌套表达式 + +IoTDB 支持在 `SELECT` 子句中计算由**聚合函数、常量、时间序列生成函数和表达式**组成的任意嵌套表达式。 + +**说明:** +- 当某个时间戳下左操作数和右操作数都不为空(`null`)时,表达式才会有结果,否则表达式值为`null`,且默认不出现在结果集中。但在使用`GROUP BY`子句的聚合查询嵌套表达式中,我们希望保留每个时间窗口的值,所以表达式值为`null`的窗口也包含在结果集中。 +- 如果表达式中某个操作数对应多条时间序列(如通配符`*`),那么每条时间序列对应的结果都会出现在结果集中(按照笛卡尔积形式)。 + +**示例 1:** + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) +from root.ln.wf01.wt01; +``` + +运行结果: + +``` ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|avg(root.ln.wf01.wt01.temperature) + sum(root.ln.wf01.wt01.hardware)| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +| 15.927999999999999| -0.21826546964855045| 16.927999999999997| -7426.0| 7441.928| ++----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+--------------------------------------------------------------------+ +Total line number = 1 +It costs 0.009s +``` + +**示例 2:** + +```sql +select avg(*), + (avg(*) + 1) * 3 / 2 -1 +from root.sg1 +``` + +运行结果: + +``` ++---------------+---------------+-------------------------------------+-------------------------------------+ +|avg(root.sg1.a)|avg(root.sg1.b)|(avg(root.sg1.a) + 1) * 3 / 2 - 1 |(avg(root.sg1.b) + 1) * 3 / 2 - 1 | ++---------------+---------------+-------------------------------------+-------------------------------------+ +| 3.2| 3.4| 5.300000000000001| 5.6000000000000005| ++---------------+---------------+-------------------------------------+-------------------------------------+ +Total line number = 1 +It costs 0.007s +``` + +**示例 3:** + +```sql +select avg(temperature), + sin(avg(temperature)), + avg(temperature) + 1, + -sum(hardware), + avg(temperature) + sum(hardware) as custom_sum +from root.ln.wf01.wt01 +GROUP BY([10, 90), 10ms); +``` + +运行结果: + +``` ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +| Time|avg(root.ln.wf01.wt01.temperature)|sin(avg(root.ln.wf01.wt01.temperature))|avg(root.ln.wf01.wt01.temperature) + 1|-sum(root.ln.wf01.wt01.hardware)|custom_sum| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +|1970-01-01T08:00:00.010+08:00| 13.987499999999999| 0.9888207947857667| 14.987499999999999| -3211.0| 3224.9875| +|1970-01-01T08:00:00.020+08:00| 29.6| -0.9701057337071853| 30.6| -3720.0| 3749.6| +|1970-01-01T08:00:00.030+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.040+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.050+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.060+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.070+08:00| null| null| null| null| null| +|1970-01-01T08:00:00.080+08:00| null| null| null| null| null| ++-----------------------------+----------------------------------+---------------------------------------+--------------------------------------+--------------------------------+----------+ +Total line number = 8 +It costs 0.012s +``` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Into.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Into.md new file mode 100644 index 00000000..5330768b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Select-Into.md @@ -0,0 +1,350 @@ + + +# 查询写回(SELECT INTO) + +`SELECT INTO` 语句用于将查询结果写入一系列指定的时间序列中。 + +应用场景如下: +- **实现 IoTDB 内部 ETL**:对原始数据进行 ETL 处理后写入新序列。 +- **查询结果存储**:将查询结果进行持久化存储,起到类似物化视图的作用。 +- **非对齐序列转对齐序列**:对齐序列从0.13版本开始支持,可以通过该功能将非对齐序列的数据写入新的对齐序列中。 + +## 语法定义 + +### 整体描述 + +```sql +selectIntoStatement + : SELECT + resultColumn [, resultColumn] ... + INTO intoItem [, intoItem] ... + FROM prefixPath [, prefixPath] ... + [WHERE whereCondition] + [GROUP BY groupByTimeClause, groupByLevelClause] + [FILL {PREVIOUS | LINEAR | constant}] + [LIMIT rowLimit OFFSET rowOffset] + [ALIGN BY DEVICE] + ; + +intoItem + : [ALIGNED] intoDevicePath '(' intoMeasurementName [',' intoMeasurementName]* ')' + ; +``` + +### `INTO` 子句 + +`INTO` 子句由若干个 `intoItem` 构成。 + +每个 `intoItem` 由一个目标设备路径和一个包含若干目标物理量名的列表组成(与 `INSERT` 语句中的 `INTO` 子句写法类似)。 + +其中每个目标物理量名与目标设备路径组成一个目标序列,一个 `intoItem` 包含若干目标序列。例如:`root.sg_copy.d1(s1, s2)` 指定了两条目标序列 `root.sg_copy.d1.s1` 和 `root.sg_copy.d1.s2`。 + +`INTO` 子句指定的目标序列要能够与查询结果集的列一一对应。具体规则如下: + +- **按时间对齐**(默认):全部 `intoItem` 包含的目标序列数量要与查询结果集的列数(除时间列外)一致,且按照表头从左到右的顺序一一对应。 +- **按设备对齐**(使用 `ALIGN BY DEVICE`):全部 `intoItem` 中指定的目标设备数和查询的设备数(即 `FROM` 子句中路径模式匹配的设备数)一致,且按照结果集设备的输出顺序一一对应。 + 为每个目标设备指定的目标物理量数量要与查询结果集的列数(除时间和设备列外)一致,且按照表头从左到右的顺序一一对应。 + +下面通过示例进一步说明: + +- **示例 1**(按时间对齐) +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1), root.sg_copy.d2(t1, t2), root.sg_copy.d1(t2) from root.sg.d1, root.sg.d2; ++--------------+-------------------+--------+ +| source column| target timeseries| written| ++--------------+-------------------+--------+ +| root.sg.d1.s1| root.sg_copy.d1.t1| 8000| ++--------------+-------------------+--------+ +| root.sg.d2.s1| root.sg_copy.d2.t1| 10000| ++--------------+-------------------+--------+ +| root.sg.d1.s2| root.sg_copy.d2.t2| 12000| ++--------------+-------------------+--------+ +| root.sg.d2.s2| root.sg_copy.d1.t2| 10000| ++--------------+-------------------+--------+ +Total line number = 4 +It costs 0.725s +``` + +该语句将 `root.sg` database 下四条序列的查询结果写入到 `root.sg_copy` database 下指定的四条序列中。注意,`root.sg_copy.d2(t1, t2)` 也可以写做 `root.sg_copy.d2(t1), root.sg_copy.d2(t2)`。 + +可以看到,`INTO` 子句的写法非常灵活,只要满足组合出的目标序列没有重复,且与查询结果列一一对应即可。 + +> `CLI` 展示的结果集中,各列的含义如下: +> - `source column` 列表示查询结果的列名。 +> - `target timeseries` 表示对应列写入的目标序列。 +> - `written` 表示预期写入的数据量。 + +- **示例 2**(按时间对齐) +```shell +IoTDB> select count(s1 + s2), last_value(s2) into root.agg.count(s1_add_s2), root.agg.last_value(s2) from root.sg.d1 group by ([0, 100), 10ms); ++--------------------------------------+-------------------------+--------+ +| source column| target timeseries| written| ++--------------------------------------+-------------------------+--------+ +| count(root.sg.d1.s1 + root.sg.d1.s2)| root.agg.count.s1_add_s2| 10| ++--------------------------------------+-------------------------+--------+ +| last_value(root.sg.d1.s2)| root.agg.last_value.s2| 10| ++--------------------------------------+-------------------------+--------+ +Total line number = 2 +It costs 0.375s +``` + +该语句将聚合查询的结果存储到指定序列中。 + +- **示例 3**(按设备对齐) +```shell +IoTDB> select s1, s2 into root.sg_copy.d1(t1, t2), root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+-------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s1| root.sg_copy.d1.t1| 8000| ++--------------+--------------+-------------------+--------+ +| root.sg.d1| s2| root.sg_copy.d1.t2| 11000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s1| root.sg_copy.d2.t1| 12000| ++--------------+--------------+-------------------+--------+ +| root.sg.d2| s2| root.sg_copy.d2.t2| 9000| ++--------------+--------------+-------------------+--------+ +Total line number = 4 +It costs 0.625s +``` + +该语句同样是将 `root.sg` database 下四条序列的查询结果写入到 `root.sg_copy` database 下指定的四条序列中。但在按设备对齐中,`intoItem` 的数量必须和查询的设备数量一致,每个查询设备对应一个 `intoItem`。 + +> 按设备对齐查询时,`CLI` 展示的结果集多出一列 `source device` 列表示查询的设备。 + +- **示例 4**(按设备对齐) +```shell +IoTDB> select s1 + s2 into root.expr.add(d1s1_d1s2), root.expr.add(d2s1_d2s2) from root.sg.d1, root.sg.d2 align by device; ++--------------+--------------+------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+--------------+------------------------+--------+ +| root.sg.d1| s1 + s2| root.expr.add.d1s1_d1s2| 10000| ++--------------+--------------+------------------------+--------+ +| root.sg.d2| s1 + s2| root.expr.add.d2s1_d2s2| 10000| ++--------------+--------------+------------------------+--------+ +Total line number = 2 +It costs 0.532s +``` + +该语句将表达式计算的结果存储到指定序列中。 + +### 使用变量占位符 + +特别地,可以使用变量占位符描述目标序列与查询序列之间的对应规律,简化语句书写。目前支持以下两种变量占位符: + +- 后缀复制符 `::`:复制查询设备后缀(或物理量),表示从该层开始一直到设备的最后一层(或物理量),目标设备的节点名(或物理量名)与查询的设备对应的节点名(或物理量名)相同。 +- 单层节点匹配符 `${i}`:表示目标序列当前层节点名与查询序列的第`i`层节点名相同。比如,对于路径`root.sg1.d1.s1`而言,`${1}`表示`sg1`,`${2}`表示`d1`,`${3}`表示`s1`。 + +在使用变量占位符时,`intoItem`与查询结果集列的对应关系不能存在歧义,具体情况分类讨论如下: + +#### 按时间对齐(默认) + +> 注:变量占位符**只能描述序列与序列之间的对应关系**,如果查询中包含聚合、表达式计算,此时查询结果中的列无法与某个序列对应,因此目标设备和目标物理量都不能使用变量占位符。 + +##### (1)目标设备不使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** + 1. 每个 `intoItem` 中,物理量列表的长度必须为 1。
(如果长度可以大于1,例如 `root.sg1.d1(::, s1)`,无法确定具体哪些列与`::`匹配) + 2. `intoItem` 数量为 1,或与查询结果集列数一致。
(在每个目标物理量列表长度均为 1 的情况下,若 `intoItem` 只有 1 个,此时表示全部查询序列写入相同设备;若 `intoItem` 数量与查询序列一致,则表示为每个查询序列指定一个目标设备;若 `intoItem` 大于 1 小于查询序列数,此时无法与查询序列一一对应) + +**匹配方法:** 每个查询序列指定目标设备,而目标物理量根据变量占位符生成。 + +**示例:** + +```sql +select s1, s2 +into root.sg_copy.d1(::), root.sg_copy.d2(s1), root.sg_copy.d1(${3}), root.sg_copy.d2(::) +from root.sg.d1, root.sg.d2; +``` +该语句等价于: +```sql +select s1, s2 +into root.sg_copy.d1(s1), root.sg_copy.d2(s1), root.sg_copy.d1(s2), root.sg_copy.d2(s2) +from root.sg.d1, root.sg.d2; +``` +可以看到,在这种情况下,语句并不能得到很好地简化。 + +##### (2)目标设备使用变量占位符 & 目标物理量列表不使用变量占位符 + +**限制:** 全部 `intoItem` 中目标物理量的数量与查询结果集列数一致。 + +**匹配方式:** 为每个查询序列指定了目标物理量,目标设备根据对应目标物理量所在 `intoItem` 的目标设备占位符生成。 + +**示例:** +```sql +select d1.s1, d1.s2, d2.s3, d3.s4 +into ::(s1_1, s2_2), root.sg.d2_2(s3_3), root.${2}_copy.::(s4) +from root.sg; +``` + +##### (3)目标设备使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** `intoItem` 只有一个且物理量列表的长度为 1。 + +**匹配方式:** 每个查询序列根据变量占位符可以得到一个目标序列。 + +**示例:** +```sql +select * into root.sg_bk.::(::) from root.sg.**; +``` +将 `root.sg` 下全部序列的查询结果写到 `root.sg_bk`,设备名后缀和物理量名保持不变。 + +#### 按设备对齐(使用 `ALIGN BY DEVICE`) + +> 注:变量占位符**只能描述序列与序列之间的对应关系**,如果查询中包含聚合、表达式计算,此时查询结果中的列无法与某个物理量对应,因此目标物理量不能使用变量占位符。 + +##### (1)目标设备不使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** 每个 `intoItem` 中,如果物理量列表使用了变量占位符,则列表的长度必须为 1。 + +**匹配方法:** 每个查询序列指定目标设备,而目标物理量根据变量占位符生成。 + +**示例:** +```sql +select s1, s2, s3, s4 +into root.backup_sg.d1(s1, s2, s3, s4), root.backup_sg.d2(::), root.sg.d3(backup_${4}) +from root.sg.d1, root.sg.d2, root.sg.d3 +align by device; +``` + +##### (2)目标设备使用变量占位符 & 目标物理量列表不使用变量占位符 + +**限制:** `intoItem` 只有一个。(如果出现多个带占位符的 `intoItem`,我们将无法得知每个 `intoItem` 需要匹配哪几个源设备) + +**匹配方式:** 每个查询设备根据变量占位符得到一个目标设备,每个设备下结果集各列写入的目标物理量由目标物理量列表指定。 + +**示例:** +```sql +select avg(s1), sum(s2) + sum(s3), count(s4) +into root.agg_${2}.::(avg_s1, sum_s2_add_s3, count_s4) +from root.** +align by device; +``` + +##### (3)目标设备使用变量占位符 & 目标物理量列表使用变量占位符 + +**限制:** `intoItem` 只有一个且物理量列表的长度为 1。 + +**匹配方式:** 每个查询序列根据变量占位符可以得到一个目标序列。 + +**示例:** +```sql +select * into ::(backup_${4}) from root.sg.** align by device; +``` +将 `root.sg` 下每条序列的查询结果写到相同设备下,物理量名前加`backup_`。 + +### 指定目标序列为对齐序列 + +通过 `ALIGNED` 关键词可以指定写入的目标设备为对齐写入,每个 `intoItem` 可以独立设置。 + +**示例:** +```sql +select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from root.sg.d1, root.sg.d2 align by device; +``` +该语句指定了 `root.sg_copy.d1` 是非对齐设备,`root.sg_copy.d2`是对齐设备。 + +### 不支持使用的查询子句 + +- `SLIMIT`、`SOFFSET`:查询出来的列不确定,功能不清晰,因此不支持。 +- `LAST`查询、`GROUP BY TAGS`、`DISABLE ALIGN`:表结构和写入结构不一致,因此不支持。 + +### 其他要注意的点 + +- 对于一般的聚合查询,时间戳是无意义的,约定使用 0 来存储。 +- 当目标序列存在时,需要保证源序列和目标时间序列的数据类型兼容。关于数据类型的兼容性,查看文档 [数据类型](../Basic-Concept/Data-Type.md#数据类型兼容性)。 +- 当目标序列不存在时,系统将自动创建目标序列(包括 database)。 +- 当查询的序列不存在或查询的序列不存在数据,则不会自动创建目标序列。 + +## 应用举例 + +### 实现 IoTDB 内部 ETL +对原始数据进行 ETL 处理后写入新序列。 +```shell +IOTDB > SELECT preprocess_udf(s1, s2) INTO ::(preprocessed_s1, preprocessed_s2) FROM root.sg.* ALIGN BY DEIVCE; ++--------------+-------------------+---------------------------+--------+ +| source device| source column| target timeseries| written| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s1)| root.sg.d1.preprocessed_s1| 8000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d1| preprocess_udf(s2)| root.sg.d1.preprocessed_s2| 10000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s1)| root.sg.d2.preprocessed_s1| 11000| ++--------------+-------------------+---------------------------+--------+ +| root.sg.d2| preprocess_udf(s2)| root.sg.d2.preprocessed_s2| 9000| ++--------------+-------------------+---------------------------+--------+ +``` +以上语句使用自定义函数对数据进行预处理,将预处理后的结果持久化存储到新序列中。 + +### 查询结果存储 +将查询结果进行持久化存储,起到类似物化视图的作用。 +```shell +IOTDB > SELECT count(s1), last_value(s1) INTO root.sg.agg_${2}(count_s1, last_value_s1) FROM root.sg1.d1 GROUP BY ([0, 10000), 10ms); ++--------------------------+-----------------------------+--------+ +| source column| target timeseries| written| ++--------------------------+-----------------------------+--------+ +| count(root.sg.d1.s1)| root.sg.agg_d1.count_s1| 1000| ++--------------------------+-----------------------------+--------+ +| last_value(root.sg.d1.s2)| root.sg.agg_d1.last_value_s2| 1000| ++--------------------------+-----------------------------+--------+ +Total line number = 2 +It costs 0.115s +``` +以上语句将降采样查询的结果持久化存储到新序列中。 + +### 非对齐序列转对齐序列 +对齐序列从 0.13 版本开始支持,可以通过该功能将非对齐序列的数据写入新的对齐序列中。 + +**注意:** 建议配合使用 `LIMIT & OFFSET` 子句或 `WHERE` 子句(时间过滤条件)对数据进行分批,防止单次操作的数据量过大。 + +```shell +IOTDB > SELECT s1, s2 INTO ALIGNED root.sg1.aligned_d(s1, s2) FROM root.sg1.non_aligned_d WHERE time >= 0 and time < 10000; ++--------------------------+----------------------+--------+ +| source column| target timeseries| written| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s1| root.sg1.aligned_d.s1| 10000| ++--------------------------+----------------------+--------+ +| root.sg1.non_aligned_d.s2| root.sg1.aligned_d.s2| 10000| ++--------------------------+----------------------+--------+ +Total line number = 2 +It costs 0.375s +``` +以上语句将一组非对齐的序列的数据迁移到一组对齐序列。 + +## 相关用户权限 + +用户必须有下列权限才能正常执行查询写回语句: + +* 所有 `SELECT` 子句中源序列的 `READ_TIMESERIES` 权限。 +* 所有 `INTO` 子句中目标序列 `INSERT_TIMESERIES` 权限。 + +更多用户权限相关的内容,请参考[权限管理语句](../Administration-Management/Administration.md)。 + +## 相关配置参数 + +* `select_into_insert_tablet_plan_row_limit` + + | 参数名 | select_into_insert_tablet_plan_row_limit | + | ---- | ---- | + | 描述 | 写入过程中每一批 `Tablet` 的最大行数 | + | 类型 | int32 | + | 默认值 | 10000 | + | 改后生效方式 | 重启后生效 | diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Where-Condition.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Where-Condition.md new file mode 100644 index 00000000..66de1200 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Query-Data/Where-Condition.md @@ -0,0 +1,185 @@ + + +# 查询过滤条件 + +`WHERE` 子句指定了对数据行的筛选条件,由一个 `whereCondition` 组成。 + +`whereCondition` 是一个逻辑表达式,对于要选择的每一行,其计算结果为真。如果没有 `WHERE` 子句,将选择所有行。 +在 `whereCondition` 中,可以使用除聚合函数之外的任何 IOTDB 支持的函数和运算符。 + +根据过滤条件的不同,可以分为时间过滤条件和值过滤条件。时间过滤条件和值过滤条件可以混合使用。 + +## 时间过滤条件 + +使用时间过滤条件可以筛选特定时间范围的数据。对于时间戳支持的格式,请参考 [时间戳类型](../Basic-Concept/Data-Type.md) 。 + +示例如下: + +1. 选择时间戳大于 2022-01-01T00:05:00.000 的数据: + + ```sql + select s1 from root.sg1.d1 where time > 2022-01-01T00:05:00.000; + ``` + +2. 选择时间戳等于 2022-01-01T00:05:00.000 的数据: + + ```sql + select s1 from root.sg1.d1 where time = 2022-01-01T00:05:00.000; + ``` + +3. 选择时间区间 [2017-11-01T00:05:00.000, 2017-11-01T00:12:00.000) 内的数据: + + ```sql + select s1 from root.sg1.d1 where time >= 2022-01-01T00:05:00.000 and time < 2017-11-01T00:12:00.000; + ``` + +注:在上述示例中,`time` 也可写做 `timestamp`。 + +## 值过滤条件 + +使用值过滤条件可以筛选数据值满足特定条件的数据。 +**允许**使用 select 子句中未选择的时间序列作为值过滤条件。 + +示例如下: + +1. 选择值大于 36.5 的数据: + + ```sql + select temperature from root.sg1.d1 where temperature > 36.5; + ``` + +2. 选择值等于 true 的数据: + + ```sql + select status from root.sg1.d1 where status = true; + +3. 选择区间 [36.5,40] 内或之外的数据: + + ```sql + select temperature from root.sg1.d1 where temperature between 36.5 and 40; + ```` + ```sql + select temperature from root.sg1.d1 where temperature not between 36.5 and 40; + ```` + +4. 选择值在特定范围内的数据: + + ```sql + select code from root.sg1.d1 where code in ('200', '300', '400', '500'); + ``` + +5. 选择值在特定范围外的数据: + + ```sql + select code from root.sg1.d1 where code not in ('200', '300', '400', '500'); + ``` + +6. 选择值为空的数据: + + ```sql + select code from root.sg1.d1 where temperature is null; + ```` + +7. 选择值为非空的数据: + + ```sql + select code from root.sg1.d1 where temperature is not null; + ```` + +## 模糊查询 + +对于 TEXT 类型的数据,支持使用 `Like` 和 `Regexp` 运算符对数据进行模糊匹配 + +### 使用 `Like` 进行模糊匹配 + +**匹配规则:** + +- `%` 表示任意0个或多个字符。 +- `_` 表示任意单个字符。 + +**示例 1:** 查询 `root.sg.d1` 下 `value` 含有`'cc'`的数据。 + +``` +IoTDB> select * from root.sg.d1 where value like '%cc%' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 `root.sg.d1` 下 `value` 中间为 `'b'`、前后为任意单个字符的数据。 + +``` +IoTDB> select * from root.sg.device where value like '_b_' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:02.000+08:00| abc| ++-----------------------------+----------------+ +Total line number = 1 +It costs 0.002s +``` + +### 使用 `Regexp` 进行模糊匹配 + +需要传入的过滤条件为 **Java 标准库风格的正则表达式**。 + +**常见的正则匹配举例:** + +``` +长度为3-20的所有字符:^.{3,20}$ +大写英文字符:^[A-Z]+$ +数字和英文字符:^[A-Za-z0-9]+$ +以a开头的:^a.* +``` + +**示例 1:** 查询 root.sg.d1 下 value 值为26个英文字符组成的字符串。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[A-Za-z]+$' ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` + +**示例 2:** 查询 root.sg.d1 下 value 值为26个小写英文字符组成的字符串且时间大于100的。 + +```shell +IoTDB> select * from root.sg.d1 where value regexp '^[a-z]+$' and time > 100 ++-----------------------------+----------------+ +| Time|root.sg.d1.value| ++-----------------------------+----------------+ +|2017-11-01T00:00:00.000+08:00| aabbccdd| +|2017-11-01T00:00:01.000+08:00| cc| ++-----------------------------+----------------+ +Total line number = 2 +It costs 0.002s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/QuickStart.md b/src/zh/UserGuide/V2.0.1/Tree/stage/QuickStart.md new file mode 100644 index 00000000..ba897d15 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/QuickStart.md @@ -0,0 +1,273 @@ + + +# 快速上手(单机版) + +本文将介绍关于 IoTDB 使用的基本流程,如果需要更多信息,请浏览我们官网的 [指引](../IoTDB-Introduction/What-is-IoTDB.md). + +## 安装环境 + +安装前需要保证设备上配有 JDK>=1.8 的运行环境,并配置好 JAVA_HOME 环境变量。 + +设置最大文件打开数为 65535。 + +## 安装步骤 + +IoTDB 支持多种安装途径。用户可以使用三种方式对 IoTDB 进行安装——下载二进制可运行程序、使用源码、使用 docker 镜像。 + +* 使用源码:您可以从代码仓库下载源码并编译,具体编译方法见下方。 + +* 二进制可运行程序:请从 [下载](https://iotdb.apache.org/Download/) 页面下载最新的安装包,解压后即完成安装。 + +* 使用 Docker 镜像:dockerfile 文件位于[github](https://github.com/apache/iotdb/blob/master/docker/src/main) + +## 软件目录结构 + +* sbin 启动和停止脚本目录 +* conf 配置文件目录 +* tools 系统工具目录 +* lib 依赖包目录 + +## IoTDB 试用 + +用户可以根据以下操作对 IoTDB 进行简单的试用,若以下操作均无误,则说明 IoTDB 安装成功。 + +### 启动 IoTDB +IoTDB 是一个基于分布式系统的数据库。要启动 IoTDB ,你可以先启动单机版(一个 ConfigNode 和一个 DataNode)来检查安装。 + +用户可以使用 sbin 文件夹下的 start-standalone 脚本启动 IoTDB。 + +Linux 系统与 MacOS 系统启动命令如下: + +``` +> bash sbin/start-standalone.sh +``` + +Windows 系统启动命令如下: + +``` +> sbin\start-standalone.bat +``` + +注意:目前,要使用单机模式,你需要保证所有的地址设置为 127.0.0.1,如果需要从非 IoTDB 所在的机器访问此IoTDB,请将配置项 `dn_rpc_address` 修改为 IoTDB 所在的机器 IP。副本数设置为1。并且,推荐使用 IoTConsensus,因为这会带来额外的效率。这些现在都是默认配置。 + +### 使用 Cli 工具 + +IoTDB 为用户提供多种与服务器交互的方式,在此我们介绍使用 Cli 工具进行写入、查询数据的基本步骤。 + +初始安装后的 IoTDB 中有一个默认用户:root,默认密码为 root。用户可以使用该用户运行 Cli 工具操作 IoTDB。Cli 工具启动脚本为 sbin 文件夹下的 start-cli 脚本。启动脚本时需要指定运行 ip、port、username 和 password。若脚本未给定对应参数,则默认参数为"-h 127.0.0.1 -p 6667 -u root -pw -root" + +以下启动语句为服务器在本机运行,且用户未更改运行端口号的示例。 + +Linux 系统与 MacOS 系统启动命令如下: + +``` +> bash sbin/start-cli.sh -h 127.0.0.1 -p 6667 -u root -pw root +``` + +Windows 系统启动命令如下: + +``` +> sbin\start-cli.bat -h 127.0.0.1 -p 6667 -u root -pw root +``` + +启动后出现如图提示即为启动成功。 + +``` + _____ _________ ______ ______ +|_ _| | _ _ ||_ _ `.|_ _ \ + | | .--.|_/ | | \_| | | `. \ | |_) | + | | / .'`\ \ | | | | | | | __'. + _| |_| \__. | _| |_ _| |_.' /_| |__) | +|_____|'.__.' |_____| |______.'|_______/ version x.x.x + +Successfully login at 127.0.0.1:6667 +IoTDB> +``` + +### IoTDB 的基本操作 + +在这里,我们首先介绍一下使用 Cli 工具创建时间序列、插入数据并查看数据的方法。 + +数据在 IoTDB 中的组织形式是以时间序列为单位,每一个时间序列中有若干个数据-时间点对,每一个时间序列属于一个 database。在定义时间序列之前,要首先使用 CREATE DATABASE 语句创建数据库。SQL 语句如下: + +``` +IoTDB> CREATE DATABASE root.ln +``` + +我们可以使用 SHOW DATABASES 语句来查看系统当前所有的 database,SQL 语句如下: + +``` +IoTDB> SHOW DATABASES +``` + +执行结果为: + +``` ++---------------+----+-----------------------+---------------------+---------------------+ +| Database| TTL|SchemaReplicationFactor|DataReplicationFactor|TimePartitionInterval| ++---------------+----+-----------------------+---------------------+---------------------+ +| root.ln|null| 1| 1| 604800000| ++---------------+----+-----------------------+---------------------+---------------------+ +Total line number = 1 +``` + +Database 设定后,使用 CREATE TIMESERIES 语句可以创建新的时间序列,创建时间序列时需要定义数据的类型和编码方式。此处我们创建两个时间序列,SQL 语句如下: + +``` +IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN +IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE +``` + +为了查看指定的时间序列,我们可以使用 SHOW TIMESERIES \语句,其中、表示时间序列对应的路径,默认值为空,表示查看系统中所有的时间序列。下面是两个例子: + +使用 SHOW TIMESERIES 语句查看系统中存在的所有时间序列,SQL 语句如下: + +``` +IoTDB> SHOW TIMESERIES +``` + +执行结果为: + +``` ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+ +|root.ln.wf01.wt01.temperature| null| root.ln| FLOAT| RLE| SNAPPY|null| null| +| root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| ++-----------------------------+-----+-------------+--------+--------+-----------+----+----------+ +Total line number = 2 +``` + +查看具体的时间序列 root.ln.wf01.wt01.status 的 SQL 语句如下: + +``` +IoTDB> SHOW TIMESERIES root.ln.wf01.wt01.status +``` + +执行结果为: + +``` ++------------------------+-----+-------------+--------+--------+-----------+----+----------+ +| timeseries|alias| database|dataType|encoding|compression|tags|attributes| ++------------------------+-----+-------------+--------+--------+-----------+----+----------+ +|root.ln.wf01.wt01.status| null| root.ln| BOOLEAN| PLAIN| SNAPPY|null| null| ++------------------------+-----+-------------+--------+--------+-----------+----+----------+ +Total line number = 1 +``` + +接下来,我们使用 INSERT 语句向 root.ln.wf01.wt01.status 时间序列中插入数据,在插入数据时需要首先指定时间戳和路径后缀名称: + +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true); +``` + +我们也可以向多个时间序列中同时插入数据,这些时间序列同属于一个时间戳: + +``` +IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71) +``` + +最后,我们查询之前插入的数据。使用 SELECT 语句我们可以查询指定的时间序列的数据结果,SQL 语句如下: + +``` +IoTDB> SELECT status FROM root.ln.wf01.wt01 +``` + +查询结果如下: + +``` ++-----------------------+------------------------+ +| Time|root.ln.wf01.wt01.status| ++-----------------------+------------------------+ +|1970-01-01T08:00:00.100| true| +|1970-01-01T08:00:00.200| false| ++-----------------------+------------------------+ +Total line number = 2 +``` + +我们也可以查询多个时间序列的数据结果,SQL 语句如下: + +``` +IoTDB> SELECT * FROM root.ln.wf01.wt01 +``` + +查询结果如下: + +``` ++-----------------------+--------------------------+-----------------------------+ +| Time| root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature| ++-----------------------+--------------------------+-----------------------------+ +|1970-01-01T08:00:00.100| true| null| +|1970-01-01T08:00:00.200| false| 20.71| ++-----------------------+--------------------------+-----------------------------+ +Total line number = 2 +``` + +输入 quit 或 exit 可退出 Cli 结束本次会话。 + +``` +IoTDB> quit +``` +或 + +``` +IoTDB> exit +``` + +想要浏览更多 IoTDB 数据库支持的命令,请浏览 [SQL Reference](../SQL-Manual/SQL-Manual.md)。 + +### 停止 IoTDB + +用户可以使用$IOTDB_HOME/sbin 文件夹下的 stop-standalone 脚本停止 IoTDB。 + +Linux 系统与 MacOS 系统停止命令如下: + +``` +> sudo bash sbin/stop-standalone.sh +``` + +Windows 系统停止命令如下: + +``` +> sbin\stop-standalone.bat +``` +注意:在 Linux 下,执行停止脚本时,请尽量加上 sudo 语句,不然停止可能会失败。更多的解释在分布式/分布式部署中。 + +### IoTDB 的权限管理 + +初始安装后的 IoTDB 中有一个默认用户:root,默认密码为 root。该用户为管理员用户,固定拥有所有权限,无法被赋予、撤销权限,也无法被删除。 + +您可以通过以下命令修改其密码: +``` +ALTER USER SET PASSWORD ; +Example: IoTDB > ALTER USER root SET PASSWORD 'newpwd'; +``` + +权限管理的具体内容可以参考:[权限管理](../User-Manual/Authority-Management.md) + +## 基础配置 + +配置文件在"conf"文件夹下,包括: + + * 环境配置模块 (`datanode-env.bat`, `datanode-env.sh`,`confignode-env.bat`,`confignode-env.sh`), + * 系统配置模块 (`iotdb-system.properties`,`iotdb-cluster.properties`) + * 日志配置模块 (`logback.xml`). diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/SQL-Reference.md b/src/zh/UserGuide/V2.0.1/Tree/stage/SQL-Reference.md new file mode 100644 index 00000000..fc7d92f5 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/SQL-Reference.md @@ -0,0 +1,1290 @@ + + +# SQL 参考文档 + +## 显示版本号 + +```sql +show version +``` + +``` ++---------------+ +| version| ++---------------+ +|1.0.0| ++---------------+ +Total line number = 1 +It costs 0.417s +``` + +## Schema 语句 + +* 设置 database + +``` SQL +CREATE DATABASE +Eg: IoTDB > CREATE DATABASE root.ln.wf01.wt01 +Note: FullPath can not include wildcard `*` or `**` +``` +* 删除 database + +``` +DELETE DATABASE [COMMA ]* +Eg: IoTDB > DELETE DATABASE root.ln +Eg: IoTDB > DELETE DATABASE root.* +Eg: IoTDB > DELETE DATABASE root.** +``` + +* 创建时间序列语句 +``` +CREATE TIMESERIES WITH +alias + : LR_BRACKET ID RR_BRACKET + ; +attributeClauses + : DATATYPE OPERATOR_EQ + COMMA ENCODING OPERATOR_EQ + (COMMA (COMPRESSOR | COMPRESSION) OPERATOR_EQ )? + (COMMA property)* + tagClause + attributeClause + ; +attributeClause + : ATTRIBUTES LR_BRACKET propertyClause (COMMA propertyClause)* RR_BRACKET + ; +tagClause + : TAGS LR_BRACKET propertyClause (COMMA propertyClause)* RR_BRACKET + ; +propertyClause + : name=ID OPERATOR_EQ propertyValue + ; +DataTypeValue: BOOLEAN | DOUBLE | FLOAT | INT32 | INT64 | TEXT +EncodingValue: GORILLA | PLAIN | RLE | TS_2DIFF | REGULAR +CompressorValue: UNCOMPRESSED | SNAPPY +AttributesType: SDT | COMPDEV | COMPMINTIME | COMPMAXTIME +PropertyValue: ID | constant +Eg: CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, COMPRESSOR=SNAPPY, MAX_POINT_NUMBER=3 +Eg: CREATE TIMESERIES root.turbine.d0.s0(temperature) WITH DATATYPE=FLOAT, ENCODING=RLE, COMPRESSOR=SNAPPY tags(unit=f, description='turbine this is a test1') attributes(H_Alarm=100, M_Alarm=50) +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, DEADBAND=SDT, COMPDEV=0.01 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, DEADBAND=SDT, COMPDEV=0.01, COMPMINTIME=3 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, DEADBAND=SDT, COMPDEV=0.01, COMPMINTIME=2, COMPMAXTIME=15 +Note: Datatype and encoding type must be corresponding. Please check Chapter 3 Encoding Section for details. +Note: When propertyValue is SDT, it is required to set compression deviation COMPDEV, which is the maximum absolute difference between values. +Note: For SDT, values withtin COMPDEV will be discarded. +Note: For SDT, it is optional to set compression minimum COMPMINTIME, which is the minimum time difference between stored values for purpose of noise reduction. +Note: For SDT, it is optional to set compression maximum COMPMAXTIME, which is the maximum time difference between stored values regardless of COMPDEV. +``` + +* 创建时间序列语句(简化版本,从v0.13起支持) +``` +CREATE TIMESERIES +SimplifiedAttributeClauses + : WITH? (DATATYPE OPERATOR_EQ)? + ENCODING OPERATOR_EQ + ((COMPRESSOR | COMPRESSION) OPERATOR_EQ )? + (COMMA property)* + tagClause + attributeClause + ; +Eg: CREATE TIMESERIES root.ln.wf01.wt01.status BOOLEAN ENCODING=PLAIN +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE COMPRESSOR=SNAPPY MAX_POINT_NUMBER=3 +Eg: CREATE TIMESERIES root.turbine.d0.s0(temperature) FLOAT ENCODING=RLE COMPRESSOR=SNAPPY tags(unit=f, description='turbine this is a test1') attributes(H_Alarm=100, M_Alarm=50) +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE DEADBAND=SDT COMPDEV=0.01 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE DEADBAND=SDT COMPDEV=0.01 COMPMINTIME=3 +Eg: CREATE TIMESERIES root.ln.wf01.wt01.temperature FLOAT ENCODING=RLE DEADBAND=SDT COMPDEV=0.01 COMPMINTIME=2 COMPMAXTIME=15 +``` + +* 创建对齐时间序列语句 +``` +CREATE ALIGNED TIMESERIES alignedMeasurements +alignedMeasurements + : LR_BRACKET nodeNameWithoutWildcard attributeClauses + (COMMA nodeNameWithoutWildcard attributeClauses)+ RR_BRACKET + ; +Eg: CREATE ALIGNED TIMESERIES root.ln.wf01.GPS(lat FLOAT ENCODING=GORILLA, lon FLOAT ENCODING=GORILLA COMPRESSOR=SNAPPY) +Note: It is not supported to set different compression for a group of aligned timeseries. +Note: It is not currently supported to set an alias, tag, and attribute for aligned timeseries. +``` + +* 创建元数据模板语句 +``` +CREATE SCHEMA TEMPLATE LR_BRACKET (COMMA plateMeasurementClause>)* RR_BRACKET +templateMeasurementClause + : suffixPath attributeClauses #nonAlignedTemplateMeasurement + | suffixPath LR_BRACKET nodeNameWithoutWildcard attributeClauses + (COMMA nodeNameWithoutWildcard attributeClauses)+ RR_BRACKET #alignedTemplateMeasurement + ; +Eg: CREATE SCHEMA TEMPLATE temp1( + s1 INT32 encoding=Gorilla, compression=SNAPPY, + vector1( + s1 INT32 encoding=Gorilla, + s2 FLOAT encoding=RLE, compression=SNAPPY) + ) +``` + +* 挂载元数据模板语句 +``` +SET SCHEMA TEMPLATE TO +Eg: SET SCHEMA TEMPLATE temp1 TO root.beijing +``` + +* 根据元数据模板创建时间序列语句 +``` +CREATE TIMESERIES OF SCHEMA TEMPLATE ON +Eg: CREATE TIMESERIES OF SCHEMA TEMPLATE ON root.beijing +``` + +* 卸载元数据模板语句 +``` +UNSET SCHEMA TEMPLATE FROM +Eg: UNSET SCHEMA TEMPLATE temp1 FROM root.beijing +``` + +* 删除时间序列语句 + +``` +(DELETE | DROP) TIMESERIES [COMMA ]* +Eg: IoTDB > DELETE TIMESERIES root.ln.wf01.wt01.status +Eg: IoTDB > DELETE TIMESERIES root.ln.wf01.wt01.status, root.ln.wf01.wt01.temperature +Eg: IoTDB > DELETE TIMESERIES root.ln.wf01.wt01.* +Eg: IoTDB > DROP TIMESERIES root.ln.wf01.wt01.* +``` + +* 修改时间序列标签属性语句 + +``` +ALTER TIMESERIES fullPath alterClause +alterClause + : RENAME beforeName=ID TO currentName=ID + | SET property (COMMA property)* + | DROP ID (COMMA ID)* + | ADD TAGS property (COMMA property)* + | ADD ATTRIBUTES property (COMMA property)* + | UPSERT tagClause attributeClause + ; +attributeClause + : (ATTRIBUTES LR_BRACKET property (COMMA property)* RR_BRACKET)? + ; +tagClause + : (TAGS LR_BRACKET property (COMMA property)* RR_BRACKET)? + ; +Eg: ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +Eg: ALTER timeseries root.turbine.d1.s1 SET tag1=newV1, attr1=newV1 +Eg: ALTER timeseries root.turbine.d1.s1 DROP tag1, tag2 +Eg: ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4 +Eg: ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3=v3, attr4=v4 +EG: ALTER timeseries root.turbine.d1.s1 UPSERT TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4) +``` + +* 显示所有时间序列语句 + +``` +SHOW TIMESERIES +Eg: IoTDB > SHOW TIMESERIES +Note: This statement can only be used in IoTDB Client. If you need to show all timeseries in JDBC, please use `DataBaseMetadata` interface. +``` + +* 显示特定时间序列语句 + +``` +SHOW TIMESERIES +Eg: IoTDB > SHOW TIMESERIES root.** +Eg: IoTDB > SHOW TIMESERIES root.ln.** +Eg: IoTDB > SHOW TIMESERIES root.ln.*.*.status +Eg: IoTDB > SHOW TIMESERIES root.ln.wf01.wt01.status +Note: The path can be timeseries path or path pattern +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示满足条件的时间序列语句 + +``` +SHOW TIMESERIES pathPattern? showWhereClause? +showWhereClause + : WHERE (property | containsExpression) + ; +containsExpression + : name=ID OPERATOR_CONTAINS value=propertyValue + ; + +Eg: show timeseries root.ln.** where unit='c' +Eg: show timeseries root.ln.** where description contains 'test1' +``` + +* 分页显示满足条件的时间序列语句 + +``` +SHOW TIMESERIES pathPattern? showWhereClause? limitClause? + +showWhereClause + : WHERE (property | containsExpression) + ; +containsExpression + : name=ID OPERATOR_CONTAINS value=propertyValue + ; +limitClause + : LIMIT INT offsetClause? + | offsetClause? LIMIT INT + ; + +Eg: show timeseries root.ln.** where unit='c' +Eg: show timeseries root.ln.** where description contains 'test1' +Eg: show timeseries root.ln.** where unit='c' limit 10 offset 10 +``` + +* 查看所有 database 语句 + +``` +SHOW DATABASES +Eg: IoTDB > SHOW DATABASES +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示特定 database + +``` +SHOW DATABASES +Eg: IoTDB > SHOW DATABASES root.* +Eg: IoTDB > SHOW DATABASES root.** +Eg: IoTDB > SHOW DATABASES root.ln +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示 Merge 状态语句 + +``` +SHOW MERGE INFO +Eg: IoTDB > SHOW MERGE INFO +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示指定路径下时间序列数语句 + +``` +COUNT TIMESERIES +Eg: IoTDB > COUNT TIMESERIES root.** +Eg: IoTDB > COUNT TIMESERIES root.ln.** +Eg: IoTDB > COUNT TIMESERIES root.ln.*.*.status +Eg: IoTDB > COUNT TIMESERIES root.ln.wf01.wt01.status +Note: The path can be timeseries path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +``` +COUNT TIMESERIES GROUP BY LEVEL= +Eg: IoTDB > COUNT TIMESERIES root.** GROUP BY LEVEL=1 +Eg: IoTDB > COUNT TIMESERIES root.ln.** GROUP BY LEVEL=2 +Eg: IoTDB > COUNT TIMESERIES root.ln.wf01.* GROUP BY LEVEL=3 +Note: The path can be timeseries path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示指定路径下特定层级的节点数语句 + +``` +COUNT NODES LEVEL= +Eg: IoTDB > COUNT NODES root.** LEVEL=2 +Eg: IoTDB > COUNT NODES root.ln.** LEVEL=2 +Eg: IoTDB > COUNT NODES root.ln.*.* LEVEL=3 +Eg: IoTDB > COUNT NODES root.ln.wf01.* LEVEL=3 +Note: The path can be timeseries path or path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示所有设备语句 + +``` +SHOW DEVICES (WITH DATABASE)? limitClause? +Eg: IoTDB > SHOW DEVICES +Eg: IoTDB > SHOW DEVICES WITH DATABASE +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示特定设备语句 + +``` +SHOW DEVICES (WITH DATABASE)? limitClause? +Eg: IoTDB > SHOW DEVICES root.** +Eg: IoTDB > SHOW DEVICES root.ln.* +Eg: IoTDB > SHOW DEVICES root.*.wf01 +Eg: IoTDB > SHOW DEVICES root.ln.* WITH DATABASE +Eg: IoTDB > SHOW DEVICES root.*.wf01 WITH DATABASE +Note: The path can be path pattern. +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示 ROOT 节点的子节点名称语句 + +``` +SHOW CHILD PATHS +Eg: IoTDB > SHOW CHILD PATHS +Note: This statement can be used in IoTDB Client and JDBC. +``` + +* 显示子节点名称语句 + +``` +SHOW CHILD PATHS +Eg: IoTDB > SHOW CHILD PATHS root +Eg: IoTDB > SHOW CHILD PATHS root.ln +Eg: IoTDB > SHOW CHILD PATHS root.*.wf01 +Eg: IoTDB > SHOW CHILD PATHS root.ln.wf* +Note: This statement can be used in IoTDB Client and JDBC. +``` + +## 数据管理语句 + +* 插入记录语句 + +``` +INSERT INTO LPAREN TIMESTAMP COMMA [COMMA ]* RPAREN VALUES LPAREN , [COMMA ]* RPAREN +Sensor : Identifier +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,status) VALUES(NOW(), false) +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,temperature) VALUES(2017-11-01T00:17:00.000+08:00,24.22028) +Eg: IoTDB > INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) VALUES (1509466680000,false,20.060787) +Eg: IoTDB > INSERT INTO root.sg.d1(timestamp,(s1,s2),(s3,s4)) VALUES (1509466680000,(1.0,2),(NULL,4)) +Note: the statement needs to satisfy this constraint: + = +Note: The order of Sensor and PointValue need one-to-one correspondence +``` + +* 删除记录语句 + +``` +DELETE FROM [COMMA ]* [WHERE ]? +WhereClause : [(AND) ]* +Condition : [(AND) ]* +TimeExpr : TIME PrecedenceEqualOperator ( | ) +Eg: DELETE FROM root.ln.wf01.wt01.temperature WHERE time > 2016-01-05T00:15:00+08:00 and time < 2017-11-1T00:05:00+08:00 +Eg: DELETE FROM root.ln.wf01.wt01.status, root.ln.wf01.wt01.temperature WHERE time < NOW() +Eg: DELETE FROM root.ln.wf01.wt01.* WHERE time >= 1509466140000 +``` + +* 选择记录语句 + +``` +SELECT FROM [WHERE ]? +SelectClause : (COMMA )* +SelectPath : LPAREN RPAREN | +FUNCTION : ‘COUNT’ , ‘MIN_TIME’, ‘MAX_TIME’, ‘MIN_VALUE’, ‘MAX_VALUE’ +FromClause : (COMMA )? +WhereClause : [(AND | OR) ]* +Condition : [(AND | OR) ]* +Expression : [NOT | !]? | [NOT | !]? +TimeExpr : TIME PrecedenceEqualOperator ( | ) +RelativeTimeDurationUnit = Integer ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS') +RelativeTime : (now() | ) [(+|-) RelativeTimeDurationUnit]+ +SensorExpr : ( | ) PrecedenceEqualOperator +Eg: IoTDB > SELECT status, temperature FROM root.ln.wf01.wt01 WHERE temperature < 24 and time > 2017-11-01 00:13:00 +Eg. IoTDB > SELECT ** FROM root +Eg. IoTDB > SELECT * FROM root.** +Eg. IoTDB > SELECT * FROM root.** where time > now() - 5m +Eg. IoTDB > SELECT * FROM root.ln.*.wf* +Eg. IoTDB > SELECT COUNT(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 +Eg. IoTDB > SELECT MIN_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 +Eg. IoTDB > SELECT MAX_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature > 24 +Eg. IoTDB > SELECT MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature > 23 +Eg. IoTDB > SELECT MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 +Eg. IoTDB > SELECT COUNT(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 25 GROUP BY LEVEL=1 +Note: the statement needs to satisfy this constraint: (SelectClause) + (FromClause) = +Note: If the (WhereClause) is started with and not with ROOT, the statement needs to satisfy this constraint: (FromClause) + (SensorExpr) = +Note: In Version 0.7.0, if includes `OR`, time filter can not be used. +Note: There must be a space on both sides of the plus and minus operator appearing in the time expression +``` + +* Group By 语句 + +``` +SELECT FROM WHERE GROUP BY +SelectClause : [COMMA < Function >]* +Function : LPAREN RPAREN +FromClause : +WhereClause : [(AND | OR) ]* +Condition : [(AND | OR) ]* +Expression : [NOT | !]? | [NOT | !]? +TimeExpr : TIME PrecedenceEqualOperator ( | ) +RelativeTimeDurationUnit = Integer ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS') +RelativeTime : (now() | ) [(+|-) RelativeTimeDurationUnit]+ +SensorExpr : ( | ) PrecedenceEqualOperator +GroupByTimeClause : LPAREN COMMA (COMMA )? RPAREN +TimeInterval: LSBRACKET COMMA RRBRACKET | LRBRACKET COMMA RSBRACKET +TimeUnit : Integer +DurationUnit : "ms" | "s" | "m" | "h" | "d" | "w" | "mo" +Eg: SELECT COUNT(status), COUNT(temperature) FROM root.ln.wf01.wt01 where temperature < 24 GROUP BY([1509465720000, 1509466380000), 5m) +Eg: SELECT COUNT(status), COUNT(temperature) FROM root.ln.wf01.wt01 where temperature < 24 GROUP BY((1509465720000, 1509466380000], 5m) +Eg. SELECT COUNT (status), MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE time < 1509466500000 GROUP BY([1509465720000, 1509466380000), 5m, 10m) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ([1509466140000, 1509466380000), 3m, 5ms) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 3m, 5ms) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 1mo) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 1mo, 1mo) +Eg. SELECT MIN_TIME(status), MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE temperature < 25 GROUP BY ((1509466140000, 1509466380000], 1mo, 2mo) +Note: the statement needs to satisfy this constraint: (SelectClause) + (FromClause) = +Note: If the (WhereClause) is started with and not with ROOT, the statement needs to satisfy this constraint: (FromClause) + (SensorExpr) = +Note: (TimeInterval) needs to be greater than 0 +Note: First (TimeInterval) in needs to be smaller than second (TimeInterval) +Note: needs to be greater than 0 +Note: Third if set shouldn't be smaller than second +Note: If the second is "mo", the third need to be in month +Note: If the third is "mo", the second can be in any unit +``` + +* Fill 语句 + +``` +SELECT FROM WHERE FILL +SelectClause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : +WhereExpression : TIME EQUAL +FillClause : LPAREN [COMMA ]* RPAREN +TypeClause : | | | | | +Int32Clause: INT32 LBRACKET ( | ) RBRACKET +Int64Clause: INT64 LBRACKET ( | ) RBRACKET +FloatClause: FLOAT LBRACKET ( | ) RBRACKET +DoubleClause: DOUBLE LBRACKET ( | ) RBRACKET +BoolClause: BOOLEAN LBRACKET ( | ) RBRACKET +TextClause: TEXT LBRACKET ( | ) RBRACKET +PreviousClause : PREVIOUS [COMMA ]? +LinearClause : LINEAR [COMMA COMMA ]? +ValidPreviousTime, ValidBehindTime: +TimeUnit : Integer +DurationUnit : "ms" | "s" | "m" | "h" | "d" | "w" +Eg: SELECT temperature FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL(float[previous, 1m]) +Eg: SELECT temperature,status FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL (float[linear, 1m, 1m], boolean[previous, 1m]) +Eg: SELECT temperature,status,hardware FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL (float[linear, 1m, 1m], boolean[previous, 1m], text[previous]) +Eg: SELECT temperature,status,hardware FROM root.ln.wf01.wt01 WHERE time = 2017-11-01T16:37:50.000 FILL (float[linear], boolean[previous, 1m], text[previous]) +Note: the statement needs to satisfy this constraint: (FromClause) + (SelectClause) = +Note: Integer in needs to be greater than 0 +``` + +* Group By Fill 语句 + +``` +# time 区间规则为:只能为左开右闭或左闭右开,例如:[20, 100) + +SELECT FROM WHERE GROUP BY (FILL )? +GroupByClause : LPAREN COMMA RPAREN +GROUPBYFillClause : LPAREN RPAREN +TypeClause : | | | | | | +AllClause: ALL LBRACKET ( | ) RBRACKET +Int32Clause: INT32 LBRACKET ( | ) RBRACKET +Int64Clause: INT64 LBRACKET ( | ) RBRACKET +FloatClause: FLOAT LBRACKET ( | ) RBRACKET +DoubleClause: DOUBLE LBRACKET ( | ) RBRACKET +BoolClause: BOOLEAN LBRACKET ( | ) RBRACKET +TextClause: TEXT LBRACKET ( | ) RBRACKET +PreviousClause : PREVIOUS +PreviousUntilLastClause : PREVIOUSUNTILLAST +Eg: SELECT last_value(temperature) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (float[PREVIOUS]) +Eg: SELECT last_value(temperature) FROM root.ln.wf01.wt01 GROUP BY((15, 100], 5m) FILL (float[PREVIOUS]) +Eg: SELECT last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (int32[PREVIOUSUNTILLAST]) +Eg: SELECT last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (int32[PREVIOUSUNTILLAST, 5m]) +Eg: SELECT last_value(temperature), last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (ALL[PREVIOUS]) +Eg: SELECT last_value(temperature), last_value(power) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (ALL[PREVIOUS, 5m]) +Note: In group by fill, sliding step is not supported in group by clause +Note: Now, only last_value aggregation function is supported in group by fill. +Note: Linear fill is not supported in group by fill. +``` + +* Order by time 语句 + +``` +SELECT FROM WHERE GROUP BY (FILL )? orderByTimeClause? +orderByTimeClause: order by time (asc | desc)? + +Eg: SELECT last_value(temperature) FROM root.ln.wf01.wt01 GROUP BY([20, 100), 5m) FILL (float[PREVIOUS]) order by time desc +Eg: SELECT * from root.** order by time desc +Eg: SELECT * from root.** order by time desc align by device +Eg: SELECT * from root.** order by time desc disable align +Eg: SELECT last * from root order by time desc +``` + +* Limit & SLimit 语句 + +``` +SELECT FROM [WHERE ] [] [] +SelectClause : [ | Function]+ +Function : LPAREN RPAREN +FromClause : +WhereClause : [(AND | OR) ]* +Condition : [(AND | OR) ]* +Expression: [NOT|!]? | [NOT|!]? +TimeExpr : TIME PrecedenceEqualOperator ( | ) +RelativeTimeDurationUnit = Integer ('Y'|'MO'|'W'|'D'|'H'|'M'|'S'|'MS'|'US'|'NS') +RelativeTime : (now() | ) [(+|-) RelativeTimeDurationUnit]+ +SensorExpr : (|) PrecedenceEqualOperator +LIMITClause : LIMIT [OFFSETClause]? +N : Integer +OFFSETClause : OFFSET +OFFSETValue : Integer +SLIMITClause : SLIMIT [SOFFSETClause]? +SN : Integer +SOFFSETClause : SOFFSET +SOFFSETValue : Integer +Eg: IoTDB > SELECT status, temperature FROM root.ln.wf01.wt01 WHERE temperature < 24 and time > 2017-11-01 00:13:00 LIMIT 3 OFFSET 2 +Eg. IoTDB > SELECT COUNT (status), MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE time < 1509466500000 GROUP BY([1509465720000, 1509466380000), 5m) LIMIT 3 +Note: N, OFFSETValue, SN and SOFFSETValue must be greater than 0. +Note: The order of and does not affect the grammatical correctness. +Note: can not use but not . +``` + +* Align by device 语句 + +``` +AlignbyDeviceClause : ALIGN BY DEVICE + +规则: +1. 大小写不敏感。 +正例:select * from root.sg1.** align by device +正例:select * from root.sg1.** ALIGN BY DEVICE + +2. AlignbyDeviceClause 只能放在末尾。 +正例:select * from root.sg1.** where time > 10 align by device +错例:select * from root.sg1.** align by device where time > 10 + +3. Select 子句中的 path 只能是单层,或者通配符,不允许有 path 分隔符"."。 +正例:select s0,s1 from root.sg1.* align by device +正例:select s0,s1 from root.sg1.d0, root.sg1.d1 align by device +正例:select * from root.sg1.* align by device +正例:select * from root.** align by device +正例:select s0,s1,* from root.*.* align by device +错例:select d0.s1, d0.s2, d1.s0 from root.sg1 align by device +错例:select *.s0, *.s1 from root.* align by device +错例:select *.*.* from root align by device + +4. 相同 measurement 的各设备的数据类型必须都相同, + +正例:select s0 from root.sg1.d0,root.sg1.d1 align by device +root.sg1.d0.s0 and root.sg1.d1.s0 are both INT32. + +正例:select count(s0) from root.sg1.d0,root.sg1.d1 align by device +count(root.sg1.d0.s0) and count(root.sg1.d1.s0) are both INT64. + +错例:select s0 from root.sg1.d0, root.sg2.d3 align by device +root.sg1.d0.s0 is INT32 while root.sg2.d3.s0 is FLOAT. + +5. 结果集的展示规则:对于 select 中给出的列,不论是否有数据(是否被注册),均会被显示。此外,select 子句中还支持常数列(例如,'a', '123'等等)。 +例如,"select s0,s1,s2,'abc',s1,s2 from root.sg.d0, root.sg.d1, root.sg.d2 align by device". 假设只有下述三列有数据: +- root.sg.d0.s0 +- root.sg.d0.s1 +- root.sg.d1.s0 + +结果集形如: + +| Time | Device | s0 | s1 | s2 | 'abc' | s1 | s2 | +| --- | --- | ---| ---| null | 'abc' | ---| null | +| 1 |root.sg.d0| 20 | 2.5| null | 'abc' | 2.5| null | +| 2 |root.sg.d0| 23 | 3.1| null | 'abc' | 3.1| null | +| ... | ... | ...| ...| null | 'abc' | ...| null | +| 1 |root.sg.d1| 12 |null| null | 'abc' |null| null | +| 2 |root.sg.d1| 19 |null| null | 'abc' |null| null | +| ... | ... | ...| ...| null | 'abc' | ...| null | + +注意注意 设备'root.sg.d1'的's0'的值全为 null + +6. 在 From 中重复写设备名字或者设备前缀是没有任何作用的。 +例如,"select s0,s1 from root.sg.d0,root.sg.d0,root.sg.d1 align by device" 等于 "select s0,s1 from root.sg.d0,root.sg.d1 align by device". +例如。"select s0,s1 from root.sg.*,root.sg.d0 align by device" 等于 "select s0,s1 from root.sg.* align by device". + +7. 在 Select 子句中重复写列名是生效的。例如,"select s0,s0,s1 from root.sg.* align by device" 不等于 "select s0,s1 from root.sg.* align by device". + +8. 在 Where 子句中时间过滤条件和值过滤条件均可以使用,值过滤条件可以使用叶子节点 path,或以 root 开头的整个 path,不允许存在通配符。例如, +- select * from root.sg.* where time = 1 align by device +- select * from root.sg.* where s0 < 100 align by device +- select * from root.sg.* where time < 20 AND s0 > 50 align by device +- select * from root.sg.d0 where root.sg.d0.s0 = 15 align by device + +9. 更多正例: + - select * from root.vehicle.* align by device + - select s0,s0,s1 from root.vehicle.* align by device + - select s0,s1 from root.vehicle.* limit 10 offset 1 align by device + - select * from root.vehicle.* slimit 10 soffset 2 align by device + - select * from root.vehicle.* where time > 10 align by device + - select * from root.vehicle.* where time < 10 AND s0 > 25 align by device + - select * from root.vehicle.* where root.vehicle.d0.s0>0 align by device + - select count(*) from root.vehicle.* align by device + - select sum(*) from root.vehicle.* GROUP BY (20ms,0,[2,50]) align by device + - select * from root.vehicle.* where time = 3 Fill(int32[previous, 5ms]) align by device +``` + +* Disable align 语句 + +``` +规则: +1. 大小写均可。 +正例:select * from root.sg1.* disable align +正例:select * from root.sg1.* DISABLE ALIGN + +2. Disable Align 只能用于查询语句句尾。 +正例:select * from root.sg1.* where time > 10 disable align +错例:select * from root.sg1.* disable align where time > 10 + +3. Disable Align 不能用于聚合查询、Fill 语句、Group by 或 Group by device 语句,但可用于 Limit 语句。 +正例:select * from root.sg1.* limit 3 offset 2 disable align +正例:select * from root.sg1.* slimit 3 soffset 2 disable align +错例:select count(s0),count(s1) from root.sg1.d1 disable align +错例:select * from root.vehicle.* where root.vehicle.d0.s0>0 disable align +错例:select * from root.vehicle.* align by device disable align + +4. 结果显示若无数据显示为空白。 + +查询结果样式如下表: +| Time | root.sg.d0.s1 | Time | root.sg.d0.s2 | Time | root.sg.d1.s1 | +| --- | --- | --- | --- | --- | --- | +| 1 | 100 | 20 | 300 | 400 | 600 | +| 2 | 300 | 40 | 800 | 700 | 900 | +| 4 | 500 | | | 800 | 1000 | +| | | | | 900 | 8000 | + +5. 一些正确使用样例: + - select * from root.vehicle.* disable align + - select s0,s0,s1 from root.vehicle.* disable align + - select s0,s1 from root.vehicle.* limit 10 offset 1 disable align + - select * from root.vehicle.* slimit 10 soffset 2 disable align + - select * from root.vehicle.* where time > 10 disable align + +``` + +* Last 语句 + +Last 语句返回所要查询时间序列的最近时间戳的一条数据 + +``` +SELECT LAST FROM WHERE +Select Clause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : [(AND | OR) ]* +TimeExpr : TIME PrecedenceEqualOperator ( | ) + +Eg. SELECT LAST s1 FROM root.sg.d1 +Eg. SELECT LAST s1, s2 FROM root.sg.d1 +Eg. SELECT LAST s1 FROM root.sg.d1, root.sg.d2 +Eg. SELECT LAST s1 FROM root.sg.d1 where time > 100 +Eg. SELECT LAST s1, s2 FROM root.sg.d1 where time >= 500 + +规则: +1. 需要满足 PrefixPath.Path 为一条完整的时间序列,即 + = + +2. 当前 SELECT LAST 语句只支持包含'>'或'>='的时间过滤条件 + +3. 结果集以四列的表格的固定形式返回。 +例如 "select last s1, s2 from root.sg.d1, root.sg.d2", 结果集返回如下: + +| Time | timeseries | value | dataType | +| --- | ------------- | ----- | -------- | +| 5 | root.sg.d1.s1 | 100 | INT32 | +| 2 | root.sg.d1.s2 | 400 | INT32 | +| 4 | root.sg.d2.s1 | 250 | INT32 | +| 9 | root.sg.d2.s2 | 600 | INT32 | + +4. 注意 LAST 语句不支持与"disable align"关键词一起使用。 + +``` + +* As 语句 + +As 语句为 SELECT 语句中出现的时间序列规定一个别名 + +``` +在每个查询中都可以使用 As 语句来规定时间序列的别名,但是对于通配符的使用有一定限制。 + +1. 原始数据查询: +select s1 as speed, s2 as temperature from root.sg.d1 + +结果集将显示为: +| Time | speed | temperature | +| ... | ... | .... | + +2. 聚合查询 +select count(s1) as s1_num, max_value(s2) as s2_max from root.sg.d1 + +3. 降频聚合查询 +select count(s1) as s1_num from root.sg.d1 group by ([100,500), 80ms) + +4. 按设备对齐查询 +select s1 as speed, s2 as temperature from root.sg.d1 align by device + +select count(s1) as s1_num, count(s2), count(s3) as s3_num from root.sg.d2 align by device + +5. 最新数据查询 +select last s1 as speed, s2 from root.sg.d1 + +规则: +1. 除按设备对齐查询外,每一个 AS 语句必须唯一对应一个时间序列。 + +E.g. select s1 as temperature from root.sg.* + +此时如果 database root.sg.* 中含有多个设备,则会抛出异常。 + +2. 按设备对齐查询中,每个 AS 语句对应的前缀路径可以含多个设备,而后缀路径不能含多个传感器。 + +E.g. select s1 as temperature from root.sg.* + +这种情况即使有多个设备,也可以正常显示。 + +E.g. select * as temperature from root.sg.d1 + +这种情况如果 * 匹配多个传感器,则无法正常显示。 + +``` +* Regexp 语句 + +Regexp语句仅支持数据类型为 TEXT的列进行过滤,传入的过滤条件为 Java 标准库风格的正则表达式 +``` +SELECT FROM WHERE +Select Clause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : andExpression (OPERATOR_OR andExpression)* +andExpression : predicate (OPERATOR_AND predicate)* +predicate : (suffixPath | fullPath) REGEXP regularExpression +regularExpression: Java standard regularexpression, like '^[a-z][0-9]$', [details](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) + +Eg. select s1 from root.sg.d1 where s1 regexp '^[0-9]*$' +Eg. select s1, s2 FROM root.sg.d1 where s1 regexp '^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$' and s2 regexp '^\d{15}|\d{18}$' +Eg. select * from root.sg.d1 where s1 regexp '^[a-zA-Z]\w{5,17}$' +Eg. select * from root.sg.d1 where s1 regexp '^\d{4}-\d{1,2}-\d{1,2}' and time > 100 +``` + +* Like 语句 + +Like语句的用法和mysql相同, 但是仅支持对数据类型为 TEXT的列进行过滤 +``` +SELECT FROM WHERE +Select Clause : [COMMA ]* +FromClause : < PrefixPath > [COMMA < PrefixPath >]* +WhereClause : andExpression (OPERATOR_OR andExpression)* +andExpression : predicate (OPERATOR_AND predicate)* +predicate : (suffixPath | fullPath) LIKE likeExpression +likeExpression : string that may contains "%" or "_", while "%value" means a string that ends with the value, "value%" means a string starts with the value, "%value%" means string that contains values, and "_" represents any character. + +Eg. select s1 from root.sg.d1 where s1 like 'abc' +Eg. select s1, s2 from root.sg.d1 where s1 like 'abc%' +Eg. select * from root.sg.d1 where s1 like 'abc_' +Eg. select * from root.sg.d1 where s1 like 'abc\%' +这种情况,'\%'表示'%'将会被转义 +结果集将显示为: +| Time | Path | Value | +| --- | ------------ | ----- | +| 200 | root.sg.d1.s1| abc% | +``` + +## 数据库管理语句 + +* 创建用户 + +``` +CREATE USER ; +userName:=identifier +password:=string +Eg: IoTDB > CREATE USER thulab 'passwd'; +``` + +* 删除用户 + +``` +DROP USER ; +userName:=identifier +Eg: IoTDB > DROP USER xiaoming; +``` + +* 创建角色 + +``` +CREATE ROLE ; +roleName:=identifie +Eg: IoTDB > CREATE ROLE admin; +``` + +* 删除角色 + +``` +DROP ROLE ; +roleName:=identifier +Eg: IoTDB > DROP ROLE admin; +``` + +* 赋予用户权限 + +``` +GRANT USER PRIVILEGES ON ; +userName:=identifier +nodeName:=identifier (DOT identifier)* +privileges:= string (COMMA string)* +Eg: IoTDB > GRANT USER tempuser PRIVILEGES DELETE_TIMESERIES on root.ln; +``` + +* 赋予角色权限 + +``` +GRANT ROLE PRIVILEGES ON ; +privileges:= string (COMMA string)* +roleName:=identifier +nodeName:=identifier (DOT identifier)* +Eg: IoTDB > GRANT ROLE temprole PRIVILEGES DELETE_TIMESERIES ON root.ln; +``` + +* 赋予用户角色 + +``` +GRANT TO ; +roleName:=identifier +userName:=identifier +Eg: IoTDB > GRANT temprole TO tempuser; +``` + +* 撤销用户权限 + +``` +REVOKE USER PRIVILEGES ON ; +privileges:= string (COMMA string)* +userName:=identifier +nodeName:=identifier (DOT identifier)* +Eg: IoTDB > REVOKE USER tempuser PRIVILEGES DELETE_TIMESERIES on root.ln; +``` + +* 撤销角色权限 + +``` +REVOKE ROLE PRIVILEGES ON ; +privileges:= string (COMMA string)* +roleName:= identifier +nodeName:=identifier (DOT identifier)* +Eg: IoTDB > REVOKE ROLE temprole PRIVILEGES DELETE_TIMESERIES ON root.ln; +``` + +* 撤销用户角色 + +``` +REVOKE FROM ; +roleName:=identifier +userName:=identifier +Eg: IoTDB > REVOKE temprole FROM tempuser; +``` + +* 列出用户 + +``` +LIST USER +Eg: IoTDB > LIST USER +``` + +* 列出角色 + +``` +LIST ROLE +Eg: IoTDB > LIST ROLE +``` + +* 列出权限 + +``` +LIST PRIVILEGES USER ON ; +username:=identifier +path=‘root’ (DOT identifier)* +Eg: IoTDB > LIST PRIVILEGES USER sgcc_wirte_user ON root.sgcc; +``` + +* 列出角色权限 + +``` +LIST ROLE PRIVILEGES +roleName:=identifier +Eg: IoTDB > LIST ROLE PRIVILEGES actor; +``` + +* 列出角色在具体路径上的权限 + +``` +LIST PRIVILEGES ROLE ON ; +roleName:=identifier +path=‘root’ (DOT identifier)* +Eg: IoTDB > LIST PRIVILEGES ROLE wirte_role ON root.sgcc; +``` + +* 列出用户权限 + +``` +LIST USER PRIVILEGES ; +username:=identifier +Eg: IoTDB > LIST USER PRIVILEGES tempuser; +``` + +* 列出用户角色 + +``` +LIST ALL ROLE OF USER ; +username:=identifier +Eg: IoTDB > LIST ALL ROLE OF USER tempuser; +``` + +* 列出角色用户 + +``` +LIST ALL USER OF ROLE ; +roleName:=identifier +Eg: IoTDB > LIST ALL USER OF ROLE roleuser; +``` + +* 更新密码 + +``` +ALTER USER SET PASSWORD ; +roleName:=identifier +password:=string +Eg: IoTDB > ALTER USER tempuser SET PASSWORD 'newpwd'; +``` + +## 功能 + +* COUNT + +``` +SELECT COUNT(Path) (COMMA COUNT(Path))* FROM [WHERE ]? +Eg. SELECT COUNT(status), COUNT(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* FIRST_VALUE +原有的 `FIRST` 方法在 `v0.10.0` 版本更名为 `FIRST_VALUE`。 + +``` +SELECT FIRST_VALUE (Path) (COMMA FIRST_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT FIRST_VALUE (status), FIRST_VALUE (temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* LAST_VALUE + +``` +SELECT LAST_VALUE (Path) (COMMA LAST_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT LAST_VALUE (status), LAST_VALUE (temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MAX_TIME + +``` +SELECT MAX_TIME (Path) (COMMA MAX_TIME (Path))* FROM [WHERE ]? +Eg. SELECT MAX_TIME(status), MAX_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MAX_VALUE + +``` +SELECT MAX_VALUE (Path) (COMMA MAX_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT MAX_VALUE(status), MAX_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* EXTREME +极值:具有最大绝对值的值(正值优先) +``` +SELECT EXTREME (Path) (COMMA EXT (Path))* FROM [WHERE ]? +Eg. SELECT EXTREME(status), EXTREME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* AVG +原有的 `MEAN` 方法在 `v0.9.0` 版本更名为 `AVG`。 + +``` +SELECT AVG (Path) (COMMA AVG (Path))* FROM [WHERE ]? +Eg. SELECT AVG (temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MIN_TIME + +``` +SELECT MIN_TIME (Path) (COMMA MIN_TIME (Path))*FROM [WHERE ]? +Eg. SELECT MIN_TIME(status), MIN_TIME(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* MIN_VALUE + +``` +SELECT MIN_VALUE (Path) (COMMA MIN_VALUE (Path))* FROM [WHERE ]? +Eg. SELECT MIN_VALUE(status),MIN_VALUE(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +* NOW + +``` +NOW() +Eg. INSERT INTO root.ln.wf01.wt01(timestamp,status) VALUES(NOW(), false) +Eg. DELETE FROM root.ln.wf01.wt01.status, root.ln.wf01.wt01.temperature WHERE time < NOW() +Eg. SELECT * FROM root.** WHERE time < NOW() +Eg. SELECT COUNT(temperature) FROM root.ln.wf01.wt01 WHERE time < NOW() +``` + +* SUM + +``` +SELECT SUM(Path) (COMMA SUM(Path))* FROM [WHERE ]? +Eg. SELECT SUM(temperature) FROM root.ln.wf01.wt01 WHERE root.ln.wf01.wt01.temperature < 24 +Note: the statement needs to satisfy this constraint: + = +``` + +## 数据存活时间(TTL) + +IoTDB 支持对 device 级别设置数据存活时间(TTL),这使得 IoTDB 可以定期、自动地删除一定时间之前的数据。合理使用 TTL +可以帮助您控制 IoTDB 占用的总磁盘空间以避免出现磁盘写满等异常。并且,随着文件数量的增多,查询性能往往随之下降, +内存占用也会有所提高。及时地删除一些较老的文件有助于使查询性能维持在一个较高的水平和减少内存资源的占用。 + +TTL的默认单位为毫秒,如果配置文件中的时间精度修改为其他单位,设置ttl时仍然使用毫秒单位。 + +当设置 TTL 时,系统会根据设置的路径寻找所包含的所有 device,并为这些 device 设置 TTL 时间,系统会按设备粒度对过期数据进行删除。 +当设备数据过期后,将不能被查询到,但磁盘文件中的数据不能保证立即删除(会在一定时间内删除),但可以保证最终被删除。 +考虑到操作代价,系统不会立即物理删除超过 TTL 的数据,而是通过合并来延迟地物理删除。因此,在数据被物理删除前,如果调小或者解除 TTL,可能会导致之前因 TTL 而不可见的数据重新出现。 +系统中仅能设置至多 1000 条 TTL 规则,达到该上限时,需要先删除部分 TTL 规则才能设置新的规则 + +### TTL Path 规则 +设置的路径 path 只支持前缀路径(即路径中间不能带 \* , 且必须以 \*\* 结尾),该路径会匹配到设备,也允许用户指定不带星的 path 为具体的 database 或 device,当 path 不带 \* 时,会检查是否匹配到 database,若匹配到 database,则会同时设置 path 和 path.\*\*。 +注意:设备 TTL 设置不会对元数据的存在性进行校验,即允许对一条不存在的设备设置 TTL。 +``` +合格的 path: +root.** +root.db.** +root.db.group1.** +root.db +root.db.group1.d1 + +不合格的 path: +root.*.db +root.**.db.* +root.db.* +``` +### TTL 适用规则 +当一个设备适用多条TTL规则时,优先适用较精确和较长的规则。例如对于设备“root.bj.hd.dist001.turbine001”来说,规则“root.bj.hd.dist001.turbine001”比“root.bj.hd.dist001.\*\*”优先,而规则“root.bj.hd.dist001.\*\*”比“root.bj.hd.\*\*”优先; +### 设置 TTL +set ttl 操作可以理解为设置一条 TTL规则,比如 set ttl to root.sg.group1.\*\* 就相当于对所有可以匹配到该路径模式的设备挂载 ttl。 unset ttl 操作表示对相应路径模式卸载 TTL,若不存在对应 TTL,则不做任何事。若想把 TTL 调成无限大,则可以使用 INF 关键字 +设置 TTL 的 SQL 语句如下所示: +``` +set ttl to pathPattern 360000; +``` +pathPattern 是前缀路径,即路径中间不能带 \* 且必须以 \*\* 结尾。 +pathPattern 匹配对应的设备。为了兼容老版本 SQL 语法,允许用户输入的 pathPattern 匹配到 db,则自动将前缀路径扩展为 path.\*\*。 +例如,写set ttl to root.sg 360000 则会自动转化为set ttl to root.sg.\*\* 360000,转化后的语句对所有 root.sg 下的 device 设置TTL。 +但若写的 pathPattern 无法匹配到 db,则上述逻辑不会生效。 +如写set ttl to root.sg.group 360000 ,由于root.sg.group未匹配到 db,则不会被扩充为root.sg.group.\*\*。 也允许指定具体 device,不带 \*。 +### 取消 TTL + +取消 TTL 的 SQL 语句如下所示: + +``` +IoTDB> unset ttl from root.ln +``` + +取消设置 TTL 后, `root.ln` 路径下所有的数据都会被保存。 +``` +IoTDB> unset ttl from root.sgcc.** +``` + +取消设置`root.sgcc`路径下的所有的 TTL 。 +``` +IoTDB> unset ttl from root.** +``` + +取消设置所有的 TTL 。 + +新语法 +``` +IoTDB> unset ttl from root.** +``` + +旧语法 +``` +IoTDB> unset ttl to root.** +``` +新旧语法在功能上没有区别并且同时兼容,仅是新语法在用词上更符合常规。 +### 显示 TTL + +显示 TTL 的 SQL 语句如下所示: +show all ttl + +``` +IoTDB> SHOW ALL TTL ++--------------+--------+ +| path| TTL| +| root.**|55555555| +| root.sg2.a.**|44440000| ++--------------+--------+ +``` + +show ttl on pathPattern +``` +IoTDB> SHOW TTL ON root.db.**; ++--------------+--------+ +| path| TTL| +| root.db.**|55555555| +| root.db.a.**|44440000| ++--------------+--------+ +``` +SHOW ALL TTL 这个例子会给出所有的 TTL。 +SHOW TTL ON pathPattern 这个例子会显示指定路径的 TTL。 + +显示设备的 TTL。 +``` +IoTDB> show devices ++---------------+---------+---------+ +| Device|IsAligned| TTL| ++---------------+---------+---------+ +|root.sg.device1| false| 36000000| +|root.sg.device2| true| INF| ++---------------+---------+---------+ +``` +所有设备都一定会有 TTL,即不可能是 null。INF 表示无穷大。 + +* 删除时间分区 (实验性功能) + +``` +DELETE PARTITION StorageGroupName INT(COMMA INT)* +Eg DELETE PARTITION root.sg1 0,1,2 +该例子将删除 database root.sg1 的前三个时间分区 +``` +partitionId 可以通过查看数据文件夹获取,或者是计算 `timestamp / partitionInterval`得到。 + +## 中止查询 + +- 显示正在执行的查询列表 + +``` +SHOW QUERY PROCESSLIST +``` + +- 中止查询 + +``` +KILL QUERY INT? +E.g. KILL QUERY +E.g. KILL QUERY 2 +``` + + +## 设置系统为只读/可写入模式 + + +``` +IoTDB> SET SYSTEM TO READONLY +IoTDB> SET SYSTEM TO WRITABLE +``` + +## 标识符列表 + +``` +QUOTE := '\''; +DOT := '.'; +COLON : ':' ; +COMMA := ',' ; +SEMICOLON := ';' ; +LPAREN := '(' ; +RPAREN := ')' ; +LBRACKET := '['; +RBRACKET := ']'; +EQUAL := '=' | '=='; +NOTEQUAL := '<>' | '!='; +LESSTHANOREQUALTO := '<='; +LESSTHAN := '<'; +GREATERTHANOREQUALTO := '>='; +GREATERTHAN := '>'; +DIVIDE := '/'; +PLUS := '+'; +MINUS := '-'; +STAR := '*'; +Letter := 'a'..'z' | 'A'..'Z'; +HexDigit := 'a'..'f' | 'A'..'F'; +Digit := '0'..'9'; +Boolean := TRUE | FALSE | 0 | 1 (case insensitive) + +``` + +``` +StringLiteral := ( '\'' ( ~('\'') )* '\''; +eg. 'abc' +``` + +``` +Integer := ('-' | '+')? Digit+; +eg. 123 +eg. -222 +``` + +``` +Float := ('-' | '+')? Digit+ DOT Digit+ (('e' | 'E') ('-' | '+')? Digit+)?; +eg. 3.1415 +eg. 1.2E10 +eg. -1.33 +``` + +``` +Identifier := (Letter | '_') (Letter | Digit | '_' | MINUS)*; +eg. a123 +eg. _abc123 + +``` + +## 常量列表 + +``` +PointValue : Integer | Float | StringLiteral | Boolean +``` +TimeValue : Integer | DateTime | ISO8601 | NOW() +Note: Integer means timestamp type. + +DateTime : +eg. 2016-11-16T16:22:33+08:00 +eg. 2016-11-16 16:22:33+08:00 +eg. 2016-11-16T16:22:33.000+08:00 +eg. 2016-11-16 16:22:33.000+08:00 +Note: DateTime Type can support several types, see Chapter 3 Datetime section for details. +``` +PrecedenceEqualOperator : EQUAL | NOTEQUAL | LESSTHANOREQUALTO | LESSTHAN | GREATERTHANOREQUALTO | GREATERTHAN +``` +Timeseries : ROOT [DOT \]* DOT \ +LayerName : Identifier +SensorName : Identifier +eg. root.ln.wf01.wt01.status +eg. root.sgcc.wf03.wt01.temperature +Note: Timeseries must be start with `root`(case insensitive) and end with sensor name. +``` + +``` +PrefixPath : ROOT (DOT \)* +LayerName : Identifier | STAR +eg. root.sgcc +eg. root.* +``` +Path: (ROOT | ) (DOT )* +LayerName: Identifier | STAR +eg. root.ln.wf01.wt01.status +eg. root.*.wf01.wt01.status +eg. root.ln.wf01.wt01.* +eg. *.wt01.* +eg. * +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Schema-Template.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Schema-Template.md new file mode 100644 index 00000000..490f2910 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Schema-Template.md @@ -0,0 +1,125 @@ + + +# 元数据模板 + +## 问题背景 + +对于大量的同类型的实体,每一个实体下的物理量都相同,为每个序列注册时间序列一方面时间序列的元数据将占用较多的内存资源,另一方面,大量序列的维护工作也会十分复杂。 + +为了实现同类型不同实体的物理量元数据共享,减少元数据内存占用,同时简化同类型实体的管理,IoTDB引入元数据模板功能。 + +下图展示了一个燃油车场景的数据模型,各地区的多台燃油车的速度、油量、加速度、角速度四个物理量将会被采集,显然这些燃油车实体具备相同的物理量。 + +example without template + +## 概念定义 + +元数据模板(Schema template) + +实际应用中有许多实体所采集的物理量相同,即具有相同的工况名称和类型,可以声明一个**元数据模板**来定义可采集的物理量集合。 + +将元数据模版挂载在树形数据模式的任意节点上,表示该节点下的所有实体具有相同的物理量集合。 + +目前每一条路径节点仅允许挂载一个元数据模板,即当一个节点被挂载元数据模板后,它的祖先节点和后代节点都不能再挂载元数据模板。实体将使用其自身或祖先的元数据模板作为有效模板。 + +特别需要说明的是,挂载模板与使用模板的概念不同。一个节点挂载模板后,其所有后代节点都可以使用这个模板,因此可以通过向同类实体的祖先节点挂载模板来简化操作。当系统向挂载模板的节点(或其后代节点)插入模板中定义的物理量时,这个节点就被设置为“正在使用模板”。 + +使用元数据模板后,问题背景中示例的燃油车数据模型将会转变至下图所示的形式。所有的物理量元数据仅在模板中保存一份,所有的实体共享模板中的元数据。 + +example with template + +### 生命周期 + +了解元数据的生命周期及相关名词,有助于更顺畅地使用元数据模板。在描述这一概念时,有六个关键词分别指向特定的过程,分别是“创建”、“挂载”、“激活”、“解除”、“卸载”、和“删除”。下图展示了一个模板从创建、挂到删除的全部过程。当用户操作执行其中任一过程时,系统都会执行对应条件检查,如条件检查通过,则操作成功,否则,操作会被拒绝: + +1. 创建模板时,检查确认正在创建的模板名称与所有已存在的模板不重复; +2. 在某节点挂载模板,需检查该节点的所有祖先节点与子孙节点,确认均未挂载任何模板; +3. 在某节点激活模板,需检查确认该节点或其祖先已挂载对应模板,且该节点下不存在与模板中同名的物理量; +4. 在某节点解除模板时,需确认该节点已经激活了模板,请注意,解除模板会删除该节点通过模板实例化的物理量及其数据点; +5. 在某节点卸载模板时,需检查确认该节点曾挂载该模板,且其所有子孙节点均不处于模板激活状态; +6. 删除模板时,需检查确认模板没有挂载在任何一个节点上。 + +最后需要补充的是,**对挂载模板与激活模板进行区分,是为了服务一种常见的场景**:在 Apache IoTDB 元数据模型 MTree 中,经常需要在数量众多的节点上“应用”元数据模板,而这些节点一般拥有共同的祖先节点。因此,可以在其共同祖先节点**挂载**模板,而不必对其大量的孩子节点进行挂载操作。对于需要“应用”模板的节点,则应该使用**激活模板**的操作。 + +example with template + +## 使用 + +目前,用户可以通过 Session 编程接口或 IoTDB-SQL 来使用元数据模板,包括模板的创建、修改、挂载与卸载等。Session 编程接口的详细文档可参见[此处](../API/Programming-Java-Native-API.md),IoTDB-SQL 的详细文档可参加[此处](../Operate-Metadata/Template.md)。下文将以 Session 中使用方法为例,进行简要介绍。 + + +* 创建元数据模板 + +在 Session 中创建元数据模板,可以通过先后创建 Template、MeasurementNode 的对象,构造模板内部物理量结构,并通过以下接口创建模板 + +```java +public void createSchemaTemplate(Template template); + +Class Template { + private String name; + private boolean directShareTime; + Map children; + public Template(String name, boolean isShareTime); + + public void addToTemplate(Node node); + public void deleteFromTemplate(String name); + public void setShareTime(boolean shareTime); +} + +Abstract Class Node { + private String name; + public void addChild(Node node); + public void deleteChild(Node node); +} + +Class MeasurementNode extends Node { + TSDataType dataType; + TSEncoding encoding; + CompressionType compressor; + public MeasurementNode(String name, + TSDataType dataType, + TSEncoding encoding, + CompressionType compressor); +} +``` + +* 构造元数据模板 + +构造上图中的元数据模板,并挂载到对应节点,可参考如下代码。**请注意,我们强烈建议您将模板设置在 database 或 database 下层的节点中,以更好地适配未来地更新及各模块的协作。** + +``` java +MeasurementNode nodeV = new MeasurementNode("velocity", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); +MeasurementNode nodeF = new MeasurementNode("fuel_amount", TSDataType.FLOAT, TSEncoding.RLE, CompressionType.SNAPPY); +MeasurementNode nodeA = new MeasurementNode("acceleration", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY); +MeasurementNode nodeAng = new MeasurementNode("angular_velocity", TSDataType.DOUBLE, TSEncoding.GORILLA, CompressionType.SNAPPY); + +Template template = new Template("template"); +template.addToTemplate(nodeV); +template.addToTemplate(nodeF); +template.addToTemplate(nodeA); +template.addToTemplate(nodeAng); + +createSchemaTemplate(template); +setSchemaTemplate("template", "root.Beijing"); +``` + +挂载元数据模板后,即可进行数据的写入。如按上述代码创建并挂载模板,并在 root.Beijing 路径上设置了 database 后,即可写入例如 root.Beijing.petro_vehicle.velocity 等时间序列数据,系统将自动创建 petro_vehicle 节点,并设置其“正在使用模板”,对写入数据应用模板中为 velocity 定义的元数据信息。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/ServerFileList.md b/src/zh/UserGuide/V2.0.1/Tree/stage/ServerFileList.md new file mode 100644 index 00000000..1b397ab4 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/ServerFileList.md @@ -0,0 +1,110 @@ + + +> 下面是 IoTDB 生成或使用的文件 +> +> 持续更新中。.. + +# 单机模式 + +## 配置文件 +> conf 目录下 +1. iotdb-system.properties +2. logback.xml +3. datanode-env.sh +4. jmx.access +5. jmx.password +6. iotdb-sync-client.properties + + 只有 Sync 工具会使用 + +> 在 basedir/system/schema 目录下 +1. system.properties + + 记录的是所有不能变动的配置,启动时会检查,防止系统错误 + +## 状态相关的文件 + +### 元数据相关文件 +> 在 basedir/system/schema 目录下 + +#### 元数据 +1. mlog.bin + + 记录的是元数据操作 +2. mtree-1.snapshot + + 元数据快照 +3. mtree-1.snapshot.tmp + + 临时文件,防止快照更新时,损坏旧快照文件 + +#### 标签和属性 +1. tlog.txt + + 存储每个时序的标签和属性 + + 默认情况下每个时序 700 字节 + +### 数据相关文件 +> 在 basedir/data/目录下 + +#### WAL +> 在 basedir/wal 目录下 +1. {StroageName}-{TsFileName}/wal1 + + 每个 memtable 会对应一个 wal 文件 + +#### TsFile +> 在 basedir/data/sequence or unsequence/{DatabaseName}/{DataRegionId}/{TimePartitionId}/目录下 +1. {time}-{version}-{mergeCnt}.tsfile + + 数据文件 +2. {TsFileName}.tsfile.mod + + 更新文件,主要记录删除操作 + +#### TsFileResource +1. {TsFileName}.tsfile.resource + + TsFile 的概要与索引文件 +2. {TsFileName}.tsfile.resource.temp + + 临时文件,用于避免更新 tsfile.resource 时损坏 tsfile.resource +3. {TsFileName}.tsfile.resource.closing + + 关闭标记文件,用于标记 TsFile 处于关闭状态,重启后可以据此选择是关闭或继续写入该文件 + +#### Version +> 在 basedir/system/databases/{DatabaseName}/{DataRegionId}/{TimePartitionId} or upgrade 目录下 +1. Version-{version} + + 版本号文件,使用文件名来记录当前最大的版本号 + +#### Upgrade +> 在 basedir/system/upgrade 目录下 +1. upgrade.txt + + 记录升级进度 + +#### Merge +> 在 basedir/system/databases/{DatabaseName}/目录下 +1. merge.mods + + 记录合并过程中发生的删除等操作 +2. merge.log + + 记录合并进展 +3. tsfile.merge + + 临时文件,每个顺序文件在合并时会产生一个对应的 merge 文件,用于存放临时数据 + +#### Authority +> 在 basedir/system/users/目录下是用户信息 +> 在 basedir/system/roles/目录下是角色信息 + +#### CompressRatio +> 在 basedir/system/compression_ration 目录下 +1. Ration-{compressionRatioSum}-{calTimes} + + 记录每个文件的压缩率 + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Detailed-Grammar.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Detailed-Grammar.md new file mode 100644 index 00000000..0d263596 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Detailed-Grammar.md @@ -0,0 +1,28 @@ + + +# 词法与文法详细定义 + +请阅读代码仓库中的词法和语法描述文件: + +词法文件:`antlr/src/main/antlr4/org/apache/iotdb/db/qp/sql/IoTDBSqlLexer.g4` + +语法文件:`antlr/src/main/antlr4/org/apache/iotdb/db/qp/sql/IoTDBSqlParser.g4` \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Identifier.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Identifier.md new file mode 100644 index 00000000..69edc327 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Identifier.md @@ -0,0 +1,142 @@ + + +# 标识符 + +## 使用场景 + +在 IoTDB 中,触发器名称、UDF函数名、元数据模板名称、用户与角色名、连续查询标识、Pipe、PipeSink、键值对中的键和值、别名等可以作为标识符。 + +## 约束 + +请注意,此处约束是标识符的通用约束,具体标识符可能还附带其它约束条件,如用户名限制字符数大于等于4,更严格的约束请参考具体标识符相关的说明文档。 + +**标识符命名有以下约束:** + +- 不使用反引号括起的标识符中,允许出现以下字符: + - [ 0-9 a-z A-Z _ ] (字母,数字,下划线) + - ['\u2E80'..'\u9FFF'] (UNICODE 中文字符) + +- 标识符允许使用数字开头、不使用反引号括起的标识符不能全部为数字。 + +- 标识符是大小写敏感的。 + +- 标识符允许为关键字。 + +**如果出现如下情况,标识符需要使用反引号进行引用:** + +- 标识符包含不允许的特殊字符。 +- 标识符为实数。 + +## 如何在反引号引起的标识符中使用引号 + +**在反引号引起的标识符中可以直接使用单引号和双引号。** + +**在用反引号引用的标识符中,可以通过双写反引号的方式使用反引号,即 ` 可以表示为 ``**,示例如下: + +```SQL +# 创建模板 t1`t +create schema template `t1``t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + +# 创建模板 t1't"t +create schema template `t1't"t` +(temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) +``` + +## 特殊情况示例 + +需要使用反引号进行引用的部分情况示例: + +- 触发器名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建触发器 alert.`listener-sg1d1s1 + CREATE TRIGGER `alert.``listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.storageengine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + ``` + +- UDF 名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建名为 111 的 UDF,111 为实数,所以需要用反引号引用。 + CREATE FUNCTION `111` AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- 元数据模板名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建名为 111 的元数据模板,111 为实数,需要用反引号引用。 + create schema template `111` + (temperature FLOAT encoding=RLE, status BOOLEAN encoding=PLAIN compression=SNAPPY) + ``` + +- 用户名、角色名出现上述特殊情况时需使用反引号引用,同时无论是否使用反引号引用,用户名、角色名中均不允许出现空格,具体请参考权限管理章节中的说明。 + + ```sql + # 创建用户 special`user. + CREATE USER `special``user.` 'write_pwd' + + # 创建角色 111 + CREATE ROLE `111` + ``` + +- 连续查询标识出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建连续查询 test.cq + CREATE CONTINUOUS QUERY `test.cq` + BEGIN + SELECT max_value(temperature) + INTO temperature_max + FROM root.ln.*.* + GROUP BY time(10s) + END + ``` + +- Pipe、PipeSink 名称出现上述特殊情况时需使用反引号引用: + + ```sql + # 创建 PipeSink test.*1 + CREATE PIPESINK `test.*1` AS IoTDB ('ip' = '输入你的IP') + + # 创建 Pipe test.*2 + CREATE PIPE `test.*2` TO `test.*1` FROM + (select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS) WITH 'SyncDelOp' = 'true' + ``` + +- Select 子句中可以结果集中的值指定别名,别名可以被定义为字符串或者标识符,示例如下: + + ```sql + select s1 as temperature, s2 as speed from root.ln.wf01.wt01; + # 表头如下所示 + +-----------------------------+-----------+-----+ + | Time|temperature|speed| + +-----------------------------+-----------+-----+ + ``` + +- 用于表示键值对,键值对的键和值可以被定义成常量(包括字符串)或者标识符,具体请参考键值对章节。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/KeyValue-Pair.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/KeyValue-Pair.md new file mode 100644 index 00000000..604bd76e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/KeyValue-Pair.md @@ -0,0 +1,119 @@ + + +# 键值对 + +**键值对的键和值可以被定义为标识符或者常量。** + +下面将介绍键值对的使用场景。 + +- 触发器中表示触发器属性的键值对。参考示例语句中 WITH 后的属性键值对。 + +```SQL +# 以字符串形式表示键值对 +CREATE TRIGGER `alert-listener-sg1d1s1` +AFTER INSERT +ON root.sg1.d1.s1 +AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' +WITH ( + 'lo' = '0', + 'hi' = '100.0' +) + +# 以标识符和常量形式表示键值对 +CREATE TRIGGER `alert-listener-sg1d1s1` +AFTER INSERT +ON root.sg1.d1.s1 +AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' +WITH ( + lo = 0, + hi = 100.0 +) +``` + +- 时间序列中用于表示标签和属性的键值对。 + +```sql +# 创建时间序列时设定标签和属性,用字符串来表示键值对。 +CREATE timeseries root.turbine.d1.s1(temprature) +WITH datatype = FLOAT, encoding = RLE, compression = SNAPPY, 'max_point_number' = '5' +TAGS('tag1' = 'v1', 'tag2'= 'v2') ATTRIBUTES('attr1' = 'v1', 'attr2' = 'v2') + +# 创建时间序列时设定标签和属性,用标识符和常量来表示键值对。 +CREATE timeseries root.turbine.d1.s1(temprature) +WITH datatype = FLOAT, encoding = RLE, compression = SNAPPY, max_point_number = 5 +TAGS(tag1 = v1, tag2 = v2) ATTRIBUTES(attr1 = v1, attr2 = v2) +``` + +```sql +# 修改时间序列的标签和属性 +ALTER timeseries root.turbine.d1.s1 SET 'newTag1' = 'newV1', 'attr1' = 'newV1' + +ALTER timeseries root.turbine.d1.s1 SET newTag1 = newV1, attr1 = newV1 +``` + +```sql +# 修改标签名 +ALTER timeseries root.turbine.d1.s1 RENAME 'tag1' TO 'newTag1' + +ALTER timeseries root.turbine.d1.s1 RENAME tag1 TO newTag1 +``` + +```sql +# 插入别名、标签、属性 +ALTER timeseries root.turbine.d1.s1 UPSERT +ALIAS='newAlias' TAGS('tag2' = 'newV2', 'tag3' = 'v3') ATTRIBUTES('attr3' ='v3', 'attr4'='v4') + +ALTER timeseries root.turbine.d1.s1 UPSERT +ALIAS = newAlias TAGS(tag2 = newV2, tag3 = v3) ATTRIBUTES(attr3 = v3, attr4 = v4) +``` + +```sql +# 添加新的标签 +ALTER timeseries root.turbine.d1.s1 ADD TAGS 'tag3' = 'v3', 'tag4' = 'v4' + +ALTER timeseries root.turbine.d1.s1 ADD TAGS tag3 = v3, tag4 = v4 +``` + +```sql +# 添加新的属性 +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES 'attr3' = 'v3', 'attr4' = 'v4' + +ALTER timeseries root.turbine.d1.s1 ADD ATTRIBUTES attr3 = v3, attr4 = v4 +``` + +```sql +# 查询符合条件的时间序列信息 +SHOW timeseries root.ln.** WHRER 'unit' = 'c' + +SHOW timeseries root.ln.** WHRER unit = c +``` + +- 创建 Pipe 以及 PipeSink 时表示属性的键值对。 + +```SQL +# 创建 PipeSink 时表示属性 +CREATE PIPESINK my_iotdb AS IoTDB ('ip' = '输入你的IP') + +# 创建 Pipe 时在 WITH 子句中表示属性 +CREATE PIPE my_pipe TO my_iotdb FROM +(select ** from root WHERE time>=yyyy-mm-dd HH:MM:SS) WITH 'SyncDelOp' = 'true' +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Keywords-And-Reserved-Words.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Keywords-And-Reserved-Words.md new file mode 100644 index 00000000..8e613158 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Keywords-And-Reserved-Words.md @@ -0,0 +1,26 @@ + + +# 关键字 + +关键字是在 SQL 具有特定含义的词,可以作为标识符。保留字是关键字的一个子集,保留字不能用于标识符。 + +关于 IoTDB 的关键字列表,可以查看 [关键字](https://iotdb.apache.org/zh/UserGuide/Master/Reference/Keywords.html) 。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Literal-Values.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Literal-Values.md new file mode 100644 index 00000000..f2ad963b --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Literal-Values.md @@ -0,0 +1,151 @@ + + +# 语法约定 +## 字面值常量 + +该部分对 IoTDB 中支持的字面值常量进行说明,包括字符串常量、数值型常量、时间戳常量、布尔型常量和空值。 + +### 字符串常量 + +在 IoTDB 中,字符串是由**单引号(`'`)或双引号(`"`)字符括起来的字符序列**。示例如下: + +```Plain%20Text +'a string' +"another string" +``` + +#### 使用场景 + +- `INSERT` 或者 `SELECT` 中用于表达 `TEXT` 类型数据的场景。 + + ```SQL + # insert 示例 + insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') + insert into root.ln.wf02.wt02(timestamp,hardware) values(2, '\\') + + +-----------------------------+--------------------------+ + | Time|root.ln.wf02.wt02.hardware| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.001+08:00| v1| + +-----------------------------+--------------------------+ + |1970-01-01T08:00:00.002+08:00| \\| + +-----------------------------+--------------------------+ + + # select 示例 + select code from root.sg1.d1 where code in ('string1', 'string2'); + ``` + +- `LOAD` / `REMOVE` / `SETTLE` 指令中的文件路径。 + + ```SQL + # load 示例 + LOAD 'examplePath' + + # remove 示例 + REMOVE 'examplePath' + + # SETTLE 示例 + SETTLE 'examplePath' + ``` + +- 用户密码。 + + ```SQL + # 示例,write_pwd 即为用户密码 + CREATE USER ln_write_user 'write_pwd' + ``` + +- 触发器和 UDF 中的类全类名,示例如下: + + ```SQL + # 触发器示例,AS 后使用字符串表示类全类名 + CREATE TRIGGER `alert-listener-sg1d1s1` + AFTER INSERT + ON root.sg1.d1.s1 + AS 'org.apache.iotdb.db.engine.trigger.example.AlertListener' + WITH ( + 'lo' = '0', + 'hi' = '100.0' + ) + + # UDF 示例,AS 后使用字符串表示类全类名 + CREATE FUNCTION example AS 'org.apache.iotdb.udf.UDTFExample' + ``` + +- Select 子句中可以为结果集中的值指定别名,别名可以被定义为字符串或者标识符,示例如下: + + ```SQL + select s1 as 'temperature', s2 as 'speed' from root.ln.wf01.wt01; + + # 表头如下所示 + +-----------------------------+-----------|-----+ + | Time|temperature|speed| + +-----------------------------+-----------|-----+ + ``` + +- 用于表示键值对,键值对的键和值可以被定义成常量(包括字符串)或者标识符,具体请参考键值对章节。 + +#### 如何在字符串内使用引号 + +- 在单引号引起的字符串内,双引号无需特殊处理。同理,在双引号引起的字符串内,单引号无需特殊处理。 +- 在单引号引起的字符串里,可以通过双写单引号来表示一个单引号,即单引号 ' 可以表示为 ''。 +- 在双引号引起的字符串里,可以通过双写双引号来表示一个双引号,即双引号 " 可以表示为 ""。 + +字符串内使用引号的示例如下: + +```Plain%20Text +'string' // string +'"string"' // "string" +'""string""' // ""string"" +'''string' // 'string + +"string" // string +"'string'" // 'string' +"''string''" // ''string'' +"""string" // "string +``` + +### 数值型常量 + +数值型常量包括整型和浮点型。 + +整型常量是一个数字序列。可以以 `+` 或 `-` 开头表示正负。例如:`1`, `-1`。 + +带有小数部分或由科学计数法表示的为浮点型常量,例如:`.1`, `3.14`, `-2.23`, `+1.70`, `1.2E3`, `1.2E-3`, `-1.2E3`, `-1.2E-3`。 + +在 IoTDB 中,`INT32` 和 `INT64` 表示整数类型(计算是准确的),`FLOAT` 和 `DOUBLE` 表示浮点数类型(计算是近似的)。 + +在浮点上下文中可以使用整数,它会被解释为等效的浮点数。 + +### 时间戳常量 + +时间戳是一个数据到来的时间点,在 IoTDB 中分为绝对时间戳和相对时间戳。详细信息可参考 [数据类型文档](https://iotdb.apache.org/zh/UserGuide/Master/Data-Concept/Data-Type.html)。 + +特别地,`NOW()`表示语句开始执行时的服务端系统时间戳。 + +### 布尔型常量 + +布尔值常量 `TRUE` 和 `FALSE` 分别等价于 `1` 和 `0`,它们对大小写不敏感。 + +### 空值 + +`NULL`值表示没有数据。`NULL`对大小写不敏感。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/NodeName-In-Path.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/NodeName-In-Path.md new file mode 100644 index 00000000..2b03d056 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/NodeName-In-Path.md @@ -0,0 +1,120 @@ + + +# 路径结点名 + +路径结点名是特殊的标识符,其还可以是通配符 \* 或 \*\*。在创建时间序列时,各层级的路径结点名不能为通配符 \* 或 \*\*。在查询语句中,可以用通配符 \* 或 \*\* 来表示路径结点名,以匹配一层或多层路径。 + +## 通配符 + +`*`在路径中表示一层。例如`root.vehicle.*.sensor1`代表的是以`root.vehicle`为前缀,以`sensor1`为后缀,层次等于 4 层的路径。 + +`**`在路径中表示是(`*`)+,即为一层或多层`*`。例如`root.vehicle.device1.**`代表的是`root.vehicle.device1.*`, `root.vehicle.device1.*.*`, `root.vehicle.device1.*.*.*`等所有以`root.vehicle.device1`为前缀路径的大于等于 4 层的路径;`root.vehicle.**.sensor1`代表的是以`root.vehicle`为前缀,以`sensor1`为后缀,层次大于等于 4 层的路径。 + +由于通配符 * 在查询表达式中也可以表示乘法符号,下述例子用于帮助您区分两种情况: + +```SQL +# 创建时间序列 root.sg.`a*b` +create timeseries root.sg.`a*b` with datatype=FLOAT,encoding=PLAIN; +# 请注意,如标识符部分所述,a*b包含特殊字符,需要用``括起来使用 +# create timeseries root.sg.a*b with datatype=FLOAT,encoding=PLAIN 是错误用法 + +# 创建时间序列 root.sg.a +create timeseries root.sg.a with datatype=FLOAT,encoding=PLAIN; + +# 创建时间序列 root.sg.b +create timeseries root.sg.b with datatype=FLOAT,encoding=PLAIN; + +# 查询时间序列 root.sg.`a*b` +select `a*b` from root.sg +# 其结果集表头为 +|Time|root.sg.a*b| + +# 查询时间序列 root.sg.a 和 root.sg.b的乘积 +select a*b from root.sg +# 其结果集表头为 +|Time|root.sg.a * root.sg.b| +``` + +## 标识符 + +路径结点名不为通配符时,使用方法和标识符一致。**在 SQL 中需要使用反引号引用的路径结点,在结果集中也会用反引号引起。** + +需要使用反引号进行引用的部分特殊情况示例: + +- 创建时间序列时,如下情况需要使用反引号对特殊节点名进行引用: + +```SQL +# 路径结点名中包含特殊字符,时间序列各结点为["root","sg","www.`baidu.com"] +create timeseries root.sg.`www.``baidu.com`.a with datatype=FLOAT,encoding=PLAIN; + +# 路径结点名为实数 +create timeseries root.sg.`111` with datatype=FLOAT,encoding=PLAIN; +``` + +依次执行示例中语句后,执行 show timeseries,结果如下: + +```SQL ++---------------------------+-----+-------------+--------+--------+-----------+----+----------+ +| timeseries|alias|database|dataType|encoding|compression|tags|attributes| ++---------------------------+-----+-------------+--------+--------+-----------+----+----------+ +| root.sg.`111`.a| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| +|root.sg.`www.``baidu.com`.a| null| root.sg| FLOAT| PLAIN| SNAPPY|null| null| ++---------------------------+-----+-------------+--------+--------+-----------+----+----------+ +``` + +- 插入数据时,如下情况需要使用反引号对特殊节点名进行引用: + +```SQL +# 路径结点名中包含特殊字符 +insert into root.sg.`www.``baidu.com`(timestamp, a) values(1, 2); + +# 路径结点名为实数 +insert into root.sg(timestamp, `111`) values (1, 2); +``` + +- 查询数据时,如下情况需要使用反引号对特殊节点名进行引用: + +```SQL +# 路径结点名中包含特殊字符 +select a from root.sg.`www.``baidu.com`; + +# 路径结点名为实数 +select `111` from root.sg +``` + +结果集分别为: + +```SQL +# select a from root.sg.`www.``baidu.com` 结果集 ++-----------------------------+---------------------------+ +| Time|root.sg.`www.``baidu.com`.a| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 2.0| ++-----------------------------+---------------------------+ + +# select `111` from root.sg 结果集 ++-----------------------------+-------------+ +| Time|root.sg.`111`| ++-----------------------------+-------------+ +|1970-01-01T08:00:00.001+08:00| 2.0| ++-----------------------------+-------------+ +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Session-And-TsFile-API.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Session-And-TsFile-API.md new file mode 100644 index 00000000..1e209599 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Syntax-Conventions/Session-And-TsFile-API.md @@ -0,0 +1,119 @@ + + +# Session And TsFile API + +在使用Session、TsFIle API时,如果您调用的方法需要以字符串形式传入物理量(measurement)、设备(device)、数据库(database)、路径(path)等参数,**请保证所传入字符串与使用 SQL 语句时的写法一致**,下面是一些帮助您理解的例子。具体代码示例可以参考:`example/session/src/main/java/org/apache/iotdb/SyntaxConventionRelatedExample.java` + +1. 以创建时间序列 createTimeseries 为例: + +```Java +public void createTimeseries( + String path, + TSDataType dataType, + TSEncoding encoding, + CompressionType compressor) + throws IoTDBConnectionException, StatementExecutionException; +``` + +如果您希望创建时间序列 root.sg.a,root.sg.\`a.\`\`"b\`,root.sg.\`111\`,您使用的 SQL 语句应该如下所示: + +```SQL +create timeseries root.sg.a with datatype=FLOAT,encoding=PLAIN,compressor=SNAPPY; + +# 路径结点名中包含特殊字符,时间序列各结点为["root","sg","a.`\"b"] +create timeseries root.sg.`a.``"b` with datatype=FLOAT,encoding=PLAIN,compressor=SNAPPY; + +# 路径结点名为实数 +create timeseries root.sg.`111` with datatype=FLOAT,encoding=PLAIN,compressor=SNAPPY; +``` + +您在调用 createTimeseries 方法时,应该按照如下方法赋值 path 字符串,保证 path 字符串内容与使用 SQL 时一致: + +```Java +// 时间序列 root.sg.a +String path = "root.sg.a"; + +// 时间序列 root.sg.`a.``"b` +String path = "root.sg.`a.``\"b`"; + +// 时间序列 root.sg.`111` +String path = "root.sg.`111`"; +``` + +2. 以插入数据 insertRecord 为例: + +```Java +public void insertRecord( + String deviceId, + long time, + List measurements, + List types, + Object... values) + throws IoTDBConnectionException, StatementExecutionException; +``` + +如果您希望向时间序列 root.sg.a,root.sg.\`a.\`\`"b\`,root.sg.\`111\`中插入数据,您使用的 SQL 语句应该如下所示: + +```SQL +insert into root.sg(timestamp, a, `a.``"b`, `111`) values (1, 2, 2, 2); +``` + +您在调用 insertRecord 方法时,应该按照如下方法赋值 deviceId 和 measurements: + +```Java +// deviceId 为 root.sg +String deviceId = "root.sg"; + +// measurements +String[] measurements = new String[]{"a", "`a.``\"b`", "`111`"}; +List measurementList = Arrays.asList(measurements); +``` + +3. 以查询数据 executeRawDataQuery 为例: + +```Java +public SessionDataSet executeRawDataQuery( + List paths, + long startTime, + long endTime) + throws StatementExecutionException, IoTDBConnectionException; +``` + +如果您希望查询时间序列 root.sg.a,root.sg.\`a.\`\`"b\`,root.sg.\`111\`的数据,您使用的 SQL 语句应该如下所示: + +```SQL +select a from root.sg + +# 路径结点名中包含特殊字符 +select `a.``"b` from root.sg; + +# 路径结点名为实数 +select `111` from root.sg +``` + +您在调用 executeRawDataQuery 方法时,应该按照如下方法赋值 paths: + +```Java +// paths +String[] paths = new String[]{"root.sg.a", "root.sg.`a.``\"b`", "root.sg.`111`"}; +List pathList = Arrays.asList(paths); +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/TSDB-Comparison.md b/src/zh/UserGuide/V2.0.1/Tree/stage/TSDB-Comparison.md new file mode 100644 index 00000000..4439d102 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/TSDB-Comparison.md @@ -0,0 +1,359 @@ + + +# 时间序列数据库比较 + +## Overview + +![TSDB Comparison](https://alioss.timecho.com/docs/img/github/119833923-182ffc00-bf32-11eb-8b3f-9f95d3729ad2.png) + +**表格外观启发自 [Andriy Zabavskyy: How to Select Time Series DB](https://towardsdatascience.com/how-to-select-time-series-db-123b0eb4ab82)* + +## 1. 已知的时间序列数据库 + +随着时间序列数据变得越来越重要,一些开源的时间序列数据库(Time Series Databases,or TSDB)诞生了。 + +但是,它们中很少有专门为物联网(IoT)或者工业物联网(Industrial IoT,缩写 IIoT)场景开发的。 + +本文把 IoTDB 和下述三种类型的时间序列数据库进行了比较: + +- InfluxDB - 原生时间序列数据库 + + InfluxDB 是最流行的时间序列数据库之一。 + + 接口:InfluxQL and HTTP API + +- OpenTSDB 和 KairosDB - 基于 NoSQL 的时间序列数据库 + + 这两种数据库是相似的,但是 OpenTSDB 基于 HBase 而 KairosDB 基于 Cassandra。 + + 它们两个都提供 RESTful 风格的 API。 + + 接口:Restful API + +- TimescaleDB - 基于关系型数据库的时间序列数据库 + + 接口:SQL + +Prometheus 和 Druid 也因为时间序列数据管理而闻名,但是 Prometheus 聚焦在数据采集、可视化和报警,Druid 聚焦在 OLAP 负载的数据分析,因此本文省略了 Prometheus 和 Druid。 + +## 2. 比较 + +本文将从以下两个角度比较时间序列数据库:功能比较、性能比较。 + +### 2.1 功能比较 + +以下两节分别是时间序列数据库的基础功能比较(2.1.1)和高级功能比较(2.1.2)。 + +表格中符号的含义: + +- `++`:强大支持 +- `+`:支持 +- `+-`:支持但欠佳 +- `-`:不支持 +- `?`:未知 + +### 2.1.1 基础功能 + +| TSDB | IoTDB | InfluxDB | OpenTSDB | KairosDB | TimescaleDB | +| ----------------------------- | :---------------------: | :--------: | :--------: | :--------: | :---------: | +| *OpenSource* | **+** | + | + | **+** | + | +| *SQL\-like* | + | + | - | - | **++** | +| *Schema* | Tree\-based, tag\-based | tag\-based | tag\-based | tag\-based | Relational | +| *Writing out\-of\-order data* | + | + | + | + | + | +| *Schema\-less* | + | + | + | + | + | +| *Batch insertion* | + | + | + | + | + | +| *Time range filter* | + | + | + | + | + | +| *Order by time* | **++** | + | - | - | + | +| *Value filter* | + | + | - | - | + | +| *Downsampling* | **++** | + | + | + | + | +| *Fill* | **++** | + | + | - | + | +| *LIMIT* | + | + | + | + | + | +| *SLIMIT* | + | + | - | - | ? | +| *Latest value* | ++ | + | + | - | + | + +具体地: + +- *OpenSource*: + + - IoTDB 使用 Apache License 2.0。 + - InfluxDB 使用 MIT license。但是,**它的集群版本没有开源**。 + - OpenTSDB 使用 LGPL2.1,**和 Apache License 不兼容**。 + - KairosDB 使用 Apache License 2.0。 + - TimescaleDB 使用 Timescale License,对企业来说不是免费的。 + +- *SQL-like*: + + - IoTDB 和 InfluxDB 支持 SQL-like 语言。 + - OpenTSDB 和 KairosDB 只支持 Rest API。IoTDB 也支持 Rest API。 + - TimescaleDB 使用的是和 PostgreSQL 一样的 SQL。 + +- *Schema*: + + - IoTDB:IoTDB 提出了一种 [基于树的 schema](http://iotdb.apache.org/zh/UserGuide/Master/Data-Concept/Data-Model-and-Terminology.html)。这和其它时间序列数据库很不一样。这种 schema 有以下优点: + - 在许多工业场景里,设备管理是有层次的,而不是扁平的。因此我们认为基于树的 schema 比基于 tag-value 的 schema 更好。 + - 在许多现实应用中,tag 的名字是不变的。例如:风力发电机制造商总是用风机所在的国家、所属的风场以及在风场中的 ID 来标识一个风机,因此,一个 4 层高的树(“root.the-country-name.the-farm-name.the-id”)来表示就足矣。你不需要重复告诉 IoTDB”树的第二层是国家名”、“树的第三层是风场名“等等这种信息。 + - 这样的基于路径的时间序列 ID 定义还能够支持灵活的查询,例如:”root.\*.a.b.\*“,其中、*是一个通配符。 + - InfluxDB, KairosDB, OpenTSDB:使用基于 tag-value 的 schema。现在比较流行这种 schema。 + - TimescaleDB 使用关系表。 + +- *Order by time*: + + 对于时间序列数据库来说,Order by time 好像是一个琐碎的功能。但是当我们考虑另一个叫做”align by time“的功能时,事情就变得有趣起来。这就是为什么我们把 OpenTSDB 和 KairosDB 标记为”不支持“。事实上,所有时间序列数据库都支持单条时间序列的按时间戳排序。但是,OpenTSDB 和 KairosDB 不支持多条时间序列的按时间戳排序。 + + 下面考虑一个新的例子:这里有两条时间序列,一条是风场 1 中的风速,一条是风场 1 中的风机 1 产生的电能。如果我们想要研究风速和产生电能之间的关系,我们首先需要知道二者在相同时间戳下的值。也就是说,我们需要按照时间戳对齐这两条时间序列。因此,结果应该是: + + | 时间戳 | 风场 1 中的风速 | 风场 1 中的风机 1 产生的电能 | + | ------ | ------------- | ------------------------ | + | 1 | 5.0 | 13.1 | + | 2 | 6.0 | 13.3 | + | 3 | null | 13.1 | + + 或者: + + | 时间戳 | 时间序列名 | 值 | + | ------ | ------------------------ | ---- | + | 1 | 风场 1 中的风速 | 5.0 | + | 1 | 风场 1 中的风机 1 产生的电能 | 13.1 | + | 2 | 风场 1 中的风速 | 6.0 | + | 2 | 风场 1 中的风机 1 产生的电能 | 13.3 | + | 3 | 风场 1 中的风机 1 产生的电能 | 13.1 | + + 虽然第二个表格没有按照时间戳对齐两条时间序列,但是只需要逐行扫描数据就可以很容易地在客户端实现这个功能。 + + IoTDB 支持第一种表格格式(叫做 align by time),InfluxDB 支持第二种表格格式。 + +- *Downsampling*: + + Downsampling(降采样)用于改变时间序列的粒度,例如:从 10Hz 到 1Hz,或者每天 1 个点。 + + 和其他数据库不同的是,IoTDB 能够实时降采样数据,而其它时间序列数据库在磁盘上序列化降采样数据。 + + 也就是说: + + - IoTDB 支持在任意时间对数据进行即席(ad-hoc)降采样。例如:一条 SQL 返回从 2020-04-27 08:00:00 开始的每 5 分钟采样 1 个点的降采样数据,另一条 SQL 返回从 2020-04-27 08:00:01 开始的每 5 分 10 秒采样 1 个点的降采样数据。 + + (InfluxDB 也支持即席降采样,但是性能似乎并不好。) + + - IoTDB 的降采样不占用磁盘。 + +- *Fill*: + + 有时候我们认为数据是按照某种固定的频率采集的,比如 1Hz(即每秒 1 个点)。但是通常我们会丢失一些数据点,可能由于网络不稳定、机器繁忙、机器宕机等等。在这些场景下,填充这些数据空洞是重要的。数据科学家可以因此避免很多所谓的”dirty work“比如数据清洗。 + + InfluxDB 和 OpenTSDB 只支持在 group by 语句里使用 fill,而 IoTDB 能支持给定一个特定的时间戳的 fill。此外,IoTDB 还支持多种填充策略。 + +- *Slimit*: + + Slimit 是指返回指定数量的 measurements(或者,InfluxDB 中的 fields)。 + + 例如:一个风机有 1000 个测点(风速、电压等等),使用 slimit 和 soffset 可以只返回其中的一部分测点。 + +- *Latest value*: + + 最基础的时间序列应用之一是监视最新数据。因此,返回一条时间序列的最新点是非常重要的查询功能。 + + IoTDB 和 OpenTSDB 使用一个特殊的 SQL 或 API 来支持这个功能,而 InfluxDB 使用聚合函数来支持。 + + IoTDB 提供一个特殊的 SQL 的原因是 IoTDB 专门优化了查询。 + +**结论:** + +通过对基础功能的比较,我们可以发现: + +- OpenTSDB 和 KairosDB 缺少一些重要的查询功能。 +- TimescaleDB 不能被企业免费使用。 +- IoTDB 和 InfluxDB 可以满足时间序列数据管理的大部分需求,同时它俩之间有一些不同之处。 + +### 2.1.2 高级功能 + +| TSDB | IoTDB | InfluxDB | OpenTSDB | KairosDB | TimescaleDB | +| ---------------------------- | :----: | :------: | :------: | :------: |:-----------:| +| *Align by time* | **++** | + | - | - | + | +| *Compression* | **++** | +- | +- | +- | +- | +| *MQTT support* | **++** | + | - | - | +- | +| *Run on Edge-side Device* | **++** | + | - | +- | + | +| *Multi\-instance Sync* | **++** | - | - | - | - | +| *JDBC Driver* | **+** | - | - | - | ++ | +| *Standard SQL* | + | - | - | - | **++** | +| *Spark integration* | **++** | - | - | - | - | +| *Hive integration* | **++** | - | - | - | - | +| *Writing data to NFS (HDFS)* | **++** | - | + | - | - | +| *Flink integration* | **++** | - | - | - | - | + +具体地: + +- *Align by time*:上文已经介绍过,这里不再赘述。 + +- *Compression*: + + - IoTDB 支持许多时间序列编码和压缩方法,比如 RLE, 2DIFF, Gorilla 等等,以及 Snappy 压缩。在 IoTDB 里,你可以根据数据分布选择你想要的编码方法。更多信息参考 [这里](http://iotdb.apache.org/UserGuide/Master/Data-Concept/Encoding.html)。 + - InfluxDB 也支持编码和压缩,但是你不能定义你想要的编码方法,编码只取决于数据类型。更多信息参考 [这里](https://docs.influxdata.com/influxdb/v1.7/concepts/storage_engine/)。 + - OpenTSDB 和 KairosDB 在后端使用 HBase 和 Cassandra,并且没有针对时间序列的特殊编码。 + +- *MQTT protocol support*: + + MQTT protocol 是一个被工业用户广泛知晓的国际标准。只有 IoTDB 和 InfluxDB 支持用户使用 MQTT 客户端来写数据。 + +- *Running on Edge-side Device*: + + 现在,边缘计算变得越来越重要,边缘设备有越来越强大的计算资源。 + + 在边缘侧部署时间序列数据库,对于管理边缘侧数据、服务于边缘计算来说,是有用的。 + + 由于 OpenTSDB 和 KairosDB 依赖另外的数据库,它们的体系结构是臃肿的。特别是很难在边缘侧运行 Hadoop。 + +- *Multi-instance Sync*: + + 现在假设我们在边缘侧有许多时间序列数据库实例,考虑如何把它们的数据上传到数据中心去形成一个数据湖。 + + 一个解决方法是从这些实例读取数据,然后逐点写入到数据中心。 + + IoTDB 提供了另一个选项:把数据文件增量上传到数据中心,然后数据中心可以支持在数据上的服务。 + +- *JDBC driver*: + + 现在只有 IoTDB 支持了 JDBC driver(虽然不是所有接口都实现),这使得 IoTDB 可以整合许多其它的基于 JDBC driver 的软件。 + + +- *Spark and Hive integration*: + + 让大数据分析软件访问数据库中的数据来完成复杂数据分析是非常重要的。 + + IoTDB 支持 Hive-connector 和 Spark-connector 来完成更好的整合。 + +- *Writing data to NFS (HDFS)*: + + Sharing nothing 的体系结构是好的,但是有时候你不得不增加新的服务器,即便你的 CPU 和内存都是空闲的而磁盘已经满了。 + + 此外,如果我们能直接把数据文件存储到 HDFS 中,用 Spark 和其它软件来分析数据将会更加简单,不需要 ETL。 + + - IoTDB 支持往本地或者 HDFS 写数据。IoTDB 还允许用户扩展实现在其它 NFS 上存储数据。 + - InfluxDB 和 KairosDB 只能往本地写数据。 + - OpenTSDB 只能往 HDFS 写数据。 + +**结论:** + +IoTDB 拥有许多其它时间序列数据库不支持的强大功能。 + +## 2.2 性能比较 + +如果你觉得:”如果我只需要基础功能的话,IoTDB 好像和其它的时间序列数据库没有什么不同。“ + +这好像是有道理的。但是如果考虑性能的话,你也许会改变你的想法。 + +### 2.2.1 快速浏览 + +| TSDB | IoTDB | InfluxDB | KairosDB | TimescaleDB | +| -------------------- | :---: | :------: | :------: | :---------: | +| *Scalable Writes* | ++ | + | + | + | +| *Raw Data Query* | ++ | + | + | + | +| *Aggregation Query* | ++ | + | + | + | +| *Downsampling Query* | ++ | + | +- | +- | +| *Latest Query* | ++ | + | +- | + | + +#### 写入性能 + +我们从两个方面来测试写性能:batch size 和 client num。存储组的数量是 10。有 1000 个设备,每个设备有 100 个传感器,也就是说一共有 100K 条时间序列。 + +测试使用的 IoTDB 版本是`v0.11.1`。 + +* 改变 batch size + +10 个客户端并发地写数据。IoTDB 使用 batch insertion API,batch size 从 1ms 到 1min 变化(每次调用 write API 写 N 个数据点)。 + +写入吞吐率(points/second)如下图所示: + +Batch Size with Write Throughput (points/second) + +
Figure 1. Batch Size with Write throughput (points/second) IoTDB v0.11.1
+ +* 改变 client num + +client num 从 1 到 50 变化。IoTDB 使用 batch insertion API,batch size 是 100(每次调用 write API 写 100 个数据点)。 + +写入吞吐率(points/second)如下图所示: + +![Client Num with Write Throughput (points/second) (ms)](https://alioss.timecho.com/docs/img/github/106251411-e5aa1700-624f-11eb-8ca8-00c0627b1e96.png) + +
Figure 3. Client Num with Write Throughput (points/second) IoTDB v0.11.1
+ +#### 查询性能 + +10 个客户端并发地读数据。存储组的数量是 10。有 10 个设备,每个设备有 10 个传感器,也就是说一共有 100 条时间序列。 + +数据类型是* double*,编码类型是* GORILLA*。 + +测试使用的 IoTDB 版本是`v0.11.1`。 + +测试结果如下图所示: + +![Raw data query 1 col](https://alioss.timecho.com/docs/img/github/106251377-daef8200-624f-11eb-9678-b1d5440be2de.png) + +
Figure 4. Raw data query 1 col time cost(ms) IoTDB v0.11.1
+ +![Aggregation query](https://alioss.timecho.com/docs/img/github/106251336-cf03c000-624f-11eb-8395-de5e349f47b5.png) + +
Figure 5. Aggregation query time cost(ms) IoTDB v0.11.1
+ +![Downsampling query](https://alioss.timecho.com/docs/img/github/106251353-d32fdd80-624f-11eb-80c1-fdb4197939fe.png) + +
Figure 6. Downsampling query time cost(ms) IoTDB v0.11.1
+ +![Latest query](https://alioss.timecho.com/docs/img/github/106251369-d7f49180-624f-11eb-9d19-fc7341582b90.png) + +
Figure 7. Latest query time cost(ms) IoTDB v0.11.1
+ +可以看到,IoTDB 的 raw data query、aggregation query、downsampling query、latest query 查询性能表现都超越了其它数据库。 + +### 2.2.2 更多细节 + +我们提供了一个 benchmark 工具,叫做 [IoTDB-benchamrk](https://github.com/thulab/iotdb-benchmark)(你可以用 dev branch 来编译它)。它支持 IoTDB, InfluxDB, KairosDB, TimescaleDB, OpenTSDB。 + +我们有一篇文章关于用这个 benchmark 工具比较这些时间序列数据库:[Benchmarking Time Series Databases with IoTDB-Benchmark for IoT Scenarios](https://arxiv.org/abs/1901.08304)。我们发表这个文章的时候,IoTDB 才刚刚加入 Apache incubator,所以我们在那篇文章里删去了 IoTDB 的性能测试。但是在比较之后,一些结果展示在这里: + +- 对于 InfluxDB,我们把 cache-max-memory-size 和 max-series-perbase 设置成 unlimited(否则它很快就会超时)。 +- 对于 KairosDB,我们把 Cassandra 的 read_repair_chance 设置为 0.1(但是这没有什么影响,因为我们只有一个结点)。 +- 对于 TimescaleDB,我们用 PGTune 工具来优化 PostgreSQL。 + +所有的时间序列数据库运行的机器配置是:Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz, (8 cores 16 threads), 32GB memory, 256G SSD and 10T HDD, OS: Ubuntu 16.04.7 LTS, 64bits. + +所有的客户端运行的机器配置是:Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz,(6 cores 12 threads), 16GB memory, 256G SSD, OS: Ubuntu 16.04.7 LTS, 64bits. + +## 3. 结论 + +从以上所有实验中,我们可以看到 IoTDB 的性能大大优于其他数据库。 + +IoTDB 具有最小的写入延迟。批处理大小越大,IoTDB 的写入吞吐量就越高。这表明 IoTDB 最适合批处理数据写入方案。 + +在高并发方案中,IoTDB 也可以保持吞吐量的稳定增长。 (每秒 1200 万个点可能已达到千兆网卡的限制) + +在原始数据查询中,随着查询范围的扩大,IoTDB 的优势开始显现。因为数据块的粒度更大,列式存储的优势体现出来,所以基于列的压缩和列迭代器都将加速查询。 + +在聚合查询中,我们使用文件层的统计信息并缓存统计信息。因此,多个查询仅需要执行内存计算(不需要遍历原始数据点,也不需要访问磁盘),因此聚合性能优势显而易见。 + +降采样查询场景更加有趣,因为时间分区越来越大,IoTDB 的查询性能逐渐提高。它可能上升了两倍,这对应于 2 个粒度(3 小时和 4.5 天)的预先计算的信息。因此,分别加快了 1 天和 1 周范围内的查询。其他数据库仅上升一次,表明它们只有一个粒度统计。 + +如果您正在为您的 IIoT 应用程序考虑使用 TSDB,那么新的时间序列数据库 Apache IoTDB 是您的最佳选择。 + +发布新版本并完成实验后,我们将更新此页面。 + +我们也欢迎更多的贡献者更正本文,为 IoTDB 做出贡献或复现实验。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Time-Partition.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Time-Partition.md new file mode 100644 index 00000000..f45f60db --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Time-Partition.md @@ -0,0 +1,53 @@ + + +# 时间分区 + +## 主要功能 + +时间分区按照时间分割数据,一个时间分区用于保存某个时间范围内的所有数据。时间分区编号使用自然数表示,0 表示 1970 年 1 月 1 日,每隔 partition_interval 毫秒后加一。数据通过计算 timestamp / partition_interval 得到自己所在的时间分区编号,主要配置项如下所示: + +* time\_partition\_interval + +|名字| time\_partition\_interval | + |:---------------------------------------------------:|:----------------------------------------| +|描述| Database 分区的时间段长度,用户指定的 database 下会使用该时间段进行分区,单位:毫秒 | +|类型| Int64 | +|默认值| 604800000 | +|改后生效方式| 仅允许在第一次启动服务前修改 | + +## 配置示例 + +开启时间分区功能,并设置 partition_interval 为 86400000(一天),则数据的分布情况如下图所示: + +time partition example + +* 插入一条时间戳为 0 的数据,计算 0 / 86400000 = 0,则该数据会被存储到 0 号文件夹下的TsFile中 + +* 插入一条时间戳为 1609459200010 的数据,计算 1609459200010 / 86400000 = 18628,则该数据会被存储到 18628 号文件夹下的TsFile中 + +## 使用建议 + +使用时间分区功能时,建议同时打开 Memtable 的定时刷盘功能,共 6 个相关配置参数(详情见 [timed_flush配置项](../Reference/DataNode-Config-Manual.md))。 + +* enable_timed_flush_unseq_memtable: 是否开启乱序 Memtable 的定时刷盘,默认打开。 + +* enable_timed_flush_seq_memtable: 是否开启顺序 Memtable 的定时刷盘,默认关闭。应当在开启时间分区后打开,定时刷盘非活跃时间分区下的 Memtable,为定时关闭 TsFileProcessor 作准备。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Time-zone.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Time-zone.md new file mode 100644 index 00000000..f6280a63 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Time-zone.md @@ -0,0 +1,90 @@ + + +# 时区 + +客户端连接 IoTDB 服务器时,可以指定该连接所要使用的时区。如果未指定,则**默认以客户端所在的时区作为连接的时区。** + +在 JDBC 和 Session 原生接口连接中均可以设置时区,使用方法如下: + +```java +JDBC: (IoTDBConnection) connection.setTimeZone("+08:00"); + +Session: session.setTimeZone("+08:00"); +``` + +在 CLI 命令行工具中,通过命令手动设置时区的方式为: + +```sql +SET time_zone=+08:00 +``` + +查看当前连接使用的时区的方法如下: + +```java +JDBC: (IoTDBConnection) connection.getTimeZone(); + +Session: session.getTimeZone(); +``` + +CLI 中的方法为: + +```sql +SHOW time_zone +``` + +## 时区使用场景 + +IoTDB 服务器只针对时间戳进行存储和处理,时区只用来与客户端进行交互,具体场景如下: + +1. 将客户端传来的日期格式的字符串转化为相应的时间戳。 + + 例如,执行写入 `insert into root.sg.d1(timestamp, s1) values(2021-07-01T08:00:00.000, 3.14)` + + 则 `2021-07-01T08:00:00.000`将会根据客户端所在的时区转换为相应的时间戳,如果在东八区,则会转化为`1625097600000` ,等价于 0 时区 `2021-07-01T00:00:00.000` 的时间戳值。 + + > Note:同一时刻,不同时区的日期不同,但时间戳相同。 + + + +2. 将服务器返回给客户端结果中包含的时间戳转化为日期格式的字符串。 + + 以上述情况为例,执行查询 `select * from root.sg.d1`,则服务器会返回 (1625097600000, 3.14) 的时间戳值对,如果使用 CLI 命令行客户端,则 `1625097600000` 又会被根据时区转化为日期格式的字符串,如下图所示: + + ``` + +-----------------------------+-------------+ + | Time|root.sg.d1.s1| + +-----------------------------+-------------+ + |2021-07-01T08:00:00.000+08:00| 3.14| + +-----------------------------+-------------+ + ``` + + 而如果在 0 时区的客户端执行查询,则显示结果将是: + + ``` + +-----------------------------+-------------+ + | Time|root.sg.d1.s1| + +-----------------------------+-------------+ + |2021-07-01T00:00:00.000+00:00| 3.14| + +-----------------------------+-------------+ + ``` + + 注意,此时返回的时间戳是相同的,只是不同时区的日期不同。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Configuration-Parameters.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Configuration-Parameters.md new file mode 100644 index 00000000..4e47a6a2 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Configuration-Parameters.md @@ -0,0 +1,29 @@ + + + + +# 配置参数 + +| 配置项 | 含义 | +| ------------------------------------------------- | ---------------------------------------------- | +| *trigger_lib_dir* | 保存触发器 jar 包的目录位置 | +| *stateful\_trigger\_retry\_num\_when\_not\_found* | 有状态触发器触发无法找到触发器实例时的重试次数 | \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Implement-Trigger.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Implement-Trigger.md new file mode 100644 index 00000000..a509e499 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Implement-Trigger.md @@ -0,0 +1,297 @@ + + + + +# 编写触发器 + +## 触发器依赖 + +触发器的逻辑需要您编写 Java 类进行实现。 +在编写触发器逻辑时,需要使用到下面展示的依赖。如果您使用 [Maven](http://search.maven.org/),则可以直接从 [Maven 库](http://search.maven.org/)中搜索到它们。请注意选择和目标服务器版本相同的依赖版本。 + +``` xml + + org.apache.iotdb + iotdb-server + 1.0.0 + provided + +``` + +## 接口说明 + +编写一个触发器需要实现 `org.apache.iotdb.trigger.api.Trigger` 类。 + +```java +import org.apache.iotdb.trigger.api.enums.FailureStrategy; +import org.apache.iotdb.tsfile.write.record.Tablet; + +public interface Trigger { + + /** + * This method is mainly used to validate {@link TriggerAttributes} before calling {@link + * Trigger#onCreate(TriggerAttributes)}. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void validate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when creating a trigger after validation. + * + * @param attributes TriggerAttributes + * @throws Exception e + */ + default void onCreate(TriggerAttributes attributes) throws Exception {} + + /** + * This method will be called when dropping a trigger. + * + * @throws Exception e + */ + default void onDrop() throws Exception {} + + /** + * When restarting a DataNode, Triggers that have been registered will be restored and this method + * will be called during the process of restoring. + * + * @throws Exception e + */ + default void restore() throws Exception {} + + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } + + /** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +} +``` + +该类主要提供了两类编程接口:**生命周期相关接口**和**数据变动侦听相关接口**。该类中所有的接口都不是必须实现的,当您不实现它们时,它们不会对流经的数据操作产生任何响应。您可以根据实际需要,只实现其中若干接口。 + +下面是所有可供用户进行实现的接口的说明。 + +### 生命周期相关接口 + +| 接口定义 | 描述 | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| *default void validate(TriggerAttributes attributes) throws Exception {}* | 用户在使用 `CREATE TRIGGER` 语句创建触发器时,可以指定触发器需要使用的参数,该接口会用于验证参数正确性。 | +| *default void onCreate(TriggerAttributes attributes) throws Exception {}* | 当您使用`CREATE TRIGGER`语句创建触发器后,该接口会被调用一次。在每一个触发器实例的生命周期内,该接口会且仅会被调用一次。该接口主要有如下作用:帮助用户解析 SQL 语句中的自定义属性(使用`TriggerAttributes`)。 可以创建或申请资源,如建立外部链接、打开文件等。 | +| *default void onDrop() throws Exception {}* | 当您使用`DROP TRIGGER`语句删除触发器后,该接口会被调用。在每一个触发器实例的生命周期内,该接口会且仅会被调用一次。该接口主要有如下作用:可以进行资源释放的操作。可以用于持久化触发器计算的结果。 | +| *default void restore() throws Exception {}* | 当重启 DataNode 时,集群会恢复 DataNode 上已经注册的触发器实例,在此过程中会为该 DataNode 上的有状态触发器调用一次该接口。有状态触发器实例所在的 DataNode 宕机后,集群会在另一个可用 DataNode 上恢复该触发器的实例,在此过程中会调用一次该接口。该接口可以用于自定义恢复逻辑。 | + +### 数据变动侦听相关接口 + +#### 侦听接口 + +```java + /** + * @param tablet see {@link Tablet} for detailed information of data structure. Data that is + * inserted will be constructed as a Tablet and you can define process logic with {@link + * Tablet}. + * @return true if successfully fired + * @throws Exception e + */ + default boolean fire(Tablet tablet) throws Exception { + return true; + } +``` + +数据变动时,触发器以 Tablet 作为触发操作的单位。您可以通过 Tablet 获取相应序列的元数据和数据,然后进行相应的触发操作,触发成功则返回值应当为 true。该接口返回 false 或是抛出异常我们均认为触发失败。在触发失败时,我们会根据侦听策略接口进行相应的操作。 + +进行一次 INSERT 操作时,对于其中的每条时间序列,我们会检测是否有侦听该路径模式的触发器,然后将符合同一个触发器所侦听的路径模式的时间序列数据组装成一个新的 Tablet 用于触发器的 fire 接口。可以理解成: + +```java +Map> pathToTriggerListMap => Map +``` + +**请注意,目前我们不对触发器的触发顺序有任何保证。** + +下面是示例: + +假设有三个触发器,触发器的触发时机均为 BEFORE INSERT + +- 触发器 Trigger1 侦听路径模式:root.sg.* +- 触发器 Trigger2 侦听路径模式:root.sg.a +- 触发器 Trigger3 侦听路径模式:root.sg.b + +写入语句: + +```sql +insert into root.sg(time, a, b) values (1, 1, 1); +``` + +序列 root.sg.a 匹配 Trigger1 和 Trigger2,序列 root.sg.b 匹配 Trigger1 和 Trigger3,那么: + +- root.sg.a 和 root.sg.b 的数据会被组装成一个新的 tablet1,在相应的触发时机进行 Trigger1.fire(tablet1) +- root.sg.a 的数据会被组装成一个新的 tablet2,在相应的触发时机进行 Trigger2.fire(tablet2) +- root.sg.b 的数据会被组装成一个新的 tablet3,在相应的触发时机进行 Trigger3.fire(tablet3) + +#### 侦听策略接口 + +在触发器触发失败时,我们会根据侦听策略接口设置的策略进行相应的操作,您可以通过下述接口设置 `org.apache.iotdb.trigger.api.enums.FailureStrategy`,目前有乐观和悲观两种策略: + +- 乐观策略:触发失败的触发器不影响后续触发器的触发,也不影响写入流程,即我们不对触发失败涉及的序列做额外处理,仅打日志记录失败,最后返回用户写入数据成功,但触发部分失败。 +- 悲观策略:失败触发器影响后续所有 Pipeline 的处理,即我们认为该 Trigger 触发失败会导致后续所有触发流程不再进行。如果该触发器的触发时机为 BEFORE INSERT,那么写入也不再进行,直接返回写入失败。 + +```java + /** + * Overrides this method to set the expected FailureStrategy, {@link FailureStrategy#OPTIMISTIC} + * is the default strategy. + * + * @return {@link FailureStrategy} + */ + default FailureStrategy getFailureStrategy() { + return FailureStrategy.OPTIMISTIC; + } +``` + +您可以参考下图辅助理解,其中 Trigger1 配置采用乐观策略,Trigger2 配置采用悲观策略。Trigger1 和 Trigger2 的触发时机是 BEFORE INSERT,Trigger3 和 Trigger4 的触发时机是 AFTER INSERT。 正常执行流程如下: + + + + + + +## 示例 + +如果您使用 [Maven](http://search.maven.org/),可以参考我们编写的示例项目 trigger-example。您可以在 [这里](https://github.com/apache/iotdb/tree/master/example/trigger) 找到它。后续我们会加入更多的示例项目供您参考。 + +下面是其中一个示例项目的代码: + +```java +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.iotdb.trigger; + +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerConfiguration; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerEvent; +import org.apache.iotdb.db.storageengine.trigger.sink.alertmanager.AlertManagerHandler; +import org.apache.iotdb.trigger.api.Trigger; +import org.apache.iotdb.trigger.api.TriggerAttributes; +import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; +import org.apache.iotdb.tsfile.write.record.Tablet; +import org.apache.iotdb.tsfile.write.schema.MeasurementSchema; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.HashMap; +import java.util.List; + +public class ClusterAlertingExample implements Trigger { + private static final Logger LOGGER = LoggerFactory.getLogger(ClusterAlertingExample.class); + + private final AlertManagerHandler alertManagerHandler = new AlertManagerHandler(); + + private final AlertManagerConfiguration alertManagerConfiguration = + new AlertManagerConfiguration("http://127.0.0.1:9093/api/v2/alerts"); + + private String alertname; + + private final HashMap labels = new HashMap<>(); + + private final HashMap annotations = new HashMap<>(); + + @Override + public void onCreate(TriggerAttributes attributes) throws Exception { + alertname = "alert_test"; + + labels.put("series", "root.ln.wf01.wt01.temperature"); + labels.put("value", ""); + labels.put("severity", ""); + + annotations.put("summary", "high temperature"); + annotations.put("description", "{{.alertname}}: {{.series}} is {{.value}}"); + + alertManagerHandler.open(alertManagerConfiguration); + } + + @Override + public void onDrop() throws IOException { + alertManagerHandler.close(); + } + + @Override + public boolean fire(Tablet tablet) throws Exception { + List measurementSchemaList = tablet.getSchemas(); + for (int i = 0, n = measurementSchemaList.size(); i < n; i++) { + if (measurementSchemaList.get(i).getType().equals(TSDataType.DOUBLE)) { + // for example, we only deal with the columns of Double type + double[] values = (double[]) tablet.values[i]; + for (double value : values) { + if (value > 100.0) { + LOGGER.info("trigger value > 100"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "critical"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } else if (value > 50.0) { + LOGGER.info("trigger value > 50"); + labels.put("value", String.valueOf(value)); + labels.put("severity", "warning"); + AlertManagerEvent alertManagerEvent = + new AlertManagerEvent(alertname, labels, annotations); + alertManagerHandler.onEvent(alertManagerEvent); + } + } + } + } + return true; + } +} +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Instructions.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Instructions.md new file mode 100644 index 00000000..4774619e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Instructions.md @@ -0,0 +1,44 @@ + + +# 使用说明 + +触发器提供了一种侦听序列数据变动的机制。配合用户自定义逻辑,可完成告警、数据转发等功能。 + +触发器基于 Java 反射机制实现。用户通过简单实现 Java 接口,即可实现数据侦听。IoTDB 允许用户动态注册、卸载触发器,在注册、卸载期间,无需启停服务器。 + +## 侦听模式 + +IoTDB 的单个触发器可用于侦听符合特定模式的时间序列的数据变动,如时间序列 root.sg.a 上的数据变动,或者符合路径模式 root.**.a 的时间序列上的数据变动。您在注册触发器时可以通过 SQL 语句指定触发器侦听的路径模式。 + +## 触发器类型 + +目前触发器分为两类,您在注册触发器时可以通过 SQL 语句指定类型: + +- 有状态的触发器。该类触发器的执行逻辑可能依赖前后的多条数据,框架会将不同节点写入的数据汇总到同一个触发器实例进行计算,来保留上下文信息,通常用于采样或者统计一段时间的数据聚合信息。集群中只有一个节点持有有状态触发器的实例。 +- 无状态的触发器。触发器的执行逻辑只和当前输入的数据有关,框架无需将不同节点的数据汇总到同一个触发器实例中,通常用于单行数据的计算和异常检测等。集群中每个节点均持有无状态触发器的实例。 + +## 触发时机 + +触发器的触发时机目前有两种,后续会拓展其它触发时机。您在注册触发器时可以通过 SQL 语句指定触发时机: + +- BEFORE INSERT,即在数据持久化之前触发。请注意,目前触发器并不支持数据清洗,不会对要持久化的数据本身进行变动。 +- AFTER INSERT,即在数据持久化之后触发。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Notes.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Notes.md new file mode 100644 index 00000000..153fb16e --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Notes.md @@ -0,0 +1,33 @@ + + + + +# 重要注意事项 + +- 触发器从注册时开始生效,不对已有的历史数据进行处理。**即只有成功注册触发器之后发生的写入请求才会被触发器侦听到。** +- 触发器目前采用**同步触发**,所以编写时需要保证触发器效率,否则可能会大幅影响写入性能。**您需要自己保证触发器内部的并发安全性**。 +- 集群中**不能注册过多触发器**。因为触发器信息全量保存在 ConfigNode 中,并且在所有 DataNode 都有一份该信息的副本。 +- **建议注册触发器时停止写入**。注册触发器并不是一个原子操作,注册触发器时,会出现集群内部分节点已经注册了该触发器,部分节点尚未注册成功的中间状态。为了避免部分节点上的写入请求被触发器侦听到,部分节点上没有被侦听到的情况,我们建议注册触发器时不要执行写入。 +- 触发器将作为进程内程序执行,如果您的触发器编写不慎,内存占用过多,由于 IoTDB 并没有办法监控触发器所使用的内存,所以有 OOM 的风险。 +- 持有有状态触发器实例的节点宕机时,我们会尝试在另外的节点上恢复相应实例,在恢复过程中我们会调用一次触发器类的 restore 接口,您可以在该接口中实现恢复触发器所维护的状态的逻辑。 +- 触发器 JAR 包有大小限制,必须小于 min(`config_node_ratis_log_appender_buffer_size_max`, 2G),其中 `config_node_ratis_log_appender_buffer_size_max` 是一个配置项,具体含义可以参考 IOTDB 配置项说明。 +- **不同的 JAR 包中最好不要有全类名相同但功能实现不一样的类**。例如:触发器 trigger1、trigger2 分别对应资源 trigger1.jar、trigger2.jar。如果两个 JAR 包里都包含一个 `org.apache.iotdb.trigger.example.AlertListener` 类,当 `CREATE TRIGGER` 使用到这个类时,系统会随机加载其中一个 JAR 包中的类,最终导致触发器执行行为不一致以及其他的问题。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Trigger-Management.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Trigger-Management.md new file mode 100644 index 00000000..d0faa3ed --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Trigger/Trigger-Management.md @@ -0,0 +1,152 @@ + + + + +# 管理触发器 + +您可以通过 SQL 语句注册和卸载一个触发器实例,您也可以通过 SQL 语句查询到所有已经注册的触发器。 + +**我们建议您在注册触发器时停止写入。** + +## 注册触发器 + +触发器可以注册在任意路径模式上。被注册有触发器的序列将会被触发器侦听,当序列上有数据变动时,触发器中对应的触发方法将会被调用。 + +注册一个触发器可以按如下流程进行: + +1. 按照编写触发器章节的说明,实现一个完整的 Trigger 类,假定这个类的全类名为 `org.apache.iotdb.trigger.ClusterAlertingExample` +2. 将项目打成 JAR 包。 +3. 使用 SQL 语句注册该触发器。注册过程中会仅只会调用一次触发器的 `validate` 和 `onCreate` 接口,具体请参考编写触发器章节。 + +完整 SQL 语法如下: + +```sql +// Create Trigger +createTrigger + : CREATE triggerType TRIGGER triggerName=identifier triggerEventClause ON pathPattern AS className=STRING_LITERAL uriClause? triggerAttributeClause? + ; + +triggerType + : STATELESS | STATEFUL + ; + +triggerEventClause + : (BEFORE | AFTER) INSERT + ; + +uriClause + : USING URI uri + ; + +uri + : STRING_LITERAL + ; + +triggerAttributeClause + : WITH LR_BRACKET triggerAttribute (COMMA triggerAttribute)* RR_BRACKET + ; + +triggerAttribute + : key=attributeKey operator_eq value=attributeValue + ; +``` + +下面对 SQL 语法进行说明,您可以结合使用说明章节进行理解: + +- triggerName:触发器 ID,该 ID 是全局唯一的,用于区分不同触发器,大小写敏感。 +- triggerType:触发器类型,分为无状态(STATELESS)和有状态(STATEFUL)两类。 +- triggerEventClause:触发时机,目前仅支持写入前(BEFORE INSERT)和写入后(AFTER INSERT)两种。 +- pathPattern:触发器侦听的路径模式,可以包含通配符 * 和 **。 +- className:触发器实现类的类名。 +- uriClause:可选项,当不指定该选项时,我们默认 DBA 已经在各个 DataNode 节点的 trigger_root_dir 目录(配置项,默认为 IOTDB_HOME/ext/trigger)下放置好创建该触发器需要的 JAR 包。当指定该选项时,我们会将该 URI 对应的文件资源下载并分发到各 DataNode 的 trigger_root_dir/install 目录下。 +- triggerAttributeClause:用于指定触发器实例创建时需要设置的参数,SQL 语法中该部分是可选项。 + +下面是一个帮助您理解的 SQL 语句示例: + +```sql +CREATE STATELESS TRIGGER triggerTest +BEFORE INSERT +ON root.sg.** +AS 'org.apache.iotdb.trigger.ClusterAlertingExample' +USING URI 'http://jar/ClusterAlertingExample.jar' +WITH ( + "name" = "trigger", + "limit" = "100" +) +``` + +上述 SQL 语句创建了一个名为 triggerTest 的触发器: + +- 该触发器是无状态的(STATELESS) +- 在写入前触发(BEFORE INSERT) +- 该触发器侦听路径模式为 root.sg.** +- 所编写的触发器类名为 org.apache.iotdb.trigger.ClusterAlertingExample +- JAR 包的 URI 为 http://jar/ClusterAlertingExample.jar +- 创建该触发器实例时会传入 name 和 limit 两个参数。 + +## 卸载触发器 + +可以通过指定触发器 ID 的方式卸载触发器,卸载触发器的过程中会且仅会调用一次触发器的 `onDrop` 接口。 + +卸载触发器的 SQL 语法如下: + +```sql +// Drop Trigger +dropTrigger + : DROP TRIGGER triggerName=identifier +; +``` + +下面是示例语句: + +```sql +DROP TRIGGER triggerTest1 +``` + +上述语句将会卸载 ID 为 triggerTest1 的触发器。 + +## 查询触发器 + +可以通过 SQL 语句查询集群中存在的触发器的信息。SQL 语法如下: + +```sql +SHOW TRIGGERS +``` + +该语句的结果集格式如下: + +| TriggerName | Event | Type | State | PathPattern | ClassName | NodeId | +| ------------ | ---------------------------- | -------------------- | ------------------------------------------- | ----------- | --------------------------------------- | --------------------------------------- | +| triggerTest1 | BEFORE_INSERT / AFTER_INSERT | STATELESS / STATEFUL | INACTIVE / ACTIVE / DROPPING / TRANSFFERING | root.** | org.apache.iotdb.trigger.TriggerExample | ALL(STATELESS) / DATA_NODE_ID(STATEFUL) | + + +## 触发器状态说明 + +在集群中注册以及卸载触发器的过程中,我们维护了触发器的状态,下面是对这些状态的说明: + +| 状态 | 描述 | 是否建议写入进行 | +| ------------ | ------------------------------------------------------------ | ---------------- | +| INACTIVE | 执行 `CREATE TRIGGER` 的中间状态,集群刚在 ConfigNode 上记录该触发器的信息,还未在任何 DataNode 上激活该触发器 | 否 | +| ACTIVE | 执行 `CREATE TRIGGE` 成功后的状态,集群所有 DataNode 上的该触发器都已经可用 | 是 | +| DROPPING | 执行 `DROP TRIGGER` 的中间状态,集群正处在卸载该触发器的过程中 | 否 | +| TRANSFERRING | 集群正在进行该触发器实例位置的迁移 | 否 | + diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/TsFile-Import-Export-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/TsFile-Import-Export-Tool.md new file mode 100644 index 00000000..3e83bafc --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/TsFile-Import-Export-Tool.md @@ -0,0 +1,427 @@ + + +# TsFile 导入导出脚本 + +针对于不同场景,IoTDB 为用户提供多种批量导入数据的操作方式,本章节向大家介绍最为常用的两种方式为 CSV文本形式的导入 和 TsFile文件形式的导入。 + +## TsFile 导入导出脚本 + +TsFile 是在 IoTDB 中使用的时间序列的文件格式,您可以通过CLI等工具直接将存有时间序列的一个或多个 TsFile 文件导入到另外一个正在运行的IoTDB实例中。 +### 介绍 +加载外部 tsfile 文件工具允许用户向正在运行中的 Apache IoTDB 中加载 tsfile 文件。或者您也可以使用脚本的方式将tsfile加载进IoTDB。 + +### 使用 SQL 加载 +用户通过 Cli 工具或 JDBC 向 Apache IoTDB 系统发送指定命令实现文件加载的功能。 + +#### 加载 tsfile 文件 + +加载 tsfile 文件的指令为:`load '' [sglevel=int][verify=true/false][onSuccess=delete/none]` + +该指令有两种用法: + +1. 通过指定文件路径(绝对路径)加载单 tsfile 文件。 + +第一个参数表示待加载的 tsfile 文件的路径。load 命令有三个可选项,分别是 sglevel,值域为整数,verify,值域为 true/false,onSuccess,值域为delete/none。不同选项之间用空格隔开,选项之间无顺序要求。 + +SGLEVEL 选项,当 tsfile 对应的 database 不存在时,用户可以通过 sglevel 参数的值来制定 database 的级别,默认为`iotdb-system.properties`中设置的级别。例如当设置 level 参数为1时表明此 tsfile 中所有时间序列中层级为1的前缀路径是 database,即若存在设备 root.sg.d1.s1,此时 root.sg 被指定为 database。 + +VERIFY 选项表示是否对载入的 tsfile 中的所有时间序列进行元数据检查,默认为 true。开启时,若载入的 tsfile 中的时间序列在当前 iotdb 中也存在,则会比较该时间序列的所有 Measurement 的数据类型是否一致,如果出现不一致将会导致载入失败,关闭该选项会跳过检查,载入更快。 + +ONSUCCESS选项表示对于成功载入的tsfile的处置方式,默认为delete,即tsfile成功加载后将被删除,如果是none表明tsfile成功加载之后依然被保留在源文件夹。 + +若待加载的 tsfile 文件对应的`.resource`文件存在,会被一并加载至 Apache IoTDB 数据文件的目录和引擎中,否则将通过 tsfile 文件重新生成对应的`.resource`文件,即加载的 tsfile 文件所对应的`.resource`文件不是必要的。 + +示例: + +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile'` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' onSuccess=delete` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=true onSuccess=none` +* `load '/Users/Desktop/data/1575028885956-101-0.tsfile' verify=false sglevel=1 onSuccess=delete` + + +2. 通过指定文件夹路径(绝对路径)批量加载文件。 + +第一个参数表示待加载的 tsfile 文件夹的路径。选项意义与加载单个 tsfile 文件相同。 + +示例: + +* `load '/Users/Desktop/data'` +* `load '/Users/Desktop/data' verify=false` +* `load '/Users/Desktop/data' verify=true` +* `load '/Users/Desktop/data' verify=true sglevel=1` +* `load '/Users/Desktop/data' verify=false sglevel=1 onSuccess=delete` + +**注意**,如果`$IOTDB_HOME$/conf/iotdb-system.properties`中`enable_auto_create_schema=true`时会在加载tsfile的时候自动创建tsfile中的元数据,否则不会自动创建。 + +### 使用脚本加载 + +若您在Windows环境中,请运行`$IOTDB_HOME/tools/load-tsfile.bat`,若为Linux或Unix,请运行`load-tsfile.sh` + +```bash +./load-tsfile.bat -f filePath [-h host] [-p port] [-u username] [-pw password] [--sgLevel int] [--verify true/false] [--onSuccess none/delete] +-f 待加载的文件或文件夹路径,必要字段 +-h IoTDB的Host地址,可选,默认127.0.0.1 +-p IoTDB的端口,可选,默认6667 +-u IoTDb登录用户名,可选,默认root +-pw IoTDB登录密码,可选,默认root +--sgLevel 加载TsFile自动创建Database的路径层级,可选,默认值为iotdb-system.properties指定值 +--verify 是否对加载TsFile进行元数据校验,可选,默认为True +--onSuccess 对成功加载的TsFile的处理方法,可选,默认为delete,成功加载之后删除源TsFile,设为none时会 保留源TsFile +``` + +#### 使用范例 + +假定服务器192.168.0.101:6667上运行一个IoTDB实例,想从将本地保存的TsFile备份文件夹D:\IoTDB\data中的所有的TsFile文件都加载进此IoTDB实例。 + +首先移动至`$IOTDB_HOME/tools/`,打开命令行,然后执行 + +```bash +./load-tsfile.bat -f D:\IoTDB\data -h 192.168.0.101 -p 6667 -u root -pw root +``` + +等待脚本执行完成之后,可以检查IoTDB实例中数据已经被正确加载 + +#### 常见问题 + +- 找不到或无法加载主类 + - 可能是由于未设置环境变量$IOTDB_HOME,请设置环境变量之后重试 +- 提示-f option must be set! + - 输入命令缺少待-f字段(加载文件或文件夹路径),请添加之后重新执行 +- 执行到中途崩溃了想重新加载怎么办 + - 重新执行刚才的命令,重新加载数据不会影响加载之后的正确性 + +## 导出 TsFile + +TsFile 工具可帮您 通过执行指定sql、命令行sql、sql文件的方式将结果集以TsFile文件的格式导出至指定路径. + +### 使用 export-tsfile.sh + +#### 运行方法 + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h -p -u -pw -td [-f -q -s ] + +# Windows +> tools\export-tsfile.bat -h -p -u -pw -td [-f -q -s ] +``` + +参数: +* `-h `: + - IoTDB服务的主机地址。 +* `-p `: + - IoTDB服务的端口号。 +* `-u `: + - IoTDB服务的用户名。 +* `-pw `: + - IoTDB服务的密码。 +* `-td `: + - 为导出的TsFile文件指定输出路径。 +* `-f `: + - 为导出的TsFile文件的文件名,只需写文件名称,不能包含文件路径和后缀。如果sql文件或控制台输入时包含多个sql,会按照sql顺序生成多个TsFile文件。 + - 例如:文件中或命令行共有3个SQL,-f 为"dump",那么会在目标路径下生成 dump0.tsfile、dump1.tsfile、dump2.tsfile三个TsFile文件。 +* `-q `: + - 在命令中直接指定想要执行的查询语句。 + - 例如: `select * from root.** limit 100` +* `-s `: + - 指定一个SQL文件,里面包含一条或多条SQL语句。如果一个SQL文件中包含多条SQL语句,SQL语句之间应该用换行符进行分割。每一条SQL语句对应一个输出的TsFile文件。 +* `-t `: + - 指定session查询时的超时时间,单位为ms + + +除此之外,如果你没有使用`-s`和`-q`参数,在导出脚本被启动之后你需要按照程序提示输入查询语句,不同的查询结果会被保存到不同的TsFile文件中。 + +#### 运行示例 + +```shell +# Unix/OS X +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 + +# Windows +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.**" +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt +# Or +> tools/export-tsfile.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile +# Or +> tools/export-tsfile.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s ./sql.txt -f myTsFile -t 10000 +``` + +#### Q&A + +- 建议在导入数据时不要同时执行写入数据命令,这将有可能导致JVM内存不足的情况。 + +## CSV导入导出工具 + +CSV 是以纯文本形式存储表格数据,您可以在CSV文件中写入多条格式化的数据,并批量的将这些数据导入到 IoTDB 中,在导入数据之前,建议在IoTDB中创建好对应的元数据信息。如果忘记创建元数据也不要担心,IoTDB 可以自动将CSV中数据推断为其对应的数据类型,前提是你每一列的数据类型必须唯一。除单个文件外,此工具还支持以文件夹的形式导入多个 CSV 文件,并且支持设置如时间精度等优化参数。 + +### 使用 export-csv.sh + +#### 运行方法 + +```shell +# Unix/OS X +> tools/export-csv.sh -h -p -u -pw -td [-tf -datatype -q -s ] + +# Windows +> tools\export-csv.bat -h -p -u -pw -td [-tf -datatype -q -s ] +``` + +参数: + +* `-datatype`: + - true (默认): 在CSV文件的header中时间序列的后面打印出对应的数据类型。例如:`Time, root.sg1.d1.s1(INT32), root.sg1.d1.s2(INT64)`. + - false: 只在CSV的header中打印出时间序列的名字, `Time, root.sg1.d1.s1 , root.sg1.d1.s2` +* `-q `: + - 在命令中直接指定想要执行的查询语句。 + - 例如: `select * from root.** limit 100`, or `select * from root.** limit 100 align by device` +* `-s `: + - 指定一个SQL文件,里面包含一条或多条SQL语句。如果一个SQL文件中包含多条SQL语句,SQL语句之间应该用换行符进行分割。每一条SQL语句对应一个输出的CSV文件。 +* `-td `: + - 为导出的CSV文件指定输出路径。 +* `-tf `: + - 指定一个你想要得到的时间格式。时间格式必须遵守[ISO 8601](https://calendars.wikia.org/wiki/ISO_8601)标准。如果说你想要以时间戳来保存时间,那就设置为`-tf timestamp`。 + - 例如: `-tf yyyy-MM-dd\ HH:mm:ss` or `-tf timestamp` +* `-linesPerFile `: + - 指定导出的dump文件最大行数,默认值为`10000`。 + - 例如: `-linesPerFile 1` +* `-t `: + - 指定session查询时的超时时间,单位为ms + +除此之外,如果你没有使用`-s`和`-q`参数,在导出脚本被启动之后你需要按照程序提示输入查询语句,不同的查询结果会被保存到不同的CSV文件中。 + +#### 运行示例 + +```shell +# Unix/OS X +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 + +# Windows +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss +# or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -q "select * from root.** align by device" +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt +# Or +> tools/export-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 +# Or +> tools/export-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt -linesPerFile 10 -t 10000 +``` + +#### SQL 文件示例 + +```sql +select * from root.**; +select * from root.** align by device; +``` + +`select * from root.**`的执行结果: + +```sql +Time,root.ln.wf04.wt04.status(BOOLEAN),root.ln.wf03.wt03.hardware(TEXT),root.ln.wf02.wt02.status(BOOLEAN),root.ln.wf02.wt02.hardware(TEXT),root.ln.wf01.wt01.hardware(TEXT),root.ln.wf01.wt01.status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,true,"v1",true,"v1",v1,true +1970-01-01T08:00:00.002+08:00,true,"v1",,,,true +``` + +`select * from root.** align by device`的执行结果: + +```sql +Time,Device,hardware(TEXT),status(BOOLEAN) +1970-01-01T08:00:00.001+08:00,root.ln.wf01.wt01,"v1",true +1970-01-01T08:00:00.002+08:00,root.ln.wf01.wt01,,true +1970-01-01T08:00:00.001+08:00,root.ln.wf02.wt02,"v1",true +1970-01-01T08:00:00.001+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.002+08:00,root.ln.wf03.wt03,"v1", +1970-01-01T08:00:00.001+08:00,root.ln.wf04.wt04,,true +1970-01-01T08:00:00.002+08:00,root.ln.wf04.wt04,,true +``` + +布尔类型的数据用`true`或者`false`来表示,此处没有用双引号括起来。文本数据需要使用双引号括起来。 + +#### 注意 + +注意,如果导出字段存在如下特殊字符: + +1. `,`: 导出程序会在`,`字符前加`\`来进行转义。 + +### 使用 import-csv.sh + +#### 创建元数据 (可选) + +```sql +CREATE DATABASE root.fit.d1; +CREATE DATABASE root.fit.d2; +CREATE DATABASE root.fit.p; +CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; +CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; +CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; +``` + +IoTDB 具有类型推断的能力,因此在数据导入前创建元数据不是必须的。但我们仍然推荐在使用 CSV 导入工具导入数据前创建元数据,因为这可以避免不必要的类型转换错误。 + +#### 待导入 CSV 文件示例 + +通过时间对齐,并且header中不包含数据类型的数据。 + +```sql +Time,root.test.t1.str,root.test.t2.str,root.test.t2.int +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,"123",, +``` + +通过时间对齐,并且header中包含数据类型的数据。(Text类型数据支持加双引号和不加双引号) + +```sql +Time,root.test.t1.str(TEXT),root.test.t2.str(TEXT),root.test.t2.int(INT32) +1970-01-01T08:00:00.001+08:00,"123hello world","123\,abc",100 +1970-01-01T08:00:00.002+08:00,123,hello world,123 +1970-01-01T08:00:00.003+08:00,"123",, +1970-01-01T08:00:00.004+08:00,123,,12 +``` + +通过设备对齐,并且header中不包含数据类型的数据。 + +```sql +Time,Device,str,int +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +``` + +通过设备对齐,并且header中包含数据类型的数据。(Text类型数据支持加双引号和不加双引号) + +```sql +Time,Device,str(TEXT),int(INT32) +1970-01-01T08:00:00.001+08:00,root.test.t1,"123hello world", +1970-01-01T08:00:00.002+08:00,root.test.t1,"123", +1970-01-01T08:00:00.001+08:00,root.test.t2,"123\,abc",100 +1970-01-01T08:00:00.002+08:00,root.test.t1,hello world,123 +``` + +#### 运行方法 + +```shell +# Unix/OS X +>tools/import-csv.sh -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] [-linesPerFailedFile ] +# Windows +>tools\import-csv.bat -h -p -u -pw -f [-fd <./failedDirectory>] [-aligned ] [-tp ] [-typeInfer ] [-linesPerFailedFile ] +``` + +参数: + +* `-f`: + - 指定你想要导入的数据,这里可以指定文件或者文件夹。如果指定的是文件夹,将会把文件夹中所有的后缀为txt与csv的文件进行批量导入。 + - 例如: `-f filename.csv` + +* `-fd`: + - 指定一个目录来存放保存失败的行的文件,如果你没有指定这个参数,失败的文件将会被保存到源数据的目录中,然后文件名是源文件名加上`.failed`的后缀。 + - 例如: `-fd ./failed/` + +* `-aligned`: + - 是否使用`aligned`接口? 默认参数为`false`。 + - 例如: `-aligned true` + +* `-batch`: + - 用于指定每一批插入的数据的点数。如果程序报了`org.apache.thrift.transport.TTransportException: Frame size larger than protect max size`这个错的话,就可以适当的调低这个参数。 + - 例如: `-batch 100000`,`100000`是默认值。 + +* `-tp`: + - 用于指定时间精度,可选值包括`ms`(毫秒),`ns`(纳秒),`us`(微秒),默认值为`ms`。 + +* `-typeInfer `: + - 用于指定类型推断规则. + - `srcTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`NaN`. + - `dstTsDataType` 包括 `boolean`,`int`,`long`,`float`,`double`,`text`. + - 当`srcTsDataType`为`boolean`, `dstTsDataType`只能为`boolean`或`text`. + - 当`srcTsDataType`为`NaN`, `dstTsDataType`只能为`float`, `double`或`text`. + - 当`srcTsDataType`为数值类型, `dstTsDataType`的精度需要高于`srcTsDataType`. + - 例如:`-typeInfer boolean=text,float=double` + +* `-linesPerFailedFile `: + - 用于指定每个导入失败文件写入数据的行数,默认值为10000。 + - 例如:`-linesPerFailedFile 1` + +#### 运行示例 + +```sh +# Unix/OS X +>tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +>tools/import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.sh -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd ./failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 +# Windows +>tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv +# or +>tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double +# or +> tools\import-csv.bat -h 127.0.0.1 -p 6667 -u root -pw root -f example-filename.csv -fd .\failed -tp ns -typeInfer boolean=text,float=double -linesPerFailedFile 10 +``` + +#### 注意 + +注意,在导入数据前,需要特殊处理下列的字符: + +1. `,` :如果text类型的字段中包含`,`那么需要用`\`来进行转义。 +2. 你可以导入像`yyyy-MM-dd'T'HH:mm:ss`, `yyy-MM-dd HH:mm:ss`, 或者 `yyyy-MM-dd'T'HH:mm:ss.SSSZ`格式的时间。 +3. `Time`这一列应该放在第一列。 \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/WayToGetIoTDB.md b/src/zh/UserGuide/V2.0.1/Tree/stage/WayToGetIoTDB.md new file mode 100644 index 00000000..1a4cf93a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/WayToGetIoTDB.md @@ -0,0 +1,212 @@ + + +# 下载与安装 + +IoTDB 为您提供了两种安装方式,您可以参考下面的建议,任选其中一种: + +第一种,从官网下载安装包。这是我们推荐使用的安装方式,通过该方式,您将得到一个可以立即使用的、打包好的二进制可执行文件。 + +第二种,使用源码编译。若您需要自行修改代码,可以使用该安装方式。 + +## 安装环境要求 + +安装前请保证您的电脑上配有 JDK>=1.8 的运行环境,并配置好 JAVA_HOME 环境变量。 + +如果您需要从源码进行编译,还需要安装: + +1. Maven >= 3.6 的运行环境,具体安装方法可以参考以下链接:[https://maven.apache.org/install.html](https://maven.apache.org/install.html)。 + +> 注: 也可以选择不安装,使用我们提供的'mvnw' 或 'mvnw.cmd' 工具。使用时请用'mvnw' 或 'mvnw.cmd'命令代替下文的'mvn'命令。 + +## 从官网下载二进制可执行文件 + +您可以从 [http://iotdb.apache.org/Download/](http://iotdb.apache.org/Download/) 上下载已经编译好的可执行程序 iotdb-xxx.zip,该压缩包包含了 IoTDB 系统运行所需的所有必要组件。 + +下载后,您可使用以下操作对 IoTDB 的压缩包进行解压: + +``` +Shell > unzip iotdb-.zip +``` + +## 使用源码编译 + +您可以获取已发布的源码 [https://iotdb.apache.org/Download/](https://iotdb.apache.org/Download/) ,或者从 [https://github.com/apache/iotdb/tree/master](https://github.com/apache/iotdb/tree/master) git 仓库获取 + +源码克隆后,进入到源码文件夹目录下。如果您想编译已经发布过的版本,可以先用`git checkout -b my_{project.version} v{project.version}`命令新建并切换分支。比如您要编译0.12.4这个版本,您可以用如下命令去切换分支: + +```shell +> git checkout -b my_0.12.4 v0.12.4 +``` + +切换分支之后就可以使用以下命令进行编译: + +``` +> mvn clean package -pl iotdb-core/datanode -am -Dmaven.test.skip=true +``` + +编译后,IoTDB 服务器会在 "server/target/iotdb-server-{project.version}" 文件夹下,包含以下内容: + +``` ++- sbin/ <-- script files +| ++- conf/ <-- configuration files +| ++- lib/ <-- project dependencies +| ++- tools/ <-- system tools +``` + +如果您想要编译项目中的某个模块,您可以在源码文件夹中使用`mvn clean package -pl {module.name} -am -DskipTests`命令进行编译。如果您需要的是带依赖的 jar 包,您可以在编译命令后面加上`-P get-jar-with-dependencies`参数。比如您想编译带依赖的 jdbc jar 包,您就可以使用以下命令进行编译: + +```shell +> mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies +``` + +编译完成后就可以在`{module.name}/target`目录中找到需要的包了。 + + +## 通过 Docker 安装 + +Apache IoTDB 的 Docker 镜像已经上传至 [https://hub.docker.com/r/apache/iotdb](https://hub.docker.com/r/apache/iotdb)。 +Apache IoTDB 的配置项以环境变量形式添加到容器内。 + +### 简单尝试 +```shell +# 获取镜像 +docker pull apache/iotdb:1.1.0-standalone +# 创建 docker bridge 网络 +docker network create --driver=bridge --subnet=172.18.0.0/16 --gateway=172.18.0.1 iotdb +# 创建 docker 容器 +# 注意:必须固定IP部署。IP改变会导致 confignode 启动失败。 +docker run -d --name iotdb-service \ + --hostname iotdb-service \ + --network iotdb \ + --ip 172.18.0.6 \ + -p 6667:6667 \ + -e cn_internal_address=iotdb-service \ + -e cn_seed_config_node=iotdb-service:10710 \ + -e cn_internal_port=10710 \ + -e cn_consensus_port=10720 \ + -e dn_rpc_address=iotdb-service \ + -e dn_internal_address=iotdb-service \ + -e dn_seed_config_node=iotdb-service:10710 \ + -e dn_mpp_data_exchange_port=10740 \ + -e dn_schema_region_consensus_port=10750 \ + -e dn_data_region_consensus_port=10760 \ + -e dn_rpc_port=6667 \ + apache/iotdb:1.1.0-standalone +# 尝试使用命令行执行SQL +docker exec -ti iotdb-service /iotdb/sbin/start-cli.sh -h iotdb-service +``` +外部连接: +```shell +# <主机IP/hostname> 是物理机的真实IP或域名。如果在同一台物理机,可以是127.0.0.1。 +$IOTDB_HOME/sbin/start-cli.sh -h <主机IP/hostname> -p 6667 +``` +```yaml +# docker-compose-1c1d.yml +version: "3" +services: + iotdb-service: + image: apache/iotdb:1.1.0-standalone + hostname: iotdb-service + container_name: iotdb-service + ports: + - "6667:6667" + environment: + - cn_internal_address=iotdb-service + - cn_internal_port=10710 + - cn_consensus_port=10720 + - cn_seed_config_node=iotdb-service:10710 + - dn_rpc_address=iotdb-service + - dn_internal_address=iotdb-service + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - dn_seed_config_node=iotdb-service:10710 + volumes: + - ./data:/iotdb/data + - ./logs:/iotdb/logs + networks: + iotdb: + ipv4_address: 172.18.0.6 + +networks: + iotdb: + external: true +``` +### 集群部署 +目前只支持 host 网络和 overlay 网络,不支持 bridge 网络。overlay 网络参照[1C2D](https://github.com/apache/iotdb/tree/master/docker/src/main/DockerCompose/docker-compose-cluster-1c2d.yml)的写法,host 网络如下。 + +假如有三台物理机,它们的hostname分别是iotdb-1、iotdb-2、iotdb-3。依次启动。 +以 iotdb-2 节点的docker-compose文件为例: +```yaml +version: "3" +services: + iotdb-confignode: + image: apache/iotdb:1.1.0-confignode + container_name: iotdb-confignode + environment: + - cn_internal_address=iotdb-2 + - cn_seed_config_node=iotdb-1:10710 + - schema_replication_factor=3 + - cn_internal_port=10710 + - cn_consensus_port=10720 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - data_replication_factor=3 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/confignode:/iotdb/data + - ./logs/confignode:/iotdb/logs + network_mode: "host" + + iotdb-datanode: + image: apache/iotdb:1.1.0-datanode + container_name: iotdb-datanode + environment: + - dn_rpc_address=iotdb-2 + - dn_internal_address=iotdb-2 + - dn_seed_config_node=iotdb-1:10710 + - data_replication_factor=3 + - dn_rpc_port=6667 + - dn_mpp_data_exchange_port=10740 + - dn_schema_region_consensus_port=10750 + - dn_data_region_consensus_port=10760 + - data_region_consensus_protocol_class=org.apache.iotdb.consensus.iot.IoTConsensus + - schema_replication_factor=3 + - schema_region_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + - config_node_consensus_protocol_class=org.apache.iotdb.consensus.ratis.RatisConsensus + volumes: + - /etc/hosts:/etc/hosts:ro + - ./data/datanode:/iotdb/data/ + - ./logs/datanode:/iotdb/logs/ + network_mode: "host" +``` +注意: +1. `dn_seed_config_node`所有节点配置一样,需要配置第一个启动的节点,这里为`iotdb-1`。 +2. 上面docker-compose文件中,`iotdb-2`需要替换为每个节点的 hostname、域名或者IP地址。 +3. 需要映射`/etc/hosts`,文件内配置了 iotdb-1、iotdb-2、iotdb-3 与IP的映射。或者可以在 docker-compose 文件中增加 `extra_hosts` 配置。 +4. 首次启动时,必须首先启动 `iotdb-1`。 +5. 如果部署失败要重新部署集群,必须将所有节点上的IoTDB服务停止并删除,然后清除`data`和`logs`文件夹后,再启动。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Batch-Load-Tool.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Batch-Load-Tool.md new file mode 100644 index 00000000..ce767979 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Batch-Load-Tool.md @@ -0,0 +1,32 @@ + + +# 批量数据导入 + +针对于不同场景,IoTDB 为用户提供多种批量导入数据的操作方式,本章节向大家介绍最为常用的两种方式为 CSV文本形式的导入 和 TsFile文件形式的导入。 + +## TsFile批量导入 + +TsFile 是在 IoTDB 中使用的时间序列的文件格式,您可以通过CLI等工具直接将存有时间序列的一个或多个 TsFile 文件导入到另外一个正在运行的IoTDB实例中。具体操作方式请参考[TsFile 导入工具](../Maintenance-Tools/Load-Tsfile.md),[TsFile 导出工具](../Maintenance-Tools/TsFile-Load-Export-Tool.md)。 + +## CSV批量导入 + +CSV 是以纯文本形式存储表格数据,您可以在CSV文件中写入多条格式化的数据,并批量的将这些数据导入到 IoTDB 中,在导入数据之前,建议在IoTDB中创建好对应的元数据信息。如果忘记创建元数据也不要担心,IoTDB 可以自动将CSV中数据推断为其对应的数据类型,前提是你每一列的数据类型必须唯一。除单个文件外,此工具还支持以文件夹的形式导入多个 CSV 文件,并且支持设置如时间精度等优化参数。具体操作方式请参考 [CSV 导入导出工具](../Maintenance-Tools/CSV-Tool.md)。 diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/MQTT.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/MQTT.md new file mode 100644 index 00000000..59af6e6a --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/MQTT.md @@ -0,0 +1,24 @@ + + +# MQTT写入 + +参考 [内置 MQTT 服务](../API/Programming-MQTT.md#内置-mqtt-服务) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/REST-API.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/REST-API.md new file mode 100644 index 00000000..15a5edc4 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/REST-API.md @@ -0,0 +1,57 @@ + + +# REST API写入 + +参考 [insertTablet (v1)](../API/RestServiceV1.md#inserttablet) or [insertTablet (v2)](../API/RestServiceV2.md#inserttablet) + +示例如下: +```JSON +{ +      "timestamps": [ +            1, +            2, +            3 +      ], +      "measurements": [ +            "temperature", +            "status" +      ], +      "data_types": [ +            "FLOAT", +            "BOOLEAN" +      ], +      "values": [ +            [ +                  1.1, +                  2.2, +                  3.3 +            ], +            [ +                  false, +                  true, +                  true +            ] +      ], +      "is_aligned": false, +      "device": "root.ln.wf01.wt01" +} +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Session.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Session.md new file mode 100644 index 00000000..693127d8 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Session.md @@ -0,0 +1,37 @@ + + +# 原生接口写入 +原生接口 (Session) 是目前IoTDB使用最广泛的系列接口,包含多种写入接口,适配不同的数据采集场景,性能高效且支持多语言。 + +## 多语言接口写入 +* ### Java + 使用Java接口写入之前,你需要先建立连接,参考 [Java原生接口](../API/Programming-Java-Native-API.md)。 + 之后通过 [ JAVA 数据操作接口(DML)](../API/Programming-Java-Native-API.md#数据写入)写入。 + +* ### Python + 参考 [ Python 数据操作接口(DML)](../API/Programming-Python-Native-API.md#数据写入) + +* ### C++ + 参考 [ C++ 数据操作接口(DML)](../API/Programming-Cpp-Native-API.md) + +* ### Go + 参考 [Go 原生接口](../API/Programming-Go-Native-API.md) \ No newline at end of file diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Write-Data.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Write-Data.md new file mode 100644 index 00000000..4a4eff04 --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Write-Data/Write-Data.md @@ -0,0 +1,112 @@ + + + +# 写入数据 + +IoTDB 为用户提供多种插入实时数据的方式,例如在 [Cli/Shell 工具](../QuickStart/Command-Line-Interface.md) 中直接输入插入数据的 INSERT 语句,或使用 Java API(标准 [Java JDBC](../API/Programming-JDBC.md) 接口)单条或批量执行插入数据的 INSERT 语句。 + +本节主要为您介绍实时数据接入的 INSERT 语句在场景中的实际使用示例,有关 INSERT SQL 语句的详细语法请参见本文 [INSERT 语句](../Reference/SQL-Reference.md) 节。 + +注:写入重复时间戳的数据则原时间戳数据被覆盖,可视为更新数据。 + +## 使用 INSERT 语句 + +使用 INSERT 语句可以向指定的已经创建的一条或多条时间序列中插入数据。对于每一条数据,均由一个时间戳类型的时间戳和一个数值或布尔值、字符串类型的传感器采集值组成。 + +在本节的场景实例下,以其中的两个时间序列`root.ln.wf02.wt02.status`和`root.ln.wf02.wt02.hardware`为例 ,它们的数据类型分别为 BOOLEAN 和 TEXT。 + +单列数据插入示例代码如下: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp,status) values(1,true) +IoTDB > insert into root.ln.wf02.wt02(timestamp,hardware) values(1, 'v1') +``` + +以上示例代码将长整型的 timestamp 以及值为 true 的数据插入到时间序列`root.ln.wf02.wt02.status`中和将长整型的 timestamp 以及值为”v1”的数据插入到时间序列`root.ln.wf02.wt02.hardware`中。执行成功后会返回执行时间,代表数据插入已完成。 + +> 注意:在 IoTDB 中,TEXT 类型的数据单双引号都可以来表示,上面的插入语句是用的是双引号表示 TEXT 类型数据,下面的示例将使用单引号表示 TEXT 类型数据。 + +INSERT 语句还可以支持在同一个时间点下多列数据的插入,同时向 2 时间点插入上述两个时间序列的值,多列数据插入示例代码如下: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) values (2, false, 'v2') +``` + +此外,INSERT 语句支持一次性插入多行数据,同时向 2 个不同时间点插入上述时间序列的值,示例代码如下: + +```sql +IoTDB > insert into root.ln.wf02.wt02(timestamp, status, hardware) VALUES (3, false, 'v3'),(4, true, 'v4') +``` + +插入数据后我们可以使用 SELECT 语句简单查询已插入的数据。 + +```sql +IoTDB > select * from root.ln.wf02.wt02 where time < 5 +``` + +结果如图所示。由查询结果可以看出,单列、多列数据的插入操作正确执行。 + +``` ++-----------------------------+--------------------------+------------------------+ +| Time|root.ln.wf02.wt02.hardware|root.ln.wf02.wt02.status| ++-----------------------------+--------------------------+------------------------+ +|1970-01-01T08:00:00.001+08:00| v1| true| +|1970-01-01T08:00:00.002+08:00| v2| false| +|1970-01-01T08:00:00.003+08:00| v3| false| +|1970-01-01T08:00:00.004+08:00| v4| true| ++-----------------------------+--------------------------+------------------------+ +Total line number = 4 +It costs 0.004s +``` + +此外,我们可以省略 timestamp 列,此时系统将使用当前的系统时间作为该数据点的时间戳,示例代码如下: +```sql +IoTDB > insert into root.ln.wf02.wt02(status, hardware) values (false, 'v2') +``` +**注意:** 当一次插入多行数据时必须指定时间戳。 + +## 向对齐时间序列插入数据 + +向对齐时间序列插入数据只需在SQL中增加`ALIGNED`关键词,其他类似。 + +示例代码如下: + +```sql +IoTDB > create aligned timeseries root.sg1.d1(s1 INT32, s2 DOUBLE) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1) +IoTDB > insert into root.sg1.d1(time, s1, s2) aligned values(2, 2, 2), (3, 3, 3) +IoTDB > select * from root.sg1.d1 +``` + +结果如图所示。由查询结果可以看出,数据的插入操作正确执行。 + +``` ++-----------------------------+--------------+--------------+ +| Time|root.sg1.d1.s1|root.sg1.d1.s2| ++-----------------------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 1| 1.0| +|1970-01-01T08:00:00.002+08:00| 2| 2.0| +|1970-01-01T08:00:00.003+08:00| 3| 3.0| ++-----------------------------+--------------+--------------+ +Total line number = 3 +It costs 0.004s +``` diff --git a/src/zh/UserGuide/V2.0.1/Tree/stage/Writing-Data-on-HDFS.md b/src/zh/UserGuide/V2.0.1/Tree/stage/Writing-Data-on-HDFS.md new file mode 100644 index 00000000..7cb8995f --- /dev/null +++ b/src/zh/UserGuide/V2.0.1/Tree/stage/Writing-Data-on-HDFS.md @@ -0,0 +1,171 @@ + + +# HDFS 集成 + +## 存储共享架构 + +当前,TSFile(包括 TSFile 文件和相关的数据文件)支持存储在本地文件系统和 Hadoop 分布式文件系统(HDFS)。配置使用 HDFS 存储 TSFile 十分容易。 + +## 系统架构 + +当你配置使用 HDFS 存储 TSFile 之后,你的数据文件将会被分布式存储。系统架构如下: + + + +## Config and usage + +如果你希望将 TSFile 存储在 HDFS 上,可以遵循以下步骤: + +首先下载对应版本的源码发布版或者下载 github 仓库 + +使用 maven 打包 server 和 Hadoop 模块:`mvn clean package -pl iotdb-core/datanode,iotdb-connector/hadoop -am -Dmaven.test.skip=true -P get-jar-with-dependencies` + +然后,将 Hadoop 模块的 target jar 包`hadoop-tsfile-X.X.X-jar-with-dependencies.jar`复制到 server 模块的 target lib 文件夹 `.../server/target/iotdb-server-X.X.X/lib`下。 + +编辑`iotdb-system.properties`中的用户配置。相关配置项包括: + +* tsfile\_storage\_fs + +|名字| tsfile\_storage\_fs | +|:---:|:---| +|描述| Tsfile 和相关数据文件的存储文件系统。目前支持 LOCAL(本地文件系统)和 HDFS 两种| +|类型| String | +|默认值|LOCAL | +|改后生效方式|仅允许在第一次启动服务器前修改| + +* core\_site\_path + +|Name| core\_site\_path | +|:---:|:---| +|描述| 在 Tsfile 和相关数据文件存储到 HDFS 的情况下用于配置 core-site.xml 的绝对路径| +|类型| String | +|默认值|/etc/hadoop/conf/core-site.xml | +|改后生效方式|重启服务器生效| + +* hdfs\_site\_path + +|Name| hdfs\_site\_path | +|:---:|:---| +|描述| 在 Tsfile 和相关数据文件存储到 HDFS 的情况下用于配置 hdfs-site.xml 的绝对路径| +|类型| String | +|默认值|/etc/hadoop/conf/hdfs-site.xml | +|改后生效方式|重启服务器生效| + +* hdfs\_ip + +|名字| hdfs\_ip | +|:---:|:---| +|描述| 在 Tsfile 和相关数据文件存储到 HDFS 的情况下用于配置 HDFS 的 IP。**如果配置了多于 1 个 hdfs\_ip,则表明启用了 Hadoop HA**| +|类型| String | +|默认值|localhost | +|改后生效方式|重启服务器生效| + +* hdfs\_port + +|名字| hdfs\_port | +|:---:|:---| +|描述| 在 Tsfile 和相关数据文件存储到 HDFS 的情况下用于配置 HDFS 的端口| +|类型| String | +|默认值|9000 | +|改后生效方式|重启服务器生效| + +* dfs\_nameservices + +|名字| hdfs\_nameservices | +|:---:|:---| +|描述| 在使用 Hadoop HA 的情况下用于配置 HDFS 的 nameservices| +|类型| String | +|默认值|hdfsnamespace | +|改后生效方式|重启服务器生效| + +* dfs\_ha\_namenodes + +|名字| hdfs\_ha\_namenodes | +|:---:|:---| +|描述| 在使用 Hadoop HA 的情况下用于配置 HDFS 的 nameservices 下的 namenodes| +|类型| String | +|默认值|nn1,nn2 | +|改后生效方式|重启服务器生效| + +* dfs\_ha\_automatic\_failover\_enabled + +|名字| dfs\_ha\_automatic\_failover\_enabled | +|:---:|:---| +|描述| 在使用 Hadoop HA 的情况下用于配置是否使用失败自动切换| +|类型| Boolean | +|默认值|true | +|改后生效方式|重启服务器生效| + +* dfs\_client\_failover\_proxy\_provider + +|名字| dfs\_client\_failover\_proxy\_provider | +|:---:|:---| +|描述| 在使用 Hadoop HA 且使用失败自动切换的情况下配置失败自动切换的实现方式| +|类型| String | +|默认值|org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider | +|改后生效方式|重启服务器生效 + +* hdfs\_use\_kerberos + +|名字| hdfs\_use\_kerberos | +|:---:|:---| +|描述| 是否使用 kerberos 验证访问 hdfs| +|类型| String | +|默认值|false | +|改后生效方式|重启服务器生效| + +* kerberos\_keytab\_file_path + +|名字| kerberos\_keytab\_file_path | +|:---:|:---| +|描述| kerberos keytab file 的完整路径| +|类型| String | +|默认值|/path | +|改后生效方式|重启服务器生效| + +* kerberos\_principal + +|名字| kerberos\_principal | +|:---:|:---| +|描述| Kerberos 认证原则| +|类型| String | +|默认值|your principal | +|改后生效方式|重启服务器生效| + +启动 server, Tsfile 将会被存储到 HDFS 上。 + +如果你想要恢复将 TSFile 存储到本地文件系统,只需编辑配置项`tsfile_storage_fs`为`LOCAL`。在这种情况下,如果你已经在 HDFS 上存储了一些数据文件,你需要将它们下载到本地,并移动到你所配置的数据文件文件夹(默认为`../server/target/iotdb-server-X.X.X/data/data`), 或者重新开始你的整个导入数据过程。 + +## 常见问题 + +1. 这个功能支持哪些 Hadoop 版本? + +A: Hadoop 2.x and Hadoop 3.x 均可以支持。 + +2. 当启动服务器或创建时间序列时,我遇到了如下错误: +``` +ERROR org.apache.iotdb.tsfile.fileSystem.fsFactory.HDFSFactory:62 - Failed to get Hadoop file system. Please check your dependency of Hadoop module. +``` + +A: 这表明你没有将 Hadoop 模块的依赖放到 IoTDB server 中。你可以这样解决: +* 使用 Maven 打包 Hadoop 模块:`mvn clean package -pl iotdb-connector/hadoop -am -Dmaven.test.skip=true -P get-jar-with-dependencies` +* 将 Hadoop 模块的 target jar 包`hadoop-tsfile-X.X.X-jar-with-dependencies.jar`复制到 server 模块的 target lib 文件夹 `.../server/target/iotdb-server-X.X.X/lib`下。