From 67cc9710e22939d2cca63627fdf54de82b53cf59 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Fri, 15 Sep 2023 18:58:30 +0800 Subject: [PATCH 01/27] fix English version --- src/UserGuide/V1.2.x/API/Programming-Java-Native-API.md | 2 +- .../V1.2.x/Basic-Concept/Data-Model-and-Terminology.md | 4 ++-- src/UserGuide/V1.2.x/Basic-Concept/Data-Type.md | 2 +- src/UserGuide/V1.2.x/QuickStart/QuickStart.md | 4 ++-- src/UserGuide/V1.2.x/User-Manual/Authority-Management.md | 2 +- src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md | 4 ++-- src/UserGuide/V1.2.x/User-Manual/Query-Data.md | 4 ++-- src/UserGuide/V1.2.x/User-Manual/Security-Management.md | 4 ++-- .../V1.2.x/User-Manual/Security-Management_timecho.md | 4 ++-- src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md | 6 +++--- 10 files changed, 18 insertions(+), 18 deletions(-) diff --git a/src/UserGuide/V1.2.x/API/Programming-Java-Native-API.md b/src/UserGuide/V1.2.x/API/Programming-Java-Native-API.md index 921d3ead..4c1a6bb6 100644 --- a/src/UserGuide/V1.2.x/API/Programming-Java-Native-API.md +++ b/src/UserGuide/V1.2.x/API/Programming-Java-Native-API.md @@ -47,7 +47,7 @@ In root directory: ## Syntax Convention -- **IoTDB-SQL interface:** The input SQL parameter needs to conform to the [syntax conventions](../Syntax-Conventions/Literal-Values.md) and be escaped for JAVA strings. For example, you need to add a backslash before the double-quotes. (That is: after JAVA escaping, it is consistent with the SQL statement executed on the command line.) +- **IoTDB-SQL interface:** The input SQL parameter needs to conform to the [syntax conventions](../User-Manual/Syntax-Rule.md#LiteralValues) and be escaped for JAVA strings. For example, you need to add a backslash before the double-quotes. (That is: after JAVA escaping, it is consistent with the SQL statement executed on the command line.) - **Other interfaces:** - The node names in path or path prefix as parameter: The node names which should be escaped by backticks (`) in the SQL statement, escaping is required here. - Identifiers (such as template names) as parameters: The identifiers which should be escaped by backticks (`) in the SQL statement, and escaping is not required here. diff --git a/src/UserGuide/V1.2.x/Basic-Concept/Data-Model-and-Terminology.md b/src/UserGuide/V1.2.x/Basic-Concept/Data-Model-and-Terminology.md index 7057074c..a49d4b26 100644 --- a/src/UserGuide/V1.2.x/Basic-Concept/Data-Model-and-Terminology.md +++ b/src/UserGuide/V1.2.x/Basic-Concept/Data-Model-and-Terminology.md @@ -85,7 +85,7 @@ The following are the constraints on the `nodeName`: * [ 0-9 a-z A-Z _ ] (letters, numbers, underscore) * ['\u2E80'..'\u9FFF'] (Chinese characters) * In particular, if the system is deployed on a Windows machine, the database layer name will be case-insensitive. For example, creating both `root.ln` and `root.LN` at the same time is not allowed. -* If you want to use special characters in `nodeName`, you can quote it with back quote, detailed information can be found from charpter Syntax-Conventions,click here: [Syntax-Conventions](https://iotdb.apache.org/UserGuide/Master/Syntax-Conventions/Literal-Values.html). +* If you want to use special characters in `nodeName`, you can quote it with back quote, detailed information can be found from charpter Syntax-Conventions,click here: [Syntax-Conventions](../User-Manual/Syntax-Rule.md). ### Path Pattern @@ -140,6 +140,6 @@ In the following chapters of data definition language, data operation language a ## Schema Template -In the actual scenario, many entities collect the same measurements, that is, they have the same measurements name and type. A **schema template** can be declared to define the collectable measurements set. Schema template helps save memory by implementing schema sharing. For detailed description, please refer to [Schema Template doc](./Schema-Template.md). +In the actual scenario, many entities collect the same measurements, that is, they have the same measurements name and type. A **schema template** can be declared to define the collectable measurements set. Schema template helps save memory by implementing schema sharing. For detailed description, please refer to [Schema Template doc](../User-Manual/Operate-Metadata.md#OperateMetadata). In the following chapters of, data definition language, data operation language and Java Native Interface, various operations related to schema template will be introduced one by one. diff --git a/src/UserGuide/V1.2.x/Basic-Concept/Data-Type.md b/src/UserGuide/V1.2.x/Basic-Concept/Data-Type.md index d63d0e5e..e09535c8 100644 --- a/src/UserGuide/V1.2.x/Basic-Concept/Data-Type.md +++ b/src/UserGuide/V1.2.x/Basic-Concept/Data-Type.md @@ -34,7 +34,7 @@ IoTDB supports the following data types: ### Float Precision -The time series of **FLOAT** and **DOUBLE** type can specify (MAX\_POINT\_NUMBER, see [this page](../Reference/SQL-Reference.md) for more information on how to specify), which is the number of digits after the decimal point of the floating point number, if the encoding method is [RLE](Encoding-and-Compression.md) or [TS\_2DIFF](Encoding-and-Compression.md). If MAX\_POINT\_NUMBER is not specified, the system will use [float\_precision](../Reference/DataNode-Config-Manual.md) in the configuration file `iotdb-common.properties`. +The time series of **FLOAT** and **DOUBLE** type can specify (MAX\_POINT\_NUMBER, see [this page](../SQL-Manual/SQL-Manual.md) for more information on how to specify), which is the number of digits after the decimal point of the floating point number, if the encoding method is [RLE](../Basic-Concept/Encoding-and-Compression.md) or [TS\_2DIFF](../Basic-Concept/Encoding-and-Compression.md). If MAX\_POINT\_NUMBER is not specified, the system will use [float\_precision](../Reference/DataNode-Config-Manual.md) in the configuration file `iotdb-common.properties`. ```sql CREATE TIMESERIES root.vehicle.d0.s0 WITH DATATYPE=FLOAT, ENCODING=RLE, 'MAX_POINT_NUMBER'='2'; diff --git a/src/UserGuide/V1.2.x/QuickStart/QuickStart.md b/src/UserGuide/V1.2.x/QuickStart/QuickStart.md index 4d74ab7c..dec3bea1 100644 --- a/src/UserGuide/V1.2.x/QuickStart/QuickStart.md +++ b/src/UserGuide/V1.2.x/QuickStart/QuickStart.md @@ -204,7 +204,7 @@ or IoTDB> exit ``` -For more on what commands are supported by IoTDB SQL, see [SQL Reference](../Reference/SQL-Reference.md). +For more on what commands are supported by IoTDB SQL, see [SQL Reference](../SQL-Manual/SQL-Manual.md). ### Stop IoTDB @@ -230,7 +230,7 @@ ALTER USER SET PASSWORD ; Example: IoTDB > ALTER USER root SET PASSWORD 'newpwd'; ``` -More about administration management:[Administration Management](https://iotdb.apache.org/UserGuide/V1.0.x/Administration-Management/Administration.html) +More about administration management:[Administration Management](../User-Manual/Security-Management.md) ## Basic configuration diff --git a/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md b/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md index 274181b7..842e6dac 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md +++ b/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md @@ -23,7 +23,7 @@ IoTDB provides users with account privilege management operations, so as to ensure data security. -We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../Reference/SQL-Reference.md). +We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../SQL-Manual/SQL-Manual.md). At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. ## Basic Concepts diff --git a/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md b/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md index 7f10fe77..46e39743 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md +++ b/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md @@ -49,7 +49,7 @@ Besides, if deploy on Windows system, the LayerName is case-insensitive, which m ### Show Databases -After creating the database, we can use the [SHOW DATABASES](../Reference/SQL-Reference.md) statement and [SHOW DATABASES \](../Reference/SQL-Reference.md) to view the databases. The SQL statements are as follows: +After creating the database, we can use the [SHOW DATABASES](../SQL-Manual/SQL-Manual.md) statement and [SHOW DATABASES \](../SQL-Manual/SQL-Manual.md) to view the databases. The SQL statements are as follows: ``` IoTDB> SHOW DATABASES @@ -811,7 +811,7 @@ create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=R The `temprature` in the brackets is an alias for the sensor `s1`. So we can use `temprature` to replace `s1` anywhere. -> IoTDB also supports [using AS function](../Reference/SQL-Reference.md#data-management-statement) to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. +> IoTDB also supports [using AS function](.../SQL-Manual/SQL-Manual.md) to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. > Notice that the size of the extra tag and attribute information shouldn't exceed the `tag_attribute_total_size`. diff --git a/src/UserGuide/V1.2.x/User-Manual/Query-Data.md b/src/UserGuide/V1.2.x/User-Manual/Query-Data.md index 61b863f6..9c2dcad0 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Query-Data.md +++ b/src/UserGuide/V1.2.x/User-Manual/Query-Data.md @@ -275,7 +275,7 @@ In IoTDB, there are two ways to execute data query: Data query statements can be used in SQL command-line terminals, JDBC, JAVA / C++ / Python / Go and other native APIs, and RESTful APIs. -- Execute the query statement in the SQL command line terminal: start the SQL command line terminal, and directly enter the query statement to execute, see [SQL command line terminal](../QuickStart/Command-Line-Interface.md). +- Execute the query statement in the SQL command line terminal: start the SQL command line terminal, and directly enter the query statement to execute, see [SQL command line terminal](../Tools-System/CLI.md). - Execute query statements in JDBC, see [JDBC](../API/Programming-JDBC.md) for details. @@ -2880,7 +2880,7 @@ The user must have the following permissions to execute a query write-back state * All `READ_TIMESERIES` permissions for the source series in the `select` clause. * All `INSERT_TIMESERIES` permissions for the target series in the `into` clause. -For more user permissions related content, please refer to [Account Management Statements](../Administration-Management/Administration.md). +For more user permissions related content, please refer to [Account Management Statements](../User-Manual/Security-Management.md). ### Configurable Properties diff --git a/src/UserGuide/V1.2.x/User-Manual/Security-Management.md b/src/UserGuide/V1.2.x/User-Manual/Security-Management.md index f5df881d..206086cd 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Security-Management.md +++ b/src/UserGuide/V1.2.x/User-Manual/Security-Management.md @@ -25,7 +25,7 @@ IoTDB provides users with account privilege management operations, so as to ensure data security. -We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../Reference/SQL-Reference.md). +We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../SQL-Manual/SQL-Manual.md). At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. ### Basic Concepts @@ -372,7 +372,7 @@ At the same time, changes to roles are immediately reflected on all users who ow | CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | | INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | | ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | -| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](../Query-Data/Overview.md)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | +| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](./Query-Data.md#OVERVIEW)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | | DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | | CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | | DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | diff --git a/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md index 39cc43ed..021baac2 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md @@ -33,7 +33,7 @@ TODO IoTDB provides users with account privilege management operations, so as to ensure data security. -We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../Reference/SQL-Reference.md). +We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../User-Manual/Security-Management_timecho.md). At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. ### Basic Concepts @@ -380,7 +380,7 @@ At the same time, changes to roles are immediately reflected on all users who ow | CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | | INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | | ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | -| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](../Query-Data/Overview.md)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | +| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](./Query-Data.md#OVERVIEW)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | | DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | | CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | | DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | diff --git a/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md b/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md index 7f4fee24..6f19b56d 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md +++ b/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md @@ -23,9 +23,9 @@ # Write & Delete Data ## CLI INSERT -IoTDB provides users with a variety of ways to insert real-time data, such as directly inputting [INSERT SQL statement](../Reference/SQL-Reference.md) in [Client/Shell tools](../QuickStart/Command-Line-Interface.md), or using [Java JDBC](../API/Programming-JDBC.md) to perform single or batch execution of [INSERT SQL statement](../Reference/SQL-Reference.md). +IoTDB provides users with a variety of ways to insert real-time data, such as directly inputting [INSERT SQL statement](../SQL-Manual/SQL-Manual.md) in [Client/Shell tools](../Tools-System/CLI.md), or using [Java JDBC](../API/Programming-JDBC.md) to perform single or batch execution of [INSERT SQL statement](../SQL-Manual/SQL-Manual.md). -NOTE: This section mainly introduces the use of [INSERT SQL statement](../Reference/SQL-Reference.md) for real-time data import in the scenario. +NOTE: This section mainly introduces the use of [INSERT SQL statement](../SQL-Manual/SQL-Manual.md) for real-time data import in the scenario. Writing a repeat timestamp covers the original timestamp data, which can be regarded as updated data. @@ -193,7 +193,7 @@ CSV stores table data in plain text. You can write multiple formatted data into ## DELETE -Users can delete data that meet the deletion condition in the specified timeseries by using the [DELETE statement](../Reference/SQL-Reference.md). When deleting data, users can select one or more timeseries paths, prefix paths, or paths with star to delete data within a certain time interval. +Users can delete data that meet the deletion condition in the specified timeseries by using the [DELETE statement](../SQL-Manual/SQL-Manual.md). When deleting data, users can select one or more timeseries paths, prefix paths, or paths with star to delete data within a certain time interval. In a JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute single or batch UPDATE statements. From 8dfead27cbfa2baccd7d982b3d3fe92a8572024a Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Fri, 15 Sep 2023 19:08:27 +0800 Subject: [PATCH 02/27] fix little wrong --- src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md b/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md index 46e39743..e1b1606f 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md +++ b/src/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md @@ -811,7 +811,7 @@ create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=R The `temprature` in the brackets is an alias for the sensor `s1`. So we can use `temprature` to replace `s1` anywhere. -> IoTDB also supports [using AS function](.../SQL-Manual/SQL-Manual.md) to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. +> IoTDB also supports [using AS function](../SQL-Manual/SQL-Manual.md) to set alias. The difference between the two is: the alias set by the AS function is used to replace the whole time series name, temporary and not bound with the time series; while the alias mentioned above is only used as the alias of the sensor, which is bound with it and can be used equivalent to the original sensor name. > Notice that the size of the extra tag and attribute information shouldn't exceed the `tag_attribute_total_size`. From ca720d0a3617658b0f0689552e6bd4f22e4e5ff1 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Mon, 18 Sep 2023 18:28:32 +0800 Subject: [PATCH 03/27] fix more wrong --- .../Basic-Concept/Encoding-and-Compression.md | 2 +- .../User-Manual/Authority-Management.md | 2 +- .../User-Manual/Database-Programming.md | 20 ++++- .../User-Manual/Operator-and-Expression.md | 68 +++++++-------- .../V1.2.x/User-Manual/Write-Delete-Data.md | 2 +- .../Basic-Concept/Encoding-and-Compression.md | 4 +- .../User-Manual/Database-Programming.md | 16 +++- .../V1.2.x/User-Manual/Operate-Metadata.md | 2 +- .../User-Manual/Operator-and-Expression.md | 86 +++++++++---------- .../V1.2.x/User-Manual/Query-Data.md | 10 +-- 10 files changed, 118 insertions(+), 94 deletions(-) diff --git a/src/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md b/src/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md index 5116b84b..36a5b6e5 100644 --- a/src/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md +++ b/src/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md @@ -116,7 +116,7 @@ IoTDB allows you to specify the compression method of the column when creating a * LZMA2 -The specified syntax for compression is detailed in [Create Timeseries Statement](../Reference/SQL-Reference.md). +The specified syntax for compression is detailed in [Create Timeseries Statement](../SQL-Manual/SQL-Manual.md). ### Compression Ratio Statistics diff --git a/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md b/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md index 842e6dac..1b9a31f0 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md +++ b/src/UserGuide/V1.2.x/User-Manual/Authority-Management.md @@ -376,7 +376,7 @@ At the same time, changes to roles are immediately reflected on all users who ow | CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | | INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | | ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | -| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](../Query-Data/Overview.md)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | +| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](./Query-Data.md#Overview)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | | DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | | CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | | DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | diff --git a/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md b/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md index 08032c00..32af4467 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md +++ b/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md @@ -1620,7 +1620,19 @@ When you have prepared the UDF source code, test cases, and instructions, you ar ### Known Implementations #### Built-in UDF - +See [Built-in Functions](../User-Manual/Operator-and-Expression.md#OPERATORS),containing the following function types: +Aggregate Functions +Mathematical +Comparison +String +Conversion +Constant +Selection +Continuous-Interval +Variation-Trend +Sample +Time-Series + #### Data Quality Function Library ##### About @@ -1649,7 +1661,7 @@ The functions in this function library are not built-in functions, and must be l 4. Copy the script to the directory of IoTDB system (under the root directory, at the same level as `sbin`), modify the parameters in the script if needed and run it to register UDF. ##### Implemented Functions - + ### Q&A diff --git a/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md b/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md index 1dea32ba..40acd006 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md +++ b/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md @@ -21,11 +21,11 @@ # Overview -This chapter describes the operators and functions supported by IoTDB. IoTDB provides a wealth of built-in operators and functions to meet your computing needs, and supports extensions through the [User-Defined Function](./User-Defined-Function.md). +This chapter describes the operators and functions supported by IoTDB. IoTDB provides a wealth of built-in operators and functions to meet your computing needs, and supports extensions through the [User-Defined Function](../User-Manual/Database-Programming.md#USER-DEFINEDFUNCTION (UDF)). A list of all available functions, both built-in and custom, can be displayed with `SHOW FUNCTIONS` command. -See the documentation [Select-Expression](../Query-Data/Select-Expression.md) for the behavior of operators and functions in SQL. +See the documentation [Select-Expression](../User-Manual/Query-Data.md) for the behavior of operators and functions in SQL. ## OPERATORS @@ -40,9 +40,9 @@ See the documentation [Select-Expression](../Query-Data/Select-Expression.md) fo | `%` | modulo | | `+` | addition | | `-` | subtraction | - + ### Comparison Operators | Operator | Meaning | @@ -63,9 +63,9 @@ For details and examples, see the document [Arithmetic Operators and Functions]( | `IS NOT NULL` | is not null | | `IN` / `CONTAINS` | is a value in the specified list | | `NOT IN` / `NOT CONTAINS` | is not a value in the specified list | - + ### Logical Operators | Operator | Meaning | @@ -73,9 +73,9 @@ For details and examples, see the document [Comparison Operators and Functions]( | `NOT` / `!` | logical negation (unary operator) | | `AND` / `&` / `&&` | logical AND | | `OR`/ | / || | logical OR | - + ### Operator Precedence The precedence of operators is arranged as shown below from high to low, and operators on the same row have the same precedence. @@ -114,9 +114,9 @@ The built-in functions can be used in IoTDB without registration, and the functi | COUNT_IF | Find the number of data points that continuously meet a given condition and the number of data points that meet the condition (represented by keep) meet the specified threshold. | BOOLEAN | `[keep >=/>/=/!=/= threshold` if `threshold` is used alone, type of `threshold` is `INT64` `ignoreNull`:Optional, default value is `true`;If the value is `true`, null values are ignored, it means that if there is a null value in the middle, the value is ignored without interrupting the continuity. If the value is `true`, null values are not ignored, it means that if there are null values in the middle, continuity will be broken | INT64 | | TIME_DURATION | Find the difference between the timestamp of the largest non-null value and the timestamp of the smallest non-null value in a column | All data Types | / | INT64 | | MODE | Find the mode. Note: 1.Having too many different values in the input series risks a memory exception; 2.If all the elements have the same number of occurrences, that is no Mode, return the value with earliest time; 3.If there are many Modes, return the Mode with earliest time. | All data Types | / | Consistent with the input data type | - + ### Arithmetic Functions | Function Name | Allowed Input Series Data Types | Output Series Data Type | Required Attributes | Corresponding Implementation in the Java Standard Library | @@ -141,18 +141,18 @@ For details and examples, see the document [Aggregate Functions](../Operators-Fu | LN | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log(double) | | LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#log10(double) | | SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | / | Math#sqrt(double) | - + ### Comparison Functions | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | | ------------- | ------------------------------- | ----------------------------------------- | ----------------------- | --------------------------------------------- | | ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`: a double type variate | BOOLEAN | Return `ts_value >= threshold`. | | IN_RANGR | INT32 / INT64 / FLOAT / DOUBLE | `lower`: DOUBLE type `upper`: DOUBLE type | BOOLEAN | Return `ts_value >= lower && value <= upper`. | - + ### String Processing Functions | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | @@ -170,17 +170,17 @@ For details and examples, see the document [Comparison Operators and Functions]( | LOWER | TEXT | / | TEXT | Get the string of input series with all characters changed to lowercase. | | TRIM | TEXT | / | TEXT | Get the string whose value is same to input series, with all leading and trailing space removed. | | STRCMP | TEXT | / | TEXT | Get the compare result of two input series. Returns `0` if series value are the same, a `negative integer` if value of series1 is smaller than series2,
a `positive integer` if value of series1 is more than series2. | - + ### Data Type Conversion Function | Function Name | Required Attributes | Output Series Data Type | Description | | ------------- | ------------------------------------------------------------ | ----------------------- | ------------------------------------------------------------ | | CAST | `type`: Output data type, INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | determined by `type` | Convert the data to the type specified by the `type` parameter. | - + ### Constant Timeseries Generating Functions | Function Name | Required Attributes | Output Series Data Type | Description | @@ -188,18 +188,18 @@ For details and examples, see the document [Data Type Conversion Function](../Op | CONST | `value`: the value of the output data point `type`: the type of the output data point, it can only be INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | Determined by the required attribute `type` | Output the user-specified constant timeseries according to the attributes `value` and `type`. | | PI | None | DOUBLE | Data point value: a `double` value of `π`, the ratio of the circumference of a circle to its diameter, which is equals to `Math.PI` in the *Java Standard Library*. | | E | None | DOUBLE | Data point value: a `double` value of `e`, the base of the natural logarithms, which is equals to `Math.E` in the *Java Standard Library*. | - + ### Selector Functions | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | | ------------- | ------------------------------------- | ------------------------------------------------------------ | ----------------------------- | ------------------------------------------------------------ | | TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the largest values in a time series. | | BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: the maximum number of selected data points, must be greater than 0 and less than or equal to 1000 | Same type as the input series | Returns `k` data points with the smallest values in a time series. | - + ### Continuous Interval Functions | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | @@ -208,9 +208,9 @@ For details and examples, see the document [Selector Functions](../Operators-Fun | NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `0L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and duration times in which the value is always not 0, and the duration time `t` satisfy `t >= min && t <= max`. The unit of `t` is ms | | ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always 0(false). Data points number `n` satisfy `n >= min && n <= max` | | NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:Optional with default value `1L` `max`:Optional with default value `Long.MAX_VALUE` | Long | Return intervals' start times and the number of data points in the interval in which the value is always not 0(false). Data points number `n` satisfy `n >= min && n <= max` | - + ### Variation Trend Calculation Functions | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | @@ -221,9 +221,9 @@ For details and examples, see the document [Continuous Interval Functions](../Op | DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the rate of change of a data point compared to the previous data point, the result is equals to DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | | NON_NEGATIVE_DERIVATIVE | INT32 / INT64 / FLOAT / DOUBLE | / | DOUBLE | Calculates the absolute value of the rate of change of a data point compared to the previous data point, the result is equals to NON_NEGATIVE_DIFFERENCE / TIME_DIFFERENCE. There is no corresponding output for the first data point. | | DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:optional,default is true. If is true, the previous data point is ignored when it is null and continues to find the first non-null value forwardly. If the value is false, previous data point is not ignored when it is null, the result is also null because null is used for subtraction | DOUBLE | Calculates the difference between the value of a data point and the value of the previous data point. There is no corresponding output for the first data point, so output is null | - + ### Sample Functions | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | @@ -233,17 +233,17 @@ For details and examples, see the document [Variation Trend Calculation Function | EQUAL_SIZE_BUCKET_M4_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | `proportion` The value range is `(0, 1]`, the default is `0.1` | INT32 / INT64 / FLOAT / DOUBLE | Returns equal bucket M4 samples that match the sampling ratio | | EQUAL_SIZE_BUCKET_OUTLIER_SAMPLE | INT32 / INT64 / FLOAT / DOUBLE | The value range of `proportion` is `(0, 1]`, the default is `0.1`
The value of `type` is `avg` or `stendis` or `cos` or `prenextdis`, the default is `avg`
The value of `number` should be greater than 0, the default is `3` | INT32 / INT64 / FLOAT / DOUBLE | Returns outlier samples in equal buckets that match the sampling ratio and the number of samples in the bucket | | M4 | INT32 / INT64 / FLOAT / DOUBLE | Different attributes used by the size window and the time window. The size window uses attributes `windowSize` and `slidingStep`. The time window uses attributes `timeInterval`, `slidingStep`, `displayWindowBegin`, and `displayWindowEnd`. More details see below. | INT32 / INT64 / FLOAT / DOUBLE | Returns the `first, last, bottom, top` points in each sliding window. M4 sorts and deduplicates the aggregated points within the window before outputting them. | - + ### Change Points Function | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Description | | ------------- | ------------------------------- | ------------------- | ----------------------------- | ----------------------------------------------------------- | | CHANGE_POINTS | INT32 / INT64 / FLOAT / DOUBLE | / | Same type as the input series | Remove consecutive identical values from an input sequence. | - + ## DATA QUALITY FUNCTION LIBRARY ### About @@ -275,17 +275,17 @@ The functions in this function library are not built-in functions, and must be l | Function Name | Allowed Input Series Data Types | Required Attributes | Output Series Data Type | Series Data Type Description | | ------------- | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | ------------------------------------------------------------ | | JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr` is a lambda expression that supports standard one or multi arguments in the form `x -> {...}` or `(x, y, z) -> {...}`, e.g. `x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | Returns the input time series transformed by a lambda expression | - + ## CONDITIONAL EXPRESSION | Expression Name | Description | | --------------- | -------------------- | | `CASE` | similar to "if else" | - + ## SELECT EXPRESSION The `SELECT` clause specifies the output of the query, consisting of several `selectExpr`. Each `selectExpr` defines one or more columns in the query result. diff --git a/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md b/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md index 6f19b56d..d726434d 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md +++ b/src/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md @@ -31,7 +31,7 @@ Writing a repeat timestamp covers the original timestamp data, which can be rega ### Use of INSERT Statements -The [INSERT SQL statement](../Reference/SQL-Reference.md) statement is used to insert data into one or more specified timeseries created. For each point of data inserted, it consists of a [timestamp](../Basic-Concept/Data-Model-and-Terminology.md) and a sensor acquisition value (see [Data Type](../Basic-Concept/Data-Type.md)). +The [INSERT SQL statement](../SQL-Manual/SQL-Manual.md) statement is used to insert data into one or more specified timeseries created. For each point of data inserted, it consists of a [timestamp](../Basic-Concept/Data-Model-and-Terminology.md) and a sensor acquisition value (see [Data Type](../Basic-Concept/Data-Type.md)). In the scenario of this section, take two timeseries `root.ln.wf02.wt02.status` and `root.ln.wf02.wt02.hardware` as an example, and their data types are BOOLEAN and TEXT, respectively. diff --git a/src/zh/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md b/src/zh/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md index 2fd7db57..5eff25af 100644 --- a/src/zh/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md +++ b/src/zh/UserGuide/V1.2.x/Basic-Concept/Encoding-and-Compression.md @@ -39,7 +39,7 @@ PLAIN 编码,默认的编码方式,即不编码,支持多种数据类型 游程编码,比较适合存储某些数值连续出现的序列,不适合编码大部分情况下前后值不一样的序列数据。 -游程编码也可用于对浮点数进行编码,但在创建时间序列的时候需指定保留小数位数(MAX_POINT_NUMBER,具体指定方式参见本文 [SQL 参考文档](../Reference/SQL-Reference.md))。比较适合存储某些浮点数值连续出现的序列数据,不适合存储对小数点后精度要求较高以及前后波动较大的序列数据。 +游程编码也可用于对浮点数进行编码,但在创建时间序列的时候需指定保留小数位数(MAX_POINT_NUMBER,具体指定方式参见本文 [SQL 参考文档](../SQL-Manual/SQL-Manual.md))。比较适合存储某些浮点数值连续出现的序列数据,不适合存储对小数点后精度要求较高以及前后波动较大的序列数据。 > 游程编码(RLE)和二阶差分编码(TS_2DIFF)对 float 和 double 的编码是有精度限制的,默认保留 2 位小数。推荐使用 GORILLA。 @@ -111,7 +111,7 @@ IoTDB 允许在创建一个时间序列的时候指定该列的压缩方式。 * ZSTD 压缩 * LZMA2 压缩 -压缩方式的指定语法详见本文 [SQL 参考文档](../Reference/SQL-Reference.md)。 +压缩方式的指定语法详见本文 [SQL 参考文档](../SQL-Manual/SQL-Manual.md)。 ### 压缩比统计信息 diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md b/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md index 2fbfe95a..f2b86c0b 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md @@ -1558,7 +1558,19 @@ SHOW FUNCTIONS ### 已知实现的UDF #### 内置UDF - +请参考[内置函数](../User-Manual/Operator-and-Expression.md#内置函数),包含以下函数类型: +聚合函数 +算数函数 +比较函数 +字符串处理函数 +数据类型转换函数 +常序列生成函数 +选择函数 +区间查询函数 +趋势计算函数 +采样函数 +时间序列处理函数 + #### 数据质量函数库 ##### 关于 diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md b/src/zh/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md index f151a76a..8435c32a 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Operate-Metadata.md @@ -791,7 +791,7 @@ create timeseries root.turbine.d1.s1(temprature) with datatype=FLOAT, encoding=R 括号里的`temprature`是`s1`这个传感器的别名。 我们可以在任何用到`s1`的地方,将其用`temprature`代替,这两者是等价的。 -> IoTDB 同时支持在查询语句中 [使用 AS 函数](../Reference/SQL-Reference.md#数据管理语句) 设置别名。二者的区别在于:AS 函数设置的别名用于替代整条时间序列名,且是临时的,不与时间序列绑定;而上文中的别名只作为传感器的别名,与其绑定且可与原传感器名等价使用。 +> IoTDB 同时支持在查询语句中 [使用 AS 函数](../SQL-Manual/SQL-Manual.md) 设置别名。二者的区别在于:AS 函数设置的别名用于替代整条时间序列名,且是临时的,不与时间序列绑定;而上文中的别名只作为传感器的别名,与其绑定且可与原传感器名等价使用。 > 注意:额外的标签和属性信息总的大小不能超过`tag_attribute_total_size`. diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md b/src/zh/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md index 694c30dd..360bf29b 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md @@ -33,9 +33,9 @@ | `%` | modulo | | `+` | addition | | `-` | subtraction | - + ### 比较运算符 | Operator | Meaning | @@ -56,9 +56,9 @@ | `IS NOT NULL` | is not null | | `IN` / `CONTAINS` | is a value in the specified list | | `NOT IN` / `NOT CONTAINS` | is not a value in the specified list | - + ### 逻辑运算符 | Operator | Meaning | @@ -66,9 +66,9 @@ | `NOT` / `!` | logical negation (unary operator) | | `AND` / `&` / `&&` | logical AND | | `OR`/ \| / \|\| | logical OR | - + ### 运算符优先级 运算符的优先级从高到低排列如下,同一行的运算符优先级相同。 @@ -104,9 +104,9 @@ OR, |, || | LAST_VALUE | 求时间戳最大的值。 | 所有类型 | 与输入类型一致 | | MAX_TIME | 求最大时间戳。 | 所有类型 | Timestamp | | MIN_TIME | 求最小时间戳。 | 所有类型 | Timestamp | - + ### 数学函数 | 函数名 | 输入序列类型 | 输出序列类型 | 必要属性参数 | Java 标准库中的对应实现 | @@ -132,18 +132,18 @@ OR, |, || | LOG10 | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#log10(double) | | SQRT | INT32 / INT64 / FLOAT / DOUBLE | DOUBLE | | Math#sqrt(double) | - + ### 比较函数 | 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | |----------|--------------------------------|---------------------------------------|------------|--------------------------------------------------| | ON_OFF | INT32 / INT64 / FLOAT / DOUBLE | `threshold`:DOUBLE | BOOLEAN | 返回`ts_value >= threshold`的bool值 | | IN_RANGE | INT32 / INT64 / FLOAT / DOUBLE | `lower`:DOUBLE
`upper`:DOUBLE | BOOLEAN | 返回`ts_value >= lower && ts_value <= upper`的bool值 | | - + ### 字符串函数 | 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | @@ -161,17 +161,17 @@ OR, |, || | LOWER | TEXT | 无 | TEXT | 将字符串转化为小写 | | TRIM | TEXT | 无 | TEXT | 移除字符串前后的空格 | | STRCMP | TEXT | 无 | TEXT | 用于比较两个输入序列,如果值相同返回 `0` , 序列1的值小于序列2的值返回一个`负数`,序列1的值大于序列2的值返回一个`正数` | - + ### 数据类型转换函数 | 函数名 | 必要的属性参数 | 输出序列类型 | 功能类型 | | ------ | ------------------------------------------------------------ | ------------------------ | ---------------------------------- | | CAST | `type`:输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数`type`决定 | 将数据转换为`type`参数指定的类型。 | - + ### 常序列生成函数 | 函数名 | 必要的属性参数 | 输出序列类型 | 功能描述 | @@ -179,18 +179,18 @@ OR, |, || | CONST | `value`: 输出的数据点的值
`type`: 输出的数据点的类型,只能是 INT32 / INT64 / FLOAT / DOUBLE / BOOLEAN / TEXT | 由输入属性参数 `type` 决定 | 根据输入属性 `value` 和 `type` 输出用户指定的常序列。 | | PI | 无 | DOUBLE | 常序列的值:`π` 的 `double` 值,圆的周长与其直径的比值,即圆周率,等于 *Java标准库* 中的`Math.PI`。 | | E | 无 | DOUBLE | 常序列的值:`e` 的 `double` 值,自然对数的底,它等于 *Java 标准库* 中的 `Math.E`。 | - + ### 选择函数 | 函数名 | 输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能描述 | | -------- | ------------------------------------- | ------------------------------------------------- | ------------------------ | ------------------------------------------------------------ | | TOP_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最大的`k`个数据点。若多于`k`个数据点的值并列最大,则返回时间戳最小的数据点。 | | BOTTOM_K | INT32 / INT64 / FLOAT / DOUBLE / TEXT | `k`: 最多选择的数据点数,必须大于 0 小于等于 1000 | 与输入序列的实际类型一致 | 返回某时间序列中值最小的`k`个数据点。若多于`k`个数据点的值并列最小,则返回时间戳最小的数据点。 | - + ### 区间查询函数 | 函数名 | 输入序列类型 | 属性参数 | 输出序列类型 | 功能描述 | @@ -199,9 +199,9 @@ OR, |, || | NON_ZERO_DURATION | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值0
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与持续时间,持续时间t(单位ms)满足`t >= min && t <= max` | | | ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | | NON_ZERO_COUNT | INT32/ INT64/ FLOAT/ DOUBLE/ BOOLEAN | `min`:可选,默认值1
`max`:可选,默认值`Long.MAX_VALUE` | Long | 返回时间序列连续不为0(false)的开始时间与其后数据点的个数,数据点个数n满足`n >= min && n <= max` | | - + ### 趋势计算函数 | 函数名 | 输入序列类型 | 输出序列类型 | 功能描述 | @@ -216,9 +216,9 @@ OR, |, || | 函数名 | 输入序列类型 | 参数 | 输出序列类型 | 功能描述 | |------|--------------------------------|------------------------------------------------------------------------------------------------------------------------|--------|------------------------------------------------| | DIFF | INT32 / INT64 / FLOAT / DOUBLE | `ignoreNull`:可选,默认为true;为true时,前一个数据点值为null时,忽略该数据点继续向前找到第一个出现的不为null的值;为false时,如果前一个数据点为null,则不忽略,使用null进行相减,结果也为null | DOUBLE | 统计序列中某数据点的值与前一数据点的值的差。第一个数据点没有对应的结果输出,输出值为null | - + ### 采样函数 | 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | @@ -234,9 +234,9 @@ OR, |, || | 函数名 | 输入序列类型 | 参数 | 输出序列类型 | 功能描述 | | ------------- | ------------------------------ | ---- | ------------------------ | -------------------------- | | CHANGE_POINTS | INT32 / INT64 / FLOAT / DOUBLE | / | 与输入序列的实际类型一致 | 去除输入序列中的连续相同值 | - + ## 数据质量函数库 ### 关于 @@ -254,31 +254,31 @@ OR, |, || ### 已经实现的函数 -1. [Data-Quality](../Operators-Functions/Data-Quality.md) 数据质量 -2. [Data-Profiling](../Operators-Functions/Data-Profiling.md) 数据画像 -3. [Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md) 异常检测 -4. [Frequency-Domain](../Operators-Functions/Frequency-Domain.md) 频域分析 -5. [Data-Matching](../Operators-Functions/Data-Matching.md) 数据匹配 -6. [Data-Repairing](../Operators-Functions/Data-Repairing.md) 数据修复 -7. [Series-Discovery](../Operators-Functions/Series-Discovery.md) 序列发现 -8. [Machine-Learning](../Operators-Functions/Machine-Learning.md) 机器学习 +1. [Data-Quality](../Reference/UDF-Libraries.md#数据质量) 数据质量 +2. [Data-Profiling](../Reference/UDF-Libraries.md#数据画像) 数据画像 +3. [Anomaly-Detection](../Reference/UDF-Libraries.md#异常检测) 异常检测 +4. [Frequency-Domain](../Reference/UDF-Libraries.md#频域分析) 频域分析 +5. [Data-Matching](../Reference/UDF-Libraries.md#数据匹配) 数据匹配 +6. [Data-Repairing](../Reference/UDF-Libraries.md#数据修复) 数据修复 +7. [Series-Discovery](../Reference/UDF-Libraries.md#序列发现) 序列发现 +8. [Machine-Learning](../Reference/UDF-Libraries.md#机器学习) 机器学习 ## Lambda 表达式 | 函数名 | 可接收的输入序列类型 | 必要的属性参数 | 输出序列类型 | 功能类型 | | ------ | ----------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------- | ---------------------------------------------- | | JEXL | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | `expr`是一个支持标准的一元或多元参数的lambda表达式,符合`x -> {...}`或`(x, y, z) -> {...}`的格式,例如`x -> {x * 2}`, `(x, y, z) -> {x + y * z}` | INT32 / INT64 / FLOAT / DOUBLE / TEXT / BOOLEAN | 返回将输入的时间序列通过lambda表达式变换的序列 | - + ## 条件表达式 | 表达式名称 | 含义 | |---------------------------|-----------| | `CASE` | 类似if else | - + ## SELECT 表达式 `SELECT` 子句指定查询的输出,由若干个 `selectExpr` 组成。 每个 `selectExpr` 定义了查询结果中的一列或多列。 @@ -315,7 +315,7 @@ select s1 as temperature, s2 as speed from root.ln.wf01.wt01; #### 运算符 -IoTDB 中支持的运算符列表见文档 [运算符和函数](../Operators-Functions/Overview.md)。 +IoTDB 中支持的运算符列表见文档 [运算符和函数](../User-Manual/Operator-and-Expression.md)。 #### 函数 @@ -332,9 +332,9 @@ select s1, count(s1) from root.sg.d1; select sin(s1), count(s1) from root.sg.d1; select s1, count(s1) from root.sg.d1 group by ([10,100),10ms); ``` - + ##### 时间序列生成函数 时间序列生成函数接受若干原始时间序列作为输入,产生一列时间序列输出。与聚合函数不同的是,时间序列生成函数的结果集带有时间戳列。 @@ -343,11 +343,11 @@ IoTDB 支持的聚合函数见文档 [聚合函数](../Operators-Functions/Aggre ###### 内置时间序列生成函数 -IoTDB 中支持的内置函数列表见文档 [运算符和函数](../Operators-Functions/Overview.md)。 +IoTDB 中支持的内置函数列表见文档 [运算符和函数](../User-Manual/Operator-and-Expression.md)。 ###### 自定义时间序列生成函数 -IoTDB 支持通过用户自定义函数(点击查看: [用户自定义函数](../Operators-Functions/User-Defined-Function.md) )能力进行函数功能扩展。 +IoTDB 支持通过用户自定义函数(点击查看: [用户自定义函数](../User-Manual/Database-Programming.md#用户自定义函数) )能力进行函数功能扩展。 #### 嵌套表达式举例 diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Query-Data.md b/src/zh/UserGuide/V1.2.x/User-Manual/Query-Data.md index 54706a53..7ee7ff69 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Query-Data.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Query-Data.md @@ -368,7 +368,7 @@ select s1 as temperature, s2 as speed from root.ln.wf01.wt01; ### 运算符 -IoTDB 中支持的运算符列表见文档 [运算符和函数](../Operators-Functions/Overview.md)。 +IoTDB 中支持的运算符列表见文档 [运算符和函数](../User-Manual/Operator-and-Expression.md)。 ### 函数 @@ -386,7 +386,7 @@ select sin(s1), count(s1) from root.sg.d1; select s1, count(s1) from root.sg.d1 group by ([10,100),10ms); ``` -IoTDB 支持的聚合函数见文档 [聚合函数](../Operators-Functions/Aggregation.md)。 +IoTDB 支持的聚合函数见文档 [聚合函数](../User-Manual/Operator-and-Expression.md#内置函数)。 #### 时间序列生成函数 @@ -396,7 +396,7 @@ IoTDB 支持的聚合函数见文档 [聚合函数](../Operators-Functions/Aggre ##### 内置时间序列生成函数 -IoTDB 中支持的内置函数列表见文档 [运算符和函数](../Operators-Functions/Overview.md)。 +IoTDB 中支持的内置函数列表见文档 [运算符和函数](../User-Manual/Operator-and-Expression.md)。 ##### 自定义时间序列生成函数 @@ -2585,7 +2585,7 @@ It costs 0.012s ### 设备对齐模式下的排序 在设备对齐模式下,默认按照设备名的字典序升序排列,每个设备内部按照时间戳大小升序排列,可以通过 `ORDER BY` 子句调整设备列和时间列的排序优先级。 -详细说明及示例见文档 [结果集排序](./Order-By.md)。 +详细说明及示例见文档 [结果集排序](../User-Manual/Operator-and-Expression.md)。 ## 查询写回(INTO 子句) @@ -2904,7 +2904,7 @@ It costs 0.375s * 所有 `SELECT` 子句中源序列的 `READ_TIMESERIES` 权限。 * 所有 `INTO` 子句中目标序列 `INSERT_TIMESERIES` 权限。 -更多用户权限相关的内容,请参考[权限管理语句](../Administration-Management/Administration.md)。 +更多用户权限相关的内容,请参考[权限管理语句](../User-Manual/Security-Management_timecho.md#权限管理)。 ### 相关配置参数 From c8fb2167ed614f1f6438bb46b422a505aaa40e8b Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Tue, 19 Sep 2023 18:24:50 +0800 Subject: [PATCH 04/27] add udf of English and fix the rest url --- src/.vuepress/sidebar/V1.2.x/en.ts | 1 + src/.vuepress/sidebar_timecho/V1.2.x/en.ts | 1 + .../V1.2.x/Reference/UDF-Libraries.md | 5131 +++++++++++++++++ src/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md | 48 +- .../User-Manual/Database-Programming.md | 2 +- .../User-Manual/Operator-and-Expression.md | 16 +- .../UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md | 48 +- 7 files changed, 5190 insertions(+), 57 deletions(-) create mode 100644 src/UserGuide/V1.2.x/Reference/UDF-Libraries.md diff --git a/src/.vuepress/sidebar/V1.2.x/en.ts b/src/.vuepress/sidebar/V1.2.x/en.ts index cb7ad306..12044e5d 100644 --- a/src/.vuepress/sidebar/V1.2.x/en.ts +++ b/src/.vuepress/sidebar/V1.2.x/en.ts @@ -169,6 +169,7 @@ export const enSidebar = { prefix: 'Reference/', // children: 'structure', children: [ + { text: 'UDF Libraries', link: 'UDF-Libraries' }, { text: 'Common Config Manual', link: 'Common-Config-Manual' }, { text: 'Status Codes', link: 'Status-Codes' }, { text: 'Keywords', link: 'Keywords' }, diff --git a/src/.vuepress/sidebar_timecho/V1.2.x/en.ts b/src/.vuepress/sidebar_timecho/V1.2.x/en.ts index 29cc3459..ab734169 100644 --- a/src/.vuepress/sidebar_timecho/V1.2.x/en.ts +++ b/src/.vuepress/sidebar_timecho/V1.2.x/en.ts @@ -171,6 +171,7 @@ export const enSidebar = { prefix: 'Reference/', // children: 'structure', children: [ + { text: 'UDF Libraries', link: 'UDF-Libraries' }, { text: 'Common Config Manual', link: 'Common-Config-Manual' }, { text: 'ConfigNode Config Manual', link: 'ConfigNode-Config-Manual' }, { text: 'DataNode Config Manual', link: 'DataNode-Config-Manual' }, diff --git a/src/UserGuide/V1.2.x/Reference/UDF-Libraries.md b/src/UserGuide/V1.2.x/Reference/UDF-Libraries.md new file mode 100644 index 00000000..9eadabad --- /dev/null +++ b/src/UserGuide/V1.2.x/Reference/UDF-Libraries.md @@ -0,0 +1,5131 @@ + + +# UDF Libraries + +## Data Quality + +### Completeness + +#### Usage + +This function is used to calculate the completeness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the completeness of each window will be output. + +**Name:** COMPLETENESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. ++ `downtime`: Whether the downtime exception is considered in the calculation of completeness. It is 'true' or 'false' (default). When considering the downtime exception, long-term missing data will be considered as downtime exception without any influence on completeness. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------------+ +| Time|completeness(root.test.d1.s1)| ++-----------------------------+-----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| ++-----------------------------+-----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select completeness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------+ +| Time|completeness(root.test.d1.s1, "window"="15")| ++-----------------------------+--------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.875| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+--------------------------------------------+ +``` + +### Consistency + +#### Usage + +This function is used to calculate the consistency of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the consistency of each window will be output. + +**Name:** CONSISTENCY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|consistency(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+----------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select consistency(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|consistency(root.test.d1.s1, "window"="15")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+-------------------------------------------+ +``` + +### Timeliness + +#### Usage + +This function is used to calculate the timeliness of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the timeliness of each window will be output. + +**Name:** TIMELINESS + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------+ +| Time|timeliness(root.test.d1.s1)| ++-----------------------------+---------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| ++-----------------------------+---------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timeliness(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|timeliness(root.test.d1.s1, "window"="15")| ++-----------------------------+------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.9333333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+------------------------------------------+ +``` + +### Validity + +#### Usage + +This function is used to calculate the Validity of time series. The input series are divided into several continuous and non overlapping windows. The timestamp of the first data point and the Validity of each window will be output. + +**Name:** VALIDITY + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `window`: The size of each window. It is a positive integer or a positive number with an unit. The former is the number of data points in each window. The number of data points in the last window may be less than it. The latter is the time of the window. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, all input data belongs to the same window. + +**Output Series:** Output a single series. The type is DOUBLE. The range of each value is [0,1]. + +**Note:** Only when the number of data points in the window exceeds 10, the calculation will be performed. Otherwise, the window will be ignored and nothing will be output. + +#### Examples + +##### Default Parameters + +With default parameters, this function will regard all input data as the same window. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|validity(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| ++-----------------------------+-------------------------+ +``` + +##### Specific Window Size + +When the window size is given, this function will divide the input data as multiple windows. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| +|2020-01-01T00:00:32.000+08:00| 130.0| +|2020-01-01T00:00:34.000+08:00| 132.0| +|2020-01-01T00:00:36.000+08:00| 134.0| +|2020-01-01T00:00:38.000+08:00| 136.0| +|2020-01-01T00:00:40.000+08:00| 138.0| +|2020-01-01T00:00:42.000+08:00| 140.0| +|2020-01-01T00:00:44.000+08:00| 142.0| +|2020-01-01T00:00:46.000+08:00| 144.0| +|2020-01-01T00:00:48.000+08:00| 146.0| +|2020-01-01T00:00:50.000+08:00| 148.0| +|2020-01-01T00:00:52.000+08:00| 150.0| +|2020-01-01T00:00:54.000+08:00| 152.0| +|2020-01-01T00:00:56.000+08:00| 154.0| +|2020-01-01T00:00:58.000+08:00| 156.0| +|2020-01-01T00:01:00.000+08:00| 158.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select Validity(s1,"window"="15") from root.test.d1 where time <= 2020-01-01 00:01:00 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|validity(root.test.d1.s1, "window"="15")| ++-----------------------------+----------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.8833333333333333| +|2020-01-01T00:00:32.000+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +### Accuracy + +#### Usage + +This function is used to calculate the Accuracy of time series based on master data. + +**Name**: Accuracy + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. ++ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. + +**Output Series**: Output a single value. The type is DOUBLE. The range is [0,1]. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +SQL for query: + +```sql +select Accuracy(t1,t2,t3,m1,m2,m3) from root.test +``` + +Output series: + + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|Accuracy(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+---------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 0.875| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + + +## Data Profiling + +### ACF + +#### Usage + +This function is used to calculate the auto-correlation factor of the input time series, +which equals to cross correlation between the same series. +For more information, please refer to [XCorr](./Data-Matching.md#XCorr) function. + +**Name:** ACF + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. +There are $2N-1$ data points in the series, and the values are interpreted in details in [XCorr](./Data-Matching.md#XCorr) function. + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| null| +|2020-01-01T00:00:03.000+08:00| 3| +|2020-01-01T00:00:04.000+08:00| NaN| +|2020-01-01T00:00:05.000+08:00| 5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select acf(s1) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|acf(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 3.6| +|1970-01-01T08:00:00.004+08:00| 0.0| +|1970-01-01T08:00:00.005+08:00| 7.0| +|1970-01-01T08:00:00.006+08:00| 0.0| +|1970-01-01T08:00:00.007+08:00| 3.6| +|1970-01-01T08:00:00.008+08:00| 0.0| +|1970-01-01T08:00:00.009+08:00| 1.0| ++-----------------------------+--------------------+ +``` + +### Distinct + +#### Usage + +This function returns all unique values in time series. + +**Name:** DISTINCT + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** + ++ The timestamp of the output series is meaningless. The output order is arbitrary. ++ Missing points and null points in the input series will be ignored, but `NaN` will not. ++ Case Sensitive. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2020-01-01T08:00:00.001+08:00| Hello| +|2020-01-01T08:00:00.002+08:00| hello| +|2020-01-01T08:00:00.003+08:00| Hello| +|2020-01-01T08:00:00.004+08:00| World| +|2020-01-01T08:00:00.005+08:00| World| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select distinct(s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|distinct(root.test.d2.s2)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| World| ++-----------------------------+-------------------------+ +``` + +### Histogram + +#### Usage + +This function is used to calculate the distribution histogram of a single column of numerical data. + +**Name:** HISTOGRAM + +**Input Series:** Only supports a single input sequence, the type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameters:** + ++ `min`: The lower limit of the requested data range, the default value is -Double.MAX_VALUE. ++ `max`: The upper limit of the requested data range, the default value is Double.MAX_VALUE, and the value of start must be less than or equal to end. ++ `count`: The number of buckets of the histogram, the default value is 1. It must be a positive integer. + +**Output Series:** The value of the bucket of the histogram, where the lower bound represented by the i-th bucket (index starts from 1) is $min+ (i-1)\cdot\frac{max-min}{count}$ and the upper bound is $min + i \cdot \frac{max-min}{count}$. + +**Note:** + ++ If the value is lower than `min`, it will be put into the 1st bucket. If the value is larger than `max`, it will be put into the last bucket. ++ Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select histogram(s1,"min"="1","max"="20","count"="10") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------+ +| Time|histogram(root.test.d1.s1, "min"="1", "max"="20", "count"="10")| ++-----------------------------+---------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 2| +|1970-01-01T08:00:00.001+08:00| 2| +|1970-01-01T08:00:00.002+08:00| 2| +|1970-01-01T08:00:00.003+08:00| 2| +|1970-01-01T08:00:00.004+08:00| 2| +|1970-01-01T08:00:00.005+08:00| 2| +|1970-01-01T08:00:00.006+08:00| 2| +|1970-01-01T08:00:00.007+08:00| 2| +|1970-01-01T08:00:00.008+08:00| 2| +|1970-01-01T08:00:00.009+08:00| 2| ++-----------------------------+---------------------------------------------------------------+ +``` + +### Integral + +#### Usage + +This function is used to calculate the integration of time series, +which equals to the area under the curve with time as X-axis and values as Y-axis. + +**Name:** INTEGRAL + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `unit`: The unit of time used when computing the integral. + The value should be chosen from "1S", "1s", "1m", "1H", "1d"(case-sensitive), + and each represents taking one millisecond / second / minute / hour / day as 1.0 while calculating the area and integral. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the integration. + +**Note:** + ++ The integral value equals to the sum of the areas of right-angled trapezoids consisting of each two adjacent points and the time-axis. + Choosing different `unit` implies different scaling of time axis, thus making it apparent to convert the value among those results with constant coefficient. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + +#### Examples + +##### Default Parameters + +With default parameters, this function will take one second as 1.0. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integral(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 57.5| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 57.5$$ + +##### Specific time unit + +With time unit specified as "1m", this function will take one minute as 1.0. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select integral(s1, "unit"="1m") from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+-------------------------+ +| Time|integral(root.test.d1.s1)| ++-----------------------------+-------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.958| ++-----------------------------+-------------------------+ +``` + +Calculation expression: +$$\frac{1}{2\times 60}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] = 0.958$$ + +### IntegralAvg + +#### Usage + +This function is used to calculate the function average of time series. +The output equals to the area divided by the time interval using the same time `unit`. +For more information of the area under the curve, please refer to `Integral` function. + +**Name:** INTEGRALAVG + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the time-weighted average. + +**Note:** + ++ The time-weighted value equals to the integral value with any `unit` divided by the time interval of input series. + The result is irrelevant to the time unit used in integral, and it's consistent with the timestamp precision of IoTDB by default. + ++ `NaN` values in the input series will be ignored. The curve or trapezoids will skip these points and use the next valid point. + ++ If the input series is empty, the output value will be 0.0, but if there is only one data point, the value will equal to the input value. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1| +|2020-01-01T00:00:02.000+08:00| 2| +|2020-01-01T00:00:03.000+08:00| 5| +|2020-01-01T00:00:04.000+08:00| 6| +|2020-01-01T00:00:05.000+08:00| 7| +|2020-01-01T00:00:08.000+08:00| 8| +|2020-01-01T00:00:09.000+08:00| NaN| +|2020-01-01T00:00:10.000+08:00| 10| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select integralavg(s1) from root.test.d1 where time <= 2020-01-01 00:00:10 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|integralavg(root.test.d1.s1)| ++-----------------------------+----------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.75| ++-----------------------------+----------------------------+ +``` + +Calculation expression: +$$\frac{1}{2}[(1+2) \times 1 + (2+5) \times 1 + (5+6) \times 1 + (6+7) \times 1 + (7+8) \times 3 + (8+10) \times 2] / 10 = 5.75$$ + +### Mad + +#### Usage + +The function is used to compute the exact or approximate median absolute deviation (MAD) of a numeric time series. MAD is the median of the deviation of each element from the elements' median. + +Take a dataset $\{1,3,3,5,5,6,7,8,9\}$ as an instance. Its median is 5 and the deviation of each element from the median is $\{0,0,1,2,2,2,3,4,4\}$, whose median is 2. Therefore, the MAD of the original dataset is 2. + +**Name:** MAD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The relative error of the approximate MAD. It should be within [0,1) and the default value is 0. Taking `error`=0.01 as an instance, suppose the exact MAD is $a$ and the approximate MAD is $b$, we have $0.99a \le b \le 1.01a$. With `error`=0, the output is the exact MAD. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the MAD. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +##### Exact Query + +With the default `error`(`error`=0), the function queries the exact MAD. + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select mad(s0) from root.test +``` + +Output series: + +``` ++-----------------------------+------------------+ +| Time| mad(root.test.s0)| ++-----------------------------+------------------+ +|1970-01-01T08:00:00.000+08:00|0.6806197166442871| ++-----------------------------+------------------+ +``` + +##### Approximate Query + +By setting `error` within (0,1), the function queries the approximate MAD. + +SQL for query: + +```sql +select mad(s0, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mad(root.test.s0, "error"="0.01")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.6806616245859518| ++-----------------------------+---------------------------------+ +``` + +### Median + +#### Usage + +The function is used to compute the exact or approximate median of a numeric time series. Median is the value separating the higher half from the lower half of a data sample. + +**Name:** MEDIAN + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `error`: The rank error of the approximate median. It should be within [0,1) and the default value is 0. For instance, a median with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact median. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the median. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select median(s0, "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|median(root.test.s0, "error"="0.01")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.021884560585022| ++-----------------------------+------------------------------------+ +``` + +### MinMax + +#### Usage + +This function is used to standardize the input series with min-max. Minimum value is transformed to 0; maximum value is transformed to 1. + +**Name:** MINMAX + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide minimum and maximum values. The default method is "batch". ++ `min`: The maximum value when method is set to "stream". ++ `max`: The minimum value when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select minmax(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|minmax(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.200+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.300+08:00| 0.25| +|1970-01-01T08:00:00.400+08:00| 0.08333333333333333| +|1970-01-01T08:00:00.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:00.700+08:00| 0.0| +|1970-01-01T08:00:00.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.900+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.100+08:00| 0.25| +|1970-01-01T08:00:01.200+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.300+08:00| 0.08333333333333333| +|1970-01-01T08:00:01.400+08:00| 0.25| +|1970-01-01T08:00:01.500+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.600+08:00| 0.16666666666666666| +|1970-01-01T08:00:01.700+08:00| 1.0| +|1970-01-01T08:00:01.800+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| 0.16666666666666666| ++-----------------------------+--------------------+ +``` + +### Mode + +#### Usage + +This function is used to calculate the mode of time series, that is, the value that occurs most frequently. + +**Name:** MODE + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is the same as which the first mode value has and value is the mode. + +**Note:** + ++ If there are multiple values with the most occurrences, the arbitrary one will be output. ++ Missing points and null points in the input series will be ignored, but `NaN` will not. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| Hello| +|1970-01-01T08:00:00.002+08:00| hello| +|1970-01-01T08:00:00.003+08:00| Hello| +|1970-01-01T08:00:00.004+08:00| World| +|1970-01-01T08:00:00.005+08:00| World| +|1970-01-01T08:00:01.600+08:00| World| +|1970-01-15T09:37:34.451+08:00| Hello| +|1970-01-15T09:37:34.452+08:00| hello| +|1970-01-15T09:37:34.453+08:00| Hello| +|1970-01-15T09:37:34.454+08:00| World| +|1970-01-15T09:37:34.455+08:00| World| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select mode(s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+---------------------+ +| Time|mode(root.test.d2.s2)| ++-----------------------------+---------------------+ +|1970-01-01T08:00:00.004+08:00| World| ++-----------------------------+---------------------+ +``` + +### MvAvg + +#### Usage + +This function is used to calculate moving average of input series. + +**Name:** MVAVG + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `window`: Length of the moving window. Default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select mvavg(s1, "window"="3") from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +| Time|mvavg(root.test.s1, "window"="3")| ++-----------------------------+---------------------------------+ +|1970-01-01T08:00:00.300+08:00| 0.3333333333333333| +|1970-01-01T08:00:00.400+08:00| 0.0| +|1970-01-01T08:00:00.500+08:00| -0.3333333333333333| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -0.6666666666666666| +|1970-01-01T08:00:00.800+08:00| 0.0| +|1970-01-01T08:00:00.900+08:00| 0.6666666666666666| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.200+08:00| 0.0| +|1970-01-01T08:00:01.300+08:00| -0.6666666666666666| +|1970-01-01T08:00:01.400+08:00| 0.0| +|1970-01-01T08:00:01.500+08:00| 0.3333333333333333| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 3.3333333333333335| +|1970-01-01T08:00:01.800+08:00| 4.0| +|1970-01-01T08:00:01.900+08:00| 0.0| +|1970-01-01T08:00:02.000+08:00| -0.6666666666666666| ++-----------------------------+---------------------------------+ +``` + +### PACF + +#### Usage + +This function is used to calculate partial autocorrelation of input series by solving Yule-Walker equation. For some cases, the equation may not be solved, and NaN will be output. + +**Name:** PACF + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lag`: Maximum lag of pacf to calculate. The default value is $\min(10\log_{10}n,n-1)$, where $n$ is the number of data points. + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Assigning maximum lag + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2019-12-27T00:00:00.000+08:00| 5.0| +|2019-12-27T00:05:00.000+08:00| 5.0| +|2019-12-27T00:10:00.000+08:00| 5.0| +|2019-12-27T00:15:00.000+08:00| 5.0| +|2019-12-27T00:20:00.000+08:00| 6.0| +|2019-12-27T00:25:00.000+08:00| 5.0| +|2019-12-27T00:30:00.000+08:00| 6.0| +|2019-12-27T00:35:00.000+08:00| 6.0| +|2019-12-27T00:40:00.000+08:00| 6.0| +|2019-12-27T00:45:00.000+08:00| 6.0| +|2019-12-27T00:50:00.000+08:00| 6.0| +|2019-12-27T00:55:00.000+08:00| 5.982609| +|2019-12-27T01:00:00.000+08:00| 5.9652176| +|2019-12-27T01:05:00.000+08:00| 5.947826| +|2019-12-27T01:10:00.000+08:00| 5.9304347| +|2019-12-27T01:15:00.000+08:00| 5.9130435| +|2019-12-27T01:20:00.000+08:00| 5.8956523| +|2019-12-27T01:25:00.000+08:00| 5.878261| +|2019-12-27T01:30:00.000+08:00| 5.8608694| +|2019-12-27T01:35:00.000+08:00| 5.843478| +............ +Total line number = 18066 +``` + +SQL for query: + +```sql +select pacf(s1, "lag"="5") from root.test +``` + +Output series: + +``` ++-----------------------------+-----------------------------+ +| Time|pacf(root.test.s1, "lag"="5")| ++-----------------------------+-----------------------------+ +|2019-12-27T00:00:00.000+08:00| 1.0| +|2019-12-27T00:05:00.000+08:00| 0.3528915091942786| +|2019-12-27T00:10:00.000+08:00| 0.1761346122516304| +|2019-12-27T00:15:00.000+08:00| 0.1492391973294682| +|2019-12-27T00:20:00.000+08:00| 0.03560059645868398| +|2019-12-27T00:25:00.000+08:00| 0.0366222998995286| ++-----------------------------+-----------------------------+ +``` + +### Percentile + +#### Usage + +The function is used to compute the exact or approximate percentile of a numeric time series. A percentile is value of element in the certain rank of the sorted series. + +**Name:** PERCENTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank percentage of the percentile. It should be (0,1] and the default value is 0.5. For instance, a percentile with `rank`=0.5 is the median. ++ `error`: The rank error of the approximate percentile. It should be within [0,1) and the default value is 0. For instance, a 0.5-percentile with `error`=0.01 is the value of the element with rank percentage 0.49~0.51. With `error`=0, the output is the exact percentile. + +**Output Series:** Output a single series. The type is the same as input series. If `error`=0, there is only one data point in the series, whose timestamp is the same has which the first percentile value has, and value is the percentile, otherwise the timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select percentile(s0, "rank"="0.2", "error"="0.01") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|percentile(root.test.s0, "rank"="0.2", "error"="0.01")| ++-----------------------------+------------------------------------------------------+ +|2021-03-17T10:35:02.054+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +``` + +### Quantile + +#### Usage + +The function is used to compute the approximate quantile of a numeric time series. A quantile is value of element in the certain rank of the sorted series. + +**Name:** QUANTILE + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + ++ `rank`: The rank of the quantile. It should be (0,1] and the default value is 0.5. For instance, a quantile with `rank`=0.5 is the median. ++ `K`: The size of KLL sketch maintained in the query. It should be within [100,+inf) and the default value is 800. For instance, the 0.5-quantile computed by a KLL sketch with K=800 items is a value with rank quantile 0.49~0.51 with a confidence of at least 99%. The result will be more accurate as K increases. + +**Output Series:** Output a single series. The type is the same as input series. The timestamp of the only data point is 0. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|2021-03-17T10:32:17.054+08:00| 0.5319929| +|2021-03-17T10:32:18.054+08:00| 0.9304316| +|2021-03-17T10:32:19.054+08:00| -1.4800133| +|2021-03-17T10:32:20.054+08:00| 0.6114087| +|2021-03-17T10:32:21.054+08:00| 2.5163336| +|2021-03-17T10:32:22.054+08:00| -1.0845392| +|2021-03-17T10:32:23.054+08:00| 1.0562582| +|2021-03-17T10:32:24.054+08:00| 1.3867859| +|2021-03-17T10:32:25.054+08:00| -0.45429882| +|2021-03-17T10:32:26.054+08:00| 1.0353678| +|2021-03-17T10:32:27.054+08:00| 0.7307929| +|2021-03-17T10:32:28.054+08:00| 2.3167255| +|2021-03-17T10:32:29.054+08:00| 2.342443| +|2021-03-17T10:32:30.054+08:00| 1.5809103| +|2021-03-17T10:32:31.054+08:00| 1.4829416| +|2021-03-17T10:32:32.054+08:00| 1.5800357| +|2021-03-17T10:32:33.054+08:00| 0.7124368| +|2021-03-17T10:32:34.054+08:00| -0.78597564| +|2021-03-17T10:32:35.054+08:00| 1.2058644| +|2021-03-17T10:32:36.054+08:00| 1.4215064| +|2021-03-17T10:32:37.054+08:00| 1.2808295| +|2021-03-17T10:32:38.054+08:00| -0.6173715| +|2021-03-17T10:32:39.054+08:00| 0.06644377| +|2021-03-17T10:32:40.054+08:00| 2.349338| +|2021-03-17T10:32:41.054+08:00| 1.7335888| +|2021-03-17T10:32:42.054+08:00| 1.5872132| +............ +Total line number = 10000 +``` + +SQL for query: + +```sql +select quantile(s0, "rank"="0.2", "K"="800") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|quantile(root.test.s0, "rank"="0.2", "K"="800")| ++-----------------------------+------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.1801469624042511| ++-----------------------------+------------------------------------------------------+ +``` + +### Period + +#### Usage + +The function is used to compute the period of a numeric time series. + +**Name:** PERIOD + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is INT32. There is only one data point in the series, whose timestamp is 0 and value is the period. + +#### Examples + +Input series: + + +``` ++-----------------------------+---------------+ +| Time|root.test.d3.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| +|1970-01-01T08:00:00.002+08:00| 2.0| +|1970-01-01T08:00:00.003+08:00| 3.0| +|1970-01-01T08:00:00.004+08:00| 1.0| +|1970-01-01T08:00:00.005+08:00| 2.0| +|1970-01-01T08:00:00.006+08:00| 3.0| +|1970-01-01T08:00:00.007+08:00| 1.0| +|1970-01-01T08:00:00.008+08:00| 2.0| +|1970-01-01T08:00:00.009+08:00| 3.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select period(s1) from root.test.d3 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|period(root.test.d3.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 3| ++-----------------------------+-----------------------+ +``` + +### QLB + +#### Usage + +This function is used to calculate Ljung-Box statistics $Q_{LB}$ for time series, and convert it to p value. + +**Name:** QLB + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters**: + +`lag`: max lag to calculate. Legal input shall be integer from 1 to n-2, where n is the sample number. Default value is n-2. + +**Output Series:** Output a single series. The type is DOUBLE. The output series is p value, and timestamp means lag. + +**Note:** If you want to calculate Ljung-Box statistics $Q_{LB}$ instead of p value, you may use ACF function. + +#### Examples + +##### Using Default Parameter + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T00:00:00.100+08:00| 1.22| +|1970-01-01T00:00:00.200+08:00| -2.78| +|1970-01-01T00:00:00.300+08:00| 1.53| +|1970-01-01T00:00:00.400+08:00| 0.70| +|1970-01-01T00:00:00.500+08:00| 0.75| +|1970-01-01T00:00:00.600+08:00| -0.72| +|1970-01-01T00:00:00.700+08:00| -0.22| +|1970-01-01T00:00:00.800+08:00| 0.28| +|1970-01-01T00:00:00.900+08:00| 0.57| +|1970-01-01T00:00:01.000+08:00| -0.22| +|1970-01-01T00:00:01.100+08:00| -0.72| +|1970-01-01T00:00:01.200+08:00| 1.34| +|1970-01-01T00:00:01.300+08:00| -0.25| +|1970-01-01T00:00:01.400+08:00| 0.17| +|1970-01-01T00:00:01.500+08:00| 2.51| +|1970-01-01T00:00:01.600+08:00| 1.42| +|1970-01-01T00:00:01.700+08:00| -1.34| +|1970-01-01T00:00:01.800+08:00| -0.01| +|1970-01-01T00:00:01.900+08:00| -0.49| +|1970-01-01T00:00:02.000+08:00| 1.63| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select QLB(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|QLB(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T00:00:00.001+08:00| 0.2168702295315677| +|1970-01-01T00:00:00.002+08:00| 0.3068948509261751| +|1970-01-01T00:00:00.003+08:00| 0.4217859150918444| +|1970-01-01T00:00:00.004+08:00| 0.5114539874276656| +|1970-01-01T00:00:00.005+08:00| 0.6560619525616759| +|1970-01-01T00:00:00.006+08:00| 0.7722398654053280| +|1970-01-01T00:00:00.007+08:00| 0.8532491661465290| +|1970-01-01T00:00:00.008+08:00| 0.9028575017542528| +|1970-01-01T00:00:00.009+08:00| 0.9434989988192729| +|1970-01-01T00:00:00.010+08:00| 0.8950280161464689| +|1970-01-01T00:00:00.011+08:00| 0.7701048398839656| +|1970-01-01T00:00:00.012+08:00| 0.7845536060001281| +|1970-01-01T00:00:00.013+08:00| 0.5943030981705825| +|1970-01-01T00:00:00.014+08:00| 0.4618413512531093| +|1970-01-01T00:00:00.015+08:00| 0.2645948244673964| +|1970-01-01T00:00:00.016+08:00| 0.3167530476666645| +|1970-01-01T00:00:00.017+08:00| 0.2330010780351453| +|1970-01-01T00:00:00.018+08:00| 0.0666611237622325| ++-----------------------------+--------------------+ +``` + +### Resample + +#### Usage + +This function is used to resample the input series according to a given frequency, +including up-sampling and down-sampling. +Currently, the supported up-sampling methods are +NaN (filling with `NaN`), +FFill (filling with previous value), +BFill (filling with next value) and +Linear (filling with linear interpolation). +Down-sampling relies on group aggregation, +which supports Max, Min, First, Last, Mean and Median. + +**Name:** RESAMPLE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + + ++ `every`: The frequency of resampling, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. ++ `interp`: The interpolation method of up-sampling, which is 'NaN', 'FFill', 'BFill' or 'Linear'. By default, NaN is used. ++ `aggr`: The aggregation method of down-sampling, which is 'Max', 'Min', 'First', 'Last', 'Mean' or 'Median'. By default, Mean is used. ++ `start`: The start time (inclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the first valid data point. ++ `end`: The end time (exclusive) of resampling with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is the timestamp of the last valid data point. + +**Output Series:** Output a single series. The type is DOUBLE. It is strictly equispaced with the frequency `every`. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +##### Up-sampling + +When the frequency of resampling is higher than the original frequency, up-sampling starts. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2021-03-06T16:00:00.000+08:00| 3.09| +|2021-03-06T16:15:00.000+08:00| 3.53| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:45:00.000+08:00| 3.51| +|2021-03-06T17:00:00.000+08:00| 3.41| ++-----------------------------+---------------+ +``` + + +SQL for query: + +```sql +select resample(s1,'every'='5m','interp'='linear') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="5m", "interp"="linear")| ++-----------------------------+----------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:05:00.000+08:00| 3.2366665999094644| +|2021-03-06T16:10:00.000+08:00| 3.3833332856496177| +|2021-03-06T16:15:00.000+08:00| 3.5299999713897705| +|2021-03-06T16:20:00.000+08:00| 3.5199999809265137| +|2021-03-06T16:25:00.000+08:00| 3.509999990463257| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T16:35:00.000+08:00| 3.503333330154419| +|2021-03-06T16:40:00.000+08:00| 3.506666660308838| +|2021-03-06T16:45:00.000+08:00| 3.509999990463257| +|2021-03-06T16:50:00.000+08:00| 3.4766666889190674| +|2021-03-06T16:55:00.000+08:00| 3.443333387374878| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+----------------------------------------------------------+ +``` + +##### Down-sampling + +When the frequency of resampling is lower than the original frequency, down-sampling starts. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','aggr'='first') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "aggr"="first")| ++-----------------------------+--------------------------------------------------------+ +|2021-03-06T16:00:00.000+08:00| 3.0899999141693115| +|2021-03-06T16:30:00.000+08:00| 3.5| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+--------------------------------------------------------+ +``` + + + +##### Specify the time period + +The time period of resampling can be specified with `start` and `end`. +The period outside the actual time range will be interpolated. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select resample(s1,'every'='30m','start'='2021-03-06 15:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------------------------------------+ +| Time|resample(root.test.d1.s1, "every"="30m", "start"="2021-03-06 15:00:00")| ++-----------------------------+-----------------------------------------------------------------------+ +|2021-03-06T15:00:00.000+08:00| NaN| +|2021-03-06T15:30:00.000+08:00| NaN| +|2021-03-06T16:00:00.000+08:00| 3.309999942779541| +|2021-03-06T16:30:00.000+08:00| 3.5049999952316284| +|2021-03-06T17:00:00.000+08:00| 3.4100000858306885| ++-----------------------------+-----------------------------------------------------------------------+ +``` + +### Sample + +#### Usage + +This function is used to sample the input series, +that is, select a specified number of data points from the input series and output them. +Currently, three sampling methods are supported: +**Reservoir sampling** randomly selects data points. +All of the points have the same probability of being sampled. +**Isometric sampling** selects data points at equal index intervals. +**Triangle sampling** assigns data points to the buckets based on the number of sampling. +Then it calculates the area of the triangle based on these points inside the bucket and selects the point with the largest area of the triangle. +For more detail, please read [paper](http://skemman.is/stream/get/1946/15343/37285/3/SS_MSthesis.pdf) + +**Name:** SAMPLE + +**Input Series:** Only support a single input series. The type is arbitrary. + +**Parameters:** + ++ `method`: The method of sampling, which is 'reservoir', 'isometric' or 'triangle'. By default, reservoir sampling is used. ++ `k`: The number of sampling, which is a positive integer. By default, it's 1. + +**Output Series:** Output a single series. The type is the same as the input. The length of the output series is `k`. Each data point in the output series comes from the input series. + +**Note:** If `k` is greater than the length of input series, all data points in the input series will be output. + +#### Examples + +##### Reservoir Sampling + +When `method` is 'reservoir' or the default, reservoir sampling is used. +Due to the randomness of this method, the output series shown below is only a possible result. + + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:04.000+08:00| 4.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:06.000+08:00| 6.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:09.000+08:00| 9.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select sample(s1,'method'='reservoir','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="reservoir", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 2.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:08.000+08:00| 8.0| +|2020-01-01T00:00:10.000+08:00| 10.0| ++-----------------------------+------------------------------------------------------+ +``` + +##### Isometric Sampling + +When `method` is 'isometric', isometric sampling is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select sample(s1,'method'='isometric','k'='5') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|sample(root.test.d1.s1, "method"="isometric", "k"="5")| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:01.000+08:00| 1.0| +|2020-01-01T00:00:03.000+08:00| 3.0| +|2020-01-01T00:00:05.000+08:00| 5.0| +|2020-01-01T00:00:07.000+08:00| 7.0| +|2020-01-01T00:00:09.000+08:00| 9.0| ++-----------------------------+------------------------------------------------------+ +``` + +### Segment + +#### Usage + +This function is used to segment a time series into subsequences according to linear trend, and returns linear fitted values of first values in each subsequence or every data point. + +**Name:** SEGMENT + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `output` :"all" to output all fitted points; "first" to output first fitted points in each subsequence. + ++ `error`: error allowed at linear regression. It is defined as mean absolute error of a subsequence. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** This function treat input series as equal-interval sampled. All data are loaded, so downsample input series first if there are too many data points. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:00.300+08:00| 2.0| +|1970-01-01T08:00:00.400+08:00| 3.0| +|1970-01-01T08:00:00.500+08:00| 4.0| +|1970-01-01T08:00:00.600+08:00| 5.0| +|1970-01-01T08:00:00.700+08:00| 6.0| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 8.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:01.100+08:00| 9.1| +|1970-01-01T08:00:01.200+08:00| 9.2| +|1970-01-01T08:00:01.300+08:00| 9.3| +|1970-01-01T08:00:01.400+08:00| 9.4| +|1970-01-01T08:00:01.500+08:00| 9.5| +|1970-01-01T08:00:01.600+08:00| 9.6| +|1970-01-01T08:00:01.700+08:00| 9.7| +|1970-01-01T08:00:01.800+08:00| 9.8| +|1970-01-01T08:00:01.900+08:00| 9.9| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:02.100+08:00| 8.0| +|1970-01-01T08:00:02.200+08:00| 6.0| +|1970-01-01T08:00:02.300+08:00| 4.0| +|1970-01-01T08:00:02.400+08:00| 2.0| +|1970-01-01T08:00:02.500+08:00| 0.0| +|1970-01-01T08:00:02.600+08:00| -2.0| +|1970-01-01T08:00:02.700+08:00| -4.0| +|1970-01-01T08:00:02.800+08:00| -6.0| +|1970-01-01T08:00:02.900+08:00| -8.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.100+08:00| 10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| +|1970-01-01T08:00:03.300+08:00| 10.0| +|1970-01-01T08:00:03.400+08:00| 10.0| +|1970-01-01T08:00:03.500+08:00| 10.0| +|1970-01-01T08:00:03.600+08:00| 10.0| +|1970-01-01T08:00:03.700+08:00| 10.0| +|1970-01-01T08:00:03.800+08:00| 10.0| +|1970-01-01T08:00:03.900+08:00| 10.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select segment(s1, "error"="0.1") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|segment(root.test.s1, "error"="0.1")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 5.0| +|1970-01-01T08:00:00.200+08:00| 1.0| +|1970-01-01T08:00:01.000+08:00| 9.0| +|1970-01-01T08:00:02.000+08:00| 10.0| +|1970-01-01T08:00:03.000+08:00| -10.0| +|1970-01-01T08:00:03.200+08:00| 10.0| ++-----------------------------+------------------------------------+ +``` + +### Skew + +#### Usage + +This function is used to calculate the population skewness. + +**Name:** SKEW + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population skewness. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 10.0| +|2020-01-01T00:00:11.000+08:00| 10.0| +|2020-01-01T00:00:12.000+08:00| 10.0| +|2020-01-01T00:00:13.000+08:00| 10.0| +|2020-01-01T00:00:14.000+08:00| 10.0| +|2020-01-01T00:00:15.000+08:00| 10.0| +|2020-01-01T00:00:16.000+08:00| 10.0| +|2020-01-01T00:00:17.000+08:00| 10.0| +|2020-01-01T00:00:18.000+08:00| 10.0| +|2020-01-01T00:00:19.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select skew(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time| skew(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| -0.9998427402292644| ++-----------------------------+-----------------------+ +``` + +### Spline + +#### Usage + +This function is used to calculate cubic spline interpolation of input series. + +**Name:** SPLINE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `points`: Number of resampling points. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note**: Output series retains the first and last timestamps of input series. Interpolation points are selected at equal intervals. The function tries to calculate only when there are no less than 4 points in input series. + +#### Examples + +##### Assigning number of interpolation points + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select spline(s1, "points"="151") from root.test +``` + +Output series: + +``` ++-----------------------------+------------------------------------+ +| Time|spline(root.test.s1, "points"="151")| ++-----------------------------+------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.010+08:00| 0.04870000251134237| +|1970-01-01T08:00:00.020+08:00| 0.09680000495910646| +|1970-01-01T08:00:00.030+08:00| 0.14430000734329226| +|1970-01-01T08:00:00.040+08:00| 0.19120000966389972| +|1970-01-01T08:00:00.050+08:00| 0.23750001192092896| +|1970-01-01T08:00:00.060+08:00| 0.2832000141143799| +|1970-01-01T08:00:00.070+08:00| 0.32830001624425253| +|1970-01-01T08:00:00.080+08:00| 0.3728000183105469| +|1970-01-01T08:00:00.090+08:00| 0.416700020313263| +|1970-01-01T08:00:00.100+08:00| 0.4600000222524008| +|1970-01-01T08:00:00.110+08:00| 0.5027000241279602| +|1970-01-01T08:00:00.120+08:00| 0.5448000259399414| +|1970-01-01T08:00:00.130+08:00| 0.5863000276883443| +|1970-01-01T08:00:00.140+08:00| 0.627200029373169| +|1970-01-01T08:00:00.150+08:00| 0.6675000309944153| +|1970-01-01T08:00:00.160+08:00| 0.7072000325520833| +|1970-01-01T08:00:00.170+08:00| 0.7463000340461731| +|1970-01-01T08:00:00.180+08:00| 0.7848000354766846| +|1970-01-01T08:00:00.190+08:00| 0.8227000368436178| +|1970-01-01T08:00:00.200+08:00| 0.8600000381469728| +|1970-01-01T08:00:00.210+08:00| 0.8967000393867494| +|1970-01-01T08:00:00.220+08:00| 0.9328000405629477| +|1970-01-01T08:00:00.230+08:00| 0.9683000416755676| +|1970-01-01T08:00:00.240+08:00| 1.0032000427246095| +|1970-01-01T08:00:00.250+08:00| 1.037500043710073| +|1970-01-01T08:00:00.260+08:00| 1.071200044631958| +|1970-01-01T08:00:00.270+08:00| 1.1043000454902647| +|1970-01-01T08:00:00.280+08:00| 1.1368000462849934| +|1970-01-01T08:00:00.290+08:00| 1.1687000470161437| +|1970-01-01T08:00:00.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:00.310+08:00| 1.2307000483103594| +|1970-01-01T08:00:00.320+08:00| 1.2608000489139557| +|1970-01-01T08:00:00.330+08:00| 1.2903000494873524| +|1970-01-01T08:00:00.340+08:00| 1.3192000500233967| +|1970-01-01T08:00:00.350+08:00| 1.3475000505149364| +|1970-01-01T08:00:00.360+08:00| 1.3752000509548186| +|1970-01-01T08:00:00.370+08:00| 1.402300051335891| +|1970-01-01T08:00:00.380+08:00| 1.4288000516510009| +|1970-01-01T08:00:00.390+08:00| 1.4547000518929958| +|1970-01-01T08:00:00.400+08:00| 1.480000052054723| +|1970-01-01T08:00:00.410+08:00| 1.5047000521290301| +|1970-01-01T08:00:00.420+08:00| 1.5288000521087646| +|1970-01-01T08:00:00.430+08:00| 1.5523000519867738| +|1970-01-01T08:00:00.440+08:00| 1.575200051755905| +|1970-01-01T08:00:00.450+08:00| 1.597500051409006| +|1970-01-01T08:00:00.460+08:00| 1.619200050938924| +|1970-01-01T08:00:00.470+08:00| 1.6403000503385066| +|1970-01-01T08:00:00.480+08:00| 1.660800049600601| +|1970-01-01T08:00:00.490+08:00| 1.680700048718055| +|1970-01-01T08:00:00.500+08:00| 1.7000000476837158| +|1970-01-01T08:00:00.510+08:00| 1.7188475466453037| +|1970-01-01T08:00:00.520+08:00| 1.7373800457262996| +|1970-01-01T08:00:00.530+08:00| 1.7555825448831923| +|1970-01-01T08:00:00.540+08:00| 1.7734400440724702| +|1970-01-01T08:00:00.550+08:00| 1.790937543250622| +|1970-01-01T08:00:00.560+08:00| 1.8080600423741364| +|1970-01-01T08:00:00.570+08:00| 1.8247925413995016| +|1970-01-01T08:00:00.580+08:00| 1.8411200402832066| +|1970-01-01T08:00:00.590+08:00| 1.8570275389817397| +|1970-01-01T08:00:00.600+08:00| 1.8725000374515897| +|1970-01-01T08:00:00.610+08:00| 1.8875225356492449| +|1970-01-01T08:00:00.620+08:00| 1.902080033531194| +|1970-01-01T08:00:00.630+08:00| 1.9161575310539258| +|1970-01-01T08:00:00.640+08:00| 1.9297400281739288| +|1970-01-01T08:00:00.650+08:00| 1.9428125248476913| +|1970-01-01T08:00:00.660+08:00| 1.9553600210317021| +|1970-01-01T08:00:00.670+08:00| 1.96736751668245| +|1970-01-01T08:00:00.680+08:00| 1.9788200117564232| +|1970-01-01T08:00:00.690+08:00| 1.9897025062101101| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.710+08:00| 2.0097024933913334| +|1970-01-01T08:00:00.720+08:00| 2.0188199867081615| +|1970-01-01T08:00:00.730+08:00| 2.027367479995188| +|1970-01-01T08:00:00.740+08:00| 2.0353599732971155| +|1970-01-01T08:00:00.750+08:00| 2.0428124666586482| +|1970-01-01T08:00:00.760+08:00| 2.049739960124489| +|1970-01-01T08:00:00.770+08:00| 2.056157453739342| +|1970-01-01T08:00:00.780+08:00| 2.06207994754791| +|1970-01-01T08:00:00.790+08:00| 2.067522441594897| +|1970-01-01T08:00:00.800+08:00| 2.072499935925006| +|1970-01-01T08:00:00.810+08:00| 2.07702743058294| +|1970-01-01T08:00:00.820+08:00| 2.081119925613404| +|1970-01-01T08:00:00.830+08:00| 2.0847924210611| +|1970-01-01T08:00:00.840+08:00| 2.0880599169707317| +|1970-01-01T08:00:00.850+08:00| 2.0909374133870027| +|1970-01-01T08:00:00.860+08:00| 2.0934399103546166| +|1970-01-01T08:00:00.870+08:00| 2.0955824079182768| +|1970-01-01T08:00:00.880+08:00| 2.0973799061226863| +|1970-01-01T08:00:00.890+08:00| 2.098847405012549| +|1970-01-01T08:00:00.900+08:00| 2.0999999046325684| +|1970-01-01T08:00:00.910+08:00| 2.1005574051201332| +|1970-01-01T08:00:00.920+08:00| 2.1002599065303778| +|1970-01-01T08:00:00.930+08:00| 2.0991524087846245| +|1970-01-01T08:00:00.940+08:00| 2.0972799118041947| +|1970-01-01T08:00:00.950+08:00| 2.0946874155104105| +|1970-01-01T08:00:00.960+08:00| 2.0914199198245944| +|1970-01-01T08:00:00.970+08:00| 2.0875224246680673| +|1970-01-01T08:00:00.980+08:00| 2.083039929962151| +|1970-01-01T08:00:00.990+08:00| 2.0780174356281687| +|1970-01-01T08:00:01.000+08:00| 2.0724999415874406| +|1970-01-01T08:00:01.010+08:00| 2.06653244776129| +|1970-01-01T08:00:01.020+08:00| 2.060159954071038| +|1970-01-01T08:00:01.030+08:00| 2.053427460438006| +|1970-01-01T08:00:01.040+08:00| 2.046379966783517| +|1970-01-01T08:00:01.050+08:00| 2.0390624730288924| +|1970-01-01T08:00:01.060+08:00| 2.031519979095454| +|1970-01-01T08:00:01.070+08:00| 2.0237974849045237| +|1970-01-01T08:00:01.080+08:00| 2.015939990377423| +|1970-01-01T08:00:01.090+08:00| 2.0079924954354746| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.110+08:00| 1.9907018211101906| +|1970-01-01T08:00:01.120+08:00| 1.9788509124245144| +|1970-01-01T08:00:01.130+08:00| 1.9645127287932083| +|1970-01-01T08:00:01.140+08:00| 1.9477527250665083| +|1970-01-01T08:00:01.150+08:00| 1.9286363560946513| +|1970-01-01T08:00:01.160+08:00| 1.9072290767278735| +|1970-01-01T08:00:01.170+08:00| 1.8835963418164114| +|1970-01-01T08:00:01.180+08:00| 1.8578036062105014| +|1970-01-01T08:00:01.190+08:00| 1.8299163247603802| +|1970-01-01T08:00:01.200+08:00| 1.7999999523162842| +|1970-01-01T08:00:01.210+08:00| 1.7623635841923329| +|1970-01-01T08:00:01.220+08:00| 1.7129696477516976| +|1970-01-01T08:00:01.230+08:00| 1.6543635959181928| +|1970-01-01T08:00:01.240+08:00| 1.5890908816156328| +|1970-01-01T08:00:01.250+08:00| 1.5196969577678319| +|1970-01-01T08:00:01.260+08:00| 1.4487272772986044| +|1970-01-01T08:00:01.270+08:00| 1.3787272931317647| +|1970-01-01T08:00:01.280+08:00| 1.3122424581911272| +|1970-01-01T08:00:01.290+08:00| 1.251818225400506| +|1970-01-01T08:00:01.300+08:00| 1.2000000476837158| +|1970-01-01T08:00:01.310+08:00| 1.1548000470995912| +|1970-01-01T08:00:01.320+08:00| 1.1130667107899999| +|1970-01-01T08:00:01.330+08:00| 1.0756000393033045| +|1970-01-01T08:00:01.340+08:00| 1.043200033187868| +|1970-01-01T08:00:01.350+08:00| 1.016666692992053| +|1970-01-01T08:00:01.360+08:00| 0.9968000192642223| +|1970-01-01T08:00:01.370+08:00| 0.9844000125527389| +|1970-01-01T08:00:01.380+08:00| 0.9802666734059655| +|1970-01-01T08:00:01.390+08:00| 0.9852000023722649| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.410+08:00| 1.023999999165535| +|1970-01-01T08:00:01.420+08:00| 1.0559999990463256| +|1970-01-01T08:00:01.430+08:00| 1.0959999996423722| +|1970-01-01T08:00:01.440+08:00| 1.1440000009536744| +|1970-01-01T08:00:01.450+08:00| 1.2000000029802322| +|1970-01-01T08:00:01.460+08:00| 1.264000005722046| +|1970-01-01T08:00:01.470+08:00| 1.3360000091791153| +|1970-01-01T08:00:01.480+08:00| 1.4160000133514405| +|1970-01-01T08:00:01.490+08:00| 1.5040000182390214| +|1970-01-01T08:00:01.500+08:00| 1.600000023841858| ++-----------------------------+------------------------------------+ +``` + +### Spread + +#### Usage + +This function is used to calculate the spread of time series, that is, the maximum value minus the minimum value. + +**Name:** SPREAD + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is the same as the input. There is only one data point in the series, whose timestamp is 0 and value is the spread. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select spread(s1) from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|spread(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 26.0| ++-----------------------------+-----------------------+ +``` + +### Stddev + +#### Usage + +This function is used to calculate the population standard deviation. + +**Name:** STDDEV + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population standard deviation. + +**Note:** Missing points, null points and `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| +|2020-01-01T00:00:01.000+08:00| 2.0| +|2020-01-01T00:00:02.000+08:00| 3.0| +|2020-01-01T00:00:03.000+08:00| 4.0| +|2020-01-01T00:00:04.000+08:00| 5.0| +|2020-01-01T00:00:05.000+08:00| 6.0| +|2020-01-01T00:00:06.000+08:00| 7.0| +|2020-01-01T00:00:07.000+08:00| 8.0| +|2020-01-01T00:00:08.000+08:00| 9.0| +|2020-01-01T00:00:09.000+08:00| 10.0| +|2020-01-01T00:00:10.000+08:00| 11.0| +|2020-01-01T00:00:11.000+08:00| 12.0| +|2020-01-01T00:00:12.000+08:00| 13.0| +|2020-01-01T00:00:13.000+08:00| 14.0| +|2020-01-01T00:00:14.000+08:00| 15.0| +|2020-01-01T00:00:15.000+08:00| 16.0| +|2020-01-01T00:00:16.000+08:00| 17.0| +|2020-01-01T00:00:17.000+08:00| 18.0| +|2020-01-01T00:00:18.000+08:00| 19.0| +|2020-01-01T00:00:19.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select stddev(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|stddev(root.test.d1.s1)| ++-----------------------------+-----------------------+ +|1970-01-01T08:00:00.000+08:00| 5.7662812973353965| ++-----------------------------+-----------------------+ +``` + +### ZScore + +#### Usage + +This function is used to standardize the input series with z-score. + +**Name:** ZSCORE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `compute`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide mean and standard deviation. The default method is "batch". ++ `avg`: Mean value when method is set to "stream". ++ `sd`: Standard deviation when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select zscore(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|zscore(root.test.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.200+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.300+08:00| 0.20672455764868078| +|1970-01-01T08:00:00.400+08:00| -0.6201736729460423| +|1970-01-01T08:00:00.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:00.700+08:00| -1.033622788243404| +|1970-01-01T08:00:00.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:00.900+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.000+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.100+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.200+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.300+08:00| -0.6201736729460423| +|1970-01-01T08:00:01.400+08:00| 0.20672455764868078| +|1970-01-01T08:00:01.500+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.600+08:00|-0.20672455764868078| +|1970-01-01T08:00:01.700+08:00| 3.9277665953249348| +|1970-01-01T08:00:01.800+08:00| 0.6201736729460423| +|1970-01-01T08:00:01.900+08:00| -1.033622788243404| +|1970-01-01T08:00:02.000+08:00|-0.20672455764868078| ++-----------------------------+--------------------+ +``` + + +## Anomaly Detection + +### IQR + +#### Usage + +This function is used to detect anomalies based on IQR. Points distributing beyond 1.5 times IQR are selected. + +**Name:** IQR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`: When set to "batch", anomaly test is conducted after importing all data points; when set to "stream", it is required to provide upper and lower quantiles. The default method is "batch". ++ `q1`: The lower quantile when method is set to "stream". ++ `q3`: The upper quantile when method is set to "stream". + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** $IQR=Q_3-Q_1$ + +#### Examples + +##### Batch computing + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| +|1970-01-01T08:00:00.300+08:00| 1.0| +|1970-01-01T08:00:00.400+08:00| -1.0| +|1970-01-01T08:00:00.500+08:00| 0.0| +|1970-01-01T08:00:00.600+08:00| 0.0| +|1970-01-01T08:00:00.700+08:00| -2.0| +|1970-01-01T08:00:00.800+08:00| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 1.0| +|1970-01-01T08:00:01.200+08:00| -1.0| +|1970-01-01T08:00:01.300+08:00| -1.0| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 0.0| +|1970-01-01T08:00:01.600+08:00| 0.0| +|1970-01-01T08:00:01.700+08:00| 10.0| +|1970-01-01T08:00:01.800+08:00| 2.0| +|1970-01-01T08:00:01.900+08:00| -2.0| +|1970-01-01T08:00:02.000+08:00| 0.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select iqr(s1) from root.test +``` + +Output series: + +``` ++-----------------------------+-----------------+ +| Time|iqr(root.test.s1)| ++-----------------------------+-----------------+ +|1970-01-01T08:00:01.700+08:00| 10.0| ++-----------------------------+-----------------+ +``` + +### KSigma + +#### Usage + +This function is used to detect anomalies based on the Dynamic K-Sigma Algorithm. +Within a sliding window, the input value with a deviation of more than k times the standard deviation from the average will be output as anomaly. + +**Name:** KSIGMA + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `k`: How many times to multiply on standard deviation to define anomaly, the default value is 3. ++ `window`: The window size of Dynamic K-Sigma Algorithm, the default value is 10000. + +**Output Series:** Output a single series. The type is same as input series. + +**Note:** Only when is larger than 0, the anomaly detection will be performed. Otherwise, nothing will be output. + +#### Examples + +##### Assigning k + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:04.000+08:00| 100.0| +|2020-01-01T00:00:06.000+08:00| 150.0| +|2020-01-01T00:00:08.000+08:00| 200.0| +|2020-01-01T00:00:10.000+08:00| 200.0| +|2020-01-01T00:00:14.000+08:00| 200.0| +|2020-01-01T00:00:15.000+08:00| 200.0| +|2020-01-01T00:00:16.000+08:00| 200.0| +|2020-01-01T00:00:18.000+08:00| 200.0| +|2020-01-01T00:00:20.000+08:00| 150.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ksigma(s1,"k"="1.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+---------------------------------+ +|Time |ksigma(root.test.d1.s1,"k"="3.0")| ++-----------------------------+---------------------------------+ +|2020-01-01T00:00:02.000+08:00| 0.0| +|2020-01-01T00:00:03.000+08:00| 50.0| +|2020-01-01T00:00:26.000+08:00| 50.0| +|2020-01-01T00:00:28.000+08:00| 0.0| ++-----------------------------+---------------------------------+ +``` + +### LOF + +#### Usage + +This function is used to detect density anomaly of time series. According to k-th distance calculation parameter and local outlier factor (lof) threshold, the function judges if a set of input values is an density anomaly, and a bool mark of anomaly values will be output. + +**Name:** LOF + +**Input Series:** Multiple input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `method`:assign a detection method. The default value is "default", when input data has multiple dimensions. The alternative is "series", when a input series will be transformed to high dimension. ++ `k`:use the k-th distance to calculate lof. Default value is 3. ++ `window`: size of window to split origin data points. Default value is 10000. ++ `windowsize`:dimension that will be transformed into when method is "series". The default value is 5. + +**Output Series:** Output a single series. The type is DOUBLE. + +**Note:** Incomplete rows will be ignored. They are neither calculated nor marked as anomaly. + +#### Examples + +##### Using default parameters + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.200+08:00| 0.0| 1.0| +|1970-01-01T08:00:00.300+08:00| 1.0| 1.0| +|1970-01-01T08:00:00.400+08:00| 1.0| 0.0| +|1970-01-01T08:00:00.500+08:00| 0.0| -1.0| +|1970-01-01T08:00:00.600+08:00| -1.0| -1.0| +|1970-01-01T08:00:00.700+08:00| -1.0| 0.0| +|1970-01-01T08:00:00.800+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.900+08:00| 0.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1,s2) from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|lof(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.100+08:00| 3.8274824267668244| +|1970-01-01T08:00:00.200+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.300+08:00| 2.838155437762879| +|1970-01-01T08:00:00.400+08:00| 3.0117631741126156| +|1970-01-01T08:00:00.500+08:00| 2.73518261244453| +|1970-01-01T08:00:00.600+08:00| 2.371440975708148| +|1970-01-01T08:00:00.700+08:00| 2.73518261244453| +|1970-01-01T08:00:00.800+08:00| 1.7561416374270742| ++-----------------------------+-------------------------------------+ +``` + +##### Diagnosing 1d timeseries + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.100+08:00| 1.0| +|1970-01-01T08:00:00.200+08:00| 2.0| +|1970-01-01T08:00:00.300+08:00| 3.0| +|1970-01-01T08:00:00.400+08:00| 4.0| +|1970-01-01T08:00:00.500+08:00| 5.0| +|1970-01-01T08:00:00.600+08:00| 6.0| +|1970-01-01T08:00:00.700+08:00| 7.0| +|1970-01-01T08:00:00.800+08:00| 8.0| +|1970-01-01T08:00:00.900+08:00| 9.0| +|1970-01-01T08:00:01.000+08:00| 10.0| +|1970-01-01T08:00:01.100+08:00| 11.0| +|1970-01-01T08:00:01.200+08:00| 12.0| +|1970-01-01T08:00:01.300+08:00| 13.0| +|1970-01-01T08:00:01.400+08:00| 14.0| +|1970-01-01T08:00:01.500+08:00| 15.0| +|1970-01-01T08:00:01.600+08:00| 16.0| +|1970-01-01T08:00:01.700+08:00| 17.0| +|1970-01-01T08:00:01.800+08:00| 18.0| +|1970-01-01T08:00:01.900+08:00| 19.0| +|1970-01-01T08:00:02.000+08:00| 20.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lof(s1, "method"="series") from root.test.d1 where time<1000 +``` + +Output series: + +``` ++-----------------------------+--------------------+ +| Time|lof(root.test.d1.s1)| ++-----------------------------+--------------------+ +|1970-01-01T08:00:00.100+08:00| 3.77777777777778| +|1970-01-01T08:00:00.200+08:00| 4.32727272727273| +|1970-01-01T08:00:00.300+08:00| 4.85714285714286| +|1970-01-01T08:00:00.400+08:00| 5.40909090909091| +|1970-01-01T08:00:00.500+08:00| 5.94999999999999| +|1970-01-01T08:00:00.600+08:00| 6.43243243243243| +|1970-01-01T08:00:00.700+08:00| 6.79999999999999| +|1970-01-01T08:00:00.800+08:00| 7.0| +|1970-01-01T08:00:00.900+08:00| 7.0| +|1970-01-01T08:00:01.000+08:00| 6.79999999999999| +|1970-01-01T08:00:01.100+08:00| 6.43243243243243| +|1970-01-01T08:00:01.200+08:00| 5.94999999999999| +|1970-01-01T08:00:01.300+08:00| 5.40909090909091| +|1970-01-01T08:00:01.400+08:00| 4.85714285714286| +|1970-01-01T08:00:01.500+08:00| 4.32727272727273| +|1970-01-01T08:00:01.600+08:00| 3.77777777777778| ++-----------------------------+--------------------+ +``` + +### MissDetect + +#### Usage + +This function is used to detect missing anomalies. +In some datasets, missing values are filled by linear interpolation. +Thus, there are several long perfect linear segments. +By discovering these perfect linear segments, +missing anomalies are detected. + +**Name:** MISSDETECT + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameter:** + +`error`: The minimum length of the detected missing anomalies, which is an integer greater than or equal to 10. By default, it is 10. + +**Output Series:** Output a single series. The type is BOOLEAN. Each data point which is miss anomaly will be labeled as true. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s2| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 0.0| +|2021-07-01T12:00:01.000+08:00| 1.0| +|2021-07-01T12:00:02.000+08:00| 0.0| +|2021-07-01T12:00:03.000+08:00| 1.0| +|2021-07-01T12:00:04.000+08:00| 0.0| +|2021-07-01T12:00:05.000+08:00| 0.0| +|2021-07-01T12:00:06.000+08:00| 0.0| +|2021-07-01T12:00:07.000+08:00| 0.0| +|2021-07-01T12:00:08.000+08:00| 0.0| +|2021-07-01T12:00:09.000+08:00| 0.0| +|2021-07-01T12:00:10.000+08:00| 0.0| +|2021-07-01T12:00:11.000+08:00| 0.0| +|2021-07-01T12:00:12.000+08:00| 0.0| +|2021-07-01T12:00:13.000+08:00| 0.0| +|2021-07-01T12:00:14.000+08:00| 0.0| +|2021-07-01T12:00:15.000+08:00| 0.0| +|2021-07-01T12:00:16.000+08:00| 1.0| +|2021-07-01T12:00:17.000+08:00| 0.0| +|2021-07-01T12:00:18.000+08:00| 1.0| +|2021-07-01T12:00:19.000+08:00| 0.0| +|2021-07-01T12:00:20.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select missdetect(s2,'minlen'='10') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------+ +| Time|missdetect(root.test.d2.s2, "minlen"="10")| ++-----------------------------+------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| false| +|2021-07-01T12:00:01.000+08:00| false| +|2021-07-01T12:00:02.000+08:00| false| +|2021-07-01T12:00:03.000+08:00| false| +|2021-07-01T12:00:04.000+08:00| true| +|2021-07-01T12:00:05.000+08:00| true| +|2021-07-01T12:00:06.000+08:00| true| +|2021-07-01T12:00:07.000+08:00| true| +|2021-07-01T12:00:08.000+08:00| true| +|2021-07-01T12:00:09.000+08:00| true| +|2021-07-01T12:00:10.000+08:00| true| +|2021-07-01T12:00:11.000+08:00| true| +|2021-07-01T12:00:12.000+08:00| true| +|2021-07-01T12:00:13.000+08:00| true| +|2021-07-01T12:00:14.000+08:00| true| +|2021-07-01T12:00:15.000+08:00| true| +|2021-07-01T12:00:16.000+08:00| false| +|2021-07-01T12:00:17.000+08:00| false| +|2021-07-01T12:00:18.000+08:00| false| +|2021-07-01T12:00:19.000+08:00| false| +|2021-07-01T12:00:20.000+08:00| false| ++-----------------------------+------------------------------------------+ +``` + +### Range + +#### Usage + +This function is used to detect range anomaly of time series. According to upper bound and lower bound parameters, the function judges if a input value is beyond range, aka range anomaly, and a new time series of anomaly will be output. + +**Name:** RANGE + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `lower_bound`:lower bound of range anomaly detection. ++ `upper_bound`:upper bound of range anomaly detection. + +**Output Series:** Output a single series. The type is the same as the input. + +**Note:** Only when `upper_bound` is larger than `lower_bound`, the anomaly detection will be performed. Otherwise, nothing will be output. + + + +#### Examples + +##### Assigning Lower and Upper Bound + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select range(s1,"lower_bound"="101.0","upper_bound"="125.0") from root.test.d1 where time <= 2020-01-01 00:00:30 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +|Time |range(root.test.d1.s1,"lower_bound"="101.0","upper_bound"="125.0")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:28.000+08:00| 126.0| ++-----------------------------+------------------------------------------------------------------+ +``` + +### TwoSidedFilter + +#### Usage + +The function is used to filter anomalies of a numeric time series based on two-sided window detection. + +**Name:** TWOSIDEDFILTER + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE + +**Output Series:** Output a single series. The type is the same as the input. It is the input without anomalies. + +**Parameter:** + +- `len`: The size of the window, which is a positive integer. By default, it's 5. When `len`=3, the algorithm detects forward window and backward window with length 3 and calculates the outlierness of the current point. + +- `threshold`: The threshold of outlierness, which is a floating number in (0,1). By default, it's 0.3. The strict standard of detecting anomalies is in proportion to the threshold. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:00:37.000+08:00| 1484.0| +|1970-01-01T08:00:38.000+08:00| 1055.0| +|1970-01-01T08:00:39.000+08:00| 1050.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select TwoSidedFilter(s0, 'len'='5', 'threshold'='0.3') from root.test +``` + +Output series: + +``` ++-----------------------------+------------+ +| Time|root.test.s0| ++-----------------------------+------------+ +|1970-01-01T08:00:00.000+08:00| 2002.0| +|1970-01-01T08:00:01.000+08:00| 1946.0| +|1970-01-01T08:00:02.000+08:00| 1958.0| +|1970-01-01T08:00:03.000+08:00| 2012.0| +|1970-01-01T08:00:04.000+08:00| 2051.0| +|1970-01-01T08:00:05.000+08:00| 1898.0| +|1970-01-01T08:00:06.000+08:00| 2014.0| +|1970-01-01T08:00:07.000+08:00| 2052.0| +|1970-01-01T08:00:08.000+08:00| 1935.0| +|1970-01-01T08:00:09.000+08:00| 1901.0| +|1970-01-01T08:00:10.000+08:00| 1972.0| +|1970-01-01T08:00:11.000+08:00| 1969.0| +|1970-01-01T08:00:12.000+08:00| 1984.0| +|1970-01-01T08:00:13.000+08:00| 2018.0| +|1970-01-01T08:01:05.000+08:00| 1023.0| +|1970-01-01T08:01:06.000+08:00| 1056.0| +|1970-01-01T08:01:07.000+08:00| 978.0| +|1970-01-01T08:01:08.000+08:00| 1050.0| +|1970-01-01T08:01:09.000+08:00| 1123.0| +|1970-01-01T08:01:10.000+08:00| 1150.0| +|1970-01-01T08:01:11.000+08:00| 1034.0| +|1970-01-01T08:01:12.000+08:00| 950.0| +|1970-01-01T08:01:13.000+08:00| 1059.0| ++-----------------------------+------------+ +``` + +### Outlier + +#### Usage + +This function is used to detect distance-based outliers. For each point in the current window, if the number of its neighbors within the distance of neighbor distance threshold is less than the neighbor count threshold, the point in detected as an outlier. + +**Name:** OUTLIER + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + ++ `r`:the neighbor distance threshold. ++ `k`:the neighbor count threshold. ++ `w`:the window size. ++ `s`:the slide size. + +**Output Series:** Output a single series. The type is the same as the input. + +#### Examples + +##### Assigning Parameters of Queries + +Input series: + +``` ++-----------------------------+------------+ +| Time|root.test.s1| ++-----------------------------+------------+ +|2020-01-04T23:59:55.000+08:00| 56.0| +|2020-01-04T23:59:56.000+08:00| 55.1| +|2020-01-04T23:59:57.000+08:00| 54.2| +|2020-01-04T23:59:58.000+08:00| 56.3| +|2020-01-04T23:59:59.000+08:00| 59.0| +|2020-01-05T00:00:00.000+08:00| 60.0| +|2020-01-05T00:00:01.000+08:00| 60.5| +|2020-01-05T00:00:02.000+08:00| 64.5| +|2020-01-05T00:00:03.000+08:00| 69.0| +|2020-01-05T00:00:04.000+08:00| 64.2| +|2020-01-05T00:00:05.000+08:00| 62.3| +|2020-01-05T00:00:06.000+08:00| 58.0| +|2020-01-05T00:00:07.000+08:00| 58.9| +|2020-01-05T00:00:08.000+08:00| 52.0| +|2020-01-05T00:00:09.000+08:00| 62.3| +|2020-01-05T00:00:10.000+08:00| 61.0| +|2020-01-05T00:00:11.000+08:00| 64.2| +|2020-01-05T00:00:12.000+08:00| 61.8| +|2020-01-05T00:00:13.000+08:00| 64.0| +|2020-01-05T00:00:14.000+08:00| 63.0| ++-----------------------------+------------+ +``` + +SQL for query: + +```sql +select outlier(s1,"r"="5.0","k"="4","w"="10","s"="5") from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------+ +| Time|outlier(root.test.s1,"r"="5.0","k"="4","w"="10","s"="5")| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:03.000+08:00| 69.0| ++-----------------------------+--------------------------------------------------------+ +|2020-01-05T00:00:08.000+08:00| 52.0| ++-----------------------------+--------------------------------------------------------+ +``` + + +### MasterTrain + +#### Usage + +This function is used to train the VAR model based on master data. The model is trained on learning samples consisting of p+1 consecutive non-error points. + +**Name:** MasterTrain + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn clean package -am -Dmaven.test.skip=true`. +- Copy `./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterTrain as 'org.apache.iotdb.library.anomaly.UDTFMasterTrain'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| ++-----------------------------+------------+------------+--------------+--------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| ++-----------------------------+------------+------------+--------------+--------------+ +``` + +SQL for query: + +```sql +select MasterTrain(lo,la,m_lo,m_la,'p'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------------+ +| Time|MasterTrain(root.test.lo, root.test.la, root.test.m_lo, root.test.m_la, "p"="3", "eta"="1.0")| ++-----------------------------+---------------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| -0.07863708480736269| ++-----------------------------+---------------------------------------------------------------------------------------------+ +``` + +### MasterDetect + +#### Usage + +This function is used to detect time series and repair errors based on master data. The VAR model is trained by MasterTrain. + +**Name:** MasterDetect + +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `p`: The order of the model. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple distance of the k-th nearest neighbor in the master data. ++ `eta`: The distance threshold. By default, it will be estimated based on the 3-sigma rule. ++ `eta`: The detection threshold. By default, it will be estimated based on the 3-sigma rule. ++ `output_type`: The type of output. 'repair' for repairing and 'anomaly' for anomaly detection. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. + +**Installation** +- Install IoTDB from branch `research/master-detector`. +- Run `mvn clean package -am -Dmaven.test.skip=true`. +- Copy `./distribution/target/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/apache-iotdb-1.2.0-SNAPSHOT-library-udf-bin/ext/udf/library-udf.jar` to `./ext/udf/`. +- Start IoTDB server and run `create function MasterDetect as 'org.apache.iotdb.library.anomaly.UDTFMasterDetect'` in client. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +| Time|root.test.lo|root.test.la|root.test.m_la|root.test.m_lo| root.test.model| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +|1970-01-01T08:00:00.001+08:00| 39.99982556| 116.327274| 116.3271939| 39.99984748| 0.13656607660463288| +|1970-01-01T08:00:00.002+08:00| 39.99983865| 116.327305| 116.3272269| 39.99984748| 0.8291884323013894| +|1970-01-01T08:00:00.003+08:00| 40.00019038| 116.3273291| 116.3272634| 39.99984769| 0.05012816073171693| +|1970-01-01T08:00:00.004+08:00| 39.99982556| 116.327342| 116.3273015| 39.9998483| -0.5495287787485761| +|1970-01-01T08:00:00.005+08:00| 39.99982991| 116.3273744| 116.327339| 39.99984892| 0.03740486307345578| +|1970-01-01T08:00:00.006+08:00| 39.99982716| 116.3274117| 116.3273759| 39.99984892| 1.0500132150475212| +|1970-01-01T08:00:00.007+08:00| 39.9998259| 116.3274396| 116.3274163| 39.99984953| 0.04583944643116993| +|1970-01-01T08:00:00.008+08:00| 39.99982597| 116.3274668| 116.3274525| 39.99985014|-0.07863708480736269| +|1970-01-01T08:00:00.009+08:00| 39.99982226| 116.3275026| 116.3274915| 39.99985076| null| +|1970-01-01T08:00:00.010+08:00| 39.99980988| 116.3274967| 116.3275235| 39.99985137| null| +|1970-01-01T08:00:00.011+08:00| 39.99984873| 116.3274929| 116.3275611| 39.99985199| null| +|1970-01-01T08:00:00.012+08:00| 39.99981589| 116.3274745| 116.3275974| 39.9998526| null| +|1970-01-01T08:00:00.013+08:00| 39.9998259| 116.3275095| 116.3276338| 39.99985384| null| +|1970-01-01T08:00:00.014+08:00| 39.99984873| 116.3274787| 116.3276695| 39.99985446| null| +|1970-01-01T08:00:00.015+08:00| 39.9998343| 116.3274693| 116.3277045| 39.99985569| null| +|1970-01-01T08:00:00.016+08:00| 39.99983316| 116.3274941| 116.3277389| 39.99985631| null| +|1970-01-01T08:00:00.017+08:00| 39.99983311| 116.3275401| 116.3277747| 39.99985693| null| +|1970-01-01T08:00:00.018+08:00| 39.99984113| 116.3275713| 116.3278041| 39.99985756| null| +|1970-01-01T08:00:00.019+08:00| 39.99983602| 116.3276003| 116.3278379| 39.99985818| null| +|1970-01-01T08:00:00.020+08:00| 39.9998355| 116.3276308| 116.3278723| 39.9998588| null| +|1970-01-01T08:00:00.021+08:00| 40.00012176| 116.3276107| 116.3279026| 39.99985942| null| +|1970-01-01T08:00:00.022+08:00| 39.9998404| 116.3276684| null| null| null| +|1970-01-01T08:00:00.023+08:00| 39.99983942| 116.3277016| null| null| null| +|1970-01-01T08:00:00.024+08:00| 39.99984113| 116.3277284| null| null| null| +|1970-01-01T08:00:00.025+08:00| 39.99984283| 116.3277562| null| null| null| ++-----------------------------+------------+------------+--------------+--------------+--------------------+ +``` + +##### Repairing + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='repair','p'='3','k'='3','eta'='1.0')| ++-----------------------------+--------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 116.327274| +|1970-01-01T08:00:00.002+08:00| 116.327305| +|1970-01-01T08:00:00.003+08:00| 116.3273291| +|1970-01-01T08:00:00.004+08:00| 116.327342| +|1970-01-01T08:00:00.005+08:00| 116.3273744| +|1970-01-01T08:00:00.006+08:00| 116.3274117| +|1970-01-01T08:00:00.007+08:00| 116.3274396| +|1970-01-01T08:00:00.008+08:00| 116.3274668| +|1970-01-01T08:00:00.009+08:00| 116.3275026| +|1970-01-01T08:00:00.010+08:00| 116.3274967| +|1970-01-01T08:00:00.011+08:00| 116.3274929| +|1970-01-01T08:00:00.012+08:00| 116.3274745| +|1970-01-01T08:00:00.013+08:00| 116.3275095| +|1970-01-01T08:00:00.014+08:00| 116.3274787| +|1970-01-01T08:00:00.015+08:00| 116.3274693| +|1970-01-01T08:00:00.016+08:00| 116.3274941| +|1970-01-01T08:00:00.017+08:00| 116.3275401| +|1970-01-01T08:00:00.018+08:00| 116.3275713| +|1970-01-01T08:00:00.019+08:00| 116.3276003| +|1970-01-01T08:00:00.020+08:00| 116.3276308| +|1970-01-01T08:00:00.021+08:00| 116.3276338| +|1970-01-01T08:00:00.022+08:00| 116.3276684| +|1970-01-01T08:00:00.023+08:00| 116.3277016| +|1970-01-01T08:00:00.024+08:00| 116.3277284| +|1970-01-01T08:00:00.025+08:00| 116.3277562| ++-----------------------------+--------------------------------------------------------------------------------------+ +``` + +##### Anomaly Detection + +SQL for query: + +```sql +select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0') from root.test +``` + +Output series: + +``` ++-----------------------------+---------------------------------------------------------------------------------------+ +| Time|MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3','eta'='1.0')| ++-----------------------------+---------------------------------------------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| false| +|1970-01-01T08:00:00.002+08:00| false| +|1970-01-01T08:00:00.003+08:00| false| +|1970-01-01T08:00:00.004+08:00| false| +|1970-01-01T08:00:00.005+08:00| true| +|1970-01-01T08:00:00.006+08:00| true| +|1970-01-01T08:00:00.007+08:00| false| +|1970-01-01T08:00:00.008+08:00| false| +|1970-01-01T08:00:00.009+08:00| false| +|1970-01-01T08:00:00.010+08:00| false| +|1970-01-01T08:00:00.011+08:00| false| +|1970-01-01T08:00:00.012+08:00| false| +|1970-01-01T08:00:00.013+08:00| false| +|1970-01-01T08:00:00.014+08:00| true| +|1970-01-01T08:00:00.015+08:00| false| +|1970-01-01T08:00:00.016+08:00| false| +|1970-01-01T08:00:00.017+08:00| false| +|1970-01-01T08:00:00.018+08:00| false| +|1970-01-01T08:00:00.019+08:00| false| +|1970-01-01T08:00:00.020+08:00| false| +|1970-01-01T08:00:00.021+08:00| false| +|1970-01-01T08:00:00.022+08:00| false| +|1970-01-01T08:00:00.023+08:00| false| +|1970-01-01T08:00:00.024+08:00| false| +|1970-01-01T08:00:00.025+08:00| false| ++-----------------------------+---------------------------------------------------------------------------------------+ +``` + + + +## Frequency Domain Analysis + +### Conv + +#### Usage + +This function is used to calculate the convolution, i.e. polynomial multiplication. + +**Name:** CONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output:** Output a single series. The type is DOUBLE. It is the result of convolution whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 0.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select conv(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------+ +| Time|conv(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+--------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| +|1970-01-01T08:00:00.003+08:00| 2.0| ++-----------------------------+--------------------------------------+ +``` + +### Deconv + +#### Usage + +This function is used to calculate the deconvolution, i.e. polynomial division. + +**Name:** DECONV + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `result`: The result of deconvolution, which is 'quotient' or 'remainder'. By default, the quotient will be output. + +**Output:** Output a single series. The type is DOUBLE. It is the result of deconvolving the second series from the first series (dividing the first series by the second series) whose timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Calculate the quotient + +When `result` is 'quotient' or the default, this function calculates the quotient of the deconvolution. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s3|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 8.0| 7.0| +|1970-01-01T08:00:00.001+08:00| 2.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 7.0| null| +|1970-01-01T08:00:00.003+08:00| 2.0| null| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select deconv(s3,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2)| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 1.0| ++-----------------------------+----------------------------------------+ +``` + +##### Calculate the remainder + +When `result` is 'remainder', this function calculates the remainder of the deconvolution. + +Input series is the same as above, the SQL for query is shown below: + + +```sql +select deconv(s3,s2,'result'='remainder') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------+ +| Time|deconv(root.test.d2.s3, root.test.d2.s2, "result"="remainder")| ++-----------------------------+--------------------------------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.0| +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 0.0| +|1970-01-01T08:00:00.003+08:00| 0.0| ++-----------------------------+--------------------------------------------------------------+ +``` + +### DWT + +#### Usage + +This function is used to calculate 1d discrete wavelet transform of a numerical series. + +**Name:** DWT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of wavelet. May select 'Haar', 'DB4', 'DB6', 'DB8', where DB means Daubechies. User may offer coefficients of wavelet transform and ignore this parameter. Case ignored. ++ `coef`: Coefficients of wavelet transform. When providing this parameter, use comma ',' to split them, and leave no spaces or other punctuations. ++ `layer`: Times to transform. The number of output vectors equals $layer+1$. Default is 1. + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. + +**Note:** The length of input series must be an integer number power of 2. + +#### Examples + + +##### Haar wavelet transform + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.100+08:00| 0.2| +|1970-01-01T08:00:00.200+08:00| 1.5| +|1970-01-01T08:00:00.300+08:00| 1.2| +|1970-01-01T08:00:00.400+08:00| 0.6| +|1970-01-01T08:00:00.500+08:00| 1.7| +|1970-01-01T08:00:00.600+08:00| 0.8| +|1970-01-01T08:00:00.700+08:00| 2.0| +|1970-01-01T08:00:00.800+08:00| 2.5| +|1970-01-01T08:00:00.900+08:00| 2.1| +|1970-01-01T08:00:01.000+08:00| 0.0| +|1970-01-01T08:00:01.100+08:00| 2.0| +|1970-01-01T08:00:01.200+08:00| 1.8| +|1970-01-01T08:00:01.300+08:00| 1.2| +|1970-01-01T08:00:01.400+08:00| 1.0| +|1970-01-01T08:00:01.500+08:00| 1.6| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select dwt(s1,"method"="haar") from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dwt(root.test.d1.s1, "method"="haar")| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.14142135834465192| +|1970-01-01T08:00:00.100+08:00| 1.909188342921157| +|1970-01-01T08:00:00.200+08:00| 1.6263456473052773| +|1970-01-01T08:00:00.300+08:00| 1.9798989957517026| +|1970-01-01T08:00:00.400+08:00| 3.252691126023161| +|1970-01-01T08:00:00.500+08:00| 1.414213562373095| +|1970-01-01T08:00:00.600+08:00| 2.1213203435596424| +|1970-01-01T08:00:00.700+08:00| 1.8384776479437628| +|1970-01-01T08:00:00.800+08:00| -0.14142135834465192| +|1970-01-01T08:00:00.900+08:00| 0.21213200063848547| +|1970-01-01T08:00:01.000+08:00| -0.7778174761639416| +|1970-01-01T08:00:01.100+08:00| -0.8485281289944873| +|1970-01-01T08:00:01.200+08:00| 0.2828427799095765| +|1970-01-01T08:00:01.300+08:00| -1.414213562373095| +|1970-01-01T08:00:01.400+08:00| 0.42426400127697095| +|1970-01-01T08:00:01.500+08:00| -0.42426408557066786| ++-----------------------------+-------------------------------------+ +``` + +### FFT + +#### Usage + +This function is used to calculate the fast Fourier transform (FFT) of a numerical series. + +**Name:** FFT + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The type of FFT, which is 'uniform' (by default) or 'nonuniform'. If the value is 'uniform', the timestamps will be ignored and all data points will be regarded as equidistant. Thus, the equidistant fast Fourier transform algorithm will be applied. If the value is 'nonuniform' (TODO), the non-equidistant fast Fourier transform algorithm will be applied based on timestamps. ++ `result`: The result of FFT, which is 'real', 'imag', 'abs' or 'angle', corresponding to the real part, imaginary part, magnitude and phase angle. By default, the magnitude will be output. ++ `compress`: The parameter of compression, which is within (0,1]. It is the reserved energy ratio of lossy compression. By default, there is no compression. + + +**Output:** Output a single series. The type is DOUBLE. The length is the same as the input. The timestamps starting from 0 only indicate the order. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + + +##### Uniform FFT + +With the default `type`, uniform FFT is applied. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select fft(s1) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+ +| Time| fft(root.test.d1.s1)| ++-----------------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| +|1970-01-01T08:00:00.001+08:00| 1.2727111142703152E-8| +|1970-01-01T08:00:00.002+08:00| 2.385520799101839E-7| +|1970-01-01T08:00:00.003+08:00| 8.723291723972645E-8| +|1970-01-01T08:00:00.004+08:00| 19.999999960195904| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| +|1970-01-01T08:00:00.006+08:00| 3.2260694930700566E-7| +|1970-01-01T08:00:00.007+08:00| 8.723291605373329E-8| +|1970-01-01T08:00:00.008+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.009+08:00| 1.2727110997246171E-8| +|1970-01-01T08:00:00.010+08:00|1.9852334701272664E-23| +|1970-01-01T08:00:00.011+08:00| 1.2727111194499847E-8| +|1970-01-01T08:00:00.012+08:00| 1.108657103979944E-7| +|1970-01-01T08:00:00.013+08:00| 8.723291785769131E-8| +|1970-01-01T08:00:00.014+08:00| 3.226069493070057E-7| +|1970-01-01T08:00:00.015+08:00| 9.999999850988388| +|1970-01-01T08:00:00.016+08:00| 19.999999960195904| +|1970-01-01T08:00:00.017+08:00| 8.723291747109068E-8| +|1970-01-01T08:00:00.018+08:00| 2.3855207991018386E-7| +|1970-01-01T08:00:00.019+08:00| 1.2727112069910878E-8| ++-----------------------------+----------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, there are peaks in $k=4$ and $k=5$ of the output. + +##### Uniform FFT with Compression + +Input series is the same as above, the SQL for query is shown below: + +```sql +select fft(s1, 'result'='real', 'compress'='0.99'), fft(s1, 'result'='imag','compress'='0.99') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| fft(root.test.d1.s1,| fft(root.test.d1.s1,| +| | "result"="real",| "result"="imag",| +| | "compress"="0.99")| "compress"="0.99")| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + +Note: Based on the conjugation of the Fourier transform result, only the first half of the compression result is reserved. +According to the given parameter, data points are reserved from low frequency to high frequency until the reserved energy ratio exceeds it. +The last data point is reserved to indicate the length of the series. + +### HighPass + +#### Usage + +This function performs low-pass filtering on the input series and extracts components above the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** HIGHPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select highpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|highpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.9999999534830373| +|1970-01-01T08:00:01.000+08:00| 1.7462829277628608E-8| +|1970-01-01T08:00:02.000+08:00| -0.9999999593178128| +|1970-01-01T08:00:03.000+08:00| -4.1115269056426626E-8| +|1970-01-01T08:00:04.000+08:00| 0.9999999925494194| +|1970-01-01T08:00:05.000+08:00| 3.328126513330016E-8| +|1970-01-01T08:00:06.000+08:00| -1.0000000183304454| +|1970-01-01T08:00:07.000+08:00| 6.260191433311374E-10| +|1970-01-01T08:00:08.000+08:00| 1.0000000018134796| +|1970-01-01T08:00:09.000+08:00| -3.097210911744423E-17| +|1970-01-01T08:00:10.000+08:00| -1.0000000018134794| +|1970-01-01T08:00:11.000+08:00| -6.260191627862097E-10| +|1970-01-01T08:00:12.000+08:00| 1.0000000183304454| +|1970-01-01T08:00:13.000+08:00| -3.328126501424346E-8| +|1970-01-01T08:00:14.000+08:00| -0.9999999925494196| +|1970-01-01T08:00:15.000+08:00| 4.111526915498874E-8| +|1970-01-01T08:00:16.000+08:00| 0.9999999593178128| +|1970-01-01T08:00:17.000+08:00| -1.7462829341296528E-8| +|1970-01-01T08:00:18.000+08:00| -0.9999999534830369| +|1970-01-01T08:00:19.000+08:00| -1.035237222742873E-16| ++-----------------------------+-----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=sin(2\pi t/4)$ after high-pass filtering. + +### IFFT + +#### Usage + +This function treats the two input series as the real and imaginary part of a complex series, performs an inverse fast Fourier transform (IFFT), and outputs the real part of the result. +For the input format, please refer to the output format of `FFT` function. +Moreover, the compressed output of `FFT` function is also supported. + +**Name:** IFFT + +**Input:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `start`: The start time of the output series with the format 'yyyy-MM-dd HH:mm:ss'. By default, it is '1970-01-01 08:00:00'. ++ `interval`: The interval of the output series, which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it is 1s. + +**Output:** Output a single series. The type is DOUBLE. It is strictly equispaced. The values are the results of IFFT. + +**Note:** If a row contains null points or `NaN`, it will be ignored. + +#### Examples + + +Input series: + +``` ++-----------------------------+----------------------+----------------------+ +| Time| root.test.d1.re| root.test.d1.im| ++-----------------------------+----------------------+----------------------+ +|1970-01-01T08:00:00.000+08:00| 0.0| 0.0| +|1970-01-01T08:00:00.001+08:00| -3.932894010461041E-9| 1.2104201863039066E-8| +|1970-01-01T08:00:00.002+08:00|-1.4021739447490164E-7| 1.9299268669082926E-7| +|1970-01-01T08:00:00.003+08:00| -7.057291240286645E-8| 5.127422242345858E-8| +|1970-01-01T08:00:00.004+08:00| 19.021130288047125| -6.180339875198807| +|1970-01-01T08:00:00.005+08:00| 9.999999850988388| 3.501852745067114E-16| +|1970-01-01T08:00:00.019+08:00| -3.932894898639461E-9|-1.2104202549376264E-8| ++-----------------------------+----------------------+----------------------+ +``` + + +SQL for query: + +```sql +select ifft(re, im, 'interval'='1m', 'start'='2021-01-01 00:00:00') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------+ +| Time|ifft(root.test.d1.re, root.test.d1.im, "interval"="1m",| +| | "start"="2021-01-01 00:00:00")| ++-----------------------------+-------------------------------------------------------+ +|2021-01-01T00:00:00.000+08:00| 2.902112992431231| +|2021-01-01T00:01:00.000+08:00| 1.1755704705132448| +|2021-01-01T00:02:00.000+08:00| -2.175570513757101| +|2021-01-01T00:03:00.000+08:00| -1.9021130389094498| +|2021-01-01T00:04:00.000+08:00| 0.9999999925494194| +|2021-01-01T00:05:00.000+08:00| 1.902113046743454| +|2021-01-01T00:06:00.000+08:00| 0.17557053610884188| +|2021-01-01T00:07:00.000+08:00| -1.1755704886020932| +|2021-01-01T00:08:00.000+08:00| -0.9021130371347148| +|2021-01-01T00:09:00.000+08:00| 3.552713678800501E-16| +|2021-01-01T00:10:00.000+08:00| 0.9021130371347154| +|2021-01-01T00:11:00.000+08:00| 1.1755704886020932| +|2021-01-01T00:12:00.000+08:00| -0.17557053610884144| +|2021-01-01T00:13:00.000+08:00| -1.902113046743454| +|2021-01-01T00:14:00.000+08:00| -0.9999999925494196| +|2021-01-01T00:15:00.000+08:00| 1.9021130389094498| +|2021-01-01T00:16:00.000+08:00| 2.1755705137571004| +|2021-01-01T00:17:00.000+08:00| -1.1755704705132448| +|2021-01-01T00:18:00.000+08:00| -2.902112992431231| +|2021-01-01T00:19:00.000+08:00| -3.552713678800501E-16| ++-----------------------------+-------------------------------------------------------+ +``` + +### LowPass + +#### Usage + +This function performs low-pass filtering on the input series and extracts components below the cutoff frequency. +The timestamps of input will be ignored and all data points will be regarded as equidistant. + +**Name:** LOWPASS + +**Input:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `wpass`: The normalized cutoff frequency which values (0,1). This parameter cannot be lacked. + +**Output:** Output a single series. The type is DOUBLE. It is the input after filtering. The length and timestamps of output are the same as the input. + +**Note:** `NaN` in the input series will be ignored. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s1| ++-----------------------------+---------------+ +|1970-01-01T08:00:00.000+08:00| 2.902113| +|1970-01-01T08:00:01.000+08:00| 1.1755705| +|1970-01-01T08:00:02.000+08:00| -2.1755705| +|1970-01-01T08:00:03.000+08:00| -1.9021131| +|1970-01-01T08:00:04.000+08:00| 1.0| +|1970-01-01T08:00:05.000+08:00| 1.9021131| +|1970-01-01T08:00:06.000+08:00| 0.1755705| +|1970-01-01T08:00:07.000+08:00| -1.1755705| +|1970-01-01T08:00:08.000+08:00| -0.902113| +|1970-01-01T08:00:09.000+08:00| 0.0| +|1970-01-01T08:00:10.000+08:00| 0.902113| +|1970-01-01T08:00:11.000+08:00| 1.1755705| +|1970-01-01T08:00:12.000+08:00| -0.1755705| +|1970-01-01T08:00:13.000+08:00| -1.9021131| +|1970-01-01T08:00:14.000+08:00| -1.0| +|1970-01-01T08:00:15.000+08:00| 1.9021131| +|1970-01-01T08:00:16.000+08:00| 2.1755705| +|1970-01-01T08:00:17.000+08:00| -1.1755705| +|1970-01-01T08:00:18.000+08:00| -2.902113| +|1970-01-01T08:00:19.000+08:00| 0.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select lowpass(s1,'wpass'='0.45') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+----------------------------------------+ +| Time|lowpass(root.test.d1.s1, "wpass"="0.45")| ++-----------------------------+----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 1.9021130073323922| +|1970-01-01T08:00:01.000+08:00| 1.1755704705132448| +|1970-01-01T08:00:02.000+08:00| -1.1755705286582614| +|1970-01-01T08:00:03.000+08:00| -1.9021130389094498| +|1970-01-01T08:00:04.000+08:00| 7.450580419288145E-9| +|1970-01-01T08:00:05.000+08:00| 1.902113046743454| +|1970-01-01T08:00:06.000+08:00| 1.1755705212076808| +|1970-01-01T08:00:07.000+08:00| -1.1755704886020932| +|1970-01-01T08:00:08.000+08:00| -1.9021130222335536| +|1970-01-01T08:00:09.000+08:00| 3.552713678800501E-16| +|1970-01-01T08:00:10.000+08:00| 1.9021130222335536| +|1970-01-01T08:00:11.000+08:00| 1.1755704886020932| +|1970-01-01T08:00:12.000+08:00| -1.1755705212076801| +|1970-01-01T08:00:13.000+08:00| -1.902113046743454| +|1970-01-01T08:00:14.000+08:00| -7.45058112983088E-9| +|1970-01-01T08:00:15.000+08:00| 1.9021130389094498| +|1970-01-01T08:00:16.000+08:00| 1.1755705286582616| +|1970-01-01T08:00:17.000+08:00| -1.1755704705132448| +|1970-01-01T08:00:18.000+08:00| -1.9021130073323924| +|1970-01-01T08:00:19.000+08:00| -2.664535259100376E-16| ++-----------------------------+----------------------------------------+ +``` + +Note: The input is $y=sin(2\pi t/4)+2sin(2\pi t/5)$ with a length of 20. Thus, the output is $y=2sin(2\pi t/5)$ after low-pass filtering. + + + +## Data Matching + +### Cov + +#### Usage + +This function is used to calculate the population covariance. + +**Name:** COV + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the population covariance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select cov(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|cov(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 12.291666666666666| ++-----------------------------+-------------------------------------+ +``` + +### DTW + +#### Usage + +This function is used to calculate the DTW distance between two input series. + +**Name:** DTW + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the DTW distance. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `0` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|1970-01-01T08:00:00.001+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.002+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.003+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.004+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.005+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.006+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.007+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.008+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.009+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.010+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.011+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.012+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.013+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.014+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.015+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.016+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.017+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.018+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.019+08:00| 1.0| 2.0| +|1970-01-01T08:00:00.020+08:00| 1.0| 2.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select dtw(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------+ +| Time|dtw(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 20.0| ++-----------------------------+-------------------------------------+ +``` + +### Pearson + +#### Usage + +This function is used to calculate the Pearson Correlation Coefficient. + +**Name:** PEARSON + +**Input Series:** Only support two input series. The types are both INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the Pearson Correlation Coefficient. + +**Note:** + ++ If a row contains missing points, null points or `NaN`, it will be ignored; ++ If all rows are ignored, `NaN` will be output. + + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d2.s1|root.test.d2.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| 101.0| +|2020-01-01T00:00:03.000+08:00| 101.0| null| +|2020-01-01T00:00:04.000+08:00| 102.0| 101.0| +|2020-01-01T00:00:06.000+08:00| 104.0| 102.0| +|2020-01-01T00:00:08.000+08:00| 126.0| 102.0| +|2020-01-01T00:00:10.000+08:00| 108.0| 103.0| +|2020-01-01T00:00:12.000+08:00| null| 103.0| +|2020-01-01T00:00:14.000+08:00| 112.0| 104.0| +|2020-01-01T00:00:15.000+08:00| 113.0| null| +|2020-01-01T00:00:16.000+08:00| 114.0| 104.0| +|2020-01-01T00:00:18.000+08:00| 116.0| 105.0| +|2020-01-01T00:00:20.000+08:00| 118.0| 105.0| +|2020-01-01T00:00:22.000+08:00| 100.0| 106.0| +|2020-01-01T00:00:26.000+08:00| 124.0| 108.0| +|2020-01-01T00:00:28.000+08:00| 126.0| 108.0| +|2020-01-01T00:00:30.000+08:00| NaN| 108.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select pearson(s1,s2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------------------------+ +| Time|pearson(root.test.d2.s1, root.test.d2.s2)| ++-----------------------------+-----------------------------------------+ +|1970-01-01T08:00:00.000+08:00| 0.5630881927754872| ++-----------------------------+-----------------------------------------+ +``` + +### PtnSym + +#### Usage + +This function is used to find all symmetric subseries in the input whose degree of symmetry is less than the threshold. +The degree of symmetry is calculated by DTW. +The smaller the degree, the more symmetrical the series is. + +**Name:** PATTERNSYMMETRIC + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE + +**Parameter:** + ++ `window`: The length of the symmetric subseries. It's a positive integer and the default value is 10. ++ `threshold`: The threshold of the degree of symmetry. It's non-negative. Only the subseries whose degree of symmetry is below it will be output. By default, all subseries will be output. + + +**Output Series:** Output a single series. The type is DOUBLE. Each data point in the output series corresponds to a symmetric subseries. The output timestamp is the starting timestamp of the subseries and the output value is the degree of symmetry. + +#### Example + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d1.s4| ++-----------------------------+---------------+ +|2021-01-01T12:00:00.000+08:00| 1.0| +|2021-01-01T12:00:01.000+08:00| 2.0| +|2021-01-01T12:00:02.000+08:00| 3.0| +|2021-01-01T12:00:03.000+08:00| 2.0| +|2021-01-01T12:00:04.000+08:00| 1.0| +|2021-01-01T12:00:05.000+08:00| 1.0| +|2021-01-01T12:00:06.000+08:00| 1.0| +|2021-01-01T12:00:07.000+08:00| 1.0| +|2021-01-01T12:00:08.000+08:00| 2.0| +|2021-01-01T12:00:09.000+08:00| 3.0| +|2021-01-01T12:00:10.000+08:00| 2.0| +|2021-01-01T12:00:11.000+08:00| 1.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ptnsym(s4, 'window'='5', 'threshold'='0') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|ptnsym(root.test.d1.s4, "window"="5", "threshold"="0")| ++-----------------------------+------------------------------------------------------+ +|2021-01-01T12:00:00.000+08:00| 0.0| +|2021-01-01T12:00:07.000+08:00| 0.0| ++-----------------------------+------------------------------------------------------+ +``` + +### XCorr + +#### Usage + +This function is used to calculate the cross correlation function of given two time series. +For discrete time series, cross correlation is given by +$$CR(n) = \frac{1}{N} \sum_{m=1}^N S_1[m]S_2[m+n]$$ +which represent the similarities between two series with different index shifts. + +**Name:** XCORR + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Output Series:** Output a single series with DOUBLE as datatype. +There are $2N-1$ data points in the series, the center of which represents the cross correlation +calculated with pre-aligned series(that is $CR(0)$ in the formula above), +and the previous(or post) values represent those with shifting the latter series forward(or backward otherwise) +until the two series are no longer overlapped(not included). +In short, the values of output series are given by(index starts from 1) +$$OS[i] = CR(-N+i) = \frac{1}{N} \sum_{m=1}^{i} S_1[m]S_2[N-i+m],\ if\ i <= N$$ +$$OS[i] = CR(i-N) = \frac{1}{N} \sum_{m=1}^{2N-i} S_1[i-N+m]S_2[m],\ if\ i > N$$ + +**Note:** + ++ `null` and `NaN` values in the input series will be ignored and treated as 0. + +#### Examples + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| null| 6| +|2020-01-01T00:00:02.000+08:00| 2| 7| +|2020-01-01T00:00:03.000+08:00| 3| NaN| +|2020-01-01T00:00:04.000+08:00| 4| 9| +|2020-01-01T00:00:05.000+08:00| 5| 10| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 +``` + +Output series: + +``` ++-----------------------------+---------------------------------------+ +| Time|xcorr(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+---------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.0| +|1970-01-01T08:00:00.002+08:00| 4.0| +|1970-01-01T08:00:00.003+08:00| 9.6| +|1970-01-01T08:00:00.004+08:00| 13.4| +|1970-01-01T08:00:00.005+08:00| 20.0| +|1970-01-01T08:00:00.006+08:00| 15.6| +|1970-01-01T08:00:00.007+08:00| 9.2| +|1970-01-01T08:00:00.008+08:00| 11.8| +|1970-01-01T08:00:00.009+08:00| 6.0| ++-----------------------------+---------------------------------------+ +``` + + + +## Data Repairing + +### TimestampRepair + +This function is used for timestamp repair. +According to the given standard time interval, +the method of minimizing the repair cost is adopted. +By fine-tuning the timestamps, +the original data with unstable timestamp interval is repaired to strictly equispaced data. +If no standard time interval is given, +this function will use the **median**, **mode** or **cluster** of the time interval to estimate the standard time interval. + +**Name:** TIMESTAMPREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `interval`: The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method. ++ `method`: The method to estimate the standard time interval, which is 'median', 'mode' or 'cluster'. This parameter is only valid when `interval` is not given. By default, median will be used. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +##### Manually Specify the Standard Time Interval + +When `interval` is given, this function repairs according to the given standard time interval. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:19.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:01.000+08:00| 7.0| +|2021-07-01T12:01:11.000+08:00| 8.0| +|2021-07-01T12:01:21.000+08:00| 9.0| +|2021-07-01T12:01:31.000+08:00| 10.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select timestamprepair(s1,'interval'='10000') from root.test.d2 +``` + +Output series: + + +``` ++-----------------------------+----------------------------------------------------+ +| Time|timestamprepair(root.test.d2.s1, "interval"="10000")| ++-----------------------------+----------------------------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+----------------------------------------------------+ +``` + +##### Automatically Estimate the Standard Time Interval + +When `interval` is default, this function estimates the standard time interval. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select timestamprepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------+ +| Time|timestamprepair(root.test.d2.s1)| ++-----------------------------+--------------------------------+ +|2021-07-01T12:00:00.000+08:00| 1.0| +|2021-07-01T12:00:10.000+08:00| 2.0| +|2021-07-01T12:00:20.000+08:00| 3.0| +|2021-07-01T12:00:30.000+08:00| 4.0| +|2021-07-01T12:00:40.000+08:00| 5.0| +|2021-07-01T12:00:50.000+08:00| 6.0| +|2021-07-01T12:01:00.000+08:00| 7.0| +|2021-07-01T12:01:10.000+08:00| 8.0| +|2021-07-01T12:01:20.000+08:00| 9.0| +|2021-07-01T12:01:30.000+08:00| 10.0| ++-----------------------------+--------------------------------+ +``` + +### ValueFill + +#### Usage + +This function is used to impute time series. Several methods are supported. + +**Name**: ValueFill +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: {"mean", "previous", "linear", "likelihood", "AR", "MA", "SCREEN"}, default "linear". + Method to use for imputation in series. "mean": use global mean value to fill holes; "previous": propagate last valid observation forward to next valid. "linear": simplest interpolation method; "likelihood":Maximum likelihood estimation based on the normal distribution of speed; "AR": auto regression; "MA": moving average; "SCREEN": speed constraint. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0). + +#### Examples + +##### Fill with linear + +When `method` is "linear" or the default, Screen method is used to impute. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| NaN| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| NaN| +|2020-01-01T00:00:22.000+08:00| NaN| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuefill(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-----------------------+ +| Time|valuefill(root.test.d2)| ++-----------------------------+-----------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 108.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.7| +|2020-01-01T00:00:22.000+08:00| 121.3| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-----------------------+ +``` + +##### Previous Fill + +When `method` is "previous", previous method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuefill(s1,"method"="previous") from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------+ +| Time|valuefill(root.test.d2,"method"="previous")| ++-----------------------------+-------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| NaN| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 110.5| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 116.0| +|2020-01-01T00:00:22.000+08:00| 116.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------+ +``` + +### ValueRepair + +#### Usage + +This function is used to repair the value of the time series. +Currently, two methods are supported: +**Screen** is a method based on speed threshold, which makes all speeds meet the threshold requirements under the premise of minimum changes; +**LsGreedy** is a method based on speed change likelihood, which models speed changes as Gaussian distribution, and uses a greedy algorithm to maximize the likelihood. + + +**Name:** VALUEREPAIR + +**Input Series:** Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The method used to repair, which is 'Screen' or 'LsGreedy'. By default, Screen is used. ++ `minSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation. ++ `maxSpeed`: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation. ++ `center`: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0. ++ `sigma`: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Screen + +When `method` is 'Screen' or the default, Screen method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 126.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 100.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| NaN| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select valuerepair(s1) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+----------------------------+ +| Time|valuerepair(root.test.d2.s1)| ++-----------------------------+----------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+----------------------------+ +``` + +##### Repair with LsGreedy + +When `method` is 'LsGreedy', LsGreedy method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select valuerepair(s1,'method'='LsGreedy') from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")| ++-----------------------------+-------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:03.000+08:00| 101.0| +|2020-01-01T00:00:04.000+08:00| 102.0| +|2020-01-01T00:00:06.000+08:00| 104.0| +|2020-01-01T00:00:08.000+08:00| 106.0| +|2020-01-01T00:00:10.000+08:00| 108.0| +|2020-01-01T00:00:14.000+08:00| 112.0| +|2020-01-01T00:00:15.000+08:00| 113.0| +|2020-01-01T00:00:16.000+08:00| 114.0| +|2020-01-01T00:00:18.000+08:00| 116.0| +|2020-01-01T00:00:20.000+08:00| 118.0| +|2020-01-01T00:00:22.000+08:00| 120.0| +|2020-01-01T00:00:26.000+08:00| 124.0| +|2020-01-01T00:00:28.000+08:00| 126.0| +|2020-01-01T00:00:30.000+08:00| 128.0| ++-----------------------------+-------------------------------------------------+ +``` + +### MasterRepair + +#### Usage + +This function is used to clean time series with master data. + +**Name**: MasterRepair +**Input Series:** Support multiple input series. The types are are in INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `omega`: The window size. It is a non-negative integer whose unit is millisecond. By default, it will be estimated according to the distances of two tuples with various time differences. ++ `eta`: The distance threshold. It is a positive number. By default, it will be estimated according to the distance distribution of tuples in windows. ++ `k`: The number of neighbors in master data. It is a positive integer. By default, it will be estimated according to the tuple dis- tance of the k-th nearest neighbor in the master data. ++ `output_column`: The repaired column to output, defaults to 1 which means output the repair result of the first column. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +#### Examples + +Input series: + +``` ++-----------------------------+------------+------------+------------+------------+------------+------------+ +| Time|root.test.t1|root.test.t2|root.test.t3|root.test.m1|root.test.m2|root.test.m3| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +|2021-07-01T12:00:01.000+08:00| 1704| 1154.55| 0.195| 1704| 1154.55| 0.195| +|2021-07-01T12:00:02.000+08:00| 1702| 1152.30| 0.193| 1702| 1152.30| 0.193| +|2021-07-01T12:00:03.000+08:00| 1702| 1148.65| 0.192| 1702| 1148.65| 0.192| +|2021-07-01T12:00:04.000+08:00| 1701| 1145.20| 0.194| 1701| 1145.20| 0.194| +|2021-07-01T12:00:07.000+08:00| 1703| 1150.55| 0.195| 1703| 1150.55| 0.195| +|2021-07-01T12:00:08.000+08:00| 1694| 1151.55| 0.193| 1704| 1151.55| 0.193| +|2021-07-01T12:01:09.000+08:00| 1705| 1153.55| 0.194| 1705| 1153.55| 0.194| +|2021-07-01T12:01:10.000+08:00| 1706| 1152.30| 0.190| 1706| 1152.30| 0.190| ++-----------------------------+------------+------------+------------+------------+------------+------------+ +``` + +SQL for query: + +```sql +select MasterRepair(t1,t2,t3,m1,m2,m3) from root.test +``` + +Output series: + + +``` ++-----------------------------+-------------------------------------------------------------------------------------------+ +| Time|MasterRepair(root.test.t1,root.test.t2,root.test.t3,root.test.m1,root.test.m2,root.test.m3)| ++-----------------------------+-------------------------------------------------------------------------------------------+ +|2021-07-01T12:00:01.000+08:00| 1704| +|2021-07-01T12:00:02.000+08:00| 1702| +|2021-07-01T12:00:03.000+08:00| 1702| +|2021-07-01T12:00:04.000+08:00| 1701| +|2021-07-01T12:00:07.000+08:00| 1703| +|2021-07-01T12:00:08.000+08:00| 1704| +|2021-07-01T12:01:09.000+08:00| 1705| +|2021-07-01T12:01:10.000+08:00| 1706| ++-----------------------------+-------------------------------------------------------------------------------------------+ +``` + +### SeasonalRepair + +#### Usage +This function is used to repair the value of the seasonal time series via decomposition. Currently, two methods are supported: **Classical** - detect irregular fluctuations through residual component decomposed by classical decomposition, and repair them through moving average; **Improved** - detect irregular fluctuations through residual component decomposed by improved decomposition, and repair them through moving median. + +**Name:** SEASONALREPAIR + +**Input Series:** Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + ++ `method`: The decomposition method used to repair, which is 'Classical' or 'Improved'. By default, classical decomposition is used. ++ `period`: It is the period of the time series. ++ `k`: It is the range threshold of residual term, which limits the degree to which the residual term is off-center. By default, it is 9. ++ `max_iter`: It is the maximum number of iterations for the algorithm. By default, it is 10. + +**Output Series:** Output a single series. The type is the same as the input. This series is the input after repairing. + +**Note:** `NaN` will be filled with linear interpolation before repairing. + +#### Examples + +##### Repair with Classical + +When `method` is 'Classical' or default value, classical decomposition method is used. + +Input series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d2.s1| ++-----------------------------+---------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 101.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select seasonalrepair(s1,'period'=3,'k'=2) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------+ +| Time|seasonalrepair(root.test.d2.s1, 'period'=4, 'k'=2)| ++-----------------------------+--------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 87.0| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+--------------------------------------------------+ +``` + +##### Repair with Improved +When `method` is 'Improved', improved decomposition method is used. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 +``` + +Output series: + +``` ++-----------------------------+-------------------------------------------------------------+ +| Time|valuerepair(root.test.d2.s1, 'method'='improved', 'period'=3)| ++-----------------------------+-------------------------------------------------------------+ +|2020-01-01T00:00:02.000+08:00| 100.0| +|2020-01-01T00:00:04.000+08:00| 120.0| +|2020-01-01T00:00:06.000+08:00| 80.0| +|2020-01-01T00:00:08.000+08:00| 100.5| +|2020-01-01T00:00:10.000+08:00| 119.5| +|2020-01-01T00:00:12.000+08:00| 81.5| +|2020-01-01T00:00:14.000+08:00| 99.5| +|2020-01-01T00:00:16.000+08:00| 119.0| +|2020-01-01T00:00:18.000+08:00| 80.5| +|2020-01-01T00:00:20.000+08:00| 99.0| +|2020-01-01T00:00:22.000+08:00| 121.0| +|2020-01-01T00:00:24.000+08:00| 79.5| ++-----------------------------+-------------------------------------------------------------+ +``` + + + +## Series Discovery + +### ConsecutiveSequences + +#### Usage + +This function is used to find locally longest consecutive subsequences in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive subsequence is the subsequence that is strictly equispaced with the standard time interval without any missing data. If a consecutive subsequence is not a proper subsequence of any consecutive subsequence, it is locally longest. + +**Name:** CONSECUTIVESEQUENCES + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a locally longest consecutive subsequence. The output timestamp is the starting timestamp of the subsequence and the output value is the number of data points in the subsequence. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + +##### Manually Specify the Standard Time Interval + +It's able to manually specify the standard time interval by the parameter `gap`. It's notable that false parameter leads to false output. + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivesequences(s1,s2,'gap'='5m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2, "gap"="5m")| ++-----------------------------+------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------------------+ +``` + + +##### Automatically Estimate the Standard Time Interval + +When `gap` is default, this function estimates the standard time interval by the mode of time intervals and gets the same results. Therefore, this usage is more recommended. + +Input series is the same as above, the SQL for query is shown below: + +```sql +select consecutivesequences(s1,s2) from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+------------------------------------------------------+ +| Time|consecutivesequences(root.test.d1.s1, root.test.d1.s2)| ++-----------------------------+------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 4| +|2020-01-01T00:45:00.000+08:00| 2| ++-----------------------------+------------------------------------------------------+ +``` + +### ConsecutiveWindows + +#### Usage + +This function is used to find consecutive windows of specified length in strictly equispaced multidimensional data. + +Strictly equispaced data is the data whose time intervals are strictly equal. Missing data, including missing rows and missing values, is allowed in it, while data redundancy and timestamp drift is not allowed. + +Consecutive window is the subsequence that is strictly equispaced with the standard time interval without any missing data. + +**Name:** CONSECUTIVEWINDOWS + +**Input Series:** Support multiple input series. The type is arbitrary but the data is strictly equispaced. + +**Parameters:** + ++ `gap`: The standard time interval which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. By default, it will be estimated by the mode of time intervals. ++ `length`: The length of the window which is a positive number with an unit. The unit is 'ms' for millisecond, 's' for second, 'm' for minute, 'h' for hour and 'd' for day. This parameter cannot be lacked. + +**Output Series:** Output a single series. The type is INT32. Each data point in the output series corresponds to a consecutive window. The output timestamp is the starting timestamp of the window and the output value is the number of data points in the window. + +**Note:** For input series that is not strictly equispaced, there is no guarantee on the output. + +#### Examples + + +Input series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d1.s1|root.test.d1.s2| ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:05:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:10:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:20:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:25:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:30:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:35:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:40:00.000+08:00| 1.0| null| +|2020-01-01T00:45:00.000+08:00| 1.0| 1.0| +|2020-01-01T00:50:00.000+08:00| 1.0| 1.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 +``` + +Output series: + +``` ++-----------------------------+--------------------------------------------------------------------+ +| Time|consecutivewindows(root.test.d1.s1, root.test.d1.s2, "length"="10m")| ++-----------------------------+--------------------------------------------------------------------+ +|2020-01-01T00:00:00.000+08:00| 3| +|2020-01-01T00:20:00.000+08:00| 3| +|2020-01-01T00:25:00.000+08:00| 3| ++-----------------------------+--------------------------------------------------------------------+ +``` + + + +## Machine Learning + +### AR + +#### Usage + +This function is used to learn the coefficients of the autoregressive models for a time series. + +**Name:** AR + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `p`: The order of the autoregressive model. Its default value is 1. + +**Output Series:** Output a single series. The type is DOUBLE. The first line corresponds to the first order coefficient, and so on. + +**Note:** + +- Parameter `p` should be a positive integer. +- Most points in the series should be sampled at a constant time interval. +- Linear interpolation is applied for the missing points in the series. + +#### Examples + +##### Assigning Model Order + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select ar(s0,"p"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+---------------------------+ +| Time|ar(root.test.d0.s0,"p"="2")| ++-----------------------------+---------------------------+ +|1970-01-01T08:00:00.001+08:00| 0.9429| +|1970-01-01T08:00:00.002+08:00| -0.2571| ++-----------------------------+---------------------------+ +``` + +### Representation + +#### Usage + +This function is used to represent a time series. + +**Name:** Representation + +**Input Series:** Only support a single input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is INT32. The length is `tb*vb`. The timestamps starting from 0 only indicate the order. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+ +| Time|root.test.d0.s0| ++-----------------------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| +|2020-01-01T00:00:03.000+08:00| -2.0| +|2020-01-01T00:00:04.000+08:00| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| ++-----------------------------+---------------+ +``` + +SQL for query: + +```sql +select representation(s0,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-------------------------------------------------+ +| Time|representation(root.test.d0.s0,"tb"="3","vb"="2")| ++-----------------------------+-------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1| +|1970-01-01T08:00:00.002+08:00| 1| +|1970-01-01T08:00:00.003+08:00| 0| +|1970-01-01T08:00:00.004+08:00| 0| +|1970-01-01T08:00:00.005+08:00| 1| +|1970-01-01T08:00:00.006+08:00| 1| ++-----------------------------+-------------------------------------------------+ +``` + +### RM + +#### Usage + +This function is used to calculate the matching score of two time series according to the representation. + +**Name:** RM + +**Input Series:** Only support two input numeric series. The type is INT32 / INT64 / FLOAT / DOUBLE. + +**Parameters:** + +- `tb`: The number of timestamp blocks. Its default value is 10. +- `vb`: The number of value blocks. Its default value is 10. + +**Output Series:** Output a single series. The type is DOUBLE. There is only one data point in the series, whose timestamp is 0 and value is the matching score. + +**Note:** + +- Parameters `tb` and `vb` should be positive integers. + +#### Examples + +##### Assigning Window Size and Dimension + +Input Series: + +``` ++-----------------------------+---------------+---------------+ +| Time|root.test.d0.s0|root.test.d0.s1 ++-----------------------------+---------------+---------------+ +|2020-01-01T00:00:01.000+08:00| -4.0| -4.0| +|2020-01-01T00:00:02.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:03.000+08:00| -3.0| -3.0| +|2020-01-01T00:00:04.000+08:00| -1.0| -1.0| +|2020-01-01T00:00:05.000+08:00| 0.0| 0.0| +|2020-01-01T00:00:06.000+08:00| 1.0| 1.0| +|2020-01-01T00:00:07.000+08:00| 2.0| 2.0| +|2020-01-01T00:00:08.000+08:00| 3.0| 3.0| +|2020-01-01T00:00:09.000+08:00| 4.0| 4.0| ++-----------------------------+---------------+---------------+ +``` + +SQL for query: + +```sql +select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 +``` + +Output Series: + +``` ++-----------------------------+-----------------------------------------------------+ +| Time|rm(root.test.d0.s0,root.test.d0.s1,"tb"="3","vb"="2")| ++-----------------------------+-----------------------------------------------------+ +|1970-01-01T08:00:00.001+08:00| 1.00| ++-----------------------------+-----------------------------------------------------+ +``` + diff --git a/src/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md b/src/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md index e6f62f5f..e1fc68de 100644 --- a/src/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md +++ b/src/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md @@ -913,7 +913,7 @@ For more details, see document [Operator-and-Expression](../User-Manual/Operator ### Arithmetic Operators -For details and examples, see the document [Arithmetic Operators and Functions](../Operators-Functions/Mathematical.md). +For details and examples, see the document [Arithmetic Operators and Functions](../User-Manual/Operator-and-Expression.md). ```sql select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 @@ -921,7 +921,7 @@ select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root ### Comparison Operators -For details and examples, see the document [Comparison Operators and Functions](../Operators-Functions/Comparison.md). +For details and examples, see the document [Comparison Operators and Functions](../User-Manual/Operator-and-Expression.md). ```sql # Basic comparison operators @@ -952,7 +952,7 @@ select a, a in (1, 2) from root.test; ### Logical Operators -For details and examples, see the document [Logical Operators](../Operators-Functions/Logical.md). +For details and examples, see the document [Logical Operators](../User-Manual/Operator-and-Expression.md). ```sql select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; @@ -964,7 +964,7 @@ For more details, see document [Operator-and-Expression](../User-Manual/Operator ### Aggregate Functions -For details and examples, see the document [Aggregate Functions](../Operators-Functions/Aggregation.md). +For details and examples, see the document [Aggregate Functions](../User-Manual/Operator-and-Expression.md). ```sql select count(status) from root.ln.wf01.wt01; @@ -977,7 +977,7 @@ select time_duration(s1) from root.db.d1; ### Arithmetic Functions -For details and examples, see the document [Arithmetic Operators and Functions](../Operators-Functions/Mathematical.md). +For details and examples, see the document [Arithmetic Operators and Functions](../User-Manual/Operator-and-Expression.md). ```sql select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; @@ -986,7 +986,7 @@ select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1; ### Comparison Functions -For details and examples, see the document [Comparison Operators and Functions](../Operators-Functions/Comparison.md). +For details and examples, see the document [Comparison Operators and Functions](../User-Manual/Operator-and-Expression.md). ```sql select ts, on_off(ts, 'threshold'='2') from root.test; @@ -995,7 +995,7 @@ select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; ### String Processing Functions -For details and examples, see the document [String Processing](../Operators-Functions/String.md). +For details and examples, see the document [String Processing](../User-Manual/Operator-and-Expression.md). ```sql select s1, string_contains(s1, 's'='warn') from root.sg1.d4; @@ -1023,7 +1023,7 @@ select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 ### Data Type Conversion Function -For details and examples, see the document [Data Type Conversion Function](../Operators-Functions/Conversion.md). +For details and examples, see the document [Data Type Conversion Function](../User-Manual/Operator-and-Expression.md). ```sql SELECT cast(s1 as INT32) from root.sg @@ -1031,7 +1031,7 @@ SELECT cast(s1 as INT32) from root.sg ### Constant Timeseries Generating Functions -For details and examples, see the document [Constant Timeseries Generating Functions](../Operators-Functions/Constant.md). +For details and examples, see the document [Constant Timeseries Generating Functions](../User-Manual/Operator-and-Expression.md). ```sql select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; @@ -1039,7 +1039,7 @@ select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from ### Selector Functions -For details and examples, see the document [Selector Functions](../Operators-Functions/Selection.md). +For details and examples, see the document [Selector Functions](../User-Manual/Operator-and-Expression.md). ```sql select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; @@ -1047,7 +1047,7 @@ select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time ### Continuous Interval Functions -For details and examples, see the document [Continuous Interval Functions](../Operators-Functions/Continuous-Interval.md). +For details and examples, see the document [Continuous Interval Functions](../User-Manual/Operator-and-Expression.md). ```sql select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; @@ -1055,7 +1055,7 @@ select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_durat ### Variation Trend Calculation Functions -For details and examples, see the document [Variation Trend Calculation Functions](../Operators-Functions/Variation-Trend.md). +For details and examples, see the document [Variation Trend Calculation Functions](../User-Manual/Operator-and-Expression.md). ```sql select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; @@ -1066,7 +1066,7 @@ SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root. ### Sample Functions -For details and examples, see the document [Sample Functions](../Operators-Functions/Sample.md). +For details and examples, see the document [Sample Functions](../User-Manual/Operator-and-Expression.md). ```sql select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; @@ -1080,7 +1080,7 @@ select M4(s1,'windowSize'='10') from root.vehicle.d1 ### Change Points Function -For details and examples, see the document [Time-Series](../Operators-Functions/Time-Series.md). +For details and examples, see the document [Time-Series](../User-Manual/Operator-and-Expression.md). ```sql select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 @@ -1092,7 +1092,7 @@ For more details, see document [Operator-and-Expression](../User-Manual/Operator ### Data Quality -For details and examples, see the document [Data-Quality](../Operators-Functions/Data-Quality.md). +For details and examples, see the document [Data-Quality](../User-Manual/Operator-and-Expression.md). ```sql # Completeness @@ -1117,7 +1117,7 @@ select Accuracy(t1,t2,t3,m1,m2,m3) from root.test ### Data Profiling -For details and examples, see the document [Data-Profiling](../Operators-Functions/Data-Profiling.md). +For details and examples, see the document [Data-Profiling](../User-Manual/Operator-and-Expression.md). ```sql # ACF @@ -1197,7 +1197,7 @@ select zscore(s1) from root.test ### Anomaly Detection -For details and examples, see the document [Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md). +For details and examples, see the document [Anomaly-Detection](../User-Manual/Operator-and-Expression.md). ```sql # IQR @@ -1232,7 +1232,7 @@ select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3 ### Frequency Domain -For details and examples, see the document [Frequency-Domain](../Operators-Functions/Frequency-Domain.md). +For details and examples, see the document [Frequency-Domain](../User-Manual/Operator-and-Expression.md). ```sql # Conv @@ -1261,7 +1261,7 @@ select lowpass(s1,'wpass'='0.45') from root.test.d1 ### Data Matching -For details and examples, see the document [Data-Matching](../Operators-Functions/Data-Matching.md). +For details and examples, see the document [Data-Matching](../User-Manual/Operator-and-Expression.md). ```sql # Cov @@ -1282,7 +1282,7 @@ select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 ### Data Repairing -For details and examples, see the document [Data-Repairing](../Operators-Functions/Data-Repairing.md). +For details and examples, see the document [Data-Repairing](../User-Manual/Operator-and-Expression.md). ```sql # TimestampRepair @@ -1307,7 +1307,7 @@ select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 ### Series Discovery -For details and examples, see the document [Series-Discovery](../Operators-Functions/Series-Discovery.md). +For details and examples, see the document [Series-Discovery](../User-Manual/Operator-and-Expression.md). ```sql # ConsecutiveSequences @@ -1320,7 +1320,7 @@ select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 ### Machine Learning -For details and examples, see the document [Machine-Learning](../Operators-Functions/Machine-Learning.md). +For details and examples, see the document [Machine-Learning](../User-Manual/Operator-and-Expression.md). ```sql # AR @@ -1335,7 +1335,7 @@ select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 ## LAMBDA EXPRESSION -For details and examples, see the document [Lambda](../Operators-Functions/Lambda.md). +For details and examples, see the document [Lambda](../User-Manual/Operator-and-Expression.md). ```sql select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` @@ -1343,7 +1343,7 @@ select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'exp ## CONDITIONAL EXPRESSION -For details and examples, see the document [Conditional Expressions](../Operators-Functions/Conditional.md). +For details and examples, see the document [Conditional Expressions](../User-Manual/Operator-and-Expression.md). ```sql select T, P, case diff --git a/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md b/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md index 32af4467..2620a77a 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md +++ b/src/UserGuide/V1.2.x/User-Manual/Database-Programming.md @@ -1551,7 +1551,7 @@ There are 3 types of user permissions related to UDF: * `DROP_FUNCTION`: Only users with this permission are allowed to deregister UDFs * `READ_TIMESERIES`: Only users with this permission are allowed to use UDFs for queries -For more user permissions related content, please refer to [Account Management Statements](../Administration-Management/Administration.md). +For more user permissions related content, please refer to [Account Management Statements](../User-Manual/Authority-Management.md). diff --git a/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md b/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md index 40acd006..083b0286 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md +++ b/src/UserGuide/V1.2.x/User-Manual/Operator-and-Expression.md @@ -261,14 +261,14 @@ The functions in this function library are not built-in functions, and must be l ### Implemented Functions -1. Data Quality related functions, such as `Completeness`. For details and examples, see the document [Data-Quality](../Operators-Functions/Data-Quality.md). -2. Data Profiling related functions, such as `ACF`. For details and examples, see the document [Data-Profiling](../Operators-Functions/Data-Profiling.md). -3. Anomaly Detection related functions, such as `IQR`. For details and examples, see the document [Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md). -4. Frequency Domain Analysis related functions, such as `Conv`. For details and examples, see the document [Frequency-Domain](../Operators-Functions/Frequency-Domain.md). -5. Data Matching related functions, such as `DTW`. For details and examples, see the document [Data-Matching](../Operators-Functions/Data-Matching.md). -6. Data Repairing related functions, such as `TimestampRepair`. For details and examples, see the document [Data-Repairing](../Operators-Functions/Data-Repairing.md). -7. Series Discovery related functions, such as `ConsecutiveSequences`. For details and examples, see the document [Series-Discovery](../Operators-Functions/Series-Discovery.md). -8. Machine Learning related functions, such as `AR`. For details and examples, see the document [Machine-Learning](../Operators-Functions/Machine-Learning.md). +1. Data Quality related functions, such as `Completeness`. For details and examples, see the document [Data-Quality](../Reference/UDF-Libraries.md#Data-Quality). +2. Data Profiling related functions, such as `ACF`. For details and examples, see the document [Data-Profiling](../Reference/UDF-Libraries.md#Data-Profiling). +3. Anomaly Detection related functions, such as `IQR`. For details and examples, see the document [Anomaly-Detection](../Reference/UDF-Libraries.md#Anomaly-Detection). +4. Frequency Domain Analysis related functions, such as `Conv`. For details and examples, see the document [Frequency-Domain](../Reference/UDF-Libraries.md#Frequency-Domain). +5. Data Matching related functions, such as `DTW`. For details and examples, see the document [Data-Matching](../Reference/UDF-Libraries.md#Data-Matching). +6. Data Repairing related functions, such as `TimestampRepair`. For details and examples, see the document [Data-Repairing](../Reference/UDF-Libraries.md#Data-Repairing). +7. Series Discovery related functions, such as `ConsecutiveSequences`. For details and examples, see the document [Series-Discovery](../Reference/UDF-Libraries.md#Series-Discovery). +8. Machine Learning related functions, such as `AR`. For details and examples, see the document [Machine-Learning](../Reference/UDF-Libraries.md#Machine-Learning). ## LAMBDA EXPRESSION diff --git a/src/zh/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md b/src/zh/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md index 7457e0f8..2853dbf4 100644 --- a/src/zh/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md +++ b/src/zh/UserGuide/V1.2.x/SQL-Manual/SQL-Manual.md @@ -1009,7 +1009,7 @@ select s1, s2 into root.sg_copy.d1(t1, t2), aligned root.sg_copy.d2(t1, t2) from ### 算数运算符 -更多见文档 [Arithmetic Operators and Functions](../Operators-Functions/Mathematical.md) +更多见文档 [Arithmetic Operators and Functions](../User-Manual/Operator-and-Expression.md) ```sql select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root.sg.d1 @@ -1017,7 +1017,7 @@ select s1, - s1, s2, + s2, s1 + s2, s1 - s2, s1 * s2, s1 / s2, s1 % s2 from root ### 比较运算符 -更多见文档[Comparison Operators and Functions](../Operators-Functions/Comparison.md) +更多见文档[Comparison Operators and Functions](../User-Manual/Operator-and-Expression.md) ```sql # Basic comparison operators @@ -1048,7 +1048,7 @@ select a, a in (1, 2) from root.test; ### 逻辑运算符 -更多见文档[Logical Operators](../Operators-Functions/Logical.md) +更多见文档[Logical Operators](../User-Manual/Operator-and-Expression.md) ```sql select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; @@ -1060,7 +1060,7 @@ select a, b, a > 10, a <= b, !(a <= b), a > 10 && a > b from root.test; ### Aggregate Functions -更多见文档[Aggregate Functions](../Operators-Functions/Aggregation.md) +更多见文档[Aggregate Functions](../User-Manual/Operator-and-Expression.md) ```sql select count(status) from root.ln.wf01.wt01; @@ -1073,7 +1073,7 @@ select time_duration(s1) from root.db.d1; ### 算数函数 -更多见文档[Arithmetic Operators and Functions](../Operators-Functions/Mathematical.md) +更多见文档[Arithmetic Operators and Functions](../User-Manual/Operator-and-Expression.md) ```sql select s1, sin(s1), cos(s1), tan(s1) from root.sg1.d1 limit 5 offset 1000; @@ -1082,7 +1082,7 @@ select s4,round(s4),round(s4,2),round(s4,-1) from root.sg1.d1; ### 比较函数 -更多见文档[Comparison Operators and Functions](../Operators-Functions/Comparison.md) +更多见文档[Comparison Operators and Functions](../User-Manual/Operator-and-Expression.md) ```sql select ts, on_off(ts, 'threshold'='2') from root.test; @@ -1091,7 +1091,7 @@ select ts, in_range(ts, 'lower'='2', 'upper'='3.1') from root.test; ### 字符串处理函数 -更多见文档[String Processing](../Operators-Functions/String.md) +更多见文档[String Processing](../User-Manual/Operator-and-Expression.md) ```sql select s1, string_contains(s1, 's'='warn') from root.sg1.d4; @@ -1119,7 +1119,7 @@ select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1 ### 数据类型转换函数 -更多见文档[Data Type Conversion Function](../Operators-Functions/Conversion.md) +更多见文档[Data Type Conversion Function](../User-Manual/Operator-and-Expression.md) ```sql SELECT cast(s1 as INT32) from root.sg @@ -1127,7 +1127,7 @@ SELECT cast(s1 as INT32) from root.sg ### 常序列生成函数 -更多见文档[Constant Timeseries Generating Functions](../Operators-Functions/Constant.md) +更多见文档[Constant Timeseries Generating Functions](../User-Manual/Operator-and-Expression.md) ```sql select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from root.sg1.d1; @@ -1135,7 +1135,7 @@ select s1, s2, const(s1, 'value'='1024', 'type'='INT64'), pi(s2), e(s1, s2) from ### 选择函数 -更多见文档[Selector Functions](../Operators-Functions/Selection.md) +更多见文档[Selector Functions](../User-Manual/Operator-and-Expression.md) ```sql select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time > 2020-12-10T20:36:15.530+08:00; @@ -1143,7 +1143,7 @@ select s1, top_k(s1, 'k'='2'), bottom_k(s1, 'k'='2') from root.sg1.d2 where time ### 区间查询函数 -更多见文档[Continuous Interval Functions](../Operators-Functions/Continuous-Interval.md) +更多见文档[Continuous Interval Functions](../User-Manual/Operator-and-Expression.md) ```sql select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_duration(s4) from root.sg.d2; @@ -1151,7 +1151,7 @@ select s1, zero_count(s1), non_zero_count(s2), zero_duration(s3), non_zero_durat ### 趋势计算函数 -更多见文档[Variation Trend Calculation Functions](../Operators-Functions/Variation-Trend.md) +更多见文档[Variation Trend Calculation Functions](../User-Manual/Operator-and-Expression.md) ```sql select s1, time_difference(s1), difference(s1), non_negative_difference(s1), derivative(s1), non_negative_derivative(s1) from root.sg1.d1 limit 5 offset 1000; @@ -1162,7 +1162,7 @@ SELECT DIFF(s1, 'ignoreNull'='false'), DIFF(s2, 'ignoreNull'='false') from root. ### 采样函数 -更多见文档[Sample Functions](../Operators-Functions/Sample.md) +更多见文档[Sample Functions](../User-Manual/Operator-and-Expression.md) ```sql select equal_size_bucket_random_sample(temperature,'proportion'='0.1') as random_sample from root.ln.wf01.wt01; @@ -1176,7 +1176,7 @@ select M4(s1,'windowSize'='10') from root.vehicle.d1 ### 时间序列处理函数 -更多见文档[Time-Series](../Operators-Functions/Time-Series.md) +更多见文档[Time-Series](../User-Manual/Operator-and-Expression.md) ```sql select change_points(s1), change_points(s2), change_points(s3), change_points(s4), change_points(s5), change_points(s6) from root.testChangePoints.d1 @@ -1188,7 +1188,7 @@ select change_points(s1), change_points(s2), change_points(s3), change_points(s4 ### 数据质量 -更多见文档[Data-Quality](../Operators-Functions/Data-Quality.md) +更多见文档[Data-Quality](../User-Manual/Operator-and-Expression.md) ```sql # Completeness @@ -1213,7 +1213,7 @@ select Accuracy(t1,t2,t3,m1,m2,m3) from root.test ### 数据画像 -更多见文档[Data-Profiling](../Operators-Functions/Data-Profiling.md) +更多见文档[Data-Profiling](../User-Manual/Operator-and-Expression.md) ```sql # ACF @@ -1293,7 +1293,7 @@ select zscore(s1) from root.test ### 异常检测 -更多见文档[Anomaly-Detection](../Operators-Functions/Anomaly-Detection.md) +更多见文档[Anomaly-Detection](../User-Manual/Operator-and-Expression.md) ```sql # IQR @@ -1328,7 +1328,7 @@ select MasterDetect(lo,la,m_lo,m_la,model,'output_type'='anomaly','p'='3','k'='3 ### 频域分析 -更多见文档[Frequency-Domain](../Operators-Functions/Frequency-Domain.md) +更多见文档[Frequency-Domain](../User-Manual/Operator-and-Expression.md) ```sql # Conv @@ -1357,7 +1357,7 @@ select lowpass(s1,'wpass'='0.45') from root.test.d1 ### 数据匹配 -更多见文档[Data-Matching](../Operators-Functions/Data-Matching.md) +更多见文档[Data-Matching](../User-Manual/Operator-and-Expression.md) ```sql # Cov @@ -1378,7 +1378,7 @@ select xcorr(s1, s2) from root.test.d1 where time <= 2020-01-01 00:00:05 ### 数据修复 -更多见文档[Data-Repairing](../Operators-Functions/Data-Repairing.md) +更多见文档[Data-Repairing](../User-Manual/Operator-and-Expression.md) ```sql # TimestampRepair @@ -1403,7 +1403,7 @@ select seasonalrepair(s1,'method'='improved','period'=3) from root.test.d2 ### 序列发现 -更多见文档[Series-Discovery](../Operators-Functions/Series-Discovery.md) +更多见文档[Series-Discovery](../User-Manual/Operator-and-Expression.md) ```sql # ConsecutiveSequences @@ -1416,7 +1416,7 @@ select consecutivewindows(s1,s2,'length'='10m') from root.test.d1 ### 机器学习 -更多见文档[Machine-Learning](../Operators-Functions/Machine-Learning.md) +更多见文档[Machine-Learning](../User-Manual/Operator-and-Expression.md) ```sql # AR @@ -1431,7 +1431,7 @@ select rm(s0, s1,"tb"="3","vb"="2") from root.test.d0 ## Lambda 表达式 -更多见文档[Lambda](../Operators-Functions/Lambda.md) +更多见文档[Lambda](../User-Manual/Operator-and-Expression.md) ```sql select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'expr'='x -> {x * 3}') as jexl2, jexl(temperature, 'expr'='x -> {x * x}') as jexl3, jexl(temperature, 'expr'='x -> {multiply(x, 100)}') as jexl4, jexl(temperature, st, 'expr'='(x, y) -> {x + y}') as jexl5, jexl(temperature, st, str, 'expr'='(x, y, z) -> {x + y + z}') as jexl6 from root.ln.wf01.wt01;``` @@ -1439,7 +1439,7 @@ select jexl(temperature, 'expr'='x -> {x + x}') as jexl1, jexl(temperature, 'exp ## 条件表达式 -更多见文档[Conditional Expressions](../Operators-Functions/Conditional.md) +更多见文档[Conditional Expressions](../User-Manual/Operator-and-Expression.md) ```sql select T, P, case From be2e06d6f8f8d5dfd76800418ca7973bcfb1fd5b Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 20 Sep 2023 18:46:31 +0800 Subject: [PATCH 05/27] add url of dbeaver-jar --- .../Master/Ecosystem-Integration/DBeaver.md | 17 +++++++---------- .../V1.2.x/Ecosystem-Integration/DBeaver.md | 17 +++++++---------- .../Master/Ecosystem-Integration/DBeaver.md | 19 ++++++++----------- .../V1.2.x/Ecosystem-Integration/DBeaver.md | 19 ++++++++----------- 4 files changed, 30 insertions(+), 42 deletions(-) diff --git a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md index 4c8b9061..7c3d36cf 100644 --- a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md +++ b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md @@ -51,24 +51,21 @@ DBeaver is a SQL client software application and a database administration tool. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) -5. Download [Sources](https://iotdb.apache.org/Download/),unzip it and compile jdbc driver by the following command - - ```shell - mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies - ``` +5. Download `iotdb-jdbc` , from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar). + 6. Find and add a lib named `apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`, which should be under `jdbc/target/`, then select `Find Class`. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) -8. Edit the driver Settings +7. Edit the driver Settings ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/05.png) -9. Open New DataBase Connection and select iotdb +8. Open New DataBase Connection and select iotdb ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/06.png) -10. Edit JDBC Connection Settings +9. Edit JDBC Connection Settings ``` JDBC URL: jdbc:iotdb://127.0.0.1:6667/ @@ -77,10 +74,10 @@ DBeaver is a SQL client software application and a database administration tool. ``` ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/07.png) -11. Test Connection +10. Test Connection ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/08.png) -12. Enjoy IoTDB with DBeaver +11. Enjoy IoTDB with DBeaver ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/09.png) diff --git a/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md b/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md index 4c8b9061..7e336557 100644 --- a/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md +++ b/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md @@ -51,24 +51,21 @@ DBeaver is a SQL client software application and a database administration tool. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) -5. Download [Sources](https://iotdb.apache.org/Download/),unzip it and compile jdbc driver by the following command - - ```shell - mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies - ``` +5. Download `iotdb-jdbc` , from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar). + 6. Find and add a lib named `apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`, which should be under `jdbc/target/`, then select `Find Class`. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) -8. Edit the driver Settings +7. Edit the driver Settings ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/05.png) -9. Open New DataBase Connection and select iotdb +8. Open New DataBase Connection and select iotdb ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/06.png) -10. Edit JDBC Connection Settings +9. Edit JDBC Connection Settings ``` JDBC URL: jdbc:iotdb://127.0.0.1:6667/ @@ -77,10 +74,10 @@ DBeaver is a SQL client software application and a database administration tool. ``` ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/07.png) -11. Test Connection +10. Test Connection ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/08.png) -12. Enjoy IoTDB with DBeaver +11. Enjoy IoTDB with DBeaver ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/09.png) diff --git a/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md b/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md index b775002c..5c3c2b17 100644 --- a/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md +++ b/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md @@ -51,24 +51,21 @@ DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 I ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) -5. 下载[源代码](https://iotdb.apache.org/zh/Download/),解压并运行下面的命令编译 jdbc 驱动 - - ```shell - mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies - ``` -7. 在`jdbc/target/`下找到并添加名为`apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`的库,点击 `Find Class`。 +5. 下载 jdbc 驱动, 可点击下载 [地址1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) 或 [地址2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar)。 + +6. 在`jdbc/target/`下找到并添加名为`apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`的库,点击 `Find Class`。 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) -8. 编辑驱动设置 +7. 编辑驱动设置 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/05.png) -9. 新建 DataBase Connection, 选择 iotdb +8. 新建 DataBase Connection, 选择 iotdb ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/06.png) -10. 编辑 JDBC 连接设置 +9. 编辑 JDBC 连接设置 ``` JDBC URL: jdbc:iotdb://127.0.0.1:6667/ @@ -77,10 +74,10 @@ DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 I ``` ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/07.png) -11. 测试连接 +10. 测试连接 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/08.png) -12. 可以开始通过 DBeaver 使用 IoTDB +11. 可以开始通过 DBeaver 使用 IoTDB ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/09.png) diff --git a/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md b/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md index b775002c..5c3c2b17 100644 --- a/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md +++ b/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md @@ -51,24 +51,21 @@ DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 I ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) -5. 下载[源代码](https://iotdb.apache.org/zh/Download/),解压并运行下面的命令编译 jdbc 驱动 - - ```shell - mvn clean package -pl iotdb-client/jdbc -am -DskipTests -P get-jar-with-dependencies - ``` -7. 在`jdbc/target/`下找到并添加名为`apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`的库,点击 `Find Class`。 +5. 下载 jdbc 驱动, 可点击下载 [地址1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) 或 [地址2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar)。 + +6. 在`jdbc/target/`下找到并添加名为`apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`的库,点击 `Find Class`。 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) -8. 编辑驱动设置 +7. 编辑驱动设置 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/05.png) -9. 新建 DataBase Connection, 选择 iotdb +8. 新建 DataBase Connection, 选择 iotdb ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/06.png) -10. 编辑 JDBC 连接设置 +9. 编辑 JDBC 连接设置 ``` JDBC URL: jdbc:iotdb://127.0.0.1:6667/ @@ -77,10 +74,10 @@ DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 I ``` ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/07.png) -11. 测试连接 +10. 测试连接 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/08.png) -12. 可以开始通过 DBeaver 使用 IoTDB +11. 可以开始通过 DBeaver 使用 IoTDB ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/09.png) From 3c2fc92dc2421a70663a7b752fe32b8f48671b04 Mon Sep 17 00:00:00 2001 From: wanghui42 <105700158+wanghui42@users.noreply.github.com> Date: Wed, 20 Sep 2023 19:03:01 +0800 Subject: [PATCH 06/27] Update src/UserGuide/Master/Ecosystem-Integration/DBeaver.md Co-authored-by: Haonan --- src/UserGuide/Master/Ecosystem-Integration/DBeaver.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md index 7c3d36cf..7fcff93c 100644 --- a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md +++ b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md @@ -51,7 +51,7 @@ DBeaver is a SQL client software application and a database administration tool. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) -5. Download `iotdb-jdbc` , from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar). +5. Download `iotdb-jdbc` , from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar). 6. Find and add a lib named `apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`, which should be under `jdbc/target/`, then select `Find Class`. From d7a55e1805896f353e8fa998954975cb230d1024 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 20 Sep 2023 19:11:01 +0800 Subject: [PATCH 07/27] adjust no.6 --- src/UserGuide/Master/Ecosystem-Integration/DBeaver.md | 2 +- src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md | 2 +- src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md | 2 +- src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md index 7c3d36cf..5ca1c001 100644 --- a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md +++ b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md @@ -53,7 +53,7 @@ DBeaver is a SQL client software application and a database administration tool. 5. Download `iotdb-jdbc` , from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar). -6. Find and add a lib named `apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`, which should be under `jdbc/target/`, then select `Find Class`. +6. Add the downloaded jar file, then select `Find Class`. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) diff --git a/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md b/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md index 7e336557..b22e75e1 100644 --- a/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md +++ b/src/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md @@ -53,7 +53,7 @@ DBeaver is a SQL client software application and a database administration tool. 5. Download `iotdb-jdbc` , from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar). -6. Find and add a lib named `apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`, which should be under `jdbc/target/`, then select `Find Class`. +6. Add the downloaded jar file, then select `Find Class`. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) diff --git a/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md b/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md index 5c3c2b17..3ea4d50f 100644 --- a/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md +++ b/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md @@ -53,7 +53,7 @@ DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 I 5. 下载 jdbc 驱动, 可点击下载 [地址1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) 或 [地址2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar)。 -6. 在`jdbc/target/`下找到并添加名为`apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`的库,点击 `Find Class`。 +6. 添加刚刚下载的驱动包,点击 Find Class。 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) diff --git a/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md b/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md index 5c3c2b17..3ea4d50f 100644 --- a/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md +++ b/src/zh/UserGuide/V1.2.x/Ecosystem-Integration/DBeaver.md @@ -53,7 +53,7 @@ DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 I 5. 下载 jdbc 驱动, 可点击下载 [地址1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) 或 [地址2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar)。 -6. 在`jdbc/target/`下找到并添加名为`apache-iotdb-jdbc-{version}-jar-with-dependencies.jar`的库,点击 `Find Class`。 +6. 添加刚刚下载的驱动包,点击 Find Class。 ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) From bf05d0275cf6513f8abe9af36c8c97b5a9c1e01b Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 20 Sep 2023 19:37:10 +0800 Subject: [PATCH 08/27] adjust master --- src/UserGuide/Master/Ecosystem-Integration/DBeaver.md | 3 ++- src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md index 3b37385f..ad506382 100644 --- a/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md +++ b/src/UserGuide/Master/Ecosystem-Integration/DBeaver.md @@ -51,7 +51,8 @@ DBeaver is a SQL client software application and a database administration tool. ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) -5. Download `iotdb-jdbc` , from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar). +5. Download `iotdb-jdbc`, from [source1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/) or [source2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/),choose the corresponding jar file,download the suffix `jar-with-dependencies.jar` file. + ![](https://alioss.timecho.com/docs/img/20230920-192746.jpg) 6. Add the downloaded jar file, then select `Find Class`. diff --git a/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md b/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md index 3ea4d50f..bd6f65b4 100644 --- a/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md +++ b/src/zh/UserGuide/Master/Ecosystem-Integration/DBeaver.md @@ -51,9 +51,9 @@ DBeaver 是一个 SQL 客户端和数据库管理工具。DBeaver 可以使用 I ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/02.png) -5. 下载 jdbc 驱动, 可点击下载 [地址1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar) 或 [地址2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/1.2.1/iotdb-jdbc-1.2.1-jar-with-dependencies.jar)。 - -6. 添加刚刚下载的驱动包,点击 Find Class。 +5. 下载 jdbc 驱动, 点击下列网址 [地址1](https://maven.proxy.ustclug.org/maven2/org/apache/iotdb/iotdb-jdbc/) 或 [地址2](https://repo1.maven.org/maven2/org/apache/iotdb/iotdb-jdbc/),选择对应版本的 jar 包,下载后缀 jar-with-dependencies.jar 的包 + ![](https://alioss.timecho.com/docs/img/20230920-192746.jpg) +6. 添加刚刚下载的驱动包,点击 Find Class ![](https://alioss.timecho.com/docs/img/UserGuide/Ecosystem-Integration/DBeaver/03.png) From 21ee0219c55b35db757d7ae95fad3fb911f69810 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Fri, 22 Sep 2023 18:38:39 +0800 Subject: [PATCH 09/27] pipe-English --- src/UserGuide/V1.2.x/User-Manual/Data-Sync.md | 464 +++++++++++++++++ .../V1.2.x/User-Manual/Data-Sync_timecho.md | 469 ++++++++++++++++++ src/UserGuide/V1.2.x/User-Manual/Streaming.md | 24 + .../V1.2.x/User-Manual/Streaming_timecho.md | 24 + 4 files changed, 981 insertions(+) create mode 100644 src/UserGuide/V1.2.x/User-Manual/Data-Sync.md create mode 100644 src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md create mode 100644 src/UserGuide/V1.2.x/User-Manual/Streaming.md create mode 100644 src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md new file mode 100644 index 00000000..17b934fd --- /dev/null +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md @@ -0,0 +1,464 @@ + + +# IoTDB Data Sync +**The IoTDB data sync transfers data from IoTDB to another data platform, and a data sync task is called a Pipe.** + +**A Pipe consists of three subtasks (plugins): ** + +- Extract +- Process +- Connect + +**Pipe allows users to customize the processing logic of these three subtasks, just like handling data using UDF (User-Defined Functions)**. Within a Pipe, the aforementioned subtasks are executed and implemented by three types of plugins. Data flows through these three plugins sequentially: Pipe Extractor is used to extract data, Pipe Processor is used to process data, and Pipe Connector is used to send data to an external system. + +**The model of a Pipe task is as follows: ** + +![Task model diagram](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) + +It describes a data synchronization task, which essentially describes the attributes of the Pipe Extractor, Pipe Processor, and Pipe Connector plugins. Users can declaratively configure the specific attributes of the three subtasks through SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. + +By utilizing the data synchronization functionality, a complete data pipeline can be built to fulfill various requirements such as edge-to-cloud synchronization, remote disaster recovery, and read-write workload distribution across multiple databases. + +## Quick Start + +**🎯 Goal: Achieve full data synchronisation of IoTDB A -> IoTDB B** + +- Start two IoTDBs,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) +- create a Pipe from A -> B, and execute on A + + ```sql + create pipe a2b + with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6668' + ) + ``` +- start a Pipe from A -> B, and execute on A + + ```sql + start pipe a2b + ``` +- Write data to A + + ```sql + INSERT INTO root.db.d(time, m) values (1, 1) + ``` +- Checking data synchronised from A at B + ```sql + SELECT ** FROM root + ``` + +> ❗️**Note: The current IoTDB -> IoTDB implementation of data synchronisation does not support DDL synchronisation** +> +> That is: ttl, trigger, alias, template, view, create/delete sequence, create/delete storage group, etc. are not supported. +> +> **IoTDB -> IoTDB data synchronisation requires the target IoTDB:** +> +> * Enable automatic metadata creation: manual configuration of encoding and compression of data types to be consistent with the sender is required +> * Do not enable automatic metadata creation: manually create metadata that is consistent with the source + +## Synchronization task management + +### Create a synchronization task + +A data synchronisation task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: + +```sql +CREATE PIPE -- PipeId is the name that uniquely identifies the synchronisation task +WITH EXTRACTOR ( + -- Default IoTDB Data Extraction Plugin + 'extractor' = 'iotdb-extractor', + -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery + 'extractor.pattern' = 'root.timecho', + -- Whether to extract historical data + 'extractor.history.enable' = 'true', + -- Describes the time range of the historical data being extracted, indicating the earliest possible time + 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', + -- Describes the time range of the extracted historical data, indicating the latest time + 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', + -- Whether to extract real-time data + 'extractor.realtime.enable' = 'true', +) +WITH PROCESSOR ( + -- Default data processing plugin, means no processing + 'processor' = 'do-nothing-processor', +) +WITH CONNECTOR ( + -- IoTDB data sending plugin with target IoTDB + 'connector' = 'iotdb-thrift-connector', + -- Data service for one of the DataNode nodes on the target IoTDB ip + 'connector.ip' = '127.0.0.1', + -- Data service port of one of the DataNode nodes of the target IoTDB + 'connector.port' = '6667', +) +``` + +**To create a synchronisation task it is necessary to configure the PipeId and the parameters of the three plugin sections:** + + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | +| PipeId | 全局唯一标定一个同步任务的名称 | 必填 | - | - | - | +| extractor | Pipe Extractor 插件,负责在数据库底层抽取同步数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入同步任务 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | + +In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System pre-built data synchronisation plug-ins" **. See the "System Preconfigured Data Synchronisation Plugins" section**. + +**An example of a minimalist CREATE PIPE statement is as follows:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of real-time data from this database instance to the IoTDB instance with target 127.0.0.1:6667. + +**注意:** + +- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 +- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 +- CONNECTOR 具备自复用能力。对于不同的任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个任务的声明: + + ```sql + CREATE PIPE pipe1 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.host' = 'localhost', + 'connector.thrift.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.port' = '9999', + 'connector.thrift.host' = 'localhost', + ) + ``` + + - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 +- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### 启动任务 + +CREATE PIPE 语句成功执行后,任务相关实例会被创建,但整个任务的运行状态会被置为 STOPPED,即任务不会立刻处理数据。 + +可以使用 START PIPE 语句使任务开始处理数据: + +```sql +START PIPE +``` + +### 停止任务 + +使用 STOP PIPE 语句使任务停止处理数据: + +```sql +STOP PIPE +``` + +### 删除任务 + +使用 DROP PIPE 语句使任务停止处理数据(当任务状态为 RUNNING 时),然后删除整个任务同步任务: + +```sql +DROP PIPE +``` + +用户在删除任务前,不需要执行 STOP 操作。 + +### 展示任务 + +使用 SHOW PIPES 语句查看所有任务: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +| ID| CreationTime | State|PipeExtractor|PipeProcessor|PipeConnector|ExceptionMessage| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| None| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +``` + +可以使用 `` 指定想看的某个同步任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 + +```sql +SHOW PIPES +WHERE CONNECTOR USED BY +``` + +### 任务运行状态迁移 + +一个数据同步 pipe 在其被管理的生命周期中会经过多种状态: + +- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: + - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 + - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED + - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- **RUNNING:** pipe 正在正常工作 +- **DROPPED:** pipe 任务被永久删除 + +下图表明了所有状态以及状态的迁移: + +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## 系统预置数据同步插件 + +### 查看预置插件 + +用户可以按需查看系统中的插件。查看插件的语句如图所示。 + +```sql +SHOW PIPEPLUGINS +``` + +### 预置 extractor 插件 + +#### iotdb-extractor + +作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 + + +| key | value | value range | required or optional with default | +| ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | +| extractor | iotdb-extractor | String: iotdb-extractor | required | +| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | optional: true | +| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | optional: true | + +> 🚫 **extractor.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,同步效率较高,其他任意格式都将降低性能 +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> +> * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100TS +> +> 的数据会被同步; +> +> * root.aligned.\`1\` +> * root.aligned.\`123\` +> +> 的数据不会被同步。 + +> ❗️**extractor.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + +> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> +> 用户可以指定 iotdb-extractor 进行: +> +> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` + +### pre-processor plugin + +#### do-nothing-processor + +Function: Do not do anything with the events passed in by the extractor. + + +| key | value | value range | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### pre-connector plugin + +#### iotdb-thrift-sync-connector(alias:iotdb-thrift-connector) + +Function: Primarily used for data transfer between IoTDB instances (v1.2.0+). Data is transmitted using the Thrift RPC framework and a single-threaded blocking IO model. It guarantees that the receiving end applies the data in the same order as the sending end receives the write requests. + +Limitation: Both the source and target IoTDB versions need to be v1.2.0+. + + +| key | value | value range | required or optional with default | +| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | + +> 📌 Please ensure that the receiving end has already created all the time series present in the sending end or has enabled automatic metadata creation. Otherwise, it may result in the failure of the pipe operation. + +#### iotdb-thrift-async-connector + +Function: Primarily used for data transfer between IoTDB instances (v1.2.0+). +Data is transmitted using the Thrift RPC framework, employing a multi-threaded async non-blocking IO model, resulting in high transfer performance. It is particularly suitable for distributed scenarios on the target end. +It does not guarantee that the receiving end applies the data in the same order as the sending end receives the write requests, but it guarantees data integrity (at-least-once). + +Limitation: Both the source and target IoTDB versions need to be v1.2.0+. + + +| key | value | value range | required or optional with default | +| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | + +> 📌 Please ensure that the receiving end has already created all the time series present in the sending end or has enabled automatic metadata creation. Otherwise, it may result in the failure of the pipe operation. + +#### iotdb-legacy-pipe-connector + +Function: Mainly used to transfer data from IoTDB (v1.2.0+) to lower versions of IoTDB, using the data synchronization (Sync) protocol before version v1.2.0. +Data is transmitted using the Thrift RPC framework. It employs a single-threaded sync blocking IO model, resulting in weak transfer performance. + +Limitation: The source IoTDB version needs to be v1.2.0+. The target IoTDB version can be either v1.2.0+, v1.1.x (lower versions of IoTDB are theoretically supported but untested). + +Note: In theory, any version prior to v1.2.0 of IoTDB can serve as the data synchronization (Sync) receiver for v1.2.0+. + + +| key | value | value range | required or optional with default | +| ------------------ | --------------------------------------------------------------------- | ----------------------------------- | --------------------------------- | +| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | +| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | optional: 1.1 | + +> 📌 Make sure that the receiver has created all the time series on the sender side, or that automatic metadata creation is turned on, otherwise the pipe run will fail. + +#### do-nothing-connector + +Function: Does not do anything with the events passed in by the processor. + + +| key | value | value range | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| connector | do-nothing-connector | String: do-nothing-connector | required | + +## Authority Management + +| Authority Name | Description | +| ----------- | -------------------- | +| CREATE_PIPE | Register task,path-independent | +| START_PIPE | Start task,path-independent | +| STOP_PIPE | Stop task,path-independent | +| DROP_PIPE | Uninstall task,path-independent | +| SHOW_PIPES | Query task,path-independent | + +## Configure Parameters + +In iotdb-common.properties : + +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 + +# The maximum number of selectors that can be used in the async connector. +# pipe_async_connector_selector_number=1 + +# The core number of clients that can be used in the async connector. +# pipe_async_connector_core_client_number=8 + +# The maximum number of clients that can be used in the async connector. +# pipe_async_connector_max_client_number=16 +``` + +## Functionality Features + +### At least one semantic guarantee **at-least-once** + +The data synchronization feature provides an at-least-once delivery semantic when transferring data to external systems. In most scenarios, the synchronization feature guarantees exactly-once delivery, ensuring that all data is synchronized exactly once. + +However, in the following scenarios, it is possible for some data to be synchronized multiple times **(due to resumable transmission)**: + +- Temporary network failures: If a data transmission request fails, the system will retry sending it until reaching the maximum retry attempts. +- Abnormal implementation of the Pipe plugin logic: If an error is thrown during the plugin's execution, the system will retry sending the data until reaching the maximum retry attempts. +- Data partition switching due to node failures or restarts: After the partition change is completed, the affected data will be retransmitted. +- Cluster unavailability: Once the cluster becomes available again, the affected data will be retransmitted. + +### Source End: Data Writing with Pipe Processing and Asynchronous Decoupling of Data Transmission + +In the data synchronization feature, data transfer adopts an asynchronous replication mode. + +Data synchronization is completely decoupled from the writing operation, eliminating any impact on the critical path of writing. This mechanism allows the framework to maintain the writing speed of a time-series database while ensuring continuous data synchronization. + +### Source End: High Availability of Pipe Service in a Highly Available Cluster Deployment + +When the sender end IoTDB is deployed in a high availability cluster mode, the data synchronization service will also be highly available. The data synchronization framework monitors the data synchronization progress of each data node and periodically takes lightweight distributed consistent snapshots to preserve the synchronization state. + +- In the event of a failure of a data node in the sender cluster, the data synchronization framework can leverage the consistent snapshot and the data stored in replicas to quickly recover and resume synchronization, thus achieving high availability of the data synchronization service. +- In the event of a complete failure and restart of the sender cluster, the data synchronization framework can also use snapshots to recover the synchronization service. diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md new file mode 100644 index 00000000..e96087ca --- /dev/null +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md @@ -0,0 +1,469 @@ + + +# IoTDB Data Sync +**The IoTDB data sync transfers data from IoTDB to another data platform, and a data sync task is called a Pipe.** + +**一个 Pipe 包含三个子任务(插件):** + +- 抽取(Extract) +- 处理(Process) +- 发送(Connect) + +**Pipe 允许用户自定义三个子任务的处理逻辑,通过类似 UDF 的方式处理数据。** 在一个 Pipe 中,上述的子任务分别由三种插件执行实现,数据会依次经过这三个插件进行处理:Pipe Extractor 用于抽取数据,Pipe Processor 用于处理数据,Pipe Connector 用于发送数据,最终数据将被发至外部系统。 + +**Pipe 任务的模型如下:** + +![任务模型图](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) + +描述一个数据同步任务,本质就是描述 Pipe Extractor、Pipe Processor 和 Pipe Connector 插件的属性。用户可以通过 SQL 语句声明式地配置三个子任务的具体属性,通过组合不同的属性,实现灵活的数据 ETL 能力。 + +利用数据同步功能,可以搭建完整的数据链路来满足端*边云同步、异地灾备、读写负载分库*等需求。 + +## 快速开始 + +**🎯 目标:实现 IoTDB A -> IoTDB B 的全量数据同步** + +- 启动两个 IoTDB,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) +- 创建 A -> B 的 Pipe,在 A 上执行 + + ```sql + create pipe a2b + with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6668' + ) + ``` +- 启动 A -> B 的 Pipe,在 A 上执行 + + ```sql + start pipe a2b + ``` +- 向 A 写入数据 + + ```sql + INSERT INTO root.db.d(time, m) values (1, 1) + ``` +- 在 B 检查由 A 同步过来的数据 + + ```sql + SELECT ** FROM root + ``` + +> ❗️**注:目前的 IoTDB -> IoTDB 的数据同步实现并不支持 DDL 同步** +> +> 即:不支持 ttl,trigger,别名,模板,视图,创建/删除序列,创建/删除存储组等操作 +> +> **IoTDB -> IoTDB 的数据同步要求目标端 IoTDB:** +> +> * 开启自动创建元数据:需要人工配置数据类型的编码和压缩与发送端保持一致 +> * 不开启自动创建元数据:手工创建与源端一致的元数据 + +## 同步任务管理 + +### 创建同步任务 + +可以使用 `CREATE PIPE` 语句来创建一条数据同步任务,示例 SQL 语句如下所示: + +```sql +CREATE PIPE -- PipeId 是能够唯一标定同步任务任务的名字 +WITH EXTRACTOR ( + -- 默认的 IoTDB 数据抽取插件 + 'extractor' = 'iotdb-extractor', + -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + 'extractor.pattern' = 'root.timecho', + -- 是否抽取历史数据 + 'extractor.history.enable' = 'true', + -- 描述被抽取的历史数据的时间范围,表示最早时间 + 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', + -- 描述被抽取的历史数据的时间范围,表示最晚时间 + 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', + -- 是否抽取实时数据 + 'extractor.realtime.enable' = 'true', +) +WITH PROCESSOR ( + -- 默认的数据处理插件,即不做任何处理 + 'processor' = 'do-nothing-processor', +) +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +**创建同步任务时需要配置 PipeId 以及三个插件部分的参数:** + + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | +| PipeId | 全局唯一标定一个同步任务的名称 | 必填 | - | - | - | +| extractor | Pipe Extractor 插件,负责在数据库底层抽取同步数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入同步任务 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | + +示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据同步任务。IoTDB 还内置了其他的数据同步插件,**请查看“系统预置数据同步插件”一节**。 + +**一个最简的 CREATE PIPE 语句示例如下:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 + +**注意:** + +- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 +- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 +- CONNECTOR 具备自复用能力。对于不同的任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个任务的声明: + + ```sql + CREATE PIPE pipe1 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.host' = 'localhost', + 'connector.thrift.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.port' = '9999', + 'connector.thrift.host' = 'localhost', + ) + ``` + + - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 +- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### 启动任务 + +CREATE PIPE 语句成功执行后,任务相关实例会被创建,但整个任务的运行状态会被置为 STOPPED,即任务不会立刻处理数据。 + +可以使用 START PIPE 语句使任务开始处理数据: + +```sql +START PIPE +``` + +### 停止任务 + +使用 STOP PIPE 语句使任务停止处理数据: + +```sql +STOP PIPE +``` + +### 删除任务 + +使用 DROP PIPE 语句使任务停止处理数据(当任务状态为 RUNNING 时),然后删除整个任务同步任务: + +```sql +DROP PIPE +``` + +用户在删除任务前,不需要执行 STOP 操作。 + +### 展示任务 + +使用 SHOW PIPES 语句查看所有任务: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +| ID| CreationTime | State|PipeExtractor|PipeProcessor|PipeConnector|ExceptionMessage| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| None| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +``` + +可以使用 `` 指定想看的某个同步任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 + +```sql +SHOW PIPES +WHERE CONNECTOR USED BY +``` + +### 任务运行状态迁移 + +一个数据同步 pipe 在其被管理的生命周期中会经过多种状态: + +- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: + - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 + - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED + - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- **RUNNING:** pipe 正在正常工作 +- **DROPPED:** pipe 任务被永久删除 + +下图表明了所有状态以及状态的迁移: + +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## 系统预置数据同步插件 + +### 查看预置插件 + +用户可以按需查看系统中的插件。查看插件的语句如图所示。 + +```sql +SHOW PIPEPLUGINS +``` + +### 预置 extractor 插件 + +#### iotdb-extractor + +作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 + + +| key | value | value 取值范围 | required or optional with default | +| ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | +| extractor | iotdb-extractor | String: iotdb-extractor | required | +| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | optional: true | +| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | optional: true | + +> 🚫 **extractor.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,同步效率较高,其他任意格式都将降低性能 +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> +> * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100TS +> +> 的数据会被同步; +> +> * root.aligned.\`1\` +> * root.aligned.\`123\` +> +> 的数据不会被同步。 + +> ❗️**extractor.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + +> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> +> 用户可以指定 iotdb-extractor 进行: +> +> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` + +### 预置 processor 插件 + +#### do-nothing-processor + +作用:不对 extractor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### 预置 connector 插件 + +#### iotdb-thrift-sync-connector(别名:iotdb-thrift-connector) + +作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 +使用 Thrift RPC 框架传输数据,单线程 blocking IO 模型。 +保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致。 + +限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 + + +| key | value | value 取值范围 | required or optional with default | +| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +#### iotdb-thrift-async-connector + +作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 +使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景。 +不保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致,但是保证数据发送的完整性(at-least-once)。 + +限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 + + +| key | value | value 取值范围 | required or optional with default | +| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +#### iotdb-legacy-pipe-connector + +作用:主要用于 IoTDB(v1.2.0+)向更低版本的 IoTDB 传输数据,使用 v1.2.0 版本前的数据同步(Sync)协议。 +使用 Thrift RPC 框架传输数据。单线程 sync blocking IO 模型,传输性能较弱。 + +限制:源端 IoTDB 版本需要在 v1.2.0+,目标端 IoTDB 版本可以是 v1.2.0+、v1.1.x(更低版本的 IoTDB 理论上也支持,但是未经测试)。 + +注意:理论上 v1.2.0+ IoTDB 可作为 v1.2.0 版本前的任意版本的数据同步(Sync)接收端。 + + +| key | value | value 取值范围 | required or optional with default | +| ------------------ | --------------------------------------------------------------------- | ----------------------------------- | --------------------------------- | +| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | +| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | optional: 1.1 | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +#### do-nothing-connector + +作用:不对 processor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| connector | do-nothing-connector | String: do-nothing-connector | required | + +## 权限管理 + +| 权限名称 | 描述 | +| ----------- | -------------------- | +| CREATE_PIPE | 注册任务。路径无关。 | +| START_PIPE | 开启任务。路径无关。 | +| STOP_PIPE | 停止任务。路径无关。 | +| DROP_PIPE | 卸载任务。路径无关。 | +| SHOW_PIPES | 查询任务。路径无关。 | + +## 配置参数 + +在 iotdb-common.properties 中: + +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 + +# The maximum number of selectors that can be used in the async connector. +# pipe_async_connector_selector_number=1 + +# The core number of clients that can be used in the async connector. +# pipe_async_connector_core_client_number=8 + +# The maximum number of clients that can be used in the async connector. +# pipe_async_connector_max_client_number=16 +``` + +## 功能特性 + +### 最少一次语义保证 **at-least-once** + +数据同步功能向外部系统传输数据时,提供 at-least-once 的传输语义。在大部分场景下,同步功能可提供 exactly-once 保证,即所有数据被恰好同步一次。 + +但是在以下场景中,可能存在部分数据被同步多次 **(断点续传)** 的情况: + +- 临时的网络故障:某次数据传输请求失败后,系统会进行重试发送,直至到达最大尝试次数 +- Pipe 插件逻辑实现异常:插件运行中抛出错误,系统会进行重试发送,直至到达最大尝试次数 +- 数据节点宕机、重启等导致的数据分区切主:分区变更完成后,受影响的数据会被重新传输 +- 集群不可用:集群可用后,受影响的数据会重新传输 + +### 源端:数据写入与 Pipe 处理、发送数据异步解耦 + +数据同步功能中,数据传输采用的是异步复制模式。 + +数据同步与写入操作完全脱钩,不存在对写入关键路径的影响。该机制允许框架在保证持续数据同步的前提下,保持时序数据库的写入速度。 + +### 源端:高可用集群部署时,Pipe 服务高可用 + +当发送端 IoTDB 为高可用集群部署模式时,数据同步服务也将是高可用的。 数据同步框架将监控每个数据节点的数据同步进度,并定期做轻量级的分布式一致性快照以保存同步状态。 + +- 当发送端集群某数据节点宕机时,数据同步框架可以利用一致性快照以及保存在副本上的数据快速恢复同步,以此实现数据同步服务的高可用。 +- 当发送端集群整体宕机并重启时,数据同步框架也能使用快照恢复同步服务。 + + diff --git a/src/UserGuide/V1.2.x/User-Manual/Streaming.md b/src/UserGuide/V1.2.x/User-Manual/Streaming.md new file mode 100644 index 00000000..c5ac54a5 --- /dev/null +++ b/src/UserGuide/V1.2.x/User-Manual/Streaming.md @@ -0,0 +1,24 @@ + + +# Tiered Storage + +TODO \ No newline at end of file diff --git a/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md new file mode 100644 index 00000000..c5ac54a5 --- /dev/null +++ b/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md @@ -0,0 +1,24 @@ + + +# Tiered Storage + +TODO \ No newline at end of file From d46019ea93da281dce3afec430701f0f90139a46 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 27 Sep 2023 18:53:55 +0800 Subject: [PATCH 10/27] 1.2-url-English-2 --- src/UserGuide/V1.2.x/User-Manual/Data-Sync.md | 68 +++++++++---------- 1 file changed, 34 insertions(+), 34 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md index 17b934fd..52c51228 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md @@ -22,7 +22,7 @@ # IoTDB Data Sync **The IoTDB data sync transfers data from IoTDB to another data platform, and a data sync task is called a Pipe.** -**A Pipe consists of three subtasks (plugins): ** +**A Pipe consists of three subtasks (plugins):** - Extract - Process @@ -30,7 +30,7 @@ **Pipe allows users to customize the processing logic of these three subtasks, just like handling data using UDF (User-Defined Functions)**. Within a Pipe, the aforementioned subtasks are executed and implemented by three types of plugins. Data flows through these three plugins sequentially: Pipe Extractor is used to extract data, Pipe Processor is used to process data, and Pipe Connector is used to send data to an external system. -**The model of a Pipe task is as follows: ** +**The model of a Pipe task is as follows:** ![Task model diagram](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) @@ -141,13 +141,13 @@ WITH CONNECTOR ( The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of real-time data from this database instance to the IoTDB instance with target 127.0.0.1:6667. -**注意:** +**Note:** -- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 -- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 -- CONNECTOR 具备自复用能力。对于不同的任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 +- EXTRACTOR and PROCESSOR are optional, if no configuration parameters are filled in, the system will use the corresponding default implementation. +- The CONNECTOR is a mandatory configuration that needs to be declared in the CREATE PIPE statement for configuring purposes. +- The CONNECTOR exhibits self-reusability. For different tasks, if their CONNECTOR possesses identical KV properties (where the value corresponds to every key), **the system will ultimately create only one instance of the CONNECTOR** to achieve resource reuse for connections. - - 例如,有下面 pipe1, pipe2 两个任务的声明: + - For example, there are the following pipe1, pipe2 task declarations: ```sql CREATE PIPE pipe1 @@ -165,49 +165,49 @@ The expressed semantics are: synchronise the full amount of historical data and ) ``` - - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 -- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + - Since they have identical CONNECTOR declarations (**even if the order of some properties is different**), the framework will automatically reuse the CONNECTOR declared by them. Hence, the CONNECTOR instances for pipe1 and pipe2 will be the same. +- Please note that we should avoid constructing application scenarios that involve data cycle synchronization (as it can result in an infinite loop): - IoTDB A -> IoTDB B -> IoTDB A - IoTDB A -> IoTDB A -### 启动任务 +### STARE TASK -CREATE PIPE 语句成功执行后,任务相关实例会被创建,但整个任务的运行状态会被置为 STOPPED,即任务不会立刻处理数据。 +After the successful execution of the CREATE PIPE statement, task-related instances will be created. However, the overall task's running status will be set to STOPPED, meaning the task will not immediately process data. -可以使用 START PIPE 语句使任务开始处理数据: +You can use the START PIPE statement to begin processing data for a task: ```sql START PIPE ``` -### 停止任务 +### STOP TASK -使用 STOP PIPE 语句使任务停止处理数据: +the STOP PIPE statement can be used to halt the data processing: ```sql STOP PIPE ``` -### 删除任务 +### DELETE TASK -使用 DROP PIPE 语句使任务停止处理数据(当任务状态为 RUNNING 时),然后删除整个任务同步任务: +If a task is in the RUNNING state, you can use the DROP PIPE statement to stop the data processing and delete the entire task: ```sql DROP PIPE ``` -用户在删除任务前,不需要执行 STOP 操作。 +Before deleting a task, there is no need to execute the STOP operation. -### 展示任务 +### SHOw TASK -使用 SHOW PIPES 语句查看所有任务: +ou can use the SHOW PIPES statement to view all tasks: ```sql SHOW PIPES ``` -查询结果如下: +The query results are as follows: ```sql +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ @@ -219,33 +219,33 @@ SHOW PIPES +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ ``` -可以使用 `` 指定想看的某个同步任务状态: +You can use to specify the status of a particular synchronization task: ```sql SHOW PIPE ``` -您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 +Additionally, the WHERE clause can be used to determine if the Pipe Connector used by a specific \ is being reused. ```sql SHOW PIPES WHERE CONNECTOR USED BY ``` -### 任务运行状态迁移 +### Task Running Status Migration -一个数据同步 pipe 在其被管理的生命周期中会经过多种状态: +The task running status can transition through several states during the lifecycle of a data synchronization pipe: -- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: - - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 - - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED - - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED -- **RUNNING:** pipe 正在正常工作 -- **DROPPED:** pipe 任务被永久删除 +- **STOPPED:** The pipe is in a stopped state. It can have the following possibilities: + - After the successful creation of a pipe, its initial state is set to stopped + - The user manually pauses a pipe that is in normal running state, transitioning its status from RUNNING to STOPPED + - If a pipe encounters an unrecoverable error during execution, its status automatically changes from RUNNING to STOPPED. +- **RUNNING:** The pipe is actively processing data +- **DROPPED:** The pipe is permanently deleted -下图表明了所有状态以及状态的迁移: +The following diagram illustrates the different states and their transitions: -![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) +![state migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) ## 系统预置数据同步插件 @@ -337,7 +337,7 @@ Limitation: Both the source and target IoTDB versions need to be v1.2.0+. | key | value | value range | required or optional with default | | --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | -| connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | +| connector | iotdb-thrift-connector or iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | | connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | | connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | | connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | @@ -375,7 +375,7 @@ Note: In theory, any version prior to v1.2.0 of IoTDB can serve as the data sync | key | value | value range | required or optional with default | | ------------------ | --------------------------------------------------------------------- | ----------------------------------- | --------------------------------- | | connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | +| connector.ip | Data service of one DataNode node of the target IoTDB ip | String | required | | connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | | connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | | connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | From 3759ea57cf3dcb7aa7fabd7fb682e8ad3d2e1c7b Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Thu, 28 Sep 2023 18:54:24 +0800 Subject: [PATCH 11/27] 3 --- src/UserGuide/V1.2.x/User-Manual/Data-Sync.md | 115 +++++++++--------- 1 file changed, 57 insertions(+), 58 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md index 52c51228..1bdeb9fe 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md @@ -96,7 +96,7 @@ WITH EXTRACTOR ( 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', -- Describes the time range of the extracted historical data, indicating the latest time 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', - -- Whether to extract real-time data + -- Whether to extract realtime data 'extractor.realtime.enable' = 'true', ) WITH PROCESSOR ( @@ -116,30 +116,30 @@ WITH CONNECTOR ( **To create a synchronisation task it is necessary to configure the PipeId and the parameters of the three plugin sections:** -| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | | --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | -| PipeId | 全局唯一标定一个同步任务的名称 | 必填 | - | - | - | -| extractor | Pipe Extractor 插件,负责在数据库底层抽取同步数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入同步任务 | 否 | -| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | -| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | +| pipeId | Globally uniquely identifies the name of a sync task | 必填 | - | - | - | +| extractor | pipe Extractor plug-in, for extracting synchronized data at the bottom of the database | 选填 | iotdb-extractor | Integrate all historical data of the database and subsequent realtime data into the sync task | 否 | +| processor | Pipe Processor plug-in, for processing data | 选填 | do-nothing-processor | no processing of incoming data | | +| connector | Pipe Connector plug-in,for sending data | 必填 | - | - | | In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System pre-built data synchronisation plug-ins" **. See the "System Preconfigured Data Synchronisation Plugins" section**. **An example of a minimalist CREATE PIPE statement is as follows:** ```sql -CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 +CREATE PIPE -- PipeId is a name that uniquely identifies the task. WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB + -- IoTDB data sending plugin with target IoTDB 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + -- Data service for one of the DataNode nodes on the target IoTDB ip 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + -- Data service port of one of the DataNode nodes of the target IoTDB 'connector.port' = '6667', ) ``` -The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of real-time data from this database instance to the IoTDB instance with target 127.0.0.1:6667. +The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of realtime data from this database instance to the IoTDB instance with target 127.0.0.1:6667. **Note:** @@ -247,73 +247,72 @@ The following diagram illustrates the different states and their transitions: ![state migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) -## 系统预置数据同步插件 - -### 查看预置插件 +## System Pre-installed Data Sync Plug-in -用户可以按需查看系统中的插件。查看插件的语句如图所示。 +### View pre-built plug-in +User can view the plug-ins in the system on demand. The statement for viewing plug-ins is shown below. ```sql SHOW PIPEPLUGINS ``` -### 预置 extractor 插件 +### Pre-built extractor plugin #### iotdb-extractor -作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 +Function: Extract historical or realtime data inside IoTDB into pipe. | key | value | value range | required or optional with default | | ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | | extractor | iotdb-extractor | String: iotdb-extractor | required | -| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | -| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | optional: true | -| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | optional: true | +| extractor.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | +| extractor.history.enable | whether to synchronize historical data | Boolean: true, false | optional: true | +| extractor.history.start-time | start of synchronizing historical data event time,Include start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | end of synchronizing historical data event time,Include end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | Whether to synchronize realtime data | Boolean: true, false | optional: true | -> 🚫 **extractor.pattern 参数说明** +> 🚫 **extractor.pattern Parameter Description** > -> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) -> * 在底层实现中,当检测到 pattern 为 root(默认值)时,同步效率较高,其他任意格式都将降低性能 -> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * In the underlying implementation, when pattern is detected as root (default value), synchronization efficiency is higher, and any other format will reduce performance. +> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'extractor.pattern'='root.aligned.1': > > * root.aligned.1TS > * root.aligned.1TS.\`1\` > * root.aligned.100TS > -> 的数据会被同步; +> the data will be synchronized; > > * root.aligned.\`1\` > * root.aligned.\`123\` > -> 的数据不会被同步。 +> the data will not be synchronized. -> ❗️**extractor.history 的 start-time,end-time 参数说明** +> ❗️**start-time, end-time parameter description of extractor.history** > -> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 +> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00 -> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> ✅ **a piece of data from production to IoTDB contains two key concepts of time** > -> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 -> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> * **event time:** the time when the data is actually produced (or the generation time assigned to the data by the data production system, which is a time item in the data point), also called the event time. +> * **arrival time:** the time the data arrived in the IoTDB system. > -> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 +> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. -> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> 💎 **the work of iotdb-extractor can be split into two stages** > -> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 -> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data +> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data > -> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** > -> 用户可以指定 iotdb-extractor 进行: +> Users can specify iotdb-extractor to: > -> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) -> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) -> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) -> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` +> * Historical data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * Realtime data extraction(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * Full data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * Disable simultaneous sets `extractor.history.enable` and `extractor.realtime.enable` to `false` ### pre-processor plugin @@ -337,10 +336,10 @@ Limitation: Both the source and target IoTDB versions need to be v1.2.0+. | key | value | value range | required or optional with default | | --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | -| connector | iotdb-thrift-connector or iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | +| connector | iotdb-thrift-connector or iotdb-thrift-sync-connector | String: iotdb-thrift-connector or iotdb-thrift-sync-connector | required | +| connector.ip | the data service IP of one of the DataNode nodes in the target IoTDB | String | optional: and connector.node-urls fill in either one | +| connector.port | the data service port of one of the DataNode nodes in the target IoTDB | Integer | optional: and connector.node-urls fill in either one | +| connector.node-urls | the URL of the data service port of any multiple DataNode nodes in the target IoTDB | String。eg:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: and connector.ip:connector.port fill in either one | > 📌 Please ensure that the receiving end has already created all the time series present in the sending end or has enabled automatic metadata creation. Otherwise, it may result in the failure of the pipe operation. @@ -356,9 +355,9 @@ Limitation: Both the source and target IoTDB versions need to be v1.2.0+. | key | value | value range | required or optional with default | | --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | | connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | +| connector.ip | the data service IP of one of the DataNode nodes in the target IoTDB | String | optional: and connector.node-urls fill in either one | +| connector.port | the data service port of one of the DataNode nodes in the target IoTDB | Integer | optional: and connector.node-urls fill in either one | +| connector.node-urls | the URL of the data service port of any multiple DataNode nodes in the target IoTDB | String。eg:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: and connector.ip:connector.port fill in either one | > 📌 Please ensure that the receiving end has already created all the time series present in the sending end or has enabled automatic metadata creation. Otherwise, it may result in the failure of the pipe operation. @@ -374,12 +373,12 @@ Note: In theory, any version prior to v1.2.0 of IoTDB can serve as the data sync | key | value | value range | required or optional with default | | ------------------ | --------------------------------------------------------------------- | ----------------------------------- | --------------------------------- | -| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | required | -| connector.ip | Data service of one DataNode node of the target IoTDB ip | String | required | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | -| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | -| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | -| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | optional: 1.1 | +| connector | iotdb-legacy-pipe-connector | string: iotdb-legacy-pipe-connector | required | +| connector.ip | data service of one DataNode node of the target IoTDB ip | string | required | +| connector.port | the data service port of one of the DataNode nodes in the target IoTDB | integer | required | +| connector.user | the user name of the target IoTDB. Note that the user needs to support data writing and TsFile Load permissions. | string | optional: root | +| connector.password | the password of the target IoTDB. Note that the user needs to support data writing and TsFile Load permissions. | string | optional: root | +| connector.version | the version of the target IoTDB, used to disguise its actual version and bypass the version consistency check of the target. | string | optional: 1.1 | > 📌 Make sure that the receiver has created all the time series on the sender side, or that automatic metadata creation is turned on, otherwise the pipe run will fail. @@ -450,13 +449,13 @@ However, in the following scenarios, it is possible for some data to be synchron - Data partition switching due to node failures or restarts: After the partition change is completed, the affected data will be retransmitted. - Cluster unavailability: Once the cluster becomes available again, the affected data will be retransmitted. -### Source End: Data Writing with Pipe Processing and Asynchronous Decoupling of Data Transmission +### Source: Data Writing with Pipe Processing and Asynchronous Decoupling of Data Transmission In the data synchronization feature, data transfer adopts an asynchronous replication mode. Data synchronization is completely decoupled from the writing operation, eliminating any impact on the critical path of writing. This mechanism allows the framework to maintain the writing speed of a time-series database while ensuring continuous data synchronization. -### Source End: High Availability of Pipe Service in a Highly Available Cluster Deployment +### Source: High Availability of Pipe Service in a Highly Available Cluster Deployment When the sender end IoTDB is deployed in a high availability cluster mode, the data synchronization service will also be highly available. The data synchronization framework monitors the data synchronization progress of each data node and periodically takes lightweight distributed consistent snapshots to preserve the synchronization state. From 96d7fc916ce42e8f8260b0756529f8cd919465f0 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Sat, 7 Oct 2023 18:50:07 +0800 Subject: [PATCH 12/27] 4 --- src/UserGuide/V1.2.x/User-Manual/Data-Sync.md | 14 +- src/UserGuide/V1.2.x/User-Manual/Streaming.md | 751 +++++++++++++++++- 2 files changed, 756 insertions(+), 9 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md index 1bdeb9fe..9fdc707c 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md @@ -118,10 +118,10 @@ WITH CONNECTOR ( | configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | | --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | -| pipeId | Globally uniquely identifies the name of a sync task | 必填 | - | - | - | -| extractor | pipe Extractor plug-in, for extracting synchronized data at the bottom of the database | 选填 | iotdb-extractor | Integrate all historical data of the database and subsequent realtime data into the sync task | 否 | -| processor | Pipe Processor plug-in, for processing data | 选填 | do-nothing-processor | no processing of incoming data | | -| connector | Pipe Connector plug-in,for sending data | 必填 | - | - | | +| pipeId | Globally uniquely identifies the name of a sync task | required | - | - | - | +| extractor | pipe Extractor plug-in, for extracting synchronized data at the bottom of the database | Optional | iotdb-extractor | Integrate all historical data of the database and subsequent realtime data into the sync task | no | +| processor | Pipe Processor plug-in, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | +| connector | Pipe Connector plug-in,for sending data | required | - | - | yes | In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System pre-built data synchronisation plug-ins" **. See the "System Preconfigured Data Synchronisation Plugins" section**. @@ -171,7 +171,7 @@ The expressed semantics are: synchronise the full amount of historical data and - IoTDB A -> IoTDB B -> IoTDB A - IoTDB A -> IoTDB A -### STARE TASK +### START TASK After the successful execution of the CREATE PIPE statement, task-related instances will be created. However, the overall task's running status will be set to STOPPED, meaning the task will not immediately process data. @@ -199,9 +199,9 @@ DROP PIPE Before deleting a task, there is no need to execute the STOP operation. -### SHOw TASK +### SHOW TASK -ou can use the SHOW PIPES statement to view all tasks: +You can use the SHOW PIPES statement to view all tasks: ```sql SHOW PIPES diff --git a/src/UserGuide/V1.2.x/User-Manual/Streaming.md b/src/UserGuide/V1.2.x/User-Manual/Streaming.md index c5ac54a5..95ae03eb 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Streaming.md +++ b/src/UserGuide/V1.2.x/User-Manual/Streaming.md @@ -19,6 +19,753 @@ --> -# Tiered Storage +# IoTDB Stream Processing Framework -TODO \ No newline at end of file +The IoTDB stream processing framework allows users to implement customized stream processing logic, which can monitor and capture storage engine changes, transform changed data, and push transformed data outward. + +We call a data flow processing task a Pipe. A stream processing task (Pipe) contains three subtasks: + +- Extract +- Process +- Send (Connect) + +The stream processing framework allows users to customize the processing logic of three subtasks using Java language and process data in a UDF-like manner. +In a Pipe, the three subtasks mentioned above are executed and implemented by three types of plugins. Data flows through these three plugins sequentially for processing: +Pipe Extractor is used to extract data, Pipe Processor is used to process data, Pipe Connector is used to send data, and the final data will be sent to an external system. + +**The model for a Pipe task is as follows:** + +![任务模型图](https://alioss.timecho.com/docs/img/%E5%90%8C%E6%AD%A5%E5%BC%95%E6%93%8E.jpeg) +A data stream processing task essentially describes the attributes of the Pipe Extractor, Pipe Processor, and Pipe Connector plugins. + +Users can configure the specific attributes of these three subtasks declaratively using SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. + +Using the stream processing framework, it is possible to build a complete data pipeline to fulfill various requirements such as *edge-to-cloud synchronization, remote disaster recovery, and read/write load balancing across multiple databases*. + +## Custom Stream Processing Plugin Development + +### Programming development dependencies + +It is recommended to use Maven to build the project. Add the following dependencies in the `pom.xml` file. Please make sure to choose dependencies with the same version as the IoTDB server version. + +```xml + + org.apache.iotdb + pipe-api + 1.2.1 + provided + +``` + +### Event-Driven Programming Model + +The design of user programming interfaces for stream processing plugins follows the principles of the event-driven programming model. In this model, events serve as the abstraction of data in the user programming interface. The programming interface is decoupled from the specific execution method, allowing the focus to be on describing how the system expects events (data) to be processed upon arrival. + +In the user programming interface of stream processing plugins, events abstract the write operations of database data. Events are captured by the local stream processing engine and passed sequentially through the three stages of stream processing, namely Pipe Extractor, Pipe Processor, and Pipe Connector plugins. User logic is triggered and executed within these three plugins. + +To accommodate both low-latency stream processing in low-load scenarios and high-throughput stream processing in high-load scenarios at the edge, the stream processing engine dynamically chooses the processing objects from operation logs and data files. Therefore, the user programming interface for stream processing requires the user to provide the handling logic for two types of events: TabletInsertionEvent for operation log write events and TsFileInsertionEvent for data file write events. + +#### **TabletInsertionEvent** + +The TabletInsertionEvent is a high-level data abstraction for user write requests, which provides the ability to manipulate the underlying data of the write request by providing a unified operation interface. + +For different database deployments, the underlying storage structure corresponding to the operation log write event is different. For stand-alone deployment scenarios, the operation log write event is an encapsulation of write-ahead log (WAL) entries; for distributed deployment scenarios, the operation log write event is an encapsulation of individual node consensus protocol operation log entries. + +For write operations generated by different write request interfaces of the database, the data structure of the request structure corresponding to the operation log write event is also different.IoTDB provides many write interfaces such as InsertRecord, InsertRecords, InsertTablet, InsertTablets, and so on, and each kind of write request uses a completely different serialisation method to generate a write request. completely different serialisation methods and generate different binary entries. + +The existence of operation log write events provides users with a unified view of data operations, which shields the implementation differences of the underlying data structures, greatly reduces the programming threshold for users, and improves the ease of use of the functionality. + +```java +/** TabletInsertionEvent is used to define the event of data insertion. */ +public interface TabletInsertionEvent extends Event { + + /** + * The consumer processes the data row by row and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processRowByRow(BiConsumer consumer); + + /** + * The consumer processes the Tablet directly and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processTablet(BiConsumer consumer); +} +``` + +#### **TsFileInsertionEvent** + +The TsFileInsertionEvent represents a high-level abstraction of the database's disk flush operation and is a collection of multiple TabletInsertionEvents. + +IoTDB's storage engine is based on the LSM (Log-Structured Merge) structure. When data is written, the write operations are first flushed to log-structured files, while the written data is also stored in memory. When the memory reaches its capacity limit, a flush operation is triggered, converting the data in memory into a database file while deleting the previously written log entries. During the conversion from memory data to database file data, two compression processes, encoding compression and universal compression, are applied. As a result, the data in the database file occupies less space compared to the original data in memory. + +In extreme network conditions, directly transferring data files is more cost-effective than transmitting individual write operations. It consumes lower network bandwidth and achieves faster transmission speed. However, there is no such thing as a free lunch. Performing calculations on data in the disk file incurs additional costs for file I/O compared to performing calculations directly on data in memory. Nevertheless, the coexistence of disk data files and memory write operations permits dynamic trade-offs and adjustments. It is based on this observation that the data file write event is introduced into the event model of the plugin. + +In summary, the data file write event appears in the event stream of stream processing plugins in the following two scenarios: + +1. Historical data extraction: Before a stream processing task starts, all persisted write data exists in the form of TsFiles. When collecting historical data at the beginning of a stream processing task, the historical data is abstracted as TsFileInsertionEvent. + +2. Real-time data extraction: During the execution of a stream processing task, if the speed of processing the log entries representing real-time operations is slower than the rate of write requests, the unprocessed log entries will be persisted to disk in the form of TsFiles. When these data are extracted by the stream processing engine, they are abstracted as TsFileInsertionEvent. + +```java +/** + * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, + * which is compressed and encoded, and requires IO cost for computational processing. + */ +public interface TsFileInsertionEvent extends Event { + + /** + * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. + * + * @return {@code Iterable} the list of TabletInsertionEvent + */ + Iterable toTabletInsertionEvents(); +} +``` + +### Custom Stream Processing Plugin Programming Interface Definition + +Based on the custom stream processing plugin programming interface, users can easily write data extraction plugins, data processing plugins, and data sending plugins, allowing the stream processing functionality to adapt flexibly to various industrial scenarios. +#### Data Extraction Plugin Interface + +Data extraction is the first stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data extraction plugin (PipeExtractor) serves as a bridge between the stream processing engine and the storage engine. It captures various data write events by listening to the behavior of the storage engine. +```java +/** + * PipeExtractor + * + *

PipeExtractor is responsible for capturing events from sources. + * + *

Various data sources can be supported by implementing different PipeExtractor classes. + * + *

The lifecycle of a PipeExtractor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH EXTRACTOR` clause in SQL are + * parsed and the validation method {@link PipeExtractor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeExtractor. + *
  • Then the method {@link PipeExtractor#start()} will be called to start the PipeExtractor. + *
  • While the collaboration task is in progress, the method {@link PipeExtractor#supply()} will + * be called to capture events from sources and then the events will be passed to the + * PipeProcessor. + *
  • The method {@link PipeExtractor#close()} will be called when the collaboration task is + * cancelled (the `DROP PIPE` command is executed). + *
+ */ +public interface PipeExtractor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeExtractor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeExtractorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeExtractor#validate(PipeParameterValidator)} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeExtractor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeExtractorRuntimeConfiguration configuration) + throws Exception; + + /** + * Start the extractor. After this method is called, events should be ready to be supplied by + * {@link PipeExtractor#supply()}. This method is called after {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} is called. + * + * @throws Exception the user can throw errors if necessary + */ + void start() throws Exception; + + /** + * Supply single event from the extractor and the caller will send the event to the processor. + * This method is called after {@link PipeExtractor#start()} is called. + * + * @return the event to be supplied. the event may be null if the extractor has no more events at + * the moment, but the extractor is still running for more events. + * @throws Exception the user can throw errors if necessary + */ + Event supply() throws Exception; +} +``` + +#### Data Processing Plugin Interface + +Data processing is the second stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data processing plugin (PipeProcessor) is primarily used for filtering and transforming the various events captured by the data extraction plugin (PipeExtractor). + +```java +/** + * PipeProcessor + * + *

PipeProcessor is used to filter and transform the Event formed by the PipeExtractor. + * + *

The lifecycle of a PipeProcessor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are + * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeProcessor. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeExtractor captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeConnector. The + * following 3 methods will be called: {@link + * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link + * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link + * PipeProcessor#process(Event, EventCollector)}. + *
    • PipeConnector serializes the events into binaries and send them to sinks. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeProcessor#close() } method will be called. + *
+ */ +public interface PipeProcessor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeProcessor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the + * events processing. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeProcessor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is called to process the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) + throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + process(tabletInsertionEvent, eventCollector); + } + } + + /** + * This method is called to process the Event. + * + * @param event Event to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(Event event, EventCollector eventCollector) throws Exception; +} +``` + +#### Data Sending Plugin Interface + +Data sending is the third stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data sending plugin (PipeConnector) is responsible for sending the various events processed by the data processing plugin (PipeProcessor). It serves as the network implementation layer of the stream processing framework and should support multiple real-time communication protocols and connectors in its interface. + +```java +/** + * PipeConnector + * + *

PipeConnector is responsible for sending events to sinks. + * + *

Various network protocols can be supported by implementing different PipeConnector classes. + * + *

The lifecycle of a PipeConnector is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH CONNECTOR` clause in SQL are + * parsed and the validation method {@link PipeConnector#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeConnector and the method {@link + * PipeConnector#handshake()} will be called to create a connection with sink. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeExtractor captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeConnector. + *
    • PipeConnector serializes the events into binaries and send them to sinks. The + * following 3 methods will be called: {@link + * PipeConnector#transfer(TabletInsertionEvent)}, {@link + * PipeConnector#transfer(TsFileInsertionEvent)} and {@link + * PipeConnector#transfer(Event)}. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeConnector#close() } method will be called. + *
+ * + *

In addition, the method {@link PipeConnector#heartbeat()} will be called periodically to check + * whether the connection with sink is still alive. The method {@link PipeConnector#handshake()} + * will be called to create a new connection with the sink when the method {@link + * PipeConnector#heartbeat()} throws exceptions. + */ +public interface PipeConnector extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeConnector. In this method, the user can do the + * following things: + * + *

    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeConnectorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeConnector#validate(PipeParameterValidator)} is called and before the method {@link + * PipeConnector#handshake()} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeConnector + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeConnectorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is used to create a connection with sink. This method will be called after the + * method {@link PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} is + * called or will be called when the method {@link PipeConnector#heartbeat()} throws exceptions. + * + * @throws Exception if the connection is failed to be created + */ + void handshake() throws Exception; + + /** + * This method will be called periodically to check whether the connection with sink is still + * alive. + * + * @throws Exception if the connection dies + */ + void heartbeat() throws Exception; + + /** + * This method is used to transfer the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; + + /** + * This method is used to transfer the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + transfer(tabletInsertionEvent); + } + } + + /** + * This method is used to transfer the Event. + * + * @param event Event to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(Event event) throws Exception; +} +``` + +## Custom Stream Processing Plugin Management + +To ensure the flexibility and usability of user-defined plugins in production environments, the system needs to provide the capability to dynamically manage plugins. This section introduces the management statements for stream processing plugins, which enable the dynamic and unified management of plugins. + +### Load Plugin Statement + +In IoTDB, to dynamically load a user-defined plugin into the system, you first need to implement a specific plugin class based on PipeExtractor, PipeProcessor, or PipeConnector. Then, you need to compile and package the plugin class into an executable jar file. Finally, you can use the loading plugin management statement to load the plugin into IoTDB. + +The syntax of the loading plugin management statement is as follows: + +```sql +CREATE PIPEPLUGIN +AS +USING +``` + +For example, if a user implements a data processing plugin with the fully qualified class name "edu.tsinghua.iotdb.pipe.ExampleProcessor" and packages it into a jar file, which is stored at "https://example.com:8080/iotdb/pipe-plugin.jar", and the user wants to use this plugin in the stream processing engine, marking the plugin as "example". The creation statement for this data processing plugin is as follows: + +```sql +CREATE PIPEPLUGIN example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +### 删除插件语句 + +当用户不再想使用一个插件,需要将插件从系统中卸载时,可以使用如图所示的删除插件语句。 + +```sql +DROP PIPEPLUGIN <别名> +``` + +### 查看插件语句 + +用户也可以按需查看系统中的插件。查看插件的语句如图所示。 + +```sql +SHOW PIPEPLUGINS +``` + +## 系统预置的流处理插件 + +### 预置 extractor 插件 + +#### iotdb-extractor + +作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 + + +| key | value | value 取值范围 | required or optional with default | +| ---------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | +| extractor | iotdb-extractor | String: iotdb-extractor | required | +| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| extractor.history.enable | 是否抽取历史数据 | Boolean: true, false | optional: true | +| extractor.history.start-time | 抽取的历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | 抽取的历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | 是否抽取实时数据 | Boolean: true, false | optional: true | + +> 🚫 **extractor.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,抽取效率较高,其他任意格式都将降低性能 +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> +> * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100T +> +> 的数据会被抽取; +> +> * root.aligned.\`1\` +> * root.aligned.\`123\` +> +> 的数据不会被抽取。 + +> ❗️**extractor.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + +> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> +> 用户可以指定 iotdb-extractor 进行: +> +> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` + +### 预置 processor 插件 + +#### do-nothing-processor + +作用:不对 extractor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### 预置 connector 插件 + +#### do-nothing-connector + +作用:不对 processor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| connector | do-nothing-connector | String: do-nothing-connector | required | + +## 流处理任务管理 + +### 创建流处理任务 + +使用 `CREATE PIPE` 语句来创建流处理任务。以数据同步流处理任务的创建为例,示例 SQL 语句如下: + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH EXTRACTOR ( + -- 默认的 IoTDB 数据抽取插件 + 'extractor' = 'iotdb-extractor', + -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + 'extractor.pattern' = 'root.timecho', + -- 是否抽取历史数据 + 'extractor.history.enable' = 'true', + -- 描述被抽取的历史数据的时间范围,表示最早时间 + 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', + -- 描述被抽取的历史数据的时间范围,表示最晚时间 + 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', + -- 是否抽取实时数据 + 'extractor.realtime.enable' = 'true', +) +WITH PROCESSOR ( + -- 默认的数据处理插件,即不做任何处理 + 'processor' = 'do-nothing-processor', +) +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +**创建流处理任务时需要配置 PipeId 以及三个插件部分的参数:** + + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| --------- | --------------------------------------------------- | --------------------------- | -------------------- | -------------------------------------------------------- | ------------------------- | +| PipeId | 全局唯一标定一个流处理任务的名称 | 必填 | - | - | - | +| extractor | Pipe Extractor 插件,负责在数据库底层抽取流处理数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入流处理任务 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | + +示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据流处理任务。IoTDB 还内置了其他的流处理插件,**请查看“系统预置流处理插件”一节**。 + +**一个最简的 CREATE PIPE 语句示例如下:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 + +**注意:** + +- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 +- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 +- CONNECTOR 具备自复用能力。对于不同的流处理任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个流处理任务的声明: + + ```sql + CREATE PIPE pipe1 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.host' = 'localhost', + 'connector.thrift.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.port' = '9999', + 'connector.thrift.host' = 'localhost', + ) + ``` + + - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 +- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### 启动流处理任务 + +CREATE PIPE 语句成功执行后,流处理任务相关实例会被创建,但整个流处理任务的运行状态会被置为 STOPPED,即流处理任务不会立刻处理数据。 + +可以使用 START PIPE 语句使流处理任务开始处理数据: + +```sql +START PIPE +``` + +### 停止流处理任务 + +使用 STOP PIPE 语句使流处理任务停止处理数据: + +```sql +STOP PIPE +``` + +### 删除流处理任务 + +使用 DROP PIPE 语句使流处理任务停止处理数据(当流处理任务状态为 RUNNING 时),然后删除整个流处理任务流处理任务: + +```sql +DROP PIPE +``` + +用户在删除流处理任务前,不需要执行 STOP 操作。 + +### 展示流处理任务 + +使用 SHOW PIPES 语句查看所有流处理任务: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +| ID| CreationTime | State|PipeExtractor|PipeProcessor|PipeConnector|ExceptionMessage| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| None| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +``` + +可以使用 `` 指定想看的某个流处理任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 + +```sql +SHOW PIPES +WHERE CONNECTOR USED BY +``` + +### Stream Processing Task Running Status Migration + +A stream processing task status can transition through several states during the lifecycle of a data synchronization pipe: + +- **STOPPED:** The pipe is in a stopped state. It can have the following possibilities: + - After the successful creation of a pipe, its initial state is set to stopped + - The user manually pauses a pipe that is in normal running state, transitioning its status from RUNNING to STOPPED + - If a pipe encounters an unrecoverable error during execution, its status automatically changes from RUNNING to STOPPED. +- **RUNNING:** The pipe is actively processing data +- **DROPPED:** The pipe is permanently deleted + +The following diagram illustrates the different states and their transitions: + +![state migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## Authority Management + +### Stream Processing Task + +| Authority Name | Description | +| ----------- | -------------------- | +| CREATE_PIPE | Register task,path-independent | +| START_PIPE | Start task,path-independent | +| STOP_PIPE | Stop task,path-independent | +| DROP_PIPE | Uninstall task,path-independent | +| SHOW_PIPES | Query task,path-independent | +### Stream Processing Task Plugin + + +| Authority Name | Description | +| ----------------- | ------------------------------ | +| CREATE_PIPEPLUGIN | Register stream processing task plugin,path-independent | +| DROP_PIPEPLUGIN | Start stream processing task plugin,path-independent | +| SHOW_PIPEPLUGINS | Query stream processing task plugin,path-independent | + +## Configure Parameters + +In iotdb-common.properties : + +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 +``` \ No newline at end of file From 8989685303d3f6a747e9e0f093975f546142e7cc Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Sun, 8 Oct 2023 18:50:39 +0800 Subject: [PATCH 13/27] 5 --- src/UserGuide/V1.2.x/User-Manual/Data-Sync.md | 15 +- src/UserGuide/V1.2.x/User-Manual/Streaming.md | 194 +++++++++--------- 2 files changed, 101 insertions(+), 108 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md index 9fdc707c..8959e7fa 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md @@ -84,7 +84,7 @@ By utilizing the data synchronization functionality, a complete data pipeline ca A data synchronisation task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: ```sql -CREATE PIPE -- PipeId is the name that uniquely identifies the synchronisation task +CREATE PIPE -- PipeId is the name that uniquely identifies the sync task WITH EXTRACTOR ( -- Default IoTDB Data Extraction Plugin 'extractor' = 'iotdb-extractor', @@ -123,8 +123,7 @@ WITH CONNECTOR ( | processor | Pipe Processor plug-in, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | | connector | Pipe Connector plug-in,for sending data | required | - | - | yes | -In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System pre-built data synchronisation plug-ins" **. See the "System Preconfigured Data Synchronisation Plugins" section**. - +In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System Pre-built Data Sync Plugin"**. **An example of a minimalist CREATE PIPE statement is as follows:** ```sql @@ -166,7 +165,7 @@ The expressed semantics are: synchronise the full amount of historical data and ``` - Since they have identical CONNECTOR declarations (**even if the order of some properties is different**), the framework will automatically reuse the CONNECTOR declared by them. Hence, the CONNECTOR instances for pipe1 and pipe2 will be the same. -- Please note that we should avoid constructing application scenarios that involve data cycle synchronization (as it can result in an infinite loop): +- Please note that we should avoid constructing application scenarios that involve data cycle sync (as it can result in an infinite loop): - IoTDB A -> IoTDB B -> IoTDB A - IoTDB A -> IoTDB A @@ -247,16 +246,16 @@ The following diagram illustrates the different states and their transitions: ![state migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) -## System Pre-installed Data Sync Plug-in +## System Pre-built Data Sync Plugin -### View pre-built plug-in +### View pre-built plugin User can view the plug-ins in the system on demand. The statement for viewing plug-ins is shown below. ```sql SHOW PIPEPLUGINS ``` -### Pre-built extractor plugin +### Pre-built Extractor Plugin #### iotdb-extractor @@ -314,7 +313,7 @@ Function: Extract historical or realtime data inside IoTDB into pipe. > * Full data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) > * Disable simultaneous sets `extractor.history.enable` and `extractor.realtime.enable` to `false` -### pre-processor plugin +### Pre-built Processor Plugin #### do-nothing-processor diff --git a/src/UserGuide/V1.2.x/User-Manual/Streaming.md b/src/UserGuide/V1.2.x/User-Manual/Streaming.md index 95ae03eb..f597cd83 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Streaming.md +++ b/src/UserGuide/V1.2.x/User-Manual/Streaming.md @@ -456,175 +456,172 @@ AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' USING URI '' ``` -### 删除插件语句 - -当用户不再想使用一个插件,需要将插件从系统中卸载时,可以使用如图所示的删除插件语句。 +### Delete Plugin Statement +When user no longer wants to use a plugin and needs to uninstall the plug-in from the system, you can use the Remove plugin statement as shown below. ```sql -DROP PIPEPLUGIN <别名> +DROP PIPEPLUGIN ``` -### 查看插件语句 - -用户也可以按需查看系统中的插件。查看插件的语句如图所示。 +### Show Plugin Statement +User can also view the plugin in the system on need. The statement to view plugin is as follows. ```sql SHOW PIPEPLUGINS ``` -## 系统预置的流处理插件 +## System Pre-installed Stream Processing Plugin -### 预置 extractor 插件 +### Pre-built extractor Plugin #### iotdb-extractor -作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 +Function: Extract historical or realtime data inside IoTDB into pipe. -| key | value | value 取值范围 | required or optional with default | -| ---------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | -| extractor | iotdb-extractor | String: iotdb-extractor | required | -| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | -| extractor.history.enable | 是否抽取历史数据 | Boolean: true, false | optional: true | -| extractor.history.start-time | 抽取的历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| extractor.history.end-time | 抽取的历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| extractor.realtime.enable | 是否抽取实时数据 | Boolean: true, false | optional: true | +| key | value | value range | required or optional with default | +| ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | +| extractor | iotdb-extractor | String: iotdb-extractor | required | +| extractor.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | +| extractor.history.enable | whether to synchronize historical data | Boolean: true, false | optional: true | +| extractor.history.start-time | start of synchronizing historical data event time,Include start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | end of synchronizing historical data event time,Include end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | Whether to synchronize realtime data | Boolean: true, false | optional: true | -> 🚫 **extractor.pattern 参数说明** +> 🚫 **extractor.pattern Parameter Description** > -> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) -> * 在底层实现中,当检测到 pattern 为 root(默认值)时,抽取效率较高,其他任意格式都将降低性能 -> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * In the underlying implementation, when pattern is detected as root (default value), synchronization efficiency is higher, and any other format will reduce performance. +> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'extractor.pattern'='root.aligned.1': > > * root.aligned.1TS > * root.aligned.1TS.\`1\` -> * root.aligned.100T -> -> 的数据会被抽取; -> +> * root.aligned.100TS +> +> the data will be synchronized; +> > * root.aligned.\`1\` > * root.aligned.\`123\` > -> 的数据不会被抽取。 +> the data will not be synchronized. -> ❗️**extractor.history 的 start-time,end-time 参数说明** +> ❗️**start-time, end-time parameter description of extractor.history** > -> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 +> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00 -> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> ✅ **a piece of data from production to IoTDB contains two key concepts of time** > -> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 -> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> * **event time:** the time when the data is actually produced (or the generation time assigned to the data by the data production system, which is a time item in the data point), also called the event time. +> * **arrival time:** the time the data arrived in the IoTDB system. > -> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 +> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. -> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> 💎 **the work of iotdb-extractor can be split into two stages** > -> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 -> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data +> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data > -> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** > -> 用户可以指定 iotdb-extractor 进行: +> Users can specify iotdb-extractor to: > -> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) -> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) -> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) -> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` +> * Historical data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * Realtime data extraction(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * Full data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * Disable simultaneous sets `extractor.history.enable` and `extractor.realtime.enable` to `false` -### 预置 processor 插件 +### Pre-built Processor Plugin #### do-nothing-processor -作用:不对 extractor 传入的事件做任何的处理。 +Function: Do not do anything with the events passed in by the extractor. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------- | -------------------- | ---------------------------- | --------------------------------- | | processor | do-nothing-processor | String: do-nothing-processor | required | - -### 预置 connector 插件 +### Pre-built Connector Plugin #### do-nothing-connector -作用:不对 processor 传入的事件做任何的处理。 +Function: Does not do anything with the events passed in by the processor. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------- | -------------------- | ---------------------------- | --------------------------------- | | connector | do-nothing-connector | String: do-nothing-connector | required | -## 流处理任务管理 +## Stream Processing Task Management -### 创建流处理任务 +### Create Stream Processing Task -使用 `CREATE PIPE` 语句来创建流处理任务。以数据同步流处理任务的创建为例,示例 SQL 语句如下: +A stream processing task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: ```sql -CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +CREATE PIPE -- PipeId is the name that uniquely identifies the sync task WITH EXTRACTOR ( - -- 默认的 IoTDB 数据抽取插件 + -- Default IoTDB Data Extraction Plugin 'extractor' = 'iotdb-extractor', - -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery 'extractor.pattern' = 'root.timecho', - -- 是否抽取历史数据 + -- Whether to extract historical data 'extractor.history.enable' = 'true', - -- 描述被抽取的历史数据的时间范围,表示最早时间 + -- Describes the time range of the historical data being extracted, indicating the earliest possible time 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', - -- 描述被抽取的历史数据的时间范围,表示最晚时间 + -- Describes the time range of the extracted historical data, indicating the latest time 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', - -- 是否抽取实时数据 + -- Whether to extract realtime data 'extractor.realtime.enable' = 'true', ) WITH PROCESSOR ( - -- 默认的数据处理插件,即不做任何处理 + -- Default data processing plugin, means no processing 'processor' = 'do-nothing-processor', ) WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB + -- IoTDB data sending plugin with target IoTDB 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + -- Data service for one of the DataNode nodes on the target IoTDB ip 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + -- Data service port of one of the DataNode nodes of the target IoTDB 'connector.port' = '6667', ) ``` -**创建流处理任务时需要配置 PipeId 以及三个插件部分的参数:** +**To create a stream processing task it is necessary to configure the PipeId and the parameters of the three plugin sections:** -| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | -| --------- | --------------------------------------------------- | --------------------------- | -------------------- | -------------------------------------------------------- | ------------------------- | -| PipeId | 全局唯一标定一个流处理任务的名称 | 必填 | - | - | - | -| extractor | Pipe Extractor 插件,负责在数据库底层抽取流处理数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入流处理任务 | 否 | -| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | -| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | +| configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | +| --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | +| pipeId | Globally uniquely identifies the name of a sync task | required | - | - | - | +| extractor | pipe Extractor plug-in, for extracting synchronized data at the bottom of the database | Optional | iotdb-extractor | Integrate all historical data of the database and subsequent realtime data into the sync task | no | +| processor | Pipe Processor plug-in, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | +| connector | Pipe Connector plug-in,for sending data | required | - | - | yes | -示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据流处理任务。IoTDB 还内置了其他的流处理插件,**请查看“系统预置流处理插件”一节**。 +In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System pre-built data synchronisation plug-ins" **. See the "System Pre-installed Stream Processing Plugin" section**. -**一个最简的 CREATE PIPE 语句示例如下:** +**An example of a minimalist CREATE PIPE statement is as follows:** ```sql -CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +CREATE PIPE -- PipeId is a name that uniquely identifies the task. WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB + -- IoTDB data sending plugin with target IoTDB 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + -- Data service for one of the DataNode nodes on the target IoTDB ip 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + -- Data service port of one of the DataNode nodes of the target IoTDB 'connector.port' = '6667', ) ``` -其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 +The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of realtime data from this database instance to the IoTDB instance with target 127.0.0.1:6667. -**注意:** +**Note:** -- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 -- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 -- CONNECTOR 具备自复用能力。对于不同的流处理任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 +- EXTRACTOR and PROCESSOR are optional, if no configuration parameters are filled in, the system will use the corresponding default implementation. +- The CONNECTOR is a mandatory configuration that needs to be declared in the CREATE PIPE statement for configuring purposes. +- The CONNECTOR exhibits self-reusability. For different tasks, if their CONNECTOR possesses identical KV properties (where the value corresponds to every key), **the system will ultimately create only one instance of the CONNECTOR** to achieve resource reuse for connections. - - 例如,有下面 pipe1, pipe2 两个流处理任务的声明: + - For example, there are the following pipe1, pipe2 task declarations: ```sql CREATE PIPE pipe1 @@ -642,49 +639,47 @@ WITH CONNECTOR ( ) ``` - - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 -- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + - Since they have identical CONNECTOR declarations (**even if the order of some properties is different**), the framework will automatically reuse the CONNECTOR declared by them. Hence, the CONNECTOR instances for pipe1 and pipe2 will be the same. +- Please note that we should avoid constructing application scenarios that involve data cycle sync (as it can result in an infinite loop): - IoTDB A -> IoTDB B -> IoTDB A - IoTDB A -> IoTDB A -### 启动流处理任务 - -CREATE PIPE 语句成功执行后,流处理任务相关实例会被创建,但整个流处理任务的运行状态会被置为 STOPPED,即流处理任务不会立刻处理数据。 +### Start Stream Processing Task -可以使用 START PIPE 语句使流处理任务开始处理数据: +After the successful execution of the CREATE PIPE statement, an instance of the stream processing task is created, but the overall task's running status will be set to STOPPED, meaning the task will not immediately process data. +You can use the START PIPE statement to make the stream processing task start processing data: ```sql START PIPE ``` -### 停止流处理任务 +### Stop Stream Processing Task -使用 STOP PIPE 语句使流处理任务停止处理数据: +Use the STOP PIPE statement to stop the stream processing task from processing data: ```sql STOP PIPE ``` -### 删除流处理任务 +### Delete Stream Processing Task -使用 DROP PIPE 语句使流处理任务停止处理数据(当流处理任务状态为 RUNNING 时),然后删除整个流处理任务流处理任务: +If a stream processing task is in the RUNNING state, you can use the DROP PIPE statement to stop it and delete the entire task: ```sql DROP PIPE ``` -用户在删除流处理任务前,不需要执行 STOP 操作。 +Before deleting a stream processing task, there is no need to execute the STOP operation. -### 展示流处理任务 - -使用 SHOW PIPES 语句查看所有流处理任务: +### Show Stream Processing Task +Use the SHOW PIPES statement to view all stream processing tasks: ```sql SHOW PIPES ``` -查询结果如下: +The query results are as follows: ```sql +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ @@ -696,13 +691,12 @@ SHOW PIPES +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ ``` -可以使用 `` 指定想看的某个流处理任务状态: - +You can use `` to specify the status of a stream processing task you want to see: ```sql SHOW PIPE ``` -您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 +Additionally, the WHERE clause can be used to determine if the Pipe Connector used by a specific \ is being reused. ```sql SHOW PIPES From 78bd683ce96412f842b0f888b075c1622e0c91e1 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Mon, 9 Oct 2023 19:22:39 +0800 Subject: [PATCH 14/27] 4 --- src/UserGuide/V1.2.x/User-Manual/Data-Sync.md | 20 +- .../V1.2.x/User-Manual/Data-Sync_timecho.md | 416 +++++----- .../V1.2.x/User-Manual/Streaming_timecho.md | 771 +++++++++++++++++- 3 files changed, 1011 insertions(+), 196 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md index 8959e7fa..42263aec 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md @@ -34,13 +34,13 @@ ![Task model diagram](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) -It describes a data synchronization task, which essentially describes the attributes of the Pipe Extractor, Pipe Processor, and Pipe Connector plugins. Users can declaratively configure the specific attributes of the three subtasks through SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. +It describes a data sync task, which essentially describes the attributes of the Pipe Extractor, Pipe Processor, and Pipe Connector plugins. Users can declaratively configure the specific attributes of the three subtasks through SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. -By utilizing the data synchronization functionality, a complete data pipeline can be built to fulfill various requirements such as edge-to-cloud synchronization, remote disaster recovery, and read-write workload distribution across multiple databases. +By utilizing the data sync functionality, a complete data pipeline can be built to fulfill various requirements such as edge-to-cloud sync, remote disaster recovery, and read-write workload distribution across multiple databases. ## Quick Start -**🎯 Goal: Achieve full data synchronisation of IoTDB A -> IoTDB B** +**🎯 Goal: Achieve full data sync of IoTDB A -> IoTDB B** - Start two IoTDBs,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) - create a Pipe from A -> B, and execute on A @@ -68,20 +68,20 @@ By utilizing the data synchronization functionality, a complete data pipeline ca SELECT ** FROM root ``` -> ❗️**Note: The current IoTDB -> IoTDB implementation of data synchronisation does not support DDL synchronisation** +> ❗️**Note: The current IoTDB -> IoTDB implementation of data sync does not support DDL sync** > > That is: ttl, trigger, alias, template, view, create/delete sequence, create/delete storage group, etc. are not supported. > -> **IoTDB -> IoTDB data synchronisation requires the target IoTDB:** +> **IoTDB -> IoTDB data sync requires the target IoTDB:** > > * Enable automatic metadata creation: manual configuration of encoding and compression of data types to be consistent with the sender is required > * Do not enable automatic metadata creation: manually create metadata that is consistent with the source -## Synchronization task management +## Sync Task Management -### Create a synchronization task +### Create a sync task -A data synchronisation task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: +A data sync task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: ```sql CREATE PIPE -- PipeId is the name that uniquely identifies the sync task @@ -113,7 +113,7 @@ WITH CONNECTOR ( ) ``` -**To create a synchronisation task it is necessary to configure the PipeId and the parameters of the three plugin sections:** +**To create a sync task it is necessary to configure the PipeId and the parameters of the three plugin sections:** | configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | @@ -123,7 +123,7 @@ WITH CONNECTOR ( | processor | Pipe Processor plug-in, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | | connector | Pipe Connector plug-in,for sending data | required | - | - | yes | -In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System Pre-built Data Sync Plugin"**. +In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data sync task. iotdb has other built-in data sync plug-ins, **see the section "System Pre-built Data Sync Plugin"**. **An example of a minimalist CREATE PIPE statement is as follows:** ```sql diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md index e96087ca..422cab13 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md @@ -7,9 +7,9 @@ to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -22,28 +22,28 @@ # IoTDB Data Sync **The IoTDB data sync transfers data from IoTDB to another data platform, and a data sync task is called a Pipe.** -**一个 Pipe 包含三个子任务(插件):** +**A Pipe consists of three subtasks (plugins):** -- 抽取(Extract) -- 处理(Process) -- 发送(Connect) +- Extract +- Process +- Connect -**Pipe 允许用户自定义三个子任务的处理逻辑,通过类似 UDF 的方式处理数据。** 在一个 Pipe 中,上述的子任务分别由三种插件执行实现,数据会依次经过这三个插件进行处理:Pipe Extractor 用于抽取数据,Pipe Processor 用于处理数据,Pipe Connector 用于发送数据,最终数据将被发至外部系统。 +**Pipe allows users to customize the processing logic of these three subtasks, just like handling data using UDF (User-Defined Functions)**. Within a Pipe, the aforementioned subtasks are executed and implemented by three types of plugins. Data flows through these three plugins sequentially: Pipe Extractor is used to extract data, Pipe Processor is used to process data, and Pipe Connector is used to send data to an external system. -**Pipe 任务的模型如下:** +**The model of a Pipe task is as follows:** -![任务模型图](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) +![Task model diagram](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) -描述一个数据同步任务,本质就是描述 Pipe Extractor、Pipe Processor 和 Pipe Connector 插件的属性。用户可以通过 SQL 语句声明式地配置三个子任务的具体属性,通过组合不同的属性,实现灵活的数据 ETL 能力。 +It describes a data sync task, which essentially describes the attributes of the Pipe Extractor, Pipe Processor, and Pipe Connector plugins. Users can declaratively configure the specific attributes of the three subtasks through SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. -利用数据同步功能,可以搭建完整的数据链路来满足端*边云同步、异地灾备、读写负载分库*等需求。 +By utilizing the data sync functionality, a complete data pipeline can be built to fulfill various requirements such as edge-to-cloud sync, remote disaster recovery, and read-write workload distribution across multiple databases. -## 快速开始 +## Quick Start -**🎯 目标:实现 IoTDB A -> IoTDB B 的全量数据同步** +**🎯 Goal: Achieve full data sync of IoTDB A -> IoTDB B** -- 启动两个 IoTDB,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) -- 创建 A -> B 的 Pipe,在 A 上执行 +- Start two IoTDBs,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) +- create a Pipe from A -> B, and execute on A ```sql create pipe a2b @@ -53,102 +53,100 @@ 'connector.port'='6668' ) ``` -- 启动 A -> B 的 Pipe,在 A 上执行 +- start a Pipe from A -> B, and execute on A ```sql start pipe a2b ``` -- 向 A 写入数据 +- Write data to A ```sql INSERT INTO root.db.d(time, m) values (1, 1) ``` -- 在 B 检查由 A 同步过来的数据 - +- Checking data synchronised from A at B ```sql SELECT ** FROM root ``` -> ❗️**注:目前的 IoTDB -> IoTDB 的数据同步实现并不支持 DDL 同步** +> ❗️**Note: The current IoTDB -> IoTDB implementation of data sync does not support DDL sync** > -> 即:不支持 ttl,trigger,别名,模板,视图,创建/删除序列,创建/删除存储组等操作 +> That is: ttl, trigger, alias, template, view, create/delete sequence, create/delete storage group, etc. are not supported. > -> **IoTDB -> IoTDB 的数据同步要求目标端 IoTDB:** +> **IoTDB -> IoTDB data sync requires the target IoTDB:** > -> * 开启自动创建元数据:需要人工配置数据类型的编码和压缩与发送端保持一致 -> * 不开启自动创建元数据:手工创建与源端一致的元数据 +> * Enable automatic metadata creation: manual configuration of encoding and compression of data types to be consistent with the sender is required +> * Do not enable automatic metadata creation: manually create metadata that is consistent with the source -## 同步任务管理 +## Sync Task Management -### 创建同步任务 +### Create a sync task -可以使用 `CREATE PIPE` 语句来创建一条数据同步任务,示例 SQL 语句如下所示: +A data sync task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: ```sql -CREATE PIPE -- PipeId 是能够唯一标定同步任务任务的名字 +CREATE PIPE -- PipeId is the name that uniquely identifies the sync task WITH EXTRACTOR ( - -- 默认的 IoTDB 数据抽取插件 + -- Default IoTDB Data Extraction Plugin 'extractor' = 'iotdb-extractor', - -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery 'extractor.pattern' = 'root.timecho', - -- 是否抽取历史数据 + -- Whether to extract historical data 'extractor.history.enable' = 'true', - -- 描述被抽取的历史数据的时间范围,表示最早时间 + -- Describes the time range of the historical data being extracted, indicating the earliest possible time 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', - -- 描述被抽取的历史数据的时间范围,表示最晚时间 + -- Describes the time range of the extracted historical data, indicating the latest time 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', - -- 是否抽取实时数据 + -- Whether to extract realtime data 'extractor.realtime.enable' = 'true', ) WITH PROCESSOR ( - -- 默认的数据处理插件,即不做任何处理 + -- Default data processing plugin, means no processing 'processor' = 'do-nothing-processor', ) WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB + -- IoTDB data sending plugin with target IoTDB 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + -- Data service for one of the DataNode nodes on the target IoTDB ip 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + -- Data service port of one of the DataNode nodes of the target IoTDB 'connector.port' = '6667', ) ``` -**创建同步任务时需要配置 PipeId 以及三个插件部分的参数:** +**To create a sync task it is necessary to configure the PipeId and the parameters of the three plugin sections:** -| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | | --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | -| PipeId | 全局唯一标定一个同步任务的名称 | 必填 | - | - | - | -| extractor | Pipe Extractor 插件,负责在数据库底层抽取同步数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入同步任务 | 否 | -| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | -| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | - -示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据同步任务。IoTDB 还内置了其他的数据同步插件,**请查看“系统预置数据同步插件”一节**。 +| pipeId | Globally uniquely identifies the name of a sync task | required | - | - | - | +| extractor | pipe Extractor plug-in, for extracting synchronized data at the bottom of the database | Optional | iotdb-extractor | Integrate all historical data of the database and subsequent realtime data into the sync task | no | +| processor | Pipe Processor plug-in, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | +| connector | Pipe Connector plug-in,for sending data | required | - | - | yes | -**一个最简的 CREATE PIPE 语句示例如下:** +In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data sync task. iotdb has other built-in data sync plug-ins, **see the section "System Pre-built Data Sync Plugin"**. +**An example of a minimalist CREATE PIPE statement is as follows:** ```sql -CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 +CREATE PIPE -- PipeId is a name that uniquely identifies the task. WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB + -- IoTDB data sending plugin with target IoTDB 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + -- Data service for one of the DataNode nodes on the target IoTDB ip 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + -- Data service port of one of the DataNode nodes of the target IoTDB 'connector.port' = '6667', ) ``` -其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 +The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of realtime data from this database instance to the IoTDB instance with target 127.0.0.1:6667. -**注意:** +**Note:** -- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 -- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 -- CONNECTOR 具备自复用能力。对于不同的任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 +- EXTRACTOR and PROCESSOR are optional, if no configuration parameters are filled in, the system will use the corresponding default implementation. +- The CONNECTOR is a mandatory configuration that needs to be declared in the CREATE PIPE statement for configuring purposes. +- The CONNECTOR exhibits self-reusability. For different tasks, if their CONNECTOR possesses identical KV properties (where the value corresponds to every key), **the system will ultimately create only one instance of the CONNECTOR** to achieve resource reuse for connections. - - 例如,有下面 pipe1, pipe2 两个任务的声明: + - For example, there are the following pipe1, pipe2 task declarations: ```sql CREATE PIPE pipe1 @@ -166,49 +164,50 @@ WITH CONNECTOR ( ) ``` - - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 -- 请不要构建出包含数据循环同步的应用场景(会导致无限循环): + - Since they have identical CONNECTOR declarations (**even if the order of some properties is different**), the framework will automatically reuse the CONNECTOR declared by them. Hence, the CONNECTOR instances for pipe1 and pipe2 will be the same. + + - When extractor is the default iotdb-extractor, and extractor.forwarding-pipe-requests is the default value true, please do not build an application scenario that involve data cycle sync (as it can result in an infinite loop): - IoTDB A -> IoTDB B -> IoTDB A - IoTDB A -> IoTDB A -### 启动任务 +### START TASK -CREATE PIPE 语句成功执行后,任务相关实例会被创建,但整个任务的运行状态会被置为 STOPPED,即任务不会立刻处理数据。 +After the successful execution of the CREATE PIPE statement, task-related instances will be created. However, the overall task's running status will be set to STOPPED, meaning the task will not immediately process data. -可以使用 START PIPE 语句使任务开始处理数据: +You can use the START PIPE statement to begin processing data for a task: ```sql START PIPE ``` -### 停止任务 +### STOP TASK -使用 STOP PIPE 语句使任务停止处理数据: +the STOP PIPE statement can be used to halt the data processing: ```sql STOP PIPE ``` -### 删除任务 +### DELETE TASK -使用 DROP PIPE 语句使任务停止处理数据(当任务状态为 RUNNING 时),然后删除整个任务同步任务: +If a task is in the RUNNING state, you can use the DROP PIPE statement to stop the data processing and delete the entire task: ```sql DROP PIPE ``` -用户在删除任务前,不需要执行 STOP 操作。 +Before deleting a task, there is no need to execute the STOP operation. -### 展示任务 +### SHOW TASK -使用 SHOW PIPES 语句查看所有任务: +You can use the SHOW PIPES statement to view all tasks: ```sql SHOW PIPES ``` -查询结果如下: +The query results are as follows: ```sql +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ @@ -220,194 +219,235 @@ SHOW PIPES +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ ``` -可以使用 `` 指定想看的某个同步任务状态: +You can use to specify the status of a particular synchronization task: ```sql SHOW PIPE ``` -您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 +Additionally, the WHERE clause can be used to determine if the Pipe Connector used by a specific \ is being reused. ```sql SHOW PIPES WHERE CONNECTOR USED BY ``` -### 任务运行状态迁移 +### Task Running Status Migration -一个数据同步 pipe 在其被管理的生命周期中会经过多种状态: +The task running status can transition through several states during the lifecycle of a data synchronization pipe: -- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: - - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 - - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED - - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED -- **RUNNING:** pipe 正在正常工作 -- **DROPPED:** pipe 任务被永久删除 +- **STOPPED:** The pipe is in a stopped state. It can have the following possibilities: + - After the successful creation of a pipe, its initial state is set to stopped + - The user manually pauses a pipe that is in normal running state, transitioning its status from RUNNING to STOPPED + - If a pipe encounters an unrecoverable error during execution, its status automatically changes from RUNNING to STOPPED. +- **RUNNING:** The pipe is actively processing data +- **DROPPED:** The pipe is permanently deleted -下图表明了所有状态以及状态的迁移: +The following diagram illustrates the different states and their transitions: -![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) +![state migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) +## System Pre-built Data Sync Plugin -## 系统预置数据同步插件 - -### 查看预置插件 - -用户可以按需查看系统中的插件。查看插件的语句如图所示。 +### View pre-built plugin +User can view the plug-ins in the system on demand. The statement for viewing plug-ins is shown below. ```sql SHOW PIPEPLUGINS ``` -### 预置 extractor 插件 +### Pre-built Extractor Plugin #### iotdb-extractor -作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 - +Function: Extract historical or realtime data inside IoTDB into pipe. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | | extractor | iotdb-extractor | String: iotdb-extractor | required | -| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | -| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | optional: true | -| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | optional: true | - -> 🚫 **extractor.pattern 参数说明** +| extractor.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | +| extractor.history.enable | whether to synchronize historical data | Boolean: true, false | optional: true | +| extractor.history.start-time | start of synchronizing historical data event time,Include start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | end of synchronizing historical data event time,Include end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | Whether to sync realtime data | Boolean: true, false | optional: true | +| extractor.realtime.mode | Extraction pattern for realtime data | String: hybrid, log, file | optional: hybrid | +| extractor.forwarding-pipe-requests | Whether or not to forward data written by another Pipe (usually Data Sync) | Boolean: true, false | optional: true | + +> 🚫 **extractor.pattern Parameter Description** > -> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) -> * 在底层实现中,当检测到 pattern 为 root(默认值)时,同步效率较高,其他任意格式都将降低性能 -> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * In the underlying implementation, when pattern is detected as root (default value), synchronization efficiency is higher, and any other format will reduce performance. +> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'extractor.pattern'='root.aligned.1': > > * root.aligned.1TS > * root.aligned.1TS.\`1\` > * root.aligned.100TS > -> 的数据会被同步; +> the data will be synchronized; > > * root.aligned.\`1\` > * root.aligned.\`123\` > -> 的数据不会被同步。 +> the data will not be synchronized. -> ❗️**extractor.history 的 start-time,end-time 参数说明** +> ❗️**start-time, end-time parameter description of extractor.history** > -> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 +> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00 -> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> ✅ **a piece of data from production to IoTDB contains two key concepts of time** > -> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 -> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> * **event time:** the time when the data is actually produced (or the generation time assigned to the data by the data production system, which is a time item in the data point), also called the event time. +> * **arrival time:** the time the data arrived in the IoTDB system. > -> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 +> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. -> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> 💎 **the work of iotdb-extractor can be split into two stages** +> +> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data +> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data +> +> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** > -> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 -> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> Users can specify iotdb-extractor to: > -> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> * Historical data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * Realtime data extraction(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * Full data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * Disable simultaneous sets `extractor.history.enable` and `extractor.realtime.enable` to `false` > -> 用户可以指定 iotdb-extractor 进行: +> 📌 **extractor.realtime.mode: mode in which data is extracted** > -> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) -> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) -> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) -> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` +> * log: in this mode, the task uses only operation logs for data processing and sending. +> * file: in this mode, the task uses only data files for data processing and sending. +> * hybrid: This mode takes into account the characteristics of low latency but low throughput when sending data item by item according to the operation log and high throughput but high latency when sending data in batches according to the data file, and is able to automatically switch to a suitable data extraction method under different write loads. When data backlog is generated, it automatically switches to data file-based data extraction to ensure high sending throughput, and when the backlog is eliminated, it automatically switches back to operation log-based data extraction, which avoids the problem that it is difficult to balance the data sending latency or throughput by using a single data extraction algorithm. +> 🍕 **extractor.forwarding-pipe-requests: whether to allow forwarding of data transferred from another pipe**. +> +> * If pipe is to be used to build A -> B -> C data sync, then the pipe of B -> C needs to have this parameter set to true for the data written from A -> B to B via the pipe to be forwarded to C correctly. +> * If you want to use pipe to build a bi-directional data sync between A \<-> B, then the pipe for A -> B and B -> A need to be set to false, otherwise it will result in an endless loop of data being forwarded between clusters. -### 预置 processor 插件 +### Pre-built Processor Plugin #### do-nothing-processor -作用:不对 extractor 传入的事件做任何的处理。 +Function: Do not do anything with the events passed in by the extractor. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------- | -------------------- | ---------------------------- | --------------------------------- | | processor | do-nothing-processor | String: do-nothing-processor | required | -### 预置 connector 插件 - -#### iotdb-thrift-sync-connector(别名:iotdb-thrift-connector) +### pre-connector plugin -作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 -使用 Thrift RPC 框架传输数据,单线程 blocking IO 模型。 -保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致。 +#### iotdb-thrift-sync-connector(alias:iotdb-thrift-connector) -限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 +Function: Primarily used for data transfer between IoTDB instances (v1.2.0+). Data is transmitted using the Thrift RPC framework and a single-threaded blocking IO model. It guarantees that the receiving end applies the data in the same order as the sending end receives the write requests. +Limitation: Both the source and target IoTDB versions need to be v1.2.0+. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | -| connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | +| connector | iotdb-thrift-connector or iotdb-thrift-sync-connector | String: iotdb-thrift-connector or iotdb-thrift-sync-connector | required | +| connector.ip | the data service IP of one of the DataNode nodes in the target IoTDB | String | optional: and connector.node-urls fill in either one | +| connector.port | the data service port of one of the DataNode nodes in the target IoTDB | Integer | optional: and connector.node-urls fill in either one | +| connector.node-urls | the URL of the data service port of any multiple DataNode nodes in the target IoTDB | String。eg:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: and connector.ip:connector.port fill in either one | +| connector.batch.enable | Whether to enable log accumulation and batch sending mode to improve transmission throughput and reduce IOPS | Boolean: true, false | optional: true | +| connector.batch.max-delay-seconds | Effective when the log save and send mode is turned on, indicates the longest time a batch of data waits before being sent (unit: s) | Integer | optional: 1 | +| connector.batch.size-bytes | Effective when log saving and delivery mode is enabled, indicates the maximum saving size of a batch of data (unit: byte) | Long | optional: 16 * 1024 * 1024 (16MiB) | -> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 +> 📌 Make sure that the receiver has created all the time series on the sender side, or that automatic metadata creation is turned on, otherwise the pipe run will fail. #### iotdb-thrift-async-connector -作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 -使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景。 -不保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致,但是保证数据发送的完整性(at-least-once)。 +Function: Primarily used for data transfer between IoTDB instances (v1.2.0+). +Data is transmitted using the Thrift RPC framework, employing a multi-threaded async non-blocking IO model, resulting in high transfer performance. It is particularly suitable for distributed scenarios on the target end. +It does not guarantee that the receiving end applies the data in the same order as the sending end receives the write requests, but it guarantees data integrity (at-least-once). -限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 +Limitation: Both the source and target IoTDB versions need to be v1.2.0+. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | | connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | +| connector.ip | the data service IP of one of the DataNode nodes in the target IoTDB | String | optional: and connector.node-urls fill in either one | +| connector.port | the data service port of one of the DataNode nodes in the target IoTDB | Integer | optional: and connector.node-urls fill in either one | +| connector.node-urls | the URL of the data service port of any multiple DataNode nodes in the target IoTDB | String。eg: '127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: and connector.ip:connector.port fill in either one | +| connector.batch.enable | Whether to enable the log saving wholesale delivery mode, which is used to improve transmission throughput and reduce IOPS | Boolean: true, false | optional: true | +| connector.batch.max-delay-seconds | Effective when the log save and send mode is turned on, indicates the longest time a batch of data waits before being sent (unit: s) | Integer | optional: 1 | +| connector.batch.size-bytes | Effective when log saving and delivery mode is enabled, indicates the maximum saving size of a batch of data (unit: byte) | Long | optional: 16 * 1024 * 1024 (16MiB) | -> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 +> 📌 Please ensure that the receiving end has already created all the time series present in the sending end or has enabled automatic metadata creation. Otherwise, it may result in the failure of the pipe operation. #### iotdb-legacy-pipe-connector -作用:主要用于 IoTDB(v1.2.0+)向更低版本的 IoTDB 传输数据,使用 v1.2.0 版本前的数据同步(Sync)协议。 -使用 Thrift RPC 框架传输数据。单线程 sync blocking IO 模型,传输性能较弱。 - -限制:源端 IoTDB 版本需要在 v1.2.0+,目标端 IoTDB 版本可以是 v1.2.0+、v1.1.x(更低版本的 IoTDB 理论上也支持,但是未经测试)。 +Function: Mainly used to transfer data from IoTDB (v1.2.0+) to lower versions of IoTDB, using the data synchronization (Sync) protocol before version v1.2.0. +Data is transmitted using the Thrift RPC framework. It employs a single-threaded sync blocking IO model, resulting in weak transfer performance. -注意:理论上 v1.2.0+ IoTDB 可作为 v1.2.0 版本前的任意版本的数据同步(Sync)接收端。 +Limitation: The source IoTDB version needs to be v1.2.0+. The target IoTDB version can be either v1.2.0+, v1.1.x (lower versions of IoTDB are theoretically supported but untested). +Note: In theory, any version prior to v1.2.0 of IoTDB can serve as the data synchronization (Sync) receiver for v1.2.0+. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | ------------------ | --------------------------------------------------------------------- | ----------------------------------- | --------------------------------- | -| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | -| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | -| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | -| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | optional: 1.1 | +| connector | iotdb-legacy-pipe-connector | string: iotdb-legacy-pipe-connector | required | +| connector.ip | data service of one DataNode node of the target IoTDB ip | string | required | +| connector.port | the data service port of one of the DataNode nodes in the target IoTDB | integer | required | +| connector.user | the user name of the target IoTDB. Note that the user needs to support data writing and TsFile Load permissions. | string | optional: root | +| connector.password | the password of the target IoTDB. Note that the user needs to support data writing and TsFile Load permissions. | string | optional: root | +| connector.version | the version of the target IoTDB, used to disguise its actual version and bypass the version consistency check of the target. | string | optional: 1.1 | -> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 +> 📌 Make sure that the receiver has created all the time series on the sender side, or that automatic metadata creation is turned on, otherwise the pipe run will fail. +#### iotdb-air-gap-connector + +Function: Used for data sync from IoTDB (v1.2.2+) to IoTDB (v1.2.2+) across one-way data gatekeepers. Supported gatekeeper models include NARI Syskeeper 2000, etc. +This Connector uses Java's own Socket to implement data transmission, a single-thread blocking IO model, and its performance is comparable to iotdb-thrift-sync-connector. +Ensure that the order in which the receiving end applies data is consistent with the order in which the sending end accepts write requests. + +Scenario: For example, in the specification of power systems + +> 1. Applications between Zone I/II and Zone III are prohibited from using SQL commands to access the database and bidirectional data transmission based on B/S mode. +> +> 2. For data communication between Zone I/II and Zone III, the transmission end is initiated by the intranet. The reverse response message is not allowed to carry data. The response message of the application layer is at most 1 byte and 1 word. The section has two states: all 0s or all 1s. + +limit: + +1. Both the source IoTDB and target IoTDB versions need to be v1.2.2+. +2. The one-way data gatekeeper needs to allow TCP requests to cross, and each request can return a byte of all 1s or all 0s. +3. The target IoTDB needs to be configured in iotdb-common.properties + a. pipe_air_gap_receiver_enabled=true + b. pipe_air_gap_receiver_port configures the receiving port of the receiver + + +| key | value | value range | required or optional with default | +| -------------------------------------- | ---------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-air-gap-connector | String: iotdb-air-gap-connector | required | +| connector.ip | the data service IP of one of the DataNode nodes in the target IoTDB | String | optional: and connector.node-urls fill in either one | +| connector.port | the data service port of one of the DataNode nodes in the target IoTDB | Integer | optional: and connector.node-urls fill in either one | +| connector.node-urls | the URL of the data service port of any multiple DataNode nodes in the target IoTDB | String. eg:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port fill in either one | +| connector.air-gap.handshake-timeout-ms | The timeout period for the handshake request when the source and target try to establish a connection for the first time, unit: milliseconds | Integer | optional: 5000 | + +> 📌 Make sure that the receiver has created all the time series on the sender side or that automatic metadata creation is turned on, otherwise the pipe run will fail. #### do-nothing-connector -作用:不对 processor 传入的事件做任何的处理。 +Function: Does not do anything with the events passed in by the processor. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------- | -------------------- | ---------------------------- | --------------------------------- | | connector | do-nothing-connector | String: do-nothing-connector | required | -## 权限管理 +## Authority Management -| 权限名称 | 描述 | +| Authority Name | Description | | ----------- | -------------------- | -| CREATE_PIPE | 注册任务。路径无关。 | -| START_PIPE | 开启任务。路径无关。 | -| STOP_PIPE | 停止任务。路径无关。 | -| DROP_PIPE | 卸载任务。路径无关。 | -| SHOW_PIPES | 查询任务。路径无关。 | +| CREATE_PIPE | Register task,path-independent | +| START_PIPE | Start task,path-independent | +| STOP_PIPE | Stop task,path-independent | +| DROP_PIPE | Uninstall task,path-independent | +| SHOW_PIPES | Query task,path-independent | -## 配置参数 +## Configure Parameters -在 iotdb-common.properties 中: +In iotdb-common.properties : ```Properties #################### @@ -438,32 +478,44 @@ SHOW PIPEPLUGINS # The maximum number of clients that can be used in the async connector. # pipe_async_connector_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# pipe_air_gap_receiver_port=9780 ``` -## 功能特性 +## Functionality Features + +### At least one semantic guarantee **at-least-once** -### 最少一次语义保证 **at-least-once** +The data synchronization feature provides an at-least-once delivery semantic when transferring data to external systems. In most scenarios, the synchronization feature guarantees exactly-once delivery, ensuring that all data is synchronized exactly once. -数据同步功能向外部系统传输数据时,提供 at-least-once 的传输语义。在大部分场景下,同步功能可提供 exactly-once 保证,即所有数据被恰好同步一次。 +However, in the following scenarios, it is possible for some data to be synchronized multiple times **(due to resumable transmission)**: -但是在以下场景中,可能存在部分数据被同步多次 **(断点续传)** 的情况: +- Temporary network failures: If a data transmission request fails, the system will retry sending it until reaching the maximum retry attempts. +- Abnormal implementation of the Pipe plugin logic: If an error is thrown during the plugin's execution, the system will retry sending the data until reaching the maximum retry attempts. +- Data partition switching due to node failures or restarts: After the partition change is completed, the affected data will be retransmitted. +- Cluster unavailability: Once the cluster becomes available again, the affected data will be retransmitted. -- 临时的网络故障:某次数据传输请求失败后,系统会进行重试发送,直至到达最大尝试次数 -- Pipe 插件逻辑实现异常:插件运行中抛出错误,系统会进行重试发送,直至到达最大尝试次数 -- 数据节点宕机、重启等导致的数据分区切主:分区变更完成后,受影响的数据会被重新传输 -- 集群不可用:集群可用后,受影响的数据会重新传输 +### Source: Data Writing with Pipe Processing and Asynchronous Decoupling of Data Transmission -### 源端:数据写入与 Pipe 处理、发送数据异步解耦 +In the data sync feature, data transfer adopts an asynchronous replication mode. -数据同步功能中,数据传输采用的是异步复制模式。 +Data sync is completely decoupled from the writing operation, eliminating any impact on the critical path of writing. This mechanism allows the framework to maintain the writing speed of a time-series database while ensuring continuous data sync. -数据同步与写入操作完全脱钩,不存在对写入关键路径的影响。该机制允许框架在保证持续数据同步的前提下,保持时序数据库的写入速度。 +### Source: Adaptive data transfer policy for data write load. -### 源端:高可用集群部署时,Pipe 服务高可用 +Support to dynamically adjust the data transfer mode according to the writing load. Sync default is to use TsFile and operation stream dynamic hybrid transfer (`'extractor.realtime.mode'='hybrid'`), when the data writing load is high, TsFile transfer is preferred. -当发送端 IoTDB 为高可用集群部署模式时,数据同步服务也将是高可用的。 数据同步框架将监控每个数据节点的数据同步进度,并定期做轻量级的分布式一致性快照以保存同步状态。 +When the data writing load is high, TsFile transfer is preferred, which has a high compression ratio and saves network bandwidth. -- 当发送端集群某数据节点宕机时,数据同步框架可以利用一致性快照以及保存在副本上的数据快速恢复同步,以此实现数据同步服务的高可用。 -- 当发送端集群整体宕机并重启时,数据同步框架也能使用快照恢复同步服务。 +When the load of data writing is low, the preferred method is operation stream synchronous transfer. The operation stream transfer has high real-time performance. +### Source: High Availability of Pipe Service in a Highly Available Cluster Deployment +When the sender end IoTDB is deployed in a high availability cluster mode, the data sync service will also be highly available. The data sync framework monitors the data sync progress of each data node and periodically takes lightweight distributed consistent snapshots to preserve the sync state. +- In the event of a failure of a data node in the sender cluster, the data sync framework can leverage the consistent snapshot and the data stored in replicas to quickly recover and resume sync, thus achieving high availability of the data sync service. +- In the event of a complete failure and restart of the sender cluster, the data sync framework can also use snapshots to recover the sync service. diff --git a/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md index c5ac54a5..c9309b06 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md @@ -7,9 +7,9 @@ to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -19,6 +19,769 @@ --> -# Tiered Storage +# IoTDB Stream Processing Framework -TODO \ No newline at end of file +The IoTDB stream processing framework allows users to implement customized stream processing logic, which can monitor and capture storage engine changes, transform changed data, and push transformed data outward. + +We call a data flow processing task a Pipe. A stream processing task (Pipe) contains three subtasks: + +- Extract +- Process +- Send (Connect) + +The stream processing framework allows users to customize the processing logic of three subtasks using Java language and process data in a UDF-like manner. +In a Pipe, the three subtasks mentioned above are executed and implemented by three types of plugins. Data flows through these three plugins sequentially for processing: +Pipe Extractor is used to extract data, Pipe Processor is used to process data, Pipe Connector is used to send data, and the final data will be sent to an external system. + +**The model for a Pipe task is as follows:** + +![任务模型图](https://alioss.timecho.com/docs/img/%E5%90%8C%E6%AD%A5%E5%BC%95%E6%93%8E.jpeg) +A data stream processing task essentially describes the attributes of the Pipe Extractor, Pipe Processor, and Pipe Connector plugins. + +Users can configure the specific attributes of these three subtasks declaratively using SQL statements. By combining different attributes, flexible data ETL (Extract, Transform, Load) capabilities can be achieved. + +Using the stream processing framework, it is possible to build a complete data pipeline to fulfill various requirements such as *edge-to-cloud synchronization, remote disaster recovery, and read/write load balancing across multiple databases*. + +## Custom Stream Processing Plugin Development + +### Programming development dependencies + +It is recommended to use Maven to build the project. Add the following dependencies in the `pom.xml` file. Please make sure to choose dependencies with the same version as the IoTDB server version. + +```xml + + org.apache.iotdb + pipe-api + 1.2.1 + provided + +``` + +### Event-Driven Programming Model + +The design of user programming interfaces for stream processing plugins follows the principles of the event-driven programming model. In this model, events serve as the abstraction of data in the user programming interface. The programming interface is decoupled from the specific execution method, allowing the focus to be on describing how the system expects events (data) to be processed upon arrival. + +In the user programming interface of stream processing plugins, events abstract the write operations of database data. Events are captured by the local stream processing engine and passed sequentially through the three stages of stream processing, namely Pipe Extractor, Pipe Processor, and Pipe Connector plugins. User logic is triggered and executed within these three plugins. + +To accommodate both low-latency stream processing in low-load scenarios and high-throughput stream processing in high-load scenarios at the edge, the stream processing engine dynamically chooses the processing objects from operation logs and data files. Therefore, the user programming interface for stream processing requires the user to provide the handling logic for two types of events: TabletInsertionEvent for operation log write events and TsFileInsertionEvent for data file write events. + +#### **TabletInsertionEvent** + +The TabletInsertionEvent is a high-level data abstraction for user write requests, which provides the ability to manipulate the underlying data of the write request by providing a unified operation interface. + +For different database deployments, the underlying storage structure corresponding to the operation log write event is different. For stand-alone deployment scenarios, the operation log write event is an encapsulation of write-ahead log (WAL) entries; for distributed deployment scenarios, the operation log write event is an encapsulation of individual node consensus protocol operation log entries. + +For write operations generated by different write request interfaces of the database, the data structure of the request structure corresponding to the operation log write event is also different.IoTDB provides many write interfaces such as InsertRecord, InsertRecords, InsertTablet, InsertTablets, and so on, and each kind of write request uses a completely different serialisation method to generate a write request. completely different serialisation methods and generate different binary entries. + +The existence of operation log write events provides users with a unified view of data operations, which shields the implementation differences of the underlying data structures, greatly reduces the programming threshold for users, and improves the ease of use of the functionality. + +```java +/** TabletInsertionEvent is used to define the event of data insertion. */ +public interface TabletInsertionEvent extends Event { + + /** + * The consumer processes the data row by row and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processRowByRow(BiConsumer consumer); + + /** + * The consumer processes the Tablet directly and collects the results by RowCollector. + * + * @return {@code Iterable} a list of new TabletInsertionEvent contains the + * results collected by the RowCollector + */ + Iterable processTablet(BiConsumer consumer); +} +``` + +#### **TsFileInsertionEvent** + +The TsFileInsertionEvent represents a high-level abstraction of the database's disk flush operation and is a collection of multiple TabletInsertionEvents. + +IoTDB's storage engine is based on the LSM (Log-Structured Merge) structure. When data is written, the write operations are first flushed to log-structured files, while the written data is also stored in memory. When the memory reaches its capacity limit, a flush operation is triggered, converting the data in memory into a database file while deleting the previously written log entries. During the conversion from memory data to database file data, two compression processes, encoding compression and universal compression, are applied. As a result, the data in the database file occupies less space compared to the original data in memory. + +In extreme network conditions, directly transferring data files is more cost-effective than transmitting individual write operations. It consumes lower network bandwidth and achieves faster transmission speed. However, there is no such thing as a free lunch. Performing calculations on data in the disk file incurs additional costs for file I/O compared to performing calculations directly on data in memory. Nevertheless, the coexistence of disk data files and memory write operations permits dynamic trade-offs and adjustments. It is based on this observation that the data file write event is introduced into the event model of the plugin. + +In summary, the data file write event appears in the event stream of stream processing plugins in the following two scenarios: + +1. Historical data extraction: Before a stream processing task starts, all persisted write data exists in the form of TsFiles. When collecting historical data at the beginning of a stream processing task, the historical data is abstracted as TsFileInsertionEvent. + +2. Real-time data extraction: During the execution of a stream processing task, if the speed of processing the log entries representing real-time operations is slower than the rate of write requests, the unprocessed log entries will be persisted to disk in the form of TsFiles. When these data are extracted by the stream processing engine, they are abstracted as TsFileInsertionEvent. + +```java +/** + * TsFileInsertionEvent is used to define the event of writing TsFile. Event data stores in disks, + * which is compressed and encoded, and requires IO cost for computational processing. + */ +public interface TsFileInsertionEvent extends Event { + + /** + * The method is used to convert the TsFileInsertionEvent into several TabletInsertionEvents. + * + * @return {@code Iterable} the list of TabletInsertionEvent + */ + Iterable toTabletInsertionEvents(); +} +``` + +### Custom Stream Processing Plugin Programming Interface Definition + +Based on the custom stream processing plugin programming interface, users can easily write data extraction plugins, data processing plugins, and data sending plugins, allowing the stream processing functionality to adapt flexibly to various industrial scenarios. +#### Data Extraction Plugin Interface + +Data extraction is the first stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data extraction plugin (PipeExtractor) serves as a bridge between the stream processing engine and the storage engine. It captures various data write events by listening to the behavior of the storage engine. +```java +/** + * PipeExtractor + * + *

PipeExtractor is responsible for capturing events from sources. + * + *

Various data sources can be supported by implementing different PipeExtractor classes. + * + *

The lifecycle of a PipeExtractor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH EXTRACTOR` clause in SQL are + * parsed and the validation method {@link PipeExtractor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeExtractor. + *
  • Then the method {@link PipeExtractor#start()} will be called to start the PipeExtractor. + *
  • While the collaboration task is in progress, the method {@link PipeExtractor#supply()} will + * be called to capture events from sources and then the events will be passed to the + * PipeProcessor. + *
  • The method {@link PipeExtractor#close()} will be called when the collaboration task is + * cancelled (the `DROP PIPE` command is executed). + *
+ */ +public interface PipeExtractor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeExtractor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeExtractorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeExtractor#validate(PipeParameterValidator)} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeExtractor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeExtractorRuntimeConfiguration configuration) + throws Exception; + + /** + * Start the extractor. After this method is called, events should be ready to be supplied by + * {@link PipeExtractor#supply()}. This method is called after {@link + * PipeExtractor#customize(PipeParameters, PipeExtractorRuntimeConfiguration)} is called. + * + * @throws Exception the user can throw errors if necessary + */ + void start() throws Exception; + + /** + * Supply single event from the extractor and the caller will send the event to the processor. + * This method is called after {@link PipeExtractor#start()} is called. + * + * @return the event to be supplied. the event may be null if the extractor has no more events at + * the moment, but the extractor is still running for more events. + * @throws Exception the user can throw errors if necessary + */ + Event supply() throws Exception; +} +``` + +#### Data Processing Plugin Interface + +Data processing is the second stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data processing plugin (PipeProcessor) is primarily used for filtering and transforming the various events captured by the data extraction plugin (PipeExtractor). + +```java +/** + * PipeProcessor + * + *

PipeProcessor is used to filter and transform the Event formed by the PipeExtractor. + * + *

The lifecycle of a PipeProcessor is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH PROCESSOR` clause in SQL are + * parsed and the validation method {@link PipeProcessor#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeProcessor. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeExtractor captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeConnector. The + * following 3 methods will be called: {@link + * PipeProcessor#process(TabletInsertionEvent, EventCollector)}, {@link + * PipeProcessor#process(TsFileInsertionEvent, EventCollector)} and {@link + * PipeProcessor#process(Event, EventCollector)}. + *
    • PipeConnector serializes the events into binaries and send them to sinks. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeProcessor#close() } method will be called. + *
+ */ +public interface PipeProcessor extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeProcessor#customize(PipeParameters, PipeProcessorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeProcessor. In this method, the user can do the + * following things: + * + *
    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeProcessorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeProcessor#validate(PipeParameterValidator)} is called and before the beginning of the + * events processing. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeProcessor + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeProcessorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is called to process the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(TabletInsertionEvent tabletInsertionEvent, EventCollector eventCollector) + throws Exception; + + /** + * This method is called to process the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + default void process(TsFileInsertionEvent tsFileInsertionEvent, EventCollector eventCollector) + throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + process(tabletInsertionEvent, eventCollector); + } + } + + /** + * This method is called to process the Event. + * + * @param event Event to be processed + * @param eventCollector used to collect result events after processing + * @throws Exception the user can throw errors if necessary + */ + void process(Event event, EventCollector eventCollector) throws Exception; +} +``` + +#### Data Sending Plugin Interface + +Data sending is the third stage of the three-stage process of stream processing, which includes data extraction, data processing, and data sending. The data sending plugin (PipeConnector) is responsible for sending the various events processed by the data processing plugin (PipeProcessor). It serves as the network implementation layer of the stream processing framework and should support multiple real-time communication protocols and connectors in its interface. + +```java +/** + * PipeConnector + * + *

PipeConnector is responsible for sending events to sinks. + * + *

Various network protocols can be supported by implementing different PipeConnector classes. + * + *

The lifecycle of a PipeConnector is as follows: + * + *

    + *
  • When a collaboration task is created, the KV pairs of `WITH CONNECTOR` clause in SQL are + * parsed and the validation method {@link PipeConnector#validate(PipeParameterValidator)} + * will be called to validate the parameters. + *
  • Before the collaboration task starts, the method {@link + * PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} will be called + * to config the runtime behavior of the PipeConnector and the method {@link + * PipeConnector#handshake()} will be called to create a connection with sink. + *
  • While the collaboration task is in progress: + *
      + *
    • PipeExtractor captures the events and wraps them into three types of Event instances. + *
    • PipeProcessor processes the event and then passes them to the PipeConnector. + *
    • PipeConnector serializes the events into binaries and send them to sinks. The + * following 3 methods will be called: {@link + * PipeConnector#transfer(TabletInsertionEvent)}, {@link + * PipeConnector#transfer(TsFileInsertionEvent)} and {@link + * PipeConnector#transfer(Event)}. + *
    + *
  • When the collaboration task is cancelled (the `DROP PIPE` command is executed), the {@link + * PipeConnector#close() } method will be called. + *
+ * + *

In addition, the method {@link PipeConnector#heartbeat()} will be called periodically to check + * whether the connection with sink is still alive. The method {@link PipeConnector#handshake()} + * will be called to create a new connection with the sink when the method {@link + * PipeConnector#heartbeat()} throws exceptions. + */ +public interface PipeConnector extends PipePlugin { + + /** + * This method is mainly used to validate {@link PipeParameters} and it is executed before {@link + * PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} is called. + * + * @param validator the validator used to validate {@link PipeParameters} + * @throws Exception if any parameter is not valid + */ + void validate(PipeParameterValidator validator) throws Exception; + + /** + * This method is mainly used to customize PipeConnector. In this method, the user can do the + * following things: + * + *

    + *
  • Use PipeParameters to parse key-value pair attributes entered by the user. + *
  • Set the running configurations in PipeConnectorRuntimeConfiguration. + *
+ * + *

This method is called after the method {@link + * PipeConnector#validate(PipeParameterValidator)} is called and before the method {@link + * PipeConnector#handshake()} is called. + * + * @param parameters used to parse the input parameters entered by the user + * @param configuration used to set the required properties of the running PipeConnector + * @throws Exception the user can throw errors if necessary + */ + void customize(PipeParameters parameters, PipeConnectorRuntimeConfiguration configuration) + throws Exception; + + /** + * This method is used to create a connection with sink. This method will be called after the + * method {@link PipeConnector#customize(PipeParameters, PipeConnectorRuntimeConfiguration)} is + * called or will be called when the method {@link PipeConnector#heartbeat()} throws exceptions. + * + * @throws Exception if the connection is failed to be created + */ + void handshake() throws Exception; + + /** + * This method will be called periodically to check whether the connection with sink is still + * alive. + * + * @throws Exception if the connection dies + */ + void heartbeat() throws Exception; + + /** + * This method is used to transfer the TabletInsertionEvent. + * + * @param tabletInsertionEvent TabletInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(TabletInsertionEvent tabletInsertionEvent) throws Exception; + + /** + * This method is used to transfer the TsFileInsertionEvent. + * + * @param tsFileInsertionEvent TsFileInsertionEvent to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + default void transfer(TsFileInsertionEvent tsFileInsertionEvent) throws Exception { + for (final TabletInsertionEvent tabletInsertionEvent : + tsFileInsertionEvent.toTabletInsertionEvents()) { + transfer(tabletInsertionEvent); + } + } + + /** + * This method is used to transfer the Event. + * + * @param event Event to be transferred + * @throws PipeConnectionException if the connection is broken + * @throws Exception the user can throw errors if necessary + */ + void transfer(Event event) throws Exception; +} +``` + +## Custom Stream Processing Plugin Management + +To ensure the flexibility and usability of user-defined plugins in production environments, the system needs to provide the capability to dynamically manage plugins. This section introduces the management statements for stream processing plugins, which enable the dynamic and unified management of plugins. + +### Load Plugin Statement + +In IoTDB, to dynamically load a user-defined plugin into the system, you first need to implement a specific plugin class based on PipeExtractor, PipeProcessor, or PipeConnector. Then, you need to compile and package the plugin class into an executable jar file. Finally, you can use the loading plugin management statement to load the plugin into IoTDB. + +The syntax of the loading plugin management statement is as follows: + +```sql +CREATE PIPEPLUGIN +AS +USING +``` + +For example, if a user implements a data processing plugin with the fully qualified class name "edu.tsinghua.iotdb.pipe.ExampleProcessor" and packages it into a jar file, which is stored at "https://example.com:8080/iotdb/pipe-plugin.jar", and the user wants to use this plugin in the stream processing engine, marking the plugin as "example". The creation statement for this data processing plugin is as follows: + +```sql +CREATE PIPEPLUGIN example +AS 'edu.tsinghua.iotdb.pipe.ExampleProcessor' +USING URI '' +``` + +### Delete Plugin Statement + +When user no longer wants to use a plugin and needs to uninstall the plug-in from the system, you can use the Remove plugin statement as shown below. +```sql +DROP PIPEPLUGIN +``` + +### Show Plugin Statement + +User can also view the plugin in the system on need. The statement to view plugin is as follows. +```sql +SHOW PIPEPLUGINS +``` + +## System Pre-installed Stream Processing Plugin + +### Pre-built extractor Plugin + +#### iotdb-extractor + +Function: Extract historical or realtime data inside IoTDB into pipe. + + +| key | value | value 取值范围 | required or optional with default | +| ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | +| extractor | iotdb-extractor | String: iotdb-extractor | required | +| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| extractor.history.enable | 是否抽取历史数据 | Boolean: true, false | optional: true | +| extractor.history.start-time | 抽取的历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | 抽取的历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | 是否抽取实时数据 | Boolean: true, false | optional: true | +| extractor.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | optional: hybrid | +| extractor.forwarding-pipe-requests | 是否抽取由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | optional: true | + +> 🚫 **extractor.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,抽取效率较高,其他任意格式都将降低性能 +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: + > + > * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100T + > + > 的数据会被抽取; + > + > * root.aligned.\`1\` +> * root.aligned.\`123\` + > + > 的数据不会被抽取。 +> * root.\_\_system 的数据不会被 pipe 抽取。用户虽然可以在 extractor.pattern 中包含任意前缀,包括带有(或覆盖) root.\__system 的前缀,但是 root.__system 下的数据总是会被 pipe 忽略的 + +> ❗️**extractor.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + +> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> +> 用户可以指定 iotdb-extractor 进行: +> +> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` + +> 📌 **extractor.realtime.mode:数据抽取的模式** +> +> * log:该模式下,任务仅使用操作日志进行数据处理、发送 +> * file:该模式下,任务仅使用数据文件进行数据处理、发送 +> * hybrid:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 + +> 🍕 **extractor.forwarding-pipe-requests:是否允许转发从另一 pipe 传输而来的数据** +> +> * 如果要使用 pipe 构建 A -> B -> C 的数据同步,那么 B -> C 的 pipe 需要将该参数为 true 后,A -> B 中 A 通过 pipe 写入 B 的数据才能被正确转发到 C +> * 如果要使用 pipe 构建 A \<-> B 的双向数据同步(双活),那么 A -> B 和 B -> A 的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发 + +### 预置 processor 插件 + +#### do-nothing-processor + +作用:不对 extractor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### 预置 connector 插件 + +#### do-nothing-connector + +作用:不对 processor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| connector | do-nothing-connector | String: do-nothing-connector | required | + +## 流处理任务管理 + +### 创建流处理任务 + +使用 `CREATE PIPE` 语句来创建流处理任务。以数据同步流处理任务的创建为例,示例 SQL 语句如下: + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH EXTRACTOR ( + -- 默认的 IoTDB 数据抽取插件 + 'extractor' = 'iotdb-extractor', + -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + 'extractor.pattern' = 'root.timecho', + -- 是否抽取历史数据 + 'extractor.history.enable' = 'true', + -- 描述被抽取的历史数据的时间范围,表示最早时间 + 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', + -- 描述被抽取的历史数据的时间范围,表示最晚时间 + 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', + -- 是否抽取实时数据 + 'extractor.realtime.enable' = 'true', + -- 描述实时数据的抽取方式 + 'extractor.realtime.mode' = 'hybrid', +) +WITH PROCESSOR ( + -- 默认的数据处理插件,即不做任何处理 + 'processor' = 'do-nothing-processor', +) +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +**创建流处理任务时需要配置 PipeId 以及三个插件部分的参数:** + + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| --------- | --------------------------------------------------- | --------------------------- | -------------------- | -------------------------------------------------------- | ------------------------- | +| PipeId | 全局唯一标定一个流处理任务的名称 | 必填 | - | - | - | +| extractor | Pipe Extractor 插件,负责在数据库底层抽取流处理数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入流处理任务 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | + +示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据流处理任务。IoTDB 还内置了其他的流处理插件,**请查看“系统预置流处理插件”一节**。 + +**一个最简的 CREATE PIPE 语句示例如下:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 + +**注意:** + +- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 +- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 +- CONNECTOR 具备自复用能力。对于不同的流处理任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个流处理任务的声明: + + ```sql + CREATE PIPE pipe1 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.host' = 'localhost', + 'connector.thrift.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.port' = '9999', + 'connector.thrift.host' = 'localhost', + ) + ``` + + - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 +- 在 extractor 为默认的 iotdb-extractor,且 extractor.forwarding-pipe-requests 为默认值 true 时,请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### 启动流处理任务 + +CREATE PIPE 语句成功执行后,流处理任务相关实例会被创建,但整个流处理任务的运行状态会被置为 STOPPED,即流处理任务不会立刻处理数据。 + +可以使用 START PIPE 语句使流处理任务开始处理数据: + +```sql +START PIPE +``` + +### 停止流处理任务 + +使用 STOP PIPE 语句使流处理任务停止处理数据: + +```sql +STOP PIPE +``` + +### 删除流处理任务 + +使用 DROP PIPE 语句使流处理任务停止处理数据(当流处理任务状态为 RUNNING 时),然后删除整个流处理任务流处理任务: + +```sql +DROP PIPE +``` + +用户在删除流处理任务前,不需要执行 STOP 操作。 + +### 展示流处理任务 + +使用 SHOW PIPES 语句查看所有流处理任务: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +| ID| CreationTime | State|PipeExtractor|PipeProcessor|PipeConnector|ExceptionMessage| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| None| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +``` + +可以使用 `` 指定想看的某个流处理任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 + +```sql +SHOW PIPES +WHERE CONNECTOR USED BY +``` + +### 流处理任务运行状态迁移 + +一个流处理 pipe 在其被管理的生命周期中会经过多种状态: + +- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: + - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 + - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED + - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- **RUNNING:** pipe 正在正常工作 +- **DROPPED:** pipe 任务被永久删除 + +下图表明了所有状态以及状态的迁移: + +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## 权限管理 + +### 流处理任务 + + +| 权限名称 | 描述 | +| ----------- | -------------------------- | +| CREATE_PIPE | 注册流处理任务。路径无关。 | +| START_PIPE | 开启流处理任务。路径无关。 | +| STOP_PIPE | 停止流处理任务。路径无关。 | +| DROP_PIPE | 卸载流处理任务。路径无关。 | +| SHOW_PIPES | 查询流处理任务。路径无关。 | + +### 流处理任务插件 + + +| 权限名称 | 描述 | +| ----------------- | ------------------------------ | +| CREATE_PIPEPLUGIN | 注册流处理任务插件。路径无关。 | +| DROP_PIPEPLUGIN | 开启流处理任务插件。路径无关。 | +| SHOW_PIPEPLUGINS | 查询流处理任务插件。路径无关。 | + +## 配置参数 + +在 iotdb-common.properties 中: + +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 +``` From a3b9ff3e86b14fef9d8b5f8f2fa85c43b6f0c8a3 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Tue, 10 Oct 2023 11:41:32 +0800 Subject: [PATCH 15/27] 5 --- src/UserGuide/V1.2.x/User-Manual/Streaming.md | 6 +- .../V1.2.x/User-Manual/Streaming_timecho.md | 256 +++++++++--------- 2 files changed, 128 insertions(+), 134 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Streaming.md b/src/UserGuide/V1.2.x/User-Manual/Streaming.md index f597cd83..da694d24 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Streaming.md +++ b/src/UserGuide/V1.2.x/User-Manual/Streaming.md @@ -483,10 +483,10 @@ Function: Extract historical or realtime data inside IoTDB into pipe. | ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | | extractor | iotdb-extractor | String: iotdb-extractor | required | | extractor.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | -| extractor.history.enable | whether to synchronize historical data | Boolean: true, false | optional: true | +| extractor.history.enable | whether to sync historical data | Boolean: true, false | optional: true | | extractor.history.start-time | start of synchronizing historical data event time,Include start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | | extractor.history.end-time | end of synchronizing historical data event time,Include end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| extractor.realtime.enable | Whether to synchronize realtime data | Boolean: true, false | optional: true | +| extractor.realtime.enable | Whether to sync realtime data | Boolean: true, false | optional: true | > 🚫 **extractor.pattern Parameter Description** > @@ -735,7 +735,7 @@ The following diagram illustrates the different states and their transitions: | Authority Name | Description | | ----------------- | ------------------------------ | | CREATE_PIPEPLUGIN | Register stream processing task plugin,path-independent | -| DROP_PIPEPLUGIN | Start stream processing task plugin,path-independent | +| DROP_PIPEPLUGIN | Delete stream processing task plugin,path-independent | | SHOW_PIPEPLUGINS | Query stream processing task plugin,path-independent | ## Configure Parameters diff --git a/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md index c9309b06..61cd393b 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md @@ -479,166 +479,165 @@ SHOW PIPEPLUGINS Function: Extract historical or realtime data inside IoTDB into pipe. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | | extractor | iotdb-extractor | String: iotdb-extractor | required | -| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | -| extractor.history.enable | 是否抽取历史数据 | Boolean: true, false | optional: true | -| extractor.history.start-time | 抽取的历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| extractor.history.end-time | 抽取的历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| extractor.realtime.enable | 是否抽取实时数据 | Boolean: true, false | optional: true | -| extractor.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | optional: hybrid | -| extractor.forwarding-pipe-requests | 是否抽取由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | optional: true | - -> 🚫 **extractor.pattern 参数说明** +| extractor.pattern | path prefix for filtering time series | String: any time series prefix | optional: root | +| extractor.history.enable | whether to sync historical data | Boolean: true, false | optional: true | +| extractor.history.start-time | start of synchronizing historical data event time,Include start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | end of synchronizing historical data event time,Include end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | Whether to sync realtime data | Boolean: true, false | optional: true | +| extractor.realtime.mode | Extraction pattern for realtime data | String: hybrid, log, file | optional: hybrid | +| extractor.forwarding-pipe-requests | Whether to extract data written by other pipes (usually Data sync) | Boolean: true, false | optional: true | + +> 🚫 **extractor.pattern Parameter Description** > -> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) -> * 在底层实现中,当检测到 pattern 为 root(默认值)时,抽取效率较高,其他任意格式都将降低性能 -> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: - > - > * root.aligned.1TS +> * Pattern should use backquotes to modify illegal characters or illegal path nodes, for example, if you want to filter root.\`a@b\` or root.\`123\`, you should set the pattern to root.\`a@b\` or root.\`123\`(Refer specifically to [Timing of single and double quotes and backquotes](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * In the underlying implementation, when pattern is detected as root (default value), synchronization efficiency is higher, and any other format will reduce performance. +> * The path prefix does not need to form a complete path. For example, when creating a pipe with the parameter 'extractor.pattern'='root.aligned.1': +> +> * root.aligned.1TS > * root.aligned.1TS.\`1\` -> * root.aligned.100T - > - > 的数据会被抽取; - > - > * root.aligned.\`1\` +> * root.aligned.100TS +> +> the data will be synchronized; +> +> * root.aligned.\`1\` > * root.aligned.\`123\` - > - > 的数据不会被抽取。 -> * root.\_\_system 的数据不会被 pipe 抽取。用户虽然可以在 extractor.pattern 中包含任意前缀,包括带有(或覆盖) root.\__system 的前缀,但是 root.__system 下的数据总是会被 pipe 忽略的 +> +> the data will not be synchronized. +> * Data under root.\_\_system will not be extracted by the pipe. Although the user can include any prefix in extractor.pattern, including prefixes with (or overriding) root.\__system, data under root.\__system will always be ignored by pipe -> ❗️**extractor.history 的 start-time,end-time 参数说明** +> ❗️**start-time, end-time parameter description of extractor.history** > -> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 +> * start-time, end-time should be in ISO format, such as 2011-12-03T10:15:30 or 2011-12-03T10:15:30+01:00 -> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> ✅ **a piece of data from production to IoTDB contains two key concepts of time** > -> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 -> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> * **event time:** the time when the data is actually produced (or the generation time assigned to the data by the data production system, which is a time item in the data point), also called the event time. +> * **arrival time:** the time the data arrived in the IoTDB system. > -> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 +> The out-of-order data we often refer to refers to data whose **event time** is far behind the current system time (or the maximum **event time** that has been dropped) when the data arrives. On the other hand, whether it is out-of-order data or sequential data, as long as they arrive newly in the system, their **arrival time** will increase with the order in which the data arrives at IoTDB. -> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> 💎 **the work of iotdb-extractor can be split into two stages** > -> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 -> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> 1. Historical data extraction: All data with **arrival time** < **current system time** when creating the pipe is called historical data +> 2. Realtime data extraction: All data with **arrival time** >= **current system time** when the pipe is created is called realtime data > -> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> The historical data transmission phase and the realtime data transmission phase are executed serially. Only when the historical data transmission phase is completed, the realtime data transmission phase is executed.** > -> 用户可以指定 iotdb-extractor 进行: +> Users can specify iotdb-extractor to: > -> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) -> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) -> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) -> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` +> * Historical data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * Realtime data extraction(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * Full data extraction(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * Disable simultaneous sets `extractor.history.enable` and `extractor.realtime.enable` to `false` -> 📌 **extractor.realtime.mode:数据抽取的模式** +> 📌 **extractor.realtime.mode: mode in which data is extracted** > -> * log:该模式下,任务仅使用操作日志进行数据处理、发送 -> * file:该模式下,任务仅使用数据文件进行数据处理、发送 -> * hybrid:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 +> * log: in this mode, the task uses only operation logs for data processing and sending. +> * file: in this mode, the task uses only data files for data processing and sending. +> * hybrid: This mode takes into account the characteristics of low latency but low throughput when sending data item by item according to the operation log and high throughput but high latency when sending data in batches according to the data file, and is able to automatically switch to a suitable data extraction method under different write loads. When data backlog is generated, it automatically switches to data file-based data extraction to ensure high sending throughput, and when the backlog is eliminated, it automatically switches back to operation log-based data extraction, which avoids the problem that it is difficult to balance the data sending latency or throughput by using a single data extraction algorithm. -> 🍕 **extractor.forwarding-pipe-requests:是否允许转发从另一 pipe 传输而来的数据** +> 🍕 **extractor.forwarding-pipe-requests: whether to allow forwarding of data transferred from another pipe**. > -> * 如果要使用 pipe 构建 A -> B -> C 的数据同步,那么 B -> C 的 pipe 需要将该参数为 true 后,A -> B 中 A 通过 pipe 写入 B 的数据才能被正确转发到 C -> * 如果要使用 pipe 构建 A \<-> B 的双向数据同步(双活),那么 A -> B 和 B -> A 的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发 +> * If pipe is to be used to build A -> B -> C data sync, then the pipe of B -> C needs to have this parameter set to true for the data written from A -> B to B via the pipe to be forwarded to C correctly. +> * If using pipe to build bi-directional data syncn for A \<-> B (dual-living), then the pipe for A -> B and B -> A need to be set to false, otherwise it will result in an endless loop of data being forwarded between clusters. -### 预置 processor 插件 +### Pre-built Processor Plugin #### do-nothing-processor -作用:不对 extractor 传入的事件做任何的处理。 +Function: Do not do anything with the events passed in by the extractor. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------- | -------------------- | ---------------------------- | --------------------------------- | | processor | do-nothing-processor | String: do-nothing-processor | required | - -### 预置 connector 插件 +### Pre-built Connector Plugin #### do-nothing-connector -作用:不对 processor 传入的事件做任何的处理。 +Function: Does not do anything with the events passed in by the processor. -| key | value | value 取值范围 | required or optional with default | +| key | value | value range | required or optional with default | | --------- | -------------------- | ---------------------------- | --------------------------------- | | connector | do-nothing-connector | String: do-nothing-connector | required | -## 流处理任务管理 +## Stream Processing Task Management -### 创建流处理任务 +### Create Stream Processing Task -使用 `CREATE PIPE` 语句来创建流处理任务。以数据同步流处理任务的创建为例,示例 SQL 语句如下: +A stream processing task can be created using the `CREATE PIPE` statement, a sample SQL statement is shown below: ```sql -CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +CREATE PIPE -- PipeId is the name that uniquely identifies the sync task WITH EXTRACTOR ( - -- 默认的 IoTDB 数据抽取插件 + -- Default IoTDB Data Extraction Plugin 'extractor' = 'iotdb-extractor', - -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + -- Path prefix, only data that can match the path prefix will be extracted for subsequent processing and delivery 'extractor.pattern' = 'root.timecho', - -- 是否抽取历史数据 + -- Whether to extract historical data 'extractor.history.enable' = 'true', - -- 描述被抽取的历史数据的时间范围,表示最早时间 + -- Describes the time range of the historical data being extracted, indicating the earliest possible time 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', - -- 描述被抽取的历史数据的时间范围,表示最晚时间 + -- Describes the time range of the extracted historical data, indicating the latest time 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', - -- 是否抽取实时数据 + -- Whether to extract realtime data 'extractor.realtime.enable' = 'true', -- 描述实时数据的抽取方式 'extractor.realtime.mode' = 'hybrid', ) WITH PROCESSOR ( - -- 默认的数据处理插件,即不做任何处理 + -- Default data processing plugin, means no processing 'processor' = 'do-nothing-processor', ) WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB + -- IoTDB data sending plugin with target IoTDB 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + -- Data service for one of the DataNode nodes on the target IoTDB ip 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + -- Data service port of one of the DataNode nodes of the target IoTDB 'connector.port' = '6667', ) ``` -**创建流处理任务时需要配置 PipeId 以及三个插件部分的参数:** +**To create a stream processing task it is necessary to configure the PipeId and the parameters of the three plugin sections:** -| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | -| --------- | --------------------------------------------------- | --------------------------- | -------------------- | -------------------------------------------------------- | ------------------------- | -| PipeId | 全局唯一标定一个流处理任务的名称 | 必填 | - | - | - | -| extractor | Pipe Extractor 插件,负责在数据库底层抽取流处理数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入流处理任务 | 否 | -| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | -| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | +| configuration item | description | Required or not | default implementation | Default implementation description | Whether to allow custom implementations | +| --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | +| pipeId | Globally uniquely identifies the name of a sync task | required | - | - | - | +| extractor | pipe Extractor plug-in, for extracting synchronized data at the bottom of the database | Optional | iotdb-extractor | Integrate all historical data of the database and subsequent realtime data into the sync task | no | +| processor | Pipe Processor plug-in, for processing data | Optional | do-nothing-processor | no processing of incoming data | yes | +| connector | Pipe Connector plug-in,for sending data | required | - | - | yes | -示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据流处理任务。IoTDB 还内置了其他的流处理插件,**请查看“系统预置流处理插件”一节**。 +In the example, the iotdb-extractor, do-nothing-processor, and iotdb-thrift-connector plug-ins are used to build the data synchronisation task. iotdb has other built-in data synchronisation plug-ins, **see the section "System pre-built data synchronisation plug-ins" **. See the "System Pre-installed Stream Processing Plugin" section**. -**一个最简的 CREATE PIPE 语句示例如下:** +**An example of a minimalist CREATE PIPE statement is as follows:** ```sql -CREATE PIPE -- PipeId 是能够唯一标定流处理任务的名字 +CREATE PIPE -- PipeId is a name that uniquely identifies the task. WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB + -- IoTDB data sending plugin with target IoTDB 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + -- Data service for one of the DataNode nodes on the target IoTDB ip 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + -- Data service port of one of the DataNode nodes of the target IoTDB 'connector.port' = '6667', ) ``` -其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 +The expressed semantics are: synchronise the full amount of historical data and subsequent arrivals of realtime data from this database instance to the IoTDB instance with target 127.0.0.1:6667. -**注意:** +**Note:** -- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 -- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 -- CONNECTOR 具备自复用能力。对于不同的流处理任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 +- EXTRACTOR and PROCESSOR are optional, if no configuration parameters are filled in, the system will use the corresponding default implementation. +- The CONNECTOR is a mandatory configuration that needs to be declared in the CREATE PIPE statement for configuring purposes. +- The CONNECTOR exhibits self-reusability. For different tasks, if their CONNECTOR possesses identical KV properties (where the value corresponds to every key), **the system will ultimately create only one instance of the CONNECTOR** to achieve resource reuse for connections. - - 例如,有下面 pipe1, pipe2 两个流处理任务的声明: + - For example, there are the following pipe1, pipe2 task declarations: ```sql CREATE PIPE pipe1 @@ -656,49 +655,48 @@ WITH CONNECTOR ( ) ``` - - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 -- 在 extractor 为默认的 iotdb-extractor,且 extractor.forwarding-pipe-requests 为默认值 true 时,请不要构建出包含数据循环同步的应用场景(会导致无限循环): + - Since they have identical CONNECTOR declarations (**even if the order of some properties is different**), the framework will automatically reuse the CONNECTOR declared by them. Hence, the CONNECTOR instances for pipe1 and pipe2 will be the same. +- Please note that we should avoid constructing application scenarios that involve data cycle sync (as it can result in an infinite loop): - IoTDB A -> IoTDB B -> IoTDB A - IoTDB A -> IoTDB A -### 启动流处理任务 -CREATE PIPE 语句成功执行后,流处理任务相关实例会被创建,但整个流处理任务的运行状态会被置为 STOPPED,即流处理任务不会立刻处理数据。 +### Start Stream Processing Task -可以使用 START PIPE 语句使流处理任务开始处理数据: +After the successful execution of the CREATE PIPE statement, an instance of the stream processing task is created, but the overall task's running status will be set to STOPPED, meaning the task will not immediately process data. +You can use the START PIPE statement to make the stream processing task start processing data: ```sql START PIPE ``` -### 停止流处理任务 +### Stop Stream Processing Task -使用 STOP PIPE 语句使流处理任务停止处理数据: +Use the STOP PIPE statement to stop the stream processing task from processing data: ```sql STOP PIPE ``` -### 删除流处理任务 +### Delete Stream Processing Task -使用 DROP PIPE 语句使流处理任务停止处理数据(当流处理任务状态为 RUNNING 时),然后删除整个流处理任务流处理任务: +If a stream processing task is in the RUNNING state, you can use the DROP PIPE statement to stop it and delete the entire task: ```sql DROP PIPE ``` -用户在删除流处理任务前,不需要执行 STOP 操作。 +Before deleting a stream processing task, there is no need to execute the STOP operation. -### 展示流处理任务 - -使用 SHOW PIPES 语句查看所有流处理任务: +### Show Stream Processing Task +Use the SHOW PIPES statement to view all stream processing tasks: ```sql SHOW PIPES ``` -查询结果如下: +The query results are as follows: ```sql +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ @@ -710,59 +708,55 @@ SHOW PIPES +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ ``` -可以使用 `` 指定想看的某个流处理任务状态: - +You can use `` to specify the status of a stream processing task you want to see: ```sql SHOW PIPE ``` -您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 +Additionally, the WHERE clause can be used to determine if the Pipe Connector used by a specific \ is being reused. ```sql SHOW PIPES WHERE CONNECTOR USED BY ``` +### Stream Processing Task Running Status Migration -### 流处理任务运行状态迁移 - -一个流处理 pipe 在其被管理的生命周期中会经过多种状态: - -- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: - - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 - - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED - - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED -- **RUNNING:** pipe 正在正常工作 -- **DROPPED:** pipe 任务被永久删除 - -下图表明了所有状态以及状态的迁移: +A stream processing task status can transition through several states during the lifecycle of a data synchronization pipe: -![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) +- **STOPPED:** The pipe is in a stopped state. It can have the following possibilities: + - After the successful creation of a pipe, its initial state is set to stopped + - The user manually pauses a pipe that is in normal running state, transitioning its status from RUNNING to STOPPED + - If a pipe encounters an unrecoverable error during execution, its status automatically changes from RUNNING to STOPPED. +- **RUNNING:** The pipe is actively processing data +- **DROPPED:** The pipe is permanently deleted -## 权限管理 +The following diagram illustrates the different states and their transitions: -### 流处理任务 +![state migration diagram](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) +## Authority Management -| 权限名称 | 描述 | -| ----------- | -------------------------- | -| CREATE_PIPE | 注册流处理任务。路径无关。 | -| START_PIPE | 开启流处理任务。路径无关。 | -| STOP_PIPE | 停止流处理任务。路径无关。 | -| DROP_PIPE | 卸载流处理任务。路径无关。 | -| SHOW_PIPES | 查询流处理任务。路径无关。 | +### Stream Processing Task -### 流处理任务插件 +| Authority Name | Description | +| ----------- | -------------------- | +| CREATE_PIPE | Register task,path-independent | +| START_PIPE | Start task,path-independent | +| STOP_PIPE | Stop task,path-independent | +| DROP_PIPE | Uninstall task,path-independent | +| SHOW_PIPES | Query task,path-independent | +### Stream Processing Task Plugin -| 权限名称 | 描述 | +| Authority Name | Description | | ----------------- | ------------------------------ | -| CREATE_PIPEPLUGIN | 注册流处理任务插件。路径无关。 | -| DROP_PIPEPLUGIN | 开启流处理任务插件。路径无关。 | -| SHOW_PIPEPLUGINS | 查询流处理任务插件。路径无关。 | +| CREATE_PIPEPLUGIN | Register stream processing task plugin,path-independent | +| DROP_PIPEPLUGIN | Delete stream processing task plugin,path-independent | +| SHOW_PIPEPLUGINS | Query stream processing task plugin,path-independent | -## 配置参数 +## Configure Parameters -在 iotdb-common.properties 中: +In iotdb-common.properties : ```Properties #################### From c72607d50368e2b409c47e5b477fa48c95211039 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Tue, 10 Oct 2023 11:54:12 +0800 Subject: [PATCH 16/27] 6 --- src/zh/UserGuide/V1.2.x/User-Manual/Streaming.md | 2 +- src/zh/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Streaming.md b/src/zh/UserGuide/V1.2.x/User-Manual/Streaming.md index 787d55b7..0f25baca 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Streaming.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Streaming.md @@ -752,7 +752,7 @@ WHERE CONNECTOR USED BY | 权限名称 | 描述 | | ----------------- | ------------------------------ | | CREATE_PIPEPLUGIN | 注册流处理任务插件。路径无关。 | -| DROP_PIPEPLUGIN | 开启流处理任务插件。路径无关。 | +| DROP_PIPEPLUGIN | 卸载流处理任务插件。路径无关。 | | SHOW_PIPEPLUGINS | 查询流处理任务插件。路径无关。 | ## 配置参数 diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md b/src/zh/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md index 02e1df00..b5abd3fc 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Streaming_timecho.md @@ -768,7 +768,7 @@ WHERE CONNECTOR USED BY | 权限名称 | 描述 | | ----------------- | ------------------------------ | | CREATE_PIPEPLUGIN | 注册流处理任务插件。路径无关。 | -| DROP_PIPEPLUGIN | 开启流处理任务插件。路径无关。 | +| DROP_PIPEPLUGIN | 卸载流处理任务插件。路径无关。 | | SHOW_PIPEPLUGINS | 查询流处理任务插件。路径无关。 | ## 配置参数 From ed18a0034bba826da414db577651c052d9d17517 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Tue, 10 Oct 2023 17:43:39 +0800 Subject: [PATCH 17/27] 7 --- src/UserGuide/V1.2.x/User-Manual/Data-Sync.md | 2 +- src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md index 42263aec..040b6605 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync.md @@ -218,7 +218,7 @@ The query results are as follows: +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ ``` -You can use to specify the status of a particular synchronization task: +You can use \ to specify the status of a particular synchronization task: ```sql SHOW PIPE diff --git a/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md index 422cab13..81bd9f68 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md @@ -219,7 +219,7 @@ The query results are as follows: +-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ ``` -You can use to specify the status of a particular synchronization task: +You can use \ to specify the status of a particular synchronization task: ```sql SHOW PIPE From 478c588e291cc3d48707a1e3484a8e6adf541444 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 11 Oct 2023 18:55:36 +0800 Subject: [PATCH 18/27] fix title bug and add English doc --- .../V1.2.x/IoTDB-Introduction/Publication.md | 8 +- .../V1.2.x/User-Manual/IoTDB-View_timecho.md | 2 +- .../Security-Management_timecho.md | 4 +- .../User-Manual/Tiered-Storage_timecho.md | 76 ++++++++++++++++++- .../Environmental-Requirement.md | 2 +- .../V1.2.x/IoTDB-Introduction/Publication.md | 9 +-- .../V1.2.x/Tools-System/Monitor-Tool.md | 22 +++--- .../User-Manual/Database-Programming.md | 2 +- .../V1.2.x/User-Manual/Write-Delete-Data.md | 2 +- 9 files changed, 94 insertions(+), 33 deletions(-) diff --git a/src/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md b/src/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md index db23cdb2..94413e4a 100644 --- a/src/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md +++ b/src/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md @@ -19,9 +19,7 @@ --> -# Publication - -## Research Papers +# Research Papers Apache IoTDB starts at Tsinghua University, School of Software. IoTDB is a database for managing large amount of time series data with columnar storage, data encoding, pre-computation, and index techniques. It has SQL-like interface to write millions of data points per second per node and is optimized to get query results in few seconds over trillions of data points. It can also be easily integrated with Apache Hadoop MapReduce and Apache Spark for analytics. @@ -35,8 +33,4 @@ The research papers related are as follows: * [The Design of Apache IoTDB distributed framework](http://ndbc2019.sdu.edu.cn/info/1002/1044.htm), Tianan Li, Jianmin Wang, Xiangdong Huang, Yi Xu, Dongfang Mao, Jun Yuan. NDBC 2019 * [Dual-PISA: An index for aggregation operations on time series data](https://www.sciencedirect.com/science/article/pii/S0306437918305489), Jialin Qiao, Xiangdong Huang, Jianmin Wang, Raymond K Wong. IS 2020 -## Benchmark tools - -We also developed Benchmark tools for time series databases -[https://github.com/thulab/iot-benchmark](https://github.com/thulab/iot-benchmark) diff --git a/src/UserGuide/V1.2.x/User-Manual/IoTDB-View_timecho.md b/src/UserGuide/V1.2.x/User-Manual/IoTDB-View_timecho.md index a9ea5a7b..20d3a9d3 100644 --- a/src/UserGuide/V1.2.x/User-Manual/IoTDB-View_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/IoTDB-View_timecho.md @@ -21,4 +21,4 @@ # IoTDB View -TODO \ No newline at end of file +coming soon \ No newline at end of file diff --git a/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md index 021baac2..222820bf 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md @@ -23,11 +23,11 @@ ## White List -TODO +coming soon ## Audit Log -TODO +coming soon ## Administration Management diff --git a/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md index c5ac54a5..6f45cc81 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md @@ -19,6 +19,78 @@ --> -# Tiered Storage +# Tiered Storage +## Overview -TODO \ No newline at end of file +Tiered storage function provides users with the ability to manage tiered storage media. users can use the tiered storage function to configure different types of storage media for IoTDB and to classify the storage media. ioTDB can support tiered storage from memory, SSD, normal hard disc to network hard disc by parameter configuration only according to the degree of hot and cold data. Specifically, in IoTDB, the configuration of tiered storage is reflected in the management of multiple directories. Users can group tiered storage directories into the same category and configure them into IoTDB as a "tier", which is called storage tier; at the same time, users can categorize data according to hot or cold, and store different categories of data into designated storage tiers. Meanwhile, users can categorise data according to hot or cold and store different categories of data in the specified tier. Currently, IoTDB supports the classification of hot and cold data by TTL, when the data in one tier does not meet the TTL rules defined in the current tier, the data will be automatically migrated to the next tier. + +## Parameter Definition + +To enable tiered storage in IoTDB, you need to configure the following aspects: + +1. configure the data catalogue and divide the data catalogue into different tiers +2. configure the TTL of the data managed in each tier to distinguish between hot and cold data categories managed in different tiers. +3. configure the minimum remaining storage space ratio for each tier so that when the storage space of the tier triggers the threshold, the data of the tier will be automatically migrated to the next tier (optional). + +The specific parameter definitions and their descriptions are as follows. + +| Configuration | Default | Description | Constraint | +| ---------------------------------------- | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| dn_data_dirs | None | specify different storage directories and divide the storage directories into tiers | Each level of storage uses a semicolon to separate, and commas to separate within a single level; cloud configuration can only be used as the last level of storage and the first level can't be used as cloud storage; a cloud object at most; the remote storage directory is denoted by OBJECT_STORAGE | +| default_ttl_in_ms | None | Define the scope of data for which each tier is responsible, expressed through a TTL | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | +| dn_default_space_move_thresholds | 0.15 | Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | +| object_storage_type | AWS_S3 | Cloud Storage Type | IoTDB currently only supports AWS S3 as a remote storage type, and this parameter can't be modified | +| object_storage_bucket | None | Name of cloud storage bucket | Bucket definition in AWS S3; no need to configure if remote storage is not used | +| object_storage_endpoiont | | endpoint of cloud storage | endpoint of AWS S3;If remote storage is not used, no configuration required | +| object_storage_access_key | | Authentication information stored in the cloud: key | AWS S3 的 credential key;If remote storage is not used, no configuration required | +| object_storage_access_secret | | Authentication information stored in the cloud: secret | AWS S3 的 credential secret;If remote storage is not used, no configuration required | +| remote_tsfile_cache_dirs | data/datanode/data/cache | Cache directory stored locally in the cloud | If remote storage is not used, no configuration required | +| remote_tsfile_cache_page_size_in_kb | 20480 |Block size of locally cached files stored in the cloud | If remote storage is not used, no configuration required | +| remote_tsfile_cache_max_disk_usage_in_mb | 51200 | Maximum Disk Occupancy Size for Cloud Storage Local Cache | If remote storage is not used, no configuration required | + +## local tiered storag configuration example + +The following is an example of a local two-level storage configuration. + +```JavaScript +//Required configuration items +dn_data_dirs=/data1/data;/data2/data,/data3/data; +default_ttl_in_ms=86400000;-1 +dn_default_space_move_thresholds=0.2;0.1 +``` + +In this example, two levels of storage are configured, specifically: + +| **层级** | **数据目录** | **数据范围** | **磁盘最小剩余空间阈值** | +| -------- | -------------------------------------- | --------------- | ------------------------ | +| 层级一 | 目录一:/data1/data | 最近 1 天的数据 | 20% | +| 层级二 | 目录一:/data2/data目录二:/data3/data | 1 天以前的数据 | 10% | + +## remote tiered storag configuration example + +The following takes three-level storage as an example: + +```JavaScript +//Required configuration items +dn_data_dirs=/data1/data;/data2/data,/data3/data;OBJECT_STORAGE +default_ttl_in_ms=86400000;864000000;-1 +dn_default_space_move_thresholds=0.2;0.15;0.1 +object_storage_name=AWS_S3 +object_storage_bucket=iotdb +object_storage_endpoiont= +object_storage_access_key= +object_storage_access_secret= + +// Optional configuration items +remote_tsfile_cache_dirs=data/datanode/data/cache +remote_tsfile_cache_page_size_in_kb=20971520 +remote_tsfile_cache_max_disk_usage_in_mb=53687091200 +``` + +In this example, a total of three levels of storage are configured, specifically: + +| **层级** | **数据目录** | **数据范围** | **磁盘最小剩余空间阈值** | +| -------- | -------------------------------------- | ---------------------------- | ------------------------ | +| 层级一 | 目录一:/data1/data | 最近 1 天的数据 | 20% | +| 层级二 | 目录一:/data2/data目录二:/data3/data | 过去1 天至过去 10 天内的数据 | 15% | +| 层级三 | 远端 AWS S3 存储 | 过去 10 天以前的数据 | 10% | diff --git a/src/zh/UserGuide/V1.2.x/Deployment-and-Maintenance/Environmental-Requirement.md b/src/zh/UserGuide/V1.2.x/Deployment-and-Maintenance/Environmental-Requirement.md index 60b6fce4..74646a40 100644 --- a/src/zh/UserGuide/V1.2.x/Deployment-and-Maintenance/Environmental-Requirement.md +++ b/src/zh/UserGuide/V1.2.x/Deployment-and-Maintenance/Environmental-Requirement.md @@ -19,7 +19,7 @@ --> -## 环境要求 +# 环境要求 要使用IoTDB,你需要具备以下条件: diff --git a/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md b/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md index 4605b179..d842aee3 100644 --- a/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md +++ b/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Publication.md @@ -19,9 +19,9 @@ --> -# 公开发表 -## 研究论文 + +# 研究论文 Apache IoTDB 始于清华大学软件学院。IoTDB 是一个用于管理大量时间序列数据的数据库,它采用了列式存储、数据编码、预计算和索引技术,具有类 SQL 的接口,可支持每秒每节点写入数百万数据点,可以秒级获得超过数万亿个数据点的查询结果。它还可以很容易地与 Apache Hadoop、MapReduce 和 Apache Spark 集成以进行分析。 @@ -34,8 +34,3 @@ Apache IoTDB 始于清华大学软件学院。IoTDB 是一个用于管理大量 * [The Design of Apache IoTDB distributed framework](http://ndbc2019.sdu.edu.cn/info/1002/1044.htm), Tianan Li, Jianmin Wang, Xiangdong Huang, Yi Xu, Dongfang Mao, Jun Yuan. NDBC 2019 * [Dual-PISA: An index for aggregation operations on time series data](https://www.sciencedirect.com/science/article/pii/S0306437918305489), Jialin Qiao, Xiangdong Huang, Jianmin Wang, Raymond K Wong. IS 2020 -## Benchmark 工具 - -我们还研发了面向时间序列数据库的 Benchmark 工具: - -[https://github.com/thulab/iot-benchmark](https://github.com/thulab/iot-benchmark) diff --git a/src/zh/UserGuide/V1.2.x/Tools-System/Monitor-Tool.md b/src/zh/UserGuide/V1.2.x/Tools-System/Monitor-Tool.md index 4df4ffcf..4232582a 100644 --- a/src/zh/UserGuide/V1.2.x/Tools-System/Monitor-Tool.md +++ b/src/zh/UserGuide/V1.2.x/Tools-System/Monitor-Tool.md @@ -19,10 +19,10 @@ --> +# 监控工具 +## Prometheus -# Prometheus - -## 监控指标的 Prometheus 映射关系 +### 监控指标的 Prometheus 映射关系 > 对于 Metric Name 为 name, Tags 为 K1=V1, ..., Kn=Vn 的监控指标有如下映射,其中 value 为具体值 @@ -34,7 +34,7 @@ | Rate | name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m1"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m5"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="m15"} value
name_total{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", rate="mean"} value | | Timer | name_seconds_max{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_sum{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds_count{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn"} value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.5"} value value
name_seconds{cluster="clusterName", nodeType="nodeType", nodeId="nodeId",k1="V1" , ..., Kn="Vn", quantile="0.99"} value | -## 修改配置文件 +### 修改配置文件 1) 以 DataNode 为例,修改 iotdb-datanode.properties 配置文件如下: @@ -58,7 +58,7 @@ file_count{name="seq",} 2.0 ... ``` -## Prometheus + Grafana +### Prometheus + Grafana 如上所示,IoTDB 对外暴露出标准的 Prometheus 格式的监控指标数据,可以使用 Prometheus 采集并存储监控指标,使用 Grafana 可视化监控指标。 @@ -100,7 +100,7 @@ static_configs: [Grafana从Prometheus查询数据并绘图的文档](https://prometheus.io/docs/visualization/grafana/#grafana-support-for-prometheus) -## Apache IoTDB Dashboard +### Apache IoTDB Dashboard 我们提供了Apache IoTDB Dashboard,支持统一集中式运维管理,可通过一个监控面板监控多个集群。 @@ -110,7 +110,7 @@ static_configs: 你可以在企业版中获取到 Dashboard 的 Json文件。 -### 集群概览 +#### 集群概览 可以监控包括但不限于: - 集群总CPU核数、总内存空间、总硬盘空间 @@ -122,7 +122,7 @@ static_configs: ![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%A6%82%E8%A7%88.png) -### 数据写入 +#### 数据写入 可以监控包括但不限于: - 写入平均耗时、耗时中位数、99%分位耗时 @@ -131,7 +131,7 @@ static_configs: ![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%86%99%E5%85%A5.png) -### 数据查询 +#### 数据查询 可以监控包括但不限于: - 节点查询加载时间序列元数据耗时 @@ -144,7 +144,7 @@ static_configs: ![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E6%9F%A5%E8%AF%A2.png) -### 存储引擎 +#### 存储引擎 可以监控包括但不限于: - 分类型的文件数量、大小 @@ -153,7 +153,7 @@ static_configs: ![](https://alioss.timecho.com/docs/img/%E7%9B%91%E6%8E%A7%20%E5%AD%98%E5%82%A8%E5%BC%95%E6%93%8E.png) -### 系统监控 +#### 系统监控 可以监控包括但不限于: - 系统内存、交换内存、进程内存 diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md b/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md index f2b86c0b..ffadee05 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Database-Programming.md @@ -19,7 +19,7 @@ --> - +# 数据库编程 ## 触发器 ### 使用说明 diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md b/src/zh/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md index 42a84cf0..0dc511b4 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Write-Delete-Data.md @@ -20,7 +20,7 @@ --> -# 写入和删除数据 +# 数据增删 ## CLI写入数据 IoTDB 为用户提供多种插入实时数据的方式,例如在 [Cli/Shell 工具](../Tools-System/CLI.md) 中直接输入插入数据的 INSERT 语句,或使用 Java API(标准 [Java JDBC](../API/Programming-JDBC.md) 接口)单条或批量执行插入数据的 INSERT 语句。 From 5c1c10756e40f1604f7627a28a347a2d644bafa8 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Tue, 17 Oct 2023 18:38:00 +0800 Subject: [PATCH 19/27] 4 --- .../V1.2.x/User-Manual/Tiered-Storage_timecho.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md index 6f45cc81..23ea05e4 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md @@ -61,10 +61,10 @@ dn_default_space_move_thresholds=0.2;0.1 In this example, two levels of storage are configured, specifically: -| **层级** | **数据目录** | **数据范围** | **磁盘最小剩余空间阈值** | +| **tier** | **data path** | **data range** | **threshold for minimum remaining disk space** | | -------- | -------------------------------------- | --------------- | ------------------------ | -| 层级一 | 目录一:/data1/data | 最近 1 天的数据 | 20% | -| 层级二 | 目录一:/data2/data目录二:/data3/data | 1 天以前的数据 | 10% | +| tier 1 | path 1:/data1/data | data for last 1 day | 20% | +| tier 2 | path 2:/data2/data path 2:/data3/data | data from 1 day ago | 10% | ## remote tiered storag configuration example @@ -89,8 +89,8 @@ remote_tsfile_cache_max_disk_usage_in_mb=53687091200 In this example, a total of three levels of storage are configured, specifically: -| **层级** | **数据目录** | **数据范围** | **磁盘最小剩余空间阈值** | +| **tier** | **data path** | **data range** | **threshold for minimum remaining disk space** | | -------- | -------------------------------------- | ---------------------------- | ------------------------ | -| 层级一 | 目录一:/data1/data | 最近 1 天的数据 | 20% | -| 层级二 | 目录一:/data2/data目录二:/data3/data | 过去1 天至过去 10 天内的数据 | 15% | -| 层级三 | 远端 AWS S3 存储 | 过去 10 天以前的数据 | 10% | +| tier一 | path 1:/data1/data | data for last 1 day | 20% | +| tier二 | path 1:/data2/data path 2:/data3/data | data from past 1 day to past 10 days | 15% | +| tier三 | Remote AWS S3 Storage | data from 1 day ago | 10% | From f7e39a1c82f79cb3ed38bef703cb45371744fc96 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Tue, 17 Oct 2023 18:56:33 +0800 Subject: [PATCH 20/27] 5 --- .../User-Manual/IoTDB-Data-Pipe_timecho.md | 24 ------------------- 1 file changed, 24 deletions(-) delete mode 100644 src/UserGuide/V1.2.x/User-Manual/IoTDB-Data-Pipe_timecho.md diff --git a/src/UserGuide/V1.2.x/User-Manual/IoTDB-Data-Pipe_timecho.md b/src/UserGuide/V1.2.x/User-Manual/IoTDB-Data-Pipe_timecho.md deleted file mode 100644 index f4f038a8..00000000 --- a/src/UserGuide/V1.2.x/User-Manual/IoTDB-Data-Pipe_timecho.md +++ /dev/null @@ -1,24 +0,0 @@ - - -# IoTDB Data Pipe - -TODO \ No newline at end of file From 01a3eacd20b84b2431ab8f10fd5fa4aa521d4aca Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Thu, 19 Oct 2023 16:05:52 +0800 Subject: [PATCH 21/27] fix words bug and delete excess doc --- src/.vuepress/sidebar/V1.2.x/en.ts | 1 - src/.vuepress/sidebar/V1.2.x/zh.ts | 1 - .../V1.2.x/User-Manual/Security-Management.md | 524 ----------------- .../Security-Management_timecho.md | 501 ----------------- .../User-Manual/Tiered-Storage_timecho.md | 16 +- .../V1.2.x/User-Manual/Security-Management.md | 527 ------------------ .../Security-Management_timecho.md | 504 ----------------- 7 files changed, 8 insertions(+), 2066 deletions(-) delete mode 100644 src/UserGuide/V1.2.x/User-Manual/Security-Management.md delete mode 100644 src/zh/UserGuide/V1.2.x/User-Manual/Security-Management.md diff --git a/src/.vuepress/sidebar/V1.2.x/en.ts b/src/.vuepress/sidebar/V1.2.x/en.ts index 12044e5d..3c085b38 100644 --- a/src/.vuepress/sidebar/V1.2.x/en.ts +++ b/src/.vuepress/sidebar/V1.2.x/en.ts @@ -88,7 +88,6 @@ export const enSidebar = { { text: 'Streaming', link: 'Streaming' }, { text: 'Data Sync', link: 'Data-Sync' }, { text: 'Database Programming', link: 'Database-Programming' }, - { text: 'Security Management', link: 'Security-Management' }, { text: 'Authority Management', link: 'Authority-Management' }, ], }, diff --git a/src/.vuepress/sidebar/V1.2.x/zh.ts b/src/.vuepress/sidebar/V1.2.x/zh.ts index 52d0af4e..8c6d1885 100644 --- a/src/.vuepress/sidebar/V1.2.x/zh.ts +++ b/src/.vuepress/sidebar/V1.2.x/zh.ts @@ -88,7 +88,6 @@ export const zhSidebar = { { text: '流处理', link: 'Streaming' }, { text: '数据同步', link: 'Data-Sync' }, { text: '数据库编程', link: 'Database-Programming' }, - { text: '安全控制', link: 'Security-Management' }, { text: '权限管理', link: 'Authority-Management' }, ], }, diff --git a/src/UserGuide/V1.2.x/User-Manual/Security-Management.md b/src/UserGuide/V1.2.x/User-Manual/Security-Management.md deleted file mode 100644 index 206086cd..00000000 --- a/src/UserGuide/V1.2.x/User-Manual/Security-Management.md +++ /dev/null @@ -1,524 +0,0 @@ - - -# Security Management - -## Administration Management - -IoTDB provides users with account privilege management operations, so as to ensure data security. - -We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../SQL-Manual/SQL-Manual.md). -At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. - -### Basic Concepts - -#### User - -The user is the legal user of the database. A user corresponds to a unique username and has a password as a means of authentication. Before using a database, a person must first provide a legitimate username and password to make himself/herself a user. - -#### Privilege - -The database provides a variety of operations, and not all users can perform all operations. If a user can perform an operation, the user is said to have the privilege to perform the operation. privileges are divided into data management privilege (such as adding, deleting and modifying data) and authority management privilege (such as creation and deletion of users and roles, granting and revoking of privileges, etc.). Data management privilege often needs a path to limit its effective range. It is flexible that using [path pattern](../Basic-Concept/Data-Model-and-Terminology.md) to manage privileges. - -#### Role - -A role is a set of privileges and has a unique role name as an identifier. A user usually corresponds to a real identity (such as a traffic dispatcher), while a real identity may correspond to multiple users. These users with the same real identity tend to have the same privileges. Roles are abstractions that can unify the management of such privileges. - -#### Default User - -There is a default user in IoTDB after the initial installation: root, and the default password is root. This user is an administrator user, who cannot be deleted and has all the privileges. Neither can new privileges be granted to the root user nor can privileges owned by the root user be deleted. - -### Privilege Management Operation Examples - -According to the [sample data](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt), the sample data of IoTDB might belong to different power generation groups such as ln, sgcc, etc. Different power generation groups do not want others to obtain their own database data, so we need to have data privilege isolated at the group layer. - -#### Create User - -We use `CREATE USER ` to create users. For example, we can use root user who has all privileges to create two users for ln and sgcc groups, named ln\_write\_user and sgcc\_write\_user, with both passwords being write\_pwd. It is recommended to wrap the username in backtick(`). The SQL statement is: - -``` -CREATE USER `ln_write_user` 'write_pwd' -CREATE USER `sgcc_write_user` 'write_pwd' -``` -Then use the following SQL statement to show the user: - -``` -LIST USER -``` -As can be seen from the result shown below, the two users have been created: - -``` -IoTDB> CREATE USER `ln_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> LIST USER -+---------------+ -| user| -+---------------+ -| ln_write_user| -| root| -|sgcc_write_user| -+---------------+ -Total line number = 3 -It costs 0.157s -``` - -#### Grant User Privilege - -At this point, although two users have been created, they do not have any privileges, so they can not operate on the database. For example, we use ln_write_user to write data in the database, the SQL statement is: - -``` -INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -``` -The SQL statement will not be executed and the corresponding error prompt is given as follows: - -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -Now, we use root user to grant the two users write privileges to the corresponding databases. - -We use `GRANT USER PRIVILEGES ON ` to grant user privileges(ps: grant create user does not need path). For example: - -``` -GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -``` -The execution result is as follows: - -``` -IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -Next, use ln_write_user to try to write data again. -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: The statement is executed successfully. -``` - -#### Revoker User Privilege - -After granting user privileges, we could use `REVOKE USER PRIVILEGES ON ` to revoke the granted user privileges(ps: revoke create user does not need path). For example, use root user to revoke the privilege of ln_write_user and sgcc_write_user: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -``` - -The execution result is as follows: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -After revoking, ln_write_user has no permission to writing data to root.ln.** -``` -INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -#### SQL Statements - -Here are all related SQL statements: - -* Create User - -``` -CREATE USER ; -Eg: IoTDB > CREATE USER `thulab` 'pwd'; -``` - -* Delete User - -``` -DROP USER ; -Eg: IoTDB > DROP USER `xiaoming`; -``` - -* Create Role - -``` -CREATE ROLE ; -Eg: IoTDB > CREATE ROLE `admin`; -``` - -* Delete Role - -``` -DROP ROLE ; -Eg: IoTDB > DROP ROLE `admin`; -``` - -* Grant User Privileges - -``` -GRANT USER PRIVILEGES ON ; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -- Grant User All Privileges - -``` -GRANT USER PRIVILEGES ALL; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; -``` - -* Grant Role Privileges - -``` -GRANT ROLE PRIVILEGES ON ; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -- Grant Role All Privileges - -``` -GRANT ROLE PRIVILEGES ALL ON ; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; -``` - -* Grant User Role - -``` -GRANT TO ; -Eg: IoTDB > GRANT `temprole` TO tempuser; -``` - -* Revoke User Privileges - -``` -REVOKE USER PRIVILEGES ON ; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -* Revoke User All Privileges - -``` -REVOKE USER PRIVILEGES ALL; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; -``` - -* Revoke Role Privileges - -``` -REVOKE ROLE PRIVILEGES ON ; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -* Revoke All Role Privileges - -``` -REVOKE ROLE PRIVILEGES ALL; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; -``` - -* Revoke Role From User - -``` -REVOKE FROM ; -Eg: IoTDB > REVOKE `temprole` FROM tempuser; -``` - -* List Users - -``` -LIST USER -Eg: IoTDB > LIST USER -``` - -* List User of Specific Role - -``` -LIST USER OF ROLE ; -Eg: IoTDB > LIST USER OF ROLE `roleuser`; -``` - -* List Roles - -``` -LIST ROLE -Eg: IoTDB > LIST ROLE -``` - -* List Roles of Specific User - -``` -LIST ROLE OF USER ; -Eg: IoTDB > LIST ROLE OF USER `tempuser`; -``` - -* List All Privileges of Users - -``` -LIST PRIVILEGES USER ; -Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; -``` - -* List Related Privileges of Users(On Specific Paths) - -``` -LIST PRIVILEGES USER ON ; -Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -``` - -* List All Privileges of Roles - -``` -LIST PRIVILEGES ROLE -Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; -``` - -* List Related Privileges of Roles(On Specific Paths) - -``` -LIST PRIVILEGES ROLE ON ; -Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -``` - -* Alter Password - -``` -ALTER USER SET PASSWORD ; -Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; -``` - - -### Other Instructions - -#### The Relationship among Users, Privileges and Roles - -A Role is a set of privileges, and privileges and roles are both attributes of users. That is, a role can have several privileges and a user can have several roles and privileges (called the user's own privileges). - -At present, there is no conflicting privilege in IoTDB, so the real privileges of a user is the union of the user's own privileges and the privileges of the user's roles. That is to say, to determine whether a user can perform an operation, it depends on whether one of the user's own privileges or the privileges of the user's roles permits the operation. The user's own privileges and privileges of the user's roles may overlap, but it does not matter. - -It should be noted that if users have a privilege (corresponding to operation A) themselves and their roles contain the same privilege, then revoking the privilege from the users themselves alone can not prohibit the users from performing operation A, since it is necessary to revoke the privilege from the role, or revoke the role from the user. Similarly, revoking the privilege from the users's roles alone can not prohibit the users from performing operation A. - -At the same time, changes to roles are immediately reflected on all users who own the roles. For example, adding certain privileges to roles will immediately give all users who own the roles corresponding privileges, and deleting certain privileges will also deprive the corresponding users of the privileges (unless the users themselves have the privileges). - -#### List of Privileges Included in the System - -| privilege Name | Interpretation | Example | -|:--------------------------|:-------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| CREATE\_DATABASE | create database; set/unset database ttl; path dependent | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | -| DELETE\_DATABASE | delete databases; path dependent | Eg: `delete database root.ln;` | -| CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | -| INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | -| ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | -| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](./Query-Data.md#OVERVIEW)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | -| DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | -| CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | -| DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | -| MODIFY\_PASSWORD | modify passwords for all users; path independent; (Those who do not have this privilege can still change their own asswords. ) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | -| LIST\_USER | list all users; list all user of specific role; list a user's related privileges on speciific paths; path independent | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | -| GRANT\_USER\_PRIVILEGE | grant user privileges; path independent | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| REVOKE\_USER\_PRIVILEGE | revoke user privileges; path independent | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| GRANT\_USER\_ROLE | grant user roles; path independent | Eg: `grant temprole to tempuser;` | -| REVOKE\_USER\_ROLE | revoke user roles; path independent | Eg: `revoke temprole from tempuser;` | -| CREATE\_ROLE | create roles; path independent | Eg: `create role admin;` | -| DELETE\_ROLE | delete roles; path independent | Eg: `drop role admin;` | -| LIST\_ROLE | list all roles; list all roles of specific user; list a role's related privileges on speciific paths; path independent | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | -| GRANT\_ROLE\_PRIVILEGE | grant role privileges; path independent | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| REVOKE\_ROLE\_PRIVILEGE | revoke role privileges; path independent | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| CREATE_FUNCTION | register UDFs; path independent | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | -| DROP_FUNCTION | deregister UDFs; path independent | Eg: `drop function example` | -| CREATE_TRIGGER | create triggers; path dependent | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | -| DROP_TRIGGER | drop triggers; path dependent | Eg: `drop trigger 'alert-listener-sg1d1s1'` | -| CREATE_CONTINUOUS_QUERY | create continuous queries; path independent | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | -| DROP_CONTINUOUS_QUERY | drop continuous queries; path independent | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | -| SHOW_CONTINUOUS_QUERIES | show continuous queries; path independent | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | -| UPDATE_TEMPLATE | create and drop schema template; path independent | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | -| READ_TEMPLATE | show schema templates and show nodes in schema template; path independent | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | -| APPLY_TEMPLATE | set, unset and activate schema template; path dependent | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | -| READ_TEMPLATE_APPLICATION | show paths set and using schema template; path independent | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | - -Note that path dependent privileges can only be granted or revoked on root.**; - -Note that the following SQL statements need to be granted multiple permissions before they can be used: - -- Import data: Need to assign `READ_TIMESERIES`,`INSERT_TIMESERIES` two permissions.。 - -``` -Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv -``` - -- Query Write-back (SELECT INTO) -- - `READ_TIMESERIES` permission of source sequence in all `select` clauses is required -- `INSERT_TIMESERIES` permission of target sequence in all `into` clauses is required - -``` -Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 -``` - -#### Username Restrictions - -IoTDB specifies that the character length of a username should not be less than 4, and the username cannot contain spaces. - -#### Password Restrictions - -IoTDB specifies that the character length of a password should have no less than 4 character length, and no spaces. The password is encrypted with MD5. - -#### Role Name Restrictions - -IoTDB specifies that the character length of a role name should have no less than 4 character length, and no spaces. - -#### Path pattern in Administration Management - -A path pattern's result set contains all the elements of its sub pattern's -result set. For example, `root.sg.d.*` is a sub pattern of -`root.sg.*.*`, while `root.sg.**` is not a sub pattern of -`root.sg.*.*`. When a user is granted privilege on a pattern, the pattern used in his DDL or DML must be a sub pattern of the privilege pattern, which guarantees that the user won't access the timeseries exceed his privilege scope. - -#### Permission cache - -In distributed related permission operations, when changing permissions other than creating users and roles, all the cache information of `dataNode` related to the user (role) will be cleared first. If any `dataNode` cache information is clear and fails, the permission change task will fail. - -#### Operations restricted by non root users - -At present, the following SQL statements supported by iotdb can only be operated by the `root` user, and no corresponding permission can be given to the new user. - -##### TsFile Management - -- Load TsFiles - -``` -Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' -``` - -- remove a tsfile - -``` -Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' -``` - -- unload a tsfile and move it to a target directory - -``` -Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' -``` - -##### Delete Time Partition (experimental) - -``` -Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 -``` - -##### Continuous Query,CQ - -``` -Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END -``` - -##### Maintenance Command - -- FLUSH - -``` -Eg: IoTDB > flush -``` - -- MERGE - -``` -Eg: IoTDB > MERGE -Eg: IoTDB > FULL MERGE -``` - -- CLEAR CACHE - -```sql -Eg: IoTDB > CLEAR CACHE -``` - -- SET STSTEM TO READONLY / WRITABLE - -``` -Eg: IoTDB > SET STSTEM TO READONLY / WRITABLE -``` - -- Query abort - -``` -Eg: IoTDB > KILL QUERY 1 -``` - -##### Watermark Tool - -- Watermark new users - -``` -Eg: IoTDB > grant watermark_embedding to Alice -``` - -- Watermark Detection - -``` -Eg: IoTDB > revoke watermark_embedding from Alice -``` diff --git a/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md index 222820bf..baf57752 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md @@ -29,504 +29,3 @@ coming soon coming soon -## Administration Management - -IoTDB provides users with account privilege management operations, so as to ensure data security. - -We will show you basic user privilege management operations through the following specific examples. Detailed SQL syntax and usage details can be found in [SQL Documentation](../User-Manual/Security-Management_timecho.md). -At the same time, in the JAVA programming environment, you can use the [Java JDBC](../API/Programming-JDBC.md) to execute privilege management statements in a single or batch mode. - -### Basic Concepts - -#### User - -The user is the legal user of the database. A user corresponds to a unique username and has a password as a means of authentication. Before using a database, a person must first provide a legitimate username and password to make himself/herself a user. - -#### Privilege - -The database provides a variety of operations, and not all users can perform all operations. If a user can perform an operation, the user is said to have the privilege to perform the operation. privileges are divided into data management privilege (such as adding, deleting and modifying data) and authority management privilege (such as creation and deletion of users and roles, granting and revoking of privileges, etc.). Data management privilege often needs a path to limit its effective range. It is flexible that using [path pattern](../Basic-Concept/Data-Model-and-Terminology.md) to manage privileges. - -#### Role - -A role is a set of privileges and has a unique role name as an identifier. A user usually corresponds to a real identity (such as a traffic dispatcher), while a real identity may correspond to multiple users. These users with the same real identity tend to have the same privileges. Roles are abstractions that can unify the management of such privileges. - -#### Default User - -There is a default user in IoTDB after the initial installation: root, and the default password is root. This user is an administrator user, who cannot be deleted and has all the privileges. Neither can new privileges be granted to the root user nor can privileges owned by the root user be deleted. - -### Privilege Management Operation Examples - -According to the [sample data](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt), the sample data of IoTDB might belong to different power generation groups such as ln, sgcc, etc. Different power generation groups do not want others to obtain their own database data, so we need to have data privilege isolated at the group layer. - -#### Create User - -We use `CREATE USER ` to create users. For example, we can use root user who has all privileges to create two users for ln and sgcc groups, named ln\_write\_user and sgcc\_write\_user, with both passwords being write\_pwd. It is recommended to wrap the username in backtick(`). The SQL statement is: - -``` -CREATE USER `ln_write_user` 'write_pwd' -CREATE USER `sgcc_write_user` 'write_pwd' -``` -Then use the following SQL statement to show the user: - -``` -LIST USER -``` -As can be seen from the result shown below, the two users have been created: - -``` -IoTDB> CREATE USER `ln_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> LIST USER -+---------------+ -| user| -+---------------+ -| ln_write_user| -| root| -|sgcc_write_user| -+---------------+ -Total line number = 3 -It costs 0.157s -``` - -#### Grant User Privilege - -At this point, although two users have been created, they do not have any privileges, so they can not operate on the database. For example, we use ln_write_user to write data in the database, the SQL statement is: - -``` -INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -``` -The SQL statement will not be executed and the corresponding error prompt is given as follows: - -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -Now, we use root user to grant the two users write privileges to the corresponding databases. - -We use `GRANT USER PRIVILEGES ON ` to grant user privileges(ps: grant create user does not need path). For example: - -``` -GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -``` -The execution result is as follows: - -``` -IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -Next, use ln_write_user to try to write data again. -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: The statement is executed successfully. -``` - -#### Revoker User Privilege - -After granting user privileges, we could use `REVOKE USER PRIVILEGES ON ` to revoke the granted user privileges(ps: revoke create user does not need path). For example, use root user to revoke the privilege of ln_write_user and sgcc_write_user: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -``` - -The execution result is as follows: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -After revoking, ln_write_user has no permission to writing data to root.ln.** -``` -INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -#### SQL Statements - -Here are all related SQL statements: - -* Create User - -``` -CREATE USER ; -Eg: IoTDB > CREATE USER `thulab` 'pwd'; -``` - -* Delete User - -``` -DROP USER ; -Eg: IoTDB > DROP USER `xiaoming`; -``` - -* Create Role - -``` -CREATE ROLE ; -Eg: IoTDB > CREATE ROLE `admin`; -``` - -* Delete Role - -``` -DROP ROLE ; -Eg: IoTDB > DROP ROLE `admin`; -``` - -* Grant User Privileges - -``` -GRANT USER PRIVILEGES ON ; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -- Grant User All Privileges - -``` -GRANT USER PRIVILEGES ALL; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; -``` - -* Grant Role Privileges - -``` -GRANT ROLE PRIVILEGES ON ; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -- Grant Role All Privileges - -``` -GRANT ROLE PRIVILEGES ALL ON ; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; -``` - -* Grant User Role - -``` -GRANT TO ; -Eg: IoTDB > GRANT `temprole` TO tempuser; -``` - -* Revoke User Privileges - -``` -REVOKE USER PRIVILEGES ON ; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -* Revoke User All Privileges - -``` -REVOKE USER PRIVILEGES ALL; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; -``` - -* Revoke Role Privileges - -``` -REVOKE ROLE PRIVILEGES ON ; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -* Revoke All Role Privileges - -``` -REVOKE ROLE PRIVILEGES ALL; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; -``` - -* Revoke Role From User - -``` -REVOKE FROM ; -Eg: IoTDB > REVOKE `temprole` FROM tempuser; -``` - -* List Users - -``` -LIST USER -Eg: IoTDB > LIST USER -``` - -* List User of Specific Role - -``` -LIST USER OF ROLE ; -Eg: IoTDB > LIST USER OF ROLE `roleuser`; -``` - -* List Roles - -``` -LIST ROLE -Eg: IoTDB > LIST ROLE -``` - -* List Roles of Specific User - -``` -LIST ROLE OF USER ; -Eg: IoTDB > LIST ROLE OF USER `tempuser`; -``` - -* List All Privileges of Users - -``` -LIST PRIVILEGES USER ; -Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; -``` - -* List Related Privileges of Users(On Specific Paths) - -``` -LIST PRIVILEGES USER ON ; -Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -``` - -* List All Privileges of Roles - -``` -LIST PRIVILEGES ROLE -Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; -``` - -* List Related Privileges of Roles(On Specific Paths) - -``` -LIST PRIVILEGES ROLE ON ; -Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -``` - -* Alter Password - -``` -ALTER USER SET PASSWORD ; -Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; -``` - - -### Other Instructions - -#### The Relationship among Users, Privileges and Roles - -A Role is a set of privileges, and privileges and roles are both attributes of users. That is, a role can have several privileges and a user can have several roles and privileges (called the user's own privileges). - -At present, there is no conflicting privilege in IoTDB, so the real privileges of a user is the union of the user's own privileges and the privileges of the user's roles. That is to say, to determine whether a user can perform an operation, it depends on whether one of the user's own privileges or the privileges of the user's roles permits the operation. The user's own privileges and privileges of the user's roles may overlap, but it does not matter. - -It should be noted that if users have a privilege (corresponding to operation A) themselves and their roles contain the same privilege, then revoking the privilege from the users themselves alone can not prohibit the users from performing operation A, since it is necessary to revoke the privilege from the role, or revoke the role from the user. Similarly, revoking the privilege from the users's roles alone can not prohibit the users from performing operation A. - -At the same time, changes to roles are immediately reflected on all users who own the roles. For example, adding certain privileges to roles will immediately give all users who own the roles corresponding privileges, and deleting certain privileges will also deprive the corresponding users of the privileges (unless the users themselves have the privileges). - -#### List of Privileges Included in the System - -| privilege Name | Interpretation | Example | -|:--------------------------|:-------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| CREATE\_DATABASE | create database; set/unset database ttl; path dependent | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | -| DELETE\_DATABASE | delete databases; path dependent | Eg: `delete database root.ln;` | -| CREATE\_TIMESERIES | create timeseries; path dependent | Eg1: create timeseries
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: create aligned timeseries
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | -| INSERT\_TIMESERIES | insert data; path dependent | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | -| ALTER\_TIMESERIES | alter timeseries; path dependent | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | -| READ\_TIMESERIES | query data; path dependent | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [Query-Data](./Query-Data.md#OVERVIEW)(The query statements under this section all use this permission)
Eg8: CVS format data export
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: Performance Tracing Tool
`tracing select * from root.**`
Eg10: UDF-Query
`select example(*) from root.sg.d1`
Eg11: Triggers-Query
`show triggers`
Eg12: Count-Query
`count devices` | -| DELETE\_TIMESERIES | delete data or timeseries; path dependent | Eg1: delete timeseries
`delete timeseries root.ln.wf01.wt01.status`
Eg2: delete data
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: use drop semantic
`drop timeseries root.ln.wf01.wt01.status | -| CREATE\_USER | create users; path independent | Eg: `create user thulab 'passwd';` | -| DELETE\_USER | delete users; path independent | Eg: `drop user xiaoming;` | -| MODIFY\_PASSWORD | modify passwords for all users; path independent; (Those who do not have this privilege can still change their own asswords. ) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | -| LIST\_USER | list all users; list all user of specific role; list a user's related privileges on speciific paths; path independent | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | -| GRANT\_USER\_PRIVILEGE | grant user privileges; path independent | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| REVOKE\_USER\_PRIVILEGE | revoke user privileges; path independent | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| GRANT\_USER\_ROLE | grant user roles; path independent | Eg: `grant temprole to tempuser;` | -| REVOKE\_USER\_ROLE | revoke user roles; path independent | Eg: `revoke temprole from tempuser;` | -| CREATE\_ROLE | create roles; path independent | Eg: `create role admin;` | -| DELETE\_ROLE | delete roles; path independent | Eg: `drop role admin;` | -| LIST\_ROLE | list all roles; list all roles of specific user; list a role's related privileges on speciific paths; path independent | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | -| GRANT\_ROLE\_PRIVILEGE | grant role privileges; path independent | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| REVOKE\_ROLE\_PRIVILEGE | revoke role privileges; path independent | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| CREATE_FUNCTION | register UDFs; path independent | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | -| DROP_FUNCTION | deregister UDFs; path independent | Eg: `drop function example` | -| CREATE_TRIGGER | create triggers; path dependent | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | -| DROP_TRIGGER | drop triggers; path dependent | Eg: `drop trigger 'alert-listener-sg1d1s1'` | -| CREATE_CONTINUOUS_QUERY | create continuous queries; path independent | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | -| DROP_CONTINUOUS_QUERY | drop continuous queries; path independent | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | -| SHOW_CONTINUOUS_QUERIES | show continuous queries; path independent | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | -| UPDATE_TEMPLATE | create and drop schema template; path independent | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | -| READ_TEMPLATE | show schema templates and show nodes in schema template; path independent | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | -| APPLY_TEMPLATE | set, unset and activate schema template; path dependent | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | -| READ_TEMPLATE_APPLICATION | show paths set and using schema template; path independent | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | - -Note that path dependent privileges can only be granted or revoked on root.**; - -Note that the following SQL statements need to be granted multiple permissions before they can be used: - -- Import data: Need to assign `READ_TIMESERIES`,`INSERT_TIMESERIES` two permissions.。 - -``` -Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv -``` - -- Query Write-back (SELECT INTO) -- - `READ_TIMESERIES` permission of source sequence in all `select` clauses is required -- `INSERT_TIMESERIES` permission of target sequence in all `into` clauses is required - -``` -Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 -``` - -#### Username Restrictions - -IoTDB specifies that the character length of a username should not be less than 4, and the username cannot contain spaces. - -#### Password Restrictions - -IoTDB specifies that the character length of a password should have no less than 4 character length, and no spaces. The password is encrypted with MD5. - -#### Role Name Restrictions - -IoTDB specifies that the character length of a role name should have no less than 4 character length, and no spaces. - -#### Path pattern in Administration Management - -A path pattern's result set contains all the elements of its sub pattern's -result set. For example, `root.sg.d.*` is a sub pattern of -`root.sg.*.*`, while `root.sg.**` is not a sub pattern of -`root.sg.*.*`. When a user is granted privilege on a pattern, the pattern used in his DDL or DML must be a sub pattern of the privilege pattern, which guarantees that the user won't access the timeseries exceed his privilege scope. - -#### Permission cache - -In distributed related permission operations, when changing permissions other than creating users and roles, all the cache information of `dataNode` related to the user (role) will be cleared first. If any `dataNode` cache information is clear and fails, the permission change task will fail. - -#### Operations restricted by non root users - -At present, the following SQL statements supported by iotdb can only be operated by the `root` user, and no corresponding permission can be given to the new user. - -##### TsFile Management - -- Load TsFiles - -``` -Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' -``` - -- remove a tsfile - -``` -Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' -``` - -- unload a tsfile and move it to a target directory - -``` -Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' -``` - -##### Delete Time Partition (experimental) - -``` -Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 -``` - -##### Continuous Query,CQ - -``` -Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END -``` - -##### Maintenance Command - -- FLUSH - -``` -Eg: IoTDB > flush -``` - -- MERGE - -``` -Eg: IoTDB > MERGE -Eg: IoTDB > FULL MERGE -``` - -- CLEAR CACHE - -```sql -Eg: IoTDB > CLEAR CACHE -``` - -- SET STSTEM TO READONLY / WRITABLE - -``` -Eg: IoTDB > SET STSTEM TO READONLY / WRITABLE -``` - -- Query abort - -``` -Eg: IoTDB > KILL QUERY 1 -``` - -##### Watermark Tool - -- Watermark new users - -``` -Eg: IoTDB > grant watermark_embedding to Alice -``` - -- Watermark Detection - -``` -Eg: IoTDB > revoke watermark_embedding from Alice -``` diff --git a/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md b/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md index 23ea05e4..3fe5792f 100644 --- a/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md +++ b/src/UserGuide/V1.2.x/User-Manual/Tiered-Storage_timecho.md @@ -22,7 +22,7 @@ # Tiered Storage ## Overview -Tiered storage function provides users with the ability to manage tiered storage media. users can use the tiered storage function to configure different types of storage media for IoTDB and to classify the storage media. ioTDB can support tiered storage from memory, SSD, normal hard disc to network hard disc by parameter configuration only according to the degree of hot and cold data. Specifically, in IoTDB, the configuration of tiered storage is reflected in the management of multiple directories. Users can group tiered storage directories into the same category and configure them into IoTDB as a "tier", which is called storage tier; at the same time, users can categorize data according to hot or cold, and store different categories of data into designated storage tiers. Meanwhile, users can categorise data according to hot or cold and store different categories of data in the specified tier. Currently, IoTDB supports the classification of hot and cold data by TTL, when the data in one tier does not meet the TTL rules defined in the current tier, the data will be automatically migrated to the next tier. +The Tiered storage functionality allows users to define multiple layers of storage, spanning across multiple types of storage media (Memory mapped directory, SSD, rotational hard discs or cloud storage). While memory and cloud storage is usually singular, the local file system storages can consist of multiple directories joined together into one tier. Meanwhile, users can classify data based on its hot or cold nature and store data of different categories in specified "tier". Currently, IoTDB supports the classification of hot and cold data through TTL (Time to live / age) of data. When the data in one tier does not meet the TTL rules defined in the current tier, the data will be automatically migrated to the next tier. ## Parameter Definition @@ -36,14 +36,14 @@ The specific parameter definitions and their descriptions are as follows. | Configuration | Default | Description | Constraint | | ---------------------------------------- | ------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | -| dn_data_dirs | None | specify different storage directories and divide the storage directories into tiers | Each level of storage uses a semicolon to separate, and commas to separate within a single level; cloud configuration can only be used as the last level of storage and the first level can't be used as cloud storage; a cloud object at most; the remote storage directory is denoted by OBJECT_STORAGE | -| default_ttl_in_ms | None | Define the scope of data for which each tier is responsible, expressed through a TTL | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | +| dn_data_dirs | None | specify different storage directories and divide the storage directories into tiers | Each level of storage uses a semicolon to separate, and commas to separate within a single level; cloud (OBJECT_STORAGE) configuration can only be used as the last level of storage and the first level can't be used as cloud storage; a cloud object at most; the remote storage directory is denoted by OBJECT_STORAGE | +| default_ttl_in_ms | None | Define the maximum age of data for which each tier is responsible | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | | dn_default_space_move_thresholds | 0.15 | Define the minimum remaining space ratio for each tier data catalogue; when the remaining space is less than this ratio, the data will be automatically migrated to the next tier; when the remaining storage space of the last tier falls below this threshold, the system will be set to READ_ONLY | Each level of storage is separated by a semicolon; the number of levels should match the number of levels defined by dn_data_dirs | | object_storage_type | AWS_S3 | Cloud Storage Type | IoTDB currently only supports AWS S3 as a remote storage type, and this parameter can't be modified | | object_storage_bucket | None | Name of cloud storage bucket | Bucket definition in AWS S3; no need to configure if remote storage is not used | | object_storage_endpoiont | | endpoint of cloud storage | endpoint of AWS S3;If remote storage is not used, no configuration required | -| object_storage_access_key | | Authentication information stored in the cloud: key | AWS S3 的 credential key;If remote storage is not used, no configuration required | -| object_storage_access_secret | | Authentication information stored in the cloud: secret | AWS S3 的 credential secret;If remote storage is not used, no configuration required | +| object_storage_access_key | | Authentication information stored in the cloud: key | AWS S3 credential key;If remote storage is not used, no configuration required | +| object_storage_access_secret | | Authentication information stored in the cloud: secret | AWS S3 credential secret;If remote storage is not used, no configuration required | | remote_tsfile_cache_dirs | data/datanode/data/cache | Cache directory stored locally in the cloud | If remote storage is not used, no configuration required | | remote_tsfile_cache_page_size_in_kb | 20480 |Block size of locally cached files stored in the cloud | If remote storage is not used, no configuration required | | remote_tsfile_cache_max_disk_usage_in_mb | 51200 | Maximum Disk Occupancy Size for Cloud Storage Local Cache | If remote storage is not used, no configuration required | @@ -91,6 +91,6 @@ In this example, a total of three levels of storage are configured, specifically | **tier** | **data path** | **data range** | **threshold for minimum remaining disk space** | | -------- | -------------------------------------- | ---------------------------- | ------------------------ | -| tier一 | path 1:/data1/data | data for last 1 day | 20% | -| tier二 | path 1:/data2/data path 2:/data3/data | data from past 1 day to past 10 days | 15% | -| tier三 | Remote AWS S3 Storage | data from 1 day ago | 10% | +| tier1 | path 1:/data1/data | data for last 1 day | 20% | +| tier2 | path 1:/data2/data path 2:/data3/data | data from past 1 day to past 10 days | 15% | +| tier3 | Remote AWS S3 Storage | data from 1 day ago | 10% | diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Security-Management.md b/src/zh/UserGuide/V1.2.x/User-Manual/Security-Management.md deleted file mode 100644 index bb137734..00000000 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Security-Management.md +++ /dev/null @@ -1,527 +0,0 @@ - - -# 安全控制 - -## 权限管理 - -IoTDB 为用户提供了权限管理操作,从而为用户提供对于数据的权限管理功能,保障数据的安全。 - -我们将通过以下几个具体的例子为您示范基本的用户权限操作,详细的 SQL 语句及使用方式详情请参见本文 [数据模式与概念章节](../Basic-Concept/Data-Model-and-Terminology.md)。同时,在 JAVA 编程环境中,您可以使用 [JDBC API](../API/Programming-JDBC.md) 单条或批量执行权限管理类语句。 - -### 基本概念 - -#### 用户 - -用户即数据库的合法使用者。一个用户与一个唯一的用户名相对应,并且拥有密码作为身份验证的手段。一个人在使用数据库之前,必须先提供合法的(即存于数据库中的)用户名与密码,使得自己成为用户。 - -#### 权限 - -数据库提供多种操作,并不是所有的用户都能执行所有操作。如果一个用户可以执行某项操作,则称该用户有执行该操作的权限。权限可分为数据管理权限(如对数据进行增删改查)以及权限管理权限(用户、角色的创建与删除,权限的赋予与撤销等)。数据管理权限往往需要一个路径来限定其生效范围,可使用[路径模式](../Basic-Concept/Data-Model-and-Terminology.md)灵活管理权限。 - -#### 角色 - -角色是若干权限的集合,并且有一个唯一的角色名作为标识符。用户通常和一个现实身份相对应(例如交通调度员),而一个现实身份可能对应着多个用户。这些具有相同现实身份的用户往往具有相同的一些权限。角色就是为了能对这样的权限进行统一的管理的抽象。 - -#### 默认用户及其具有的角色 - -初始安装后的 IoTDB 中有一个默认用户:root,默认密码为 root。该用户为管理员用户,固定拥有所有权限,无法被赋予、撤销权限,也无法被删除。 - -### 权限操作示例 - -根据本文中描述的 [样例数据](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt) 内容,IoTDB 的样例数据可能同时属于 ln, sgcc 等不同发电集团,不同的发电集团不希望其他发电集团获取自己的数据库数据,因此我们需要将不同的数据在集团层进行权限隔离。 - -#### 创建用户 - -使用 `CREATE USER ` 创建用户。例如,我们可以使用具有所有权限的root用户为 ln 和 sgcc 集团创建两个用户角色,名为 ln_write_user, sgcc_write_user,密码均为 write_pwd。建议使用反引号(`)包裹用户名。SQL 语句为: - -``` -CREATE USER `ln_write_user` 'write_pwd' -CREATE USER `sgcc_write_user` 'write_pwd' -``` - -此时使用展示用户的 SQL 语句: - -``` -LIST USER -``` - -我们可以看到这两个已经被创建的用户,结果如下: - -``` -IoTDB> CREATE USER `ln_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> LIST USER -+---------------+ -| user| -+---------------+ -| ln_write_user| -| root| -|sgcc_write_user| -+---------------+ -Total line number = 3 -It costs 0.157s -``` - -#### 赋予用户权限 - -此时,虽然两个用户已经创建,但是他们不具有任何权限,因此他们并不能对数据库进行操作,例如我们使用 ln_write_user 用户对数据库中的数据进行写入,SQL 语句为: - -``` -INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -``` - -此时,系统不允许用户进行此操作,会提示错误: - -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -现在,我们用root用户分别赋予他们向对应 database 数据的写入权限. - -我们使用 `GRANT USER PRIVILEGES ON ` 语句赋予用户权限(注:其中,创建用户权限无需指定路径),例如: - -``` -GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -``` - -执行状态如下所示: - -``` -IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -接着使用ln_write_user再尝试写入数据 - -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: The statement is executed successfully. -``` - -#### 撤销用户权限 - -授予用户权限后,我们可以使用 `REVOKE USER PRIVILEGES ON ` 来撤销已授予的用户权限(注:其中,撤销创建用户权限无需指定路径)。例如,用root用户撤销ln_write_user和sgcc_write_user的权限: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -``` - -执行状态如下所示: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -撤销权限后,ln_write_user就没有向root.ln.**写入数据的权限了。 - -``` -INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -#### SQL 语句 - -与权限相关的语句包括: - -* 创建用户 - -``` -CREATE USER ; -Eg: IoTDB > CREATE USER `thulab` 'passwd'; -``` - -* 删除用户 - -``` -DROP USER ; -Eg: IoTDB > DROP USER `xiaoming`; -``` - -* 创建角色 - -``` -CREATE ROLE ; -Eg: IoTDB > CREATE ROLE `admin`; -``` - -* 删除角色 - -``` -DROP ROLE ; -Eg: IoTDB > DROP ROLE `admin`; -``` - -* 赋予用户权限 - -``` -GRANT USER PRIVILEGES ON ; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -- 赋予用户全部的权限 - -``` -GRANT USER PRIVILEGES ALL; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; -``` - -* 赋予角色权限 - -``` -GRANT ROLE PRIVILEGES ON ; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -- 赋予角色全部的权限 - -``` -GRANT ROLE PRIVILEGES ALL; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; -``` - -* 赋予用户角色 - -``` -GRANT TO ; -Eg: IoTDB > GRANT `temprole` TO tempuser; -``` - -* 撤销用户权限 - -``` -REVOKE USER PRIVILEGES ON ; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -- 移除用户所有权限 - -``` -REVOKE USER PRIVILEGES ALL; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; -``` - -* 撤销角色权限 - -``` -REVOKE ROLE PRIVILEGES ON ; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -- 撤销角色全部的权限 - -``` -REVOKE ROLE PRIVILEGES ALL; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; -``` - -* 撤销用户角色 - -``` -REVOKE FROM ; -Eg: IoTDB > REVOKE `temprole` FROM tempuser; -``` - -* 列出所有用户 - -``` -LIST USER -Eg: IoTDB > LIST USER -``` - -* 列出指定角色下所有用户 - -``` -LIST USER OF ROLE ; -Eg: IoTDB > LIST USER OF ROLE `roleuser`; -``` - -* 列出所有角色 - -``` -LIST ROLE -Eg: IoTDB > LIST ROLE -``` - -* 列出指定用户下所有角色 - -``` -LIST ROLE OF USER ; -Eg: IoTDB > LIST ROLE OF USER `tempuser`; -``` - -* 列出用户所有权限 - -``` -LIST PRIVILEGES USER ; -Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; -``` - -* 列出用户在具体路径上相关联的权限 - -``` -LIST PRIVILEGES USER ON ; -Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -``` - -* 列出角色所有权限 - -``` -LIST PRIVILEGES ROLE ; -Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; -``` - -* 列出角色在具体路径上相关联的权限 - -``` -LIST PRIVILEGES ROLE ON ; -Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -``` - -* 更新密码 - -``` -ALTER USER SET PASSWORD ; -Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; -``` - - -### 其他说明 - -#### 用户、权限与角色的关系 - -角色是权限的集合,而权限和角色都是用户的一种属性。即一个角色可以拥有若干权限。一个用户可以拥有若干角色与权限(称为用户自身权限)。 - -目前在 IoTDB 中并不存在相互冲突的权限,因此一个用户真正具有的权限是用户自身权限与其所有的角色的权限的并集。即要判定用户是否能执行某一项操作,就要看用户自身权限或用户的角色的所有权限中是否有一条允许了该操作。用户自身权限与其角色权限,他的多个角色的权限之间可能存在相同的权限,但这并不会产生影响。 - -需要注意的是:如果一个用户自身有某种权限(对应操作 A),而他的某个角色有相同的权限。那么如果仅从该用户撤销该权限无法达到禁止该用户执行操作 A 的目的,还需要从这个角色中也撤销对应的权限,或者从这个用户将该角色撤销。同样,如果仅从上述角色将权限撤销,也不能禁止该用户执行操作 A。 - -同时,对角色的修改会立即反映到所有拥有该角色的用户上,例如对角色增加某种权限将立即使所有拥有该角色的用户都拥有对应权限,删除某种权限也将使对应用户失去该权限(除非用户本身有该权限)。 - -#### 系统所含权限列表 - -| 权限名称 | 说明 | 示例 | -| :------------------------ | :----------------------------------------------------------- | ------------------------------------------------------------ | -| CREATE\_DATABASE | 创建 database。包含设置 database 的权限和TTL。路径相关 | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | -| DELETE\_DATABASE | 删除 database。路径相关 | Eg: `delete database root.ln;` | -| CREATE\_TIMESERIES | 创建时间序列。路径相关 | Eg1: 创建时间序列
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: 创建对齐时间序列
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | -| INSERT\_TIMESERIES | 插入数据。路径相关 | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | -| ALTER\_TIMESERIES | 修改时间序列标签。路径相关 | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | -| READ\_TIMESERIES | 查询数据。路径相关 | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [数据查询](./Query-Data.md#概述)(这一节之下的查询语句均使用该权限)
Eg8: CVS格式数据导出
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: 查询性能追踪
`tracing select * from root.**`
Eg10: UDF查询
`select example(*) from root.sg.d1`
Eg11: 查询触发器
`show triggers`
Eg12: 统计查询
`count devices` | -| DELETE\_TIMESERIES | 删除数据或时间序列。路径相关 | Eg1: 删除时间序列
`delete timeseries root.ln.wf01.wt01.status`
Eg2: 删除数据
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: 使用DROP关键字
`drop timeseries root.ln.wf01.wt01.status` | -| CREATE\_USER | 创建用户。路径无关 | Eg: `create user thulab 'passwd';` | -| DELETE\_USER | 删除用户。路径无关 | Eg: `drop user xiaoming;` | -| MODIFY\_PASSWORD | 修改所有用户的密码。路径无关。(没有该权限者仍然能够修改自己的密码。) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | -| LIST\_USER | 列出所有用户,列出具有某角色的所有用户,列出用户在指定路径下相关权限。路径无关 | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | -| GRANT\_USER\_PRIVILEGE | 赋予用户权限。路径无关 | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| REVOKE\_USER\_PRIVILEGE | 撤销用户权限。路径无关 | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| GRANT\_USER\_ROLE | 赋予用户角色。路径无关 | Eg: `grant temprole to tempuser;` | -| REVOKE\_USER\_ROLE | 撤销用户角色。路径无关 | Eg: `revoke temprole from tempuser;` | -| CREATE\_ROLE | 创建角色。路径无关 | Eg: `create role admin;` | -| DELETE\_ROLE | 删除角色。路径无关 | Eg: `drop role admin;` | -| LIST\_ROLE | 列出所有角色,列出某用户下所有角色,列出角色在指定路径下相关权限。路径无关 | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | -| GRANT\_ROLE\_PRIVILEGE | 赋予角色权限。路径无关 | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| REVOKE\_ROLE\_PRIVILEGE | 撤销角色权限。路径无关 | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| CREATE_FUNCTION | 注册 UDF。路径无关 | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | -| DROP_FUNCTION | 卸载 UDF。路径无关 | Eg: `drop function example` | -| CREATE_TRIGGER | 创建触发器。路径相关 | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | -| DROP_TRIGGER | 卸载触发器。路径相关 | Eg: `drop trigger 'alert-listener-sg1d1s1'` | -| CREATE_CONTINUOUS_QUERY | 创建连续查询。路径无关 | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | -| DROP_CONTINUOUS_QUERY | 卸载连续查询。路径无关 | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | -| SHOW_CONTINUOUS_QUERIES | 展示所有连续查询。路径无关 | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | -| UPDATE_TEMPLATE | 创建、删除模板。路径无关。 | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | -| READ_TEMPLATE | 查看所有模板、模板内容。 路径无关 | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | -| APPLY_TEMPLATE | 挂载、卸载、激活、解除模板。路径有关。 | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | -| READ_TEMPLATE_APPLICATION | 查看模板的挂载路径和激活路径。路径无关 | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | - -注意: 路径无关的权限只能在路径root.**下赋予或撤销; - -注意: 下述sql语句需要赋予多个权限才可以使用: - -- 导入数据,需要赋予`READ_TIMESERIES`,`INSERT_TIMESERIES`两种权限。 - -``` -Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv -``` - -- 查询写回(SELECT_INTO) - - 需要所有 `select` 子句中源序列的 `READ_TIMESERIES` 权限 - - 需要所有 `into` 子句中目标序列 `INSERT_TIMESERIES` 权限 - - -``` -Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 -``` - -#### 用户名限制 - -IoTDB 规定用户名的字符长度不小于 4,其中用户名不能包含空格。 - -#### 密码限制 - -IoTDB 规定密码的字符长度不小于 4,其中密码不能包含空格,密码默认采用 MD5 进行加密。 - -#### 角色名限制 - -IoTDB 规定角色名的字符长度不小于 4,其中角色名不能包含空格。 - -#### 权限管理中的路径模式 - -一个路径模式的结果集包含了它的子模式的结果集的所有元素。例如,`root.sg.d.*`是`root.sg.*.*`的子模式,而`root.sg.**`不是`root.sg.*.*`的子模式。当用户被授予对某个路径模式的权限时,在他的DDL或DML中使用的模式必须是该路径模式的子模式,这保证了用户访问时间序列时不会超出他的权限范围。 - -#### 权限缓存 - -在分布式相关的权限操作中,在进行除了创建用户和角色之外的其他权限更改操作时,都会先清除与该用户(角色)相关的所有的`dataNode`的缓存信息,如果任何一台`dataNode`缓存信息清楚失败,这个权限更改的任务就会失败。 - -#### 非root用户限制进行的操作 - -目前以下IoTDB支持的sql语句只有`root`用户可以进行操作,且没有对应的权限可以赋予新用户。 - -##### TsFile管理 - -- 加载TsFile - -``` -Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' -``` - -- 删除TsFile文件 - -``` -Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' -``` - -- 卸载TsFile文件到指定目录 - -``` -Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' -``` - -##### 删除时间分区(实验性功能) - -``` -Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 -``` - -##### 连续查询 - -``` -Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END -``` - -##### 运维命令 - -- FLUSH - -``` -Eg: IoTDB > flush -``` - -- MERGE - -``` -Eg: IoTDB > MERGE -Eg: IoTDB > FULL MERGE -``` - -- CLEAR CACHE - -```sql -Eg: IoTDB > CLEAR CACHE -``` - -- SET STSTEM TO READONLY / WRITABLE - -``` -Eg: IoTDB > SET STSTEM TO READONLY / WRITABLE -``` - -- 查询终止 - -``` -Eg: IoTDB > KILL QUERY 1 -``` - -##### 水印工具 - -- 为新用户施加水印 - -``` -Eg: IoTDB > grant watermark_embedding to Alice -``` - -- 撤销水印 - -``` -Eg: IoTDB > revoke watermark_embedding from Alice -``` \ No newline at end of file diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md b/src/zh/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md index dc22e5b1..54b48c4f 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Security-Management_timecho.md @@ -156,507 +156,3 @@ white.list: # enable_audit_log_for_native_insert_api=true ``` -## 权限管理 - -IoTDB 为用户提供了权限管理操作,从而为用户提供对于数据的权限管理功能,保障数据的安全。 - -我们将通过以下几个具体的例子为您示范基本的用户权限操作,详细的 SQL 语句及使用方式详情请参见本文 [数据模式与概念章节](../Basic-Concept/Data-Model-and-Terminology.md)。同时,在 JAVA 编程环境中,您可以使用 [JDBC API](../API/Programming-JDBC.md) 单条或批量执行权限管理类语句。 - -### 基本概念 - -#### 用户 - -用户即数据库的合法使用者。一个用户与一个唯一的用户名相对应,并且拥有密码作为身份验证的手段。一个人在使用数据库之前,必须先提供合法的(即存于数据库中的)用户名与密码,使得自己成为用户。 - -#### 权限 - -数据库提供多种操作,并不是所有的用户都能执行所有操作。如果一个用户可以执行某项操作,则称该用户有执行该操作的权限。权限可分为数据管理权限(如对数据进行增删改查)以及权限管理权限(用户、角色的创建与删除,权限的赋予与撤销等)。数据管理权限往往需要一个路径来限定其生效范围,可使用[路径模式](../Basic-Concept/Data-Model-and-Terminology.md)灵活管理权限。 - -#### 角色 - -角色是若干权限的集合,并且有一个唯一的角色名作为标识符。用户通常和一个现实身份相对应(例如交通调度员),而一个现实身份可能对应着多个用户。这些具有相同现实身份的用户往往具有相同的一些权限。角色就是为了能对这样的权限进行统一的管理的抽象。 - -#### 默认用户及其具有的角色 - -初始安装后的 IoTDB 中有一个默认用户:root,默认密码为 root。该用户为管理员用户,固定拥有所有权限,无法被赋予、撤销权限,也无法被删除。 - -### 权限操作示例 - -根据本文中描述的 [样例数据](https://github.com/thulab/iotdb/files/4438687/OtherMaterial-Sample.Data.txt) 内容,IoTDB 的样例数据可能同时属于 ln, sgcc 等不同发电集团,不同的发电集团不希望其他发电集团获取自己的数据库数据,因此我们需要将不同的数据在集团层进行权限隔离。 - -#### 创建用户 - -使用 `CREATE USER ` 创建用户。例如,我们可以使用具有所有权限的root用户为 ln 和 sgcc 集团创建两个用户角色,名为 ln_write_user, sgcc_write_user,密码均为 write_pwd。建议使用反引号(`)包裹用户名。SQL 语句为: - -``` -CREATE USER `ln_write_user` 'write_pwd' -CREATE USER `sgcc_write_user` 'write_pwd' -``` - -此时使用展示用户的 SQL 语句: - -``` -LIST USER -``` - -我们可以看到这两个已经被创建的用户,结果如下: - -``` -IoTDB> CREATE USER `ln_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> CREATE USER `sgcc_write_user` 'write_pwd' -Msg: The statement is executed successfully. -IoTDB> LIST USER -+---------------+ -| user| -+---------------+ -| ln_write_user| -| root| -|sgcc_write_user| -+---------------+ -Total line number = 3 -It costs 0.157s -``` - -#### 赋予用户权限 - -此时,虽然两个用户已经创建,但是他们不具有任何权限,因此他们并不能对数据库进行操作,例如我们使用 ln_write_user 用户对数据库中的数据进行写入,SQL 语句为: - -``` -INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -``` - -此时,系统不允许用户进行此操作,会提示错误: - -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(1509465600000,true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -现在,我们用root用户分别赋予他们向对应 database 数据的写入权限. - -我们使用 `GRANT USER PRIVILEGES ON ` 语句赋予用户权限(注:其中,创建用户权限无需指定路径),例如: - -``` -GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -``` - -执行状态如下所示: - -``` -IoTDB> GRANT USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -IoTDB> GRANT USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -接着使用ln_write_user再尝试写入数据 - -``` -IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: The statement is executed successfully. -``` - -#### 撤销用户权限 - -授予用户权限后,我们可以使用 `REVOKE USER PRIVILEGES ON ` 来撤销已授予的用户权限(注:其中,撤销创建用户权限无需指定路径)。例如,用root用户撤销ln_write_user和sgcc_write_user的权限: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -``` - -执行状态如下所示: - -``` -REVOKE USER `ln_write_user` PRIVILEGES INSERT_TIMESERIES on root.ln.** -Msg: The statement is executed successfully. -REVOKE USER `sgcc_write_user` PRIVILEGES INSERT_TIMESERIES on root.sgcc1.**, root.sgcc2.** -Msg: The statement is executed successfully. -REVOKE USER `ln_write_user` PRIVILEGES CREATE_USER -Msg: The statement is executed successfully. -``` - -撤销权限后,ln_write_user就没有向root.ln.**写入数据的权限了。 - -``` -INSERT INTO root.ln.wf01.wt01(timestamp, status) values(1509465600000, true) -Msg: 602: No permissions for this operation, please add privilege INSERT_TIMESERIES. -``` - -#### SQL 语句 - -与权限相关的语句包括: - -* 创建用户 - -``` -CREATE USER ; -Eg: IoTDB > CREATE USER `thulab` 'passwd'; -``` - -* 删除用户 - -``` -DROP USER ; -Eg: IoTDB > DROP USER `xiaoming`; -``` - -* 创建角色 - -``` -CREATE ROLE ; -Eg: IoTDB > CREATE ROLE `admin`; -``` - -* 删除角色 - -``` -DROP ROLE ; -Eg: IoTDB > DROP ROLE `admin`; -``` - -* 赋予用户权限 - -``` -GRANT USER PRIVILEGES ON ; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES on root.ln.**, root.sgcc.**; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -- 赋予用户全部的权限 - -``` -GRANT USER PRIVILEGES ALL; -Eg: IoTDB > GRANT USER `tempuser` PRIVILEGES ALL; -``` - -* 赋予角色权限 - -``` -GRANT ROLE PRIVILEGES ON ; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES INSERT_TIMESERIES, DELETE_TIMESERIES ON root.sgcc.**, root.ln.**; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -- 赋予角色全部的权限 - -``` -GRANT ROLE PRIVILEGES ALL; -Eg: IoTDB > GRANT ROLE `temprole` PRIVILEGES ALL; -``` - -* 赋予用户角色 - -``` -GRANT TO ; -Eg: IoTDB > GRANT `temprole` TO tempuser; -``` - -* 撤销用户权限 - -``` -REVOKE USER PRIVILEGES ON ; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES DELETE_TIMESERIES on root.ln.**; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES CREATE_ROLE; -``` - -- 移除用户所有权限 - -``` -REVOKE USER PRIVILEGES ALL; -Eg: IoTDB > REVOKE USER `tempuser` PRIVILEGES ALL; -``` - -* 撤销角色权限 - -``` -REVOKE ROLE PRIVILEGES ON ; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES DELETE_TIMESERIES ON root.ln.**; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES CREATE_ROLE; -``` - -- 撤销角色全部的权限 - -``` -REVOKE ROLE PRIVILEGES ALL; -Eg: IoTDB > REVOKE ROLE `temprole` PRIVILEGES ALL; -``` - -* 撤销用户角色 - -``` -REVOKE FROM ; -Eg: IoTDB > REVOKE `temprole` FROM tempuser; -``` - -* 列出所有用户 - -``` -LIST USER -Eg: IoTDB > LIST USER -``` - -* 列出指定角色下所有用户 - -``` -LIST USER OF ROLE ; -Eg: IoTDB > LIST USER OF ROLE `roleuser`; -``` - -* 列出所有角色 - -``` -LIST ROLE -Eg: IoTDB > LIST ROLE -``` - -* 列出指定用户下所有角色 - -``` -LIST ROLE OF USER ; -Eg: IoTDB > LIST ROLE OF USER `tempuser`; -``` - -* 列出用户所有权限 - -``` -LIST PRIVILEGES USER ; -Eg: IoTDB > LIST PRIVILEGES USER `tempuser`; -``` - -* 列出用户在具体路径上相关联的权限 - -``` -LIST PRIVILEGES USER ON ; -Eg: IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.**, root.ln.wf01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -IoTDB> LIST PRIVILEGES USER `tempuser` ON root.ln.wf01.wt01.**; -+--------+-----------------------------------+ -| role| privilege| -+--------+-----------------------------------+ -| | root.ln.** : ALTER_TIMESERIES| -|temprole|root.ln.wf01.** : CREATE_TIMESERIES| -+--------+-----------------------------------+ -Total line number = 2 -It costs 0.005s -``` - -* 列出角色所有权限 - -``` -LIST PRIVILEGES ROLE ; -Eg: IoTDB > LIST PRIVILEGES ROLE `actor`; -``` - -* 列出角色在具体路径上相关联的权限 - -``` -LIST PRIVILEGES ROLE ON ; -Eg: IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.**, root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -IoTDB> LIST PRIVILEGES ROLE `temprole` ON root.ln.wf01.wt01.**; -+-----------------------------------+ -| privilege| -+-----------------------------------+ -|root.ln.wf01.** : CREATE_TIMESERIES| -+-----------------------------------+ -Total line number = 1 -It costs 0.005s -``` - -* 更新密码 - -``` -ALTER USER SET PASSWORD ; -Eg: IoTDB > ALTER USER `tempuser` SET PASSWORD 'newpwd'; -``` - - -### 其他说明 - -#### 用户、权限与角色的关系 - -角色是权限的集合,而权限和角色都是用户的一种属性。即一个角色可以拥有若干权限。一个用户可以拥有若干角色与权限(称为用户自身权限)。 - -目前在 IoTDB 中并不存在相互冲突的权限,因此一个用户真正具有的权限是用户自身权限与其所有的角色的权限的并集。即要判定用户是否能执行某一项操作,就要看用户自身权限或用户的角色的所有权限中是否有一条允许了该操作。用户自身权限与其角色权限,他的多个角色的权限之间可能存在相同的权限,但这并不会产生影响。 - -需要注意的是:如果一个用户自身有某种权限(对应操作 A),而他的某个角色有相同的权限。那么如果仅从该用户撤销该权限无法达到禁止该用户执行操作 A 的目的,还需要从这个角色中也撤销对应的权限,或者从这个用户将该角色撤销。同样,如果仅从上述角色将权限撤销,也不能禁止该用户执行操作 A。 - -同时,对角色的修改会立即反映到所有拥有该角色的用户上,例如对角色增加某种权限将立即使所有拥有该角色的用户都拥有对应权限,删除某种权限也将使对应用户失去该权限(除非用户本身有该权限)。 - -#### 系统所含权限列表 - -| 权限名称 | 说明 | 示例 | -| :------------------------ | :----------------------------------------------------------- | ------------------------------------------------------------ | -| CREATE\_DATABASE | 创建 database。包含设置 database 的权限和TTL。路径相关 | Eg1: `CREATE DATABASE root.ln;`
Eg2:`set ttl to root.ln 3600000;`
Eg3:`unset ttl to root.ln;` | -| DELETE\_DATABASE | 删除 database。路径相关 | Eg: `delete database root.ln;` | -| CREATE\_TIMESERIES | 创建时间序列。路径相关 | Eg1: 创建时间序列
`create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN;`
Eg2: 创建对齐时间序列
`create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY);` | -| INSERT\_TIMESERIES | 插入数据。路径相关 | Eg1: `insert into root.ln.wf02(timestamp,status) values(1,true);`
Eg2: `insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1)` | -| ALTER\_TIMESERIES | 修改时间序列标签。路径相关 | Eg1: `alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4;`
Eg2: `ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4);` | -| READ\_TIMESERIES | 查询数据。路径相关 | Eg1: `SHOW DATABASES;`
Eg2: `show child paths root.ln, show child nodes root.ln;`
Eg3: `show devices;`
Eg4: `show timeseries root.**;`
Eg5: `show schema templates;`
Eg6: `show all ttl`
Eg7: [数据查询](./Query-Data.md#概述)(这一节之下的查询语句均使用该权限)
Eg8: CVS格式数据导出
`./export-csv.bat -h 127.0.0.1 -p 6667 -u tempuser -pw root -td ./`
Eg9: 查询性能追踪
`tracing select * from root.**`
Eg10: UDF查询
`select example(*) from root.sg.d1`
Eg11: 查询触发器
`show triggers`
Eg12: 统计查询
`count devices` | -| DELETE\_TIMESERIES | 删除数据或时间序列。路径相关 | Eg1: 删除时间序列
`delete timeseries root.ln.wf01.wt01.status`
Eg2: 删除数据
`delete from root.ln.wf02.wt02.status where time < 10`
Eg3: 使用DROP关键字
`drop timeseries root.ln.wf01.wt01.status` | -| CREATE\_USER | 创建用户。路径无关 | Eg: `create user thulab 'passwd';` | -| DELETE\_USER | 删除用户。路径无关 | Eg: `drop user xiaoming;` | -| MODIFY\_PASSWORD | 修改所有用户的密码。路径无关。(没有该权限者仍然能够修改自己的密码。) | Eg: `alter user tempuser SET PASSWORD 'newpwd';` | -| LIST\_USER | 列出所有用户,列出具有某角色的所有用户,列出用户在指定路径下相关权限。路径无关 | Eg1: `list user;`
Eg2: `list user of role 'wirte_role';`
Eg3: `list privileges user admin;`
Eg4: `list privileges user 'admin' on root.sgcc.**;` | -| GRANT\_USER\_PRIVILEGE | 赋予用户权限。路径无关 | Eg: `grant user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| REVOKE\_USER\_PRIVILEGE | 撤销用户权限。路径无关 | Eg: `revoke user tempuser privileges DELETE_TIMESERIES on root.ln.**;` | -| GRANT\_USER\_ROLE | 赋予用户角色。路径无关 | Eg: `grant temprole to tempuser;` | -| REVOKE\_USER\_ROLE | 撤销用户角色。路径无关 | Eg: `revoke temprole from tempuser;` | -| CREATE\_ROLE | 创建角色。路径无关 | Eg: `create role admin;` | -| DELETE\_ROLE | 删除角色。路径无关 | Eg: `drop role admin;` | -| LIST\_ROLE | 列出所有角色,列出某用户下所有角色,列出角色在指定路径下相关权限。路径无关 | Eg1: `list role`
Eg2: `list role of user 'actor';`
Eg3: `list privileges role wirte_role;`
Eg4: `list privileges role wirte_role ON root.sgcc;` | -| GRANT\_ROLE\_PRIVILEGE | 赋予角色权限。路径无关 | Eg: `grant role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| REVOKE\_ROLE\_PRIVILEGE | 撤销角色权限。路径无关 | Eg: `revoke role temprole privileges DELETE_TIMESERIES ON root.ln.**;` | -| CREATE_FUNCTION | 注册 UDF。路径无关 | Eg: `create function example AS 'org.apache.iotdb.udf.UDTFExample';` | -| DROP_FUNCTION | 卸载 UDF。路径无关 | Eg: `drop function example` | -| CREATE_TRIGGER | 创建触发器。路径相关 | Eg1: `CREATE TRIGGER BEFORE INSERT ON AS `
Eg2: `CREATE TRIGGER AFTER INSERT ON AS ` | -| DROP_TRIGGER | 卸载触发器。路径相关 | Eg: `drop trigger 'alert-listener-sg1d1s1'` | -| CREATE_CONTINUOUS_QUERY | 创建连续查询。路径无关 | Eg: `CREATE CONTINUOUS QUERY cq1 RESAMPLE RANGE 40s BEGIN END` | -| DROP_CONTINUOUS_QUERY | 卸载连续查询。路径无关 | Eg1: `DROP CONTINUOUS QUERY cq3`
Eg2: `DROP CQ cq3` | -| SHOW_CONTINUOUS_QUERIES | 展示所有连续查询。路径无关 | Eg1: `SHOW CONTINUOUS QUERIES`
Eg2: `SHOW cqs` | -| UPDATE_TEMPLATE | 创建、删除模板。路径无关。 | Eg1: `create schema template t1(s1 int32)`
Eg2: `drop schema template t1` | -| READ_TEMPLATE | 查看所有模板、模板内容。 路径无关 | Eg1: `show schema templates`
Eg2: `show nodes in template t1` | -| APPLY_TEMPLATE | 挂载、卸载、激活、解除模板。路径有关。 | Eg1: `set schema template t1 to root.sg.d`
Eg2: `unset schema template t1 from root.sg.d`
Eg3: `create timeseries of schema template on root.sg.d`
Eg4: `delete timeseries of schema template on root.sg.d` | -| READ_TEMPLATE_APPLICATION | 查看模板的挂载路径和激活路径。路径无关 | Eg1: `show paths set schema template t1`
Eg2: `show paths using schema template t1` | - -注意: 路径无关的权限只能在路径root.**下赋予或撤销; - -注意: 下述sql语句需要赋予多个权限才可以使用: - -- 导入数据,需要赋予`READ_TIMESERIES`,`INSERT_TIMESERIES`两种权限。 - -``` -Eg: IoTDB > ./import-csv.bat -h 127.0.0.1 -p 6667 -u renyuhua -pw root -f dump0.csv -``` - -- 查询写回(SELECT_INTO) - - 需要所有 `select` 子句中源序列的 `READ_TIMESERIES` 权限 - - 需要所有 `into` 子句中目标序列 `INSERT_TIMESERIES` 权限 - - -``` -Eg: IoTDB > select s1, s1 into t1, t2 from root.sg.d1 limit 5 offset 1000 -``` - -#### 用户名限制 - -IoTDB 规定用户名的字符长度不小于 4,其中用户名不能包含空格。 - -#### 密码限制 - -IoTDB 规定密码的字符长度不小于 4,其中密码不能包含空格,密码默认采用 MD5 进行加密。 - -#### 角色名限制 - -IoTDB 规定角色名的字符长度不小于 4,其中角色名不能包含空格。 - -#### 权限管理中的路径模式 - -一个路径模式的结果集包含了它的子模式的结果集的所有元素。例如,`root.sg.d.*`是`root.sg.*.*`的子模式,而`root.sg.**`不是`root.sg.*.*`的子模式。当用户被授予对某个路径模式的权限时,在他的DDL或DML中使用的模式必须是该路径模式的子模式,这保证了用户访问时间序列时不会超出他的权限范围。 - -#### 权限缓存 - -在分布式相关的权限操作中,在进行除了创建用户和角色之外的其他权限更改操作时,都会先清除与该用户(角色)相关的所有的`dataNode`的缓存信息,如果任何一台`dataNode`缓存信息清楚失败,这个权限更改的任务就会失败。 - -#### 非root用户限制进行的操作 - -目前以下IoTDB支持的sql语句只有`root`用户可以进行操作,且没有对应的权限可以赋予新用户。 - -##### TsFile管理 - -- 加载TsFile - -``` -Eg: IoTDB > load '/Users/Desktop/data/1575028885956-101-0.tsfile' -``` - -- 删除TsFile文件 - -``` -Eg: IoTDB > remove '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' -``` - -- 卸载TsFile文件到指定目录 - -``` -Eg: IoTDB > unload '/Users/Desktop/data/data/root.vehicle/0/0/1575028885956-101-0.tsfile' '/data/data/tmp' -``` - -##### 删除时间分区(实验性功能) - -``` -Eg: IoTDB > DELETE PARTITION root.ln 0,1,2 -``` - -##### 连续查询 - -``` -Eg: IoTDB > CREATE CONTINUOUS QUERY cq1 BEGIN SELECT max_value(temperature) INTO temperature_max FROM root.ln.*.* GROUP BY time(10s) END -``` - -##### 运维命令 - -- FLUSH - -``` -Eg: IoTDB > flush -``` - -- MERGE - -``` -Eg: IoTDB > MERGE -Eg: IoTDB > FULL MERGE -``` - -- CLEAR CACHE - -```sql -Eg: IoTDB > CLEAR CACHE -``` - -- SET STSTEM TO READONLY / WRITABLE - -``` -Eg: IoTDB > SET STSTEM TO READONLY / WRITABLE -``` - -- 查询终止 - -``` -Eg: IoTDB > KILL QUERY 1 -``` - -##### 水印工具 - -- 为新用户施加水印 - -``` -Eg: IoTDB > grant watermark_embedding to Alice -``` - -- 撤销水印 - -``` -Eg: IoTDB > revoke watermark_embedding from Alice -``` \ No newline at end of file From c02f35f2bfecfa5f8c80ef50f6701c0d450efd17 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 25 Oct 2023 15:47:46 +0800 Subject: [PATCH 22/27] add words --- src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md | 4 ++-- src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md b/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md index dd809353..ac11edf7 100644 --- a/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md +++ b/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md @@ -32,9 +32,9 @@ IoTDB 具有以下特点: * 终端解压即用 * 终端-云端无缝连接(数据云端同步工具) * 低硬件成本的存储解决方案 - * 高压缩比的磁盘存储(10 亿数据点硬盘成本低于 1.4 元) + * 高压缩比的磁盘存储(无损压缩比可达 20:1以上) * 目录结构的时间序列组织管理方式 - * 支持复杂结构的智能网联设备的时间序列组织 + * 支持复杂结构的智能网联设备的时间序列组织(多层树形结构,层级数量无限制) * 支持大量同类物联网设备的时间序列组织 * 可用模糊方式对海量复杂的时间序列目录结构进行检索 * 高通量的时间序列数据读写 diff --git a/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md b/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md index 2e8bf051..0fab66b0 100644 --- a/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md +++ b/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md @@ -28,9 +28,9 @@ IoTDB 具有以下特点: * 终端解压即用 * 终端-云端无缝连接(数据云端同步工具) * 低硬件成本的存储解决方案 - * 高压缩比的磁盘存储(10 亿数据点硬盘成本低于 1.4 元) + * 高压缩比的磁盘存储(无损压缩比可达 20:1以上) * 目录结构的时间序列组织管理方式 - * 支持复杂结构的智能网联设备的时间序列组织 + * 支持复杂结构的智能网联设备的时间序列组织(多层树形结构,层级数量无限制) * 支持大量同类物联网设备的时间序列组织 * 可用模糊方式对海量复杂的时间序列目录结构进行检索 * 高通量的时间序列数据读写 From 428fa339acebc40aefa7386a5118a26d306d5255 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 25 Oct 2023 16:02:12 +0800 Subject: [PATCH 23/27] add words2 --- src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md | 4 +++- src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md b/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md index ac11edf7..2229a89e 100644 --- a/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md +++ b/src/zh/UserGuide/V1.1.x/IoTDB-Introduction/Features.md @@ -32,7 +32,9 @@ IoTDB 具有以下特点: * 终端解压即用 * 终端-云端无缝连接(数据云端同步工具) * 低硬件成本的存储解决方案 - * 高压缩比的磁盘存储(无损压缩比可达 20:1以上) + * 高压缩比的磁盘存储 + * 无损压缩比可达 20:1以上 + * 10 亿数据点硬盘成本低于 1.4 元 * 目录结构的时间序列组织管理方式 * 支持复杂结构的智能网联设备的时间序列组织(多层树形结构,层级数量无限制) * 支持大量同类物联网设备的时间序列组织 diff --git a/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md b/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md index 0fab66b0..9cf2417d 100644 --- a/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md +++ b/src/zh/UserGuide/V1.2.x/IoTDB-Introduction/Features.md @@ -28,7 +28,9 @@ IoTDB 具有以下特点: * 终端解压即用 * 终端-云端无缝连接(数据云端同步工具) * 低硬件成本的存储解决方案 - * 高压缩比的磁盘存储(无损压缩比可达 20:1以上) + * 高压缩比的磁盘存储 + * 无损压缩比可达 20:1以上 + * 10 亿数据点硬盘成本低于 1.4 元 * 目录结构的时间序列组织管理方式 * 支持复杂结构的智能网联设备的时间序列组织(多层树形结构,层级数量无限制) * 支持大量同类物联网设备的时间序列组织 From 5e99ef93e316e63e659d87157c87c8ca060fc29a Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Tue, 31 Oct 2023 18:53:39 +0800 Subject: [PATCH 24/27] update pipe doc --- .../V1.2.x/User-Manual/Data-Sync_timecho.md | 594 +++++++----------- .../User-Manual/Stage_Data-Sync_timecho.md | 536 ++++++++++++++++ 2 files changed, 777 insertions(+), 353 deletions(-) create mode 100644 src/zh/UserGuide/V1.2.x/User-Manual/Stage_Data-Sync_timecho.md diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md index 47997443..2f39036a 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md @@ -7,9 +7,9 @@ to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -19,123 +19,48 @@ --> -# IoTDB 数据同步 +数据同步是工业物联网的典型需求,通过数据同步机制,可实现不同数据库的数据共享,搭建完整的数据链路来满足内网外网数据互通、端*边云同步、**数据迁移、**异地灾备、读写负载分库*等需求。 -**IoTDB 数据同步功能可以将 IoTDB 的数据传输到另一个数据平台,我们将一个数据同步任务称为 Pipe。** +# 功能介绍 -**一个 Pipe 包含三个子任务(插件):** +## 同步任务 - 整体框架 -- 抽取(Extract) -- 处理(Process) -- 发送(Connect) - -**Pipe 允许用户自定义三个子任务的处理逻辑,通过类似 UDF 的方式处理数据。** 在一个 Pipe 中,上述的子任务分别由三种插件执行实现,数据会依次经过这三个插件进行处理:Pipe Extractor 用于抽取数据,Pipe Processor 用于处理数据,Pipe Connector 用于发送数据,最终数据将被发至外部系统。 - -**Pipe 任务的模型如下:** +一个数据同步任务称为 Pipe,如图所示,一个 Pipe 包含三个子阶段: ![任务模型图](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) -描述一个数据同步任务,本质就是描述 Pipe Extractor、Pipe Processor 和 Pipe Connector 插件的属性。用户可以通过 SQL 语句声明式地配置三个子任务的具体属性,通过组合不同的属性,实现灵活的数据 ETL 能力。 - -利用数据同步功能,可以搭建完整的数据链路来满足端*边云同步、异地灾备、读写负载分库*等需求。 - -## 快速开始 - -**🎯 目标:实现 IoTDB A -> IoTDB B 的全量数据同步** - -- 启动两个 IoTDB,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) -- 创建 A -> B 的 Pipe,在 A 上执行 - - ```sql - create pipe a2b - with connector ( - 'connector'='iotdb-thrift-connector', - 'connector.ip'='127.0.0.1', - 'connector.port'='6668' - ) - ``` -- 启动 A -> B 的 Pipe,在 A 上执行 - - ```sql - start pipe a2b - ``` -- 向 A 写入数据 +- 抽取(Extract):由 Extractor 插件实现,用于抽取数据 +- 处理(Process):由 Processor 插件实现,用于处理数据 +- 发送(Connect):由 Connector 插件实现,用于发送数据 - ```sql - INSERT INTO root.db.d(time, m) values (1, 1) - ``` -- 在 B 检查由 A 同步过来的数据 +通过 SQL 语句声明式地配置三个子任务的具体插件,组合每个插件不同的属性,可实现灵活的数据 ETL 能力。 - ```sql - SELECT ** FROM root - ``` +## 同步任务 - 创建 -> ❗️**注:目前的 IoTDB -> IoTDB 的数据同步实现并不支持 DDL 同步** -> -> 即:不支持 ttl,trigger,别名,模板,视图,创建/删除序列,创建/删除存储组等操作 -> -> **IoTDB -> IoTDB 的数据同步要求目标端 IoTDB:** -> -> * 开启自动创建元数据:需要人工配置数据类型的编码和压缩与发送端保持一致 -> * 不开启自动创建元数据:手工创建与源端一致的元数据 +使用 `CREATE PIPE` 语句来创建一条数据同步任务,下列属性中`PipeId`和`connector`必填,`extractor`和`processor`选填,输入SQL时注意 `EXTRACTOR `与 `CONNECTOR` 插件顺序不能替换。 -## 同步任务管理 +SQL 示例如下: -### 创建同步任务 - -可以使用 `CREATE PIPE` 语句来创建一条数据同步任务,示例 SQL 语句如下所示: - -```sql -CREATE PIPE -- PipeId 是能够唯一标定同步任务任务的名字 +```Go +CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 WITH EXTRACTOR ( - -- 默认的 IoTDB 数据抽取插件 + -- IoTDB 数据抽取插件,默认为 'iotdb-extractor' 'extractor' = 'iotdb-extractor', - -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + -- 以下为 IoTDB 数据抽取插件参数,此处为示例,详细参数见本文extractor参数部分 'extractor.pattern' = 'root.timecho', - -- 是否抽取历史数据 'extractor.history.enable' = 'true', - -- 描述被抽取的历史数据的时间范围,表示最早时间 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', - -- 描述被抽取的历史数据的时间范围,表示最晚时间 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', - -- 是否抽取实时数据 'extractor.realtime.enable' = 'true', - -- 描述实时数据的抽取方式 'extractor.realtime.mode' = 'hybrid', -) + 'extractor.forwarding-pipe-requests' = 'hybrid', WITH PROCESSOR ( - -- 默认的数据处理插件,即不做任何处理 + -- 数据处理插件,即不做任何处理 'processor' = 'do-nothing-processor', ) WITH CONNECTOR ( -- IoTDB 数据发送插件,目标端为 IoTDB - 'connector' = 'iotdb-thrift-connector', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip - 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port - 'connector.port' = '6667', -) -``` - -**创建同步任务时需要配置 PipeId 以及三个插件部分的参数:** - - -| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | -| --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | -| PipeId | 全局唯一标定一个同步任务的名称 | 必填 | - | - | - | -| extractor | Pipe Extractor 插件,负责在数据库底层抽取同步数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入同步任务 | 否 | -| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | -| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | - -示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据同步任务。IoTDB 还内置了其他的数据同步插件,**请查看“系统预置数据同步插件”一节**。 - -**一个最简的 CREATE PIPE 语句示例如下:** - -```sql -CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 -WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB - 'connector' = 'iotdb-thrift-connector', + 'connector' = '', -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip 'connector.ip' = '127.0.0.1', -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port @@ -143,327 +68,260 @@ WITH CONNECTOR ( ) ``` -其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 +## 同步任务 - 管理 -**注意:** +数据同步任务有三种状态:RUNNING、STOPPED和DROPPED。 -- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 -- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 -- CONNECTOR 具备自复用能力。对于不同的任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 +任务状态转换如下图所示: +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) - - 例如,有下面 pipe1, pipe2 两个任务的声明: +- 任务的初始状态为停止状态(STOPPED)。可以使用SQL语句启动任务,将状态从STOPPED转换为RUNNING。 - ```sql - CREATE PIPE pipe1 - WITH CONNECTOR ( - 'connector' = 'iotdb-thrift-connector', - 'connector.thrift.host' = 'localhost', - 'connector.thrift.port' = '9999', - ) +- 用户可以使用SQL语句手动将运行状态的任务停止,将状态从RUNNING转换为STOPPED。 - CREATE PIPE pipe2 - WITH CONNECTOR ( - 'connector' = 'iotdb-thrift-connector', - 'connector.thrift.port' = '9999', - 'connector.thrift.host' = 'localhost', - ) - ``` +- 当任务遇到无法恢复的错误时,其状态会自动从RUNNING转换为STOPPED,这表示任务无法继续执行数据同步操作。 - - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 -- 在 extractor 为默认的 iotdb-extractor,且 extractor.forwarding-pipe-requests 为默认值 true 时,请不要构建出包含数据循环同步的应用场景(会导致无限循环): +- 如果需要删除一个任务,可以使用相应命令。删除之前无需转换为STOPPED状态。 - - IoTDB A -> IoTDB B -> IoTDB A - - IoTDB A -> IoTDB A +我们提供以下SQL语句对同步任务进行状态管理。 ### 启动任务 -CREATE PIPE 语句成功执行后,任务相关实例会被创建,但整个任务的运行状态会被置为 STOPPED,即任务不会立刻处理数据。 +创建之后,任务不会立即被处理,需要启动任务。使用`START PIPE`语句来启动任务,从而开始处理数据: -可以使用 START PIPE 语句使任务开始处理数据: - -```sql -START PIPE +```Go +START PIPE ``` ### 停止任务 -使用 STOP PIPE 语句使任务停止处理数据: +停止处理数据: -```sql +```Go STOP PIPE ``` -### 删除任务 +### 删除任务 -使用 DROP PIPE 语句使任务停止处理数据(当任务状态为 RUNNING 时),然后删除整个任务同步任务: +删除指定任务: -```sql +```Go DROP PIPE ``` -用户在删除任务前,不需要执行 STOP 操作。 - -### 展示任务 +### 查看任务 -使用 SHOW PIPES 语句查看所有任务: +查看全部任务: -```sql +```Go SHOW PIPES ``` -查询结果如下: - -```sql -+-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ -| ID| CreationTime | State|PipeExtractor|PipeProcessor|PipeConnector|ExceptionMessage| -+-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ -|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| None| -+-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ -|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| -+-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ -``` - -可以使用 `` 指定想看的某个同步任务状态: +查看指定任务: -```sql +```Go SHOW PIPE ``` -您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 +## 插件 -```sql -SHOW PIPES -WHERE CONNECTOR USED BY -``` +为了使得整体架构更加灵活以匹配不同的同步场景需求,在上述同步任务框架中IoTDB支持进行插件组装。系统为您预置了一些常用插件可直接使用,同时您也可以自定义 processor 插件和 connector 插件,并加载至IoTDB系统进行使用。 -### 任务运行状态迁移 +| 模块 | 插件 | 预置插件 | 自定义插件 | +| --- | --- | --- | --- | +| 抽取(Extract) | Extractor 插件 | iotdb-extractor | 不支持 | +| 处理(Process) | Processor 插件 | do-nothing-processor | 支持 | +| 发送(Connect) | Connector 插件 | iotdb-thrift-sync-connector iotdb-thrift-async-connector iotdb-legacy-pipe-connector iotdb-air-gap-connector websocket - connector | 支持 | -一个数据同步 pipe 在其被管理的生命周期中会经过多种状态: +### 预置插件 -- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: - - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 - - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED - - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED -- **RUNNING:** pipe 正在正常工作 -- **DROPPED:** pipe 任务被永久删除 +预置插件如下: -下图表明了所有状态以及状态的迁移: +| 插件名称 | 类型 | 介绍 | 适用版本 | +| ---------------------------- | ---- | ------------------------------------------------------------ | --------- | +| iotdb-extractor | extractor 插件 | 抽取 IoTDB 内部的历史或实时数据进入 pipe | 1.2.x | +| do-nothing-processor | processor 插件 | 不对 extractor 传入的事件做任何的处理 | 1.2.x | +| iotdb-thrift-sync-connector | connector 插件 | 主要用于 IoTDB(v1.2.0及以上)与 IoTDB(v1.2.0及以上)之间的数据传输。 使用 Thrift RPC 框架传输数据,单线程 blocking IO 模型 | 1.2.x | +| iotdb-thrift-async-connector | connector 插件 | 用于 IoTDB(v1.2.0及以上)与 IoTDB(v1.2.0及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景 | 1.2.x | +| iotdb-legacy-pipe-connector | connector 插件 | 用于 IoTDB(v1.2.0及以上)与低版本的 IoTDB (V1.2.0以前)之间的数据传输。 使用 Thrift RPC 框架传输数据 | 1.2.x | +| iotdb-air-gap-connector | connector 插件 | 用于 IoTDB(v1.2.2+)向 IoTDB(v1.2.2+)跨单向数据网闸的数据同步。支持的网闸型号包括南瑞 Syskeeper 2000 等 | 1.2.1以上 | +| websocket - connector | connector 插件 | 用于flink sql connector 传输数据 | 1.2.2以上 | -![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) +每个插件的属性参考[参数说明](#connector-参数)。 +### 自定义插件 -## 系统预置数据同步插件 +自定义插件方法参考[自定义流处理插件开发](Streaming_timecho.md#自定义流处理插件开发)一章。 -### 查看预置插件 +### 查看插件 -用户可以按需查看系统中的插件。查看插件的语句如图所示。 +查看系统中的插件(含自定义与内置插件)可以用以下语句: -```sql +```Go SHOW PIPEPLUGINS ``` -### 预置 extractor 插件 - -#### iotdb-extractor - -作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 - - -| key | value | value 取值范围 | required or optional with default | -| ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | -| extractor | iotdb-extractor | String: iotdb-extractor | required | -| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | -| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | optional: true | -| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | -| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | -| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | optional: true | -| extractor.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | optional: hybrid | -| extractor.forwarding-pipe-requests | 是否转发由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | optional: true | - -> 🚫 **extractor.pattern 参数说明** -> -> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) -> * 在底层实现中,当检测到 pattern 为 root(默认值)时,同步效率较高,其他任意格式都将降低性能 -> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: -> -> * root.aligned.1TS -> * root.aligned.1TS.\`1\` -> * root.aligned.100TS -> -> 的数据会被同步; -> -> * root.aligned.\`1\` -> * root.aligned.\`123\` -> -> 的数据不会被同步。 -> * root.\_\_system 的数据不会被 pipe 抽取,即不会被同步到目标端。用户虽然可以在 extractor.pattern 中包含任意前缀,包括带有(或覆盖) root.\__system 的前缀,但是 root.__system 下的数据总是会被 pipe 忽略的 - -> ❗️**extractor.history 的 start-time,end-time 参数说明** -> -> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 - -> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** -> -> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 -> * **arrival time:** 数据到达 IoTDB 系统内的时间。 -> -> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 - -> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** -> -> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 -> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 -> -> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** -> -> 用户可以指定 iotdb-extractor 进行: -> -> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) -> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) -> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) -> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` - -> 📌 **extractor.realtime.mode:数据抽取的模式** -> -> * log:该模式下,任务仅使用操作日志进行数据处理、发送 -> * file:该模式下,任务仅使用数据文件进行数据处理、发送 -> * hybrid:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 - -> 🍕 **extractor.forwarding-pipe-requests:是否允许转发从另一 pipe 传输而来的数据** -> -> * 如果要使用 pipe 构建 A -> B -> C 的数据同步,那么 B -> C 的 pipe 需要将该参数为 true 后,A -> B 中 A 通过 pipe 写入 B 的数据才能被正确转发到 C -> * 如果要使用 pipe 构建 A \<-> B 的双向数据同步(双活),那么 A -> B 和 B -> A 的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发 - -### 预置 processor 插件 - -#### do-nothing-processor - -作用:不对 extractor 传入的事件做任何的处理。 - - -| key | value | value 取值范围 | required or optional with default | -| --------- | -------------------- | ---------------------------- | --------------------------------- | -| processor | do-nothing-processor | String: do-nothing-processor | required | - -### 预置 connector 插件 +返回结果如下(1.2.2 版本): + +```Go +IoTDB> SHOW PIPEPLUGINS ++----------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------------+ +| PluginName|PluginType| ClassName| PluginJar| ++----------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------------+ +| DO-NOTHING-CONNECTOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.DoNothingConnector| | +| DO-NOTHING-PROCESSOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.processor.DoNothingProcessor| | +| IOTDB-AIR-GAP-CONNECTOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.IoTDBAirGapConnector| | +| IOTDB-EXTRACTOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.extractor.IoTDBExtractor| | +| IOTDB-LEGACY-PIPE-CONNECTOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.IoTDBLegacyPipeConnector| | +|IOTDB-THRIFT-ASYNC-CONNECTOR| Builtin|org.apache.iotdb.commons.pipe.plugin.builtin.connector.IoTDBThriftAsyncConnector| | +| IOTDB-THRIFT-CONNECTOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.IoTDBThriftConnector| | +| IOTDB-THRIFT-SYNC-CONNECTOR| Builtin| org.apache.iotdb.commons.pipe.plugin.builtin.connector.IoTDBThriftSyncConnector| | ++----------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------------+ +``` -#### iotdb-thrift-sync-connector(别名:iotdb-thrift-connector) +# 使用示例 -作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 -使用 Thrift RPC 框架传输数据,单线程 blocking IO 模型。 -保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致。 +## 全量数据同步 -限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 +创建一个名为 A2B, 功能为同步 A IoTDB 到 B IoTDB 间的全量数据,数据链路如下图所示: +![](https://alioss.timecho.com/docs/img/w1.png) +可使用如下语句: -| key | value | value 取值范围 | required or optional with default | -| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | -| connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | -| connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | optional: true | -| connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | optional: 1 | -| connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | optional: 16 * 1024 * 1024 (16MiB) | +```Go +create pipe A2B +with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6668' +) +``` -> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 +## 部分数据同步 -#### iotdb-thrift-async-connector +创建一个名为 实时数据, 功能为同步 A IoTDB 到 B IoTDB 间的2023年8月23日8点到2023年10月23日8点的数据,数据链路如下图所示。 -作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 -使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景。 -不保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致,但是保证数据发送的完整性(at-least-once)。 +![](https://alioss.timecho.com/docs/img/w2.png) -限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 +可使用如下语句: +```Go +create pipe A2B +WITH EXTRACTOR ( +'extractor'= 'iotdb-extractor', +'extractor.realtime.enable' = 'false', +'extractor.realtime.mode'='file', +'extractor.history.start-time' = '2023.08.23T08:00:00+00:00', +'extractor.history.end-time' = '2023.10.23T08:00:00+00:00') +with connector ( +'connector'='iotdb-thrift-async-connector', +'connector.node-urls'='xxxx:6668', +'connector.batch.enable'='false') +``` +> 📌 'extractor.realtime.mode'='file'表示实时数据的抽取模式为 file 模式,该模式下,任务仅使用数据文件进行数据处理、发送。 -| key | value | value 取值范围 | required or optional with default | -| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | -| connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | -| connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | optional: true | -| connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | optional: 1 | -| connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | optional: 16 * 1024 * 1024 (16MiB) | +> 📌'extractor.realtime.enable' = 'false', 表示不同步实时数据,即创建该任务后到达的数据都不传输。 -> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 +> 📌 start-time,end-time 应为 ISO 格式。 +## 双向数据传输 -#### iotdb-legacy-pipe-connector +要实现两个 IoTDB 之间互相备份,实时同步的功能,如下图所示: -作用:主要用于 IoTDB(v1.2.0+)向更低版本的 IoTDB 传输数据,使用 v1.2.0 版本前的数据同步(Sync)协议。 -使用 Thrift RPC 框架传输数据。单线程 sync blocking IO 模型,传输性能较弱。 +![](https://alioss.timecho.com/docs/img/w3.png) -限制:源端 IoTDB 版本需要在 v1.2.0+,目标端 IoTDB 版本可以是 v1.2.0+、v1.1.x(更低版本的 IoTDB 理论上也支持,但是未经测试)。 +可创建两个子任务, 功能为双向同步 A IoTDB 到 B IoTDB 间的实时数据,在 A IoTDB 上执行下列语句: -注意:理论上 v1.2.0+ IoTDB 可作为 v1.2.0 版本前的任意版本的数据同步(Sync)接收端。 +```Go +create pipe AB +with extractor ( + 'extractor.history.enable' = 'false', + 'extractor.forwarding-pipe-requests' = 'false', +with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6668' +) +``` +在 B IoTDB 上执行下列语句: + +```Go +create pipe BA +with extractor ( + 'extractor.history.enable' = 'false', + 'extractor.forwarding-pipe-requests' = 'false', +with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6667' +) +``` +> 📌 'extractor.history.enable' = 'false'表示不传输历史数据,即不同步创建该任务前的数据。 -| key | value | value 取值范围 | required or optional with default | -| ------------------ | --------------------------------------------------------------------- | ----------------------------------- | --------------------------------- | -| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | -| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | -| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | -| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | optional: 1.1 | +> 📌 'extractor.forwarding-pipe-requests' = 'false'表示不转发从另一 pipe 传输而来的数据,A 和 B 上的的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发。 -> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 -#### iotdb-air-gap-connector +## 级联数据传输 -作用:用于 IoTDB(v1.2.2+)向 IoTDB(v1.2.2+)跨单向数据网闸的数据同步。支持的网闸型号包括南瑞 Syskeeper 2000 等。 -该 Connector 使用 Java 自带的 Socket 实现数据传输,单线程 blocking IO 模型,其性能与 iotdb-thrift-sync-connector 相当。 -保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致。 +要实现 A IoTDB 到 B IoTDB 到 C IoTDB 之间的级联数据传输链路,如下图所示: -场景:例如,在电力系统的规范中 +![](https://alioss.timecho.com/docs/img/w4.png) -> 1.I/II 区与 III 区之间的应用程序禁止采用 SQL 命令访问数据库和基于 B/S 方式的双向数据传输 -> -> 2.I/II 区与 III 区之间的数据通信,传输的启动端由内网发起,反向的应答报文不容许携带数据,应用层的应答报文最多为 1 个字节,并且 1 个字节为全 0 或者全 1 两种状态 +创建一个名为 AB 的pipe,在 A IoTDB 上执行下列语句: -限制: +```Go +create pipe AB +with extractor ( + 'extractor.forwarding-pipe-requests', +with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6668' +) +``` -1. 源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.2+。 -2. 单向数据网闸需要允许 TCP 请求跨越,且每一个请求可返回一个全 1 或全 0 的 byte。 -3. 目标端 IoTDB 需要在 iotdb-common.properties 内,配置 - a. pipe_air_gap_receiver_enabled=true - b. pipe_air_gap_receiver_port 配置 receiver 的接收端口 +创建一个名为 BC 的pipe,在 B IoTDB 上执行下列语句: +```Go +create pipe BC +with extractor ( + 'extractor.forwarding-pipe-requests' = 'false', +with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6669' +) +``` -| key | value | value 取值范围 | required or optional with default | -| -------------------------------------- | ---------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | -| connector | iotdb-air-gap-connector | String: iotdb-air-gap-connector | required | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | -| connector.air-gap.handshake-timeout-ms | 发送端与接收端在首次尝试建立连接时握手请求的超时时长,单位:毫秒 | Integer | optional: 5000 | +## 跨网闸数据传输 -> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 +创建一个名为 A2B 的pipe,实现内网服务器上的 A,经由单向网闸,传输数据到外网服务器上的B,如下图所示: -#### do-nothing-connector +![](https://alioss.timecho.com/docs/img/w5.png) -作用:不对 processor 传入的事件做任何的处理。 +配置网闸后,在 A IoTDB 上执行下列语句: -| key | value | value 取值范围 | required or optional with default | -| --------- | -------------------- | ---------------------------- | --------------------------------- | -| connector | do-nothing-connector | String: do-nothing-connector | required | +```Go +create pipe A2B +with connector ( + 'connector'='iotdb-air-gap-connector', + 'connector.ip'='10.53.53.53', + 'connector.port'='9780' +) +``` -## 权限管理 +# 参考:注意事项 -| 权限名称 | 描述 | -| ----------- | -------------------- | -| CREATE_PIPE | 注册任务。路径无关。 | -| START_PIPE | 开启任务。路径无关。 | -| STOP_PIPE | 停止任务。路径无关。 | -| DROP_PIPE | 卸载任务。路径无关。 | -| SHOW_PIPES | 查询任务。路径无关。 | +- 使用数据同步功能,请保证接收端开启自动创建元数据; +- Pipe 中的数据含义: -## 配置参数 +1. 历史数据抽取:所有 arrival time < 创建 pipe 时当前系统时间的数据称为历史数据 +2. 实时数据抽取:所有 arrival time >= 创建 pipe 时当前系统时间的数据称为实时数据 +3. 全量数据 = 历史数据 + 实时数据 -在 iotdb-common.properties 中: +- 可通过修改 IoTDB 配置文件(iotdb-common.properties)以调整数据同步的参数,如同步数据存储目录等。完整配置如下: -```Properties +```Go #################### ### Pipe Configuration #################### @@ -501,36 +359,66 @@ SHOW PIPEPLUGINS # pipe_air_gap_receiver_port=9780 ``` -## 功能特性 +# 参考:参数说明 -### 最少一次语义保证 **at-least-once** +## extractor 参数 -数据同步功能向外部系统传输数据时,提供 at-least-once 的传输语义。在大部分场景下,同步功能可提供 exactly-once 保证,即所有数据被恰好同步一次。 +| key | value | value 取值范围 | 是否必填 |默认取值| +| ---------------------------------- | ------------------------------------------------ | -------------------------------------- | -------- |------| +| extractor | iotdb-extractor | String: iotdb-extractor | 必填 | - | +| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | 选填 | root | +| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | 选填 | true | +| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | 选填 | Long.MIN_VALUE | +| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | 选填 | Long.MAX_VALUE | +| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | 选填 | true | +| extractor.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | 选填 | hybrid | +| extractor.forwarding-pipe-requests | 是否转发由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | 选填 | true | -但是在以下场景中,可能存在部分数据被同步多次 **(断点续传)** 的情况: -- 临时的网络故障:某次数据传输请求失败后,系统会进行重试发送,直至到达最大尝试次数 -- Pipe 插件逻辑实现异常:插件运行中抛出错误,系统会进行重试发送,直至到达最大尝试次数 -- 数据节点宕机、重启等导致的数据分区切主:分区变更完成后,受影响的数据会被重新传输 -- 集群不可用:集群可用后,受影响的数据会重新传输 +## connector 参数 -### 源端:数据写入与 Pipe 处理、发送数据异步解耦 - -数据同步功能中,数据传输采用的是异步复制模式。 +#### iotdb-thrift-sync-connector(别名:iotdb-thrift-connector) -数据同步与写入操作完全脱钩,不存在对写入关键路径的影响。该机制允许框架在保证持续数据同步的前提下,保持时序数据库的写入速度。 +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| --------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------------------------------------- | +| connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | 必填 | | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 选填 | 与 connector.ip:connector.port 任选其一填写 | +| connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | +| connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | +| connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 -### 源端:可自适应数据写入负载的数据传输策略 +#### iotdb-thrift-async-connector -支持根据写入负载,动态调整数据传输方式,同步默认使用 TsFile 文件与操作流动态混合传输(`'extractor.realtime.mode'='hybrid'`)。 +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| --------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------------------------------------- | +| connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | 必填 | | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 选填 | 与 connector.ip:connector.port 任选其一填写 | +| connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | +| connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | +| connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 | 16 * 1024 * 1024 (16MiB) | -在数据写入负载高时,优先选择 TsFile 传输的方式。TsFile 压缩比高,节省网络带宽。 -在数据写入负载低时,优先选择操作流同步传输的方式。操作流传输实时性高。 +#### iotdb-legacy-pipe-connector -### 源端:高可用集群部署时,Pipe 服务高可用 +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| ------------------ | ------------------------------------------------------------ | ----------------------------------- | -------- | -------- | +| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | 必填 | - | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | 选填 | - | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | 选填 | - | +| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | 选填 | root | +| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | 选填 | root | +| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | 选填 | 1.1 | -当发送端 IoTDB 为高可用集群部署模式时,数据同步服务也将是高可用的。 数据同步框架将监控每个数据节点的数据同步进度,并定期做轻量级的分布式一致性快照以保存同步状态。 +#### iotdb-air-gap-connector -- 当发送端集群某数据节点宕机时,数据同步框架可以利用一致性快照以及保存在副本上的数据快速恢复同步,以此实现数据同步服务的高可用。 -- 当发送端集群整体宕机并重启时,数据同步框架也能使用快照恢复同步服务。 +| key | value | value 取值范围 | 是否必填 | 默认取值 | +| -------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------------------------------------- | +| connector | iotdb-air-gap-connector | String: iotdb-air-gap-connector | 必填 | | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 选填 | 与 connector.ip:connector.port 任选其一填写 | +| connector.air-gap.handshake-timeout-ms | 发送端与接收端在首次尝试建立连接时握手请求的超时时长,单位:毫秒 | Integer | 选填 | 5000 | \ No newline at end of file diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Stage_Data-Sync_timecho.md b/src/zh/UserGuide/V1.2.x/User-Manual/Stage_Data-Sync_timecho.md new file mode 100644 index 00000000..47997443 --- /dev/null +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Stage_Data-Sync_timecho.md @@ -0,0 +1,536 @@ + + +# IoTDB 数据同步 + +**IoTDB 数据同步功能可以将 IoTDB 的数据传输到另一个数据平台,我们将一个数据同步任务称为 Pipe。** + +**一个 Pipe 包含三个子任务(插件):** + +- 抽取(Extract) +- 处理(Process) +- 发送(Connect) + +**Pipe 允许用户自定义三个子任务的处理逻辑,通过类似 UDF 的方式处理数据。** 在一个 Pipe 中,上述的子任务分别由三种插件执行实现,数据会依次经过这三个插件进行处理:Pipe Extractor 用于抽取数据,Pipe Processor 用于处理数据,Pipe Connector 用于发送数据,最终数据将被发至外部系统。 + +**Pipe 任务的模型如下:** + +![任务模型图](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) + +描述一个数据同步任务,本质就是描述 Pipe Extractor、Pipe Processor 和 Pipe Connector 插件的属性。用户可以通过 SQL 语句声明式地配置三个子任务的具体属性,通过组合不同的属性,实现灵活的数据 ETL 能力。 + +利用数据同步功能,可以搭建完整的数据链路来满足端*边云同步、异地灾备、读写负载分库*等需求。 + +## 快速开始 + +**🎯 目标:实现 IoTDB A -> IoTDB B 的全量数据同步** + +- 启动两个 IoTDB,A(datanode -> 127.0.0.1:6667) B(datanode -> 127.0.0.1:6668) +- 创建 A -> B 的 Pipe,在 A 上执行 + + ```sql + create pipe a2b + with connector ( + 'connector'='iotdb-thrift-connector', + 'connector.ip'='127.0.0.1', + 'connector.port'='6668' + ) + ``` +- 启动 A -> B 的 Pipe,在 A 上执行 + + ```sql + start pipe a2b + ``` +- 向 A 写入数据 + + ```sql + INSERT INTO root.db.d(time, m) values (1, 1) + ``` +- 在 B 检查由 A 同步过来的数据 + + ```sql + SELECT ** FROM root + ``` + +> ❗️**注:目前的 IoTDB -> IoTDB 的数据同步实现并不支持 DDL 同步** +> +> 即:不支持 ttl,trigger,别名,模板,视图,创建/删除序列,创建/删除存储组等操作 +> +> **IoTDB -> IoTDB 的数据同步要求目标端 IoTDB:** +> +> * 开启自动创建元数据:需要人工配置数据类型的编码和压缩与发送端保持一致 +> * 不开启自动创建元数据:手工创建与源端一致的元数据 + +## 同步任务管理 + +### 创建同步任务 + +可以使用 `CREATE PIPE` 语句来创建一条数据同步任务,示例 SQL 语句如下所示: + +```sql +CREATE PIPE -- PipeId 是能够唯一标定同步任务任务的名字 +WITH EXTRACTOR ( + -- 默认的 IoTDB 数据抽取插件 + 'extractor' = 'iotdb-extractor', + -- 路径前缀,只有能够匹配该路径前缀的数据才会被抽取,用作后续的处理和发送 + 'extractor.pattern' = 'root.timecho', + -- 是否抽取历史数据 + 'extractor.history.enable' = 'true', + -- 描述被抽取的历史数据的时间范围,表示最早时间 + 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', + -- 描述被抽取的历史数据的时间范围,表示最晚时间 + 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', + -- 是否抽取实时数据 + 'extractor.realtime.enable' = 'true', + -- 描述实时数据的抽取方式 + 'extractor.realtime.mode' = 'hybrid', +) +WITH PROCESSOR ( + -- 默认的数据处理插件,即不做任何处理 + 'processor' = 'do-nothing-processor', +) +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +**创建同步任务时需要配置 PipeId 以及三个插件部分的参数:** + + +| 配置项 | 说明 | 是否必填 | 默认实现 | 默认实现说明 | 是否允许自定义实现 | +| --------- | ------------------------------------------------- | --------------------------- | -------------------- | ------------------------------------------------------ | ------------------------- | +| PipeId | 全局唯一标定一个同步任务的名称 | 必填 | - | - | - | +| extractor | Pipe Extractor 插件,负责在数据库底层抽取同步数据 | 选填 | iotdb-extractor | 将数据库的全量历史数据和后续到达的实时数据接入同步任务 | 否 | +| processor | Pipe Processor 插件,负责处理数据 | 选填 | do-nothing-processor | 对传入的数据不做任何处理 | | +| connector | Pipe Connector 插件,负责发送数据 | 必填 | - | - | | + +示例中,使用了 iotdb-extractor、do-nothing-processor 和 iotdb-thrift-connector 插件构建数据同步任务。IoTDB 还内置了其他的数据同步插件,**请查看“系统预置数据同步插件”一节**。 + +**一个最简的 CREATE PIPE 语句示例如下:** + +```sql +CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 +WITH CONNECTOR ( + -- IoTDB 数据发送插件,目标端为 IoTDB + 'connector' = 'iotdb-thrift-connector', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip + 'connector.ip' = '127.0.0.1', + -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port + 'connector.port' = '6667', +) +``` + +其表达的语义是:将本数据库实例中的全量历史数据和后续到达的实时数据,同步到目标为 127.0.0.1:6667 的 IoTDB 实例上。 + +**注意:** + +- EXTRACTOR 和 PROCESSOR 为选填配置,若不填写配置参数,系统则会采用相应的默认实现 +- CONNECTOR 为必填配置,需要在 CREATE PIPE 语句中声明式配置 +- CONNECTOR 具备自复用能力。对于不同的任务,如果他们的 CONNECTOR 具备完全相同 KV 属性的(所有属性的 key 对应的 value 都相同),**那么系统最终只会创建一个 CONNECTOR 实例**,以实现对连接资源的复用。 + + - 例如,有下面 pipe1, pipe2 两个任务的声明: + + ```sql + CREATE PIPE pipe1 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.host' = 'localhost', + 'connector.thrift.port' = '9999', + ) + + CREATE PIPE pipe2 + WITH CONNECTOR ( + 'connector' = 'iotdb-thrift-connector', + 'connector.thrift.port' = '9999', + 'connector.thrift.host' = 'localhost', + ) + ``` + + - 因为它们对 CONNECTOR 的声明完全相同(**即使某些属性声明时的顺序不同**),所以框架会自动对它们声明的 CONNECTOR 进行复用,最终 pipe1, pipe2 的CONNECTOR 将会是同一个实例。 +- 在 extractor 为默认的 iotdb-extractor,且 extractor.forwarding-pipe-requests 为默认值 true 时,请不要构建出包含数据循环同步的应用场景(会导致无限循环): + + - IoTDB A -> IoTDB B -> IoTDB A + - IoTDB A -> IoTDB A + +### 启动任务 + +CREATE PIPE 语句成功执行后,任务相关实例会被创建,但整个任务的运行状态会被置为 STOPPED,即任务不会立刻处理数据。 + +可以使用 START PIPE 语句使任务开始处理数据: + +```sql +START PIPE +``` + +### 停止任务 + +使用 STOP PIPE 语句使任务停止处理数据: + +```sql +STOP PIPE +``` + +### 删除任务 + +使用 DROP PIPE 语句使任务停止处理数据(当任务状态为 RUNNING 时),然后删除整个任务同步任务: + +```sql +DROP PIPE +``` + +用户在删除任务前,不需要执行 STOP 操作。 + +### 展示任务 + +使用 SHOW PIPES 语句查看所有任务: + +```sql +SHOW PIPES +``` + +查询结果如下: + +```sql ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +| ID| CreationTime | State|PipeExtractor|PipeProcessor|PipeConnector|ExceptionMessage| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-kafka|2022-03-30T20:58:30.689|RUNNING| ...| ...| ...| None| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +|iotdb-iotdb|2022-03-31T12:55:28.129|STOPPED| ...| ...| ...| TException: ...| ++-----------+-----------------------+-------+-------------+-------------+-------------+----------------+ +``` + +可以使用 `` 指定想看的某个同步任务状态: + +```sql +SHOW PIPE +``` + +您也可以通过 where 子句,判断某个 \ 使用的 Pipe Connector 被复用的情况。 + +```sql +SHOW PIPES +WHERE CONNECTOR USED BY +``` + +### 任务运行状态迁移 + +一个数据同步 pipe 在其被管理的生命周期中会经过多种状态: + +- **STOPPED:** pipe 处于停止运行状态。当管道处于该状态时,有如下几种可能: + - 当一个 pipe 被成功创建之后,其初始状态为暂停状态 + - 用户手动将一个处于正常运行状态的 pipe 暂停,其状态会被动从 RUNNING 变为 STOPPED + - 当一个 pipe 运行过程中出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- **RUNNING:** pipe 正在正常工作 +- **DROPPED:** pipe 任务被永久删除 + +下图表明了所有状态以及状态的迁移: + +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) + +## 系统预置数据同步插件 + +### 查看预置插件 + +用户可以按需查看系统中的插件。查看插件的语句如图所示。 + +```sql +SHOW PIPEPLUGINS +``` + +### 预置 extractor 插件 + +#### iotdb-extractor + +作用:抽取 IoTDB 内部的历史或实时数据进入 pipe。 + + +| key | value | value 取值范围 | required or optional with default | +| ---------------------------------- | ------------------------------------------------ | -------------------------------------- | --------------------------------- | +| extractor | iotdb-extractor | String: iotdb-extractor | required | +| extractor.pattern | 用于筛选时间序列的路径前缀 | String: 任意的时间序列前缀 | optional: root | +| extractor.history.enable | 是否同步历史数据 | Boolean: true, false | optional: true | +| extractor.history.start-time | 同步历史数据的开始 event time,包含 start-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MIN_VALUE | +| extractor.history.end-time | 同步历史数据的结束 event time,包含 end-time | Long: [Long.MIN_VALUE, Long.MAX_VALUE] | optional: Long.MAX_VALUE | +| extractor.realtime.enable | 是否同步实时数据 | Boolean: true, false | optional: true | +| extractor.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | optional: hybrid | +| extractor.forwarding-pipe-requests | 是否转发由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | optional: true | + +> 🚫 **extractor.pattern 参数说明** +> +> * Pattern 需用反引号修饰不合法字符或者是不合法路径节点,例如如果希望筛选 root.\`a@b\` 或者 root.\`123\`,应设置 pattern 为 root.\`a@b\` 或者 root.\`123\`(具体参考 [单双引号和反引号的使用时机](https://iotdb.apache.org/zh/Download/#_1-0-版本不兼容的语法详细说明)) +> * 在底层实现中,当检测到 pattern 为 root(默认值)时,同步效率较高,其他任意格式都将降低性能 +> * 路径前缀不需要能够构成完整的路径。例如,当创建一个包含参数为 'extractor.pattern'='root.aligned.1' 的 pipe 时: +> +> * root.aligned.1TS +> * root.aligned.1TS.\`1\` +> * root.aligned.100TS +> +> 的数据会被同步; +> +> * root.aligned.\`1\` +> * root.aligned.\`123\` +> +> 的数据不会被同步。 +> * root.\_\_system 的数据不会被 pipe 抽取,即不会被同步到目标端。用户虽然可以在 extractor.pattern 中包含任意前缀,包括带有(或覆盖) root.\__system 的前缀,但是 root.__system 下的数据总是会被 pipe 忽略的 + +> ❗️**extractor.history 的 start-time,end-time 参数说明** +> +> * start-time,end-time 应为 ISO 格式,例如 2011-12-03T10:15:30 或 2011-12-03T10:15:30+01:00 + +> ✅ **一条数据从生产到落库 IoTDB,包含两个关键的时间概念** +> +> * **event time:** 数据实际生产时的时间(或者数据生产系统给数据赋予的生成时间,是数据点中的时间项),也称为事件时间。 +> * **arrival time:** 数据到达 IoTDB 系统内的时间。 +> +> 我们常说的乱序数据,指的是数据到达时,其 **event time** 远落后于当前系统时间(或者已经落库的最大 **event time**)的数据。另一方面,不论是乱序数据还是顺序数据,只要它们是新到达系统的,那它们的 **arrival time** 都是会随着数据到达 IoTDB 的顺序递增的。 + +> 💎 **iotdb-extractor 的工作可以拆分成两个阶段** +> +> 1. 历史数据抽取:所有 **arrival time** < 创建 pipe 时**当前系统时间**的数据称为历史数据 +> 2. 实时数据抽取:所有 **arrival time** >= 创建 pipe 时**当前系统时间**的数据称为实时数据 +> +> 历史数据传输阶段和实时数据传输阶段,**两阶段串行执行,只有当历史数据传输阶段完成后,才执行实时数据传输阶段。** +> +> 用户可以指定 iotdb-extractor 进行: +> +> * 历史数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'false'` ) +> * 实时数据抽取(`'extractor.history.enable' = 'false'`, `'extractor.realtime.enable' = 'true'` ) +> * 全量数据抽取(`'extractor.history.enable' = 'true'`, `'extractor.realtime.enable' = 'true'` ) +> * 禁止同时设置 `extractor.history.enable` 和 `extractor.realtime.enable` 为 `false` + +> 📌 **extractor.realtime.mode:数据抽取的模式** +> +> * log:该模式下,任务仅使用操作日志进行数据处理、发送 +> * file:该模式下,任务仅使用数据文件进行数据处理、发送 +> * hybrid:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 + +> 🍕 **extractor.forwarding-pipe-requests:是否允许转发从另一 pipe 传输而来的数据** +> +> * 如果要使用 pipe 构建 A -> B -> C 的数据同步,那么 B -> C 的 pipe 需要将该参数为 true 后,A -> B 中 A 通过 pipe 写入 B 的数据才能被正确转发到 C +> * 如果要使用 pipe 构建 A \<-> B 的双向数据同步(双活),那么 A -> B 和 B -> A 的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发 + +### 预置 processor 插件 + +#### do-nothing-processor + +作用:不对 extractor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| processor | do-nothing-processor | String: do-nothing-processor | required | + +### 预置 connector 插件 + +#### iotdb-thrift-sync-connector(别名:iotdb-thrift-connector) + +作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 +使用 Thrift RPC 框架传输数据,单线程 blocking IO 模型。 +保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致。 + +限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 + + +| key | value | value 取值范围 | required or optional with default | +| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | +| connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | optional: true | +| connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | optional: 1 | +| connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | optional: 16 * 1024 * 1024 (16MiB) | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +#### iotdb-thrift-async-connector + +作用:主要用于 IoTDB(v1.2.0+)与 IoTDB(v1.2.0+)之间的数据传输。 +使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景。 +不保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致,但是保证数据发送的完整性(at-least-once)。 + +限制:源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.0+。 + + +| key | value | value 取值范围 | required or optional with default | +| --------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | +| connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | optional: true | +| connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | optional: 1 | +| connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | optional: 16 * 1024 * 1024 (16MiB) | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +#### iotdb-legacy-pipe-connector + +作用:主要用于 IoTDB(v1.2.0+)向更低版本的 IoTDB 传输数据,使用 v1.2.0 版本前的数据同步(Sync)协议。 +使用 Thrift RPC 框架传输数据。单线程 sync blocking IO 模型,传输性能较弱。 + +限制:源端 IoTDB 版本需要在 v1.2.0+,目标端 IoTDB 版本可以是 v1.2.0+、v1.1.x(更低版本的 IoTDB 理论上也支持,但是未经测试)。 + +注意:理论上 v1.2.0+ IoTDB 可作为 v1.2.0 版本前的任意版本的数据同步(Sync)接收端。 + + +| key | value | value 取值范围 | required or optional with default | +| ------------------ | --------------------------------------------------------------------- | ----------------------------------- | --------------------------------- | +| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | required | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | required | +| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | optional: root | +| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | optional: 1.1 | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +#### iotdb-air-gap-connector + +作用:用于 IoTDB(v1.2.2+)向 IoTDB(v1.2.2+)跨单向数据网闸的数据同步。支持的网闸型号包括南瑞 Syskeeper 2000 等。 +该 Connector 使用 Java 自带的 Socket 实现数据传输,单线程 blocking IO 模型,其性能与 iotdb-thrift-sync-connector 相当。 +保证接收端 apply 数据的顺序与发送端接受写入请求的顺序一致。 + +场景:例如,在电力系统的规范中 + +> 1.I/II 区与 III 区之间的应用程序禁止采用 SQL 命令访问数据库和基于 B/S 方式的双向数据传输 +> +> 2.I/II 区与 III 区之间的数据通信,传输的启动端由内网发起,反向的应答报文不容许携带数据,应用层的应答报文最多为 1 个字节,并且 1 个字节为全 0 或者全 1 两种状态 + +限制: + +1. 源端 IoTDB 与 目标端 IoTDB 版本都需要在 v1.2.2+。 +2. 单向数据网闸需要允许 TCP 请求跨越,且每一个请求可返回一个全 1 或全 0 的 byte。 +3. 目标端 IoTDB 需要在 iotdb-common.properties 内,配置 + a. pipe_air_gap_receiver_enabled=true + b. pipe_air_gap_receiver_port 配置 receiver 的接收端口 + + +| key | value | value 取值范围 | required or optional with default | +| -------------------------------------- | ---------------------------------------------------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------- | +| connector | iotdb-air-gap-connector | String: iotdb-air-gap-connector | required | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | optional: 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | optional: 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | optional: 与 connector.ip:connector.port 任选其一填写 | +| connector.air-gap.handshake-timeout-ms | 发送端与接收端在首次尝试建立连接时握手请求的超时时长,单位:毫秒 | Integer | optional: 5000 | + +> 📌 请确保接收端已经创建了发送端的所有时间序列,或是开启了自动创建元数据,否则将会导致 pipe 运行失败。 + +#### do-nothing-connector + +作用:不对 processor 传入的事件做任何的处理。 + + +| key | value | value 取值范围 | required or optional with default | +| --------- | -------------------- | ---------------------------- | --------------------------------- | +| connector | do-nothing-connector | String: do-nothing-connector | required | + +## 权限管理 + +| 权限名称 | 描述 | +| ----------- | -------------------- | +| CREATE_PIPE | 注册任务。路径无关。 | +| START_PIPE | 开启任务。路径无关。 | +| STOP_PIPE | 停止任务。路径无关。 | +| DROP_PIPE | 卸载任务。路径无关。 | +| SHOW_PIPES | 查询任务。路径无关。 | + +## 配置参数 + +在 iotdb-common.properties 中: + +```Properties +#################### +### Pipe Configuration +#################### + +# Uncomment the following field to configure the pipe lib directory. +# For Windows platform +# If its prefix is a drive specifier followed by "\\", or if its prefix is "\\\\", then the path is +# absolute. Otherwise, it is relative. +# pipe_lib_dir=ext\\pipe +# For Linux platform +# If its prefix is "/", then the path is absolute. Otherwise, it is relative. +# pipe_lib_dir=ext/pipe + +# The maximum number of threads that can be used to execute the pipe subtasks in PipeSubtaskExecutor. +# The actual value will be min(pipe_subtask_executor_max_thread_num, max(1, CPU core number / 2)). +# pipe_subtask_executor_max_thread_num=5 + +# The connection timeout (in milliseconds) for the thrift client. +# pipe_connector_timeout_ms=900000 + +# The maximum number of selectors that can be used in the async connector. +# pipe_async_connector_selector_number=1 + +# The core number of clients that can be used in the async connector. +# pipe_async_connector_core_client_number=8 + +# The maximum number of clients that can be used in the async connector. +# pipe_async_connector_max_client_number=16 + +# Whether to enable receiving pipe data through air gap. +# The receiver can only return 0 or 1 in tcp mode to indicate whether the data is received successfully. +# pipe_air_gap_receiver_enabled=false + +# The port for the server to receive pipe data through air gap. +# pipe_air_gap_receiver_port=9780 +``` + +## 功能特性 + +### 最少一次语义保证 **at-least-once** + +数据同步功能向外部系统传输数据时,提供 at-least-once 的传输语义。在大部分场景下,同步功能可提供 exactly-once 保证,即所有数据被恰好同步一次。 + +但是在以下场景中,可能存在部分数据被同步多次 **(断点续传)** 的情况: + +- 临时的网络故障:某次数据传输请求失败后,系统会进行重试发送,直至到达最大尝试次数 +- Pipe 插件逻辑实现异常:插件运行中抛出错误,系统会进行重试发送,直至到达最大尝试次数 +- 数据节点宕机、重启等导致的数据分区切主:分区变更完成后,受影响的数据会被重新传输 +- 集群不可用:集群可用后,受影响的数据会重新传输 + +### 源端:数据写入与 Pipe 处理、发送数据异步解耦 + +数据同步功能中,数据传输采用的是异步复制模式。 + +数据同步与写入操作完全脱钩,不存在对写入关键路径的影响。该机制允许框架在保证持续数据同步的前提下,保持时序数据库的写入速度。 + +### 源端:可自适应数据写入负载的数据传输策略 + +支持根据写入负载,动态调整数据传输方式,同步默认使用 TsFile 文件与操作流动态混合传输(`'extractor.realtime.mode'='hybrid'`)。 + +在数据写入负载高时,优先选择 TsFile 传输的方式。TsFile 压缩比高,节省网络带宽。 + +在数据写入负载低时,优先选择操作流同步传输的方式。操作流传输实时性高。 + +### 源端:高可用集群部署时,Pipe 服务高可用 + +当发送端 IoTDB 为高可用集群部署模式时,数据同步服务也将是高可用的。 数据同步框架将监控每个数据节点的数据同步进度,并定期做轻量级的分布式一致性快照以保存同步状态。 + +- 当发送端集群某数据节点宕机时,数据同步框架可以利用一致性快照以及保存在副本上的数据快速恢复同步,以此实现数据同步服务的高可用。 +- 当发送端集群整体宕机并重启时,数据同步框架也能使用快照恢复同步服务。 From f29454b9ca44c1f096b8c88121e77229e92cfa44 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Mon, 6 Nov 2023 18:10:12 +0800 Subject: [PATCH 25/27] 4 --- .../V1.2.x/User-Manual/Data-Sync_timecho.md | 146 +++++++++--------- 1 file changed, 69 insertions(+), 77 deletions(-) diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md index 2f39036a..607e8c99 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md @@ -19,73 +19,59 @@ --> -数据同步是工业物联网的典型需求,通过数据同步机制,可实现不同数据库的数据共享,搭建完整的数据链路来满足内网外网数据互通、端*边云同步、**数据迁移、**异地灾备、读写负载分库*等需求。 +# 数据同步 +数据同步是工业物联网的典型需求,通过数据同步机制,可实现IoTDB之间的数据共享,搭建完整的数据链路来满足内网外网数据互通、端边云同步、数据迁移、数据备份等需求。 -# 功能介绍 +## 功能介绍 -## 同步任务 - 整体框架 +### 同步任务概述 -一个数据同步任务称为 Pipe,如图所示,一个 Pipe 包含三个子阶段: +一个数据同步任务包含2个阶段: -![任务模型图](https://alioss.timecho.com/docs/img/%E6%B5%81%E5%A4%84%E7%90%86%E5%BC%95%E6%93%8E.jpeg) +- 抽取(Extract)阶段:该部分用于从源 IoTDB 抽取数据,在SQL语句中的 Extractor 部分定义 +- 发送(Connect)阶段:该部分用于向目标 IoTDB 发送数据,在SQL语句中的 Connector 部分定义 -- 抽取(Extract):由 Extractor 插件实现,用于抽取数据 -- 处理(Process):由 Processor 插件实现,用于处理数据 -- 发送(Connect):由 Connector 插件实现,用于发送数据 -通过 SQL 语句声明式地配置三个子任务的具体插件,组合每个插件不同的属性,可实现灵活的数据 ETL 能力。 -## 同步任务 - 创建 +通过 SQL 语句声明式地配置2个部分的具体内容,可实现灵活的数据同步能力。 -使用 `CREATE PIPE` 语句来创建一条数据同步任务,下列属性中`PipeId`和`connector`必填,`extractor`和`processor`选填,输入SQL时注意 `EXTRACTOR `与 `CONNECTOR` 插件顺序不能替换。 +### 同步任务 - 创建 + +使用 `CREATE PIPE` 语句来创建一条数据同步任务,下列属性中`PipeId`和`connector`为必填项,`extractor`和`processor`为选填项,输入SQL时注意 `EXTRACTOR `与 `CONNECTOR` 插件顺序不能替换。 SQL 示例如下: -```Go +```SQL CREATE PIPE -- PipeId 是能够唯一标定任务任务的名字 +-- 数据抽取插件,必填插件 WITH EXTRACTOR ( - -- IoTDB 数据抽取插件,默认为 'iotdb-extractor' - 'extractor' = 'iotdb-extractor', - -- 以下为 IoTDB 数据抽取插件参数,此处为示例,详细参数见本文extractor参数部分 - 'extractor.pattern' = 'root.timecho', - 'extractor.history.enable' = 'true', - 'extractor.history.start-time' = '2011.12.03T10:15:30+01:00', - 'extractor.history.end-time' = '2022.12.03T10:15:30+01:00', - 'extractor.realtime.enable' = 'true', - 'extractor.realtime.mode' = 'hybrid', - 'extractor.forwarding-pipe-requests' = 'hybrid', -WITH PROCESSOR ( - -- 数据处理插件,即不做任何处理 - 'processor' = 'do-nothing-processor', -) + [ = ,], +-- 数据连接插件,必填插件 WITH CONNECTOR ( - -- IoTDB 数据发送插件,目标端为 IoTDB - 'connector' = '', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip - 'connector.ip' = '127.0.0.1', - -- 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port - 'connector.port' = '6667', + [ = ,], ) ``` -## 同步任务 - 管理 -数据同步任务有三种状态:RUNNING、STOPPED和DROPPED。 -任务状态转换如下图所示: -![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) +### 同步任务 - 管理 -- 任务的初始状态为停止状态(STOPPED)。可以使用SQL语句启动任务,将状态从STOPPED转换为RUNNING。 +数据同步任务有三种状态:RUNNING、STOPPED和DROPPED。任务状态转换如下图所示: -- 用户可以使用SQL语句手动将运行状态的任务停止,将状态从RUNNING转换为STOPPED。 +![状态迁移图](https://alioss.timecho.com/docs/img/%E7%8A%B6%E6%80%81%E8%BF%81%E7%A7%BB%E5%9B%BE.png) -- 当任务遇到无法恢复的错误时,其状态会自动从RUNNING转换为STOPPED,这表示任务无法继续执行数据同步操作。 +一个数据同步任务在生命周期中会经过多种状态: -- 如果需要删除一个任务,可以使用相应命令。删除之前无需转换为STOPPED状态。 +- RUNNING: 运行状态。 +- STOPPED: 停止状态。 + - 说明1:任务的初始状态为停止状态,需要使用SQL语句启动任务 + - 说明2:用户也可以使用SQL语句手动将一个处于运行状态的任务停止,此时状态会从 RUNNING 变为 STOPPED + - 说明3:当一个任务出现无法恢复的错误时,其状态会自动从 RUNNING 变为 STOPPED +- DROPPED:删除状态。 我们提供以下SQL语句对同步任务进行状态管理。 -### 启动任务 +#### 启动任务 创建之后,任务不会立即被处理,需要启动任务。使用`START PIPE`语句来启动任务,从而开始处理数据: @@ -93,7 +79,7 @@ WITH CONNECTOR ( START PIPE ``` -### 停止任务 +#### 停止任务 停止处理数据: @@ -101,7 +87,7 @@ START PIPE STOP PIPE ``` -### 删除任务 +#### 删除任务 删除指定任务: @@ -109,7 +95,7 @@ STOP PIPE DROP PIPE ``` -### 查看任务 +#### 查看任务 查看全部任务: @@ -123,7 +109,7 @@ SHOW PIPES SHOW PIPE ``` -## 插件 +### 插件 为了使得整体架构更加灵活以匹配不同的同步场景需求,在上述同步任务框架中IoTDB支持进行插件组装。系统为您预置了一些常用插件可直接使用,同时您也可以自定义 processor 插件和 connector 插件,并加载至IoTDB系统进行使用。 @@ -133,26 +119,23 @@ SHOW PIPE | 处理(Process) | Processor 插件 | do-nothing-processor | 支持 | | 发送(Connect) | Connector 插件 | iotdb-thrift-sync-connector iotdb-thrift-async-connector iotdb-legacy-pipe-connector iotdb-air-gap-connector websocket - connector | 支持 | -### 预置插件 +#### 预置插件 预置插件如下: | 插件名称 | 类型 | 介绍 | 适用版本 | | ---------------------------- | ---- | ------------------------------------------------------------ | --------- | -| iotdb-extractor | extractor 插件 | 抽取 IoTDB 内部的历史或实时数据进入 pipe | 1.2.x | -| do-nothing-processor | processor 插件 | 不对 extractor 传入的事件做任何的处理 | 1.2.x | -| iotdb-thrift-sync-connector | connector 插件 | 主要用于 IoTDB(v1.2.0及以上)与 IoTDB(v1.2.0及以上)之间的数据传输。 使用 Thrift RPC 框架传输数据,单线程 blocking IO 模型 | 1.2.x | -| iotdb-thrift-async-connector | connector 插件 | 用于 IoTDB(v1.2.0及以上)与 IoTDB(v1.2.0及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景 | 1.2.x | -| iotdb-legacy-pipe-connector | connector 插件 | 用于 IoTDB(v1.2.0及以上)与低版本的 IoTDB (V1.2.0以前)之间的数据传输。 使用 Thrift RPC 框架传输数据 | 1.2.x | +| iotdb-extractor | extractor 插件 | 默认的extractor插件,用于抽取 IoTDB 历史或实时数据 | 1.2.x | +| do-nothing-processor | processor 插件 | 默认的processor插件,不对传入的数据做任何的处理 | 1.2.x | +| iotdb-thrift-connector | connector 插件 | 用于 IoTDB(v1.2.0及以上)与 IoTDB(v1.2.0及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景 | 1.2.x | | iotdb-air-gap-connector | connector 插件 | 用于 IoTDB(v1.2.2+)向 IoTDB(v1.2.2+)跨单向数据网闸的数据同步。支持的网闸型号包括南瑞 Syskeeper 2000 等 | 1.2.1以上 | -| websocket - connector | connector 插件 | 用于flink sql connector 传输数据 | 1.2.2以上 | 每个插件的属性参考[参数说明](#connector-参数)。 -### 自定义插件 +#### 自定义插件 自定义插件方法参考[自定义流处理插件开发](Streaming_timecho.md#自定义流处理插件开发)一章。 -### 查看插件 +#### 查看插件 查看系统中的插件(含自定义与内置插件)可以用以下语句: @@ -178,9 +161,9 @@ IoTDB> SHOW PIPEPLUGINS +----------------------------+----------+--------------------------------------------------------------------------------+----------------------------------------------------+ ``` -# 使用示例 +## 使用示例 -## 全量数据同步 +### 全量数据同步 创建一个名为 A2B, 功能为同步 A IoTDB 到 B IoTDB 间的全量数据,数据链路如下图所示: @@ -195,8 +178,12 @@ with connector ( 'connector.port'='6668' ) ``` +> 💎 ​**extractor.realtime.mode:数据抽取的模式** +> - **​log**:该模式下,任务仅使用操作日志进行数据处理、发送 +> - **file**:该模式下,任务仅使用数据文件进行数据处理、发送 +> - **hybrid**:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量 发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的>数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 -## 部分数据同步 +### 部分数据同步 创建一个名为 实时数据, 功能为同步 A IoTDB 到 B IoTDB 间的2023年8月23日8点到2023年10月23日8点的数据,数据链路如下图所示。 @@ -217,12 +204,13 @@ with connector ( 'connector.node-urls'='xxxx:6668', 'connector.batch.enable'='false') ``` -> 📌 'extractor.realtime.mode'='file'表示实时数据的抽取模式为 file 模式,该模式下,任务仅使用数据文件进行数据处理、发送。 - -> 📌'extractor.realtime.enable' = 'false', 表示不同步实时数据,即创建该任务后到达的数据都不传输。 +> ✅ +> +> - `'extractor.realtime.mode'='file'`表示实时数据的抽取模式为 file 模式,该模式下,任务仅使用数据文件进行数据处理、发送。 +> - `'extractor.realtime.enable' = 'false'`, 表示不同步实时数据,即创建该任务后到达的数据都不传输。 +> - `start-time,end-time` 应为 ISO 格式。 -> 📌 start-time,end-time 应为 ISO 格式。 -## 双向数据传输 +### 双向数据传输 要实现两个 IoTDB 之间互相备份,实时同步的功能,如下图所示: @@ -255,12 +243,15 @@ with connector ( 'connector.port'='6667' ) ``` -> 📌 'extractor.history.enable' = 'false'表示不传输历史数据,即不同步创建该任务前的数据。 -> 📌 'extractor.forwarding-pipe-requests' = 'false'表示不转发从另一 pipe 传输而来的数据,A 和 B 上的的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发。 +> 💎 +> +> - `'extractor.history.enable' = 'false'`表示不传输历史数据,即不同步创建该任务前的数据。 +> +> - `'extractor.forwarding-pipe-requests' = 'false'`表示不转发从另一 pipe 传输而来的数据,A 和 B 上的的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发。 -## 级联数据传输 +### 级联数据传输 要实现 A IoTDB 到 B IoTDB 到 C IoTDB 之间的级联数据传输链路,如下图所示: @@ -292,7 +283,7 @@ with connector ( ) ``` -## 跨网闸数据传输 +### 跨网闸数据传输 创建一个名为 A2B 的pipe,实现内网服务器上的 A,经由单向网闸,传输数据到外网服务器上的B,如下图所示: @@ -310,16 +301,17 @@ with connector ( ) ``` -# 参考:注意事项 +## 参考:注意事项 -- 使用数据同步功能,请保证接收端开启自动创建元数据; -- Pipe 中的数据含义: +> 📌 使用数据同步功能,请保证接收端开启自动创建元数据 -1. 历史数据抽取:所有 arrival time < 创建 pipe 时当前系统时间的数据称为历史数据 -2. 实时数据抽取:所有 arrival time >= 创建 pipe 时当前系统时间的数据称为实时数据 -3. 全量数据 = 历史数据 + 实时数据 +> ❗️ **Pipe 中的数据含义** +> +> * 历史数据:所有 arrival time < 创建 pipe 时当前系统时间的数据称为历史数据 +> * 实时数据:所有 arrival time >= 创建 pipe 时当前系统时间的数据称为实时数据 +> * 全量数据: 全量数据 = 历史数据 + 实时数据 -- 可通过修改 IoTDB 配置文件(iotdb-common.properties)以调整数据同步的参数,如同步数据存储目录等。完整配置如下: +可通过修改 IoTDB 配置文件(iotdb-common.properties)以调整数据同步的参数,如同步数据存储目录等。完整配置如下: ```Go #################### @@ -359,9 +351,9 @@ with connector ( # pipe_air_gap_receiver_port=9780 ``` -# 参考:参数说明 +## 参考:参数说明 -## extractor 参数 +### extractor 参数 | key | value | value 取值范围 | 是否必填 |默认取值| | ---------------------------------- | ------------------------------------------------ | -------------------------------------- | -------- |------| @@ -375,9 +367,9 @@ with connector ( | extractor.forwarding-pipe-requests | 是否转发由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | 选填 | true | -## connector 参数 +### connector 参数 -#### iotdb-thrift-sync-connector(别名:iotdb-thrift-connector) +#### iotdb-thrift-connector | key | value | value 取值范围 | 是否必填 | 默认取值 | | --------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------------------------------------- | From 5ea116da7c527ce73f769e2e3bfaa440a56e5ace Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Wed, 8 Nov 2023 19:00:19 +0800 Subject: [PATCH 26/27] 5 --- .../V1.2.x/User-Manual/Data-Sync_timecho.md | 99 +++++++------------ 1 file changed, 36 insertions(+), 63 deletions(-) diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md index 607e8c99..34f9cfeb 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md @@ -94,7 +94,7 @@ STOP PIPE ```Go DROP PIPE ``` - +删除任务不需要先停止同步任务。 #### 查看任务 查看全部任务: @@ -111,29 +111,24 @@ SHOW PIPE ### 插件 -为了使得整体架构更加灵活以匹配不同的同步场景需求,在上述同步任务框架中IoTDB支持进行插件组装。系统为您预置了一些常用插件可直接使用,同时您也可以自定义 processor 插件和 connector 插件,并加载至IoTDB系统进行使用。 +为了使得整体架构更加灵活以匹配不同的同步场景需求,在上述同步任务框架中IoTDB支持进行插件组装。系统为您预置了一些常用插件可直接使用,同时您也可以自定义 connector 插件,并加载至IoTDB系统进行使用。 | 模块 | 插件 | 预置插件 | 自定义插件 | | --- | --- | --- | --- | | 抽取(Extract) | Extractor 插件 | iotdb-extractor | 不支持 | -| 处理(Process) | Processor 插件 | do-nothing-processor | 支持 | | 发送(Connect) | Connector 插件 | iotdb-thrift-sync-connector iotdb-thrift-async-connector iotdb-legacy-pipe-connector iotdb-air-gap-connector websocket - connector | 支持 | #### 预置插件 -预置插件如下: +预置插件如下(部分插件为系统内部插件,将在1.3.0版本中删除): | 插件名称 | 类型 | 介绍 | 适用版本 | | ---------------------------- | ---- | ------------------------------------------------------------ | --------- | | iotdb-extractor | extractor 插件 | 默认的extractor插件,用于抽取 IoTDB 历史或实时数据 | 1.2.x | -| do-nothing-processor | processor 插件 | 默认的processor插件,不对传入的数据做任何的处理 | 1.2.x | | iotdb-thrift-connector | connector 插件 | 用于 IoTDB(v1.2.0及以上)与 IoTDB(v1.2.0及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景 | 1.2.x | | iotdb-air-gap-connector | connector 插件 | 用于 IoTDB(v1.2.2+)向 IoTDB(v1.2.2+)跨单向数据网闸的数据同步。支持的网闸型号包括南瑞 Syskeeper 2000 等 | 1.2.1以上 | -每个插件的属性参考[参数说明](#connector-参数)。 -#### 自定义插件 - -自定义插件方法参考[自定义流处理插件开发](Streaming_timecho.md#自定义流处理插件开发)一章。 +每个插件的详细参数参考[参数说明](#connector-参数)。 #### 查看插件 @@ -165,10 +160,11 @@ IoTDB> SHOW PIPEPLUGINS ### 全量数据同步 -创建一个名为 A2B, 功能为同步 A IoTDB 到 B IoTDB 间的全量数据,数据链路如下图所示: +同步两个 IoTDB 之间的所有数据,例如下面场景,创建一个名为 A2B, 功能为同步 A IoTDB 到 B IoTDB 间的全量数据,数据链路如下图所示: ![](https://alioss.timecho.com/docs/img/w1.png) -可使用如下语句: + +可使用简化的创建任务语句: ```Go create pipe A2B @@ -178,25 +174,33 @@ with connector ( 'connector.port'='6668' ) ``` -> 💎 ​**extractor.realtime.mode:数据抽取的模式** -> - **​log**:该模式下,任务仅使用操作日志进行数据处理、发送 -> - **file**:该模式下,任务仅使用数据文件进行数据处理、发送 -> - **hybrid**:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量 发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的>数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 +在这个例子中,connector 任务用到的是 iotdb-thrift-connector 插件,需指定接收端地址,这个例子中指定了'connector.ip'和'connector.port',也可指定'connector.node-urls',如下面的例子。 + +> 📌 注:使用数据同步功能,请保证接收端开启自动创建元数据 ### 部分数据同步 -创建一个名为 实时数据, 功能为同步 A IoTDB 到 B IoTDB 间的2023年8月23日8点到2023年10月23日8点的数据,数据链路如下图所示。 + +> ❗️ **Pipe 中的数据含义** +> +> * 历史数据:所有 arrival time < 创建 pipe 时当前系统时间的数据称为历史数据 +> * 实时数据:所有 arrival time >= 创建 pipe 时当前系统时间的数据称为实时数据 +> * 全量数据: 全量数据 = 历史数据 + 实时数据 + +同步某个时间范围的数据,例如下面场景,创建一个名为 A2B, 功能为同步 A IoTDB 到 B IoTDB 间2023年8月23日8点到2023年10月23日8点的数据,数据链路如下图所示。 ![](https://alioss.timecho.com/docs/img/w2.png) -可使用如下语句: +此时,我们需要使用 extractor 来定义传输数据的范围。由于传输的是历史数据(历史数据是指同步任务创建之前存在的数据),所以需要将extractor.realtime.enable参数配置为false,即不同步实时数据(实时数据是指同步任务创建之后存在的数据),同时将 extractor.realtime.mode设置为 hybrid,表示使用 hybrid模式传输数据。 -```Go +详细语句如下: + +```SQL create pipe A2B WITH EXTRACTOR ( 'extractor'= 'iotdb-extractor', 'extractor.realtime.enable' = 'false', -'extractor.realtime.mode'='file', +'extractor.realtime.mode'='hybrid', 'extractor.history.start-time' = '2023.08.23T08:00:00+00:00', 'extractor.history.end-time' = '2023.10.23T08:00:00+00:00') with connector ( @@ -204,11 +208,12 @@ with connector ( 'connector.node-urls'='xxxx:6668', 'connector.batch.enable'='false') ``` -> ✅ -> -> - `'extractor.realtime.mode'='file'`表示实时数据的抽取模式为 file 模式,该模式下,任务仅使用数据文件进行数据处理、发送。 -> - `'extractor.realtime.enable' = 'false'`, 表示不同步实时数据,即创建该任务后到达的数据都不传输。 -> - `start-time,end-time` 应为 ISO 格式。 + +> 💎 ​**extractor.realtime.mode:数据抽取的模式** +> - **​log**:该模式下,任务仅使用操作日志进行数据处理、发送 +> - **file**:该模式下,任务仅使用数据文件进行数据处理、发送 +> - **hybrid**:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量 发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的>数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 + ### 双向数据传输 @@ -216,7 +221,11 @@ with connector ( ![](https://alioss.timecho.com/docs/img/w3.png) -可创建两个子任务, 功能为双向同步 A IoTDB 到 B IoTDB 间的实时数据,在 A IoTDB 上执行下列语句: + 在这个场景中,需要将参数`extractor.forwarding-pipe-requests` 设置为 `false`,表示不转发从另一 pipe 传输而来的数据,A 和 B 上的的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发。 + + `'extractor.history.enable' = 'false'`表示不传输历史数据,即不同步创建该任务前的数据。 + + 可创建两个子任务, 功能为双向同步 A IoTDB 到 B IoTDB 间的实时数据,在 A IoTDB 上执行下列语句: ```Go create pipe AB @@ -244,12 +253,6 @@ with connector ( ) ``` -> 💎 -> -> - `'extractor.history.enable' = 'false'`表示不传输历史数据,即不同步创建该任务前的数据。 -> -> - `'extractor.forwarding-pipe-requests' = 'false'`表示不转发从另一 pipe 传输而来的数据,A 和 B 上的的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发。 - ### 级联数据传输 @@ -275,7 +278,7 @@ with connector ( ```Go create pipe BC with extractor ( - 'extractor.forwarding-pipe-requests' = 'false', + 'extractor.forwarding-pipe-requests' = 'true', with connector ( 'connector'='iotdb-thrift-connector', 'connector.ip'='127.0.0.1', @@ -290,7 +293,7 @@ with connector ( ![](https://alioss.timecho.com/docs/img/w5.png) -配置网闸后,在 A IoTDB 上执行下列语句: +数据穿透网闸需要使用 connector 任务中的iotdb-air-gap-connector 插件(目前支持部分型号网闸,具体型号请联系天谋科技工作人员确认),配置网闸后,在 A IoTDB 上执行下列语句: ```Go create pipe A2B @@ -303,14 +306,6 @@ with connector ( ## 参考:注意事项 -> 📌 使用数据同步功能,请保证接收端开启自动创建元数据 - -> ❗️ **Pipe 中的数据含义** -> -> * 历史数据:所有 arrival time < 创建 pipe 时当前系统时间的数据称为历史数据 -> * 实时数据:所有 arrival time >= 创建 pipe 时当前系统时间的数据称为实时数据 -> * 全量数据: 全量数据 = 历史数据 + 实时数据 - 可通过修改 IoTDB 配置文件(iotdb-common.properties)以调整数据同步的参数,如同步数据存储目录等。完整配置如下: ```Go @@ -381,29 +376,7 @@ with connector ( | connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | | connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 -#### iotdb-thrift-async-connector - -| key | value | value 取值范围 | 是否必填 | 默认取值 | -| --------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------------------------------------- | -| connector | iotdb-thrift-async-connector | String: iotdb-thrift-async-connector | 必填 | | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | 选填 | 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | 选填 | 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 选填 | 与 connector.ip:connector.port 任选其一填写 | -| connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | -| connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | -| connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填 | 16 * 1024 * 1024 (16MiB) | - - -#### iotdb-legacy-pipe-connector -| key | value | value 取值范围 | 是否必填 | 默认取值 | -| ------------------ | ------------------------------------------------------------ | ----------------------------------- | -------- | -------- | -| connector | iotdb-legacy-pipe-connector | String: iotdb-legacy-pipe-connector | 必填 | - | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | 选填 | - | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | 选填 | - | -| connector.user | 目标端 IoTDB 的用户名,注意该用户需要支持数据写入、TsFile Load 的权限 | String | 选填 | root | -| connector.password | 目标端 IoTDB 的密码,注意该用户需要支持数据写入、TsFile Load 的权限 | String | 选填 | root | -| connector.version | 目标端 IoTDB 的版本,用于伪装自身实际版本,绕过目标端的版本一致性检查 | String | 选填 | 1.1 | #### iotdb-air-gap-connector From e30129c570f1b67e343f4a85e2bb413d1b8e8b19 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Thu, 9 Nov 2023 18:40:32 +0800 Subject: [PATCH 27/27] 6 --- .../V1.2.x/User-Manual/Data-Sync_timecho.md | 78 +++++++++---------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md index 34f9cfeb..7acf9149 100644 --- a/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md +++ b/src/zh/UserGuide/V1.2.x/User-Manual/Data-Sync_timecho.md @@ -51,6 +51,7 @@ WITH CONNECTOR ( [ = ,], ) ``` +> 📌 注:使用数据同步功能,请保证接收端开启自动创建元数据 @@ -116,11 +117,11 @@ SHOW PIPE | 模块 | 插件 | 预置插件 | 自定义插件 | | --- | --- | --- | --- | | 抽取(Extract) | Extractor 插件 | iotdb-extractor | 不支持 | -| 发送(Connect) | Connector 插件 | iotdb-thrift-sync-connector iotdb-thrift-async-connector iotdb-legacy-pipe-connector iotdb-air-gap-connector websocket - connector | 支持 | +| 发送(Connect) | Connector 插件 | iotdb-thrift-connector、iotdb-air-gap-connector| 支持 | #### 预置插件 -预置插件如下(部分插件为系统内部插件,将在1.3.0版本中删除): +预置插件如下: | 插件名称 | 类型 | 介绍 | 适用版本 | | ---------------------------- | ---- | ------------------------------------------------------------ | --------- | @@ -128,7 +129,7 @@ SHOW PIPE | iotdb-thrift-connector | connector 插件 | 用于 IoTDB(v1.2.0及以上)与 IoTDB(v1.2.0及以上)之间的数据传输。使用 Thrift RPC 框架传输数据,多线程 async non-blocking IO 模型,传输性能高,尤其适用于目标端为分布式时的场景 | 1.2.x | | iotdb-air-gap-connector | connector 插件 | 用于 IoTDB(v1.2.2+)向 IoTDB(v1.2.2+)跨单向数据网闸的数据同步。支持的网闸型号包括南瑞 Syskeeper 2000 等 | 1.2.1以上 | -每个插件的详细参数参考[参数说明](#connector-参数)。 +每个插件的详细参数可参考本文[参数说明](#connector-参数)章节。 #### 查看插件 @@ -138,7 +139,7 @@ SHOW PIPE SHOW PIPEPLUGINS ``` -返回结果如下(1.2.2 版本): +返回结果如下(其中部分插件为系统内部插件,将在1.3.0版本中删除): ```Go IoTDB> SHOW PIPEPLUGINS @@ -160,11 +161,11 @@ IoTDB> SHOW PIPEPLUGINS ### 全量数据同步 -同步两个 IoTDB 之间的所有数据,例如下面场景,创建一个名为 A2B, 功能为同步 A IoTDB 到 B IoTDB 间的全量数据,数据链路如下图所示: +本例子用来演示将一个 IoTDB 的所有数据同步至另一个IoTDB,数据链路如下图所示: ![](https://alioss.timecho.com/docs/img/w1.png) -可使用简化的创建任务语句: +在这个例子中,我们可以创建一个名为 A2B 的同步任务,用来同步 A IoTDB 到 B IoTDB 间的全量数据,这里需要用到用到 connector 的 iotdb-thrift-connector 插件(内置插件),需指定接收端地址,这个例子中指定了'connector.ip'和'connector.port',也可指定'connector.node-urls',如下面的示例语句: ```Go create pipe A2B @@ -174,24 +175,15 @@ with connector ( 'connector.port'='6668' ) ``` -在这个例子中,connector 任务用到的是 iotdb-thrift-connector 插件,需指定接收端地址,这个例子中指定了'connector.ip'和'connector.port',也可指定'connector.node-urls',如下面的例子。 - -> 📌 注:使用数据同步功能,请保证接收端开启自动创建元数据 -### 部分数据同步 +### 历史数据同步 -> ❗️ **Pipe 中的数据含义** -> -> * 历史数据:所有 arrival time < 创建 pipe 时当前系统时间的数据称为历史数据 -> * 实时数据:所有 arrival time >= 创建 pipe 时当前系统时间的数据称为实时数据 -> * 全量数据: 全量数据 = 历史数据 + 实时数据 - -同步某个时间范围的数据,例如下面场景,创建一个名为 A2B, 功能为同步 A IoTDB 到 B IoTDB 间2023年8月23日8点到2023年10月23日8点的数据,数据链路如下图所示。 +本例子用来演示同步某个历史时间范围(2023年8月23日8点到2023年10月23日8点)的数据至另一个IoTDB,数据链路如下图所示: ![](https://alioss.timecho.com/docs/img/w2.png) -此时,我们需要使用 extractor 来定义传输数据的范围。由于传输的是历史数据(历史数据是指同步任务创建之前存在的数据),所以需要将extractor.realtime.enable参数配置为false,即不同步实时数据(实时数据是指同步任务创建之后存在的数据),同时将 extractor.realtime.mode设置为 hybrid,表示使用 hybrid模式传输数据。 +在这个例子中,我们可以创建一个名为 A2B 的同步任务。首先我们需要在 extractor 中定义传输数据的范围,由于传输的是历史数据(历史数据是指同步任务创建之前存在的数据),所以需要将extractor.realtime.enable参数配置为false;同时需要配置数据的起止时间start-time和end-time以及传输的模式mode,此处推荐mode设置为 hybrid 模式(hybrid模式为混合传输,在无数据积压时采用实时传输方式,有数据积压时采用批量传输方式,并根据系统内部情况自动切换)。 详细语句如下: @@ -209,23 +201,18 @@ with connector ( 'connector.batch.enable'='false') ``` -> 💎 ​**extractor.realtime.mode:数据抽取的模式** -> - **​log**:该模式下,任务仅使用操作日志进行数据处理、发送 -> - **file**:该模式下,任务仅使用数据文件进行数据处理、发送 -> - **hybrid**:该模式,考虑了按操作日志逐条目发送数据时延迟低但吞吐低的特点,以及按数据文件批量 发送时发送吞吐高但延迟高的特点,能够在不同的写入负载下自动切换适合的>数据抽取方式,首先采取基于操作日志的数据抽取方式以保证低发送延迟,当产生数据积压时自动切换成基于数据文件的数据抽取方式以保证高发送吞吐,积压消除时自动切换回基于操作日志的数据抽取方式,避免了采用单一数据抽取算法难以平衡数据发送延迟或吞吐的问题。 - ### 双向数据传输 -要实现两个 IoTDB 之间互相备份,实时同步的功能,如下图所示: +本例子用来演示两个 IoTDB 之间互为双活的场景,数据链路如下图所示: ![](https://alioss.timecho.com/docs/img/w3.png) - 在这个场景中,需要将参数`extractor.forwarding-pipe-requests` 设置为 `false`,表示不转发从另一 pipe 传输而来的数据,A 和 B 上的的 pipe 都需要将该参数设置为 false,否则将会造成数据无休止的集群间循环转发。 - - `'extractor.history.enable' = 'false'`表示不传输历史数据,即不同步创建该任务前的数据。 +在这个例子中,为了避免数据无限循环,需要将A和B上的参数`extractor.forwarding-pipe-requests` 均设置为 `false`,表示不转发从另一pipe传输而来的数据。同时将`'extractor.history.enable'` 设置为 `false`,表示不传输历史数据,即不同步创建该任务前的数据。 - 可创建两个子任务, 功能为双向同步 A IoTDB 到 B IoTDB 间的实时数据,在 A IoTDB 上执行下列语句: +详细语句如下: + +在 A IoTDB 上执行下列语句: ```Go create pipe AB @@ -256,16 +243,17 @@ with connector ( ### 级联数据传输 -要实现 A IoTDB 到 B IoTDB 到 C IoTDB 之间的级联数据传输链路,如下图所示: + +本例子用来演示多个 IoTDB 之间级联传输数据的场景,数据由A集群同步至B集群,再同步至C集群,数据链路如下图所示: ![](https://alioss.timecho.com/docs/img/w4.png) -创建一个名为 AB 的pipe,在 A IoTDB 上执行下列语句: +在这个例子中,为了将A集群的数据同步至C,在BC之间的pipe需要将 `extractor.forwarding-pipe-requests` 配置为`true`,详细语句如下: + +在A IoTDB上执行下列语句,将A中数据同步至B: ```Go create pipe AB -with extractor ( - 'extractor.forwarding-pipe-requests', with connector ( 'connector'='iotdb-thrift-connector', 'connector.ip'='127.0.0.1', @@ -273,7 +261,7 @@ with connector ( ) ``` -创建一个名为 BC 的pipe,在 B IoTDB 上执行下列语句: +在B IoTDB上执行下列语句,将B中数据同步至C: ```Go create pipe BC @@ -288,12 +276,11 @@ with connector ( ### 跨网闸数据传输 -创建一个名为 A2B 的pipe,实现内网服务器上的 A,经由单向网闸,传输数据到外网服务器上的B,如下图所示: +本例子用来演示将一个 IoTDB 的数据,经过单向网闸,同步至另一个IoTDB的场景,数据链路如下图所示: ![](https://alioss.timecho.com/docs/img/w5.png) - -数据穿透网闸需要使用 connector 任务中的iotdb-air-gap-connector 插件(目前支持部分型号网闸,具体型号请联系天谋科技工作人员确认),配置网闸后,在 A IoTDB 上执行下列语句: +在这个例子中,需要使用 connector 任务中的iotdb-air-gap-connector 插件(目前支持部分型号网闸,具体型号请联系天谋科技工作人员确认),配置网闸后,在 A IoTDB 上执行下列语句,其中ip和port填写网闸信息,详细语句如下: ```Go create pipe A2B @@ -361,6 +348,19 @@ with connector ( | extractor.realtime.mode | 实时数据的抽取模式 | String: hybrid, log, file | 选填 | hybrid | | extractor.forwarding-pipe-requests | 是否转发由其他 Pipe (通常是数据同步)写入的数据 | Boolean: true, false | 选填 | true | +> 💎 **说明:历史数据与实时数据的差异** +> +> * **历史数据**:所有 arrival time < 创建 pipe 时当前系统时间的数据称为历史数据 +> * **实时数据**:所有 arrival time >= 创建 pipe 时当前系统时间的数据称为实时数据 +> * **全量数据**: 全量数据 = 历史数据 + 实时数据 + + +> 💎 ​**说明:数据抽取模式hybrid, log和file的差异** +> +> - **hybrid(推荐)**:该模式下,任务将优先对数据进行实时处理、发送,当数据产生积压时自动切换至批量发送模式,其特点是平衡了数据同步的时效性和吞吐量 +> - **​log**:该模式下,任务将对数据进行实时处理、发送,其特点是高时效、低吞吐 +> - **file**:该模式下,任务将对数据进行批量(按底层数据文件)处理、发送,其特点是低时效、高吞吐 + ### connector 参数 @@ -369,9 +369,9 @@ with connector ( | key | value | value 取值范围 | 是否必填 | 默认取值 | | --------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | ------------------------------------------- | | connector | iotdb-thrift-connector 或 iotdb-thrift-sync-connector | String: iotdb-thrift-connector 或 iotdb-thrift-sync-connector | 必填 | | -| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip | String | 选填 | 与 connector.node-urls 任选其一填写 | -| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port | Integer | 选填 | 与 connector.node-urls 任选其一填写 | -| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 选填 | 与 connector.ip:connector.port 任选其一填写 | +| connector.ip | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 ip(请注意同步任务不支持向自身服务进行转发) | String | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.port | 目标端 IoTDB 其中一个 DataNode 节点的数据服务 port(请注意同步任务不支持向自身服务进行转发) | Integer | 选填 | 与 connector.node-urls 任选其一填写 | +| connector.node-urls | 目标端 IoTDB 任意多个 DataNode 节点的数据服务端口的 url(请注意同步任务不支持向自身服务进行转发) | String。例:'127.0.0.1:6667,127.0.0.1:6668,127.0.0.1:6669', '127.0.0.1:6667' | 选填 | 与 connector.ip:connector.port 任选其一填写 | | connector.batch.enable | 是否开启日志攒批发送模式,用于提高传输吞吐,降低 IOPS | Boolean: true, false | 选填 | true | | connector.batch.max-delay-seconds | 在开启日志攒批发送模式时生效,表示一批数据在发送前的最长等待时间(单位:s) | Integer | 选填 | 1 | | connector.batch.size-bytes | 在开启日志攒批发送模式时生效,表示一批数据最大的攒批大小(单位:byte) | Long | 选填