diff --git a/src/UserGuide/Master/User-Manual/Database-Programming.md b/src/UserGuide/Master/User-Manual/Database-Programming.md index 0ce6cf74..121ef4a6 100644 --- a/src/UserGuide/Master/User-Manual/Database-Programming.md +++ b/src/UserGuide/Master/User-Manual/Database-Programming.md @@ -1051,7 +1051,7 @@ In IoTDB, you can expand two types of UDF: | UDF Class | Description | | --------------------------------------------------- | ------------------------------------------------------------ | | UDTF(User Defined Timeseries Generating Function) | This type of function can take **multiple** time series as input, and output **one** time series, which can have any number of data points. | -| UDAF(User Defined Aggregation Function) | Under development, please stay tuned. | +| UDAF(User Defined Aggregation Function) | Custom Aggregation Functions. This type of function can take one time series as input, and output **one** aggregated data point for each group based on the GROUP BY type. | ### UDF Development Dependencies @@ -1410,6 +1410,260 @@ This method is called by the framework. For a UDF instance, `beforeDestroy` will +### UDAF (User Defined Aggregation Function) + +A complete definition of UDAF involves two classes, `State` and `UDAF`. + +#### State Class + +To write your own `State`, you need to implement the `org.apache.iotdb.udf.api.State` interface. + +The following table shows all the interfaces available for user implementation. + +| Interface Definition | Description | Required to Implement | +| -------------------------------- | ------------------------------------------------------------ | --------------------- | +| `void reset()` | To reset the `State` object to its initial state, you need to fill in the initial values of the fields in the `State` class within this method as if you were writing a constructor. | Required | +| `byte[] serialize()` | Serializes `State` to binary data. This method is used for IoTDB internal `State` passing. Note that the order of serialization must be consistent with the following deserialization methods. | Required | +| `void deserialize(byte[] bytes)` | Deserializes binary data to `State`. This method is used for IoTDB internal `State` passing. Note that the order of deserialization must be consistent with the serialization method above. | Required | + +The following section describes the usage of each interface in detail. + + + +##### void reset() + +This method resets the `State` to its initial state, you need to fill in the initial values of the fields in the `State` object in this method. For optimization reasons, IoTDB reuses `State` as much as possible internally, rather than creating a new `State` for each group, which would introduce unnecessary overhead. When `State` has finished updating the data in a group, this method is called to reset to the initial state as a way to process the next group. + +In the case of `State` for averaging (aka `avg`), for example, you would need the sum of the data, `sum`, and the number of entries in the data, `count`, and initialize both to 0 in the `reset()` method. + +```java +class AvgState implements State { + double sum; + + long count; + + @Override + public void reset() { + sum = 0; + count = 0; + } + + // other methods +} +``` + + + +##### byte[] serialize()/void deserialize(byte[] bytes) + +These methods serialize the `State` into binary data, and deserialize the `State` from the binary data. IoTDB, as a distributed database, involves passing data among different nodes, so you need to write these two methods to enable the passing of the State among different nodes. Note that the order of serialization and deserialization must be the consistent. + +In the case of `State` for averaging (aka `avg`), for example, you can convert the content of State to `byte[]` array and read out the content of State from `byte[]` array in any way you want, the following shows the code for serialization/deserialization using `ByteBuffer` introduced by Java8: + +```java +@Override +public byte[] serialize() { + ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES); + buffer.putDouble(sum); + buffer.putLong(count); + + return buffer.array(); +} + +@Override +public void deserialize(byte[] bytes) { + ByteBuffer buffer = ByteBuffer.wrap(bytes); + sum = buffer.getDouble(); + count = buffer.getLong(); +} +``` + + + +#### UDAF Classes + +To write a UDAF, you need to implement the `org.apache.iotdb.udf.api.UDAF` interface. + +The following table shows all the interfaces available for user implementation. + +| Interface definition | Description | Required to Implement | +| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------- | +| `void validate(UDFParameterValidator validator) throws Exception` | This method is mainly used to validate `UDFParameters` and it is executed before `beforeStart(UDFParameters, UDTFConfigurations)` is called. | Optional | +| `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` | Initialization method that invokes user-defined initialization behavior before UDAF processes the input data. Unlike UDTF, configuration is of type `UDAFConfiguration`. | Required | +| `State createState()` | To create a `State` object, usually just call the default constructor and modify the default initial value as needed. | Required | +| `void addInput(State state, Column[] columns, BitMap bitMap)` | Update `State` object according to the incoming data `Column[]` in batch, note that `column[0]` always represents the time column. In addition, `BitMap` represents the data that has been filtered out before, you need to manually determine whether the corresponding data has been filtered out when writing this method. | Required | +| `void combineState(State state, State rhs)` | Merge `rhs` state into `state` state. In a distributed scenario, the same set of data may be distributed on different nodes, IoTDB generates a `State` object for the partial data on each node, and then calls this method to merge it into the complete `State`. | Required | +| `void outputFinal(State state, ResultValue resultValue)` | Computes the final aggregated result based on the data in `State`. Note that according to the semantics of the aggregation, only one value can be output per group. | Required | +| `void beforeDestroy() ` | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | + +In the life cycle of a UDAF instance, the calling sequence of each method is as follows: + +1. `State createState()` +2. `void validate(UDFParameterValidator validator) throws Exception` +3. `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` +4. `void addInput(State state, Column[] columns, BitMap bitMap)` +5. `void combineState(State state, State rhs)` +6. `void outputFinal(State state, ResultValue resultValue)` +7. `void beforeDestroy()` + +Similar to UDTF, every time the framework executes a UDAF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDAF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDAF without considering the influence of concurrency and other factors. + +The usage of each interface will be described in detail below. + + + +##### void validate(UDFParameterValidator validator) throws Exception + +Same as UDTF, the `validate` method is used to validate the parameters entered by the user. + +In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification. + + + +##### void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception + + The `beforeStart` method does the same thing as the UDAF: + +1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. +2. Set the strategy to access the raw data and set the output data type in UDAFConfigurations. +3. Create resources, such as establishing external connections, opening files, etc. + +The role of the `UDFParameters` type can be seen above. + +###### UDAFConfigurations + +The difference from UDTF is that UDAF uses `UDAFConfigurations` as the type of `configuration` object. + +Currently, this class only supports setting the type of output data. + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setOutputDataType(Type.INT32); } +} +``` + +The relationship between the output type set in `setOutputDataType` and the type of data output that `ResultValue` can actually receive is as follows: + +| The output type set in `setOutputDataType` | The output type that `ResultValue` can actually receive | +| ------------------------------------------ | ------------------------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `org.apache.iotdb.udf.api.type.Binary` | + +The output type of the UDAF is determined at runtime. You can dynamically determine the output sequence type based on the input type. + +Here is a simple example: + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setOutputDataType(parameters.getDataType(0)); +} +``` + + + +##### State createState() + +This method creates and initializes a `State` object for UDAF. Due to the limitations of the Java language, you can only call the default constructor for the `State` class. The default constructor assigns a default initial value to all the fields in the class, and if that initial value does not meet your requirements, you need to initialize them manually within this method. + +The following is an example that includes manual initialization. Suppose you want to implement an aggregate function that multiply all numbers in the group, then your initial `State` value should be set to 1, but the default constructor initializes it to 0, so you need to initialize `State` manually after calling the default constructor: + +```java +public State createState() { + MultiplyState state = new MultiplyState(); + state.result = 1; + return state; +} +``` + + + +##### void addInput(State state, Column[] columns, BitMap bitMap) + +This method updates the `State` object with the raw input data. For performance reasons, also to align with the IoTDB vectorized query engine, the raw input data is no longer a data point, but an array of columns ``Column[]``. Note that the first column (i.e. `column[0]`) is always the time column, so you can also do different operations in UDAF depending on the time. + +Since the input parameter is not of a single data point type, but of multiple columns, you need to manually filter some of the data in the columns, which is why the third parameter, `BitMap`, exists. It identifies which of these columns have been filtered out, so you don't have to think about the filtered data in any case. + +Here's an example of `addInput()` that counts the number of items (aka count). It shows how you can use `BitMap` to ignore data that has been filtered out. Note that due to the limitations of the Java language, you need to do the explicit cast the `State` object from type defined in the interface to a custom `State` type at the beginning of the method, otherwise you won't be able to use the `State` object. + +```java +public void addInput(State state, Column[] column, BitMap bitMap) { + CountState countState = (CountState) state; + + int count = column[0].getPositionCount(); + for (int i = 0; i < count; i++) { + if (bitMap != null && !bitMap.isMarked(i)) { + continue; + } + if (!column[1].isNull(i)) { + countState.count++; + } + } +} +``` + + + +##### void combineState(State state, State rhs) + +This method combines two `State`s, or more precisely, updates the first `State` object with the second `State` object. IoTDB is a distributed database, and the data of the same group may be distributed on different nodes. For performance reasons, IoTDB will first aggregate some of the data on each node into `State`, and then merge the `State`s on different nodes that belong to the same group, which is what `combineState` does. + +Here's an example of `combineState()` for averaging (aka avg). Similar to `addInput`, you need to do an explicit type conversion for the two `State`s at the beginning. Also note that you are updating the value of the first `State` with the contents of the second `State`. + +```java +public void combineState(State state, State rhs) { + AvgState avgState = (AvgState) state; + AvgState avgRhs = (AvgState) rhs; + + avgState.count += avgRhs.count; + avgState.sum += avgRhs.sum; +} +``` + + + +##### void outputFinal(State state, ResultValue resultValue) + +This method works by calculating the final result from `State`. You need to access the various fields in `State`, derive the final result, and set the final result into the `ResultValue` object.IoTDB internally calls this method once at the end for each group. Note that according to the semantics of aggregation, the final result can only be one value. + +Here is another `outputFinal` example for averaging (aka avg). In addition to the forced type conversion at the beginning, you will also see a specific use of the `ResultValue` object, where the final result is set by `setXXX` (where `XXX` is the type name). + +```java +public void outputFinal(State state, ResultValue resultValue) { + AvgState avgState = (AvgState) state; + + if (avgState.count != 0) { + resultValue.setDouble(avgState.sum / avgState.count); + } else { + resultValue.setNull(); + } +} +``` + + + +##### void beforeDestroy() + +The method for terminating a UDF. + +This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. + + + + + ### Maven Project Example If you use Maven, you can build your own UDF project referring to our **udf-example** module. You can find the project [here](https://github.com/apache/iotdb/tree/master/example/udf). diff --git a/src/UserGuide/latest/User-Manual/Database-Programming.md b/src/UserGuide/latest/User-Manual/Database-Programming.md index 0ce6cf74..98c91aa8 100644 --- a/src/UserGuide/latest/User-Manual/Database-Programming.md +++ b/src/UserGuide/latest/User-Manual/Database-Programming.md @@ -1051,7 +1051,7 @@ In IoTDB, you can expand two types of UDF: | UDF Class | Description | | --------------------------------------------------- | ------------------------------------------------------------ | | UDTF(User Defined Timeseries Generating Function) | This type of function can take **multiple** time series as input, and output **one** time series, which can have any number of data points. | -| UDAF(User Defined Aggregation Function) | Under development, please stay tuned. | +| UDAF(User Defined Aggregation Function) | Custom Aggregation Functions. This type of function can take one time series as input, and output **one** aggregated data point for each group based on the GROUP BY type. | ### UDF Development Dependencies @@ -1410,6 +1410,258 @@ This method is called by the framework. For a UDF instance, `beforeDestroy` will +### UDAF (User Defined Aggregation Function) + +A complete definition of UDAF involves two classes, `State` and `UDAF`. + +#### State Class + +To write your own `State`, you need to implement the `org.apache.iotdb.udf.api.State` interface. + +The following table shows all the interfaces available for user implementation. + +| Interface Definition | Description | Required to Implement | +| -------------------------------- | ------------------------------------------------------------ | --------------------- | +| `void reset()` | To reset the `State` object to its initial state, you need to fill in the initial values of the fields in the `State` class within this method as if you were writing a constructor. | Required | +| `byte[] serialize()` | Serializes `State` to binary data. This method is used for IoTDB internal `State` passing. Note that the order of serialization must be consistent with the following deserialization methods. | Required | +| `void deserialize(byte[] bytes)` | Deserializes binary data to `State`. This method is used for IoTDB internal `State` passing. Note that the order of deserialization must be consistent with the serialization method above. | Required | + +The following section describes the usage of each interface in detail. + + + +##### void reset() + +This method resets the `State` to its initial state, you need to fill in the initial values of the fields in the `State` object in this method. For optimization reasons, IoTDB reuses `State` as much as possible internally, rather than creating a new `State` for each group, which would introduce unnecessary overhead. When `State` has finished updating the data in a group, this method is called to reset to the initial state as a way to process the next group. + +In the case of `State` for averaging (aka `avg`), for example, you would need the sum of the data, `sum`, and the number of entries in the data, `count`, and initialize both to 0 in the `reset()` method. + +```java +class AvgState implements State { + double sum; + + long count; + + @Override + public void reset() { + sum = 0; + count = 0; + } + + // other methods +} +``` + + + +##### byte[] serialize()/void deserialize(byte[] bytes) + +These methods serialize the `State` into binary data, and deserialize the `State` from the binary data. IoTDB, as a distributed database, involves passing data among different nodes, so you need to write these two methods to enable the passing of the State among different nodes. Note that the order of serialization and deserialization must be the consistent. + +In the case of `State` for averaging (aka `avg`), for example, you can convert the content of State to `byte[]` array and read out the content of State from `byte[]` array in any way you want, the following shows the code for serialization/deserialization using `ByteBuffer` introduced by Java8: + +```java +@Override +public byte[] serialize() { + ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES); + buffer.putDouble(sum); + buffer.putLong(count); + + return buffer.array(); +} + +@Override +public void deserialize(byte[] bytes) { + ByteBuffer buffer = ByteBuffer.wrap(bytes); + sum = buffer.getDouble(); + count = buffer.getLong(); +} +``` + + + +#### UDAF Classes + +To write a UDAF, you need to implement the `org.apache.iotdb.udf.api.UDAF` interface. + +The following table shows all the interfaces available for user implementation. + +| Interface definition | Description | Required to Implement | +| ------------------------------------------------------------ | ------------------------------------------------------------ | --------------------- | +| `void validate(UDFParameterValidator validator) throws Exception` | This method is mainly used to validate `UDFParameters` and it is executed before `beforeStart(UDFParameters, UDTFConfigurations)` is called. | Optional | +| `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` | Initialization method that invokes user-defined initialization behavior before UDAF processes the input data. Unlike UDTF, configuration is of type `UDAFConfiguration`. | Required | +| `State createState()` | To create a `State` object, usually just call the default constructor and modify the default initial value as needed. | Required | +| `void addInput(State state, Column[] columns, BitMap bitMap)` | Update `State` object according to the incoming data `Column[]` in batch, note that `column[0]` always represents the time column. In addition, `BitMap` represents the data that has been filtered out before, you need to manually determine whether the corresponding data has been filtered out when writing this method. | Required | +| `void combineState(State state, State rhs)` | Merge `rhs` state into `state` state. In a distributed scenario, the same set of data may be distributed on different nodes, IoTDB generates a `State` object for the partial data on each node, and then calls this method to merge it into the complete `State`. | Required | +| `void outputFinal(State state, ResultValue resultValue)` | Computes the final aggregated result based on the data in `State`. Note that according to the semantics of the aggregation, only one value can be output per group. | Required | +| `void beforeDestroy() ` | This method is called by the framework after the last input data is processed, and will only be called once in the life cycle of each UDF instance. | Optional | + +In the life cycle of a UDAF instance, the calling sequence of each method is as follows: + +1. `State createState()` +2. `void validate(UDFParameterValidator validator) throws Exception` +3. `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` +4. `void addInput(State state, Column[] columns, BitMap bitMap)` +5. `void combineState(State state, State rhs)` +6. `void outputFinal(State state, ResultValue resultValue)` +7. `void beforeDestroy()` + +Similar to UDTF, every time the framework executes a UDAF query, a new UDF instance will be constructed. When the query ends, the corresponding instance will be destroyed. Therefore, the internal data of the instances in different UDAF queries (even in the same SQL statement) are isolated. You can maintain some state data in the UDAF without considering the influence of concurrency and other factors. + +The usage of each interface will be described in detail below. + + + +##### void validate(UDFParameterValidator validator) throws Exception + +Same as UDTF, the `validate` method is used to validate the parameters entered by the user. + +In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom verification. + + + +##### void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception + + The `beforeStart` method does the same thing as the UDAF: + +1. Use UDFParameters to get the time series paths and parse key-value pair attributes entered by the user. +2. Set the strategy to access the raw data and set the output data type in UDAFConfigurations. +3. Create resources, such as establishing external connections, opening files, etc. + +The role of the `UDFParameters` type can be seen above. + +###### UDAFConfigurations + +The difference from UDTF is that UDAF uses `UDAFConfigurations` as the type of `configuration` object. + +Currently, this class only supports setting the type of output data. + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setOutputDataType(Type.INT32); } +} +``` + +The relationship between the output type set in `setOutputDataType` and the type of data output that `ResultValue` can actually receive is as follows: + +| The output type set in `setOutputDataType` | The output type that `ResultValue` can actually receive | +| ------------------------------------------ | ------------------------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `org.apache.iotdb.udf.api.type.Binary` | + +The output type of the UDAF is determined at runtime. You can dynamically determine the output sequence type based on the input type. + +Here is a simple example: + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setOutputDataType(parameters.getDataType(0)); +} +``` + + + +##### State createState() + +This method creates and initializes a `State` object for UDAF. Due to the limitations of the Java language, you can only call the default constructor for the `State` class. The default constructor assigns a default initial value to all the fields in the class, and if that initial value does not meet your requirements, you need to initialize them manually within this method. + +The following is an example that includes manual initialization. Suppose you want to implement an aggregate function that multiply all numbers in the group, then your initial `State` value should be set to 1, but the default constructor initializes it to 0, so you need to initialize `State` manually after calling the default constructor: + +```java +public State createState() { + MultiplyState state = new MultiplyState(); + state.result = 1; + return state; +} +``` + + + +##### void addInput(State state, Column[] columns, BitMap bitMap) + +This method updates the `State` object with the raw input data. For performance reasons, also to align with the IoTDB vectorized query engine, the raw input data is no longer a data point, but an array of columns ``Column[]``. Note that the first column (i.e. `column[0]`) is always the time column, so you can also do different operations in UDAF depending on the time. + +Since the input parameter is not of a single data point type, but of multiple columns, you need to manually filter some of the data in the columns, which is why the third parameter, `BitMap`, exists. It identifies which of these columns have been filtered out, so you don't have to think about the filtered data in any case. + +Here's an example of `addInput()` that counts the number of items (aka count). It shows how you can use `BitMap` to ignore data that has been filtered out. Note that due to the limitations of the Java language, you need to do the explicit cast the `State` object from type defined in the interface to a custom `State` type at the beginning of the method, otherwise you won't be able to use the `State` object. + +```java +public void addInput(State state, Column[] column, BitMap bitMap) { + CountState countState = (CountState) state; + + int count = column[0].getPositionCount(); + for (int i = 0; i < count; i++) { + if (bitMap != null && !bitMap.isMarked(i)) { + continue; + } + if (!column[1].isNull(i)) { + countState.count++; + } + } +} +``` + + + +##### void combineState(State state, State rhs) + +This method combines two `State`s, or more precisely, updates the first `State` object with the second `State` object. IoTDB is a distributed database, and the data of the same group may be distributed on different nodes. For performance reasons, IoTDB will first aggregate some of the data on each node into `State`, and then merge the `State`s on different nodes that belong to the same group, which is what `combineState` does. + +Here's an example of `combineState()` for averaging (aka avg). Similar to `addInput`, you need to do an explicit type conversion for the two `State`s at the beginning. Also note that you are updating the value of the first `State` with the contents of the second `State`. + +```java +public void combineState(State state, State rhs) { + AvgState avgState = (AvgState) state; + AvgState avgRhs = (AvgState) rhs; + + avgState.count += avgRhs.count; + avgState.sum += avgRhs.sum; +} +``` + + + +##### void outputFinal(State state, ResultValue resultValue) + +This method works by calculating the final result from `State`. You need to access the various fields in `State`, derive the final result, and set the final result into the `ResultValue` object.IoTDB internally calls this method once at the end for each group. Note that according to the semantics of aggregation, the final result can only be one value. + +Here is another `outputFinal` example for averaging (aka avg). In addition to the forced type conversion at the beginning, you will also see a specific use of the `ResultValue` object, where the final result is set by `setXXX` (where `XXX` is the type name). + +```java +public void outputFinal(State state, ResultValue resultValue) { + AvgState avgState = (AvgState) state; + + if (avgState.count != 0) { + resultValue.setDouble(avgState.sum / avgState.count); + } else { + resultValue.setNull(); + } +} +``` + + + +##### void beforeDestroy() + +The method for terminating a UDF. + +This method is called by the framework. For a UDF instance, `beforeDestroy` will be called after the last record is processed. In the entire life cycle of the instance, `beforeDestroy` will only be called once. + + + ### Maven Project Example If you use Maven, you can build your own UDF project referring to our **udf-example** module. You can find the project [here](https://github.com/apache/iotdb/tree/master/example/udf). diff --git a/src/zh/UserGuide/Master/User-Manual/Database-Programming.md b/src/zh/UserGuide/Master/User-Manual/Database-Programming.md index c6d5020e..17ca798b 100644 --- a/src/zh/UserGuide/Master/User-Manual/Database-Programming.md +++ b/src/zh/UserGuide/Master/User-Manual/Database-Programming.md @@ -1045,7 +1045,7 @@ IoTDB 支持两种类型的 UDF 函数,如下表所示。 | UDF 分类 | 描述 | | --------------------------------------------------- | ------------------------------------------------------------ | | UDTF(User Defined Timeseries Generating Function) | 自定义时间序列生成函数。该类函数允许接收多条时间序列,最终会输出一条时间序列,生成的时间序列可以有任意多数量的数据点。 | -| UDAF(User Defined Aggregation Function) | 正在开发,敬请期待。 | +| UDAF(User Defined Aggregation Function) | 自定义聚合函数。该类函数接受一条时间序列数据,最终会根据用户指定的 GROUP BY 类型,为每个组生成一个聚合后的数据点。 | ### UDF 依赖 @@ -1376,6 +1376,232 @@ UDTF 的结束方法,您可以在此方法中进行一些资源释放等的操 此方法由框架调用。对于一个 UDF 类实例而言,生命周期中会且只会被调用一次,即在处理完最后一条记录之后被调用。 +### UDAF(User Defined Aggregation Function) + +一个完整的 UDAF 定义涉及到 State 和 UDAF 两个类。 + +#### State 类 + +编写一个 State 类需要实现`org.apache.iotdb.udf.api.State`接口,下表是需要实现的方法说明。 + +| 接口定义 | 描述 | 是否必须 | +| -------------------------------- | ------------------------------------------------------------ | -------- | +| `void reset()` | 将 `State` 对象重置为初始的状态,您需要像编写构造函数一样,在该方法内填入 `State` 类中各个字段的初始值。 | 是 | +| `byte[] serialize()` | 将 `State` 序列化为二进制数据。该方法用于 IoTDB 内部的 `State` 对象传递,注意序列化的顺序必须和下面的反序列化方法一致。 | 是 | +| `void deserialize(byte[] bytes)` | 将二进制数据反序列化为 `State`。该方法用于 IoTDB 内部的 `State` 对象传递,注意反序列化的顺序必须和上面的序列化方法一致。 | 是 | + +下面将详细介绍各个接口的使用方法。 + +- void reset() + +该方法的作用是将 `State` 重置为初始的状态,您需要在该方法内填写 `State` 对象中各个字段的初始值。出于优化上的考量,IoTDB 在内部会尽可能地复用 `State`,而不是为每一个组创建一个新的 `State`,这样会引入不必要的开销。当 `State` 更新完一个组中的数据之后,就会调用这个方法重置为初始状态,以此来处理下一个组。 + +以求平均数(也就是 `avg`)的 `State` 为例,您需要数据的总和 `sum` 与数据的条数 `count`,并在 `reset()` 方法中将二者初始化为 0。 + +```java +class AvgState implements State { + double sum; + + long count; + + @Override + public void reset() { + sum = 0; + count = 0; + } + + // other methods +} +``` + +- byte[] serialize()/void deserialize(byte[] bytes) + +该方法的作用是将 State 序列化为二进制数据,和从二进制数据中反序列化出 State。IoTDB 作为分布式数据库,涉及到在不同节点中传递数据,因此您需要编写这两个方法,来实现 State 在不同节点中的传递。注意序列化和反序列的顺序必须一致。 + +还是以求平均数(也就是求 avg)的 State 为例,您可以通过任意途径将 State 的内容转化为 `byte[]` 数组,以及从 `byte[]` 数组中读取出 State 的内容,下面展示的是用 Java8 引入的 `ByteBuffer` 进行序列化/反序列的代码: + +```java +@Override +public byte[] serialize() { + ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES); + buffer.putDouble(sum); + buffer.putLong(count); + + return buffer.array(); +} + +@Override +public void deserialize(byte[] bytes) { + ByteBuffer buffer = ByteBuffer.wrap(bytes); + sum = buffer.getDouble(); + count = buffer.getLong(); +} +``` + +#### UDAF 类 + +编写一个 UDAF 类需要实现`org.apache.iotdb.udf.api.UDAF`接口,下表是需要实现的方法说明。 + +| 接口定义 | 描述 | 是否必须 | +| ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | +| `void validate(UDFParameterValidator validator) throws Exception` | 在初始化方法`beforeStart`调用前执行,用于检测`UDFParameters`中用户输入的参数是否合法。该方法与 UDTF 的`validate`相同。 | 否 | +| `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` | 初始化方法,在 UDAF 处理输入数据前,调用用户自定义的初始化行为。与 UDTF 不同的是,这里的 configuration 是 `UDAFConfiguration` 类型。 | 是 | +| `State createState()` | 创建`State`对象,一般只需要调用默认构造函数,然后按需修改默认的初始值即可。 | 是 | +| `void addInput(State state, Column[] columns, BitMap bitMap)` | 根据传入的数据`Column[]`批量地更新`State`对象,注意 `column[0]` 总是代表时间列。另外`BitMap`表示之前已经被过滤掉的数据,您在编写该方法时需要手动判断对应的数据是否被过滤掉。 | 是 | +| `void combineState(State state, State rhs)` | 将`rhs`状态合并至`state`状态中。在分布式场景下,同一组的数据可能分布在不同节点上,IoTDB 会为每个节点上的部分数据生成一个`State`对象,然后调用该方法合并成完整的`State`。 | 是 | +| `void outputFinal(State state, ResultValue resultValue)` | 根据`State`中的数据,计算出最终的聚合结果。注意根据聚合的语义,每一组只能输出一个值。 | 是 | +| `void beforeDestroy() ` | UDAF 的结束方法。此方法由框架调用,并且只会被调用一次,即在处理完最后一条记录之后被调用。 | 否 | + +在一个完整的 UDAF 实例生命周期中,各个方法的调用顺序如下: + +1. `State createState()` +2. `void validate(UDFParameterValidator validator) throws Exception` +3. `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` +4. `void addInput(State state, Column[] columns, BitMap bitMap)` +5. `void combineState(State state, State rhs)` +6. `void outputFinal(State state, ResultValue resultValue)` +7. `void beforeDestroy()` + +和 UDTF 类似,框架每执行一次 UDAF 查询,都会构造一个全新的 UDF 类实例,查询结束时,对应的 UDF 类实例即被销毁,因此不同 UDAF 查询(即使是在同一个 SQL 语句中)UDF 类实例内部的数据都是隔离的。您可以放心地在 UDAF 中维护一些状态数据,无需考虑并发对 UDF 类实例内部状态数据的影响。 + +下面将详细介绍各个接口的使用方法。 + + * void validate(UDFParameterValidator validator) throws Exception + +同 UDTF, `validate`方法能够对用户输入的参数进行验证。 + +您可以在该方法中限制输入序列的数量和类型,检查用户输入的属性或者进行自定义逻辑的验证。 + + * void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception + + `beforeStart`方法的作用 UDAF 相同: + + 1. 帮助用户解析 SQL 语句中的 UDF 参数 + 2. 配置 UDF 运行时必要的信息,即指定 UDF 访问原始数据时采取的策略和输出结果序列的类型 + 3. 创建资源,比如建立外部链接,打开文件等。 + +其中,`UDFParameters` 类型的作用可以参照上文。 + +##### UDAFConfigurations + +和 UDTF 的区别在于,UDAF 使用了 `UDAFConfigurations` 作为 `configuration` 对象的类型。 + +目前,该类仅支持设置输出数据的类型。 + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setOutputDataType(Type.INT32); +} +``` + +`setOutputDataType` 中设定的输出类型和 `ResultValue` 实际能够接收的数据输出类型关系如下: + +| `setOutputDataType`中设定的输出类型 | `ResultValue`实际能够接收的输出类型 | +| :---------------------------------- | :------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `org.apache.iotdb.udf.api.type.Binary` | + +UDAF 输出序列的类型也是运行时决定的。您可以根据输入序列类型动态决定输出序列类型。 + +下面是一个简单的例子: + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setOutputDataType(parameters.getDataType(0)); +} +``` + +- State createState() + +为 UDAF 创建并初始化 `State`。由于 Java 语言本身的限制,您只能调用 `State` 类的默认构造函数。默认构造函数会为类中所有的字段赋一个默认的初始值,如果该初始值并不符合您的要求,您需要在这个方法内进行手动的初始化。 + +下面是一个包含手动初始化的例子。假设您要实现一个累乘的聚合函数,`State` 的初始值应该设置为 1,但是默认构造函数会初始化为 0,因此您需要在调用默认构造函数之后,手动对 `State` 进行初始化: + +```java +public State createState() { + MultiplyState state = new MultiplyState(); + state.result = 1; + return state; +} +``` + +- void addInput(State state, Column[] columns, BitMap bitMap) + +该方法的作用是,通过原始的输入数据来更新 `State` 对象。出于性能上的考量,也是为了和 IoTDB 向量化的查询引擎相对齐,原始的输入数据不再是一个数据点,而是列的数组 `Column[]`。注意第一列(也就是 `column[0]` )总是时间列,因此您也可以在 UDAF 中根据时间进行不同的操作。 + +由于输入参数的类型不是一个数据点,而是多个列,您需要手动对列中的部分数据进行过滤处理,这就是第三个参数 `BitMap` 存在的意义。它用来标识这些列中哪些数据被过滤掉了,您在任何情况下都无需考虑被过滤掉的数据。 + +下面是一个用于统计数据条数(也就是 count)的 `addInput()` 示例。它展示了您应该如何使用 `BitMap` 来忽视那些已经被过滤掉的数据。注意还是由于 Java 语言本身的限制,您需要在方法的开头将接口中定义的 `State` 类型强制转化为自定义的 `State` 类型,不然后续无法正常使用该 `State` 对象。 + +```java +public void addInput(State state, Column[] column, BitMap bitMap) { + CountState countState = (CountState) state; + + int count = column[0].getPositionCount(); + for (int i = 0; i < count; i++) { + if (bitMap != null && !bitMap.isMarked(i)) { + continue; + } + if (!column[1].isNull(i)) { + countState.count++; + } + } +} +``` + +- void combineState(State state, State rhs) + +该方法的作用是合并两个 `State`,更加准确的说,是用第二个 `State` 对象来更新第一个 `State` 对象。IoTDB 是分布式数据库,同一组的数据可能分布在多个不同的节点上。出于性能考虑,IoTDB 会为每个节点上的部分数据先进行聚合成 `State`,然后再将不同节点上的、属于同一个组的 `State` 进行合并,这就是 `combineState` 的作用。 + +下面是一个用于求平均数(也就是 avg)的 `combineState()` 示例。和 `addInput` 类似,您都需要在开头对两个 `State` 进行强制类型转换。另外需要注意是用第二个 `State` 的内容来更新第一个 `State` 的值。 + +```java +public void combineState(State state, State rhs) { + AvgState avgState = (AvgState) state; + AvgState avgRhs = (AvgState) rhs; + + avgState.count += avgRhs.count; + avgState.sum += avgRhs.sum; +} +``` + +- void outputFinal(State state, ResultValue resultValue) + +该方法的作用是从 `State` 中计算出最终的结果。您需要访问 `State` 中的各个字段,求出最终的结果,并将最终的结果设置到 `ResultValue` 对象中。IoTDB 内部会为每个组在最后调用一次这个方法。注意根据聚合的语义,最终的结果只能是一个值。 + +下面还是一个用于求平均数(也就是 avg)的 `outputFinal` 示例。除了开头的强制类型转换之外,您还将看到 `ResultValue` 对象的具体用法,即通过 `setXXX`(其中 `XXX` 是类型名)来设置最后的结果。 + +```java +public void outputFinal(State state, ResultValue resultValue) { + AvgState avgState = (AvgState) state; + + if (avgState.count != 0) { + resultValue.setDouble(avgState.sum / avgState.count); + } else { + resultValue.setNull(); + } +} +``` + + * void beforeDestroy() + +UDAF 的结束方法,您可以在此方法中进行一些资源释放等的操作。 + +此方法由框架调用。对于一个 UDF 类实例而言,生命周期中会且只会被调用一次,即在处理完最后一条记录之后被调用。 + ### 完整 Maven 项目示例 如果您使用 [Maven](http://search.maven.org/),可以参考我们编写的示例项目**udf-example**。您可以在 [这里](https://github.com/apache/iotdb/tree/master/example/udf) 找到它。 diff --git a/src/zh/UserGuide/latest/User-Manual/Database-Programming.md b/src/zh/UserGuide/latest/User-Manual/Database-Programming.md index c6d5020e..17ca798b 100644 --- a/src/zh/UserGuide/latest/User-Manual/Database-Programming.md +++ b/src/zh/UserGuide/latest/User-Manual/Database-Programming.md @@ -1045,7 +1045,7 @@ IoTDB 支持两种类型的 UDF 函数,如下表所示。 | UDF 分类 | 描述 | | --------------------------------------------------- | ------------------------------------------------------------ | | UDTF(User Defined Timeseries Generating Function) | 自定义时间序列生成函数。该类函数允许接收多条时间序列,最终会输出一条时间序列,生成的时间序列可以有任意多数量的数据点。 | -| UDAF(User Defined Aggregation Function) | 正在开发,敬请期待。 | +| UDAF(User Defined Aggregation Function) | 自定义聚合函数。该类函数接受一条时间序列数据,最终会根据用户指定的 GROUP BY 类型,为每个组生成一个聚合后的数据点。 | ### UDF 依赖 @@ -1376,6 +1376,232 @@ UDTF 的结束方法,您可以在此方法中进行一些资源释放等的操 此方法由框架调用。对于一个 UDF 类实例而言,生命周期中会且只会被调用一次,即在处理完最后一条记录之后被调用。 +### UDAF(User Defined Aggregation Function) + +一个完整的 UDAF 定义涉及到 State 和 UDAF 两个类。 + +#### State 类 + +编写一个 State 类需要实现`org.apache.iotdb.udf.api.State`接口,下表是需要实现的方法说明。 + +| 接口定义 | 描述 | 是否必须 | +| -------------------------------- | ------------------------------------------------------------ | -------- | +| `void reset()` | 将 `State` 对象重置为初始的状态,您需要像编写构造函数一样,在该方法内填入 `State` 类中各个字段的初始值。 | 是 | +| `byte[] serialize()` | 将 `State` 序列化为二进制数据。该方法用于 IoTDB 内部的 `State` 对象传递,注意序列化的顺序必须和下面的反序列化方法一致。 | 是 | +| `void deserialize(byte[] bytes)` | 将二进制数据反序列化为 `State`。该方法用于 IoTDB 内部的 `State` 对象传递,注意反序列化的顺序必须和上面的序列化方法一致。 | 是 | + +下面将详细介绍各个接口的使用方法。 + +- void reset() + +该方法的作用是将 `State` 重置为初始的状态,您需要在该方法内填写 `State` 对象中各个字段的初始值。出于优化上的考量,IoTDB 在内部会尽可能地复用 `State`,而不是为每一个组创建一个新的 `State`,这样会引入不必要的开销。当 `State` 更新完一个组中的数据之后,就会调用这个方法重置为初始状态,以此来处理下一个组。 + +以求平均数(也就是 `avg`)的 `State` 为例,您需要数据的总和 `sum` 与数据的条数 `count`,并在 `reset()` 方法中将二者初始化为 0。 + +```java +class AvgState implements State { + double sum; + + long count; + + @Override + public void reset() { + sum = 0; + count = 0; + } + + // other methods +} +``` + +- byte[] serialize()/void deserialize(byte[] bytes) + +该方法的作用是将 State 序列化为二进制数据,和从二进制数据中反序列化出 State。IoTDB 作为分布式数据库,涉及到在不同节点中传递数据,因此您需要编写这两个方法,来实现 State 在不同节点中的传递。注意序列化和反序列的顺序必须一致。 + +还是以求平均数(也就是求 avg)的 State 为例,您可以通过任意途径将 State 的内容转化为 `byte[]` 数组,以及从 `byte[]` 数组中读取出 State 的内容,下面展示的是用 Java8 引入的 `ByteBuffer` 进行序列化/反序列的代码: + +```java +@Override +public byte[] serialize() { + ByteBuffer buffer = ByteBuffer.allocate(Double.BYTES + Long.BYTES); + buffer.putDouble(sum); + buffer.putLong(count); + + return buffer.array(); +} + +@Override +public void deserialize(byte[] bytes) { + ByteBuffer buffer = ByteBuffer.wrap(bytes); + sum = buffer.getDouble(); + count = buffer.getLong(); +} +``` + +#### UDAF 类 + +编写一个 UDAF 类需要实现`org.apache.iotdb.udf.api.UDAF`接口,下表是需要实现的方法说明。 + +| 接口定义 | 描述 | 是否必须 | +| ------------------------------------------------------------ | ------------------------------------------------------------ | -------- | +| `void validate(UDFParameterValidator validator) throws Exception` | 在初始化方法`beforeStart`调用前执行,用于检测`UDFParameters`中用户输入的参数是否合法。该方法与 UDTF 的`validate`相同。 | 否 | +| `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` | 初始化方法,在 UDAF 处理输入数据前,调用用户自定义的初始化行为。与 UDTF 不同的是,这里的 configuration 是 `UDAFConfiguration` 类型。 | 是 | +| `State createState()` | 创建`State`对象,一般只需要调用默认构造函数,然后按需修改默认的初始值即可。 | 是 | +| `void addInput(State state, Column[] columns, BitMap bitMap)` | 根据传入的数据`Column[]`批量地更新`State`对象,注意 `column[0]` 总是代表时间列。另外`BitMap`表示之前已经被过滤掉的数据,您在编写该方法时需要手动判断对应的数据是否被过滤掉。 | 是 | +| `void combineState(State state, State rhs)` | 将`rhs`状态合并至`state`状态中。在分布式场景下,同一组的数据可能分布在不同节点上,IoTDB 会为每个节点上的部分数据生成一个`State`对象,然后调用该方法合并成完整的`State`。 | 是 | +| `void outputFinal(State state, ResultValue resultValue)` | 根据`State`中的数据,计算出最终的聚合结果。注意根据聚合的语义,每一组只能输出一个值。 | 是 | +| `void beforeDestroy() ` | UDAF 的结束方法。此方法由框架调用,并且只会被调用一次,即在处理完最后一条记录之后被调用。 | 否 | + +在一个完整的 UDAF 实例生命周期中,各个方法的调用顺序如下: + +1. `State createState()` +2. `void validate(UDFParameterValidator validator) throws Exception` +3. `void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception` +4. `void addInput(State state, Column[] columns, BitMap bitMap)` +5. `void combineState(State state, State rhs)` +6. `void outputFinal(State state, ResultValue resultValue)` +7. `void beforeDestroy()` + +和 UDTF 类似,框架每执行一次 UDAF 查询,都会构造一个全新的 UDF 类实例,查询结束时,对应的 UDF 类实例即被销毁,因此不同 UDAF 查询(即使是在同一个 SQL 语句中)UDF 类实例内部的数据都是隔离的。您可以放心地在 UDAF 中维护一些状态数据,无需考虑并发对 UDF 类实例内部状态数据的影响。 + +下面将详细介绍各个接口的使用方法。 + + * void validate(UDFParameterValidator validator) throws Exception + +同 UDTF, `validate`方法能够对用户输入的参数进行验证。 + +您可以在该方法中限制输入序列的数量和类型,检查用户输入的属性或者进行自定义逻辑的验证。 + + * void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception + + `beforeStart`方法的作用 UDAF 相同: + + 1. 帮助用户解析 SQL 语句中的 UDF 参数 + 2. 配置 UDF 运行时必要的信息,即指定 UDF 访问原始数据时采取的策略和输出结果序列的类型 + 3. 创建资源,比如建立外部链接,打开文件等。 + +其中,`UDFParameters` 类型的作用可以参照上文。 + +##### UDAFConfigurations + +和 UDTF 的区别在于,UDAF 使用了 `UDAFConfigurations` 作为 `configuration` 对象的类型。 + +目前,该类仅支持设置输出数据的类型。 + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setOutputDataType(Type.INT32); +} +``` + +`setOutputDataType` 中设定的输出类型和 `ResultValue` 实际能够接收的数据输出类型关系如下: + +| `setOutputDataType`中设定的输出类型 | `ResultValue`实际能够接收的输出类型 | +| :---------------------------------- | :------------------------------------- | +| `INT32` | `int` | +| `INT64` | `long` | +| `FLOAT` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `TEXT` | `org.apache.iotdb.udf.api.type.Binary` | + +UDAF 输出序列的类型也是运行时决定的。您可以根据输入序列类型动态决定输出序列类型。 + +下面是一个简单的例子: + +```java +void beforeStart(UDFParameters parameters, UDAFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setOutputDataType(parameters.getDataType(0)); +} +``` + +- State createState() + +为 UDAF 创建并初始化 `State`。由于 Java 语言本身的限制,您只能调用 `State` 类的默认构造函数。默认构造函数会为类中所有的字段赋一个默认的初始值,如果该初始值并不符合您的要求,您需要在这个方法内进行手动的初始化。 + +下面是一个包含手动初始化的例子。假设您要实现一个累乘的聚合函数,`State` 的初始值应该设置为 1,但是默认构造函数会初始化为 0,因此您需要在调用默认构造函数之后,手动对 `State` 进行初始化: + +```java +public State createState() { + MultiplyState state = new MultiplyState(); + state.result = 1; + return state; +} +``` + +- void addInput(State state, Column[] columns, BitMap bitMap) + +该方法的作用是,通过原始的输入数据来更新 `State` 对象。出于性能上的考量,也是为了和 IoTDB 向量化的查询引擎相对齐,原始的输入数据不再是一个数据点,而是列的数组 `Column[]`。注意第一列(也就是 `column[0]` )总是时间列,因此您也可以在 UDAF 中根据时间进行不同的操作。 + +由于输入参数的类型不是一个数据点,而是多个列,您需要手动对列中的部分数据进行过滤处理,这就是第三个参数 `BitMap` 存在的意义。它用来标识这些列中哪些数据被过滤掉了,您在任何情况下都无需考虑被过滤掉的数据。 + +下面是一个用于统计数据条数(也就是 count)的 `addInput()` 示例。它展示了您应该如何使用 `BitMap` 来忽视那些已经被过滤掉的数据。注意还是由于 Java 语言本身的限制,您需要在方法的开头将接口中定义的 `State` 类型强制转化为自定义的 `State` 类型,不然后续无法正常使用该 `State` 对象。 + +```java +public void addInput(State state, Column[] column, BitMap bitMap) { + CountState countState = (CountState) state; + + int count = column[0].getPositionCount(); + for (int i = 0; i < count; i++) { + if (bitMap != null && !bitMap.isMarked(i)) { + continue; + } + if (!column[1].isNull(i)) { + countState.count++; + } + } +} +``` + +- void combineState(State state, State rhs) + +该方法的作用是合并两个 `State`,更加准确的说,是用第二个 `State` 对象来更新第一个 `State` 对象。IoTDB 是分布式数据库,同一组的数据可能分布在多个不同的节点上。出于性能考虑,IoTDB 会为每个节点上的部分数据先进行聚合成 `State`,然后再将不同节点上的、属于同一个组的 `State` 进行合并,这就是 `combineState` 的作用。 + +下面是一个用于求平均数(也就是 avg)的 `combineState()` 示例。和 `addInput` 类似,您都需要在开头对两个 `State` 进行强制类型转换。另外需要注意是用第二个 `State` 的内容来更新第一个 `State` 的值。 + +```java +public void combineState(State state, State rhs) { + AvgState avgState = (AvgState) state; + AvgState avgRhs = (AvgState) rhs; + + avgState.count += avgRhs.count; + avgState.sum += avgRhs.sum; +} +``` + +- void outputFinal(State state, ResultValue resultValue) + +该方法的作用是从 `State` 中计算出最终的结果。您需要访问 `State` 中的各个字段,求出最终的结果,并将最终的结果设置到 `ResultValue` 对象中。IoTDB 内部会为每个组在最后调用一次这个方法。注意根据聚合的语义,最终的结果只能是一个值。 + +下面还是一个用于求平均数(也就是 avg)的 `outputFinal` 示例。除了开头的强制类型转换之外,您还将看到 `ResultValue` 对象的具体用法,即通过 `setXXX`(其中 `XXX` 是类型名)来设置最后的结果。 + +```java +public void outputFinal(State state, ResultValue resultValue) { + AvgState avgState = (AvgState) state; + + if (avgState.count != 0) { + resultValue.setDouble(avgState.sum / avgState.count); + } else { + resultValue.setNull(); + } +} +``` + + * void beforeDestroy() + +UDAF 的结束方法,您可以在此方法中进行一些资源释放等的操作。 + +此方法由框架调用。对于一个 UDF 类实例而言,生命周期中会且只会被调用一次,即在处理完最后一条记录之后被调用。 + ### 完整 Maven 项目示例 如果您使用 [Maven](http://search.maven.org/),可以参考我们编写的示例项目**udf-example**。您可以在 [这里](https://github.com/apache/iotdb/tree/master/example/udf) 找到它。