From db94fe15a4a8073f1cc9c00cd944f08b4f9a3b41 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Thu, 25 Jan 2024 18:45:51 +0800 Subject: [PATCH 1/3] add doc --- doc/UserGuide/TsFile-API.md | 529 ++++++++++++++++++++++++++++++++++ doc/zh/TsFile-API.md | 561 ++++++++++++++++++++++++++++++++++++ 2 files changed, 1090 insertions(+) create mode 100644 doc/UserGuide/TsFile-API.md create mode 100644 doc/zh/TsFile-API.md diff --git a/doc/UserGuide/TsFile-API.md b/doc/UserGuide/TsFile-API.md new file mode 100644 index 000000000..4ee7b4dc5 --- /dev/null +++ b/doc/UserGuide/TsFile-API.md @@ -0,0 +1,529 @@ + + +# TsFile API + +`TsFile` is a file format of time series used in IoTDB. +This document introduces the usage of this file format. + +## TsFile library Installation + +There are several ways to use TsFile in your own project. + +* Build from source and use the jars: + ```shell + git clone https://github.com/apache/iotdb.git + cd iotdb-core/tsfile + mvn clean package -DskipTests + ``` + Then, all the jars are located in the folder named `target/`. Import `target/tsfile-0.12.0-jar-with-dependencies.jar` into your project. +* Build from source and use as maven dependency: + Compile source codes and deploy to your local repository in three steps: + * Get the source codes + ```shell + git clone https://github.com/apache/iotdb.git + cd iotdb-core/tsfile + mvn clean install -DskipTests + ``` + * Add the following dependency into your project: + ```xml + + org.apache.iotdb + tsfile + 1.0.0 + + ``` +* Download the convenience binaries available from Maven-Central: + * (If you want to reference the latest SNAPSHOT versions, you need to execute this step) + Add the `Apache Snapshot Repository` to your projects main `pom.xml`: + ```xml + + + apache.snapshots + Apache Development Snapshot Repository + https://repository.apache.org/content/repositories/snapshots/ + + false + + + true + + + + ``` + Alternately, find or create your maven `settings.xml` located at: `${username}\.m2\settings.xml`, add this `` to ``: + ```xml + + allow-snapshots + + true + + + + apache.snapshots + Apache Development Snapshot Repository + https://repository.apache.org/content/repositories/snapshots/ + + false + + + true + + + + + ``` + * Then add dependency into your project: + ```xml + + org.apache.iotdb + tsfile + 1.0.0 + + ``` + +## TsFile Usage + +This section demonstrates the detailed usages of TsFile. + +Time-series data is considered as a sequence of quadruples. A quadruple is defined as (device, measurement, time, value). + +* **measurement**: A physical or formal measurement that a time-series data takes, e.g. the temperature of a city, the sales number of some goods or the speed of a train at different times. + As a traditional sensor (like a thermometer) also produces a single measurement which we can use to create a time-series, we will use measurement and sensor interchangeably below. + +* **device**: A device refers to an entity that produces one or multiple measurements (producing multiple time-series), e.g., + a running train monitors its speed, oil meter, miles it has run, current passengers. + Each is persisted to a time-series dataset. + +* **Row of Data**: In many industrial applications, a device normally contains more than one sensor and these sensors may have values at the same timestamp, which is called `row of data`. + Formally, a `row of data` consists of a `device_id`, a `timestamp` which indicates the milliseconds since `January 1, 1970, 00:00:00`, and several data pairs composed of `measurement_id` and corresponding `value`. + All data pairs in a line of data belong to the same `device_id` and have the same timestamp. + If one of the `measurements` does not have a `value` in the `timestamp`, a space is used instead (Actually, TsFile does not actually store `null` values). + Its format is shown as follows: + ``` + device_id, timestamp, ... + ``` + An example is illustrated as follows. + In this example, the data type of two measurements are `INT32`, `FLOAT` respectively. + ``` + device_1, 1490860659000, m1, 10, m2, 12.12 + ``` + +### Write TsFile + +A `TsFile` is generated by the following steps (The complete code is given in [TsFile examples module](https://github.com/apache/iotdb/tree/master/example/tsfile)): + +1. Construct a `TsFileWriter` instance. +2. Define the `Schema` for the `TSFile` (However a pre-defined Schema can also be passed directly to the constructor in step 1). +3. Write data to the `TsFileWriter`. +4. Close the `TsFileWriter` (When using a Java try-with-resources block, Java will take care of closing the TsFileWriter). + +#### Construct an `TsFileWriter` instance. + +Here are the available constructors: +* Without pre-defined schema: + ```java + TsFileWriter(File file) throws IOException + ``` +* With pre-defined schema: + ```java + TsFileWriter(File file, Schema schema) throws IOException + ``` +* Providing a `TsFileOutput` instead of a File with a schema (useful when using the HDFS file system as `TsFileOutput` can be an instance of class `HDFSOutput`): + ```java + TsFileWriter(TsFileOutput output, Schema schema) throws IOException + ``` +* If you want to set some TSFile configuration on your own, you could use param `config`. For example: + ```java + TSFileConfig conf = new TSFileConfig(); + conf.setTSFileStorageFs("HDFS"); + TsFileWriter tsFileWriter = new TsFileWriter(file, schema, conf); + ``` + In this example, data files will be stored in HDFS, instead of local file system. + If you'd like to store data files in local file system, you can use `conf.setTSFileStorageFs("LOCAL")`, which is also the default config. + You can configure the `ip` and `rpc port` of your HDFS by setting `config.setHdfsIp(...)` and `config.setHdfsPort(...)`. The default ip is `localhost` and default rpc port is `9000`. + +**Parameters:** + +* file : The TsFile to write +* schema : The file schemas, will be introduced in next part. +* config : The config of TsFile. + +#### Construct a `Schema` instance. + +The class `Schema` contains a map whose key is the name of one measurement schema, and the value is the schema itself. +Here are the most important methods: +```java +// Create an empty Schema or from an existing map +public Schema() +public Schema(Map measurements) +// Use this two interfaces to add measurements +public void registerMeasurement(MeasurementSchema descriptor) +public void registerMeasurements(Map measurements) +// Some useful getter and checker +public TSDataType getMeasurementDataType(String measurementId) +public MeasurementSchema getMeasurementSchema(String measurementId) +public Map getAllMeasurementSchema() +public boolean hasMeasurement(String measurementId) +``` +The class `MeasurementSchema` contains the information of one measurement, there are several constructors: + +```java +public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding) +public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType) +public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType, Map props) +``` +**Parameters:** + +* measurementID: The name of this measurement, typically the name of the sensor. +* type: The data type, now support six types: `BOOLEAN`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `TEXT`; +* encoding: The data encoding. +* compressionType: The data compression type. +* props: A map of properties for special data types, such as `max_point_number` for `FLOAT` and `DOUBLE`, `max_string_length` for `TEXT`. Use as string pairs into a map such as ("max_point_number", "3"). + +> **Notice:** Although one measurement name can be used in multiple deltaObjects, the properties cannot be changed. I.e. it is not allowed to add one measurement name for multiple times with different type or encoding. Here is a bad example: + +```java +List measurementSchemas = new ArrayList<>(); +// The measurement "sensor_1" is float type +measurementSchemas.add(new MeasurementSchema("sensor_1", TSDataType.FLOAT, TSEncoding.RLE)); +measurementSchemas.add(new MeasurementSchema("sensor_1", TSDataType.INT32, TSEncoding.RLE)); +``` +#### Insert and write data + +Use this interface to create a new `TSRecord`(a timestamp and device pair). + +```java + TSRecord tsRecord = new TSRecord(time, deviceId); +``` +Then create a `DataPoint`(a measurement and value pair), and use the `addTuple` method to add the `DataPoint` to the current `TsRecord`. +```java + DataPoint dPoint = new LongDataPoint("sensor_1", 42); + tsRecord.addTuple(dPoint); +``` +> Notice: there are implementations of `DataPoint` for each of IoTDBs supported data types: `BooleanDataPoint`, `DoubleDataPoint`, `FloatDataPoint`, `IntDataPoint`, `LongDataPoint` and `StringDataPoint`. + +As soon as the TSRecord is finished, write it to file with the following command: +```java + tsFileWriter.write(tsRecord); +``` + +#### Call `close` to finish this writing process + +```java + tsFileWriter.close(); +``` + +We are also able to write data into a closed TsFile. + +1. Use `ForceAppendTsFileWriter` to open a closed file. + + ```java + public ForceAppendTsFileWriter(File file) throws IOException + ``` + +2. Call `doTruncate()` to truncate the part of Metadata + +3. Then use `ForceAppendTsFileWriter` to construct a new `TsFileWriter` + +```java +public TsFileWriter(TsFileIOWriter fileWriter) throws IOException +``` +Please note, we should redo the step of adding measurements before writing new data to the TsFile. + +### Example for writing a TsFile + +You should install TsFile to your local maven repository. + +```shell +mvn clean install -pl iotdb-core/tsfile -am -DskipTests +``` + +You could write a TsFile by constructing **TSRecord** if you have the **non-aligned** (e.g. not all sensors contain values) time series data. + +A more thorough example can be found at `/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTSRecord.java` + +You could write a TsFile by constructing **Tablet** if you have the **aligned** time series data. + +A more thorough example can be found at `/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTablet.java` + +You could write data into a closed TsFile by using **ForceAppendTsFileWriter**. + +A more thorough example can be found at `/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileForceAppendWrite.java` + +### Interface for Reading TsFile + +* Definition of Path + +A path is a dot-separated string which uniquely identifies a time-series in TsFile, e.g., "root.area_1.device_1.sensor_1". +The last section "sensor_1" is called "measurementId" while the remaining parts "root.area_1.device_1" is called deviceId. +As mentioned above, the same measurement in different devices has the same data type and encoding, and devices are also unique. + +In read interfaces, The parameter `paths` indicates the measurements to be selected. + +Path instance can be easily constructed through the class `Path`. For example: + +```java +Path p = new Path("device_1.sensor_1"); +``` + +We will pass an ArrayList of paths for final query call to support multiple paths. + +```java +List paths = new ArrayList(); +paths.add(new Path("device_1.sensor_1")); +paths.add(new Path("device_1.sensor_3")); +``` + +> **Notice:** When constructing a Path, the format of the parameter should be a dot-separated string, the last part will + be recognized as measurementId while the remaining parts will be recognized as deviceId. + + +* Definition of Filter + + * Usage Scenario +Filter is used in TsFile reading process to select data satisfying one or more given condition(s). + + * IExpression +The `IExpression` is a filter expression interface and it will be passed to our final query call. +We create one or more filter expressions and may use binary filter operators to link them to our final expression. + +* **Create a Filter Expression** + + There are two types of filters. + + * TimeFilter: A filter for `time` in time-series data. + ``` + IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter); + ``` + Use the following relationships to get a `TimeFilter` object (value is a long int variable). + + |Relationship|Description| + |---|---| + |TimeFilter.eq(value)|Choose the time equal to the value| + |TimeFilter.lt(value)|Choose the time less than the value| + |TimeFilter.gt(value)|Choose the time greater than the value| + |TimeFilter.ltEq(value)|Choose the time less than or equal to the value| + |TimeFilter.gtEq(value)|Choose the time greater than or equal to the value| + |TimeFilter.notEq(value)|Choose the time not equal to the value| + |TimeFilter.not(TimeFilter)|Choose the time not satisfy another TimeFilter| + + * ValueFilter: A filter for `value` in time-series data. + + ``` + IExpression valueFilterExpr = new SingleSeriesExpression(Path, ValueFilter); + ``` + The usage of `ValueFilter` is the same as using `TimeFilter`, just to make sure that the type of the value + equal to the measurement's(defined in the path). + +* **Binary Filter Operators** + + Binary filter operators can be used to link two single expressions. + + * BinaryExpression.and(Expression, Expression): Choose the value satisfy for both expressions. + * BinaryExpression.or(Expression, Expression): Choose the value satisfy for at least one expression. + +Filter Expression Examples + +* **TimeFilterExpression Examples** + + ```java + IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.eq(15)); // series time = 15 + ``` +``` + ```java + IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.ltEq(15)); // series time <= 15 +``` +```java + IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.lt(15)); // series time < 15 +``` + ```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.gtEq(15)); // series time >= 15 + ``` + ```java + IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.notEq(15)); // series time != 15 +``` + ```java + IExpression timeFilterExpr = BinaryExpression.and( + new GlobalTimeExpression(TimeFilter.gtEq(15L)), + new GlobalTimeExpression(TimeFilter.lt(25L))); // 15 <= series time < 25 +``` + ```java + IExpression timeFilterExpr = BinaryExpression.or( + new GlobalTimeExpression(TimeFilter.gtEq(15L)), + new GlobalTimeExpression(TimeFilter.lt(25L))); // series time >= 15 or series time < 25 + ``` +* Read Interface + +First, we open the TsFile and get a `ReadOnlyTsFile` instance from a file path string `path`. + +```java +TsFileSequenceReader reader = new TsFileSequenceReader(path); + +ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader); +``` +Next, we prepare the path array and query expression, then get final `QueryExpression` object by this interface: + +```java +QueryExpression queryExpression = QueryExpression.create(paths, statement); +``` + +The ReadOnlyTsFile class has two `query` method to perform a query. +* **Method 1** + + ```java + public QueryDataSet query(QueryExpression queryExpression) throws IOException + ``` + +* **Method 2** + + ```java + public QueryDataSet query(QueryExpression queryExpression, long partitionStartOffset, long partitionEndOffset) throws IOException + ``` + + This method is designed for advanced applications such as the TsFile-Spark Connector. + + * **params** : For method 2, two additional parameters are added to support partial query: + * ```partitionStartOffset```: start offset for a TsFile + * ```partitionEndOffset```: end offset for a TsFile + + > **What is Partial Query ?** + > + > In some distributed file systems(e.g. HDFS), a file is split into severval parts which are called "Blocks" and stored in different nodes. Executing a query paralleled in each nodes involved makes better efficiency. Thus Partial Query is needed. Paritial Query only selects the results stored in the part split by ```QueryConstant.PARTITION_START_OFFSET``` and ```QueryConstant.PARTITION_END_OFFSET``` for a TsFile. + +* QueryDataset Interface + +The query performed above will return a `QueryDataset` object. + +Here's the useful interfaces for user. + + * `bool hasNext();` + + Return true if this dataset still has elements. + * `List getPaths()` + + Get the paths in this data set. + * `List getDataTypes();` + + Get the data types. The class TSDataType is an enum class, the value will be one of the following: + + BOOLEAN, + INT32, + INT64, + FLOAT, + DOUBLE, + TEXT; + * `RowRecord next() throws IOException;` + + Get the next record. + + The class `RowRecord` consists of a `long` timestamp and a `List` for data in different sensors, + we can use two getter methods to get them. + + ```java + long getTimestamp(); + List getFields(); + ``` + + To get data from one Field, use these methods: + + ```java + TSDataType getDataType(); + Object getObjectValue(); + ``` + + + +### Example for reading an existing TsFile + + +You should install TsFile to your local maven repository. + + +A more thorough example with query statement can be found at +`/tsfile/example/src/main/java/org/apache/iotdb/tsfile/TsFileRead.java` + +```java +package org.apache.iotdb.tsfile; +import java.io.IOException; +import java.util.ArrayList; +import org.apache.iotdb.tsfile.read.ReadOnlyTsFile; +import org.apache.iotdb.tsfile.read.TsFileSequenceReader; +import org.apache.iotdb.tsfile.read.common.Path; +import org.apache.iotdb.tsfile.read.expression.IExpression; +import org.apache.iotdb.tsfile.read.expression.QueryExpression; +import org.apache.iotdb.tsfile.read.expression.impl.BinaryExpression; +import org.apache.iotdb.tsfile.read.expression.impl.GlobalTimeExpression; +import org.apache.iotdb.tsfile.read.expression.impl.SingleSeriesExpression; +import org.apache.iotdb.tsfile.read.filter.TimeFilter; +import org.apache.iotdb.tsfile.read.filter.ValueFilter; +import org.apache.iotdb.tsfile.read.query.dataset.QueryDataSet; + +/** + * The class is to show how to read TsFile file named "test.tsfile". + * The TsFile file "test.tsfile" is generated from class TsFileWrite. + * Run TsFileWrite to generate the test.tsfile first + */ +public class TsFileRead { + private static void queryAndPrint(ArrayList paths, ReadOnlyTsFile readTsFile, IExpression statement) + throws IOException { + QueryExpression queryExpression = QueryExpression.create(paths, statement); + QueryDataSet queryDataSet = readTsFile.query(queryExpression); + while (queryDataSet.hasNext()) { + System.out.println(queryDataSet.next()); + } + System.out.println("------------"); + } + + public static void main(String[] args) throws IOException { + + // file path + String path = "test.tsfile"; + + // create reader and get the readTsFile interface + TsFileSequenceReader reader = new TsFileSequenceReader(path); + ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader); + // use these paths(all sensors) for all the queries + ArrayList paths = new ArrayList<>(); + paths.add(new Path("device_1.sensor_1")); + paths.add(new Path("device_1.sensor_2")); + paths.add(new Path("device_1.sensor_3")); + + // no query statement + queryAndPrint(paths, readTsFile, null); + + //close the reader when you left + reader.close(); + } +} +``` + + + +## Change TsFile Configuration + +```java +TSFileConfig config = TSFileDescriptor.getInstance().getConfig(); +config.setXXX(); +``` + + + diff --git a/doc/zh/TsFile-API.md b/doc/zh/TsFile-API.md new file mode 100644 index 000000000..b8e71a88c --- /dev/null +++ b/doc/zh/TsFile-API.md @@ -0,0 +1,561 @@ + + +# TsFile API + +TsFile 是在 IoTDB 中使用的时间序列的文件格式。在这个章节中,我们将介绍这种文件格式的用法。 + +## 安装 TsFile library + +在您自己的项目中有两种方法使用 TsFile . + +* 使用 jar 包:编译源码生成 jar 包 + +```shell +git clone https://github.com/apache/iotdb.git +cd iotdb-core/tsfile/ +mvn clean package -Dmaven.test.skip=true +``` + +命令执行完成之后,所有的 jar 包都可以从 `target/` 目录下找到。之后您可以在自己的工程中导入 `target/tsfile-1.0.0.jar`. + +* 使用 Maven 依赖: + +编译源码并且部署到您的本地仓库中需要 3 步: + + 1. 下载源码 + + ```shell +git clone https://github.com/apache/iotdb.git + ``` + 2. 编译源码和部署到本地仓库 + + ```shell +cd iotdb-core/tsfile/ +mvn clean install -Dmaven.test.skip=true + ``` + 3. 在您自己的工程中增加依赖: + + ```xml + + org.apache.iotdb + tsfile + 0.12.0 + + ``` + +或者,您可以直接使用官方的 Maven 仓库: + + 1. 首先,在`${username}\.m2\settings.xml`目录下的`settings.xml`文件中`` + 节中增加``,内容如下: + + ```xml + + allow-snapshots + true + + + apache.snapshots + Apache Development Snapshot Repository + https://repository.apache.org/content/repositories/snapshots/ + + false + + + true + + + + + ``` + 2. 之后您可以在您的工程中增加如下依赖: + + ```xml + + org.apache.iotdb + tsfile + 1.0.0 + + ``` + +## TsFile 的使用 + +本章节演示 TsFile 的详细用法。 + +时序数据 (Time-series Data) +一个时序是由 4 个序列组成,分别是 device, measurement, time, value。 + +* **measurement**: 时间序列描述的是一个物理或者形式的测量 (measurement),比如:城市的温度,一些商品的销售数量或者是火车在不同时间的速度。 +传统的传感器(如温度计)也采用单次测量 (measurement) 并产生时间序列,我们将在下面交替使用测量 (measurement) 和传感器。 + +* **device**: 一个设备指的是一个正在进行多次测量(产生多个时间序列)的实体,例如, + ​ ​ ​ 一列正在运行的火车监控它的速度、油表、它已经运行的英里数,当前的乘客每个都被传送到一个时间序列。 + +**单行数据**: 在许多工业应用程序中,一个设备通常包含多个传感器,这些传感器可能同时具有多个值,这称为一行数据。 + +在形式上,一行数据包含一个`device_id`,它是一个时间戳,表示从 1970 年 1 月 1 日 00:00:00 开始的毫秒数, +以及由`measurement_id`和相应的`value`组成的几个数据对。一行中的所有数据对都属于这个`device_id`,并且具有相同的时间戳。 +如果其中一个度量值`measurements`在某个时间戳`timestamp`没有值`value`,将使用一个空格表示(实际上 TsFile 并不存储 null 值)。 +其格式如下: + +``` +device_id, timestamp, ... +``` + +示例数据如下所示。在本例中,两个度量值 (measurement) 的数据类型分别是`INT32`和`FLOAT`。 + +``` +device_1, 1490860659000, m1, 10, m2, 12.12 +``` + +### 写入 TsFile + +TsFile 可以通过以下三个步骤生成,完整的代码参见"写入 TsFile 示例"章节。 + +1. 构造一个`TsFileWriter`实例。 + + 以下是可用的构造函数: + + * 没有预定义 schema + + ```java + public TsFileWriter(File file) throws IOException + ``` + * 预定义 schema + + ```java + public TsFileWriter(File file, Schema schema) throws IOException + ``` + 这个是用于使用 HDFS 文件系统的。`TsFileOutput`可以是`HDFSOutput`类的一个实例。 + + ```java + public TsFileWriter(TsFileOutput output, Schema schema) throws IOException + ``` + + 如果你想自己设置一些 TSFile 的配置,你可以使用`config`参数。比如: + + ```java + TSFileConfig conf = new TSFileConfig(); + conf.setTSFileStorageFs("HDFS"); + TsFileWriter tsFileWriter = new TsFileWriter(file, schema, conf); + ``` + + 在上面的例子中,数据文件将存储在 HDFS 中,而不是本地文件系统中。如果你想在本地文件系统中存储数据文件,你可以使用`conf.setTSFileStorageFs("LOCAL")`,这也是默认的配置。 + + 您还可以通过`config.setHdfsIp(...)`和`config.setHdfsPort(...)`来配置 HDFS 的 IP 和端口。默认的 IP 是`localhost`,默认的`RPC`端口是`9000`. + + **参数:** + + * file : 写入 TsFile 数据的文件 + * schema : 文件的 schemas,将在下章进行介绍 + * config : TsFile 的一些配置项 + +2. 添加测量值 (measurement) + + 你也可以先创建一个`Schema`类的实例然后把它传递给`TsFileWriter`类的构造函数 + + `Schema`类保存的是一个映射关系,key 是一个 measurement 的名字,value 是 measurement schema. + + 下面是一系列接口: + + ```java + // Create an empty Schema or from an existing map + public Schema() + public Schema(Map measurements) + // Use this two interfaces to add measurements + public void registerMeasurement(MeasurementSchema descriptor) + public void registerMeasurements(Map measurements) + // Some useful getter and checker + public TSDataType getMeasurementDataType(String measurementId) + public MeasurementSchema getMeasurementSchema(String measurementId) + public Map getAllMeasurementSchema() + public boolean hasMeasurement(String measurementId) + ``` + + 你可以在`TsFileWriter`类中使用以下接口来添加额外的测量 (measurement): + ​ + ```java + public void addMeasurement(MeasurementSchema measurementSchema) throws WriteProcessException + ``` + + `MeasurementSchema`类保存了一个测量 (measurement) 的信息,有几个构造函数: + + ```java + public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding) + public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType) + public MeasurementSchema(String measurementId, TSDataType type, TSEncoding encoding, CompressionType compressionType, + Map props) + ``` + + **参数:** + ​ + + * measurementID: 测量的名称,通常是传感器的名称。 + + * type: 数据类型,现在支持六种类型:`BOOLEAN`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `TEXT`; + + * encoding: 编码类型。 + + * compression: 压缩方式。现在支持 `UNCOMPRESSED` 和 `SNAPPY`. + + * props: 特殊数据类型的属性。比如说`FLOAT`和`DOUBLE`可以设置`max_point_number`,`TEXT`可以设置`max_string_length`。 + 可以使用 Map 来保存键值对,比如 ("max_point_number", "3")。 + + > **注意:** 虽然一个测量 (measurement) 的名字可以被用在多个 deltaObjects 中,但是它的参数是不允许被修改的。比如: + 不允许多次为同一个测量 (measurement) 名添加不同类型的编码。下面是一个错误示例: + + ```java + // The measurement "sensor_1" is float type + addMeasurement(new MeasurementSchema("sensor_1", TSDataType.FLOAT, TSEncoding.RLE)); + // This call will throw a WriteProcessException exception + addMeasurement(new MeasurementSchema("sensor_1", TSDataType.INT32, TSEncoding.RLE)); + ``` +3. 插入和写入数据。 + + 使用这个接口创建一个新的`TSRecord`(时间戳和设备对)。 + + ```java + public TSRecord(long timestamp, String deviceId) + ``` + + 然后创建一个`DataPoint`(度量 (measurement) 和值的对应),并使用 addTuple 方法将数据 DataPoint 添加正确的值到 TsRecord。 + + 用下面这种方法写 + + ```java + public void write(TSRecord record) throws IOException, WriteProcessException + ``` + +4. 调用`close`方法来完成写入过程。 + + ```java + public void close() throws IOException + ``` + +我们也支持将数据写入已关闭的 TsFile 文件中。 + +1. 使用`ForceAppendTsFileWriter`打开已经关闭的文件。 + + ```java + public ForceAppendTsFileWriter(File file) throws IOException + ``` +2. 调用 `doTruncate` 去掉文件的 Metadata 部分 + +3. 使用 `ForceAppendTsFileWriter` 构造另一个`TsFileWriter` + + ```java + public TsFileWriter(TsFileIOWriter fileWriter) throws IOException + ``` +请注意 此时需要重新添加测量值 (measurement) 再进行上述写入操作。 + +### 写入 TsFile 示例 + +您需要安装 TsFile 到本地的 Maven 仓库中。 + +```shell +mvn clean install -pl iotdb-core/tsfile -am -DskipTests +``` + +如果存在**非对齐**的时序数据(比如:不是所有的传感器都有值),您可以通过构造** TSRecord **来写入。 + +更详细的例子可以在 + +``` +/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTSRecord.java +``` + +中查看 + +如果所有时序数据都是**对齐**的,您可以通过构造** Tablet **来写入数据。 + +更详细的例子可以在 + +``` +/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileWriteWithTablet.java +``` +中查看 + +在已关闭的 TsFile 文件中写入新数据的详细例子可以在 + +``` +/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileForceAppendWrite.java +``` +中查看 + +### 读取 TsFile 接口 + + * 路径的定义 + +路径是一个点 (.) 分隔的字符串,它唯一地标识 TsFile 中的时间序列,例如:"root.area_1.device_1.sensor_1"。 +最后一部分"sensor_1"称为"measurementId",其余部分"root.area_1.device_1"称为 deviceId。 +正如之前提到的,不同设备中的相同测量 (measurement) 具有相同的数据类型和编码,设备也是唯一的。 + +在 read 接口中,参数`paths`表示要选择的测量值 (measurement)。 +Path 实例可以很容易地通过类`Path`来构造。例如: + +```java +Path p = new Path("device_1.sensor_1"); +``` + +我们可以为查询传递一个 ArrayList 路径,以支持多个路径查询。 + +```java +List paths = new ArrayList(); +paths.add(new Path("device_1.sensor_1")); +paths.add(new Path("device_1.sensor_3")); +``` + +> **注意:** 在构造路径时,参数的格式应该是一个点 (.) 分隔的字符串,最后一部分是 measurement,其余部分确认为 deviceId。 + + * 定义 Filter + + * 使用条件过滤 +在 TsFile 读取过程中使用 Filter 来选择满足一个或多个给定条件的数据。 + + * IExpression +`IExpression`是一个过滤器表达式接口,它将被传递给系统查询时调用。 +我们创建一个或多个筛选器表达式,并且可以使用`Binary Filter Operators`将它们连接形成最终表达式。 + +* **创建一个 Filter 表达式** + + 有两种类型的过滤器。 + + * TimeFilter: 使用时序数据中的`time`过滤。 + + ```java + IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter); + ``` + + 使用以下关系获得一个`TimeFilter`对象(值是一个 long 型变量)。 + +|Relationship|Description| +|----|----| +|TimeFilter.eq(value)|选择时间等于值的数据| +|TimeFilter.lt(value)|选择时间小于值的数据| +|TimeFilter.gt(value)|选择时间大于值的数据| +|TimeFilter.ltEq(value)|选择时间小于等于值的数据| +|TimeFilter.gtEq(value)|选择时间大于等于值的数据| +|TimeFilter.notEq(value)|选择时间不等于值的数据| +|TimeFilter.not(TimeFilter)|选择时间不满足另一个时间过滤器的数据| + + * ValueFilter: 使用时序数据中的`value`过滤。 + + +```java +IExpression valueFilterExpr = new SingleSeriesExpression(Path, ValueFilter); +``` + + `ValueFilter`的用法与`TimeFilter`相同,只是需要确保值的类型等于 measurement(在路径中定义)的类型。 + +* **Binary Filter Operators** + + Binary filter operators 可以用来连接两个单独的表达式。 + + * BinaryExpression.and(Expression, Expression): 选择同时满足两个表达式的数据。 + * BinaryExpression.or(Expression, Expression): 选择满足任意一个表达式值的数据。 + + +Filter Expression 示例 + +* **TimeFilterExpression 示例** + +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.eq(15)); // series time = 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.ltEq(15)); // series time <= 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.lt(15)); // series time < 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.gtEq(15)); // series time >= 15 +``` +```java +IExpression timeFilterExpr = new GlobalTimeExpression(TimeFilter.notEq(15)); // series time != 15 +``` +```java +IExpression timeFilterExpr = BinaryExpression.and( + new GlobalTimeExpression(TimeFilter.gtEq(15L)), + new GlobalTimeExpression(TimeFilter.lt(25L))); // 15 <= series time < 25 +``` +```java +IExpression timeFilterExpr = BinaryExpression.or( + new GlobalTimeExpression(TimeFilter.gtEq(15L)), + new GlobalTimeExpression(TimeFilter.lt(25L))); // series time >= 15 or series time < 25 +``` + +* 读取接口 + +首先,我们打开 TsFile 并从文件路径`path`中获取一个`ReadOnlyTsFile`实例。 + +```java +TsFileSequenceReader reader = new TsFileSequenceReader(path); +ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader); +``` +接下来,我们准备路径数组和查询表达式,然后通过这个接口得到最终的`QueryExpression`对象: + +```java +QueryExpression queryExpression = QueryExpression.create(paths, statement); +``` + +ReadOnlyTsFile 类有两个`query`方法来执行查询。 + +```java +public QueryDataSet query(QueryExpression queryExpression) throws IOException +public QueryDataSet query(QueryExpression queryExpression, long partitionStartOffset, long partitionEndOffset) throws IOException +``` + +此方法是为高级应用(如 TsFile-Spark 连接器)设计的。 + +* **参数** : 对于第二个方法,添加了两个额外的参数来支持部分查询 (Partial Query): + * `partitionStartOffset`: TsFile 的开始偏移量 + * `partitionEndOffset`: TsFile 的结束偏移量 + +>什么是部分查询? + +> 在一些分布式文件系统中(比如:HDFS), 文件被分成几个部分,这些部分被称为"Blocks"并存储在不同的节点中。在涉及的每个节点上并行执行查询可以提高效率。因此需要部分查询 (Partial Query)。部分查询 (Partial Query) 仅支持查询 TsFile 中被`QueryConstant.PARTITION_START_OFFSET`和`QueryConstant.PARTITION_END_OFFSET`分割的部分。 + +* QueryDataset 接口 + + 上面执行的查询将返回一个`QueryDataset`对象。 + + 以下是一些用户常用的接口: + + * `bool hasNext();` + + 如果该数据集仍然有数据,则返回 true。 + * `List getPaths()` + + 获取这个数据集中的路径。 + * `List getDataTypes();` + + 获取数据类型。 + + * `RowRecord next() throws IOException;` + + 获取下一条记录。 + + `RowRecord`类包含一个`long`类型的时间戳和一个`List`,用于不同传感器中的数据,我们可以使用两个 getter 方法来获取它们。 + + ```java + long getTimestamp(); + List getFields(); + ``` + + 要从一个字段获取数据,请使用以下方法: + + ```java + TSDataType getDataType(); + Object getObjectValue(); + ``` + +### 读取现有 TsFile 示例 + +您需要安装 TsFile 到本地的 Maven 仓库中。 + +有关查询语句的更详细示例,请参见 +`/example/tsfile/src/main/java/org/apache/iotdb/tsfile/TsFileRead.java` + +```java +package org.apache.iotdb.tsfile; +import java.io.IOException; +import java.util.ArrayList; +import org.apache.iotdb.tsfile.read.ReadOnlyTsFile; +import org.apache.iotdb.tsfile.read.TsFileSequenceReader; +import org.apache.iotdb.tsfile.read.common.Path; +import org.apache.iotdb.tsfile.read.expression.IExpression; +import org.apache.iotdb.tsfile.read.expression.QueryExpression; +import org.apache.iotdb.tsfile.read.expression.impl.BinaryExpression; +import org.apache.iotdb.tsfile.read.expression.impl.GlobalTimeExpression; +import org.apache.iotdb.tsfile.read.expression.impl.SingleSeriesExpression; +import org.apache.iotdb.tsfile.read.filter.TimeFilter; +import org.apache.iotdb.tsfile.read.filter.ValueFilter; +import org.apache.iotdb.tsfile.read.query.dataset.QueryDataSet; + +/** + * The class is to show how to read TsFile file named "test.tsfile". + * The TsFile file "test.tsfile" is generated from class TsFileWrite. + * Run TsFileWrite to generate the test.tsfile first + */ +public class TsFileRead { + private static final String DEVICE1 = "device_1"; + + private static void queryAndPrint(ArrayList paths, ReadOnlyTsFile readTsFile, IExpression statement) + throws IOException { + QueryExpression queryExpression = QueryExpression.create(paths, statement); + QueryDataSet queryDataSet = readTsFile.query(queryExpression); + while (queryDataSet.hasNext()) { + System.out.println(queryDataSet.next()); + } + System.out.println("------------"); + } + + public static void main(String[] args) throws IOException { + + // file path + String path = "test.tsfile"; + + // create reader and get the readTsFile interface + try (TsFileSequenceReader reader = new TsFileSequenceReader(path); + ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader)){ + + // use these paths(all sensors) for all the queries + ArrayList paths = new ArrayList<>(); + paths.add(new Path(DEVICE1, "sensor_1")); + paths.add(new Path(DEVICE1, "sensor_2")); + paths.add(new Path(DEVICE1, "sensor_3")); + + // no filter, should select 1 2 3 4 6 7 8 + queryAndPrint(paths, readTsFile, null); + + // time filter : 4 <= time <= 10, should select 4 6 7 8 + IExpression timeFilter = + BinaryExpression.and( + new GlobalTimeExpression(TimeFilter.gtEq(4L)), + new GlobalTimeExpression(TimeFilter.ltEq(10L))); + queryAndPrint(paths, readTsFile, timeFilter); + + // value filter : device_1.sensor_2 <= 20, should select 1 2 4 6 7 + IExpression valueFilter = + new SingleSeriesExpression(new Path(DEVICE1, "sensor_2"), ValueFilter.ltEq(20L)); + queryAndPrint(paths, readTsFile, valueFilter); + + // time filter : 4 <= time <= 10, value filter : device_1.sensor_3 >= 20, should select 4 7 8 + timeFilter = + BinaryExpression.and( + new GlobalTimeExpression(TimeFilter.gtEq(4L)), + new GlobalTimeExpression(TimeFilter.ltEq(10L))); + valueFilter = + new SingleSeriesExpression(new Path(DEVICE1, "sensor_3"), ValueFilter.gtEq(20L)); + IExpression finalFilter = BinaryExpression.and(timeFilter, valueFilter); + queryAndPrint(paths, readTsFile, finalFilter); + } + } +} +``` + +## 修改 TsFile 配置项 + +```java +TSFileConfig config = TSFileDescriptor.getInstance().getConfig(); +config.setXXX(); +``` From a80c8ce4c7f0c35a3b8fea5e6cbd30cb39bafd58 Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Thu, 1 Feb 2024 18:10:53 +0800 Subject: [PATCH 2/3] add doc2 --- doc/UserGuide/TsFile-API.md | 2 +- doc/zh/TsFile-API.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/UserGuide/TsFile-API.md b/doc/UserGuide/TsFile-API.md index 4ee7b4dc5..08f8e9daa 100644 --- a/doc/UserGuide/TsFile-API.md +++ b/doc/UserGuide/TsFile-API.md @@ -21,7 +21,7 @@ # TsFile API -`TsFile` is a file format of time series used in IoTDB. +`TsFile` is a file format of time series, which used in IoTDB. This document introduces the usage of this file format. ## TsFile library Installation diff --git a/doc/zh/TsFile-API.md b/doc/zh/TsFile-API.md index b8e71a88c..edc2f1e60 100644 --- a/doc/zh/TsFile-API.md +++ b/doc/zh/TsFile-API.md @@ -21,7 +21,7 @@ # TsFile API -TsFile 是在 IoTDB 中使用的时间序列的文件格式。在这个章节中,我们将介绍这种文件格式的用法。 +Tsfile 是在时序领域的通用时间序列的文件格式,目前在Aapche IoTDB组件中应用。本文档将介绍这种文件格式的用法。 ## 安装 TsFile library From e1c8ee2ebb8e4a59330bdaf2d80d0551055005ab Mon Sep 17 00:00:00 2001 From: wanghui42 Date: Mon, 4 Mar 2024 18:31:09 +0800 Subject: [PATCH 3/3] add down doc --- docs/src/.vuepress/navbar/en.ts | 8 +++---- docs/src/.vuepress/navbar/zh.ts | 8 +++---- docs/src/Download/README.md | 40 +++++++++++---------------------- docs/src/zh/Download/README.md | 37 +++++++++++------------------- 4 files changed, 34 insertions(+), 59 deletions(-) diff --git a/docs/src/.vuepress/navbar/en.ts b/docs/src/.vuepress/navbar/en.ts index 1f014de88..6e8094671 100644 --- a/docs/src/.vuepress/navbar/en.ts +++ b/docs/src/.vuepress/navbar/en.ts @@ -26,10 +26,10 @@ export const enNavbar = navbar([ // { text: 'v1.0.x', link: '/UserGuide/latest/QuickStart/QuickStart' }, // ], }, - // { - // text: 'Release', - // link: '/Download/', - // }, + { + text: 'Release', + link: '/Download/', + }, // { // text: 'Community', // children: [ diff --git a/docs/src/.vuepress/navbar/zh.ts b/docs/src/.vuepress/navbar/zh.ts index a9c251df9..c434303c2 100644 --- a/docs/src/.vuepress/navbar/zh.ts +++ b/docs/src/.vuepress/navbar/zh.ts @@ -26,10 +26,10 @@ export const zhNavbar = navbar([ // { text: 'v1.0.x', link: '/zh/UserGuide/latest/QuickStart/QuickStart' }, // ], }, - // { - // text: '发布版本', - // link: '/zh/Download/', - // }, + { + text: '下载', + link: '/zh/Download/', + }, { text: '社区', children: [ diff --git a/docs/src/Download/README.md b/docs/src/Download/README.md index e01f6a145..ab43b7021 100644 --- a/docs/src/Download/README.md +++ b/docs/src/Download/README.md @@ -1,32 +1,18 @@ - +Downloading it from the [Maven central repository](https://search.maven.org/search?q=g:org.apache.tsfile) -​ +Add the following dependency section to your pom.xml: -# All releases +``` + + org.apache.tsfile + tsfile + 1.0.0 + +``` -Find all releases in the [Archive repository](https://archive.apache.org/dist/iotdb/). - - - -# Verifying Hashes and Signatures - -Along with our releases, we also provide sha512 hashes in *.sha512 files and cryptographic signatures in *.asc files. The Apache Software Foundation has an extensive tutorial to [verify hashes and signatures ](http://www.apache.org/info/verification.html)which you can follow by using any of these release-signing [KEYS ](https://downloads.apache.org/iotdb/KEYS). +The release note of 1.0.0 can be found in the Archives of the Apache Software Foundation: https://github.com/apache/tsfile/releases/tag/v1.0.0 \ No newline at end of file diff --git a/docs/src/zh/Download/README.md b/docs/src/zh/Download/README.md index e9cc41a1c..946b64664 100644 --- a/docs/src/zh/Download/README.md +++ b/docs/src/zh/Download/README.md @@ -1,29 +1,18 @@ - +点击 Maven 仓库地址下载:[Maven central repository](https://search.maven.org/search?q=g:org.apache.tsfile) +将以下依赖添加至 pom : -# 所有版本 +``` + + org.apache.tsfile + tsfile + 1.0.0 + +``` -在 [Archive repository](https://archive.apache.org/dist/iotdb/) 查看所有版本 - -# 验证哈希和签名 - -除了我们的发行版,我们还在 *.sha512 文件中提供了 sha512 散列,并在 *.asc 文件中提供了加密签名。 Apache Software Foundation 提供了广泛的教程来 [验证哈希和签名](http://www.apache.org/info/verification.html),您可以使用任何这些发布签名的 [KEYS](https://downloads.apache.org/iotdb/KEYS) 来遵循这些哈希和签名。 +该版本的特性可点击查看: https://github.com/apache/tsfile/releases/tag/v1.0.0 \ No newline at end of file