-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8780][RFC-83] Incremental Table Service #12514
base: master
Are you sure you want to change the base?
Changes from 12 commits
24e4b7f
ec54175
ec123fc
a40ddf5
aadf356
e32a04e
10f82e8
c03db6f
b5e7538
82b67d8
17cd121
6d354e6
a439369
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,233 @@ | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one or more | ||
contributor license agreements. See the NOTICE file distributed with | ||
this work for additional information regarding copyright ownership. | ||
The ASF licenses this file to You under the Apache License, Version 2.0 | ||
(the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
# RFC-83: Incremental Table Service | ||
|
||
## Proposers | ||
|
||
- @zhangyue19921010 | ||
|
||
## Approvers | ||
- @danny0405 | ||
- @yuzhaojing | ||
|
||
## Status | ||
|
||
JIRA: https://issues.apache.org/jira/browse/HUDI-8780 | ||
|
||
## Abstract | ||
|
||
In Hudi, when scheduling Compaction and Clustering, the default behavior is to scan all partitions under the current table. | ||
When there are many historical partitions, such as 640,000 in our production environment, this scanning and planning operation becomes very inefficient. | ||
For Flink, it often leads to checkpoint timeouts, resulting in data delays. | ||
As for cleaning, we already have the ability to do cleaning for incremental partitions. | ||
|
||
This RFC will draw on the design of Incremental Clean to generalize the capability of processing incremental partitions to all table services, such as Clustering and Compaction. | ||
|
||
## Background | ||
|
||
`earliestInstantToRetain` in clean plan meta | ||
|
||
HoodieCleanerPlan.avsc | ||
|
||
```text | ||
{ | ||
"namespace": "org.apache.hudi.avro.model", | ||
"type": "record", | ||
"name": "HoodieCleanerPlan", | ||
"fields": [ | ||
{ | ||
"name": "earliestInstantToRetain", | ||
"type":["null", { | ||
"type": "record", | ||
"name": "HoodieActionInstant", | ||
"fields": [ | ||
{ | ||
"name": "timestamp", | ||
"type": "string" | ||
}, | ||
{ | ||
"name": "action", | ||
"type": "string" | ||
}, | ||
{ | ||
"name": "state", | ||
"type": "string" | ||
} | ||
] | ||
}], | ||
"default" : null | ||
}, | ||
xxxx | ||
] | ||
} | ||
``` | ||
|
||
`EarliestCommitToRetan` in clean commit meta | ||
|
||
HoodieCleanMetadata.avsc | ||
|
||
```text | ||
{"namespace": "org.apache.hudi.avro.model", | ||
"type": "record", | ||
"name": "HoodieCleanMetadata", | ||
"fields": [ | ||
xxxx, | ||
{"name": "earliestCommitToRetain", "type": "string"}, | ||
xxxx | ||
] | ||
} | ||
``` | ||
How to get incremental partitions during cleaning | ||
|
||
![cleanIncrementalpartitions.png](cleanIncrementalpartitions.png) | ||
|
||
**Note** | ||
`EarliestCommitToRetain` is recorded in `HoodieCleanMetadata` | ||
newInstantToRetain is computed based on Clean configs such as `hoodie.clean.commits.retained` and will be record in clean meta as new EarliestCommitToRetain | ||
|
||
## Design And Implementation | ||
|
||
### Changes in TableService Metadata Schema | ||
|
||
Add new column `earliestInstantToRetain` (default null) in Clustering/Compaction plan same as `earliestInstantToRetain` in clean plan | ||
|
||
```text | ||
{ | ||
"name": "earliestInstantToRetain", | ||
"type":["null", { | ||
"type": "record", | ||
"name": "HoodieActionInstant", | ||
"fields": [ | ||
{ | ||
"name": "timestamp", | ||
"type": "string" | ||
}, | ||
{ | ||
"name": "action", | ||
"type": "string" | ||
}, | ||
{ | ||
"name": "state", | ||
"type": "string" | ||
} | ||
] | ||
}], | ||
"default" : null | ||
}, | ||
``` | ||
|
||
We also need a unified interface/abstract-class to control the Plan behavior of the TableService including clustering and compaction. | ||
|
||
### Abstraction | ||
|
||
Use `PartitionBaseTableServicePlanStrategy` to control the behavior of getting partitions, filter partitions and generate table service plan etc. | ||
|
||
Since we want to control the logic of partition acquisition, partition filtering, and plan generation through different strategies, | ||
in the first step, we need to use an abstraction to converge the logic of partition acquisition, partition filtering, and plan generation into the base strategy. | ||
|
||
```java | ||
package org.apache.hudi.table; | ||
|
||
import org.apache.hudi.common.engine.HoodieEngineContext; | ||
import org.apache.hudi.common.table.HoodieTableMetaClient; | ||
import org.apache.hudi.common.table.timeline.HoodieInstant; | ||
import org.apache.hudi.common.util.Option; | ||
import org.apache.hudi.config.HoodieWriteConfig; | ||
|
||
import java.io.IOException; | ||
import java.util.List; | ||
|
||
public abstract class PartitionBaseTableServicePlanStrategy<R,S> { | ||
|
||
/** | ||
* Generate table service plan based on given instant. | ||
* @return | ||
*/ | ||
public abstract R generateTableServicePlan(Option<String> instant) throws IOException; | ||
|
||
/** | ||
* Generate table service plan based on given instant. | ||
* @return | ||
*/ | ||
public abstract R generateTableServicePlan(List<S> operations) throws IOException; | ||
|
||
|
||
/** | ||
* Get partition paths to be performed for current table service. | ||
* @param metaClient | ||
* @return | ||
*/ | ||
public abstract List<String> getPartitionPaths(HoodieWriteConfig writeConfig, HoodieTableMetaClient metaClient, HoodieEngineContext engineContext); | ||
|
||
/** | ||
* Filter partition path for given fully paths. | ||
* @param metaClient | ||
* @return | ||
*/ | ||
public abstract List<String> filterPartitionPaths(HoodieWriteConfig writeConfig, List<String> partitionPaths); | ||
|
||
/** | ||
* Get incremental partitions from EarliestCommitToRetain to instantToRetain | ||
* @param instantToRetain | ||
* @param type | ||
* @param deleteEmptyCommit | ||
* @return | ||
* @throws IOException | ||
*/ | ||
public List<String> getIncrementalPartitionPaths(Option<HoodieInstant> instantToRetain) { | ||
throw new UnsupportedOperationException("Not support yet"); | ||
} | ||
|
||
/** | ||
* Returns the earliest commit to retain from instant meta | ||
*/ | ||
public Option<HoodieInstant> getEarliestCommitToRetain() { | ||
throw new UnsupportedOperationException("Not support yet"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
And because the implementaion of compaction and clustering are quite different, maybe we just add two new interfaces: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In addition, Danny, what's your opinion for the logic of incremental partition acquisition? Option1 : Record a metadata field in the commit to indicate where the last processing was done. The partition acquisition behavior under Option1 is more flexible. Option2: Directly obtain the last completed table service commit time as the new starting point. Option2 is simpler and does not require modifying and processing commit metadata fields. |
||
} | ||
} | ||
|
||
``` | ||
|
||
Default action of `generateTableServicePlan`, `getPartitionPaths` and `filterPartitionPaths` API remains the same as it is now. | ||
|
||
Let baseAbstraction `CompactionStrategy` and `ClusteringPlanStrategy` extends this `PartitionBaseTableServicePlanStrategy` which are | ||
1. `public abstract class CompactionStrategy extends PartitionBaseTableServicePlanStrategy<HoodieCompactionPlan, HoodieCompactionOperation> implements Serializable` | ||
2. `public abstract class ClusteringPlanStrategy<T,I,K,O> extends PartitionBaseTableServicePlanStrategy<Option<HoodieClusteringPlan>, HoodieClusteringGroup> implements Serializable` | ||
|
||
**For Incremental Table Service including clustering and compaction, we will support a new IncrementalCompactionStrategy and | ||
new IncrementalClusteringPlanStrategy** | ||
|
||
|
||
### Work Flow for Incremental Clustering/Compaction Strategy | ||
|
||
Table Service Planner with Incremental Clustering/Compaction Strategy | ||
1. Retrieve the instant recorded in the last table service `xxxx.requested` as **INSTANT 1**. | ||
2. Calculate the current instant(Request time) to be processed as **INSTANT 2**. | ||
3. Obtain all partitions involved from **INSTANT 1** to **INSTANT 2** as incremental partitions and perform the table service plan operation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we turn on the incremental table service mode, are the various flexible partition selection mechanisms now unavailable? Consider the following scenario:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For common strategy, this various flexible partition selection mechanisms still works. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also in |
||
4. Record **INSTANT 2** in the table service plan. | ||
|
||
|
||
### About archive | ||
|
||
We record `EarliestCommitToRetain` in the TableService Request metadata file and use it as the basis for retrieving incremental partitions. | ||
Therefore, when Incremental Table Service is enabled, we should always ensure that there is a Clustering/Compaction request metadata in the active timeline. | ||
|
||
## Rollout/Adoption Plan | ||
|
||
low impact for current users | ||
|
||
## Test Plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we name it
IncrementalPartitionAwareStrategy
to emphasize it is "incremental".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed