-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8631] Fix the bug where the Flink table config hoodie.populate.meta.fields is not effective and optimize write performance #12404
base: master
Are you sure you want to change the base?
Conversation
…meta.fields is not effective and optimize write performance
Dear Danny, Could you please take a look at this PR? @danny0405 Summary1. When enable
|
@usberkeley, @yihua , @danny0405
hudi/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSink.java Lines 98 to 101 in 3cb874f
which will lead to Line 140 in 3cb874f
But at this stage file writer is already initialized with row type: Line 95 in 3cb874f
So we can use row type without metadata columns, and write in append mode only initial Flink row data. |
@usberkeley, @yihua , @danny0405 For support of By any chance, could this MR be refactoring and optimization everything related to |
@usberkeley, if you don't mind, it would be better to reopen and use already created HUDI-8308 for optimizations. |
Dear geserdugarov, look amazing. I've been busy with a major feature recently, so I'll take a closer look at your PR a bit later |
Change Logs
1. Fix the bug
hoodie.populate.meta.fields
in Table Config (hoodie.properties)2. Optimize write performance
Impact
Improve write performance. After optimization, the write speed with
hoodie.populate.meta.fields=false
is 42.9% faster than withhoodie.populate.meta.fields=true
.Testing method
Consume from the earliest position in Kafka until all messages are consumed (Kafka Lag = 0), and compare the time taken for both.
1)populate meta fields
time taken: 21hours and 25mins
2)no meta fields
time taken: 12hours and 14mins
Risk level (write none, low medium or high below)
medium
Documentation Update
none
Contributor's checklist