Config for deciding whether to use Iceberg Time type #11174

kumarpritam863 · 2024-09-20T16:06:22Z

As per the current design:
If Connect Schema is int32 with logical type is Time, then iceberg table type is returned as TimeType.
Spark does not support the Iceberg TimeType https://iceberg.apache.org/docs/1.4.3/spark-writes/#iceberg-type-to-spark-type.
In this PR, We have added the config to decide whether to use Iceberg Time Type or IntegerType.
This config gives the flexibility to the users to choose IntegerType over TimeType without loosing any precision with being able to query via spark.

As per the current design: If Connect Schema is int32 with logical type is Time, then iceberg table type is returned as TimeType. Spark does not support the Iceberg TimeType https://iceberg.apache.org/docs/1.4.3/spark-writes/#iceberg-type-to-spark-type. In this PR, We have added the config to decide whether to use Iceberg Time Type or IntegerType. This config gives the flexibility to the users to choose IntegerType over TimeType without loosing any precision with being able to query via spark.

…not desired.

Fixed Record Converter Test to handle whether to use IcebergTimeType or Integer type for validations.

Fixed SchemaUtils test to handle iceberg time type or integer type.

kumarpritam863 · 2024-10-07T06:04:10Z

@bryanck can you please review this.

kumarpritam863 · 2024-10-08T12:58:26Z

@fqaiser94 can you please review this.

bryanck · 2024-10-08T15:15:51Z

Have you considered using an SMT for this? I'm reluctant to add configs for each type conversion scenario.

kumarpritam863 · 2024-10-09T16:11:42Z

@bryanck thanks for the quick response. Kafka Connect SMT's are a bit slow and putting that just for this conversion would unnecessarily lower the performance. Also as there is not plan from spark to extend the support and spark is very frequently and widely used that is why I thought of giving this option via the config.

bryanck · 2024-10-16T18:28:32Z

What we want is to keep transform logic out of the sink when possible. Our plan is to have a set of useful SMTs that we package w/ the connector, and this could be one of them.

kumarpritam863 · 2024-10-17T01:47:53Z

Hi Bryan,

Thank you for your response. I completely agree with your view that the logic for Single Message Transforms (SMTs) should not be part of any sink. However, in the context of this PR, I have a few considerations about moving this to an SMT:

Performance Impact: While SMTs provide flexibility, they can introduce performance overhead. Adding an SMT increases the time complexity by O(number of records), which may seem like O(N), but becomes significant when N involves processing millions of records.

General Applicability: SMTs are typically designed to work across different connectors, ensuring they remain connector-agnostic. In this case, however, the transformation logic is closely tied to deciding whether to convert to Iceberg's time type. Moving this to an SMT would result in an SMT that is specifically tailored for Kafka Connect Iceberg, which might limit its broader applicability.

Overhead of Moving Logic: The current logic is a simple conditional check. Moving this to an SMT introduces additional processing complexity, essentially creating a new step or hop, which could come at a higher cost for what is essentially a straightforward type conversion.

I'd be interested to hear your thoughts on this approach.

github-actions · 2024-11-17T00:17:24Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

kumarpritam863 · 2024-11-21T07:58:01Z

Hi Bryan, I hope you're well. I wanted to follow up on this. It's been a while, and I'd appreciate the opportunity to discuss it further.

github-actions · 2024-12-22T00:16:39Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions · 2024-12-29T00:16:54Z

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

braislchao · 2025-01-17T11:29:49Z

Hey!

Just a quick follow up: I encountered the same issue when using the files generated by the connector with Spark, and I came to the same conclusion as @kumarpritam863 : configuring a property in my custom connector and now using it in production.

Regarding making this functionality available, as @bryanck says, we can include a simple SMT that transforms all the fields in the key/value that are of type Time into long or Timestamp from epoch. Maybe is not as efficient but it separates the logic from the connector itself.

I can handle this in a separate PR, but first, it's necessary that the SMTs be merged as part of the Iceberg repo: #11936

Let me know if you have any other idea @kumarpritam863

bryanck · 2025-01-17T14:01:12Z

We have #11936 for a new SMT project. We can start adding some common transforms to that, rather than mixing transform logic into the sink itself.

kumarpritam863 added 5 commits September 20, 2024 21:26

Added Date to int conversion in case iceberg time type conversion is …

661ddaf

…not desired.

Returning integer type if iceberg time type conversion is not intended.

18c0cee

Fixed RecordConverterTest

20a5c86

Fixed Record Converter Test to handle whether to use IcebergTimeType or Integer type for validations.

Fixed SchemaUtilsTest

281681c

Fixed SchemaUtils test to handle iceberg time type or integer type.

github-actions bot added the KAFKACONNECT label Sep 20, 2024

github-actions bot added the stale label Nov 17, 2024

github-actions bot removed the stale label Nov 22, 2024

github-actions bot added the stale label Dec 22, 2024

github-actions bot closed this Dec 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config for deciding whether to use Iceberg Time type #11174

Config for deciding whether to use Iceberg Time type #11174

kumarpritam863 commented Sep 20, 2024

kumarpritam863 commented Oct 7, 2024

kumarpritam863 commented Oct 8, 2024

bryanck commented Oct 8, 2024

kumarpritam863 commented Oct 9, 2024

bryanck commented Oct 16, 2024

kumarpritam863 commented Oct 17, 2024

github-actions bot commented Nov 17, 2024

kumarpritam863 commented Nov 21, 2024

github-actions bot commented Dec 22, 2024

github-actions bot commented Dec 29, 2024

braislchao commented Jan 17, 2025

bryanck commented Jan 17, 2025

Config for deciding whether to use Iceberg Time type #11174

Config for deciding whether to use Iceberg Time type #11174

Conversation

kumarpritam863 commented Sep 20, 2024

kumarpritam863 commented Oct 7, 2024

kumarpritam863 commented Oct 8, 2024

bryanck commented Oct 8, 2024

kumarpritam863 commented Oct 9, 2024

bryanck commented Oct 16, 2024

kumarpritam863 commented Oct 17, 2024

github-actions bot commented Nov 17, 2024

kumarpritam863 commented Nov 21, 2024

github-actions bot commented Dec 22, 2024

github-actions bot commented Dec 29, 2024

braislchao commented Jan 17, 2025

bryanck commented Jan 17, 2025