Replies: 1 comment
-
Iceberg relies on snapshot isolation and optimistic locking at the table level to handle concurrent writes. This does not scale beyond just a few concurrenct writers, so in a streaming/realtime use case it is important to buffer data and commit every x seconds. This is what the article explains in terms of the asynchronous ingestion pattern, where files are to written s3 in parallel and later committed to iceberg in realtime batches to avoid concurrent writes. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
In this blog it is mentioned that Iceberg
There's several considerations here. First, we need to ensure we are not making too many concurrent commits to an Iceberg table. In short, each time we append files to an Iceberg table, that creates a commit, and you cannot have a high number of concurrent commits to a single table because Iceberg needs to maintain atomicity by locking the table.
. AFAIK Iceberg doesnt lock the table, it uses MVCC approach. If I understand, the issue seems to be about creating multiple snapshots resulting in metadata files and small data files. Or does this refer to specific metastore implementations?Beta Was this translation helpful? Give feedback.
All reactions