Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is partitioning by country safe (> 100 countries)? #3

Open
MichaelTiemannOSC opened this issue Aug 6, 2022 · 0 comments
Open

Is partitioning by country safe (> 100 countries)? #3

MichaelTiemannOSC opened this issue Aug 6, 2022 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@MichaelTiemannOSC
Copy link
Contributor

The default Trino / Iceberg configuration limits the pool of writers to 100. When writing data that has more than 100 distinct partition values, Trino can throw an error that it doesn't have enough writers configured. The current demonstration of data loading partitions by country (cell 13) but then chooses to only populate power plants based in France (cell 15). It also cleverly limits the batch size to 100 (which means that at most 100 writers can be needed, since there cannot be more than 100 distinct countries in 100 rows of data). This conveniently avoids the problem of not enough writers (in two different ways).

But the real question is: generally, how do we best manage the partition writer limit against how typical pipeline developers will want to write and maintain their code?

@MichaelTiemannOSC MichaelTiemannOSC added the documentation Improvements or additions to documentation label Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant