Is partitioning by country safe (> 100 countries)? #3

MichaelTiemannOSC · 2022-08-06T12:57:02Z

The default Trino / Iceberg configuration limits the pool of writers to 100. When writing data that has more than 100 distinct partition values, Trino can throw an error that it doesn't have enough writers configured. The current demonstration of data loading partitions by country (cell 13) but then chooses to only populate power plants based in France (cell 15). It also cleverly limits the batch size to 100 (which means that at most 100 writers can be needed, since there cannot be more than 100 distinct countries in 100 rows of data). This conveniently avoids the problem of not enough writers (in two different ways).

But the real question is: generally, how do we best manage the partition writer limit against how typical pipeline developers will want to write and maintain their code?

MichaelTiemannOSC added the documentation Improvements or additions to documentation label Aug 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is partitioning by country safe (> 100 countries)? #3

Is partitioning by country safe (> 100 countries)? #3

MichaelTiemannOSC commented Aug 6, 2022

Is partitioning by country safe (> 100 countries)? #3

Is partitioning by country safe (> 100 countries)? #3

Comments

MichaelTiemannOSC commented Aug 6, 2022