Skip to content

Commit

Permalink
Merge pull request #23 from pranavanba/main
Browse files Browse the repository at this point in the history
Update max_rows_per_file param in arrow::write_dataset() operation
  • Loading branch information
pranavanba authored Feb 29, 2024
2 parents 8552b1c + 60c9e26 commit 4756168
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion filtering.R
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ drop_cols_datasets <- function(dataset, columns=c(), input = AWS_PARQUET_DOWNLOA
arrow::open_dataset(sources = input_path) %>%
dplyr::select(!dplyr::any_of(columns)) %>%
arrow::write_dataset(path = final_path,
max_rows_per_file = 100000,
max_rows_per_file = 1000000,
partitioning = partitions,
existing_data_behavior = 'delete_matching',
basename_template = paste0("part-0000{i}.", as.character("parquet")))
Expand Down

0 comments on commit 4756168

Please sign in to comment.