Parquet reader list microkernel (#16538) · rapidsai/cudf@eeb4d27

Commit

Parquet reader list microkernel (#16538)

This PR refactors fixed-width parquet list reader decoding into its own set of micro-kernels, templatizing the existing fixed-width microkernels. When skipping rows for lists, this will skip ahead the decoding of the definition, repetition, and dictionary rle_streams as well. The list kernel uses 128 threads per block and 71 registers per thread, so I've changed the launch_bounds to enforce a minimum of 8 blocks per SM.  This causes a small register spill but the benchmarks are still faster, as seen below: 

DEVICE_BUFFER list benchmarks (decompress + decode, not bound by IO): 
run_length 1,   cardinality 0,             no byte_limit: 24.7% faster
run_length 32, cardinality 1000,       no byte_limit: 18.3% faster
run_length 1,   cardinality 0,       500kb byte_limit: 57% faster
run_length 32, cardinality 1000, 500kb byte_limit: 53% faster

Compressed list of ints on hard drive: 5.5% faster
Sample real data on hard drive (many columns not lists): 0.5% faster

Authors:
  - Paul Mattione (https://github.com/pmattione-nvidia)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - https://github.com/nvdbaranec
  - Nghia Truong (https://github.com/ttnghia)

URL: #16538

Loading branch information

pmattione-nvidia authored Oct 29, 2024

1 parent 8d7b0d8 commit eeb4d27

0 comments on commit `eeb4d27`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `eeb4d27`

Commit

There are no files selected for viewing

0 comments on commit eeb4d27

0 comments on commit `eeb4d27`