Filter_func in parqeut reader #23

paopaoactioner · 2022-01-04T07:29:54Z

User Story

It is a common process that map, filter and batch, in row-based storage format, like tfrecord. But in parquet format, transforming to row-based dataset performs very badly and fitlering data after batch will bring the size of batch fluctuating drasticly. So we suppose to add a filter_func in read_parquet interface that helps user to get a clean batch directly.

Detailed requirements

add filter_func in hybridbackend.tensorflow.data.read_parquet(batch_size, fields=None, partition_count=1, partition_index=0, drop_remainder=False, num_parallel_reads=None, num_sequential_reads=1, filter_func=None, map_func=None)

API Compatibility

At least tensorflow 1.14 and 1.15

The text was updated successfully, but these errors were encountered:

2sin18 self-assigned this Jan 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter_func in parqeut reader #23

Filter_func in parqeut reader #23

paopaoactioner commented Jan 4, 2022

Filter_func in parqeut reader #23

Filter_func in parqeut reader #23

Comments

paopaoactioner commented Jan 4, 2022

User Story

Detailed requirements

API Compatibility