You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is a common process that map, filter and batch, in row-based storage format, like tfrecord. But in parquet format, transforming to row-based dataset performs very badly and fitlering data after batch will bring the size of batch fluctuating drasticly. So we suppose to add a filter_func in read_parquet interface that helps user to get a clean batch directly.
User Story
It is a common process that map, filter and batch, in row-based storage format, like tfrecord. But in parquet format, transforming to row-based dataset performs very badly and fitlering data after batch will bring the size of batch fluctuating drasticly. So we suppose to add a
filter_func
in read_parquet interface that helps user to get a clean batch directly.Detailed requirements
add
filter_func
inhybridbackend.tensorflow.data.read_parquet(batch_size, fields=None, partition_count=1, partition_index=0, drop_remainder=False, num_parallel_reads=None, num_sequential_reads=1, filter_func=None, map_func=None)
API Compatibility
At least tensorflow 1.14 and 1.15
The text was updated successfully, but these errors were encountered: