-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow HiveSplit info columns like '$file_size' and '$file_modified_time' to be queried in SQL #8800
Conversation
✅ Deploy Preview for meta-velox canceled.
|
38310c0
to
b7be310
Compare
b7be310
to
340ffbe
Compare
hey @aditi-pandit I also have a similar PR #7880 to let velox support query spark engine supported file metadata for hiveTables (file_path, file_size, file_name, file_modify_time, file_block_start, file_block_end) etc. Maybe we can work together to see if can let the change support for both engine presto and spark ? |
340ffbe
to
1974a77
Compare
hey @aditi-pandit may change the PR title to |
1974a77
to
a67125c
Compare
a67125c
to
43fce91
Compare
43fce91
to
86ab66c
Compare
@Yuhta @majetideepak : PTAL. |
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@Yuhta : Do you need help with the linter error ? Please can you give me more info about it. |
…ified_time' to be queried in SQL (facebookincubator#8800)" This reverts commit b9afa14.
…file_modified_time' to be queried in SQL (facebookincubator#8800)"" This reverts commit d3dc172.
@@ -364,11 +378,13 @@ std::shared_ptr<common::ScanSpec> makeScanSpec( | |||
// SelectiveColumnReader doesn't support constant columns with filters, | |||
// hence, we can't have a filter for a $path or $bucket column. | |||
// | |||
// Unfortunately, Presto happens to specify a filter for $path or | |||
// $bucket column. This filter is redundant and needs to be removed. | |||
// Unfortunately, Presto happens to specify a filter for $path, $file_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if there is there an issue for this on Presto side?
…me' to be queried in SQL (facebookincubator#8800) Summary: $file_size and $file_modified_time are queryable synthesized columns for Hive tables in Presto. Spark also has bunch of such queryable synthesized columns (facebookincubator#7880). The columns are passed by the co-ordinator to the worker in the HiveSplit. i) Velox HiveSplit needed to be enhanced to get filesize and file_modified_time metadata in a generic map data-structure of (column name, value) from Prestissimo. ii) These values should be populated by SplitReader into TableScanOperator output buffers. This also needs a Prestissimo change to populate the HiveSplit with this info sent in the fragment prestodb/presto#21965 Fixes prestodb/presto#21867 gaoyangxiaozhu will have a follow up PR on the Spark integration. Pull Request resolved: facebookincubator#8800 Reviewed By: mbasmanova Differential Revision: D54512245 Pulled By: Yuhta fbshipit-source-id: 190a97f9fcb1e869fff82e0a2264d57f9915376e
$file_size and $file_modified_time are queryable synthesized columns for Hive tables in Presto. Spark also has bunch of such queryable synthesized columns (#7880).
The columns are passed by the co-ordinator to the worker in the HiveSplit.
i) Velox HiveSplit needed to be enhanced to get filesize and file_modified_time metadata in a generic map data-structure of (column name, value) from Prestissimo.
ii) These values should be populated by SplitReader into TableScanOperator output buffers.
This also needs a Prestissimo change to populate the HiveSplit with this info sent in the fragment prestodb/presto#21965
Fixes prestodb/presto#21867
@gaoyangxiaozhu will have a follow up PR on the Spark integration.