-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Pass file size and modification time in split #6029
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
@FelixYBW can you please review this? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, leave one comments.
// Spark 3.3 and later only have file size and modification time in PartitionedFile | ||
def getFileSizeAndModificationTime(file: PartitionedFile): (Option[Long], Option[Long]) = { | ||
(None, None) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd better keep pure abstract method in this file, just like other declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thank you!
Run Gluten Clickhouse CI |
Thank you for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
The PR broken the S3 reader. quick fix: #6313 |
What changes were proposed in this pull request?
Support was made in Velox to pass file properties while creating a file handle for reading. Previously a separate call was made to fetch file properties from remote storage. By passing this information down to Velox in the split we save on one network call per file open.
facebookincubator/velox#9314
How was this patch tested?
Existing UTs