Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Pass file size and modification time in split #6029

Merged
merged 3 commits into from
Jun 11, 2024

Conversation

acvictor
Copy link
Contributor

@acvictor acvictor commented Jun 9, 2024

What changes were proposed in this pull request?

Support was made in Velox to pass file properties while creating a file handle for reading. Previously a separate call was made to fetch file properties from remote storage. By passing this information down to Velox in the split we save on one network call per file open.

facebookincubator/velox#9314

How was this patch tested?

Existing UTs

Copy link

github-actions bot commented Jun 9, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

github-actions bot commented Jun 9, 2024

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Jun 9, 2024

Run Gluten Clickhouse CI

@acvictor acvictor marked this pull request as ready for review June 9, 2024 16:07
@acvictor
Copy link
Contributor Author

@FelixYBW can you please review this? Thank you!

Yohahaha
Yohahaha previously approved these changes Jun 10, 2024
Copy link
Contributor

@Yohahaha Yohahaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, leave one comments.

Comment on lines 210 to 214
// Spark 3.3 and later only have file size and modification time in PartitionedFile
def getFileSizeAndModificationTime(file: PartitionedFile): (Option[Long], Option[Long]) = {
(None, None)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd better keep pure abstract method in this file, just like other declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thank you!

Copy link

Run Gluten Clickhouse CI

@acvictor
Copy link
Contributor Author

LGTM, leave one comments.

Thank you for the review!

Copy link
Contributor

@Yohahaha Yohahaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@Yohahaha Yohahaha merged commit 13babf3 into apache:main Jun 11, 2024
41 checks passed
@FelixYBW
Copy link
Contributor

FelixYBW commented Jul 2, 2024

The PR broken the S3 reader. quick fix: #6313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants