-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Improve the implementation of Spark function input_file_name #6157
Comments
Assigned to @gaoyangxiaozhu |
thanks @zhztheplayer , also synced with @zhli1142015 offline, he has another proposal to maybe using a new logical rule in @zhli1142015 could you help share your thoguht here for reference, then we can finalize using which way for refactor this part. thanks! |
hey @zhztheplayer I thought a while and zhen's suggestion to convert But it help me bring another idea which can overcome the above shortcomings. The old implemented way need a fallback rule due to it introduce a new metadta column named With that, we only need one rule to keep it simple and work. for example
would be
I will try locally to verify if the idea works. But let me know if you think that way is acceptable @zhztheplayer / @FelixYBW / @zhli1142015 FYI |
That sounds great if it works. |
unlucky, can't directly reuse |
@zhztheplayer i checked the spark code if a project node has I have follow your suggestion to modify |
Description
2 PRs are being proposed for adding support for
input_file_name
#6021 (merged)
#6139 (pending)
However so far 2 interdependent rules are being added for this feature, one for pushing down
input_file_name
function to Velox scan, another one for falling back the plan to vanilla plan once failed to move scan to Velox.This kind of re-planning for offload/fallback should be avoided since it significantly increases query planner's code complexity. Technically, the rule to fallback is relying on not only the offload rule but actually all the rules that apply before it. We should find another way to add this kind of function support.
To improve:
InputFileNameReplaceRule
andInputFileNameReplaceFallbackRule
SparkPlanExecApi#genExtendedColumnarFinalRules
since unusedOffloadSingleNode.scala
, add a ruleOffloadInputFileName
(or something) to:Project(input_file_name) + Scan
patternProjectTransformer(input_file_name) + ScanTransformer(input_file_name)
The text was updated successfully, but these errors were encountered: