Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add ConfigOptions to ScalarFunctionArgs #13527

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Omega359
Copy link
Contributor

Which issue does this PR close?

Closes #13519

Rationale for this change

Allow udf's to access df config

What changes are included in this PR?

Code.

Are these changes tested?

Existing tests.

Are there any user-facing changes?

Not specifically, this is covered with the udf signature change in #13290

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate common Related to common crate proto Related to proto crate functions labels Nov 22, 2024
@Omega359 Omega359 changed the title Feature/scalar func args session config feat: Add ConfigOptions to ScalarFuntionArgs Nov 22, 2024
@Omega359 Omega359 changed the title feat: Add ConfigOptions to ScalarFuntionArgs feat: Add ConfigOptions to ScalarFunctionArgs Nov 22, 2024
@Omega359
Copy link
Contributor Author

There is a lot of file changes here but most of the important changes are in scalar_function.rs, There is a todo in expr_simplifier.rs that I would like feedback on.

@Omega359 Omega359 marked this pull request as ready for review November 22, 2024 16:42
@alamb
Copy link
Contributor

alamb commented Nov 24, 2024

I plan to review this carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Omega359 -- this is an epic plumbing exercise 🪠

The signature in ScalarFunctionArgs is 👌 very nice

This PR seems to require config_options to be cloned many times now. I wonder if it is possible to avoid that 🤔. I took a brief look and it seems to be somewhat challenging as SessionState allows mutable access to the underlying SessionConfig.

Maybe we could change the semantics so that SessionConfig has a Arc<ConfigOptions> which was cloned when it was modified (Arc::unwrap_or_clone() style) 🤔

I also think the const evaluator does need the actual correct ConfigOptions for correctness

let physical_expr =
datafusion_physical_expr::create_physical_expr(&expr, &df_schema, &props)?;
let config_options = Arc::new(ConfigOptions::default());
let physical_expr = datafusion_physical_expr::create_physical_expr(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems somewhat inevitable that creating a physical expr will require the config options

However, I also think threading through the config options down through to the physical creation will (finally) permit people to pass things from the session down to function implementations (I think @cisaacson also was trying to do this in the past)

@@ -283,10 +284,16 @@ async fn prune_partitions(

// TODO: Plumb this down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This todo may have now be complete

@@ -336,6 +337,8 @@ pub struct ScalarFunctionArgs<'a> {
// The return type of the scalar function returned (from `return_type` or `return_type_from_exprs`)
// when creating the physical expression from the logical expression
pub return_type: &'a DataType,
// The config options which can be used to lookup configuration properties
pub config_options: Arc<ConfigOptions>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Ok(e) => e,
Err(err) => return ConstSimplifyResult::SimplifyRuntimeError(err, expr),
};
// todo - should the config options be the actual options here or is this sufficient?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the actual configuration options are needed here. Otherwise what will happen is that any function whose behavior relies on the ConfigOptions may have different behavior on columns and constants (or other expressions that can be constant folded)

@Omega359
Copy link
Contributor Author

This PR seems to require config_options to be cloned many times now. I wonder if it is possible to avoid that 🤔. I took a brief look and it seems to be somewhat challenging as SessionState allows mutable access to the underlying SessionConfig.

Yes, it's a bit annoying. I was tempted to see if I could switch to &'a ConfigOptions everywhere. There is at least one 'real' (vs Arc::clone) clone for every query, possibly more as I haven't checked.

Maybe we could change the semantics so that SessionConfig has a Arc<ConfigOptions> which was cloned when it was modified (Arc::unwrap_or_clone() style) 🤔

Certainly possible, I can attempt that.

I also think the const evaluator does need the actual correct ConfigOptions for correctness

I was afraid of that. I was avoiding it because of the signature changes it would required just about everywhere which would cause even more headaches for those systems trying to upgrade.

@alamb
Copy link
Contributor

alamb commented Nov 25, 2024

Yeah, it is a tricky one for sure

@alamb
Copy link
Contributor

alamb commented Nov 27, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft November 27, 2024 19:24
@Omega359
Copy link
Contributor Author

Maybe we could change the semantics so that SessionConfig has a Arc<ConfigOptions> which was cloned when it was modified (Arc::unwrap_or_clone() style) 🤔

Certainly possible, I can attempt that.

@alamb I did a quick attempt at implementing that however it breaks a commonly used method - SessionConfig.options_mut(). Not having that available breaks a bunch of stuff and while switching to SessionConfig.set(..) is quite possible it's not as clean.

Trying with &ConfigOptions in ScalarFunctionExpr leads to lifetime hell in areas I have no idea how to overcome right now.

As much as I want this feature I'm going to put it aside for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate functions logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SessionConfig reference to ScalarFunctionArgs
2 participants