Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ScalarValue::try_as_str to get str value from logical strings #14167

Merged
merged 1 commit into from
Jan 18, 2025

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 17, 2025

Which issue does this PR close?

Rationale for this change

See #14166

TLDR is I don't want to have to remember to check all the variants of ScalarValue that can contain a string

What changes are included in this PR?

  1. Add ScalarValue::try_as_str to get str value from logical strings
  2. Add docs/examples
  3. Update some of the code in DataFusion to use this new API

Are these changes tested?

yes, by doc tests and examples

Are there any user-facing changes?

there is a new API but all existing APIs still work

@github-actions github-actions bot added physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate common Related to common crate functions labels Jan 17, 2025
@@ -2849,6 +2849,50 @@ impl ScalarValue {
ScalarValue::from(value).cast_to(target_type)
}

/// Returns the Some(`&str`) representation of `ScalarValue` of logical string type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the new function

ScalarValue::Utf8(Some(delimiter))
| ScalarValue::LargeUtf8(Some(delimiter)) => {
Ok(Box::new(StringAggAccumulator::new(delimiter.as_str())))
return match lit.value().try_as_str() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty good example of can reducing the repetition in the code to check for string literal values. This also now implicitly will work for Dictionary values where it would not have before

| ScalarValue::Utf8(Some(method))
| ScalarValue::LargeUtf8(Some(method)) => method.parse::<DigestAlgorithm>(),
other => exec_err!("Unsupported data type {other:?} for function digest"),
ColumnarValue::Scalar(scalar) => match scalar.try_as_str() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also avoid a bunch more duplication of stuff like this:

        let part = if let ColumnarValue::Scalar(ScalarValue::Utf8(Some(v))) = part {

If we added a similar convenience method ColumnarValue::try_as_scalar_str() that returned a Option<Option<&str>>

Similarly we could do the same with Expr::try_as_scalar_str()

| ScalarValue::LargeUtf8(a)
| ScalarValue::Utf8(a) => {
ColumnarValue::Scalar(scalar) => match scalar.try_as_str() {
Some(a) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is clearer now

@alamb alamb marked this pull request as ready for review January 17, 2025 13:46
@@ -2849,6 +2849,50 @@ impl ScalarValue {
ScalarValue::from(value).cast_to(target_type)
}

/// Returns the Some(`&str`) representation of `ScalarValue` of logical string type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 doc

Copy link
Contributor

@wiedld wiedld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. ❤️

/// let scalar = ScalarValue::from("hello");
/// assert_eq!(scalar.try_as_str().flatten(), Some("hello"));
/// ```
pub fn try_as_str(&self) -> Option<Option<&str>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not -> Result<Option<&str>> for this try method?
Caller can always convert to an option.

(Also, most of the use cases in this PR are converting a returned None to an error).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @wiedld

DataFusionError always has an owned String in it, so returning an Result is actually quite slow as it needs to allocate some memory and copy stuff around. Thus I think this API should return an Option

@alamb alamb merged commit 5d18648 into apache:main Jan 18, 2025
25 checks passed
@alamb
Copy link
Contributor Author

alamb commented Jan 18, 2025

Thanks @wiedld and @xudong963 !

@alamb alamb deleted the alamb/scalar_value_as_str branch January 18, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate functions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a way to access logical String ScalarValues as &str
3 participants