-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Spark-compatible CAST from String to Decimal #325
Comments
Hi, I'd like to contribute to this! |
Thanks @kevinmingtarja. You can take a look at @andygrove's PR as a reference #307 |
Current state for reference:
Note: I encountered a |
Hi @kevinmingtarja, are you working on this issue? If not, I would like to work on it. Thank you. |
Hey, i don't think i have the bandwidth rn to complete this, so please feel free to work on it. I have made some progress here on a branch in my fork, so feel free to take inspirations from there as well if needed! |
Hi @sujithjay, let me know if you are still working on this. I am happy to continue #615 and add you as a co-author. Thank you |
I was looking at this issue and now - with comet enabled and comet disabled shows same result - +-----+-----------+
| n| converted|
+-----+-----------+
|-10.0| -10.00|
| +1.0| 1.00|
| .34| 0.34|
| 4e7|40000000.00|
| 1| 1.00|
| 0| 0.00|
| | null|
+-----+-----------+
scala> spark.conf.set("spark.comet.enabled", false)
scala> df2.show
+-----+-----------+
| n| converted|
+-----+-----------+
|-10.0| -10.00|
| +1.0| 1.00|
| .34| 0.34|
| 4e7|40000000.00|
| 1| 1.00|
| 0| 0.00|
| | null|
+-----+-----------+ But even with comet enabled it does not go through DataFusion as it is marked as incompatible, I then checked this in the DataFusion project the following queries query TR
select arrow_typeof(cast(4e7 as decimal(10,2))), cast(4e7 as decimal(10,2));
----
Decimal128(10, 2) 40000000
works fine but the following one if I add a '4e7' as shown below it throws arrow-error External error: query failed: DataFusion error: Arrow error: Cast error: Cannot cast string '4e7' to value of Decimal128(38, 10) type
[SQL] select arrow_typeof(cast('4e7' as decimal(10,2))), cast('4e7' as decimal(10,2));
I think, if we add the functionality for string of this format to decimal in arrow-rs, then the above error could be resolved and also this issue will be addressed. Previous attempt to fix this issue was made on DataFusion project. Can someone please help me understand whether the fix can be applied in arrow-rs and use the cast from there? |
It makes sense to me for arrow-rs to support casting |
take, this arrow-pr should fix one. |
What is the problem the feature request solves?
What is the problem the feature request solves?
We currently delegate to DataFusion when casting from string to decimal and there are some differences in behavior compared to Spark.
4e7
produces40000000.00
in Spark, andnull
in DataFusion.
,-
,+
and empty string producenull
in Spark, and0.0
in DataFusion0
produces0
in Spark, andnull
in DataFusionCannot cast string to decimal with negative scale
). We could choose to fallback to Spark for this use case (or ifSQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED
is enabled)Describe the potential solution
No response
Additional context
I used the following test in
CometCastSuite
to explore this.Describe the potential solution
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: