-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-6612] Fix ParquetFileFormat issue caused by the setting of local property isNativeApplicable #6627
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
@@ -210,14 +210,10 @@ class ParquetFileFormat extends FileFormat with DataSourceRegister with Logging | |||
|
|||
/** Returns whether the reader will return the rows as batch or not. */ | |||
override def supportBatch(sparkSession: SparkSession, schema: StructType): Boolean = { | |||
if ("true".equals(sparkSession.sparkContext.getLocalProperty("isNativeAppliable"))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf, do you know why this check is added in the main code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PHILO-HE No need to add this check here. It is only used in parquet reader not writer.
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
Run Gluten Clickhouse CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a little bit concerned about the use of these local properties. Perhaps we can manage to remove them later some time.
@@ -83,7 +83,7 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable | |||
options: Map[String, String], | |||
files: Seq[FileStatus]): Option[StructType] = { | |||
// Why if (false)? Such code requires comments when being written. | |||
if ("true".equals(sparkSession.sparkContext.getLocalProperty("isNativeAppliable")) && false) { | |||
if ("true" == sparkSession.sparkContext.getLocalProperty("isNativeApplicable") && false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more straight forward to use sparkSession.sparkContext.getLocalProperty("isNativeApplicable").toBoolean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer, the property can be null and needs extra check if we use that. Maybe, just keep the current change.
I am still a little bit concerned about the use of these local properties. Perhaps we can manage to remove them later some time.
Yes, it's not a good approach to use local property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use true == xxx? why do we use string "true", is it from gluten config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FelixYBW, it's the raw value we obtained for this property.
The property value is kept by spark context and it is set by Gluten in GlutenWriterColumnarRules
.
The implementation for |
Let's add comments in code once you get it. We should avoid such use. |
…cal property isNativeApplicable (apache#6627)
* [GLUTEN-6612] Fix ParquetFileFormat issue caused by the setting of local property isNativeApplicable (#6627) * [CORE] Fix schema mismatch between ReadRelNode and LocalFilesNode (#6746) Co-authored-by: 蒋添 <[email protected]> * [UT] Test input_file_name, input_file_block_start & input_file_block_length when scan falls back (#6318) * [VL] Fix E function fallback issue (#6397) * [VL] Add Scala 2.13 support (#6326) * [VL] Add Scala 2.13 support * Fix scalaStyle issues * Fix Scala Style issues * Add Spark 3.5.1 and Scala 2.13 test in workflow * Add run-spark-test-spark35-scala213 job * Add Spark 3.5.1 and Scala 2.13 test in workflow * Fix tests failures * Fix tests failures * ScalaStyle fix * Fix SoftAffinitySuite * Fix ArrowUtil error * Fix backend-velox scala issues * Fix ColumnarArrowEvalPythonExec issues * Fix ColumnarArrowEvalPythonExec issues * Fix TestOperator.scala for style issues * Fix TestOperator.scala for style issues * Fix issues in DeltaRewriteTransformerRules.scala * DeltaRewriteTransformerRules fix * Fix style issues * Fix issues * Fix issues * Fix issues * Fix issues * Fix issues * Fix issues * Fix issues --------- Co-authored-by: Hongze Zhang <[email protected]> * [VL] Fix Alinux3 arrow build issue (#6363) * update velox docker and port PR #6363 for get_velox.sh update --------- Co-authored-by: PHILO-HE <[email protected]> Co-authored-by: jiangjiangtian <[email protected]> Co-authored-by: 蒋添 <[email protected]> Co-authored-by: 高阳阳 <[email protected]> Co-authored-by: Preetesh2110 <[email protected]> Co-authored-by: Hongze Zhang <[email protected]> Co-authored-by: Joey <[email protected]>
What changes were proposed in this pull request?
supportBatch
method.supportBatch
is only called in read path, which shouldn't be impacted by that property used for native write.Fix #6612, which shows inconsistent query behavior caused by historical property setting.
How was this patch tested?
CI/local test.