Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6612] Fix ParquetFileFormat issue caused by the setting of local property isNativeApplicable #6627

Merged
merged 5 commits into from
Jul 31, 2024

Conversation

PHILO-HE
Copy link
Contributor

@PHILO-HE PHILO-HE commented Jul 30, 2024

What changes were proposed in this pull request?

  • Fix typo: isNativeAppliable -> isNativeApplicable.
  • Use == for null-safe comparing the value of local properties.
  • Remove the check of this local property in supportBatch method. supportBatch is only called in read path, which shouldn't be impacted by that property used for native write.
  • Reset local properties used for native write before next setting.

Fix #6612, which shows inconsistent query behavior caused by historical property setting.

How was this patch tested?

CI/local test.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Copy link

Run Gluten Clickhouse CI

@@ -210,14 +210,10 @@ class ParquetFileFormat extends FileFormat with DataSourceRegister with Logging

/** Returns whether the reader will return the rows as batch or not. */
override def supportBatch(sparkSession: SparkSession, schema: StructType): Boolean = {
if ("true".equals(sparkSession.sparkContext.getLocalProperty("isNativeAppliable"))) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf, do you know why this check is added in the main code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PHILO-HE No need to add this check here. It is only used in parquet reader not writer.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@PHILO-HE PHILO-HE changed the title Fix ParquetFileFormat issue caused by the setting of local property isNativeApplicable [GLUTEN-6612] Fix ParquetFileFormat issue caused by the setting of local property isNativeApplicable Jul 30, 2024
Copy link

#6612

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still a little bit concerned about the use of these local properties. Perhaps we can manage to remove them later some time.

@@ -83,7 +83,7 @@ class OrcFileFormat extends FileFormat with DataSourceRegister with Serializable
options: Map[String, String],
files: Seq[FileStatus]): Option[StructType] = {
// Why if (false)? Such code requires comments when being written.
if ("true".equals(sparkSession.sparkContext.getLocalProperty("isNativeAppliable")) && false) {
if ("true" == sparkSession.sparkContext.getLocalProperty("isNativeApplicable") && false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more straight forward to use sparkSession.sparkContext.getLocalProperty("isNativeApplicable").toBoolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer, the property can be null and needs extra check if we use that. Maybe, just keep the current change.

I am still a little bit concerned about the use of these local properties. Perhaps we can manage to remove them later some time.

Yes, it's not a good approach to use local property.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use true == xxx? why do we use string "true", is it from gluten config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FelixYBW, it's the raw value we obtained for this property.
The property value is kept by spark context and it is set by Gluten in GlutenWriterColumnarRules.

@PHILO-HE
Copy link
Contributor Author

The implementation for inferSchema has code like if (xxx && false). I will confirm with this code's author to see whether we can remove it in another pr.

@PHILO-HE PHILO-HE merged commit decd8b4 into apache:main Jul 31, 2024
46 checks passed
@FelixYBW
Copy link
Contributor

The implementation for inferSchema has code like if (xxx && false). I will confirm with this code's author to see whether we can remove it in another pr.

Let's add comments in code once you get it. We should avoid such use.

weiting-chen pushed a commit to weiting-chen/gluten that referenced this pull request Aug 10, 2024
weiting-chen added a commit that referenced this pull request Aug 12, 2024
* [GLUTEN-6612] Fix ParquetFileFormat issue caused by the setting of local property isNativeApplicable (#6627)

* [CORE] Fix schema mismatch between ReadRelNode and LocalFilesNode (#6746)

Co-authored-by: 蒋添 <[email protected]>

* [UT] Test input_file_name, input_file_block_start & input_file_block_length when scan falls back (#6318)

* [VL] Fix E function fallback issue (#6397)

* [VL] Add Scala 2.13 support (#6326)

* [VL] Add Scala 2.13 support

* Fix scalaStyle issues

* Fix Scala Style issues

* Add Spark 3.5.1 and Scala 2.13 test in workflow

* Add run-spark-test-spark35-scala213 job

* Add Spark 3.5.1 and Scala 2.13 test in workflow

* Fix tests failures

* Fix tests failures

* ScalaStyle fix

* Fix SoftAffinitySuite

* Fix ArrowUtil error

* Fix backend-velox scala issues

* Fix ColumnarArrowEvalPythonExec issues

* Fix ColumnarArrowEvalPythonExec issues

* Fix TestOperator.scala for style issues

* Fix TestOperator.scala for style issues

* Fix issues in DeltaRewriteTransformerRules.scala

* DeltaRewriteTransformerRules fix

* Fix style issues

* Fix issues

* Fix issues

* Fix issues

* Fix issues

* Fix issues

* Fix issues

* Fix issues

---------

Co-authored-by: Hongze Zhang <[email protected]>

* [VL] Fix Alinux3 arrow build issue (#6363)

* update velox docker and port PR #6363 for get_velox.sh update

---------

Co-authored-by: PHILO-HE <[email protected]>
Co-authored-by: jiangjiangtian <[email protected]>
Co-authored-by: 蒋添 <[email protected]>
Co-authored-by: 高阳阳 <[email protected]>
Co-authored-by: Preetesh2110 <[email protected]>
Co-authored-by: Hongze Zhang <[email protected]>
Co-authored-by: Joey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] table scan fallback gets wrong query plan
4 participants