Skip to content

Commit

Permalink
[SPARK-34578][SQL][TESTS][TEST-MAVEN] Refactor ORC encryption tests a…
Browse files Browse the repository at this point in the history
…nd ignore ORC shim loaded by old Hadoop library

### What changes were proposed in this pull request?

1. This PR aims to ignore ORC encryption tests when ORC shim is loaded by old Hadoop library by some other tests. The test coverage is preserved by Jenkins SBT runs and GitHub Action jobs. This PR only aims to recover Maven Jenkins jobs.
2. In addition, this PR simplifies SBT testing by refactor the test config to `SparkBuild.scala/pom.xml` and remove `DedicatedJVMTest`. This will remove one GitHub Action job which was recently added for `DedicatedJVMTest` tag.

### Why are the changes needed?

Currently, Maven test fails when it runs in a batch mode because `HadoopShimsPre2_3$NullKeyProvider` is loaded.

**MVN COMMAND**
```
$ mvn test -pl sql/core --am -Dtest=none -DwildcardSuites=org.apache.spark.sql.execution.datasources.orc.OrcV1QuerySuite,org.apache.spark.sql.execution.datasources.orc.OrcEncryptionSuite
```

**BEFORE**
```
- Write and read an encrypted table *** FAILED ***
...
  Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1) (localhost executor driver): java.lang.IllegalArgumentException: Unknown key pii
	at org.apache.orc.impl.HadoopShimsPre2_3$NullKeyProvider.getCurrentKeyVersion(HadoopShimsPre2_3.java:71)
	at org.apache.orc.impl.WriterImpl.getKey(WriterImpl.java:871)
```

**AFTER**
```
OrcV1QuerySuite
...
OrcEncryptionSuite:
- Write and read an encrypted file !!! CANCELED !!!
  [] was empty org.apache.orc.impl.HadoopShimsPre2_3$NullKeyProvider1b705f65 doesn't has the test keys. ORC shim is created with old Hadoop libraries (OrcEncryptionSuite.scala:39)
- Write and read an encrypted table !!! CANCELED !!!
  [] was empty org.apache.orc.impl.HadoopShimsPre2_3$NullKeyProvider22adeee1 doesn't has the test keys. ORC shim is created with old Hadoop libraries (OrcEncryptionSuite.scala:67)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the Jenkins Maven tests.

For SBT command,
- the test suite required a dedicated JVM (Before)
- the test suite doesn't require a dedicated JVM (After)
```
$ build/sbt "sql/testOnly *.OrcV1QuerySuite *.OrcEncryptionSuite"
...
[info] OrcV1QuerySuite
...
[info] - SPARK-20728 Make ORCFileFormat configurable between sql/hive and sql/core (26 milliseconds)
[info] OrcEncryptionSuite:
[info] - Write and read an encrypted file (431 milliseconds)
[info] - Write and read an encrypted table (359 milliseconds)
[info] All tests passed.
[info] Passed: Total 35, Failed 0, Errors 0, Passed 35
```

Closes apache#31697 from dongjoon-hyun/SPARK-34578-TEST.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
  • Loading branch information
2 people authored and HyukjinKwon committed Mar 2, 2021
1 parent ce13dcc commit 4818847
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 41 deletions.
8 changes: 1 addition & 7 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,12 +61,6 @@ jobs:
excluded-tags: org.apache.spark.tags.SlowHiveTest
comment: "- other tests"
# SQL tests
- modules: sql
java: 8
hadoop: hadoop3.2
hive: hive2.3
included-tags: org.apache.spark.tags.DedicatedJVMTest
comment: "- dedicated JVM tests"
- modules: sql
java: 8
hadoop: hadoop3.2
Expand All @@ -77,7 +71,7 @@ jobs:
java: 8
hadoop: hadoop3.2
hive: hive2.3
excluded-tags: org.apache.spark.tags.DedicatedJVMTest,org.apache.spark.tags.ExtendedSQLTest
excluded-tags: org.apache.spark.tags.ExtendedSQLTest
comment: "- other tests"
env:
MODULES_TO_TEST: ${{ matrix.modules }}
Expand Down

This file was deleted.

1 change: 1 addition & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2647,6 +2647,7 @@
<spark.ui.showConsoleProgress>false</spark.ui.showConsoleProgress>
<spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak>
<spark.memory.debugFill>true</spark.memory.debugFill>
<spark.hadoop.hadoop.security.key.provider.path>test:///</spark.hadoop.hadoop.security.key.provider.path>
<!-- Needed by sql/hive tests. -->
<test.src.tables>src</test.src.tables>
</systemProperties>
Expand Down
2 changes: 1 addition & 1 deletion project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,6 @@ object SparkParallelTestGrouping {
"org.apache.spark.sql.catalyst.expressions.HashExpressionsSuite",
"org.apache.spark.sql.catalyst.expressions.CastSuite",
"org.apache.spark.sql.catalyst.expressions.MathExpressionsSuite",
"org.apache.spark.sql.execution.datasources.orc.OrcEncryptionSuite",
"org.apache.spark.sql.hive.HiveExternalCatalogSuite",
"org.apache.spark.sql.hive.StatisticsSuite",
"org.apache.spark.sql.hive.client.VersionsSuite",
Expand Down Expand Up @@ -1061,6 +1060,7 @@ object TestSettings {
javaOptions in Test += "-Dspark.ui.enabled=false",
javaOptions in Test += "-Dspark.ui.showConsoleProgress=false",
javaOptions in Test += "-Dspark.unsafe.exceptionOnMemoryLeak=true",
javaOptions in Test += "-Dspark.hadoop.hadoop.security.key.provider.path=test:///",
javaOptions in Test += "-Dsun.io.serialization.extendedDebugInfo=false",
javaOptions in Test += "-Dderby.system.durability=test",
javaOptions in Test += "-Dio.netty.tryReflectionSetAccessible=true",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@

package org.apache.spark.sql.execution.datasources.orc

import java.util.Random

import org.apache.orc.impl.HadoopShimsFactory

import org.apache.spark.sql.Row
import org.apache.spark.sql.test.SharedSparkSession
import org.apache.spark.tags.DedicatedJVMTest

@DedicatedJVMTest
class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
import testImplicits._

Expand All @@ -30,6 +32,11 @@ class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
Row(null, "841626795E7D351555B835A002E3BF10669DE9B81C95A3D59E10865AC37EA7C3", "Dongjoon Hyun")

test("Write and read an encrypted file") {
val conf = spark.sessionState.newHadoopConf()
val provider = HadoopShimsFactory.get.getHadoopKeyProvider(conf, new Random)
assume(!provider.getKeyNames.isEmpty,
s"$provider doesn't has the test keys. ORC shim is created with old Hadoop libraries")

val df = originalData.toDF("ssn", "email", "name")

withTempPath { dir =>
Expand All @@ -53,9 +60,14 @@ class OrcEncryptionSuite extends OrcTest with SharedSparkSession {
}

test("Write and read an encrypted table") {
val conf = spark.sessionState.newHadoopConf()
val provider = HadoopShimsFactory.get.getHadoopKeyProvider(conf, new Random)
assume(!provider.getKeyNames.isEmpty,
s"$provider doesn't has the test keys. ORC shim is created with old Hadoop libraries")

val df = originalData.toDF("ssn", "email", "name")

withTempPath { dir =>
withTempDir { dir =>
val path = dir.getAbsolutePath
withTable("encrypted") {
sql(
Expand Down

0 comments on commit 4818847

Please sign in to comment.