You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to read parquet files using Gluten with Velox backend I get the error Reason: Failed to get S3 object due to: 'Network connection'. Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)', RequestID:''.
We are able to read properly without Gluten. We are using instance profile directly on EC2.
I've seen similar threads suggest modifying the spark.hadoop.fs.s3a confs but this doesn't seem to fix the issue. Any idea what might be going on?
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Failed to get S3 object due to: 'Network connection'. Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)', RequestID:''.
Retriable: False
Context: Split [Hive: s3a://<bucket>/bmorck/tpch_parquet/lineitem.parquet 15837691904 - 268435456] Task Gluten_Stage_8_TID_240_VTID_4
Additional Context: Operator: TableScan[0] 0
Function: preadInternal
File: /root/src/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp
Line: 184
Stack trace:
# 0 # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 # 11 # 12 # 13 # 14 # 15 # 16 # 17 # 18 # 19 # 20 # 21 # 22
at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at org.apache.gluten.utils.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
at org.apache.gluten.utils.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
at org.apache.gluten.utils.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
at org.apache.gluten.utils.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator.isEmpty(Iterator.scala:387)
at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
at org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:108)
at org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:79)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:868)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:868)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:568)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1537)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:571)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
The text was updated successfully, but these errors were encountered:
Backend
VL (Velox)
Bug description
[Expected behavior] and [actual behavior].
When trying to read parquet files using Gluten with Velox backend I get the error
Reason: Failed to get S3 object due to: 'Network connection'. Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)', RequestID:''.
We are able to read properly without Gluten. We are using instance profile directly on EC2.
I've seen similar threads suggest modifying the
spark.hadoop.fs.s3a
confs but this doesn't seem to fix the issue. Any idea what might be going on?Spark version
Spark-3.3.x
Spark configurations
spark.hadoop.fs.s3a.aws.credentials.provider = "internal credential provider"
spark.hadoop.fs.s3a.endpoint = s3-external-1.amazonaws.com
spark.hadoop.fs.s3a.use.instance.credentials = true
spark.hadoop.fs.s3a.connection.ssl.enabled = true
spark.hadoop.fs.s3a.path.style.access = false
System information
No response
Relevant logs
The text was updated successfully, but these errors were encountered: