[VL] VeloxRuntimeError: Failed to get S3 object due to #8170

bmorck · 2024-12-06T20:26:30Z

Backend

VL (Velox)

Bug description

[Expected behavior] and [actual behavior].

When trying to read parquet files using Gluten with Velox backend I get the error Reason: Failed to get S3 object due to: 'Network connection'. Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)', RequestID:''.

We are able to read properly without Gluten. We are using instance profile directly on EC2.

I've seen similar threads suggest modifying the spark.hadoop.fs.s3a confs but this doesn't seem to fix the issue. Any idea what might be going on?

Spark version

Spark-3.3.x

Spark configurations

spark.hadoop.fs.s3a.aws.credentials.provider = "internal credential provider"
spark.hadoop.fs.s3a.endpoint = s3-external-1.amazonaws.com
spark.hadoop.fs.s3a.use.instance.credentials = true
spark.hadoop.fs.s3a.connection.ssl.enabled = true
spark.hadoop.fs.s3a.path.style.access = false

System information

No response

Relevant logs

Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Failed to get S3 object due to: 'Network connection'. Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)', RequestID:''.
Retriable: False
Context: Split [Hive: s3a://<bucket>/bmorck/tpch_parquet/lineitem.parquet 15837691904 - 268435456] Task Gluten_Stage_8_TID_240_VTID_4
Additional Context: Operator: TableScan[0] 0
Function: preadInternal
File: /root/src/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp
Line: 184
Stack trace:
# 0  
# 1  
# 2  
# 3  
# 4  
# 5  
# 6  
# 7  
# 8  
# 9  
# 10 
# 11 
# 12 
# 13 
# 14 
# 15 
# 16 
# 17 
# 18 
# 19 
# 20 
# 21 
# 22 

	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.utils.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.utils.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.utils.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.utils.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator.isEmpty(Iterator.scala:387)
	at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
	at org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
	at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:108)
	at org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:79)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:868)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:868)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:568)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1537)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:571)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError

The text was updated successfully, but these errors were encountered:

majetideepak · 2024-12-07T11:37:33Z

Velox does not support aws.credentials.provider.

bmorck · 2024-12-10T18:00:53Z

I followed the guidance in this closed issue as well as is documented here. Is the documentation outdated?

bmorck added bug Something isn't working triage labels Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] VeloxRuntimeError: Failed to get S3 object due to #8170

[VL] VeloxRuntimeError: Failed to get S3 object due to #8170

bmorck commented Dec 6, 2024 •

edited

Loading

majetideepak commented Dec 7, 2024

bmorck commented Dec 10, 2024

[VL] VeloxRuntimeError: Failed to get S3 object due to #8170

[VL] VeloxRuntimeError: Failed to get S3 object due to #8170

Comments

bmorck commented Dec 6, 2024 • edited Loading

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

majetideepak commented Dec 7, 2024

bmorck commented Dec 10, 2024

bmorck commented Dec 6, 2024 •

edited

Loading