Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] VeloxRuntimeError: Failed to get S3 object due to #8170

Open
bmorck opened this issue Dec 6, 2024 · 2 comments
Open

[VL] VeloxRuntimeError: Failed to get S3 object due to #8170

bmorck opened this issue Dec 6, 2024 · 2 comments
Labels
bug Something isn't working triage

Comments

@bmorck
Copy link

bmorck commented Dec 6, 2024

Backend

VL (Velox)

Bug description

[Expected behavior] and [actual behavior].

When trying to read parquet files using Gluten with Velox backend I get the error Reason: Failed to get S3 object due to: 'Network connection'. Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)', RequestID:''.

We are able to read properly without Gluten. We are using instance profile directly on EC2.

I've seen similar threads suggest modifying the spark.hadoop.fs.s3a confs but this doesn't seem to fix the issue. Any idea what might be going on?

Spark version

Spark-3.3.x

Spark configurations

spark.hadoop.fs.s3a.aws.credentials.provider = "internal credential provider"
spark.hadoop.fs.s3a.endpoint = s3-external-1.amazonaws.com
spark.hadoop.fs.s3a.use.instance.credentials = true
spark.hadoop.fs.s3a.connection.ssl.enabled = true
spark.hadoop.fs.s3a.path.style.access = false

System information

No response

Relevant logs

Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Failed to get S3 object due to: 'Network connection'. Path:'s3://<bucket>/bmorck/tpch_parquet/lineitem.parquet', SDK Error Type:99, HTTP Status Code:-1, S3 Service:'Unknown', Message:'curlCode: 77, Problem with the SSL CA cert (path? access rights?)', RequestID:''.
Retriable: False
Context: Split [Hive: s3a://<bucket>/bmorck/tpch_parquet/lineitem.parquet 15837691904 - 268435456] Task Gluten_Stage_8_TID_240_VTID_4
Additional Context: Operator: TableScan[0] 0
Function: preadInternal
File: /root/src/apache/incubator-gluten/ep/build-velox/build/velox_ep/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp
Line: 184
Stack trace:
# 0  
# 1  
# 2  
# 3  
# 4  
# 5  
# 6  
# 7  
# 8  
# 9  
# 10 
# 11 
# 12 
# 13 
# 14 
# 15 
# 16 
# 17 
# 18 
# 19 
# 20 
# 21 
# 22 

	at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.utils.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:159)
	at org.apache.gluten.utils.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:71)
	at org.apache.gluten.utils.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:37)
	at org.apache.gluten.utils.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:100)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator.isEmpty(Iterator.scala:387)
	at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
	at org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
	at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:108)
	at org.apache.gluten.execution.VeloxColumnarToRowExec.$anonfun$doExecuteInternal$1(VeloxColumnarToRowExec.scala:79)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:868)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:868)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:378)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:342)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:568)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1537)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:571)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
@bmorck bmorck added bug Something isn't working triage labels Dec 6, 2024
@majetideepak
Copy link
Collaborator

Velox does not support aws.credentials.provider.

@bmorck
Copy link
Author

bmorck commented Dec 10, 2024

I followed the guidance in this closed issue as well as is documented here. Is the documentation outdated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants