You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using a REST catalog, minio and spark iceberg on docker to read from Kafka topic, make some transformations and write it out as iceberg tables. I am able to print the transformed data to the console but throws an error when I write to Iceberg. I can see metadata in minio
py4j.protocol.Py4JJavaError: An error occurred while calling o181.start.
: org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path: s3://warehouse/crypto_metrics4/metadata/version-hint.text
at org.apache.iceberg.hadoop.Util.getFs(Util.java:58)
at org.apache.iceberg.hadoop.HadoopTableOperations.getFileSystem(HadoopTableOperations.java:412)
at org.apache.iceberg.hadoop.HadoopTableOperations.findVersion(HadoopTableOperations.java:315)
at org.apache.iceberg.hadoop.HadoopTableOperations.refresh(HadoopTableOperations.java:104)
at org.apache.iceberg.hadoop.HadoopTableOperations.current(HadoopTableOperations.java:84)
at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:94)
at org.apache.iceberg.spark.SparkCatalog.loadFromPathIdentifier(SparkCatalog.java:972)
at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:839)
at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:170)
at org.apache.iceberg.spark.source.IcebergSource.getTable(IcebergSource.java:107)
at org.apache.iceberg.spark.source.IcebergSource.inferPartitioning(IcebergSource.java:90)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:82)
at org.apache.spark.sql.streaming.DataStreamWriter.startInternal(DataStreamWriter.scala:399)
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:251)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.iceberg.hadoop.Util.getFs(Util.java:56)
... 25 more
What am I possibly doing wrong?
The text was updated successfully, but these errors were encountered:
@rohitanil as commented by @stym06 , you are missing couple things and one of them is aws jar. If u want to load hadoop-aws jars, you will need to load couple additional dependencies jars as well. Alternative, you can use iceberg-aws-bundle jar instead. This is documented in https://iceberg.apache.org/docs/1.7.1/aws/#spark
Other than this, you will need couple additional properties so Spark will know how to talk to S3, such as following:
spark.hadoop.fs.s3.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
# If u are planning to pass in key and secret
spark.hadoop.fs.s3a.access.key=ACCESSKEY
spark.hadoop.fs.s3a.secret.key=SECRETKEY
## if u are planning to use assume role, then set the next one and remove key/secret above
## spark.hadoop.fs.s3a.aws.credentials.provider org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
Query engine
Spark
Question
I am using a REST catalog, minio and spark iceberg on docker to read from Kafka topic, make some transformations and write it out as iceberg tables. I am able to print the transformed data to the console but throws an error when I write to Iceberg. I can see metadata in minio
And I am submitting the spark job along with configurations as follows
Error
What am I possibly doing wrong?
The text was updated successfully, but these errors were encountered: