-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usage help #39
Comments
Sure! I haven't published a release yet, but you can build fairly easily. Just download gradle 4.4 and build using That will build the project and should produce the iceberg-runtime Jar in For an example of creating an Iceberg table, you can look at the example. The API is fairly easy, but I can answer questions if you have any. |
@rdblue first thanks for the help i managed to run the jar on spark but still have some questions (I dont use hdfs i use S3 and example is hdfs) i get
|
You can use Looks like your table create failed with NPE? What was the exception message and stack trace? If that fails, then you can't read with Spark. Also, this conversion is for Hive tables. You can alter it to get it working for a directory of Parquet files, but it works out of the box for partitioned Hive tables. |
@rdblue
Also, this conversion is for Hive tables.
You can alter it to get it working for a directory of Parquet files, but it works out of the box for partitioned Hive tables.
Looks like your table create failed with NPE? What was the exception message and stack trace? If that fails, then you can't read with Spark.
|
Looks like your partition spec is null. Your source table isn't really a Hive table stored in a Hive metastore. That's what the example is trying to use, so you'll have to modify the example. For now, lets focus on getting the table created. You probably have a schema for your table's dataframe. You can use the conversion helpers to get an Iceberg schema to pass when creating a table val icebergSchema = SparkSchemaUtil.convert(df.schema) See Then, you can create a partition spec using the spec builder and identity partitions:
Then, pass that schema and spec in to create the table. |
@rdblue amazing this actually worked for creating iceberge table from spark df
@rdblue regarding the next part of of going through the partitions it seems and appending it ( i created partitioned hive table for the append part) but i fail on (only when the action occur) :
this is the stack trace (truncated):
@rdblue I dont mind updating the example file do you think it will be beneficial ? if so let me know i will create a pr |
I just pushed a fix for this issue. I think you were getting an empty map that the code assumed was non-empty. Try it again? As for the example, it would be great to create one for non-Hive tables. Otherwise, I think maybe we should just document the tools available to convert schemas to Iceberg and to create partition specs. |
@rdblue it work on the mapToArray function but still occur on the bytesMapToArray
|
Ah, same problem. I've pushed a fix for that method, too. |
@rdblue I have built the fixed version the part of creating the extended partition map with the stats succeed.
|
It doesn't look like any files are getting added. Can you make sure the dataframe of SparkDataFiles has rows? |
@rdblue how can i scan the s3 files using using the table/manifest? |
Once you've added files to the table, you can scan it using Spark like this: val df = spark.read.format("iceberg").load("s3://path/to/table").filter(...) |
@rdblue i get NPE on getStatistics in the reader class any idea why ?
|
@eyaltrabelsi, the code was still referencing a field directly instead of using the lazy accessor. Should be fixed now. |
@rdblue Its still the same issue/line thanks |
@rdblue it still accessed |
Did you rebuild with the latest master? |
First of all I think this package can be super beneficial so kudos.
Can you guide me in the process of installing and using this amazing package either by:
thanks for the hard work
The text was updated successfully, but these errors were encountered: