Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3 - java.lang.ArrayStoreException: java.lang.invoke.SerializedLambda #84

Open
zHaytam opened this issue Apr 6, 2021 · 8 comments

Comments

@zHaytam
Copy link

zHaytam commented Apr 6, 2021

Hello,

We're trying to write a dataframe to redshift, using Spark 3.0.1 (on emr) and your connector, but we receive the following error:
WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, 10.80.139.254, executor 1): java.lang.ArrayStoreException: java.lang.invoke.SerializedLambda

Packages added:

  • com.amazon.redshift:redshift-jdbc42-no-awssdk:1.2.36.1060
  • io.github.spark-redshift-community:spark-redshift_2.12:4.2.0
  • org.apache.spark:spark-avro_2.12:3.0.1
@jsleight
Copy link
Collaborator

jsleight commented Apr 6, 2021

(I also saw your stackoverflow, so reading a bit from there that you suspect it is crashing on the write to s3 on some pretty simple data)

The write to s3 code is here, do you know which format you are writing with? That would help make a narrower example.

@zHaytam
Copy link
Author

zHaytam commented Apr 6, 2021

Hello,

We tried with Avro (default) and CSV, they both throw the same exception. I also read that part of the source code, I suspect that maybe it's because of either convertedRows or convertedSchema?

Thanks

@jsleight
Copy link
Collaborator

jsleight commented Apr 6, 2021

Could be, although the converters are just for decimal, date, and timestamp -- which aren't in your example. There is something about making the schema columns be lowercase, which would impact your example -- could see if all lowercase column names helps?

Otherwise, I'd check to see that this isn't a case of spark giving you a misinformative error (e.g., via lazy execution and the issue is actually somewhere else but this was the first spark action). Could try swapping out the redshift write with just an s3 write to the same path.

@zHaytam
Copy link
Author

zHaytam commented Apr 6, 2021

All the columns in the dataframe we're trying to write are lowercase.
Also, we are able to write the dataframe to s3 to the same path (without the conversions).

@jsleight
Copy link
Collaborator

jsleight commented Apr 7, 2021

Do you have an example df? The example you linked in stackoverflow has columns called ["ID", "TYPE", "CODE"] which are all uppercase. If you have decimal, date, or timestamp types in your df, then a bug in the converters seems more likely.

@zHaytam
Copy link
Author

zHaytam commented Apr 7, 2021

The dataframe that we tried is this:

name id type count
x 0 cf 7

Nothing advanced.

@jsleight
Copy link
Collaborator

jsleight commented Apr 7, 2021

@88manpreet have any ideas? I don't see anything in the converters that should cause this error

@88manpreet
Copy link
Collaborator

88manpreet commented May 25, 2021

@zHaytam sorry missed getting back and prioritizing this earlier. Is this issue still happening?

I tried to reproduce it in the integration tests for both avro and csv format.
Diff: https://gist.github.com/88manpreet/8049611246ee306628dfc3e9df7eb2ad

which I think imitates the above behavior. I could see the temp files created in the scratch path for both avro and csv format.

I also didn't see anything obviously wrong with the converters. I will keep trying to reproduce it in different ways.
I also noticed that the redshift-jdbc42-no-awssdk you are using is the same one we are using.

@zHaytam in the meantime is it possible to test this case with the latest version v5.0.3?

Would it also be possible for you to share the patch of the relevant code you are using to run into this scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants