-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning in pyspark: NullPointerException in RecordReader.close() #20
Comments
Similar issue with pyspark. It seems that the decompressor has already been destroyed/finalized when reset buffer is getting called so the buffer does not exist anymore and it throws an Null Pointer in the Lz4Decompressor. 23/03/20 12:12:24 INFO CodecPool: Got brand-new decompressor [.4mc] |
This is a problem on FourMcInputStream.close() method calling LZ4Decompressor.releaseDirectBuffers. when reset gets called the buffers have been previously set to null and therefore it causes the NullPointerException. Commenting this out on FourMcInputStream.close() method fix this issue. The following scala code can be used to reproduce this issue, using spark 2.4.4-scala-2.11 and hadoop 2.7.0. object Decompressor { def main(args: Array[String]): Unit = { |
I compressed a file (~30mb, just for testing) using the 4mc tool:
Then I tried to open the compressed file in (py)spark:
I got the result, but it warned me about a null pointer exception. Thought you guys might want to know!
The text was updated successfully, but these errors were encountered: