Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV when closing ZstdCompressCtx #331

Open
gaojieliu opened this issue Nov 18, 2024 · 6 comments
Open

SIGSEGV when closing ZstdCompressCtx #331

gaojieliu opened this issue Nov 18, 2024 · 6 comments

Comments

@gaojieliu
Copy link

There is a race condition in AutoCloseBase#close based on my understanding
https://github.com/luben/zstd-jni/blob/master/src/main/java/com/github/luben/zstd/AutoCloseBase.java#L68

@Override
    public void close() {
        // Note: still should use synchronized in addition to sharedLock, because this class must support racy close(),
        // the second could happen through finalization or cleaning (when Cleaner is used). When updated to Java 9+,
        // synchronization block could be replaced with try { ... } finally { Reference.reachabilityFence(this); }
        synchronized (this) {
            if (sharedLock == SHARED_LOCK_CLOSED) {
                return;
            }
            if (!SHARED_LOCK_UPDATER.compareAndSet(this, 0, SHARED_LOCK_CLOSED)) {
                throw new IllegalStateException("Attempt to close while in use");
            }
            doClose();
        }
    }

Some other compressing thread can increase the sharedLock between

            if (!SHARED_LOCK_UPDATER.compareAndSet(this, 0, SHARED_LOCK_CLOSED)) {
                throw new IllegalStateException("Attempt to close while in use");
            }

and

            doClose();

Since doClose would deallocate the underlying native reference, the other compressing threads would crash with SIGSEGV.
What is the reason that we can't use a ReentrantReadWriteLock here to guard this race condition?
Performance concerns?

@luben
Copy link
Owner

luben commented Nov 20, 2024

No, it can't be increased. After SHARED_LOCK_UPDATER.compareAndSet(this, 0, SHARED_LOCK_CLOSED) the lock is closed and attempt to acquire it will throw exception. See https://github.com/luben/zstd-jni/blob/master/src/main/java/com/github/luben/zstd/AutoCloseBase.java#L25-L36

@gaojieliu
Copy link
Author

Emm..
That is hard to explain why the crash would happen.

Current thread (0x00007f32f0fc8200):  JavaThread "AA_WC_PARALLEL_PROCESSING-t4" daemon [_thread_in_native, id=2497946, stack(0x00007f29b9700000,0x00007f29b9800000)]

Stack: [0x00007f29b9700000,0x00007f29b9800000],  sp=0x00007f29b97fc700,  free space=1009k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libzstd-jni-1.5.5-610053516617059633024.so+0x9d981]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 40176  com.github.luben.zstd.ZstdCompressCtx.compressByteArray0(J[BII[BII)J (0 bytes) @ 0x00007f366396465d [0x00007f36639645c0+0x000000000000009d]
J 40177 c1 com.github.luben.zstd.ZstdCompressCtx.compressByteArray([BII[BII)I (87 bytes) @ 0x00007f365c98426c [0x00007f365c984060+0x000000000000020c]
J 40178 c1 com.linkedin.venice.compression.ZstdWithDictCompressor.compress(Ljava/nio/ByteBuffer;I)Ljava/nio/ByteBuffer; (172 bytes) @ 0x00007f365cb31bd4 [0x00007f365cb307c0+0x0000
000000001414]
J 45590 c1 com.linkedin.davinci.kafka.consumer.LeaderFollowerStoreIngestionTask.maybeCompressData(ILjava/nio/ByteBuffer;Lcom/linkedin/davinci/kafka/consumer/PartitionConsumptionSt
ate;)Ljava/nio/ByteBuffer; (90 bytes) @ 0x00007f365f122e44 [0x00007f365f122a60+0x00000000000003e4]
J 40318 c2 com.linkedin.davinci.kafka.consumer.ActiveActiveStoreIngestionTask.processActiveActiveMessage(Lcom/linkedin/venice/pubsub/api/PubSubMessage;Lcom/linkedin/davinci/kafka/
consumer/PartitionConsumptionState;ILjava/lang/String;IJJ)Lcom/linkedin/davinci/kafka/consumer/PubSubMessageProcessedResult; (785 bytes) @ 0x00007f3664f2faa8 [0x00007f3664f2d2e0+0
x00000000000027c8]

@luben
Copy link
Owner

luben commented Nov 21, 2024

Hmm, strange. I see some other methods that should be guarding the native code through the same lock but do not. May be that could be the issue, though these are not in the stack trace.

@luben
Copy link
Owner

luben commented Nov 21, 2024

added more guards around executing native code: cec9653 , but I am not sure if it will fix the problem here

@gaojieliu
Copy link
Author

Thanks, we will give it a try when the new release is ready.

@luben
Copy link
Owner

luben commented Nov 24, 2024

just released 1.5.6-8 with that fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants