Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash on latest version QBFT #7777

Open
AndreiD opened this issue Oct 17, 2024 · 1 comment
Open

crash on latest version QBFT #7777

AndreiD opened this issue Oct 17, 2024 · 1 comment
Assignees
Labels
bug Something isn't working P3 Medium (ex: JSON-RPC request not working with a specific client library due to loose spec assumtion) QBFT QBFT Consensus re;ated

Comments

@AndreiD
Copy link

AndreiD commented Oct 17, 2024

in the last hyperledger besu running on ubuntu 22 the following crash occurs (openjdk 21.0.2 2024-01-16) why

3|node2 | 2024-10-17 00:22:04.552+02:00 | vertx-blocked-thread-checker | WARN | BlockedThreadChecker | Thread Thread[vert.x-worker-thread-6,5,main] has been blocked for 102105 ms, time limit is 60000 ms
3|node2 | io.vertx.core.VertxException: Thread blocked
3|node2 | at com.sun.jna.internal.Cleaner.register(Cleaner.java:58)
3|node2 | at com.sun.jna.Memory.(Memory.java:122)
3|node2 | at com.sun.jna.Structure$AutoAllocated.(Structure.java:2219)
3|node2 | at com.sun.jna.Structure.autoAllocate(Structure.java:309)
3|node2 | at com.sun.jna.Structure.allocateMemory(Structure.java:436)
3|node2 | at com.sun.jna.Structure.allocateMemory(Structure.java:413)
3|node2 | at com.sun.jna.Structure.ensureAllocated(Structure.java:389)
3|node2 | at com.sun.jna.Structure.ensureAllocated(Structure.java:379)
3|node2 | at com.sun.jna.Structure.write(Structure.java:788)
3|node2 | at com.sun.jna.Structure.autoWrite(Structure.java:2285)
3|node2 | at org.hyperledger.besu.nativelib.secp256k1.LibSecp256k1.secp256k1_ecdsa_sign_recoverable(Native Method)
3|node2 | at org.hyperledger.besu.crypto.SECP256K1.signNative(SECP256K1.java:157)
3|node2 | at org.hyperledger.besu.crypto.SECP256K1.sign(SECP256K1.java:107)
3|node2 | at org.hyperledger.besu.cryptoservices.KeyPairSecurityModule.sign(KeyPairSecurityModule.java:66)
3|node2 | at org.hyperledger.besu.cryptoservices.NodeKey.sign(NodeKey.java:49)
3|node2 | at org.hyperledger.besu.ethereum.p2p.discovery.internal.Packet.(Packet.java:64)
3|node2 | at org.hyperledger.besu.ethereum.p2p.discovery.internal.Packet.create(Packet.java:99)
3|node2 | at org.hyperledger.besu.ethereum.p2p.discovery.internal.PeerDiscoveryController.lambda$createPacket$21(PeerDiscoveryController.java:606)
3|node2 | at org.hyperledger.besu.ethereum.p2p.discovery.VertxPeerDiscoveryAgent$VertxAsyncExecutor.lambda$execute$0(VertxPeerDiscoveryAgent.java:326)
3|node2 | at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:191)
3|node2 | at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:279)
3|node2 | at io.vertx.core.impl.ContextImpl.lambda$internalExecuteBlocking$2(ContextImpl.java:210)
3|node2 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
3|node2 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
3|node2 | at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalR

ChatGPT said:
ChatGPT
The error you're encountering in Hyperledger Besu running on OpenJDK 21 suggests that the Vert.x worker thread is blocked for too long, triggering the BlockedThreadChecker warning. Specifically, it points to a VertxException: Thread blocked, with the blocked thread spending a significant amount of time (102105 ms, where the limit is 60000 ms).

Breakdown of the Error:
Thread Blocked Warning: The BlockedThreadChecker detects that a worker thread (vert.x-worker-thread-6) is taking more time than the configured time limit (60 seconds). In Vert.x, worker threads are expected to process tasks in a non-blocking manner. If a task takes too long, it can indicate a performance bottleneck or an issue in the code.

JNA Memory Management: The thread seems to be blocked while allocating and managing memory using JNA (Java Native Access). The relevant methods involved are:

com.sun.jna.Memory
com.sun.jna.Structure
Native method LibSecp256k1.secp256k1_ecdsa_sign_recoverable
This indicates that the problem might be related to native code interacting with the secp256k1 elliptic curve cryptography library used in Besu.

Key Areas Involved:

The blocking occurs while signing ECDSA messages using the LibSecp256k1 library in native code (SECP256K1.signNative).
The Java Native Access (JNA) framework is managing native memory allocations, and it seems that the memory allocation (Structure.allocateMemory) or writing the structure (Structure.write) is taking too long.
Possible Reasons for the Crash:
Excessive Memory Allocation/Deallocation: The repeated allocation of native memory via JNA can be inefficient or cause the thread to block, especially under high load. If this operation is invoked frequently, it may overwhelm the worker thread.

JNA Issues: There might be inefficiencies in JNA when managing the native memory structures used by the secp256k1 library, leading to excessive time spent in memory management.

Thread Pool Configuration: If the number of Vert.x worker threads is insufficient to handle the workload, tasks may start to queue up and cause blocking. Check the configuration for the thread pool size in Vert.x.

Blocking Operations on Vert.x Worker Threads: Long-running, blocking tasks should not be executed on Vert.x worker threads as these are meant for non-blocking operations. The cryptographic signing operation (LibSecp256k1.secp256k1_ecdsa_sign_recoverable) seems to be a blocking task, which could cause performance bottlenecks if not handled properly.

Suggested Solutions:

  1. Increase Worker Thread Pool Size:
    You can increase the number of worker threads in Vert.x to allow more threads to handle blocking tasks. This can be done through the VertxOptions when initializing Vert.x.

Example:

java
Copy code
VertxOptions options = new VertxOptions().setWorkerPoolSize(40); // Increase the pool size
Vertx vertx = Vertx.vertx(options);
2. Offload Blocking Operations:
Since the cryptographic signing operation seems to be blocking, you can try offloading it to a dedicated thread pool to avoid blocking the main Vert.x worker threads. This can be done by using executeBlocking in Vert.x for blocking tasks.

Example:

java
Copy code
vertx.executeBlocking(promise -> {
// Perform the blocking signing operation
SECP256K1.sign(...);
promise.complete();
}, res -> {
// Handle the result or failure
});
3. Optimize Memory Allocation:
If the issue is related to JNA memory management, try to reduce the frequency of memory allocations or reuse allocated memory if possible. Excessive allocations can degrade performance.

  1. Upgrade Libraries:
    Ensure you are using the latest version of Hyperledger Besu and related libraries, as there could be bug fixes or optimizations in newer releases.
    Similarly, check for updates in the JNA or secp256k1 libraries, as newer versions might resolve performance bottlenecks.
  2. Analyze the Workload:
    Review how often the ECDSA signing operation is invoked and whether it can be optimized or batched to reduce the load on the system.

Conclusion:
The issue seems to arise from blocking cryptographic operations that are not properly handled in the Vert.x non-blocking architecture. By increasing the worker thread pool size, offloading blocking tasks, and optimizing memory allocation, you may be able to resolve the crash and improve system stability.

@jflo jflo added bug Something isn't working P1 Very High (ex: Security issues, unplanned backward incompatibility preventing network participation) labels Oct 22, 2024
@siladu
Copy link
Contributor

siladu commented Nov 25, 2024

Hi @AndreiD we haven't been able to reproduce this so far - are you still experiencing this issue? If so, are there any more details you can add?

Sharing all your config, especially your genesis file might help us reproduce this.

What OS version and glibc version are you on?

Also what Besu version?

@siladu siladu added P3 Medium (ex: JSON-RPC request not working with a specific client library due to loose spec assumtion) and removed P1 Very High (ex: Security issues, unplanned backward incompatibility preventing network participation) labels Nov 25, 2024
@jflo jflo added the QBFT QBFT Consensus re;ated label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P3 Medium (ex: JSON-RPC request not working with a specific client library due to loose spec assumtion) QBFT QBFT Consensus re;ated
Projects
None yet
Development

No branches or pull requests

5 participants
@jflo @AndreiD @garyschulte @siladu and others