Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JedisCluster Requests Hang Indefinitely After Lock, Ignoring Timeout Configurations #4002

Open
Nacol-174 opened this issue Oct 25, 2024 · 2 comments

Comments

@Nacol-174
Copy link

Expected behavior

The command timeout can be interrupted.

Actual behavior

When performing Jedis operations in the production environment, the system experiences lags lasting several minutes. After troubleshooting with jstack, we found that numerous threads enter a WAITING state after calling getSlotConnection(). Upon examining the source code for JedisClusterInfoCache, I noticed that this class uses a ReentrantReadWriteLock, leading me to suspect that a write lock is being held, which is causing prolonged read lock blocking. Based on this, I developed a tool to proactively acquire the write lock, as outlined below:

package com.nacol.redisbandwidth.component.cache;

import redis.clients.jedis.*;

import java.lang.reflect.Field;
import java.util.concurrent.locks.Lock;

public class JedisClusterInfoCacheLockUtil {

    private final Lock writeLock;

    private final Lock readLock;

    public JedisClusterInfoCacheLockUtil(JedisCluster jedisCluster) throws Exception {

        Field connectionHandlerField = BinaryJedisCluster.class.getDeclaredField("connectionHandler");
        connectionHandlerField.setAccessible(true);
        JedisClusterConnectionHandler connectionHandler = (JedisClusterConnectionHandler) connectionHandlerField.get(jedisCluster);

        Field cacheField = JedisClusterConnectionHandler.class.getDeclaredField("cache");
        cacheField.setAccessible(true);
        JedisClusterInfoCache jedisClusterInfoCache = (JedisClusterInfoCache) cacheField.get(connectionHandler);

        Field writeLockField = JedisClusterInfoCache.class.getDeclaredField("w");
        writeLockField.setAccessible(true);
        this.writeLock = (Lock) writeLockField.get(jedisClusterInfoCache);
        
        Field readLockField = JedisClusterInfoCache.class.getDeclaredField("r");
        readLockField.setAccessible(true);
        this.readLock = (Lock) readLockField.get(jedisClusterInfoCache);
    }

    public void lockWrite() {
        writeLock.lock();
    }

    public void unlockWrite() {
        writeLock.unlock();
    }

    public void lockRead() {
        readLock.lock();
    }

    public void unlockRead() {
        readLock.unlock();
    }

}

Then, execute the following demo:

1.	Initialize JedisCluster.
2.	Acquire the write lock.
3.	Start a child thread to execute the get command (executing get in the same thread would re-enter, which doesn’t fit the scenario).
        // STEP init Clsuter
        JedisCluster cluster = JedisClient.getCluster();

        // STEP init clock util
        JedisClusterInfoCacheLockUtil util = new JedisClusterInfoCacheLockUtil(cluster);

        // STEP lock
        util.lockWrite();

        // STEP Start a child thread to execute the get command 
        //    (executing get in the same thread would re-enter, which doesn’t fit the 
        Executors.newFixedThreadPool(1).execute(() ->{
            // At this point, execution will be indefinitely blocked.
            cluster.get("test-key");
            log.info("sub finish");
        });

        Thread.sleep(10000000);
        util.unlockWrite();

        log.info("main finish");

Execution result:
The child thread’s get command will remain stalled, waiting for the write lock to be released. Even if maxWaitMillis, connectionTimeout, soTimeout, and maxAttempts are configured, the operation will not trigger an interruption.
This leads to a multi-minute blocking delay.

ENV

  • Jedis Configuration
    • maxWaitMillis: 4000ms
    • connectionTimeout: 2000ms
    • maxAttempts: 3
    • soTimeout: 350ms
  • Jedis version:3.5.0
  • Redis version:6.2.14
  • java version:8
@sazzad16
Copy link
Collaborator

@Nacol-174 Thank you for your work and sharing.

@Nacol-174
Copy link
Author

@Nacol-174 Thank you for your work and sharing.

This issue occurs across Jedis versions 3, 4, and 5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants