Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could OOM Killer cause APCU deadlock ? #486

Open
TomZhuPlanetart opened this issue Mar 16, 2023 · 1 comment
Open

Could OOM Killer cause APCU deadlock ? #486

TomZhuPlanetart opened this issue Mar 16, 2023 · 1 comment

Comments

@TomZhuPlanetart
Copy link

TomZhuPlanetart commented Mar 16, 2023

APCU Version: 5.1.22
PHP Version: 8.1.9

Today two of our production servers stopped responding. Nginx kept returning 502, and the error log had this error:

2023/03/16 06:39:50 [error] 10953#0: *783933 connect() to unix:/var/run/php-fpm.sock failed (11: Resource temporarily unavailable) while connecting to upstream

The number of php-fpm process reached the max_children. I spot checked the call stack of several of the fpm processes, all show me something like below:

#0  0x0000ffffbc6f8a1c in pthread_rwlock_wrlock () from target:/lib64/libpthread.so.0
#1  0x0000ffffb1e79188 in apc_lock_wlock_impl (lock=<optimized out>) at /var/tmp/apcu/apc_lock.c:102
#2  apc_lock_wlock (lock=<optimized out>) at /var/tmp/apcu/apc_lock.c:322
#3  0x0000ffffb1e7c834 in apc_cache_wlock (cache=0x0) at /var/tmp/apcu/apc_cache.h:293
#4  apc_cache_store (cache=0xaaaab888a0a0, key=0xffffbc1bc000, val=0xffffbc015050, ttl=<optimized out>, exclusive=39, exclusive@entry=false) at /var/tmp/apcu/apc_cache.c:511
#5  0x0000ffffb1e7a9a4 in apc_store_helper (return_value=0xffffc49db968, exclusive=false, execute_data=<optimized out>) at /var/tmp/apcu/php_apc.c:490
#6  0x0000aaaaab0fffdc in execute_ex ()
#7  0x0000aaaaab103748 in zend_execute ()
#8  0x0000aaaaab083fc4 in zend_execute_scripts ()
#9  0x0000aaaaab020528 in php_execute_script ()
#10 0x0000aaaaaae70d90 in main ()

I then checked system log with following command , it seems several fpm processes were killed by OOM.

dmesg | grep ' Killed process'
[273397.418006] Killed process 2470 (php-fpm) total-vm:610136kB, anon-rss:85900kB, file-rss:0kB, shmem-rss:57476kB
[273400.232269] Killed process 2485 (php-fpm) total-vm:610136kB, anon-rss:84876kB, file-rss:0kB, shmem-rss:58548kB
[273403.803063] Killed process 2499 (php-fpm) total-vm:676524kB, anon-rss:84760kB, file-rss:0kB, shmem-rss:58256kB
[273406.776361] Killed process 2474 (php-fpm) total-vm:608088kB, anon-rss:83756kB, file-rss:0kB, shmem-rss:58500kB
[273409.829074] Killed process 2498 (php-fpm) total-vm:608092kB, anon-rss:84096kB, file-rss:0kB, shmem-rss:57756kB
[273411.916416] Killed process 2488 (php-fpm) total-vm:609684kB, anon-rss:84940kB, file-rss:0kB, shmem-rss:57780kB
[273414.262209] Killed process 2494 (php-fpm) total-vm:609688kB, anon-rss:85128kB, file-rss:0kB, shmem-rss:57244kB
[273416.574143] Killed process 2492 (php-fpm) total-vm:608092kB, anon-rss:83796kB, file-rss:0kB, shmem-rss:58284kB
[273417.772178] Killed process 2462 (php-fpm) total-vm:610136kB, anon-rss:85428kB, file-rss:0kB, shmem-rss:56720kB
[273419.842312] Killed process 2487 (php-fpm) total-vm:610140kB, anon-rss:84536kB, file-rss:0kB, shmem-rss:58008kB
[273422.121932] Killed process 2408 (php-fpm) total-vm:606472kB, anon-rss:80948kB, file-rss:8kB, shmem-rss:64068kB
[273424.428721] Killed process 2455 (php-fpm) total-vm:608276kB, anon-rss:83124kB, file-rss:0kB, shmem-rss:61248kB
[273427.680694] Killed process 2429 (php-fpm) total-vm:608248kB, anon-rss:82688kB, file-rss:0kB, shmem-rss:61796kB
[273431.686782] Killed process 2496 (php-fpm) total-vm:610140kB, anon-rss:84928kB, file-rss:0kB, shmem-rss:58700kB
[273434.162302] Killed process 2464 (php-fpm) total-vm:608012kB, anon-rss:83436kB, file-rss:0kB, shmem-rss:60548kB
[273436.500678] Killed process 2451 (php-fpm) total-vm:610180kB, anon-rss:84500kB, file-rss:0kB, shmem-rss:59340kB
[273439.167658] Killed process 2491 (php-fpm) total-vm:607964kB, anon-rss:83644kB, file-rss:0kB, shmem-rss:60104kB
[273443.495059] Killed process 2503 (php-fpm) total-vm:610136kB, anon-rss:85568kB, file-rss:0kB, shmem-rss:58444kB
[273446.585592] Killed process 2497 (php-fpm) total-vm:610184kB, anon-rss:84620kB, file-rss:0kB, shmem-rss:59540kB
[273449.305805] Killed process 2484 (php-fpm) total-vm:610140kB, anon-rss:85020kB, file-rss:0kB, shmem-rss:59148kB
[273452.450242] Killed process 2495 (php-fpm) total-vm:610340kB, anon-rss:86108kB, file-rss:0kB, shmem-rss:58356kB
[273453.889468] Killed process 2160 (php-fpm) total-vm:600828kB, anon-rss:75428kB, file-rss:0kB, shmem-rss:69152kB
[273456.191259] Killed process 2489 (php-fpm) total-vm:610136kB, anon-rss:86084kB, file-rss:0kB, shmem-rss:58552kB
[273459.088086] Killed process 2476 (php-fpm) total-vm:610140kB, anon-rss:86216kB, file-rss:12kB, shmem-rss:58480kB
[273462.998725] Killed process 2452 (php-fpm) total-vm:610188kB, anon-rss:84512kB, file-rss:0kB, shmem-rss:60088kB
[273684.069670] Killed process 2552 (php-fpm) total-vm:610140kB, anon-rss:84532kB, file-rss:8kB, shmem-rss:59740kB
[274345.304179] Killed process 2538 (php-fpm) total-vm:610136kB, anon-rss:84524kB, file-rss:0kB, shmem-rss:59704kB
[300480.909337] Killed process 2520 (php-fpm) total-vm:610140kB, anon-rss:85508kB, file-rss:0kB, shmem-rss:58668kB
[300484.820437] Killed process 20249 (yum) total-vm:324688kB, anon-rss:215068kB, file-rss:0kB, shmem-rss:0kB
[306699.394469] Killed process 30288 (python2.7) total-vm:388616kB, anon-rss:168776kB, file-rss:0kB, shmem-rss:0kB
[328553.368812] Killed process 18773 (yum) total-vm:238440kB, anon-rss:204668kB, file-rss:0kB, shmem-rss:0kB

MY PHP Code has a snippet of code using APCU like this:

    public function load($id, $doNotTestCacheValidity = false)
    {
        if (!$this->apcuLoaded) {
            return parent::load($id, $doNotTestCacheValidity);
        }

        $v = apcu_fetch(self::APCU_PREFIX . $id, $success);

        if (!$success) {
            $v = parent::load($id, $doNotTestCacheValidity);
            if ($v) {
                apcu_store(self::APCU_PREFIX . $id, $v, 10);
            }
        }

        return $v;
    }

I'm wondering whether it's possible that after one of the php process acquired the lock, and then it was killed by OOM, the lock held by it could not be released, and finally caused all processes being locked.

@TomZhuPlanetart TomZhuPlanetart changed the title Could OOM cause deadlock ? Could OOM cause APCU deadlock ? Mar 16, 2023
@TomZhuPlanetart TomZhuPlanetart changed the title Could OOM cause APCU deadlock ? Could OOM Killer cause APCU deadlock ? Mar 16, 2023
@nikic
Copy link
Collaborator

nikic commented Mar 18, 2023

I'm wondering whether it's possible that after one of the php process acquired the lock, and then it was killed by OOM, the lock held by it could not be released, and finally caused all processes being locked.

Yes, unfortunately this is possible. APCu uses POSIX rwlocks, which do not support robustness. POSIX only supports robust mutexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants