Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
How to spot the problem visually: | [Main process] No output from workers. It seems that we hang. Send | SIGKILL to workers; exiting... How to reproduce: | (Patch Python to trigger GC inside Colorer._write().) | $ diff -u /usr/lib/python3.9/multiprocessing/connection.py{.orig,} | --- /usr/lib/python3.9/multiprocessing/connection.py.orig | +++ /usr/lib/python3.9/multiprocessing/connection.py | @@ -202,6 +202,8 @@ | raise ValueError("size is negative") | elif offset + size > n: | raise ValueError("buffer length < offset + size") | + import gc | + gc.collect() | self._send_bytes(m[offset:offset + size]) | | def send(self, obj): | | (Just in case, my tarantool version.) | $ ./src/tarantool --version | head -n 1 | Tarantool 2.8.0-134-g81c663335 | | (Add the reduced test case.) | $ cat test/xlog/test-run-hang-gh-qa-96.test.lua | test_run = require('test_run').new() | box.schema.user.grant('guest', 'replication') | test_run:cmd('create server replica with rpl_master=default, script="xlog/replica.lua"') | test_run:cmd('start server replica') | test_run:cmd('stop server replica') | test_run:cmd('cleanup server replica') | test_run:cmd('delete server replica') | box.schema.user.revoke('guest', 'replication') | | (Run the reduced test case.) | $ ./test/test-run.py xlog/test-run-hang-gh-qa-96.test.lua | | (Or run existing test with instance managing.) | $ ./test/test-run.py xlog/panic_on_broken_lsn.test.lua The problem appears, when GC is triggered inside Colorer._write() (more precisely, in multiprocessing.SimpleQueue#put()), and TarantoolServer instance is collected. __del__() calls stop(), which calls color_log(), which calls SimpleQueue#put(), which blocks on a lock. The process stucks. In fact, test-run should stop instances correctly without this __del__() method. If it is not so, it is a bug in test-run, which should be fixed anyway. So, I just removed this __del__() method. The problem looks related to [1], but it is unclear, whether it is the only problem, so I'll leave the issue open for a while. [1]: tarantool/tarantool-qa#96
- Loading branch information