More flake fixes! #8027

rustyrussell · 2025-01-23T00:26:37Z

Changelog-None: just tweaking tests...

``` def test_no_reconnect_awating_unilateral(node_factory, bitcoind): l1, l2 = node_factory.line_graph(2, opts={'may_reconnect': True}) l2.stop() # Close immediately. l1.rpc.close(l2.info['id'], 1) wait_for(lambda: only_one(l1.rpc.listpeerchannels(l2.info['id'])['channels'])['state'] == 'AWAITING_UNILATERAL') # After switching to AWAITING_UNILATERAL it will *not* try to reconnect. l1.daemon.wait_for_log("State changed from CHANNELD_SHUTTING_DOWN to AWAITING_UNILATERAL") time.sleep(10) > assert not l1.daemon.is_in_log('Will try reconnect', start=l1.daemon.logsearch_start) E AssertionError: assert not 'lightningd-1 2025-01-20T05:46:04.548Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 2 seconds' E + where 'lightningd-1 2025-01-20T05:46:04.548Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 2 seconds' = <bound method TailableProc.is_in_log of <pyln.testing.utils.LightningD object at 0x7f68ab565c90>>('Will try reconnect', start=453) E + where <bound method TailableProc.is_in_log of <pyln.testing.utils.LightningD object at 0x7f68ab565c90>> = <pyln.testing.utils.LightningD object at 0x7f68ab565c90>.is_in_log E + where <pyln.testing.utils.LightningD object at 0x7f68ab565c90> = <fixtures.LightningNode object at 0x7f68ab567880>.daemon E + and 453 = <pyln.testing.utils.LightningD object at 0x7f68ab565c90>.logsearch_start E + where <pyln.testing.utils.LightningD object at 0x7f68ab565c90> = <fixtures.LightningNode object at 0x7f68ab567880>.daemon ``` In fact: ``` 2025-01-20T06:15:27.2854309Z lightningd-2 2025-01-20T05:46:03.527Z DEBUG lightningd: io_break: destroy_plugin 2025-01-20T06:15:27.2855089Z lightningd-2 2025-01-20T05:46:03.527Z DEBUG lightningd: Command returned result after jcon close 2025-01-20T06:15:27.2855805Z lightningd-2 2025-01-20T05:46:03.528Z DEBUG connectd: Shutting down 2025-01-20T06:15:27.2856576Z lightningd-2 2025-01-20T05:46:03.528Z DEBUG gossipd: Shutting down 2025-01-20T06:15:27.2857159Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG connectd: drain_peer 2025-01-20T06:15:27.2857790Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG connectd: drain_peer draining subd! 2025-01-20T06:15:27.2858825Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: peer_disconnect_done 2025-01-20T06:15:27.2860481Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 1 seconds 2025-01-20T06:15:27.2861626Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG connectd: maybe_free_peer freeing peer! 2025-01-20T06:15:27.2862723Z lightningd-1 2025-01-20T05:46:03.530Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Peer connection lost 2025-01-20T06:15:27.2864510Z lightningd-1 2025-01-20T05:46:03.530Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208) 2025-01-20T06:15:27.2866301Z lightningd-1 2025-01-20T05:46:03.530Z DEBUG plugin-funder: Cleaning up inflights for peer id 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59 2025-01-20T06:15:27.2867313Z lightningd-2 2025-01-20T05:46:03.530Z DEBUG hsmd: Shutting down 2025-01-20T06:15:27.2868029Z lightningd-1 2025-01-20T05:46:03.535Z TRACE lightningd: Calling rpc_command hook of plugin cln-xpay 2025-01-20T06:15:27.2868935Z lightningd-1 2025-01-20T05:46:03.535Z TRACE lightningd: Plugin cln-xpay returned from rpc_command hook call 2025-01-20T06:15:27.2869830Z lightningd-1 2025-01-20T05:46:03.536Z TRACE lightningd: Calling rpc_command hook of plugin cln-xpay 2025-01-20T06:15:27.2870740Z lightningd-1 2025-01-20T05:46:03.536Z TRACE lightningd: Plugin cln-xpay returned from rpc_command hook call 2025-01-20T06:15:27.2872276Z lightningd-1 2025-01-20T05:46:03.536Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: State changed from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN 2025-01-20T06:15:27.2873873Z lightningd-1 2025-01-20T05:46:03.538Z DEBUG lightningd: NOTIFY "-c:close#30" info peer is offline, will negotiate once they reconnect (1 seconds before unilateral close). 2025-01-20T06:15:27.2874947Z lightningd-1 2025-01-20T05:46:03.538Z DEBUG lightningd: close_command: timeout = 1 2025-01-20T06:15:27.2878248Z lightningd-1 2025-01-20T05:46:03.541Z DEBUG plugin-cln-grpc: Received a message: CustomNotification(Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"User or plugin invoked close command\"), \"new_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"old_state\": String(\"CHANNELD_NORMAL\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:03.537Z\")}}}) 2025-01-20T06:15:27.2884122Z lightningd-1 2025-01-20T05:46:03.542Z DEBUG plugin-cln-grpc: Dispatching custom notification Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"User or plugin invoked close command\"), \"new_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"old_state\": String(\"CHANNELD_NORMAL\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:03.537Z\")}}} 2025-01-20T06:15:27.2888242Z lightningd-1 2025-01-20T05:46:03.542Z DEBUG plugin-cln-grpc: Failed to parse notification from lightningd Error(\"unknown variant `channel_state_changed`, expected one of `block_added`, `channel_open_failed`, `channel_opened`, `connect`, `custommsg`\", line: 0, column: 0) 2025-01-20T06:15:27.2889970Z lightningd-1 2025-01-20T05:46:04.350Z DEBUG gossipd: seeker: no peers, waiting 2025-01-20T06:15:27.2890762Z lightningd-1 2025-01-20T05:46:04.538Z DEBUG lightningd: NOTIFY "-c:close#30" info Timed out, forcing close. 2025-01-20T06:15:27.2892345Z lightningd-1 2025-01-20T05:46:04.539Z UNUSUAL 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer permanent failure in CHANNELD_SHUTTING_DOWN: Forcibly closed by `close` command timeout (reason=user) 2025-01-20T06:15:27.2894333Z lightningd-1 2025-01-20T05:46:04.539Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: State changed from CHANNELD_SHUTTING_DOWN to AWAITING_UNILATERAL 2025-01-20T06:15:27.2895943Z lightningd-1 2025-01-20T05:46:04.543Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: We have 1 anchor points to use 2025-01-20T06:15:27.2897412Z lightningd-1 2025-01-20T05:46:04.544Z DEBUG lightningd: Broadcasting txid c96f6e4409140474fdd5d8b4862c3d74bd6fa24dadd178182e78fdc4cbf68149 for "-c:close#30" 2025-01-20T06:15:27.2906272Z lightningd-1 2025-01-20T05:46:04.544Z DEBUG lightningd: sendrawtransaction: 020000000001019aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab00000000009db0e280024a01000000000000220020be7935a77ca9ab70a4b8b1906825637767fed3c00824aa90c988983587d68488352f0f00000000002200209f4684ddb28acdc73959bc194d1a25df906f61ed030f52d163e6f1e247d32cbb0400473044022061656b7587a96723e3942f7445313d1c4315e8b2a17fade62d9a48d8a69e2806022024ac01825c79d3bcee6f5dc19586af6b3f8415156cbda934d0e2cbad9268a6a20147304402201fb54f5cb72da9dfd1399b8df6bf4f1950313b057fc0f5b30b2b09aec2783c9e022052dbd5e30bdc7015c0eb939ff3ad58b83c2d7369bb546571603ccfa62e65c4e00147522102324266de8403b3ab157a09f1f784d587af61831c998c151bcc21bb74c2b2314b2102e3bd38009866c9da8ec4aa99cc4ea9c6c0dd46df15c61ef0ce1f271291714e5752ae9a3ed620 2025-01-20T06:15:27.2914366Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG hsmd: Client: Received message 5 from client 2025-01-20T06:15:27.2915958Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Failed connected out: All addresses failed: 127.0.0.1:42595: Connection establishment: Connection refused. 2025-01-20T06:15:27.2917891Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 2 seconds 2025-01-20T06:15:27.2921924Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG plugin-cln-grpc: Received a message: CustomNotification(Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"Forcibly closed by `close` command timeout\"), \"new_state\": String(\"AWAITING_UNILATERAL\"), \"old_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:04.540Z\")}}}) 2025-01-20T06:15:27.2928311Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG plugin-cln-grpc: Dispatching custom notification Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"Forcibly closed by `close` command timeout\"), \"new_state\": String(\"AWAITING_UNILATERAL\"), \"old_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:04.540Z\")}}} 2025-01-20T06:15:27.2932905Z lightningd-1 2025-01-20T05:46:04.549Z DEBUG plugin-cln-grpc: Failed to parse notification from lightningd Error(\"unknown variant `channel_state_changed`, expected one of `block_added`, `channel_open_failed`, `channel_opened`, `connect`, `custommsg`\", line: 0, column: 0) 2025-01-20T06:15:27.2934604Z lightningd-1 2025-01-20T05:46:04.549Z TRACE lightningd: Calling rpc_command hook of plugin cln-xpay 2025-01-20T06:15:27.2935599Z lightningd-1 2025-01-20T05:46:04.551Z TRACE lightningd: Plugin cln-xpay returned from rpc_command hook call 2025-01-20T06:15:27.2936589Z lightningd-1 2025-01-20T05:46:04.553Z TRACE lightningd: Calling rpc_command hook of plugin cln-xpay 2025-01-20T06:15:27.2937546Z lightningd-1 2025-01-20T05:46:04.554Z TRACE lightningd: Plugin cln-xpay returned from rpc_command hook call 2025-01-20T06:15:27.2946767Z lightningd-1 2025-01-20T05:46:04.581Z DEBUG plugin-bcli: sendrawtx exit 0 (bitcoin-cli -regtest -datadir=/tmp/ltests-qdo6i210/test_no_reconnect_awating_unilateral_1/lightning-1/ -rpcclienttimeout=60 -rpcport=57415 -rpcuser=... -stdinrpcpass sendrawtransaction 020000000001019aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab00000000009db0e280024a01000000000000220020be7935a77ca9ab70a4b8b1906825637767fed3c00824aa90c988983587d68488352f0f00000000002200209f4684ddb28acdc73959bc194d1a25df906f61ed030f52d163e6f1e247d32cbb0400473044022061656b7587a96723e3942f7445313d1c4315e8b2a17fade62d9a48d8a69e2806022024ac01825c79d3bcee6f5dc19586af6b3f8415156cbda934d0e2cbad9268a6a20147304402201fb54f5cb72da9dfd1399b8df6bf4f1950313b057fc0f5b30b2b09aec2783c9e022052dbd5e30bdc7015c0eb939ff3ad58b83c2d7369bb546571603ccfa62e65c4e00147522102324266de8403b3ab157a09f1f784d587af61831c998c151bcc21bb74c2b2314b2102e3bd38009866c9da8ec4aa99cc4ea9c6c0dd46df15c61ef0ce1f271291714e5752ae9a3ed620) 2025-01-20T06:15:27.2956874Z lightningd-1 2025-01-20T05:46:04.581Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Low-priority anchorspend aiming for block 2119 (feerate 253) 2025-01-20T06:15:27.2958216Z lightningd-1 2025-01-20T05:46:04.583Z DEBUG hsmd: Client: Received message 28 from client 2025-01-20T06:15:27.2958970Z lightningd-1 2025-01-20T05:46:09.354Z DEBUG gossipd: seeker: no peers, waiting 2025-01-20T06:15:27.2959764Z lightningd-1 2025-01-20T05:46:09.568Z DEBUG lightningd: channel_gossip: no longer in startup mode 2025-01-20T06:15:27.2960572Z lightningd-1 2025-01-20T05:46:14.358Z DEBUG gossipd: seeker: no peers, waiting 2025-01-20T06:15:27.2962896Z {'github_repository': 'ElementsProject/lightning', 'github_sha': '8c945c6075752c2488c211f912e9b97d1d019fc5', 'github_ref': 'refs/pull/7886/merge', 'github_ref_name': 'HEAD', 'github_run_id': 12859491513, 'github_head_ref': 'guilt/test-htable', 'github_run_number': 11999, 'github_base_ref': 'master', 'github_run_attempt': '3', 'testname': 'test_no_reconnect_awating_unilateral', 'start_time': 1737351957, 'end_time': 1737351974, 'outcome': 'fail'} 2025-01-20T06:15:27.2965150Z --------------------------- Captured stdout teardown --------------------------- ``` We need to check that connectd got the "peer_downgrade" message: before that it *might* try to connect. Signed-off-by: Rusty Russell <[email protected]>

…ming filters. Signed-off-by: Rusty Russell <[email protected]>

I'd been resisting this, but some tests really do want "hang up after *receiving* this", and it's more reliable than getting the peer to "hang up afer *sending* this" which seems not always work. Signed-off-by: Rusty Russell <[email protected]>

Signed-off-by: Rusty Russell <[email protected]>

Sometimes, l1 wouldn't receive the msg before l2 hung up, causing chaos. What we *actually* want here is to make sure that l1 receives the msg before closing: ``` l1.rpc.connect(l2.info['id'], 'localhost', l2.port) > l1.rpc.fundchannel(l2.info['id'], CHANNEL_SIZE) tests/test_connection.py:667: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ contrib/pyln-client/pyln/client/lightning.py:767: in fundchannel return self.call("fundchannel", payload) contrib/pyln-testing/pyln/testing/utils.py:740: in call res = LightningRpc.call(self, method, payload, cmdprefix, filter) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <pyln.testing.utils.PrettyPrintingLightningRpc object at 0x7f22112f5450> method = 'fundchannel' payload = {'amount': 50000, 'announce': True, 'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59'} cmdprefix = None, filter = None def call(self, method, payload=None, cmdprefix=None, filter=None): """Generic call API: you can set cmdprefix here, or set self.cmdprefix before the call is made. """ self.logger.debug("Calling %s with payload %r", method, payload) if payload is None: payload = {} # Filter out arguments that are None if isinstance(payload, dict): payload = {k: v for k, v in payload.items() if v is not None} this_id = self.get_json_id(method, cmdprefix) self.next_id += 1 # FIXME: we open a new socket for every readobj call... sock = UnixSocket(self.socket_path) buf = b'' if self._notify is not None: # Opt into the notifications support self._writeobj(sock, { "jsonrpc": "2.0", "method": "notifications", "id": this_id + "+notify-enable", "params": { "enable": True }, }) # FIXME: Notification schema support? _, buf = self._readobj(sock, buf) request = { "jsonrpc": "2.0", "method": method, "params": payload, "id": this_id, } if filter is None: filter = self._filter if filter is not None: request["filter"] = filter self._writeobj(sock, request) while True: resp, buf = self._readobj(sock, buf) id = resp.get("id", None) meth = resp.get("method", None) if meth == 'message' and self._notify is not None: n = resp['params'] self._notify( message=n.get('message', None), progress=n.get('progress', None), request=request ) continue if meth is None or id is None: break self.logger.debug("Received response for %s call: %r", method, resp) if 'id' in resp and resp['id'] != this_id: raise ValueError("Malformed response, id is not {}: {}.".format(this_id, resp)) sock.close() if not isinstance(resp, dict): raise TypeError("Malformed response, response is not a dictionary %s." % resp) elif "error" in resp: > raise RpcError(method, payload, resp['error']) E pyln.client.lightning.RpcError: RPC call failed: method: fundchannel, payload: {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'amount': 50000, 'announce': True}, error: {'code': -1, 'message': 'Disconnected', 'data': {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'method': 'openchannel_update'}} ``` Signed-off-by: Rusty Russell <[email protected]>

``` ____________________ ERROR at teardown of test_xpay_maxfee _____________________ ... # Format a nice list of everything that went wrong and raise an exception request.node.has_errors = True > raise ValueError(str(errors)) E ValueError: E Node errors: E - lightningd-1: Node exited with return code 1 E Global errors: ``` And: ``` @unittest.skipIf(TEST_NETWORK != 'regtest', 'too dusty on elements') def test_xpay_maxfee(node_factory, bitcoind, chainparams): """Test which shows that we don't excees maxfee""" outfile = tempfile.NamedTemporaryFile(prefix='gossip-store-') subprocess.check_output(['devtools/gossmap-compress', 'decompress', '--node-map=3301=022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'tests/data/gossip-store-2024-09-22.compressed', outfile.name]).decode('utf-8').splitlines() AMOUNT = 100_000_000 # l2 will warn l1 about its invalid gossip: ignore. # We throttle l1's gossip to avoid massive log spam. > l1, l2 = node_factory.line_graph(2, # This is in sats, so 1000x amount we send. fundamount=AMOUNT, opts=[{'gossip_store_file': outfile.name, 'subdaemon': 'channeld:../tests/plugins/channeld_fakenet', 'allow_warning': True, 'dev-throttle-gossip': None}, {'allow_bad_gossip': True}]) tests/test_xpay.py:509: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ contrib/pyln-testing/pyln/testing/utils.py:1720: in line_graph nodes = self.get_nodes(num_nodes, opts=opts) contrib/pyln-testing/pyln/testing/utils.py:1602: in get_nodes return [j.result() for j in jobs] contrib/pyln-testing/pyln/testing/utils.py:1602: in <listcomp> return [j.result() for j in jobs] /opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/concurrent/futures/_base.py:458: in result return self.__get_result() /opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/concurrent/futures/_base.py:403: in __get_result raise self._exception /opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/concurrent/futures/thread.py:58: in run result = self.fn(*self.args, **self.kwargs) contrib/pyln-testing/pyln/testing/utils.py:1653: in get_node node.start(wait_for_bitcoind_sync) contrib/pyln-testing/pyln/testing/utils.py:1015: in start self.daemon.start(stderr_redir=stderr_redir) contrib/pyln-testing/pyln/testing/utils.py:671: in start self.wait_for_log("Server started with public key") contrib/pyln-testing/pyln/testing/utils.py:355: in wait_for_log return self.wait_for_logs([regex], timeout) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <pyln.testing.utils.LightningD object at 0x7f27ab586c20> regexs = ['Server started with public key'], timeout = 180 def wait_for_logs(self, regexs, timeout=TIMEOUT): """Look for `regexs` in the logs. The logs contain tailed stdout of the process. We look for each regex in `regexs`, starting from `logsearch_start` which normally is the position of the last found entry of a previous wait-for logs call. The ordering inside `regexs` doesn't matter. We fail if the timeout is exceeded or if the underlying process exits before all the `regexs` were found. If timeout is None, no time-out is applied. """ logging.debug("Waiting for {} in the logs".format(regexs)) exs = [re.compile(r) for r in regexs] start_time = time.time() while True: if self.logsearch_start >= len(self.logs): if not self.logs_catchup(): time.sleep(0.25) if timeout is not None and time.time() > start_time + timeout: print("Time-out: can't find {} in logs".format(exs)) for r in exs: if self.is_in_log(r): print("({} was previously in logs!)".format(r)) > raise TimeoutError('Unable to find "{}" in logs.'.format(exs)) E TimeoutError: Unable to find "[re.compile('Server started with public key')]" in logs. ``` gossipd (and other plugins) simply take too long to digest the gossmap under valgrind. Signed-off-by: Rusty Russell <[email protected]>

This test seems confused: l2 won't be able to reconnect to l1 (l1 connected to l2 in the first place, it's the only one which can reconnect). Also, there's a commitment_signed early on when we have dual funding, so this doesn't actually test splicing disconnect in that case? The race seems to happen when l1 reconnectd, and l2 hasn't registered the disconnect yet. ``` ________________________ test_splice_disconnect_commit _________________________ [gw9] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python node_factory = <pyln.testing.utils.NodeFactory object at 0x7f53ebc1a950> bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f53ebc1b760> executor = <concurrent.futures.thread.ThreadPoolExecutor object at 0x7f53ebc1a170> @pytest.mark.openchannel('v1') @pytest.mark.openchannel('v2') @unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd doesnt yet support PSBT features we need') def test_splice_disconnect_commit(node_factory, bitcoind, executor): l1 = node_factory.get_node(options={'experimental-splicing': None}, may_reconnect=True) l2 = node_factory.get_node(disconnect=['+WIRE_COMMITMENT_SIGNED'], options={'experimental-splicing': None, 'dev-no-reconnect': None}, may_reconnect=True) > l1.openchannel(l2, 1000000) tests/test_splicing_disconnect.py:77: ... elif "error" in resp: > raise RpcError(method, payload, resp['error']) E pyln.client.lightning.RpcError: RPC call failed: method: fundchannel, payload: {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'amount': 1000000, 'announce': True}, error: {'code': -1, 'message': 'Disconnected', 'data': {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'method': 'openchannel_update'}} ``` From the logs: ``` 2025-01-23T23:50:25.4098040Z lightningd-2 2025-01-23T23:47:12.443Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: dev_disconnect: +WIRE_COMMITMENT_SIGNED (WIRE_COMMITMENT_SIGNED) ... 2025-01-23T23:50:25.4107026Z lightningd-2 2025-01-23T23:47:12.444Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-dualopend-chan#1: peer_out WIRE_COMMITMENT_SIGNED 2025-01-23T23:50:25.4108070Z lightningd-2 2025-01-23T23:47:12.444Z INFO 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-dualopend-chan#1: Peer connection lost 2025-01-23T23:50:25.4109375Z lightningd-2 2025-01-23T23:47:12.445Z INFO 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: Peer transient failure in DUALOPEND_OPEN_COMMIT_READY: dualopend: Owning subdaemon dualopend died (62208) ... 2025-01-23T23:50:25.4111195Z lightningd-2 2025-01-23T23:47:12.445Z DEBUG plugin-cln-grpc: Received a message: CustomNotification(Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_open_failed\"), \"params\": Object {\"channel_open_failed\": Object {\"channel_id\": String(\"252d1b0a1e57895e84137f28cf19ab2c35847e284c112fefdecc7afeaa5c1de7\")}}}) 2025-01-23T23:50:25.4113039Z lightningd-2 2025-01-23T23:47:12.445Z DEBUG plugin-cln-grpc: Dispatching custom notification Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_open_failed\"), \"params\": Object {\"channel_open_failed\": Object {\"channel_id\": String(\"252d1b0a1e57895e84137f28cf19ab2c35847e284c112fefdecc7afeaa5c1de7\")}}} 2025-01-23T23:50:25.4114525Z lightningd-2 2025-01-23T23:47:12.446Z DEBUG plugin-funder: Cleaning up inflights for peer id 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518 2025-01-23T23:50:25.4115550Z lightningd-2 2025-01-23T23:47:12.446Z DEBUG plugin-funder: Cleaning up inflight for channel_id 252d1b0a1e57895e84137f28cf19ab2c35847e284c112fefdecc7afeaa5c1de7 2025-01-23T23:50:25.4116406Z lightningd-2 2025-01-23T23:47:12.446Z TRACE lightningd: Calling rpc_command hook of plugin cln-xpay 2025-01-23T23:50:25.4117134Z lightningd-2 2025-01-23T23:47:12.447Z TRACE lightningd: Plugin cln-xpay returned from rpc_command hook call 2025-01-23T23:50:25.4117728Z lightningd-1 2025-01-23T23:47:12.448Z DEBUG connectd: drain_peer 2025-01-23T23:50:25.4118229Z lightningd-1 2025-01-23T23:47:12.448Z DEBUG connectd: drain_peer draining subd! 2025-01-23T23:50:25.4119066Z lightningd-1 2025-01-23T23:47:12.448Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: peer_disconnect_done 2025-01-23T23:50:25.4119792Z lightningd-1 2025-01-23T23:47:12.449Z DEBUG connectd: maybe_free_peer freeing peer! ... 2025-01-23T23:50:25.4135647Z lightningd-2 2025-01-23T23:47:12.455Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Connect IN 2025-01-23T23:50:25.4136554Z lightningd-1 2025-01-23T23:47:12.455Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Connected out, starting crypto 2025-01-23T23:50:25.4137502Z lightningd-1 2025-01-23T23:47:12.456Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Connect OUT 2025-01-23T23:50:25.4138483Z lightningd-1 2025-01-23T23:47:12.456Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: peer_out WIRE_INIT 2025-01-23T23:50:25.4139407Z lightningd-1 2025-01-23T23:47:12.456Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: peer_in WIRE_INIT 2025-01-23T23:50:25.4140714Z lightningd-1 2025-01-23T23:47:12.456Z INFO 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in DUALOPEND_OPEN_COMMIT_READY: Disconnected ``` Signed-off-by: Rusty Russell <[email protected]>

Sometimes they connect too fast, so we don't get a chance to ratelimit all of them: ``` def test_connect_ratelimit(node_factory, bitcoind): """l1 has 5 peers, restarts, make sure we limit""" nodes = node_factory.get_nodes(6, opts=[{'dev-limit-connections-inflight': None, 'may_reconnect': True}] + [{'may_reconnect': True}] * 5) l1 = nodes[0] nodes = nodes[1:] addr = l1.rpc.newaddr()['bech32'] for n in nodes: bitcoind.rpc.sendtoaddress(addr, (FUNDAMOUNT + 1000000) / 10**8) bitcoind.generate_block(1, wait_for_mempool=len(nodes)) sync_blockheight(bitcoind, [l1]) for n in nodes: l1.rpc.connect(n.info['id'], 'localhost', n.port) l1.rpc.fundchannel(n.info['id'], FUNDAMOUNT) # Make sure all channels are established and announced. bitcoind.generate_block(6, wait_for_mempool=len(nodes)) wait_for(lambda: len(l1.rpc.listchannels()['channels']) == len(nodes) * 2) assert not l1.daemon.is_in_log('Unblocking for') l1.restart() # The first will be ok, but others should block and be unblocked. > l1.daemon.wait_for_logs((['Unblocking for '] + ['Too many connections, waiting']) * (len(nodes) - 1)) tests/test_connection.py:4721: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <pyln.testing.utils.LightningD object at 0x7f6e288a3a60> regexs = ['Unblocking for ', 'Too many connections, waiting', 'Unblocking for ', 'Too many connections, waiting', 'Unblocking for ', 'Too many connections, waiting', ...] timeout = 180 def wait_for_logs(self, regexs, timeout=TIMEOUT): """Look for `regexs` in the logs. The logs contain tailed stdout of the process. We look for each regex in `regexs`, starting from `logsearch_start` which normally is the position of the last found entry of a previous wait-for logs call. The ordering inside `regexs` doesn't matter. We fail if the timeout is exceeded or if the underlying process exits before all the `regexs` were found. If timeout is None, no time-out is applied. """ logging.debug("Waiting for {} in the logs".format(regexs)) exs = [re.compile(r) for r in regexs] start_time = time.time() while True: if self.logsearch_start >= len(self.logs): if not self.logs_catchup(): time.sleep(0.25) if timeout is not None and time.time() > start_time + timeout: print("Time-out: can't find {} in logs".format(exs)) for r in exs: if self.is_in_log(r): print("({} was previously in logs!)".format(r)) > raise TimeoutError('Unable to find "{}" in logs.'.format(exs)) E TimeoutError: Unable to find "[re.compile('Unblocking for '), re.compile('Too many connections, waiting')]" in logs. ``` Signed-off-by: Rusty Russell <[email protected]>

We can be a bit early in our assertion: ``` @unittest.skipIf(os.getenv('TEST_DB_PROVIDER', 'sqlite3') != 'sqlite3', "sqlite3-specific DB manip") def test_reconnect_remote_sends_no_sigs(node_factory): """We re-announce, even when remote node doesn't send its announcement_signatures on reconnect. """ l1, l2 = node_factory.line_graph(2, wait_for_announce=True, opts={'may_reconnect': True, 'dev-no-reconnect': None}) # Wipe l2's gossip_store l2.stop() gs_path = os.path.join(l2.daemon.lightning_dir, TEST_NETWORK, 'gossip_store') os.unlink(gs_path) l2.start() # l2 will now uses (REMOTE's) announcement_signatures it has stored wait_for(lambda: l2.rpc.listchannels()['channels'] != []) # Remove remote signatures from l1 so it asks for them (and delete gossip store) l1.db_manip("UPDATE channels SET remote_ann_node_sig=NULL, remote_ann_bitcoin_sig=NULL") gs_path = os.path.join(l1.daemon.lightning_dir, TEST_NETWORK, 'gossip_store') os.unlink(gs_path) l1.restart() l1.connect(l2) l1needle = l1.daemon.logsearch_start l2needle = l2.daemon.logsearch_start # l1 asks once, l2 replies once. # Make sure we get all the msgs! time.sleep(5) l1.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES') l2.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES') l1msgs = [l.split()[4] for l in l1.daemon.logs[l1needle:] if 'WIRE_ANNOUNCEMENT_SIGNATURES' in l] > assert l1msgs == ['peer_out', 'peer_in'] E AssertionError: assert ['peer_out'] == ['peer_out', 'peer_in'] E Right contains one more item: 'peer_in' E Full diff: E - ['peer_out', 'peer_in'] E + ['peer_out'] ``` ``` lightningd-2 2025-01-24T05:53:22.862Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: peer_out WIRE_ANNOUNCEMENT_SIGNATURES lightningd-1 2025-01-24T05:53:22.864Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_in WIRE_ANNOUNCEMENT_SIGNATURES lightningd-1 2025-01-24T05:53:22.885Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: channel_gossip: received announcement sigs for 103x1x0 (we have 103x1x0) {'github_repository': 'ElementsProject/lightning', 'github_sha': 'e9d36f2b8ecd45882753cbe062c355e40bc7109c', 'github_ref': 'refs/pull/8027/merge', 'github_ref_name': 'HEAD', 'github_run_id': 12943530601, 'github_head_ref': ``` Signed-off-by: Rusty Russell <[email protected]>

This caused a "flake" in testing, because it's wrong: ``` _____________________________ test_gossip_pruning ______________________________ [gw2] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python node_factory = <pyln.testing.utils.NodeFactory object at 0x7f0267530490> bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f0267532b30> def test_gossip_pruning(node_factory, bitcoind): """ Create channel and see it being updated in time before pruning """ l1, l2, l3 = node_factory.get_nodes(3, opts={'dev-fast-gossip-prune': None, 'allow_bad_gossip': True, 'autoconnect-seeker-peers': 0}) l1.rpc.connect(l2.info['id'], 'localhost', l2.port) l2.rpc.connect(l3.info['id'], 'localhost', l3.port) scid1, _ = l1.fundchannel(l2, 10**6) scid2, _ = l2.fundchannel(l3, 10**6) mine_funding_to_announce(bitcoind, [l1, l2, l3]) wait_for(lambda: l1.rpc.listchannels(source=l1.info['id'])['channels'] != []) l1_initial_cupdate_timestamp = only_one(l1.rpc.listchannels(source=l1.info['id'])['channels'])['last_update'] # Get timestamps of initial updates, so we can ensure they change. # Channels should be activated locally > wait_for(lambda: [c['active'] for c in l1.rpc.listchannels()['channels']] == [True] * 4) ``` Here you can see it has pruned: ``` lightningd-1 2025-01-24T07:39:40.873Z DEBUG gossipd: Pruning channel 105x1x0 from network view (ages 1737704380 and 0) ... lightningd-1 2025-01-24T07:39:50.941Z UNUSUAL lightningd: Bad gossip order: could not find channel 105x1x0 for peer's channel update ``` Changelog-Fixed: Protocol: we were overzealous in pruning channels if we hadn't seen one side's gossip update yet. Signed-off-by: Rusty Russell <[email protected]>

So not every message type counts: this is useful when we want to get a specific number of a specific type. Signed-off-by: Rusty Russell <[email protected]>

@localhost

By having gossipwith filter out messages we don't want, we can get the counts of expected messages correct, and not hit errors like this: ``` def test_gossip_throttle(node_factory, bitcoind, chainparams): """Make some gossip, test it gets throttled""" l1, l2, l3, l4 = node_factory.line_graph(4, wait_for_announce=True, opts=[{}, {}, {}, {'dev-throttle-gossip': None}]) # We expect: self-advertizement (3 messages for l1 and l4) plus # 4 node announcements, 3 channel announcements and 6 channel updates. # We also expect it to send a timestamp filter message. # (We won't take long enough to get a ping!) expected = 4 + 4 + 3 + 6 + 1 # l1 is unlimited start_fast = time.time() out1 = subprocess.run(['devtools/gossipwith', '--all-gossip', '--hex', '--network={}'.format(TEST_NETWORK), '--max-messages={}'.format(expected), '{}@localhost:{}'.format(l1.info['id'], l1.port)], check=True, timeout=TIMEOUT, stdout=subprocess.PIPE).stdout.split() time_fast = time.time() - start_fast assert time_fast < 2 # Remove timestamp filter, since timestamp will change! out1 = [m for m in out1 if not m.startswith(b'0109')] # l4 is throttled start_slow = time.time() out2 = subprocess.run(['devtools/gossipwith', '--all-gossip', '--hex', '--network={}'.format(TEST_NETWORK), '--max-messages={}'.format(expected), '{}@localhost:{}'.format(l4.info['id'], l4.port)], check=True, timeout=TIMEOUT, stdout=subprocess.PIPE).stdout.split() time_slow = time.time() - start_slow assert time_slow > 3 # Remove timestamp filter, since timestamp will change! out2 = [m for m in out2 if not m.startswith(b'0109')] # Contents should be identical (once uniquified, since each # doubles-up on its own gossip) > assert set(out1) == set(out2) E AssertionError: assert {b'010054b1907bdf639c9060e0fa4bca02419c46f75a99f0908b87a2e09711d5d031ba76b8fd07acc8be1b2fac9e31efb808e5d362c32ef4665... E Extra items in the left set: E b'01010ad5be8b9ba029245c2ae2d667af7ead7c0129c479c7fd7145a9b65931e90222082e6e4ab37ef60ebd10f1493d73e8bf7a40c4ae5f7d87cc...8488830b60f7e744ed9235eb0b1ba93283b315c035180266e44a554e494f524245414d2d333930353033622d6d6f64646564000000000000000000' E Extra items in the right set: E b'01079f87eb580b9e5f11dc211e9fb66abb3699999044f8fe146801162393364286c6000000010000006c010101' ``` Signed-off-by: Rusty Russell <[email protected]>

We can ask for the state too fast, before WIRE_ERROR is processed: ``` 2025-01-24T14:17:57.4799255Z ________________________ test_onchain_reestablish_reply ________________________ 2025-01-24T14:17:57.4800383Z [gw2] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python 2025-01-24T14:17:57.4801246Z 2025-01-24T14:17:57.4802068Z node_factory = <pyln.testing.utils.NodeFactory object at 0x7f66ec144400> 2025-01-24T14:17:57.4802645Z bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f66ec145ab0> 2025-01-24T14:17:57.4803483Z executor = <concurrent.futures.thread.ThreadPoolExecutor object at 0x7f66ec146da0> 2025-01-24T14:17:57.4804029Z 2025-01-24T14:17:57.4804340Z def test_onchain_reestablish_reply(node_factory, bitcoind, executor): 2025-01-24T14:17:57.4805013Z l1, l2, l3 = node_factory.line_graph(3, opts={'may_reconnect': True, 2025-01-24T14:17:57.4805572Z 'dev-no-reconnect': None}) 2025-01-24T14:17:57.4805990Z 2025-01-24T14:17:57.4806262Z # Make l1 close unilaterally. 2025-01-24T14:17:57.4806697Z l1.rpc.disconnect(l2.info['id'], force=True) 2025-01-24T14:17:57.4807200Z l1.rpc.close(l2.info['id'], unilateraltimeout=1) 2025-01-24T14:17:57.4807622Z 2025-01-24T14:17:57.4808025Z # l2 doesn't know, reconnection should get REESTABLISH *then* error. 2025-01-24T14:17:57.4808637Z l2.rpc.connect(l1.info['id'], 'localhost', l1.port) 2025-01-24T14:17:57.4809066Z 2025-01-24T14:17:57.4809334Z # We should exchange messages 2025-01-24T14:17:57.4809835Z l2.daemon.wait_for_logs(["peer_out WIRE_CHANNEL_REESTABLISH", 2025-01-24T14:17:57.4810399Z "peer_in WIRE_CHANNEL_REESTABLISH"]) 2025-01-24T14:17:57.4811197Z # It should be OK 2025-01-24T14:17:57.4811642Z l2.daemon.wait_for_log("Reconnected, and reestablished.") 2025-01-24T14:17:57.4812116Z 2025-01-24T14:17:57.4812378Z # Then we get the error, close. 2025-01-24T14:17:57.4812824Z l2.daemon.wait_for_log("peer_in WIRE_ERROR") 2025-01-24T14:17:57.4813576Z > assert only_one(l2.rpc.listpeerchannels(l1.info['id'])['channels'])['state'] == 'AWAITING_UNILATERAL' 2025-01-24T14:17:57.4814446Z E AssertionError: assert 'CHANNELD_NORMAL' == 'AWAITING_UNILATERAL' 2025-01-24T14:17:57.4815014Z E - AWAITING_UNILATERAL 2025-01-24T14:17:57.4815362Z E + CHANNELD_NORMAL ``` ``` 2025-01-24T14:17:57.5760435Z lightningd-2 2025-01-24T14:04:28.398Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: peer_in WIRE_ERROR 2025-01-24T14:17:57.5760674Z lightningd-2 2025-01-24T14:04:28.401Z TRACE lightningd: Calling rpc_command hook of plugin cln-xpay 2025-01-24T14:17:57.5760935Z lightningd-2 2025-01-24T14:04:28.403Z TRACE lightningd: Plugin cln-xpay returned from rpc_command hook call 2025-01-24T14:17:57.5761176Z lightningd-2 2025-01-24T14:04:28.408Z TRACE lightningd: Calling rpc_command hook of plugin cln-xpay 2025-01-24T14:17:57.5761430Z lightningd-2 2025-01-24T14:04:28.414Z TRACE lightningd: Plugin cln-xpay returned from rpc_command hook call 2025-01-24T14:17:57.5762598Z {'github_repository': 'ElementsProject/lightning', 'github_sha': '390503b31e1949cb84e5a5257bf29a1b80d5cd20', 'github_ref': 'refs/pull/8027/merge', 'github_ref_name': 'HEAD', 'github_run_id': 12950414032, 'github_head_ref': 'guilt/fix-flakes12', 'github_run_number': 12051, 'github_base_ref': 'master', 'github_run_attempt': '1', 'testname': 'test_onchain_reestablish_reply', 'start_time': 1737727408, 'end_time': 1737727468, 'outcome': 'fail'} 2025-01-24T14:17:57.5763430Z lightningd-2 2025-01-24T14:04:28.426Z DEBUG 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: billboard perm: Received ERROR channel 69d744488de7ec7f4c018da98a8a9d726ee3b47a97809608e4ce6f17747b5f09: Forcibly closed by `close` command timeout 2025-01-24T14:17:57.5764512Z lightningd-2 2025-01-24T14:04:28.433Z UNUSUAL 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: Peer permanent failure in CHANNELD_NORMAL: channeld: received ERROR channel 69d744488de7ec7f4c018da98a8a9d726ee3b47a97809608e4ce6f17747b5f09: Forcibly closed by `close` command timeout (reason=protocol) 2025-01-24T14:17:57.5764997Z lightningd-2 2025-01-24T14:04:28.439Z INFO 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL ``` Signed-off-by: Rusty Russell <[email protected]>

Doesn't always die messily, it seems? ``` @unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd doesnt yet support PSBT features we need') @pytest.mark.openchannel('v2') def test_rbf_reconnect_tx_construct(node_factory, bitcoind, chainparams): disconnects = ['=WIRE_TX_ADD_INPUT', # Initial funding succeeds '-WIRE_TX_ADD_INPUT', '+WIRE_TX_ADD_INPUT', '-WIRE_TX_ADD_OUTPUT', '+WIRE_TX_ADD_OUTPUT', '-WIRE_TX_COMPLETE', '+WIRE_TX_COMPLETE', '-WIRE_COMMITMENT_SIGNED', '+WIRE_COMMITMENT_SIGNED'] l1, l2 = node_factory.get_nodes(2, opts=[{'disconnect': disconnects, 'may_reconnect': True, 'dev-no-reconnect': None}, {'may_reconnect': True, 'dev-no-reconnect': None, 'broken_log': 'dualopend daemon died before signed PSBT returned'}]) l1.rpc.connect(l2.info['id'], 'localhost', l2.port) amount = 2**24 chan_amount = 100000 bitcoind.rpc.sendtoaddress(l1.rpc.newaddr()['bech32'], amount / 10**8 + 0.01) bitcoind.generate_block(1) # Wait for it to arrive. wait_for(lambda: len(l1.rpc.listfunds()['outputs']) > 0) res = l1.rpc.fundchannel(l2.info['id'], chan_amount) chan_id = res['channel_id'] vins = bitcoind.rpc.decoderawtransaction(res['tx'])['vin'] assert(only_one(vins)) prev_utxos = ["{}:{}".format(vins[0]['txid'], vins[0]['vout'])] # Check that we're waiting for lockin l1.daemon.wait_for_log(' to DUALOPEND_AWAITING_LOCKIN') # rbf the lease with a higher amount rate = int(find_next_feerate(l1, l2)[:-5]) # We 4x the feerate to beat the min-relay fee next_feerate = '{}perkw'.format(rate * 4) # Initiate an RBF startweight = 42 + 172 # base weight, funding output initpsbt = l1.rpc.utxopsbt(chan_amount, next_feerate, startweight, prev_utxos, reservedok=True, excess_as_change=True) # Run through TX_ADD wires for d in disconnects[1:-4]: l1.rpc.connect(l2.info['id'], 'localhost', l2.port) with pytest.raises(RpcError): l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt']) wait_for(lambda: l1.rpc.getpeer(l2.info['id'])['connected'] is False) # The first TX_COMPLETE breaks l1.rpc.connect(l2.info['id'], 'localhost', l2.port) bump = l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt']) with pytest.raises(RpcError): update = l1.rpc.openchannel_update(chan_id, bump['psbt']) wait_for(lambda: l1.rpc.getpeer(l2.info['id'])['connected'] is False) l1.rpc.connect(l2.info['id'], 'localhost', l2.port) # l1 should remember, l2 has forgotten # l2 should send tx-abort, to reset l2.daemon.wait_for_log(r'tx-abort: Sent next_funding_txid .* doesn\'t match ours .*') l1.daemon.wait_for_log(r'Cleaned up incomplete inflight') # abort doesn't cause a disconnect assert l1.rpc.getpeer(l2.info['id'])['connected'] # The next TX_COMPLETE break (both remember) + they break on the # COMMITMENT_SIGNED during the reconnect bump = l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt']) with pytest.raises(RpcError): update = l1.rpc.openchannel_update(chan_id, bump['psbt']) wait_for(lambda: l1.rpc.getpeer(l2.info['id'])['connected'] is False) l1.rpc.connect(l2.info['id'], 'localhost', l2.port) l2.daemon.wait_for_logs([r'Got dualopend reestablish', r'No commitment, not sending our sigs']) l1.daemon.wait_for_logs([r'Got dualopend reestablish', r'No commitment, not sending our sigs', r'dev_disconnect: -WIRE_COMMITMENT_SIGNED', 'peer_disconnect_done']) assert not l1.rpc.getpeer(l2.info['id'])['connected'] l1.rpc.connect(l2.info['id'], 'localhost', l2.port) # COMMITMENT_SIGNED disconnects *during* the reconnect # We can't bump because the last negotiation is in the wrong state with pytest.raises(RpcError, match=r'Funding sigs for this channel not secured'): l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt']) # l2 reconnects, but doesn't have l1's commitment > l2.daemon.wait_for_logs([r'Got dualopend reestablish', r'No commitment, not sending our sigs', # This is a BROKEN log, it's expected! r'dualopend daemon died before signed PSBT returned']) tests/test_opening.py:944: ... > raise TimeoutError('Unable to find "{}" in logs.'.format(exs)) E TimeoutError: Unable to find "[re.compile('dualopend daemon died before signed PSBT returned')]" in logs. ``` Signed-off-by: Rusty Russell <[email protected]>

If we get "lucky" then commit tx will have short sig, one less weight (1 in 256 chance): ``` @unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd anchors not supportd') def test_onchain_slow_anchor(node_factory, bitcoind): """We still use anchors for non-critical closes""" l1, l2 = node_factory.line_graph(2) # Don't let l1 succeed in sending commit tx def censoring_sendrawtx(r): return {'id': r['id'], 'result': {}} l1.daemon.rpcproxy.mock_rpc('sendrawtransaction', censoring_sendrawtx) close_start_depth = bitcoind.rpc.getblockchaininfo()['blocks'] # Make l1 close unilaterally. l1.rpc.disconnect(l2.info['id'], force=True) l1.rpc.close(l2.info['id'], unilateraltimeout=1) # We will have a super-low-prio anchor spend. l1.daemon.wait_for_log(r"Low-priority anchorspend aiming for block {} \(feerate 253\)".format(close_start_depth + 2016)) # Restart with reduced block time. l1.stop() l1.daemon.opts['dev-low-prio-anchor-blocks'] = 20 l1.start() l1.daemon.wait_for_log("Low-priority anchorspend aiming for block {}".format(close_start_depth + 20)) l1.daemon.wait_for_log("Anchorspend for local commit tx") # Won't go under 12 blocks though. # Make sure it sees all these blocks at once, to avoid test flakes! l1.stop() bitcoind.generate_block(7) l1.start() height = bitcoind.rpc.getblockchaininfo()['blocks'] l1.daemon.wait_for_log(r"Low-priority anchorspend aiming for block {} \(feerate 7458\)".format(height + 13)) > l1.daemon.wait_for_log(r"Anchorspend for local commit tx fee 12335sat \(w=714\), commit_tx fee 4545sat \(w=768\): package feerate 11390 perkw") ``` Here's the log we *did* get: ``` 2025-01-25T08:46:40.9399213Z lightningd-1 2025-01-25T08:40:06.312Z DEBUG 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Anchorspend for local commit tx fee 12328sat (w=714), commit_tx fee 4545sat (w=767): package feerate 11392 perkw ``` Signed-off-by: Rusty Russell <[email protected]>

We have CI runs which timeout (after 2 hours). It's not clear why, but we can at least eliminate CLN lockups as the answer. Since pytest disabled the --timeout option on test shutdown, we could be seeing an issue on stopping taking a long time? Signed-off-by: Rusty Russell <[email protected]>

rustyrussell added this to the v25.02 milestone Jan 23, 2025

rustyrussell added 15 commits January 25, 2025 11:07

dev_disconnect: rename to dev_disconnect_out, in preparation for inco…

67f5baa

…ming filters. Signed-off-by: Rusty Russell <[email protected]>

connectd: attach input filtering for incoming dev_disconnect.

eb8e97a

Signed-off-by: Rusty Russell <[email protected]>

devtools/gossipwith: filter for max-messages.

906eb95

So not every message type counts: this is useful when we want to get a specific number of a specific type. Signed-off-by: Rusty Russell <[email protected]>

rustyrussell force-pushed the guilt/fix-flakes12 branch from c818ce0 to 53e64f8 Compare January 26, 2025 01:38

rustyrussell force-pushed the guilt/fix-flakes12 branch from 53e64f8 to 00b22d1 Compare January 26, 2025 06:03

rustyrussell requested a review from cdecker as a code owner January 26, 2025 06:03

rustyrussell merged commit 4e887c7 into ElementsProject:master Jan 27, 2025
33 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More flake fixes! #8027

More flake fixes! #8027

rustyrussell commented Jan 23, 2025

More flake fixes! #8027

More flake fixes! #8027

Conversation

rustyrussell commented Jan 23, 2025