Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flake fixes! #8027

Merged
merged 16 commits into from
Jan 27, 2025
Merged

Conversation

rustyrussell
Copy link
Contributor

Changelog-None: just tweaking tests...

@rustyrussell rustyrussell added this to the v25.02 milestone Jan 23, 2025
```
    def test_no_reconnect_awating_unilateral(node_factory, bitcoind):
        l1, l2 = node_factory.line_graph(2, opts={'may_reconnect': True})
        l2.stop()
    
        # Close immediately.
        l1.rpc.close(l2.info['id'], 1)
    
        wait_for(lambda: only_one(l1.rpc.listpeerchannels(l2.info['id'])['channels'])['state'] == 'AWAITING_UNILATERAL')
    
        # After switching to AWAITING_UNILATERAL it will *not* try to reconnect.
        l1.daemon.wait_for_log("State changed from CHANNELD_SHUTTING_DOWN to AWAITING_UNILATERAL")
        time.sleep(10)
    
>       assert not l1.daemon.is_in_log('Will try reconnect', start=l1.daemon.logsearch_start)
E       AssertionError: assert not 'lightningd-1 2025-01-20T05:46:04.548Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 2 seconds'
E        +  where 'lightningd-1 2025-01-20T05:46:04.548Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 2 seconds' = <bound method TailableProc.is_in_log of <pyln.testing.utils.LightningD object at 0x7f68ab565c90>>('Will try reconnect', start=453)
E        +    where <bound method TailableProc.is_in_log of <pyln.testing.utils.LightningD object at 0x7f68ab565c90>> = <pyln.testing.utils.LightningD object at 0x7f68ab565c90>.is_in_log
E        +      where <pyln.testing.utils.LightningD object at 0x7f68ab565c90> = <fixtures.LightningNode object at 0x7f68ab567880>.daemon
E        +    and   453 = <pyln.testing.utils.LightningD object at 0x7f68ab565c90>.logsearch_start
E        +      where <pyln.testing.utils.LightningD object at 0x7f68ab565c90> = <fixtures.LightningNode object at 0x7f68ab567880>.daemon
```

In fact:

```
2025-01-20T06:15:27.2854309Z lightningd-2 2025-01-20T05:46:03.527Z DEBUG   lightningd: io_break: destroy_plugin
2025-01-20T06:15:27.2855089Z lightningd-2 2025-01-20T05:46:03.527Z DEBUG   lightningd: Command returned result after jcon close
2025-01-20T06:15:27.2855805Z lightningd-2 2025-01-20T05:46:03.528Z DEBUG   connectd: Shutting down
2025-01-20T06:15:27.2856576Z lightningd-2 2025-01-20T05:46:03.528Z DEBUG   gossipd: Shutting down
2025-01-20T06:15:27.2857159Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG   connectd: drain_peer
2025-01-20T06:15:27.2857790Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG   connectd: drain_peer draining subd!
2025-01-20T06:15:27.2858825Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: peer_disconnect_done
2025-01-20T06:15:27.2860481Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 1 seconds
2025-01-20T06:15:27.2861626Z lightningd-1 2025-01-20T05:46:03.529Z DEBUG   connectd: maybe_free_peer freeing peer!
2025-01-20T06:15:27.2862723Z lightningd-1 2025-01-20T05:46:03.530Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: Peer connection lost
2025-01-20T06:15:27.2864510Z lightningd-1 2025-01-20T05:46:03.530Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in CHANNELD_NORMAL: channeld: Owning subdaemon channeld died (62208)
2025-01-20T06:15:27.2866301Z lightningd-1 2025-01-20T05:46:03.530Z DEBUG   plugin-funder: Cleaning up inflights for peer id 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59
2025-01-20T06:15:27.2867313Z lightningd-2 2025-01-20T05:46:03.530Z DEBUG   hsmd: Shutting down
2025-01-20T06:15:27.2868029Z lightningd-1 2025-01-20T05:46:03.535Z TRACE   lightningd: Calling rpc_command hook of plugin cln-xpay
2025-01-20T06:15:27.2868935Z lightningd-1 2025-01-20T05:46:03.535Z TRACE   lightningd: Plugin cln-xpay returned from rpc_command hook call
2025-01-20T06:15:27.2869830Z lightningd-1 2025-01-20T05:46:03.536Z TRACE   lightningd: Calling rpc_command hook of plugin cln-xpay
2025-01-20T06:15:27.2870740Z lightningd-1 2025-01-20T05:46:03.536Z TRACE   lightningd: Plugin cln-xpay returned from rpc_command hook call
2025-01-20T06:15:27.2872276Z lightningd-1 2025-01-20T05:46:03.536Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: State changed from CHANNELD_NORMAL to CHANNELD_SHUTTING_DOWN
2025-01-20T06:15:27.2873873Z lightningd-1 2025-01-20T05:46:03.538Z DEBUG   lightningd: NOTIFY "-c:close#30" info peer is offline, will negotiate once they reconnect (1 seconds before unilateral close).
2025-01-20T06:15:27.2874947Z lightningd-1 2025-01-20T05:46:03.538Z DEBUG   lightningd: close_command: timeout = 1
2025-01-20T06:15:27.2878248Z lightningd-1 2025-01-20T05:46:03.541Z DEBUG   plugin-cln-grpc: Received a message: CustomNotification(Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"User or plugin invoked close command\"), \"new_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"old_state\": String(\"CHANNELD_NORMAL\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:03.537Z\")}}})
2025-01-20T06:15:27.2884122Z lightningd-1 2025-01-20T05:46:03.542Z DEBUG   plugin-cln-grpc: Dispatching custom notification Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"User or plugin invoked close command\"), \"new_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"old_state\": String(\"CHANNELD_NORMAL\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:03.537Z\")}}}
2025-01-20T06:15:27.2888242Z lightningd-1 2025-01-20T05:46:03.542Z DEBUG   plugin-cln-grpc: Failed to parse notification from lightningd Error(\"unknown variant `channel_state_changed`, expected one of `block_added`, `channel_open_failed`, `channel_opened`, `connect`, `custommsg`\", line: 0, column: 0)
2025-01-20T06:15:27.2889970Z lightningd-1 2025-01-20T05:46:04.350Z DEBUG   gossipd: seeker: no peers, waiting
2025-01-20T06:15:27.2890762Z lightningd-1 2025-01-20T05:46:04.538Z DEBUG   lightningd: NOTIFY "-c:close#30" info Timed out, forcing close.
2025-01-20T06:15:27.2892345Z lightningd-1 2025-01-20T05:46:04.539Z UNUSUAL 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer permanent failure in CHANNELD_SHUTTING_DOWN: Forcibly closed by `close` command timeout (reason=user)
2025-01-20T06:15:27.2894333Z lightningd-1 2025-01-20T05:46:04.539Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: State changed from CHANNELD_SHUTTING_DOWN to AWAITING_UNILATERAL
2025-01-20T06:15:27.2895943Z lightningd-1 2025-01-20T05:46:04.543Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: We have 1 anchor points to use
2025-01-20T06:15:27.2897412Z lightningd-1 2025-01-20T05:46:04.544Z DEBUG   lightningd: Broadcasting txid c96f6e4409140474fdd5d8b4862c3d74bd6fa24dadd178182e78fdc4cbf68149 for "-c:close#30"
2025-01-20T06:15:27.2906272Z lightningd-1 2025-01-20T05:46:04.544Z DEBUG   lightningd: sendrawtransaction: 020000000001019aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab00000000009db0e280024a01000000000000220020be7935a77ca9ab70a4b8b1906825637767fed3c00824aa90c988983587d68488352f0f00000000002200209f4684ddb28acdc73959bc194d1a25df906f61ed030f52d163e6f1e247d32cbb0400473044022061656b7587a96723e3942f7445313d1c4315e8b2a17fade62d9a48d8a69e2806022024ac01825c79d3bcee6f5dc19586af6b3f8415156cbda934d0e2cbad9268a6a20147304402201fb54f5cb72da9dfd1399b8df6bf4f1950313b057fc0f5b30b2b09aec2783c9e022052dbd5e30bdc7015c0eb939ff3ad58b83c2d7369bb546571603ccfa62e65c4e00147522102324266de8403b3ab157a09f1f784d587af61831c998c151bcc21bb74c2b2314b2102e3bd38009866c9da8ec4aa99cc4ea9c6c0dd46df15c61ef0ce1f271291714e5752ae9a3ed620
2025-01-20T06:15:27.2914366Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG   hsmd: Client: Received message 5 from client
2025-01-20T06:15:27.2915958Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Failed connected out: All addresses failed: 127.0.0.1:42595: Connection establishment: Connection refused. 
2025-01-20T06:15:27.2917891Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Will try reconnect in 2 seconds
2025-01-20T06:15:27.2921924Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG   plugin-cln-grpc: Received a message: CustomNotification(Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"Forcibly closed by `close` command timeout\"), \"new_state\": String(\"AWAITING_UNILATERAL\"), \"old_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:04.540Z\")}}})
2025-01-20T06:15:27.2928311Z lightningd-1 2025-01-20T05:46:04.548Z DEBUG   plugin-cln-grpc: Dispatching custom notification Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_state_changed\"), \"params\": Object {\"channel_state_changed\": Object {\"cause\": String(\"user\"), \"channel_id\": String(\"9aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab\"), \"message\": String(\"Forcibly closed by `close` command timeout\"), \"new_state\": String(\"AWAITING_UNILATERAL\"), \"old_state\": String(\"CHANNELD_SHUTTING_DOWN\"), \"peer_id\": String(\"022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59\"), \"short_channel_id\": String(\"103x1x0\"), \"timestamp\": String(\"2025-01-20T05:46:04.540Z\")}}}
2025-01-20T06:15:27.2932905Z lightningd-1 2025-01-20T05:46:04.549Z DEBUG   plugin-cln-grpc: Failed to parse notification from lightningd Error(\"unknown variant `channel_state_changed`, expected one of `block_added`, `channel_open_failed`, `channel_opened`, `connect`, `custommsg`\", line: 0, column: 0)
2025-01-20T06:15:27.2934604Z lightningd-1 2025-01-20T05:46:04.549Z TRACE   lightningd: Calling rpc_command hook of plugin cln-xpay
2025-01-20T06:15:27.2935599Z lightningd-1 2025-01-20T05:46:04.551Z TRACE   lightningd: Plugin cln-xpay returned from rpc_command hook call
2025-01-20T06:15:27.2936589Z lightningd-1 2025-01-20T05:46:04.553Z TRACE   lightningd: Calling rpc_command hook of plugin cln-xpay
2025-01-20T06:15:27.2937546Z lightningd-1 2025-01-20T05:46:04.554Z TRACE   lightningd: Plugin cln-xpay returned from rpc_command hook call
2025-01-20T06:15:27.2946767Z lightningd-1 2025-01-20T05:46:04.581Z DEBUG   plugin-bcli: sendrawtx exit 0 (bitcoin-cli -regtest -datadir=/tmp/ltests-qdo6i210/test_no_reconnect_awating_unilateral_1/lightning-1/ -rpcclienttimeout=60 -rpcport=57415 -rpcuser=... -stdinrpcpass sendrawtransaction 020000000001019aeb0e78b23e67f13670c602d5a40a7c8482a87220d93522cc0f7c6c9da9daab00000000009db0e280024a01000000000000220020be7935a77ca9ab70a4b8b1906825637767fed3c00824aa90c988983587d68488352f0f00000000002200209f4684ddb28acdc73959bc194d1a25df906f61ed030f52d163e6f1e247d32cbb0400473044022061656b7587a96723e3942f7445313d1c4315e8b2a17fade62d9a48d8a69e2806022024ac01825c79d3bcee6f5dc19586af6b3f8415156cbda934d0e2cbad9268a6a20147304402201fb54f5cb72da9dfd1399b8df6bf4f1950313b057fc0f5b30b2b09aec2783c9e022052dbd5e30bdc7015c0eb939ff3ad58b83c2d7369bb546571603ccfa62e65c4e00147522102324266de8403b3ab157a09f1f784d587af61831c998c151bcc21bb74c2b2314b2102e3bd38009866c9da8ec4aa99cc4ea9c6c0dd46df15c61ef0ce1f271291714e5752ae9a3ed620) 
2025-01-20T06:15:27.2956874Z lightningd-1 2025-01-20T05:46:04.581Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Low-priority anchorspend aiming for block 2119 (feerate 253)
2025-01-20T06:15:27.2958216Z lightningd-1 2025-01-20T05:46:04.583Z DEBUG   hsmd: Client: Received message 28 from client
2025-01-20T06:15:27.2958970Z lightningd-1 2025-01-20T05:46:09.354Z DEBUG   gossipd: seeker: no peers, waiting
2025-01-20T06:15:27.2959764Z lightningd-1 2025-01-20T05:46:09.568Z DEBUG   lightningd: channel_gossip: no longer in startup mode
2025-01-20T06:15:27.2960572Z lightningd-1 2025-01-20T05:46:14.358Z DEBUG   gossipd: seeker: no peers, waiting
2025-01-20T06:15:27.2962896Z {'github_repository': 'ElementsProject/lightning', 'github_sha': '8c945c6075752c2488c211f912e9b97d1d019fc5', 'github_ref': 'refs/pull/7886/merge', 'github_ref_name': 'HEAD', 'github_run_id': 12859491513, 'github_head_ref': 'guilt/test-htable', 'github_run_number': 11999, 'github_base_ref': 'master', 'github_run_attempt': '3', 'testname': 'test_no_reconnect_awating_unilateral', 'start_time': 1737351957, 'end_time': 1737351974, 'outcome': 'fail'}
2025-01-20T06:15:27.2965150Z --------------------------- Captured stdout teardown ---------------------------
```

We need to check that connectd got the "peer_downgrade" message: before that it *might* try to connect.

Signed-off-by: Rusty Russell <[email protected]>
I'd been resisting this, but some tests really do want "hang up after
*receiving* this", and it's more reliable than getting the peer to
"hang up afer *sending* this" which seems not always work.

Signed-off-by: Rusty Russell <[email protected]>
Sometimes, l1 wouldn't receive the msg before l2 hung up, causing
chaos.  What we *actually* want here is to make sure that l1 receives
the msg before closing:

```
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
>       l1.rpc.fundchannel(l2.info['id'], CHANNEL_SIZE)

tests/test_connection.py:667: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-client/pyln/client/lightning.py:767: in fundchannel
    return self.call("fundchannel", payload)
contrib/pyln-testing/pyln/testing/utils.py:740: in call
    res = LightningRpc.call(self, method, payload, cmdprefix, filter)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyln.testing.utils.PrettyPrintingLightningRpc object at 0x7f22112f5450>
method = 'fundchannel'
payload = {'amount': 50000, 'announce': True, 'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59'}
cmdprefix = None, filter = None

    def call(self, method, payload=None, cmdprefix=None, filter=None):
        """Generic call API: you can set cmdprefix here, or set self.cmdprefix
        before the call is made.
    
        """
        self.logger.debug("Calling %s with payload %r", method, payload)
    
        if payload is None:
            payload = {}
        # Filter out arguments that are None
        if isinstance(payload, dict):
            payload = {k: v for k, v in payload.items() if v is not None}
    
        this_id = self.get_json_id(method, cmdprefix)
        self.next_id += 1
    
        # FIXME: we open a new socket for every readobj call...
        sock = UnixSocket(self.socket_path)
    
        buf = b''
    
        if self._notify is not None:
            # Opt into the notifications support
            self._writeobj(sock, {
                "jsonrpc": "2.0",
                "method": "notifications",
                "id": this_id + "+notify-enable",
                "params": {
                    "enable": True
                },
            })
            # FIXME: Notification schema support?
            _, buf = self._readobj(sock, buf)
    
        request = {
            "jsonrpc": "2.0",
            "method": method,
            "params": payload,
            "id": this_id,
        }
    
        if filter is None:
            filter = self._filter
        if filter is not None:
            request["filter"] = filter
    
        self._writeobj(sock, request)
        while True:
            resp, buf = self._readobj(sock, buf)
            id = resp.get("id", None)
            meth = resp.get("method", None)
    
            if meth == 'message' and self._notify is not None:
                n = resp['params']
                self._notify(
                    message=n.get('message', None),
                    progress=n.get('progress', None),
                    request=request
                )
                continue
    
            if meth is None or id is None:
                break
    
        self.logger.debug("Received response for %s call: %r", method, resp)
        if 'id' in resp and resp['id'] != this_id:
            raise ValueError("Malformed response, id is not {}: {}.".format(this_id, resp))
        sock.close()
    
        if not isinstance(resp, dict):
            raise TypeError("Malformed response, response is not a dictionary %s." % resp)
        elif "error" in resp:
>           raise RpcError(method, payload, resp['error'])
E           pyln.client.lightning.RpcError: RPC call failed: method: fundchannel, payload: {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'amount': 50000, 'announce': True}, error: {'code': -1, 'message': 'Disconnected', 'data': {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'method': 'openchannel_update'}}
```

Signed-off-by: Rusty Russell <[email protected]>
```
____________________ ERROR at teardown of test_xpay_maxfee _____________________
...
            # Format a nice list of everything that went wrong and raise an exception
            request.node.has_errors = True
>           raise ValueError(str(errors))
E           ValueError: 
E           Node errors:
E            - lightningd-1: Node exited with return code 1
E           Global errors:
```

And:

```
    @unittest.skipIf(TEST_NETWORK != 'regtest', 'too dusty on elements')
    def test_xpay_maxfee(node_factory, bitcoind, chainparams):
        """Test which shows that we don't excees maxfee"""
        outfile = tempfile.NamedTemporaryFile(prefix='gossip-store-')
        subprocess.check_output(['devtools/gossmap-compress',
                                 'decompress',
                                 '--node-map=3301=022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59',
                                 'tests/data/gossip-store-2024-09-22.compressed',
                                 outfile.name]).decode('utf-8').splitlines()
        AMOUNT = 100_000_000
    
        # l2 will warn l1 about its invalid gossip: ignore.
        # We throttle l1's gossip to avoid massive log spam.
>       l1, l2 = node_factory.line_graph(2,
                                         # This is in sats, so 1000x amount we send.
                                         fundamount=AMOUNT,
                                         opts=[{'gossip_store_file': outfile.name,
                                                'subdaemon': 'channeld:../tests/plugins/channeld_fakenet',
                                                'allow_warning': True,
                                                'dev-throttle-gossip': None},
                                               {'allow_bad_gossip': True}])

tests/test_xpay.py:509: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
contrib/pyln-testing/pyln/testing/utils.py:1720: in line_graph
    nodes = self.get_nodes(num_nodes, opts=opts)
contrib/pyln-testing/pyln/testing/utils.py:1602: in get_nodes
    return [j.result() for j in jobs]
contrib/pyln-testing/pyln/testing/utils.py:1602: in <listcomp>
    return [j.result() for j in jobs]
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/concurrent/futures/_base.py:458: in result
    return self.__get_result()
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/concurrent/futures/_base.py:403: in __get_result
    raise self._exception
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/concurrent/futures/thread.py:58: in run
    result = self.fn(*self.args, **self.kwargs)
contrib/pyln-testing/pyln/testing/utils.py:1653: in get_node
    node.start(wait_for_bitcoind_sync)
contrib/pyln-testing/pyln/testing/utils.py:1015: in start
    self.daemon.start(stderr_redir=stderr_redir)
contrib/pyln-testing/pyln/testing/utils.py:671: in start
    self.wait_for_log("Server started with public key")
contrib/pyln-testing/pyln/testing/utils.py:355: in wait_for_log
    return self.wait_for_logs([regex], timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyln.testing.utils.LightningD object at 0x7f27ab586c20>
regexs = ['Server started with public key'], timeout = 180

    def wait_for_logs(self, regexs, timeout=TIMEOUT):
        """Look for `regexs` in the logs.
    
        The logs contain tailed stdout of the process. We look for each regex
        in `regexs`, starting from `logsearch_start` which normally is the
        position of the last found entry of a previous wait-for logs call.
        The ordering inside `regexs` doesn't matter.
    
        We fail if the timeout is exceeded or if the underlying process
        exits before all the `regexs` were found.
    
        If timeout is None, no time-out is applied.
        """
        logging.debug("Waiting for {} in the logs".format(regexs))
        exs = [re.compile(r) for r in regexs]
        start_time = time.time()
        while True:
            if self.logsearch_start >= len(self.logs):
                if not self.logs_catchup():
                    time.sleep(0.25)
    
                if timeout is not None and time.time() > start_time + timeout:
                    print("Time-out: can't find {} in logs".format(exs))
                    for r in exs:
                        if self.is_in_log(r):
                            print("({} was previously in logs!)".format(r))
>                   raise TimeoutError('Unable to find "{}" in logs.'.format(exs))
E                   TimeoutError: Unable to find "[re.compile('Server started with public key')]" in logs.
```

gossipd (and other plugins) simply take too long to digest the gossmap under valgrind.

Signed-off-by: Rusty Russell <[email protected]>
This test seems confused: l2 won't be able to reconnect to l1 (l1
connected to l2 in the first place, it's the only one which can
reconnect).  Also, there's a commitment_signed early on when we have
dual funding, so this doesn't actually test splicing disconnect in
that case?

The race seems to happen when l1 reconnectd, and l2 hasn't registered the disconnect yet.

```
________________________ test_splice_disconnect_commit _________________________
[gw9] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python

node_factory = <pyln.testing.utils.NodeFactory object at 0x7f53ebc1a950>
bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f53ebc1b760>
executor = <concurrent.futures.thread.ThreadPoolExecutor object at 0x7f53ebc1a170>

    @pytest.mark.openchannel('v1')
    @pytest.mark.openchannel('v2')
    @unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd doesnt yet support PSBT features we need')
    def test_splice_disconnect_commit(node_factory, bitcoind, executor):
        l1 = node_factory.get_node(options={'experimental-splicing': None}, may_reconnect=True)
        l2 = node_factory.get_node(disconnect=['+WIRE_COMMITMENT_SIGNED'],
                                   options={'experimental-splicing': None, 'dev-no-reconnect': None},
                                   may_reconnect=True)
>       l1.openchannel(l2, 1000000)

tests/test_splicing_disconnect.py:77: 

...
        elif "error" in resp:
>           raise RpcError(method, payload, resp['error'])
E           pyln.client.lightning.RpcError: RPC call failed: method: fundchannel, payload: {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'amount': 1000000, 'announce': True}, error: {'code': -1, 'message': 'Disconnected', 'data': {'id': '022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59', 'method': 'openchannel_update'}}
```

From the logs:

```
2025-01-23T23:50:25.4098040Z lightningd-2 2025-01-23T23:47:12.443Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: dev_disconnect: +WIRE_COMMITMENT_SIGNED (WIRE_COMMITMENT_SIGNED)
...
2025-01-23T23:50:25.4107026Z lightningd-2 2025-01-23T23:47:12.444Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-dualopend-chan#1: peer_out WIRE_COMMITMENT_SIGNED
2025-01-23T23:50:25.4108070Z lightningd-2 2025-01-23T23:47:12.444Z INFO    0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-dualopend-chan#1: Peer connection lost
2025-01-23T23:50:25.4109375Z lightningd-2 2025-01-23T23:47:12.445Z INFO    0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: Peer transient failure in DUALOPEND_OPEN_COMMIT_READY: dualopend: Owning subdaemon dualopend died (62208)
...
2025-01-23T23:50:25.4111195Z lightningd-2 2025-01-23T23:47:12.445Z DEBUG   plugin-cln-grpc: Received a message: CustomNotification(Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_open_failed\"), \"params\": Object {\"channel_open_failed\": Object {\"channel_id\": String(\"252d1b0a1e57895e84137f28cf19ab2c35847e284c112fefdecc7afeaa5c1de7\")}}})
2025-01-23T23:50:25.4113039Z lightningd-2 2025-01-23T23:47:12.445Z DEBUG   plugin-cln-grpc: Dispatching custom notification Object {\"jsonrpc\": String(\"2.0\"), \"method\": String(\"channel_open_failed\"), \"params\": Object {\"channel_open_failed\": Object {\"channel_id\": String(\"252d1b0a1e57895e84137f28cf19ab2c35847e284c112fefdecc7afeaa5c1de7\")}}}
2025-01-23T23:50:25.4114525Z lightningd-2 2025-01-23T23:47:12.446Z DEBUG   plugin-funder: Cleaning up inflights for peer id 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518
2025-01-23T23:50:25.4115550Z lightningd-2 2025-01-23T23:47:12.446Z DEBUG   plugin-funder: Cleaning up inflight for channel_id 252d1b0a1e57895e84137f28cf19ab2c35847e284c112fefdecc7afeaa5c1de7
2025-01-23T23:50:25.4116406Z lightningd-2 2025-01-23T23:47:12.446Z TRACE   lightningd: Calling rpc_command hook of plugin cln-xpay
2025-01-23T23:50:25.4117134Z lightningd-2 2025-01-23T23:47:12.447Z TRACE   lightningd: Plugin cln-xpay returned from rpc_command hook call
2025-01-23T23:50:25.4117728Z lightningd-1 2025-01-23T23:47:12.448Z DEBUG   connectd: drain_peer
2025-01-23T23:50:25.4118229Z lightningd-1 2025-01-23T23:47:12.448Z DEBUG   connectd: drain_peer draining subd!
2025-01-23T23:50:25.4119066Z lightningd-1 2025-01-23T23:47:12.448Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-lightningd: peer_disconnect_done
2025-01-23T23:50:25.4119792Z lightningd-1 2025-01-23T23:47:12.449Z DEBUG   connectd: maybe_free_peer freeing peer!
...
2025-01-23T23:50:25.4135647Z lightningd-2 2025-01-23T23:47:12.455Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Connect IN
2025-01-23T23:50:25.4136554Z lightningd-1 2025-01-23T23:47:12.455Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Connected out, starting crypto
2025-01-23T23:50:25.4137502Z lightningd-1 2025-01-23T23:47:12.456Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: Connect OUT
2025-01-23T23:50:25.4138483Z lightningd-1 2025-01-23T23:47:12.456Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: peer_out WIRE_INIT
2025-01-23T23:50:25.4139407Z lightningd-1 2025-01-23T23:47:12.456Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: peer_in WIRE_INIT
2025-01-23T23:50:25.4140714Z lightningd-1 2025-01-23T23:47:12.456Z INFO    022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Peer transient failure in DUALOPEND_OPEN_COMMIT_READY: Disconnected
```

Signed-off-by: Rusty Russell <[email protected]>
Sometimes they connect too fast, so we don't get a chance to ratelimit all of them:

```
    def test_connect_ratelimit(node_factory, bitcoind):
        """l1 has 5 peers, restarts, make sure we limit"""
        nodes = node_factory.get_nodes(6,
                                       opts=[{'dev-limit-connections-inflight': None, 'may_reconnect': True}] + [{'may_reconnect': True}] * 5)
    
        l1 = nodes[0]
        nodes = nodes[1:]
    
        addr = l1.rpc.newaddr()['bech32']
        for n in nodes:
            bitcoind.rpc.sendtoaddress(addr, (FUNDAMOUNT + 1000000) / 10**8)
        bitcoind.generate_block(1, wait_for_mempool=len(nodes))
        sync_blockheight(bitcoind, [l1])
    
        for n in nodes:
            l1.rpc.connect(n.info['id'], 'localhost', n.port)
            l1.rpc.fundchannel(n.info['id'], FUNDAMOUNT)
    
        # Make sure all channels are established and announced.
        bitcoind.generate_block(6, wait_for_mempool=len(nodes))
        wait_for(lambda: len(l1.rpc.listchannels()['channels']) == len(nodes) * 2)
    
        assert not l1.daemon.is_in_log('Unblocking for')
    
        l1.restart()
    
        # The first will be ok, but others should block and be unblocked.
>       l1.daemon.wait_for_logs((['Unblocking for ']
                                 + ['Too many connections, waiting'])
                                * (len(nodes) - 1))

tests/test_connection.py:4721: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyln.testing.utils.LightningD object at 0x7f6e288a3a60>
regexs = ['Unblocking for ', 'Too many connections, waiting', 'Unblocking for ', 'Too many connections, waiting', 'Unblocking for ', 'Too many connections, waiting', ...]
timeout = 180

    def wait_for_logs(self, regexs, timeout=TIMEOUT):
        """Look for `regexs` in the logs.
    
        The logs contain tailed stdout of the process. We look for each regex
        in `regexs`, starting from `logsearch_start` which normally is the
        position of the last found entry of a previous wait-for logs call.
        The ordering inside `regexs` doesn't matter.
    
        We fail if the timeout is exceeded or if the underlying process
        exits before all the `regexs` were found.
    
        If timeout is None, no time-out is applied.
        """
        logging.debug("Waiting for {} in the logs".format(regexs))
        exs = [re.compile(r) for r in regexs]
        start_time = time.time()
        while True:
            if self.logsearch_start >= len(self.logs):
                if not self.logs_catchup():
                    time.sleep(0.25)
    
                if timeout is not None and time.time() > start_time + timeout:
                    print("Time-out: can't find {} in logs".format(exs))
                    for r in exs:
                        if self.is_in_log(r):
                            print("({} was previously in logs!)".format(r))
>                   raise TimeoutError('Unable to find "{}" in logs.'.format(exs))
E                   TimeoutError: Unable to find "[re.compile('Unblocking for '), re.compile('Too many connections, waiting')]" in logs.
```

Signed-off-by: Rusty Russell <[email protected]>
We can be a bit early in our assertion:

```
    @unittest.skipIf(os.getenv('TEST_DB_PROVIDER', 'sqlite3') != 'sqlite3', "sqlite3-specific DB manip")
    def test_reconnect_remote_sends_no_sigs(node_factory):
        """We re-announce, even when remote node doesn't send its announcement_signatures on reconnect.
        """
        l1, l2 = node_factory.line_graph(2, wait_for_announce=True, opts={'may_reconnect': True,
                                                                          'dev-no-reconnect': None})
    
        # Wipe l2's gossip_store
        l2.stop()
        gs_path = os.path.join(l2.daemon.lightning_dir, TEST_NETWORK, 'gossip_store')
        os.unlink(gs_path)
        l2.start()
    
        # l2 will now uses (REMOTE's) announcement_signatures it has stored
        wait_for(lambda: l2.rpc.listchannels()['channels'] != [])
    
        # Remove remote signatures from l1 so it asks for them (and delete gossip store)
        l1.db_manip("UPDATE channels SET remote_ann_node_sig=NULL, remote_ann_bitcoin_sig=NULL")
        gs_path = os.path.join(l1.daemon.lightning_dir, TEST_NETWORK, 'gossip_store')
        os.unlink(gs_path)
        l1.restart()
    
        l1.connect(l2)
        l1needle = l1.daemon.logsearch_start
        l2needle = l2.daemon.logsearch_start
    
        # l1 asks once, l2 replies once.
        # Make sure we get all the msgs!
        time.sleep(5)
    
        l1.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES')
        l2.daemon.wait_for_log('peer_out WIRE_ANNOUNCEMENT_SIGNATURES')
    
        l1msgs = [l.split()[4] for l in l1.daemon.logs[l1needle:] if 'WIRE_ANNOUNCEMENT_SIGNATURES' in l]
>       assert l1msgs == ['peer_out', 'peer_in']
E       AssertionError: assert ['peer_out'] == ['peer_out', 'peer_in']
E         Right contains one more item: 'peer_in'
E         Full diff:
E         - ['peer_out', 'peer_in']
E         + ['peer_out']
```

```
lightningd-2 2025-01-24T05:53:22.862Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: peer_out WIRE_ANNOUNCEMENT_SIGNATURES
lightningd-1 2025-01-24T05:53:22.864Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#1: peer_in WIRE_ANNOUNCEMENT_SIGNATURES
lightningd-1 2025-01-24T05:53:22.885Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: channel_gossip: received announcement sigs for 103x1x0 (we have 103x1x0)
{'github_repository': 'ElementsProject/lightning', 'github_sha': 'e9d36f2b8ecd45882753cbe062c355e40bc7109c', 'github_ref': 'refs/pull/8027/merge', 'github_ref_name': 'HEAD', 'github_run_id': 12943530601, 'github_head_ref':
```

Signed-off-by: Rusty Russell <[email protected]>
This caused a "flake" in testing, because it's wrong:

```
_____________________________ test_gossip_pruning ______________________________
[gw2] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python

node_factory = <pyln.testing.utils.NodeFactory object at 0x7f0267530490>
bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f0267532b30>

    def test_gossip_pruning(node_factory, bitcoind):
        """ Create channel and see it being updated in time before pruning
        """
        l1, l2, l3 = node_factory.get_nodes(3, opts={'dev-fast-gossip-prune': None,
                                                     'allow_bad_gossip': True,
                                                     'autoconnect-seeker-peers': 0})
    
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
        l2.rpc.connect(l3.info['id'], 'localhost', l3.port)
    
        scid1, _ = l1.fundchannel(l2, 10**6)
        scid2, _ = l2.fundchannel(l3, 10**6)
    
        mine_funding_to_announce(bitcoind, [l1, l2, l3])
        wait_for(lambda: l1.rpc.listchannels(source=l1.info['id'])['channels'] != [])
        l1_initial_cupdate_timestamp = only_one(l1.rpc.listchannels(source=l1.info['id'])['channels'])['last_update']
    
        # Get timestamps of initial updates, so we can ensure they change.
        # Channels should be activated locally
>       wait_for(lambda: [c['active'] for c in l1.rpc.listchannels()['channels']] == [True] * 4)
```

Here you can see it has pruned:

```
lightningd-1 2025-01-24T07:39:40.873Z DEBUG   gossipd: Pruning channel 105x1x0 from network view (ages 1737704380 and 0)
...
lightningd-1 2025-01-24T07:39:50.941Z UNUSUAL lightningd: Bad gossip order: could not find channel 105x1x0 for peer's channel update
```

Changelog-Fixed: Protocol: we were overzealous in pruning channels if we hadn't seen one side's gossip update yet.
Signed-off-by: Rusty Russell <[email protected]>
So not every message type counts: this is useful when we want to get a specific number
of a specific type.

Signed-off-by: Rusty Russell <[email protected]>
By having gossipwith filter out messages we don't want, we can get the counts of
expected messages correct, and not hit errors like this:

```
    def test_gossip_throttle(node_factory, bitcoind, chainparams):
        """Make some gossip, test it gets throttled"""
        l1, l2, l3, l4 = node_factory.line_graph(4, wait_for_announce=True,
                                                 opts=[{}, {}, {}, {'dev-throttle-gossip': None}])
    
        # We expect: self-advertizement (3 messages for l1 and l4) plus
        # 4 node announcements, 3 channel announcements and 6 channel updates.
        # We also expect it to send a timestamp filter message.
        # (We won't take long enough to get a ping!)
        expected = 4 + 4 + 3 + 6 + 1
    
        # l1 is unlimited
        start_fast = time.time()
        out1 = subprocess.run(['devtools/gossipwith',
                               '--all-gossip',
                               '--hex',
                               '--network={}'.format(TEST_NETWORK),
                               '--max-messages={}'.format(expected),
                               '{}@localhost:{}'.format(l1.info['id'], l1.port)],
                              check=True,
                              timeout=TIMEOUT, stdout=subprocess.PIPE).stdout.split()
        time_fast = time.time() - start_fast
        assert time_fast < 2
        # Remove timestamp filter, since timestamp will change!
        out1 = [m for m in out1 if not m.startswith(b'0109')]
    
        # l4 is throttled
        start_slow = time.time()
        out2 = subprocess.run(['devtools/gossipwith',
                               '--all-gossip',
                               '--hex',
                               '--network={}'.format(TEST_NETWORK),
                               '--max-messages={}'.format(expected),
                               '{}@localhost:{}'.format(l4.info['id'], l4.port)],
                              check=True,
                              timeout=TIMEOUT, stdout=subprocess.PIPE).stdout.split()
        time_slow = time.time() - start_slow
        assert time_slow > 3
    
        # Remove timestamp filter, since timestamp will change!
        out2 = [m for m in out2 if not m.startswith(b'0109')]
    
        # Contents should be identical (once uniquified, since each
        # doubles-up on its own gossip)
>       assert set(out1) == set(out2)
E       AssertionError: assert {b'010054b1907bdf639c9060e0fa4bca02419c46f75a99f0908b87a2e09711d5d031ba76b8fd07acc8be1b2fac9e31efb808e5d362c32ef4665...
E         Extra items in the left set:
E         b'01010ad5be8b9ba029245c2ae2d667af7ead7c0129c479c7fd7145a9b65931e90222082e6e4ab37ef60ebd10f1493d73e8bf7a40c4ae5f7d87cc...8488830b60f7e744ed9235eb0b1ba93283b315c035180266e44a554e494f524245414d2d333930353033622d6d6f64646564000000000000000000'
E         Extra items in the right set:
E         b'01079f87eb580b9e5f11dc211e9fb66abb3699999044f8fe146801162393364286c6000000010000006c010101'
```

Signed-off-by: Rusty Russell <[email protected]>
We can ask for the state too fast, before WIRE_ERROR is processed:

```
2025-01-24T14:17:57.4799255Z ________________________ test_onchain_reestablish_reply ________________________
2025-01-24T14:17:57.4800383Z [gw2] linux -- Python 3.10.16 /home/runner/.cache/pypoetry/virtualenvs/cln-meta-project-AqJ9wMix-py3.10/bin/python
2025-01-24T14:17:57.4801246Z 
2025-01-24T14:17:57.4802068Z node_factory = <pyln.testing.utils.NodeFactory object at 0x7f66ec144400>
2025-01-24T14:17:57.4802645Z bitcoind = <pyln.testing.utils.BitcoinD object at 0x7f66ec145ab0>
2025-01-24T14:17:57.4803483Z executor = <concurrent.futures.thread.ThreadPoolExecutor object at 0x7f66ec146da0>
2025-01-24T14:17:57.4804029Z 
2025-01-24T14:17:57.4804340Z     def test_onchain_reestablish_reply(node_factory, bitcoind, executor):
2025-01-24T14:17:57.4805013Z         l1, l2, l3 = node_factory.line_graph(3, opts={'may_reconnect': True,
2025-01-24T14:17:57.4805572Z                                                       'dev-no-reconnect': None})
2025-01-24T14:17:57.4805990Z     
2025-01-24T14:17:57.4806262Z         # Make l1 close unilaterally.
2025-01-24T14:17:57.4806697Z         l1.rpc.disconnect(l2.info['id'], force=True)
2025-01-24T14:17:57.4807200Z         l1.rpc.close(l2.info['id'], unilateraltimeout=1)
2025-01-24T14:17:57.4807622Z     
2025-01-24T14:17:57.4808025Z         # l2 doesn't know, reconnection should get REESTABLISH *then* error.
2025-01-24T14:17:57.4808637Z         l2.rpc.connect(l1.info['id'], 'localhost', l1.port)
2025-01-24T14:17:57.4809066Z     
2025-01-24T14:17:57.4809334Z         # We should exchange messages
2025-01-24T14:17:57.4809835Z         l2.daemon.wait_for_logs(["peer_out WIRE_CHANNEL_REESTABLISH",
2025-01-24T14:17:57.4810399Z                                  "peer_in WIRE_CHANNEL_REESTABLISH"])
2025-01-24T14:17:57.4811197Z         # It should be OK
2025-01-24T14:17:57.4811642Z         l2.daemon.wait_for_log("Reconnected, and reestablished.")
2025-01-24T14:17:57.4812116Z     
2025-01-24T14:17:57.4812378Z         # Then we get the error, close.
2025-01-24T14:17:57.4812824Z         l2.daemon.wait_for_log("peer_in WIRE_ERROR")
2025-01-24T14:17:57.4813576Z >       assert only_one(l2.rpc.listpeerchannels(l1.info['id'])['channels'])['state'] == 'AWAITING_UNILATERAL'
2025-01-24T14:17:57.4814446Z E       AssertionError: assert 'CHANNELD_NORMAL' == 'AWAITING_UNILATERAL'
2025-01-24T14:17:57.4815014Z E         - AWAITING_UNILATERAL
2025-01-24T14:17:57.4815362Z E         + CHANNELD_NORMAL
```

```
2025-01-24T14:17:57.5760435Z lightningd-2 2025-01-24T14:04:28.398Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: peer_in WIRE_ERROR
2025-01-24T14:17:57.5760674Z lightningd-2 2025-01-24T14:04:28.401Z TRACE   lightningd: Calling rpc_command hook of plugin cln-xpay
2025-01-24T14:17:57.5760935Z lightningd-2 2025-01-24T14:04:28.403Z TRACE   lightningd: Plugin cln-xpay returned from rpc_command hook call
2025-01-24T14:17:57.5761176Z lightningd-2 2025-01-24T14:04:28.408Z TRACE   lightningd: Calling rpc_command hook of plugin cln-xpay
2025-01-24T14:17:57.5761430Z lightningd-2 2025-01-24T14:04:28.414Z TRACE   lightningd: Plugin cln-xpay returned from rpc_command hook call
2025-01-24T14:17:57.5762598Z {'github_repository': 'ElementsProject/lightning', 'github_sha': '390503b31e1949cb84e5a5257bf29a1b80d5cd20', 'github_ref': 'refs/pull/8027/merge', 'github_ref_name': 'HEAD', 'github_run_id': 12950414032, 'github_head_ref': 'guilt/fix-flakes12', 'github_run_number': 12051, 'github_base_ref': 'master', 'github_run_attempt': '1', 'testname': 'test_onchain_reestablish_reply', 'start_time': 1737727408, 'end_time': 1737727468, 'outcome': 'fail'}
2025-01-24T14:17:57.5763430Z lightningd-2 2025-01-24T14:04:28.426Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-channeld-chan#1: billboard perm: Received ERROR channel 69d744488de7ec7f4c018da98a8a9d726ee3b47a97809608e4ce6f17747b5f09: Forcibly closed by `close` command timeout
2025-01-24T14:17:57.5764512Z lightningd-2 2025-01-24T14:04:28.433Z UNUSUAL 0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: Peer permanent failure in CHANNELD_NORMAL: channeld: received ERROR channel 69d744488de7ec7f4c018da98a8a9d726ee3b47a97809608e4ce6f17747b5f09: Forcibly closed by `close` command timeout (reason=protocol)
2025-01-24T14:17:57.5764997Z lightningd-2 2025-01-24T14:04:28.439Z INFO    0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-chan#1: State changed from CHANNELD_NORMAL to AWAITING_UNILATERAL
```

Signed-off-by: Rusty Russell <[email protected]>
Doesn't always die messily, it seems?


```
    @unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd doesnt yet support PSBT features we need')
    @pytest.mark.openchannel('v2')
    def test_rbf_reconnect_tx_construct(node_factory, bitcoind, chainparams):
        disconnects = ['=WIRE_TX_ADD_INPUT',  # Initial funding succeeds
                       '-WIRE_TX_ADD_INPUT',
                       '+WIRE_TX_ADD_INPUT',
                       '-WIRE_TX_ADD_OUTPUT',
                       '+WIRE_TX_ADD_OUTPUT',
                       '-WIRE_TX_COMPLETE',
                       '+WIRE_TX_COMPLETE',
                       '-WIRE_COMMITMENT_SIGNED',
                       '+WIRE_COMMITMENT_SIGNED']
    
        l1, l2 = node_factory.get_nodes(2,
                                        opts=[{'disconnect': disconnects,
                                               'may_reconnect': True,
                                               'dev-no-reconnect': None},
                                              {'may_reconnect': True,
                                               'dev-no-reconnect': None,
                                               'broken_log': 'dualopend daemon died before signed PSBT returned'}])
    
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
        amount = 2**24
        chan_amount = 100000
        bitcoind.rpc.sendtoaddress(l1.rpc.newaddr()['bech32'], amount / 10**8 + 0.01)
        bitcoind.generate_block(1)
        # Wait for it to arrive.
        wait_for(lambda: len(l1.rpc.listfunds()['outputs']) > 0)
    
        res = l1.rpc.fundchannel(l2.info['id'], chan_amount)
        chan_id = res['channel_id']
        vins = bitcoind.rpc.decoderawtransaction(res['tx'])['vin']
        assert(only_one(vins))
        prev_utxos = ["{}:{}".format(vins[0]['txid'], vins[0]['vout'])]
    
        # Check that we're waiting for lockin
        l1.daemon.wait_for_log(' to DUALOPEND_AWAITING_LOCKIN')
    
        # rbf the lease with a higher amount
        rate = int(find_next_feerate(l1, l2)[:-5])
        # We 4x the feerate to beat the min-relay fee
        next_feerate = '{}perkw'.format(rate * 4)
    
        # Initiate an RBF
        startweight = 42 + 172  # base weight, funding output
        initpsbt = l1.rpc.utxopsbt(chan_amount, next_feerate, startweight,
                                   prev_utxos, reservedok=True,
                                   excess_as_change=True)
    
        # Run through TX_ADD wires
        for d in disconnects[1:-4]:
            l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
            with pytest.raises(RpcError):
                l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt'])
            wait_for(lambda: l1.rpc.getpeer(l2.info['id'])['connected'] is False)
    
        # The first TX_COMPLETE breaks
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
        bump = l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt'])
        with pytest.raises(RpcError):
            update = l1.rpc.openchannel_update(chan_id, bump['psbt'])
        wait_for(lambda: l1.rpc.getpeer(l2.info['id'])['connected'] is False)
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
        # l1 should remember, l2 has forgotten
        # l2 should send tx-abort, to reset
        l2.daemon.wait_for_log(r'tx-abort: Sent next_funding_txid .* doesn\'t match ours .*')
        l1.daemon.wait_for_log(r'Cleaned up incomplete inflight')
        # abort doesn't cause a disconnect
        assert l1.rpc.getpeer(l2.info['id'])['connected']
    
        # The next TX_COMPLETE break (both remember) + they break on the
        # COMMITMENT_SIGNED during the reconnect
        bump = l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt'])
        with pytest.raises(RpcError):
            update = l1.rpc.openchannel_update(chan_id, bump['psbt'])
        wait_for(lambda: l1.rpc.getpeer(l2.info['id'])['connected'] is False)
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
        l2.daemon.wait_for_logs([r'Got dualopend reestablish',
                                 r'No commitment, not sending our sigs'])
        l1.daemon.wait_for_logs([r'Got dualopend reestablish',
                                 r'No commitment, not sending our sigs',
                                 r'dev_disconnect: -WIRE_COMMITMENT_SIGNED',
                                 'peer_disconnect_done'])
        assert not l1.rpc.getpeer(l2.info['id'])['connected']
        l1.rpc.connect(l2.info['id'], 'localhost', l2.port)
    
        # COMMITMENT_SIGNED disconnects *during* the reconnect
        # We can't bump because the last negotiation is in the wrong state
        with pytest.raises(RpcError, match=r'Funding sigs for this channel not secured'):
            l1.rpc.openchannel_bump(chan_id, chan_amount, initpsbt['psbt'])
        # l2 reconnects, but doesn't have l1's commitment
>       l2.daemon.wait_for_logs([r'Got dualopend reestablish',
                                 r'No commitment, not sending our sigs',
                                 # This is a BROKEN log, it's expected!
                                 r'dualopend daemon died before signed PSBT returned'])

tests/test_opening.py:944: 
...
>                   raise TimeoutError('Unable to find "{}" in logs.'.format(exs))
E                   TimeoutError: Unable to find "[re.compile('dualopend daemon died before signed PSBT returned')]" in logs.
```

Signed-off-by: Rusty Russell <[email protected]>
If we get "lucky" then commit tx will have short sig, one less weight (1 in 256 chance):

```
    @unittest.skipIf(TEST_NETWORK != 'regtest', 'elementsd anchors not supportd')
    def test_onchain_slow_anchor(node_factory, bitcoind):
        """We still use anchors for non-critical closes"""
        l1, l2 = node_factory.line_graph(2)
    
        # Don't let l1 succeed in sending commit tx
        def censoring_sendrawtx(r):
            return {'id': r['id'], 'result': {}}
    
        l1.daemon.rpcproxy.mock_rpc('sendrawtransaction', censoring_sendrawtx)
    
        close_start_depth = bitcoind.rpc.getblockchaininfo()['blocks']
    
        # Make l1 close unilaterally.
        l1.rpc.disconnect(l2.info['id'], force=True)
        l1.rpc.close(l2.info['id'], unilateraltimeout=1)
    
        # We will have a super-low-prio anchor spend.
        l1.daemon.wait_for_log(r"Low-priority anchorspend aiming for block {} \(feerate 253\)".format(close_start_depth + 2016))
    
        # Restart with reduced block time.
        l1.stop()
        l1.daemon.opts['dev-low-prio-anchor-blocks'] = 20
        l1.start()
    
        l1.daemon.wait_for_log("Low-priority anchorspend aiming for block {}".format(close_start_depth + 20))
        l1.daemon.wait_for_log("Anchorspend for local commit tx")
    
        # Won't go under 12 blocks though.
    
        # Make sure it sees all these blocks at once, to avoid test flakes!
        l1.stop()
        bitcoind.generate_block(7)
        l1.start()
    
        height = bitcoind.rpc.getblockchaininfo()['blocks']
        l1.daemon.wait_for_log(r"Low-priority anchorspend aiming for block {} \(feerate 7458\)".format(height + 13))
>       l1.daemon.wait_for_log(r"Anchorspend for local commit tx fee 12335sat \(w=714\), commit_tx fee 4545sat \(w=768\): package feerate 11390 perkw")
```

Here's the log we *did* get:

```
2025-01-25T08:46:40.9399213Z lightningd-1 2025-01-25T08:40:06.312Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Anchorspend for local commit tx fee 12328sat (w=714), commit_tx fee 4545sat (w=767): package feerate 11392 perkw
```

Signed-off-by: Rusty Russell <[email protected]>
We have CI runs which timeout (after 2 hours).  It's not clear why,
but we can at least eliminate CLN lockups as the answer.

Since pytest disabled the --timeout option on test shutdown, we could be
seeing an issue on stopping taking a long time?

Signed-off-by: Rusty Russell <[email protected]>
@rustyrussell rustyrussell requested a review from cdecker as a code owner January 26, 2025 06:03
@rustyrussell rustyrussell merged commit 4e887c7 into ElementsProject:master Jan 27, 2025
33 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant