-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beat itest [2/3]: document and fix itest flakes #9307
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Pull reviewers statsStats of the last 30 days for lnd:
|
b1011ac
to
d8d2a54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing bug fixes, thank you so much! Crazy how many flakes you were to investigate and fix! You're my absolute hero 💯
c6819c0
to
3c60868
Compare
a558fe3
to
3ac3b53
Compare
3c60868
to
abda3ce
Compare
3ac3b53
to
8abca8c
Compare
abda3ce
to
21082c5
Compare
8abca8c
to
6f59050
Compare
21082c5
to
9374711
Compare
6f59050
to
9c24fc0
Compare
9374711
to
e7beb0a
Compare
9c24fc0
to
a6b616e
Compare
e7beb0a
to
d03a8b2
Compare
a6b616e
to
598f2a0
Compare
d03a8b2
to
dfa75eb
Compare
598f2a0
to
e5cbddd
Compare
8cfd618
to
bde86dc
Compare
e5cbddd
to
064f494
Compare
bde86dc
to
8e08948
Compare
064f494
to
4e1822c
Compare
8e08948
to
890b841
Compare
4e1822c
to
d36937e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this very laborious PR 👏 It is very time-expensive to study all these itest failures and fix them 🧹
// multisig funding output. | ||
func runPsbtChanFunding(ht *lntest.HarnessTest, carol, dave *node.HarnessNode, | ||
private bool, commitType lnrpc.CommitmentType) { | ||
func runPsbtChanFundingWithNodes(ht *lntest.HarnessTest, carol, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: godoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
itest/lnd_rest_api_test.go
Outdated
@@ -506,6 +505,7 @@ func wsTestCaseBiDirectionalSubscription(ht *lntest.HarnessTest) { | |||
} | |||
return | |||
} | |||
ht.Log("Finish writing message") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Finished writing message
maybe also more detail which message ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -376,7 +377,7 @@ func runFeeEstimationTestCase(ht *lntest.HarnessTest, | |||
) | |||
feeReq = &routerrpc.RouteFeeRequest{ | |||
PaymentRequest: payReqs[0], | |||
Timeout: 10, | |||
Timeout: uint32(wait.PaymentTimeout.Seconds()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the flake here tho, we are just increasing the timeout to 60s ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the previous 10s will cause the payment to time out, updated the commit msg
@@ -237,17 +237,17 @@ func testUnannouncedChannels(ht *lntest.HarnessTest) { | |||
ht.WaitForChannelOpenEvent(chanOpenUpdate) | |||
|
|||
// Alice should have 1 edge in her graph. | |||
ht.AssertNumActiveEdges(alice, 1, true) | |||
ht.AssertNumEdges(alice, 1, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the windows build have no problem with this call tho, why are we not catching this nil case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah when running in windows the behavior is quite different, which is why we skip some tests for the window build in the final PR, that being said, we don't really know why windows has this issue, and we now have somewhat a giant TODO list for the windows build.
@@ -269,6 +269,9 @@ func runPsbtChanFundingWithNodes(ht *lntest.HarnessTest, carol, | |||
txHash := finalTx.TxHash() | |||
block := ht.MineBlocksAndAssertNumTxes(6, 1)[0] | |||
ht.AssertTxInBlock(block, txHash) | |||
|
|||
ht.AssertChannelActive(carol, chanPoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should pack the AssertChannelActive
and AssertChannelInGraph
in a new call so we never forget these two ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good q - I think we need to do a proper cleanup of the assertion methods in lntest
, to make them clear that which one is asserting the channeldb, and the graphdb. But think that will be something far in the future...
// `SendPaymentV2` case, where a payment amount of 300,000 sats is used and it | ||
// tests sending three attempts: the first has 150,000 sats, the rest two have | ||
// 75,000 sats. It returns the payment amt. | ||
func (c *mppTestScenario) setupSendPaymentCase() btcutil.Amount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
itest/lnd_mpp_test.go
Outdated
@@ -378,6 +378,33 @@ func (c *mppTestScenario) setupSendPaymentCase() btcutil.Amount { | |||
// - 2nd attempt(75,000 sats): Alice->Dave->Bob: 155,000 sats. | |||
// - 3rd attempt(75,000 sats): Alice->Carol->Eve->Bob: 155,000 sats. | |||
// | |||
// There is a case where the payment will fail due to the channel | |||
// capacity not being updated in the graph, which has been seen many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the channel capacity is a constant value so how can it be updated in the graph ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to be bandwidth
lntest/harness.go
Outdated
// Make sure the nodes know each other's channels if they are public. | ||
if !p.Private { | ||
// If the channels are private, make sure the channel participants know | ||
// the relevant channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: channel => channels ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// - the local one will be swept. | ||
// - the remote one will be marked as failed due to `testmempoolaccept` | ||
// check. | ||
// - the pending remote one will not be attempted due to it being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you here talking about hte case after the commitment is confirmed (dangling anchor) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope - before the FC is confirmed, the uneconomical inputs will not be attempted since we check them when making the input sets.
// | ||
// TODO(yy): Remove the following wait and use the direct call, then | ||
// investigate the bug in the edge unifier. | ||
var route *lnrpc.Route |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so eventually it shows up after some time the missing edge ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, which is weird, think it's most likely due to graph being cached and not updated in time.
So it's easier to get the logs and debug.
We previously didn't see this issue because we always have nodes being over-funded.
So we know which open channel operation failed.
We need to mine an empty block as the tx may already have entered the mempool. This should be fixed once we start using the sweeper to handle the justice tx.
The reconnection will happen automatically when the nodes have a channel, so we just ensure the connection instead of reconnecting directly.
The test used 10s as the timeout value, which can easily cause a timeout in a slow build so we increase it to 60s.
So we won't forget to assert the topology after opening a chain of channels.
This is no longer needed since we don't have standby nodes, plus it's causing panic in windows build due to `edge.Policy` being nil.
This has been seen in the itest which can lead to the node startup failure, ``` 2024-11-20 18:55:15.727 [INF] RPCS: Max websocket clients exceeded [25] - disconnecting client 127.0.0.1:57224 ```
This is needed so we can have one place to fix the flakes found in the MPP-related tests, which is fixed in the following commit.
We now make sure the channel participants have heard their private channel when opening channels.
d36937e
to
55b40e2
Compare
Check #9306 for context, and #9260 for the final result.
Changes in each commit is pretty small, major changes are,
flatten some tests so it's easier to debug.
We encountered issue [bug]: listunspent shows spent outputs #8786 - which should have been caught way easier if we didn't use the standby nodes, and is now temporarily hacked. This will be my high-priority item to fix.
when checking for the sweeping tx, if the tx is RBFed, the old txid won't work.
connect to an already connected node can result in an error.
increase payment timeout for sql.
assert channel is in the grapb before sending payments.
assert outgoing and incoming HTLCs separately.
increase
rpcmaxwebsockets
forbtcd
.document the MPP flake - we now have a much better understanding of how it happened.
some other minor refactors.