Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beat itest [2/3]: document and fix itest flakes #9307

Merged
merged 20 commits into from
Dec 18, 2024

Conversation

yyforyongyu
Copy link
Member

Check #9306 for context, and #9260 for the final result.

Changes in each commit is pretty small, major changes are,

  • flatten some tests so it's easier to debug.

  • We encountered issue [bug]: listunspent shows spent outputs #8786 - which should have been caught way easier if we didn't use the standby nodes, and is now temporarily hacked. This will be my high-priority item to fix.

  • when checking for the sweeping tx, if the tx is RBFed, the old txid won't work.

  • connect to an already connected node can result in an error.

  • increase payment timeout for sql.

  • assert channel is in the grapb before sending payments.

  • assert outgoing and incoming HTLCs separately.

  • increase rpcmaxwebsockets for btcd.

  • document the MPP flake - we now have a much better understanding of how it happened.

  • some other minor refactors.

Copy link
Contributor

coderabbitai bot commented Nov 26, 2024

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

Pull reviewers stats

Stats of the last 30 days for lnd:

User Total reviews Time to review Total comments
guggero
🥇
29
▀▀▀▀
13h 1m
31
▀▀
yyforyongyu
🥈
11
1d 1h 32m
28
▀▀
ellemouton
🥉
11
16h 30m
21
ziggie1984
8
12h 47m
9
Roasbeef
7
4d 13h 56m
▀▀
47
▀▀▀
ProofOfKeags
3
5d 4h 23m
▀▀
10
Abdulkbk
3
13h 37m
2
ViktorTigerstrom
2
2d 14h 26m
5
alexbosworth
1
4d 11h 58m
▀▀
1
bhandras
1
12h
0
bitromortac
1
16h 16m
0

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing bug fixes, thank you so much! Crazy how many flakes you were to investigate and fix! You're my absolute hero 💯

@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-shuffle branch from c6819c0 to 3c60868 Compare December 3, 2024 22:08
@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-flakes branch 2 times, most recently from a558fe3 to 3ac3b53 Compare December 3, 2024 22:13
@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-shuffle branch from 3c60868 to abda3ce Compare December 4, 2024 06:36
@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-shuffle branch from abda3ce to 21082c5 Compare December 5, 2024 05:15
@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-shuffle branch from 21082c5 to 9374711 Compare December 5, 2024 15:09
@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-shuffle branch from 9374711 to e7beb0a Compare December 10, 2024 07:18
@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-shuffle branch from e7beb0a to d03a8b2 Compare December 12, 2024 12:14
@ziggie1984 ziggie1984 self-requested a review December 16, 2024 23:35
@yyforyongyu yyforyongyu force-pushed the yy-beat-itest-shuffle branch from d03a8b2 to dfa75eb Compare December 17, 2024 09:49
@yyforyongyu yyforyongyu changed the base branch from yy-beat-itest-shuffle to yy-waiting-on-merge December 17, 2024 10:45
Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this very laborious PR 👏 It is very time-expensive to study all these itest failures and fix them 🧹

// multisig funding output.
func runPsbtChanFunding(ht *lntest.HarnessTest, carol, dave *node.HarnessNode,
private bool, commitType lnrpc.CommitmentType) {
func runPsbtChanFundingWithNodes(ht *lntest.HarnessTest, carol,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: godoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -506,6 +505,7 @@ func wsTestCaseBiDirectionalSubscription(ht *lntest.HarnessTest) {
}
return
}
ht.Log("Finish writing message")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Finished writing message maybe also more detail which message ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -376,7 +377,7 @@ func runFeeEstimationTestCase(ht *lntest.HarnessTest,
)
feeReq = &routerrpc.RouteFeeRequest{
PaymentRequest: payReqs[0],
Timeout: 10,
Timeout: uint32(wait.PaymentTimeout.Seconds()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the flake here tho, we are just increasing the timeout to 60s ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the previous 10s will cause the payment to time out, updated the commit msg

@@ -237,17 +237,17 @@ func testUnannouncedChannels(ht *lntest.HarnessTest) {
ht.WaitForChannelOpenEvent(chanOpenUpdate)

// Alice should have 1 edge in her graph.
ht.AssertNumActiveEdges(alice, 1, true)
ht.AssertNumEdges(alice, 1, true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the windows build have no problem with this call tho, why are we not catching this nil case ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah when running in windows the behavior is quite different, which is why we skip some tests for the window build in the final PR, that being said, we don't really know why windows has this issue, and we now have somewhat a giant TODO list for the windows build.

@@ -269,6 +269,9 @@ func runPsbtChanFundingWithNodes(ht *lntest.HarnessTest, carol,
txHash := finalTx.TxHash()
block := ht.MineBlocksAndAssertNumTxes(6, 1)[0]
ht.AssertTxInBlock(block, txHash)

ht.AssertChannelActive(carol, chanPoint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should pack the AssertChannelActive and AssertChannelInGraph in a new call so we never forget these two ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good q - I think we need to do a proper cleanup of the assertion methods in lntest, to make them clear that which one is asserting the channeldb, and the graphdb. But think that will be something far in the future...

// `SendPaymentV2` case, where a payment amount of 300,000 sats is used and it
// tests sending three attempts: the first has 150,000 sats, the rest two have
// 75,000 sats. It returns the payment amt.
func (c *mppTestScenario) setupSendPaymentCase() btcutil.Amount {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

@@ -378,6 +378,33 @@ func (c *mppTestScenario) setupSendPaymentCase() btcutil.Amount {
// - 2nd attempt(75,000 sats): Alice->Dave->Bob: 155,000 sats.
// - 3rd attempt(75,000 sats): Alice->Carol->Eve->Bob: 155,000 sats.
//
// There is a case where the payment will fail due to the channel
// capacity not being updated in the graph, which has been seen many
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the channel capacity is a constant value so how can it be updated in the graph ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to be bandwidth

// Make sure the nodes know each other's channels if they are public.
if !p.Private {
// If the channels are private, make sure the channel participants know
// the relevant channel.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: channel => channels ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// - the local one will be swept.
// - the remote one will be marked as failed due to `testmempoolaccept`
// check.
// - the pending remote one will not be attempted due to it being
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you here talking about hte case after the commitment is confirmed (dangling anchor) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope - before the FC is confirmed, the uneconomical inputs will not be attempted since we check them when making the input sets.

//
// TODO(yy): Remove the following wait and use the direct call, then
// investigate the bug in the edge unifier.
var route *lnrpc.Route
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so eventually it shows up after some time the missing edge ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, which is weird, think it's most likely due to graph being cached and not updated in time.

So it's easier to get the logs and debug.
We previously didn't see this issue because we always have nodes being
over-funded.
So we know which open channel operation failed.
We need to mine an empty block as the tx may already have entered the
mempool. This should be fixed once we start using the sweeper to handle
the justice tx.
The reconnection will happen automatically when the nodes have a
channel, so we just ensure the connection instead of reconnecting
directly.
The test used 10s as the timeout value, which can easily cause a timeout
in a slow build so we increase it to 60s.
So we won't forget to assert the topology after opening a chain of
channels.
This is no longer needed since we don't have standby nodes, plus it's
causing panic in windows build due to `edge.Policy` being nil.
This has been seen in the itest which can lead to the node startup
failure,
```
2024-11-20 18:55:15.727 [INF] RPCS: Max websocket clients exceeded [25] - disconnecting client 127.0.0.1:57224
```
This is needed so we can have one place to fix the flakes found in the
MPP-related tests, which is fixed in the following commit.
We now make sure the channel participants have heard their private
channel when opening channels.
@yyforyongyu yyforyongyu merged commit 4db3060 into yy-waiting-on-merge Dec 18, 2024
21 of 23 checks passed
@yyforyongyu yyforyongyu deleted the yy-beat-itest-flakes branch December 18, 2024 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants