Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for using last_funding_locked tlv in channel_reestablish #3007

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

remyers
Copy link
Contributor

@remyers remyers commented Feb 14, 2025

This TLV is relevant for splice funding txs only.

When the my_current_funding_locked_txid TLV attribute confirms the latest funding tx we prune previous funding transaction similarly to receiving splice_locked from our peer for that txid.

When we receive your_last_funding_locked_txid that does not match our latest confirmed funding tx, then we know our peer did not receive our last splice_locked and retransmit it.

For public channels, nodes also retransmit splice_locked after channel_reestablish if they have not received announcement_signatures for the latest confirmed funding tx. This is needed to prompt our peer to also retransmit their own splice_locked and announcement_signatures.

For public channels, nodes respond to splice_locked with their own splice_locked if it has already been received, but only once. If any announcement_signatures have been sent then we do not do this to prevent exchanging an endless loop of splice_locked messages.

These changes ensures both sides will have exchanged splice_locked after a disconnect and will be relevant for simple taproot channels to exchange nonces.

If the last_funding_locked tlv is not set then nodes always send splice_locked on reconnect to preserve previous behavior for retransmitting splice_locked.

Note: Previous behavior was susceptible to a race condition if one node sent a channel update after channel_reestablish, but before receiving splice_locked from a peer that had confirmed the latest funding tx while offline.

cf. lightning/bolts#1223

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.04%. Comparing base (10edb42) to head (34209f2).
Report is 15 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3007      +/-   ##
==========================================
+ Coverage   86.02%   86.04%   +0.01%     
==========================================
  Files         228      228              
  Lines       20452    20816     +364     
  Branches      846      865      +19     
==========================================
+ Hits        17594    17911     +317     
- Misses       2858     2905      +47     
Files with missing lines Coverage Δ
...in/scala/fr/acinq/eclair/channel/Commitments.scala 96.36% <100.00%> (+0.06%) ⬆️
...in/scala/fr/acinq/eclair/channel/fsm/Channel.scala 83.90% <100.00%> (-0.17%) ⬇️
...ala/fr/acinq/eclair/wire/protocol/ChannelTlv.scala 100.00% <100.00%> (ø)
...q/eclair/wire/protocol/LightningMessageTypes.scala 99.19% <100.00%> (+0.04%) ⬆️

... and 7 files with indirect coverage changes

@remyers remyers force-pushed the 2502-current-funding-locked branch from 34209f2 to 5cd1827 Compare February 17, 2025 10:45
@remyers
Copy link
Contributor Author

remyers commented Feb 17, 2025

Some things to note during review:

1) The logic in Channel processing of the channel_reestablish message from our peer to determine whether or not to resend splice_locked has a second clause with the comment, "Resend if we have not received their remote announcement_signatures for our latest locked commitment". The logic here is as clear as I could make it, but could perhaps be made more clear. I'm open to suggestions. -- edit: hopefully fixed in force push to c1ebbbc.

  1. I defined lastFundingLockedTlv in Commitments rather than computing it inline in Channels because it's convenient for tests in the OfflineStateSpec tests.

@remyers remyers marked this pull request as ready for review February 17, 2025 10:51
This TLV is relevant for splice funding txs only.

When the `my_current_funding_locked_txid` TLV attribute confirms the latest funding tx we prune previous funding transaction similarly to receiving `splice_locked` from our peer for that txid.

When we receive `your_last_funding_locked_txid` that does not match our latest confirmed funding tx, then we know our peer did not receive our last `splice_locked` and retransmit it.

For public channels, nodes also retransmit `splice_locked` after `channel_reestablish` if they have not received `announcement_signatures` for the latest confirmed funding tx. This is needed to prompt our peer to also retransmit their own `splice_locked` and `announcement_signatures`.

For public channels, nodes respond to splice_locked with their own splice_locked if it has already been received, but only once. If any announcement_signatures have been sent then we do not do this to prevent exchanging an endless loop of splice_locked messages.

These changes ensures both sides will have exchanged `splice_locked` after a disconnect and will be relevant for simple taproot channels to exchange nonces.

If the last_funding_locked tlv is not set then nodes always send splice_locked on reconnect to preserve previous behavior for retransmitting `splice_locked`.

Note: Previous behavior was susceptible to a race condition if one node sent a channel update after `channel_reestablish`, but before receiving `splice_locked` from a peer that had confirmed the latest funding tx while offline.

cf. lightning/bolts#1223
@remyers remyers force-pushed the 2502-current-funding-locked branch from 2633d17 to c1ebbbc Compare February 17, 2025 21:20
@remyers
Copy link
Contributor Author

remyers commented Feb 17, 2025

In c1ebbbc I modified the PR to account for legacy channel_reestablish where the new last_funding_locked tlv is not set. I also made changes to prevent a potential splice_locked loop for public channels.

Note: the special cases for the funding tx without locked remote commit will be simplified in another PR.
…l_ready

When reconnecting in the Normal state, if a legacy channel does not have its remote funding status set to `Locked`, set and store it to migrate older channels.

After reconnecting in other states, the remote funding status will be set to `Locked` and stored when receiving `channel_ready` or deferred if still waiting for the funding tx to be confirmed locally.
Ensure we do not respond to the splice_locked we sent after channel_reestablish with the same splice_locked.

There can only be one signed channel announcement and so only one corresponding last splice_locked that could be re-sent after channel_reestablish.
@remyers remyers force-pushed the 2502-current-funding-locked branch from 4fc3cd2 to c2ad77d Compare February 25, 2025 10:36
// retransmit splice_locked to avoid a loop.
// NB: It is important both nodes retransmit splice_locked after reconnecting to ensure new Taproot nonces
// are exchanged for channel announcements.
val spliceLocked_opt = d.lastAnnouncement_opt.collect {
case ann if announcementSigsSent.isEmpty && commitment.shortChannelId_opt.contains(ann.shortChannelId) =>
case ann if !spliceLockedSentAfterReestablish.contains(ann.shortChannelId) && commitment.shortChannelId_opt.contains(ann.shortChannelId) =>
spliceLockedSentAfterReestablish = Some(ann.shortChannelId)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is necessary? Can splice_locked be sent again if it has already been sent? I thought it was safer to set spliceLockedSentAfterReestablish but we could take it out and the loop after channel_reestablish will still not occur.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can splice_locked be sent again if it has already been sent?

Yes it can, this is harmless. I think we should simplify this once you've introduced the spliceLockedSent map as requested in my other comment to do the following when we receive splice_locked:

  • check the spliceLockedSent map: if it contains this scid, we don't need to send anything (we may have already re-sent splice_locked on reestablish)
  • otherwise, check if the local funding status is Locked, and if it is, retransmit our splice_locked
  • that means we may unnecessarily retransmit (if their channel_reestablish said that they already received it), but it's harmless, simplifies the logic (no need to look at the announcement at all), and shouldn't happen often

Comment on lines +1349 to +1350
val lastLocalLocked_opt: Option[Commitment] = active.filter(_.localFundingStatus.isInstanceOf[LocalFundingStatus.Locked]).sortBy(_.fundingTxIndex).lastOption
val lastRemoteLocked_opt: Option[Commitment] = active.filter(c => c.remoteFundingStatus == RemoteFundingStatus.Locked).sortBy(_.fundingTxIndex).lastOption
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this with the other vals at the beginning of the class definition?

Comment on lines 146 to +148
val commitments1 = commitments.modify(_.remoteNextCommitInfo).setTo(Right(channelReady.nextPerCommitmentPoint))
// Set the remote status for all initial funding commitments to Locked. If there are RBF attempts, only one will be confirmed locally.
val commitments2 = commitments1.copy(active = commitments.active.collect { case c if c.fundingTxIndex == 0 => c.copy(remoteFundingStatus = RemoteFundingStatus.Locked) })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collect seems fishy here, because it drops other commitments. Even though it's safe with the way we use it, it's not the responsibility of this function to drop the other attempts, that is done when we locally see one of the funding txs confirm. It would be better to use a map, and cleaner to do all changes in a single call:

Suggested change
val commitments1 = commitments.modify(_.remoteNextCommitInfo).setTo(Right(channelReady.nextPerCommitmentPoint))
// Set the remote status for all initial funding commitments to Locked. If there are RBF attempts, only one will be confirmed locally.
val commitments2 = commitments1.copy(active = commitments.active.collect { case c if c.fundingTxIndex == 0 => c.copy(remoteFundingStatus = RemoteFundingStatus.Locked) })
val commitments1 = commitments.copy(
// Set the remote status for all initial funding commitments to Locked. If there are RBF attempts, only one can be confirmed locally.
active = commitments.active.map {
case c if c.fundingTxIndex == 0 => c.copy(remoteFundingStatus = RemoteFundingStatus.Locked)
case c => c
},
remoteNextCommitInfo = Right(channelReady.nextPerCommitmentPoint)
)

@@ -116,6 +118,7 @@ class WaitForChannelReadyStateSpec extends TestKitBaseClass with FixtureAnyFunSu
val listener = TestProbe()
alice.underlying.system.eventStream.subscribe(listener.ref, classOf[ChannelOpened])
bob2alice.forward(alice)
assert(alice.stateData.asInstanceOf[DATA_NORMAL].commitments.active.head.remoteFundingStatus == RemoteFundingStatus.Locked)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race here, forward doesn't wait for the message to be fully handled, so you should replace assert with awaitCond (same below).

Note that we don't need to do the same change in your updates to WaitForDualFundingReadyStateSpec because we always start with awaitCond(xxx.state == NORMAL) before this check, which ensures that we've done the transition already when we reach the assert.

Comment on lines +785 to +787
stay() using d.copy(commitments = d.commitments.copy(active = d.commitments.active.collect {
case c if c.fundingTxIndex == 0 => c.copy(remoteFundingStatus = RemoteFundingStatus.Locked)
}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point using collect is really dangerous: it will drop every other commitment (e.g. pending splices)! You must use map here to ensure you don't modify the other commitments:

Suggested change
stay() using d.copy(commitments = d.commitments.copy(active = d.commitments.active.collect {
case c if c.fundingTxIndex == 0 => c.copy(remoteFundingStatus = RemoteFundingStatus.Locked)
}))
stay() using d.copy(commitments = d.commitments.copy(active = d.commitments.active.map {
case c if c.fundingTxIndex == 0 => c.copy(remoteFundingStatus = RemoteFundingStatus.Locked)
case c => c
}))

Comment on lines +225 to +226
// we keep track of the splice_locked we sent after channel_reestablish to avoid sending it again
private var spliceLockedSentAfterReestablish: Option[RealShortChannelId] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need a Map of the splice_locked messages we sent to keep track of our local nonces for taproot, so this is a good opportunity to do it right now, otherwise we'll need to change this code anyway...

Can you turn this into private var spliceLockedSent = Map.empty[RealShortChannelId, SpliceLocked]? And implement a mecanism to prune the oldest RealShortChannelIds when we exceed 10 elements, like we do for announcement signatures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use TxId as the key instead of RealShortChannelId because we might have sent SpliceLocked for a zero conf channel?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's probably what we need here!

// retransmit splice_locked to avoid a loop.
// NB: It is important both nodes retransmit splice_locked after reconnecting to ensure new Taproot nonces
// are exchanged for channel announcements.
val spliceLocked_opt = d.lastAnnouncement_opt.collect {
case ann if announcementSigsSent.isEmpty && commitment.shortChannelId_opt.contains(ann.shortChannelId) =>
case ann if !spliceLockedSentAfterReestablish.contains(ann.shortChannelId) && commitment.shortChannelId_opt.contains(ann.shortChannelId) =>
spliceLockedSentAfterReestablish = Some(ann.shortChannelId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can splice_locked be sent again if it has already been sent?

Yes it can, this is harmless. I think we should simplify this once you've introduced the spliceLockedSent map as requested in my other comment to do the following when we receive splice_locked:

  • check the spliceLockedSent map: if it contains this scid, we don't need to send anything (we may have already re-sent splice_locked on reestablish)
  • otherwise, check if the local funding status is Locked, and if it is, retransmit our splice_locked
  • that means we may unnecessarily retransmit (if their channel_reestablish said that they already received it), but it's harmless, simplifies the logic (no need to look at the announcement at all), and shouldn't happen often

@@ -2227,13 +2245,19 @@ class Channel(val nodeParams: NodeParams, val wallet: OnChainChannelFunder with
}
case _ => Set.empty
}
val lastFundingLockedTlvs: Set[ChannelReestablishTlv] =
if (d.commitments.params.remoteParams.initFeatures.hasFeature(Features.SplicePrototype) && d.commitments.params.localParams.initFeatures.hasFeature(Features.SplicePrototype)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to gate this on init features, we're using odd TLVs which should be ignored by nodes that don't understand them. It's simpler to always send those TLVs and they may be used outside of splicing when lnd folks for taproot channel announcements (which will potentially require channel_ready retransmission).

It's only helpful to look at init features when we receive channel_reestablish, because if our peer supports splicing and does NOT set your_last_funding_locked that means we MUST retransmit channel_ready or splice_locked, regardless of other checks around the commitment_number.

By the way, we are missing channel_ready retransmission based on those TLVs: the condition line 2350 must be updated to always retransmit channel_ready if d.commitments.params.remoteParams.initFeatures.hasFeature(Features.SplicePrototype) && channelReestablish.yourLastFundingLocked_opt.isEmpty.

Comment on lines +2398 to +2402
// Prune previous funding transactions and RBF attempts if we already sent splice_locked for the last funding
// transaction confirmed by our counterparty; we either missed their splice_locked or it confirmed while disconnected.
val commitments1: Commitments = channelReestablish.myCurrentFundingLocked_opt.flatMap(myCurrentFundingLocked =>
d.commitments.updateRemoteFundingStatus(myCurrentFundingLocked, d.lastAnnouncedFundingTxId_opt).toOption.map(_._1))
.getOrElse(d.commitments)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it dangerous to have multiple places where we update the commitments and rely on numbering variable names, it requires scanning the whole block of code to ensure we're using the right version...can you instead move this change below, where we do the discardUnsignedUpdates, and chain the two updates like this:

// Prune previous funding transactions and RBF attempts if we already sent splice_locked for the last funding
// transaction that is also locked by our counterparty; we either missed their splice_locked or it confirmed
// while disconnected.
val commitments1: Commitments = channelReestablish.myCurrentFundingLocked_opt
  .flatMap(remoteFundingTxLocked => d.commitments.updateRemoteFundingStatus(remoteFundingTxLocked, d.lastAnnouncedFundingTxId_opt).toOption.map(_._1))
  .getOrElse(d.commitments)
  // We then clean up unsigned updates that haven't been received before the disconnection.
  .discardUnsignedUpdates()

Comment on lines +2405 to +2407
// 1) If they did not receive our last splice_locked;
// 2) or the last splice_locked they received is different from what we sent;
// 3) or (public channels only) we have not received their announcement_signatures for our latest locked commitment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite confusing, because the second statement is contained in the first one: if the last splice_locked they received doesn't match what we sent, it means they did not receive our last splice_locked?

Suggested change
// 1) If they did not receive our last splice_locked;
// 2) or the last splice_locked they received is different from what we sent;
// 3) or (public channels only) we have not received their announcement_signatures for our latest locked commitment.
// 1) If they did not receive our last splice_locked
// 2) or if this is a public channel and we have not created the channel_announcement yet

Comment on lines +2415 to +2416
.collect { case c if !channelReestablish.yourLastFundingLocked_opt.contains(c.fundingTxId) ||
(commitments1.announceChannel && d.lastAnnouncement_opt.forall(ann => !c.shortChannelId_opt.contains(ann.shortChannelId))) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those conditions are quite hard to parse, it's helpful to decompose them with comments:

          val spliceLocked = commitments1.lastLocalLocked_opt match {
            case None => None
            // We only send splice_locked for splice transactions.
            case Some(c) if c.fundingTxIndex == 0 => None
            case Some(c) =>
              // If our peer has not received our splice_locked, we retransmit it.
              val notReceivedByRemote = !channelReestablish.yourLastFundingLocked_opt.contains(c.fundingTxId)
              // If this is a public channel and we haven't announced the splice, we retransmit our splice_locked and
              // will exchange announcement_signatures afterwards.
              val notAnnouncedYet = commitments1.announceChannel && d.lastAnnouncement_opt.forall(ann => !c.shortChannelId_opt.contains(ann.shortChannelId))
              if (notReceivedByRemote || notAnnouncedYet) {
                log.debug("re-sending splice_locked for fundingTxId={}", c.fundingTxId)
                spliceLockedSentAfterReestablish = c.shortChannelId_opt
                Some(SpliceLocked(d.channelId, c.fundingTxId))
              } else {
                None
              }
          }

Copy link
Member

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished reviewing the tests, they're really good!

alice2bob.forward(bob)
alice2bob.expectNoMessage(100 millis)
assert(bob2alice.expectMsgType[SpliceLocked].fundingTxId == spliceTx.txid)
assert(bob2alice.expectMsgType[SpliceLocked].fundingTxId == spliceTx.txid) // Bob resends `splice_locked` in response to Alice's `splice_locked` after channel_reestablish.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to verify that this is really in response to Alice's splice_locked, we should add bob2alice.expectNoMessage(100 millis) right after reconnect(f).

// Because `your_last_funding_locked_txid` from Bob matches the last `splice_locked` txid sent by Alice; there is no need
// for Alice to resend `splice_locked`. Alice processes the `my_current_funding_locked` from Bob as if she received
// `splice_locked` from Bob and prunes the initial funding commitment.
assert(alice.stateData.asInstanceOf[ChannelDataWithCommitments].commitments.active.size == 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a race condition here between the test code and Alice's handler for the channel_reestablish message:

Suggested change
assert(alice.stateData.asInstanceOf[ChannelDataWithCommitments].commitments.active.size == 1)
awaitCond(alice.stateData.asInstanceOf[ChannelDataWithCommitments].commitments.active.size == 1)

Comment on lines +2526 to +2535
// If either node receives `splice_locked` again before reconnecting, reply with `announcement_signatures` and not
// `splice_locked` to avoid a loop.
alice2bob.forward(bob, aliceSpliceLocked)
bob2alice.forward(alice, bobSpliceLocked)
alice2bob.expectMsgType[AnnouncementSignatures]
alice2bob.forward(bob)
bob2alice.expectMsgType[AnnouncementSignatures]
bob2alice.forward(alice)
alice2bob.expectNoMessage(100 millis)
bob2alice.expectNoMessage(100 millis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, that's a very good test because this is not the expected behavior! We've already sent announcement_signatures so we definitely should not re-send it here, which means we're missing a check on the announcementSigsSent in the handler for the remote SpliceLocked message in Channel.scala. I didn't notice it reading the code, it's nice that a test will catch this invalid behavior now ;)

// `splice_locked` to avoid a loop.
alice2bob.forward(bob, aliceSpliceLocked)
bob2alice.forward(alice, bobSpliceLocked)
alice2bob.expectMsgType[AnnouncementSignatures]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this should instead be alice2bob.expectNoMessage(100 millis) and bob2alice.expectNoMessage(100 millis)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants