Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: htlcswitch may crash upon shutdown because of a race in WaitGroup #9308

Open
starius opened this issue Nov 26, 2024 · 0 comments · May be fixed by #9140
Open

[bug]: htlcswitch may crash upon shutdown because of a race in WaitGroup #9308

starius opened this issue Nov 26, 2024 · 0 comments · May be fixed by #9140
Assignees
Labels
bug Unintended code behaviour needs triage

Comments

@starius
Copy link
Collaborator

starius commented Nov 26, 2024

Background

The htlcswitch package uses sync.WaitGroup in a manner that allows parallel Add(1) and Wait() calls. Specifically:

  • GetAttemptResult calls wg.Add(1) before launching a goroutine,
  • Stop() calls wg.Done().

This is explicitly prohibited by the WaitGroup documentation.

Note that calls with a positive delta that occur when the counter is zero must happen before a Wait.

Your environment

  • version of lnd: master (fbeab72)
  • which operating system (uname -a on *Nix): Linux host 6.1.75-1.qubes.fc32.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 06:49:25 CET 2024 x86_64 GNU/Linux
  • version of btcd, bitcoind, or other backend: not used
  • any other relevant environment details

Steps to reproduce

I reproduced this in branch reproduce-race in my fork. I added test TestSwitchGetAttemptResultStress which runs many GetAttemptResult() calls sequentially and a Stop() call in parallel.

How to run it. Checkout the branch and run the following commands:

$ cd htlcswitch
$ go test -race -run TestSwitchGetAttemptResultStress

Expected behaviour

I expect TestSwitchGetAttemptResultStress not to crash, i.e. that there are no race condition between GetAttemptResult and Stop methods.

Actual behaviour

The stress test crashed

htlcswitch$ go test -race -run TestSwitchGetAttemptResultStress
==================
WARNING: DATA RACE
Read at 0x00c0001d4118 by goroutine 21:
  runtime.raceread()
      <autogenerated>:1 +0x1e
  github.com/lightningnetwork/lnd/htlcswitch.(*Switch).GetAttemptResult()
      /home/user/lnd/htlcswitch/switch.go:496 +0x1c4
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress.func1()
      /home/user/lnd/htlcswitch/switch_test.go:3211 +0x168

Previous write at 0x00c0001d4118 by goroutine 22:
  runtime.racewrite()
      <autogenerated>:1 +0x1e
  github.com/lightningnetwork/lnd/htlcswitch.(*Switch).Stop()
      /home/user/lnd/htlcswitch/switch.go:1995 +0x1e9
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress.func2()
      /home/user/lnd/htlcswitch/switch_test.go:3232 +0xae

Goroutine 21 (running) created at:
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress()
      /home/user/lnd/htlcswitch/switch_test.go:3203 +0x356
  testing.tRunner()
      /home/user/.goroot/src/testing/testing.go:1690 +0x226
  testing.(*T).Run.gowrap1()
      /home/user/.goroot/src/testing/testing.go:1743 +0x44

Goroutine 22 (finished) created at:
  github.com/lightningnetwork/lnd/htlcswitch.TestSwitchGetAttemptResultStress()
      /home/user/lnd/htlcswitch/switch_test.go:3222 +0x45c
  testing.tRunner()
      /home/user/.goroot/src/testing/testing.go:1690 +0x226
  testing.(*T).Run.gowrap1()
      /home/user/.goroot/src/testing/testing.go:1743 +0x44
==================
--- FAIL: TestSwitchGetAttemptResultStress (0.08s)
    testing.go:1399: race detected during execution of test
FAIL
exit status 1
FAIL    github.com/lightningnetwork/lnd/htlcswitch      0.380s

@starius starius added bug Unintended code behaviour needs triage labels Nov 26, 2024
@starius starius linked a pull request Nov 26, 2024 that will close this issue
8 tasks
@saubyk saubyk linked a pull request Nov 26, 2024 that will close this issue
8 tasks
@saubyk saubyk added this to lnd v0.19 Nov 27, 2024
@saubyk saubyk moved this to In Progress in lnd v0.19 Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour needs triage
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

1 participant