Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for go races and miner crashes #42

Merged
merged 4 commits into from
Feb 22, 2024

Conversation

lemois-1337
Copy link
Collaborator

@lemois-1337 lemois-1337 commented Feb 22, 2024

This is a stability improvement pull request. In the Go node implementation there are several races which are detected by go -race and have been fixed.

1. Fixed race in gRPC client send, recv and close handler:

Write at 0x00c01e77fc90 by goroutine 6616:
  google.golang.org/grpc.(*clientStream).CloseSend()

Previous read at 0x00c01e77fc90 by goroutine 6615:
  google.golang.org/grpc.(*clientStream).SendMsg()

2. Fixed race in gRPC client error handler:

This is a workaround for current implementation to set RLock for non-atomic error handler, some parts are atomic already. It is also necessary to store new value isError otherwise concurrent connections can block each other but without lock no atomic read and write is guaranteed. The gRPC client implementation with a shared error handler between send and recv should be reworked.

3. Fixed karlsenminer crash:

The miner routines are asynchronous and in stability tests the miner regularly crashes when you set very low limit of blocks to be mined. If running the miner karlsenminer --simnet -s localhost:9000 --mine-when-not-synced --miningaddr karlsensim:qre0nh3wemr8c4zqlwsd5tch78dx93mmx2u2pvrugheny7ja8k6t682pvpx83 --numblocks 1, it will fail with:

2024-02-22 12:18:23.209 [CRT] KSMN: Exiting: Fatal error in goroutine `blocksLoop 7`: Writing to the logger when it's not running
panic: Writing to the logger when it's not running

goroutine 39 [running]:
github.com/karlsen-network/karlsend/infrastructure/logger.(*Logger).printf(0xfe4180, 0x5, {0xad163c, 0x4}, {0xad725b, 0xb}, {0xc0000837c0, 0x1, 0x1})
	/home/runner/work/karlsend/infrastructure/logger/logs.go:185 +0x385
github.com/karlsen-network/karlsend/infrastructure/logger.(*Logger).Writef(...)
	/home/runner/work/karlsend/infrastructure/logger/logs.go:144
github.com/karlsen-network/karlsend/infrastructure/logger.(*Logger).Criticalf(0xc0000b61e0?, {0xad725b?, 0x0?}, {0xc0000837c0?, 0x0?, 0x0?})
	/home/runner/work/karlsend/infrastructure/logger/logs.go:127 +0x4c
github.com/karlsen-network/karlsend/util/panics.exit.func1()
	/home/runner/work/karlsend/util/panics/panics.go:58 +0xa6
created by github.com/karlsen-network/karlsend/util/panics.exit in goroutine 28
	/home/runner/work/karlsend/util/panics/panics.go:57 +0x14a

Root cause for this issue is that in simnet with very low difficulty we can mine and submit blocks very fast and while submitting an already mined block, the miner will find several others. It can be easily reproduced via:

2024-02-22 12:18:22.686 [INF] KSMN: Version 1.1.0
2024-02-22 12:18:22.693 [INF] KSMN: Connected to localhost:9000
2024-02-22 12:18:22.693 [INF] KSMN: Minimum average time per 10 blocks: 5ms, smaller minimum time per block: 50µs
2024-02-22 12:18:22.693 [INF] KSMN: Waiting for the initial template
2024-02-22 12:18:23.194 [INF] KSMN: Found block a05f1d311b3322287a4155ad0afc737b58898353b5d3aadff0000b3fb6ff19e0 with parents [c626b2c1e22b0d020a6e9da789402dfb1440c5085b465d2ba5d4ac327d813afe]
2024-02-22 12:18:23.195 [INF] KSMN: Found block c073e0f9741f075d481d67680360a9ae4f5f3649a1ceef029240c4a04415703b with parents [c626b2c1e22b0d020a6e9da789402dfb1440c5085b465d2ba5d4ac327d813afe]
2024-02-22 12:18:23.195 [INF] KSMN: Submitting block a05f1d311b3322287a4155ad0afc737b58898353b5d3aadff0000b3fb6ff19e0 to localhost:9000
2024-02-22 12:18:23.196 [INF] KSMN: Found block 4d77744b4e900372327c65fde4dd7929af4f603ad886ef06d58218216e64a4c6 with parents [c626b2c1e22b0d020a6e9da789402dfb1440c5085b465d2ba5d4ac327d813afe]
2024-02-22 12:18:23.196 [INF] KSMN: Found block 6f2ff9fc3afd14876a0d0fc8ad12971d01e46783ed111cde2835c37b02886fe4 with parents [c626b2c1e22b0d020a6e9da789402dfb1440c5085b465d2ba5d4ac327d813afe]
2024-02-22 12:18:23.196 [INF] KSMN: Found block 7fc9a1ce7004433309f90177cb4730044b681506379353f1c92e0e5550a40381 with parents [c626b2c1e22b0d020a6e9da789402dfb1440c5085b465d2ba5d4ac327d813afe]

The limit was only applied for submitting blocks, but not for mining blocks. It is fixed now.

1. Fixed race in gRPC client send/recv/close handler:

Write at 0x00c01e77fc90 by goroutine 6616:
  google.golang.org/grpc.(*clientStream).CloseSend()

Previous read at 0x00c01e77fc90 by goroutine 6615:
  google.golang.org/grpc.(*clientStream).SendMsg()

2. Fixed race in gRPC client error handler:

This is a workaround for current implementation to set RLock for
non-atomic error handler, some parts are atomic already.

It is also necessary to store new value 'isError' otherwise
concurrent connections can block each other but without lock no
atomic read/write is guaranteed. The gRPC client implementation
with a shared error handler between send/recv should be reworked.
The miner routines are asynchroneous and in stability tests the
miner regularly crashes when you set very low limit of blocks to
be mined, it will fail with:

2024-02-22 12:18:23.209 [CRT] KSMN: Exiting: Fatal error in goroutine `blocksLoop 7`: Writing to the logger when it's not running
panic: Writing to the logger when it's not running

goroutine 39 [running]:
github.com/karlsen-network/karlsend/infrastructure/logger.(*Logger).printf(0xfe4180, 0x5, {0xad163c, 0x4}, {0xad725b, 0xb}, {0xc0000837c0, 0x1, 0x1})
	/home/runner/work/karlsend/infrastructure/logger/logs.go:185 +0x385
github.com/karlsen-network/karlsend/infrastructure/logger.(*Logger).Writef(...)
	/home/runner/work/karlsend/infrastructure/logger/logs.go:144
github.com/karlsen-network/karlsend/infrastructure/logger.(*Logger).Criticalf(0xc0000b61e0?, {0xad725b?, 0x0?}, {0xc0000837c0?, 0x0?, 0x0?})
	/home/runner/work/karlsend/infrastructure/logger/logs.go:127 +0x4c
github.com/karlsen-network/karlsend/util/panics.exit.func1()
	/home/runner/work/karlsend/util/panics/panics.go:58 +0xa6
created by github.com/karlsen-network/karlsend/util/panics.exit in goroutine 28
	/home/runner/work/karlsend/util/panics/panics.go:57 +0x14a

Root cause for this issue is that in simnet with very low
difficulty we can mine and submit blocks very fast and while
submitting an already mined block, the miner will find several
others.

The limit was only applied for submitting blocks, but not for
mining blocks. It is fixed now.
Copy link

codecov bot commented Feb 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (feb42c2) 61.42% compared to head (7fe6504) 61.43%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #42      +/-   ##
==========================================
+ Coverage   61.42%   61.43%   +0.01%     
==========================================
  Files         646      646              
  Lines       31882    31898      +16     
==========================================
+ Hits        19583    19597      +14     
- Misses       9447     9449       +2     
  Partials     2852     2852              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lemois-1337
Copy link
Collaborator Author

  1. Inherited bug from kaspad and needs upstream discussion as it is a workaround for a buggy implementation.
  2. Inherited bug from kaspad and needs upstream discussion as it is a workaround for a buggy implementation.
  3. Inherited bug from kaspad and should be fixed in upstream as well as it stabilizes stability-tests.

@lemois-1337 lemois-1337 merged commit 7e93264 into karlsen-network:master Feb 22, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant