Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DecommissionStreamingErr is waiting for non-existing log lines #9501

Open
2 tasks
fruch opened this issue Dec 8, 2024 · 3 comments
Open
2 tasks

DecommissionStreamingErr is waiting for non-existing log lines #9501

fruch opened this issue Dec 8, 2024 · 3 comments

Comments

@fruch
Copy link
Contributor

fruch commented Dec 8, 2024

Packages

Scylla version: 6.3.0~dev-20241206.7e2875d6489d with build-id 5227dd2a3fce4d2beb83ec6c17d47ad2e8ba6f5c

Kernel Version: 6.8.0-1019-aws

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

no all of the log mention DecommissionStreamingErr nemesis does exist when raft topology is enabled,

see https://github.com/scylladb/scylladb/blame/f744007e13657491082ccf7023d07947f2b23ea1/service/storage_service.cc#L3711

all the "DECOMMISSIONING: .*" are in the branch used when raft topology is disable,

ABORT_DECOMMISSION_LOG_PATTERNS: Iterable[MessagePosition] = [
    MessagePosition("api - decommission", LogPosition.BEGIN),
    MessagePosition("DECOMMISSIONING: unbootstrap starts", LogPosition.BEGIN),
    MessagePosition("DECOMMISSIONING: unbootstrap done", LogPosition.END),
    MessagePosition("becoming a group 0 non-voter", LogPosition.END),
    MessagePosition("became a group 0 non-voter", LogPosition.END),
    MessagePosition("leaving token ring", LogPosition.END),
    MessagePosition("left token ring", LogPosition.END),
    MessagePosition("raft_topology - decommission: waiting for completion", LogPosition.BEGIN),
    MessagePosition("repair - decommission_with_repair", LogPosition.END)
]

or those logs should be removed, or to put under appropriate logic

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 4 nodes (i3en.2xlarge)

Scylla Nodes used in this run:

  • longevity-twcs-48h-master-db-node-00441d41-9 (54.194.53.160 | 10.4.9.27) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-8 (34.244.107.75 | 10.4.8.101) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-7 (63.33.71.172 | 10.4.10.42) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-6 (52.211.182.14 | 10.4.10.239) (shards: -1)
  • longevity-twcs-48h-master-db-node-00441d41-5 (34.251.145.212 | 10.4.8.74) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-4 (54.217.134.87 | 10.4.8.135) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-3 (18.200.239.48 | 10.4.8.44) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-2 (34.243.168.32 | 10.4.11.104) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-10 (34.245.46.14 | 10.4.11.13) (shards: 7)
  • longevity-twcs-48h-master-db-node-00441d41-1 (54.78.241.215 | 10.4.9.219) (shards: 7)

OS / Image: ami-0c7b4b0835c9342f7 (aws: undefined_region)

Test: longevity-twcs-48h-test
Test id: 00441d41-0edb-47a9-bbab-8f9e7a5b5821
Test name: scylla-master/tier1/longevity-twcs-48h-test
Test method: longevity_twcs_test.TWCSLongevityTest.test_custom_time
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 00441d41-0edb-47a9-bbab-8f9e7a5b5821
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 00441d41-0edb-47a9-bbab-8f9e7a5b5821

Logs:

Jenkins job URL
Argus

@fruch fruch removed their assignment Dec 8, 2024
@fruch
Copy link
Contributor Author

fruch commented Dec 8, 2024

@temichus @aleksbykov, seems like this logic is there for almost a year now.

can all of those log be validated to actually exist in the code ? and when they should be waited for ?

@aleksbykov aleksbykov self-assigned this Dec 10, 2024
@aleksbykov
Copy link
Contributor

@temichus @aleksbykov, seems like this logic is there for almost a year now.

can all of those log be validated to actually exist in the code ? and when they should be waited for ?

The reason was , that jobs are running with/without raft and for different versions. Now, raft is always enabled log messages will be set to actual state

@fruch
Copy link
Contributor Author

fruch commented Dec 10, 2024

@temichus @aleksbykov, seems like this logic is there for almost a year now.

can all of those log be validated to actually exist in the code ? and when they should be waited for ?

The reason was , that jobs are running with/without raft and for different versions. Now, raft is always enabled log messages will be set to actual state

But the code didn't split to select only from the correct group of logs according to if the raft is on or not, and if it did, it didn't reach 6.1 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants