Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crush root X not known #44

Open
debuggercz opened this issue Jun 2, 2024 · 8 comments
Open

crush root X not known #44

debuggercz opened this issue Jun 2, 2024 · 8 comments

Comments

@debuggercz
Copy link

Hi!

We are using two regions in the CRUSH map, where one is for new servers and the other is for old servers. Both regions are under a single root default. Unfortunately, we are unable to start the balancer. It seems that for some reason, it does not consider the root default and fails at that point.

Attached, you will find the output of "ceph osd tree" and below is the error encountered when attempting to start it. Does anyone have any idea what could be wrong?

We are running 16.2.12 version (code name Pacific).

root@osd032 ~/ceph-balancer # git pull
Already up to date.
root@osd032 ~/ceph-balancer # python3 ./placementoptimizer.py -v balance --max-pg-moves 16
[2024-06-02 13:28:06,383] gathering cluster state via ceph api...
[2024-06-02 13:28:15,914] running pg balancer
Traceback (most recent call last):
  File "./placementoptimizer.py", line 5496, in <module>
    exit(main())
  File "./placementoptimizer.py", line 5490, in main
    run()
  File "./placementoptimizer.py", line 5454, in <lambda>
    run = lambda: balance(args, state)
  File "./placementoptimizer.py", line 4618, in balance
    need_simulation=True)
  File "./placementoptimizer.py", line 3265, in __init__
    self.init_analyzer.analyze(self)
  File "./placementoptimizer.py", line 4288, in analyze
    self._update_stats()
  File "./placementoptimizer.py", line 4385, in _update_stats
    avail, limit_osd = self.pg_mappings.get_pool_max_avail_limited(poolid)
  File "./placementoptimizer.py", line 3974, in get_pool_max_avail_limited
    osdid_candidates = self.cluster.candidates_for_pool(poolid).keys()
  File "./placementoptimizer.py", line 2324, in candidates_for_pool
    candidates = self.candidates_for_root(root_name)
  File "./placementoptimizer.py", line 2286, in candidates_for_root
    raise RuntimeError(f"crush root {root_name} not known?")
RuntimeError: crush root oldservers not known?

ceph_osd_tree.txt

Thank you very much for any advice and help.

Michal

@TheJJ
Copy link
Owner

TheJJ commented Jun 2, 2024

currently just root buckets work as "take" base, support for any bucket is not yet implemented, unfortunately.

@debuggercz
Copy link
Author

debuggercz commented Jun 3, 2024 via email

@TheJJ
Copy link
Owner

TheJJ commented Jun 4, 2024

hmm, strange. so there is a bucked "oldservers" of type root?
i think a state dump would be helpful here, too. can you get one, please?

@debuggercz
Copy link
Author

Hi Jonas,

"oldservers" and "newservers" are of the type region, but both are descendants of a root. I attached the output from "ceph osd tree" to the first post, where it can be seen. Do you need the output from "ceph osd crush dump"?

Thx
Michal

@TheJJ
Copy link
Owner

TheJJ commented Jun 4, 2024

yea that's the problem. they need to be of type root, sorry... but you can probably collect the "root" trees anyway by changing "root" to "region" here as a dirty hack:

if bucket["type_name"] == "root":   # change to region
    bucket_root_ids.append(id)

i have to rewrite the logic for root node collection one day so we can support any top node type...

@debuggercz
Copy link
Author

debuggercz commented Jun 4, 2024 via email

@TheJJ
Copy link
Owner

TheJJ commented Jun 6, 2024

ah, apparently you also have default with type root.
no idea what will break, but please try:

 if bucket["type_name"] in ("region", "root"):

@debuggercz
Copy link
Author

debuggercz commented Jun 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants