-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buggy or uncommon ACPI tables break xenopsd-xc
startup (and thus XAPI
's startup)
#6240
Comments
Thanks, this was enabled (by default once no failures were found in the lab) to find buggy machines/buggy code ahead of enabling NUMA support in 6074aef (2023). The code could be changed to disable NUMA support ( For now reverting the commit should make it work again for the affected user |
Although reverting the commit might cause xenopsd to call Lazy.force from a thread, which will require the ocaml/ocaml@ed4695a backported OCaml runtime commit to avoid 'Lazy.force' from causing a segfault. |
NUMAPlacement.make should return an option, IMO. Tentative commit: 5f49ba1 |
btw here is the ACPI spec https://uefi.org/specs/ACPI/6.5_A/06_Device_Configuration.html#system-locality-information-table that says |
@stormi would be useful to see the full If that is indeed an unreachable NUMA node (e.g. one that has no online CPUs) then we can safely ignore it and still use the rest of the distance matrix. |
Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
The tentative commit from yesterday fixed the issue for the user.
I've got a xen-bugtool from the user so I'll try to find the data you asked. |
acpidump.out.txt @psafont do you need tests from the user with your latest patch? |
OK, so maybe not a buggy ACPI table after all. If node 1 has no memory then we should be ignoring it. Probably quite rare to have such a system (if it has no memory, then what is its purpose?) |
A node is a node whether it has memory or not. (Mis)configurations like this were very common in the AMD MagnyCours days, and one OEM shipped a lot of systems with DIMMs populated in a less-optimal configuration. |
FWIW here is the decoded SLIT and SRAT table using |
ah can we also have NUMA nodes with no CPUs in it?
|
xenopsd-xc
startup (and thus XAPI
's startup?)xenopsd-xc
startup (and thus XAPI
's startup)
Yes. If you e.g. downcore to a single CPU, and the package has multiple memory controllers, then you'll get a NUMA node with IO and RAM, but no (online) CPUs. Also, manually playing with Furthermore, In principle, a memory controller more hops than usual away from the cores could manifest like this, but I'm not aware of a platform which looks like this naturally. What's weird here is that the SLIT declares a single locality, while the SRAT lists two proximity domains. I expect Xen has filled in the blanks with |
If they're up for it, sure. there was a bug in the previous one |
Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
…nce matrices Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
…nce matrices Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
…nce matrices Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
…nce matrices Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
…nce matrices Instead disable NUMA for the host Fixes xapi-project#6240 Signed-off-by: Pau Ruiz Safont <[email protected]>
A user of XCP-ng has a regression in XCP-ng 8.3 when compared to XCP-ng 8.2. XAPI doesn't start completely (the last step started is
[starting up database engine]
, which fails on aUnix.ECONNREFUSED
forput_import_metadata
).Related or not,
xenopsd
also fails:It apparently gets buggy distances from the ACPI tables, raises
Invalid_argument("NUMA distance from node to itself must be 10: 4294967295")
, and exits.Is
xenopsd
too strict here? What would have to be done to solve it?Reference forum thread: https://xcp-ng.org/forum/topic/10244/8-3-network-troubles/12
The text was updated successfully, but these errors were encountered: