Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to determine which "vmlinux" to use when working with kernel dumps? #41

Open
prakashsurya opened this issue Oct 14, 2019 · 1 comment · May be fixed by #286
Open

how to determine which "vmlinux" to use when working with kernel dumps? #41

prakashsurya opened this issue Oct 14, 2019 · 1 comment · May be fixed by #286
Labels
enhancement New feature or request

Comments

@prakashsurya
Copy link
Contributor

Currently, it's possible to debug a kernel crash dump with SDB by running something like this:

$ sdb /usr/lib/debug/boot/vmlinux-4.15.0-65-generic /var/crash/201910142024/dump.201910142024 -s /lib/modules/4.15.0-65-generic

The problem with this approach, though, is there's no guarantee that the crash dump (dump.201910142024) should be used with the "vmlinux" file (vmlinux-4.15.0-65-generic) passed to SDB.

For example, if multiple kernels are installed in /usr/lib/debug/boot (e.g. because multiple kernel packages are installed), how should the user determine which "vmlinux" to use when starting SDB? Additionally, what could happen if the wrong "vmlinux" file was specified? My guess is SDB my break in silent and subtle ways (i.e. it may show the wrong data, without outright failing).

I think we should improve the usability of SDB and harden it against incorrect usage, by not requiring the user to pass in the "vmlinux". For example, the dump contains some information about the kernel version that it needs:

$ sudo dd if=dump.201910142024 bs=1M count=1 2>/dev/null | strings | grep OSRELEASE
OSRELEASE=4.15.0-65-generic

Thus, we could modify SDB to look for this OSRELEASE information in the dump, and then automatically search the root filesystem for the "vmlinux" file that corresponds to that dump. Then, the interface for SDB could more simply be:

$ sdb /var/crash/201910142024/dump.201910142024

Which would prevent any possibility for the user to incorrectly pass in the wrong "vmlinux" file.

@sdimitro sdimitro added the enhancement New feature or request label Oct 14, 2019
@prakashsurya
Copy link
Contributor Author

Discussing this with @osandov, and how to prevent incorrect usage, where the user specifies a vmlinux file that doesn't correspond to the vmcore file..

Omar suggested that the ideal solution would be to leverage the "build-id" that is generated during compilation. For example, if the "build-id" for the kernel that generated the vmcore was embedded in the vmcore, then we could read this "build-id" value, and then use it to determine the correct vmlinux to use. If the two "build-id" values from the vmcore and vmlinux files did not match, we could throw an error.

If we did it this way, this would ensure the solution works even for kernel's of the same version, but built with different compilation flags, features, etc. The problem though, is the vmcore file does not currently contain the "build-id" information. Thus, to adopt this approach, we'd first need to extend the kernel and/or vmcore file format, to embed the "build-id" into the vmcore file.

Until the vmcore file contains the "build-id", we think it'd be reasonable to leverage the "OSRELEASE" value as I mention in the bug report. Obviously this may not work for all cases, e.g. for two kernels of the same version but with different features enabled, but it would likely be better than nothing.

We'd also want to enable a way to disable any verification of the "OSRELEASE", in cases where the operator wants to ignore the check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
2 participants