Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create probe on symbols with duplicate entries in kallsyms (multiple addresses) in recent kernels #3653

Closed
rafaeldtinoco opened this issue Nov 1, 2023 · 13 comments · Fixed by #3802
Assignees
Labels
Milestone

Comments

@rafaeldtinoco
Copy link
Contributor

rafaeldtinoco commented Nov 1, 2023

Description

sudo ./dist/tracee --install-path /tmp/tracee \
--cache cache-type=mem --cache mem-cache-size=512 --output option:sort-events \
--output none --output option:parse-arguments --events openat
{"level":"warn","ts":1698863276.173086,"msg":"libbpf: prog 'trace_load_elf_phdrs': failed to create kprobe 'load_elf_phdrs+0x0' perf event: Cannot assign requested address"}

kprobe attachment fails in some "recent" (and LTS) kernels.

Output of tracee version:

Tracee version: "v0.18.0-rc-106-g7ba03ff68"

Output of uname -a:

Linux rugged 6.1.60-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 25 Oct 2023 11:10:15 +0000 x86_64 GNU/Linux

Additional details

@rafaeldtinoco rafaeldtinoco added this to the v0.19.0 milestone Nov 1, 2023
@rafaeldtinoco

This comment was marked as outdated.

@rafaeldtinoco

This comment was marked as outdated.

@rafaeldtinoco rafaeldtinoco removed this from the v0.19.0 milestone Nov 1, 2023
@rafaeldtinoco
Copy link
Contributor Author

I also got this with:

Linux rugged 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux

which makes me think it could be a kconfig related option (rather than a broken kernel or build).

@geyslan
Copy link
Member

geyslan commented Nov 1, 2023

I also got this with:

Linux rugged 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux

which makes me think it could be a kconfig related option (rather than a broken kernel or build).

But it should log a warning if is a kconfig miss.

@rafaeldtinoco

This comment was marked as outdated.

@rafaeldtinoco rafaeldtinoco changed the title Tracee cant create kprobe on "load_elf_phdrs" during initialization Cannot create probe on symbols with duplicate entries in kallsyms (multiple addresses) in recent kernels Nov 1, 2023
@rafaeldtinoco

This comment was marked as outdated.

@rafaeldtinoco

This comment was marked as outdated.

@rafaeldtinoco

This comment was marked as outdated.

@rafaeldtinoco rafaeldtinoco added this to the v0.20.0 milestone Nov 1, 2023
@rafaeldtinoco rafaeldtinoco self-assigned this Nov 1, 2023
@AlonZivony

This comment was marked as outdated.

@rafaeldtinoco
Copy link
Contributor Author

Okay so the summary for this issue is the following, recent kernels have the following kernel commit:

commit b022f0c7e404
Author: Francis Laniel <[email protected]>
Date:   Fri Oct 20 07:42:49 2023

    tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols
    
    When a kprobe is attached to a function that's name is not unique (is
    static and shares the name with other functions in the kernel), the
    kprobe is attached to the first function it finds. This is a bug as the
    function that it is attaching to is not necessarily the one that the
    user wants to attach to.
    
    Instead of blindly picking a function to attach to what is ambiguous,
    error with EADDRNOTAVAIL to let the user know that this function is not
    unique, and that the user must use another unique function with an
    address offset to get to the function they want to attach to.
    
    Link: https://lore.kernel.org/all/[email protected]/
    
    Cc: [email protected]
    Fixes: 413d37d1eb69 ("tracing: Add kprobe-based event tracer")
    Suggested-by: Masami Hiramatsu <[email protected]>
    Signed-off-by: Francis Laniel <[email protected]>
    Link: https://lore.kernel.org/lkml/[email protected]/
    Acked-by: Masami Hiramatsu (Google) <[email protected]>
    Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

The commit was only introduced in v6.6 kernel, but since it was also sent to the stable list, it was applied in some older kernels (and distributions will start using it, and tracee would start having errors by the time those kernels are used).

This commit changes the behavior for kprobe attachments. Whenever a kprobe is being created, if the symbols the kprobe attaches to isn't unique (meaning that /proc/kallsyms has multiple address for the same symbol, for example) then, instead of attaching to "one of them", up to this commit, the attachment won't occur, resulting in an error similar to:

{"L":"WARN","T":"2023-11-29T16:36:36.013-0300","M":"libbpf: prog 'trace_load_elf_phdrs': failed to create kprobe 'load_elf_phdrs+0x0' perf event: Cannot assign requested address"}

My proposed course of action here is the following:

  • replace libbpfgo package to a local ./3rdparty/libbpfgo so you can change libbpfgo and test it right away.
  • within tracee attachProbes() function, tracee attaches all loaded eBPF programs to their hooks. It is likely that the error comes from the attachment phase (but it could be coming from the load phase as well, when the object is loaded, or, maybe not, because if I remove load_elf_phdrs from the probes dependency I don't get the error).
  • The TraceProbe type calls libbpfgo for attaching the hooks:
	switch p.probeType {
	case KProbe:
		link, err = prog.AttachKprobe(p.eventName)
	case KretProbe:
		link, err = prog.AttachKretprobe(p.eventName)
	case Tracepoint:
		tp := strings.Split(p.eventName, ":")
		tpClass := tp[0]
		tpEvent := tp[1]
		link, err = prog.AttachTracepoint(tpClass, tpEvent)
	case RawTracepoint:
		tpEvent := strings.Split(p.eventName, ":")[1]
		link, err = prog.AttachRawTracepoint(tpEvent)
	}

So, both the kprobe and kretprobe are affected. Both translate into a libbpfgo function that calls bpf_program__attach_kprobe_opts() (or similar). I believe that the struct bpf_kprobe_opts passed to the bpf_program__attachXXX() functions contain:

struct bpf_kprobe_opts {
	/* size of this struct, for forward/backward compatibility */
	size_t sz;
	/* custom user-provided value fetchable through bpf_get_attach_cookie() */
	__u64 bpf_cookie;
	/* function's offset to install kprobe to */
	size_t offset;
	/* kprobe is return probe */
	bool retprobe;
	/* kprobe attach mode */
	enum probe_attach_mode attach_mode;
	size_t :0;
};
#define bpf_kprobe_opts__last_field attach_mode

and that the "offset" here is the kernel symbol address. Example:

$ sudo cat /proc/kallsyms | grep " load_elf_phdrs"
ffffffff8f088eb0 t load_elf_phdrs
ffffffff8f08ba40 t load_elf_phdrs

Attach would fail for the eBPF Program that uses this hook because there are 2 addresses for it. Then, we can make libbpfgo call the bpf_program__attachXXX() two times, but instead of using the symbol name only (like we do), we can specify in one call ffffffff8f088eb0 as the offset and in the other call ffffffff8f08ba40 as the offset.

This would make our eBPF program to run on both cases (whenever those 2 addresses are called as functions from the kernel). Nowadays, with the buggy kernels (all until this fix was done) our eBPF programs are running only in one of them (and we dont know which).

@rafaeldtinoco
Copy link
Contributor Author

I believe #3798 needs to be sorted out, and then I need a libbpfgo change adding the functions to attach the kpropes to specific offsets, and then the real fix for this issue.

@rafaeldtinoco
Copy link
Contributor Author

rafaeldtinoco commented Jan 11, 2024

With aquasecurity/libbpfgo#399 merged I believe I can fix this issue in Tracee by giving the symbol offsets in the kprobe attachment. Ill be suggesting a PR soon.

@rafaeldtinoco rafaeldtinoco linked a pull request Jan 17, 2024 that will close this issue
@rafaeldtinoco
Copy link
Contributor Author

Addressed by #3802

rafaeldtinoco added a commit that referenced this issue Jan 18, 2024
Changes probes interface "tracing" type so kprobe and kretprobes can be
attached using the kernel symbol addresses instead of names. This solves the
problem when the symbol has multiple addresses and the kernel refuses the
attachment (newer kernels) because of that.

#3653 (comment)

Fixes: #3653
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants