Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: new(driver/modern_bpf): home-made bpf_loop for sendmmsg and recvmmsg. #2233

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

FedeDP
Copy link
Contributor

@FedeDP FedeDP commented Jan 15, 2025

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area driver-modern-bpf

Does this PR require a change in the driver versions?

What this PR does / why we need it:

For sendmmsg and recvmmsg in modern_bpf probe we could not use bpf_loop helper because it caused verifier issues on kernels prior to 5.13 (see #2027 (comment)).
Therefore we used a loop up to 16; then we noticed that 16 was too high, thus we lowered the limit to 8 (600fefb).
This means we can only read first 8 messages sent through sendmmsg and recvmmsg.

This PR's scope is to increase the limit up to 256 (in the first proposed draft, i limit it to 64).
The idea is to build an X_MACRO that let us easily chain tail-calls.
EBPF tail call limit is MAX_TAIL_CALL_CNT, ie 32 on older kernels, and 33 nowadays. For now, i capped the implementation to 8 (and each tail-call loops 8 times, this 64 total).

The good: we support up to 8x32=256messages (that is a far better situation than now)
The bad: we need to extract network args at each iteration because we can't share state between tail calls
The ugly: well, the code gets a bit convoluted but it does the trick. It's just plain old C anyway :) 

Note also that to make it a little bit less verbose, i could also create a new header with all the macros and share them between the 2 source files, since they are basically identical (naming aside). I decided to keep it as easy as possible, but it is a possibility (and it would be more future proof since we would only have a single place to update if needed).

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Putting it in wip to allow a discussion.
THIS IS NOT FOR 0.20.0.

Does this PR introduce a user-facing change?:

NONE

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 15, 2025

/milestone TBD

@poiana poiana added this to the TBD milestone Jan 15, 2025
@poiana poiana added the size/L label Jan 15, 2025
@poiana
Copy link
Contributor

poiana commented Jan 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FedeDP

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana requested review from hbrueckner and leogr January 15, 2025 08:08
@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 15, 2025

I want to hear @Andreagit97 and @Molter73 opinions on this one :)

Copy link

github-actions bot commented Jan 15, 2025

Perf diff from master - unit tests

Warning:
Processed 36803 events and lost 1 chunks!

Check IO/CPU overload!

    11.27%     -0.69%  [.] sinsp::next
     1.53%     +0.52%  [.] next
     1.75%     -0.41%  [.] libsinsp::sinsp_suppress::process_event
     5.75%     -0.26%  [.] next_event_from_file
     0.47%     +0.26%  [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            +0.0125         +0.0125           146           148           146           148
BM_sinsp_split_median                                          +0.0168         +0.0168           146           148           146           148
BM_sinsp_split_stddev                                          +0.7774         +0.7748             1             1             1             1
BM_sinsp_split_cv                                              +0.7555         +0.7529             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  -0.0333         -0.0332            61            59            61            59
BM_sinsp_concatenate_paths_relative_path_median                -0.0336         -0.0336            61            59            61            59
BM_sinsp_concatenate_paths_relative_path_stddev                +0.5974         +0.5989             1             1             1             1
BM_sinsp_concatenate_paths_relative_path_cv                    +0.6523         +0.6539             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     -0.0452         -0.0452            25            24            25            24
BM_sinsp_concatenate_paths_empty_path_median                   -0.0437         -0.0437            25            24            25            24
BM_sinsp_concatenate_paths_empty_path_stddev                   -0.6335         -0.6332             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       -0.6162         -0.6158             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  -0.0114         -0.0114            64            63            64            63
BM_sinsp_concatenate_paths_absolute_path_median                -0.0136         -0.0136            64            63            64            63
BM_sinsp_concatenate_paths_absolute_path_stddev                +0.8003         +0.8003             0             1             0             1
BM_sinsp_concatenate_paths_absolute_path_cv                    +0.8210         +0.8210             0             0             0             0
BM_sinsp_split_container_image_mean                            -0.0052         -0.0051           390           388           390           388
BM_sinsp_split_container_image_median                          -0.0034         -0.0034           389           388           389           388
BM_sinsp_split_container_image_stddev                          -0.1510         -0.1514             3             3             3             3
BM_sinsp_split_container_image_cv                              -0.1466         -0.1470             0             0             0             0

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.09%. Comparing base (8362ae9) to head (8e15871).
Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2233   +/-   ##
=======================================
  Coverage   75.09%   75.09%           
=======================================
  Files         276      276           
  Lines       34391    34391           
  Branches     5927     5927           
=======================================
  Hits        25826    25826           
  Misses       8565     8565           
Flag Coverage Δ
libsinsp 75.09% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Jan 15, 2025

X64 kernel testing matrix

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-4.19 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2-5.10 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
amazonlinux2023-6.1 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.0 🟢 🟢 🟢 🟢 🟢 🟢
archlinux-6.7 🟢 🟢 🟢 🟢 🟢 🟢
centos-3.10 🟢 🟢 🟢 🟡 🟡 🟡
centos-4.18 🟢 🟢 🟢 🟢 🟢 🟢
centos-5.14 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.17 🟢 🟢 🟢 🟢 🟢 🟢
fedora-5.8 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-3.10 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-4.14 🟢 🟢 🟢 🟢 🟢 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-5.4 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-4.15 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-5.8 🟢 🟢 🟢 🟢 🟢 🟡
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

ARM64 kernel testing matrix

KERNEL CMAKE-CONFIGURE KMOD BUILD KMOD SCAP-OPEN BPF-PROBE BUILD BPF-PROBE SCAP-OPEN MODERN-BPF SCAP-OPEN
amazonlinux2-5.4 🟢 🟢 🟢 🟢 🟢 🟡
amazonlinux2022-5.15 🟢 🟢 🟢 🟢 🟢 🟢
fedora-6.2 🟢 🟢 🟢 🟢 🟢 🟢
oraclelinux-4.14 🟢 🟢 🟢 🟡 🟡 🟡
oraclelinux-5.15 🟢 🟢 🟢 🟢 🟢 🟢
ubuntu-6.5 🟢 🟢 🟢 🟢 🟢 🟢

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 15, 2025

CI Build / run-e2e-tests-amd64 (bundled_deps) (pull_request) Failing after 21m

Need to understand what broke e2e-tests.

@FedeDP FedeDP force-pushed the new/homemade_bpf_loop branch from dd30172 to 5664af4 Compare January 15, 2025 11:00
With this one weird trick, bpf hates us!

Signed-off-by: Federico Di Pierro <[email protected]>
@FedeDP FedeDP force-pushed the new/homemade_bpf_loop branch from 5664af4 to 8e15871 Compare January 15, 2025 11:03
Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach seems great! thank you! if we want to avoid macro X we can just use a single ebpf program in tail call and save the state in a PER-CPU map/array so that each tailed called prog can see the iteration number. In the end, it should change almost nothing, it would be probably just easier to debug

@FedeDP
Copy link
Contributor Author

FedeDP commented Jan 15, 2025

If we create a per-cpu map, we could also avoid to share recvmmsg_data_t structure on each tail called program, avoiding unneeded extract__network_args calls.
I will look into that; for a first draft i wanted to avoid any additional map and see how it went.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants