Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel locking up cpu - reboot of openvpn-server proces fixes it #75

Open
33Fraise33 opened this issue Jan 21, 2025 · 4 comments
Open

Comments

@33Fraise33
Copy link

33Fraise33 commented Jan 21, 2025

Hello,

We are seeing an issue for a while now where our server will lockup, CPU will go to 100% usage (by kernel usage reported by netdata). Then memory usage goes up, starts swapping until everything is full and a hard reboot is required. It seems that clients start reconnecting constantly when cpu starts rising as well (not sure if because of the issue or that is causing the issue).

At the time I see a lot of the following messages in the dmesg:

[79393.290437] ovpn_udp_encap_recv: cannot handle incoming packet from peer 5: -28
[79393.291940] ovpn_udp_encap_recv: cannot handle incoming packet from peer 5: -28
[79393.292828] ovpn_udp_encap_recv: cannot handle incoming packet from peer 5: -28
[79393.293573] ovpn_udp_encap_recv: cannot handle incoming packet from peer 5: -28

A bit further there seems to be a kernel dump:

[80233.842115]       Tainted: G           OE      6.1.0-30-amd64 #1 Debian 6.1.124-1
[80233.843110] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[80233.844111] task:rcu_tasks_rude_ state:D stack:0     pid:12    ppid:2      flags:0x00004000
[80233.845221] Call Trace:
[80233.845679]  <TASK>
[80233.846097]  __schedule+0x34d/0x9e0
[80233.846681]  schedule+0x5a/0xd0
[80233.847218]  schedule_timeout+0x118/0x150
[80233.847850]  wait_for_completion+0x86/0x160
[80233.848515]  __flush_work.isra.0+0x173/0x280
[80233.849240]  ? flush_workqueue_prep_pwqs+0x110/0x110
[80233.850033]  ? get_completed_synchronize_rcu+0x10/0x10
[80233.850777]  schedule_on_each_cpu+0xaf/0x100
[80233.851464]  rcu_tasks_one_gp+0x2fe/0x390
[80233.852041]  ? rcu_tasks_postscan+0x20/0x20
[80233.852496]  rcu_tasks_kthread+0x2e/0x40
[80233.852945]  kthread+0xd7/0x100
[80233.853324]  ? kthread_complete_and_exit+0x20/0x20
[80233.853845]  ret_from_fork+0x1f/0x30
[80233.854260]  </TASK>
[80233.854556] INFO: task khugepaged:38 blocked for more than 725 seconds.
[80233.855218]       Tainted: G           OE      6.1.0-30-amd64 #1 Debian 6.1.124-1
[80233.855970] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[80233.856732] task:khugepaged      state:D stack:0     pid:38    ppid:2      flags:0x00004000
[80233.857569] Call Trace:
[80233.857920]  <TASK>
[80233.858229]  __schedule+0x34d/0x9e0
[80233.858646]  schedule+0x5a/0xd0
[80233.859048]  schedule_timeout+0x118/0x150
[80233.859511]  wait_for_completion+0x86/0x160
[80233.860008]  __flush_work.isra.0+0x173/0x280
[80233.860489]  ? flush_workqueue_prep_pwqs+0x110/0x110
[80233.861046]  __lru_add_drain_all+0x147/0x1f0
[80233.861635]  khugepaged+0x63/0x970
[80233.862113]  ? cpuusage_read+0x10/0x10
[80233.862561]  ? collapse_pte_mapped_thp+0x5d0/0x5d0
[80233.863137]  kthread+0xd7/0x100
[80233.863703]  ? kthread_complete_and_exit+0x20/0x20
[80233.864479]  ret_from_fork+0x1f/0x30
[80233.865036]  </TASK>
[80233.865397] INFO: task EBPF_WORK[1]:3295 blocked for more than 120 seconds.
[80233.866112]       Tainted: G           OE      6.1.0-30-amd64 #1 Debian 6.1.124-1
[80233.866895] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[80233.867675] task:EBPF_WORK[1]    state:D stack:0     pid:3295  ppid:2958   flags:0x00004002
[80233.868509] Call Trace:
[80233.868880]  <TASK>
[80233.869272]  __schedule+0x34d/0x9e0
[80233.869720]  schedule+0x5a/0xd0
[80233.870142]  schedule_timeout+0x118/0x150
[80233.870636]  wait_for_completion+0x86/0x160
[80233.871136]  __wait_rcu_gp+0x12b/0x140
[80233.871597]  synchronize_rcu_tasks_rude+0x5e/0xd0
[80233.872134]  ? call_rcu_tasks+0x20/0x20
[80233.872615]  ? __bpf_trace_rcu_stall_warning+0x10/0x10
[80233.873225]  ftrace_shutdown.part.0+0xc5/0x1b0
[80233.873741]  ? 0xffffffffc027f000
[80233.874179]  unregister_ftrace_function+0x45/0x150
[80233.874746]  ? 0xffffffffc027f000
[80233.875184]  unregister_ftrace_direct_multi+0x46/0x90
[80233.875755]  bpf_trampoline_update+0x4fe/0x5d0
[80233.876295]  bpf_trampoline_unlink_prog+0x7a/0x100
[80233.876871]  bpf_tracing_link_release+0x12/0x40
[80233.877404]  bpf_link_free+0x4b/0x70
[80233.877861]  bpf_link_release+0x63/0x70
[80233.878343]  __fput+0x90/0x250
[80233.878742]  task_work_run+0x56/0x90
[80233.879217]  do_exit+0x352/0xaf0
[80233.879634]  do_group_exit+0x2d/0x80
[80233.880097]  get_signal+0x985/0x990
[80233.880541]  arch_do_signal_or_restart+0x3e/0x830
[80233.881081]  exit_to_user_mode_prepare+0x195/0x1e0
[80233.881634]  syscall_exit_to_user_mode+0x17/0x40
[80233.882160]  do_syscall_64+0x61/0xb0
[80233.882606]  ? release_pages+0x159/0x470
[80233.883076]  ? lru_gen_add_folio+0x2d0/0x2d0
[80233.883578]  ? folio_batch_move_lru+0xd3/0x150
[80233.884096]  ? _raw_spin_unlock+0xa/0x30
[80233.884576]  ? __handle_mm_fault+0xeef/0xfa0
[80233.885076]  ? rwsem_down_read_slowpath+0x3ed/0x4f0
[80233.885619]  ? _raw_spin_unlock_irqrestore+0xa/0x40
[80233.886162]  ? try_to_wake_up+0x68/0x570
[80233.886639]  ? rwsem_mark_wake+0x211/0x310
[80233.887127]  ? wake_up_q+0x4a/0x90
[80233.887544]  ? rwsem_wake.isra.0+0x69/0x90
[80233.888030]  ? up_read+0x4b/0x60
[80233.888433]  ? do_user_addr_fault+0x1b0/0x550
[80233.888954]  ? clear_bhb_loop+0x15/0x70
[80233.889416]  ? clear_bhb_loop+0x15/0x70
[80233.889890]  ? clear_bhb_loop+0x15/0x70
[80233.890337]  ? clear_bhb_loop+0x15/0x70
[80233.890772]  ? clear_bhb_loop+0x15/0x70
[80233.891220]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[80233.891748] RIP: 0033:0x7f60aefc5f16
[80233.892185] RSP: 002b:00007f60ab46ea80 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[80233.892922] RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00007f60aefc5f16
[80233.893653] RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f60a4000bd0
[80233.894357] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
[80233.895062] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f60a4000b80
[80233.895763] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f60a4000bd0
[80233.896487]  </TASK>

Openvpn version: (same issue existed with dco 20231117 and with openvpn 2.6.12)

OpenVPN 2.6.13 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] [DCO]
library versions: OpenSSL 3.0.15 3 Sep 2024, LZO 2.10
DCO version: 0.2.20241216
Originally developed by James Yonan
Copyright (C) 2002-2024 OpenVPN Inc <[email protected]>
Compile time defines: enable_async_push=no enable_comp_stub=no enable_crypto_ofb_cfb=yes enable_dco=yes enable_dco_arg=yes enable_debug=yes enable_dependency_tracking=no enable_dlopen=unknown enable_dlopen_self=unknown enable_dlopen_self_static=unknown enable_fast_install=needless enable_fragment=yes enable_iproute2=no enable_libtool_lock=yes enable_lz4=yes enable_lzo=yes enable_maintainer_mode=no enable_management=yes enable_option_checking=no enable_pam_dlopen=no enable_pedantic=no enable_pkcs11=yes enable_plugin_auth_pam=yes enable_plugin_down_root=yes enable_plugins=yes enable_port_share=yes enable_selinux=no enable_shared=yes enable_shared_with_static_runtimes=no enable_silent_rules=no enable_small=no enable_static=yes enable_strict=no enable_strict_options=no enable_systemd=yes enable_unit_tests=no enable_werror=no enable_win32_dll=yes enable_wolfssl_options_h=yes enable_x509_alt_username=yes with_aix_soname=aix with_crypto_library=openssl with_gnu_ld=yes with_mem_check=no with_openssl_engine=auto with_sysroot=no

Kernel version: (it has been giving issues for a longer time already, before this specific kernel release)

# uname -a
Linux lionel 6.1.0-30-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.124-1 (2025-01-12) x86_64 GNU/Linux

Config:
/etc/openvpn/server/uav-server.conf

port 1194
proto udp
dev tun0
cipher AES-256-GCM

tun-mtu 1400

ca /etc/openvpn/server/uav-server/easy-rsa/pki/ca.crt
dh /etc/openvpn/server/uav-server/easy-rsa/pki/dh.pem
cert /etc/openvpn/server/uav-server/easy-rsa/pki/issued/server.crt
key /etc/openvpn/server/uav-server/easy-rsa/pki/private/server.key
tls-auth /etc/openvpn/server/uav-server/easy-rsa/pki/ta.key 0

allow-compression no

topology subnet
server 10.10.226.0 255.255.255.0
ifconfig-pool-persist /etc/openvpn/server/uav-server/ipp.txt
push "route 10.10.224.0 255.255.240.0"
push "route 10.11.0.0 255.255.0.0"

client-config-dir /etc/openvpn/server/uav-server/ccd

keepalive 10 120

user openvpn
group openvpn
persist-key
persist-tun
status uav-server-status.log
verb 3
mute 20
explicit-exit-notify 1

/etc/openvpn/server/ccd/uav1

ifconfig-push 10.10.226.2 255.255.255.0

Screenshots:

Image

Journalctl logs at that time: after 10:46:13 all clients are reconnecting, the same logs are spammed every second

Jan 21 10:42:11 lionel openvpn[591]: MULTI_sva: pool returned IPv4=10.10.226.2, IPv6=(Not enabled)
Jan 21 10:42:11 lionel openvpn[591]: OPTIONS IMPORT: reading client specific options from: /etc/openvpn/server/uav-server/ccd/uav53
Jan 21 10:42:11 lionel openvpn[591]: MULTI: Learn: 10.10.226.54 -> uav53/pub-ip:55518
Jan 21 10:42:11 lionel openvpn[591]: MULTI: primary virtual IP for uav53/pub-ip:55518: 10.10.226.54
Jan 21 10:42:12 lionel openvpn[591]: uav53/pub-ip:55518 Data Channel: cipher 'AES-256-GCM', peer-id: 0
Jan 21 10:42:12 lionel openvpn[591]: uav53/pub-ip:55518 Timers: ping 10, ping-restart 240
Jan 21 10:42:12 lionel openvpn[591]: uav53/pub-ip:55518 Protocol options: explicit-exit-notify 1
Jan 21 10:42:12 lionel openvpn[591]: uav53/pub-ip:55518 PUSH: Received control message: 'PUSH_REQUEST'
Jan 21 10:42:12 lionel openvpn[591]: uav53/pub-ip:55518 SENT CONTROL [uav53]: 'PUSH_REPLY,route 10.10.224.0 255.255.240.0,route 10.11.0.0 255.255.0.0,route-gateway 10.10.226.1,topology subnet,ping 10,ping>
Jan 21 10:42:31 lionel openvpn[591]: uav53/pub-ip:58186 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Jan 21 10:42:31 lionel openvpn[591]: uav53/pub-ip:58186 TLS Error: TLS handshake failed
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 TLS: soft reset sec=3300/3300 bytes=0/-1 pkts=0/0
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 VERIFY OK: depth=1, CN=Easy-RSA CA
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 VERIFY OK: depth=0, CN=uav60
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_VER=2.4.4
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_PLAT=linux
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_PROTO=2
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_LZ4=1
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_LZ4v2=1
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_LZO=1
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_COMP_STUB=1
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_COMP_STUBv2=1
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 peer info: IV_TCPNL=1
Jan 21 10:43:10 lionel openvpn[591]: uav60/pub-ip2:6279 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bits RSA, signature: RSA-SHA256, peer temporary key: 253 bits>
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 TLS: soft reset sec=3263/3263 bytes=0/-1 pkts=0/0
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 VERIFY OK: depth=1, CN=Easy-RSA CA
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 VERIFY OK: depth=0, CN=uav57
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_VER=2.4.4
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_PLAT=linux
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_PROTO=2
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_LZ4=1
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_LZ4v2=1
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_LZO=1
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_COMP_STUB=1
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_COMP_STUBv2=1
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 peer info: IV_TCPNL=1
Jan 21 10:43:38 lionel openvpn[591]: uav57/pub-ip2:6148 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bits RSA, signature: RSA-SHA256, peer temporary key: 253 bits>
Jan 21 10:43:46 lionel openvpn[591]: uav53/pub-ip:58186 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Jan 21 10:43:46 lionel openvpn[591]: uav53/pub-ip:58186 TLS Error: TLS handshake failed
Jan 21 10:45:02 lionel openvpn[591]: uav53/pub-ip:58186 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
Jan 21 10:45:02 lionel openvpn[591]: uav53/pub-ip:58186 TLS Error: TLS handshake failed
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 VERIFY OK: depth=1, CN=Easy-RSA CA
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 VERIFY OK: depth=0, CN=uav43
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_VER=2.4.4
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_PLAT=linux
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_PROTO=2
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_NCP=2
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_LZ4=1
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_LZ4v2=1
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_LZO=1
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_COMP_STUB=1
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_COMP_STUBv2=1
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 peer info: IV_TCPNL=1
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 TLS: move_session: dest=TM_ACTIVE src=TM_INITIAL reinit_src=1
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 TLS: tls_multi_process: initial untrusted session promoted to trusted
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bits RSA, signature: RSA-SHA256, peer temporary key: 253 bits X25519
Jan 21 10:46:13 lionel openvpn[591]: pub-ip2:6728 [uav43] Peer Connection Initiated with [AF_INET]pub-ip2:6728
Jan 21 10:46:13 lionel openvpn[591]: uav43/pub-ip2:6728 dco_parse_peer_multi: cannot store DCO stats for peer 12
Jan 21 10:46:13 lionel openvpn[591]: MULTI: new connection by client 'uav43' will cause previous active sessions by this client to be dropped.  Remember to use the --duplicate-cn option if you want multiple cli>
Jan 21 10:46:13 lionel openvpn[591]: MULTI_sva: pool returned IPv4=10.10.226.2, IPv6=(Not enabled)
Jan 21 10:46:13 lionel openvpn[591]: OPTIONS IMPORT: reading client specific options from: /etc/openvpn/server/uav-server/ccd/uav43
Jan 21 10:46:13 lionel openvpn[591]: MULTI: Learn: 10.10.226.44 -> uav43/pub-ip2:6728
Jan 21 10:46:13 lionel openvpn[591]: MULTI: primary virtual IP for uav43/pub-ip2:6728: 10.10.226.44
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 VERIFY OK: depth=1, CN=Easy-RSA CA
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 VERIFY OK: depth=0, CN=uav48
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_VER=2.4.4
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_PLAT=linux
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_PROTO=2
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_NCP=2
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_LZ4=1
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_LZ4v2=1
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_LZO=1
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_COMP_STUB=1
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_COMP_STUBv2=1
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 peer info: IV_TCPNL=1
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 TLS: move_session: dest=TM_ACTIVE src=TM_INITIAL reinit_src=1
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 TLS: tls_multi_process: initial untrusted session promoted to trusted
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bits RSA, signature: RSA-SHA256, peer temporary key: 253 bits X255>
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:56638 [uav48] Peer Connection Initiated with [AF_INET]pub-ip:56638
Jan 21 10:46:14 lionel openvpn[591]: uav48/pub-ip:56638 dco_parse_peer_multi: cannot store DCO stats for peer 9
Jan 21 10:46:14 lionel openvpn[591]: MULTI: new connection by client 'uav48' will cause previous active sessions by this client to be dropped.  Remember to use the --duplicate-cn option if you want multiple cli>
Jan 21 10:46:14 lionel openvpn[591]: MULTI_sva: pool returned IPv4=10.10.226.2, IPv6=(Not enabled)
Jan 21 10:46:14 lionel openvpn[591]: OPTIONS IMPORT: reading client specific options from: /etc/openvpn/server/uav-server/ccd/uav48
Jan 21 10:46:14 lionel openvpn[591]: MULTI: Learn: 10.10.226.49 -> uav48/pub-ip:56638
Jan 21 10:46:14 lionel openvpn[591]: MULTI: primary virtual IP for uav48/pub-ip:56638: 10.10.226.49
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:57317 VERIFY OK: depth=1, CN=Easy-RSA CA
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:57317 VERIFY OK: depth=0, CN=uav49
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:57317 peer info: IV_VER=2.4.4
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:57317 peer info: IV_PLAT=linux
Jan 21 10:46:14 lionel openvpn[591]: pub-ip:57317 peer info: IV_PROTO=2
@ordex
Copy link
Member

ordex commented Jan 21, 2025

Thanks for your report.
The following message

[79393.290437] ovpn_udp_encap_recv: cannot handle incoming packet from peer 5: -28

indicates that the RX queue is full and no more packets can be appended.
This happens because the packet processing code is not picking up new packets and therefore everything is stuck.
Why this is happening is not clear from the log. What we see is the system not processing anymore.

Is this something easy to trigger?

@33Fraise33
Copy link
Author

33Fraise33 commented Jan 21, 2025

Is this something easy to trigger?

This is happening more and more frequent, there is no actual way to trigger it but currently it happens about daily and most of the time a few times after each other. We have now disabled dco to see if we are able to get it stable like that.

Is there specific logging I could enable to further debug this?

@ordex
Copy link
Member

ordex commented Jan 21, 2025

Is there specific logging I could enable to further debug this?

Unfortunately there is no easy way, especially because we don't know what is failing.
Assuming we can reproduce it, one way would be to compile your kernel with various debugging instrumentations to see if more splats come out.

How many clients do you have connected when the problem starts?
Are they generating high amount of traffic at that time? Or it happened also under low traffic circumstances?

@33Fraise33
Copy link
Author

Are they generating high amount of traffic at that time? Or it happened also under low traffic circumstances?

We have 15-20 clients connected, at the time of the issue we see about 10mbit/s in and out (about 4kp/s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants