Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running container fails with failed to add the host cannot allocate memory #1443

Open
hpakniamina opened this issue Nov 29, 2022 · 79 comments
Open

Comments

@hpakniamina
Copy link

hpakniamina commented Nov 29, 2022

OS: Red Hat Enterprise Linux release 8.7 (Ootpa)
Version:

$ sudo yum list installed | grep docker
containerd.io.x86_64                         1.6.9-3.1.el8                               @docker-ce-stable
docker-ce.x86_64                             3:20.10.21-3.el8                            @docker-ce-stable
docker-ce-cli.x86_64                         1:20.10.21-3.el8                            @docker-ce-stable
docker-ce-rootless-extras.x86_64             20.10.21-3.el8                              @docker-ce-stable
docker-scan-plugin.x86_64                    0.21.0-3.el8                                @docker-ce-stable

Out of hundreds os docker calls made over days, a few of them fails. This is the schema of the commandline:

/usr/bin/docker run \
-u 1771:1771 \
-a stdout \
-a stderr \
-v /my_path:/data \
--rm \
my_image:latest my_entry --my_args

The failure:

docker: Error response from daemon: failed to create endpoint recursing_aryabhata on network bridge: failed to add the host (veth6ad97f8) <=> sandbox (veth23b66ce) pair interfaces: cannot allocate memory.

It is not easily reproducible. The failure rate is less than one percent. At the time this error happens system has lots of free memory. Around the time that this failure happens, the application is making around 5 docker calls per second. Each call take about 5 to 10 seconds to complete.

@petergerten
Copy link

I have the same issue on Arch, also not consistently reproducible.
Docker version 20.10.23, build 715524332f

@hpakniamina
Copy link
Author

I have the same issue on Arch, also not consistently reproducible. Docker version 20.10.23, build 715524332f

I did not need the networking features of the container, therefore passing "--network none" to docker run commandline circumvented the problem:

docker run ... --network none ...

@henryborchers
Copy link

It's happening to me when I am building my images. Sadly, it too is not able to be reproduced consistently.

docker build ...

@nixon89
Copy link

nixon89 commented Feb 8, 2023

I have the same behavior with docker build command (cannot allocate memory)

# docker version

Client: Docker Engine - Community
 Version:           23.0.0
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        e92dd87
 Built:             Wed Feb  1 17:47:51 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.0
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.5
  Git commit:       d7573ab
  Built:            Wed Feb  1 17:47:51 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.16
  GitCommit:        31aa4358a36870b21a992d3ad2bef29e1d693bec
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
# apt list --installed | grep docker

docker-buildx-plugin/jammy,now 0.10.2-1~ubuntu.22.04~jammy amd64 [installed,automatic]
docker-ce-cli/jammy,now 5:23.0.0-1~ubuntu.22.04~jammy amd64 [installed]
docker-ce-rootless-extras/jammy,now 5:23.0.0-1~ubuntu.22.04~jammy amd64 [installed,automatic]
docker-ce/jammy,now 5:23.0.0-1~ubuntu.22.04~jammy amd64 [installed]
docker-compose-plugin/jammy,now 2.15.1-1~ubuntu.22.04~jammy amd64 [installed,automatic]
docker-scan-plugin/jammy,now 0.23.0~ubuntu-jammy amd64 [installed,automatic]

@hostalp
Copy link

hostalp commented Feb 15, 2023

Exactly the same issue here during docker build,
Rocky Linux 8.7 (RHEL 8.7 clone), Docker 20.10.22-3.el8

@b-khouy
Copy link

b-khouy commented Feb 15, 2023

I fixed the prob using docker builder prune command then run the build again
https://docs.docker.com/engine/reference/commandline/builder_prune

@hpakniamina
Copy link
Author

I fixed the prob using docker builder prune command then run the build again https://docs.docker.com/engine/reference/commandline/builder_prune

If one is dealing with an intermittent problem, then there is no guarantee the issue is resolved.

@bendem
Copy link

bendem commented Apr 4, 2023

Same problem here, every x times, a build fails with failed to add the host ( ) <=> sandbox ( ) pair interfaces: cannot allocate memory.. System info :

$ dnf list --installed docker\* containerd\* | cat
Installed Packages
containerd.io.x86_64                    1.6.20-3.1.el8         @docker-ce-stable
docker-buildx-plugin.x86_64             0.10.4-1.el8           @docker-ce-stable
docker-ce.x86_64                        3:23.0.2-1.el8         @docker-ce-stable
docker-ce-cli.x86_64                    1:23.0.2-1.el8         @docker-ce-stable
docker-ce-rootless-extras.x86_64        23.0.2-1.el8           @docker-ce-stable
docker-compose-plugin.x86_64            2.17.2-1.el8           @docker-ce-stable
docker-scan-plugin.x86_64               0.23.0-3.el8           @docker-ce-stable

$ sudo docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 55
 Server Version: 23.0.2
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc version: v1.1.5-0-gf19387a
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 4.18.0-425.13.1.el8_7.x86_64
 Operating System: Rocky Linux 8.7 (Green Obsidian)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.4GiB
 Name: x
 ID: NUAJ:VDZR:RMDC:ASCP:5SEG:D4EF:OEIW:RY57:VXYI:5EZV:6F4F:D5RO
 Docker Root Dir: /opt/docker_data
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://x/
 Live Restore Enabled: false
 Default Address Pools:
   Base: 172.17.0.0/16, Size: 24
   Base: 172.20.0.0/16, Size: 24
   Base: 172.30.0.0/16, Size: 24

@bendem
Copy link

bendem commented Apr 4, 2023

If I understand correctly, this is the same as https://bbs.archlinux.org/viewtopic.php?id=282429 which is fixed by this patch queued here.

@henryborchers
Copy link

I don't know if this helps but it's happening to me on Rocky Linux 8.7 as well, just like @hostalp.

@pschoen-itsc
Copy link

We have the same issue on Ubuntu 20.04 since a few weeks.

@thaJeztah
Copy link
Member

/cc @akerouanton FYI (I see a potential kernel issue mentioned above)

@pschoen-itsc
Copy link

We have the problem with an older kernel (5.15), so I do not think that there is a connection with the mentioned kernel bug.

@XuNiLuS
Copy link

XuNiLuS commented Jun 29, 2023

I have the same problem with a debian 12 (6.1.0-9-amd64), but no problem with a debian 11 (5.10.0-21-amd64)

@utrotzek
Copy link

utrotzek commented Jul 3, 2023

Same Problem on Ubuntu 22.04

Linux 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

This error is really annoying since all of our CI/CD Pipelines fail randomly.

@skast96
Copy link

skast96 commented Jul 6, 2023

Same problem on Ubuntu 22.04 on a server

@WolleTD
Copy link

WolleTD commented Jul 14, 2023

I have the same problem with a debian 12 (6.1.0-9-amd64), but no problem with a debian 11 (5.10.0-21-amd64)

Same here. Moved a server to Debian 12 which is starting a new container once per day to create backups of docker volumes. After some days or weeks, starting the container fails until I restart the docker daemon.

@skast96
Copy link

skast96 commented Jul 17, 2023

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

@pschoen-itsc
Copy link

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

We disabled swap on our failing servers, but it did not help.

@skast96
Copy link

skast96 commented Jul 17, 2023

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

We disabled swap on our failing servers, but it did not help.

I thought of the other way around and wanted to check if enabling the swap helps 🤔

@utrotzek
Copy link

I compared three of my servers with the failing one and the only thing that is different is that the vm.swappiness is set to 0 and that the server has no swap activated at all. If that helps

That was EXACTLY the case on our servers.

Changing the values in /etc/sysctl.conf

from

vm.swappiness = 0

to

vm.swappiness = 60

and applying it with sysctl -p

solved it for. You saved my life! ;) I forgot that I had set this value.

@skast96
Copy link

skast96 commented Jul 31, 2023

Did anyone fix that error without enabling the swap because on that specific server it is not possible to enable the swap....

@pschoen-itsc
Copy link

Configured swap as suggested here (disabled image pruning before each run) and the error appeared again after a few days.

@skast96
Copy link

skast96 commented Aug 13, 2023

I am still trying to fix this on Ubuntu 22.04 without a swap. My next guess is that i miss configured something in my compose files, which leads to a high number of network connections left open?!? I am not sure if that fixes that problem or if it really is due to a kernel error. I will report my findings here next week. If anyone has figured it out please feel free to comment.

@hpakniamina
Copy link
Author

My next guess is that i miss configured something in my compose files

As mentioned before, we did not need the networking, so "--network none" helped us to go around it. We don't have docker compose. We simply call docker a couple of thousand times. Docker container reads the input and writes into the output and the container is removed by "--rm". Our issue does not have anything to do with weird configurations or docker compose.

@AmirL
Copy link

AmirL commented Aug 17, 2023

Have the same problem. vm.swappiness = 60 helped for a while, but now the problem is back again.

@cimchd
Copy link

cimchd commented Aug 18, 2023

Same problem here. We have 89 services in our compose file. Running docker system prune before the build usually solves the problem temporary.

@pschoen-itsc
Copy link

Setting the nr_cpu boot parameter resolved the issue for us permanently.

@JonasAlfredsson
Copy link

Same goes for us, after doing the steps from my comment above, with nr_cpu set to the threads available to the system (grep -c processor /proc/cpuinfo), we haven't seen the previously hourly occurring problem for 3 months straigt.

@attie-argentum
Copy link

Thanks both for your responses - I've put that in place, and will report back if the issue continues! 🤞

@mumbleskates
Copy link

Having never seen this before, I just had two gitlab-ci containers (launched by the native runner, not the docker-in-docker one) fail with this error at the same time. Only one allocation failure was logged to dmesg (seen below). The system is also running zfs, and the system root (and docker) are on btrfs. Swap is disabled, and the system has many gigabytes of free memory both before and after the page cache and the zfs ARC.

root@erebor ~ # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy
root@erebor ~ # uname -a
Linux erebor 6.5.0-15-generic #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
root@erebor ~ # zfs version
zfs-2.2.2-1
zfs-kmod-2.2.2-1
root@erebor ~ # 
dmesg logs
[906739.889741] dockerd: page allocation failure: order:5, mode:0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=docker.service,mems_allowed=0
[906739.889765] CPU: 52 PID: 1207114 Comm: dockerd Tainted: P           OE      6.5.0-15-generic #15~22.04.1-Ubuntu
[906739.889772] Hardware name: ASUS System Product Name/Pro WS WRX80E-SAGE SE WIFI, BIOS 1003 02/18/2022
[906739.889776] Call Trace:
[906739.889780]  
[906739.889786]  dump_stack_lvl+0x48/0x70
[906739.889796]  dump_stack+0x10/0x20
[906739.889801]  warn_alloc+0x174/0x1f0
[906739.889812]  ? __alloc_pages_direct_compact+0x20b/0x240
[906739.889822]  __alloc_pages_slowpath.constprop.0+0x914/0x9a0
[906739.889835]  __alloc_pages+0x31d/0x350
[906739.889847]  ? veth_dev_init+0x95/0x140 [veth]
[906739.889858]  __kmalloc_large_node+0x7e/0x160
[906739.889866]  __kmalloc.cold+0xc/0xa6
[906739.889875]  veth_dev_init+0x95/0x140 [veth]
[906739.889886]  register_netdevice+0x132/0x700
[906739.889895]  veth_newlink+0x190/0x480 [veth]
[906739.889931]  rtnl_newlink_create+0x170/0x3d0
[906739.889944]  __rtnl_newlink+0x70f/0x770
[906739.889959]  rtnl_newlink+0x48/0x80
[906739.889966]  rtnetlink_rcv_msg+0x170/0x430
[906739.889972]  ? srso_return_thunk+0x5/0x10
[906739.889980]  ? rmqueue+0x93d/0xf10
[906739.889985]  ? srso_return_thunk+0x5/0x10
[906739.889991]  ? __check_object_size.part.0+0x72/0x150
[906739.889999]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[906739.890005]  netlink_rcv_skb+0x5d/0x110
[906739.890020]  rtnetlink_rcv+0x15/0x30
[906739.890027]  netlink_unicast+0x1ae/0x2a0
[906739.890035]  netlink_sendmsg+0x25e/0x4e0
[906739.890047]  sock_sendmsg+0xcc/0xd0
[906739.890053]  __sys_sendto+0x151/0x1b0
[906739.890072]  __x64_sys_sendto+0x24/0x40
[906739.890078]  do_syscall_64+0x5b/0x90
[906739.890085]  ? srso_return_thunk+0x5/0x10
[906739.890091]  ? do_user_addr_fault+0x17a/0x6b0
[906739.890097]  ? srso_return_thunk+0x5/0x10
[906739.890102]  ? exit_to_user_mode_prepare+0x30/0xb0
[906739.890110]  ? srso_return_thunk+0x5/0x10
[906739.890116]  ? irqentry_exit_to_user_mode+0x17/0x20
[906739.890122]  ? srso_return_thunk+0x5/0x10
[906739.890128]  ? irqentry_exit+0x43/0x50
[906739.890133]  ? srso_return_thunk+0x5/0x10
[906739.890139]  ? exc_page_fault+0x94/0x1b0
[906739.890146]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[906739.890153] RIP: 0033:0x55d44da6700e
[906739.890190] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48
[906739.890194] RSP: 002b:000000c0013750c8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[906739.890201] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 000055d44da6700e
[906739.890206] RDX: 0000000000000074 RSI: 000000c001d0e880 RDI: 000000000000000c
[906739.890209] RBP: 000000c001375108 R08: 000000c0012a4910 R09: 000000000000000c
[906739.890213] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[906739.890216] R13: 000000c0016ba800 R14: 000000c00191c1a0 R15: 0000000000000011
[906739.890227]  
[906739.890231] Mem-Info:
[906739.890239] active_anon:4026084 inactive_anon:4572236 isolated_anon:0
                 active_file:356682 inactive_file:3106746 isolated_file:0
                 unevictable:7026 dirty:361241 writeback:0
                 slab_reclaimable:417679 slab_unreclaimable:1060505
                 mapped:3338536 shmem:3269641 pagetables:30618
                 sec_pagetables:8669 bounce:0
                 kernel_misc_reclaimable:0
                 free:651883 free_pcp:319 free_cma:0
[906739.890250] Node 0 active_anon:16104336kB inactive_anon:18288944kB active_file:1426728kB inactive_file:12426984kB unevictable:28104kB isolated(anon):0kB isolated(file):0kB mapped:13354144kB dirty:1444964kB writeback:0kB shmem:13078564kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4063232kB writeback_tmp:0kB kernel_stack:32736kB pagetables:122472kB sec_pagetables:34676kB all_unreclaimable? no
[906739.890262] Node 0 DMA free:11260kB boost:0kB min:0kB low:12kB high:24kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[906739.890274] lowmem_reserve[]: 0 2713 257385 257385 257385
[906739.890289] Node 0 DMA32 free:1022764kB boost:0kB min:712kB low:3488kB high:6264kB reserved_highatomic:32768KB active_anon:678072kB inactive_anon:29120kB active_file:0kB inactive_file:64kB unevictable:0kB writepending:0kB present:2977184kB managed:2910992kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[906739.890301] lowmem_reserve[]: 0 0 254671 254671 254671
[906739.890315] Node 0 Normal free:1574012kB boost:0kB min:66864kB low:327648kB high:588432kB reserved_highatomic:839680KB active_anon:15426264kB inactive_anon:18259824kB active_file:1426728kB inactive_file:12426920kB unevictable:28104kB writepending:1444964kB present:265275392kB managed:260792020kB mlocked:28104kB bounce:0kB free_pcp:744kB local_pcp:0kB free_cma:0kB
[906739.890328] lowmem_reserve[]: 0 0 0 0 0
[906739.890340] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 0*1024kB 1*2048kB (M) 2*4096kB (M) = 11260kB
[906739.890389] Node 0 DMA32: 2173*4kB (UM) 987*8kB (UM) 586*16kB (UM) 374*32kB (UM) 666*64kB (UM) 451*128kB (UM) 295*256kB (UM) 136*512kB (UM) 60*1024kB (UM) 3*2048kB (M) 164*4096kB (UM) = 1022764kB
[906739.890440] Node 0 Normal: 22973*4kB (UME) 42920*8kB (UME) 30209*16kB (UMEH) 8817*32kB (UMEH) 5762*64kB (UMH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1569508kB
[906739.890481] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[906739.890485] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[906739.890489] 6735316 total pagecache pages
[906739.890492] 0 pages in swap cache
[906739.890495] Free swap  = 0kB
[906739.890497] Total swap = 0kB
[906739.890500] 67067142 pages RAM
[906739.890503] 0 pages HighMem/MovableOnly
[906739.890505] 1137549 pages reserved
[906739.890508] 0 pages hwpoisoned

@dounoit
Copy link

dounoit commented Feb 19, 2024

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -

interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -

i tried creating a swapfile and activating it - i get permission denied

this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.3 LTS
Release: 22.04
Codename: jammy

docker info

Client: Docker Engine - Community
Version: 25.0.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.12.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.24.5
Path: /usr/libexec/docker/cli-plugins/docker-compose

Server:
Containers: 19
Running: 17
Paused: 0
Stopped: 2
Images: 19
Server Version: 25.0.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: x
Is Manager: true
ClusterID: x
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 185.185.126.69
Manager Addresses:
x.x.x.x:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.2.0
Operating System: Ubuntu 22.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 72GiB
Name: xxxxx
ID: xxxx-xxx-xxx-xx-xxxx
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

@dounoit
Copy link

dounoit commented Feb 19, 2024

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -

interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -

i tried creating a swapfile and activating it - i get permission denied

this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

docker info

Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose

Server: Containers: 19 Running: 17 Paused: 0 Stopped: 2 Images: 19 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: active NodeID: x Is Manager: true ClusterID: x Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 185.185.126.69 Manager Addresses: x.x.x.x:2377 Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 5.2.0 Operating System: Ubuntu 22.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 72GiB Name: xxxxx ID: xxxx-xxx-xxx-xx-xxxx Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: bridge-nf-call-ip6tables is disabled

this is interesting - no cpu? haha:

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 0
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPU family: 6
Model: 62
Thread(s) per core: 0
Core(s) per socket: 0
Socket(s): 0
Stepping: 4
BogoMIPS: 4400.16
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t
m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmpe
rf eagerfpu cpuid_faulting pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_
1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vn
mi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_
l1d
Virtualization features:
Virtualization: VT-x
Hypervisor vendor: Parallels
Virtualization type: container

@dounoit
Copy link

dounoit commented Feb 19, 2024

all - ive been strugglin with this running on an openVZ instance VPS - there is 72GB ram allocated and not much used -
interesting thing is i can't edit sysctl for vm.swapiness which is set to 60 - i wanted to try setting it to 0 but i apparently don't have permissions even though i'm obviously root -
i tried creating a swapfile and activating it - i get permission denied
this is my first time deploying docker to this infra im trying to stack a bunch of containers on it but im getting the OOM now and containers just create/restart/fail - i just got the same OOM when the container tries to join the network - i tried using docker-compose directly vs. docker stack for testing and got the error - i'll try the grub kernel flags - I sure hope this works! thanks!

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy

docker info

Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose
Server: Containers: 19 Running: 17 Paused: 0 Stopped: 2 Images: 19 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: active NodeID: x Is Manager: true ClusterID: x Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8 SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 185.185.126.69 Manager Addresses: x.x.x.x:2377 Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin Kernel Version: 5.2.0 Operating System: Ubuntu 22.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 72GiB Name: xxxxx ID: xxxx-xxx-xxx-xx-xxxx Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: bridge-nf-call-ip6tables is disabled

this is interesting - no cpu? haha:

lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 0 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz CPU family: 6 Model: 62 Thread(s) per core: 0 Core(s) per socket: 0 Socket(s): 0 Stepping: 4 BogoMIPS: 4400.16 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht t m pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmpe rf eagerfpu cpuid_faulting pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_ 1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vn mi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_ l1d Virtualization features: Virtualization: VT-x Hypervisor vendor: Parallels Virtualization type: container

and top shows 4cpu:

image

@shankerwangmiao
Copy link

shankerwangmiao commented Feb 20, 2024

Hi, all

I also met this problem. I might possibly identified the cause.

It might be because kernel changed its default behavior, creating queues for each possible cpus when creating veth pairs without explicitly specifying number of rx and tx queues. The original behavior is to create only one queue. A queue requires 768 bytes of memory on one side of a veth pair. As a result servers with larger numbers of cores tend to meet this issue. I've reported the issue to the kernel mailing list.

I wonder if docker can explicitly specify 1 tx and rx queue when creating the veth pair to fix this?

@CoyoteWAN
Copy link

@shankerwangmiao I saw the patch to veth.c based on what you reported. Is there any way you can walk us through applying the patch to module veth?

@shankerwangmiao
Copy link

shankerwangmiao commented Mar 1, 2024

@shankerwangmiao I saw the patch to veth.c based on what you reported. Is there any way you can walk us through applying the patch to module veth?

The patch will be included in linux 6.8 and backported to linux lts versions, so I suggest wait for the release of linux 6.8 and the lts releases, and also wait for the corresponding kernel release by your certain linux distribution.

If you are really affected by this bug, I recommend downgrading your kernel to versions <= 5.14 provided by your linux distribution.

TL;DR: Always sticking to the kernel versions provided by your linux distribution is a wise choice, either wait (I'll update this information when such releases are available) or downgrade.

Updates:

The fix has been included in the following kernel lts versions:

  • 5.15.151
  • 6.1.81
  • 6.6.21
  • 6.7.9

For debian users:

  • Bookworm: Upgrade to linux-image-6.1.0-19-amd64 or above;
  • Bullseye-backports: Upgrade to linux-image-6.1.0-0.deb11.21-amd64 or above;
  • Bullseye and older: not affected

For ubuntu users:

  • Noble: not affected;
  • Jammy (linux-image-generic users): Upgrade to linux-image-5.15.0-111-generic or above;
  • Jammy (linux-image-generic-hwe-22.04 users): Still affected;
  • Focal (linux-image-generic users): not affected;
  • Focal (linux-image-generic-hwe-20.04 users): Still affected;
If downgrading is not possible, and this must be fixed... If downgrading is not possible, and this must be fixed, the following procedure can be taken to build a patched `veth.ko`. Please note that using a custom patched kernel module might lead to unexpected consequences and might be DANGEROUS if carried out by an inexperienced person. Always backup and run tests before massive deployment. Take your OWN RISK.
  1. Determine the current kernel version

  2. Download the source of the current kernel, and extract the veth.c from drivers/net/veth.c

    An alternative way to do this is to browse https://elixir.bootlin.com/linux/latest/source/drivers/net/veth.c, select the version on the left panel, and copy the source code on the right side.

  3. Install the development package of the current kernel version, which is provided by the linux distribution and contains header needed to build a kernel module.

    This can be confirmed by ensuring the existence of /lib/modules/$(uname -r)/build

  4. Apply the patch https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/patch/?id=1ce7d306ea63f3e379557c79abd88052e0483813 to the extracted veth.c

  5. Prepare a kbuild file for the building of the module:

    obj-m += veth.o
    
  6. Prepare the build environment:

    • Using a non-root user
    • Create a new empty directory
    • Only put in two files, the patched veth.c and the above kbuild file named Kbuild
  7. Build the patched kernel module:

    • Change current dir to the above directory
    • Execute: make -C "/lib/modules/$(uname -r)/build/" M="$(pwd)" modules in above directory
    • Ensure veth.ko is generated in above directory
  8. Install the patched kernel module:

    • Copy the generated veth.ko to /lib/modules/$(uname -r)/updates: sudo install -Dm 644 veth.ko -t "/lib/modules/$(uname -r)/updates"
    • Regenerate module dependencies: sudo depmod "$(uname -r)"
    • Ensure the original veth module is overridden: sudo modinfo -k "$(uname -r)" veth and inspect the filename: field, which should contain the veth.ko in updates/ directory, rather than the original one in kernel/drivers/net/
  9. Replace the current loaded veth module

    • Stop all docker containers
    • Stop dockerd (including docker.service and docker.socket systemd units) to prevent the creation of new containers during the process
    • Using ip link show type veth to ensure no veth interfaces are present
    • Execute sudo rmmod veth to unload the current loaded original veth module
    • Execute sudo modprobe -v veth to load the built patched veth module. The command should prints the path of the actual loaded veth module. Confirm the loaded module is the patched one
    • Start docker daemon and all containers needed

The change made above will persist across reboots, as long as the next boot kernel is exactly the same as the current running kernel. If the kernel version has been upgraded since this boot, execute the first 8 steps on the version of the kernel which will be boot into next time. Install the development package of that kernel in step 3, remember to create a fresh new directory in step 6, and replace all the $(uname -r) with the exact kernel release version of next boot.

To revert the changes, simply remove the installed veth.ko from the updates/ directory and re-run depmod and follow the 9th step to replace the current loaded veth module.

@attie-argentum
Copy link

Adding nr_cpu=56 in my case has allowed the system to run fine until yesterday... longer perhaps, but certainly not a "fix"

@bendem
Copy link

bendem commented Apr 15, 2024

If you are really affected by this bug, I recommend downgrading your kernel to versions <= 5.14 provided by your linux distribution.

RHEL8 is affected with kernel 4.18.0-513.18.1.el8_9.x86_64. Has someone reported the problem to them already? Guessing they won't care since they don't support docker in the first place, but it probably has impact on other things.

@ExpliuM
Copy link

ExpliuM commented Apr 16, 2024

We also suffer from this issue on RHEL 8.9 with kernel version4.18.0-513.11.1.el8_9.x86_64

@pschoen-itsc
Copy link

Adding nr_cpu=56 in my case has allowed the system to run fine until yesterday... longer perhaps, but certainly not a "fix"

The idea behind to nr_cpu workaround is to reduce to number of cpus the kernel thinks the machine has. This works well with VMs, because one VM normally has way less cores then the host system could provide. If you want to use 56 cores, then this workaround does not work well. For us, we have smaller VMs (4-6 cores) and it works without any problems.

@nblazincic
Copy link

We are facing this issue with docker 26 on ubuntu 22.0.4 LTS
As I can see it, neither the nrcpu or vm.swap fixed this issue.
Is this a confirmed kernel issue or a docker problem ?

@shankerwangmiao
Copy link

We are facing this issue with docker 26 on ubuntu 22.0.4 LTS As I can see it, neither the nrcpu or vm.swap fixed this issue. Is this a confirmed kernel issue or a docker problem ?

Can you look into the kernel startup log, and find the following line:

smpboot: Allowing XX CPUs, X hotplug CPUs

and see how many CPUS are allocated?

@nblazincic
Copy link

@shankerwangmiao Thank you for your quick reply.
kernel: smpboot: Allowing 240 CPUs, 238 hotplug CPUs
Do you think nrcpu or disabling cpu hot add on hypervisor could fix the issue ?
Machines have 2 vCpus assigned

@shankerwangmiao
Copy link

shankerwangmiao commented Apr 26, 2024

@shankerwangmiao Thank you for your quick reply. kernel: smpboot: Allowing 240 CPUs, 238 hotplug CPUs Do you think nrcpu or disabling cpu hot add on hypervisor could fix the issue ? Machines have 2 vCpus assigned

Yes, either specifying nr_cpus=2 or disabling cpu hot add on the hypervisor side should work this issue around.

Currently, Debian and Ubuntu neither releases kernel package including this patch.

@rdelangh
Copy link

I have exactly the same error.
Running ubuntu 23.10, kernel 6.5.0-28, 56 processors, 755GB RAM (744GB free)

Awaiting for the release (soon, normally) of Ubuntu 24.04 ... with (I assume) a patched kernel.

@nblazincic
Copy link

@shankerwangmiao solution was correct in our case. We have no more issues
Thank you.

@rdelangh
Copy link

rdelangh commented May 21, 2024

I have exactly the same error. Running ubuntu 23.10, kernel 6.5.0-28, 56 processors, 755GB RAM (744GB free)

Awaiting for the release (soon, normally) of Ubuntu 24.04 ... with (I assume) a patched kernel.

Still on my Ubuntu 23.10, I have downgraded to kernel to 5.15.151 because in the above messages this release is listed as one of the patched kernels:

# uname -r
5.15.151-0515151-generic

Using this Dockerfile:

FROM inernetsystemsconsortium/bind9
ENV TZ MET
CMD [ "/usr/sbin/named", "-4", "-f", "-u", "bind" ]
VOLUME /store/central/dns/secondary /etc/bind9
VOLUME /dev/log /dev/log

I have build the image "my_named_img", and launched a container with the command "bash" to be able to startup the process interactively (and capture the errors):

$ podman run -p 10053:53 -p 10053:53/udp --name bind9-container-slave1 -it -e TZ=MET -v /store/central/dns/primary/cfg:/etc/bind -v /dev/log:/dev/log my_named_img:latest /bin/bash
root@c28c01733e24:/# apt install -y strace
root@c28c01733e24:/# strace -f /usr/sbin/named -4 -u bind 2>&1
...
[pid   485] mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid   485] brk(0x55dc3ad94000)         = 0x55dc3ad73000
[pid   485] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid   485] futex(0x7fd58f375210, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 88, MSG_NOSIGNAL, NULL, 0) = 88
[pid   485] mprotect(0x7fd5880da000, 4096, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory)
[pid   485] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd380000000
[pid   485] munmap(0x7fd384000000, 67108864) = 0
[pid   485] mprotect(0x7fd380000000, 135168, PROT_READ|PROT_WRITE) = -1 ENOMEM (Cannot allocate memory)
[pid   485] munmap(0x7fd380000000, 67108864) = 0
[pid   485] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 74, MSG_NOSIGNAL, NULL, 0) = 74
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 133, MSG_NOSIGNAL, NULL, 0) = 133
[pid   485] getpid()                    = 484
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 116, MSG_NOSIGNAL, NULL, 0) = 116
[pid   485] getpid()                    = 484
...
[pid   485] sendto(3, "<26>May 21 14:43:56 named[484]: "..., 66, MSG_NOSIGNAL, NULL, 0) = 66
[pid   485] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
[pid   485] gettid()                    = 485
[pid   485] getpid()                    = 484
[pid   485] tgkill(484, 485, SIGABRT)   = 0
[pid   485] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=484, si_uid=100} ---
[pid   591] <... futex resumed>)        = 0
[pid   577] <... futex resumed>)        = ? <unavailable>
[pid   594] <... futex resumed>)        = 0
[pid   576] <... futex resumed>)        = 0
[pid   596] <... futex resumed>)        = ?
[pid   597] <... futex resumed>)        = ?
[pid   595] <... futex resumed>)        = ?
...
[pid   487] <... futex resumed>)        = ?
[pid   486] <... futex resumed>)        = ?
[pid   484] <... rt_sigtimedwait resumed> <unfinished ...>) = ?
[pid   547] +++ killed by SIGABRT (core dumped) +++
[pid   552] +++ killed by SIGABRT (core dumped) +++
[pid   560] +++ killed by SIGABRT (core dumped) +++
[pid   563] +++ killed by SIGABRT (core dumped) +++
[pid   483] <... read resumed>"", 1)    = 0
[pid   485] +++ killed by SIGABRT (core dumped) +++
[pid   484] +++ killed by SIGABRT (core dumped) +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=484, si_uid=100, si_status=SIGABRT, si_utime=2 /* 0.02 s */, si_stime=5 /* 0.05 s */} ---
exit_group(1)                           = ?
+++ exited with 1 +++
root@c28c01733e24:/#

What surprises me are the many different PIDs, as if "named" is spawning many child procs...

@doonydoo
Copy link

doonydoo commented Jun 4, 2024

I had a vm.overcommit_memory = 2 and got same error
solution for me: vm.overcommit_memory = 0

@shankerwangmiao
Copy link

Debian has released kernel package with this fix:

  • Bookworm: Upgrade to linux-image-6.1.0-19-amd64 or above;
  • Bullseye-backports: Upgrade to linux-image-6.1.0-0.deb11.21-amd64 or above;
  • Bullseye and older: not affected

@rdelangh
Copy link

rdelangh commented Jun 5, 2024

I had a vm.overcommit_memory = 2 and got same error solution for me: vm.overcommit_memory = 0

@shankerwangmiao : on my Ubuntu 24.04, kernel 6.8.0-31, my setting of the "overcommit_memory" is already equal to zero (0) but I still have the error

@shankerwangmiao
Copy link

I had a vm.overcommit_memory = 2 and got same error solution for me: vm.overcommit_memory = 0

@shankerwangmiao : on my Ubuntu 24.04, kernel 6.8.0-31, my setting of the "overcommit_memory" is already equal to zero (0) but I still have the error

I believe 6.8.0 should not be affected by this. Can you show the output of uname -a? Also can you find the following line in kernel dmesg,

smpboot: Allowing XX CPUs, X hotplug CPUs

and see how many CPUS are allocated?

Can you see dockerd: page allocation failure: order:X in dmesg when the container fails to start?

@shankerwangmiao
Copy link

I had a vm.overcommit_memory = 2 and got same error solution for me: vm.overcommit_memory = 0

@shankerwangmiao : on my Ubuntu 24.04, kernel 6.8.0-31, my setting of the "overcommit_memory" is already equal to zero (0) but I still have the error

I've seen your previously posted log, and I should say your problem is totally not relevant to this issue. Although there are words "cannot allocate memory" in the title, the issue happens when the container runtime trying to create a veth pair before starting the container. In your case, I can clearly see that the container started successfully, entering a bash shell inside the container. The error happened when you starting the process named in that container, and getting -ENOMEM on syscall mprotect, which is clearly something not normal, but may be caused by various reasons.

After all, this is for docker related issues, while you are using podman ....

@rdelangh
Copy link

rdelangh commented Jun 6, 2024

@shankerwangmiao : ok clear, indeed I seem to encounter another issue, not related to the veth interfaces. Sorry for the noise ;-)

@skast96
Copy link

skast96 commented Oct 11, 2024

I am 100% sure it has to do something with our VPS hoster. If the kernel settings of the virtualized server is not correctly set this error happens after some time => memory cant be released and is running full => restart does fix that problem as the memory is released on reboot of a VPS. Our hoster https://www.easyname.at/en did give us more kernel space but this did only move the problem in an uncertain future.

@hufon
Copy link

hufon commented Oct 30, 2024

Does anyone know if the problem exists on RHEL9 kernels?
Patch seems to be applied on RHEL9 kernels : https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/blob/main/drivers/net/veth.c?ref_type=heads&blame=0#L1411

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests