Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input/Output error with libc.so.6 when using udba=none on kernel 5.10+ during remount loop #44

Open
arturpzol opened this issue Aug 23, 2024 · 84 comments

Comments

@arturpzol
Copy link

I have encountered an issue with the AUFS filesystem that occurs sporadically during system startup. The error message I receive is:

error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error

Upon further investigation, I found that this issue also occurs when I run a remount loop:

mount |grep aufs
aufs on / type aufs (rw,relatime,sync,si=57b4c9ce733e61da,udba=none,nowarn_perm)
while true; do mount -o remount / ; done

After some time, the following errors appear:

mount: error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error
mount: error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error
mount: error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error

From that point on, it becomes impossible to run any command in the system as all attempts result in an "Input/output error."

The issue occurs on the latest kernel (6.10.6) and AUFS version (6.10-20240722).
After conducting some tests, I found that this problem started appearing with kernel 5.10 and has not occurred on kernel 5.9 (aufs 5.9-20210906). Something must have changed starting with kernel 5.10 that causes this issue to appear.

The issue only arises when udba=none is set while mounting AUFS. When udba=notify is used, the problem does not occur.

I understand that running a remount loop is not typical in normal environments, but the fact that it leads to such critical issues suggests a potential bug in AUFS.

@sfjro , would you be able to run a remount loop on your setup with AUFS using udba=none and check if the issue occurs for you as well?

@sfjro
Copy link
Owner

sfjro commented Aug 24, 2024 via email

@arturpzol
Copy link
Author

@sfjro thank you for attempting to reproduce the problem in your environment.

The issue I occasionally experience during system startup is very rare, and I have no way to retrieve any logs from that startup. However, the loop that remounts the main file system tends to trigger the problem relatively quickly. It's possible that these are two separate issues...

Answers to your questions:

  • How did you mount your root aufs at the beginning?
    initramfs + busybox + swtich_root?

yes exacly, it is initramfs + busybox + swtich_root

  • How may branches does your root aufs have?

The number of branches seems to be irrelevant here, as the problem occurs with both 3 and 10 branches, for example. However, in my case, I have this native configuration::

find /sys/fs/aufs/si_876424372235a759/*
/sys/fs/aufs/si_876424372235a759/br0
/sys/fs/aufs/si_876424372235a759/br1
/sys/fs/aufs/si_876424372235a759/br2
/sys/fs/aufs/si_876424372235a759/br3
/sys/fs/aufs/si_876424372235a759/br4
/sys/fs/aufs/si_876424372235a759/brid0
/sys/fs/aufs/si_876424372235a759/brid1
/sys/fs/aufs/si_876424372235a759/brid2
/sys/fs/aufs/si_876424372235a759/brid3
/sys/fs/aufs/si_876424372235a759/brid4
/sys/fs/aufs/si_876424372235a759/xi_path
cat /sys/fs/aufs/si_876424372235a759/*
/mnt/live/changes=rw
/mnt/live/images/ssh.lzm=rr+wh
/mnt/live/images/linux-6.10-oe64.lzm=rr+wh
/mnt/live/images/baseosx-apache.lzm=rr+wh
/mnt/live/images/baseos.lzm=rr+wh
64
68
67
66
65
/mnt/live/xino/.aufs.xino
  • Which branch contains libc.so.6?

baseos.lzm

I need to know which process got that error. Busybox(mount), /bin/mount,
or /sbin/mount.aufs? Could you try strace or something?

In the case of a bash loop that is executed and remounts /, it uses /bin/mount.

The initial problem that occurs during system startup is triggered by various commands. For example, I get:


INIT: Id "2" respawning too fast: disabled for 5 minutes
INIT: Id "3" respawning too fast: disabled for 5 minutes/sbin/getty: error while loading shared libraries: 
libc.so.6: cannot open shared object file: Input/output error
/sbin/getty: error while loading shared libraries: libc.so.6: cannot open shared object file: Input/
/sbin/getty: error while loading shared libraries: libc.so.6: cannot open shared object file: Input/
output error
INIT: Id "4" respawning too fast: disabled for 5 minutes
INIT: Id "5" respawning too fast: disabled for 5 minutes
INIT: Id "6" respawning too fast: disabled for 5 minutes
  • Are there any kernel log left?

No, there are no call traces or other errors in the kernel logs that would indicate the cause of the problem.

  • Have you installed aufs-util.git on your system?

aufs-util.git is installed.

  • Is your kernel patched by a linux distribution?

it is vanila 6.10.6 kernel with aufs 6.10-20240722 compiled by me.

  • How is your aufs configuration (define/undefine CONFIG_AUFS_xxx)?

AUFS configuration:

cat /sys/fs/aufs/config              
CONFIG_AUFS_FS=m
CONFIG_AUFS_BRANCH_MAX_127=y
CONFIG_AUFS_SBILIST=y
CONFIG_AUFS_HNOTIFY=y
CONFIG_AUFS_HFSNOTIFY=y
CONFIG_AUFS_EXPORT=y
CONFIG_AUFS_INO_T_64=y
CONFIG_AUFS_BDEV_LOOP=y
CONFIG_AUFS_DEBUG=y
CONFIG_AUFS_MAGIC_SYSRQ=y
  • How is your LSM setting?

no any LSM settings

Strace from while true; do strace mount -o remount / ; done

...
...
execve("/bin/mount", ["mount", "-o", "remount", "/"], [/* 18 vars */]) = 0
brk(0)                                  = 0x6daa000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f227000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=59821, ...}) = 0
mmap(NULL, 59821, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f058f211000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libmount.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\254\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=284096, ...}) = 0
mmap(NULL, 2383648, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058ec00000
mprotect(0x7f058ec43000, 2097152, PROT_NONE) = 0
mmap(0x7f058ee43000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x43000) = 0x7f058ee43000
mmap(0x7f058ee45000, 3872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058ee45000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20c\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=142728, ...}) = 0
mmap(NULL, 2246896, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058e800000
mprotect(0x7f058e821000, 2097152, PROT_NONE) = 0
mmap(0x7f058ea21000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x21000) = 0x7f058ea21000
mmap(0x7f058ea23000, 6384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058ea23000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P\34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1738176, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f226000
mmap(NULL, 3844640, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058e400000
mprotect(0x7f058e5a1000, 2097152, PROT_NONE) = 0
mmap(0x7f058e7a1000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a1000) = 0x7f058e7a1000
mmap(0x7f058e7a7000, 14880, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058e7a7000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libblkid.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\210\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=258688, ...}) = 0
mmap(NULL, 2358248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058e000000
mprotect(0x7f058e03b000, 2097152, PROT_NONE) = 0
mmap(0x7f058e23b000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3b000) = 0x7f058e23b000
mmap(0x7f058e23f000, 3048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058e23f000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpcre.so.3", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\27\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=448440, ...}) = 0
mmap(NULL, 2543976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058dc00000
mprotect(0x7f058dc6c000, 2097152, PROT_NONE) = 0
mmap(0x7f058de6c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6c000) = 0x7f058de6c000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=14664, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f225000
mmap(NULL, 2109712, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058d800000
mprotect(0x7f058d803000, 2093056, PROT_NONE) = 0
mmap(0x7f058da02000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f058da02000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libuuid.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \26\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=18904, ...}) = 0
mmap(NULL, 2113952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058d400000
mprotect(0x7f058d404000, 2093056, PROT_NONE) = 0
mmap(0x7f058d603000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7f058d603000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320n\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=137384, ...}) = 0
mmap(NULL, 2213008, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f058d000000
mprotect(0x7f058d018000, 2093056, PROT_NONE) = 0
mmap(0x7f058d217000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f058d217000
mmap(0x7f058d219000, 13456, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f058d219000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f224000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f223000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f20f000
arch_prctl(ARCH_SET_FS, 0x7f058f20f840) = 0
mprotect(0x7f058e7a1000, 16384, PROT_READ) = 0
mprotect(0x7f058d217000, 4096, PROT_READ) = 0
mprotect(0x7f058d603000, 4096, PROT_READ) = 0
mprotect(0x7f058da02000, 4096, PROT_READ) = 0
mprotect(0x7f058de6c000, 4096, PROT_READ) = 0
mprotect(0x7f058e23b000, 12288, PROT_READ) = 0
mprotect(0x7f058ea21000, 4096, PROT_READ) = 0
mprotect(0x7f058ee43000, 4096, PROT_READ) = 0
mprotect(0x608000, 4096, PROT_READ)     = 0
mprotect(0x7f058f220000, 4096, PROT_READ) = 0
munmap(0x7f058f211000, 59821)           = 0
set_tid_address(0x7f058f20fb10)         = 31590
set_robust_list(0x7f058f20fb20, 24)     = 0
rt_sigaction(SIGRTMIN, {0x7f058d0069b0, [], SA_RESTORER|SA_SIGINFO, 0x7f058d00f890}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f058d006a40, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f058d00f890}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
statfs("/sys/fs/selinux", 0x7ffcb7d1f500) = -1 ENOENT (No such file or directory)
statfs("/selinux", {f_type=0x61756673, f_bsize=4096, f_blocks=1013432, f_bfree=996590, f_bavail=940078, f_files=262144, f_ffree=255896, f_fsid={0, 0}, f_namelen=242, f_frsize=4096}) = 0
brk(0)                                  = 0x6daa000
brk(0x6dcb000)                          = 0x6dcb000
open("/proc/filesystems", O_RDONLY)     = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f21f000
read(3, "nodev\tsysfs\nnodev\ttmpfs\nnodev\tbd"..., 1024) = 413
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0x7f058f21f000, 4096)            = 0
getuid()                                = 0
geteuid()                               = 0
getuid()                                = 0
geteuid()                               = 0
getgid()                                = 0
getegid()                               = 0
prctl(PR_GET_DUMPABLE)                  = 1
stat("/etc/fstab", {st_mode=S_IFREG|0644, st_size=79, ...}) = 0
open("/etc/fstab", O_RDONLY|O_CLOEXEC)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=79, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f21f000
read(3, "aufs       /    rootfs  sync\t\t0 "..., 1024) = 79
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0x7f058f21f000, 4096)            = 0
stat("/sbin/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
stat("/sbin/fs.d/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
stat("/sbin/fs/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
getuid()                                = 0
geteuid()                               = 0
getgid()                                = 0
getegid()                               = 0
prctl(PR_GET_DUMPABLE)                  = 1
getuid()                                = 0
geteuid()                               = 0
getgid()                                = 0
getegid()                               = 0
prctl(PR_GET_DUMPABLE)                  = 1
stat("/run", {st_mode=S_IFDIR|0755, st_size=1200, ...}) = 0
lstat("/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0
lstat("/run/mount/utab", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/run/mount/utab", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 3
close(3)                                = 0
mount("aufs", "/", 0x6daa2d0, MS_MGC_VAL|MS_SYNCHRONOUS|MS_REMOUNT, NULL) = 0
access("/", W_OK)                       = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++
-bash: /usr/bin/strace: Input/output error
-bash: /usr/bin/strace: Input/output error
-bash: /usr/bin/strace: Input/output error

It seems that the remount is successful because the exit code is 0, but everything afterward results in an Input/output error.

I enabled debug mode for AUFS, but there are a lot of logs, so I'm pasting a snippet where a -5 error occurs:

<7>[  283.876619][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: ld.so.cache, f_flags 0x8000, f_mode 0x801d
<7>[  283.876641][T10391] aufs au_reval_and_lock_fdi:755:mount[10391]: DEBUG: sigen 23, figen 22
<7>[  283.876651][T10391] aufs au_refresh_dentry:792:mount[10391]: DEBUG: dentry
<7>[  283.876654][T10391] d-1: 000000006c6ca199, etc/ld.so.cache?, aufs, cnt 1, flags 0x480040, hashed
<7>[  283.876661][T10391] i-1: 00000000fda5122e, i33, aufs, cnt 1, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 0, st 0x0, f 0x0, v 2, g d525d45
<7>[  283.876666][T10391] d-1: btop 0, bbot 0, bwh -1, bdiropq -1, gen 22, tmp 0
<7>[  283.876668][T10391] d0: 00000000ed6e068d, etc/ld.so.cache?, ext4, cnt 2, flags 0x480040, hashed
<7>[  283.876670][T10391] i0: 00000000c27efab9, i136782, ext4, cnt 2, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 15, st 0x0, f 0x0, v 1, g 28a3fb87
<7>[  283.876677][T10391] aufs au_lkup_dentry:231:mount[10391]: DEBUG: positive
<7>[  283.876679][T10391] aufs au_refresh_dentry:804:mount[10391]: DEBUG: dentry
<7>[  283.876681][T10391] d-1: 000000006c6ca199, etc/ld.so.cache?, aufs, cnt 1, flags 0x480040, hashed
<7>[  283.876683][T10391] i-1: 00000000fda5122e, i33, aufs, cnt 1, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 0, st 0x0, f 0x0, v 2, g d525d45
<7>[  283.876686][T10391] d-1: btop 0, bbot 0, bwh -1, bdiropq -1, gen 22, tmp 0
<7>[  283.876688][T10391] d0: 00000000ed6e068d, etc/ld.so.cache?, ext4, cnt 3, flags 0x480040, hashed
<7>[  283.876690][T10391] i0: 00000000c27efab9, i136782, ext4, cnt 2, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 15, st 0x0, f 0x0, v 1, g 28a3fb87
<7>[  283.876693][T10391] aufs au_refresh_dentry:810:mount[10391]: DEBUG: dentry
<7>[  283.876695][T10391] d-1: 000000006c6ca199, etc/ld.so.cache?, aufs, cnt 1, flags 0x480040, hashed
<7>[  283.876696][T10391] i-1: 00000000fda5122e, i33, aufs, cnt 1, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 0, st 0x0, f 0x0, v 2, g d525d45
<7>[  283.876699][T10391] d-1: btop 0, bbot 0, bwh -1, bdiropq -1, gen 22, tmp 0
<7>[  283.876701][T10391] d0: 00000000ed6e068d, etc/ld.so.cache?, ext4, cnt 3, flags 0x480040, hashed
<7>[  283.876702][T10391] i0: 00000000c27efab9, i136782, ext4, cnt 2, nl 1, 0100644, sz 59083, blk 120, hn -1, ct 36345, np 15, st 0x0, f 0x0, v 1, g 28a3fb87
<7>[  283.876706][T10391] aufs au_refresh_by_dinfo:663:mount[10391]: DEBUG: positive originally
<7>[  283.876707][T10391] aufs au_refresh_by_dinfo:676:mount[10391]: DEBUG: positive --> positive, same type
<7>[  283.876752][T10391] aufs aufs_atomic_open:352:mount[10391]: DEBUG: open_flag 0100000
<7>[  283.876754][T10391] aufs aufs_atomic_open:353:mount[10391]: DEBUG: dentry
<7>[  283.876756][T10391] d-1: 0000000075755c48, x86_64-linux-gnu/libmount.so.1?, aufs, cnt 1, flags 0x10000000, unhashed
<7>[  283.876759][T10391] i-1: err 0
<7>[  283.876766][T10391] aufs au_lkup_dentry:231:mount[10391]: DEBUG: positive
<7>[  283.876777][T10391] aufs au_new_inode:414:mount[10391]: DEBUG: i2381
<7>[  283.876785][T10391] aufs au_new_inode:420:mount[10391]: DEBUG: 8, new 1
<7>[  283.876798][T10391] aufs aufs_atomic_open:374:mount[10391]: DEBUG: dentry
<7>[  283.876799][T10391] d-1: 0000000075755c48, x86_64-linux-gnu/libmount.so.1?, aufs, cnt 1, flags 0x600000, hashed
<7>[  283.876801][T10391] i-1: 00000000bcc9aa1d, i2381, aufs, cnt 1, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 0, st 0x0, f 0x0, v 2, g d5261ab
<7>[  283.876805][T10391] d-1: btop 14, bbot 14, bwh -1, bdiropq -1, gen 23, tmp 0
<7>[  283.876807][T10391] d14: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 1, flags 0x680040, hashed
<7>[  283.876809][T10391] i14: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[  283.876813][T10391] aufs au_aopen_no_open:326:mount[10391]: DEBUG: here
<7>[  283.876815][T10391] aufs aufs_atomic_open:446:mount[10391]: DEBUG: file
<7>[  283.876816][T10391] f-1: mode 0x1, flags 0100000, cnt 1, v 0, pos 0
<7>[  283.876819][T10391] d-1: 0000000075755c48, x86_64-linux-gnu/libmount.so.1?, aufs, cnt 2, flags 0x600000, hashed
<7>[  283.876820][T10391] i-1: 00000000bcc9aa1d, i2381, aufs, cnt 1, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 0, st 0x0, f 0x0, v 2, g d5261ab
<7>[  283.876825][T10391] aufs aufs_get_link:1366:mount[10391]: DEBUG: page_get_link
<7>[  283.876834][T10391] aufs aufs_get_link:1367:mount[10391]: DEBUG: h_dentry
<7>[  283.876835][T10391] d-1: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 2, flags 0x680040, hashed
<7>[  283.876836][T10391] i-1: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[  283.876844][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d
<7>[  283.876943][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d
<7>[  283.877015][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5
<7>[  283.877088][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5
<7>[  283.878446][T10392] aufs aufs_open_nondir:90:bash[10392]: DEBUG: mount, f_flags 0x8020, f_mode 0x801d
<7>[  283.878467][T10392] aufs aufs_get_link:1366:bash[10392]: DEBUG: page_get_link
<7>[  283.878471][T10392] aufs aufs_get_link:1367:bash[10392]: DEBUG: h_dentry
<7>[  283.878473][T10392] d-1: 000000007e2efab5, lib64/ld-linux-x86-64.so.2?, squashfs, cnt 2, flags 0x680040, hashed
<7>[  283.878477][T10392] i-1: 00000000ff8c3b4a, i3147, squashfs, cnt 2, nl 1, 0120777, sz 32, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[  283.878484][T10392] aufs aufs_open_nondir:90:bash[10392]: DEBUG: ld-2.19.so, f_flags 0x8020, f_mode 0x801d
<7>[  283.878709][T10392] aufs aufs_open_nondir:90:mount[10392]: DEBUG: ld.so.cache, f_flags 0x8000, f_mode 0x801d
<7>[  283.878734][T10392] aufs aufs_get_link:1366:mount[10392]: DEBUG: page_get_link
<7>[  283.878738][T10392] aufs aufs_get_link:1367:mount[10392]: DEBUG: h_dentry
<7>[  283.878740][T10392] d-1: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 2, flags 0x680040, hashed
<7>[  283.878743][T10392] i-1: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[  283.878748][T10392] aufs aufs_open_nondir:90:mount[10392]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d
<7>[  283.878806][T10392] aufs aufs_open_nondir:90:mount[10392]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d
<7>[  283.878862][T10392] aufs aufs_get_link:1378:mount[10392]: DEBUG: err -5
<7>[  283.878890][T10392] aufs aufs_get_link:1378:mount[10392]: DEBUG: err -5
<7>[  283.886528][T10393] aufs aufs_open_nondir:90:bash[10393]: DEBUG: mount, f_flags 0x8020, f_mode 0x801d
<7>[  283.886568][T10393] aufs aufs_get_link:1366:bash[10393]: DEBUG: page_get_link
<7>[  283.886577][T10393] aufs aufs_get_link:1367:bash[10393]: DEBUG: h_dentry
<7>[  283.886579][T10393] d-1: 000000007e2efab5, lib64/ld-linux-x86-64.so.2?, squashfs, cnt 2, flags 0x680040, hashed
<7>[  283.886585][T10393] i-1: 00000000ff8c3b4a, i3147, squashfs, cnt 2, nl 1, 0120777, sz 32, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[  283.886632][T10393] aufs aufs_open_nondir:90:bash[10393]: DEBUG: ld-2.19.so, f_flags 0x8020, f_mode 0x801d
<7>[  283.887012][T10393] aufs aufs_open_nondir:90:mount[10393]: DEBUG: ld.so.cache, f_flags 0x8000, f_mode 0x801d
<7>[  283.887044][T10393] aufs aufs_get_link:1366:mount[10393]: DEBUG: page_get_link
<7>[  283.887048][T10393] aufs aufs_get_link:1367:mount[10393]: DEBUG: h_dentry
<7>[  283.887050][T10393] d-1: 00000000683fcaae, x86_64-linux-gnu/libmount.so.1?, squashfs, cnt 2, flags 0x680040, hashed
<7>[  283.887053][T10393] i-1: 00000000fbbaf3f7, i2887, squashfs, cnt 2, nl 1, 0120777, sz 17, blk 0, hn -1, ct 55296, np 1, st 0x0, f 0x0, v 0, g 0
<7>[  283.887061][T10393] aufs aufs_open_nondir:90:mount[10393]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d
<7>[  283.887141][T10393] aufs aufs_open_nondir:90:mount[10393]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d
<7>[  283.887209][T10393] aufs aufs_get_link:1378:mount[10393]: DEBUG: err -5
<7>[  283.887249][T10393] aufs aufs_get_link:1378:mount[10393]: DEBUG: err -5

I noticed that in the kernel between 5.9 and 5.10 (from which the problem started) there were several changes in aufs_remount_fs, au_remount_refresh and the like. Maybe the problem started somewhere here?

@sfjro
Copy link
Owner

sfjro commented Aug 28, 2024 via email

@arturpzol
Copy link
Author

@sfjro

If you change /etc/fstab, these stat(".../mount.rootfs") will change to
".../mount.aufs" (and should be found and executed).
But it may NOT be related to the problem.

Even after switching from rootfs to aufs in fstab and using the mount.aufs command, the input/output error still persists.

Why 2? Threaded? Logging order?

This issue might be related to the order in which logs are being written via syslog.

This may be an unknown bug. I will investigate more.

Thank you.

If you need any additional logs or debugging information, please let me know. I will also try to analyze the issue on my side.

@sfjro
Copy link
Owner

sfjro commented Aug 29, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Aug 29, 2024 via email

@arturpzol
Copy link
Author

@sfjro in the aufs_debug_io_error.log file, I have included logs from the enabled debug mode in AUFS (kernel v6.10.6) along with the additional proposed logging for digen and btop. The first occurrence of the input/output error happened around the timestamp value 952.

OK then, did you run "mount --move" for your all branches before you
mount aufs?
I want to make sure that your branhces (/mnt/live/changes and
/mnt/live/iamges/*.lzm) are living there after swtich_root.

Yes, I perform a mount --move on my branches before mounting AUFS.

@sfjro
Copy link
Owner

sfjro commented Aug 30, 2024 via email

@arturpzol
Copy link
Author

@sfjro

The previous log had to be truncated by logrotate. Sorry for that.

I am resending the logs aufs_io_error.log , but without debugging enabled in aufs (echo 1 > /sys/module/aufs/parameters/debug) because with debugging enabled, it is very difficult to reproduce the problem due to the large amount of generated logs. My system cannot handle processing such a large number of logs.

Anyway let me know if you need the full logs with debugging enabled in aufs, and I will keep trying until I succeed.

@sfjro
Copy link
Owner

sfjro commented Aug 31, 2024 via email

@arturpzol
Copy link
Author

@sfjro additional logs aufs_debug_io_error1.log with debug for remount.

Thank You

@sfjro
Copy link
Owner

sfjro commented Aug 31, 2024 via email

@arturpzol
Copy link
Author

arturpzol commented Aug 31, 2024

@sfjro

What kind of logs do you think are missing (debug from fsctx.c file ?) The last logs were generated while skipping syslog. I'm using the dmesg -n8 && cat /proc/kmsg command, so it seems strange that log_buf_len would matter here. All kernel logs are currently shown in the terminal, then I copy the contents of the terminal and put them in a log file (additionally, I set the terminal history to infinite). I do this on purpose to bypass system services such as syslog which would not be able to write anything in the event of an input/output error.

The cron job is intentionally set to run every minute because the more commands are executed on the system during the remount, the faster I can reproduce the problem.

@sfjro
Copy link
Owner

sfjro commented Sep 1, 2024 via email

@arturpzol
Copy link
Author

@sfjro

I'm sending the logs again aufs_debug_io_error2.log. I set log_buf_len to 10M and made sure that each log with sigen information for a given process is duplicated.

Thank You

@sfjro
Copy link
Owner

sfjro commented Sep 3, 2024 via email

@arturpzol
Copy link
Author

error" logs, but no "aufs_get_link: err -5" log at all. I am not sure
where EIO comes from.

Could it be that debug wasn't enabled in AUFS? As I mentioned, it's difficult to reproduce the issue when debug is enabled, but I kept trying until I succeeded, and the file aufs_debug_io_error2.log now contains another log, this time with debug enabled.

Would you apply this another additional debug patch?

I'm having a bit of a problem here: despite the patch being applied (I double-checked), there is no debug (from the last patch) information in the log. It's as if the condition is always false and never enters the pr_info function.

However, I hope that the log with debugging enabled (echo 1 > /sys/module/aufs/parameters/debug) will be now more useful.

@sfjro
Copy link
Owner

sfjro commented Sep 4, 2024 via email

@arturpzol
Copy link
Author

@sfjro

Despite applying the potential fix, the problem is still reproducible. I am sending debug logs aufs_debug_io_error_with_potential_fix.log with the latest patch applied.

Please note that the issue only occurs when udba=none. When udba=notify is set, the problem does not occur.

@sfjro
Copy link
Owner

sfjro commented Sep 7, 2024 via email

@arturpzol
Copy link
Author

@sfjro

Unfortunately, the proposed second fix causes a kernel panic. Here is the log aufs_panic_fix2.log

Now I want you to find all the directories named 'x86_64-linux-gnu',

I have two branches that have a directory /lib/x86_64-linux-gnu and /usr/lib/x86_64-linux-gnu with various library files.

and all the entries (regular file, symlink, etc) named 'libc.so.6' on

Only one branch has libc.so.6.

@sfjro
Copy link
Owner

sfjro commented Sep 11, 2024 via email

@arturpzol
Copy link
Author

@sfjro

And it is a symlink on your 14th branch?
What is the target, and where is it?

Yes, it is symlink on 14th branch which target is libc-2.19.so. Target is also on the same 14th branch.

ls -al /lib/x86_64-linux-gnu/libc.so.6
lrwxrwxrwx 1 root root 12 May 17  2016 /lib/x86_64-linux-gnu/libc.so.6 -> libc-2.19.so

Please modify fs/aufs/dcsub.c (which is patched by my last patch) like
this. (see attached)

Now it looks like the fix is ​​working and the issue is gone. Is this the target fix or just for analysis purposes?

@sfjro
Copy link
Owner

sfjro commented Sep 14, 2024 via email

@arturpzol
Copy link
Author

@sfjro

May I ask you more tests?

Yes, of course. Please write what additional tests I should do.

@sfjro
Copy link
Owner

sfjro commented Sep 18, 2024 via email

@arturpzol
Copy link
Author

@sfjro

I removed all previous patches and applied only the last one (changes in au_do_refresh_d function) you sent and the issue is reproducible.

I hope this helps.

@sfjro
Copy link
Owner

sfjro commented Sep 18, 2024 via email

@arturpzol
Copy link
Author

@sfjro

I am a bit confused about what I should test because there were several proposed patches in this thread (debugging, a potential fix that didn’t work, a fix that caused a kernel panic, and a change that potentially resolved the issue), and now I’m not sure what to revert and what to keep.

In this regard, could you please prepare a patch on the clean AUFS code that I should test?

@sfjro
Copy link
Owner

sfjro commented Sep 19, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Nov 17, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Nov 17, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Nov 18, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Nov 18, 2024 via email

@arturpzol
Copy link
Author

@sfjro

But did you forget applying the patch for the first part of
fs/aufs/i_op_ren:aufs_rename()?

I checked, I have those changes applied, but I don't have CONFIG_AUFS_DEBUG enabled. Maybe that's why?

Hold on.
That part must be less important. You don't have to apply and test
again. I will try another debugging.

OK

I'm feeling sorry to ask you many times, but please apply this
additional debug patch (with the previous debug patch you already
applied) and send me the log.

I’m trying to reproduce the issue with this additional debug patch, but with such a large amount of logs, I haven’t been able to reproduce it so far. However, I’m still trying. The additional debugging might have prevented the race condition from occurring.

I also noticed that the issue is reproducible only on real fast hardware. On a virtual machine, the problem cannot be reproduced.

@arturpzol
Copy link
Author

@sfjro

After many attempts, I was finally able to reproduce the issue. The logs are here

@arturpzol
Copy link
Author

@sfjro

Here is a progress brief from me.
I'm afraid there is something more about this issue.
Applying the previous patch is not enough, it caused another
symptoms. They are all related to the ref-count at the remount time.
I'm digging deeper but it is not easy.

I have a question regarding the initial Input/Output error issue. Are you able to reproduce this problem on your system? I managed to reproduce it on Puppy Linux. I can prepare a description of how to replicate it if you need this.

@arturpzol
Copy link
Author

@sfjro

After adding a spin_lock for inode structure to the au_set_nlink function you recently added, the issue seems to be non-reproducible (or at least harder to reproduce). After nearly 100 restarts, the warning has not occurred even once. Do you think we can use locking in this place?

diff --git a/fs/aufs/vfsub.h b/fs/aufs/vfsub.h
index 9a7d87c..9df208b 100644
--- a/fs/aufs/vfsub.h
+++ b/fs/aufs/vfsub.h
@@ -63,8 +63,10 @@ static inline void au_set_nlink(struct inode *inode, unsigned int nlink)
         * stop setting the value equal to the current one, in order to stop
         * a useless warning from vfs:destroy_inode() about sb->s_remove_count.
         */
+       spin_lock(&inode->i_lock);
        if (nlink != inode->i_nlink)
                set_nlink(inode, nlink);
+       spin_unlock(&inode->i_lock);
 }

@sfjro
Copy link
Owner

sfjro commented Nov 21, 2024 via email

@arturpzol
Copy link
Author

@sfjro

I cannot reproduce the original problem (and destory_inode() warning
too). I appriciate your offer. I will try AFTER solving destory_inode()
warning.

To reproduce the issue with Input/Output errors, try running F96-CE_4.iso (https://f96.puppylinux.com/), as it uses AUFS by default. It can also be run as virtual machine.

First, remount / with udba=none:

mount -o remount,udba=none /

Then, execute the following script to repeatedly remount /:

for i in {1..10}; do
    while true; do mount -o remount / ; done &
done

Wait for a while, as the Input/Output error should occur.

I know that such repeated remounting is not a real-world scenario, but it accelerates the occurrence of the problem.

Currently I agree that the simple spinlock is best
for this problem

Great, thank you.

@sfjro
Copy link
Owner

sfjro commented Nov 26, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Nov 26, 2024 via email

@arturpzol
Copy link
Author

@sfjro

With your refactor, the warning also seems to be non-reproducible. Are you planning to commit these changes?

@sfjro
Copy link
Owner

sfjro commented Nov 29, 2024 via email

sfjro added a commit to sfjro/aufs-linux that referenced this issue Dec 1, 2024
By the commit in linux-v3.3-rc1
        7ada4db 2012-01-06 vfs: count unlinked inodes
vfs:__destroy_inode() became available to produce a warning about
sb->s_remove_count.
Basically inode->i_nlink should be referenced by anytime, and
protected by inode_lock() or something in changing-time.
In aufs, Artur Piechocki found a problem and it is necessary to be
protected by another lock.
The problem is a warning produced by VFS:__destroy_inode() about
superblock->s_remove_count. I am not sure whether the warning appears
since linux-v3.3 or not. Some other recent (much later than v3.3)
changes in mainline MAY be related to the lifetime of inode or its
link count. On my test environment, the warning never appeared.

Here aufs introduces a spinlock dedicated to i_nlink only.
In aufs, every user of i_nlink and VFS functions for it should call
this new function regardless the inode is aufs or not.

Reported by Artur Piechocki on github.

See-also: sfjro/aufs-standalone#44
Signed-off-by: J. R. Okajima <[email protected]>
sfjro added a commit to sfjro/aufs-linux that referenced this issue Dec 1, 2024
For the details, see previous commit (i_nlink 1/2).

See-also: sfjro/aufs-standalone#44
Signed-off-by: J. R. Okajima <[email protected]>
@sfjro
Copy link
Owner

sfjro commented Dec 1, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Dec 18, 2024 via email

@arturpzol
Copy link
Author

arturpzol commented Dec 18, 2024

@sfjro

Unfortunately, I'm not very familiar with Puppy Linux either, but I believe this distribution has a package manager called ppm (Puppy Package Manager). Try running it, and you should be able to install strace from there.

My goal in using Puppy Linux was to reproduce the issue on a different distribution to verify if the problem is specific to my Linux or not.

Debugging the kernel directly here might be challenging since it would likely require enabling some debugging options, but maybe @peabee could help with that.

Thanks for your engagement on this topic!

@peabee
Copy link

peabee commented Dec 18, 2024

You need to download and sfs-load the devx.sfs (hopefully has strace)......
https://mega.nz/folder/j0JQ2RaZ#Uiw3eA8MBOhOxHnwxqRKNg

@sfjro
Copy link
Owner

sfjro commented Dec 18, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Dec 18, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Dec 18, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Dec 19, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Dec 20, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Dec 21, 2024 via email

@sfjro
Copy link
Owner

sfjro commented Dec 23, 2024 via email

@arturpzol
Copy link
Author

@sfjro

Thank you for your analysis, and I apologize for the late reply—it’s been a busy holiday period.

The issue with the /bin/mount shell script does not apply to my environment (let’s remember that Puppy Linux was just an example environment used to reproduce the issue.), as in my case, this file is a binary executable:

file /bin/mount
/bin/mount: setuid ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=a89552e3517b8f88cf73cd84459570a767c54123, stripped
file /sbin/mount.aufs
/sbin/mount.aufs: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.32, BuildID[sha1]=a1a32f0dde8c42c3b06d46f4dde52a3feaf3b5b3, stripped

Which, from the perspective of the Makefile in aufs.utils, should be fine.

MountCmd=/bin/mount
UmountCmd=/bin/umount
override CPPFLAGS += -DMOUNT_CMD=\"${MountCmd}\"
override CPPFLAGS += -DUMOUNT_CMD=\"${UmountCmd}\"

Let’s revisit the post here:
#44 (comment)

as well as the subsequent posts and debug logs where binary /bin/mount invokes /sbin/mount.aufs

However, I will take another look at this and try recompiling aufs.utils one more time.

Regarding:

echo 3 > /proc/sys/vm/drop_caches

Indeed, it does restore the system to working condition, even in the case of my environment.

Additionally, what about the fix proposed here?:

#44 (comment)

How does this fix relate to your latest analysis?

This fix fully resolved the issue in my case.

I wish you a Happy New Year!

@sfjro
Copy link
Owner

sfjro commented Jan 5, 2025 via email

sfjro added a commit to sfjro/aufs-linux that referenced this issue Jan 5, 2025
This may be necessary for the issue reported by Artur Piechocki on
github.

See-also: sfjro/aufs-standalone#44
Signed-off-by: J. R. Okajima <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants