-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input/Output error with libc.so.6 when using udba=none on kernel 5.10+ during remount loop #44
Comments
Hello Artur,
Artur Piechocki:
I have encountered an issue with the AUFS filesystem that occurs sporadically during system startup. The error message I receive is:
`error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error`
:::
```
mount |grep aufs
aufs on / type aufs (rw,relatime,sync,si=57b4c9ce733e61da,udba=none,nowarn_perm)
while true; do mount -o remount / ; done
```
:::
@sfjro , would you be able to run a remount loop on your setup with AUFS using `udba=none` and check if the issue occurs for you as well?
Thanx for the report.
I have tried hundreds times on aufs6.x-rcN (for linux-v6.11-rc4) but
could not reproduce. It may be related to my regular test environment
whose root dir is not aufs.
I have some questions for you.
- How did you mount your root aufs at the beginning?
initramfs + busybox + swtich_root?
- How may branches does your root aufs have?
- Which branch contains libc.so.6?
I need to know which process got that error. Busybox(mount), /bin/mount,
or /sbin/mount.aufs? Could you try strace or something?
Also
- Are there any kernel log left?
- Have you installed aufs-util.git on your system?
- Is your kernel patched by a linux distribution?
- How is your aufs configuration (define/undefine CONFIG_AUFS_xxx)?
- How is your LSM setting?
J. R. Okajima
|
@sfjro thank you for attempting to reproduce the problem in your environment. The issue I occasionally experience during system startup is very rare, and I have no way to retrieve any logs from that startup. However, the loop that remounts the main file system tends to trigger the problem relatively quickly. It's possible that these are two separate issues... Answers to your questions:
yes exacly, it is initramfs + busybox + swtich_root
The number of branches seems to be irrelevant here, as the problem occurs with both 3 and 10 branches, for example. However, in my case, I have this native configuration::
In the case of a bash loop that is executed and remounts The initial problem that occurs during system startup is triggered by various commands. For example, I get:
No, there are no call traces or other errors in the kernel logs that would indicate the cause of the problem.
aufs-util.git is installed.
it is vanila 6.10.6 kernel with aufs 6.10-20240722 compiled by me.
AUFS configuration:
no any LSM settings Strace from
It seems that the remount is successful because the exit code is 0, but everything afterward results in an Input/output error. I enabled debug mode for AUFS, but there are a lot of logs, so I'm pasting a snippet where a -5 error occurs:
I noticed that in the kernel between 5.9 and 5.10 (from which the problem started) there were several changes in |
Artur Piechocki:
Strace from `while true; do strace mount -o remount / ; done`
```
...
...
execve("/bin/mount", ["mount", "-o", "remount", "/"], [/* 18 vars */]) = 0
:::
stat("/etc/fstab", {st_mode=S_IFREG|0644, st_size=79, ...}) = 0
open("/etc/fstab", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=79, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f058f21f000
read(3, "aufs / rootfs sync\t\t0 "..., 1024) = 79
It is wrong.
The line in /etc/fstab should be aufs instead of rootfs.
stat("/sbin/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
stat("/sbin/fs.d/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
stat("/sbin/fs/mount.rootfs", 0x7ffcb7d1e280) = -1 ENOENT (No such file or directory)
If you change /etc/fstab, these stat(".../mount.rootfs") will change to
".../mount.aufs" (and should be found and executed).
But it may NOT be related to the problem.
I enabled debug mode for AUFS, but there are a lot of logs, so I'm pasting a snippet where a -5 error occurs:
```
:::
<7>[ 283.876844][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libmount.so.1.1.0, f_flags 0x8000, f_mode 0x801d
<7>[ 283.876943][T10391] aufs aufs_open_nondir:90:mount[10391]: DEBUG: libselinux.so.1, f_flags 0x8000, f_mode 0x801d
<7>[ 283.877015][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5
<7>[ 283.877088][T10391] aufs aufs_get_link:1378:mount[10391]: DEBUG: err -5
:::
Why 2? Threaded? Logging order?
Anyway, aufs_get_link() got the error EIO, which comes from
aufs_read_lock().
aufs_read_lock() gets EIO when the generation of either the root
dentry and/or inode are unmatched with the generation of superblock.
This may be an unknown bug. I will investigate more.
J. R. Okajima
|
Even after switching from rootfs to aufs in fstab and using the
This issue might be related to the order in which logs are being written via syslog.
Thank you. If you need any additional logs or debugging information, please let me know. I will also try to analyze the issue on my side. |
Artur Piechocki:
> - How did you mount your root aufs at the beginning?
> initramfs + busybox + swtich_root?
yes exacly, it is initramfs + busybox + swtich_root
OK then, did you run "mount --move" for your all branches before you
mount aufs?
I want to make sure that your branhces (/mnt/live/changes and
/mnt/live/iamges/*.lzm) are living there after swtich_root.
J. R. Okajima
|
Artur Piechocki:
If you need any additional logs or debugging information, please let me know. I will also try to analyze the issue on my side.
Thank you.
I'd ask you try this debug patch. It doesn't fix the problem, but will
help us to identify the cause.
The debug log is LOG_INFO, so you may need to customize your
/proc/sys/kernel/printk.
And the base version of this patch is aufs-6.x-rcN for linux-v6.11-rc5.
I hope it doesn't matter for you.
J. R. Okajima
diff --git a/fs/aufs/dentry.c b/fs/aufs/dentry.c
index 7eae27d1e9b1..bafdd54776aa 100644
--- a/fs/aufs/dentry.c
+++ b/fs/aufs/dentry.c
@@ -485,7 +485,8 @@ static int au_do_refresh_hdentry(struct dentry *dentry, struct dentry *parent)
err = 0;
break;
}
- }
+ } else
+ pr_info("%pd2??, btop %d\n", dentry, dinfo->di_btop);
AuTraceErr(err);
return err;
diff --git a/fs/aufs/dinfo.c b/fs/aufs/dinfo.c
index 318aedd357e4..b130d8a1e67a 100644
--- a/fs/aufs/dinfo.c
+++ b/fs/aufs/dinfo.c
@@ -486,6 +486,7 @@ void au_update_dbrange(struct dentry *dentry, int do_put_zero)
if (dinfo->di_btop > dinfo->di_bbot) {
dinfo->di_btop = -1;
dinfo->di_bbot = -1;
+ pr_info("%pd2??, btop %d\n", dentry, dinfo->di_btop);
return;
}
diff --git a/fs/aufs/fsctx.c b/fs/aufs/fsctx.c
index 008a5aaf11e7..b966e45141ee 100644
--- a/fs/aufs/fsctx.c
+++ b/fs/aufs/fsctx.c
@@ -63,9 +63,19 @@ static int au_fsctx_reconfigure(struct fs_context *fc)
goto out;
di_write_lock_child(root);
+pr_info("sigen %u, digen %u, iigen %u\n",
+ au_sbi(sb)->si_generation,
+ au_digen(root),
+ au_iigen(inode, /*igflags*/NULL));
+
/* au_opts_remount() may return an error */
err = au_opts_remount(sb, &a->opts);
+pr_info("sigen %u, digen %u, iigen %u\n",
+ au_sbi(sb)->si_generation,
+ au_digen(root),
+ au_iigen(inode, /*igflags*/NULL));
+
if (au_ftest_opts(a->opts.flags, REFRESH))
au_remount_refresh(sb, au_ftest_opts(a->opts.flags,
REFRESH_IDOP));
diff --git a/fs/aufs/i_op.c b/fs/aufs/i_op.c
index 662704a62d96..ee63f9d83f7d 100644
--- a/fs/aufs/i_op.c
+++ b/fs/aufs/i_op.c
@@ -1332,19 +1332,28 @@ static const char *aufs_get_link(struct dentry *dentry, struct inode *inode,
goto out;
err = aufs_read_lock(dentry, AuLock_IR | AuLock_GEN);
- if (unlikely(err))
+ if (unlikely(err)) {
+ pr_info("%pd2??, digen %u, btop %d\n",
+ dentry, au_digen(dentry), au_dbtop(dentry));
goto out;
+ }
err = au_d_hashed_positive(dentry);
- if (unlikely(err))
+ if (unlikely(err)) {
+ pr_info("%pd2??, digen %u, btop %d\n",
+ dentry, au_digen(dentry), au_dbtop(dentry));
goto out_unlock;
+ }
err = -EINVAL;
inode = d_inode(dentry);
bindex = au_ibtop(inode);
h_inode = au_h_iptr(inode, bindex);
- if (unlikely(!h_inode->i_op->get_link))
+ if (unlikely(!h_inode->i_op->get_link)) {
+ pr_info("%pd2??, digen %u, btop %d\n",
+ dentry, au_digen(dentry), au_dbtop(dentry));
goto out_unlock;
+ }
err = -EBUSY;
h_dentry = NULL;
@@ -1357,11 +1366,16 @@ static const char *aufs_get_link(struct dentry *dentry, struct inode *inode,
h_dentry = d_find_any_alias(h_inode);
if (IS_ERR(h_dentry)) {
err = PTR_ERR(h_dentry);
+ pr_info("%pd2??, digen %u, btop %d\n",
+ dentry, au_digen(dentry), au_dbtop(dentry));
goto out_unlock;
}
}
- if (unlikely(!h_dentry))
+ if (unlikely(!h_dentry)) {
+ pr_info("%pd2??, digen %u, btop %d\n",
+ dentry, au_digen(dentry), au_dbtop(dentry));
goto out_unlock;
+ }
err = 0;
AuDbg("%ps\n", h_inode->i_op->get_link);
|
@sfjro in the aufs_debug_io_error.log file, I have included logs from the enabled debug mode in AUFS (kernel v6.10.6) along with the additional proposed logging for digen and btop. The first occurrence of the
Yes, I perform a |
Artur Piechocki:
@sfjro in the [aufs_debug_io_error.log](https://github.com/user-attachments/files/16814777/aufs_debug_io_error.log) file, I have included logs from the enabled debug mode in AUFS (kernel v6.10.6) along with the additional proposed logging for digen and btop. The first occurrence of the `input/output` error happened around the timestamp value 952.
Thanks for the log.
But it looks a little weird.
"mount -o remount /" should enter aufs/fsctx.c:au_fsctx_reconfigure()
which contains
pr_info("sigen %u, digen %u, iigen %u\n", ...)
two places as previous debug patch.
I am expecting this debug log but cannot find in your
aufs_debug_io_error.log. Is the log full version?
J. R. Okajima
|
The previous log had to be truncated by logrotate. Sorry for that. I am resending the logs aufs_io_error.log , but without debugging enabled in aufs ( Anyway let me know if you need the full logs with debugging enabled in aufs, and I will keep trying until I succeed. |
Artur Piechocki:
I am resending the logs [aufs_io_error.log](https://github.com/user-attachments/files/16823829/aufs_io_error.log) , but without debugging enabled in aufs (`echo 1 > /sys/module/aufs/parameters/debug`) because with debugging enabled, it is very difficult to reproduce the problem due to the large amount of generated logs. My system cannot handle processing such a large number of logs.
Thanx for the log.
But it was not enough.
Would you try this additional patch, which enables DEBUG during remount
only?
J. R. Okajima
diff --git a/fs/aufs/fsctx.c b/fs/aufs/fsctx.c
index 008a5aaf11e7..de4d68b19b36 100644
--- a/fs/aufs/fsctx.c
+++ b/fs/aufs/fsctx.c
@@ -43,6 +43,7 @@ static int au_fsctx_reconfigure(struct fs_context *fc)
struct inode *inode;
struct au_fsctx_opts *a = fc->fs_private;
+au_debug_on();
AuDbg("fc %p\n", fc);
root = fc->root;
@@ -84,6 +95,7 @@ static int au_fsctx_reconfigure(struct fs_context *fc)
err = cvt_err(err);
AuTraceErr(err);
+au_debug_off();
return err;
}
|
@sfjro additional logs aufs_debug_io_error1.log with debug for remount. Thank You |
Artur Piechocki:
@sfjro additional logs [aufs_debug_io_error1.log](https://github.com/user-attachments/files/16824541/aufs_debug_io_error1.log) with debug for remount.
Unfortunately the log is still not enough.
I'm afraid several parts of the log are lost.
I'd suggest you to try increasing 'log_buf_len' in your kernel command
line.
Also 'cron' daemon looks invoked very frequently in a short time.
Is it expected behaviour?
J. R. Okajima
|
What kind of logs do you think are missing (debug from fsctx.c file ?) The last logs were generated while skipping syslog. I'm using the The cron job is intentionally set to run every minute because the more commands are executed on the system during the remount, the faster I can reproduce the problem. |
Artur Piechocki:
What kind of logs do you think are missing (debug from fsctx.c file ?)
Mainly yes.
Applying the debug patch, au_fsctx_reconfigure() prints two "sigen
%d..." logs. But there is only one log for "sigen 17" in your
aufs_debug_io_error1.log.
1162: aufs au_fsctx_reconfigure:80:mount[19848]: sigen 16, digen 16, iigen 16
1164: aufs au_fsctx_reconfigure:88:mount[19848]: sigen 16, digen 16, iigen 16
3069: aufs au_fsctx_reconfigure:88:mount[14731]: sigen 17, digen 17, iigen 17
16422: aufs au_fsctx_reconfigure:80:mount[14732]: sigen 18, digen 18, iigen 18
16425: aufs au_fsctx_reconfigure:88:mount[14732]: sigen 18, digen 18, iigen 18
That is the main reason why I'm afraid the log is not meaningful.
J. R. Okajima
|
I'm sending the logs again aufs_debug_io_error2.log. I set Thank You |
Artur Piechocki:
I'm sending the logs again [aufs_debug_io_error2.log](https://github.com/user-attachments/files/16833027/aufs_debug_io_error2.log). I set `log_buf_len` to 10M and made sure that each log with sigen information for a given process is duplicated.
OK thanks.
But this log still looks weird. It has several "/bin/mount: Input/output
error" logs, but no "aufs_get_link: err -5" log at all. I am not sure
where EIO comes from.
Assuming all logs were collected, EIO has to come from other than
aufs. (well, it's doubtful).
Would you apply this another additional debug patch?
diff --git a/fs/fsopen.c b/fs/fsopen.c
index 6593ae518115..83355c676ecf 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -266,6 +266,10 @@ static int vfs_cmd_reconfigure(struct fs_context *fc)
down_write(&sb->s_umount);
ret = reconfigure_super(fc);
up_write(&sb->s_umount);
+if(ret)pr_info("%s:%d:%.*s[%d]: ret %d\n",
+ __func__, __LINE__,
+ (int)sizeof(current->comm), current->comm, current->pid,
+ ret);
if (ret) {
fc->phase = FS_CONTEXT_FAILED;
return ret;
diff --git a/fs/namespace.c b/fs/namespace.c
index d3b682cb7704..a2461a5a5f09 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1708,6 +1708,10 @@ static int do_umount_root(struct super_block *sb)
}
}
up_write(&sb->s_umount);
+if(ret)pr_info("%s:%d:%.*s[%d]: ret %d\n",
+ __func__, __LINE__,
+ (int)sizeof(current->comm), current->comm, current->pid,
+ ret);
return ret;
}
@@ -2912,6 +2916,10 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags,
mnt_warn_timestamp_expiry(path, &mnt->mnt);
put_fs_context(fc);
+if(err)pr_info("%s:%d:%.*s[%d]: err %d\n",
+ __func__, __LINE__,
+ (int)sizeof(current->comm), current->comm, current->pid,
+ err);
return err;
}
diff --git a/fs/super.c b/fs/super.c
index 095ba793e10c..d2d261679d5a 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1097,6 +1097,10 @@ int reconfigure_super(struct fs_context *fc)
cancel_readonly:
sb_end_ro_state_change(sb);
+if(retval)pr_info("%s:%d:%.*s[%d]: retval %d\n",
+ __func__, __LINE__,
+ (int)sizeof(current->comm), current->comm, current->pid,
+ retval);
return retval;
}
@@ -1734,6 +1738,10 @@ int reconfigure_single(struct super_block *s,
ret = reconfigure_super(fc);
out:
put_fs_context(fc);
+if(ret)pr_info("%s:%d:%.*s[%d]: ret %d\n",
+ __func__, __LINE__,
+ (int)sizeof(current->comm), current->comm, current->pid,
+ ret);
return ret;
}
|
Could it be that debug wasn't enabled in AUFS? As I mentioned, it's difficult to reproduce the issue when debug is enabled, but I kept trying until I succeeded, and the file aufs_debug_io_error2.log now contains another log, this time with debug enabled.
I'm having a bit of a problem here: despite the patch being applied (I double-checked), there is no debug (from the last patch) information in the log. It's as if the condition is always false and never enters the However, I hope that the log with debugging enabled ( |
Artur Piechocki:
Could it be that debug wasn't enabled in AUFS? As I mentioned, it's difficult to reproduce the issue when debug is enabled, but I kept trying until I succeeded, and the file [aufs_debug_io_error2.log](https://github.com/user-attachments/files/16869774/aufs_debug_io_error2.log) now contains another log, this time with debug enabled.
Enabling debug is helpful here.
Please be patient.
By new aufs_debug_io_error2.log, I am narrowing down the (possible) bug.
Here is a first trial to fix. Please test this patch.
I'm having a bit of a problem here: despite the patch being applied (I double-checked), there is no debug (from the last patch) information in the log. It's as if the condition is always false and never enters the `pr_info` function.
Ok, the must be in aufs.
J. R. Okajima
diff --git a/fs/aufs/sbinfo.c b/fs/aufs/sbinfo.c
index fc1d958bcaa5..d83e13462061 100644
--- a/fs/aufs/sbinfo.c
+++ b/fs/aufs/sbinfo.c
@@ -239,8 +239,10 @@ int aufs_read_lock(struct dentry *dentry, int flags)
err = au_digen_test(dentry, au_sigen(sb));
if (!au_opt_test(au_mntflags(sb), UDBA_NONE))
AuDebugOn(!err && au_dbrange_test(dentry));
+/*
else if (!err)
err = au_dbrange_test(dentry);
+*/
if (unlikely(err))
aufs_read_unlock(dentry, flags);
}
|
Despite applying the potential fix, the problem is still reproducible. I am sending debug logs aufs_debug_io_error_with_potential_fix.log with the latest patch applied. Please note that the issue only occurs when |
Artur Piechocki:
Despite applying the potential fix, the problem is still reproducible. I am sending debug logs [aufs_debug_io_error_with_potential_fix.log](https://github.com/user-attachments/files/16885950/aufs_debug_io_error_with_potential_fix.log) with the latest patch applied.
OK, thanks for your repeated tests.
Now I want you to find all the directories named 'x86_64-linux-gnu',
and all the entries (regular file, symlink, etc) named 'libc.so.6' on
your all branches. It is to know whether multiple entires exist or not.
And here is another possible/potential fix patch.
Please try.
Please note that the issue only occurs when `udba=none`. When `udba=notify` is set, the problem does not occur.
Yes, my previous patch was meaningless here. Probably I was crazy.
J. R. Okajima
diff --git a/fs/aufs/dcsub.c b/fs/aufs/dcsub.c
index 842063a3c314..1e27ba9f42ba 100644
--- a/fs/aufs/dcsub.c
+++ b/fs/aufs/dcsub.c
@@ -122,9 +122,9 @@ static enum d_walk_ret au_call_dpages_append(void *_arg, struct dentry *dentry)
ret = D_WALK_CONTINUE;
if (dentry->d_sb == arg->sb
&& !IS_ROOT(dentry)
- && au_dcount(dentry) > 0
+/* && au_dcount(dentry) > 0
&& au_di(dentry)
- && (!arg->test || arg->test(dentry, arg->arg))) {
+*/ && (!arg->test || arg->test(dentry, arg->arg))) {
arg->err = au_dpages_append(arg->dpages, dentry, GFP_ATOMIC);
if (unlikely(arg->err))
ret = D_WALK_QUIT;
diff --git a/fs/aufs/super.c b/fs/aufs/super.c
index 07d3412e950f..10a80f4f8c22 100644
--- a/fs/aufs/super.c
+++ b/fs/aufs/super.c
@@ -577,7 +577,7 @@ static int au_do_refresh_d(struct dentry *dentry, unsigned int sigen,
err = 0;
parent = dget_parent(dentry);
- if (!au_digen_test(parent, sigen) && au_digen_test(dentry, sigen)) {
+ if (/*!au_digen_test(parent, sigen) &&*/ au_digen_test(dentry, sigen)) {
if (d_really_is_positive(dentry)) {
if (!d_is_dir(dentry))
err = au_do_refresh(dentry, /*dir_flags*/0,
|
Unfortunately, the proposed second fix causes a kernel panic. Here is the log aufs_panic_fix2.log
I have two branches that have a directory
Only one branch has |
Artur Piechocki:
Unfortunately, the proposed second fix causes a kernel panic. Here is the log [aufs_panic_fix2.log](https://github.com/user-attachments/files/16940691/aufs_panic_fix2.log)
Ok, that was my bad.
Please modify fs/aufs/dcsub.c (which is patched by my last patch) like
this. (see attached)
> and all the entries (regular file, symlink, etc) named 'libc.so.6' on
Only one branch has `libc.so.6`.
And it is a symlink on your 14th branch?
What is the target, and where is it?
J. R. Okajima
diff --git a/fs/aufs/dcsub.c b/fs/aufs/dcsub.c
index 842063a3c314..ee2a177b4188 100644
--- a/fs/aufs/dcsub.c
+++ b/fs/aufs/dcsub.c
@@ -86,7 +86,7 @@ static int au_dpages_append(struct au_dcsub_pages *dpages,
dpages->ndpage++;
}
- AuDebugOn(au_dcount(dentry) <= 0);
+ //AuDebugOn(au_dcount(dentry) <= 0);
dpage->dentries[dpage->ndentry++] = dget_dlock(dentry);
return 0; /* success */
@@ -122,8 +122,8 @@ static enum d_walk_ret au_call_dpages_append(void *_arg, struct dentry *dentry)
ret = D_WALK_CONTINUE;
if (dentry->d_sb == arg->sb
&& !IS_ROOT(dentry)
- && au_dcount(dentry) > 0
- && au_di(dentry)
+/* && au_dcount(dentry) > 0
+*/ && au_di(dentry)
&& (!arg->test || arg->test(dentry, arg->arg))) {
arg->err = au_dpages_append(arg->dpages, dentry, GFP_ATOMIC);
if (unlikely(arg->err))
|
Yes, it is symlink on 14th branch which target is
Now it looks like the fix is working and the issue is gone. Is this the target fix or just for analysis purposes? |
Artur Piechocki:
Now it looks like the fix is =E2=80=8B=E2=80=8Bworking and the issue is g=
one. Is this the target fix or just for analysis purposes?
That is a good news.
The patch is not final one, and I'm still considering the bug scenario.
May I ask you more tests?
J. R. Okajima
|
Yes, of course. Please write what additional tests I should do. |
Artur Piechocki:
Yes, of course. Please write what additional tests I should do.
Thank you very much.
The patches I sent were to fix two (or more) possible bugs, and I still
don't identify which bug is real. Please REMOVE this patch and test
again. If it succeeds, the bug will be narrow down to around the dentry
ref-count.
J. R. Okajima
diff --git a/fs/aufs/super.c b/fs/aufs/super.c
index 07d3412e950f..10a80f4f8c22 100644
--- a/fs/aufs/super.c
+++ b/fs/aufs/super.c
@@ -577,7 +577,7 @@ static int au_do_refresh_d(struct dentry *dentry, unsigned int sigen,
err = 0;
parent = dget_parent(dentry);
- if (!au_digen_test(parent, sigen) && au_digen_test(dentry, sigen)) {
+ if (/*!au_digen_test(parent, sigen) &&*/ au_digen_test(dentry, sigen)) {
if (d_really_is_positive(dentry)) {
if (!d_is_dir(dentry))
err = au_do_refresh(dentry, /*dir_flags*/0,
|
I removed all previous patches and applied only the last one (changes in I hope this helps. |
Artur Piechocki:
I removed all previous patches and applied only the last one (changes in `au_do_refresh_d` function) you sent and the issue is reproducible.
No, that is not what I meant.
Do not apply the patch for fs/aufs/super.c:au_do_refresh_d() which
commented out "!au_digen_test(parent, sigen) &&".
Keep all other patches applied and test please.
J. R. Okajima
|
I am a bit confused about what I should test because there were several proposed patches in this thread (debugging, a potential fix that didn’t work, a fix that caused a kernel panic, and a change that potentially resolved the issue), and now I’m not sure what to revert and what to keep. In this regard, could you please prepare a patch on the clean AUFS code that I should test? |
Artur Piechocki:
In this regard, could you please prepare a patch on the clean AUFS code t=
hat I should test?
I'm sorry to make you confused.
Here is the patch for clean aufs6.10.
Plz test.
J. R. Okajima
diff --git a/fs/aufs/dcsub.c b/fs/aufs/dcsub.c
index 842063a3c314..ee2a177b4188 100644
--- a/fs/aufs/dcsub.c
+++ b/fs/aufs/dcsub.c
@@ -86,7 +86,7 @@ static int au_dpages_append(struct au_dcsub_pages *dpages,
dpages->ndpage++;
}
- AuDebugOn(au_dcount(dentry) <= 0);
+ //AuDebugOn(au_dcount(dentry) <= 0);
dpage->dentries[dpage->ndentry++] = dget_dlock(dentry);
return 0; /* success */
@@ -122,8 +122,8 @@ static enum d_walk_ret au_call_dpages_append(void *_arg, struct dentry *dentry)
ret = D_WALK_CONTINUE;
if (dentry->d_sb == arg->sb
&& !IS_ROOT(dentry)
- && au_dcount(dentry) > 0
- && au_di(dentry)
+/* && au_dcount(dentry) > 0
+*/ && au_di(dentry)
&& (!arg->test || arg->test(dentry, arg->arg))) {
arg->err = au_dpages_append(arg->dpages, dentry, GFP_ATOMIC);
if (unlikely(arg->err))
|
------- Blind-Carbon-Copy
From: "J. R. Okajima" ***@***.***>
To: ***@***.***
Subject: aufs6 GIT release (v6.12-rc7)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: ***@***.***>
Date: Mon, 18 Nov 2024 01:57:36 +0900
Message-ID: ***@***.***>
o news
- - LOCKDEP warning is produced since linux-v6.12-rc6.
It is unknown whether the warning is aufs specific or not. Need to
investigate more.
o possible bugfix
- - warning about sb->s_remove_count #2
- - init inode->i_nlink directly
J. R. Okajima
- ----------------------------------------
- - aufs6-linux.git
aufs: init inode->i_nlink directly
aufs: possible bugfix, warning about sb->s_remove_count #2
- - aufs6-standalone.git#aufs6.6
ditto
- - aufs-util.git
nothing
…------- End of Blind-Carbon-Copy
|
Artur Piechocki:
Thank you.
But did you forget applying the patch for the first part of
fs/aufs/i_op_ren:aufs_rename()?
diff --git a/fs/aufs/i_op_ren.c b/fs/aufs/i_op_ren.c
index e5a3d5549ac7..9e339b77ef05 100644
--- a/fs/aufs/i_op_ren.c
+++ b/fs/aufs/i_op_ren.c
@@ -974,8 +961,12 @@ int aufs_rename(struct mnt_idmap *idmap,
struct au_pin pin;
AuDbg("%pd, %pd, 0x%x\n", _src_dentry, _dst_dentry, _flags);
+au_debug_on();
+AuDbgDentry(_src_dentry);
+au_debug_off();
IMustLock(_src_dir);
IMustLock(_dst_dir);
:::
J. R. Okajima
|
"J. R. Okajima":
But did you forget applying the patch for the first part of
fs/aufs/i_op_ren:aufs_rename()?
Hold on.
That part must be less important. You don't have to apply and test
again. I will try another debugging.
J. R. Okajima
|
***@***.***:
Hold on.
That part must be less important. You don't have to apply and test
again. I will try another debugging.
I'm feeling sorry to ask you many times, but please apply this
additional debug patch (with the previous debug patch you already
applied) and send me the log.
J. R. Okajima
diff --git a/fs/aufs/cpup.c b/fs/aufs/cpup.c
index 59c94fb30bda..c4e7d92fd1cf 100644
--- a/fs/aufs/cpup.c
+++ b/fs/aufs/cpup.c
@@ -57,7 +57,7 @@ void au_cpup_attr_nlink(struct inode *inode, int force)
* the incorrect link count.
*/
au_set_nlink(inode, h_inode->i_nlink);
-//pr_info("dev 0x%x, s_remove_count %ld\n", sb->s_dev, atomic_long_read(&sb->s_remove_count));
+pr_info("dev 0x%x, i%lu, new nlink %u, s_remove_count %ld\n", inode->i_sb->s_dev, inode->i_ino, h_inode->i_nlink, atomic_long_read(&inode->i_sb->s_remove_count));
/*
* fewer nlink makes find(1) noisy, but larger nlink doesn't.
diff --git a/fs/aufs/dir.c b/fs/aufs/dir.c
index 4ee1f5086614..c7f61f4a87aa 100644
--- a/fs/aufs/dir.c
+++ b/fs/aufs/dir.c
@@ -24,7 +24,7 @@ void au_add_nlink(struct inode *dir, struct inode *h_dir)
smp_mb(); /* for i_nlink */
/* 0 can happen in revaliding */
au_set_nlink(dir, nlink);
-//pr_info("dev 0x%x, s_remove_count %ld\n", dir->i_sb->s_dev, atomic_long_read(&dir->i_sb->s_remove_count));
+pr_info("dev 0x%x, i%lu, new nlink %u, s_remove_count %ld\n", dir->i_sb->s_dev, dir->i_ino, nlink, atomic_long_read(&dir->i_sb->s_remove_count));
}
void au_sub_nlink(struct inode *dir, struct inode *h_dir)
@@ -40,7 +40,7 @@ void au_sub_nlink(struct inode *dir, struct inode *h_dir)
smp_mb(); /* for i_nlink */
/* nlink == 0 means the branch-fs is broken */
au_set_nlink(dir, nlink);
-//pr_info("dev 0x%x, s_remove_count %ld\n", dir->i_sb->s_dev, atomic_long_read(&dir->i_sb->s_remove_count));
+pr_info("dev 0x%x, i%lu, new nlink %u, s_remove_count %ld\n", dir->i_sb->s_dev, dir->i_ino, nlink, atomic_long_read(&dir->i_sb->s_remove_count));
}
loff_t au_dir_size(struct file *file, struct dentry *dentry)
diff --git a/fs/aufs/i_op.c b/fs/aufs/i_op.c
index a9ae8faaee5f..d4f532a1a3ac 100644
--- a/fs/aufs/i_op.c
+++ b/fs/aufs/i_op.c
@@ -1174,7 +1174,7 @@ static void au_refresh_iattr(struct inode *inode, struct kstat *st,
smp_mb(); /* for i_nlink */
/* 0 can happen */
au_set_nlink(inode, n);
-//pr_info("dev 0x%x, s_remove_count %ld\n", inode->i_sb->s_dev, atomic_long_read(&inode->i_sb->s_remove_count));
+pr_info("dev 0x%x, i%lu, new nlink %u, s_remove_count %ld\n", inode->i_sb->s_dev, inode->i_ino, n, atomic_long_read(&inode->i_sb->s_remove_count));
}
spin_lock(&inode->i_lock);
diff --git a/fs/aufs/i_op_add.c b/fs/aufs/i_op_add.c
index c771750ba29b..57890aae69a8 100644
--- a/fs/aufs/i_op_add.c
+++ b/fs/aufs/i_op_add.c
@@ -817,6 +817,7 @@ int aufs_link(struct dentry *src_dentry, struct inode *dir,
au_dir_ts(dir, a->bdst);
inode_inc_iversion(dir);
inc_nlink(inode);
+pr_info("dev 0x%x, i%lu, nl%u, s_remove_count %ld\n", inode->i_sb->s_dev, inode->i_ino, inode->i_nlink, atomic_long_read(&inode->i_sb->s_remove_count));
inode_set_ctime_to_ts(inode, inode_get_ctime(dir));
d_instantiate(dentry, au_igrab(inode));
if (d_unhashed(a->h_path.dentry))
@@ -924,6 +925,7 @@ int aufs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
err = epilog(dir, bindex, wh_dentry, dentry);
if (!err) {
inc_nlink(dir);
+pr_info("dev 0x%x, i%lu, nl%u, s_remove_count %ld\n", dir->i_sb->s_dev, dir->i_ino, dir->i_nlink, atomic_long_read(&dir->i_sb->s_remove_count));
goto out_unpin; /* success */
}
|
I checked, I have those changes applied, but I don't have CONFIG_AUFS_DEBUG enabled. Maybe that's why?
OK
I’m trying to reproduce the issue with this additional debug patch, but with such a large amount of logs, I haven’t been able to reproduce it so far. However, I’m still trying. The additional debugging might have prevented the race condition from occurring. I also noticed that the issue is reproducible only on real fast hardware. On a virtual machine, the problem cannot be reproduced. |
I have a question regarding the initial Input/Output error issue. Are you able to reproduce this problem on your system? I managed to reproduce it on Puppy Linux. I can prepare a description of how to replicate it if you need this. |
After adding a
|
Sorry for my slow responce. I'm just kinda busy.
Artur Piechocki:
I checked, I have those changes applied, but I don't have CONFIG_AUFS_DEB=
UG enabled. Maybe that's why?
I see.
I=E2=80=99m trying to reproduce the issue with this additional debug patc=
h, but with such a large amount of logs, I haven=E2=80=99t been able to r=
eproduce it so far. However, I=E2=80=99m still trying. The additional deb=
ugging might have prevented the race condition from occurring.
I also noticed that the issue is reproducible only on real fast hardware.=
On a virtual machine, the problem cannot be reproduced.
This report highly indicates that the problem is a race condition.
----------------------------------------
I have a question regarding the initial Input/Output error issue. Are you able to reproduce this problem on your system? I managed to reproduce it on Puppy Linux. I can prepare a description of how to replicate it if you need this.
I cannot reproduce the original problem (and destory_inode() warning
too). I appriciate your offer. I will try AFTER solving destory_inode()
warning.
----------------------------------------
After adding a `spin_lock` for inode structure to the `au_set_nlink` function you recently added, the issue seems to be non-reproducible (or at least harder to reproduce). After nearly 100 restarts, the warning has not occurred even once. Do you think we can use locking in this place?
Wow, you solved by yourself.
I was thinking the same approach too since your kernel log shows very
weird value changes. Currently I agree that the simple spinlock is best
for this problem. Usually i_nlink is protected by inode_lock() or inode
flag, but there is unusual behaviour around i_nlink in aufs, which is
natively required for aufs. And that may affect the problem.
Now I'm wondering whether the simple reference (such like n =
inode->i_nlink) will require to be protected by spinlock or not.
J. R. Okajima
|
To reproduce the issue with Input/Output errors, try running F96-CE_4.iso (https://f96.puppylinux.com/), as it uses AUFS by default. It can also be run as virtual machine. First, remount / with
Then, execute the following script to repeatedly remount /:
Wait for a while, as the Input/Output error should occur. I know that such repeated remounting is not a real-world scenario, but it accelerates the occurrence of the problem.
Great, thank you. |
Artur Piechocki:
> Currently I agree that the simple spinlock is best
> for this problem
>
Great, thank you.
Here I post two commits one by one. They introduce a new spinlock and
replace all uses of i_nlink by some new aufs funcs.
Please apply them and test.
J. R. Okajima
…----------------------------------------
commit f1446eace4050c1cff877872376dde04e2e0d6cb
Author: J. R. Okajima ***@***.***>
Date: Tue Nov 26 16:06:44 2024 +0900
aufs: i_nlink 1/2, protect aufs inode i_nlink
By the commit in linux-v3.3-rc1
7ada4db88634 2012-01-06 vfs: count unlinked inodes
vfs:__destroy_inode() became available to produce a warning about
sb->s_remove_count.
Basically inode->i_nlink should be referenced by anytime. But in aufs,
Artur Piechocki found a problem and it is necessary to be protected by
a lock. Here aufs introduces a spinlock dedicated to i_nlink only.
Every user of i_nlink and VFS functions for it should call this new
function regardless the inode is aufs or not.
Reported by Artur Piechocki on github.
See-also: #44
Signed-off-by: J. R. Okajima ***@***.***>
diff --git a/fs/aufs/iinfo.c b/fs/aufs/iinfo.c
index 672eed8b5..283909b17 100644
--- a/fs/aufs/iinfo.c
+++ b/fs/aufs/iinfo.c
@@ -167,6 +167,7 @@ void au_icntnr_init_once(void *_c)
spin_lock_init(&iinfo->ii_generation.ig_spin);
au_rw_init(&iinfo->ii_rwsem);
inode_init_once(&c->vfs_inode);
+ spin_lock_init(&c->nlink_spin);
}
void au_hinode_init(struct au_hinode *hinode)
diff --git a/fs/aufs/inode.h b/fs/aufs/inode.h
index 56aa0186d..6824a7689 100644
--- a/fs/aufs/inode.h
+++ b/fs/aufs/inode.h
@@ -13,6 +13,7 @@
#ifdef __KERNEL__
#include <linux/fsnotify.h>
+#include "fstype.h"
#include "rwsem.h"
struct vfsmount;
@@ -67,6 +68,7 @@ struct au_iinfo {
struct au_icntnr {
struct au_iinfo iinfo;
struct inode vfs_inode;
+ spinlock_t nlink_spin; /* protects vfs_inode.i_nlink */
struct hlist_bl_node plink;
struct rcu_head rcu;
} ____cacheline_aligned_in_smp;
@@ -112,6 +114,24 @@ static inline struct au_iinfo *au_ii(struct inode *inode)
return &(container_of(inode, struct au_icntnr, vfs_inode)->iinfo);
}
+static inline void au_nlink_lock(struct inode *inode)
+{
+ spinlock_t *spin;
+
+ AuDebugOn(!au_test_aufs(inode->i_sb));
+ AuDebugOn(is_bad_inode(inode));
+ spin = &(container_of(inode, struct au_icntnr, vfs_inode)->nlink_spin);
+ spin_lock(spin);
+}
+
+static inline void au_nlink_unlock(struct inode *inode)
+{
+ spinlock_t *spin;
+
+ spin = &(container_of(inode, struct au_icntnr, vfs_inode)->nlink_spin);
+ spin_unlock(spin);
+}
+
/* ---------------------------------------------------------------------- */
/* inode.c */
diff --git a/fs/aufs/vfsub.c b/fs/aufs/vfsub.c
index 42aea4826..86912a537 100644
--- a/fs/aufs/vfsub.c
+++ b/fs/aufs/vfsub.c
@@ -38,6 +38,52 @@ int vfsub_sync_filesystem(struct super_block *h_sb)
/* ---------------------------------------------------------------------- */
+unsigned int vfsub_inode_nlink_aufs(struct inode *inode)
+{
+ unsigned int nlink;
+
+ au_nlink_lock(inode);
+ nlink = inode->i_nlink;
+ au_nlink_unlock(inode);
+
+ return nlink;
+}
+
+void vfsub_inc_nlink(struct inode *inode)
+{
+ au_nlink_lock(inode);
+ inc_nlink(inode);
+ au_nlink_unlock(inode);
+}
+
+void vfsub_drop_nlink(struct inode *inode)
+{
+ au_nlink_lock(inode);
+ AuDebugOn(!inode->i_nlink);
+ drop_nlink(inode);
+ au_nlink_unlock(inode);
+}
+
+void vfsub_clear_nlink(struct inode *inode)
+{
+ au_nlink_lock(inode);
+ AuDebugOn(!inode->i_nlink);
+ clear_nlink(inode);
+ au_nlink_unlock(inode);
+}
+
+void vfsub_set_nlink(struct inode *inode, unsigned int nlink)
+{
+ /*
+ * stop setting the value equal to the current one, in order to stop
+ * a useless warning from vfs:destroy_inode() about sb->s_remove_count.
+ */
+ au_nlink_lock(inode);
+ if (nlink != inode->i_nlink)
+ set_nlink(inode, nlink);
+ au_nlink_unlock(inode);
+}
+
int vfsub_update_h_iattr(struct path *h_path, int *did)
{
int err;
diff --git a/fs/aufs/vfsub.h b/fs/aufs/vfsub.h
index 4eea10d3f..bf12e73cf 100644
--- a/fs/aufs/vfsub.h
+++ b/fs/aufs/vfsub.h
@@ -17,6 +17,7 @@
#include <linux/posix_acl.h>
#include <linux/xattr.h>
#include "debug.h"
+#include "fstype.h"
/* copied from linux/fs/internal.h */
/* todo: BAD approach!! */
@@ -44,28 +45,50 @@ enum {
/* ---------------------------------------------------------------------- */
-static inline void au_set_nlink(struct inode *inode, unsigned int nlink)
+unsigned int vfsub_inode_nlink_aufs(struct inode *inode);
+
+enum au_inode_type {
+ AU_I_AUFS,
+ AU_I_BRANCH,
+ AU_I_UNKNOWN
+};
+
+static inline unsigned int vfsub_inode_nlink(struct inode *inode,
+ enum au_inode_type type)
{
- /*
- * stop setting the value equal to the current one, in order to stop
- * a useless warning from vfs:destroy_inode() about sb->s_remove_count.
- */
- if (nlink != inode->i_nlink)
- set_nlink(inode, nlink);
+ unsigned int nlink;
+
+ switch (type) {
+ case AU_I_AUFS:
+ nlink = vfsub_inode_nlink_aufs(inode);
+ break;
+ case AU_I_BRANCH: /* aufs cannot be a branch of another aufs mount */
+ AuDebugOn(au_test_aufs(inode->i_sb));
+ nlink = inode->i_nlink;
+ break;
+ case AU_I_UNKNOWN:
+ if (au_test_aufs(inode->i_sb))
+ nlink = vfsub_inode_nlink_aufs(inode);
+ else
+ nlink = inode->i_nlink;
+ break;
+ };
+
+ return nlink;
}
-static inline void au_init_nlink(struct inode *inode, unsigned int nlink)
+void vfsub_inc_nlink(struct inode *inode);
+void vfsub_drop_nlink(struct inode *inode);
+void vfsub_clear_nlink(struct inode *inode);
+void vfsub_set_nlink(struct inode *inode, unsigned int nlink);
+
+static inline void vfsub_inode_nlink_init(struct inode *inode,
+ unsigned int nlink)
{
/* to ignore sb->s_remove_count, do not use set_nlink() */
inode->__i_nlink = nlink;
}
From: "J. R. ***@***.***>
-static inline void vfsub_drop_nlink(struct inode *inode)
-{
- AuDebugOn(!inode->i_nlink);
- drop_nlink(inode);
-}
-
static inline void vfsub_dead_dir(struct inode *inode)
{
AuDebugOn(!S_ISDIR(inode->i_mode));
|
Here I post two commits one by one. They introduce a new spinlock and
replace all uses of i_nlink by some new aufs funcs.
Please apply them and test.
This is the second commit.
J. R. Okajima
…----------------------------------------
commit 195fa684a5aba837b0a45adfab8b48e6f4c122de
Author: J. R. Okajima ***@***.***>
Date: Tue Nov 26 17:16:13 2024 +0900
aufs: i_nlink 2/2, replace all use of i_nlink and VFS i_nlink funcs
For the details, see previous commit.
See-also: #44
Signed-off-by: J. R. Okajima ***@***.***>
diff --git a/fs/aufs/branch.c b/fs/aufs/branch.c
index 1f8b3c9cb..87030bf27 100644
--- a/fs/aufs/branch.c
+++ b/fs/aufs/branch.c
@@ -230,7 +230,7 @@ static int test_add(struct super_block *sb, struct au_opt_add *add, int remount)
inode = d_inode(add->path.dentry);
err = -ENOENT;
- if (unlikely(!inode->i_nlink)) {
+ if (unlikely(!vfsub_inode_nlink(inode, AU_I_UNKNOWN))) {
pr_err("no existence %s\n", add->pathname);
goto out;
}
diff --git a/fs/aufs/cpup.c b/fs/aufs/cpup.c
index 8a249b070..e6cea7a85 100644
--- a/fs/aufs/cpup.c
+++ b/fs/aufs/cpup.c
@@ -56,7 +56,7 @@ void au_cpup_attr_nlink(struct inode *inode, int force)
* todo: O_TMPFILE+linkat(AT_SYMLINK_FOLLOW) bypassing aufs may cause
* the incorrect link count.
*/
- au_set_nlink(inode, h_inode->i_nlink);
+ vfsub_set_nlink(inode, vfsub_inode_nlink(h_inode, AU_I_BRANCH));
/*
* fewer nlink makes find(1) noisy, but larger nlink doesn't.
@@ -703,7 +703,7 @@ int cpup_entry(struct au_cp_generic *cpg, struct dentry *dst_parent,
if (!au_opt_test(mnt_flags, UDBA_NONE)
&& !isdir
&& au_opt_test(mnt_flags, XINO)
- && (h_inode->i_nlink == 1
+ && (vfsub_inode_nlink(h_inode, AU_I_BRANCH) == 1
|| (h_inode->i_state & I_LINKABLE))
/* todo: unnecessary? */
/* && d_inode(cpg->dentry)->i_nlink == 1 */
@@ -830,7 +830,7 @@ static int au_cpup_single(struct au_cp_generic *cpg, struct dentry *dst_parent)
goto out_parent;
}
- if (dst_inode->i_nlink) {
+ if (vfsub_inode_nlink(dst_inode, AU_I_BRANCH)) {
const int do_dt = au_ftest_cpup(cpg->flags, DTIME);
h_src = au_plink_lkup(inode, cpg->bdst);
@@ -915,7 +915,7 @@ static int au_cpup_single(struct au_cp_generic *cpg, struct dentry *dst_parent)
src_inode = d_inode(h_src);
if (!isdir
- && (src_inode->i_nlink > 1
+ && (vfsub_inode_nlink(src_inode, AU_I_BRANCH) > 1
|| src_inode->i_state & I_LINKABLE)
&& plink)
au_plink_append(inode, cpg->bdst, h_dst);
@@ -1281,7 +1281,7 @@ int au_sio_cpup_wh(struct au_cp_generic *cpg, struct file *file)
h_dir = au_igrab(au_h_iptr(dir, bdst));
h_tmpdir = h_dir;
pin_orig = NULL;
- if (!h_dir->i_nlink) {
+ if (!vfsub_inode_nlink(h_dir, AU_I_BRANCH)) {
wbr = au_sbr(dentry->d_sb, bdst)->br_wbr;
h_orph = wbr->wbr_orph;
diff --git a/fs/aufs/dcsub.h b/fs/aufs/dcsub.h
index 7f1eb7a78..0a5694da8 100644
--- a/fs/aufs/dcsub.h
+++ b/fs/aufs/dcsub.h
@@ -14,6 +14,7 @@
#include <linux/dcache.h>
#include <linux/fs.h>
+#include "vfsub.h"
struct au_dpage {
int ndentry;
@@ -54,7 +55,8 @@ static inline int au_d_hashed_positive(struct dentry *d)
err = 0;
if (unlikely(d_unhashed(d)
|| d_is_negative(d)
- || !inode->i_nlink))
+ /* to support both aufs and branches */
+ || !vfsub_inode_nlink(inode, AU_I_UNKNOWN)))
err = -ENOENT;
return err;
}
@@ -84,7 +86,7 @@ static inline int au_d_alive(struct dentry *d)
inode = d_inode(d);
if (unlikely(d_unlinked(d)
|| d_is_negative(d)
- || !inode->i_nlink))
+ || !vfsub_inode_nlink(inode, AU_I_UNKNOWN)))
err = -ENOENT;
}
return err;
diff --git a/fs/aufs/debug.c b/fs/aufs/debug.c
index 86f5c69b9..51f6b18c8 100644
--- a/fs/aufs/debug.c
+++ b/fs/aufs/debug.c
@@ -120,7 +120,8 @@ static int do_pri_inode(aufs_bindex_t bindex, struct inode *inode, int hn,
" hn %d, ct %lld, np %lu, st 0x%lx, f 0x%x, v %llu, g %x%s%.*s\n",
bindex, inode,
inode->i_ino, inode->i_sb ? au_sbtype(inode->i_sb) : "??",
- atomic_read(&inode->i_count), inode->i_nlink, inode->i_mode,
+ atomic_read(&inode->i_count),
+ vfsub_inode_nlink(inode, AU_I_UNKNOWN), inode->i_mode,
i_size_read(inode), (unsigned long long)inode->i_blocks,
inode->i_acl, inode->i_default_acl,
hn, (long long)timespec64_to_ns(&ctime) & 0x0ffff,
diff --git a/fs/aufs/dentry.c b/fs/aufs/dentry.c
index 734a77cca..65244b2e3 100644
--- a/fs/aufs/dentry.c
+++ b/fs/aufs/dentry.c
@@ -493,10 +493,11 @@ static void au_do_hide(struct dentry *dentry)
if (d_really_is_positive(dentry)) {
inode = d_inode(dentry);
if (!d_is_dir(dentry)) {
- if (inode->i_nlink && !d_unhashed(dentry))
- drop_nlink(inode);
+ if (vfsub_inode_nlink(inode, AU_I_AUFS)
+ && !d_unhashed(dentry))
+ vfsub_drop_nlink(inode);
} else {
- clear_nlink(inode);
+ vfsub_clear_nlink(inode);
/* stop next lookup */
inode->i_flags |= S_DEAD;
}
@@ -876,7 +877,7 @@ static int h_d_revalidate(struct dentry *dentry, struct inode *inode,
*/
if (do_udba && inode) {
mode = (inode->i_mode & S_IFMT);
- plus = (inode->i_nlink > 0);
+ plus = (vfsub_inode_nlink(inode, AU_I_AUFS) > 0);
ibs = au_ibtop(inode);
ibe = au_ibbot(inode);
}
@@ -944,7 +945,7 @@ static int h_d_revalidate(struct dentry *dentry, struct inode *inode,
h_cached_inode = h_inode;
if (h_inode && bindex != bwh) {
h_mode = (h_inode->i_mode & S_IFMT);
- h_plus = (h_inode->i_nlink > 0);
+ h_plus = (vfsub_inode_nlink(h_inode, AU_I_BRANCH) > 0);
}
if (inode && ibs <= bindex && bindex <= ibe)
h_cached_inode = au_h_iptr(inode, bindex);
@@ -1100,7 +1101,7 @@ static int aufs_d_revalidate(struct dentry *dentry, unsigned int flags)
if (!(flags & (LOOKUP_OPEN | LOOKUP_EMPTY))
&& inode
&& !(inode->i_state && I_LINKABLE)
- && (IS_DEADDIR(inode) || !inode->i_nlink)) {
+ && (IS_DEADDIR(inode) || !vfsub_inode_nlink(inode, AU_I_AUFS))) {
AuTraceErr(err);
goto out_inval;
}
diff --git a/fs/aufs/dir.c b/fs/aufs/dir.c
index 2faeabd47..37e4af56d 100644
--- a/fs/aufs/dir.c
+++ b/fs/aufs/dir.c
@@ -13,32 +13,32 @@
void au_add_nlink(struct inode *dir, struct inode *h_dir)
{
- unsigned int nlink;
+ unsigned int nlink, h_nlink;
AuDebugOn(!S_ISDIR(dir->i_mode) || !S_ISDIR(h_dir->i_mode));
- nlink = dir->i_nlink;
- nlink += h_dir->i_nlink - 2;
- if (h_dir->i_nlink < 2)
+ nlink = vfsub_inode_nlink(dir, AU_I_AUFS);
+ h_nlink = vfsub_inode_nlink(h_dir, AU_I_BRANCH);
+ nlink += h_nlink - 2;
+ if (h_nlink < 2)
nlink += 2;
- smp_mb(); /* for i_nlink */
/* 0 can happen in revaliding */
- au_set_nlink(dir, nlink);
+ vfsub_set_nlink(dir, nlink);
}
void au_sub_nlink(struct inode *dir, struct inode *h_dir)
{
- unsigned int nlink;
+ unsigned int nlink, h_nlink;
AuDebugOn(!S_ISDIR(dir->i_mode) || !S_ISDIR(h_dir->i_mode));
- nlink = dir->i_nlink;
- nlink -= h_dir->i_nlink - 2;
- if (h_dir->i_nlink < 2)
+ nlink = vfsub_inode_nlink(dir, AU_I_AUFS);
+ h_nlink = vfsub_inode_nlink(h_dir, AU_I_BRANCH);
+ nlink -= h_nlink - 2;
+ if (h_nlink < 2)
nlink -= 2;
- smp_mb(); /* for i_nlink */
/* nlink == 0 means the branch-fs is broken */
- au_set_nlink(dir, nlink);
+ vfsub_set_nlink(dir, nlink);
}
loff_t au_dir_size(struct file *file, struct dentry *dentry)
@@ -131,7 +131,7 @@ static void au_do_dir_ts(void *arg)
hdir = au_hi(dir, btop);
au_hn_inode_lock_nested(hdir, AuLsc_I_PARENT);
h_dir = au_h_iptr(dir, btop);
- if (h_dir->i_nlink
+ if (vfsub_inode_nlink(h_dir, AU_I_BRANCH)
&& timespec64_compare(&h_dir->i_mtime, &dt.dt_mtime) < 0) {
dt.dt_h_path = h_path;
au_dtime_revert(&dt);
@@ -587,7 +587,7 @@ static int do_test_empty(struct dentry *dentry, struct test_empty_arg *arg)
err = 0;
if (!au_opt_test(au_mntflags(dentry->d_sb), UDBA_NONE)
- && !file_inode(h_file)->i_nlink)
+ && !vfsub_inode_nlink(file_inode(h_file), AU_I_BRANCH))
goto out_put;
do {
diff --git a/fs/aufs/file.c b/fs/aufs/file.c
index f7c10541f..3af28dc28 100644
--- a/fs/aufs/file.c
+++ b/fs/aufs/file.c
@@ -393,7 +393,7 @@ static int au_ready_to_write_wh(struct file *file, loff_t len,
err = au_reopen_wh(file, bcpup, hi_wh);
if (!err
- && (inode->i_nlink > 1
+ && (vfsub_inode_nlink(inode, AU_I_AUFS) > 1
|| (inode->i_state & I_LINKABLE))
&& au_opt_test(au_mntflags(cpg.dentry->d_sb), PLINK))
au_plink_append(inode, bcpup, au_h_dptr(cpg.dentry, bcpup));
diff --git a/fs/aufs/hnotify.c b/fs/aufs/hnotify.c
index 516d746b9..50a9b62e5 100644
--- a/fs/aufs/hnotify.c
+++ b/fs/aufs/hnotify.c
@@ -306,7 +306,7 @@ static int hn_job(struct hn_job_args *a)
&& a->inode
&& a->h_inode) {
inode_lock_shared_nested(a->h_inode, AuLsc_I_CHILD);
- if (!a->h_inode->i_nlink
+ if (!vfsub_inode_nlink(a->h_inode, AU_I_BRANCH)
&& !(a->h_inode->i_state & I_LINKABLE))
hn_xino(a->inode, a->h_inode); /* ignore this error */
inode_unlock_shared(a->h_inode);
@@ -620,7 +620,7 @@ int au_hnotify(struct inode *h_dir, struct au_hnotify *hnotify, u32 mask,
/* NFS fires the event for silly-renamed one from kworker */
f = 0;
- if (!dir->i_nlink
+ if (!vfsub_inode_nlink(dir, AU_I_AUFS)
|| (au_test_nfs(h_dir->i_sb) && (mask & FS_DELETE)))
f = AuWkq_NEST;
err = au_wkq_nowait(au_hn_bh, args, dir->i_sb, f);
diff --git a/fs/aufs/i_op.c b/fs/aufs/i_op.c
index 6369fa4e6..a5ad59924 100644
--- a/fs/aufs/i_op.c
+++ b/fs/aufs/i_op.c
@@ -615,7 +615,7 @@ int au_pin_hdir_relock(struct au_pin *p)
continue;
if (d_is_positive(h_d[i])) {
h_i = d_inode(h_d[i]);
- err = !h_i->i_nlink;
+ err = !vfsub_inode_nlink(h_i, AU_I_BRANCH);
}
}
@@ -1160,12 +1160,11 @@ static void au_refresh_iattr(struct inode *inode, struct kstat *st,
au_cpup_attr_nlink(inode, /*force*/0);
if (S_ISDIR(inode->i_mode)) {
- n = inode->i_nlink;
+ n = vfsub_inode_nlink(inode, AU_I_AUFS);
n -= nlink;
n += st->nlink;
- smp_mb(); /* for i_nlink */
/* 0 can happen */
- au_set_nlink(inode, n);
+ vfsub_set_nlink(inode, n);
}
spin_lock(&inode->i_lock);
@@ -1286,7 +1285,8 @@ static int aufs_getattr(struct mnt_idmap *idmap, const struct path *path,
if (!err) {
if (positive)
au_refresh_iattr(inode, st,
- d_inode(h_path.dentry)->i_nlink);
+ vfsub_inode_nlink(d_inode(h_path.dentry),
+ AU_I_BRANCH));
goto out_fill; /* success */
}
AuTraceErr(err);
diff --git a/fs/aufs/i_op_add.c b/fs/aufs/i_op_add.c
index bc33b3604..ccd581c13 100644
--- a/fs/aufs/i_op_add.c
+++ b/fs/aufs/i_op_add.c
@@ -110,7 +110,7 @@ int au_may_add(struct dentry *dentry, aufs_bindex_t bindex,
if (unlikely(d_is_negative(h_dentry)))
goto out;
h_inode = d_inode(h_dentry);
- if (unlikely(!h_inode->i_nlink))
+ if (unlikely(!vfsub_inode_nlink(h_inode, AU_I_BRANCH)))
goto out;
h_mode = h_inode->i_mode;
@@ -491,7 +491,7 @@ int aufs_tmpfile(struct mnt_idmap *idmap, struct inode *dir,
goto out_h_file;
}
- au_init_nlink(inode, 1);
+ vfsub_inode_nlink_init(inode, 1);
d_tmpfile(file, inode);
au_di(dentry)->di_tmpfile = 1;
get_file(h_file);
@@ -596,7 +596,7 @@ static int au_cpup_or_link(struct dentry *src_dentry, struct dentry *dentry,
inode = d_inode(src_dentry);
if (au_ibtop(inode) <= a->bdst)
h_inode = au_h_iptr(inode, a->bdst);
- if (!h_inode || !h_inode->i_nlink) {
+ if (!h_inode || !vfsub_inode_nlink(h_inode, AU_I_BRANCH)) {
/* copyup src_dentry as the name of dentry. */
bbot = au_dbbot(dentry);
if (bbot < a->bsrc)
@@ -809,7 +809,7 @@ int aufs_link(struct dentry *src_dentry, struct inode *dir,
au_dir_ts(dir, a->bdst);
inode_inc_iversion(dir);
- inc_nlink(inode);
+ vfsub_inc_nlink(inode);
inode_set_ctime_to_ts(inode, inode_get_ctime(dir));
d_instantiate(dentry, au_igrab(inode));
if (d_unhashed(a->h_path.dentry))
@@ -914,7 +914,7 @@ int aufs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
err = epilog(dir, bindex, wh_dentry, dentry);
if (!err) {
- inc_nlink(dir);
+ vfsub_inc_nlink(dir);
goto out_unpin; /* success */
}
diff --git a/fs/aufs/i_op_del.c b/fs/aufs/i_op_del.c
index 278c17797..ebce2fffa 100644
--- a/fs/aufs/i_op_del.c
+++ b/fs/aufs/i_op_del.c
@@ -92,7 +92,7 @@ int au_may_del(struct dentry *dentry, aufs_bindex_t bindex,
if (unlikely(d_is_negative(h_dentry)))
goto out;
h_inode = d_inode(h_dentry);
- if (unlikely(!h_inode->i_nlink))
+ if (unlikely(!vfsub_inode_nlink(h_inode, AU_I_BRANCH)))
goto out;
h_mode = h_inode->i_mode;
diff --git a/fs/aufs/i_op_ren.c b/fs/aufs/i_op_ren.c
index 784ddb2f6..7914751a0 100644
--- a/fs/aufs/i_op_ren.c
+++ b/fs/aufs/i_op_ren.c
@@ -638,7 +638,7 @@ static int au_may_ren(struct au_ren_args *a)
if (unlikely(d_is_negative(a->dst_h_dentry)))
goto out;
h_inode = d_inode(a->dst_h_dentry);
- if (h_inode->i_nlink)
+ if (vfsub_inode_nlink(h_inode, AU_I_BRANCH))
err = au_may_del(a->dst_dentry, a->btgt,
a->dst_h_parent, isdir);
}
@@ -1031,7 +1031,7 @@ int aufs_rename(struct mnt_idmap *idmap,
* If it is a dir, VFS unhash it before this
* function. It means we cannot rely upon d_unhashed().
*/
- if (unlikely(!a->dst_inode->i_nlink))
+ if (unlikely(!vfsub_inode_nlink(a->dst_inode, AU_I_AUFS)))
goto out_unlock;
if (!au_ftest_ren(a->auren_flags, ISDIR_DST)) {
err = au_d_hashed_positive(a->dst_dentry);
diff --git a/fs/aufs/iinfo.c b/fs/aufs/iinfo.c
index 283909b17..269567776 100644
--- a/fs/aufs/iinfo.c
+++ b/fs/aufs/iinfo.c
@@ -134,7 +134,7 @@ void au_update_ibrange(struct inode *inode, int do_put_zero)
h_i = au_hinode(iinfo, bindex)->hi_inode;
if (h_i
- && !h_i->i_nlink
+ && !vfsub_inode_nlink(h_i, AU_I_BRANCH)
&& !(h_i->i_state & I_LINKABLE))
au_set_h_iptr(inode, bindex, NULL, 0);
}
@@ -235,7 +235,7 @@ void au_iinfo_fin(struct inode *inode)
struct au_hinode *hi;
struct super_block *sb;
aufs_bindex_t bindex, bbot;
- const unsigned char unlinked = !inode->i_nlink;
+ const unsigned char unlinked = !vfsub_inode_nlink(inode, AU_I_AUFS);
AuDebugOn(au_is_bad_inode(inode));
diff --git a/fs/aufs/inode.c b/fs/aufs/inode.c
index 36d2d7ba8..b5fdd05b0 100644
--- a/fs/aufs/inode.c
+++ b/fs/aufs/inode.c
@@ -371,7 +371,7 @@ struct inode *au_new_inode(struct dentry *dentry, int must_new)
h_dentry = au_h_dptr(dentry, btop);
h_inode = d_inode(h_dentry);
h_ino = h_inode->i_ino;
- hlinked = !d_is_dir(h_dentry) && h_inode->i_nlink > 1;
+ hlinked = !d_is_dir(h_dentry) && vfsub_inode_nlink(h_inode, AU_I_BRANCH) > 1;
new_ino:
/*
@@ -424,7 +424,8 @@ new_ino:
au_xino_write(sb, btop, h_ino, /*ino*/0);
/* ignore this error */
goto out_iput;
- } else if (!must_new && !IS_DEADDIR(inode) && inode->i_nlink) {
+ } else if (!must_new && !IS_DEADDIR(inode)
+ && vfsub_inode_nlink(inode, AU_I_AUFS)) {
/*
* horrible race condition between lookup, readdir and copyup
* (or something).
diff --git a/fs/aufs/mvdown.c b/fs/aufs/mvdown.c
index cb1d670e5..f35e2400a 100644
--- a/fs/aufs/mvdown.c
+++ b/fs/aufs/mvdown.c
@@ -387,15 +387,16 @@ static int au_mvd_args_busy(const unsigned char dmsg, struct au_mvd_args *a)
&& atomic_read(&a->inode->i_count) == 1
/* && a->mvd_h_src_inode->i_nlink == 1 */
&& (!plinked || !au_plink_test(a->inode))
- && a->inode->i_nlink == 1)
+ && vfsub_inode_nlink(a->inode, AU_I_AUFS) == 1)
goto out;
err = -EBUSY;
AU_MVD_PR(dmsg,
"b%d, d{b%d, c%d?}, i{c%d?, l%u}, hi{l%u}, p{%d, %d}\n",
a->mvd_bsrc, au_dbtop(a->dentry), au_dcount(a->dentry),
- atomic_read(&a->inode->i_count), a->inode->i_nlink,
- a->mvd_h_src_inode->i_nlink,
+ atomic_read(&a->inode->i_count),
+ vfsub_inode_nlink(a->inode, AU_I_AUFS),
+ vfsub_inode_nlink(a->mvd_h_src_inode, AU_I_BRANCH),
plinked, plinked ? au_plink_test(a->inode) : 0);
out:
diff --git a/fs/aufs/super.c b/fs/aufs/super.c
index ec1cd2371..6da6270e9 100644
--- a/fs/aufs/super.c
+++ b/fs/aufs/super.c
@@ -778,7 +778,7 @@ int au_alloc_root(struct super_block *sb)
inode->i_op = aufs_iop + AuIop_DIR; /* with getattr by default */
inode->i_fop = &aufs_dir_fop;
inode->i_mode = S_IFDIR;
- au_init_nlink(inode, 2);
+ vfsub_inode_nlink_init(inode, 2);
unlock_new_inode(inode);
root = d_make_root(inode);
diff --git a/fs/aufs/vfsub.c b/fs/aufs/vfsub.c
index 86912a537..3167858be 100644
--- a/fs/aufs/vfsub.c
+++ b/fs/aufs/vfsub.c
@@ -387,7 +387,7 @@ static int au_test_nlink(struct inode *inode)
const unsigned int link_max = UINT_MAX >> 1; /* rough margin */
if (!au_test_fs_no_limit_nlink(inode->i_sb)
- || inode->i_nlink < link_max)
+ || vfsub_inode_nlink(inode, AU_I_BRANCH) < link_max)
return 0;
return -EMLINK;
}
diff --git a/fs/aufs/vfsub.h b/fs/aufs/vfsub.h
index bf12e73cf..1b660e2ad 100644
--- a/fs/aufs/vfsub.h
+++ b/fs/aufs/vfsub.h
@@ -93,7 +93,7 @@ static inline void vfsub_dead_dir(struct inode *inode)
{
AuDebugOn(!S_ISDIR(inode->i_mode));
inode->i_flags |= S_DEAD;
- clear_nlink(inode);
+ vfsub_clear_nlink(inode);
}
static inline int vfsub_native_ro(struct inode *inode)
diff --git a/fs/aufs/whout.c b/fs/aufs/whout.c
index 79c609079..094ef4fe1 100644
--- a/fs/aufs/whout.c
+++ b/fs/aufs/whout.c
@@ -968,10 +968,10 @@ int au_whtmp_rmdir(struct inode *dir, aufs_bindex_t bindex,
inode_unlock(wh_inode);
if (!err) {
- h_nlink = h_dir->i_nlink;
+ h_nlink = vfsub_inode_nlink(h_dir, AU_I_BRANCH);
err = vfsub_rmdir(h_dir, &wh_path);
/* some fs doesn't change the parent nlink in some cases */
- h_nlink -= h_dir->i_nlink;
+ h_nlink -= vfsub_inode_nlink(h_dir, AU_I_BRANCH);
}
if (!err) {
diff --git a/fs/aufs/xino.c b/fs/aufs/xino.c
index 8fab07297..f48da91f9 100644
--- a/fs/aufs/xino.c
+++ b/fs/aufs/xino.c
@@ -184,7 +184,7 @@ struct file *au_xino_create(struct super_block *sb, char *fpath, int silent,
h_dir = d_inode(h_parent);
inode = file_inode(file);
/* no delegation since it is just created */
- if (inode->i_nlink)
+ if (vfsub_inode_nlink(inode, AU_I_BRANCH))
err = vfsub_unlink(h_dir, &file->f_path, /*delegated*/NULL,
/*force*/0);
inode_unlock(h_dir);
@@ -1085,7 +1085,7 @@ static void au_xib_clear_bit(struct inode *inode)
struct super_block *sb;
struct au_sbinfo *sbinfo;
- AuDebugOn(inode->i_nlink);
+ AuDebugOn(vfsub_inode_nlink(inode, AU_I_AUFS));
sb = inode->i_sb;
xib_calc_bit(inode->i_ino, &pindex, &bit);
@@ -1774,7 +1774,7 @@ void au_xino_delete_inode(struct inode *inode, const int unlinked)
for (; bindex <= bbot; bindex++, hi++) {
h_inode = hi->hi_inode;
if (!h_inode
- || (!unlinked && h_inode->i_nlink))
+ || (!unlinked && vfsub_inode_nlink(h_inode, AU_I_BRANCH)))
continue;
/* inode may not be revalidated */
|
With your refactor, the warning also seems to be non-reproducible. Are you planning to commit these changes? |
Artur Piechocki:
With your refactor, the warning also seems to be non-reproducible. Are you planning to commit these changes?
Thanks for the tests.
On my side, I am still testing too. The commits will be released on next
Monday.
J. R. Okajima
|
By the commit in linux-v3.3-rc1 7ada4db 2012-01-06 vfs: count unlinked inodes vfs:__destroy_inode() became available to produce a warning about sb->s_remove_count. Basically inode->i_nlink should be referenced by anytime, and protected by inode_lock() or something in changing-time. In aufs, Artur Piechocki found a problem and it is necessary to be protected by another lock. The problem is a warning produced by VFS:__destroy_inode() about superblock->s_remove_count. I am not sure whether the warning appears since linux-v3.3 or not. Some other recent (much later than v3.3) changes in mainline MAY be related to the lifetime of inode or its link count. On my test environment, the warning never appeared. Here aufs introduces a spinlock dedicated to i_nlink only. In aufs, every user of i_nlink and VFS functions for it should call this new function regardless the inode is aufs or not. Reported by Artur Piechocki on github. See-also: sfjro/aufs-standalone#44 Signed-off-by: J. R. Okajima <[email protected]>
For the details, see previous commit (i_nlink 1/2). See-also: sfjro/aufs-standalone#44 Signed-off-by: J. R. Okajima <[email protected]>
------- Blind-Carbon-Copy
From: "J. R. Okajima" ***@***.***>
To: ***@***.***
Subject: aufs6 GIT release
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: ***@***.***>
Date: Mon, 02 Dec 2024 05:17:58 +0900
Message-ID: ***@***.***>
o news
- - Escape from a warning reported by Artur Piechocki on github.
+ i_nlink 1/2, protect aufs inode i_nlink
+ i_nlink 2/2, replace all use of i_nlink and VFS i_nlink funcs
- - new branch aufs6.6.63
to patch mm/mmap.c cleanly
aufs6.x-rcN branch is unchanged.
J. R. Okajima
- ----------------------------------------
- - aufs6-linux.git
aufs: i_nlink 1/2, protect aufs inode i_nlink
aufs: i_nlink 2/2, replace all use of i_nlink and VFS i_nlink funcs
- - aufs6-standalone.git
ditto
- - aufs-util.git
nothing
…------- End of Blind-Carbon-Copy
|
Hi Artur,
You might be surprising, but I'm still struglling with this issue. :-)
Artur Piechocki:
To reproduce the issue with Input/Output errors, try running F96-CE_4.iso (https://f96.puppylinux.com/), as it uses AUFS by default. It can also be run as virtual machine.
OK, I could reproduce the issue using this iso image under qemu.
But this environment is not suitable to debug and develop.
How can I add strace(1) into this environment? Should I get the source
and build by myself?
J. R. Okajima
|
Unfortunately, I'm not very familiar with Puppy Linux either, but I believe this distribution has a package manager called My goal in using Puppy Linux was to reproduce the issue on a different distribution to verify if the problem is specific to my Linux or not. Debugging the kernel directly here might be challenging since it would likely require enabling some debugging options, but maybe @peabee could help with that. Thanks for your engagement on this topic! |
You need to download and sfs-load the devx.sfs (hopefully has strace)...... |
Artur Piechocki:
Then, execute the following script to repeatedly remount /:
```
for i in {1..10}; do
while true; do mount -o remount / ; done &
done
```
That is really many mount processes in parallel.
Such concurrent mount processes will not success. Many of them will fail
due to a lockfile. Even if mount(2) succeeded, the process would return
an error because of mtab.lock or mtab~ or something, after calling
mount(2). And the system MAY reject such many processes because of some
resource limitation.
But those failure is not a problem. That is an expected correct
behaviour. The point is that the system becomes unusable after the
remount storm.
Checking /bin/mount in the iso file, I've found the command is a shell
script and the script runs /bin/mount-FULL binary. That is unexpected
situation for /sbin/mount.aufs (aufs-util.git).
/sbin/mount.aufs executes mount(8) internally and the path of the
command should be specified at the compile-time (via Makefile or
-DMOUNT_CMD=...). For your case, it should be /bin/mount-FULL instead of
/bin/mount.
Just to make sure simply I tried,
# cd /bin
# mv mount mountO
# ln -s mount-FULL mount
and then tried the remount storm.
Many processes returned the error, but the system survived and is still
usable.
Please try replacing /bin/mount or rebuilding /sbin/mount.aufs.
J. R. Okajima
|
PB:
You need to download and sfs-load the devx.sfs (hopefully has strace)......
https://mega.nz/folder/j0JQ2RaZ#Uiw3eA8MBOhOxHnwxqRKNg
Thank you. I got strace.
J. R. Okajima
|
"J. R. Okajima":
Please try replacing /bin/mount or rebuilding /sbin/mount.aufs.
Also I found a possible bug in aufs. Please apply this patch if
necessary. I'm not sure whether this patch is related to the issue or
not.
J. R. Okajima
diff --git a/fs/aufs/fsctx.c b/fs/aufs/fsctx.c
index 008a5aaf11e7..1d1690d9ed5a 100644
--- a/fs/aufs/fsctx.c
+++ b/fs/aufs/fsctx.c
@@ -54,13 +54,14 @@ static int au_fsctx_reconfigure(struct fs_context *fc)
di_write_lock_child(root);
err = au_opts_verify(sb, fc->sb_flags, /*pending*/0);
aufs_write_unlock(root);
- }
+ } else
+ goto out;
inode = d_inode(root);
inode_lock(inode);
err = si_write_lock(sb, AuLock_FLUSH | AuLock_NOPLM);
if (unlikely(err))
- goto out;
+ goto out_inode;
di_write_lock_child(root);
/* au_opts_remount() may return an error */
@@ -79,8 +80,9 @@ static int au_fsctx_reconfigure(struct fs_context *fc)
au_fhsm_wrote_all(sb, /*force*/1); /* ?? */
aufs_write_unlock(root);
…-out:
+out_inode:
inode_unlock(inode);
+out:
err = cvt_err(err);
AuTraceErr(err);
|
"J. R. Okajima":
Please try replacing /bin/mount or rebuilding /sbin/mount.aufs.
If you have a statically linked probram out of aufs which issues
only mount(MS_REMOUNT) (see below), then your system will be alive again
even if it stuck after the remount storm.
J. R. Okajima
…----------------------------------------
#include <sys/mount.h>
#include <stdio.h>
#include <linux/aufs_type.h>
int main(int argc, char *argv[])
{
int err;
char *mntpnt;
mntpnt = argv[1];
err = mount(NULL, mntpnt, AUFS_NAME, MS_REMOUNT, NULL);
if (err)
perror(mntpnt);
return err;
}
----------------------------------------
|
"J. R. Okajima":
If you have a statically linked probram out of aufs which issues
only mount(MS_REMOUNT) (see below), then your system will be alive again
even if it stuck after the remount storm.
"out of aufs" was wrong.
It is OK to put this mount(MS_REMOUNT) program inside aufs.
J. R. Okajima
|
"J. R. Okajima":
If you have a statically linked probram out of aufs which issues
only mount(MS_REMOUNT) (see below), then your system will be alive again
even if it stuck after the remount storm.
Also "echo 3 > /proc/sys/vm/drop_caches" will make your system back
again.
J. R. Okajima
|
"J. R. Okajima":
Also "echo 3 > /proc/sys/vm/drop_caches" will make your system back
again.
Here is my current theory on this isssue.
- During the remount, some error happens (the cause will be discussed
later).
- As a result, some files in aufs get "stale" status. Usually such
status is recovered by next "re-validation" time.
- Since "udba=none" is specified here (instead of the default
"udba=reval"), all revalidations are skipped. And the files are left
in the status.
- If the file is a shared object library (libc.so), then all dynamically
linked commands cannot be invoked here after.
- This situation can be recovered by mount(MS_REMOUNT) from a statically
linked command or "echo 2 > /proc/sys/vm/drop_caches".
Generally mount(8) behaves like this.
- /bin/mount (dynamic link)
+ /sbin/mount.aufs (static link)
++ /bin/mount (again)
+++ mount(MS_REMOUNT) systemcall
But your system has /bin/mount as a shell script.
- /bin/mount (shell script)
+ several commands
++ /bin/mount-FULL (dynamic link)
+++ /sbin/mount.aufs (static link)
++++ /bin/mount (shell script)
+++++ several commands
++++++ /bin/mount-FULL (again)
+++++++ mount(MS_REMOUNT) systemcall
If you gave a compile option to specify the path of mount(8) is
/bin/mount-FULL, the call graph would be like this.
- /bin/mount (shell script)
+ several commands
++ /bin/mount-FULL (dynamic link)
+++ /sbin/mount.aufs (static link)
++++ /bin/mount-FULL (again)
+++++ mount(MS_REMOUNT) systemcall
The shell script (/bin/mount) doesn't handle errors. Even if the
"several commands" failed their invocation and the script got an
incorrect (empty?) output from them, the script doesn't abort.
It means the behaviour of /bin/mount is not reliable. But I'm not sure
how evil this behaviour is. It might be just bogus but harmless.
The root cause of the error during remount is still unidentified. It can
be ENOMEM, if it is then /bin/mount should have printed "Cannot allocate
memory" or something. But there is no such message at all in the report,
just EIO appears. I'm not sure whether this "no msg" is caused by
unreliable /bin/mount script or other situation.
For now, I will make aufs remount function to print kernel log if an
error happens.
Have nice holidays.
J. R. Okajima
|
Thank you for your analysis, and I apologize for the late reply—it’s been a busy holiday period. The issue with the
Which, from the perspective of the
Let’s revisit the post here: as well as the subsequent posts and debug logs where binary However, I will take another look at this and try recompiling Regarding:
Indeed, it does restore the system to working condition, even in the case of my environment. Additionally, what about the fix proposed here?: How does this fix relate to your latest analysis? This fix fully resolved the issue in my case. I wish you a Happy New Year! |
Artur Piechocki:
The issue with the `/bin/mount` shell script does not apply to my environ=
Ok, understood.
Regarding:
`
echo 3 > /proc/sys/vm/drop_caches
`
Indeed, it does restore the system to working condition, even in the case=
of my environment.
Yes. Because you have set udba=none, all revalidations are skipped. You
need a way to discard the bogus cache. "echo 2" will be enough instead
of "3" mostly for you case.
Additionally, what about the fix proposed here?:
#44 (comment)=
3
How does this fix relate to your latest analysis?
This fix fully resolved the issue in my case.
I'm still doubtful about this patch and I don't know why VFS failed to
discard the unused caches at remount-time. That is the remained
mystery. And I want you to report the error log which will be produced
by au_fsctx_reconfigure() at the end of "mount -o remount". The error
log will be implemented by the aufs release today.
J. R. Okajima
|
This may be necessary for the issue reported by Artur Piechocki on github. See-also: sfjro/aufs-standalone#44 Signed-off-by: J. R. Okajima <[email protected]>
I have encountered an issue with the AUFS filesystem that occurs sporadically during system startup. The error message I receive is:
error while loading shared libraries: libc.so.6: cannot open shared object file: Input/output error
Upon further investigation, I found that this issue also occurs when I run a remount loop:
After some time, the following errors appear:
From that point on, it becomes impossible to run any command in the system as all attempts result in an "Input/output error."
The issue occurs on the latest kernel (6.10.6) and AUFS version (6.10-20240722).
After conducting some tests, I found that this problem started appearing with kernel 5.10 and has not occurred on kernel 5.9 (aufs 5.9-20210906). Something must have changed starting with kernel 5.10 that causes this issue to appear.
The issue only arises when
udba=none
is set while mounting AUFS. Whenudba=notify
is used, the problem does not occur.I understand that running a remount loop is not typical in normal environments, but the fact that it leads to such critical issues suggests a potential bug in AUFS.
@sfjro , would you be able to run a remount loop on your setup with AUFS using
udba=none
and check if the issue occurs for you as well?The text was updated successfully, but these errors were encountered: