-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock while remounting nested bound mountpoint #40
Comments
Hello Arkadiusz,
"Arkadiusz B.":
`mkdir /union/mnt/new`
`mount -o bind /union /union/mnt/new`
`mount -o remount /union/mnt/new`
after last command console is locked.
I tried the same command sequnece on vanilla 6.1.0 + my current aufs6.1,
but could not reproduce. I will try again with closer environtmen to
yours later.
And did you install aufs-util.git on your system?
J. R. Okajima
|
I got outdated version, but after update I'm also reproducing this issue. |
"Arkadiusz B.":
> And did you install aufs-util.git on your system?
I got outdated version, but after update I'm also reproducing this issue.
I tried reproducing again on plain linux-v6.1.92 + aufs6.1, but could
not reproduce the problem.
The kernel log in your first mail shows
- mount[16359] got a warning that semaphore owner is different.
- 30 seconds later, snmpd[21963] and kbtest[9102] are warned "blocked
for more than 30 seconds".
- at the same time, mount[16359] (again) is also warned "blocked for
more than 30 seconds".
For snmpd[21963] and kbtest[9102], it is understandable that they
stopped working since mount[16359] is still running. Then why
mount[16359] stopped working? It was warned as the semaphore owner is
different, and then stopped working. But the stopped point is ahead from
the first warning, which means after the first warning mount[16359] kept
running and moved ahead. And then stopped with holding a semaphore.
Is this the scenario? Really? I don't understand.
Hmm, I don't know why the semaphore owner is different. It has to be
same. In the remount process, aufs au_fsctx_reconfigure() function
acquires the semaphore and au_remount_refresh() (in the same process)
releases the semaphore temporary. I guess LOCKDEP produces the warning
here. It is weird.
Did you apply some other patches to your kernel?
I guess you applied lockdep-debug.patch in aufs-standalone.git. If you
didn't, the kernel build should fail, or another warning would be
produced much earlier and LOCKDEP would be off.
J. R. Okajima
|
No, only the standard set of patches was applied.
|
"Arkadiusz B.":
Here is how I'm reproducing this from scratch:
```
mkdir /union
dd if=/dev/zero of=/tmp/image bs=1M count=8
mkfs.ext4 /tmp/image
mount -o loop /tmp/image /mnt/image
mount -t aufs -o br:/mnt/image=rw none /union
mkidr /union/usr
cp /bin/bash /union/usr/
mkdir -p /union/mnt/test
mount -o remount /union/mnt/test
```
Hmm, that is really strange.
The last "remount" should return an error saying "that is not a mount
point" or something.
Did you forget
mount -o bind /union /union/mnt/test
just before the last "remount"?
Anyway I tried using loopback mounted ext4 on v6.1.92 again, but could
not reproduce.
Something is totally broken. I don't think it sane that the semaphore
onwer changes silently.
I'm not sure whether this would help us or not, what will happen if you
try
mount -o move /union /union/mnt/test
and "remount"
instead of "bind"?
J. R. Okajima
|
Sorry, my "enter" key got stuck, sent the message and closed the ticket... This is correct set of commands:
There is one more thing to add. The aufs is mounted at the initrd stage where busybox is used. But I'm also reproducing on running system without busybox. |
"Arkadiusz B.":
Sorry, my "enter" key got stuck, sent the message and closed the ticket... This is correct set of commands:
I see.
I guess "bind"ing a subdir onto another (sub)subdir is the key.
I was truing binding the ROOT dir onto a subdir (and failed
reproducing).
Now I will think about why older aufs could handle it.
Give me some time.
J. R. Okajima
|
Great, thank you. |
I guess "bind"ing a subdir onto another (sub)subdir is the key.
I was truing binding the ROOT dir onto a subdir (and failed
reproducing).
Now I will think about why older aufs could handle it.
In aufs5.10, aufs hired fs_context in mainline and it passes a different
dentry (the command line parameter, /union/mnt/test in your case).
Aufs should not trust it is the root dentry in aufs super_block, I
think.
Would you test this patch?
J. R. Okajima
diff --git a/fs/aufs/fsctx.c b/fs/aufs/fsctx.c
index 43b21910bc67..73d0cbe5b2c9 100644
--- a/fs/aufs/fsctx.c
+++ b/fs/aufs/fsctx.c
@@ -47,6 +47,7 @@ static int au_fsctx_reconfigure(struct fs_context *fc)
root = fc->root;
sb = root->d_sb;
+ root = sb->s_root; /* "bound"-mount differs */
err = si_write_lock(sb, AuLock_FLUSH | AuLock_NOPLM);
if (!err) {
di_write_lock_child(root);
|
|
"Arkadiusz B.":
I tested it and could not reproduce issue anymore. Thank you 👍
Thanx for testing.
The patch will be merged after a few weeks.
I'm testing several kernel versions now to release on next Monday. It is
not for this "root" dentry bug, but another AIO bug.
The fix for this "root" dentry bug will be released after that.
J. R. Okajima
|
Great news. Thank you. |
If the subdir in aufs is "bind" mounted and remount is issued, fs_context gives us the bound non-root dentry as if it were root. But aufs expects fc->root being always root. See-also: sfjro/aufs-standalone#40 Signed-off-by: J. R. Okajima <[email protected]>
------- Blind-Carbon-Copy
From: "J. R. Okajima" ***@***.***>
To: ***@***.***
Subject: aufs6 GIT release (v6.10-rc5), aufs6.1--aufs6.5 will end
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: ***@***.***>
Date: Mon, 01 Jul 2024 06:20:39 +0900
Message-ID: ***@***.***>
o Bugfix
- - fc->root may not be real root, reported by Arkadiusz B. on github.
- - (aufs-util.git) break the loop if fd is not root.
o News
- - When linux-v6.10 is released, aufs6.1--aufs6.5 will get the end of
life and will not be supported. aufs6.6 will become my new development
base.
- - according to www.kernel.org, linux-v6.8 got marked as EOL.
aufs6.8 simply follows it.
J. R. Okajima
- ----------------------------------------
- - aufs6-linux.git
aufs: bugfix, fc->root may not be real root
- - aufs6-standalone.git
ditto
- - aufs-util.git
break the loop if fd is not root
…------- End of Blind-Carbon-Copy
|
Hello,
I noticed a deadlock while trying to remount nested and bound mountpoint. It is reproducible with different underlying filesystems.
Steps to reproduce (assuming
/union
is a simple aufs r/w mount point):mkdir /union/mnt/new
mount -o bind /union /union/mnt/new
mount -o remount /union/mnt/new
after last command console is locked.
At the time of writing last kernel that works is 5.4. Issue is observed also on 5.15 LTS kernel and 6.1 LTS kernel.
There is dump of tasks from the 6.1.92 kernel:
The text was updated successfully, but these errors were encountered: