-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to send mail in case btrfs issues were detected #107
base: master
Are you sure you want to change the base?
Conversation
I suggest a cron job, cause cron knows how to send mails.
|
Doing that would result in:
So, I still see good reasons to have the extra script for this :-) |
Which cron implementation can do this without an MTA? When I investigated this, I discovered that Fedora now appears to log cron output to syslog (and by now, maybe journald rather than syslog) rather than piping output to the MTA; using journald might also be problematic, because not all systems have adequate persistent journal retention policies. I like the idea of using a file (/run/btrfs-issue-mail-sent), and I wonder if this idea could be extended. @ximion, what do you think about the following approach (pros, cons, etc): Poll btrfs stats on an hourly basis, and dump it to a file. Limit notification emails similarly to the logic you've proposed, but send a follow up email if the rate of errors rapidly increases. The reason I wonder about this approach is because of the following case: One disk is begins to fail rapidly, and the rate of failed reads (or failed writes) is increasing hour by hour. Meanwhile, the firmware lies about SMART data while claiming everything is fine. It also seems like having a file with regularly updated stats could be used to enable desktop notifications, albeit in another project, since this seems out of scope for btrfsmaintenance. Btrfs dev stats are "updated during filesystem [mount] lifetime" in addition to "from a scrub run" (btrfs-device(8)), which is why I think this approach may have value :-) |
Oh, and here are the citations for the Fedora case: |
In general I think those are good ideas, and the case of errors rapidly increasing on a disk actually appears to be relatively common - on our systems once a disk is starting to fail, I can pretty much bet on this behavior. |
Why not use mail instead of sendmail? See the following fragment from unit packagekit-background.service this is when something useful was doneif [ $PKCON_RETVAL -ne 5 ]; then |
This can be very useful for smaller setups where the admin still would like to receive an email in case a disk in a btrfs RAID array fails. Partially resolves kdave#88
Small suggestion. It would be a good idea if there would be some test path to validate that everything is set up correctly and that I will indeed get the email notification when something goes wrong. Similarly like SMART has the But otherwise, this is very much needed for me so thanks a lot for this PR! Hopefully this will be merged 👍 |
Thanks. I imagine it's stuff you've already thought of, of course ;) I'm encouraged to hear that this failure mode is common, because common problems of sufficient severity make something work towards a solution pragmatically useful.
Yes, definitely, and there was upstream thread that indicates a need for it: Zygo Blaxell proposes an autodefrag daemon here: https://www.spinics.net/lists/linux-btrfs/msg122168.html And a user (Ghislain Adnet) requests what this PR solves here: https://www.spinics.net/lists/linux-btrfs/msg110798.html I find Adnet's request interesting because this would be where a future |
/\ @ximion |
Agree that an email-on-error service should be added. ZFS supports this behavior, for any preinstalled mail service, via |
FWIW, we've been using 'sendemail' for many years. It's still a dependency,
but a much lighter one.
The actual mail server runs elsewhere, no need to have every system be it's
own mail server.
…On Sat, Mar 19, 2022 at 1:40 PM Matthias Klumpp ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In btrfs-errmail.sh
<#107 (comment)>
:
> +then
+ # no email set, nothing to do for us
+ exit 0
+fi
+
+BTRFS_STATS_MOUNTPOINTS=$(expand_auto_mountpoint "auto")
+OIFS="$IFS"
+IFS=:
+for MM in $BTRFS_STATS_MOUNTPOINTS; do
+ if ! is_btrfs "$MM"; then
+ echo "Path $MM is not btrfs, skipping"
+ continue
+ fi
+ devstats=$(btrfs device stats --check $MM 2>&1)
+ if [ $? -ne 0 ]; then
+ mail_body="$(sendmail -t <<EOF
Sendmail would obviously have to be a dependency of this. I changed the
code so in case an email location was set but sendmail wasn't installed,
the script will fail and print a warning to stderr.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQSRYS7ZSEJPGWN3KRBH3TVAYUTFANCNFSM5RDJO4IA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Co-authored-by: Adam Uhlíř <[email protected]>
Do you know if any progress has been made on the "btrfsd" front? |
Matthias Klumpp ***@***.***> writes:
> /\ @ximion
Do you know if any progress has been made on the "btrfsd" front?
I haven't heard anything further. If boot environment handling is
within the ideal scope of "btrfsd", then maybe grub-btrfsd could be
grown into a general-purpose maintenance btrfsd? But maybe that's too
much of a stretch...
https://github.com/Antynea/grub-btrfs
If future btrfsd would does boot environment handling, then it will
probably need to support systemd-boot. I wonder if this chicken/egg
problem isn't going to be solved until someone from Fedora implements
something, and then it becomes defacto standard.
|
I'm working on a thing (called btrfsd for now because I don't have a better name...) which will basically be a small binary called by a systemd timer to perform actions like btrfsmaintenance does, but likely a bit more basic, and scratch my particular itch about mail sending and syslog-message-writing, because this patch apparently won't be merged anytime soon. |
I'm working on a thing (called btrfsd for now because I don't have a
better name...) which will basically be a small binary called by a systemd
timer to perform actions like btrfsmaintenance does, but likely a bit more
basic, and scratch my particular itch about mail sending and
syslog-message-writing, because this patch apparently won't be merged
anytime soon.
No ETA on this thing yet though, as I am drowning in work a bit and this
will be a "when time permits" kind of project.
Thank you, much appreciated! Please CC me news.
grub-btrfs looks super cool! Probably does make sense being its own
project though (consolidating all tools would ease maintenance a bit, but
would also require the maintainers to be familiar with every aspect of the
software...)
🙂 and fair point; I guess that means there's still a need for distribution
maintainers to do this work themselves!
|
I actually had some time to work on this, and tiny Btrfsd is born :-) |
Hi!
This PR adds an extremely basic script that just runs
btrfs device stats --check
on all btrfs filesystems every hour and sends an email to a user-defined address (most likelyroot
in 90% of all cases) in case any issues were found.This should very much work like the mdadm daemon feature that also sends mail in case one of the RAID members is about to fail.
A feature like this can be very useful for smaller setups where the admin still would
like to receive an email in case a disk in a btrfs RAID array fails.
This also is likely the billionth time someone has written such a script, so putting a version in one place where it can be shared and improved seemed like a good idea, and btrfsmaintenance seems to be the perfect place to add such a feature.
Thanks for considering this PR!