-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug:1564372] Setup Nagios server #41
Comments
Time: 20180409T12:21:53 On the easy side, we can monitor for ping, various metrics (disk space), services that are running. What policy do we want for alert ? And what SLA/SLE, especially due to timezone difference ? |
Time: 20180409T16:20:52 |
Time: 20180620T18:25:33 As a result this bug is being closed. If the bug persists on a maintained version of gluster or against the mainline gluster repository, request that it be reopened and the Version field be marked appropriately. |
Time: 20180910T15:28:36 Now, I need to:
So far, it worked, cause I got paged for a IP v6 problem in the cage (cause there is no ipv6 in the cage in the first place...) |
Time: 20180926T11:24:14 All servers managed by ansible are now monitored for ping/ssh (which did permit to see that our freebsd hosts blocked ping, because i got paged for that as soon as I deployed). Aka, all but gerrit prod. I have added smtp port on supercolony, and vhost checking for a couple of web site, see ansible repo for details. For now, and while I do clean the roles and stuff, I am the only one receiving alerts, but we will need a plan for the future, I did discuss with nigel on irc. Notes for myself (and people that care), here the list of things to do:
|
Time: 20180926T15:57:17
type=AVC msg=audit(1537977117.718:115791): avc: denied { search } for pid=19206 comm="send_nsca" name="nagios" dev="dm-0" ino=271810 scontext=system_u:system_r:munin_t:s0-s0:c0.c1023 tcontext=system_u:object_r:nagios_etc_t:s0 tclass=dir This one shouldn't be too hard to fix.
[1537976773] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;supercolony.gluster.org;Disk usage in percent;1;WARNINGs: / is 93.80 (outside range [:92]).
|
Time: 20180926T16:36:39 type=AVC msg=audit(1537979718.243:116446): avc: denied { name_connect } for pid=27096 comm="send_nsca" dest=5667 scontext=system_u:system_r:munin_t:s0-s0:c0.c1023 tcontext=system_u:object_r:unreserved_port_t:s0 tclass=tcp_socket Guess I might need to write my own policy. |
Time: 20180927T13:50:37 |
Time: 20180927T15:03:48 In the mean time, I will make munin run as unconfined server side until I can work on a send_nsca policy. |
Time: 20180928T15:21:43 |
Time: 20180928T17:54:40
Next step:
|
Time: 20190219T11:28:21 |
URL: https://bugzilla.redhat.com/1564372
Creator: nigelb at redhat
Time: 20180406T06:09:12
We need to setup a nagios server that alerts us to system failures. These include machines which are disconnected from Jenkins and/or have full disk space. It would let our reactions be far more predictive rather than reactive.
This is a long running goal, but for the moment, I'll settle for a nagios server and all machines having nagios clients.
If we want to replace nagios with another equivalent like icinga, that works too.
The text was updated successfully, but these errors were encountered: