You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
About 60% service checks are either health checks (wmi or win-rm) with check interval of 5 to 15 minutes.
About 3~5 % service checks are HTTP health checks for Rabbitmq with check interval of 1 min and notification interval of 1 min.
Its standalone machine and it’s not scaled.
we are running
a) poller with min_worker as 6 and max_worker as 16
b) And reactionner with min_worker as 4 and max_worker with 12.
Commonly seen in logs:
Reactionner Log:
File "/usr/lib/python2.6/site-packages/shinken/action.py", line 125, in execute
return self.execute__() ## OS specific part
File "/usr/lib/python2.6/site-packages/shinken/action.py", line 311, in execute__
preexec_fn=os.setsid)
File "/usr/lib64/python2.6/subprocess.py", line 642, in init
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1238, in _execute_child
raise child_exception
TypeError: execve() arg 2 must contain only strings
Broker Log:
Error : Back trace of this error: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/shinken/daemon.py", line 864, in http_daemon_thread
self.http_daemon.run()
File "/usr/lib/python2.6/site-packages/shinken/http_daemon.py", line 283, in run
self.srv.run()
File "/usr/lib/python2.6/site-packages/shinken/http_daemon.py", line 123, in run
raise PortNotFree(msg)
PortNotFree: Error: Sorry, the port 7772 is not free: No socket could be created
Poller Log:
[1606292549] Error : [Livestatus Query] Error: 'Hosts' object has no attribute 'itersorted'
[1606292744] Error : [broker-master] The external module livestatus goes down unexpectedly!
[1606292744] Error : [broker-master] The external module npcdmod goes down unexpectedly!
[1606292744] Warning : [broker-master] Connection problem to the scheduler scheduler-master: Connexion error to http://localhost:7768/ : couldn't connect to host
[1606292747] Warning : [broker-master] Connection problem to the poller poller-master: Connexion error to http://localhost:7771/ : Operation timed out after 3000
Dmesg:
TCP: too many of orphaned sockets
__ratelimit: 192 callbacks suppressed
TCP: too many of orphaned sockets
TCP: too many of orphaned sockets
TCP: too many of orphaned sockets
TCP: too many of orphaned sockets
Netstat;
netstat –anp | grep 7772
we see it in either FIN_WAIT1 or FIN_WAIT2 state
Currently we run sysctl -w net.ipv4.tcp_max_orphans=0 and kill and restart all shinken services to make it up and running .
This happens 2 or 3 times in a day .
Please help us on overcoming this problem .
Upgrading to shinken 2.4.3 will fixe the problem ? Or tuning kernel params like net.ipv4.tcp_mem, net.ipv4.tcp_fin_timeout, etc..will further help..
The text was updated successfully, but these errors were encountered:
Hello, the issue your're facing, it's strange. I'm running a Shinken platform with more than 2k hosts, and more than 45k services, and I never had such problems.
It's a fairly old Shinken release you are running. It should be a good idea to try to upgrade, anyway. I doubt the latest release will run on Python 2.6, through.
Hardware:
CPU : 24 Core
RAM : 24 GB
Shinken version: 2.0.3
Python Version:2.6.6
OS: Centos 6.10
Hosts Monitored: 409
Total Services : 14600
About 60% service checks are either health checks (wmi or win-rm) with check interval of 5 to 15 minutes.
About 3~5 % service checks are HTTP health checks for Rabbitmq with check interval of 1 min and notification interval of 1 min.
Its standalone machine and it’s not scaled.
we are running
a) poller with min_worker as 6 and max_worker as 16
b) And reactionner with min_worker as 4 and max_worker with 12.
Commonly seen in logs:
Reactionner Log:
File "/usr/lib/python2.6/site-packages/shinken/action.py", line 125, in execute
return self.execute__() ## OS specific part
File "/usr/lib/python2.6/site-packages/shinken/action.py", line 311, in execute__
preexec_fn=os.setsid)
File "/usr/lib64/python2.6/subprocess.py", line 642, in init
errread, errwrite)
File "/usr/lib64/python2.6/subprocess.py", line 1238, in _execute_child
raise child_exception
TypeError: execve() arg 2 must contain only strings
Broker Log:
Error : Back trace of this error: Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/shinken/daemon.py", line 864, in http_daemon_thread
self.http_daemon.run()
File "/usr/lib/python2.6/site-packages/shinken/http_daemon.py", line 283, in run
self.srv.run()
File "/usr/lib/python2.6/site-packages/shinken/http_daemon.py", line 123, in run
raise PortNotFree(msg)
PortNotFree: Error: Sorry, the port 7772 is not free: No socket could be created
Poller Log:
[1606292549] Error : [Livestatus Query] Error: 'Hosts' object has no attribute 'itersorted'
[1606292744] Error : [broker-master] The external module livestatus goes down unexpectedly!
[1606292744] Error : [broker-master] The external module npcdmod goes down unexpectedly!
[1606292744] Warning : [broker-master] Connection problem to the scheduler scheduler-master: Connexion error to http://localhost:7768/ : couldn't connect to host
[1606292747] Warning : [broker-master] Connection problem to the poller poller-master: Connexion error to http://localhost:7771/ : Operation timed out after 3000
Dmesg:
TCP: too many of orphaned sockets
__ratelimit: 192 callbacks suppressed
TCP: too many of orphaned sockets
TCP: too many of orphaned sockets
TCP: too many of orphaned sockets
TCP: too many of orphaned sockets
Netstat;
netstat –anp | grep 7772
we see it in either FIN_WAIT1 or FIN_WAIT2 state
Currently we run sysctl -w net.ipv4.tcp_max_orphans=0 and kill and restart all shinken services to make it up and running .
This happens 2 or 3 times in a day .
Please help us on overcoming this problem .
Upgrading to shinken 2.4.3 will fixe the problem ? Or tuning kernel params like net.ipv4.tcp_mem, net.ipv4.tcp_fin_timeout, etc..will further help..
The text was updated successfully, but these errors were encountered: