Xymon Mailing List Archive search

paging with REPEAT problem...

3 messages in this thread

list Olivier Beau · Sun, 27 Mar 2005 12:31:12 +0200 ·
i've setup a rule with REPEAT=7d

in the "info" page, i see what i execpt :
ping	user-fe6e0e6a0d05@xymon.invalid (R)	2m 	-	1w 	-	red


but i keep receiving mails every, not a regular basis
in notifications.log :
Sun Mar 27 11:26:03 2005 
Sun Mar 27 11:56:20 2005 
Sun Mar 27 12:11:42 2005 
up to now


in page.log, i see this :
2005-03-27 11:09:08 Worker process died with exit code 0, terminating
2005-03-27 11:09:08 Could not get shm of size 102400: No such file or directory
2005-03-27 11:09:08 Channel not available
2005-03-27 11:18:54 Worker process died with exit code 0, terminating
2005-03-27 11:18:54 Could not get shm of size 102400: No such file or directory
2005-03-27 11:18:54 Channel not available
2005-03-27 11:48:39 Worker process died with exit code 0, terminating
2005-03-27 12:01:01 Worker process died with exit code 0, terminating


should i restart hobbit, to clean up all ipc ?..


--
Olivier Beau
list Henrik Størner · Sun, 27 Mar 2005 12:43:19 +0200 ·
quoted from Olivier Beau
On Sun, Mar 27, 2005 at 12:31:12PM +0200, user-fe6e0e6a0d05@xymon.invalid wrote:
in page.log, i see this :
2005-03-27 11:09:08 Worker process died with exit code 0, terminating
2005-03-27 11:09:08 Could not get shm of size 102400: No such file or directory
2005-03-27 11:09:08 Channel not available
2005-03-27 11:18:54 Worker process died with exit code 0, terminating
2005-03-27 11:18:54 Could not get shm of size 102400: No such file or directory
2005-03-27 11:18:54 Channel not available
2005-03-27 11:48:39 Worker process died with exit code 0, terminating
2005-03-27 12:01:01 Worker process died with exit code 0,
terminating
Your hobbitd_alert proces dies for some reason, and when restarting it
has forgotten about when is the next time to send out an alert.

So why does it die ... the only reason I can come up with is that it
catches a signal from a child-process. Could you try changing line 332
of hobbitd/hobbitd_alert.c from
   sigaction(SIGPIPE, &sa, NULL);
to
   signal(SIGPIPE, SIG_IGN);

and let me know if that makes it keep on running ? If it does, then
the mail program that is launched to send the alerts does something
weird with it's I/O.


Henrik
list Olivier Beau · Mon, 28 Mar 2005 01:38:55 +0200 ·
quoted from Henrik Størner
Your hobbitd_alert proces dies for some reason, and when restarting it
has forgotten about when is the next time to send out an alert.

So why does it die ... the only reason I can come up with is that it
catches a signal from a child-process. Could you try changing line 332
of hobbitd/hobbitd_alert.c from
   sigaction(SIGPIPE, &sa, NULL);
to
   signal(SIGPIPE, SIG_IGN);

and let me know if that makes it keep on running ? If it does, then
the mail program that is launched to send the alerts does something
weird with it's I/O.
i've changed the code, and it keeps doing it in page.log :

2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 15:33:43 Worker process died with exit code 0, terminating
2005-03-27 15:33:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:33:43 Channel not available
2005-03-27 22:55:21 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Could not get shm of size 102400: No such file or directory
2005-03-27 22:58:15 Channel not available
2005-03-27 23:46:48 Worker process died with exit code 0, terminating
2005-03-27 23:46:48 Could not get shm of size 102400: No such file or directory
2005-03-27 23:46:48 Channel not available
2005-03-28 00:08:06 Worker process died with exit code 0, terminating
2005-03-28 00:08:07 Could not get shm of size 102400: No such file or directory
2005-03-28 00:08:07 Channel not available


i've been sending alert using a script, 
so maybe it's crummy..
i've changes to just sending mail and will let you know if it still have happens


btw, i've just realized that a rule was using a macro that didn't exist... i
dont think that a problem ..?


in the enadis.log (which i suppose is enable/disable)
i got those too :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 19:35:17 Worker process died with exit code 0, terminating
2005-03-27 19:35:17 Could not get shm of size 102400: No such file or directory
2005-03-27 19:35:17 Channel not available

I was not playing with maintenance (thow i do have a couple DOWNTIME in
bb-host..), what could be going on here ?


--
olivier