Your hobbitd_alert proces dies for some reason, and when restarting it
has forgotten about when is the next time to send out an alert.
So why does it die ... the only reason I can come up with is that it
catches a signal from a child-process. Could you try changing line 332
of hobbitd/hobbitd_alert.c from
sigaction(SIGPIPE, &sa, NULL);
to
signal(SIGPIPE, SIG_IGN);
and let me know if that makes it keep on running ? If it does, then
the mail program that is launched to send the alerts does something
weird with it's I/O.
i've changed the code, and it keeps doing it in page.log :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 15:33:43 Worker process died with exit code 0, terminating
2005-03-27 15:33:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:33:43 Channel not available
2005-03-27 22:55:21 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Worker process died with exit code 0, terminating
2005-03-27 22:58:15 Could not get shm of size 102400: No such file or directory
2005-03-27 22:58:15 Channel not available
2005-03-27 23:46:48 Worker process died with exit code 0, terminating
2005-03-27 23:46:48 Could not get shm of size 102400: No such file or directory
2005-03-27 23:46:48 Channel not available
2005-03-28 00:08:06 Worker process died with exit code 0, terminating
2005-03-28 00:08:07 Could not get shm of size 102400: No such file or directory
2005-03-28 00:08:07 Channel not available
i've been sending alert using a script,
so maybe it's crummy..
i've changes to just sending mail and will let you know if it still have happens
btw, i've just realized that a rule was using a macro that didn't exist... i
dont think that a problem ..?
in the enadis.log (which i suppose is enable/disable)
i got those too :
2005-03-27 15:27:43 Worker process died with exit code 0, terminating
2005-03-27 15:27:43 Could not get shm of size 102400: No such file or directory
2005-03-27 15:27:43 Channel not available
2005-03-27 19:35:17 Worker process died with exit code 0, terminating
2005-03-27 19:35:17 Could not get shm of size 102400: No such file or directory
2005-03-27 19:35:17 Channel not available
I was not playing with maintenance (thow i do have a couple DOWNTIME in
bb-host..), what could be going on here ?
--
olivier