Xymon Mailing List Archive search

hobbitd_alert crash

2 messages in this thread

list Christopher T. Beers · Thu, 03 Mar 2005 21:40:45 -0500 ·
hobbitd_alert crashed earlier this afternoon.  Hobbit started to send out alerts like this

red - Program crashed

Fatal signal caught!

See http://host.domain.com/hobbit-cgi/bb-hostsvc.sh?HOSTSVC=host,domain,com.hobbitd_alert

Since I had not applied the post RC4 patch to my RC4 instance, I figured I must have run into the hobbitd_alert bug.  So I stopped hobbit, applied the patch to the source, make clean, make, make install.  Restarted hobbit.

Now it appears to be working because the alert module is sending out actual alerts (mrtg alerts and vmio to high alerts).  However, I am still getting those hobbitd_alert crashed messages, but its running with a ps -ef....

How can I get rid of the 100 or so per hour of hobbitd_alert crashed emails that are being sent out.

-- 
Christopher T. Beers
Lead UNIX Architect - System Infrastructure Services (SIS)
Syracuse University | 250 Machinery Hall | Syracuse, NY XXXXX
(XXX) XXX-XXXX Office | (XXX) XXX-XXXX Fax | user-62463a5fbf92@xymon.invalid Pager
list Henrik Størner · Fri, 4 Mar 2005 07:44:58 +0100 ·
quoted from Christopher T. Beers
On Thu, Mar 03, 2005 at 09:40:45PM -0500, Christopher T. Beers wrote:
Now it appears to be working because the alert module is sending out actual 
alerts (mrtg alerts and vmio to high alerts).  However, I am still getting 
those hobbitd_alert crashed messages, but its running with a ps -ef....

How can I get rid of the 100 or so per hour of hobbitd_alert crashed emails 
that are being sent out.
~hobbit/server/bin/bb 127.0.0.1 "drop HOBBITHOST hobbitd_alert" will
drop that hobbitd_alert column and should stop the messages from being
repeated.

However, I think you're missing the one fix that was added after I
first announced the post-RC4 patch - so your alert messages get
repeated every minute. Change line 1488 of hobbitd/do_alert.c from
     rpt = find_repeatinfo(alert, recip, 0);
t
     rpt = find_repeatinfo(alert, recip, 1);

There will be an RC5 because of this confusion.


Henrik