Xymon Mailing List Archive search

DEVMON stops working every now and then

list Buchan Milne
Thu, 12 Nov 2009 18:30:51 +0100
Message-Id: <user-bb48ae942ca3@xymon.invalid>

On Wednesday, 11 November 2009 22:37:56 user-c15424b7e83a@xymon.invalid wrote:
We have the same problem - I've even got devmon configured under SMF in
Solaris however it doesn't pick up the fact its crashed as the process is
still there.
It doesn't crash. As far as I can tell, eventually all the child processes lose communication with the master process, but they are all still running, just waiting for someone to tell them to do something.
A quick and dirty workaround we have is to send an alert on the "dm"
monitor going purple - this allows the on-call engineer to be alerted to
the fact we are no longer effectively monitoring the network devices and so
to restart the process!

There must be a better way though...
Devmon has had "goes purple" problems since 0.2.2 beta. I fixed the more frequent one before the 0.3.0 release.

Anyway, I've done some work on this, however the only production instance of devmon I look at often at present last went purple 9 days ago ...

If you are reproducing more frequently, please have a look at the devmon-devel mailing list (or archives[1] once they have updated), I just sent a mail with an attached patch (against svn, it may apply to the 0.3.1-beta1, haven't tried) that may fix the problem, allow us to narrow it down further, or at least eliminate one aspect as the cause.

1. http://sourceforge.net/mailarchive/forum.php?forum_name=devmon-devel

Regards,
Buchan