Xymon Mailing List Archive search

xymon hostdata module going rogue - bug report

list John Thurston
Mon, 31 Aug 2015 09:19:13 -0800
Message-Id: <user-e161add9dce0@xymon.invalid>

On Fri, August 28, 2015 3:16 pm, John Thurston wrote:
On 8/28/2015 12:45 PM, John Thurston wrote:
On 6/10/2015 9:01 AM, Scot Kreienkamp wrote:
. . .
hobbit   28452  0.0  0.0      0     0 ?        Z    12:50   0:00
[xymond_hostdata] <defunct>

It seemed related to drop messages . . .
Hey, I think I'm seeing the same thing on Solaris with 4.3.21

I've ended up here after a customer let me know that email alerts were
not working as expected. After a few hours of digging around, I decided
that the alert daemon was failing to retrieve hostnames and failing
miserably.

Have other people seen this behavior?
I have duplicated this behavior on another xymon server on Solaris. It
certainly looks like this behavior breaks the alert daemon. Fortunately,
I "drop" hosts in batches so can restart Xymon at that time, but this is
still pretty icky.
On 8/28/2015 3:12 PM, J.C. Cleaver wrote:
The patch from
http://lists.xymon.com/pipermail/xymon/2015-June/041833.html was checked
in in https://sourceforge.net/p/xymon/code/7669/ , however it's not in the
most recent Terabithia RPM.

If you could test the direct patch (for hostdata, at
http://lists.xymon.com/pipermail/xymon/attachments/20150610/8b425efb/attachment.obj
) on your OS, that would be very helpful. Signal handling is always a bit
tricky to ensure is correct across the board.
I have patched one of my servers and it behaves much better under my 
contrived tests :) This is under Solaris 10 (Update 11) on SPARC. The 
original report was under Red Hat Enterprise Linux 5.

If my understanding of this is correct, it is a pretty nasty defect :(

My failure scenario was non-delivery of some email alerts for hosts in 
dire straits. I have several customers who do not monitor the web 
interface, but rely on email notifications to warn them of impending 
problems. These folks had been without any alerting capability since 
early in July when I "dropped" at host and unknowingly clobbered the 
child of xymond_hostdata.

-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska