Xymon Mailing List Archive search

alert/hostname loading (was Re: xymon hostdata module going rogue)

list John Thurston
Tue, 01 Dec 2015 12:03:03 -0900
Message-Id: <user-960db100d422@xymon.invalid>

On 12/1/2015 11:48 AM, J.C. Cleaver wrote:
- snip -
Hmm. This seems to be fundamentally a different issue than the "hostdata
module going rogue" thing, which was about zombies never being picked up.

AFAICT, somehow the hosts tree structure is getting clobbered as a result
of the drop (assuming all of those hosts are expected to be existing).
See my later message for its relation to 'drop' activity.
There were a few patches for things in xymond.c at one point, and more
error checking when going to POSIX btrees generally, but I hadn't
encountered this in other intermittent hostlist readers.

1) Which version of Solaris is this?
Solaris 10, most recent update, SPARC
2) Have you experienced this in other workers for xymon? (IE,
xymond_client not being able to look up hostnames after a drop -- would
probably lead to random purples)
I haven't seen behavior like that with other worker processes.
Is there a way to interactively run a worker process and have it hit the 
daemon process for the hostnames?
Aside from making the process dump core, is there a way to get the 
daemon to spill its current list of hostnames?
3) Does issuing a "reload" command or -HUP to xymond_alert re-sync things?
I didn't do a 'reload', but I killed the "xymond_channel --channel=page 
--log=/var/log/xymon/alert.log xymond_alert" process and alerts started 
working again.

I haven't yet found a way to induce this failure, so I haven't yet 
identified the minimal recovery steps. I'm working on it, though.
-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska