Xymon Mailing List Archive search

xymon hostdata module going rogue

list John Thurston
Tue, 01 Dec 2015 12:41:26 -0900
Message-Id: <user-a92d643cefee@xymon.invalid>

On 12/1/2015 11:51 AM, J.C. Cleaver wrote:
On Tue, December 1, 2015 9:32 am, John Thurston wrote:
*snip*
In this occurrence, it does not appear to be related to a "drop"
message. My last recorded "drop" was at 20151103-0846 and the alert
process didn't start logging "which is not defined" until 20151120-0007
Hmm. Okay, that does change things slightly. Fortunately, that means it's
probably specifically caused by drops per se. Were there any other errors
that occurred with other components around this time?
I have several instances of "Oversize status msg from " in the xymond.log, but those are appearing six hours before the bad behavior appeared in xymon_alert. I have difficulty believing they are related.
Perhaps the system
being low enough on memory that some re-allocations might have failed?
I think this is unlikely. The system has 256GB of RAM, and there are no memory caps placed on the non-global zone in which xymon is running. I don't have information of its size on Nov 20, but today it using about 400MB of RAM. All of the zones on the system are consuming less than 10GB of the 256GB and it wouldn't have been significantly different a few weeks ago.

I've been doing some 'drops' today to try to break it, but haven't succeeded. I'll continue to beat on it and see if I can find a repeatable failure scenario.

fwiw, this is under 4.3.22
-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska