Xymon Mailing List Archive search

Possible defect in rrd handler causing divide-by-zero crashes

list John Thurston
Wed, 22 Apr 2015 10:28:43 -0800
Message-Id: <user-3332f6a81529@xymon.invalid>

On 4/21/2015 4:40 PM, J.C. Cleaver wrote:
On Tue, April 21, 2015 2:04 pm, John Thurston wrote:
It has been a long road, but I may have uncovered a defect in the rrd
handler. I'm currently running xymon 4.3.17 (somewhat patched) on
Solaris 10 on SPARC.

:: Symptom ::
The xymond_rrd process crashes. It leaves footprints in the log like:
2015-04-20 19:09:18 Child process 23929 died: Signal 8
2015-04-20 19:09:18 Peer at 0.0.0.0:0 failed: Broken pipe
2015-04-20 19:09:18 Peer not up, flushing message queue
It also leaves a pid file behind.
It also leaves gaps in the rrd data.
- snip -
:: Hypothesis ::

The message handling code is accepting messages from clients stating 0MB
total physical memory, but such information is making its way into the
RRD handler and causing a divide by zero.

Can anyone else test this hypothesis?

Can someone with more C-skills look at do_la_rrd and see if a zero
really can find its way into its division statements?
Yep, seems exactly like that's the case! I believe the following patch
should fix it for you. Can you try it out?
Thank you!

I created a script with which I could semi-reliably induce a crash by feeding a message claiming 0MB of physical memory. It isn't 100% reliable because I think there is some magic timing I haven't deciphered. But if I wait five or ten minutes between attempts, I can crash the unpatched process with my message.

After applying your patch, I am _unable_ to crash the process with my message. I also found the "report had 0 total physical/pagefile memory listed" text in my rrd-status log.

Now I want to try to grasp the possible consequences of using this patch. Am I correct that by responding to this condition with "return 0", there will not be a call made to do_memory_rrd_update for this host/message combination? And that the worst consequence of this will be a possible gap in the data stored in the rrd for this host?

-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Enterprise Technology Services
Department of Administration
State of Alaska