Xymon Mailing List Archive search

Weird disk alert with bad data

list Stef Coene
Tue, 28 Oct 2008 21:10:17 +0100
Message-Id: <user-68a5c58557cb@xymon.invalid>

On Tuesday 28 October 2008, Martha McConaghy wrote:
We recently got the AIX client working with our Hobbit server.  I then
had to apply a patch to rrd/do_vmstat.c to fix a problem with rrd crashing
due to an uninitialized variable coming from the AIX client.  Despite that,
I'm still seeing a weird problem.  One of the other non-AIX clients will
have their disk check to to red alert.  When, I take a look at it, the
disks are fine.  However, the data being processed by rrd is off by a few
characters which seems to be what is causing the red alert to be generated.
 It will last for an hour or so, then will go green again and the problem
will move to a different non-AIX client.  When I remove the three AIX
clients from bb-hosts, the problem disappears.  So, it seems to be pretty
clearly related to the AIX client, though is affecting other alerts.

Any thoughts on what to do?  Have we stumbled onto another bug?
What patch did you applied for the rrd?

I have lots of AIX client talking to lots of hobbit servers and I never had a 
problem with the rrds.  The only patch I applied regarding vmstat is adding 
cpu_pc and cpu_ec and striping of . and , of the numbers.

My vmstat patch:

--- ./hobbit-4.2.0/hobbitd/rrd/do_vmstat.c   2006-08-09 22:10:06.000000000 
+0200
+++ ./hobbit-4.2.0-OK/hobbitd/rrd/do_vmstat.c   2007-03-13 11:40:39.000000000 
+0100
@@ -76,6 +76,8 @@
   { 14, "cpu_sys" },
   { 15, "cpu_idl" },
   { 16, "cpu_wait" },
+  { 17, "cpu_pc" },
+  { 18, "cpu_ec" },
   { -1, NULL }
 };

@@ -322,6 +324,17 @@
   p = strchr(datapart, '\n'); if (p) *p = '\0';
   p = strtok(datapart, " "); datacount = 0;
   while (p && (datacount < MAX_VMSTAT_VALUES)) {
• +      /* Removing . and , from the numbers */
+      char *p1;
+      while ( (p1 = strchr(p,'.')) != NULL ) {
+         strcpy (p1, p1+1) ;
+      }
+      char *p2;
+      while ( (p2 = strchr(p,',')) != NULL ) {
+         strcpy (p2, p2+1) ;
+      }
• values[datacount++] = atoi(p);
      p = strtok(NULL, " ");
   }


Stef