Xymon Mailing List Archive search

Another weirdness - RRD info has gaps

7 messages in this thread

list Betsy Schwartz · Fri, 3 Aug 2012 15:26:32 -0400 ·
These two graphs were generated from the same script, sending the same
data to two different tests :

http://www.flickr.com/photos/betsys99/7705982914/in/photostream

You can see that the first test has chunks missing at regular
intervals. These two graphs were made from the same data, sent from
the same perl script, to two different tests, RTM and rtmstats. I did
this originally back when I was having trouble with RRD choking on the
extra information and wanted to separate the stats, but I've kept it
because the one graph is so spotty.  On the client side,my perl script
for the test is logging that it is sending the data correctly, and I
believe it's always displaying correctly on the test page

 On the xymon server, I see the rrd file for RTM is missing some
entries that are populated on the rtmstats, for example:

RTM:
         <!-- 2012-08-01 17:05:00 EDT / 1343855100 -->
<row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row>
         <!-- 2012-08-01 17:10:00 EDT / 1343855400 -->
<row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
rtmstats:
            <!-- 2012-08-01 17:05:00 EDT / 1343855100 -->
<row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row>
            <!-- 2012-08-01 17:10:00 EDT / 1343855400 -->
<row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row>

What could be causing this discrepancy?


Looking at that interval on the client side log:
/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.RTM  green  Wed Aug  1 17:04:15 2012
 Total : 357039
 Success : 352086
 Temp_Errors : 1058
 Other_Errors : 3891
 Total_Errors : 4949
 Percent_Failure : 1.39%
<SNIP lots more stuff>

'/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.rtmstats green  Wed Aug  1 17:04:15 2012
 Total : 357039
 Success : 352086
 TempErrors : 1058
 OtherErrors : 3891
 TotalErrors : 4949
 PercentFailure : 1.39%
list Betsy Schwartz · Fri, 3 Aug 2012 15:34:11 -0400 ·
accidentally snipped off the tail interval there, but it's still matching:

/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.RTM  green  Wed Aug  1 17:09:20 2012
 Total : 377910
 Success : 372738
 Temp_Errors : 1083
 Other_Errors : 4085
 Total_Errors : 5168
 Percent_Failure : 1.37%
<SNIP otherstuff>

'/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.rtmstats green  Wed Aug  1 17:09:20 2012
 Total : 377910
 Success : 372738
 TempErrors : 1083
 OtherErrors : 4085
 TotalErrors : 5168
 PercentFailure : 1.37%
quoted from Betsy Schwartz


On Fri, Aug 3, 2012 at 3:26 PM, Betsy Schwartz <user-c61747246f66@xymon.invalid> wrote:
These two graphs were generated from the same script, sending the same
data to two different tests :

http://www.flickr.com/photos/betsys99/7705982914/in/photostream

You can see that the first test has chunks missing at regular
intervals. These two graphs were made from the same data, sent from
the same perl script, to two different tests, RTM and rtmstats. I did
this originally back when I was having trouble with RRD choking on the
extra information and wanted to separate the stats, but I've kept it
because the one graph is so spotty.  On the client side,my perl script
for the test is logging that it is sending the data correctly, and I
believe it's always displaying correctly on the test page

 On the xymon server, I see the rrd file for RTM is missing some
entries that are populated on the rtmstats, for example:

RTM:
         <!-- 2012-08-01 17:05:00 EDT / 1343855100 -->
<row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row>
         <!-- 2012-08-01 17:10:00 EDT / 1343855400 -->
<row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
rtmstats:
            <!-- 2012-08-01 17:05:00 EDT / 1343855100 -->
<row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row>
            <!-- 2012-08-01 17:10:00 EDT / 1343855400 -->
<row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row>

What could be causing this discrepancy?


Looking at that interval on the client side log:
/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.RTM  green  Wed Aug  1 17:04:15 2012
 Total : 357039
 Success : 352086
 Temp_Errors : 1058
 Other_Errors : 3891
 Total_Errors : 4949
 Percent_Failure : 1.39%
<SNIP lots more stuff>

'/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.rtmstats green  Wed Aug  1 17:04:15 2012
 Total : 357039
 Success : 352086
 TempErrors : 1058
 OtherErrors : 3891
 TotalErrors : 4949
 PercentFailure : 1.39%
list Tim McCloskey · Fri, 3 Aug 2012 13:17:27 -0700 ·
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair.  Do both of these graphs have essentially the same rrd configs?  If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
quoted from Betsy Schwartz


From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Betsy Schwartz [user-c61747246f66@xymon.invalid]
Sent: Friday, August 03, 2012 12:34 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Another weirdness - RRD info has gaps

accidentally snipped off the tail interval there, but it's still matching:

/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.RTM  green  Wed Aug  1 17:09:20 2012
 Total : 377910
 Success : 372738
 Temp_Errors : 1083
 Other_Errors : 4085
 Total_Errors : 5168
 Percent_Failure : 1.37%
<SNIP otherstuff>

'/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.rtmstats green  Wed Aug  1 17:09:20 2012
 Total : 377910
 Success : 372738
 TempErrors : 1083
 OtherErrors : 4085
 TotalErrors : 5168
 PercentFailure : 1.37%


On Fri, Aug 3, 2012 at 3:26 PM, Betsy Schwartz <user-c61747246f66@xymon.invalid> wrote:
These two graphs were generated from the same script, sending the same
data to two different tests :

http://www.flickr.com/photos/betsys99/7705982914/in/photostream

You can see that the first test has chunks missing at regular
intervals. These two graphs were made from the same data, sent from
the same perl script, to two different tests, RTM and rtmstats. I did
this originally back when I was having trouble with RRD choking on the
extra information and wanted to separate the stats, but I've kept it
because the one graph is so spotty.  On the client side,my perl script
for the test is logging that it is sending the data correctly, and I
believe it's always displaying correctly on the test page

 On the xymon server, I see the rrd file for RTM is missing some
entries that are populated on the rtmstats, for example:

RTM:
         <!-- 2012-08-01 17:05:00 EDT / 1343855100 -->
<row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row>
         <!-- 2012-08-01 17:10:00 EDT / 1343855400 -->
<row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row>
rtmstats:
            <!-- 2012-08-01 17:05:00 EDT / 1343855100 -->
<row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row>
            <!-- 2012-08-01 17:10:00 EDT / 1343855400 -->
<row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row>

What could be causing this discrepancy?


Looking at that interval on the client side log:
/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.RTM  green  Wed Aug  1 17:04:15 2012
 Total : 357039
 Success : 352086
 Temp_Errors : 1058
 Other_Errors : 3891
 Total_Errors : 4949
 Percent_Failure : 1.39%
<SNIP lots more stuff>

'/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.rtmstats green  Wed Aug  1 17:04:15 2012
 Total : 357039
 Success : 352086
 TempErrors : 1058
 OtherErrors : 3891
 TotalErrors : 4949
 PercentFailure : 1.39%
list Betsy Schwartz · Fri, 3 Aug 2012 17:42:46 -0400 ·
quoted from Tim McCloskey
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair.  Do both of these graphs have essentially the same rrd configs?  If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the
RRD file and put it back, and I am *still* seeing gaps.

I swear my two tests are getting the exact same info:

push( @rtmstats,
" \n Total : $totalcnt \n Success : $success \n TempErrors :
$temperror\n OtherErrors : $othererror  \n TotalErrors : $totalerror
\n PercentFailure : $failure%   \n"
);
push ( @bbdata, at rtmstats);
push ( @bbdata,< SNIP lots of other stuff>);

my $bbcmd =  "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME  $color
$date @bbdata \n'";
system("$bbcmd");
print $bbcmd;

$bbcmd =  "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green  $date
@rtmstats \n'";
system("$bbcmd");
print $bbcmd;
list Tim McCloskey · Fri, 3 Aug 2012 15:05:56 -0700 ·
comments at the bottom
quoted from Betsy Schwartz
From: Betsy Schwartz [user-c61747246f66@xymon.invalid]
Sent: Friday, August 03, 2012 2:42 PM
To: Tim McCloskey
Cc: xymon at xymon.com
Subject: Re: [Xymon] Another weirdness - RRD info has gaps

n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair.  Do both of these graphs have essentially the same rrd configs?  If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the
RRD file and put it back, and I am *still* seeing gaps.

I swear my two tests are getting the exact same info:

push( @rtmstats,
" \n Total : $totalcnt \n Success : $success \n TempErrors :
$temperror\n OtherErrors : $othererror  \n TotalErrors : $totalerror
\n PercentFailure : $failure%   \n"
);
push ( @bbdata, at rtmstats);
push ( @bbdata,< SNIP lots of other stuff>);

my $bbcmd =  "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME  $color
$date @bbdata \n'";
system("$bbcmd");
print $bbcmd;

$bbcmd =  "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green  $date
@rtmstats \n'";
system("$bbcmd");
print $bbcmd;


1. Do both of these graphs have essentially the same rrd configs?  

2. This would fail to provide any data if $TESTNAME did not expand, so I don't think that's it.  Still, these two statements are glued together in different ways.
quoted from Betsy Schwartz

my $bbcmd =  "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME  $color
$date @bbdata \n'";

$bbcmd =  "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green  $date
@rtmstats \n'";
list Betsy Schwartz · Fri, 3 Aug 2012 20:52:27 -0400 ·
1. Do both of these graphs have essentially the same rrd configs?
The graphs have identical rrd configs, cut'n'paste with the name change
2. This would fail to provide any data if $TESTNAME did not expand, so I don't think that's it.  Still, these two statements are glued together in different ways.
I'm getting *most* of the data in RRD , and I have not caught any gaps
on the xymon test page.

thanks Betsy
list Henrik Størner · Sat, 04 Aug 2012 07:58:48 +0200 ·
quoted from Betsy Schwartz
On 03-08-2012 23:42, Betsy Schwartz wrote:
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair.  Do both of these graphs have essentially the same rrd configs?  If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the
RRD file and put it back, and I am *still* seeing gaps.

I swear my two tests are getting the exact same info:
What Xymon version are you running on the Xymon server ? Anything before 4.3.4 has a known bug that can cause this.

Are there any errors logged in your rrd-status.log / rrd-data.log files?

You're sending the messages with a lifetime of 12 hours. How often are you sending these messages ? If not once every 5 minutes, then the RRD-file must have a non-standard "step" and "heartbeat" value - you can check this with "rrdtool info myfile.rrd" - here's one from my server:

filename = "/var/lib/xymon/rrd/jorn.hswn.dk/la.rrd"
rrd_version = "0003"
step = 300
<snip>
ds[la].minimal_heartbeat = 600

"step" is how often you're sending updates. "minimal_heartbeat" determines how long time may pass between updates before the data is considered invalid - in this case, if more than 600 seconds pass between two updates, then the data is not considered valid and will be ignored *unless* another update arrives within the next 600 seconds.

The "heartbeat" can be changed with rrdtuune. "step" cannot - you'll have to export the rrd-file to XML, edit the XML file and then restore the rrd-file from the XML.


Regards,
Henrik