Another weirdness - RRD info has gaps
list Betsy Schwartz
These two graphs were generated from the same script, sending the same data to two different tests : http://www.flickr.com/photos/betsys99/7705982914/in/photostream You can see that the first test has chunks missing at regular intervals. These two graphs were made from the same data, sent from the same perl script, to two different tests, RTM and rtmstats. I did this originally back when I was having trouble with RRD choking on the extra information and wanted to separate the stats, but I've kept it because the one graph is so spotty. On the client side,my perl script for the test is logging that it is sending the data correctly, and I believe it's always displaying correctly on the test page On the xymon server, I see the rrd file for RTM is missing some entries that are populated on the rtmstats, for example: RTM: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row> rtmstats: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row> What could be causing this discrepancy? Looking at that interval on the client side log: /home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 Temp_Errors : 1058 Other_Errors : 3891 Total_Errors : 4949 Percent_Failure : 1.39% <SNIP lots more stuff> '/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 TempErrors : 1058 OtherErrors : 3891 TotalErrors : 4949 PercentFailure : 1.39%
list Betsy Schwartz
accidentally snipped off the tail interval there, but it's still matching: /home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:09:20 2012 Total : 377910 Success : 372738 Temp_Errors : 1083 Other_Errors : 4085 Total_Errors : 5168 Percent_Failure : 1.37% <SNIP otherstuff> '/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:09:20 2012 Total : 377910 Success : 372738 TempErrors : 1083 OtherErrors : 4085 TotalErrors : 5168 PercentFailure : 1.37%
▸
On Fri, Aug 3, 2012 at 3:26 PM, Betsy Schwartz <user-c61747246f66@xymon.invalid> wrote:These two graphs were generated from the same script, sending the same data to two different tests : http://www.flickr.com/photos/betsys99/7705982914/in/photostream You can see that the first test has chunks missing at regular intervals. These two graphs were made from the same data, sent from the same perl script, to two different tests, RTM and rtmstats. I did this originally back when I was having trouble with RRD choking on the extra information and wanted to separate the stats, but I've kept it because the one graph is so spotty. On the client side,my perl script for the test is logging that it is sending the data correctly, and I believe it's always displaying correctly on the test page On the xymon server, I see the rrd file for RTM is missing some entries that are populated on the rtmstats, for example: RTM: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row> rtmstats: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row> What could be causing this discrepancy? Looking at that interval on the client side log: /home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 Temp_Errors : 1058 Other_Errors : 3891 Total_Errors : 4949 Percent_Failure : 1.39% <SNIP lots more stuff> '/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 TempErrors : 1058 OtherErrors : 3891 TotalErrors : 4949 PercentFailure : 1.39%
list Tim McCloskey
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
▸
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Betsy Schwartz [user-c61747246f66@xymon.invalid]
Sent: Friday, August 03, 2012 12:34 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Another weirdness - RRD info has gaps
accidentally snipped off the tail interval there, but it's still matching:
/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.RTM green Wed Aug 1 17:09:20 2012
Total : 377910
Success : 372738
Temp_Errors : 1083
Other_Errors : 4085
Total_Errors : 5168
Percent_Failure : 1.37%
<SNIP otherstuff>
'/home/xymon/client/bin/bb 10.100.5.42 'status+12h
myhost.example.com.rtmstats green Wed Aug 1 17:09:20 2012
Total : 377910
Success : 372738
TempErrors : 1083
OtherErrors : 4085
TotalErrors : 5168
PercentFailure : 1.37%
On Fri, Aug 3, 2012 at 3:26 PM, Betsy Schwartz <user-c61747246f66@xymon.invalid> wrote:These two graphs were generated from the same script, sending the same data to two different tests : http://www.flickr.com/photos/betsys99/7705982914/in/photostream You can see that the first test has chunks missing at regular intervals. These two graphs were made from the same data, sent from the same perl script, to two different tests, RTM and rtmstats. I did this originally back when I was having trouble with RRD choking on the extra information and wanted to separate the stats, but I've kept it because the one graph is so spotty. On the client side,my perl script for the test is logging that it is sending the data correctly, and I believe it's always displaying correctly on the test page On the xymon server, I see the rrd file for RTM is missing some entries that are populated on the rtmstats, for example: RTM: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v><v>NaN</v></row> rtmstats: <!-- 2012-08-01 17:05:00 EDT / 1343855100 --> <row><v>3.5954352000e+05</v><v>3.5456424000e+05</v><v>1.0610000000e+03</v><v>3.9142800000e+03</v><v>4.9752800000e+03</v><v>1.3876000000e+00</v></row> <!-- 2012-08-01 17:10:00 EDT / 1343855400 --> <row><v>3.8017446000e+05</v><v>3.7497760000e+05</v><v>1.0860800000e+03</v><v>4.1067800000e+03</v><v>5.1928600000e+03</v><v>1.3678000000e+00</v></row> What could be causing this discrepancy? Looking at that interval on the client side log: /home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.RTM green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 Temp_Errors : 1058 Other_Errors : 3891 Total_Errors : 4949 Percent_Failure : 1.39% <SNIP lots more stuff> '/home/xymon/client/bin/bb 10.100.5.42 'status+12h myhost.example.com.rtmstats green Wed Aug 1 17:04:15 2012 Total : 357039 Success : 352086 TempErrors : 1058 OtherErrors : 3891 TotalErrors : 4949 PercentFailure : 1.39%
list Betsy Schwartz
▸
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:
I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the
RRD file and put it back, and I am *still* seeing gaps.
I swear my two tests are getting the exact same info:
push( @rtmstats,
" \n Total : $totalcnt \n Success : $success \n TempErrors :
$temperror\n OtherErrors : $othererror \n TotalErrors : $totalerror
\n PercentFailure : $failure% \n"
);
push ( @bbdata, at rtmstats);
push ( @bbdata,< SNIP lots of other stuff>);
my $bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME $color
$date @bbdata \n'";
system("$bbcmd");
print $bbcmd;
$bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green $date
@rtmstats \n'";
system("$bbcmd");
print $bbcmd;
list Tim McCloskey
comments at the bottom
▸
From: Betsy Schwartz [user-c61747246f66@xymon.invalid]
Sent: Friday, August 03, 2012 2:42 PM
To: Tim McCloskey
Cc: xymon at xymon.com
Subject: Re: [Xymon] Another weirdness - RRD info has gaps
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)
It's got *some* data but it's got gaps. I dropped the test, erased the
RRD file and put it back, and I am *still* seeing gaps.
I swear my two tests are getting the exact same info:
push( @rtmstats,
" \n Total : $totalcnt \n Success : $success \n TempErrors :
$temperror\n OtherErrors : $othererror \n TotalErrors : $totalerror
\n PercentFailure : $failure% \n"
);
push ( @bbdata, at rtmstats);
push ( @bbdata,< SNIP lots of other stuff>);
my $bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME $color
$date @bbdata \n'";
system("$bbcmd");
print $bbcmd;
$bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green $date
@rtmstats \n'";
system("$bbcmd");
print $bbcmd;
1. Do both of these graphs have essentially the same rrd configs?
2. This would fail to provide any data if $TESTNAME did not expand, so I don't think that's it. Still, these two statements are glued together in different ways.
▸
my $bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.$TESTNAME $color
$date @bbdata \n'";
$bbcmd = "$XYMON $XYMSRV 'status+12h $MACHINE.rtmstats green $date
@rtmstats \n'";
list Betsy Schwartz
1. Do both of these graphs have essentially the same rrd configs?
The graphs have identical rrd configs, cut'n'paste with the name change
2. This would fail to provide any data if $TESTNAME did not expand, so I don't think that's it. Still, these two statements are glued together in different ways.
I'm getting *most* of the data in RRD , and I have not caught any gaps on the xymon test page. thanks Betsy
list Henrik Størner
▸
On 03-08-2012 23:42, Betsy Schwartz wrote:
n Fri, Aug 3, 2012 at 4:17 PM, Tim McCloskey <user-440820cc07d6@xymon.invalid> wrote:I've seen this before but I don't remember the exact cause, I just remember that working on rrd stuff tended to add gray hair. Do both of these graphs have essentially the same rrd configs? If the foo.rrd files do not contain the detail for the tests I would truncate foo.rrd, and recheck my [foo] graph stanza. (rather, save a copy off foo.rrd first)It's got *some* data but it's got gaps. I dropped the test, erased the RRD file and put it back, and I am *still* seeing gaps. I swear my two tests are getting the exact same info:
What Xymon version are you running on the Xymon server ? Anything before 4.3.4 has a known bug that can cause this. Are there any errors logged in your rrd-status.log / rrd-data.log files? You're sending the messages with a lifetime of 12 hours. How often are you sending these messages ? If not once every 5 minutes, then the RRD-file must have a non-standard "step" and "heartbeat" value - you can check this with "rrdtool info myfile.rrd" - here's one from my server: filename = "/var/lib/xymon/rrd/jorn.hswn.dk/la.rrd" rrd_version = "0003" step = 300 <snip> ds[la].minimal_heartbeat = 600 "step" is how often you're sending updates. "minimal_heartbeat" determines how long time may pass between updates before the data is considered invalid - in this case, if more than 600 seconds pass between two updates, then the data is not considered valid and will be ignored *unless* another update arrives within the next 600 seconds. The "heartbeat" can be changed with rrdtuune. "step" cannot - you'll have to export the rrd-file to XML, edit the XML file and then restore the rrd-file from the XML. Regards, Henrik