Network tests stopped graphing
list Geoff Steer
For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping. This is only happening for some hosts (most). The rrd's have timestamps that would indicate they are being updated, an 'rrdtool dump' of one of the tcp rrd's shows that the values for all timestamps since the problem started are 0. No changes to the configuration have been made and as I mentioned, the actual tests are working. Running hobbit snapshot from around Sep 5. -- Geoff Steer <user-63da8dfb9093@xymon.invalid> -------------------------------Safe Stamp----------------------------------- The sender's Anti-virus Service scanned this email. It is safe from known viruses.
list Henrik Størner
▸
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
Any messages in /var/log/hobbit/rrd-status.log ? Could you show me the output from "ls -l ~hobbit/data/rrd/HOSTNAME" ? Are the graphs missing from both the individual status view (e.g. the "smtp" detailed status should have a graph at the bottom), and from the combined view on the "trends" page ? Or just one of them ? Henrik
list Geoff Steer
▸
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.Any messages in /var/log/hobbit/rrd-status.log ? Could you show me the output from "ls -l ~hobbit/data/rrd/HOSTNAME" ? Are the graphs missing from both the individual status view (e.g. the "smtp" detailed status should have a graph at the bottom), and from the combined view on the "trends" page ? Or just one of them ?
tail of /var/log/hobbit/rrd-status.log: 2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/vwall.test.firstwave.com.au/tcp.ssh.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/admin5.firstwave.com.au/tcp.ssh.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/admin3.firstwave.com.au/tcp.ssh.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-26 11:09:47 RRD error updating /usr/local/hobbit/data/rrd/vwall.test.firstwave.com.au/tcp.smtp.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step) 2005-09-28 14:53:04 Tried to down BOARDBUSY: Invalid argument 2005-09-28 14:54:10 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:12:14 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:22:46 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:24:37 Tried to down BOARDBUSY: Invalid argument 2005-09-28 15:26:11 Tried to down BOARDBUSY: Invalid argument NOTE: vwall.test.firstwave.com.au is not one of the hosts showing this problem. Clocks are synced to a ntp server running on the hobbit server. An ls -l of one host (only tcp related rrd's shown. -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ssh.rrd The problem shows up in both the trends and the detailed graphs.
▸
--
Geoff Steer <user-63da8dfb9093@xymon.invalid>
-------------------------------Safe Stamp-----------------------------------
The sender's Anti-virus Service scanned this email. It is safe from known viruses.
list Henrik Størner
▸
In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <user-63da8dfb9093@xymon.invalid> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
tail of /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
▸
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated. Could you send me (directly, not to the list) the output from bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" as well as the RRD files for this host ? When viewing the detailed status for e.g. ldap or smtp, do you get a graph image that is empty, or no image at all ? Henrik
list Henrik Størner
▸
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
Geoff and I looked into this and he let me look at some of his data. Apparently, his servers are responding faster than Hobbit can measure. Hobbit logs everything < 10 ms as "0.00 seconds", resulting in a flat line on the TCP response-time graphs. So there is no bug, just some very speedy servers. Regards, Henrik
list Adam Scheblein
Henrik, I have noticed that I am starting to have similar problems, however they are with the memory graphs and when I try to do bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" (replacing HOSTNAME with the hostname) I get the following error: bb: Can't open 127.0.0.1 Any ideas?? Thanks, Adam
▸
In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <user-63da8dfb9093@xymon.invalid> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
tail of /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated. Could you send me (directly, not to the list) the output from bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" as well as the RRD files for this host ? When viewing the detailed status for e.g. ldap or smtp, do you get a graph image that is empty, or no image at all ? Henrik
list Larry Barber
Try using your actual IP address, rather than 127.0.0.1. Thanks, Larry Barber
▸
On 1/5/06, Scheblein, Adam <user-de8d51f0c651@xymon.invalid> wrote:Henrik, I have noticed that I am starting to have similar problems, however they are with the memory graphs and when I try to do bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" (replacing HOSTNAME with the hostname) I get the following error: bb: Can't open 127.0.0.1 Any ideas?? Thanks, Adam
In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <gsteer
▸
(at) firstwave.com.au> writes:On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.tail of /var/log/hobbit/rrd-status.log:Nothing unusual in there.An ls -l of one host (only tcp related rrd's shown.-rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrdAnd the files are being updated. Could you send me (directly, not to the list) the output from bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" as well as the RRD files for this host ? When viewing the detailed status for e.g. ldap or smtp, do you get a graph image that is empty, or no image at all ? Henrik
list Adam Scheblein
I still get the message bb: Can't open [my ip address]
▸
From: Larry Barber [mailto:user-6ef9c2864140@xymon.invalid]
Sent: Thursday, January 05, 2006 4:03 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Network tests stopped graphing
Try using your actual IP address, rather than 127.0.0.1.
Thanks,
Larry Barber
On 1/5/06, Scheblein, Adam <user-de8d51f0c651@xymon.invalid> wrote:
Henrik,
I have noticed that I am starting to have similar problems, however they
are with the memory graphs and when I try to do bb 127.0.0.1
"hobbitdboard host=HOSTNAME fields=msg" (replacing HOSTNAME with the
hostname) I get the following error:
bb: Can't open 127.0.0.1
Any ideas??
Thanks,
Adam
In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <user-63da8dfb9093@xymon.invalid> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:For no reason that I can see, my network tests are no longer being graphed. I run tests for ldap, smtp and ssh. All tests on all hosts are working with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times for these three tests and the ping test. Now all the graphs show is the value for ping.
tail of /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd -rw-r--r-- 1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated. Could you send me (directly, not to the list) the output from bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" as well as the RRD files for this host ? When viewing the detailed status for e.g. ldap or smtp, do you get a graph image that is empty, or no image at all ? Henrik
list Henrik Størner
▸
On Thu, Jan 05, 2006 at 03:31:29PM -0600, Scheblein, Adam wrote:
Henrik, I have noticed that I am starting to have similar problems, however they are with the memory graphs and when I try to do bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" (replacing HOSTNAME with the hostname) I get the following error: bb: Can't open 127.0.0.1
I don't know which "bb" command you're using, but that error message does not look like anything that I wrote in the Hobbit "bb" utility. In fact, I checked and henrik at osiris:~/hobbit$ grep -i "can.t open" */*.c finds no matches in my Hobbit sources. A connect-error should give you either "connect to bbd failed - <OS errortext>" or "Could not connect to bbd@<some more text>" I think those two observations are un-related. Are your memory RRD files being updated ? If not, are there any errors in the rrd-status.log file? If you run (as the hobbit user) bbcmd hobbitd_channel --channel=status grep -A 10 "^@@.*|memory|" do any memory status updates appear for these hosts, and what do they look like ? Regards, Henrik
list Adam Scheblein
I was using an old bb command from my previous bb installation, I found the correct one and when I use it all it gives me is about 7 lines of blank space. The rrd files seem to be updating, but I am getting an error in my rrd-status.log file 2006-01-06 07:24:20 RRD error updating /usr/local/hobbit/data/rrd/HOSTNAME/tcp.ftp.rrd from 134.48.22.240: illegal attempt to update using time 1136553860 when last update time is 1136553860 (minimum one second step) And this keeps repeating in the past The messages that are coming through using the bbcmd listener look like: @@status#171614|1136561819.575708|134.48.20.42||HOSTNAME|memory|11365636 19|green||green|1136389790|0||0| status HOSTNAME.memory green Fri Jan 6 08:26:06 CST 2006 - Memory OK Memory Used Total Percentage &green Physical 980M 1024M 95% &green Swap 85M 1664M 5% Thanks, Adam
▸
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Thursday, January 05, 2006 4:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Network tests stopped graphing
On Thu, Jan 05, 2006 at 03:31:29PM -0600, Scheblein, Adam wrote:Henrik, I have noticed that I am starting to have similar problems, however they are with the memory graphs and when I try to do bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" (replacing HOSTNAME with the hostname) I get the following error: bb: Can't open 127.0.0.1
I don't know which "bb" command you're using, but that error message does not look like anything that I wrote in the Hobbit "bb" utility. In fact, I checked and henrik at osiris:~/hobbit$ grep -i "can.t open" */*.c finds no matches in my Hobbit sources. A connect-error should give you either "connect to bbd failed - <OS errortext>" or "Could not connect to bbd@<some more text>" I think those two observations are un-related. Are your memory RRD files being updated ? If not, are there any errors in the rrd-status.log file? If you run (as the hobbit user) bbcmd hobbitd_channel --channel=status grep -A 10 "^@@.*|memory|" do any memory status updates appear for these hosts, and what do they look like ? Regards, Henrik
list Adam Scheblein
Also, I was wondering if this may have any bearing on the situation: 16677 hobbit 25 0 2896 1328 604 R 92.6 0.3 1:21.78 hobbitd_client As you can see, my hobbitd_client frequently goes up to 98 percent and normally does not go any lower than 60 percent. Could this be causing some type of bottleneck that is causing problems with the graphing? Thanks, Adam
▸
-----Original Message-----
From: Scheblein, Adam [mailto:user-de8d51f0c651@xymon.invalid]
Sent: Friday, January 06, 2006 9:47 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Network tests stopped graphing
I was using an old bb command from my previous bb installation, I found
the correct one and when I use it all it gives me is about 7 lines of
blank space. The rrd files seem to be updating, but I am getting an
error in my rrd-status.log file
2006-01-06 07:24:20 RRD error updating
/usr/local/hobbit/data/rrd/HOSTNAME/tcp.ftp.rrd from 134.48.22.240:
illegal attempt to update using time 1136553860 when last update time is
1136553860 (minimum one second step)
And this keeps repeating in the past
The messages that are coming through using the bbcmd listener look like:
@@status#171614|1136561819.575708|134.48.20.42||HOSTNAME|memory|11365636
19|green||green|1136389790|0||0|
status HOSTNAME.memory green Fri Jan 6 08:26:06 CST 2006 - Memory OK
Memory Used Total Percentage
&green Physical 980M 1024M 95%
&green Swap 85M 1664M 5%
Thanks,
Adam
-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Thursday, January 05, 2006 4:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Network tests stopped graphing
On Thu, Jan 05, 2006 at 03:31:29PM -0600, Scheblein, Adam wrote:Henrik, I have noticed that I am starting to have similar problems, however they are with the memory graphs and when I try to do bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg" (replacing HOSTNAME with the hostname) I get the following error: bb: Can't open 127.0.0.1
I don't know which "bb" command you're using, but that error message does not look like anything that I wrote in the Hobbit "bb" utility. In fact, I checked and henrik at osiris:~/hobbit$ grep -i "can.t open" */*.c finds no matches in my Hobbit sources. A connect-error should give you either "connect to bbd failed - <OS errortext>" or "Could not connect to bbd@<some more text>" I think those two observations are un-related. Are your memory RRD files being updated ? If not, are there any errors in the rrd-status.log file? If you run (as the hobbit user) bbcmd hobbitd_channel --channel=status grep -A 10 "^@@.*|memory|" do any memory status updates appear for these hosts, and what do they look like ? Regards, Henrik