Xymon Mailing List Archive search

Network tests stopped graphing

11 messages in this thread

list Geoff Steer · Wed, 28 Sep 2005 11:28:32 +1000 ·
For no reason that I can see, my network tests are no longer being
graphed. 
I run tests for ldap, smtp and ssh. All tests on all hosts are working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times for
these three tests and the ping test. Now all the graphs show is the
value for ping.

This is only happening for some hosts (most). The rrd's have timestamps
that would indicate they are being updated, an 'rrdtool dump' of one of
the tcp rrd's shows that the values for all timestamps since the problem
started are 0.
No changes to the configuration have been made and as I mentioned, the
actual tests are working.

Running hobbit snapshot from around Sep 5.
 
-- 
Geoff Steer <user-63da8dfb9093@xymon.invalid>


-------------------------------Safe Stamp-----------------------------------
The sender's Anti-virus Service scanned this email. It is safe from known viruses.
list Henrik Størner · Wed, 28 Sep 2005 07:28:38 +0200 ·
quoted from Geoff Steer
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being
graphed. 
I run tests for ldap, smtp and ssh. All tests on all hosts are working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times for
these three tests and the ping test. Now all the graphs show is the
value for ping.
Any messages in /var/log/hobbit/rrd-status.log ?

Could you show me the output from "ls -l ~hobbit/data/rrd/HOSTNAME" ?

Are the graphs missing from both the individual status view (e.g. the
"smtp" detailed status should have a graph at the bottom), and from
the combined view on the "trends" page ? Or just one of them ?


Henrik
list Geoff Steer · Wed, 28 Sep 2005 15:57:59 +1000 ·
quoted from Henrik Størner
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote: 
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being
graphed. 
I run tests for ldap, smtp and ssh. All tests on all hosts are working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times for
these three tests and the ping test. Now all the graphs show is the
value for ping.
Any messages in /var/log/hobbit/rrd-status.log ?

Could you show me the output from "ls -l ~hobbit/data/rrd/HOSTNAME" ?

Are the graphs missing from both the individual status view (e.g. the
"smtp" detailed status should have a graph at the bottom), and from
the combined view on the "trends" page ? Or just one of them ?
tail of  /var/log/hobbit/rrd-status.log:

2005-09-26 11:09:47 RRD error
updating /usr/local/hobbit/data/rrd/vwall.test.firstwave.com.au/tcp.ssh.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step)
2005-09-26 11:09:47 RRD error
updating /usr/local/hobbit/data/rrd/admin5.firstwave.com.au/tcp.ssh.rrd
from 202.12.141.141: illegal attempt to update using time 1127696987
when last update time is 1127696987 (minimum one second step)
2005-09-26 11:09:47 RRD error
updating /usr/local/hobbit/data/rrd/admin3.firstwave.com.au/tcp.ssh.rrd
from 202.12.141.141: illegal attempt to update using time 1127696987
when last update time is 1127696987 (minimum one second step)
2005-09-26 11:09:47 RRD error
updating /usr/local/hobbit/data/rrd/vwall.test.firstwave.com.au/tcp.smtp.rrd from 202.12.141.141: illegal attempt to update using time 1127696987 when last update time is 1127696987 (minimum one second step)
2005-09-28 14:53:04 Tried to down BOARDBUSY: Invalid argument
2005-09-28 14:54:10 Tried to down BOARDBUSY: Invalid argument
2005-09-28 15:12:14 Tried to down BOARDBUSY: Invalid argument
2005-09-28 15:22:46 Tried to down BOARDBUSY: Invalid argument
2005-09-28 15:24:37 Tried to down BOARDBUSY: Invalid argument
2005-09-28 15:26:11 Tried to down BOARDBUSY: Invalid argument

NOTE:  vwall.test.firstwave.com.au is not one of the hosts showing this
problem. Clocks are synced to a ntp server running on the hobbit server.

An ls -l of one host (only tcp related rrd's shown.

-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.ssh.rrd

The problem shows up in both the trends and the detailed graphs.
quoted from Geoff Steer


-- 
Geoff Steer <user-63da8dfb9093@xymon.invalid>


-------------------------------Safe Stamp-----------------------------------
The sender's Anti-virus Service scanned this email. It is safe from known viruses.
list Henrik Størner · Wed, 28 Sep 2005 09:02:05 +0000 (UTC) ·
quoted from Geoff Steer
In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <user-63da8dfb9093@xymon.invalid> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote: 
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being
graphed. 
I run tests for ldap, smtp and ssh. All tests on all hosts are working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times for
these three tests and the ping test. Now all the graphs show is the
value for ping.
tail of  /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
quoted from Geoff Steer
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated.


Could you send me (directly, not to the list) the output from
  bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg"
as well as the RRD files for this host ?


When viewing the detailed status for e.g. ldap or smtp, do you 
get a graph image that is empty, or no image at all ?


Henrik
list Henrik Størner · Wed, 28 Sep 2005 17:01:56 +0200 ·
quoted from Geoff Steer
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being
graphed. 
I run tests for ldap, smtp and ssh. All tests on all hosts are working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times for
these three tests and the ping test. Now all the graphs show is the
value for ping.
Geoff and I looked into this and he let me look at some of his data.

Apparently, his servers are responding faster than Hobbit can measure.
Hobbit logs everything < 10 ms as "0.00 seconds", resulting in a flat
line on the TCP response-time graphs.

So there is no bug, just some very speedy servers.


Regards,
Henrik
list Adam Scheblein · Thu, 5 Jan 2006 15:31:29 -0600 ·
Henrik,

I have noticed that I am starting to have similar problems, however they
are with the memory graphs and when I try to do bb 127.0.0.1
"hobbitdboard host=HOSTNAME fields=msg"  (replacing HOSTNAME with the
hostname) I get the following error:

bb: Can't open 127.0.0.1

Any ideas??

Thanks,
Adam
quoted from Geoff Steer

In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <user-63da8dfb9093@xymon.invalid> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote: 
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being
graphed. 
I run tests for ldap, smtp and ssh. All tests on all hosts are
working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times
for
these three tests and the ping test. Now all the graphs show is the
value for ping.
tail of  /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated.


Could you send me (directly, not to the list) the output from
  bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg"
as well as the RRD files for this host ?


When viewing the detailed status for e.g. ldap or smtp, do you 
get a graph image that is empty, or no image at all ?


Henrik
list Larry Barber · Thu, 5 Jan 2006 16:03:20 -0600 ·
Try using your actual IP address, rather than 127.0.0.1.

Thanks,
Larry Barber
quoted from Adam Scheblein

On 1/5/06, Scheblein, Adam <user-de8d51f0c651@xymon.invalid> wrote:
Henrik,

I have noticed that I am starting to have similar problems, however they
are with the memory graphs and when I try to do bb 127.0.0.1
"hobbitdboard host=HOSTNAME fields=msg"  (replacing HOSTNAME with the
hostname) I get the following error:

bb: Can't open 127.0.0.1

Any ideas??

Thanks,
Adam

In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <gsteer
quoted from Adam Scheblein
(at) firstwave.com.au> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being
graphed.
I run tests for ldap, smtp and ssh. All tests on all hosts are
working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times
for
these three tests and the ping test. Now all the graphs show is the
value for ping.
tail of  /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated.


Could you send me (directly, not to the list) the output from
  bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg"
as well as the RRD files for this host ?


When viewing the detailed status for e.g. ldap or smtp, do you
get a graph image that is empty, or no image at all ?


Henrik

list Adam Scheblein · Thu, 5 Jan 2006 16:13:26 -0600 ·
I still get the message bb: Can't open [my ip address]
quoted from Larry Barber

 
From: Larry Barber [mailto:user-6ef9c2864140@xymon.invalid] 
Sent: Thursday, January 05, 2006 4:03 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Network tests stopped graphing

 
Try using your actual IP address, rather than 127.0.0.1. 

Thanks,
Larry Barber

On 1/5/06, Scheblein, Adam <user-de8d51f0c651@xymon.invalid> wrote:

Henrik,

I have noticed that I am starting to have similar problems, however they
are with the memory graphs and when I try to do bb 127.0.0.1
"hobbitdboard host=HOSTNAME fields=msg"  (replacing HOSTNAME with the 
hostname) I get the following error:

bb: Can't open 127.0.0.1

Any ideas??

Thanks,
Adam

In <user-2ff93526e0e7@xymon.invalid> Geoff Steer <user-63da8dfb9093@xymon.invalid> writes:
On Wed, 2005-09-28 at 07:28 +0200, Henrik Stoerner wrote:
On Wed, Sep 28, 2005 at 11:28:32AM +1000, Geoff Steer wrote:
For no reason that I can see, my network tests are no longer being
graphed.
I run tests for ldap, smtp and ssh. All tests on all hosts are
working
with no alerts being generated. Until about 2 days ago, there was a
single graph available for each host that showed the response times
for
these three tests and the ping test. Now all the graphs show is the
value for ping.
tail of  /var/log/hobbit/rrd-status.log:
Nothing unusual in there.
An ls -l of one host (only tcp related rrd's shown.
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.conn.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.ldap.rrd
-rw-r--r--  1 hobbit hobbit 19548 Sep 28 15:51 tcp.smtp5000.rrd
And the files are being updated.


Could you send me (directly, not to the list) the output from 
  bb 127.0.0.1 "hobbitdboard host=HOSTNAME fields=msg"
as well as the RRD files for this host ?


When viewing the detailed status for e.g. ldap or smtp, do you
get a graph image that is empty, or no image at all ?


Henrik
list Henrik Størner · Thu, 5 Jan 2006 23:30:57 +0100 ·
quoted from Adam Scheblein
On Thu, Jan 05, 2006 at 03:31:29PM -0600, Scheblein, Adam wrote:
Henrik,

I have noticed that I am starting to have similar problems, however they
are with the memory graphs and when I try to do bb 127.0.0.1
"hobbitdboard host=HOSTNAME fields=msg"  (replacing HOSTNAME with the
hostname) I get the following error:

bb: Can't open 127.0.0.1
I don't know which "bb" command you're using, but that error message 
does not look like anything that I wrote in the Hobbit "bb" utility.
In fact, I checked and
   henrik at osiris:~/hobbit$ grep -i "can.t open" */*.c
finds no matches in my Hobbit sources.

A connect-error should give you either 
   "connect to bbd failed - <OS errortext>" 
or
   "Could not connect to bbd@<some more text>"

I think those two observations are un-related. Are your memory RRD files
being updated ? If not, are there any errors in the rrd-status.log file?
If you run (as the hobbit user) 
   bbcmd hobbitd_channel --channel=status grep -A 10 "^@@.*|memory|"
do any memory status updates appear for these hosts, and what do they
look like ?


Regards,
Henrik
list Adam Scheblein · Fri, 6 Jan 2006 09:47:28 -0600 ·
I was using an old bb command from my previous bb installation, I found
the correct one and when I use it all it gives me is about 7 lines of
blank space.  The rrd files seem to be updating, but I am getting an
error in my rrd-status.log file

2006-01-06 07:24:20 RRD error updating
/usr/local/hobbit/data/rrd/HOSTNAME/tcp.ftp.rrd from 134.48.22.240:
illegal attempt to update using time 1136553860 when last update time is
1136553860 (minimum one second step)

And this keeps repeating in the past

The messages that are coming through using the bbcmd listener look like:
@@status#171614|1136561819.575708|134.48.20.42||HOSTNAME|memory|11365636
19|green||green|1136389790|0||0|
status HOSTNAME.memory green Fri Jan  6 08:26:06 CST 2006 - Memory OK
   Memory              Used       Total  Percentage
&green Physical            980M       1024M         95%
&green Swap                 85M       1664M          5%

Thanks,
Adam
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Thursday, January 05, 2006 4:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Network tests stopped graphing

On Thu, Jan 05, 2006 at 03:31:29PM -0600, Scheblein, Adam wrote:
Henrik,

I have noticed that I am starting to have similar problems, however
they
are with the memory graphs and when I try to do bb 127.0.0.1
"hobbitdboard host=HOSTNAME fields=msg"  (replacing HOSTNAME with the
hostname) I get the following error:

bb: Can't open 127.0.0.1
I don't know which "bb" command you're using, but that error message 
does not look like anything that I wrote in the Hobbit "bb" utility.
In fact, I checked and
   henrik at osiris:~/hobbit$ grep -i "can.t open" */*.c
finds no matches in my Hobbit sources.

A connect-error should give you either 
   "connect to bbd failed - <OS errortext>" 
or
   "Could not connect to bbd@<some more text>"

I think those two observations are un-related. Are your memory RRD files
being updated ? If not, are there any errors in the rrd-status.log file?
If you run (as the hobbit user) 
   bbcmd hobbitd_channel --channel=status grep -A 10 "^@@.*|memory|"
do any memory status updates appear for these hosts, and what do they
look like ?


Regards,
Henrik
list Adam Scheblein · Fri, 6 Jan 2006 16:07:13 -0600 ·
Also, I was wondering if this may have any bearing on the situation:

16677 hobbit    25   0  2896 1328  604 R 92.6  0.3   1:21.78
hobbitd_client

As you can see, my hobbitd_client frequently goes up to 98 percent and
normally does not go any lower than 60 percent.

Could this be causing some type of bottleneck that is causing problems
with the graphing?

Thanks,
Adam
quoted from Adam Scheblein

-----Original Message-----
From: Scheblein, Adam [mailto:user-de8d51f0c651@xymon.invalid] 
Sent: Friday, January 06, 2006 9:47 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Network tests stopped graphing

I was using an old bb command from my previous bb installation, I found
the correct one and when I use it all it gives me is about 7 lines of
blank space.  The rrd files seem to be updating, but I am getting an
error in my rrd-status.log file

2006-01-06 07:24:20 RRD error updating
/usr/local/hobbit/data/rrd/HOSTNAME/tcp.ftp.rrd from 134.48.22.240:
illegal attempt to update using time 1136553860 when last update time is
1136553860 (minimum one second step)

And this keeps repeating in the past

The messages that are coming through using the bbcmd listener look like:
@@status#171614|1136561819.575708|134.48.20.42||HOSTNAME|memory|11365636
19|green||green|1136389790|0||0|
status HOSTNAME.memory green Fri Jan  6 08:26:06 CST 2006 - Memory OK
   Memory              Used       Total  Percentage
&green Physical            980M       1024M         95%
&green Swap                 85M       1664M          5%

Thanks,
Adam

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Thursday, January 05, 2006 4:31 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Network tests stopped graphing

On Thu, Jan 05, 2006 at 03:31:29PM -0600, Scheblein, Adam wrote:
Henrik,

I have noticed that I am starting to have similar problems, however
they
are with the memory graphs and when I try to do bb 127.0.0.1
"hobbitdboard host=HOSTNAME fields=msg"  (replacing HOSTNAME with the
hostname) I get the following error:

bb: Can't open 127.0.0.1
I don't know which "bb" command you're using, but that error message 
does not look like anything that I wrote in the Hobbit "bb" utility.
In fact, I checked and
   henrik at osiris:~/hobbit$ grep -i "can.t open" */*.c
finds no matches in my Hobbit sources.

A connect-error should give you either 
   "connect to bbd failed - <OS errortext>" 
or
   "Could not connect to bbd@<some more text>"

I think those two observations are un-related. Are your memory RRD files
being updated ? If not, are there any errors in the rrd-status.log file?
If you run (as the hobbit user) 
   bbcmd hobbitd_channel --channel=status grep -A 10 "^@@.*|memory|"
do any memory status updates appear for these hosts, and what do they
look like ?


Regards,
Henrik