Xymon Mailing List Archive search

Users / Procs Graphing problem

5 messages in this thread

list Mike Rowell · Wed, 1 Feb 2006 12:27:48 -0000 ·
All,
 
Long term bigbrother user, new hobbit convert.
 
I'm building some monitoring solutions at a place I'm working and during
this have noticed an issue with one of the rrd graphs that I can't
figure out.. Everyday at a specific time (different for each system)
there is a 10minute gap in the graph, I've done an rrd dump on the data
and it's appearing as a NaN entry.
 
This is only effecting the users / procs graph, anyone got any ideas?
If it was effecting all the systems at the same time I might have an
idea but it just seems to effect all the systems at different times.
 
Regards,
 
Mike Rowell


For more information about the Viatel Group, please visit www.viatel.com

THIS MESSAGE IS INTENDED ONLY FOR THE USE OF THE INTENDED RECIPIENT TO WHICH IT IS ADDRESSED AND MAY CONTAIN INFORMATION THAT IS PRIVILEGED, CONFIDENTIAL AND EXEMPT FROM DISCLOSURE.  If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, you are notified that any dissemination, distribution or copying of this e-mail is prohibited, and you should delete this e-mail from your system.

This message has been scanned for viruses and spam by Viatel MailControl - www.viatel.com
list Henrik Størner · Wed, 1 Feb 2006 14:02:08 +0100 ·
quoted from Mike Rowell
On Wed, Feb 01, 2006 at 12:27:48PM -0000, Rowell, Mike wrote:
I'm building some monitoring solutions at a place I'm working and during
this have noticed an issue with one of the rrd graphs that I can't
figure out.. Everyday at a specific time (different for each system)
there is a 10minute gap in the graph, I've done an rrd dump on the data
and it's appearing as a NaN entry.
This means that no data was being fed into the RRD file for 10-15
minutes.
This is only effecting the users / procs graph, anyone got any ideas?
Could it be that you are rebooting these servers once a day ?
(I know, Unix folks rarely do that - but just in case).

Is this with the Hobbit client, or the BB client reporting data ?

Since it always happens on the same time for a given server, it would be
interesting to see what messages are fed into Hobbit around that time.
If you are running the Hobbit client, could you setup a cron job to
fetch the client data around that time ? It should run something like

   wget http://hobbitserver/cgi-bin/bb-hostsvc.sh?CLIENT=bad.client.name

and store the output in a file where you can look at it later. Best 
thing would be if you could run this every minute for 15 minutes around
the time this problem occurs.

If you are running the BB client, the interesting part is the "cpu"
column data that is sent around that time. So something similar,
except that the URL you should fetch is

http://hobbitserver/cgi-bin/bb-hostsvc.sh?HOSTSVC=bad,client,name.cpu


Regards,
Henrik
list Mike Rowell · Wed, 1 Feb 2006 13:12:32 -0000 ·
Henrik,

Thanks for this, the data is actually coming via my own modified client, the
data is correct before anyone asks.  What I've done is taken the Nagios
client as we have extensive nagios monitoring, added a couple of new checks
to it and written an ext script that connects to the specified servers, runs
the new checks and then sticks the data into Hobbit.

We have multiple environments some firewalls some not etc etc so it's the
best allround solution I could come up with.

I'll pull the data at the client side and see if I can tell whats going
wrong.

Regards

Mike Rowell 
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: 01 February 2006 13:02
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Users / Procs Graphing problem

On Wed, Feb 01, 2006 at 12:27:48PM -0000, Rowell, Mike wrote:
I'm building some monitoring solutions at a place I'm working and 
during this have noticed an issue with one of the rrd graphs that I 
can't figure out.. Everyday at a specific time (different for each 
system) there is a 10minute gap in the graph, I've done an rrd dump on 
the data and it's appearing as a NaN entry.
This means that no data was being fed into the RRD file for 10-15 minutes.
This is only effecting the users / procs graph, anyone got any ideas?
Could it be that you are rebooting these servers once a day ?
(I know, Unix folks rarely do that - but just in case).

Is this with the Hobbit client, or the BB client reporting data ?

Since it always happens on the same time for a given server, it would be
interesting to see what messages are fed into Hobbit around that time.
If you are running the Hobbit client, could you setup a cron job to fetch
the client data around that time ? It should run something like

   wget http://hobbitserver/cgi-bin/bb-hostsvc.sh?CLIENT=bad.client.name

and store the output in a file where you can look at it later. Best thing
would be if you could run this every minute for 15 minutes around the time
this problem occurs.

If you are running the BB client, the interesting part is the "cpu"
column data that is sent around that time. So something similar, except that
the URL you should fetch is

http://hobbitserver/cgi-bin/bb-hostsvc.sh?HOSTSVC=bad,client,name.cpu


Regards,
Henrik


For more information about the Viatel Group, please visit www.viatel.com

THIS MESSAGE IS INTENDED ONLY FOR THE USE OF THE INTENDED RECIPIENT TO WHICH IT IS ADDRESSED AND MAY CONTAIN INFORMATION THAT IS PRIVILEGED, CONFIDENTIAL AND EXEMPT FROM DISCLOSURE.  If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, you are notified that any dissemination, distribution or copying of this e-mail is prohibited, and you should delete this e-mail from your system.

This message has been scanned for viruses and spam by Viatel MailControl - www.viatel.com
list Henrik Størner · Wed, 1 Feb 2006 14:47:14 +0100 ·
Hi Mike,
quoted from Mike Rowell

On Wed, Feb 01, 2006 at 01:12:32PM -0000, Rowell, Mike wrote:
I'll pull the data at the client side and see if I can tell whats going
wrong.
I was a bit silly when I suggested that you pull the data from the web.
An easier way (and it gives me some more interesting data to work with)
would be to get it directly from the hobbit daemon. So just run

   bb 127.0.0.1 "hobbitdlog bad,client,name.cpu"

and save this to a file.

The reason this is more interesting is because it provides data about
when the last status message was received and such.


Regards,
Henrik
list Mike Rowell · Wed, 1 Feb 2006 13:49:51 -0000 ·
Something even more curious,

I've just had a closer look at the rrdtool dump and it seems the 5minute
data is fully intact, however when it rolls it over to the half hour and
increasing time periods that the errors creep in.

This looks like it could be an rrd issue, I'll try upgrading rrd first
methinks.
quoted from Mike Rowell

Regards,

Mike Rowell 

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: 01 February 2006 13:47
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Users / Procs Graphing problem

Hi Mike,

On Wed, Feb 01, 2006 at 01:12:32PM -0000, Rowell, Mike wrote:
I'll pull the data at the client side and see if I can tell whats 
going wrong.
I was a bit silly when I suggested that you pull the data from the web.
An easier way (and it gives me some more interesting data to work with)
would be to get it directly from the hobbit daemon. So just run

   bb 127.0.0.1 "hobbitdlog bad,client,name.cpu"

and save this to a file.

The reason this is more interesting is because it provides data about when
the last status message was received and such.


Regards,
Henrik


For more information about the Viatel Group, please visit www.viatel.com

THIS MESSAGE IS INTENDED ONLY FOR THE USE OF THE INTENDED RECIPIENT TO WHICH IT IS ADDRESSED AND MAY CONTAIN INFORMATION THAT IS PRIVILEGED, CONFIDENTIAL AND EXEMPT FROM DISCLOSURE.  If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering the message to the intended recipient, you are notified that any dissemination, distribution or copying of this e-mail is prohibited, and you should delete this e-mail from your system.

This message has been scanned for viruses and spam by Viatel MailControl - www.viatel.com