Xymon Mailing List Archive search

CPU load average not being graphed for some servers

5 messages in this thread

list Martin Ward · Wed, 13 Oct 2010 17:07:26 +0100 ·
Hi all,

 
I'm having trouble tracking this one down. For many of my
Xymon-monitored servers I can see the graphs on the CPU load page and
they have data in them. Yet for many others the CPU graphs are empty. I
can see the graphs but there is no data.

 
Investigation reveals that for those servers who are missing data the
.cpu file has not been updated in months, yet when I look at the client
data available it shows the load average in the uptime section, as in:

 
[uptime]

  5:07pm  2 users,  load average: 0.53, 0.71, 0.62

 
which is, I believe, where Xymon gets the LA from. 

 
The file ownership and permissions on the data/hist/*.cpu files are all
correct (hobbit:hobbit and 644 on this system), in fact the only thing I
noticed that doesn't make sense is that the data/hostdata/ directories
are not being updated for those servers that do not have the CPU graphs,
even though I can see the host data in the Xymon web pages. 

 
Confused? I know I am.

 
|\/|

--  

 
Martin Ward

Manager, Technical Services

 
DDI:+44 (0) 20 7863 5218 / Fax: +XX (X)XX XXXX XXXX /  www.colt.net
<http://www.colt.net/>; 

Colt Technology Services, Unit XX, Powergate Business Park, Volt Avenue,
Park Royal, London, NW10 6PW, UK.

 
Help reduce your carbon footprint | Think before you print. Registered
in England and Wales, registered number 02452736, VAT number GB 645 4205
50

 
[Colt Disclaimer]
The message is intended for the named addressee only and may not be disclosed
to or used by anyone else, nor may it be copied in any way. The contents of
this message and its attachments are confidential and may also be subject to
legal privilege. If you are not the named addressee and/or have received this
message in error, please advise us by e-mailing user-51905b889b93@xymon.invalid and delete the
message and any attachments without retaining any copies. Internet
communications are not secure and Colt does not accept responsibility for this
message, its contents nor responsibility for any viruses. No contracts can be
created or varied on behalf of Colt Technology Services, its subsidiaries,
group companies or affiliates ("Colt") and any other party by email
communications unless expressly agreed in writing with such other party.
Please note that incoming emails will be automatically scanned to eliminate
potential viruses and unsolicited promotional emails. For more information
refer to www.colt.net or contact us on +44(0)20 7390 3900
Attachments (1)
list Thomas R. Brand · Wed, 13 Oct 2010 12:52:34 -0400 ·
From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid] 
quoted from Martin Ward

 
Hi all,

 
I'm having trouble tracking this one down. For many of my
Xymon-monitored servers I can see the graphs on the CPU load page and
they have data in them. Yet for many others the CPU graphs are empty. I
can see the graphs but there is no data.

 
Investigation reveals that for those servers who are missing data the
.cpu file has not been updated in months, yet when I look at the client
data available it shows the load average in the uptime section, as in:

 
[uptime]

  5:07pm  2 users,  load average: 0.53, 0.71, 0.62

 
which is, I believe, where Xymon gets the LA from. 

 

Martin,

 
This may be related to an issue I came across last year.

 
The (cpu load and users & processes) graphs appear to be dependent on
the exact output of the 'uptime' command.

 
In my case, the graphs did show for the first 24 hours of uptime,
didn't show after 24 hours, then started showing again after 48 hours of
uptime.

I was finally able to isolate this to the output of 'uptime' for the
first full day - it showed 'day' not 'days'.

 
# uptime

 12:41pm  up 196 days 21:49,  5 users,  load average: 0.23, 0.19, 0.17

 
# uptime

 12:41pm  up 1 day 21:49,  2 users,  load average: 0.06, 0.02, 0.00

 
I 'fixed' this by updating hobbitclient-linux.sh and modified the output
of the 'uptime' command replacing 'day ' with 'days '.

 
echo "[uptime]"

uptime | perl -pe "s/^(.*) day (.*)/\1 days \2/"

 
It appears your uptime command does not show the 'up x days HH:MM'.

 
Cheers,

Tom Brand
list Martin Ward · Thu, 14 Oct 2010 15:46:28 +0100 ·
Thanks Tom, that got me sorted. It seems  that Solaris relies on the
BOOT_TIME record held in the /var/adm/utmpx file. This file has been
rotated out of the way in order to save disk space so I got no uptime
values at all.  It looks like this messed with the load average data
since the uptime output didn't have any uptime in it.

 
Like you I have hacked the hobbitclient-sunos.sh file and put in a small
perl scriptlet so that if there is no uptime it adds a fake value in
just to ensure that the load averages get stored properly.
quoted from Thomas R. Brand

 
|\/|

 
From: Brand, Thomas R. [mailto:user-10a840458972@xymon.invalid] 
Sent: 13 October 2010 17:53
To: xymon at xymon.com
Subject: RE: [xymon] CPU load average not being graphed for some servers

 
From: Ward, Martin [mailto:user-2d33a6eb6a05@xymon.invalid] 

 
Hi all,

 
I'm having trouble tracking this one down. For many of my
Xymon-monitored servers I can see the graphs on the CPU load page and
they have data in them. Yet for many others the CPU graphs are empty. I
can see the graphs but there is no data.

 
Investigation reveals that for those servers who are missing data the
.cpu file has not been updated in months, yet when I look at the client
data available it shows the load average in the uptime section, as in:

 
[uptime]

  5:07pm  2 users,  load average: 0.53, 0.71, 0.62

 
which is, I believe, where Xymon gets the LA from. 

 
Martin,

 
This may be related to an issue I came across last year.

 
The (cpu load and users & processes) graphs appear to be dependent on
the exact output of the 'uptime' command.

 
In my case, the graphs did show for the first 24 hours of uptime,
didn't show after 24 hours, then started showing again after 48 hours of
uptime.

I was finally able to isolate this to the output of 'uptime' for the
first full day - it showed 'day' not 'days'.

 
# uptime

 12:41pm  up 196 days 21:49,  5 users,  load average: 0.23, 0.19, 0.17

 
# uptime

 12:41pm  up 1 day 21:49,  2 users,  load average: 0.06, 0.02, 0.00

 
I 'fixed' this by updating hobbitclient-linux.sh and modified the output
of the 'uptime' command replacing 'day ' with 'days '.

 
echo "[uptime]"

uptime | perl -pe "s/^(.*) day (.*)/\1 days \2/"

 
It appears your uptime command does not show the 'up x days HH:MM'.

 
Cheers,

Tom Brand


[Colt Disclaimer]
The message is intended for the named addressee only and may not be disclosed
to or used by anyone else, nor may it be copied in any way. The contents of
this message and its attachments are confidential and may also be subject to
legal privilege. If you are not the named addressee and/or have received this
message in error, please advise us by e-mailing user-51905b889b93@xymon.invalid and delete the
message and any attachments without retaining any copies. Internet
communications are not secure and Colt does not accept responsibility for this
message, its contents nor responsibility for any viruses. No contracts can be
created or varied on behalf of Colt Technology Services, its subsidiaries,
group companies or affiliates ("Colt") and any other party by email
communications unless expressly agreed in writing with such other party.
Please note that incoming emails will be automatically scanned to eliminate
potential viruses and unsolicited promotional emails. For more information
refer to www.colt.net or contact us on +44(0)20 7390 3900
list Buchan Milne · Fri, 15 Oct 2010 12:20:47 +0100 ·
quoted from Martin Ward
On Thursday, 14 October 2010 15:46:28 Ward, Martin wrote:
Thanks Tom, that got me sorted. It seems  that Solaris relies on the
BOOT_TIME record held in the /var/adm/utmpx file. This file has been
rotated out of the way in order to save disk space so I got no uptime
values at all.  It looks like this messed with the load average data
since the uptime output didn't have any uptime in it.


Like you I have hacked the hobbitclient-sunos.sh file and put in a small
perl scriptlet so that if there is no uptime it adds a fake value in
just to ensure that the load averages get stored properly.
It would be useful if you could instead supply the "client data" for the host 
in the events where this breaks, so it can be fixed in hobbitd_client instead.

Regards,
Buchan
list Martin Ward · Fri, 15 Oct 2010 12:33:26 +0100 ·
Hi Buchan,

I did provide the output from the client data for the particular
section, I didn't think it made sense to dump the whole output packet
when it's only two lines that are the problem. Still, here it is again
with more detailed information on the issue, hopefully this will help
someone to code around this.

The original issue I had was that if the /var/adm/utmpx file didn't
exist or didn't contain a BOOT_TIME record then the output of the
uptime(1) command looked like this in the Xymon client data:

[uptime]
 11:25am  1 user,  load average: 1.21, 0.64, 0.46
[who]
...

For reasons unknown (because I haven't dug through the code) this
stopped Xymon from logging the load average data even though it
displayed it at the top of the "cpu" web page.

The Xymon code seems to require the output of uptime to look like this:
[uptime]
 12:29pm  up 133 day(s),  2:34,  5 users,  load average: 4.90, 4.63,
4.41
[who]
...

If it helps you any I have also seen uptime output, when the uptime is
less than one day, of:
[uptime]
 12:29pm  up 2:34,  5 users,  load average: 4.90, 4.63, 4.41
[who]
...

I hope this helps,
quoted from Martin Ward

|\/|
--  
Martin Ward
Manager, Technical Services

DDI:+44 (0) 20 7863 5218 / Fax: +XX (X)XX XXXX XXXX /  www.colt.net
Colt Technology Services, Unit XX, Powergate Business Park, Volt Avenue,
Park Royal, London, NW10 6PW, UK.

Help reduce your carbon footprint | Think before you print. Registered
in England and Wales, registered number 02452736, VAT number GB 645 4205
50

-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: 15 October 2010 12:21
To: xymon at xymon.com
Cc: Ward, Martin
Subject: Re: [xymon] CPU load average not being graphed for some
servers

On Thursday, 14 October 2010 15:46:28 Ward, Martin wrote:
Thanks Tom, that got me sorted. It seems  that Solaris relies on the
BOOT_TIME record held in the /var/adm/utmpx file. This file has been
rotated out of the way in order to save disk space so I got no
uptime
values at all.  It looks like this messed with the load average data
since the uptime output didn't have any uptime in it.


Like you I have hacked the hobbitclient-sunos.sh file and put in a
small
perl scriptlet so that if there is no uptime it adds a fake value in
just to ensure that the load averages get stored properly.
It would be useful if you could instead supply the "client data" for
the host
in the events where this breaks, so it can be fixed in hobbitd_client
instead.

Regards,
Buchan
[Colt Disclaimer]
The message is intended for the named addressee only and may not be disclosed
to or used by anyone else, nor may it be copied in any way. The contents of
this message and its attachments are confidential and may also be subject to
legal privilege. If you are not the named addressee and/or have received this
message in error, please advise us by e-mailing user-51905b889b93@xymon.invalid and delete the
message and any attachments without retaining any copies. Internet
communications are not secure and Colt does not accept responsibility for this
message, its contents nor responsibility for any viruses. No contracts can be
created or varied on behalf of Colt Technology Services, its subsidiaries,
group companies or affiliates ("Colt") and any other party by email
communications unless expressly agreed in writing with such other party.
Please note that incoming emails will be automatically scanned to eliminate
potential viruses and unsolicited promotional emails. For more information
refer to www.colt.net or contact us on +44(0)20 7390 3900