Xymon Mailing List Archive search

Graphs stop update 24 hours after client reboot; start again 24 hours later.

6 messages in this thread

list Thomas R. Brand · Tue, 27 Jan 2009 10:21:00 -0500 ·
Hi all,

I need some help/suggestions to figure out why my "cpu load" and "users
& processes" graphs stop updating about 24 hours after the systems
reboot. The updates stop for anywhere from 12 to 24 hours, then simply
start back up again.
Only the "CPU load" and the "Users and Processes" graphs are having the
problem; disk, memory, cpu utilization, network traffic don't miss a
beat.

We have a number of identically configured systems, they all reboot
around 00:30 local time (they are in different time zones) Wednesday
mornings. And they all stop reporting cpu-load/users-and-processes
graphs sometime Thursday mornings and then start up again Thursday
afternoon/Friday morning. They don't all stop/start at exactly the same
time, but the majority do stop/start at the same time.  I end up with
about a 24 hour gap in my graphs every week.

The rrd files are all updated except for the la.rrd, procs.rrd, and
users.rrd:
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 16:56 clock.rrd
-rw-r--r-- 1 hobbit hobbit  38536 Jan 22 16:56 disk,cvsrx.rrd
-rw-r--r-- 1 hobbit hobbit  38536 Jan 22 16:56 disk,root.rrd
-rw-r--r-- 1 hobbit hobbit  38536 Jan 22 16:56 ifstat.eth0.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 00:38 la.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 16:56 memory.actual.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 16:56 memory.real.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 16:56 memory.swap.rrd
-rw-r--r-- 1 hobbit hobbit  57520 Jan 22 16:56 mysql.rrd
-rw-r--r-- 1 hobbit hobbit 304312 Jan 22 16:56 netstat.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 00:38 procs.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 16:57 tcp.conn.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 16:59 tcp.ssh.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Jan 22 00:38 users.rrd
-rw-r--r-- 1 hobbit hobbit 323296 Jan 22 16:56 vmstat.rrd


I've restarted the hobbit client and the hobbit server; no help.

Any pointers/suggestions would be very welcome!

Tom

Tom Brand
CVS/pharmacy
list Henrik Størner · Wed, 28 Jan 2009 12:23:17 +0000 (UTC) ·
quoted from Thomas R. Brand
In <user-11dc7467f42e@xymon.invalid> "Brand, Thomas R." <user-10a840458972@xymon.invalid> writes:
I need some help/suggestions to figure out why my "cpu load" and "users
& processes" graphs stop updating about 24 hours after the systems
reboot. The updates stop for anywhere from 12 to 24 hours, then simply
start back up again.
Only the "CPU load" and the "Users and Processes" graphs are having the
problem; disk, memory, cpu utilization, network traffic don't miss a
beat.
The only explanation I can come up with is that the format of
some of the "cpu" status message is different for the first 24 hours
after a reboot.

Could you send me an example of the cpu status shortly after a reboot,
and one when the graphs are working ?

What OS are these boxes ?


Regards,
Henrik
list Thomas R. Brand · Wed, 28 Jan 2009 17:26:47 -0500 ·
-----Original Message-----
From: Henrik "StC8rner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, January 28, 2009 7:23 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Graphs stop update 24 hours after client reboot;
start again 24 hours later.
quoted from Henrik Størner

In <user-11dc7467f42e@xymon.invalid>
"Brand, Thomas R." <user-10a840458972@xymon.invalid> writes:
I need some help/suggestions to figure out why my "cpu load" and
"users
& processes" graphs stop updating about 24 hours after the systems
reboot. The updates stop for anywhere from 12 to 24 hours, then
simply
start back up again.
Only the "CPU load" and the "Users and Processes" graphs are having
the
problem; disk, memory, cpu utilization, network traffic don't miss a
beat.
The only explanation I can come up with is that the format of
some of the "cpu" status message is different for the first 24 hours
after a reboot.

Could you send me an example of the cpu status shortly after a reboot,
and one when the graphs are working ?

What OS are these boxes ?


Regards,
Henrik
Hi Henrik,

 The systems are running SUSE Linux Enterprise Server 10 SP1 (SLES
10.1).
It's pretty much a standard out-of-the-box OS install, nothing very odd.
The hobbit server is also SLES10.1

Slight correction: the server reboots, graphs show fine for 24 hours,
then graphs stop for 12-24 hours, then graphs start again...
    reboot: wed 00:30
    graph shows data until Thursday 00:30 and then stops
    graph data starts again 12-24 hours after stopping.

I'm not sure what you mean by 'send me an example of the cpu status'...
are you looking for a data file? a log file?

Thanks for taking time to respond,
Tom
list Thomas R. Brand · Mon, 16 Feb 2009 17:44:53 -0500 ·
quoted from Thomas R. Brand
-----Original Message-----
From: Henrik "StC8rner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Wednesday, January 28, 2009 7:23 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Graphs stop update 24 hours after client reboot;
start again 24 hours later.

In <user-11dc7467f42e@xymon.invalid>
"Brand, Thomas R." <user-10a840458972@xymon.invalid> writes:
I need some help/suggestions to figure out why my "cpu load" and
"users
& processes" graphs stop updating about 24 hours after the systems
reboot. The updates stop for anywhere from 12 to 24 hours, then
simply
start back up again.
Only the "CPU load" and the "Users and Processes" graphs are having
the
problem; disk, memory, cpu utilization, network traffic don't miss a
beat.
The only explanation I can come up with is that the format of
some of the "cpu" status message is different for the first 24 hours
after a reboot.

Could you send me an example of the cpu status shortly after a reboot,
and one when the graphs are working ?

What OS are these boxes ?


Regards,
Henrik
Hi Henrik,

  I'm still struggling to understand why the graphs stop updating and
appreciate your taking the time to respond.

I'm not sure what you mean by 'send me an example of the cpu status'...
are you looking for a data file? a log file? or the 'client data' as
reported by http://lxadmin02/hobbit-cgi/bb-hostsvc.sh?CLIENT=s00766rxs?

Both the Hobbit server & client are running on a SUSE Linux Enterprise
Server 10 SP1 (SLES 10.1) OS.
It's pretty much a standard out-of-the-box OS install, nothing very odd.

Hobbit is version 4.2.0 with all-in-one patch.

To clarify, the server reboots, graphs update for 24 hours. Then, 24
hours after the reboot graphs stop for 24 hours, then graphs start
quoted from Thomas R. Brand
again...
    reboot: wed 00:30
    graph shows data until Thursday 00:30 and then stops

    graph data starts again -- usually the updates start exactly 24
hours after stopping.


Here is a list of current rrd directory from a system that rebooted
yesterday (Feb 15) at 9:40 am. The data for users,procs,la stopped
updating at 9:40 am today, but the other data files are still updating:
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 09:40 users.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 09:40 procs.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 09:40 la.rrd
-rw-r--r-- 1 hobbit hobbit  57520 Feb 16 17:18 mysql.rrd
-rw-r--r-- 1 hobbit hobbit 323296 Feb 16 17:18 vmstat.rrd
-rw-r--r-- 1 hobbit hobbit 304312 Feb 16 17:18 netstat.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 17:18 memory.swap.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 17:18 memory.real.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 17:18 memory.actual.rrd
-rw-r--r-- 1 hobbit hobbit  38536 Feb 16 17:18 ifstat.eth0.rrd
-rw-r--r-- 1 hobbit hobbit  38536 Feb 16 17:18 disk,root.rrd
-rw-r--r-- 1 hobbit hobbit  38536 Feb 16 17:18 disk,cvsrx.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 17:18 clock.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 17:20 tcp.conn.rrd
-rw-r--r-- 1 hobbit hobbit  19552 Feb 16 17:20 tcp.ssh.rrd


Thanks for taking time to respond,
Tom
list Patrik Nilsson · Wed, 30 Sep 2009 15:51:08 +0200 ·
Returning to this old thread as I ran into this issue today.
quoted from Thomas R. Brand

Wed, 28 Jan 2009 12:23:17 +0000 (UTC), Henrik wrote:
"Brand, Thomas R." <user-10a840458972@xymon.invalid> writes:
I need some help/suggestions to figure out why my "cpu load" and "users
& processes" graphs stop updating about 24 hours after the systems
reboot. The updates stop for anywhere from 12 to 24 hours, then simply
start back up again.
Only the "CPU load" and the "Users and Processes" graphs are having the
problem; disk, memory, cpu utilization, network traffic don't miss a
beat.
The only explanation I can come up with is that the format of
some of the "cpu" status message is different for the first 24 hours
after a reboot.
Could you send me an example of the cpu status shortly after a reboot,
and one when the graphs are working ?
What OS are these boxes ?
Running openSUSE 11.1 (x86_64).

Client output that does not update the rrd:

[top]
top - 14:39:44 up 1 day,  4:40,  3 users,  load average: 2.42, 2.88, 2.89

Client output that does update the rrd:

[top]
top - 14:42:51 up 40 days,  2:41,  3 users,  load average: 4.19, 3.61, 3.10

The only difference I can see is "day" instead of "days".

Regards,

Patrik
list Thomas R. Brand · Thu, 1 Oct 2009 11:39:54 -0400 ·
quoted from Patrik Nilsson
-----Original Message-----
From: Patrik Nilsson [mailto:user-f78fa12d6274@xymon.invalid]
Sent: Wednesday, September 30, 2009 9:51 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Graphs stop update 24 hours after client reboot;
start again 24 hours later.

Returning to this old thread as I ran into this issue today.

Wed, 28 Jan 2009 12:23:17 +0000 (UTC), Henrik wrote:
"Brand, Thomas R." <user-10a840458972@xymon.invalid> writes:
I need some help/suggestions to figure out why my "cpu load" and
"users
& processes" graphs stop updating about 24 hours after the systems
reboot. The updates stop for anywhere from 12 to 24 hours, then
simply
start back up again.
Only the "CPU load" and the "Users and Processes" graphs are having
the
problem; disk, memory, cpu utilization, network traffic don't miss a
beat.
The only explanation I can come up with is that the format of
some of the "cpu" status message is different for the first 24 hours
after a reboot.
Could you send me an example of the cpu status shortly after a
reboot,
and one when the graphs are working ?
What OS are these boxes ?
Running openSUSE 11.1 (x86_64).

Client output that does not update the rrd:

[top]
top - 14:39:44 up 1 day,  4:40,  3 users,  load average: 2.42, 2.88,
2.89
Client output that does update the rrd:

[top]
top - 14:42:51 up 40 days,  2:41,  3 users,  load average: 4.19, 3.61,
3.10

The only difference I can see is "day" instead of "days".

Regards,

Patrik
Based on Patrik's observation, I tried a few more things and found that
'top' does not appear to be the problem; however, on SuSE Linux 10.x
'uptime' also uses 'day' vs. 'days' and it is this value that causes the
la.rrd graphs to lose the info.

As a quick-fix, I have modified hobbitclient-linux.sh on my SuSE Linux
10.x 
systems as follows; 

echo "[uptime]"
uptime | perl -pe "s/^(.*) day (.*)/\1 days \2/"

The graphs updated on the next polling interval and started displaying
the missing information.

Thanks for pointing out the way Patrik :)

Now if someone can figure out where and what needs to be updated in the
source code -- that's a bit beyond my skills...

Cheers
Tom Brand