Feature request - thresholds for CPU utilisation (not load average)
list Buchan Milne
Something I have been wondering about for a while is whether it would be possible to have thresholds on the CPU utilisation. While we have thresholds for load averages, in some cases these have to be relatively high (e.g. 2 to 4 times the number of CPUs) due to the impact of IO wait on load average (e.g, our SAN-attached NFS servers often have a load average of over 10, with a CPU utilisation of 50%, when reading over 10k blocks/sec). However, it then makes it difficult to catch a process in CPU-race (as much less IO gets done, IO wait is low, and load average is almost exactly 1 *CPUs). The CPU utilisation is already reported (in the vmstat data), which is how I know the above about our NFS servers (vmstat/vmstat1 graph). This would also remove the complication of thresholds differing between servers with different numbers of CPUs, and maybe work better for Windows clients (which don't seem to have a concept of load average). (I don't mean thresholds for load average should be removed ... I would love to have thresholds for both load average and CPU utilisation). Regards, Buchan
list Taylor Lewick
Funny you brought this up just now, because today I noticed if you load the windows client, either bbwin or bbnt, those allow you to set alerts for CPU utilization, but both big brother and hobbit only understand load average, so I keep getting alerts saying load is very high, when the cpus are around 20-50% Well a load of 20 on a linux/unix server would be very high, but Windows boxes don't really have the load average concept, just the cpu utilization, so if you are monitoring utilization on windows clients you have to change the load to something like 70 90 to avoid getting red pages. -----Original Message----- From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid] Sent: Thursday, February 28, 2008 12:44 PM To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] Feature request - thresholds for CPU utilisation (not load average)
▸
Something I have been wondering about for a while is whether it would be
possible to have thresholds on the CPU utilisation. While we have
thresholds
for load averages, in some cases these have to be relatively high (e.g.
2 to
4 times the number of CPUs) due to the impact of IO wait on load average
(e.g, our SAN-attached NFS servers often have a load average of over 10,
with
a CPU utilisation of 50%, when reading over 10k blocks/sec). However, it
then
makes it difficult to catch a process in CPU-race (as much less IO gets
done,
IO wait is low, and load average is almost exactly 1 *CPUs).
The CPU utilisation is already reported (in the vmstat data), which is
how I
know the above about our NFS servers (vmstat/vmstat1 graph).
This would also remove the complication of thresholds differing between
servers with different numbers of CPUs, and maybe work better for
Windows
clients (which don't seem to have a concept of load average).
(I don't mean thresholds for load average should be removed ... I would
love
to have thresholds for both load average and CPU utilisation).
Regards,
Buchan
list Tom Kauffman
I'll second that. I just found out we had a test system that has had an oracle process using 99% of one cpu for the past (drumroll!) two months and we didn't notice it! Tom Kauffman NIBCO, Inc
▸
-----Original Message-----
From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid]
Sent: Thursday, February 28, 2008 1:44 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Feature request - thresholds for CPU utilisation (not load average)
Something I have been wondering about for a while is whether it would be
possible to have thresholds on the CPU utilisation. While we have thresholds
for load averages, in some cases these have to be relatively high (e.g. 2 to
4 times the number of CPUs) due to the impact of IO wait on load average
(e.g, our SAN-attached NFS servers often have a load average of over 10, with
a CPU utilisation of 50%, when reading over 10k blocks/sec). However, it then
makes it difficult to catch a process in CPU-race (as much less IO gets done,
IO wait is low, and load average is almost exactly 1 *CPUs).
The CPU utilisation is already reported (in the vmstat data), which is how I
know the above about our NFS servers (vmstat/vmstat1 graph).
This would also remove the complication of thresholds differing between
servers with different numbers of CPUs, and maybe work better for Windows
clients (which don't seem to have a concept of load average).
(I don't mean thresholds for load average should be removed ... I would love
to have thresholds for both load average and CPU utilisation).
Regards,
Buchan
CONFIDENTIALITY NOTICE: This email and any attachments are for the
exclusive and confidential use of the intended recipient. If you are not
the intended recipient, please do not read, distribute or take action in
reliance upon this message. If you have received this in error, please
notify us immediately by return email and promptly delete this message
and its attachments from your computer system. We do not waive
attorney-client or work product privilege by the transmission of this
message.
list Josh Luthman
Thirdsies!
▸
On 2/28/08, Kauffman, Tom <user-3feba9e60a8b@xymon.invalid> wrote:I'll second that. I just found out we had a test system that has had an oracle process using 99% of one cpu for the past (drumroll!) two months and we didn't notice it! Tom Kauffman NIBCO, Inc -----Original Message----- From: Buchan Milne [mailto:user-9b139aff4dec@xymon.invalid] Sent: Thursday, February 28, 2008 1:44 PM To: user-ae9b8668bcde@xymon.invalid Subject: [hobbit] Feature request - thresholds for CPU utilisation (not load average) Something I have been wondering about for a while is whether it would be possible to have thresholds on the CPU utilisation. While we have thresholds for load averages, in some cases these have to be relatively high (e.g. 2 to 4 times the number of CPUs) due to the impact of IO wait on load average (e.g, our SAN-attached NFS servers often have a load average of over 10, with a CPU utilisation of 50%, when reading over 10k blocks/sec). However, it then makes it difficult to catch a process in CPU-race (as much less IO gets done, IO wait is low, and load average is almost exactly 1 *CPUs). The CPU utilisation is already reported (in the vmstat data), which is how I know the above about our NFS servers (vmstat/vmstat1 graph). This would also remove the complication of thresholds differing between servers with different numbers of CPUs, and maybe work better for Windows clients (which don't seem to have a concept of load average). (I don't mean thresholds for load average should be removed ... I would love to have thresholds for both load average and CPU utilisation). Regards, Buchan CONFIDENTIALITY NOTICE: This email and any attachments are for the exclusive and confidential use of the intended recipient. If you are not the intended recipient, please do not read, distribute or take action in reliance upon this message. If you have received this in error, please notify us immediately by return email and promptly delete this message and its attachments from your computer system. We do not waive attorney-client or work product privilege by the transmission of this message.
--
Josh Luthman
Office: XXX-XXX-XXXX
Direct: XXX-XXX-XXXX
XXXX Wayne St
Suite XXXX
Troy, OH XXXXX
Those who don't understand UNIX are condemned to reinvent it, poorly.
--- Henry Spencer
list Bill Richardson
I see that Buchan asked for this a few years back. Has anyone done this. I would like to start alerting on %CPU not LOAD. I would still like to graph LOAD and have that show up under trends. The % CPU is already in Trends being graphed it would be nice just to pull that over to the CPU column. Here is the first request: http://lists.xymon.com/archive/2008-February/017968.html Thanks Bill Richardson
list Henrik Størner
▸
On 07-12-2011 17:13, Bill Richardson wrote:
I see that Buchan asked for this a few years back. Has anyone done this. I would like to start alerting on %CPU not LOAD. I would still like to graph LOAD and have that show up under trends. The % CPU is already in Trends being graphed it would be nice just to pull that over to the CPU column.
In 4.3.x, add this to your analysis.cfg:
HOST=foo
DS cpu vmstat.rrd:cpu_idl >=25 COLOR=green TEXT="CPU load normal"
DS cpu vmstat.rrd:cpu_idl <25 COLOR=yellow TEXT="High CPU load"
DS cpu vmstat.rrd:cpu_idl <10 COLOR=red TEXT="Critical CPU load"
Regards,
Henrik
list Ralph Mitchell
▸
On Wed, Dec 7, 2011 at 4:50 PM, Henrik Størner <user-ce4a2c883f75@xymon.invalid> wrote:
On 07-12-2011 17:13, Bill Richardson wrote:I see that Buchan asked for this a few years back. Has anyone done this. I would like to start alerting on %CPU not LOAD. I would still like to graph LOAD and have that show up under trends. The % CPU is already in Trends being graphed it would be nice just to pull that over to the CPU column.In 4.3.x, add this to your analysis.cfg: HOST=foo DS cpu vmstat.rrd:cpu_idl >=25 COLOR=green TEXT="CPU load normal" DS cpu vmstat.rrd:cpu_idl <25 COLOR=yellow TEXT="High CPU load" DS cpu vmstat.rrd:cpu_idl <10 COLOR=red TEXT="Critical CPU load"
FYI: The column name is missing in the DS example in the docs:
Example: Flag "conn" status a yellow if responsetime exceeds
100 msec.
.br
DS tcp.conn.rrd:sec >0.1 COLOR=yellow TEXT="Response time &V
exceeds &U seconds"
Ralph Mitchell
list Henrik Størner
On Thu, 8 Dec 2011 08:32:59 -0500, Ralph Mitchell <user-00a5e44c48c0@xymon.invalid>
▸
wrote: FYI: The column name is missing in the DS example in the docs:
Thanks - fixed. Regards, Henrik
list Ralph Mitchell
▸
On Thu, Dec 8, 2011 at 8:41 AM, <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, 8 Dec 2011 08:32:59 -0500, Ralph Mitchell <user-00a5e44c48c0@xymon.invalid> wrote:FYI: The column name is missing in the DS example in the docs:Thanks - fixed.
Also, when I put in
TEXT="cpu load....."
the opening double-quote shows in the display. Putting the double-quote
before the TEXT:
"TEXT=cpu load......"
makes it come out OK. I don't know if that's a documentation issue or
something in the code that processes analysis.cfg.
Thanks!
Ralph Mitchell
list Bill Richardson
Great info... Having the ability to alert on the rrd data is great! Thank you!
▸
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Ralph Mitchell
Sent: Thursday, December 08, 2011 8:33 AM
To: Henrik Størner
Cc: xymon at xymon.com
Subject: Re: [Xymon] Feature request - thresholds for CPU utilisation (not load average)
On Wed, Dec 7, 2011 at 4:50 PM, Henrik Størner <user-ce4a2c883f75@xymon.invalid<mailto:user-ce4a2c883f75@xymon.invalid>> wrote:
On 07-12-2011 17:13, Bill Richardson wrote:
I see that Buchan asked for this a few years back. Has anyone done this.
I would like to start alerting on %CPU not LOAD. I would still like to
graph LOAD and have that show up under trends. The % CPU is already in
Trends being graphed it would be nice just to pull that over to the CPU
column.
In 4.3.x, add this to your analysis.cfg:
HOST=foo
DS cpu vmstat.rrd:cpu_idl >=25 COLOR=green TEXT="CPU load normal"
DS cpu vmstat.rrd:cpu_idl <25 COLOR=yellow TEXT="High CPU load"
DS cpu vmstat.rrd:cpu_idl <10 COLOR=red TEXT="Critical CPU load"
FYI: The column name is missing in the DS example in the docs:
Example: Flag "conn" status a yellow if responsetime exceeds
100 msec.
.br
DS tcp.conn.rrd:sec >0.1 COLOR=yellow TEXT="Response time &V exceeds &U seconds"
Ralph Mitchell