Xymon Mailing List Archive search

Problem disabling large groups of hosts

5 messages in this thread

list Simmons Clint · Tue, 29 Jan 2013 17:36:02 +0000 ·
I'm having a problem trying to disable over 1000+ hosts and tests at one time for maintenance windows. I've even tried disabling in groups of 100-200 but it will still miss a group of servers here and there. It seems that when I click "apply" and once the screen refreshes could possibly stop running the enadis.sh/cgi script in the background (??)

Is there a way to debug this or modify the refresh time/interval?

The reason we want to disable all these hosts/tests for maintenance windows is to keep SLA during the reboots and service outages for patching etc.. Does anyone have other recommendations or suggest using REPORTTIME globally instead?


FreeBSD 9.0
Apache 2.2.22
Xymon v4.3.10


Thanks in advance,

Clint

This e-mail (and any attachment) is strictly confidential and for use only by intended recipient(s). The opinions therein expressed are those of the author. Its contents, therefore, do not represent any commitment between the company and the recipient(s) and no liability or responsibility is accepted by the company for the above mentioned content. If you are not an intended recipient(s), please notify the author promptly and delete this message.
list Japheth Cleaver · Tue, 29 Jan 2013 13:42:48 -0800 (PST) ·
quoted from Simmons Clint
I'm having a problem trying to disable over 1000+ hosts and tests at one
time for maintenance windows. I've even tried disabling in groups of
100-200 but it will still miss a group of servers here and there. It seems
that when I click "apply" and once the screen refreshes could possibly
stop running the enadis.sh/cgi script in the background (??)

Is there a way to debug this or modify the refresh time/interval?

The reason we want to disable all these hosts/tests for maintenance
windows is to keep SLA during the reboots and service outages for patching
etc.. Does anyone have other recommendations or suggest using REPORTTIME
globally instead?
Possibly adding --debug to cgioptions under ~xymon/server/etc/ might give
you some additional info... Sadly, it could be httpd closing the
connection :/

If you're doing that many, I might suggest simply generating a series of
disable/enable commands and just piping them in via the xymon CLI.

There's also the 'schedule' option too.


Regards,

-jc
list Jeremy Laidman · Thu, 31 Jan 2013 14:26:37 +1100 ·
quoted from Simmons Clint
On 30 January 2013 04:36, Simmons Clint <user-308c5c21ab11@xymon.invalid> wrote:
I’m having a problem trying to disable over 1000+ hosts and tests at one
time for maintenance windows. I’ve even tried disabling in groups of
100-200 but it will still miss a group of servers here and there. It seems
that when I click “apply” and once the screen refreshes could possibly stop
running the enadis.sh/cgi script in the background (??)
I'm thinking that there's some limit being reached, perhaps maximum rate of
commands to xymond (although I'm not aware of any such thing), maximum CGI
request size, or maximum CGI lifetime.

First, check your xymond logs for warnings.  Also, check the Apache logs.
 If nothing, then run the following command:

xymond_channel --channel=enadis sh -c 'cat >/tmp/enadis-msgs.dump'

Then do your disable from the web interface.  When it's all finished
(perhaps when the last of the servers show as disabled), stop the above
process and review the dump file, looking for the hosts that didn't get
disabled.  If they show up in the dump, the problem might be in the xymond
process handling the disable commands.  This would be tricky to diagnose,
and might require a review of the code, or running xymond with debugging on.

If the hosts didn't show up in the dump, the problem might be in the CGI
process (enadis.cgi).  You could replace enadis.sh with a modified version
that first copies STDIN to a dump file before sending it on the normal path
to the CGI.  Such as adding the "cat" line below immediately before the
"exec" line like so:

#!/bin/sh
# This is a wrapper for the Xymon enadis script
. /usr/lib/xymon/server/etc/cgioptions.cfg
cat > /tmp/enadis-cgi.dump; cat /tmp/enadis-cgi.dump |
 exec /usr/lib/xymon/server/bin/enadis.cgi $CGI_ENADIS_OPTS

Also, you might see if sending the disable messages from the command-line
all at once also produces the same behaviour.  If it does, then the CGI is
not your problem, and is more likely to be the xymond process.

J
list Darin D [itsys] Dugan · Thu, 31 Jan 2013 15:31:37 +0000 ·
This may be a genuine problem to be solved, but.... Doesn't DOWNTIME fit the bill here to designate recurring maintenance windows?
http://www.xymon.com/xymon/help/manpages/man5/hosts.cfg.5.html

Cheers.
quoted from Simmons Clint

From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Simmons Clint
Sent: Tuesday, January 29, 2013 11:36 AM
To: xymon at xymon.com
Subject: [Xymon] Problem disabling large groups of hosts


I'm having a problem trying to disable over 1000+ hosts and tests at one time for maintenance windows. I've even tried disabling in groups of 100-200 but it will still miss a group of servers here and there. It seems that when I click "apply" and once the screen refreshes could possibly stop running the enadis.sh/cgi script in the background (??)

Is there a way to debug this or modify the refresh time/interval?

The reason we want to disable all these hosts/tests for maintenance windows is to keep SLA during the reboots and service outages for patching etc.. Does anyone have other recommendations or suggest using REPORTTIME globally instead?


FreeBSD 9.0
Apache 2.2.22
Xymon v4.3.10


Thanks in advance,

Clint

This e-mail (and any attachment) is strictly confidential and for use only by intended recipient(s). The opinions therein expressed are those of the author. Its contents, therefore, do not represent any commitment between the company and the recipient(s) and no liability or responsibility is accepted by the company for the above mentioned content. If you are not an intended recipient(s), please notify the author promptly and delete this message.
list Simmons Clint · Thu, 31 Jan 2013 21:22:04 +0000 ·
There's also the 'schedule' option too. <-- Using this method worked!

Thanks,
Clint
quoted from Darin D [itsys] Dugan


-----Original Message-----
From: user-87556346d4af@xymon.invalid [mailto:user-87556346d4af@xymon.invalid]
Sent: Tuesday, January 29, 2013 3:43 PM
To: Simmons Clint
Cc: xymon at xymon.com
Subject: Re: [Xymon] Problem disabling large groups of hosts
I'm having a problem trying to disable over 1000+ hosts and tests at
one time for maintenance windows. I've even tried disabling in groups
of
100-200 but it will still miss a group of servers here and there. It
seems that when I click "apply" and once the screen refreshes could
possibly stop running the enadis.sh/cgi script in the background (??)

Is there a way to debug this or modify the refresh time/interval?

The reason we want to disable all these hosts/tests for maintenance
windows is to keep SLA during the reboots and service outages for
patching etc.. Does anyone have other recommendations or suggest using
REPORTTIME globally instead?
Possibly adding --debug to cgioptions under ~xymon/server/etc/ might give you some additional info... Sadly, it could be httpd closing the connection :/

If you're doing that many, I might suggest simply generating a series of disable/enable commands and just piping them in via the xymon CLI.

There's also the 'schedule' option too.


Regards,

-jc

This e-mail (and any attachment) is strictly confidential and for use only by intended recipient(s). The opinions therein expressed are those of the author. Its contents, therefore, do not represent any commitment between the company and the recipient(s) and no liability or responsibility is accepted by the company for the above mentioned content. If you are not an intended recipient(s), please notify the author promptly and delete this message.