Xymon Mailing List Archive search

acknowledgements does not survive xymon restart

8 messages in this thread

list Norbert Kriegenburg · Thu, 1 Nov 2018 10:57:36 +0100 ·
Experts:

I have a new fairly large xymon environment (4.3.28 with >10k servers), and
of course there are always a lot of alerts acknowledged.

Unfortunately all these acks are resetted if i restart the xymon daemon
(xymon.sh restart) to activate config changes f.e.
This is the same if i use the "Acknowledge alert" from the menu or the
(undocumented) "xymondack" feature of xymon.
This leads to a lot of extra effort and confusion...
Any ideas how i can save the ack situation and restore it after a restart?

Norbert
list Schminke_Erik_D · Thu, 1 Nov 2018 06:58:25 -0500 ·
What config changes are you making that requires such frequent restarts?

Changes to hosts.cfg, alerts.cfg, analysis.cfg, client-local.cfg; these are
files that get changed most frequently in my environment.  None of which
require a restart to pick up the changes.  I think the only one that *does*
need a restart, would be xymonserver.cfg, and the only component that would
need to be restarted would be xymond, not the full stack.

Erik D. Schminke | Associate Systems Programmer
Hormel Foods Corporation | One Hormel Place | Austin, MN XXXXX
Phone: (XXX) XXX-XXXX
user-15513f33c451@xymon.invalid | www.hormelfoods.com
list Norbert Kriegenburg · Thu, 1 Nov 2018 13:16:06 +0100 ·
Don't get me wrong: i don't do frequent restarts usually, but from time to
time i need a restart, or the whole server must be restarted bc. of
patches.
And as i constantly have to add new checks with new NCV definitions the
TEST2RRD and SPLITNCV settings in xymonserver.cfg changes, this needs also
a restart.

Because we have such a huge numer of servers a lot of departments use Xymon
regularly (luckily), and use the ack mechanism to organize their work (add
ticket number to an alert, do some evaluation reports and so on).
To have >100 alerts ack'ed is normal situation.
And it creates a lot of extra work to restore this.
In old BB times the acks always survived downtimes and restarts, but now
there are no more ack files stored in the acks dir, so i thought it would
be restored from the info in the alert.chk file, what is not the case.

Norbert
quoted from Schminke_Erik_D


From:	user-15513f33c451@xymon.invalid
To:	xymon at xymon.com
Cc:	user-501bbe9c5409@xymon.invalid
Date:	11/01/2018 12:58 PM
Subject:	Re: [Xymon] acknowledgements does not survive xymon restart


What config changes are you making that requires such frequent restarts?

Changes to hosts.cfg, alerts.cfg, analysis.cfg, client-local.cfg; these are
files that get changed most frequently in my environment.  None of which
require a restart to pick up the changes.  I think the only one that *does*
need a restart, would be xymonserver.cfg, and the only component that would
need to be restarted would be xymond, not the full stack.

Erik D. Schminke | Associate Systems Programmer
Hormel Foods Corporation | One Hormel Place | Austin, MN XXXXX
Phone: (XXX) XXX-XXXX
user-15513f33c451@xymon.invalid | www.hormelfoods.com
list Thomas Eckert · Fri, 2 Nov 2018 10:45:53 +0100 ·
Hi Norbert,

as I can not remember that I encountered this I just tested this (in a mini setup, xymon 4.3.28 on Debian9) and my 2 acks survived the xymon-restart.
As your environment is fairly large this could be size/load problem.

Random ideas:
- is the restart “clean” or does something crash during the stop-phase (logfiles)?
- you could try is to manually force writing a checkpoint-file by sending SIGUSR1 to xymond before restart.
- if you have a redundant/multi-server setup: Is there any chance that an other xymon-server is propagating incomplete state (`xymond_distribute` not enabled)?

All the best
Thomas
quoted from Norbert Kriegenburg
On 01 Nov 2018, at 13:16, Norbert Kriegenburg <user-501bbe9c5409@xymon.invalid> wrote:

Don't get me wrong: i don't do frequent restarts usually, but from time to time i need a restart, or the whole server must be restarted bc. of patches.
And as i constantly have to add new checks with new NCV definitions the TEST2RRD and SPLITNCV settings in xymonserver.cfg changes, this needs also a restart.

Because we have such a huge numer of servers a lot of departments use Xymon regularly (luckily), and use the ack mechanism to organize their work (add ticket number to an alert, do some evaluation reports and so on).
To have >100 alerts ack'ed is normal situation.
And it creates a lot of extra work to restore this.
In old BB times the acks always survived downtimes and restarts, but now there are no more ack files stored in the acks dir, so i thought it would be restored from the info in the alert.chk file, what is not the case.

Norbert


<graycol.gif>EDSchminke---11/01/2018 12:58:35 PM---What config changes are you making that requires such frequent restarts? Changes to hosts.cfg, alert
quoted from Norbert Kriegenburg

From: user-15513f33c451@xymon.invalid
To: xymon at xymon.com
Cc: user-501bbe9c5409@xymon.invalid
Date: 11/01/2018 12:58 PM
Subject: Re: [Xymon] acknowledgements does not survive xymon restart


What config changes are you making that requires such frequent restarts?

Changes to hosts.cfg, alerts.cfg, analysis.cfg, client-local.cfg; these are
files that get changed most frequently in my environment.  None of which
require a restart to pick up the changes.  I think the only one that *does*
need a restart, would be xymonserver.cfg, and the only component that would
need to be restarted would be xymond, not the full stack.

Erik D. Schminke | Associate Systems Programmer
Hormel Foods Corporation | One Hormel Place | Austin, MN XXXXX
Phone: (XXX) XXX-XXXX
user-15513f33c451@xymon.invalid | www.hormelfoods.com

list Norbert Kriegenburg · Fri, 2 Nov 2018 12:43:45 +0100 ·
Hi Thomas,

thx for your suggestions, but unfortunately this did not catch the issue.
The restart runs without messages, nothing suspicious in the logs.
But after restart and after some minutes all acks are gone.
Also the ack table on bottom of the nongreen page is empty.

My alert.chk file is always up-to-date, a SIGUSR1 does not change anything.
But it is quite large (5,1MB) currently due to the lot of alerts (my access
to one of the DMZ is blocked creating a lot of conn/ssh/rdp alerts).

I wrote a script to mass-ack such alerts, otherwise the noise would be
unmanageable.
This acks all the red conns/ssh/rdp for this DMZ, and i can see the acks in
the nongreen page.
Until next restart...

Btw: In difference what Erik wrote: at least new CLASS settings in
analysis.cfg need a xymon.sh restart to take effect (just checked).

Norbert
quoted from Thomas Eckert


From:	Thomas Eckert <user-2a86d6cd6326@xymon.invalid>
To:	xymon <xymon at xymon.com>
Cc:	Norbert Kriegenburg <user-501bbe9c5409@xymon.invalid>
Date:	11/02/2018 10:45 AM
Subject:	Re: [Xymon] acknowledgements does not survive xymon restart


Hi Norbert,

as I can not remember that I encountered this I just tested this (in a mini
setup, xymon 4.3.28 on Debian9) and my 2 acks survived the xymon-restart.
As your environment is fairly large this could be size/load problem.

Random ideas:
- is the restart “clean” or does something crash during the stop-phase
(logfiles)?
- you could try is to manually force writing a checkpoint-file by sending
SIGUSR1 to xymond before restart.
- if you have a redundant/multi-server setup: Is there any chance that an
other xymon-server is propagating incomplete state (`xymond_distribute` not
enabled)?

All the best
Thomas
      On 01 Nov 2018, at 13:16, Norbert Kriegenburg <
      user-501bbe9c5409@xymon.invalid> wrote:


      Don't get me wrong: i don't do frequent restarts usually, but from
      time to time i need a restart, or the whole server must be restarted
      bc. of patches.
      And as i constantly have to add new checks with new NCV definitions
      the TEST2RRD and SPLITNCV settings in xymonserver.cfg changes, this
      needs also a restart.

      Because we have such a huge numer of servers a lot of departments use
      Xymon regularly (luckily), and use the ack mechanism to organize
      their work (add ticket number to an alert, do some evaluation reports
      and so on).
      To have >100 alerts ack'ed is normal situation.
      And it creates a lot of extra work to restore this.
      In old BB times the acks always survived downtimes and restarts, but
      now there are no more ack files stored in the acks dir, so i thought
      it would be restored from the info in the alert.chk file, what is not
      the case.

      Norbert


      <graycol.gif>EDSchminke---11/01/2018 12:58:35 PM---What config
      changes are you making that requires such frequent restarts? Changes
      to hosts.cfg, alert

      From: user-15513f33c451@xymon.invalid
      To: xymon at xymon.com
      Cc: user-501bbe9c5409@xymon.invalid
      Date: 11/01/2018 12:58 PM
      Subject: Re: [Xymon] acknowledgements does not survive xymon restart


      What config changes are you making that requires such frequent
      restarts?

      Changes to hosts.cfg, alerts.cfg, analysis.cfg, client-local.cfg;
      these are
      files that get changed most frequently in my environment.  None of
      which
      require a restart to pick up the changes.  I think the only one that
      *does*
      need a restart, would be xymonserver.cfg, and the only component that
      would

      need to be restarted would be xymxm-multiack.sh -t conn -c clear -a
      rdp -d 144000 -r "FW blocked" -i de152911
ond, not the full stack.
quoted from Thomas Eckert

      Erik D. Schminke | Associate Systems Programmer
      Hormel Foods Corporation | One Hormel Place | Austin, MN XXXXX
      Phone: (XXX) XXX-XXXX
      user-15513f33c451@xymon.invalid | www.hormelfoods.com

list Tom Diehl · Fri, 2 Nov 2018 09:10:36 -0400 (EDT) ·
quoted from Norbert Kriegenburg
On Thu, 1 Nov 2018, Norbert Kriegenburg wrote:
Experts:

I have a new fairly large xymon environment (4.3.28 with >10k servers), and
of course there are always a lot of alerts acknowledged.

Unfortunately all these acks are resetted if i restart the xymon daemon
(xymon.sh restart) to activate config changes f.e.
This is the same if i use the "Acknowledge alert" from the menu or the
(undocumented) "xymondack" feature of xymon.
This leads to a lot of extra effort and confusion...
Any ideas how i can save the ack situation and restore it after a restart?
You might want to have a look at these old threads. I remember this problem
because it bit me back then.

http://lists.xymon.com/archive/2013-March/037082.html

http://lists.xymon.com/archive/2013-January/036721.html

HTH,

-- 
Tom			user-dcee455aaab0@xymon.invalid
list Norbert Kriegenburg · Fri, 2 Nov 2018 16:27:34 +0100 ·
Tom,

checked the links, and it looked like a similar problem at first sight, but
does not describe my issue.
My env is quite different:

- my xymon runs in the xymon user homedir /home/xymon/server (freshly
compiled from source with configure and make)
- i have the correct xymond.chk and alert.chk in my $XYMONTMP
(/home/xymon/server/tmp), which are still present after i stopped xymon
- the settings in tasks.cfg for xymond and alert are correct (otherwise the
chk files wouldn't be updated regularly)

I have another Xymon installation (much smaller, but same design and
version), where i verified this behaviour.
So seems not related to the number of servers or the size of the chk files.

My tasks.cfg settings:

[xymond]
        ENVFILE /home/xymon/server/etc/xymonserver.cfg
        CMD xymond --pidfile=$XYMONSERVERLOGS/xymond.pid \
                --restart=$XYMONTMP/xymond.chk \
                --checkpoint-file=$XYMONTMP/xymond.chk \
                --checkpoint-interval=600 \
                --log=$XYMONSERVERLOGS/xymond.log \
                --admin-senders=127.0.0.1,$XYMONSERVERIP \
                --ack-each-color \
                --ghosts=match

[alert]
        ENVFILE /home/xymon/server/etc/xymonserver.cfg
        NEEDS xymond
        CMD xymond_channel \
                           --channel=page  \
                           --log=$XYMONSERVERLOGS/alert.log xymond_alert \
                           --checkpoint-file=$XYMONTMP/alert.chk \
                           --checkpoint-interval=600

Norbert
quoted from Tom Diehl


From:	user-dcee455aaab0@xymon.invalid
To:	Norbert Kriegenburg <user-501bbe9c5409@xymon.invalid>
Cc:	xymon at xymon.com
Date:	11/02/2018 02:11 PM
Subject:	Re: [Xymon] acknowledgements does not survive xymon restart


On Thu, 1 Nov 2018, Norbert Kriegenburg wrote:
Experts:

I have a new fairly large xymon environment (4.3.28 with >10k servers),

andErik?
quoted from Tom Diehl
of course there are always a lot of alerts acknowledged.

Unfortunately all these acks are resetted if i restart the xymon daemon
(xymon.sh restart) to activate config changes f.e.
This is the same if i use the "Acknowledge alert" from the menu or the
(undocumented) "xymondack" feature of xymon.
This leads to a lot of extra effort and confusion...
Any ideas how i can save the ack situation and restore it after a
restart?
You might want to have a look at these old threads. I remember this problem
because it bit me back then.

http://lists.xymon.com/archive/2013-March/037082.html


http://lists.xymon.com/archive/2013-January/036721.html


HTH,

--
Tom		 		 		 user-dcee455aaab0@xymon.invalid
list Tom Diehl · Fri, 2 Nov 2018 11:55:57 -0400 (EDT) ·
Hi Norbert,
quoted from Norbert Kriegenburg

On Fri, 2 Nov 2018, Norbert Kriegenburg wrote:
Tom,

checked the links, and it looked like a similar problem at first sight, but
does not describe my issue.
My env is quite different:

- my xymon runs in the xymon user homedir /home/xymon/server (freshly
compiled from source with configure and make)
- i have the correct xymond.chk and alert.chk in my $XYMONTMP
(/home/xymon/server/tmp), which are still present after i stopped xymon
- the settings in tasks.cfg for xymond and alert are correct (otherwise the
chk files wouldn't be updated regularly)
If the chk files are still there after a reboot, then my idea was wrong.

In my case the chk files were getting deleted during a reboot.

Sorry, but that is the only idea I had.


Regards,

Tom