Xymon Mailing List Archive search

Bogus hosts filling up alert.log

8 messages in this thread

list David Mills · Tue, 31 Oct 2017 20:31:10 +0000 ·
Hi, all!

I have recently set up a new Xymon (Xymon 4.3.28-1.el6.terabithia<http://xymon.sourceforge.net/>;) server on RHEL 6.7  that, for the most part, is doing just fine. However, I've discovered that my /var/log/xymon/alert.log file is growing at a crazy rate to the point it periodically needs to zero'd-out or it will swamp the file system.

The problem is every 15 seconds the /var/log/xymon/alert.log file receives a flurry of log entries like this:

...
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_94_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_95_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
2017-10-31 14:53:55 Checking criteria for host '0FS_96_192_168_22_1__export_', which is not yet defined; some alerts may not immediately fire
...

The '0FS_96_192_168_22_1__export_' is actually the name of a host I've defined in the past, but is no longer in xymon's memory (AFAIK!!)

I have reduced the alerts.cfg down to a minimal stub, commenting out the "directory /etc/xymon/alerts.d/..." directive, "rm -r"'d any reference to this host (and others similar to it) under the data files directories (e.g. rrd/, hist/, histlogs/, hostdata/, etc.).

I have gone as far as running "find /etc/xymon -type f | xargs egrep 0FS_" looking for "surprises". I've also stopped / restarted the server and scanned what's active in memory via "xymon localhost xymondboard | egrep 0FS_".

This "host" is not a real client host but an artifact I've created on the server side to represent a file system I'm monitoring in a server-side ext script, so I know it is not announcing it's presence over port 1984. For the life of me I can't figure out where the alerts daemon is running across this hostname.

Help?

~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
David Mills
Systems Administrator
Northrop Grumman
(XXX) XXX-XXXX (mobile)
list John Thurston · Tue, 31 Oct 2017 12:38:27 -0800 ·
quoted from David Mills
On 10/31/2017 12:31 PM, Mills,David (HHSC Contractor) wrote:
This “host” is not a real client host but an artifact I’ve created on
the server side to represent a file system I’m monitoring in a
server-side ext script, so I know it is not announcing it’s presence
over port 1984. For the life of me I can’t figure out where the alerts
daemon is running across this hostname.
Do you get any information if you ask xymond_alert to react in the 
foreground?
xymoncmd xymond_alert --test 0FS_96_192_168_22_1__export_ foo --color=red

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska
list David Mills · Tue, 31 Oct 2017 20:54:46 +0000 ·
Thx, John!

Here's the output, though I'm not quite sure what to make of it:

.../xymon> /usr/share/xymon/bin/xymoncmd /usr/share/xymon/bin/xymond_alert --test 0FS_96_192_168_22_1__export --color=RED
2017-10-31 15:41:18.587126 Host not found in hosts.cfg - assuming it is on the top page
00081791 2017-10-31 15:41:18 send_alert 0FS_96_192_168_22_1__export:--color=RED state Paging
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 121
00081791 2017-10-31 15:41:18 Failed 'GROUP=phys-dba' (group not in include list)
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 124
00081791 2017-10-31 15:41:18 Failed 'GROUP=lgcl-dba' (group not in include list)
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 127
00081791 2017-10-31 15:41:18 Failed 'GROUP=maxi-dba' (group not in include list)
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 131
00081791 2017-10-31 15:41:18 Failed 'GROUP=unix-sadm' (group not in include list)
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 134
00081791 2017-10-31 15:41:18 Failed 'GROUP=windows-sadm' (group not in include list)
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 138
00081791 2017-10-31 15:41:18 Failed 'GROUP=env-mgmt' (group not in include list)
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 141
00081791 2017-10-31 15:41:18 Failed 'GROUP=tools-adm' (group not in include list)
2017-10-31 15:41:18 Checking criteria for host '0FS_96_192_168_22_1__export', which is not yet defined; some alerts may not immediately fire
00081791 2017-10-31 15:41:18 Matching host:service:dgroup:page '0FS_96_192_168_22_1__export:--color=RED:(NULL):' against rule line 146
00081791 2017-10-31 15:41:18 Failed 'GROUP=net-adm' (group not in include list)
quoted from John Thurston

--

~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
David Mills
Systems Administrator
Northrop Grumman
(XXX) XXX-XXXX (mobile)

-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of John Thurston
Sent: Tuesday, October 31, 2017 3:38 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Bogus hosts filling up alert.log

On 10/31/2017 12:31 PM, Mills,David (HHSC Contractor) wrote:
This “host” is not a real client host but an artifact I’ve created on 
the server side to represent a file system I’m monitoring in a 
server-side ext script, so I know it is not announcing it’s presence 
over port 1984. For the life of me I can’t figure out where the alerts 
daemon is running across this hostname.
Do you get any information if you ask xymond_alert to react in the foreground?
xymoncmd xymond_alert --test 0FS_96_192_168_22_1__export_ foo 
--color=red

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska
list John Thurston · Tue, 31 Oct 2017 13:00:17 -0800 ·
quoted from David Mills
On 10/31/2017 12:31 PM, Mills,David (HHSC Contractor) wrote:
This “host” is not a real client host but an artifact I’ve created on
the server side to represent a file system I’m monitoring in a
server-side ext script, so I know it is not announcing it’s presence
over port 1984. For the life of me I can’t figure out where the alerts
daemon is running across this hostname.
Based on the output of your interactive run, I suggest the host named 
"0FS_94_192_168_22_1__export_" is not defined in your hosts.cfg, but the 
xymon server is seeing messages being sent to it under this name.

Does it appear in your 'ghost report'?
If you define this name in hosts.cfg, do the nasty messages cease?
quoted from David Mills

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska
list David Mills · Tue, 31 Oct 2017 21:30:54 +0000 ·
Well, John, I added a bogus entry to what we call our "orphans" page -- part of the hosts.cfg hierarchy -- culled from the current ghostlist report, so Xymon will officially have it in the hosts.cfg hierarchy.

   $ tail -1 orphaned-hosts.cfg
   10.11.22.245    0FS_96_192_168_22_1__export_ # noconn

   $ pkill -HUP 'xymond '

Perhaps because this is not really a host you can interact with (hence the "noconn" tag) Xymon claims it still doesn't know about this host:

   $ /usr/share/xymon/bin/xymon localhost "xymondboard  host=0FS_96_192_168_22_1__export_"
quoted from John Thurston
   $ 


~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
David Mills
Systems Administrator
Northrop Grumman
(XXX) XXX-XXXX (mobile)


-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of John Thurston
Sent: Tuesday, October 31, 2017 4:00 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Bogus hosts filling up alert.log


On 10/31/2017 12:31 PM, Mills,David (HHSC Contractor) wrote:
This “host” is not a real client host but an artifact I’ve created on 
the server side to represent a file system I’m monitoring in a 
server-side ext script, so I know it is not announcing it’s presence 
over port 1984. For the life of me I can’t figure out where the alerts 
daemon is running across this hostname.
Based on the output of your interactive run, I suggest the host named "0FS_94_192_168_22_1__export_" is not defined in your hosts.cfg, but the xymon server is seeing messages being sent to it under this name.

Does it appear in your 'ghost report'?
If you define this name in hosts.cfg, do the nasty messages cease?

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska
list David Mills · Tue, 31 Oct 2017 21:49:40 +0000 ·
Amendment: Eventually, Xymon did see the bogus entry for 0FS_96_192_168_22_1__export_ and it did stop appearing in the alert.cfg

It's another data point, but this is kind of expected behavior and now I have just this bogus artifact hanging around. 

'Any ideas where the alerts daemon gets its list of host names to check against?

;-)
quoted from David Mills

~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
David Mills
Systems Administrator
Northrop Grumman
(XXX) XXX-XXXX (mobile)

-----Original Message-----
From: Mills,David (HHSC Contractor) 
Sent: Tuesday, October 31, 2017 4:31 PM
To: 'John Thurston' <user-ce4d79d99bab@xymon.invalid>; xymon at xymon.com
Subject: RE: [Xymon] Bogus hosts filling up alert.log

Well, John, I added a bogus entry to what we call our "orphans" page -- part of the hosts.cfg hierarchy -- culled from the current ghostlist report, so Xymon will officially have it in the hosts.cfg hierarchy.

   $ tail -1 orphaned-hosts.cfg
   10.11.22.245    0FS_96_192_168_22_1__export_ # noconn

   $ pkill -HUP 'xymond '

Perhaps because this is not really a host you can interact with (hence the "noconn" tag) Xymon claims it still doesn't know about this host:

   $ /usr/share/xymon/bin/xymon localhost "xymondboard  host=0FS_96_192_168_22_1__export_"
   $ 


~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
David Mills
Systems Administrator
Northrop Grumman
(XXX) XXX-XXXX (mobile)


-----Original Message-----
From: Xymon [mailto:xymon-bounces at xymon.com] On Behalf Of John Thurston
Sent: Tuesday, October 31, 2017 4:00 PM
To: xymon at xymon.com
Subject: Re: [Xymon] Bogus hosts filling up alert.log


On 10/31/2017 12:31 PM, Mills,David (HHSC Contractor) wrote:
This “host” is not a real client host but an artifact I’ve created on 
the server side to represent a file system I’m monitoring in a 
server-side ext script, so I know it is not announcing it’s presence 
over port 1984. For the life of me I can’t figure out where the alerts 
daemon is running across this hostname.
Based on the output of your interactive run, I suggest the host named "0FS_94_192_168_22_1__export_" is not defined in your hosts.cfg, but the xymon server is seeing messages being sent to it under this name.

Does it appear in your 'ghost report'?
If you define this name in hosts.cfg, do the nasty messages cease?

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska
list Japheth Cleaver · Tue, 31 Oct 2017 16:05:42 -0700 ·
quoted from David Mills
On 10/31/2017 2:49 PM, Mills,David (HHSC Contractor) wrote:
Amendment: Eventually, Xymon did see the bogus entry for 0FS_96_192_168_22_1__export_ and it did stop appearing in the alert.cfg

It's another data point, but this is kind of expected behavior and now I have just this bogus artifact hanging around.

'Any ideas where the alerts daemon gets its list of host names to check against?

;-)
If there had been a previous alert for the virtual/fake host, it would have been stored in memory, and frozen out into the alerts.chk file (probably in /var/lib/xymon/tmp/ during restarts).

The general reason for the alert is that xymond(_alert) is getting a report about something it doesn't (yet) know about. Usually, this clears a few minutes later as soon as xymond next checks hosts.cfg for changes, since typically that's the action pending to be taken.

Similarly, when a host is removed from hosts.cfg (or a drop command), that message is passed to xymond_alert to clear its record of the alert as well.

Adding the host to hosts.cfg and then dropping it should work for clearing out the errant alert. Alternatively, you can stop xymon (or at least xymond_alert, by adding DISABLED into tasks.cfg), grep out the line for the alert in alerts.chk manually (it's just a normal single-line-record text file, and then bring it back up.

HTH,
-jc
list David Mills · Wed, 1 Nov 2017 15:51:43 +0000 ·
JC --

Brilliant! That solved my problem perfectly. Thanks to John again for lending a hand!

;-)
quoted from Japheth Cleaver


~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
David Mills
Systems Administrator
Northrop Grumman
(XXX) XXX-XXXX (mobile)

-----Original Message-----
From: Japheth Cleaver [mailto:user-87556346d4af@xymon.invalid] 
Sent: Tuesday, October 31, 2017 6:06 PM
To: Mills,David (HHSC Contractor) <user-7037272ac73f@xymon.invalid>; John Thurston <user-ce4d79d99bab@xymon.invalid>; xymon at xymon.com
Subject: Re: [Xymon] Bogus hosts filling up alert.log

On 10/31/2017 2:49 PM, Mills,David (HHSC Contractor) wrote:
Amendment: Eventually, Xymon did see the bogus entry for 
0FS_96_192_168_22_1__export_ and it did stop appearing in the 
alert.cfg

It's another data point, but this is kind of expected behavior and now I have just this bogus artifact hanging around.

'Any ideas where the alerts daemon gets its list of host names to check against?

;-)
quoted from Japheth Cleaver
If there had been a previous alert for the virtual/fake host, it would have been stored in memory, and frozen out into the alerts.chk file (probably in /var/lib/xymon/tmp/ during restarts).

The general reason for the alert is that xymond(_alert) is getting a report about something it doesn't (yet) know about. Usually, this clears a few minutes later as soon as xymond next checks hosts.cfg for changes, since typically that's the action pending to be taken.

Similarly, when a host is removed from hosts.cfg (or a drop command), that message is passed to xymond_alert to clear its record of the alert as well.

Adding the host to hosts.cfg and then dropping it should work for clearing out the errant alert. Alternatively, you can stop xymon (or at least xymond_alert, by adding DISABLED into tasks.cfg), grep out the line for the alert in alerts.chk manually (it's just a normal single-line-record text file, and then bring it back up.

HTH,
-jc