Xymon Mailing List Archive search

Trouble shooting alerts

8 messages in this thread

list Barrie Parker · Fri, 18 Nov 2011 11:22:21 -0600 ·
Folks:

I've been working on this problem for the last couple of days and it's likely an easy fix, but I can't see it. 

I have a Linux test box, ix-test, that I've shut ntpd down on. The status page shows the red "critical" icon for procs. When I drill down into procs, the page is red with ntpd red as I expect.

But I never get an email alert. I can manually email out, so postfix is okay on the xymon server.

The snippet of code from alerts.cfg looks like:

HOST=ix-test    SERVICE=ntpd
                MAIL barrie DOT user-101a12cb03d9@xymon.invalid REPEAT=10 COLOR=RED


I've gone through literally hundreds of emails in the archives (there's over 7200 with alerts and troubleshooting in them ;). I've gone through what I've seen in them, but I'm still no closer to a solution.

When I do: bin/xymoncmd xymond_alert --dump-config
2011-11-18 09:43:40 Using default environment file /usr/local/xymon/server/etc/xymonserver.cfg
  <SNIP>

  125   HOST=ix-test SERVICE=ntpd
        MAIL barrie DOT user-101a12cb03d9@xymon.invalid FORMAT=TEXT REPEAT=10 COLOR=red

<SNIP>

I gather that's what I should see. 

When I do:  bin/xymoncmd xymond_alert --test ix-test ntpd
2011-11-18 09:44:24 Using default environment file /usr/local/xymon/server/etc/xymonserver.cfg
00016177 2011-11-18 09:44:24 send_alert ix-test:ntpd state Paging
00016177 2011-11-18 09:44:24 Matching host:service:dgroup:page 'ix-test:ntpd:Math:' against rule line 122
00016177 2011-11-18 09:44:24 Failed 'HOST=rss03 SERVICE=syslog-ng' (hostname not in include list)
00016177 2011-11-18 09:44:24 Matching host:service:dgroup:page 'ix-test:ntpd:Math:' against rule line 125
00016177 2011-11-18 09:44:24 *** Match with 'HOST=ix-test    SERVICE=ntpd' ***
00016177 2011-11-18 09:44:24 Matching host:service:dgroup:page 'ix-test:ntpd:Math:' against rule line 126
00016177 2011-11-18 09:44:24 *** Match with 'MAIL barrie DOT user-101a12cb03d9@xymon.invalid REPEAT=10 COLOR=RED' ***
00016177 2011-11-18 09:44:24 Mail alert with command 'mail -s "Xymon [12345] ix-test:ntpd CRITICAL (RED)" barrie DOT user-101a12cb03d9@xymon.invalid' 
<SNIP>

That would seem to indicate success reading and parsing the alerts file.

I've tried to debug with capturing the output of:./bbcmd --env=../etc/xymonserver.cfg xymond_channel --channel=page cat
and then using the code between
@@page#339/ix-test|1321631963.933541|142.3.156.124|ix-test|procs|142.3.156.124|13 
21633763|red|red|1321625351||647263|linux|linux||

and

@@


and then feeding it to:

bin/xymoncmd
2011-11-18 10:48:43 Using default environment file /usr/local/xymon/server/etc/x ymonserver.cfg
rss03:/usr/local/xymon/server> bin/xymond_alert --debug <input.txt
17777 2011-11-18 10:49:04 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbyte s=0, bufleft=266239
17777 2011-11-18 10:49:04 Got 8542 bytes
17777 2011-11-18 10:49:04 xymond_alert: Got message 339 @@page#339/ix-test|13216 31963.933541|142.3.156.124|ix-test|procs|142.3.156.124|1321633763|red|red|132162 5351||647263|linux|linux||
17777 2011-11-18 10:49:04 startpos 8541, fillpos 8542, endpos -1
17777 2011-11-18 10:49:04 Got page message from ix-test:procs
17777 2011-11-18 10:49:04 Alert status changed from 0 to 1
17777 2011-11-18 10:49:04 Found no first matching rule
17777 2011-11-18 10:49:04 Opening file /usr/local/xymon/server/etc/alerts.cfg
17777 2011-11-18 10:49:04 Opening file /usr/local/xymon/server/etc/holidays.cfg
17777 2011-11-18 10:49:04 Transport setup is:
17777 2011-11-18 10:49:04 xymondportnumber = 1984
17777 2011-11-18 10:49:04 xymonproxyhost = NONE
17777 2011-11-18 10:49:04 xymonproxyport = 0
17777 2011-11-18 10:49:04 Recipient listed as '142.3.156.39'
17777 2011-11-18 10:49:04 Standard protocol on port 1984
17777 2011-11-18 10:49:04 Will connect to address 142.3.156.39 port 1984
17777 2011-11-18 10:49:04 Connect status is 0
17777 2011-11-18 10:49:04 Sent 16 bytes
17777 2011-11-18 10:49:04 Read 1437 bytes
17777 2011-11-18 10:49:04 Closing connection
17777 2011-11-18 10:49:04 Found no first matching rule
17777 2011-11-18 10:49:04 cleanup_alert called for host ix-test, test procs
17777 2011-11-18 10:49:04 0 alerts to go
2011-11-18 10:49:04 Bad data in channel, skipping it
2011-11-18 10:49:04 Buffer sync lost, flushing data
17777 2011-11-18 10:49:04 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbyte s=0, bufleft=266239
17777 2011-11-18 10:49:04 get_xymond_message: Returning NULL due to EOF
rss03:/usr/local/xymon/server>

I see the line (twice) "Found no first matching rule"  but I'm not sure how to interpret this.

Oh and xymon is 4.3.5 on the server and it should be 4.3.2 on the monitored host.

Thank you.
Regards,
Barrie.
list Tim McCloskey · Fri, 18 Nov 2011 09:45:49 -0800 ·
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts

...snip...
The status page shows the red "critical" icon for procs. When I drill down into procs
...snip...


Should the alert be for 'procs', not ntpd?

Tim
list Bruce White · Mon, 21 Nov 2011 08:00:28 -0600 ·
I think Tim found the issue.  Click on the info page for the server and
see if the alert shows up in the alerts box.  If the server has no ntpd
test, then no alert will appear in the alerts box.


 
 Bruce White
 Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/
 
 
 
Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.
 
-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Tim McCloskey
quoted from Tim McCloskey
Sent: Friday, November 18, 2011 11:46 AM
To: Barrie Parker; xymon at xymon.com
Subject: Re: [Xymon] Trouble shooting alerts


From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of
Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts

...snip...
The status page shows the red "critical" icon for procs. When I drill
down into procs ...snip...


Should the alert be for 'procs', not ntpd?

Tim
list Bruce White · Mon, 21 Nov 2011 08:00:28 -0600 ·
I think Tim found the issue.  Click on the info page for the server and
see if the alert shows up in the alerts box.  If the server has no ntpd
test, then no alert will appear in the alerts box.


 
 Bruce White
 Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/
 
 
 
Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.
 
-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Tim McCloskey
Sent: Friday, November 18, 2011 11:46 AM
To: Barrie Parker; xymon at xymon.com
Subject: Re: [Xymon] Trouble shooting alerts


From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of
Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts

...snip...
The status page shows the red "critical" icon for procs. When I drill
down into procs ...snip...


Should the alert be for 'procs', not ntpd?

Tim
list Barrie Parker · Mon, 21 Nov 2011 10:53:13 -0600 ·
Thanks guys.
 
Alerting using procs does send an alert as expected for all monitored processes/daemons/services/etc. So that was my misunderstanding. And, yes, drilling down on info shows an alert when I use procs and no alert when I use ntpd by itself. I hadn't understood to look at "info" previously; thank you for the pointer.
I had looked at the protocols doc late on Friday and I'm starting to understand xymon's behavior better.
 
Since the doc for protocols says that the testing is for TCP-based services, I gather that I would need to get at UDP-based services in some other way. I'm assuming that for something like ntpd, I'd need to build a shell script to run ntpq and parse that output, but I'm not sure.
 
Can someone point me at the appropriate document or at a code sample to look at?
 
Thank you all again.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
quoted from Bruce White
I think Tim found the issue.  Click on the info page for the server and
see if the alert shows up in the alerts box.  If the server has no ntpd
test, then no alert will appear in the alerts box.


Bruce White
Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/


Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Tim McCloskey
Sent: Friday, November 18, 2011 11:46 AM
To: Barrie Parker; xymon at xymon.com
Subject: Re: [Xymon] Trouble shooting alerts


From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of
Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts

...snip...
The status page shows the red "critical" icon for procs. When I drill
down into procs ...snip...


Should the alert be for 'procs', not ntpd?

Tim
list Tim McCloskey · Mon, 21 Nov 2011 09:04:05 -0800 ·
Hi, 

Depending on what you want to test, many TCP/UDP services are already available.  Simply add service-name to the hosts file (bb-hosts in older versions).

10.10.10.10    ns1.time.foo.asfdasdfasdfasfd.com      ntp dns


Tim
quoted from Barrie Parker

From: Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Monday, November 21, 2011 8:53 AM
To: Bruce White; Tim McCloskey; xymon at xymon.com
Subject: RE: [Xymon] Trouble shooting alerts

Thanks guys.

Alerting using procs does send an alert as expected for all monitored processes/daemons/services/etc. So that was my misunderstanding. And, yes, drilling down on info shows an alert when I use procs and no alert when I use ntpd by itself. I hadn't understood to look at "info" previously; thank you for the pointer.
I had looked at the protocols doc late on Friday and I'm starting to understand xymon's behavior better.

Since the doc for protocols says that the testing is for TCP-based services, I gather that I would need to get at UDP-based services in some other way. I'm assuming that for something like ntpd, I'd need to build a shell script to run ntpq and parse that output, but I'm not sure.

Can someone point me at the appropriate document or at a code sample to look at?

Thank you all again.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
I think Tim found the issue.  Click on the info page for the server and
see if the alert shows up in the alerts box.  If the server has no ntpd
test, then no alert will appear in the alerts box.


Bruce White
Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/


Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Tim McCloskey
Sent: Friday, November 18, 2011 11:46 AM
To: Barrie Parker; xymon at xymon.com
Subject: Re: [Xymon] Trouble shooting alerts


From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of
Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts

...snip...
The status page shows the red "critical" icon for procs. When I drill
down into procs ...snip...


Should the alert be for 'procs', not ntpd?

Tim
list Bruce White · Mon, 21 Nov 2011 11:06:04 -0600 ·
There is a built in network test for ntp.  Check the help for the old -
bb-hosts new - host.cfg.  It is listed in the OTHER NETWORK TESTS
section.

 
     .......Bruce
signature

 
 
 Bruce White
 Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/
 
 
 
Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.
 

quoted from Barrie Parker
From: Barrie Parker [mailto:user-7beef34d130e@xymon.invalid] 
Sent: Monday, November 21, 2011 10:53 AM
To: White, Bruce; Tim McCloskey; xymon at xymon.com
Subject: RE: [Xymon] Trouble shooting alerts

 
Thanks guys.

 
Alerting using procs does send an alert as expected for all monitored
processes/daemons/services/etc. So that was my misunderstanding. And,
yes, drilling down on info shows an alert when I use procs and no alert
when I use ntpd by itself. I hadn't understood to look at "info"
previously; thank you for the pointer.

I had looked at the protocols doc late on Friday and I'm starting to
understand xymon's behavior better.

 
Since the doc for protocols says that the testing is for TCP-based
services, I gather that I would need to get at UDP-based services in
some other way. I'm assuming that for something like ntpd, I'd need to
build a shell script to run ntpq and parse that output, but I'm not
sure.

 
Can someone point me at the appropriate document or at a code sample to
look at?

 
Thank you all again.

"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
I think Tim found the issue.  Click on the info page for the server and
see if the alert shows up in the alerts box.  If the server has no ntpd
test, then no alert will appear in the alerts box.


Bruce White
Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax:
XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/


Disclaimer: The information contained in this message may be privileged
and confidential and protected from disclosure. If the reader of this
message is not the intended recipient or an employee or agent
responsible for delivering this message to the intended recipient, you
are hereby notified that any dissemination, distribution or copying of
this communication is strictly prohibited. If you have received this
communication in error, please notify us immediately by replying to the
message and deleting it from your computer. Thank you. Fellowes, Inc.

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Tim McCloskey
Sent: Friday, November 18, 2011 11:46 AM
To: Barrie Parker; xymon at xymon.com
Subject: Re: [Xymon] Trouble shooting alerts


From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of
Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts

...snip...
The status page shows the red "critical" icon for procs. When I drill
down into procs ...snip...


Should the alert be for 'procs', not ntpd?

Tim
list Barrie Parker · Tue, 22 Nov 2011 14:09:57 -0600 ·
I apologize -- I probably should have been clearer on this.

We use Novell's (read Attachmate's) SLES. In SLES 11 SP1 ntpdate doesn't work (running /usr/sbin/ntpdate gives a whole warning page of stuff -- ntpdate is deprecated (see ntp.org for more info), use ntpd -q, use rcntp ntptimeset and on on...). But ntpdate doesn't do anything, so I think that the built-in ntp of xymon won't work in this case.

So, for me to get this to work, it looks like I would have to mangle xymonnet.c (not great for future xymon updates) or try to figure out how to do UDP tests (which, for me, is probably easier said than done). Or I just let it go. I need to cipher on that.

But I do thank you again Bruce and Tim for your help and pointers.

Regards,
Barrie.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 11:06 AM >>>
quoted from Bruce White
There is a built in network test for ntp.  Check the help for the old -  bb-hosts new – host.cfg.  It is listed in the OTHER NETWORK TESTS section.
 
     …….Bruce
 

 Bruce White

 Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | www.fellowes.com
quoted from Bruce White

 
Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.


From: Barrie Parker [mailto:user-7beef34d130e@xymon.invalid] 
Sent: Monday, November 21, 2011 10:53 AM
To: White, Bruce; Tim McCloskey; xymon at xymon.com
Subject: RE: [Xymon] Trouble shooting alerts
 
Thanks guys.
 
Alerting using procs does send an alert as expected for all monitored processes/daemons/services/etc. So that was my misunderstanding. And, yes, drilling down on info shows an alert when I use procs and no alert when I use ntpd by itself. I hadn't understood to look at "info" previously; thank you for the pointer.
I had looked at the protocols doc late on Friday and I'm starting to understand xymon's behavior better.
 
Since the doc for protocols says that the testing is for TCP-based services, I gather that I would need to get at UDP-based services in some other way. I'm assuming that for something like ntpd, I'd need to build a shell script to run ntpq and parse that output, but I'm not sure.
 
Can someone point me at the appropriate document or at a code sample to look at?
 
Thank you all again.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
I think Tim found the issue.  Click on the info page for the server and
see if the alert shows up in the alerts box.  If the server has no ntpd
test, then no alert will appear in the alerts box.


Bruce White
Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/


Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Tim McCloskey
Sent: Friday, November 18, 2011 11:46 AM
To: Barrie Parker; xymon at xymon.com
Subject: Re: [Xymon] Trouble shooting alerts


From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of
Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts

...snip...
The status page shows the red "critical" icon for procs. When I drill
down into procs ...snip...


Should the alert be for 'procs', not ntpd?

Tim