Trouble shooting alerts
list Barrie Parker
Folks:
I've been working on this problem for the last couple of days and it's likely an easy fix, but I can't see it.
I have a Linux test box, ix-test, that I've shut ntpd down on. The status page shows the red "critical" icon for procs. When I drill down into procs, the page is red with ntpd red as I expect.
But I never get an email alert. I can manually email out, so postfix is okay on the xymon server.
The snippet of code from alerts.cfg looks like:
HOST=ix-test SERVICE=ntpd
MAIL barrie DOT user-101a12cb03d9@xymon.invalid REPEAT=10 COLOR=RED
I've gone through literally hundreds of emails in the archives (there's over 7200 with alerts and troubleshooting in them ;). I've gone through what I've seen in them, but I'm still no closer to a solution.
When I do: bin/xymoncmd xymond_alert --dump-config
2011-11-18 09:43:40 Using default environment file /usr/local/xymon/server/etc/xymonserver.cfg
<SNIP>
125 HOST=ix-test SERVICE=ntpd
MAIL barrie DOT user-101a12cb03d9@xymon.invalid FORMAT=TEXT REPEAT=10 COLOR=red
<SNIP>
I gather that's what I should see.
When I do: bin/xymoncmd xymond_alert --test ix-test ntpd
2011-11-18 09:44:24 Using default environment file /usr/local/xymon/server/etc/xymonserver.cfg
00016177 2011-11-18 09:44:24 send_alert ix-test:ntpd state Paging
00016177 2011-11-18 09:44:24 Matching host:service:dgroup:page 'ix-test:ntpd:Math:' against rule line 122
00016177 2011-11-18 09:44:24 Failed 'HOST=rss03 SERVICE=syslog-ng' (hostname not in include list)
00016177 2011-11-18 09:44:24 Matching host:service:dgroup:page 'ix-test:ntpd:Math:' against rule line 125
00016177 2011-11-18 09:44:24 *** Match with 'HOST=ix-test SERVICE=ntpd' ***
00016177 2011-11-18 09:44:24 Matching host:service:dgroup:page 'ix-test:ntpd:Math:' against rule line 126
00016177 2011-11-18 09:44:24 *** Match with 'MAIL barrie DOT user-101a12cb03d9@xymon.invalid REPEAT=10 COLOR=RED' ***
00016177 2011-11-18 09:44:24 Mail alert with command 'mail -s "Xymon [12345] ix-test:ntpd CRITICAL (RED)" barrie DOT user-101a12cb03d9@xymon.invalid'
<SNIP>
That would seem to indicate success reading and parsing the alerts file.
I've tried to debug with capturing the output of:./bbcmd --env=../etc/xymonserver.cfg xymond_channel --channel=page cat
and then using the code between
@@page#339/ix-test|1321631963.933541|142.3.156.124|ix-test|procs|142.3.156.124|13
21633763|red|red|1321625351||647263|linux|linux||
and
@@
and then feeding it to:
bin/xymoncmd
2011-11-18 10:48:43 Using default environment file /usr/local/xymon/server/etc/x ymonserver.cfg
rss03:/usr/local/xymon/server> bin/xymond_alert --debug <input.txt
17777 2011-11-18 10:49:04 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbyte s=0, bufleft=266239
17777 2011-11-18 10:49:04 Got 8542 bytes
17777 2011-11-18 10:49:04 xymond_alert: Got message 339 @@page#339/ix-test|13216 31963.933541|142.3.156.124|ix-test|procs|142.3.156.124|1321633763|red|red|132162 5351||647263|linux|linux||
17777 2011-11-18 10:49:04 startpos 8541, fillpos 8542, endpos -1
17777 2011-11-18 10:49:04 Got page message from ix-test:procs
17777 2011-11-18 10:49:04 Alert status changed from 0 to 1
17777 2011-11-18 10:49:04 Found no first matching rule
17777 2011-11-18 10:49:04 Opening file /usr/local/xymon/server/etc/alerts.cfg
17777 2011-11-18 10:49:04 Opening file /usr/local/xymon/server/etc/holidays.cfg
17777 2011-11-18 10:49:04 Transport setup is:
17777 2011-11-18 10:49:04 xymondportnumber = 1984
17777 2011-11-18 10:49:04 xymonproxyhost = NONE
17777 2011-11-18 10:49:04 xymonproxyport = 0
17777 2011-11-18 10:49:04 Recipient listed as '142.3.156.39'
17777 2011-11-18 10:49:04 Standard protocol on port 1984
17777 2011-11-18 10:49:04 Will connect to address 142.3.156.39 port 1984
17777 2011-11-18 10:49:04 Connect status is 0
17777 2011-11-18 10:49:04 Sent 16 bytes
17777 2011-11-18 10:49:04 Read 1437 bytes
17777 2011-11-18 10:49:04 Closing connection
17777 2011-11-18 10:49:04 Found no first matching rule
17777 2011-11-18 10:49:04 cleanup_alert called for host ix-test, test procs
17777 2011-11-18 10:49:04 0 alerts to go
2011-11-18 10:49:04 Bad data in channel, skipping it
2011-11-18 10:49:04 Buffer sync lost, flushing data
17777 2011-11-18 10:49:04 Want msg 1, startpos 0, fillpos 0, endpos -1, usedbyte s=0, bufleft=266239
17777 2011-11-18 10:49:04 get_xymond_message: Returning NULL due to EOF
rss03:/usr/local/xymon/server>
I see the line (twice) "Found no first matching rule" but I'm not sure how to interpret this.
Oh and xymon is 4.3.5 on the server and it should be 4.3.2 on the monitored host.
Thank you.
Regards,
Barrie.
list Tim McCloskey
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Barrie Parker [user-7beef34d130e@xymon.invalid] Sent: Friday, November 18, 2011 9:22 AM To: xymon at xymon.com Subject: [Xymon] Trouble shooting alerts ...snip... The status page shows the red "critical" icon for procs. When I drill down into procs ...snip... Should the alert be for 'procs', not ntpd? Tim
list Bruce White
I think Tim found the issue. Click on the info page for the server and see if the alert shows up in the alerts box. If the server has no ntpd test, then no alert will appear in the alerts box. Bruce White Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Tim McCloskey
▸
Sent: Friday, November 18, 2011 11:46 AM
To: Barrie Parker; xymon at xymon.com
Subject: Re: [Xymon] Trouble shooting alerts
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of
Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Friday, November 18, 2011 9:22 AM
To: xymon at xymon.com
Subject: [Xymon] Trouble shooting alerts
...snip...
The status page shows the red "critical" icon for procs. When I drill
down into procs ...snip...
Should the alert be for 'procs', not ntpd?
Tim
list Bruce White
I think Tim found the issue. Click on the info page for the server and see if the alert shows up in the alerts box. If the server has no ntpd test, then no alert will appear in the alerts box. Bruce White Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Tim McCloskey Sent: Friday, November 18, 2011 11:46 AM To: Barrie Parker; xymon at xymon.com Subject: Re: [Xymon] Trouble shooting alerts From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Barrie Parker [user-7beef34d130e@xymon.invalid] Sent: Friday, November 18, 2011 9:22 AM To: xymon at xymon.com Subject: [Xymon] Trouble shooting alerts ...snip... The status page shows the red "critical" icon for procs. When I drill down into procs ...snip... Should the alert be for 'procs', not ntpd? Tim
list Barrie Parker
Thanks guys. Alerting using procs does send an alert as expected for all monitored processes/daemons/services/etc. So that was my misunderstanding. And, yes, drilling down on info shows an alert when I use procs and no alert when I use ntpd by itself. I hadn't understood to look at "info" previously; thank you for the pointer. I had looked at the protocols doc late on Friday and I'm starting to understand xymon's behavior better. Since the doc for protocols says that the testing is for TCP-based services, I gather that I would need to get at UDP-based services in some other way. I'm assuming that for something like ntpd, I'd need to build a shell script to run ntpq and parse that output, but I'm not sure. Can someone point me at the appropriate document or at a code sample to look at? Thank you all again.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
▸
I think Tim found the issue. Click on the info page for the server and see if the alert shows up in the alerts box. If the server has no ntpd test, then no alert will appear in the alerts box. Bruce White Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Tim McCloskey Sent: Friday, November 18, 2011 11:46 AM To: Barrie Parker; xymon at xymon.com Subject: Re: [Xymon] Trouble shooting alerts From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Barrie Parker [user-7beef34d130e@xymon.invalid] Sent: Friday, November 18, 2011 9:22 AM To: xymon at xymon.com Subject: [Xymon] Trouble shooting alerts ...snip... The status page shows the red "critical" icon for procs. When I drill down into procs ...snip... Should the alert be for 'procs', not ntpd? Tim
list Tim McCloskey
Hi, Depending on what you want to test, many TCP/UDP services are already available. Simply add service-name to the hosts file (bb-hosts in older versions). 10.10.10.10 ns1.time.foo.asfdasdfasdfasfd.com ntp dns Tim
▸
From: Barrie Parker [user-7beef34d130e@xymon.invalid]
Sent: Monday, November 21, 2011 8:53 AM
To: Bruce White; Tim McCloskey; xymon at xymon.com
Subject: RE: [Xymon] Trouble shooting alerts
Thanks guys.
Alerting using procs does send an alert as expected for all monitored processes/daemons/services/etc. So that was my misunderstanding. And, yes, drilling down on info shows an alert when I use procs and no alert when I use ntpd by itself. I hadn't understood to look at "info" previously; thank you for the pointer.
I had looked at the protocols doc late on Friday and I'm starting to understand xymon's behavior better.
Since the doc for protocols says that the testing is for TCP-based services, I gather that I would need to get at UDP-based services in some other way. I'm assuming that for something like ntpd, I'd need to build a shell script to run ntpq and parse that output, but I'm not sure.
Can someone point me at the appropriate document or at a code sample to look at?
Thank you all again.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
I think Tim found the issue. Click on the info page for the server and see if the alert shows up in the alerts box. If the server has no ntpd test, then no alert will appear in the alerts box. Bruce White Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Tim McCloskey Sent: Friday, November 18, 2011 11:46 AM To: Barrie Parker; xymon at xymon.com Subject: Re: [Xymon] Trouble shooting alerts From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Barrie Parker [user-7beef34d130e@xymon.invalid] Sent: Friday, November 18, 2011 9:22 AM To: xymon at xymon.com Subject: [Xymon] Trouble shooting alerts ...snip... The status page shows the red "critical" icon for procs. When I drill down into procs ...snip... Should the alert be for 'procs', not ntpd? Tim
list Bruce White
There is a built in network test for ntp. Check the help for the old -
bb-hosts new - host.cfg. It is listed in the OTHER NETWORK TESTS
section.
.......Bruce
▸
Bruce White Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.
▸
From: Barrie Parker [mailto:user-7beef34d130e@xymon.invalid]
Sent: Monday, November 21, 2011 10:53 AM
To: White, Bruce; Tim McCloskey; xymon at xymon.com
Subject: RE: [Xymon] Trouble shooting alerts
Thanks guys.
Alerting using procs does send an alert as expected for all monitored
processes/daemons/services/etc. So that was my misunderstanding. And,
yes, drilling down on info shows an alert when I use procs and no alert
when I use ntpd by itself. I hadn't understood to look at "info"
previously; thank you for the pointer.
I had looked at the protocols doc late on Friday and I'm starting to
understand xymon's behavior better.
Since the doc for protocols says that the testing is for TCP-based
services, I gather that I would need to get at UDP-based services in
some other way. I'm assuming that for something like ntpd, I'd need to
build a shell script to run ntpq and parse that output, but I'm not
sure.
Can someone point me at the appropriate document or at a code sample to
look at?
Thank you all again.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
I think Tim found the issue. Click on the info page for the server and see if the alert shows up in the alerts box. If the server has no ntpd test, then no alert will appear in the alerts box. Bruce White Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Tim McCloskey Sent: Friday, November 18, 2011 11:46 AM To: Barrie Parker; xymon at xymon.com Subject: Re: [Xymon] Trouble shooting alerts From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Barrie Parker [user-7beef34d130e@xymon.invalid] Sent: Friday, November 18, 2011 9:22 AM To: xymon at xymon.com Subject: [Xymon] Trouble shooting alerts ...snip... The status page shows the red "critical" icon for procs. When I drill down into procs ...snip... Should the alert be for 'procs', not ntpd? Tim
list Barrie Parker
I apologize -- I probably should have been clearer on this. We use Novell's (read Attachmate's) SLES. In SLES 11 SP1 ntpdate doesn't work (running /usr/sbin/ntpdate gives a whole warning page of stuff -- ntpdate is deprecated (see ntp.org for more info), use ntpd -q, use rcntp ntptimeset and on on...). But ntpdate doesn't do anything, so I think that the built-in ntp of xymon won't work in this case. So, for me to get this to work, it looks like I would have to mangle xymonnet.c (not great for future xymon updates) or try to figure out how to do UDP tests (which, for me, is probably easier said than done). Or I just let it go. I need to cipher on that. But I do thank you again Bruce and Tim for your help and pointers. Regards, Barrie.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 11:06 AM >>>
▸
There is a built in network test for ntp. Check the help for the old - bb-hosts new – host.cfg. It is listed in the OTHER NETWORK TESTS section.
…….Bruce
Bruce White
Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | www.fellowes.com
▸
Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc.
From: Barrie Parker [mailto:user-7beef34d130e@xymon.invalid]
Sent: Monday, November 21, 2011 10:53 AM
To: White, Bruce; Tim McCloskey; xymon at xymon.com
Subject: RE: [Xymon] Trouble shooting alerts
Thanks guys.
Alerting using procs does send an alert as expected for all monitored processes/daemons/services/etc. So that was my misunderstanding. And, yes, drilling down on info shows an alert when I use procs and no alert when I use ntpd by itself. I hadn't understood to look at "info" previously; thank you for the pointer.
I had looked at the protocols doc late on Friday and I'm starting to understand xymon's behavior better.
Since the doc for protocols says that the testing is for TCP-based services, I gather that I would need to get at UDP-based services in some other way. I'm assuming that for something like ntpd, I'd need to build a shell script to run ntpq and parse that output, but I'm not sure.
Can someone point me at the appropriate document or at a code sample to look at?
Thank you all again.
"White, Bruce" <user-58f975e8bf9d@xymon.invalid> 11/21/2011 8:00 AM >>>
I think Tim found the issue. Click on the info page for the server and see if the alert shows up in the alerts box. If the server has no ntpd test, then no alert will appear in the alerts box. Bruce White Senior Enterprise Systems Engineer | Phone: X-XXX-XXX-XXXX | Fax: XXX-XXX-XXXX | user-58f975e8bf9d@xymon.invalid | http://www.fellowes.com/ Disclaimer: The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Fellowes, Inc. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Tim McCloskey Sent: Friday, November 18, 2011 11:46 AM To: Barrie Parker; xymon at xymon.com Subject: Re: [Xymon] Trouble shooting alerts From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] On Behalf Of Barrie Parker [user-7beef34d130e@xymon.invalid] Sent: Friday, November 18, 2011 9:22 AM To: xymon at xymon.com Subject: [Xymon] Trouble shooting alerts ...snip... The status page shows the red "critical" icon for procs. When I drill down into procs ...snip... Should the alert be for 'procs', not ntpd? Tim