more problems with acks/Cookies in 4.3
list Sean Clark
I had an issue with xymon not acknowledging events in the xymondboard in version 4.2.3 This has continued in 4.3.0 I run ~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack 212940 500 this is a test ack" 22832 2011-04-01 11:01:36 Transport setup is: 22832 2011-04-01 11:01:36 xymondportnumber = 1984 22832 2011-04-01 11:01:36 xymonproxyhost = NONE 22832 2011-04-01 11:01:36 xymonproxyport = 0 22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180' 22832 2011-04-01 11:01:36 Standard protocol on port 1984 22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984 22832 2011-04-01 11:01:36 Connect status is 0 22832 2011-04-01 11:01:36 Sent 39 bytes 22832 2011-04-01 11:01:36 Closing connection In xymond.log I get 2011-04-01 11:01:36 Cookie 212940 not found, dropping ack Xymondlog shows ~xymon/server/bin/xymon 10.10.8.180 "xymondlog db-03.subdomain.domain.com.ipmi" db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0|0|10.10.8.134|212940|||N| red Fri Apr 1 11:05:01 EDT 2011 - IPMI FAILURE <p>&red One or more components below has a failure!<p><br>&yellow Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory\r<br>&red Get Device ID command failed\r<br>&yellow Unable to open SDR for reading\r unified-ipmi.pl version - 1.0 Which shows the Cookie right in there I am stumped, what could be causing this? This is repeatable in that it happens for several hosts in xymon – but not the same host/test pair consistently, and I can acknowledge other things while it is not finding this cookie. Additionally, putting in in maintenance for 1 minute will allow it to be acknowledged after it comes out of maintenance , because it will get a new cookie. This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
list Sean Clark
Could it be a buffer size that I need to increase in the compile? I.e.
It's not finding the cookie in the rb tree, even after it looks it up?
Here is roughly the number of host/tests I have
~xymon/server/bin/xymon localhost "hobbitdboard fields=color" | sort -n |
uniq -c | sort -n
9 none
91 purple
163 red
192 blue
1870 clear
2476 yellow
68797 green
--
Sean Clark
Sr. Engineer, Software
ATG Network Operations & Planning Integrated Regional OSS
<http://www.twcable.com/DepartmentOverview/AdvancedTechnologyGroup/ATG/NOP/
OSS/Network.aspx>
user-2db5fbcae9a7@xymon.invalid <mailto:user-2db5fbcae9a7@xymon.invalid> devaudio
<aim://devaudio> <mailto:user-2db5fbcae9a7@xymon.invalid>
Office: (XXX) XXX-XXXX cell: (XXX) XXX-XXXX
▸
On 4/1/11 11:13 AM, "Clark, Sean" <user-2db5fbcae9a7@xymon.invalid> wrote:
I had an issue with xymon not acknowledging events in the xymondboard in version 4.2.3 This has continued in 4.3.0 I run ~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack 212940 500 this is a test ack" 22832 2011-04-01 11:01:36 Transport setup is: 22832 2011-04-01 11:01:36 xymondportnumber = 1984 22832 2011-04-01 11:01:36 xymonproxyhost = NONE 22832 2011-04-01 11:01:36 xymonproxyport = 0 22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180' 22832 2011-04-01 11:01:36 Standard protocol on port 1984 22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984 22832 2011-04-01 11:01:36 Connect status is 0 22832 2011-04-01 11:01:36 Sent 39 bytes 22832 2011-04-01 11:01:36 Closing connection In xymond.log I get 2011-04-01 11:01:36 Cookie 212940 not found, dropping ack Xymondlog shows ~xymon/server/bin/xymon 10.10.8.180 "xymondlog db-03.subdomain.domain.com.ipmi"
db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0|0|
10.10.8.134|212940|||N|
▸
red Fri Apr 1 11:05:01 EDT 2011 - IPMI FAILURE
<p>&red One or more components below has a failure!<p><br>&yellow Could
not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such
file or directory\r<br>&red Get Device ID command failed\r<br>&yellow
Unable to open SDR for reading\r
unified-ipmi.pl version - 1.0
Which shows the Cookie right in there
I am stumped, what could be causing this?
This is repeatable in that it happens for several hosts in xymon but
not the same host/test pair consistently, and I can acknowledge other
things while it is not finding this cookie. Additionally, putting in in
maintenance for 1 minute will allow it to be acknowledged after it comes
out of maintenance , because it will get a new cookie.
This E-mail and any of its attachments may contain Time Warner Cable
proprietary information, which is privileged, confidential, or subject to
copyright belonging to Time Warner Cable. This E-mail is intended solely
for the use of the individual or entity to which it is addressed. If you
are not the intended recipient of this E-mail, you are hereby notified
that any dissemination, distribution, copying, or action taken in
relation to the contents of and attachments to this E-mail is strictly
prohibited and may be unlawful. If you have received this E-mail in
error, please notify the sender immediately and permanently delete the
original and any copy of this E-mail and any printout.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
list Darin D [eit] Dugan
For what it's worth, I also have the exact same ack problem on occasion but haven't tracked it down either. I've also taken the approach of disabling the alert for a short time and then acking the new alert, or just dealing with the repeated alerts until it's fixed. Motivates you to fix things more quickly (when possible). This is with a pretty old snapshot from the 4.3.0 branch. I'll be updating to the final 4.3.0 release Real Soon Now. Was hoping that would magically fix the issue but I guess not. FYI, my number of tests is an order of magnitude smaller than yours. After poring through lots of Cisco documentation today I think looking at some Xymon source would be a welcome break... Off to research. Cheers.
▸
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Clark, Sean Sent: Friday, April 01, 2011 2:03 PM To: Clark, Sean; xymon at xymon.com Subject: Re: [Xymon] more problems with acks/Cookies in 4.3 Could it be a buffer size that I need to increase in the compile? I.e. It's not finding the cookie in the rb tree, even after it looks it up? Here is roughly the number of host/tests I have ~xymon/server/bin/xymon localhost "hobbitdboard fields=color" | sort -n | uniq -c | sort -n 9 none 91 purple 163 red 192 blue 1870 clear 2476 yellow 68797 green -- Sean Clark Sr. Engineer, Software ATG Network Operations & Planning Integrated Regional OSS <http://www.twcable.com/DepartmentOverview/AdvancedTechnologyGroup/ATG/NOP/ OSS/Network.aspx> user-2db5fbcae9a7@xymon.invalid <mailto:user-2db5fbcae9a7@xymon.invalid> devaudio <aim://devaudio> <mailto:user-2db5fbcae9a7@xymon.invalid> Office: (XXX) XXX-XXXX cell: (XXX) XXX-XXXX On 4/1/11 11:13 AM, "Clark, Sean" <user-2db5fbcae9a7@xymon.invalid> wrote:
I had an issue with xymon not acknowledging events in the xymondboard in version 4.2.3 This has continued in 4.3.0 I run ~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack 212940 500 this is a test ack" 22832 2011-04-01 11:01:36 Transport setup is: 22832 2011-04-01 11:01:36 xymondportnumber = 1984 22832 2011-04-01 11:01:36 xymonproxyhost = NONE 22832 2011-04-01 11:01:36 xymonproxyport = 0 22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180' 22832 2011-04-01 11:01:36 Standard protocol on port 1984 22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984 22832 2011-04-01 11:01:36 Connect status is 0 22832 2011-04-01 11:01:36 Sent 39 bytes 22832 2011-04-01 11:01:36 Closing connection In xymond.log I get 2011-04-01 11:01:36 Cookie 212940 not found, dropping ack Xymondlog shows ~xymon/server/bin/xymon 10.10.8.180 "xymondlog db-03.subdomain.domain.com.ipmi"
db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0
|0|
▸
10.10.8.134|212940|||N|
red Fri Apr 1 11:05:01 EDT 2011 - IPMI FAILURE <p>&red One or more components below has a failure!<p><br>&yellow Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory\r<br>&red Get Device ID command failed\r<br>&yellow Unable to open SDR for reading\r
unified-ipmi.pl version - 1.0
Which shows the Cookie right in there
I am stumped, what could be causing this?
This is repeatable in that it happens for several hosts in xymon but not the same host/test pair consistently, and I can acknowledge other things while it is not finding this cookie. Additionally, putting in in maintenance for 1 minute will allow it to be acknowledged after it comes out of maintenance , because it will get a new cookie.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
list Sean Clark
Just adding another thing in - it gets worse as time progresses (up for about 3 days , no issues, on the 4 day, starts to crop up, 5th day more and more alert/status pairs get the "cookie not found" message) When I restart, it loads back in all the things I had in maintenance and ack (woo hoo!) and will now let me acknowledge events , albeit because now the unacknowledged red events now have new cookies (924663 was the new cookie for the example test/host pair I was acknowledging and failing in the text below) -Sean
▸
On 4/1/11 3:55 PM, "Dugan, Darin D [EIT]" <user-b33a1547d27a@xymon.invalid> wrote:
For what it's worth, I also have the exact same ack problem on occasion but haven't tracked it down either. I've also taken the approach of disabling the alert for a short time and then acking the new alert, or just dealing with the repeated alerts until it's fixed. Motivates you to fix things more quickly (when possible). This is with a pretty old snapshot from the 4.3.0 branch. I'll be updating to the final 4.3.0 release Real Soon Now. Was hoping that would magically fix the issue but I guess not. FYI, my number of tests is an order of magnitude smaller than yours. After poring through lots of Cisco documentation today I think looking at some Xymon source would be a welcome break... Off to research. Cheers. -----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Clark, Sean Sent: Friday, April 01, 2011 2:03 PM To: Clark, Sean; xymon at xymon.com Subject: Re: [Xymon] more problems with acks/Cookies in 4.3 Could it be a buffer size that I need to increase in the compile? I.e. It's not finding the cookie in the rb tree, even after it looks it up? Here is roughly the number of host/tests I have ~xymon/server/bin/xymon localhost "hobbitdboard fields=color" | sort -n | uniq -c | sort -n 9 none 91 purple 163 red 192 blue 1870 clear 2476 yellow 68797 green -- Sean Clark Sr. Engineer, Software ATG Network Operations & Planning Integrated Regional OSS <http://www.twcable.com/DepartmentOverview/AdvancedTechnologyGroup/ATG/NOP / OSS/Network.aspx> user-2db5fbcae9a7@xymon.invalid <mailto:user-2db5fbcae9a7@xymon.invalid> devaudio <aim://devaudio> <mailto:user-2db5fbcae9a7@xymon.invalid> Office: (XXX) XXX-XXXX cell: (XXX) XXX-XXXX On 4/1/11 11:13 AM, "Clark, Sean" <user-2db5fbcae9a7@xymon.invalid> wrote:I had an issue with xymon not acknowledging events in the xymondboard in version 4.2.3 This has continued in 4.3.0 I run ~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack 212940 500 this is a test ack" 22832 2011-04-01 11:01:36 Transport setup is: 22832 2011-04-01 11:01:36 xymondportnumber = 1984 22832 2011-04-01 11:01:36 xymonproxyhost = NONE 22832 2011-04-01 11:01:36 xymonproxyport = 0 22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180' 22832 2011-04-01 11:01:36 Standard protocol on port 1984 22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984 22832 2011-04-01 11:01:36 Connect status is 0 22832 2011-04-01 11:01:36 Sent 39 bytes 22832 2011-04-01 11:01:36 Closing connection In xymond.log I get 2011-04-01 11:01:36 Cookie 212940 not found, dropping ack Xymondlog shows ~xymon/server/bin/xymon 10.10.8.180 "xymondlog db-03.subdomain.domain.com.ipmi" db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0 |0| 10.10.8.134|212940|||N| red Fri Apr 1 11:05:01 EDT 2011 - IPMI FAILURE <p>&red One or more components below has a failure!<p><br>&yellow Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory\r<br>&red Get Device ID command failed\r<br>&yellow Unable to open SDR for reading\r unified-ipmi.pl version - 1.0 Which shows the Cookie right in there I am stumped, what could be causing this? This is repeatable in that it happens for several hosts in xymon but not the same host/test pair consistently, and I can acknowledge other things while it is not finding this cookie. Additionally, putting in in maintenance for 1 minute will allow it to be acknowledged after it comes out of maintenance , because it will get a new cookie. This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.