Xymon Mailing List Archive search

more problems with acks/Cookies in 4.3

4 messages in this thread

list Sean Clark · Fri, 1 Apr 2011 11:13:21 -0400 ·
I had an issue with xymon not acknowledging events in the xymondboard in version 4.2.3

This has continued in 4.3.0

I run

~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack 212940 500 this is a test ack"
22832 2011-04-01 11:01:36 Transport setup is:
22832 2011-04-01 11:01:36 xymondportnumber = 1984
22832 2011-04-01 11:01:36 xymonproxyhost = NONE
22832 2011-04-01 11:01:36 xymonproxyport = 0
22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180'
22832 2011-04-01 11:01:36 Standard protocol on port 1984
22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984
22832 2011-04-01 11:01:36 Connect status is 0
22832 2011-04-01 11:01:36 Sent 39 bytes
22832 2011-04-01 11:01:36 Closing connection


In xymond.log I get

2011-04-01 11:01:36 Cookie 212940 not found, dropping ack


Xymondlog shows

~xymon/server/bin/xymon 10.10.8.180 "xymondlog db-03.subdomain.domain.com.ipmi"
db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0|0|10.10.8.134|212940|||N|
red Fri Apr  1 11:05:01 EDT 2011 - IPMI FAILURE
<p>&red One or more components below has a failure!<p><br>&yellow Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory\r<br>&red Get Device ID command failed\r<br>&yellow Unable to open SDR for reading\r

unified-ipmi.pl version - 1.0


Which shows the Cookie right in there


I am stumped, what could be causing this?

This is repeatable in that it happens for several hosts in xymon – but not the same host/test pair consistently, and I can acknowledge other things while it is not finding this cookie. Additionally, putting in in maintenance for 1 minute will allow it to be acknowledged after it comes out of maintenance , because it will get a new cookie.


This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
list Sean Clark · Fri, 1 Apr 2011 15:02:40 -0400 ·
Could it be a buffer size that I need to increase in the compile? I.e.
It's not finding the cookie in the rb tree, even after it looks it up?


Here is roughly the number of host/tests I have

~xymon/server/bin/xymon localhost "hobbitdboard fields=color" | sort -n |
uniq -c | sort -n

      9 none
     91 purple
    163 red
    192 blue
   1870 clear
   2476 yellow
  68797 green


--


Sean Clark
Sr. Engineer, Software
ATG Network Operations & Planning Integrated Regional OSS
<http://www.twcable.com/DepartmentOverview/AdvancedTechnologyGroup/ATG/NOP/
OSS/Network.aspx>
user-2db5fbcae9a7@xymon.invalid  <mailto:user-2db5fbcae9a7@xymon.invalid> devaudio
<aim://devaudio>  <mailto:user-2db5fbcae9a7@xymon.invalid>
Office: (XXX) XXX-XXXX cell: (XXX) XXX-XXXX
quoted from Sean Clark


On 4/1/11 11:13 AM, "Clark, Sean" <user-2db5fbcae9a7@xymon.invalid> wrote:
I had an issue with xymon not acknowledging events in the xymondboard in
version 4.2.3

This has continued in 4.3.0

I run

~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack 212940
500 this is a test ack"
22832 2011-04-01 11:01:36 Transport setup is:
22832 2011-04-01 11:01:36 xymondportnumber = 1984
22832 2011-04-01 11:01:36 xymonproxyhost = NONE
22832 2011-04-01 11:01:36 xymonproxyport = 0
22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180'
22832 2011-04-01 11:01:36 Standard protocol on port 1984
22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984
22832 2011-04-01 11:01:36 Connect status is 0
22832 2011-04-01 11:01:36 Sent 39 bytes
22832 2011-04-01 11:01:36 Closing connection


In xymond.log I get

2011-04-01 11:01:36 Cookie 212940 not found, dropping ack


Xymondlog shows

~xymon/server/bin/xymon 10.10.8.180 "xymondlog
db-03.subdomain.domain.com.ipmi"

db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0|0|
10.10.8.134|212940|||N|
quoted from Sean Clark
red Fri Apr  1 11:05:01 EDT 2011 - IPMI FAILURE
<p>&red One or more components below has a failure!<p><br>&yellow Could
not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such
file or directory\r<br>&red Get Device ID command failed\r<br>&yellow
Unable to open SDR for reading\r

unified-ipmi.pl version - 1.0


Which shows the Cookie right in there


I am stumped, what could be causing this?

This is repeatable in that it happens for several hosts in xymon ­ but
not the same host/test pair consistently, and I can acknowledge other
things while it is not finding this cookie. Additionally, putting in in
maintenance for 1 minute will allow it to be acknowledged after it comes
out of maintenance , because it will get a new cookie.


This E-mail and any of its attachments may contain Time Warner Cable
proprietary information, which is privileged, confidential, or subject to
copyright belonging to Time Warner Cable. This E-mail is intended solely
for the use of the individual or entity to which it is addressed. If you
are not the intended recipient of this E-mail, you are hereby notified
that any dissemination, distribution, copying, or action taken in
relation to the contents of and attachments to this E-mail is strictly
prohibited and may be unlawful. If you have received this E-mail in
error, please notify the sender immediately and permanently delete the
original and any copy of this E-mail and any printout.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
list Darin D [eit] Dugan · Fri, 1 Apr 2011 14:55:45 -0500 ·
For what it's worth, I also have the exact same ack problem on occasion but haven't tracked it down either. I've also taken the approach of disabling the alert for a short time and then acking the new alert, or just dealing with the repeated alerts until it's fixed. Motivates you to fix things more quickly (when possible). This is with a pretty old snapshot from the 4.3.0 branch. I'll be updating to the final 4.3.0 release Real Soon Now. Was hoping that would magically fix the issue but I guess not. FYI, my number of tests is an order of magnitude smaller than yours.

After poring through lots of Cisco documentation today I think looking at some Xymon source would be a welcome break... Off to research.
Cheers.
quoted from Sean Clark

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Clark, Sean
Sent: Friday, April 01, 2011 2:03 PM
To: Clark, Sean; xymon at xymon.com
Subject: Re: [Xymon] more problems with acks/Cookies in 4.3

Could it be a buffer size that I need to increase in the compile? I.e.
It's not finding the cookie in the rb tree, even after it looks it up?


Here is roughly the number of host/tests I have

~xymon/server/bin/xymon localhost "hobbitdboard fields=color" | sort -n | uniq -c | sort -n

      9 none
     91 purple
    163 red
    192 blue
   1870 clear
   2476 yellow
  68797 green


--


Sean Clark
Sr. Engineer, Software
ATG Network Operations & Planning Integrated Regional OSS <http://www.twcable.com/DepartmentOverview/AdvancedTechnologyGroup/ATG/NOP/
OSS/Network.aspx>
user-2db5fbcae9a7@xymon.invalid  <mailto:user-2db5fbcae9a7@xymon.invalid> devaudio <aim://devaudio>  <mailto:user-2db5fbcae9a7@xymon.invalid>
Office: (XXX) XXX-XXXX cell: (XXX) XXX-XXXX


On 4/1/11 11:13 AM, "Clark, Sean" <user-2db5fbcae9a7@xymon.invalid> wrote:
I had an issue with xymon not acknowledging events in the xymondboard in version 4.2.3

This has continued in 4.3.0

I run

~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack 212940
500 this is a test ack"
22832 2011-04-01 11:01:36 Transport setup is:
22832 2011-04-01 11:01:36 xymondportnumber = 1984
22832 2011-04-01 11:01:36 xymonproxyhost = NONE
22832 2011-04-01 11:01:36 xymonproxyport = 0
22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180'
22832 2011-04-01 11:01:36 Standard protocol on port 1984
22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984
22832 2011-04-01 11:01:36 Connect status is 0
22832 2011-04-01 11:01:36 Sent 39 bytes
22832 2011-04-01 11:01:36 Closing connection


In xymond.log I get

2011-04-01 11:01:36 Cookie 212940 not found, dropping ack


Xymondlog shows

~xymon/server/bin/xymon 10.10.8.180 "xymondlog db-03.subdomain.domain.com.ipmi"

db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0
|0|
quoted from Sean Clark
10.10.8.134|212940|||N|
red Fri Apr  1 11:05:01 EDT 2011 - IPMI FAILURE <p>&red One or more components below has a failure!<p><br>&yellow Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory\r<br>&red Get Device ID command failed\r<br>&yellow Unable to open SDR for reading\r

unified-ipmi.pl version - 1.0


Which shows the Cookie right in there


I am stumped, what could be causing this?

This is repeatable in that it happens for several hosts in xymon  but not the same host/test pair consistently, and I can acknowledge other things while it is not finding this cookie. Additionally, putting in in maintenance for 1 minute will allow it to be acknowledged after it comes out of maintenance , because it will get a new cookie.


This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
list Sean Clark · Fri, 1 Apr 2011 17:20:10 -0400 ·
Just adding another thing in - it gets worse as time progresses (up for
about 3 days , no issues, on the 4 day, starts to crop up, 5th day more
and more alert/status pairs get the "cookie not found" message)

When I restart, it loads back in all the things I had in maintenance and
ack (woo hoo!) and will now let me acknowledge events , albeit because now
the unacknowledged red events now have new cookies (924663 was the new
cookie for the example test/host pair I was acknowledging and failing in
the text below)


-Sean
quoted from Darin D [eit] Dugan


On 4/1/11 3:55 PM, "Dugan, Darin D [EIT]" <user-b33a1547d27a@xymon.invalid> wrote:
For what it's worth, I also have the exact same ack problem on occasion
but haven't tracked it down either. I've also taken the approach of
disabling the alert for a short time and then acking the new alert, or
just dealing with the repeated alerts until it's fixed. Motivates you to
fix things more quickly (when possible). This is with a pretty old
snapshot from the 4.3.0 branch. I'll be updating to the final 4.3.0
release Real Soon Now. Was hoping that would magically fix the issue but
I guess not. FYI, my number of tests is an order of magnitude smaller
than yours.

After poring through lots of Cisco documentation today I think looking at
some Xymon source would be a welcome break... Off to research.
Cheers.

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf
Of Clark, Sean
Sent: Friday, April 01, 2011 2:03 PM
To: Clark, Sean; xymon at xymon.com
Subject: Re: [Xymon] more problems with acks/Cookies in 4.3

Could it be a buffer size that I need to increase in the compile? I.e.
It's not finding the cookie in the rb tree, even after it looks it up?


Here is roughly the number of host/tests I have

~xymon/server/bin/xymon localhost "hobbitdboard fields=color" | sort -n |
uniq -c | sort -n

     9 none
    91 purple
   163 red
   192 blue
  1870 clear
  2476 yellow
 68797 green


--


Sean Clark
Sr. Engineer, Software
ATG Network Operations & Planning Integrated Regional OSS
<http://www.twcable.com/DepartmentOverview/AdvancedTechnologyGroup/ATG/NOP
/
OSS/Network.aspx>
user-2db5fbcae9a7@xymon.invalid  <mailto:user-2db5fbcae9a7@xymon.invalid> devaudio
<aim://devaudio>  <mailto:user-2db5fbcae9a7@xymon.invalid>
Office: (XXX) XXX-XXXX cell: (XXX) XXX-XXXX


On 4/1/11 11:13 AM, "Clark, Sean" <user-2db5fbcae9a7@xymon.invalid> wrote:
I had an issue with xymon not acknowledging events in the xymondboard
in version 4.2.3

This has continued in 4.3.0

I run

~xymon/server/bin/xymon --debug --response 10.10.8.180 "xymondack
212940
500 this is a test ack"
22832 2011-04-01 11:01:36 Transport setup is:
22832 2011-04-01 11:01:36 xymondportnumber = 1984
22832 2011-04-01 11:01:36 xymonproxyhost = NONE
22832 2011-04-01 11:01:36 xymonproxyport = 0
22832 2011-04-01 11:01:36 Recipient listed as '10.10.8.180'
22832 2011-04-01 11:01:36 Standard protocol on port 1984
22832 2011-04-01 11:01:36 Will connect to address 10.10.8.180 port 1984
22832 2011-04-01 11:01:36 Connect status is 0
22832 2011-04-01 11:01:36 Sent 39 bytes
22832 2011-04-01 11:01:36 Closing connection


In xymond.log I get

2011-04-01 11:01:36 Cookie 212940 not found, dropping ack


Xymondlog shows

~xymon/server/bin/xymon 10.10.8.180 "xymondlog
db-03.subdomain.domain.com.ipmi"
db-03.subdomain.domain.com|ipmi|red||1301336915|1301670417|1301672217|0
|0|
10.10.8.134|212940|||N|
red Fri Apr  1 11:05:01 EDT 2011 - IPMI FAILURE <p>&red One or more
components below has a failure!<p><br>&yellow Could not open device at
/dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or
directory\r<br>&red Get Device ID command failed\r<br>&yellow Unable to
open SDR for reading\r

unified-ipmi.pl version - 1.0


Which shows the Cookie right in there


I am stumped, what could be causing this?

This is repeatable in that it happens for several hosts in xymon  but
not the same host/test pair consistently, and I can acknowledge other
things while it is not finding this cookie. Additionally, putting in in
maintenance for 1 minute will allow it to be acknowledged after it
comes out of maintenance , because it will get a new cookie.


This E-mail and any of its attachments may contain Time Warner Cable
proprietary information, which is privileged, confidential, or subject
to copyright belonging to Time Warner Cable. This E-mail is intended
solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient of this E-mail, you
are hereby notified that any dissemination, distribution, copying, or
action taken in relation to the contents of and attachments to this
E-mail is strictly prohibited and may be unlawful. If you have received
this E-mail in error, please notify the sender immediately and
permanently delete the original and any copy of this E-mail and any
printout.
This E-mail and any of its attachments may contain Time Warner Cable
proprietary information, which is privileged, confidential, or subject to
copyright belonging to Time Warner Cable. This E-mail is intended solely
for the use of the individual or entity to which it is addressed. If you
are not the intended recipient of this E-mail, you are hereby notified
that any dissemination, distribution, copying, or action taken in
relation to the contents of and attachments to this E-mail is strictly
prohibited and may be unlawful. If you have received this E-mail in
error, please notify the sender immediately and permanently delete the
original and any copy of this E-mail and any printout.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.