hobbitfetch spinning again
list Dan McDonald
The patched hobbitfetch ran longer, but it still ended up consuming a processor after about 3 days. I did a kill -6 on it, but couldn't find the corefile anywhere. -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com
list Dan McDonald
▸
On Fri, 2007-07-27 at 12:29 -0500, McDonald, Dan wrote:
The patched hobbitfetch ran longer, but it still ended up consuming a processor after about 3 days. I did a kill -6 on it, but couldn't find the corefile anywhere.
Died again... Guess it's not more stable, just luck of the draw as to when it hangs. Again when I did a kill -6 there was no corefile anywhere generated. Should I strace the process the next time it "gang aft agley"? -- Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX Austin Energy http://www.austinenergy.com
list James Wade
Hello, I need a recommendation on paging. When I last submitted this, Henrik pointed out that when I use the DURATION<30, that if the Alert goes from yellow to Red within 30 minutes, I won't get the alert. So, I thought if I divided up, like below, the problem would be solved. HOST=%du* SERVICE=conn,cpu,disk,nfs MAIL user-96ac0991469d@xymon.invalid REPEAT=15 COLOR=YELLOW DURATION<30 RECOVERED HOST=%du* SERVICE=conn,cpu,disk,nfs MAIL user-a8d2a9525cb9@xymon.invalid REPEAT=15 COLOR=RED DURATION<30 RECOVERED So, based on the settings up, At around midnight last night, a database server reached YELLOW and sent out two emails 15 minutes apart (30 minutes). One hour and 30 minutes later, the database server reached RED, however, no pages were sent out. From notifications log: Sat Jul 28 00:08:55 2007 du102.disk (192.168.1.76) user-96ac0991469d@xymon.invalid [128] 1185599335 100 Sat Jul 28 00:24:24 2007 du102.disk (191.168.1.76) user-96ac0991469d@xymon.invalid 1185600264 100 Sat Jul 28 00:13:19 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128] 1185599599 100 Sat Jul 28 00:28:24 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128] 1185600504 100 What I'm trying to do is reduce the number of overall pages that get sent when an alert occurs, regardless it it's red or yellow. The persons getting the notifications only want a maximum of two notifications per red or yellow alert. Any suggestions? How is everyone else handling not getting bombarded with pages? I'm open to any strategy that insures I'm getting pages on everything, but also doesn't send multiple pages on the same problem. I would also very much appreciate if you could send the setting in the alert file with the suggestion so I can set mine up that way. Thanks All for the help. James
list Trent Melcher
▸
On Sat, 2007-07-28 at 09:53 -0500, James Wade wrote:
Hello, I need a recommendation on paging. When I last submitted this, Henrik pointed out that when I use the DURATION<30, that if the Alert goes from yellow to Red within 30 minutes, I won't get the alert. So, I thought if I divided up, like below, the problem would be solved. HOST=%du* SERVICE=conn,cpu,disk,nfs MAIL user-96ac0991469d@xymon.invalid REPEAT=15 COLOR=YELLOW DURATION<30 RECOVERED HOST=%du* SERVICE=conn,cpu,disk,nfs MAIL user-a8d2a9525cb9@xymon.invalid REPEAT=15 COLOR=RED DURATION<30 RECOVERED
Try testing your rules and see which ones it matches. Usage: hobbitd_alert --test HOST SERVICE [duration [color [time]]] Depending on your results, you may want the red alert first, the hobbit-alerts file is read from the top down. Trent
▸
So, based on the settings up, At around midnight last night, a database server reached YELLOW and sent out two emails 15 minutes apart (30 minutes). One hour and 30 minutes later, the database server reached RED, however, no pages were sent out. From notifications log: Sat Jul 28 00:08:55 2007 du102.disk (192.168.1.76) user-96ac0991469d@xymon.invalid [128] 1185599335 100 Sat Jul 28 00:24:24 2007 du102.disk (191.168.1.76) user-96ac0991469d@xymon.invalid 1185600264 100 Sat Jul 28 00:13:19 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128] 1185599599 100 Sat Jul 28 00:28:24 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128] 1185600504 100 What I'm trying to do is reduce the number of overall pages that get sent when an alert occurs, regardless it it's red or yellow. The persons getting the notifications only want a maximum of two notifications per red or yellow alert. Any suggestions? How is everyone else handling not getting bombarded with pages? I'm open to any strategy that insures I'm getting pages on everything, but also doesn't send multiple pages on the same problem. I would also very much appreciate if you could send the setting in the alert file with the suggestion so I can set mine up that way. Thanks All for the help. James
list Hobbit User
Just want to confirm that I'm getting the following right: --When the process hobbitd_client encounters errors in parsing its configuration file hobbit-clients.cfg, it logs them to clientdata.log rather than to hobbitclient.log (which never seems to get any entries). --Any "unknown token" error in hobbit-clients.cfg will result in hobbitd_client disregarding all lines below the error.