Xymon Mailing List Archive search

hobbitfetch spinning again

5 messages in this thread

list Dan McDonald · Fri, 27 Jul 2007 12:29:58 -0500 ·
The patched hobbitfetch ran longer, but it still ended up consuming a
processor after about 3 days.  I did a kill -6 on it, but couldn't find
the corefile anywhere.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com
list Dan McDonald · Fri, 27 Jul 2007 16:53:04 -0500 ·
quoted from Dan McDonald
On Fri, 2007-07-27 at 12:29 -0500, McDonald, Dan wrote:
The patched hobbitfetch ran longer, but it still ended up consuming a
processor after about 3 days.  I did a kill -6 on it, but couldn't find
the corefile anywhere.
Died again...  Guess it's not more stable, just luck of the draw as to
when it hangs.  Again when I did a kill -6 there was no corefile
anywhere generated.

Should I strace the process the next time it "gang aft agley"?


-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281, CNX
Austin Energy
http://www.austinenergy.com
list James Wade · Sat, 28 Jul 2007 09:53:25 -0500 ·
Hello,

I need a recommendation on paging. When I last submitted this,
Henrik pointed out that when I use the DURATION<30, that if
the Alert goes from yellow to Red within 30 minutes, I won't get
the alert.

So, I thought if I divided up, like below, the problem would be solved.

HOST=%du* SERVICE=conn,cpu,disk,nfs
   MAIL user-96ac0991469d@xymon.invalid REPEAT=15 COLOR=YELLOW DURATION<30 RECOVERED

HOST=%du* SERVICE=conn,cpu,disk,nfs
   
MAIL user-a8d2a9525cb9@xymon.invalid REPEAT=15 COLOR=RED DURATION<30 RECOVERED


So, based on the settings up, At around midnight last night, a database
server reached YELLOW and sent out two emails 15 minutes apart (30 minutes).

One hour and 30 minutes later, the database server reached RED, however,
no pages were sent out.

From notifications log:

Sat Jul 28 00:08:55 2007 du102.disk (192.168.1.76) user-96ac0991469d@xymon.invalid [128]
1185599335 100 

Sat Jul 28 00:24:24 2007 du102.disk (191.168.1.76) user-96ac0991469d@xymon.invalid
1185600264 100

Sat Jul 28 00:13:19 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128]
1185599599 100

Sat Jul 28 00:28:24 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128]
1185600504 100


What I'm trying to do is reduce the number of overall pages that get
sent when an alert occurs, regardless it it's red or yellow.

The persons getting the notifications only want a maximum of two
notifications per red or yellow alert.


Any suggestions? How is everyone else handling not getting bombarded 
with pages? I'm open to any strategy that insures I'm getting pages
on everything, but also doesn't send multiple pages on the same problem.

I would also very much appreciate if you could send the setting in
the alert file with the suggestion so I can set mine up that way.

Thanks All for the help.

James
list Trent Melcher · Mon, 30 Jul 2007 08:29:39 -0500 ·
quoted from James Wade
On Sat, 2007-07-28 at 09:53 -0500, James Wade wrote:
Hello,

I need a recommendation on paging. When I last submitted this,
Henrik pointed out that when I use the DURATION<30, that if
the Alert goes from yellow to Red within 30 minutes, I won't get
the alert.

So, I thought if I divided up, like below, the problem would be solved.

HOST=%du* SERVICE=conn,cpu,disk,nfs
   MAIL user-96ac0991469d@xymon.invalid REPEAT=15 COLOR=YELLOW DURATION<30 RECOVERED

HOST=%du* SERVICE=conn,cpu,disk,nfs
   MAIL user-a8d2a9525cb9@xymon.invalid REPEAT=15 COLOR=RED DURATION<30 RECOVERED
Try testing your rules and see which ones it matches.

Usage: hobbitd_alert --test HOST SERVICE [duration [color [time]]]

Depending on your results,  you may want the red alert first,  the
hobbit-alerts file is read from the top down.

Trent
quoted from James Wade
So, based on the settings up, At around midnight last night, a database
server reached YELLOW and sent out two emails 15 minutes apart (30 minutes).

One hour and 30 minutes later, the database server reached RED, however,
no pages were sent out.

From notifications log:

Sat Jul 28 00:08:55 2007 du102.disk (192.168.1.76) user-96ac0991469d@xymon.invalid [128]
1185599335 100 
Sat Jul 28 00:24:24 2007 du102.disk (191.168.1.76) user-96ac0991469d@xymon.invalid
1185600264 100

Sat Jul 28 00:13:19 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128]
1185599599 100

Sat Jul 28 00:28:24 2007 du103.disk (192.168.1.78) user-96ac0991469d@xymon.invalid [128]
1185600504 100


What I'm trying to do is reduce the number of overall pages that get
sent when an alert occurs, regardless it it's red or yellow.

The persons getting the notifications only want a maximum of two
notifications per red or yellow alert.


Any suggestions? How is everyone else handling not getting bombarded with pages? I'm open to any strategy that insures I'm getting pages
on everything, but also doesn't send multiple pages on the same problem.

I would also very much appreciate if you could send the setting in
the alert file with the suggestion so I can set mine up that way.

Thanks All for the help.

James

list Hobbit User · Mon, 30 Jul 2007 10:52:38 -0400 (EDT) ·
Just want to confirm that I'm getting the following right:

--When the process hobbitd_client encounters errors in parsing its
configuration file hobbit-clients.cfg, it logs them to clientdata.log
rather than to hobbitclient.log (which never seems to get any entries).

--Any "unknown token" error in hobbit-clients.cfg will result in
hobbitd_client disregarding all lines below the error.