Xymon Mailing List Archive search

Avoid alarm when hobbitd server lost connection

6 messages in this thread

list Rodolfo Pilas · Fri, 20 Jul 2007 10:49:27 -0300 ·
I have a hobbitd server that monitor several other servers (hosts)

hobbitd ------- server0
        |------ server1
	|------ server2
	|------ ...
	'------ serverN

I had an issue with the hobbit server switch and its lost conn with all
'serverN' but not with 'localhost'.  As soon as switch restore
connection I receive e-mail alarms for all monitors.

What is the proper way to avoid alarms when hobbitd lost all remote
conns? (and not with localhost)
(two hobbitd servers?)
("router" or "depends" tags?)

Thank you
Rodolfo Pilas
list Stewart Larsen · Fri, 20 Jul 2007 09:59:28 -0400 (EDT) ·
I was going to ask about this today as well.

I brought this up about a week ago.  Hobbit is supposed to have a 10
minute delay on setting stuff purple on server startup, but that wasn't
working.

Any update on this?  We have over 3000 hosts monitored, so this is a HUGE
amount of emails that get generated for us. Like 20-30 thousand. :)

Stew
quoted from Rodolfo Pilas

I have a hobbitd server that monitor several other servers (hosts)

hobbitd ------- server0
        |------ server1
	|------ server2
	|------ ...
	'------ serverN

I had an issue with the hobbit server switch and its lost conn with all
'serverN' but not with 'localhost'.  As soon as switch restore
connection I receive e-mail alarms for all monitors.

What is the proper way to avoid alarms when hobbitd lost all remote
conns? (and not with localhost)
(two hobbitd servers?)
("router" or "depends" tags?)

Thank you
Rodolfo Pilas

-- 

Stewart Larsen
list Greg L Hubbard · Fri, 20 Jul 2007 09:25:17 -0500 ·
I would use router tags, make sure you are pinging the switch, and then
put the switch as the first hop in your router path.

As for the flood of emails when connections are restored -- this is a
Hobbit "feature" at present if you have "recovery" messages enabled --
Hobbit will send recovery messages regardless of whether or not it send
a "problem" message.  In a large installation, this could add up to a
lot of emails. 
quoted from Rodolfo Pilas

-----Original Message-----
From: Rodolfo Pilas [mailto:user-d7a5704f8ad9@xymon.invalid] 
Sent: Friday, July 20, 2007 8:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Avoid alarm when hobbitd server lost connection

I have a hobbitd server that monitor several other servers (hosts)

hobbitd ------- server0
        |------ server1
	|------ server2
	|------ ...
	'------ serverN

I had an issue with the hobbit server switch and its lost conn with all
'serverN' but not with 'localhost'.  As soon as switch restore
connection I receive e-mail alarms for all monitors.

What is the proper way to avoid alarms when hobbitd lost all remote
conns? (and not with localhost) (two hobbitd servers?) ("router" or
"depends" tags?)

Thank you
Rodolfo Pilas
list Rodolfo Pilas · Mon, 23 Jul 2007 13:17:26 -0300 ·
quoted from Greg L Hubbard
Hubbard, Greg L escribió:
I would use router tags, make sure you are pinging the switch, and then
put the switch as the first hop in your router path.

As for the flood of emails when connections are restored -- this is a
Hobbit "feature" at present if you have "recovery" messages enabled --
Hobbit will send recovery messages regardless of whether or not it send
a "problem" message.  In a large installation, this could add up to a
lot of emails. 
Greg, I try to understand router tag, but it is not clear to me from
documentation:
Man page:
http://www.hswn.dk/hobbit/help/manpages/man5/bb-hosts.5.html
or Wikibook:
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/Administration_Guide

I have now monitoring the 'gateway' (conn), but how I notice hobbit and
other servers checks that 'gateway' is the first router?

Can you please paste here a example configuration?

Thank you
Rodolfo Pilas
quoted from Rodolfo Pilas

-----Original Message-----
From: Rodolfo Pilas [mailto:user-d7a5704f8ad9@xymon.invalid] 
Sent: Friday, July 20, 2007 8:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Avoid alarm when hobbitd server lost connection

I have a hobbitd server that monitor several other servers (hosts)

hobbitd ------- server0
        |------ server1
	|------ server2
	|------ ...
	'------ serverN

I had an issue with the hobbit server switch and its lost conn with all
'serverN' but not with 'localhost'.  As soon as switch restore
connection I receive e-mail alarms for all monitors.

What is the proper way to avoid alarms when hobbitd lost all remote
conns? (and not with localhost) (two hobbitd servers?) ("router" or
"depends" tags?)

Thank you
Rodolfo Pilas

list Greg L Hubbard · Mon, 23 Jul 2007 13:05:46 -0500 ·
Example of a long route chain:

HobbitServer --> Firewall.Rail.1 --> NATServerPhysical --> NATServerVirtual --> Router1

Route:Firewall.Rail.1,NATServerPhysical,NATServerVirtual,Router

Remember, everything in the chain must correspond to a name somewhere in the bb-hosts file on your Hobbit server.  No spaces allowed in the chain.

This tag doesn't suppress everything, but it will make ping fails yellow instead of red for things beyond the first failure.  In my case, if the NAT server goes down (or becomes inaccessible) then the conn test for NATServerPhysical will go red, and the conn tests for NETServerVirtual and Router will go yellow and include a message blaming everything on NATServerPhysical.

NOTE: The above names are bogus -- I cannot post the real ones...

GLH
quoted from Rodolfo Pilas

-----Original Message-----
From: Rodolfo Pilas [mailto:user-d7a5704f8ad9@xymon.invalid] Sent: Monday, July 23, 2007 11:17 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Avoid alarm when hobbitd server lost connection

Hubbard, Greg L escribió:
I would use router tags, make sure you are pinging the switch, and then put the switch as the first hop in your router path.

As for the flood of emails when connections are restored -- this is a Hobbit "feature" at present if you have "recovery" messages enabled -- Hobbit will send recovery messages regardless of whether or not it send a "problem" message.  In a large installation, this could add up to a lot of emails.
Greg, I try to understand router tag, but it is not clear to me from
documentation:
Man page:
http://www.hswn.dk/hobbit/help/manpages/man5/bb-hosts.5.html
or Wikibook:
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/Administration_Guide

I have now monitoring the 'gateway' (conn), but how I notice hobbit and other servers checks that 'gateway' is the first router?

Can you please paste here a example configuration?

Thank you
Rodolfo Pilas

-----Original Message-----
From: Rodolfo Pilas [mailto:user-d7a5704f8ad9@xymon.invalid]
Sent: Friday, July 20, 2007 8:49 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Avoid alarm when hobbitd server lost connection

I have a hobbitd server that monitor several other servers (hosts)

hobbitd ------- server0
        |------ server1
	|------ server2
	|------ ...
	'------ serverN

I had an issue with the hobbit server switch and its lost conn with all 'serverN' but not with 'localhost'.  As soon as switch restore connection I receive e-mail alarms for all monitors.

What is the proper way to avoid alarms when hobbitd lost all remote conns? (and not with localhost) (two hobbitd servers?) ("router" or "depends" tags?)

Thank you
Rodolfo Pilas

list Johann Eggers · Tue, 24 Jul 2007 10:09:01 +0200 ·
quoted from Rodolfo Pilas
-----Original Message-----
From: Rodolfo Pilas [mailto:user-d7a5704f8ad9@xymon.invalid]
Sent: Montag, 23. Juli 2007 18:17
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Avoid alarm when hobbitd server lost connection

Greg, I try to understand router tag, but it is not clear to me from
documentation:
Man page:
http://www.hswn.dk/hobbit/help/manpages/man5/bb-hosts.5.html
or Wikibook:
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/Administratio
n_
Guide
quoted from Greg L Hubbard

I have now monitoring the 'gateway' (conn), but how I notice hobbit
and
other servers checks that 'gateway' is the first router?

Can you please paste here a example configuration?
Scenario:

You have a remote network in which you want to monitor host(s). You
reach this network via router1 on the remote site. So your definition
will be like this:

x.x.x.x somehost # route:router1

This tag changes the color reported for a ping check that fails, when
one or more of the hosts in the "route" list is also down. A "red"
status becomes "yellow" - other colors are unchanged. The status message
will include information about the hosts in the router-list that are
down, to aid tracking down which router is the root cause of the
problem.

Regards
Johann