Need help determining why alerts didn't come

8 messages in this thread

list Brian Bouchard · Fri, 7 Nov 2008 09:52:22 -0500 ·

Hello Hobbit Gurus,

 
I am seeking help determining why we recently received only some alerts
that were configured on a given server.

 
In my hobbit-clients.cfg file I have multiple sections of relevance:

 
#######################################################

# generic checks for all WebLogic Servers

#######################################################

HOST= applesauce,gravy,enchilada,chips

        DISK    *       95 97

        PROC dsmcad 1 -1 yellow

        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red

#######################################################

# specific checks for applesauce

#######################################################

HOST=applesauce

       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow

       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES

       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01

       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01

       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01

 
So, a couple of questions:

 
1)       Is it valid to have different alerts for the same HOST in the
hobbit-clients.cfg like this?  It seemed to work in some instances, but
I should ask before moving forward...


2)       Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01"  when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die.  I cleaned up a bunch of log
files, and restarted the process and all was good...  BUT... Why didn't
I receive the alert that the DISK was more than 97% full.  I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down.  Also, the check for the
"jrockit" file did not kick off when that file was create  (after the
filesystem was at 100%)  I need to determine why we weren't warned on
the disk space issue before our production application came down.


3)       One other thing I noticed was that the IP address for this
server was incorrect in the bb-hosts file.  I assume that's an issue,
but I'm not sure why we got some expected alerts and not others.  Also,
I updated this entry in the bb-hosts file to the correct IP, and cycled
the hobbit server, but I am still not receiving the alert on the jrockit
file, which is still out there.

 
Any help is appreciated.  I'm relatively new to Hobbit, so its
completely within the realm of possibility that I don't have any of this
set up correctly. Please feel free to correct me on anything that looks
out of whack.

 
- Brian

list Greg L Hubbard · Fri, 7 Nov 2008 09:14:21 -0600 ·

You can always look at the page behind the "info" button for applesauce
to see how the alert rules were interpreted.  You can also run an event
configuration report.
 
Personally, I would not try to be too clever in any of the Hobbit
configuration files unless the documentation provides a specific example
of "cleverness."  I would explicitly list what I want for each host, and
not assume that I can set up a hierarchy of parameters using multiple
definitions.  Over the past year or so there have been a number of posts
from people who are misled by their own assumptions that "Hobbit works
this way because I want/need it to work this way."
 
GLH

▸ quoted from Brian Bouchard



	From: Bouchard, Brian [mailto:user-4c1afba0ca37@xymon.invalid] 
	Sent: Friday, November 07, 2008 8:52 AM
	To: user-ae9b8668bcde@xymon.invalid
	Subject: [hobbit] Need help determining why alerts didn't come
	
	
	Hello Hobbit Gurus,

	 
	I am seeking help determining why we recently received only some
alerts that were configured on a given server.

	 
	In my hobbit-clients.cfg file I have multiple sections of
relevance:

	 
	#######################################################

	# generic checks for all WebLogic Servers

	#######################################################

	HOST= applesauce,gravy,enchilada,chips

	        DISK    *       95 97

	        PROC dsmcad 1 -1 yellow

	        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red

	#######################################################

	# specific checks for applesauce

	#######################################################

	HOST=applesauce

	       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL"
COLOR=yellow

	       PROC "weblogic.Name=" 3 3 red
TEXT=TOTAL_WEBLOGIC_PROCESSES

	       PROC "weblogic.Name=prod_alsb_01" 1 1 red
TEXT=PROD_ALSB_01

	       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red
TEXT=PROD_CCS_WLI_01

	       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01

	 
	So, a couple of questions:

	 
	1)       Is it valid to have different alerts for the same HOST
in the hobbit-clients.cfg like this?  It seemed to work in some
instances, but I should ask before moving forward...
	
	
	2)       Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01"  when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die.  I cleaned up a bunch of log
files, and restarted the process and all was good...  BUT... Why didn't
I receive the alert that the DISK was more than 97% full.  I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down.  Also, the check for the
"jrockit" file did not kick off when that file was create  (after the
filesystem was at 100%)  I need to determine why we weren't warned on
the disk space issue before our production application came down.
	
	
	3)       One other thing I noticed was that the IP address for
this server was incorrect in the bb-hosts file.  I assume that's an
issue, but I'm not sure why we got some expected alerts and not others.
Also, I updated this entry in the bb-hosts file to the correct IP, and
cycled the hobbit server, but I am still not receiving the alert on the
jrockit file, which is still out there.

	 
	Any help is appreciated.  I'm relatively new to Hobbit, so its
completely within the realm of possibility that I don't have any of this
set up correctly. Please feel free to correct me on anything that looks
out of whack.

	 
	- Brian

list Tom Callahan · Fri, 07 Nov 2008 10:26:36 -0500 ·

I¹ve noticed inability to correctly parse ³df² if you have long device names
(think device-mapper).

My solution was to change DF=²df k² in bbsys.local to DF=²df k P² for
POSIX mode.

Try that and see if it helps?

▸ quoted from Greg L Hubbard



On 11/7/08 9:52 AM, "Bouchard, Brian" <user-4c1afba0ca37@xymon.invalid> wrote:

Hello Hobbit Gurus,
 
I am seeking help determining why we recently received only some alerts that
were configured on a given server.
 
 
In my hobbit-clients.cfg file I have multiple sections of relevance:
 
#######################################################
# generic checks for all WebLogic Servers
#######################################################
HOST= applesauce,gravy,enchilada,chips
        DISK    *       95 97
        PROC dsmcad 1 -1 yellow
        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red
#######################################################
# specific checks for applesauce
#######################################################
HOST=applesauce
       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red TEXT=PROD_CCS_ALDSP_01
 
 
So, a couple of questions:
 
1)       Is it valid to have different alerts for the same HOST in the
hobbit-clients.cfg like this?  It seemed to work in some instances, but I
should ask before moving forward


2)       Yesterday, I received the alerts with TEXT=


³TOTAL_WEBLOGIC_PROCESSES² and ³PROD_ALSB_01² when I logged onto the server, I

▸ quoted from Greg L Hubbard

found the filesystem this process was running on was 100% used, which caused
this process to die.  I cleaned up a bunch of log files, and restarted the


process and all was good  BUT Why didn¹t I receive the alert that the DISK

▸ quoted from Greg L Hubbard

was more than 97% full.  I checked the history for the disk usage, and it had
been over 95% for at least 6 hours prior to the process going down.  Also, the


check for the ³jrockit² file did not kick off when that file was create
(after the filesystem was at 100%)  I need to determine why we weren¹t warned

▸ quoted from Greg L Hubbard

on the disk space issue before our production application came down.


3)       One other thing I noticed was that the IP address for this server was


incorrect in the bb-hosts file.  I assume that¹s an issue, but I¹m not sure

▸ quoted from Greg L Hubbard

why we got some expected alerts and not others.  Also, I updated this entry in
the bb-hosts file to the correct IP, and cycled the hobbit server, but I am
still not receiving the alert on the jrockit file, which is still out there.


Any help is appreciated.  I¹m relatively new to Hobbit, so its completely
within the realm of possibility that I don¹t have any of this set up

▸ quoted from Greg L Hubbard

correctly. Please feel free to correct me on anything that looks out of whack.
 
- Brian

list Brian Bouchard · Fri, 7 Nov 2008 11:18:36 -0500 ·

Tom - 

 
I don't have a bbsys.local file, but I did find a similar command in the
hobbitserver.cfg.  The setting there is as similar to as you suggested:

 
DF="/bin/df -Pk"

 
Also, to Greg's points earlier, I took the hierarchy out of the picture
for now.  I'm looking at the info and config reports now.

 
- Brian


From: Tom Callahan [mailto:user-16f19114071e@xymon.invalid] 
Sent: Friday, November 07, 2008 10:27 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Need help determining why alerts didn't come

 
I've noticed inability to correctly parse "df" if you have long device
names (think device-mapper).

My solution was to change DF="df -k" in bbsys.local to DF="df -k -P" for

▸ quoted from Tom Callahan

POSIX mode.

Try that and see if it helps?


On 11/7/08 9:52 AM, "Bouchard, Brian" <user-4c1afba0ca37@xymon.invalid> wrote:

Hello Hobbit Gurus,
 
I am seeking help determining why we recently received only some alerts
that were configured on a given server.
 
 
In my hobbit-clients.cfg file I have multiple sections of relevance:
 
#######################################################
# generic checks for all WebLogic Servers
#######################################################
HOST= applesauce,gravy,enchilada,chips
        DISK    *       95 97
        PROC dsmcad 1 -1 yellow
        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red
#######################################################
# specific checks for applesauce
#######################################################
HOST=applesauce
       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01
 
 
So, a couple of questions:
 
1)       Is it valid to have different alerts for the same HOST in the
hobbit-clients.cfg like this?  It seemed to work in some instances, but
I should ask before moving forward...


2)       Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01" when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die.  I cleaned up a bunch of log
files, and restarted the process and all was good...  BUT... Why didn't
I receive the alert that the DISK was more than 97% full.  I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down.  Also, the check for the
"jrockit" file did not kick off when that file was create  (after the
filesystem was at 100%)  I need to determine why we weren't warned on
the disk space issue before our production application came down.


3)       One other thing I noticed was that the IP address for this
server was incorrect in the bb-hosts file.  I assume that's an issue,
but I'm not sure why we got some expected alerts and not others.  Also,
I updated this entry in the bb-hosts file to the correct IP, and cycled
the hobbit server, but I am still not receiving the alert on the jrockit
file, which is still out there.
 
Any help is appreciated.  I'm relatively new to Hobbit, so its
completely within the realm of possibility that I don't have any of this
set up correctly. Please feel free to correct me on anything that looks
out of whack.
 
- Brian

list Brian Bouchard · Fri, 7 Nov 2008 13:11:35 -0500 ·

Ok, I removed the hierarchy as suggested, Greg.

 
Then I added a line to my applesauce server so the hobbit-clients.cfg
now has the following:

▸ quoted from Brian Bouchard


 
HOST=applesauce

       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow

       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES

       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01

       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01

       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01


       DISK /wls_domains 40 97

 
Looking at the disk page for this server on hobbit, the page is still
green, and I see the following:

 
/dev/mapper/vg00-lvol10   9289080 5718512   3098712  65% /wls_domains

 
When I run the config report for this server I see the following for
disk:

 
disk

No

-/-/-

Default limits: Yellow 90% full, Red 95% full


/wls_appl


/var


/boot


/wls_logs


/wls_domains


/opt


/usr


/root


/dev

/shm


/home


/tmp

 
I assume this is saying all of these disks are only going to go yellow
on 90% full., and red on 95% full?  If this is the case, we clearly have
something set up incorrectly.  If I am misunderstanding the report,
please let me know.

▸ quoted from Brian Bouchard


 
From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid] 
Sent: Friday, November 07, 2008 10:14 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Need help determining why alerts didn't come

 
You can always look at the page behind the "info" button for applesauce
to see how the alert rules were interpreted.  You can also run an event
configuration report.

 
Personally, I would not try to be too clever in any of the Hobbit
configuration files unless the documentation provides a specific example
of "cleverness."  I would explicitly list what I want for each host, and
not assume that I can set up a hierarchy of parameters using multiple
definitions.  Over the past year or so there have been a number of posts
from people who are misled by their own assumptions that "Hobbit works
this way because I want/need it to work this way."

 
GLH

	 
	From: Bouchard, Brian [mailto:user-4c1afba0ca37@xymon.invalid] 
	Sent: Friday, November 07, 2008 8:52 AM
	To: user-ae9b8668bcde@xymon.invalid
	Subject: [hobbit] Need help determining why alerts didn't come

	Hello Hobbit Gurus,

	 
	I am seeking help determining why we recently received only some
alerts that were configured on a given server.

	 
	In my hobbit-clients.cfg file I have multiple sections of
relevance:

	 
	#######################################################

	# generic checks for all WebLogic Servers

	#######################################################

	HOST= applesauce,gravy,enchilada,chips

	        DISK    *       95 97

	        PROC dsmcad 1 -1 yellow

	        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red

	#######################################################

	# specific checks for applesauce

	#######################################################

	HOST=applesauce

	       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL"
COLOR=yellow

	       PROC "weblogic.Name=" 3 3 red
TEXT=TOTAL_WEBLOGIC_PROCESSES

	       PROC "weblogic.Name=prod_alsb_01" 1 1 red
TEXT=PROD_ALSB_01

	       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red
TEXT=PROD_CCS_WLI_01

	       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01

	 
	So, a couple of questions:

	 
	1)       Is it valid to have different alerts for the same HOST
in the hobbit-clients.cfg like this?  It seemed to work in some
instances, but I should ask before moving forward...

	2)       Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01"  when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die.  I cleaned up a bunch of log
files, and restarted the process and all was good...  BUT... Why didn't
I receive the alert that the DISK was more than 97% full.  I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down.  Also, the check for the
"jrockit" file did not kick off when that file was create  (after the
filesystem was at 100%)  I need to determine why we weren't warned on
the disk space issue before our production application came down.

	3)       One other thing I noticed was that the IP address for
this server was incorrect in the bb-hosts file.  I assume that's an
issue, but I'm not sure why we got some expected alerts and not others.
Also, I updated this entry in the bb-hosts file to the correct IP, and
cycled the hobbit server, but I am still not receiving the alert on the
jrockit file, which is still out there.

	 
	Any help is appreciated.  I'm relatively new to Hobbit, so its
completely within the realm of possibility that I don't have any of this
set up correctly. Please feel free to correct me on anything that looks
out of whack.

	 
	- Brian

list Greg L Hubbard · Fri, 7 Nov 2008 12:47:54 -0600 ·

Couple of things to try:
 
a) make sure that your DEFAULT section is at the very bottom of the
hobbit-clients.cfg file.
 
Also, from the man page for the hobbit-client.cfg file:
 

RULES: APPLYING SETTINGS TO SELECTED HOSTS

Rules must be placed after the settings, e.g. 

	
	LOAD 8.0 12.0  HOST=db.foo.com TIME=*:0800:1600
	

If you have multiple settings that you want to apply the same rules to,
you can write the rules *only* on one line, followed by the settings.
E.g. 

	
	HOST=%db.*.foo.com TIME=W:0800:1600
	        LOAD 8.0 12.0
	        DISK /db  98 100
	        PROC mysqld 1
	

will apply the three settings to all of the "db" hosts on week-days
between 8AM and 4PM. This can be combined with per-settings rule, in
which case the per-settings rule overrides the general rule; e.g. 

	
	HOST=%.*.foo.com
	        LOAD 7.0 12.0 HOST=bax.foo.com
	        LOAD 3.0 8.0
	

will result in the load-limits being 7.0/12.0 for the "bax.foo.com"
host, and 3.0/8.0 for all other foo.com hosts. 

The entire file is evaluated from the top to bottom, and the first match
found is used. So you should put the specific settings first, and the
generic ones last.

▸ quoted from Brian Bouchard



	From: Bouchard, Brian [mailto:user-4c1afba0ca37@xymon.invalid] 
	Sent: Friday, November 07, 2008 12:12 PM
	To: user-ae9b8668bcde@xymon.invalid
	Subject: RE: [hobbit] Need help determining why alerts didn't
come
	
	
	Ok, I removed the hierarchy as suggested, Greg.

	 
	Then I added a line to my applesauce server so the
hobbit-clients.cfg now has the following:

	 
	HOST=applesauce

	       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL"
COLOR=yellow

	       PROC "weblogic.Name=" 3 3 red
TEXT=TOTAL_WEBLOGIC_PROCESSES

	       PROC "weblogic.Name=prod_alsb_01" 1 1 red
TEXT=PROD_ALSB_01

	       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red
TEXT=PROD_CCS_WLI_01

	       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01

	       DISK /wls_domains 40 97

	 
	Looking at the disk page for this server on hobbit, the page is
still green, and I see the following:

	 
	/dev/mapper/vg00-lvol10   9289080 5718512   3098712  65%
/wls_domains

	 
	When I run the config report for this server I see the following
for disk:

	 
disk

No

-/-/-

Default limits: Yellow 90% full, Red 95% full


/wls_appl


/var


/boot


/wls_logs


/wls_domains


/opt


/usr


/root


/dev

/shm


/home


/tmp

	 
	I assume this is saying all of these disks are only going to go
yellow on 90% full., and red on 95% full?  If this is the case, we
clearly have something set up incorrectly.  If I am misunderstanding the
report, please let me know.

	 
	From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid] 
	Sent: Friday, November 07, 2008 10:14 AM
	To: user-ae9b8668bcde@xymon.invalid
	Subject: RE: [hobbit] Need help determining why alerts didn't
come

	 
	You can always look at the page behind the "info" button for
applesauce to see how the alert rules were interpreted.  You can also
run an event configuration report.

	 
	Personally, I would not try to be too clever in any of the
Hobbit configuration files unless the documentation provides a specific
example of "cleverness."  I would explicitly list what I want for each
host, and not assume that I can set up a hierarchy of parameters using
multiple definitions.  Over the past year or so there have been a number
of posts from people who are misled by their own assumptions that
"Hobbit works this way because I want/need it to work this way."

	 
	GLH

		 
		From: Bouchard, Brian [mailto:user-4c1afba0ca37@xymon.invalid] 
		Sent: Friday, November 07, 2008 8:52 AM
		To: user-ae9b8668bcde@xymon.invalid
		Subject: [hobbit] Need help determining why alerts
didn't come

		Hello Hobbit Gurus,

		 
		I am seeking help determining why we recently received
only some alerts that were configured on a given server.

		 
		In my hobbit-clients.cfg file I have multiple sections
of relevance:

		 
		#######################################################

		# generic checks for all WebLogic Servers

		#######################################################

		HOST= applesauce,gravy,enchilada,chips

		        DISK    *       95 97

		        PROC dsmcad 1 -1 yellow

		        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST
red

		#######################################################

		# specific checks for applesauce

		#######################################################

		HOST=applesauce

		       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL"
COLOR=yellow

		       PROC "weblogic.Name=" 3 3 red
TEXT=TOTAL_WEBLOGIC_PROCESSES

		       PROC "weblogic.Name=prod_alsb_01" 1 1 red
TEXT=PROD_ALSB_01

		       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red
TEXT=PROD_CCS_WLI_01

		       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red
TEXT=PROD_CCS_ALDSP_01

		 
		So, a couple of questions:

		 
		1)       Is it valid to have different alerts for the
same HOST in the hobbit-clients.cfg like this?  It seemed to work in
some instances, but I should ask before moving forward...

		2)       Yesterday, I received the alerts with TEXT=
"TOTAL_WEBLOGIC_PROCESSES" and "PROD_ALSB_01"  when I logged onto the
server, I found the filesystem this process was running on was 100%
used, which caused this process to die.  I cleaned up a bunch of log
files, and restarted the process and all was good...  BUT... Why didn't
I receive the alert that the DISK was more than 97% full.  I checked the
history for the disk usage, and it had been over 95% for at least 6
hours prior to the process going down.  Also, the check for the
"jrockit" file did not kick off when that file was create  (after the
filesystem was at 100%)  I need to determine why we weren't warned on
the disk space issue before our production application came down.

		3)       One other thing I noticed was that the IP
address for this server was incorrect in the bb-hosts file.  I assume
that's an issue, but I'm not sure why we got some expected alerts and
not others.  Also, I updated this entry in the bb-hosts file to the
correct IP, and cycled the hobbit server, but I am still not receiving
the alert on the jrockit file, which is still out there.

		 
		Any help is appreciated.  I'm relatively new to Hobbit,
so its completely within the realm of possibility that I don't have any
of this set up correctly. Please feel free to correct me on anything
that looks out of whack.

		 
		- Brian

list T.J. Yang · Fri, 7 Nov 2008 12:52:50 -0600 ·

please look at "debugging' section of following url
 
http://en.wikibooks.org/wiki/System_Monitoring_with_Hobbit/Other_Docs/FAQ#Q._How_do_I_configure_.22GROUP.22_alerts_.3F
 
you can trace which alert rule got matched or not.
 
Hope this helps
T.J. Yang


Date: Fri, 7 Nov 2008 13:11:35 -0500From: user-6814e9ebff5d@xymon.invalid: user-2607dbc47d4f@xymon.invalid: RE: [hobbit] Need help determining why alerts didn't come

▸ quoted from Greg L Hubbard



Ok, I removed the hierarchy as suggested, Greg.
 
Then I added a line to my applesauce server so the hobbit-clients.cfg now has the following:
 
 
HOST=applesauce
       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red TEXT=PROD_CCS_ALDSP_01
       DISK /wls_domains 40 97
 
 
Looking at the disk page for this server on hobbit, the page is still green, and I see the following:
 
/dev/mapper/vg00-lvol10   9289080 5718512   3098712  65% /wls_domains
 
When I run the config report for this server I see the following for disk:
 

disk

No

-/-/-

Default limits: Yellow 90% full, Red 95% full
/wls_appl
/var
/boot
/wls_logs
/wls_domains
/opt
/usr
/root
/dev
/shm
/home
/tmp
 
I assume this is saying all of these disks are only going to go yellow on 90% full., and red on 95% full?  If this is the case, we clearly have something set up incorrectly.  If I am misunderstanding the report, please let me know.


From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid] Sent: Friday, November 07, 2008 10:14 AMTo: user-2607dbc47d4f@xymon.invalid: RE: [hobbit] Need help determining why alerts didn't come

▸ quoted from Greg L Hubbard

 
You can always look at the page behind the "info" button for applesauce to see how the alert rules were interpreted.  You can also run an event configuration report.
 
Personally, I would not try to be too clever in any of the Hobbit configuration files unless the documentation provides a specific example of "cleverness."  I would explicitly list what I want for each host, and not assume that I can set up a hierarchy of parameters using multiple definitions.  Over the past year or so there have been a number of posts from people who are misled by their own assumptions that "Hobbit works this way because I want/need it to work this way."
 
GLH


From: Bouchard, Brian [mailto:user-4c1afba0ca37@xymon.invalid] Sent: Friday, November 07, 2008 8:52 AMTo: user-2607dbc47d4f@xymon.invalid: [hobbit] Need help determining why alerts didn't come

▸ quoted from Greg L Hubbard

Hello Hobbit Gurus,
 
I am seeking help determining why we recently received only some alerts that were configured on a given server.
 
 
In my hobbit-clients.cfg file I have multiple sections of relevance:
 
#######################################################
# generic checks for all WebLogic Servers
#######################################################
HOST= applesauce,gravy,enchilada,chips
        DISK    *       95 97
        PROC dsmcad 1 -1 yellow
        FILE "%/wls_domains/.*/jrockit..*.dump" NOEXIST red
#######################################################
# specific checks for applesauce
#######################################################
HOST=applesauce
       LOG  /var/log/messages "%(?-i)SERIOUS_CRITICAL" COLOR=yellow
       PROC "weblogic.Name=" 3 3 red TEXT=TOTAL_WEBLOGIC_PROCESSES
       PROC "weblogic.Name=prod_alsb_01" 1 1 red TEXT=PROD_ALSB_01
       PROC "weblogic.Name=prod_ccs_wli_01" 1 1 red TEXT=PROD_CCS_WLI_01
       PROC "weblogic.Name=prod_ccs_aldsp_01" 1 1 red TEXT=PROD_CCS_ALDSP_01
 
 
So, a couple of questions:
 
1)       Is it valid to have different alerts for the same HOST in the hobbit-clients.cfg like this?  It seemed to work in some instances, but I should ask before moving forward…
2)       Yesterday, I received the alerts with TEXT=  “TOTAL_WEBLOGIC_PROCESSES” and “PROD_ALSB_01”  when I logged onto the server, I found the filesystem this process was running on was 100% used, which caused this process to die.  I cleaned up a bunch of log files, and restarted the process and all was good…  BUT… Why didn’t I receive the alert that the DISK was more than 97% full.  I checked the history for the disk usage, and it had been over 95% for at least 6 hours prior to the process going down.  Also, the check for the “jrockit” file did not kick off when that file was create  (after the filesystem was at 100%)  I need to determine why we weren’t warned on the disk space issue before our production application came down.
3)       One other thing I noticed was that the IP address for this server was incorrect in the bb-hosts file.  I assume that’s an issue, but I’m not sure why we got some expected alerts and not others.  Also, I updated this entry in the bb-hosts file to the correct IP, and cycled the hobbit server, but I am still not receiving the alert on the jrockit file, which is still out there.
 
Any help is appreciated.  I’m relatively new to Hobbit, so its completely within the realm of possibility that I don’t have any of this set up correctly. Please feel free to correct me on anything that looks out of whack.
 
- Brian


Stay up to date on your PC, the Web, and your mobile phone with Windows Live
http://clk.atdmt.com/MRT/go/119462413/direct/01/

list Henrik Størner · Mon, 10 Nov 2008 14:12:11 +0000 (UTC) ·

In <user-00509fe46c7a@xymon.invalid> Tom Callahan <user-16f19114071e@xymon.invalid> writes:

I=B9ve noticed inability to correctly parse =B3df=B2 if you have long device name=
s
(think device-mapper).

My solution was to change DF=3D=B2df =ADk=B2 in bbsys.local to DF=3D=B2df =ADk =ADP=B2 for
POSIX mode.


bbsys.local ? Sounds like either a custom config on your setup,
or something left over from a previous Big Brother installation.


Henrik

Need help determining why alerts didn't come 🔗 link

Need help determining why alerts didn't come