Xymon Mailing List Archive search

Hobbit server crashing

19 messages in this thread

list Vernon Everett · Thu, 9 Oct 2008 20:20:32 +0800 ·
Hi all

Henrik, I think this might be one for you.
My Hobbit server crashed and died.

This happened before, a few months ago, and I shrugged it off - sometimes sh1t happens.
Then it happened last week again. This time I was concerned.
Now it has just happened again, about 40 minutes ago.

I tried to restart hobbit, without much luck, then I walked away, put my son into bed, and then tried again.
This time it worked.

The logs never showed anything conclusive, but maybe I just don't know what I am looking for.

The symptoms were the same all three times.
All "passive" server based tests go purple.
By passive server based, I mean conn, http, content, ssh, ftp, ftps, etc. The tests that do not rely on a client.
Also went purple, was bbd and bbtest.

All client based tests were unaffected. Graphing worked as normal. And alerts were being sent out.

I am running 4.2 with all-in-one patch, the Sun if-config patch, the bbwin update and the devmon update.

Has anybody seen anything like this before?
Anybody got any tips on where to look for the problem, or how to diagnose this?

Regards
     Vernon


NOTICE: This email and any attachments are confidential. They may contain legally privileged information or copyright material. You must not read, copy, use or disclose them without authorisation. If you are not an intended recipient, please contact us at once by return email and then delete both messages and all attachments.
list Henrik Størner · Thu, 9 Oct 2008 13:17:48 +0000 (UTC) ·
quoted from Vernon Everett
In <user-1ee6c893bc5e@xymon.invalid> "Everett, Vernon" <user-9da1a1882f49@xymon.invalid> writes:
My Hobbit server crashed and died.
This happened before, a few months ago, and I shrugged it off - sometimes
sh1t happens.
Then it happened last week again. This time I was concerned.
Now it has just happened again, about 40 minutes ago.
I tried to restart hobbit, without much luck, then I walked away, put my son
into bed, and then tried again.
This time it worked.
The logs never showed anything conclusive, but maybe I just don't know what
I am looking for.
The symptoms were the same all three times.
All "passive" server based tests go purple.
By passive server based, I mean conn, http, content, ssh, ftp, ftps, etc.
The tests that do not rely on a client.
Also went purple, was bbd and bbtest.
All client based tests were unaffected. Graphing worked as normal. And 
alerts were being sent out.

Your description sounds very much as if the only thing that stopped were 
the network tests (bbtest-net). Since the client-side tests are updating,
network tests go purple and alerts go out, I think that is where the
problem is. "bbtest" going purple also points in this direction.

Next time it happens, see if there's a "bbtest-net" process running (and possible 
a "hobbitping" or "fping" process as well); if there is, kill it with a "kill -6"
to make it dump core. Then do the usual stuff of getting a stacktrace from the
core file ( http://www.hswn.dk/hobbit/help/known-issues.html#bugreport )

Are you running bbtest-net with the "--no-ares" option ? Then a hung/slow DNS server
can make your network tests run very slowly.


Henrik
list Vernon Everett · Thu, 9 Oct 2008 21:38:17 +0800 ·
Hmm, that is what I suspected, because I found this in the log file after sending my mail
This might be something conclusive, if I had even the foggiest idea what it meant.
Hoping some of the smarter list members can assist.

This log entry is not time-stamped, but it was the last entry before I did the restart.

From hobbitlaunch.cfg

[bbnet]
        ENVFILE /usr/lib/hobbit/server/etc/hobbitserver.cfg
        NEEDS hobbitd
        CMD bbtest-net --report --ping --checkresponse
        LOGFILE $BBSERVERLOGS/bb-network.log
        INTERVAL 5m

From bb-network.log
*** glibc detected *** bbtest-net: double free or corruption (out): 0x000000000a96dd20 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3db7a71634]
/lib64/libc.so.6(cfree+0x8c)[0x3db7a74c5c]
bbtest-net[0x42493a]
bbtest-net[0x422bdf]
bbtest-net[0x422d7e]
bbtest-net[0x40f7d7]
bbtest-net[0x4076cc]
bbtest-net[0x4088c6]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3db7a1d8b4]
bbtest-net[0x4039e9]
======= Memory map: ========
00400000-00430000 r-xp 00000000 fd:01 389432                             /usr/lib/hobbit/server/bin/bbtest-net
00630000-00631000 rw-p 00030000 fd:01 389432                             /usr/lib/hobbit/server/bin/bbtest-net
00631000-00637000 rw-p 00631000 00:00 0
00830000-00832000 rw-p 00030000 fd:01 389432                             /usr/lib/hobbit/server/bin/bbtest-net
0a8d1000-0aa1b000 rw-p 0a8d1000 00:00 0
3224600000-3224638000 r-xp 00000000 fd:01 68061                          /usr/lib64/libldap-2.3.so.0.2.15
3224638000-3224838000 ---p 00038000 fd:01 68061                          /usr/lib64/libldap-2.3.so.0.2.15
3224838000-322483a000 rw-p 00038000 fd:01 68061                          /usr/lib64/libldap-2.3.so.0.2.15
3366400000-3366443000 r-xp 00000000 fd:00 75879                          /lib64/libssl.so.0.9.8b
3366443000-3366643000 ---p 00043000 fd:00 75879                          /lib64/libssl.so.0.9.8b
3366643000-3366649000 rw-p 00043000 fd:00 75879                          /lib64/libssl.so.0.9.8b
3368000000-3368125000 r-xp 00000000 fd:00 75876                          /lib64/libcrypto.so.0.9.8b
3368125000-3368325000 ---p 00125000 fd:00 75876                          /lib64/libcrypto.so.0.9.8b
3368325000-3368344000 rw-p 00125000 fd:00 75876                          /lib64/libcrypto.so.0.9.8b
3368344000-3368348000 rw-p 3368344000 00:00 0
385c600000-385c63b000 r-xp 00000000 fd:00 75779                          /lib64/libsepol.so.1
385c63b000-385c83b000 ---p 0003b000 fd:00 75779                          /lib64/libsepol.so.1
385c83b000-385c83c000 rw-p 0003b000 fd:00 75779                          /lib64/libsepol.so.1
385c83c000-385c846000 rw-p 385c83c000 00:00 0
385ca00000-385ca15000 r-xp 00000000 fd:00 75786                          /lib64/libselinux.so.1
385ca15000-385cc15000 ---p 00015000 fd:00 75786                          /lib64/libselinux.so.1
385cc15000-385cc17000 rw-p 00015000 fd:00 75786                          /lib64/libselinux.so.1
385cc17000-385cc18000 rw-p 385cc17000 00:00 0
385ce00000-385ce8f000 r-xp 00000000 fd:01 68057                          /usr/lib64/libkrb5.so.3.3
385ce8f000-385d08e000 ---p 0008f000 fd:01 68057                          /usr/lib64/libkrb5.so.3.3
385d08e000-385d092000 rw-p 0008e000 fd:01 68057                          /usr/lib64/libkrb5.so.3.3
385d600000-385d608000 r-xp 00000000 fd:01 68055                          /usr/lib64/libkrb5support.so.0.1
385d608000-385d807000 ---p 00008000 fd:01 68055                          /usr/lib64/libkrb5support.so.0.1
385d807000-385d808000 rw-p 00007000 fd:01 68055                          /usr/lib64/libkrb5support.so.0.1
385da00000-385da24000 r-xp 00000000 fd:01 68056                          /usr/lib64/libk5crypto.so.3.1
385da24000-385dc23000 ---p 00024000 fd:01 68056                          /usr/lib64/libk5crypto.so.3.1
385dc23000-385dc25000 rw-p 00023000 fd:01 68056                          /usr/lib64/libk5crypto.so.3.1
385de00000-385de2c000 r-xp 00000000 fd:01 68058                          /usr/lib64/libgssapi_krb5.so.2.2
385de2c000-385e02c000 ---p 0002c000 fd:01 68058                          /usr/lib64/libgssapi_krb5.so.2.2
385e02c000-385e02e000 rw-p 0002c000 fd:01 68058                          /usr/lib64/libgssapi_krb5.so.2.2
3af4400000-3af440d000 r-xp 00000000 fd:01 68095                          /usr/lib64/liblber-2.3.so.0.2.15
3af440d000-3af460d000 ---p 0000d000 fd:01 68095                          /usr/lib64/liblber-2.3.so.0.2.15
3af460d000-3af460e000 rw-p 0000d000 fd:01 68095                          /usr/lib64/liblber-2.3.so.0.2.15
3db7600000-3db761a000 r-xp 00000000 fd:00 75791                          /lib64/ld-2.5.so
3db781a000-3db781b000 r--p 0001a000 fd:00 75791                          /lib64/ld-2.5.so
3db781b000-3db781c000 rw-p 0001b000 fd:00 75791                          /lib64/ld-2.5.so
3db7a00000-3db7b4a000 r-xp 00000000 fd:00 75797                          /lib64/libc-2.5.so
3db7b4a000-3db7d49000 ---p 0014a000 fd:00 75797                          /lib64/libc-2.5.so
3db7d49000-3db7d4d000 r--p 00149000 fd:00 75797                          /lib64/libc-2.5.so
3db7d4d000-3db7d4e000 rw-p 0014d000 fd:00 75797                          /lib64/libc-2.5.so
3db7d4e000-3db7d53000 rw-p 3db7d4e000 00:00 0
3db7e00000-3db7e02000 r-xp 00000000 fd:00 75840                          /lib64/libdl-2.5.so
3db7e02000-3db8002000 ---p 00002000 fd:00 75840                          /lib64/libdl-2.5.so
3db8002000-3db8003000 r--p 00002000 fd:00 75840                          /lib64/libdl-2.5.so
3db8003000-3db8004000 rw-p 00003000 fd:00 75840                          /lib64/libdl-2.5.so
3db8200000-3db8218000 r-xp 00000000 fd:01 67958                          /usr/lib64/libsasl2.so.2.0.22
3db8218000-3db8418000 ---p 00018000 fd:01 67958                          /usr/lib64/libsasl2.so.2.0.22
3db8418000-3db8419000 rw-p 00018000 fd:01 67958                          /usr/lib64/libsasl2.so.2.0.22
3db8600000-3db8614000 r-xp 00000000 fd:01 67257                          /usr/lib64/libz.so.1.2.3
3db8614000-3db8813000 ---p 00014000 fd:01 67257                          /usr/lib64/libz.so.1.2.3
3db8813000-3db8814000 rw-p 00013000 fd:01 67257                          /usr/lib64/libz.so.1.2.3
3db9a00000-3db9a09000 r-xp 00000000 fd:00 76086                          /lib64/libcrypt-2.5.so
3db9a09000-3db9c08000 ---p 00009000 fd:00 76086                          /lib64/libcrypt-2.5.so
3db9c08000-3db9c09000 r--p 00008000 fd:00 76086                          /lib64/libcrypt-2.5.so
3db9c09000-3db9c0a000 rw-p 00009000 fd:00 76086                          /lib64/libcrypt-2.5.so
3db9c0a000-3db9c38000 rw-p 3db9c0a000 00:00 0
3dba600000-3dba611000 r-xp 00000000 fd:00 76082                          /lib64/libresolv-2.5.so
3dba611000-3dba811000 ---p 00011000 fd:00 76082                          /lib64/libresolv-2.5.so
3dba811000-3dba812000 r--p 00011000 fd:00 76082                          /lib64/libresolv-2.5.so
3dba812000-3dba813000 rw-p 00012000 fd:00 76082                          /lib64/libresolv-2.5.so
3dba813000-3dba815000 rw-p 3dba813000 00:00 0
3dbaa00000-3dbaa02000 r-xp 00000000 fd:00 76083                          /lib64/libcom_err.so.2.1
3dbaa02000-3dbac01000 ---p 00002000 fd:00 76083                          /lib64/libcom_err.so.2.1
3dbac01000-3dbac02000 rw-p 00001000 fd:00 76083                          /lib64/libcom_err.so.2.1
3dbba00000-3dbba02000 r-xp 00000000 fd:00 76080                          /lib64/libkeyutils-1.2.so
3dbba02000-3dbbc01000 ---p 00002000 fd:00 76080                          /lib64/libkeyutils-1.2.so
3dbbc01000-3dbbc02000 rw-p 00001000 fd:00 76080                          /lib64/libkeyutils-1.2.so
3dbc200000-3dbc20d000 r-xp 00000000 fd:00 75803                          /lib64/libgcc_s-4.1.2-20080102.so.1
3dbc20d000-3dbc40d000 ---p 0000d000 fd:00 75803                          /lib64/libgcc_s-4.1.2-20080102.so.1
3dbc40d000-3dbc40e000 rw-p 0000d000 fd:00 75803                          /lib64/libgcc_s-4.1.2-20080102.so.1
2b224ebb4000-2b224ebb6000 rw-p 2b224ebb4000 00:00 0
2b224ebc1000-2b224ebc9000 rw-p 2b224ebc1000 00:00 0
2b224ebc9000-2b224ebd3000 r-xp 00000000 fd:00 75873                      /lib64/libnss_files-2.5.so
2b224ebd3000-2b224edd2000 ---p 0000a000 fd:00 75873                      /lib64/libnss_files-2.5.so
2b224edd2000-2b224edd3000 r--p 00009000 fd:00 75873                      /lib64/libnss_files-2.5.so
2b224edd3000-2b224edd4000 rw-p 0000a000 fd:00 75873                      /lib64/libnss_files-2.5.so
2b2250000000-2b2250021000 rw-p 2b2250000000 00:00 0
2b2250021000-2b2254000000 ---p 2b2250021000 00:00 0
7fff5bee0000-7fff5bef6000 rw-p 7fff5bee0000 00:00 0                      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
quoted from Henrik Størner

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid]
Sent: Thursday, 9 October 2008 9:18 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Hobbit server crashing

In <user-1ee6c893bc5e@xymon.invalid> "Everett, Vernon" <user-9da1a1882f49@xymon.invalid> writes:
My Hobbit server crashed and died.
This happened before, a few months ago, and I shrugged it off -
sometimes sh1t happens.
Then it happened last week again. This time I was concerned.
Now it has just happened again, about 40 minutes ago.
I tried to restart hobbit, without much luck, then I walked away, put
my son into bed, and then tried again.
This time it worked.
The logs never showed anything conclusive, but maybe I just don't know
what  I am looking for.
The symptoms were the same all three times.
All "passive" server based tests go purple.
By passive server based, I mean conn, http, content, ssh, ftp, ftps, etc.
The tests that do not rely on a client.
Also went purple, was bbd and bbtest.
All client based tests were unaffected. Graphing worked as normal. And
alerts were being sent out.

Your description sounds very much as if the only thing that stopped were the network tests (bbtest-net). Since the client-side tests are updating, network tests go purple and alerts go out, I think that is where the problem is. "bbtest" going purple also points in this direction.

Next time it happens, see if there's a "bbtest-net" process running (and possible a "hobbitping" or "fping" process as well); if there is, kill it with a "kill -6"
to make it dump core. Then do the usual stuff of getting a stacktrace from the core file ( http://www.hswn.dk/hobbit/help/known-issues.html#bugreport )

Are you running bbtest-net with the "--no-ares" option ? Then a hung/slow DNS server can make your network tests run very slowly.


Henrik


NOTICE: This email and any attachments are confidential. 
They may contain legally privileged information or 
copyright material. You must not read, copy, use or 
disclose them without authorisation. If you are not an 
intended recipient, please contact us at once by return 
email and then delete both messages and all attachments.
list Henrik Størner · Thu, 9 Oct 2008 14:13:03 +0000 (UTC) ·
In <user-7a569889e39c@xymon.invalid> "Everett, Vernon" <user-9da1a1882f49@xymon.invalid> writes:
Hmm, that is what I suspected, because I found this in the log file after s=
ending my mail
*** glibc detected *** bbtest-net: double free or corruption (out): 0x00000=
0000a96dd20 ***
=3D=3D=3D=3D=3D=3D=3D Backtrace: =3D=3D=3D=3D=3D=3D=3D=3D=3D
/lib64/libc.so.6[0x3db7a71634]
/lib64/libc.so.6(cfree+0x8c)[0x3db7a74c5c]
bbtest-net[0x42493a]
bbtest-net[0x422bdf]
bbtest-net[0x422d7e]
bbtest-net[0x40f7d7]
bbtest-net[0x4076cc]
bbtest-net[0x4088c6]
It would be really interesting to find out what these adresses correspond to
in the source code. If you have gdb on this host, could you run 
   gdb ~hobbit/server/bin/bbtest-net
then at the "(gdb)" prompt enter the command
   (gdb) l *0x42493a

Hopefully that gives you something like

henrik at osiris:~/hobbit$ gdb ./bbnet/bbtest-net
(gdb) l *0x8053ce7
0x8053ce7 is in bbgen_ASN1_UTCTIME (contest.c:392).

If it does, it would be nice to see.


Henrik
list Vernon Everett · Thu, 9 Oct 2008 22:34:59 +0800 ·
quoted from Henrik Størner
It would be really interesting to find out what these adresses correspond to in the source code. If you have gdb on this host,
could you run
  gdb ~hobbit/server/bin/bbtest-net
then at the "(gdb)" prompt enter the command
   (gdb) l *0x42493a

Hopefully that gives you something like

henrik at osiris:~/hobbit$ gdb ./bbnet/bbtest-net
(gdb) l *0x8053ce7
0x8053ce7 is in bbgen_ASN1_UTCTIME (contest.c:392).

If it does, it would be nice to see.


Henrik
That would be too easy.
Will talk to my Red-Hat guys in the morning.

-bash-3.2$ gdb /usr/lib/hobbit/server/bin/bbtest-net
GNU gdb Red Hat Linux (6.5-37.el5_2.1rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found)
Using host libthread_db library "/lib64/libthread_db.so.1".

(gdb) l *0x42493a
No symbol table is loaded.  Use the "file" command.
quoted from Vernon Everett

NOTICE: This email and any attachments are confidential. 
They may contain legally privileged information or 
copyright material. You must not read, copy, use or 
disclose them without authorisation. If you are not an 
intended recipient, please contact us at once by return 
email and then delete both messages and all attachments.
list Tom Kauffman · Thu, 9 Oct 2008 11:31:12 -0400 ·
We haven't been putting the Windows Server msgs column on our bb2 page, nor alerting on msgs, because of the number of events that seem to trigger warnings or errors.

Would anyone be willing to share with is your event log filter entries for BBWIN, to use as a starting point?

TIA

Tom Kauffman
list Rafal Roginela · Thu, 9 Oct 2008 10:45:48 -0500 ·
Hi Tom,

What you are talking about is a process that shapes the exceptions per
server. I monitor all of our logs here and we have about 10 servers with
different functions so on each client (I use BBWin and local configs)
initially I looked at the logs and found the pointless errors and
ignored those right away and I still sometimes have to adjust the list
if applications or hardware changes. Here is an excerpt from 2 different
configuration files to give you an example of what I mean:

Server1
	<setting name="alwaysgreen" value="false" />
	<match logfile="System" type="error" delay="1h" alarmcolor="red"
/>
	<match logfile="Application" type="error" delay="1h"
alarmcolor="red" />
	<ignore logfile="System" eventid="1111" />
	<ignore logfile="System" eventid="16" />
	<ignore logfile="System" eventid="12294" />
	<ignore logfile="System" eventid="5805" />
	<ignore logfile="System" eventid="5723" />
	<ignore logfile="Application" eventid="1000" />
	<ignore logfile="Application" eventid="11" />
	<ignore logfile="Application" eventid="34113" />
	<ignore logfile="System" source="Backup Exec" />

Server2
	<setting name="alwaysgreen" value="false" />
	<match logfile="System" type="error" delay="1h" alarmcolor="red"
/>
	<match logfile="Application" type="error" delay="1h"
alarmcolor="red" />
	<ignore logfile="System" eventid="1111" />
	<ignore logfile="System" eventid="16" />
	<ignore logfile="System" eventid="39" />
	<ignore logfile="System" eventid="1106" />
	<ignore logfile="System" eventid="61" />
	<ignore logfile="Application" eventid="107" />
	<ignore logfile="Application" eventid="1106" />
	<ignore logfile="Application" eventid="1000" />
	<ignore logfile="Application" eventid="101" />
	<ignore logfile="Application" eventid="2002" />
	<ignore logfile="Application" eventid="1015" />

As you can see the list is not huge and our servers are ones I inherited
so they experience more weirdness then I'm used to with my own builds,
but you can always find a useless errors in a windows event log. Hope
this helps.

Thank You.

Rafal Roginela

-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid] 
Sent: Thursday, October 09, 2008 10:31 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Looking for sample BBWIN configs for filtering Windows
event logs
quoted from Tom Kauffman

We haven't been putting the Windows Server msgs column on our bb2 page,
nor alerting on msgs, because of the number of events that seem to
trigger warnings or errors.

Would anyone be willing to share with is your event log filter entries
for BBWIN, to use as a starting point?

TIA

Tom Kauffman
list Jon Boede · Thu, 09 Oct 2008 10:48:40 -0500 ·
We put the column in and we let it alert, but we have the alerts file set up so that a msgs alert never pages anybody... it just sends email.  We've commented a few things out but mostly we've looked into the things that were complained about and fixed or shut off the source of the complaint.  It's helped us clean up the boxes quite a bit -- there were services running that did not need to be.

Jon
quoted from Rafal Roginela
 Kauffman, Tom wrote:
We haven't been putting the Windows Server msgs column on our bb2 page, nor alerting on msgs, because of the number of events that seem to trigger warnings or errors.

Would anyone be willing to share with is your event log filter entries for BBWIN, to use as a starting point?

TIA

Tom Kauffman

list Shawn Heisey · Thu, 09 Oct 2008 11:54:01 -0600 ·
Here's our typical list:

    <ignore logfile="System" eventid="2" />
    <ignore logfile="System" eventid="3" />
    <ignore logfile="System" eventid="4" />
    <ignore logfile="System" eventid="8" />
    <ignore logfile="System" eventid="1106" />
    <ignore logfile="System" eventid="1111" />
    <ignore logfile="Application" eventid="3033" />
    <ignore logfile="Application" eventid="2003" />

ID 3033 is an Exchange message relating to Windows Mobile clients, but because Exchange was the first server I converted to BBWin from Big Brother, it's ended up on all of the systems.  ID 2003 is related to performance counters.  It's probably possible to fix, but my focus is not so much on the Windows infrastructure.

The rest are the annoying printer driver entries that you get when you log into a machine via Remote Desktop and are forwarding printers but don't have drivers on the system.  I tried for a long time to get people to turn off printer forwarding, because I could never get Big Brother to stop alarming, but nobody listened.  Hobbit/BBWin has been a lifesaver in this respect.  With a little more work, we will be able to soon include the NOC in all alarms.  With Big Brother, msgs was a flood of crap and would have overwhelmed them.

I have a question that's really more suited for the BBWin mailing list, but I've asked it there and gotten no response:  Does anyone have a complete server-side configuration example for BBWin clients, showing how to handle all aspects of the client configuration?

Thanks,
Shawn
quoted from Jon Boede

Kauffman, Tom wrote:
We haven't been putting the Windows Server msgs column on our bb2 page, nor alerting on msgs, because of the number of events that seem to trigger warnings or errors.
list Gavin Leonard · Thu, 9 Oct 2008 12:01:54 -0600 ·
I have very chatty windows boxes as well,, where do you place these lists? Which file?

-Gavin
quoted from Shawn Heisey

-----Original Message-----
From: Shawn Heisey [mailto:user-5d0d01dba542@xymon.invalid]
Sent: Thursday, October 09, 2008 11:54 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Looking for sample BBWIN configs for filtering Windows event logs

Here's our typical list:

    <ignore logfile="System" eventid="2" />
    <ignore logfile="System" eventid="3" />
    <ignore logfile="System" eventid="4" />
    <ignore logfile="System" eventid="8" />
    <ignore logfile="System" eventid="1106" />
    <ignore logfile="System" eventid="1111" />
    <ignore logfile="Application" eventid="3033" />
    <ignore logfile="Application" eventid="2003" />

ID 3033 is an Exchange message relating to Windows Mobile clients, but
because Exchange was the first server I converted to BBWin from Big
Brother, it's ended up on all of the systems.  ID 2003 is related to
performance counters.  It's probably possible to fix, but my focus is
not so much on the Windows infrastructure.

The rest are the annoying printer driver entries that you get when you
log into a machine via Remote Desktop and are forwarding printers but
don't have drivers on the system.  I tried for a long time to get people
to turn off printer forwarding, because I could never get Big Brother to
stop alarming, but nobody listened.  Hobbit/BBWin has been a lifesaver
in this respect.  With a little more work, we will be able to soon
include the NOC in all alarms.  With Big Brother, msgs was a flood of
crap and would have overwhelmed them.

I have a question that's really more suited for the BBWin mailing list,
but I've asked it there and gotten no response:  Does anyone have a
complete server-side configuration example for BBWin clients, showing
how to handle all aspects of the client configuration?

Thanks,
Shawn

Kauffman, Tom wrote:
We haven't been putting the Windows Server msgs column on our bb2 page, nor alerting on msgs, because of the number of events that seem to trigger warnings or errors.
list Shawn Heisey · Thu, 09 Oct 2008 12:10:22 -0600 ·
It goes in the <msgs> section of BBWin.cfg, normally in C:\Program 
Files\BBWin\etc.
quoted from Gavin Leonard

Gavin Leonard wrote:
I have very chatty windows boxes as well,, where do you place these lists? Which file?
list Bob Gordon · Thu, 9 Oct 2008 12:12:53 -0700 ·
quoted from Shawn Heisey
On Thu, Oct 9, 2008 at 10:54 AM, Shawn Heisey <user-5d0d01dba542@xymon.invalid> wrote:
I have a question that's really more suited for the BBWin mailing list, but
I've asked it there and gotten no response:  Does anyone have a complete
server-side configuration example for BBWin clients, showing how to handle
all aspects of the client configuration?
This is the one that I am using.  I still have some cleanup to do on it
though....

###########################################################
## The defaults used by the Hobbit clients
###########################################################
DEFAULT
        UP      30m
        DISK    * 90 95
        SWAP    85 90
        MEMPHYS 100 101
        MEMSWAP 90 95
        MEMACT  90 97
        CLOCK   30

###########################################################
## Windows Based Systems - Central Config Mode
###########################################################
CLASS=%win32*  EXHOST=server1,server2
        LOAD 80 90              # Load thresholds are in %
        PROC svchost.exe 2 -1
        PROC %[mM]cshield.exe 1 -1
        PROC nserver.exe 1 -1
        PROC nrouter.exe 1 -1
        LOG %.*  %.*error.* COLOR=red
IGNORE=%(BigBrotherHobbitClient|SnapDrive|WinVNC4|TermDD|SV-GSX|TermServDevices|Perflib|PerfNet)


So far its worked out pretty well as my default setting...  After the
Default section and before the generic section above I have my system
specific entries...

-- 
--==[ Bob Gordon ]==--
list Shawn Heisey · Thu, 09 Oct 2008 14:04:30 -0600 ·
It looks like the ignore section only uses text matches, in this case regular expressions, right?  That would mean it can't match on event ID unless I encode something like "Print (8)" in a regular expression format.

Not that this is a huge problem, but having a nice clean field like event ID is one of the good things about BBWin's local config mode.  I'm just tired of having to remote into the client to change something, especially when I have to do it on more than one client.

Thanks for the info!  Only one more thing I'd want - do you have an examples of centrally defined service monitoring?
quoted from Bob Gordon

Bob Gordon wrote:

On Thu, Oct 9, 2008 at 10:54 AM, Shawn Heisey <user-5d0d01dba542@xymon.invalid <mailto:user-5d0d01dba542@xymon.invalid>> wrote:

    I have a question that's really more suited for the BBWin mailing
    list, but I've asked it there and gotten no response:  Does anyone
    have a complete server-side configuration example for BBWin
    clients, showing how to handle all aspects of the client
    configuration?


This is the one that I am using.  I still have some cleanup to do on it though....

###########################################################
## The defaults used by the Hobbit clients
###########################################################
DEFAULT
        UP      30m
        DISK    * 90 95
        SWAP    85 90
        MEMPHYS 100 101
        MEMSWAP 90 95
        MEMACT  90 97
        CLOCK   30

###########################################################
## Windows Based Systems - Central Config Mode
###########################################################
CLASS=%win32*  EXHOST=server1,server2
        LOAD 80 90              # Load thresholds are in %
        PROC svchost.exe 2 -1
        PROC %[mM]cshield.exe 1 -1
        PROC nserver.exe 1 -1
        PROC nrouter.exe 1 -1
        LOG %.*  %.*error.* COLOR=red IGNORE=%(BigBrotherHobbitClient|SnapDrive|WinVNC4|TermDD|SV-GSX|TermServDevices|Perflib|PerfNet)


So far its worked out pretty well as my default setting...  After the Default section and before the generic section above I have my system specific entries...
list Robert P McGraw · Thu, 9 Oct 2008 16:47:08 -0400 ·
Bob,

FWIW.

In server/etc/hobbit-clients.cfg file the last few lines of the comments at
the top says

    # The special DEFAULT section can modify the built-in defaults - this
must
    # be placed at the end of the file.


Robert
quoted from Bob Gordon


On 10/9/08 3:12 PM, "Bob Gordon" <user-488dbf322a4e@xymon.invalid> wrote:
On Thu, Oct 9, 2008 at 10:54 AM, Shawn Heisey <user-5d0d01dba542@xymon.invalid> wrote:
I have a question that's really more suited for the BBWin mailing list, but
I've asked it there and gotten no response:  Does anyone have a complete
server-side configuration example for BBWin clients, showing how to handle
all aspects of the client configuration?
This is the one that I am using.  I still have some cleanup to do on it
though....

###########################################################
## The defaults used by the Hobbit clients
###########################################################
DEFAULT
        UP      30m
        DISK    * 90 95
        SWAP    85 90
        MEMPHYS 100 101
        MEMSWAP 90 95
        MEMACT  90 97
        CLOCK   30

###########################################################
## Windows Based Systems - Central Config Mode
###########################################################
CLASS=%win32*  EXHOST=server1,server2
        LOAD 80 90              # Load thresholds are in %
        PROC svchost.exe 2 -1
        PROC %[mM]cshield.exe 1 -1
        PROC nserver.exe 1 -1
        PROC nrouter.exe 1 -1
        LOG %.*  %.*error.* COLOR=red

IGNORE=%(BigBrotherHobbitClient|SnapDrive|WinVNC4|TermDD|SV-GSX|TermServDevice
s|Perflib|PerfNet)
quoted from Shawn Heisey


So far its worked out pretty well as my default setting...  After the Default
section and before the generic section above I have my system specific
entries...
-- 

Robert P. McGraw, Jr.
Manager, Computer System               EMAIL: user-33cf07af04dd@xymon.invalid
Purdue University                       ROOM: MATH-807
Department of Mathematics              PHONE: (XXX) XXX-XXXX
XXX N. University Street
West Lafayette, IN XXXXX-XXXX
list Bob Gordon · Thu, 9 Oct 2008 15:55:25 -0700 ·
quoted from Robert P McGraw
On Thu, Oct 9, 2008 at 1:47 PM, McGraw, Robert P <user-33cf07af04dd@xymon.invalid> wrote:
 Bob,

FWIW.

In server/etc/hobbit-clients.cfg file the last few lines of the comments at
the top says

    # The special DEFAULT section can modify the built-in defaults - this
must
    # be placed at the end of the file.

Thanks for pointing that out  (wonder when it changed)..  The config file I
have has been used since the first 4.x version when it finished with:

# Rules are evaluated from the top of this file and down, and the first
# matching rule is used. So you should put the specific rules first, and
# the generic rules last.

#
# EXHOST=%^pto\.linuxbog\.dk|\.sslug\.dk|^bb-mws.csc.dk|sarge|trantor|postcode
EXSERVICE=dnsinfo,dnsreg
#


So far it seems to be working but I will go through and double check to make
sure it actually is...  ;)

-- 
--==[ Bob Gordon ]==--
list Bob Gordon · Thu, 9 Oct 2008 15:58:15 -0700 ·
quoted from Shawn Heisey
On Thu, Oct 9, 2008 at 1:04 PM, Shawn Heisey <user-90f60e6a2765@xymon.invalid> wrote:
It looks like the ignore section only uses text matches, in this case
regular expressions, right?  That would mean it can't match on event ID
unless I encode something like "Print (8)" in a regular expression format.

Not that this is a huge problem, but having a nice clean field like event
ID is one of the good things about BBWin's local config mode.  I'm just
tired of having to remote into the client to change something, especially
when I have to do it on more than one client.

Thanks for the info!  Only one more thing I'd want - do you have an
examples of centrally defined service monitoring?
In my case I found it easier to match based on the text rather than the ID.
You should be able to match on the ID rather than the text though...

The entries that I am doing service monitoring on have entries similar to
these:

	SVC "RFBOARD" startup=manual status=started color=red
	SVC "RFDB" startup=manual status=started color=red


Regards,

-- 
--==[ Bob Gordon ]==--
list Shawn Heisey · Fri, 10 Oct 2008 11:41:57 -0600 ·
quoted from Bob Gordon
Bob Gordon wrote:
In my case I found it easier to match based on the text rather than the ID.  You should be able to match on the ID rather than the text though...

The entries that I am doing service monitoring on have entries similar to these:
	SVC "RFBOARD" startup=manual status=started color=red
	SVC "RFDB" startup=manual status=started color=red
  
This is going to be so incredibly helpful.  Do you happen to know if the uptime "yellow alarm" value can be centrally controlled?  I've got some Windows machines that have been up for more than 1000 days, so BBWin reports yellow.  I haven't bothered with changing the value because right now I'd have to do it on dozens of machines individually.

The systems with high uptimes are in continuous production, so I haven't been able to convince the integration folks to install updates and get them rebooted.  They're firmly behind firewalls, so I am not SUPER concerned about their security.
list Raymond Storer · Fri, 10 Oct 2008 17:13:33 -0400 ·
Shawn, you can configure Terminal Services to turn off printer redirection.  It can be done in a group policy object (GPO) or it can be done to each server individually.

See this link for a more visual discussion on the subject:
http://blogs.technet.com/askperf/archive/2007/08/24/terminal-server-and-printer-redirection.aspx

Ray
quoted from Shawn Heisey

----Original Message----
From: Shawn Heisey [mailto:user-5d0d01dba542@xymon.invalid]
Sent: Thursday, October 09, 2008 1:54 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Looking for sample BBWIN configs for filtering Windows event logs
Here's our typical list:

    <ignore logfile="System" eventid="2" />
    <ignore logfile="System" eventid="3" />
    <ignore logfile="System" eventid="4" />
    <ignore logfile="System" eventid="8" />
    <ignore logfile="System" eventid="1106" />
    <ignore logfile="System" eventid="1111" />
    <ignore logfile="Application" eventid="3033" />
    <ignore logfile="Application" eventid="2003" />

ID 3033 is an Exchange message relating to Windows Mobile clients, but
because Exchange was the first server I converted to BBWin from Big
Brother, it's ended up on all of the systems.  ID 2003 is related to
performance counters.  It's probably possible to fix, but my focus
is not so much on the Windows infrastructure.

The rest are the annoying printer driver entries that you get when you
log into a machine via Remote Desktop and are forwarding printers but
don't have drivers on the system.  I tried for a long time to get
people to turn off printer forwarding, because I could never get Big
Brother to stop alarming, but nobody listened.  Hobbit/BBWin has been
a lifesaver in this respect.  With a little more work, we will be able
to soon include the NOC in all alarms.  With Big Brother, msgs
was a flood of crap and would have overwhelmed them.

I have a question that's really more suited for the BBWin mailing
list, but I've asked it there and gotten no response:  Does anyone
have a complete server-side configuration example for BBWin clients,
showing how to handle all aspects of the client configuration?

Thanks,
Shawn
list Shawn Heisey · Fri, 10 Oct 2008 18:32:35 -0600 ·
Sometimes we do actually want to be able to print from the remote 
machine to a local printer, so I have left the capability there. 
quoted from Raymond Storer


Storer, Raymond wrote:
Shawn, you can configure Terminal Services to turn off printer redirection.  It can be done in a group policy object (GPO) or it can be done to each server individually.

See this link for a more visual discussion on the subject:
http://blogs.technet.com/askperf/archive/2007/08/24/terminal-server-and-printer-redirection.aspx