Xymon Mailing List Archive search

Bug Report: Critical error in log couldn't be sent to server every time

8 messages in this thread

list Samuel Cai · Wed, 23 Jul 2008 03:22:27 -0700 ·
Hi,

My company is using Hobbit (4.2) to monitor OutOfMemoryError and
StackOverflowError in application log, but we found out sometimes Hobbit
client did not send data which contains these error strings to server,
that resulted in no error reported.
Below is our configuration snippet in client-local.cfg, as you can see,
although we set maximum amount of data to 10240 bytes, we also set
trigger on key word of Error, so even if there is more data in the log
than the maximum size set, those matched error string should be sent to
server in any case:
[our server]
log:/home/mine/server.log:10240
trigger Error

So I'm thinking two possible reasons:
1. The regular expression for trigger is wrong.
2. There's a bug/limitation in logfetch tool, it can only process a
maximum data, for example, if application happened to write 100M data to
log file in 5 mins, this tool will only process, say last 10M data.

I made some tests to find out root reason, each test contains two steps:
1. Clean log, wait after client sends out data.
2. Fill in some data into log, the first line is "OutOfMemoryError
StackOverflowError", others are just garbage data.

Here is the result, I list the lines (L) and bytes (C) of log after
filled in data:
1. 485L, 54545C, catch error
2. 1445L, 163025C, couldn't catch error
3. 707L, 53771C, couldn't catch error
4. 468L, 36451C, couldn't catch error
5. 226L, 18615C, catch error

The test proves that the trigger pattern is correct, and logfetch tool
has an issue to process all new data if it's large (in lines or in
bytes, I don't know).

We need to fix it or have a workaround, since these errors are so
important, we shouldn't miss them.

Thanks,
Samuel Cai
list Greg L Hubbard · Wed, 23 Jul 2008 08:29:03 -0500 ·
Samuel,

Maybe the current release of Hobbit is not up to this task (maybe you
should ask for a refund :) )?  I think the Hobbit logfetch function is
aimed more at "convenience monitoring" instead of real-time log
filtering. It is not hard to envision cases where processing log files
in "30 minute chunks" might have scalability problems.

If these messages are VERY important, you might search the Web for a
tool that will scan a log file watching for these messages, and then
write them to another log, and then have the Hobbit agent watch the log
you create that only has "interesting" messages in it.

GLH

-----Original Message-----
From: Samuel Cai [mailto:user-ba507acc1d03@xymon.invalid] 
Sent: Wednesday, July 23, 2008 5:22 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Bug Report: Critical error in log couldn't be sent to
server every time
quoted from Samuel Cai

Hi,

My company is using Hobbit (4.2) to monitor OutOfMemoryError and
StackOverflowError in application log, but we found out sometimes Hobbit
client did not send data which contains these error strings to server,
that resulted in no error reported.
Below is our configuration snippet in client-local.cfg, as you can see,
although we set maximum amount of data to 10240 bytes, we also set
trigger on key word of Error, so even if there is more data in the log
than the maximum size set, those matched error string should be sent to
server in any case:
[our server]
log:/home/mine/server.log:10240
trigger Error

So I'm thinking two possible reasons:
1. The regular expression for trigger is wrong.
2. There's a bug/limitation in logfetch tool, it can only process a
maximum data, for example, if application happened to write 100M data to
log file in 5 mins, this tool will only process, say last 10M data.

I made some tests to find out root reason, each test contains two steps:
1. Clean log, wait after client sends out data.
2. Fill in some data into log, the first line is "OutOfMemoryError
StackOverflowError", others are just garbage data.

Here is the result, I list the lines (L) and bytes (C) of log after
filled in data:
1. 485L, 54545C, catch error
2. 1445L, 163025C, couldn't catch error
3. 707L, 53771C, couldn't catch error
4. 468L, 36451C, couldn't catch error
5. 226L, 18615C, catch error

The test proves that the trigger pattern is correct, and logfetch tool
has an issue to process all new data if it's large (in lines or in
bytes, I don't know).

We need to fix it or have a workaround, since these errors are so
important, we shouldn't miss them.

Thanks,
Samuel Cai
list S Aiello · Wed, 23 Jul 2008 09:50:12 -0400 ·
quoted from Greg L Hubbard
On Wednesday 23 July 2008, Hubbard, Greg L wrote:
Samuel,

Maybe the current release of Hobbit is not up to this task (maybe you
should ask for a refund :) )?  I think the Hobbit logfetch function is
aimed more at "convenience monitoring" instead of real-time log
filtering. It is not hard to envision cases where processing log files
in "30 minute chunks" might have scalability problems.

If these messages are VERY important, you might search the Web for a
tool that will scan a log file watching for these messages, and then
write them to another log, and then have the Hobbit agent watch the log
you create that only has "interesting" messages in it.

GLH

-----Original Message-----
From: Samuel Cai [mailto:user-ba507acc1d03@xymon.invalid]
Sent: Wednesday, July 23, 2008 5:22 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Bug Report: Critical error in log couldn't be sent to
server every time

Hi,

My company is using Hobbit (4.2) to monitor OutOfMemoryError and
StackOverflowError in application log, but we found out sometimes Hobbit
client did not send data which contains these error strings to server,
that resulted in no error reported.
Below is our configuration snippet in client-local.cfg, as you can see,
although we set maximum amount of data to 10240 bytes, we also set
trigger on key word of Error, so even if there is more data in the log
than the maximum size set, those matched error string should be sent to
server in any case:
[our server]
log:/home/mine/server.log:10240
trigger Error

So I'm thinking two possible reasons:
1. The regular expression for trigger is wrong.
2. There's a bug/limitation in logfetch tool, it can only process a
maximum data, for example, if application happened to write 100M data to
log file in 5 mins, this tool will only process, say last 10M data.

I made some tests to find out root reason, each test contains two steps:
1. Clean log, wait after client sends out data.
2. Fill in some data into log, the first line is "OutOfMemoryError
StackOverflowError", others are just garbage data.

Here is the result, I list the lines (L) and bytes (C) of log after
filled in data:
1. 485L, 54545C, catch error
2. 1445L, 163025C, couldn't catch error
3. 707L, 53771C, couldn't catch error
4. 468L, 36451C, couldn't catch error
5. 226L, 18615C, catch error

The test proves that the trigger pattern is correct, and logfetch tool
has an issue to process all new data if it's large (in lines or in
bytes, I don't know).

We need to fix it or have a workaround, since these errors are so
important, we shouldn't miss them.

Thanks,
Samuel Cai
It really depends on what log level your application is logging at. If you are 
logging at 'INFO' level, then there will be alot of data to process. As you 
see, Hobbit implements a limit on how much log data it will parse. This is a 
good thing, at least in my opinion.

It all depends what is in your log... and why soo much data is being written. 
If they are all errors, well hobbit would be catching them telling you there 
are errors. Since this is not the case.. would guess your log has data other 
than errors.

Suggestions:
1. tune your application log settings so that only errors are written.
2. make use of the client-local.cfg log's setting of ignore. This will allow 
the hobbit client to identify what is an extraneous message, and ignore it. 
Per the man page:

The ignore PATTERN line (optional) defines lines in the logfile which are 
ignored entirely, i.e. they are stripped from the logfile data before sending 
it to the Hobbit server. It is used to remove completely unwanted "noise" 
entries from the logdata processed by Hobbit. "PATTERN" is a regular 
expression.

I hope this helps you,
 ~Steve
list Samuel Cai · Wed, 23 Jul 2008 19:32:53 -0700 ·
It's great to hear you guys, Hubbard and Steve, that you also find this
is a limitation (more than a bug), not wrong in my configuration.

I was thinking to modify source codes before, but it might be difficult
for me. I'll try your suggestions, thanks!

Samuel Cai
quoted from S Aiello

-----Original Message-----
From: user-ce96540ed38f@xymon.invalid [mailto:user-ce96540ed38f@xymon.invalid] 
Sent: Wednesday, July 23, 2008 9:50 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Bug Report: Critical error in log couldn't be sent
to server every time

On Wednesday 23 July 2008, Hubbard, Greg L wrote:
Samuel,

Maybe the current release of Hobbit is not up to this task (maybe you
should ask for a refund :) )?  I think the Hobbit logfetch function is
aimed more at "convenience monitoring" instead of real-time log
filtering. It is not hard to envision cases where processing log files
in "30 minute chunks" might have scalability problems.

If these messages are VERY important, you might search the Web for a
tool that will scan a log file watching for these messages, and then
write them to another log, and then have the Hobbit agent watch the
log
you create that only has "interesting" messages in it.

GLH

-----Original Message-----
From: Samuel Cai [mailto:user-ba507acc1d03@xymon.invalid]
Sent: Wednesday, July 23, 2008 5:22 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] Bug Report: Critical error in log couldn't be sent
to
server every time

Hi,

My company is using Hobbit (4.2) to monitor OutOfMemoryError and
StackOverflowError in application log, but we found out sometimes
Hobbit
client did not send data which contains these error strings to server,
that resulted in no error reported.
Below is our configuration snippet in client-local.cfg, as you can
see,
although we set maximum amount of data to 10240 bytes, we also set
trigger on key word of Error, so even if there is more data in the log
than the maximum size set, those matched error string should be sent
to
server in any case:
[our server]
log:/home/mine/server.log:10240
trigger Error

So I'm thinking two possible reasons:
1. The regular expression for trigger is wrong.
2. There's a bug/limitation in logfetch tool, it can only process a
maximum data, for example, if application happened to write 100M data
to
log file in 5 mins, this tool will only process, say last 10M data.

I made some tests to find out root reason, each test contains two
steps:
1. Clean log, wait after client sends out data.
2. Fill in some data into log, the first line is "OutOfMemoryError
StackOverflowError", others are just garbage data.

Here is the result, I list the lines (L) and bytes (C) of log after
filled in data:
1. 485L, 54545C, catch error
2. 1445L, 163025C, couldn't catch error
3. 707L, 53771C, couldn't catch error
4. 468L, 36451C, couldn't catch error
5. 226L, 18615C, catch error

The test proves that the trigger pattern is correct, and logfetch tool
has an issue to process all new data if it's large (in lines or in
bytes, I don't know).

We need to fix it or have a workaround, since these errors are so
important, we shouldn't miss them.

Thanks,
Samuel Cai
It really depends on what log level your application is logging at. If
you are 
logging at 'INFO' level, then there will be alot of data to process. As
you 
see, Hobbit implements a limit on how much log data it will parse. This
is a 
good thing, at least in my opinion.

It all depends what is in your log... and why soo much data is being
written. 
If they are all errors, well hobbit would be catching them telling you
there 
are errors. Since this is not the case.. would guess your log has data
other 
than errors.

Suggestions:
1. tune your application log settings so that only errors are written.
2. make use of the client-local.cfg log's setting of ignore. This will
allow 
the hobbit client to identify what is an extraneous message, and ignore
it. 
Per the man page:

The ignore PATTERN line (optional) defines lines in the logfile which
are 
ignored entirely, i.e. they are stripped from the logfile data before
sending 
it to the Hobbit server. It is used to remove completely unwanted
"noise" 
entries from the logdata processed by Hobbit. "PATTERN" is a regular 
expression.

I hope this helps you,
 ~Steve
list S Aiello · Thu, 24 Jul 2008 08:40:58 -0400 ·
quoted from Samuel Cai
On Wednesday 23 July 2008, Samuel Cai wrote:
It really depends on what log level your application is logging at. If
you are
logging at 'INFO' level, then there will be alot of data to process. As
you
see, Hobbit implements a limit on how much log data it will parse. This
is a
good thing, at least in my opinion.

It all depends what is in your log... and why soo much data is being
written.
If they are all errors, well hobbit would be catching them telling you
there
are errors. Since this is not the case.. would guess your log has data
other
than errors.

Suggestions:
1. tune your application log settings so that only errors are written.
2. make use of the client-local.cfg log's setting of ignore. This will
allow
the hobbit client to identify what is an extraneous message, and ignore
it.
Per the man page:

The ignore PATTERN line (optional) defines lines in the logfile which
are
ignored entirely, i.e. they are stripped from the logfile data before
sending
it to the Hobbit server. It is used to remove completely unwanted
"noise"
entries from the logdata processed by Hobbit. "PATTERN" is a regular
expression.

I hope this helps you,
 ~Steve
It's great to hear you guys, Hubbard and Steve, that you also find this
is a limitation (more than a bug), not wrong in my configuration.

I was thinking to modify source codes before, but it might be difficult
for me. I'll try your suggestions, thanks!

Samuel Cai
In my reply to your email, I said that this behavior "was a good thing". I do 
not find this to be a limitation at all. I offered you two possible 
solutions, were any of these applicable ?

The "limitation" really resides in whatever application is logging soo 
verbosely. Production level applications should have their logging limited as 
much as possible whenever possible, only logging indicators of errors. And 
whenever this isn't possible, make use of the IGNORE option.

 ~Steve
list Samuel Cai · Thu, 24 Jul 2008 18:01:08 -0700 ·
Hi, Steve

For your two suggestions, I checked the source codes, there is a 100K
limitation, so it doesn't help to introduce ignore if the Error is out
of that range. So you may wonder why we output more than 100K in just 30
minutes, there are several reasons:
1. It is our production server log, which is busy.
2. We set level to Warning, which output more than Error level
3. Some our codes did not set log level correctly, we're in the process
of cleaning up it.
4. If there's exception, we output whole thread log in INFO level, which
is huge.

Anyway, I still think 100K in 30 minutes is a little small value for a
busy site's log, I would like to remove this limitation and also keep
cleaning up our logs.

Thanks for your suggestion,
quoted from S Aiello
Samuel Cai

-----Original Message-----
From: user-ce96540ed38f@xymon.invalid [mailto:user-ce96540ed38f@xymon.invalid] 
Sent: Thursday, July 24, 2008 8:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Bug Report: Critical error in log couldn't be sent
to server every time

On Wednesday 23 July 2008, Samuel Cai wrote:
It really depends on what log level your application is logging at.
If
you are
logging at 'INFO' level, then there will be alot of data to process.
As
you
see, Hobbit implements a limit on how much log data it will parse.
This
is a
good thing, at least in my opinion.

It all depends what is in your log... and why soo much data is being
written.
If they are all errors, well hobbit would be catching them telling
you
there
are errors. Since this is not the case.. would guess your log has
data
other
than errors.

Suggestions:
1. tune your application log settings so that only errors are
written.
2. make use of the client-local.cfg log's setting of ignore. This
will
allow
the hobbit client to identify what is an extraneous message, and
ignore
it.
Per the man page:

The ignore PATTERN line (optional) defines lines in the logfile which
are
ignored entirely, i.e. they are stripped from the logfile data before
sending
it to the Hobbit server. It is used to remove completely unwanted
"noise"
entries from the logdata processed by Hobbit. "PATTERN" is a regular
expression.

I hope this helps you,
 ~Steve
It's great to hear you guys, Hubbard and Steve, that you also find
this
is a limitation (more than a bug), not wrong in my configuration.

I was thinking to modify source codes before, but it might be
difficult
for me. I'll try your suggestions, thanks!

Samuel Cai
In my reply to your email, I said that this behavior "was a good thing".
I do 
not find this to be a limitation at all. I offered you two possible 
solutions, were any of these applicable ?

The "limitation" really resides in whatever application is logging soo 
verbosely. Production level applications should have their logging
limited as 
much as possible whenever possible, only logging indicators of errors.
And 
whenever this isn't possible, make use of the IGNORE option.

 ~Steve
list Greg L Hubbard · Fri, 25 Jul 2008 08:45:10 -0500 ·
Samuel,

If you think this through, you probably don't really want the Hobbit
agent to send that much data up to the Hobbit server every 5 minutes.
You are welcome to modify your own source code to change the 100K limit
to anything you want, but I think you will be better served to look for
a near-real-time log filtering process that can process your big, busy
log on the local host, and then spit out "significant" messages in
another log that you can wire into Hobbit.

I do not have any solutions to offer, but I think you can probably find
plenty of options if you spend a few minutes in a Google search. 
quoted from Samuel Cai

-----Original Message-----
From: Samuel Cai [mailto:user-ba507acc1d03@xymon.invalid] 
Sent: Thursday, July 24, 2008 8:01 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Bug Report: Critical error in log couldn't be sent
to server every time

Hi, Steve

For your two suggestions, I checked the source codes, there is a 100K
limitation, so it doesn't help to introduce ignore if the Error is out
of that range. So you may wonder why we output more than 100K in just 30
minutes, there are several reasons:
1. It is our production server log, which is busy.
2. We set level to Warning, which output more than Error level 3. Some
our codes did not set log level correctly, we're in the process of
cleaning up it.
4. If there's exception, we output whole thread log in INFO level, which
is huge.

Anyway, I still think 100K in 30 minutes is a little small value for a
busy site's log, I would like to remove this limitation and also keep
cleaning up our logs.

Thanks for your suggestion,
Samuel Cai

-----Original Message-----
From: user-ce96540ed38f@xymon.invalid [mailto:user-ce96540ed38f@xymon.invalid]
Sent: Thursday, July 24, 2008 8:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Bug Report: Critical error in log couldn't be sent
to server every time

On Wednesday 23 July 2008, Samuel Cai wrote:
It really depends on what log level your application is logging at.
If
you are
logging at 'INFO' level, then there will be alot of data to process.
As
you
see, Hobbit implements a limit on how much log data it will parse.
This
is a
good thing, at least in my opinion.

It all depends what is in your log... and why soo much data is being 
written.
If they are all errors, well hobbit would be catching them telling
you
there
are errors. Since this is not the case.. would guess your log has
data
other
than errors.

Suggestions:
1. tune your application log settings so that only errors are
written.
2. make use of the client-local.cfg log's setting of ignore. This
will
allow
the hobbit client to identify what is an extraneous message, and
ignore
it.
Per the man page:

The ignore PATTERN line (optional) defines lines in the logfile which
are ignored entirely, i.e. they are stripped from the logfile data 
before sending it to the Hobbit server. It is used to remove 
completely unwanted "noise"
entries from the logdata processed by Hobbit. "PATTERN" is a regular 
expression.

I hope this helps you,
 ~Steve
It's great to hear you guys, Hubbard and Steve, that you also find
this
is a limitation (more than a bug), not wrong in my configuration.

I was thinking to modify source codes before, but it might be
difficult
for me. I'll try your suggestions, thanks!

Samuel Cai
In my reply to your email, I said that this behavior "was a good thing".
I do
not find this to be a limitation at all. I offered you two possible
solutions, were any of these applicable ?

The "limitation" really resides in whatever application is logging soo
verbosely. Production level applications should have their logging
limited as much as possible whenever possible, only logging indicators
of errors.
And
whenever this isn't possible, make use of the IGNORE option.

 ~Steve
list Samuel Cai · Sun, 27 Jul 2008 18:51:38 -0700 ·
Hi Hubbard, 

I think you have some misunderstanding on how Hobbit works. Hobbit
client still sends the limited data you defined to server, no more.
Removing 100k limitation just let the logfetch program processes more
data every time before it sends data to server, so it may have impact on
client's performance, but won't affect network traffic. 
Anyway, I will stick on my solution, introducing another filtering
process may make hobbit client configuration complex. 
quoted from Greg L Hubbard

Thanks,
Samuel Cai


-----Original Message-----
From: Hubbard, Greg L [mailto:user-d970b5e56ec9@xymon.invalid] 
Sent: Friday, July 25, 2008 9:45 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Bug Report: Critical error in log couldn't be sent
to server every time

Samuel,

If you think this through, you probably don't really want the Hobbit
agent to send that much data up to the Hobbit server every 5 minutes.
You are welcome to modify your own source code to change the 100K limit
to anything you want, but I think you will be better served to look for
a near-real-time log filtering process that can process your big, busy
log on the local host, and then spit out "significant" messages in
another log that you can wire into Hobbit.

I do not have any solutions to offer, but I think you can probably find
plenty of options if you spend a few minutes in a Google search. 

-----Original Message-----
From: Samuel Cai [mailto:user-ba507acc1d03@xymon.invalid] 
Sent: Thursday, July 24, 2008 8:01 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] Bug Report: Critical error in log couldn't be sent
to server every time

Hi, Steve

For your two suggestions, I checked the source codes, there is a 100K
limitation, so it doesn't help to introduce ignore if the Error is out
of that range. So you may wonder why we output more than 100K in just 30
minutes, there are several reasons:
1. It is our production server log, which is busy.
2. We set level to Warning, which output more than Error level 3. Some
our codes did not set log level correctly, we're in the process of
cleaning up it.
4. If there's exception, we output whole thread log in INFO level, which
is huge.

Anyway, I still think 100K in 30 minutes is a little small value for a
busy site's log, I would like to remove this limitation and also keep
cleaning up our logs.

Thanks for your suggestion,
Samuel Cai

-----Original Message-----
From: user-ce96540ed38f@xymon.invalid [mailto:user-ce96540ed38f@xymon.invalid]
Sent: Thursday, July 24, 2008 8:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] Bug Report: Critical error in log couldn't be sent
to server every time

On Wednesday 23 July 2008, Samuel Cai wrote:
It really depends on what log level your application is logging at.
If
you are
logging at 'INFO' level, then there will be alot of data to process.
As
you
see, Hobbit implements a limit on how much log data it will parse.
This
is a
good thing, at least in my opinion.

It all depends what is in your log... and why soo much data is being 
written.
If they are all errors, well hobbit would be catching them telling
you
there
are errors. Since this is not the case.. would guess your log has
data
other
than errors.

Suggestions:
1. tune your application log settings so that only errors are
written.
2. make use of the client-local.cfg log's setting of ignore. This
will
allow
the hobbit client to identify what is an extraneous message, and
ignore
it.
Per the man page:

The ignore PATTERN line (optional) defines lines in the logfile which
are ignored entirely, i.e. they are stripped from the logfile data 
before sending it to the Hobbit server. It is used to remove 
completely unwanted "noise"
entries from the logdata processed by Hobbit. "PATTERN" is a regular 
expression.

I hope this helps you,
 ~Steve
It's great to hear you guys, Hubbard and Steve, that you also find
this
is a limitation (more than a bug), not wrong in my configuration.

I was thinking to modify source codes before, but it might be
difficult
for me. I'll try your suggestions, thanks!

Samuel Cai
In my reply to your email, I said that this behavior "was a good thing".
I do
not find this to be a limitation at all. I offered you two possible
solutions, were any of these applicable ?

The "limitation" really resides in whatever application is logging soo
verbosely. Production level applications should have their logging
limited as much as possible whenever possible, only logging indicators
of errors.
And
whenever this isn't possible, make use of the IGNORE option.

 ~Steve