Xymon Mailing List Archive search

Easy question on GRAPHS and trends

10 messages in this thread

list Eduardo Mayoral · Fri, 03 Mar 2006 22:52:48 +0100 ·
I have successfully migrated from BB to hobbit (excellent software
IMHO), however there is a small thing that I cannot get solved.

I collect rrd data for the tests vmstat, vmstat1,
vmstat2,vmstat3,vmstat4 and vmstat5

If I build an URL like:
http://MYBBDISPLAY/hobbit-cgi/hobbitgraph.sh?host=AHOBBITCLIENT&service=vmstat3&graph_width=576&graph_height=120&first=1&count=1&disp=jupiter%2earsys%2ees&graph=hourly&action=view

I can see the graph just fine (so the data on the rrd files and the
graph definitions - which come standard on hobbitgraph.cfg - are OK),
however I cannot get hobbit to display the graph and link in the trends
page. It only displays vmstat , but not vmstat1-vmstat5

According to the documentation I should just add the graph names to the
GRAPHS user-e483cdb434b1@xymon.invalid, but I have already done that:
GRAPHS="la,disk,inode,qtree,memory,users,vmstat,vmstat3,vmstat4,vmstat5,iostat,tcp.http,tcp,netstat,temperature,ntpstat,apache,bind,sendmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,postfix,mrtg::1"

The funny thing is that I also made a custom graph 'postfix' for postfix
queues and that one shows up on the trends page perfectly, it is only
the vmstat family which refuses to appear.

My version is hobbit-4.1.2p1-1 on Fedora Core 3.

Thanks in advance for your help.
list Geoff Steer · Mon, 06 Mar 2006 15:55:13 +1100 ·
My hobbit server has been error free since I installed 4.1.2 but in the
last day of so, has had an error for hobbitd_rrd .

The rrd-data.log shows:
*** glibc detected *** double linked list
Worker process died with exit code 134
*** glibc detected *** double free or corruption (fasttop)

I can't find anything in the archives about this. 

No changes made to the system, no client added. 
I;ves stopped and started hobbit with no changed.

The rrd graphs show gaps every 10 - 20 minutes or so.

Any suggestions? more data required?

Regards
Geoff


-------------------------------Safe Stamp-----------------------------------
The sender's Anti-virus Service scanned this email. It is safe from known viruses.
list Henrik Størner · Mon, 6 Mar 2006 10:58:21 +0100 ·
quoted from Geoff Steer
On Mon, Mar 06, 2006 at 03:55:13PM +1100, Geoff Steer wrote:
My hobbit server has been error free since I installed 4.1.2 but in the
last day of so, has had an error for hobbitd_rrd .

The rrd-data.log shows:
*** glibc detected *** double linked list
Worker process died with exit code 134
*** glibc detected *** double free or corruption (fasttop)
This usually indicates some sort of corruption of the memory
allocation inside hobbitd_rrd. Since hobbitd_rrd depends on the
rrdtool library, it could also be a problem with that.

Since it's glibc you're probably on a Linux/Intel platform.
Would it be possible for you to run the hobbitd_rrd command
through the "Valgrind" memory checker ? I don't know if
Valgrind is included with your distribution - it is part
of the standard Debian release, but your distro might be
different. If you can get it installed, then just change
the command in the "[rrddata]" section from

CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
    hobbitd_rrd --rrddir=$BBVAR/rrd

to

CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
    valgrind --log-file=$BBSERVERLOGS/valgrind.log \
    hobbitd_rrd --rrddir=$BBVAR/rrd

Let it run until the errors shows up, then send me the valgrind.log.*
files.


Regards,
Henrik
list Henrik Størner · Mon, 6 Mar 2006 11:05:06 +0100 ·
quoted from Eduardo Mayoral
On Fri, Mar 03, 2006 at 10:52:48PM +0100, Eduardo Mayoral wrote:
however I cannot get hobbit to display the graph and link in the trends
page. It only displays vmstat , but not vmstat1-vmstat5

According to the documentation I should just add the graph names to the
GRAPHS user-e483cdb434b1@xymon.invalid, but I have already done that:
GRAPHS="la,disk,inode,qtree,memory,users,vmstat,vmstat3,vmstat4,vmstat5,iostat,tcp.http,tcp,netstat,temperature,ntpstat,apache,bind,sendmail,mailq,socks,bea,iishealth,citrix,bbgen,bbtest,bbproxy,hobbitd,postfix,mrtg::1"
I'm afraid you've misunderstood the docs (which means I'll need to make
them more explicit). The GRAPHS setting is "only" used to find out which
RRD databases should be used on the trends page. But vmstat has multiple
datasets inside a single RRD database, and you want to show several of
the datasets on your trends page.

The answer to that one is to add a TRENDS setting to the host entry in
the bb-hosts file. Like

10.0.0.1  myhost.foo.com # TRENDS:*,vmstat:vmstat|vmstat3|vmstat4|vmstat5


Regards,
Henrik
list Olivier Beau · Thu, 9 Mar 2006 14:49:55 +0100 ·
Hi Henrik,

Doing content checks on "large" web pages (13M) disturbs hobbitd;
in the log : "Data flooding from 10.33.254.87, closing connection"
causing a bunch of network checks to go purple..


That url did 13M because of a big tomcat dump... and we (sysadmin) don't
controls the size of the webpages...


Do you have a work arround for this ?


Regards,

Olivier
list Henrik Størner · Thu, 9 Mar 2006 23:01:19 +0100 ·
quoted from Olivier Beau
On Thu, Mar 09, 2006 at 02:49:55PM +0100, Olivier Beau wrote:
Hi Henrik,

Doing content checks on "large" web pages (13M) disturbs hobbitd;
in the log : "Data flooding from 10.33.254.87, closing connection"
causing a bunch of network checks to go purple..
This is really a safety/security thing to avoid hobbitd consuming all 
of memory. Since hobbitd keeps everything in memory, it would be too 
easy to launch a denial-of-service attack by just flooding it with data.
quoted from Olivier Beau
That url did 13M because of a big tomcat dump... and we (sysadmin) don't
controls the size of the webpages...
I hope your developers weren't forced to explain every bit of that dump :-)
Do you have a work arround for this ?
Try the attached patch for the network test tool. It limits the amount
of content data that is sent across to 1 MB, but the content check
itself is performed on the full amount of data.

Untested, but fairly simple so I would expect it to work.


Regards,
Henrik

-------------- next part --------------
--- bbnet/bbtest-net.h	2005/12/29 16:18:42	1.34
+++ bbnet/bbtest-net.h	2006/03/09 21:55:07
@@ -17,6 +17,8 @@
 #define STATUS_CONTENTMATCH_FAILED 902
 #define STATUS_CONTENTMATCH_BADREGEX 903
 
+#define MAX_CONTENT_DATA (1024*1024)	/* 1 MB should be enough for most */
• /*
  * Structure of the bbtest-net in-memory records
  • --- bbnet/httpresult.c	2005/12/29 16:19:20	1.19
+++ bbnet/httpresult.c	2006/03/09 21:54:18
@@ -429,6 +429,12 @@
 		xfree(msgline);
 
 		if (req->output) {
+			/* Dont flood hobbitd with data */
+			if (req->outlen > MAX_CONTENT_DATA) {
+				*(req->output + MAX_CONTENT_DATA) = '\0';
+				req->outlen = MAX_CONTENT_DATA;
+			}
• if ( (req->contenttype && (strncasecmp(req->contenttype, "text/html", 9) == 0)) ||
 			     (strncasecmp(req->output, "<html", 5) == 0) ) {
 				char *bodystart = NULL;
list Olivier Beau · Fri, 10 Mar 2006 15:57:51 +0100 ·
Hi Henrik,

In bb-hosts, settings an apache tag will an invalid url
(apache=www.toto.com/server-status?auto) causes the whole bbtest-net to fails
and coredump


(gdb) bt
#0  0x0026deff in raise () from /lib/tls/libc.so.6
#1  0x0026f705 in abort () from /lib/tls/libc.so.6
#2  0x08059cf1 in xstrdup (s=0x0) at memory.c:175
#3  0x08053105 in add_http_test (t=0x9195e10) at httptest.c:418
#4  0x0804f5c9 in main (argc=9, argv=0xbfffa264) at bbtest-net.c:2227


Regards,

Olivier Beau
list Geoff Steer · Fri, 31 Mar 2006 15:21:20 +1100 ·
I've finally gotten back to looking at this problem and have some more
info that may be relevant. It hasn't been high on the list as hobbit  is
still working fine for alerts.

Firstly, I've tried removing the existing rrd files and letting hobbit
create new ones, no change - the core files still are produced.

I've tried building hobbit 4.1.2p1 with rrdtool 1.2.11 and with 1.2.12,
no change. This is with existing rrd files and also letting hobbit
create new ones as required.

In looking at the current core files with gdb, it seems that that they
all report an error related to sendmail:

(gdb) bt
#0  0x00abe7a2 in ?? () from /lib/ld-linux.so.2
#1  0x00afe7d5 in raise () from /lib/tls/libc.so.6
#2  0x00b00149 in abort () from /lib/tls/libc.so.6
#3  0x08054af2 in sigsegv_handler (signum=11) at sig.c:57
#4  0x00afe8c8 in killpg () from /lib/tls/libc.so.6
#5  0x0804e011 in do_sendmail_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xbffc5dd0  tstamp=1143771322)
    at rrd/do_sendmail.c:127
#6  0x08050120 in update_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xb7f6f05b "data outrelay1,firstwave,com,au.sendmail Fri Mar 31
13:15:22 EST 2006\nStatistics from Tue Jun 21 10:47:07 2005\n M   msgsfr
bytes_from   msgsto    bytes_to  msgsrej msgsdis msgsqur  Mailer\n 3
25299848"..., 
    tstamp=1143771322, sender=0x0, ldef=0x0) at do_rrd.c:271
#7  0x08049e3a in main (argc=0, argv=0xbffca4e4) at hobbitd_rrd.c:199

I'm ready to rebuild the server entirely but I'm not convinced that this
will resolve the issue. As I said previously, this set up has been
working fine for months, the problem started for no obvious reason in
early march.

Regards
geoff
quoted from Henrik Størner


On Mon, 2006-03-06 at 10:58 +0100, Henrik Stoerner wrote:
On Mon, Mar 06, 2006 at 03:55:13PM +1100, Geoff Steer wrote:
My hobbit server has been error free since I installed 4.1.2 but in the
last day of so, has had an error for hobbitd_rrd .

The rrd-data.log shows:
*** glibc detected *** double linked list
Worker process died with exit code 134
*** glibc detected *** double free or corruption (fasttop)
This usually indicates some sort of corruption of the memory
allocation inside hobbitd_rrd. Since hobbitd_rrd depends on the
rrdtool library, it could also be a problem with that.

Since it's glibc you're probably on a Linux/Intel platform.
Would it be possible for you to run the hobbitd_rrd command
through the "Valgrind" memory checker ? I don't know if
Valgrind is included with your distribution - it is part
of the standard Debian release, but your distro might be
different. If you can get it installed, then just change
the command in the "[rrddata]" section from

CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
    hobbitd_rrd --rrddir=$BBVAR/rrd

to

CMD hobbitd_channel --channel=data   --log=$BBSERVERLOGS/rrd-data.log \
    valgrind --log-file=$BBSERVERLOGS/valgrind.log \
    hobbitd_rrd --rrddir=$BBVAR/rrd

Let it run until the errors shows up, then send me the valgrind.log.*
files.


Regards,
Henrik


-------------------------------Safe Stamp-----------------------------------
Your Anti-virus Service scanned this email. It is safe from known viruses.
For more information regarding this service, please contact your service provider.
quoted from Geoff Steer

-------------------------------Safe Stamp-----------------------------------
The sender's Anti-virus Service scanned this email. It is safe from known viruses.
list Henrik Størner · Fri, 2 Jun 2006 18:26:28 +0200 ·
quoted from Olivier Beau
On Fri, Mar 10, 2006 at 03:57:51PM +0100, Olivier Beau wrote:
In bb-hosts, settings an apache tag will an invalid url
(apache=www.toto.com/server-status?auto) causes the whole bbtest-net to fails
and coredump
This has been fixed today.


Henrik
list Henrik Størner · Fri, 2 Jun 2006 18:29:21 +0200 ·
quoted from Geoff Steer
On Fri, Mar 31, 2006 at 03:21:20PM +1100, Geoff Steer wrote:
In looking at the current core files with gdb, it seems that that they
all report an error related to sendmail:

(gdb) bt
#5  0x0804e011 in do_sendmail_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xbffc5dd0  tstamp=1143771322)
    at rrd/do_sendmail.c:127
#6  0x08050120 in update_rrd (
    hostname=0xb7f6f037 "outrelay1.firstwave.com.au", 
    testname=0xb7f6f052 "sendmail", 
    msg=0xb7f6f05b "data outrelay1,firstwave,com,au.sendmail Fri Mar 31
13:15:22 EST 2006\nStatistics from Tue Jun 21 10:47:07 2005\n M   msgsfr
bytes_from   msgsto    bytes_to  msgsrej msgsdis msgsqur  Mailer\n 3
25299848"..., 
    tstamp=1143771322, sender=0x0, ldef=0x0) at do_rrd.c:271
#7  0x08049e3a in main (argc=0, argv=0xbffca4e4) at hobbitd_rrd.c:199

I'm ready to rebuild the server entirely but I'm not convinced that this
will resolve the issue. As I said previously, this set up has been
working fine for months, the problem started for no obvious reason in
early march.
I'm not sure what was the cause of these crashes, but I looked for a
problem around that piece of code and found one missing check for
an error returned by the RRDtool library that could explain this.
So I'd like some feedback if the problem still occurs, and if it does
whether tomorrows snapshot (generated in a few hours) helps.


Regards,
Henrik