Xymon Mailing List Archive search

hobbitd coredumping and purple trends

7 messages in this thread

list Richard Deal · Fri, 1 Apr 2005 17:22:42 -0500 ·
My hobbitd is core dumping every so often and less often but still
occasional the trends column turns purple.

Looking through the makefile the only oddity is MAXMSG=32768
Were my old BBd was set to #define MAXLINE  11264

I have core files in /tmp from hobbitd 

Logs :

more bb-display.log 
2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:18:05 Whoops ! bb failed to send message - timeout
2005-04-01 17:03:08 Whoops ! bb failed to send message - timeout
more hobbitd.log
2005-04-01 15:32:47 Setup complete
2005-04-01 15:32:54 Setup complete
2005-04-01 15:48:01 Setup complete
2005-04-01 16:03:01 Setup complete
2005-04-01 16:33:03 Setup complete
2005-04-01 16:48:04 Setup complete

I have a lot of these errors in larrd-data.log from various hosts.
2005-04-01 17:17:53 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/ray1.tigr.org/netstat.rrd from
172.17.10.20: expected 12 data source readings (got 16) from
1112393873:597496849:203665680:0:1400608:474490:380897:4323:190:65584910
3:2750185864:9271815:54370878:358842800:919424657:55608:57615:...
2005-04-01 17:18:15 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/akela.tigr.org/netstat.rrd
from 172.17.10.87: expected 12 data source readings (got 16) from
1112393894:7278664:4601574:0:2187293:80558:15408:1028:18:3786687185:3319
9304:551592:3055134:392628802:534540232:12324:8938:...
2005-04-01 17:18:22 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/vader.tigr.org/netstat.rrd
from 172.16.4.50: expected 12 data source readings (got 16) from
1112393902:844147:844153:0:173177:11681993:15774:1756237:109:2946405093:
1171800154:1508:44541250:1263968085:53592252:29:1305303:...
2005-04-01 17:18:49 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/invino.tigr.org/netstat.rrd
from 172.17.10.29: expected 12 data source readings (got 16) from
1112393929:161474660:161355279:0:979032:1013326:8108:2751:26:3077107260:
3115145104:3779497608:1171327:3474031250:2366740414:176290878:15382:...

I used the moverrd.sh .


And these errors from lard-status.log:
005-04-01 17:18:10 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/IGR51RRTB.tigr.org/temperature
.module_6_asic-.rrd from 172.17.10.16: illegal attempt to update using
time 1112393889 when last update time is 1112393889 (minimum one second
step)
2005-04-01 17:20:04 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/utah.tigr.org/disk.rrd from
172.17.10.79: illegal attempt to update using time 1112394004 when last
update time is 1112394004 (minimum one second step)
2005-04-01 17:20:04 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/utah.tigr.org/disk.rrd from
172.17.10.79: illegal attempt to update using time 1112394004 when last
update time is 1112394004 (minimum one second step)
2005-04-01 17:21:27 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/atlas.tigr.org/netstat.rrd
from 172.17.10.80: expected 11 data source readings (got 16) from
1112394087:23501770:2904610:0:97558:26724:76:17:8:U:U:U:U:226801128:2976
62863:U:956:...

any suggestions?
Thanks
list Olivier Beau · Sat, 2 Apr 2005 01:13:00 +0200 ·
i'm still running RC6,
and i have the same behaviour : serveral cores in tmp/
(about a dozen per day)
they seem to be bbtest-net, but also bbgen cores !


i have also seem my hobbitd bark to listen to port 1984...
(telnet localhost 1984 would not answer; couple seconds after it would...)


henrik : can these 2 problems be related ?


olivier


Selon "Deal, Richard" <user-f6f804cb0a50@xymon.invalid>:
quoted from Richard Deal
My hobbitd is core dumping every so often and less often but still
occasional the trends column turns purple.

Looking through the makefile the only oddity is MAXMSG=32768
Were my old BBd was set to #define MAXLINE  11264

I have core files in /tmp from hobbitd 

Logs :

more bb-display.log 
2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:18:05 Whoops ! bb failed to send message - timeout
2005-04-01 17:03:08 Whoops ! bb failed to send message - timeout
list Henrik Størner · Sat, 2 Apr 2005 08:51:40 +0200 ·
quoted from Olivier Beau
On Fri, Apr 01, 2005 at 05:22:42PM -0500, Deal, Richard wrote:
My hobbitd is core dumping every so often and less often but still
occasional the trends column turns purple.
hobbitd crashing - that's bad.

Could you run the core-dump through gdb and send me the call-trace.
Do this:

    $ gdb ~hobbit/server/bin/hobbitd /tmp/core-file-from-hobbitd
    [messages from gdb]
    gdb> bt

and send me the output from that "bt" command.
quoted from Olivier Beau

Looking through the makefile the only oddity is MAXMSG=32768
Were my old BBd was set to #define MAXLINE  11264
Shouldn't cause any problems, it just means Hobbit will accept larger
messages than your BB setup.
quoted from Olivier Beau
more bb-display.log 
2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:03:00 connect to bbd failed - Connection refused
Probably a result of hobbitd being down.
quoted from Richard Deal

I have a lot of these errors in larrd-data.log from various hosts.
2005-04-01 17:17:53 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/ray1.tigr.org/netstat.rrd from
172.17.10.20: expected 12 data source readings (got 16) from
The "netstat" and "vmstat" RRD files from LARRD are not compatible
with Hobbit. Do a

   find ~hobbit/data/rrd -name netstat.rrd | xargs rm -f

to delete the old files.
quoted from Richard Deal

005-04-01 17:18:10 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/IGR51RRTB.tigr.org/temperature
.module_6_asic-.rrd from 172.17.10.16: illegal attempt to update using
time 1112393889 when last update time is 1112393889 (minimum one second
step)
This is a bit more tricky. It means that the same RRD file was being
updated by two status messages within one second - that normally
should not happen, because a status is sent every 5 minutes. It can
happen if you have two hosts reporting the same hostname (one of them
would be the 172.17.10.16 IP you have in that error message).


Regards,
Henrik
list Henrik Størner · Sat, 2 Apr 2005 08:54:32 +0200 ·
quoted from Olivier Beau
On Sat, Apr 02, 2005 at 01:13:00AM +0200, user-fe6e0e6a0d05@xymon.invalid wrote:
i'm still running RC6,
and i have the same behaviour : serveral cores in tmp/
(about a dozen per day)
they seem to be bbtest-net, but also bbgen cores !
I'd like to see call-traces from those core files:

   cd ~hobbit/server
   gdb bin/bbgen tmp/core-from-bbgen
   [messages from gdb starting up]
quoted from Olivier Beau
   gdb> bt

and send me the output.
i have also seem my hobbitd bark to listen to port 1984...
(telnet localhost 1984 would not answer; couple seconds after it would...)

henrik : can these 2 problems be related ?
Perhaps ... but I wouldn't expect them to be, unless it was hobbitd
that crashed.


Henrik
list Terry Barnes · Sun, 03 Apr 2005 14:21:25 -0400 ·
I experienced same thing after making some changes to hobbit - might be
a longshot, but here is what caused this for me.

After restarting hobbit and receiving the same as you, found that some
hobbit processes were hung. If I stopped hobbit - could still see most
processes were still running. Even after multiple attempt to do a
~/server/hobbit.sh stop, the processes continued to run. Killed those
processes and restart hobbit - problem solved.

Like I say - could be a longshot, but worth a look.

Terry Barnes
Siemens Com @ HFHS
XXX-XXX-XXXX (Office)
XXX-XXX-XXXX (Cellular)
XXX-XXX-XXXX (Fax)
user-34ea5ff61ded@xymon.invalid (Text Pager)
user-0e29285d9a67@xymon.invalid
user-f6f804cb0a50@xymon.invalid 4/1/05 5:22:42 PM >>>
quoted from Richard Deal
My hobbitd is core dumping every so often and less often but still
occasional the trends column turns purple.

Looking through the makefile the only oddity is MAXMSG=32768
Were my old BBd was set to #define MAXLINE  11264

I have core files in /tmp from hobbitd 

Logs :

more bb-display.log 
2005-04-01 15:47:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:02:59 Whoops ! bb failed to send message - timeout
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:03:00 connect to bbd failed - Connection refused
2005-04-01 16:03:00 Whoops ! bb failed to send message - Connection
failed
2005-04-01 16:18:05 Whoops ! bb failed to send message - timeout
2005-04-01 17:03:08 Whoops ! bb failed to send message - timeout
more hobbitd.log
2005-04-01 15:32:47 Setup complete
2005-04-01 15:32:54 Setup complete
2005-04-01 15:48:01 Setup complete
2005-04-01 16:03:01 Setup complete
2005-04-01 16:33:03 Setup complete
2005-04-01 16:48:04 Setup complete

I have a lot of these errors in larrd-data.log from various hosts.
2005-04-01 17:17:53 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/ray1.tigr.org/netstat.rrd
from
172.17.10.20: expected 12 data source readings (got 16) from
1112393873:597496849:203665680:0:1400608:474490:380897:4323:190:65584910
3:2750185864:9271815:54370878:358842800:919424657:55608:57615:...
2005-04-01 17:18:15 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/akela.tigr.org/netstat.rrd
from 172.17.10.87: expected 12 data source readings (got 16) from
1112393894:7278664:4601574:0:2187293:80558:15408:1028:18:3786687185:3319
9304:551592:3055134:392628802:534540232:12324:8938:...
2005-04-01 17:18:22 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/vader.tigr.org/netstat.rrd
from 172.16.4.50: expected 12 data source readings (got 16) from
1112393902:844147:844153:0:173177:11681993:15774:1756237:109:2946405093:
1171800154:1508:44541250:1263968085:53592252:29:1305303:...
2005-04-01 17:18:49 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/invino.tigr.org/netstat.rrd
from 172.17.10.29: expected 12 data source readings (got 16) from
1112393929:161474660:161355279:0:979032:1013326:8108:2751:26:3077107260:
3115145104:3779497608:1171327:3474031250:2366740414:176290878:15382:...

I used the moverrd.sh .


And these errors from lard-status.log:
005-04-01 17:18:10 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/IGR51RRTB.tigr.org/temperature
.module_6_asic-.rrd from 172.17.10.16: illegal attempt to update using
time 1112393889 when last update time is 1112393889 (minimum one
second
step)
2005-04-01 17:20:04 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/utah.tigr.org/disk.rrd from
172.17.10.79: illegal attempt to update using time 1112394004 when
last
update time is 1112394004 (minimum one second step)
2005-04-01 17:20:04 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/utah.tigr.org/disk.rrd from
172.17.10.79: illegal attempt to update using time 1112394004 when
last
update time is 1112394004 (minimum one second step)
2005-04-01 17:21:27 RRD error updating
/local/packages/IT/HOBBIT/hobbit/data/rrd/atlas.tigr.org/netstat.rrd
from 172.17.10.80: expected 11 data source readings (got 16) from
1112394087:23501770:2904610:0:97558:26724:76:17:8:U:U:U:U:226801128:2976
62863:U:956:...

any suggestions?
Thanks


==============================================================================

CONFIDENTIALITY NOTICE: This email contains information from the sender that may be CONFIDENTIAL, LEGALLY PRIVILEGED, PROPRIETARY or otherwise protected from disclosure. This email is intended for use only by the person or entity to whom it is addressed.  If you are not the intended recipient, any use, disclosure, copying, distribution, printing, or any action taken in reliance on the contents of this email, is strictly prohibited. If you received this email in error, please contact the sending party by replying in an email to the sender, delete the email from your computer system and shred any paper copies of the email you printed.

Note to Patients: There are a number of risks you should consider before using e-mail to communicate with us. These risks are described in our Privacy Policy at http://henryford.com.  Review that policy carefully before continuing to communicate with us by e-mail. For greater Internet security, our policy describes the Henry Ford MyHealth electronic communication process - you may register at http://henryford.com.  If you do not believe that our policy gives you the privacy and security protection you need, do not send e-mail or Internet communications to us.


==============================================================================
list Henrik Størner · Sun, 3 Apr 2005 19:53:41 +0000 (UTC) ·
quoted from Terry Barnes
In <user-b51923391996@xymon.invalid> "Terry Barnes" <user-0e29285d9a67@xymon.invalid> writes:
I experienced same thing after making some changes to hobbit - might be
a longshot, but here is what caused this for me.
I think I've found the cause for this particular problem. A
usable work-around is to add the "--no-meta" option to the
bb-larrdcolumn command in hobbitlaunch.cfg.


Regards,
Henrik
list Richard Deal · Tue, 5 Apr 2005 09:22:10 -0400 ·
--no-meta fixed the purple trends issue and the patch fixed the core
dumps.
Thanks
quoted from Henrik Størner


-----Original Message-----
From: Henrik Storner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Sunday, April 03, 2005 3:54 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] hobbitd coredumping and purple trends

In <user-b51923391996@xymon.invalid> "Terry Barnes" <user-0e29285d9a67@xymon.invalid> writes:
I experienced same thing after making some changes to hobbit - might be
a longshot, but here is what caused this for me.
I think I've found the cause for this particular problem. A
usable work-around is to add the "--no-meta" option to the
bb-larrdcolumn command in hobbitlaunch.cfg.


Regards,
Henrik