Xymon Mailing List Archive search

fping failure

10 messages in this thread

list Jeff Newman · Thu, 4 May 2006 17:24:11 -0500 ·
All,

Every now and then, all of my hosts go red on conn and fail, then come
right back up. Below is and excerpt from bb-network.log. As you can
see, it's at no
set time, no set date, it just plain fails. Any thoughts?

2006-03-06 23:20:29 Execution of '/usr/sbin/fping -Ae' failed with error-code 98
2006-03-06 23:20:29 Cannot open fping output file
/usr/local/hobbit/server/tmp/fping-stdout.17520
2006-03-24 02:51:27 Execution of '/usr/sbin/fping -Ae' failed with error-code 98
2006-03-24 02:51:27 Cannot open fping output file
/usr/local/hobbit/server/tmp/fping-stdout.17576
2006-04-15 18:24:31 Execution of '/usr/sbin/fping -Ae' failed with error-code 98
2006-04-15 18:24:31 Cannot open fping output file
/usr/local/hobbit/server/tmp/fping-stdout.18026
2006-04-27 18:48:05 Execution of '/usr/sbin/fping -Ae' failed with error-code 98
2006-04-27 18:48:05 Cannot open fping output file
/usr/local/hobbit/server/tmp/fping-stdout.17520
2006-05-04 17:14:08 Execution of '/usr/sbin/fping -Ae' failed with error-code 98
2006-05-04 17:14:08 Cannot open fping output file
/usr/local/hobbit/server/tmp/fping-stdout.18026


-Jeff
list Rob MacGregor · Thu, 4 May 2006 23:37:00 +0100 ·
quoted from Jeff Newman
On 5/4/06, Jeff Newman <user-e96740e73ca8@xymon.invalid> wrote:
All,

Every now and then, all of my hosts go red on conn and fail, then come
right back up. Below is and excerpt from bb-network.log. As you can
see, it's at no
set time, no set date, it just plain fails. Any thoughts?
Version of hobbit?  Operating system?

--
                 Please keep list traffic on the list.
Rob MacGregor
      Whoever fights monsters should see to it that in the process he
        doesn't become a monster.                  Friedrich Nietzsche
list Jeff Newman · Thu, 4 May 2006 17:43:49 -0500 ·
Sorry,

Hobbit 4.1.2p1, Redhat AS 4

-Jeff
quoted from Rob MacGregor

On 5/4/06, Rob MacGregor <user-07c9d92ae079@xymon.invalid> wrote:
On 5/4/06, Jeff Newman <user-e96740e73ca8@xymon.invalid> wrote:
All,

Every now and then, all of my hosts go red on conn and fail, then come
right back up. Below is and excerpt from bb-network.log. As you can
see, it's at no
set time, no set date, it just plain fails. Any thoughts?
Version of hobbit?  Operating system?

--
                Please keep list traffic on the list.
Rob MacGregor
     Whoever fights monsters should see to it that in the process he
       doesn't become a monster.                  Friedrich Nietzsche
list Henrik Størner · Fri, 5 May 2006 07:43:18 +0200 ·
quoted from Jeff Newman
On Thu, May 04, 2006 at 05:24:11PM -0500, Jeff Newman wrote:
All,

Every now and then, all of my hosts go red on conn and fail, then come
right back up. Below is and excerpt from bb-network.log. As you can
see, it's at no
set time, no set date, it just plain fails. Any thoughts?

2006-03-06 23:20:29 Execution of '/usr/sbin/fping -Ae' failed with 
error-code 98
2006-03-06 23:20:29 Cannot open fping output file
The "code 98" is an internal code used by Hobbit when it fails to
open two files that we use to pick up the output from fping. These
go in ~hobbit/data/tmp/ directory, and are called fping-std{out,err}.PID

Perhaps you're short of diskspace ?


Regards,
Henrik
list Jeff Newman · Fri, 5 May 2006 09:32:15 -0500 ·
Nope, lots of disk space:

# df -k .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p2     69922596   5430680  60229552   9% /

It almost looks like maybe something had a lock on the file
preventing something from reading it?

Would the new "hobbitping" work better? How do I get that?

-Jeff
quoted from Henrik Størner


On 5/5/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, May 04, 2006 at 05:24:11PM -0500, Jeff Newman wrote:
All,

Every now and then, all of my hosts go red on conn and fail, then come
right back up. Below is and excerpt from bb-network.log. As you can
see, it's at no
set time, no set date, it just plain fails. Any thoughts?

2006-03-06 23:20:29 Execution of '/usr/sbin/fping -Ae' failed with
error-code 98
2006-03-06 23:20:29 Cannot open fping output file
The "code 98" is an internal code used by Hobbit when it fails to
open two files that we use to pick up the output from fping. These
go in ~hobbit/data/tmp/ directory, and are called fping-std{out,err}.PID

Perhaps you're short of diskspace ?


Regards,
Henrik

list Henrik Størner · Fri, 5 May 2006 17:34:22 +0200 ·
quoted from Jeff Newman
On Fri, May 05, 2006 at 09:32:15AM -0500, Jeff Newman wrote:
Nope, lots of disk space:

# df -k .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p2     69922596   5430680  60229552   9% /
How about inodes ? "df -i"
quoted from Jeff Newman
It almost looks like maybe something had a lock on the file
preventing something from reading it?
No, something prevented us from creating it.
Would the new "hobbitping" work better? How do I get that?
This is before fping/hobbitping gets to run, so I don't think it will
make any difference. Still, if you want to try it out just grab the 
latest snapshot and build it. Then you can copy the bbnet/hobbitping
binary to your Hobbit server/bin/ directory - make sure it is suid-root.
And change your FPING setting to point at "$BBHOME/bin/hobbitping"


Henrik
list Jeff Newman · Fri, 5 May 2006 13:25:37 -0500 ·
Here is some more info (df -i, ls in the tmp dir)

# cd /usr/local/hobbit/server/tmp
# ls
alert.chk           fping-stderr.17687  fping-stdout.17182  fping-stdout.18045
alert.chk.sub       fping-stderr.18009  fping-stdout.17471  fping-stdout.28736
fping..status       fping-stderr.18026  fping-stdout.17520  fping-stdout.31722
fping-stderr.17182  fping-stderr.18045  fping-stdout.17576  fping-stdout.4937
fping-stderr.17471  fping-stderr.28736  fping-stdout.17687  hobbitd.chk
fping-stderr.17520  fping-stderr.31722  fping-stdout.18009
fping-stderr.17576  fping-stderr.4937   fping-stdout.18026
# df -k .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p2     69922596   5436192  60224040   9% /
# df -i .
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/cciss/c0d0p2    8897472  178544 8718928    3% /
quoted from Henrik Størner

-Jeff


On 5/5/06, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Fri, May 05, 2006 at 09:32:15AM -0500, Jeff Newman wrote:
Nope, lots of disk space:

# df -k .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p2     69922596   5430680  60229552   9% /
How about inodes ? "df -i"
It almost looks like maybe something had a lock on the file
preventing something from reading it?
No, something prevented us from creating it.
Would the new "hobbitping" work better? How do I get that?
This is before fping/hobbitping gets to run, so I don't think it will
make any difference. Still, if you want to try it out just grab the
latest snapshot and build it. Then you can copy the bbnet/hobbitping
binary to your Hobbit server/bin/ directory - make sure it is suid-root.
And change your FPING setting to point at "$BBHOME/bin/hobbitping"


Henrik

list Henrik Størner · Fri, 5 May 2006 22:38:14 +0200 ·
quoted from Jeff Newman
On Fri, May 05, 2006 at 01:25:37PM -0500, Jeff Newman wrote:
Here is some more info (df -i, ls in the tmp dir)

# cd /usr/local/hobbit/server/tmp
# ls
alert.chk           fping-stderr.17687
Those fping-stderr* files are all size 0 - or ... ?
quoted from Jeff Newman
# df -k .
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/cciss/c0d0p2     69922596   5436192  60224040   9% /
# df -i .
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/cciss/c0d0p2    8897472  178544 8718928    3% /
No problem there.


If possible, I would like you to try out the bbtest-net program from the
current snapshot - http://www.hswn.dk/beta/ . It's OK if you just copy 
the bbnet/bbtest-net tool from the snapshot over to your current Hobbit
installation.

I've added a couple of lines of error output so it should log why it
cannot create the output files in the bb-network.log file.


Regards,
Henrik
list Jeff Newman · Fri, 5 May 2006 17:10:18 -0500 ·
Those fping-stderr* files are all size 0 - or ... ?
Sorry, should have done an ls -la ...

they are all 0, all old from like February, probably when I did some
testing, so they are nothing to worry about
quoted from Henrik Størner
If possible, I would like you to try out the bbtest-net program from the
current snapshot
I will do that on Monday. Like I initially, it only happens once or
twice a month,
so it may be hard to catch. It will be good to have this in place once it does.
I would rather fix this than go to hobbitping for now, so this should be good.

I'll let you know what the results are.

Thanks again,
Jeff
quoted from Henrik Størner

If possible, I would like you to try out the bbtest-net program from the
current snapshot - http://www.hswn.dk/beta/ . It's OK if you just copy
the bbnet/bbtest-net tool from the snapshot over to your current Hobbit
installation.

I've added a couple of lines of error output so it should log why it
cannot create the output files in the bb-network.log file.


Regards,
Henrik

list Jeff Newman · Mon, 8 May 2006 15:54:20 -0500 ·
Ok, I have compiled and put in the new bbtest-net file. Next time I
get an fping bomb I'll look in the log file and let you know what it
says.

Thanks again,
Jeff
quoted from Jeff Newman


On 5/5/06, Jeff Newman <user-e96740e73ca8@xymon.invalid> wrote:
Those fping-stderr* files are all size 0 - or ... ?
Sorry, should have done an ls -la ...

they are all 0, all old from like February, probably when I did some
testing, so they are nothing to worry about
If possible, I would like you to try out the bbtest-net program from the
current snapshot
I will do that on Monday. Like I initially, it only happens once or
twice a month,
so it may be hard to catch. It will be good to have this in place once it does.
I would rather fix this than go to hobbitping for now, so this should be good.

I'll let you know what the results are.

Thanks again,
Jeff

If possible, I would like you to try out the bbtest-net program from the
current snapshot - http://www.hswn.dk/beta/ . It's OK if you just copy
the bbnet/bbtest-net tool from the snapshot over to your current Hobbit
installation.

I've added a couple of lines of error output so it should log why it
cannot create the output files in the bb-network.log file.


Regards,
Henrik