Xymon Mailing List Archive search

Conn test fails after server reboot

10 messages in this thread

list John Horne · Thu, 12 Jul 2012 10:35:54 +0100 ·
Hello,

Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then
the 'conn' test fails for all the clients. E.g.:

============================
Thu Jul 12 10:24:11 2012 conn NOT ok 
Service conn on dns1 is not OK : Host does not respond to ping


System unreachable for 5 poll periods (984 seconds)
============================

If, from the server, I run 'ping' to the client then that works fine. So
does fping. If I stop then start the Xymon service on the server then
the client conn tests all report ok.


Any ideas about this?


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list Jeremy Laidman · Fri, 13 Jul 2012 14:45:01 +1000 ·
How long did you wait between the reboot and restarting Xymon?
quoted from John Horne

On Thu, Jul 12, 2012 at 7:35 PM, John Horne <user-e95f1ec2f147@xymon.invalid>wrote:
Hello,

Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then
the 'conn' test fails for all the clients. E.g.:

============================
Thu Jul 12 10:24:11 2012 conn NOT ok
Service conn on dns1 is not OK : Host does not respond to ping


System unreachable for 5 poll periods (984 seconds)
============================

If, from the server, I run 'ping' to the client then that works fine. So
does fping. If I stop then start the Xymon service on the server then
the client conn tests all report ok.


Any ideas about this?


John.

--
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX

list John Horne · Fri, 13 Jul 2012 09:38:54 +0100 ·
quoted from Jeremy Laidman
On Fri, 2012-07-13 at 14:45 +1000, Jeremy Laidman wrote:
How long did you wait between the reboot and restarting Xymon?

On Thu, Jul 12, 2012 at 7:35 PM, John Horne
<user-e95f1ec2f147@xymon.invalid> wrote:

        Using Xymon 4.3.7 I have noticed that if I reboot the Xymon
        server then the 'conn' test fails for all the clients. E.g.:
        ============================
        Thu Jul 12 10:24:11 2012 conn NOT ok
        Service conn on dns1 is not OK : Host does not respond to ping
        System unreachable for 5 poll periods (984 seconds)
        ============================
        If, from the server, I run 'ping' to the client then that
        works fine. So does fping. If I stop then start the Xymon
        service on the server then the client conn tests all report
        ok.
Hello,

I have waited various amounts of time, from as soon as I could log in
(about a minute or two since rebooting), up to about an hour.

I should have added that after a reboot, and when the conn tests are
red, then they stay red! Yet the clients are all up and running, and are
pingable. At what time I restart Xymon seems to make no difference, once
it is done then the tests start to turn green.

I can only assume that there is some initial condition which causes the
ping to fail, but that it remains in force until Xymon is restarted.
Very odd. I will investigate, but am a little lost as to why, say after
5, 10, 60 (!) mins, the tests do not automatically turn green.

I added 'trace' to one client in hosts,cfg, and it shows the traceroute
working fine but the test is still red and saying the ping failed.
quoted from Jeremy Laidman


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list Steven Carr · Fri, 13 Jul 2012 10:02:41 +0100 ·
What's the ping command set to in your server configuration file? are you
using the 'xymonping' command or 'fping'? Make sure that which ever command
you are using has the sticky bit set on the actual executable to allow the
xymon user to run it.

Steve
quoted from John Horne


On 13 July 2012 09:38, John Horne <user-e95f1ec2f147@xymon.invalid> wrote:
On Fri, 2012-07-13 at 14:45 +1000, Jeremy Laidman wrote:
How long did you wait between the reboot and restarting Xymon?

On Thu, Jul 12, 2012 at 7:35 PM, John Horne
<user-e95f1ec2f147@xymon.invalid> wrote:

        Using Xymon 4.3.7 I have noticed that if I reboot the Xymon
        server then the 'conn' test fails for all the clients. E.g.:
        ============================
        Thu Jul 12 10:24:11 2012 conn NOT ok
        Service conn on dns1 is not OK : Host does not respond to ping
        System unreachable for 5 poll periods (984 seconds)
        ============================
        If, from the server, I run 'ping' to the client then that
        works fine. So does fping. If I stop then start the Xymon
        service on the server then the client conn tests all report
        ok.
Hello,

I have waited various amounts of time, from as soon as I could log in
(about a minute or two since rebooting), up to about an hour.

I should have added that after a reboot, and when the conn tests are
red, then they stay red! Yet the clients are all up and running, and are
pingable. At what time I restart Xymon seems to make no difference, once
it is done then the tests start to turn green.

I can only assume that there is some initial condition which causes the
ping to fail, but that it remains in force until Xymon is restarted.
Very odd. I will investigate, but am a little lost as to why, say after
5, 10, 60 (!) mins, the tests do not automatically turn green.

I added 'trace' to one client in hosts,cfg, and it shows the traceroute
working fine but the test is still red and saying the ping failed.


John.

--
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX

list John Horne · Fri, 13 Jul 2012 10:11:40 +0100 ·
quoted from Steven Carr
On Fri, 2012-07-13 at 10:02 +0100, Steven Carr wrote:
What's the ping command set to in your server configuration file? are
you using the 'xymonping' command or 'fping'? Make sure that which
ever command you are using has the sticky bit set on the actual
executable to allow the xymon user to run it.
It is set to use fping. The pathname is correct, and the sticky bit is
set. I have run fping from the Xymon server as the xymon user and it
works fine:

===============================
xymon 17: fping -Ae 141.163.1.250 141.163.177.1
141.163.1.250 is alive (0.43 ms)
141.163.177.1 is alive (0.35 ms)
quoted from Steven Carr
===============================


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list Xymon User in Richmond · Fri, 13 Jul 2012 09:01:37 -0400 ·
quoted from John Horne
On Fri, July 13, 2012 04:38, John Horne wrote:
On Fri, 2012-07-13 at 14:45 +1000, Jeremy Laidman wrote:
How long did you wait between the reboot and restarting Xymon?

On Thu, Jul 12, 2012 at 7:35 PM, John Horne
<user-e95f1ec2f147@xymon.invalid> wrote:

        Using Xymon 4.3.7 I have noticed that if I reboot the Xymon
        server then the 'conn' test fails for all the clients. E.g.:
        ============================
        Thu Jul 12 10:24:11 2012 conn NOT ok
        Service conn on dns1 is not OK : Host does not respond to ping
        System unreachable for 5 poll periods (984 seconds)
        ============================
        If, from the server, I run 'ping' to the client then that
        works fine. So does fping. If I stop then start the Xymon
        service on the server then the client conn tests all report
        ok.
Hello,

I have waited various amounts of time, from as soon as I could log in
(about a minute or two since rebooting), up to about an hour.

I should have added that after a reboot, and when the conn tests are
red, then they stay red! Yet the clients are all up and running, and are
pingable. At what time I restart Xymon seems to make no difference, once
it is done then the tests start to turn green.

I can only assume that there is some initial condition which causes the
ping to fail, but that it remains in force until Xymon is restarted.
Very odd. I will investigate, but am a little lost as to why, say after
5, 10, 60 (!) mins, the tests do not automatically turn green.

I added 'trace' to one client in hosts,cfg, and it shows the traceroute
working fine but the test is still red and saying the ping failed.

Just a WAG: could Xymon be getting started before the network interfaces
and be locked onto localhost as a route, or in some other ambiguous
networking state?  How's it getting started at boot?
list John Horne · Fri, 13 Jul 2012 17:33:13 +0100 ·
quoted from John Horne
On Thu, 2012-07-12 at 10:35 +0100, John Horne wrote:
Hello,

Using Xymon 4.3.7 I have noticed that if I reboot the Xymon server then
the 'conn' test fails for all the clients. E.g.:

============================
Thu Jul 12 10:24:11 2012 conn NOT ok 
Service conn on dns1 is not OK : Host does not respond to ping


System unreachable for 5 poll periods (984 seconds)
============================

If, from the server, I run 'ping' to the client then that works fine. So
does fping. If I stop then start the Xymon service on the server then
the client conn tests all report ok.
Hello,

Sorry, but this turned out to be an SELinux problem. 'fping' is denied
write access to files in the ~/server/tmp directory on the Xymon server.
However, fping records its results in that directory, and Xymon looks at
them to see if a client is alive or not. Since there were no results,
because of SELinux, Xymon figured that all the clients were down.

I have created a local SELinux policy to allow writes for fping and that
seems to work. (I have rebooted the Xymon server and it didn't show any
red ping/conn tests.)

The clients don't use 'fping' so they don't have this problem.

Why did restarting the Xymon service (not the server) allow the tests to
turn green? Not sure.


Thanks for all the replies.
quoted from John Horne

John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list Jeremy Laidman · Tue, 17 Jul 2012 16:30:33 +1000 ·
quoted from John Horne
On Fri, Jul 13, 2012 at 6:38 PM, John Horne <user-e95f1ec2f147@xymon.invalid>wrote:
I should have added that after a reboot, and when the conn tests are
red, then they stay red! Yet the clients are all up and running, and are
pingable. At what time I restart Xymon seems to make no difference, once
it is done then the tests start to turn green.
This symptom is probably significant, but I can't think what might cause
it.  Once we know, it will all make sense!

Does tcpdump/snoop show the ping packets before the restart of Xymon?

J
list Japheth Cleaver · Tue, 17 Jul 2012 03:51:04 -0700 (PDT) ·
quoted from John Horne
On Thu, 2012-07-12 at 10:35 +0100, John Horne wrote:
Hello,

Sorry, but this turned out to be an SELinux problem. 'fping' is denied
write access to files in the ~/server/tmp directory on the Xymon server.
However, fping records its results in that directory, and Xymon looks at
them to see if a client is alive or not. Since there were no results,
because of SELinux, Xymon figured that all the clients were down.

I have created a local SELinux policy to allow writes for fping and that
seems to work. (I have rebooted the Xymon server and it didn't show any
red ping/conn tests.)

The clients don't use 'fping' so they don't have this problem.

Why did restarting the Xymon service (not the server) allow the tests to
turn green? Not sure.
SELinux policies distinguish between appending, writing, and seeking in
many cases. I don't recall the details, but I remember needing to futz
with different policies to figure out what was going on as well. Was
anything interesting going on in the audit logs at the time?

-jc
list John Horne · Tue, 17 Jul 2012 12:58:13 +0100 ·
quoted from Japheth Cleaver
On Tue, 2012-07-17 at 03:51 -0700, user-87556346d4af@xymon.invalid wrote:
On Thu, 2012-07-12 at 10:35 +0100, John Horne wrote:
Hello,

Sorry, but this turned out to be an SELinux problem. 'fping' is denied
write access to files in the ~/server/tmp directory on the Xymon server.
However, fping records its results in that directory, and Xymon looks at
them to see if a client is alive or not. Since there were no results,
because of SELinux, Xymon figured that all the clients were down.

I have created a local SELinux policy to allow writes for fping and that
seems to work. (I have rebooted the Xymon server and it didn't show any
red ping/conn tests.)

The clients don't use 'fping' so they don't have this problem.

Why did restarting the Xymon service (not the server) allow the tests to
turn green? Not sure.
SELinux policies distinguish between appending, writing, and seeking in
many cases. I don't recall the details, but I remember needing to futz
with different policies to figure out what was going on as well. Was
anything interesting going on in the audit logs at the time?
Hi,

Nothing else was going on in the logs at the time that the fpings were
stopped. The log showed that it was a write denial:

=============================
type=AVC msg=audit(1342195229.681:349): avc:  denied  { write } for
pid=25973 comm="fping"
path="/home/xymon/server/tmp/ping-stderr.25955.00" dev=sdb1 ino=1587865
scontext=system_u:system_r:ping_t:s0
tcontext=system_u:object_r:user_home_t:s0 tclass=file
=============================

Using audit2allow to create a policy allowing writes in 'tmp' solved the
problem.
quoted from John Horne


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX