Xymon Mailing List Archive search

Issues after upgrade from 4.2.3 to 4.3.5

5 messages in this thread

list Ryan Skadberg · Wed, 26 Oct 2011 18:01:28 -0400 ·
Hi All -

  I've been running hobbit/xymon for a long time, but I've finally just
joined the list due to a number of issues after I upgraded from 4.2.3 to
4.3.5.  I walked through and fixed all the file names and everything SEEMS
ok, but am seeing a number of strange things:

a) When I start up xymon, I don't get my previous state back.  The xymond
options have:

--restart=/usr/lib/xymon/server/tmp/xymond.chk
--checkpoint-file=/usr/lib/xymon/server/tmp/xymond.chk

And the files has data:

8136 /usr/lib/xymon/server/tmp/xymond.chk

But when I restart the server or service, I seem to lose all data and
anything that has been disabled or acked is now once again back to it's
starting state.

b) I can't seem to add new machines.  I add a machine to hosts.cfg and
analysis.cfg and it shows up on the web page, but it never seems to
actually receive any data.  I've tried reloading the service, but that
doesn't seem to help.  I even tried something like:

xymon localhost 'enable machine.company.com'

But it did not help.  I see on the Ghost client page that it seems to
recognize the correlation as it has a candidate, but isn't doing anything.
 Do I need to do something different with this newer version to enable a
machine instead of just adding it to the two files?

c) Last issue is very similar to the previous issue.  I removed a machine
from the hosts.cfg file, then ran:

xymon localhost 'drop machine.company.com'

But when it didn't check in for an hour, it still seemingly went purple and
sent out emails/pages.  I did a reload on the service and this now seems to
have stopped, but I don't think it should have happened in the first place.

d) I get startup errors that don't seem to make a ton of sense:

In xymonlaunch.log, I see:

2011-10-25 12:11:51 xymonlaunch starting
2011-10-25 12:11:51 Loading tasklist configuration from
/usr/lib/xymon/server/etc/tasks.cfg
2011-10-25 12:11:51 Cannot open directory
2011-10-25 12:11:51 Loading hostnames
2011-10-25 12:11:51 Cannot load host data
2011-10-25 12:11:51 Loading saved state
2011-10-25 12:11:51 Cannot access checkpoint file
/usr/lib/xymon/server/tmp/xymond.chk for restore
2011-10-25 12:11:51 Setting up network listener on 0.0.0.0:1984
2011-10-25 12:11:51 Setting up signal handlers
2011-10-25 12:11:51 Setting up xymond channels
2011-10-25 12:11:51 Setting up logfiles


As I said, I am seeing all of the hosts that I have in the system and have
double checked the permissions and the xymon user can most definitely access
all of the files, so why am I getting these errors?

[xymon at vir5ob xymon]$ whoami
xymon
[xymon at vir5ob xymon]$ wc -l /usr/lib/xymon/server/tmp/xymond.chk
8136 /usr/lib/xymon/server/tmp/xymond.chk
[xymon at vir5ob xymon]$ ls -als /usr/lib/xymon/server/tmp/xymond.chk
31756 -rw-rw-r-- 1 xymon xymon 32474378 Oct 26 17:54
/usr/lib/xymon/server/tmp/xymond.chk

I am also seeing:

2011-10-26 17:54:50 Cannot load host data

in my xymond.log every 10 minutes.  As I said, any files seem to be
accessible as far as I can tell, but maybe since the error message is not
very verbose, I am not looking in the right place.

I know these probably all seem like newbie questions, but I have done all
the debugging I seem to be able to do and can't figure anything out on
these.  Any help would be greatly appreciated.

Thanks!
Skadz
list Ryan Skadberg · Thu, 3 Nov 2011 12:51:34 -0400 ·
Any help here?  I've tried a number of different things to attempt to solve
these on my own, but nothing has worked.  Any help would be greatly
quoted from Ryan Skadberg
appreciated.

Thanks!
Skadz


On Wed, Oct 26, 2011 at 6:01 PM, Ryan Skadberg <user-fd74ceebcd5d@xymon.invalid> wrote:
Hi All -

  I've been running hobbit/xymon for a long time, but I've finally just
joined the list due to a number of issues after I upgraded from 4.2.3 to
4.3.5.  I walked through and fixed all the file names and everything SEEMS
ok, but am seeing a number of strange things:

a) When I start up xymon, I don't get my previous state back.  The xymond
options have:

--restart=/usr/lib/xymon/server/tmp/xymond.chk
--checkpoint-file=/usr/lib/xymon/server/tmp/xymond.chk

And the files has data:

8136 /usr/lib/xymon/server/tmp/xymond.chk

But when I restart the server or service, I seem to lose all data and
anything that has been disabled or acked is now once again back to it's
starting state.

b) I can't seem to add new machines.  I add a machine to hosts.cfg and
analysis.cfg and it shows up on the web page, but it never seems to
actually receive any data.  I've tried reloading the service, but that
doesn't seem to help.  I even tried something like:

xymon localhost 'enable machine.company.com'

But it did not help.  I see on the Ghost client page that it seems to
recognize the correlation as it has a candidate, but isn't doing anything.
 Do I need to do something different with this newer version to enable a
machine instead of just adding it to the two files?

c) Last issue is very similar to the previous issue.  I removed a machine
from the hosts.cfg file, then ran:

xymon localhost 'drop machine.company.com'

But when it didn't check in for an hour, it still seemingly went purple
and sent out emails/pages.  I did a reload on the service and this now
seems to have stopped, but I don't think it should have happened in the
first place.

d) I get startup errors that don't seem to make a ton of sense:

In xymonlaunch.log, I see:

2011-10-25 12:11:51 xymonlaunch starting
2011-10-25 12:11:51 Loading tasklist configuration from
/usr/lib/xymon/server/etc/tasks.cfg
2011-10-25 12:11:51 Cannot open directory
2011-10-25 12:11:51 Loading hostnames
2011-10-25 12:11:51 Cannot load host data
2011-10-25 12:11:51 Loading saved state
2011-10-25 12:11:51 Cannot access checkpoint file
/usr/lib/xymon/server/tmp/xymond.chk for restore
2011-10-25 12:11:51 Setting up network listener on 0.0.0.0:1984
2011-10-25 12:11:51 Setting up signal handlers
2011-10-25 12:11:51 Setting up xymond channels
2011-10-25 12:11:51 Setting up logfiles


As I said, I am seeing all of the hosts that I have in the system and have
double checked the permissions and the xymon user can most definitely
access all of the files, so why am I getting these errors?

[xymon at vir5ob xymon]$ whoami
xymon
[xymon at vir5ob xymon]$ wc -l /usr/lib/xymon/server/tmp/xymond.chk
8136 /usr/lib/xymon/server/tmp/xymond.chk
[xymon at vir5ob xymon]$ ls -als /usr/lib/xymon/server/tmp/xymond.chk
31756 -rw-rw-r-- 1 xymon xymon 32474378 Oct 26 17:54
/usr/lib/xymon/server/tmp/xymond.chk

I am also seeing:

2011-10-26 17:54:50 Cannot load host data

in my xymond.log every 10 minutes.  As I said, any files seem to be
accessible as far as I can tell, but maybe since the error message is not
very verbose, I am not looking in the right place.

I know these probably all seem like newbie questions, but I have done all
the debugging I seem to be able to do and can't figure anything out on
these.  Any help would be greatly appreciated.

Thanks!
Skadz

list Ryan Novosielski · Thu, 03 Nov 2011 13:12:43 -0400 ·
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You should not have ghost clients. If you do, the host and the client
are not seeing eye to eye about what to call the machine. Sometimes this
happens where the client provides the short name and you've entered the
long one. I believe a CLIENT= statement on the line is what fixes that.
I don't know if that helps -- your e-mail is a little unlclear altogether.
quoted from Ryan Skadberg

On 11/03/2011 12:51 PM, Ryan Skadberg wrote:
Any help here?  I've tried a number of different things to attempt to
solve these on my own, but nothing has worked.  Any help would be
greatly appreciated.

Thanks!
Skadz


On Wed, Oct 26, 2011 at 6:01 PM, Ryan Skadberg <user-fd74ceebcd5d@xymon.invalid
<mailto:user-fd74ceebcd5d@xymon.invalid>> wrote:

    Hi All -

      I've been running hobbit/xymon for a long time, but I've finally
    just joined the list due to a number of issues after I upgraded from
    4.2.3 to 4.3.5.  I walked through and fixed all the file names and
    everything SEEMS ok, but am seeing a number of strange things:

    a) When I start up xymon, I don't get my previous state back.  The
    xymond options have:

    --restart=/usr/lib/xymon/server/tmp/xymond.chk
    --checkpoint-file=/usr/lib/xymon/server/tmp/xymond.chk

    And the files has data:

    8136 /usr/lib/xymon/server/tmp/xymond.chk

    But when I restart the server or service, I seem to lose all data
    and anything that has been disabled or acked is now once again back
    to it's starting state.

    b) I can't seem to add new machines.  I add a machine to hosts.cfg
    and analysis.cfg and it shows up on the web page, but it never seems
    to actually receive any data.  I've tried reloading the service, but
    that doesn't seem to help.  I even tried something like:

    xymon localhost 'enable machine.company.com

    <http://machine.company.com>';
quoted from Ryan Skadberg

    But it did not help.  I see on the Ghost client page that it seems
    to recognize the correlation as it has a candidate, but isn't doing
    anything.  Do I need to do something different with this newer
    version to enable a machine instead of just adding it to the two files?

    c) Last issue is very similar to the previous issue.  I removed a
    machine from the hosts.cfg file, then ran:

    xymon localhost 'drop machine.company.com <http://machine.company.com>';
quoted from Ryan Skadberg

    But when it didn't check in for an hour, it still seemingly went
    purple and sent out emails/pages.  I did a reload on the service and
    this now seems to have stopped, but I don't think it should have
    happened in the first place.

    d) I get startup errors that don't seem to make a ton of sense:

    In xymonlaunch.log, I see:

    2011-10-25 12:11:51 xymonlaunch starting
    2011-10-25 12:11:51 Loading tasklist configuration from
    /usr/lib/xymon/server/etc/tasks.cfg
    2011-10-25 12:11:51 Cannot open directory 
    2011-10-25 12:11:51 Loading hostnames
    2011-10-25 12:11:51 Cannot load host data
    2011-10-25 12:11:51 Loading saved state
    2011-10-25 12:11:51 Cannot access checkpoint file
    /usr/lib/xymon/server/tmp/xymond.chk for restore
    2011-10-25 12:11:51 Setting up network listener on 0.0.0.0:1984

    <http://0.0.0.0:1984>;
quoted from Ryan Skadberg
    2011-10-25 12:11:51 Setting up signal handlers
    2011-10-25 12:11:51 Setting up xymond channels
    2011-10-25 12:11:51 Setting up logfiles


    As I said, I am seeing all of the hosts that I have in the system
    and have double checked the permissions and the xymon user can most
    definitely access all of the files, so why am I getting these errors?

    [xymon at vir5ob xymon]$ whoami
    xymon
    [xymon at vir5ob xymon]$ wc -l /usr/lib/xymon/server/tmp/xymond.chk 
    8136 /usr/lib/xymon/server/tmp/xymond.chk
    [xymon at vir5ob xymon]$ ls -als /usr/lib/xymon/server/tmp/xymond.chk 
    31756 -rw-rw-r-- 1 xymon xymon 32474378 Oct 26 17:54
    /usr/lib/xymon/server/tmp/xymond.chk

    I am also seeing:

    2011-10-26 17:54:50 Cannot load host data

    in my xymond.log every 10 minutes.  As I said, any files seem to be
    accessible as far as I can tell, but maybe since the error message
    is not very verbose, I am not looking in the right place.

    I know these probably all seem like newbie questions, but I have
    done all the debugging I seem to be able to do and can't figure
    anything out on these.  Any help would be greatly appreciated.

    Thanks!
    Skadz

- -- 
- ---- _  _ _  _ ___  _  _  _

|Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Sr. Systems Programmer
|$&| |__| |  | |__/ | \| _| |user-ae4522577e16@xymon.invalid - 973/972.0922 (2-0922)
\__/ Univ. of Med. and Dent.|IST/CST-Academic Svcs. - ADMC 450, Newark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6yy4sACgkQmb+gadEcsb5MVgCfdnRczmcCF20dwEk/a+nkqUT3
iqMAoIGqD3GT7spunsQCwfDIO8zUSpfD
=KAKB
-----END PGP SIGNATURE-----
list Martin Flemming · Thu, 3 Nov 2011 19:12:24 +0100 (CET) ·
Hi, Ryan !

I've got till one hour before the same issues and was completley unlucky like you :-(

Unfortunatley i didn't know how i  solve it but now it runs like a charme ...

But i think it was something with xymonlaunch and the start-script ...

Please debug the scripts

  /etc/init.d/xymon and 
/usr/lib/xymon/server/bin/xymon.sh

with set -x and watch the logfiles

maybe & hopefully it helps ..

cheers,
 	martin
quoted from Ryan Skadberg

On Thu, 3 Nov 2011, Ryan Skadberg wrote:
Any help here?  I've tried a number of different things to attempt to solve these on my own, but nothing has worked.  Any help would
be greatly appreciated.
Thanks!
Skadz


On Wed, Oct 26, 2011 at 6:01 PM, Ryan Skadberg <user-fd74ceebcd5d@xymon.invalid> wrote:
      Hi All -
  I've been running hobbit/xymon for a long time, but I've finally just joined the list due to a number of issues after I
upgraded from 4.2.3 to 4.3.5.  I walked through and fixed all the file names and everything SEEMS ok, but am seeing a number
of strange things:

a) When I start up xymon, I don't get my previous state back.  The xymond options have:

--restart=/usr/lib/xymon/server/tmp/xymond.chk
--checkpoint-file=/usr/lib/xymon/server/tmp/xymond.chk

And the files has data:

8136 /usr/lib/xymon/server/tmp/xymond.chk

But when I restart the server or service, I seem to lose all data and anything that has been disabled or acked is now once
again back to it's starting state.

b) I can't seem to add new machines.  I add a machine to hosts.cfg and analysis.cfg and it shows up on the web page, but it
never seems to actually receive any data.  I've tried reloading the service, but that doesn't seem to help.  I even tried
something like:

xymon localhost 'enable machine.company.com'

But it did not help.  I see on the Ghost client page that it seems to recognize the correlation as it has a candidate, but
isn't doing anything.  Do I need to do something different with this newer version to enable a machine instead of just adding
it to the two files?

c) Last issue is very similar to the previous issue.  I removed a machine from the hosts.cfg file, then ran:

xymon localhost 'drop machine.company.com'

But when it didn't check in for an hour, it still seemingly went purple and sent out emails/pages.  I did a reload on the
service and this now seems to have stopped, but I don't think it should have happened in the first place.

d) I get startup errors that don't seem to make a ton of sense:

In xymonlaunch.log, I see:

2011-10-25 12:11:51 xymonlaunch starting
2011-10-25 12:11:51 Loading tasklist configuration from /usr/lib/xymon/server/etc/tasks.cfg
2011-10-25 12:11:51 Cannot open directory 
2011-10-25 12:11:51 Loading hostnames
2011-10-25 12:11:51 Cannot load host data
2011-10-25 12:11:51 Loading saved state
2011-10-25 12:11:51 Cannot access checkpoint file /usr/lib/xymon/server/tmp/xymond.chk for restore
2011-10-25 12:11:51 Setting up network listener on 0.0.0.0:1984
2011-10-25 12:11:51 Setting up signal handlers
2011-10-25 12:11:51 Setting up xymond channels
2011-10-25 12:11:51 Setting up logfiles


As I said, I am seeing all of the hosts that I have in the system and have double checked the permissions and the xymon user
can most definitely access all of the files, so why am I getting these errors?

[xymon at vir5ob xymon]$ whoami
xymon
[xymon at vir5ob xymon]$ wc -l /usr/lib/xymon/server/tmp/xymond.chk 
8136 /usr/lib/xymon/server/tmp/xymond.chk
[xymon at vir5ob xymon]$ ls -als /usr/lib/xymon/server/tmp/xymond.chk 
31756 -rw-rw-r-- 1 xymon xymon 32474378 Oct 26 17:54 /usr/lib/xymon/server/tmp/xymond.chk

I am also seeing:

2011-10-26 17:54:50 Cannot load host data

in my xymond.log every 10 minutes.  As I said, any files seem to be accessible as far as I can tell, but maybe since the error
message is not very verbose, I am not looking in the right place.

I know these probably all seem like newbie questions, but I have done all the debugging I seem to be able to do and can't
figure anything out on these.  Any help would be greatly appreciated.

Thanks!
Skadz

list Ryan Skadberg · Tue, 13 Dec 2011 10:32:02 -0500 ·
Just to complete the loop here, the 4.3.6 upgrade fixed all the issues I
was seeing.  Guess whatever was fixed with the host loading fixed my issues.

Skadz
quoted from Martin Flemming


On Thu, Nov 3, 2011 at 2:12 PM, Martin Flemming <user-f286aaa49a76@xymon.invalid>wrote:
Hi, Ryan !

I've got till one hour before the same issues and was completley unlucky
like you :-(

Unfortunatley i didn't know how i  solve it but now it runs like a charme
...

But i think it was something with xymonlaunch and the start-script ...

Please debug the scripts

 /etc/init.d/xymon and /usr/lib/xymon/server/bin/**xymon.sh

with set -x and watch the logfiles

maybe & hopefully it helps ..

cheers,
       martin


On Thu, 3 Nov 2011, Ryan Skadberg wrote:

 Any help here?  I've tried a number of different things to attempt to
solve these on my own, but nothing has worked.  Any help would
be greatly appreciated.
Thanks!
Skadz


On Wed, Oct 26, 2011 at 6:01 PM, Ryan Skadberg <user-fd74ceebcd5d@xymon.invalid> wrote:
     Hi All -
  I've been running hobbit/xymon for a long time, but I've finally just
joined the list due to a number of issues after I
upgraded from 4.2.3 to 4.3.5.  I walked through and fixed all the file
names and everything SEEMS ok, but am seeing a number
of strange things:

a) When I start up xymon, I don't get my previous state back.  The xymond
options have:

--restart=/usr/lib/xymon/**server/tmp/xymond.chk
--checkpoint-file=/usr/lib/**xymon/server/tmp/xymond.chk

And the files has data:

8136 /usr/lib/xymon/server/tmp/**xymond.chk

But when I restart the server or service, I seem to lose all data and
anything that has been disabled or acked is now once
again back to it's starting state.

b) I can't seem to add new machines.  I add a machine to hosts.cfg and
analysis.cfg and it shows up on the web page, but it
never seems to actually receive any data.  I've tried reloading the
service, but that doesn't seem to help.  I even tried
something like:

xymon localhost 'enable machine.company.com'

But it did not help.  I see on the Ghost client page that it seems to
recognize the correlation as it has a candidate, but
isn't doing anything.  Do I need to do something different with this
newer version to enable a machine instead of just adding
it to the two files?

c) Last issue is very similar to the previous issue.  I removed a machine
from the hosts.cfg file, then ran:

xymon localhost 'drop machine.company.com'

But when it didn't check in for an hour, it still seemingly went purple
and sent out emails/pages.  I did a reload on the
service and this now seems to have stopped, but I don't think it should
have happened in the first place.

d) I get startup errors that don't seem to make a ton of sense:

In xymonlaunch.log, I see:

2011-10-25 12:11:51 xymonlaunch starting
2011-10-25 12:11:51 Loading tasklist configuration from
/usr/lib/xymon/server/etc/**tasks.cfg
2011-10-25 12:11:51 Cannot open directory
2011-10-25 12:11:51 Loading hostnames
2011-10-25 12:11:51 Cannot load host data
2011-10-25 12:11:51 Loading saved state
2011-10-25 12:11:51 Cannot access checkpoint file
/usr/lib/xymon/server/tmp/**xymond.chk for restore
2011-10-25 12:11:51 Setting up network listener on 0.0.0.0:1984
2011-10-25 12:11:51 Setting up signal handlers
2011-10-25 12:11:51 Setting up xymond channels
2011-10-25 12:11:51 Setting up logfiles


As I said, I am seeing all of the hosts that I have in the system and
have double checked the permissions and the xymon user
can most definitely access all of the files, so why am I getting these
errors?

[xymon at vir5ob xymon]$ whoami
xymon
[xymon at vir5ob xymon]$ wc -l /usr/lib/xymon/server/tmp/**xymond.chk
8136 /usr/lib/xymon/server/tmp/**xymond.chk
[xymon at vir5ob xymon]$ ls -als /usr/lib/xymon/server/tmp/**xymond.chk
31756 -rw-rw-r-- 1 xymon xymon 32474378 Oct 26 17:54
/usr/lib/xymon/server/tmp/**xymond.chk

I am also seeing:

2011-10-26 17:54:50 Cannot load host data

in my xymond.log every 10 minutes.  As I said, any files seem to be
accessible as far as I can tell, but maybe since the error
message is not very verbose, I am not looking in the right place.

I know these probably all seem like newbie questions, but I have done all
the debugging I seem to be able to do and can't
figure anything out on these.  Any help would be greatly appreciated.

Thanks!
Skadz