Xymon Mailing List Archive search

Bug? Dropping host doesn't remove hostdata

10 messages in this thread

list John Horne · Tue, 02 Jul 2013 11:17:50 +0100 ·
Hello,

Using Xymon 4.3.10 I wanted to remove a host completely, so used the
following command on the Xymon server:

   xymon xxx.xxx.xxx.xxx "drop lib-srvr10"

This removed all of the host data except for the 'lib-srvr10'
subdirectory in the 'data/hostdata' directory. The subdirectory still
contained all of the data files for the host.

I had already removed the host from the hosts.cfg file, and restarted
Xymon. No errors were seen in any of the log files.


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list Japheth Cleaver · Wed, 17 Jul 2013 00:38:10 -0000 (UTC) ·
quoted from John Horne
Hello,

Using Xymon 4.3.10 I wanted to remove a host completely, so used the
following command on the Xymon server:

   xymon xxx.xxx.xxx.xxx "drop lib-srvr10"

This removed all of the host data except for the 'lib-srvr10'
subdirectory in the 'data/hostdata' directory. The subdirectory still
contained all of the data files for the host.

I had already removed the host from the hosts.cfg file, and restarted
Xymon. No errors were seen in any of the log files.
Hmm. Just ran this myself on the occasion of having a host to drop and it
seemed to be working okay. Worked both for a host I was dropping "live"
and one that had already been deleted a restart or two ago ; in both cases
the directory under hostdata/* was removed properly.

Is this still occurring for you? Had it happened before?


Regards,

-jc
list John Horne · Thu, 18 Jul 2013 10:26:04 +0100 ·
quoted from Japheth Cleaver
On Wed, 2013-07-17 at 00:38 +0000, user-87556346d4af@xymon.invalid wrote:
Hello,

Using Xymon 4.3.10 I wanted to remove a host completely, so used the
following command on the Xymon server:

   xymon xxx.xxx.xxx.xxx "drop lib-srvr10"

This removed all of the host data except for the 'lib-srvr10'
subdirectory in the 'data/hostdata' directory. The subdirectory still
contained all of the data files for the host.

I had already removed the host from the hosts.cfg file, and restarted
Xymon. No errors were seen in any of the log files.
Hmm. Just ran this myself on the occasion of having a host to drop and it
seemed to be working okay. Worked both for a host I was dropping "live"
and one that had already been deleted a restart or two ago ; in both cases
the directory under hostdata/* was removed properly.

Is this still occurring for you? Had it happened before?
I dropped two hosts and neither had their directory removed under
hostdata.

I think it has happened before because when it happened this time I
thought I had already reported it as a bug. I couldn't find any record
of that though, hence the report this time.
quoted from John Horne


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list John Horne · Thu, 18 Jul 2013 18:08:02 +0100 ·
quoted from John Horne
On Wed, 2013-07-17 at 00:38 +0000, user-87556346d4af@xymon.invalid wrote:
Hello,

Using Xymon 4.3.10 I wanted to remove a host completely, so used the
following command on the Xymon server:

   xymon xxx.xxx.xxx.xxx "drop lib-srvr10"

This removed all of the host data except for the 'lib-srvr10'
subdirectory in the 'data/hostdata' directory. The subdirectory still
contained all of the data files for the host.

I had already removed the host from the hosts.cfg file, and restarted
Xymon. No errors were seen in any of the log files.
Hmm. Just ran this myself on the occasion of having a host to drop and it
seemed to be working okay. Worked both for a host I was dropping "live"
and one that had already been deleted a restart or two ago ; in both cases
the directory under hostdata/* was removed properly.

Is this still occurring for you? Had it happened before?
I can probably test this with some test servers. However, where does the
deletion of 'hostdata' directories occur? I'll add some
logging/debugging to see if I can see what is going on.


John.

-- 
John Horne, Plymouth University, UK
Tel: +XX (X)XXXX XXXXXX    Fax: +XX (X)XXXX XXXXXX
list Japheth Cleaver · Sat, 20 Jul 2013 19:10:03 -0000 (UTC) ·
quoted from John Horne
On Wed, 2013-07-17 at 00:38 +0000, user-87556346d4af@xymon.invalid wrote:
Hello,

Using Xymon 4.3.10 I wanted to remove a host completely, so used the
following command on the Xymon server:

   xymon xxx.xxx.xxx.xxx "drop lib-srvr10"

This removed all of the host data except for the 'lib-srvr10'
subdirectory in the 'data/hostdata' directory. The subdirectory still
contained all of the data files for the host.

I had already removed the host from the hosts.cfg file, and restarted
Xymon. No errors were seen in any of the log files.
Hmm. Just ran this myself on the occasion of having a host to drop and
it
seemed to be working okay. Worked both for a host I was dropping "live"
and one that had already been deleted a restart or two ago ; in both
cases
the directory under hostdata/* was removed properly.

Is this still occurring for you? Had it happened before?
I can probably test this with some test servers. However, where does the
deletion of 'hostdata' directories occur? I'll add some
logging/debugging to see if I can see what is going on.
The deletion is handled by xymond_hostdata.c when it gets a @@drophost
message (which gets sent to all channels by xymond and bypasses
xymond_channel filters). It's around 263 in the .c code in my copy.


== snip ==
                else if ((metacount > 3) && (strncmp(metadata[0],
"@@drophost", 10) == 0)) {
                        /* @@drophost|timestamp|sender|hostname */
                        char hostdir[PATH_MAX];
                        sprintf(hostdir, "%s/%s", clientlogdir, metadata[3]);
                        dropdirectory(hostdir, 1);
                }
== snip ==


At the risk of stating the obvious :) , did you double-check owner and
permissions on the dir?


-jc
list John Horne · Mon, 22 Jul 2013 12:21:56 +0100 ·
quoted from Japheth Cleaver
On Sat, 2013-07-20 at 19:10 +0000, user-87556346d4af@xymon.invalid wrote:
On Wed, 2013-07-17 at 00:38 +0000, user-87556346d4af@xymon.invalid wrote:
Hello,

Using Xymon 4.3.10 I wanted to remove a host completely, so used the
following command on the Xymon server:

   xymon xxx.xxx.xxx.xxx "drop lib-srvr10"

This removed all of the host data except for the 'lib-srvr10'
subdirectory in the 'data/hostdata' directory. The subdirectory still
contained all of the data files for the host.

I had already removed the host from the hosts.cfg file, and restarted
Xymon. No errors were seen in any of the log files.
Hmm. Just ran this myself on the occasion of having a host to drop and
it
seemed to be working okay. Worked both for a host I was dropping "live"
and one that had already been deleted a restart or two ago ; in both
cases
the directory under hostdata/* was removed properly.

Is this still occurring for you? Had it happened before?
I can probably test this with some test servers. However, where does the
deletion of 'hostdata' directories occur? I'll add some
logging/debugging to see if I can see what is going on.
The deletion is handled by xymond_hostdata.c when it gets a @@drophost
message (which gets sent to all channels by xymond and bypasses
xymond_channel filters). It's around 263 in the .c code in my copy.


== snip ==
                else if ((metacount > 3) && (strncmp(metadata[0],
"@@drophost", 10) == 0)) {
                        /* @@drophost|timestamp|sender|hostname */
                        char hostdir[PATH_MAX];
                        sprintf(hostdir, "%s/%s", clientlogdir, metadata[3]);
                        dropdirectory(hostdir, 1);
                }
== snip ==


At the risk of stating the obvious :) , did you double-check owner and
permissions on the dir?
Hello,

Yes I rechecked the ownership and permissions on the directory path and
filenames. They are fine :-)

I've started xymond_hostdata with the '--debug' option, but am not
seeing the '@@drophost' command being received. The main xymond log sees
the actual 'drop jhvm2' command though ('jhvm2' is the test host name).
So I am currently testing the above bit of code to see that it is
actually being reached.
quoted from John Horne


John.

-- 
John Horne                   Tel: +XX (X)XXXX XXXXXX
Plymouth University, UK      Fax: +XX (X)XXXX XXXXXX
list John Horne · Mon, 22 Jul 2013 18:57:14 +0100 ·
quoted from John Horne
On Mon, 2013-07-22 at 12:21 +0100, John Horne wrote:
I've started xymond_hostdata with the '--debug' option, but am not
seeing the '@@drophost' command being received. The main xymond log sees
the actual 'drop jhvm2' command though ('jhvm2' is the test host name).
So I am currently testing the above bit of code to see that it is
actually being reached.
Well I'm a bit stumped with this.

I have added several dbgprintf statements (which begin with 'JH:') to
both xymond.c and xymond_hostdata.c. I also modified tasks.cfg so that
xymond and xymond_hostdata started with the '--debug' option.

The log files show the command being received and sent to the xymon
channels. However, the hostdata.log does not show it being received.

From xymond.log:

====================================
29942 2013-07-22 18:32:59 -> do_message/1 (10 bytes): drop jhvm2
29942 2013-07-22 18:32:59 -> update_statistics
29942 2013-07-22 18:32:59 <- update_statistics
29942 2013-07-22 18:32:59 -> oksender
29942 2013-07-22 18:32:59 <- oksender(1-a)
29942 2013-07-22 18:32:59 -> handle_dropnrename
29942 2013-07-22 18:32:59 JH: In handle_dropnrename: host is jhvm2
29942 2013-07-22 18:32:59 JH: About to call posttochannel: statuschn
29942 2013-07-22 18:32:59 -> posttochannel
29942 2013-07-22 18:32:59 JH: In posttochannel: readymsg
29942 2013-07-22 18:32:59 JH: In posttochannel: command is:
@@drophost#195/*|1374514379.136454|141.163.66.133|jhvm2
29942 2013-07-22 18:32:59 Posting message 195 to 1 readers
29942 2013-07-22 18:32:59 <- posttochannel
29942 2013-07-22 18:32:59 JH: About to call posttochannel: stachgchn
29942 2013-07-22 18:32:59 -> posttochannel
29942 2013-07-22 18:32:59 JH: In posttochannel: readymsg
29942 2013-07-22 18:32:59 JH: In posttochannel: command is:
@@drophost#195/*|1374514379.136550|141.163.66.133|jhvm2
29942 2013-07-22 18:32:59 Posting message 195 to 1 readers
29942 2013-07-22 18:32:59 <- posttochannel
29942 2013-07-22 18:32:59 JH: About to call posttochannel: pagechn
29942 2013-07-22 18:32:59 -> posttochannel
29942 2013-07-22 18:32:59 JH: In posttochannel: readymsg
29942 2013-07-22 18:32:59 JH: In posttochannel: command is:
@@drophost#1/*|1374514379.136624|141.163.66.133|jhvm2
29942 2013-07-22 18:32:59 Posting message 1 to 1 readers
29942 2013-07-22 18:32:59 <- posttochannel
29942 2013-07-22 18:32:59 JH: About to call posttochannel: datachn
29942 2013-07-22 18:32:59 -> posttochannel
29942 2013-07-22 18:32:59 JH: In posttochannel: readymsg
29942 2013-07-22 18:32:59 JH: In posttochannel: command is:
@@drophost#22/*|1374514379.136739|141.163.66.133|jhvm2
29942 2013-07-22 18:32:59 Posting message 22 to 1 readers
29942 2013-07-22 18:32:59 <- posttochannel
29942 2013-07-22 18:32:59 JH: About to call posttochannel: noteschn
29942 2013-07-22 18:32:59 -> posttochannel
29942 2013-07-22 18:32:59 Dropping message - no readers
29942 2013-07-22 18:32:59 JH: About to call posttochannel: enadischn
29942 2013-07-22 18:32:59 -> posttochannel
29942 2013-07-22 18:32:59 Dropping message - no readers
29942 2013-07-22 18:32:59 JH: About to call posttochannel: clientchn
29942 2013-07-22 18:32:59 -> posttochannel
29942 2013-07-22 18:32:59 JH: In posttochannel: readymsg
29942 2013-07-22 18:32:59 JH: In posttochannel: command is:
@@drophost#4/*|1374514379.136890|141.163.66.133|jhvm2
29942 2013-07-22 18:32:59 Posting message 4 to 1 readers
29942 2013-07-22 18:32:59 <- posttochannel
====================================

Basically this shows 'dropnrename' being called with the host name
'jhvm2'. It then calls 'posttochannel' for each channel, where the
message is either dropped if there are no readers, or is sent on with
the '@@drophost... jhvm2' command.


From hostdata.log:

====================================
21100 2013-07-22 18:07:59 JH: xymond_hostdata starting: clientlogdir
is: /home/xymon/data/hostdata
21100 2013-07-22 18:07:59 Want msg 1, startpos 0, fillpos 0, endpos -1,
usedbytes=0, bufleft=2101247
21100 2013-07-22 18:07:59 Got 44 bytes
21100 2013-07-22 18:07:59 xymond_hostdata: Got message 1 @@shutdown#1/*|
1374512879.482598|xymond|
21100 2013-07-22 18:07:59 startpos 44, fillpos 44, endpos -1
2013-07-22 18:31:16 Peer not up, flushing message queue
====================================


So the command is accepted by xymond and sent on, but not received by
xymond_hostdata.

Unfortunately (?) this IPC is controlled by semaphores, so seeing as to
why xymond_hostdata does not pick up the message may be difficult.
quoted from John Horne


John.

-- 
John Horne, Plymouth University, UK
Tel: +XX (X)XXXX XXXXXX    Fax: +XX (X)XXXX XXXXXX
list Japheth Cleaver · Mon, 22 Jul 2013 18:40:21 -0000 (UTC) ·
quoted from John Horne
On Mon, 2013-07-22 at 12:21 +0100, John Horne wrote:
I've started xymond_hostdata with the '--debug' option, but am not
seeing the '@@drophost' command being received. The main xymond log sees
the actual 'drop jhvm2' command though ('jhvm2' is the test host name).
So I am currently testing the above bit of code to see that it is
actually being reached.
Well I'm a bit stumped with this.

I have added several dbgprintf statements (which begin with 'JH:') to
both xymond.c and xymond_hostdata.c. I also modified tasks.cfg so that
xymond and xymond_hostdata started with the '--debug' option.

The log files show the command being received and sent to the xymon
channels. However, the hostdata.log does not show it being received.
*snip*
quoted from John Horne
So the command is accepted by xymond and sent on, but not received by
xymond_hostdata.

Unfortunately (?) this IPC is controlled by semaphores, so seeing as to
why xymond_hostdata does not pick up the message may be difficult.
A ha! It all makes sense now... :) The root of this is actually a known
issue: drop commands not getting sent to the CLICHG channel, which is why
it only shows up here and not, say, xymond_rrd... xymond_hostdata is the
only built-in that uses this one.


It's patched in the Terabithia RPMs from last month, but it's not in the
release tarball yet. The attached file should resolve it for you; can you
verify?


HTH,

-jc
Attachments (1)
list John Horne · Mon, 22 Jul 2013 21:04:17 +0100 ·
quoted from Japheth Cleaver
On Mon, 2013-07-22 at 18:40 +0000, user-87556346d4af@xymon.invalid wrote:
On Mon, 2013-07-22 at 12:21 +0100, John Horne wrote:

So the command is accepted by xymond and sent on, but not received by
xymond_hostdata.

Unfortunately (?) this IPC is controlled by semaphores, so seeing as to
why xymond_hostdata does not pick up the message may be difficult.
A ha! It all makes sense now... :) The root of this is actually a known
issue: drop commands not getting sent to the CLICHG channel, which is why
it only shows up here and not, say, xymond_rrd... xymond_hostdata is the
only built-in that uses this one.


It's patched in the Terabithia RPMs from last month, but it's not in the
release tarball yet. The attached file should resolve it for you; can you
verify?
Hello,

Yes that patch works fine :-) The hostdata log now shows the directory
(and files) being deleted (and I checked that they had actually been
deleted from the disk).

Many thanks for your help.
quoted from John Horne


John.

-- 
John Horne, Plymouth University, UK
Tel: +XX (X)XXXX XXXXXX    Fax: +XX (X)XXXX XXXXXX
list Henrik Størner · Tue, 23 Jul 2013 15:23:35 +0200 ·
quoted from John Horne
On 22-07-2013 22:04, John Horne wrote:
On Mon, 2013-07-22 at 18:40 +0000, user-87556346d4af@xymon.invalid wrote:
A ha! It all makes sense now... :) The root of this is actually a known
issue: drop commands not getting sent to the CLICHG channel, which is why
it only shows up here and not, say, xymond_rrd... xymond_hostdata is the
only built-in that uses this one.
Yes that patch works fine :-) The hostdata log now shows the directory
(and files) being deleted (and I checked that they had actually been
deleted from the disk).
I've just added this for 4.3.12.


Regards,
Henrik