Xymon Mailing List Archive search

msgcache hobbitfetch problems

8 messages in this thread

list Steinar M. Skúlason · Mon, 8 Feb 2010 16:37:31 +0000 ·
Hi,

I'm having problems with "msgcache" on the client machines and "hobbitfetch"
on the server machine.
It works for a short period and then get's stuck and all my client side
checks end up with status purple.

Since I can't have my clients report straight back to the server I was
forced to use hobbitfetch/msgcache.
I'm using the latest version of xymon ( 4.3.0-beta2 )

Is there anyone that has faced similar problems and hopefully fixed it ?


Best Regards,
Steinar
list Daniel McDonald · Mon, 08 Feb 2010 11:36:51 -0600 ·
quoted from Steinar M. Skúlason
On 2/8/10 10:37 AM, "Steinar M. Skúlason" <user-3b78224d184c@xymon.invalid> wrote:
Hi,

I'm having problems with "msgcache" on the client machines and "hobbitfetch"
on the server machine.
It works for a short period and then get's stuck and all my client side checks
end up with status purple.
Yup.  Been doing that for a long time here.  I sent a bunch of corefiles to
Henrik about it, and he tried a bunch of patches.  Eventually, we just wrote
a routine that restarts hobbitfetch whenever a host turns purple.

-- 
Daniel J McDonald, CCIE # 2495, CISSP # 78281
list Steinar M. Skúlason · Mon, 8 Feb 2010 18:12:30 +0000 ·
On Mon, Feb 8, 2010 at 5:36 PM, Daniel McDonald <
quoted from Daniel McDonald
user-290ce4e24e19@xymon.invalid> wrote:
On 2/8/10 10:37 AM, "Steinar M. Skúlason" <user-3b78224d184c@xymon.invalid> wrote:
Hi,

I'm having problems with "msgcache" on the client machines and
"hobbitfetch"
on the server machine.
It works for a short period and then get's stuck and all my client side
checks
end up with status purple.
Yup.  Been doing that for a long time here.  I sent a bunch of corefiles to
Henrik about it, and he tried a bunch of patches.  Eventually, we just
wrote
a routine that restarts hobbitfetch whenever a host turns purple.

--
Daniel J McDonald, CCIE # 2495, CISSP # 78281

Ok, good to hear that I am not the only one.
I wrote a ugly routine that restarts hobbitfetch if there is no new entry in
the logfile


#!/bin/bash

#This is to see if any progress has been made within the hobbitfetch
utility.
TMP_FILE=/tmp/tmp.hobbitfetch.last
LAST_LINE=`tail -1 /usr/lib/xymon/server/log/hobbitfetch.log|awk '{print $1
$2}'`
PREV_LINE=`cat /tmp/tmp.hobbitfetch.last`
echo $LAST_LINE > $TMP_FILE

if [ "$LAST_LINE" == "$PREV_LINE" ]; then
  echo "Nothing has happend .... killing hobbitfetch!"
  PID=`ps -ef|grep hobbitfetch|awk '{print $2}'`
  kill -9 $PID
fi

Best Regards,
Steinar M.
list Cade Robinson · Mon, 08 Feb 2010 12:53:48 -0600 ·
The issue is in the hobbitfetch "grabdata" function.

int n;
char buf[8192];
...
n = read(conn->sockfd, buf, sizeof(buf));
...
else if (n > 0) {
...
buf[n] = '\0';
...

If the "read" reads 8192 bytes then n is 8192 and then buf[n] tries to
get set to NULL.  There is no element 8192.
Also only one read happens so if there is more than 8192 bytes to be
fetched not everything is fetched.

I put the "read" and "if"s in a do...while loop and fixed the null
termination on buf and haven't had any issues.

~/hobbitmon/trunk/hobbitd:-> diff -u hobbitfetch.c ./hobbitfetch.c.new
--- hobbitfetch.c       2010-02-08 12:43:22.781543905 -0600
+++ ./hobbitfetch.c.new 2010-02-08 12:52:25.249509306 -0600
@@ -342,8 +342,9 @@
     int n;
     char buf[8192];

+    do {
     /* Read data from a peer connection (client or server) */
-        n = read(conn->sockfd, buf, sizeof(buf));
+        n = read(conn->sockfd, buf, sizeof(buf)-2);
     if (n == -1) {
         if ((errno != EINTR) && (errno != EAGAIN)) {
             /* Read failure */
@@ -360,7 +361,7 @@
         /* Save the data */
         dbgprintf("Got %d bytes of data from %s (req %lu)\n",
             n, addrstring(&conn->caddr), conn->seq);
-        buf[n] = '\0';
+        buf[n+1] = '\0';
         addtobuffer(conn->msgbuf, buf);
     }
     else if (n == 0) {
@@ -380,6 +381,7 @@
             break;
         }
     }
+    } while (n>0);
 }

 void set_polltime(clients_t *client)
quoted from Steinar M. Skúlason


On Mon, 2010-02-08 at 18:12 +0000, Steinar M. Skúlason wrote:

On Mon, Feb 8, 2010 at 5:36 PM, Daniel McDonald
<user-290ce4e24e19@xymon.invalid> wrote:
        On 2/8/10 10:37 AM, "Steinar M. Skúlason"
        <user-3b78224d184c@xymon.invalid> wrote:
        
Hi,

I'm having problems with "msgcache" on the client machines
        and "hobbitfetch"
on the server machine.
It works for a short period and then get's stuck and all my
        client side checks
end up with status purple.
        
        
        Yup.  Been doing that for a long time here.  I sent a bunch of
        corefiles to
        Henrik about it, and he tried a bunch of patches.  Eventually,
        we just wrote
        a routine that restarts hobbitfetch whenever a host turns
        purple.
        
        --
        Daniel J McDonald, CCIE # 2495, CISSP # 78281
        
        
Ok, good to hear that I am not the only one.
I wrote a ugly routine that restarts hobbitfetch if there is no new
entry in the logfile


#!/bin/bash

#This is to see if any progress has been made within the hobbitfetch
utility.
TMP_FILE=/tmp/tmp.hobbitfetch.last
LAST_LINE=`tail -1 /usr/lib/xymon/server/log/hobbitfetch.log|awk
'{print $1 $2}'`
PREV_LINE=`cat /tmp/tmp.hobbitfetch.last`
echo $LAST_LINE > $TMP_FILE

if [ "$LAST_LINE" == "$PREV_LINE" ]; then
  echo "Nothing has happend .... killing hobbitfetch!"
  PID=`ps -ef|grep hobbitfetch|awk '{print $2}'`
  kill -9 $PID
fi

Best Regards,
Steinar M.
list Steinar M. Skúlason · Wed, 10 Feb 2010 09:33:19 +0000 ·
quoted from Cade Robinson
On Mon, Feb 8, 2010 at 6:53 PM, Cade Robinson <user-a187bb1b921c@xymon.invalid>wrote:
The issue is in the hobbitfetch "grabdata" function.

int n;
char buf[8192];
...
n = read(conn->sockfd, buf, sizeof(buf));
...
else if (n > 0) {
...
buf[n] = '\0';
...

If the "read" reads 8192 bytes then n is 8192 and then buf[n] tries to
get set to NULL.  There is no element 8192.
Also only one read happens so if there is more than 8192 bytes to be
fetched not everything is fetched.

I put the "read" and "if"s in a do...while loop and fixed the null
termination on buf and haven't had any issues.

~/hobbitmon/trunk/hobbitd:-> diff -u hobbitfetch.c ./hobbitfetch.c.new
--- hobbitfetch.c       2010-02-08 12:43:22.781543905 -0600
+++ ./hobbitfetch.c.new 2010-02-08 12:52:25.249509306 -0600
@@ -342,8 +342,9 @@
    int n;
    char buf[8192];

+    do {
    /* Read data from a peer connection (client or server) */
-        n = read(conn->sockfd, buf, sizeof(buf));
+        n = read(conn->sockfd, buf, sizeof(buf)-2);
    if (n == -1) {
        if ((errno != EINTR) && (errno != EAGAIN)) {
            /* Read failure */
@@ -360,7 +361,7 @@
        /* Save the data */
        dbgprintf("Got %d bytes of data from %s (req %lu)\n",
            n, addrstring(&conn->caddr), conn->seq);
-        buf[n] = '\0';
+        buf[n+1] = '\0';
        addtobuffer(conn->msgbuf, buf);
    }
    else if (n == 0) {
@@ -380,6 +381,7 @@
            break;
        }
    }
+    } while (n>0);
 }

 void set_polltime(clients_t *client)


On Mon, 2010-02-08 at 18:12 +0000, Steinar M. Skúlason wrote:

On Mon, Feb 8, 2010 at 5:36 PM, Daniel McDonald
<user-290ce4e24e19@xymon.invalid> wrote:
        On 2/8/10 10:37 AM, "Steinar M. Skúlason"
        <user-3b78224d184c@xymon.invalid> wrote:
Hi,

I'm having problems with "msgcache" on the client machines
        and "hobbitfetch"
on the server machine.
It works for a short period and then get's stuck and all my
        client side checks
end up with status purple.

        Yup.  Been doing that for a long time here.  I sent a bunch of
        corefiles to
        Henrik about it, and he tried a bunch of patches.  Eventually,
        we just wrote
        a routine that restarts hobbitfetch whenever a host turns
        purple.

        --
        Daniel J McDonald, CCIE # 2495, CISSP # 78281


Ok, good to hear that I am not the only one.
I wrote a ugly routine that restarts hobbitfetch if there is no new
entry in the logfile


#!/bin/bash

#This is to see if any progress has been made within the hobbitfetch
utility.
TMP_FILE=/tmp/tmp.hobbitfetch.last
LAST_LINE=`tail -1 /usr/lib/xymon/server/log/hobbitfetch.log|awk
'{print $1 $2}'`
PREV_LINE=`cat /tmp/tmp.hobbitfetch.last`
echo $LAST_LINE > $TMP_FILE

if [ "$LAST_LINE" == "$PREV_LINE" ]; then
  echo "Nothing has happend .... killing hobbitfetch!"
  PID=`ps -ef|grep hobbitfetch|awk '{print $2}'`
  kill -9 $PID
fi

Best Regards,
Steinar M.

Thank you for your reply Cade, I tried your patch but it was not working
for me
are you using the 4.3.0-beta2 for both client and server?
I get no checks populated with your changes.
Or did you also make changes to the msgcache.c ?

Regards,
Steinar M.
list Cade Robinson · Wed, 10 Feb 2010 08:25:39 -0600 ·
I think I know why.
My editor converted tab to space and I bet your file is tab indented.

You probably have to just edit the hobbitfetch.c file and add the couple
lines and edit the other two.
I will see about getting a patch against the 4.3.0-beta2 files and a
4.2.3 one if it is needed.
quoted from Steinar M. Skúlason


On Wed, 2010-02-10 at 09:33 +0000, Steinar M. Skúlason wrote:
On Mon, Feb 8, 2010 at 6:53 PM, Cade Robinson
<user-a187bb1b921c@xymon.invalid> wrote:

        The issue is in the hobbitfetch "grabdata" function.
        
        int n;
        char buf[8192];
        ...
        n = read(conn->sockfd, buf, sizeof(buf));
        ...
        else if (n > 0) {
        ...
        buf[n] = '\0';
        ...
        
        If the "read" reads 8192 bytes then n is 8192 and then buf[n]
        tries to
        get set to NULL.  There is no element 8192.
        Also only one read happens so if there is more than 8192 bytes
        to be
        fetched not everything is fetched.
        
        I put the "read" and "if"s in a do...while loop and fixed the
        null
        termination on buf and haven't had any issues.
        
        ~/hobbitmon/trunk/hobbitd:-> diff -u
        hobbitfetch.c ./hobbitfetch.c.new
        --- hobbitfetch.c       2010-02-08 12:43:22.781543905 -0600
        +++ ./hobbitfetch.c.new 2010-02-08 12:52:25.249509306 -0600
        @@ -342,8 +342,9 @@
            int n;
            char buf[8192];
        
        +    do {
            /* Read data from a peer connection (client or server) */
        -        n = read(conn->sockfd, buf, sizeof(buf));
        +        n = read(conn->sockfd, buf, sizeof(buf)-2);
            if (n == -1) {
                if ((errno != EINTR) && (errno != EAGAIN)) {
                    /* Read failure */
        @@ -360,7 +361,7 @@
                /* Save the data */
                dbgprintf("Got %d bytes of data from %s (req %lu)\n",
                    n, addrstring(&conn->caddr), conn->seq);
        -        buf[n] = '\0';
        +        buf[n+1] = '\0';
                addtobuffer(conn->msgbuf, buf);
            }
            else if (n == 0) {
        @@ -380,6 +381,7 @@
                    break;
                }
            }
        +    } while (n>0);
         }
        
         void set_polltime(clients_t *client)
        
        
        On Mon, 2010-02-08 at 18:12 +0000, Steinar M. Skúlason wrote:

On Mon, Feb 8, 2010 at 5:36 PM, Daniel McDonald
<user-290ce4e24e19@xymon.invalid> wrote:
        On 2/8/10 10:37 AM, "Steinar M. Skúlason"
        <user-3b78224d184c@xymon.invalid> wrote:
Hi,

I'm having problems with "msgcache" on the client
        machines
        and "hobbitfetch"
on the server machine.
It works for a short period and then get's stuck
        and all my
        client side checks
end up with status purple.

        Yup.  Been doing that for a long time here.  I sent
        a bunch of
        corefiles to
        Henrik about it, and he tried a bunch of patches.
         Eventually,
        we just wrote
        a routine that restarts hobbitfetch whenever a host
        turns
        purple.

        --
        Daniel J McDonald, CCIE # 2495, CISSP # 78281


        To unsubscribe from the hobbit list, send an e-mail
        to
        user-095ef1c764a2@xymon.invalid
quoted from Steinar M. Skúlason


Ok, good to hear that I am not the only one.
I wrote a ugly routine that restarts hobbitfetch if there is
        no new
entry in the logfile


#!/bin/bash

#This is to see if any progress has been made within the
        hobbitfetch
utility.
TMP_FILE=/tmp/tmp.hobbitfetch.last
LAST_LINE=`tail
        -1 /usr/lib/xymon/server/log/hobbitfetch.log|awk
'{print $1 $2}'`
PREV_LINE=`cat /tmp/tmp.hobbitfetch.last`
echo $LAST_LINE > $TMP_FILE

if [ "$LAST_LINE" == "$PREV_LINE" ]; then
  echo "Nothing has happend .... killing hobbitfetch!"
  PID=`ps -ef|grep hobbitfetch|awk '{print $2}'`
  kill -9 $PID
fi

Best Regards,
Steinar M.
        
        
Thank you for your reply Cade, I tried your patch but it was not
working for me
are you using the 4.3.0-beta2 for both client and server?
I get no checks populated with your changes.
Or did you also make changes to the msgcache.c ?

Regards,
Steinar M.
list Cade Robinson · Wed, 10 Feb 2010 13:54:58 -0600 ·
I am not sure how this patch was working for me.
It fixes if the fetched data is 8192 bytes or greater but breaks when
the read is <8192.
It does the initial read fine but on the second round through the
do/while loop n is -1 since there is nothing left to read and the remote
has closed.

So it thinks there is a failure.
quoted from Steinar M. Skúlason

On Wed, 2010-02-10 at 09:33 +0000, Steinar M. Skúlason wrote:
On Mon, Feb 8, 2010 at 6:53 PM, Cade Robinson
<user-a187bb1b921c@xymon.invalid> wrote:

        The issue is in the hobbitfetch "grabdata" function.
        
        int n;
        char buf[8192];
        ...
        n = read(conn->sockfd, buf, sizeof(buf));
        ...
        else if (n > 0) {
        ...
        buf[n] = '\0';
        ...
        
        If the "read" reads 8192 bytes then n is 8192 and then buf[n]
        tries to
        get set to NULL.  There is no element 8192.
        Also only one read happens so if there is more than 8192 bytes
        to be
        fetched not everything is fetched.
        
        I put the "read" and "if"s in a do...while loop and fixed the
        null
        termination on buf and haven't had any issues.
        
        ~/hobbitmon/trunk/hobbitd:-> diff -u
        hobbitfetch.c ./hobbitfetch.c.new
        --- hobbitfetch.c       2010-02-08 12:43:22.781543905 -0600
        +++ ./hobbitfetch.c.new 2010-02-08 12:52:25.249509306 -0600
        @@ -342,8 +342,9 @@
            int n;
            char buf[8192];
        
        +    do {
            /* Read data from a peer connection (client or server) */
        -        n = read(conn->sockfd, buf, sizeof(buf));
        +        n = read(conn->sockfd, buf, sizeof(buf)-2);
            if (n == -1) {
                if ((errno != EINTR) && (errno != EAGAIN)) {
                    /* Read failure */
        @@ -360,7 +361,7 @@
                /* Save the data */
                dbgprintf("Got %d bytes of data from %s (req %lu)\n",
                    n, addrstring(&conn->caddr), conn->seq);
        -        buf[n] = '\0';
        +        buf[n+1] = '\0';
                addtobuffer(conn->msgbuf, buf);
            }
            else if (n == 0) {
        @@ -380,6 +381,7 @@
                    break;
                }
            }
        +    } while (n>0);
         }
        
         void set_polltime(clients_t *client)
        
        
        On Mon, 2010-02-08 at 18:12 +0000, Steinar M. Skúlason wrote:

On Mon, Feb 8, 2010 at 5:36 PM, Daniel McDonald
<user-290ce4e24e19@xymon.invalid> wrote:
        On 2/8/10 10:37 AM, "Steinar M. Skúlason"
        <user-3b78224d184c@xymon.invalid> wrote:
Hi,

I'm having problems with "msgcache" on the client
        machines
        and "hobbitfetch"
on the server machine.
It works for a short period and then get's stuck
        and all my
        client side checks
end up with status purple.

        Yup.  Been doing that for a long time here.  I sent
        a bunch of
        corefiles to
        Henrik about it, and he tried a bunch of patches.
         Eventually,
        we just wrote
        a routine that restarts hobbitfetch whenever a host
        turns
        purple.

        --
        Daniel J McDonald, CCIE # 2495, CISSP # 78281


        To unsubscribe from the hobbit list, send an e-mail
        to
        user-095ef1c764a2@xymon.invalid


Ok, good to hear that I am not the only one.
I wrote a ugly routine that restarts hobbitfetch if there is
        no new
entry in the logfile


#!/bin/bash

#This is to see if any progress has been made within the
        hobbitfetch
utility.
TMP_FILE=/tmp/tmp.hobbitfetch.last
LAST_LINE=`tail
        -1 /usr/lib/xymon/server/log/hobbitfetch.log|awk
'{print $1 $2}'`
PREV_LINE=`cat /tmp/tmp.hobbitfetch.last`
echo $LAST_LINE > $TMP_FILE

if [ "$LAST_LINE" == "$PREV_LINE" ]; then
  echo "Nothing has happend .... killing hobbitfetch!"
  PID=`ps -ef|grep hobbitfetch|awk '{print $2}'`
  kill -9 $PID
fi

Best Regards,
Steinar M.
        
        
Thank you for your reply Cade, I tried your patch but it was not
working for me
are you using the 4.3.0-beta2 for both client and server?
I get no checks populated with your changes.
Or did you also make changes to the msgcache.c ?

Regards,
Steinar M.
list Steinar M. Skúlason · Thu, 11 Feb 2010 16:37:41 +0000 ·
quoted from Cade Robinson
On Wed, Feb 10, 2010 at 7:54 PM, Cade Robinson <user-a187bb1b921c@xymon.invalid>wrote:
 I am not sure how this patch was working for me.
It fixes if the fetched data is 8192 bytes or greater but breaks when the
read is <8192.
It does the initial read fine but on the second round through the do/while
loop n is -1 since there is nothing left to read and the remote has closed.

So it thinks there is a failure.


On Wed, 2010-02-10 at 09:33 +0000, Steinar M. Skúlason wrote:


 On Mon, Feb 8, 2010 at 6:53 PM, Cade Robinson <user-a187bb1b921c@xymon.invalid>
wrote:

The issue is in the hobbitfetch "grabdata" function.

int n;
char buf[8192];
...
n = read(conn->sockfd, buf, sizeof(buf));
...
else if (n > 0) {
...
buf[n] = '\0';
...

If the "read" reads 8192 bytes then n is 8192 and then buf[n] tries to
get set to NULL.  There is no element 8192.
Also only one read happens so if there is more than 8192 bytes to be
fetched not everything is fetched.

I put the "read" and "if"s in a do...while loop and fixed the null
termination on buf and haven't had any issues.

~/hobbitmon/trunk/hobbitd:-> diff -u hobbitfetch.c ./hobbitfetch.c.new
--- hobbitfetch.c       2010-02-08 12:43:22.781543905 -0600
+++ ./hobbitfetch.c.new 2010-02-08 12:52:25.249509306 -0600
@@ -342,8 +342,9 @@
    int n;
    char buf[8192];

+    do {
    /* Read data from a peer connection (client or server) */
-        n = read(conn->sockfd, buf, sizeof(buf));
+        n = read(conn->sockfd, buf, sizeof(buf)-2);
    if (n == -1) {
        if ((errno != EINTR) && (errno != EAGAIN)) {
            /* Read failure */
@@ -360,7 +361,7 @@
        /* Save the data */
        dbgprintf("Got %d bytes of data from %s (req %lu)\n",
            n, addrstring(&conn->caddr), conn->seq);
-        buf[n] = '\0';
+        buf[n+1] = '\0';
        addtobuffer(conn->msgbuf, buf);
    }
    else if (n == 0) {
@@ -380,6 +381,7 @@
            break;
        }
    }
+    } while (n>0);
 }

 void set_polltime(clients_t *client)


On Mon, 2010-02-08 at 18:12 +0000, Steinar M. Skúlason wrote:

On Mon, Feb 8, 2010 at 5:36 PM, Daniel McDonald
<user-290ce4e24e19@xymon.invalid> wrote:
        On 2/8/10 10:37 AM, "Steinar M. Skúlason"
        <user-3b78224d184c@xymon.invalid> wrote:
Hi,

I'm having problems with "msgcache" on the client machines
        and "hobbitfetch"
on the server machine.
It works for a short period and then get's stuck and all my
        client side checks
end up with status purple.

        Yup.  Been doing that for a long time here.  I sent a bunch of
        corefiles to
        Henrik about it, and he tried a bunch of patches.  Eventually,
        we just wrote
        a routine that restarts hobbitfetch whenever a host turns
        purple.

        --
        Daniel J McDonald, CCIE # 2495, CISSP # 78281


Ok, good to hear that I am not the only one.
I wrote a ugly routine that restarts hobbitfetch if there is no new
entry in the logfile


#!/bin/bash

#This is to see if any progress has been made within the hobbitfetch
utility.
TMP_FILE=/tmp/tmp.hobbitfetch.last
LAST_LINE=`tail -1 /usr/lib/xymon/server/log/hobbitfetch.log|awk
'{print $1 $2}'`
PREV_LINE=`cat /tmp/tmp.hobbitfetch.last`
echo $LAST_LINE > $TMP_FILE

if [ "$LAST_LINE" == "$PREV_LINE" ]; then
  echo "Nothing has happend .... killing hobbitfetch!"
  PID=`ps -ef|grep hobbitfetch|awk '{print $2}'`
  kill -9 $PID
fi

Best Regards,
Steinar M.

  Thank you for your reply Cade, I tried your patch but it was not working
for me
are you using the 4.3.0-beta2 for both client and server?
I get no checks populated with your changes.
Or did you also make changes to the msgcache.c ?

Regards,
Steinar M.

 I recompiled everything clients/server ( with your patch included )
Looks good at the moment, but I'm going to give it a couple of days.

I will report back with my findings.


Regards,
Steinar M.