Xymon Mailing List Archive search

Critical Systems view loading problem

7 messages in this thread

list Tracy di Marco White · Thu, 6 Sep 2007 21:30:58 -0500 ·
I'm running Hobbit 4.2.0 with the allinone patch on NetBSD 3.x.  I've
started making use of the critical systems setup at work, but after
encouraging all my co-workers to start using it, I've run into a
problem.

I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh".  My problem seems
to be related to a test being yellow right now, and right now being
outside of the parameters of when the machine/test combo is critical.
If I change the critical time for the event from "|W:0800:1700|" to
"||", the critical systems page comes up fine.  If I put the time
constraints back, the page fails to come up again.  It started failing
after 1700, although I didn't notice it for about 15 minutes.  Is
anyone else seeing this problem?

-Tracy
list Henrik Størner · Fri, 7 Sep 2007 11:31:45 +0200 ·
quoted from Tracy di Marco White
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:
I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh".  My problem seems
to be related to a test being yellow right now, and right now being
outside of the parameters of when the machine/test combo is critical.
If I change the critical time for the event from "|W:0800:1700|" to
"||", the critical systems page comes up fine.  If I put the time
constraints back, the page fails to come up again.  It started failing
after 1700, although I didn't notice it for about 15 minutes.  Is
anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line from
the hobbit-nkview.cfg file ?


Thanks,
Henrik
list Gary Baluha · Fri, 7 Sep 2007 11:05:57 -0400 ·
Not to be another "me too" post, but... I've seen a similar "Internal Server
Error" issue with the critical systems page myself.  I haven't done a test
like Tracy, though.  The weird thing is, the error is only intermittent, and
sometimes if I just keep reloading the page, it eventually comes up.  More
testing on my end will be required to see if it's from an individual host
status.
quoted from Henrik Størner

On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:
I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh".  My problem seems
to be related to a test being yellow right now, and right now being
outside of the parameters of when the machine/test combo is critical.
If I change the critical time for the event from "|W:0800:1700|" to
"||", the critical systems page comes up fine.  If I put the time
constraints back, the page fails to come up again.  It started failing
after 1700, although I didn't notice it for about 15 minutes.  Is
anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line from
the hobbit-nkview.cfg file ?


Thanks,
Henrik

list Tracy di Marco White · Mon, 17 Sep 2007 16:43:54 -0500 ·
quoted from Gary Baluha
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:
I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh".  My problem seems
to be related to a test being yellow right now, and right now being
outside of the parameters of when the machine/test combo is critical.
If I change the critical time for the event from "|W:0800:1700|" to
"||", the critical systems page comes up fine.  If I put the time
constraints back, the page fails to come up again.  It started failing
after 1700, although I didn't notice it for about 15 minutes.  Is
anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line from
the hobbit-nkview.cfg file ?
So I worked with a co-worker to figure out where the problem is.
  In get_nkconfig() in ./lib/loadnkconf.c

                        /* Go to the next */
                        handle = rbtNext(rbconf, handle);
                        if (handle != rbtEnd(rbconf)) {
                                rbtKeyValue(rbconf, handle, &k1, &k2);
                                if (strncmp(realkey, ((nkconf_t *)k2)->key,...

  here k2 is treated as a pointer to a nkconf_t
  but it appears that it is really a 'char *'.

Is what he concluded from our poking with gdb. We haven't gotten
any further in debugging, but hoped this would help to pin down
the problem.

-Tracy
list Gary Baluha · Mon, 24 Dec 2007 23:09:19 -0500 ·
As I previously posted, I get this problem every now and then as well.
About a month back, the Critical Systems page suddenly become useless when
it became stuck with that "Internal Server Error" issue.  My co-worker came
across an apparent fix that the file permissions for the
hobbit-nkview.cfgfile were wrong, and the --debug option in
hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from loading.
This now appears NOT to be the case, because the eternal Internal Server
Error problem is back.  It seems it was just coincidence that he made the
changes when the Critical Systems page started working again.

Also, while I was in the process of typing the above section, it appears the
Critical Systems page is working again.  I made absolutely no changes to
anything during this time.  Unfortunately now, as before, I cannot determine
any causal relationship.  Additionally, unlike Tracy's problem below, it
doesn't appear to be related to the alerts that are showing up either (I can
confirm that no alert statuses changed while I was writing this).

I'm going to have to go with Tracy's assesment that it is a pointer issue as
pointed out.  I do recall during my programming days of incorrect pointer
usage in the code causing intermittent and non-reproducible errors
occuring...  Unfortunately, it's been a while since I've programmed in
C/C++, and I would have to spend a while with the code to see if this really
is the issue, and how to fix it.  All I know is, it sounds plausible.

Anyone else have any ideas, or am I just going a little off the deep end
with this (which is quite possible)?
quoted from Tracy di Marco White


On Sep 17, 2007 4:43 PM, Tracy Di Marco White <user-736ce936c847@xymon.invalid> wrote:
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:
I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh".  My problem seems
to be related to a test being yellow right now, and right now being
outside of the parameters of when the machine/test combo is critical.
If I change the critical time for the event from "|W:0800:1700|" to
"||", the critical systems page comes up fine.  If I put the time
constraints back, the page fails to come up again.  It started failing
after 1700, although I didn't notice it for about 15 minutes.  Is
anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line from
the hobbit-nkview.cfg file ?
So I worked with a co-worker to figure out where the problem is.
 In get_nkconfig() in ./lib/loadnkconf.c

                       /* Go to the next */
                       handle = rbtNext(rbconf, handle);
                       if (handle != rbtEnd(rbconf)) {
                               rbtKeyValue(rbconf, handle, &k1, &k2);
                               if (strncmp(realkey, ((nkconf_t
*)k2)->key,...

 here k2 is treated as a pointer to a nkconf_t
 but it appears that it is really a 'char *'.

Is what he concluded from our poking with gdb. We haven't gotten
any further in debugging, but hoped this would help to pin down
the problem.
list Gary Baluha · Mon, 24 Dec 2007 23:11:54 -0500 ·
Actually, one more thing to add...  I take it back, there was an alert
status change between when it was working and when it wasn't, and it is now
again not working correctly.  Also, I just recalled that if I zeroed out the
contents of the hobbit-nkview.cfg file, the Critical Systems page started
working again (albeit it with no alerts showing up).
quoted from Gary Baluha

On Dec 24, 2007 11:09 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
As I previously posted, I get this problem every now and then as well.
About a month back, the Critical Systems page suddenly become useless when
it became stuck with that "Internal Server Error" issue.  My co-worker came
across an apparent fix that the file permissions for the hobbit-nkview.cfgfile were wrong, and the --debug option in
hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from loading.
This now appears NOT to be the case, because the eternal Internal Server
Error problem is back.  It seems it was just coincidence that he made the
changes when the Critical Systems page started working again.

Also, while I was in the process of typing the above section, it appears
the Critical Systems page is working again.  I made absolutely no changes to
anything during this time.  Unfortunately now, as before, I cannot determine
any causal relationship.  Additionally, unlike Tracy's problem below, it
doesn't appear to be related to the alerts that are showing up either (I can
confirm that no alert statuses changed while I was writing this).

I'm going to have to go with Tracy's assesment that it is a pointer issue
as pointed out.  I do recall during my programming days of incorrect pointer
usage in the code causing intermittent and non-reproducible errors
occuring...  Unfortunately, it's been a while since I've programmed in
C/C++, and I would have to spend a while with the code to see if this really
is the issue, and how to fix it.  All I know is, it sounds plausible.

Anyone else have any ideas, or am I just going a little off the deep end
with this (which is quite possible)?


On Sep 17, 2007 4:43 PM, Tracy Di Marco White <user-736ce936c847@xymon.invalid> wrote:
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:
I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh".  My problem
seems
to be related to a test being yellow right now, and right now being
outside of the parameters of when the machine/test combo is
critical.
If I change the critical time for the event from "|W:0800:1700|" to
"||", the critical systems page comes up fine.  If I put the time
constraints back, the page fails to come up again.  It started
failing
after 1700, although I didn't notice it for about 15 minutes.  Is
anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line
from
the hobbit-nkview.cfg file ?
list Gary Baluha · Fri, 28 Dec 2007 09:41:59 -0500 ·
Okay, I think I figured out the issue I was having.  I noticed it happen
again this morning, and when I took a look at which alerts were showing as
red (from the "all non-green" page), I noticed a pattern that everytime the
critical systems page was showing an Internal Server Error, the same alerts
were showing as red in the non-green page.

In short, the hosts look like "A-B-C-[1-4]".  I manually edited the
hobbit-nkview.cfg file to remove all of the staging machines
("A-stag-C-[1-4]"), which are all clones of the same entry.  I found that if
I just deleted the master clone entries, the critical systems page was
working.  As soon as I reverted the file to the previous version, the
problem came back.  I then deleted the clone entries again and readded them,
and all is working fine now.

So it appears that where the documentation says "don't edit the
hobbit-nkview.cfg file manually", it means it ;-)  Still, it may be useful
to have some sort of script or something to run that can check the syntax of
the hobbit-nkview.cfg for errors.  I think that as long as the file is named
".cfg" and is in the same directory in hobbit as all the other configuration
files, some people will be tempted to manually edit the file.
quoted from Gary Baluha

On Dec 24, 2007 11:11 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
Actually, one more thing to add...  I take it back, there was an alert
status change between when it was working and when it wasn't, and it is now
again not working correctly.  Also, I just recalled that if I zeroed out the
contents of the hobbit-nkview.cfg file, the Critical Systems page started
working again (albeit it with no alerts showing up).


On Dec 24, 2007 11:09 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
As I previously posted, I get this problem every now and then as well.
About a month back, the Critical Systems page suddenly become useless when
it became stuck with that "Internal Server Error" issue.  My co-worker came
across an apparent fix that the file permissions for the

hobbit-nkview.cfg file were wrong, and the --debug option in
quoted from Gary Baluha
hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from
loading.  This now appears NOT to be the case, because the eternal Internal
Server Error problem is back.  It seems it was just coincidence that he made
the changes when the Critical Systems page started working again.

Also, while I was in the process of typing the above section, it appears
the Critical Systems page is working again.  I made absolutely no changes to
anything during this time.  Unfortunately now, as before, I cannot determine
any causal relationship.  Additionally, unlike Tracy's problem below, it
doesn't appear to be related to the alerts that are showing up either (I can
confirm that no alert statuses changed while I was writing this).

I'm going to have to go with Tracy's assesment that it is a pointer
issue as pointed out.  I do recall during my programming days of incorrect
pointer usage in the code causing intermittent and non-reproducible errors
occuring...  Unfortunately, it's been a while since I've programmed in
C/C++, and I would have to spend a while with the code to see if this really
is the issue, and how to fix it.  All I know is, it sounds plausible.

Anyone else have any ideas, or am I just going a little off the deep end
with this (which is quite possible)?


On Sep 17, 2007 4:43 PM, Tracy Di Marco White <user-736ce936c847@xymon.invalid>
wrote:
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White
wrote:
I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh".  My problem
seems
to be related to a test being yellow right now, and right now
being
outside of the parameters of when the machine/test combo is
critical.
If I change the critical time for the event from "|W:0800:1700|"
to
"||", the critical systems page comes up fine.  If I put the time
constraints back, the page fails to come up again.  It started
failing
after 1700, although I didn't notice it for about 15 minutes.  Is
anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line
from
the hobbit-nkview.cfg file ?