Critical Systems view loading problem
list Tracy di Marco White
I'm running Hobbit 4.2.0 with the allinone patch on NetBSD 3.x. I've started making use of the critical systems setup at work, but after encouraging all my co-workers to start using it, I've run into a problem. I'm getting an "Internal Server Error" and the error log shows "Premature end of script headers: hobbit-nkview.sh". My problem seems to be related to a test being yellow right now, and right now being outside of the parameters of when the machine/test combo is critical. If I change the critical time for the event from "|W:0800:1700|" to "||", the critical systems page comes up fine. If I put the time constraints back, the page fails to come up again. It started failing after 1700, although I didn't notice it for about 15 minutes. Is anyone else seeing this problem? -Tracy
list Henrik Størner
▸
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:
I'm getting an "Internal Server Error" and the error log shows "Premature end of script headers: hobbit-nkview.sh". My problem seems to be related to a test being yellow right now, and right now being outside of the parameters of when the machine/test combo is critical. If I change the critical time for the event from "|W:0800:1700|" to "||", the critical systems page comes up fine. If I put the time constraints back, the page fails to come up again. It started failing after 1700, although I didn't notice it for about 15 minutes. Is anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line from the hobbit-nkview.cfg file ? Thanks, Henrik
list Gary Baluha
Not to be another "me too" post, but... I've seen a similar "Internal Server Error" issue with the critical systems page myself. I haven't done a test like Tracy, though. The weird thing is, the error is only intermittent, and sometimes if I just keep reloading the page, it eventually comes up. More testing on my end will be required to see if it's from an individual host status.
▸
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:I'm getting an "Internal Server Error" and the error log shows "Premature end of script headers: hobbit-nkview.sh". My problem seems to be related to a test being yellow right now, and right now being outside of the parameters of when the machine/test combo is critical. If I change the critical time for the event from "|W:0800:1700|" to "||", the critical systems page comes up fine. If I put the time constraints back, the page fails to come up again. It started failing after 1700, although I didn't notice it for about 15 minutes. Is anyone else seeing this problem?Interesting, it does sound like a bug. Could you send me that line from the hobbit-nkview.cfg file ? Thanks, Henrik
list Tracy di Marco White
▸
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:I'm getting an "Internal Server Error" and the error log shows "Premature end of script headers: hobbit-nkview.sh". My problem seems to be related to a test being yellow right now, and right now being outside of the parameters of when the machine/test combo is critical. If I change the critical time for the event from "|W:0800:1700|" to "||", the critical systems page comes up fine. If I put the time constraints back, the page fails to come up again. It started failing after 1700, although I didn't notice it for about 15 minutes. Is anyone else seeing this problem?Interesting, it does sound like a bug. Could you send me that line from the hobbit-nkview.cfg file ?
So I worked with a co-worker to figure out where the problem is.
In get_nkconfig() in ./lib/loadnkconf.c
/* Go to the next */
handle = rbtNext(rbconf, handle);
if (handle != rbtEnd(rbconf)) {
rbtKeyValue(rbconf, handle, &k1, &k2);
if (strncmp(realkey, ((nkconf_t *)k2)->key,...
here k2 is treated as a pointer to a nkconf_t
but it appears that it is really a 'char *'.
Is what he concluded from our poking with gdb. We haven't gotten
any further in debugging, but hoped this would help to pin down
the problem.
-Tracy
list Gary Baluha
As I previously posted, I get this problem every now and then as well. About a month back, the Critical Systems page suddenly become useless when it became stuck with that "Internal Server Error" issue. My co-worker came across an apparent fix that the file permissions for the hobbit-nkview.cfgfile were wrong, and the --debug option in hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from loading. This now appears NOT to be the case, because the eternal Internal Server Error problem is back. It seems it was just coincidence that he made the changes when the Critical Systems page started working again. Also, while I was in the process of typing the above section, it appears the Critical Systems page is working again. I made absolutely no changes to anything during this time. Unfortunately now, as before, I cannot determine any causal relationship. Additionally, unlike Tracy's problem below, it doesn't appear to be related to the alerts that are showing up either (I can confirm that no alert statuses changed while I was writing this). I'm going to have to go with Tracy's assesment that it is a pointer issue as pointed out. I do recall during my programming days of incorrect pointer usage in the code causing intermittent and non-reproducible errors occuring... Unfortunately, it's been a while since I've programmed in C/C++, and I would have to spend a while with the code to see if this really is the issue, and how to fix it. All I know is, it sounds plausible. Anyone else have any ideas, or am I just going a little off the deep end with this (which is quite possible)?
▸
On Sep 17, 2007 4:43 PM, Tracy Di Marco White <user-736ce936c847@xymon.invalid> wrote:
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:I'm getting an "Internal Server Error" and the error log shows "Premature end of script headers: hobbit-nkview.sh". My problem seems to be related to a test being yellow right now, and right now being outside of the parameters of when the machine/test combo is critical. If I change the critical time for the event from "|W:0800:1700|" to "||", the critical systems page comes up fine. If I put the time constraints back, the page fails to come up again. It started failing after 1700, although I didn't notice it for about 15 minutes. Is anyone else seeing this problem?Interesting, it does sound like a bug. Could you send me that line from the hobbit-nkview.cfg file ?So I worked with a co-worker to figure out where the problem is. In get_nkconfig() in ./lib/loadnkconf.c /* Go to the next */ handle = rbtNext(rbconf, handle); if (handle != rbtEnd(rbconf)) { rbtKeyValue(rbconf, handle, &k1, &k2); if (strncmp(realkey, ((nkconf_t *)k2)->key,... here k2 is treated as a pointer to a nkconf_t but it appears that it is really a 'char *'. Is what he concluded from our poking with gdb. We haven't gotten any further in debugging, but hoped this would help to pin down the problem.
list Gary Baluha
Actually, one more thing to add... I take it back, there was an alert status change between when it was working and when it wasn't, and it is now again not working correctly. Also, I just recalled that if I zeroed out the contents of the hobbit-nkview.cfg file, the Critical Systems page started working again (albeit it with no alerts showing up).
▸
On Dec 24, 2007 11:09 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
As I previously posted, I get this problem every now and then as well. About a month back, the Critical Systems page suddenly become useless when it became stuck with that "Internal Server Error" issue. My co-worker came across an apparent fix that the file permissions for the hobbit-nkview.cfgfile were wrong, and the --debug option in hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from loading. This now appears NOT to be the case, because the eternal Internal Server Error problem is back. It seems it was just coincidence that he made the changes when the Critical Systems page started working again. Also, while I was in the process of typing the above section, it appears the Critical Systems page is working again. I made absolutely no changes to anything during this time. Unfortunately now, as before, I cannot determine any causal relationship. Additionally, unlike Tracy's problem below, it doesn't appear to be related to the alerts that are showing up either (I can confirm that no alert statuses changed while I was writing this). I'm going to have to go with Tracy's assesment that it is a pointer issue as pointed out. I do recall during my programming days of incorrect pointer usage in the code causing intermittent and non-reproducible errors occuring... Unfortunately, it's been a while since I've programmed in C/C++, and I would have to spend a while with the code to see if this really is the issue, and how to fix it. All I know is, it sounds plausible. Anyone else have any ideas, or am I just going a little off the deep end with this (which is quite possible)? On Sep 17, 2007 4:43 PM, Tracy Di Marco White <user-736ce936c847@xymon.invalid> wrote:On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:I'm getting an "Internal Server Error" and the error log shows "Premature end of script headers: hobbit-nkview.sh". My problem seems to be related to a test being yellow right now, and right now being outside of the parameters of when the machine/test combo is critical. If I change the critical time for the event from "|W:0800:1700|" to "||", the critical systems page comes up fine. If I put the time constraints back, the page fails to come up again. It started failing after 1700, although I didn't notice it for about 15 minutes. Is anyone else seeing this problem?Interesting, it does sound like a bug. Could you send me that line from the hobbit-nkview.cfg file ?
list Gary Baluha
Okay, I think I figured out the issue I was having. I noticed it happen
again this morning, and when I took a look at which alerts were showing as
red (from the "all non-green" page), I noticed a pattern that everytime the
critical systems page was showing an Internal Server Error, the same alerts
were showing as red in the non-green page.
In short, the hosts look like "A-B-C-[1-4]". I manually edited the
hobbit-nkview.cfg file to remove all of the staging machines
("A-stag-C-[1-4]"), which are all clones of the same entry. I found that if
I just deleted the master clone entries, the critical systems page was
working. As soon as I reverted the file to the previous version, the
problem came back. I then deleted the clone entries again and readded them,
and all is working fine now.
So it appears that where the documentation says "don't edit the
hobbit-nkview.cfg file manually", it means it ;-) Still, it may be useful
to have some sort of script or something to run that can check the syntax of
the hobbit-nkview.cfg for errors. I think that as long as the file is named
".cfg" and is in the same directory in hobbit as all the other configuration
files, some people will be tempted to manually edit the file.
▸
On Dec 24, 2007 11:11 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
Actually, one more thing to add... I take it back, there was an alert status change between when it was working and when it wasn't, and it is now again not working correctly. Also, I just recalled that if I zeroed out the contents of the hobbit-nkview.cfg file, the Critical Systems page started working again (albeit it with no alerts showing up). On Dec 24, 2007 11:09 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:As I previously posted, I get this problem every now and then as well. About a month back, the Critical Systems page suddenly become useless when it became stuck with that "Internal Server Error" issue. My co-worker came across an apparent fix that the file permissions for the
hobbit-nkview.cfg file were wrong, and the --debug option in
▸
hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from loading. This now appears NOT to be the case, because the eternal Internal Server Error problem is back. It seems it was just coincidence that he made the changes when the Critical Systems page started working again. Also, while I was in the process of typing the above section, it appears the Critical Systems page is working again. I made absolutely no changes to anything during this time. Unfortunately now, as before, I cannot determine any causal relationship. Additionally, unlike Tracy's problem below, it doesn't appear to be related to the alerts that are showing up either (I can confirm that no alert statuses changed while I was writing this). I'm going to have to go with Tracy's assesment that it is a pointer issue as pointed out. I do recall during my programming days of incorrect pointer usage in the code causing intermittent and non-reproducible errors occuring... Unfortunately, it's been a while since I've programmed in C/C++, and I would have to spend a while with the code to see if this really is the issue, and how to fix it. All I know is, it sounds plausible. Anyone else have any ideas, or am I just going a little off the deep end with this (which is quite possible)? On Sep 17, 2007 4:43 PM, Tracy Di Marco White <user-736ce936c847@xymon.invalid> wrote:On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White wrote:I'm getting an "Internal Server Error" and the error log shows "Premature end of script headers: hobbit-nkview.sh". My problem seems to be related to a test being yellow right now, and right now being outside of the parameters of when the machine/test combo is critical. If I change the critical time for the event from "|W:0800:1700|" to "||", the critical systems page comes up fine. If I put the time constraints back, the page fails to come up again. It started failing after 1700, although I didn't notice it for about 15 minutes. Is anyone else seeing this problem?Interesting, it does sound like a bug. Could you send me that line from the hobbit-nkview.cfg file ?