Okay, I think I figured out the issue I was having. I noticed it happen
again this morning, and when I took a look at which alerts were showing as
red (from the "all non-green" page), I noticed a pattern that everytime the
critical systems page was showing an Internal Server Error, the same alerts
were showing as red in the non-green page.
In short, the hosts look like "A-B-C-[1-4]". I manually edited the
hobbit-nkview.cfg file to remove all of the staging machines
("A-stag-C-[1-4]"), which are all clones of the same entry. I found that if
I just deleted the master clone entries, the critical systems page was
working. As soon as I reverted the file to the previous version, the
problem came back. I then deleted the clone entries again and readded them,
and all is working fine now.
So it appears that where the documentation says "don't edit the
hobbit-nkview.cfg file manually", it means it ;-) Still, it may be useful
to have some sort of script or something to run that can check the syntax of
the hobbit-nkview.cfg for errors. I think that as long as the file is named
".cfg" and is in the same directory in hobbit as all the other configuration
files, some people will be tempted to manually edit the file.
On Dec 24, 2007 11:11 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
Actually, one more thing to add... I take it back, there was an alert
status change between when it was working and when it wasn't, and it is now
again not working correctly. Also, I just recalled that if I zeroed out the
contents of the hobbit-nkview.cfg file, the Critical Systems page started
working again (albeit it with no alerts showing up).
On Dec 24, 2007 11:09 PM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:
As I previously posted, I get this problem every now and then as well.
About a month back, the Critical Systems page suddenly become useless when
it became stuck with that "Internal Server Error" issue. My co-worker came
across an apparent fix that the file permissions for the
hobbit-nkview.cfg file were wrong, and the --debug option in
hobbitcgi.cfg for hobbit-nkview.cfg was preventing the page from
loading. This now appears NOT to be the case, because the eternal Internal
Server Error problem is back. It seems it was just coincidence that he made
the changes when the Critical Systems page started working again.
Also, while I was in the process of typing the above section, it appears
the Critical Systems page is working again. I made absolutely no changes to
anything during this time. Unfortunately now, as before, I cannot determine
any causal relationship. Additionally, unlike Tracy's problem below, it
doesn't appear to be related to the alerts that are showing up either (I can
confirm that no alert statuses changed while I was writing this).
I'm going to have to go with Tracy's assesment that it is a pointer
issue as pointed out. I do recall during my programming days of incorrect
pointer usage in the code causing intermittent and non-reproducible errors
occuring... Unfortunately, it's been a while since I've programmed in
C/C++, and I would have to spend a while with the code to see if this really
is the issue, and how to fix it. All I know is, it sounds plausible.
Anyone else have any ideas, or am I just going a little off the deep end
with this (which is quite possible)?
On Sep 17, 2007 4:43 PM, Tracy Di Marco White <user-736ce936c847@xymon.invalid>
wrote:
On 9/7/07, Henrik Stoerner <user-ce4a2c883f75@xymon.invalid> wrote:
On Thu, Sep 06, 2007 at 09:30:58PM -0500, Tracy Di Marco White
wrote:
I'm getting an "Internal Server Error" and the error log shows
"Premature end of script headers: hobbit-nkview.sh". My problem
seems
to be related to a test being yellow right now, and right now
being
outside of the parameters of when the machine/test combo is
critical.
If I change the critical time for the event from "|W:0800:1700|"
to
"||", the critical systems page comes up fine. If I put the time
constraints back, the page fails to come up again. It started
failing
after 1700, although I didn't notice it for about 15 minutes. Is
anyone else seeing this problem?
Interesting, it does sound like a bug. Could you send me that line
from
the hobbit-nkview.cfg file ?