Xymon Mailing List Archive search

'red' status not showing up in history

3 messages in this thread

list Stephen Menton · Thu, 6 Mar 2008 10:56:16 -0800 ·
I have a test that's gone 'red' multiple times today. I know this
because I've seen it in the website and alerts have been sent to me. Yet
if I look in ~hobbit/data/hist, ~hobbit/data/histlogs, or use the
hsitory CGIs in the web GUI, it shows that it hasn't been 'red' today,
rather that it's been green for >2 days. This isn't the case though, as
It's gone 'red' 6 times today alone (2 are repeated e-mails, 2h delay).
 
from alert e-mails:
bspdm07.edc.cingular.net:sgsn_procs red [55828]
red Thu Mar  6 09:43:03 2008 
bspdm07.edc.cingular.net:sgsn_procs red [271768]
red Thu Mar  6 07:47:06 2008 
bspdm07.edc.cingular.net:sgsn_procs red [271768]
red Thu Mar  6 05:51:35 2008 
bspdm07.edc.cingular.net:sgsn_procs red [811858]
red Thu Mar  6 05:11:22 2008 
bspdm07.edc.cingular.net:sgsn_procs red [811858]
red Thu Mar  6 03:15:51 2008
bspdm07.edc.cingular.net:sgsn_procs red [694717]
red Thu Mar  6 02:10:28 2008 
bspdm07.edc.cingular.net:sgsn_procs red [738302]
red Thu Mar  6 01:45:16 2008 
bspdm07.edc.cingular.net:sgsn_procs red [648888]
red Thu Mar  6 01:20:08 2008 
 
yet from the logs:
~hobbit/data/hist:# tail bspdm07,edc,cingular,net.sgsn_procs
Tue Feb 26 11:01:29 2008 red 1204052489 12385
Tue Feb 26 14:27:54 2008 green 1204064874 1506
Tue Feb 26 14:53:00 2008 red 1204066380 301
Tue Feb 26 14:58:01 2008 green 1204066681 549212
Mon Mar  3 23:31:33 2008 red 1204615893 1521
Mon Mar  3 23:56:54 2008 green 1204617414 592
Tue Mar  4 00:06:46 2008 red 1204618006 302
Tue Mar  4 00:11:48 2008 green 1204618308 302
Tue Mar  4 00:16:50 2008 red 1204618610 1206
Tue Mar  4 00:36:56 2008 green 1204619816

~hobbit/data/histlogs/bspdm07_edc_cingular_net/sgsn_procs:# ls -rt |
tail
Tue_Feb_26_11:01:29_2008
Tue_Feb_26_14:27:54_2008
Tue_Feb_26_14:53:00_2008
Tue_Feb_26_14:58:01_2008
Mon_Mar_3_23:31:33_2008
Mon_Mar_3_23:56:54_2008
Tue_Mar_4_00:06:46_2008
Tue_Mar_4_00:11:48_2008
Tue_Mar_4_00:16:50_2008
Tue_Mar_4_00:36:56_2008

So where did my reds go? ...
 
Also, it's not just this test, there are >10,000 tests doing this. Other
tests seem to reflect red status fine, others don't. I see no rhyme or
reason to it. :-(
 
stephen
list Henrik Størner · Thu, 6 Mar 2008 22:43:27 +0100 ·
quoted from Stephen Menton
On Thu, Mar 06, 2008 at 10:56:16AM -0800, Menton, Stephen wrote:
I have a test that's gone 'red' multiple times today. I know this
because I've seen it in the website and alerts have been sent to me. Yet
if I look in ~hobbit/data/hist, ~hobbit/data/histlogs, or use the
hsitory CGIs in the web GUI, it shows that it hasn't been 'red' today,
rather that it's been green for >2 days.
The only explanation I can give is that the hobbitd_history module
might have been stopped or crashed when this red status happened.
Could you check the history.log and hobbitlaunch.log files to see if
there's any mention of this ?

Doing a full restart of Hobbit will force a sync of the history logs
with the current status recorded in Hobbit, but it obviously cannot 
record events that are long past.


Regards,
Henrik
list Stephen Menton · Thu, 6 Mar 2008 14:00:13 -0800 ·
Well, the server was restarted back on the 4th... Current procs:
  hobbit 12332 12230  0   Mar 04 ?        5:26 hobbitd_channel
--channel=page --log=/var/log/hobbit/page.log hobbitd_alert --c
  hobbit 12231 12230  2   Mar 04 ?       379:51 hobbitd
--pidfile=/var/log/hobbit/hobbitd.pid --restart=/opt/home/hobbit/server
  hobbit 12326 12319  0   Mar 04 ?       40:04 hobbitd_rrd
--rrddir=/opt/home/hobbit/data/rrd
  hobbit 12328 12321  0   Mar 04 ?        6:02 hobbitd_client
  hobbit 12333 12330  2   Mar 04 ?       117:51 hobbitd_history
  hobbit 12321 12230  0   Mar 04 ?        1:44 hobbitd_channel
--channel=client --log=/var/log/hobbit/clientdata.log hobbitd_c
  hobbit 12331 12230  0   Mar 04 ?        0:15 hobbitd_channel
--channel=clichg --log=/var/log/hobbit/hostdata.log hobbitd_hos
  hobbit 12334 12331  0   Mar 04 ?        0:25 hobbitd_hostdata
  hobbit 12327 12320  0   Mar 04 ?        4:43 hobbitd_rrd
--rrddir=/opt/home/hobbit/data/rrd
  hobbit 12330 12230  0   Mar 04 ?        0:04 hobbitd_channel
--channel=stachg --log=/var/log/hobbit/history.log hobbitd_hist
  hobbit 12320 12230  0   Mar 04 ?        0:12 hobbitd_channel
--channel=data --log=/var/log/hobbit/rrd-data.log hobbitd_rrd -
  hobbit 12319 12230  0   Mar 04 ?       21:28 hobbitd_channel
--channel=status --log=/var/log/hobbit/rrd-status.log hobbitd_r
  hobbit 12335 12332  1   Mar 04 ?       153:22 hobbitd_alert
--checkpoint-file=/opt/home/hobbit/server/tmp/alert.chk --checkpo

hobbitd_history seems to be running...

Again, status of other tests are being reflected properly, history
recorded properly, both for the same test on other hosts AND for
different tests on the same host. It's like random histories are being
ignored... x,x

history.log shows some errors... But they are later than confirmed times
when some histories weren't being recorded (~2 days ago).

hobbitlaunch.log shows a termination of hobbitd 2 days ago... But this
was the time of a restart so perhaps it's expected. Again, it has been
writing valid histories for some items since then. I'll try forcibly
shutting down the server and brining it back up as well.

stephen
quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Thursday, March 06, 2008 1:43 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] 'red' status not showing up in history

On Thu, Mar 06, 2008 at 10:56:16AM -0800, Menton, Stephen wrote:
I have a test that's gone 'red' multiple times today. I know this 
because I've seen it in the website and alerts have been sent to me. 
Yet if I look in ~hobbit/data/hist, ~hobbit/data/histlogs, or use the 
hsitory CGIs in the web GUI, it shows that it hasn't been 'red' today,
rather that it's been green for >2 days.
The only explanation I can give is that the hobbitd_history module might
have been stopped or crashed when this red status happened.
Could you check the history.log and hobbitlaunch.log files to see if
there's any mention of this ?

Doing a full restart of Hobbit will force a sync of the history logs
with the current status recorded in Hobbit, but it obviously cannot
record events that are long past.


Regards,
Henrik