Xymon Mailing List Archive search

white gaps in graphs across a number of services

list Vincent Baines
Wed, 20 Jun 2012 11:29:22 +0000
Message-Id: <user-55d8ce1403c5@xymon.invalid>

Well, still getting these issues despite tidying alot of errors away.. had quite a few misses last night. Selection of error messages I get include:
alot of these
2012-06-20 11:13:17 xymond_rrd: Got message 460528, expected 460520
2012-06-20 11:14:22 xymond_rrd: Got message 460720, expected 460712
2012-06-20 11:15:41 xymond_rrd: Got message 461145, expected 461133
2012-06-20 11:18:15 xymond_rrd: Got message 462593, expected 462584
2012-06-20 11:18:19 Peer at 0.0.0.0:0 failed: Broken pipe
27089 2012-06-20 11:18:19 Semaphore wait aborted: Interrupted system call
2012-06-20 11:18:19 Peer not up, flushing message queue
27089 2012-06-20 11:18:19 Connecting to peer 0.0.0.0:0
27089 2012-06-20 11:18:19 Peer is UP
2012-06-20 11:18:19 Unknown token 'MEMSTAT' ignored at line 385

at the time of some gaps I get these:
2012-06-20 02:00:57 xymond_rrd: Got message 242464, expected 242463
2012-06-20 02:01:06 Flushed 12 stale messages for 0.0.0.0:0
2012-06-20 02:01:07 Flushed 4 stale messages for 0.0.0.0:0
2012-06-20 02:01:08 xymond_rrd: Got message 242493, expected 242476
2012-06-20 02:01:09 Flushed 5 stale messages for 0.0.0.0:0
2012-06-20 02:01:10 xymond_rrd: Got message 242512, expected 242507
2012-06-20 02:01:36 Flushed 9 stale messages for 0.0.0.0:0
2012-06-20 02:01:37 Flushed 11 stale messages for 0.0.0.0:0
2012-06-20 02:01:38 Flushed 9 stale messages for 0.0.0.0:0
2012-06-20 02:01:39 Flushed 11 stale messages for 0.0.0.0:0
2012-06-20 02:01:39 xymond_rrd: Got message 242703, expected 242663
2012-06-20 02:01:40 xymond_rrd: Got message 242799, expected 242797
2012-06-20 02:01:52 xymond_rrd: Got message 242855, expected 242846
2012-06-20 02:01:53 xymond_rrd: Got message 242874, expected 242866
(and even more in rrd-data.log


and quite a few of these:
2012-06-20 10:46:57 RRD error updating /xymon/data/rrd/hostname1/allext.rrd from 172.30.166.218: /xymon/data/rrd/hostname1/allext.rrd: found extra data on update argument: 46:+2:0.28:80:91.5:64:13:00:04:00:00:00:23:20:00:25:45:29:21:30:44:03:00:54:41:59:42:09:29:51:11:01:50:39:52:59

I'm guessing the latter might be the cause of why I see random RRD files created - there's some strange characters in there. But, I've added an echo to the custom script to log what it sends to xymon, so far the output of that is what I'd expect. Is there some sort of corruption possible - two updates at exactly the same time corrupting somehow?! 

Anything suggestions?

Thanks!
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 20:47
To: Vincent Baines
Cc: Xymon Email List
Subject: RE: [Xymon] white gaps in graphs across a number of services

No problem.. It can be confusing with long process chains like this :)

In tasks.cfg, in [xymond] put it straight after the xymond in the CMD
line. In [rrdstatus] and [rrddata], put it immediately after the
"xymond_rrd" (not xymond_channel).


-jc

Sorry.. hopefully not a stupid question, but where should I put the
--debug flag? I've done this before where I think I've enabled debug, but
haven't and become happy because there were no debug errors!

The logs are a bit messy at the moment, I'm trying to get rid of some of
the errors, the main culprits are too many data sources for the RRD files,
which I can't really explain as they work sometimes, and some cases of the
message relating to 'expected message number XXX and received message
number XXY' - sometimes just one or two but sometimes alot in one go.
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 19:29
To: Vincent Baines
Cc: xymon at xymon.com
Subject: Re: [Xymon] white gaps in graphs across a number of services

Do you see anything unusual in the xymond_rrd or xymond log(s) around that
time? If messages are dropping to zero, it could definitely be a crash
somewhere.

If nothing interesting shows up, try running both with --debug enabled as
well... We might get a better idea of why that's happening.

Regards,

-jc

Hi Everyone,


Have been looking on and off at a problem I've seen for a while now,
without massive success. I see intermittant 'white gaps' occuring in
xymon
results across a number of services, and sometimes at corresponding
times,
but sometimes not. Most frequently I see this gap for CPU load, and this
isn't just specific to one server.

Attached is an example of useres and processes from one client server.
There is a corresponding gap for the approx 3AM gap in CPU utilization
graphs, memory graphs, actually, all of them I think, and a large
300second spike in clock offset at that time. But, nothing corresponding
to the other gaps.


If I look at the xymon server itself, it looks like there was something
up
at that time too, as xymond incoming messages drops to zero. But, for
the
rest of the day,  it holds at a steady number. But, theres are gaps all
over the place in xymonnet runtime, CPU utilization, users and procs,
etc.


I seem to recall we did try to tweak some rrd cache value as it cropped
up
in another post, which I think improved things slightly. But, we are
having problems with the platforms that we're trying to monitor, with
apparent long NFS pings between boxes.


The xymon server itself is running on a VM box. Has anyone had issues
running on VM?


As best I can figure, either we have a xymon config issue, the xymon box
itself isn't stable and it dropping data, or we have genuine network /
disk write issues..


Any other thoughts?


Cheers!

The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use.
If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and
reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use. If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system. 

Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.