Xymon Mailing List Archive search

white gaps in graphs across a number of services

10 messages in this thread

list Vincent Baines · Mon, 18 Jun 2012 17:32:35 +0000 ·
Hi Everyone,


Have been looking on and off at a problem I've seen for a while now, without massive success. I see intermittant 'white gaps' occuring in xymon results across a number of services, and sometimes at corresponding times, but sometimes not. Most frequently I see this gap for CPU load, and this isn't just specific to one server.

Attached is an example of useres and processes from one client server. There is a corresponding gap for the approx 3AM gap in CPU utilization graphs, memory graphs, actually, all of them I think, and a large 300second spike in clock offset at that time. But, nothing corresponding to the other gaps.


If I look at the xymon server itself, it looks like there was something up at that time too, as xymond incoming messages drops to zero. But, for the rest of the day,  it holds at a steady number. But, theres are gaps all over the place in xymonnet runtime, CPU utilization, users and procs, etc.


I seem to recall we did try to tweak some rrd cache value as it cropped up in another post, which I think improved things slightly. But, we are having problems with the platforms that we're trying to monitor, with apparent long NFS pings between boxes.


The xymon server itself is running on a VM box. Has anyone had issues running on VM?


As best I can figure, either we have a xymon config issue, the xymon box itself isn't stable and it dropping data, or we have genuine network / disk write issues..


Any other thoughts?


Cheers!

The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system. 

Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
Attachments (1)
list Japheth Cleaver · Mon, 18 Jun 2012 11:29:57 -0700 (PDT) ·
Do you see anything unusual in the xymond_rrd or xymond log(s) around that
time? If messages are dropping to zero, it could definitely be a crash
somewhere.

If nothing interesting shows up, try running both with --debug enabled as
well... We might get a better idea of why that's happening.

Regards,

-jc
quoted from Vincent Baines

Hi Everyone,


Have been looking on and off at a problem I've seen for a while now,
without massive success. I see intermittant 'white gaps' occuring in xymon
results across a number of services, and sometimes at corresponding times,
but sometimes not. Most frequently I see this gap for CPU load, and this
isn't just specific to one server.

Attached is an example of useres and processes from one client server.
There is a corresponding gap for the approx 3AM gap in CPU utilization
graphs, memory graphs, actually, all of them I think, and a large
300second spike in clock offset at that time. But, nothing corresponding
to the other gaps.


If I look at the xymon server itself, it looks like there was something up
at that time too, as xymond incoming messages drops to zero. But, for the
rest of the day,  it holds at a steady number. But, theres are gaps all
over the place in xymonnet runtime, CPU utilization, users and procs, etc.


I seem to recall we did try to tweak some rrd cache value as it cropped up
in another post, which I think improved things slightly. But, we are
having problems with the platforms that we're trying to monitor, with
apparent long NFS pings between boxes.


The xymon server itself is running on a VM box. Has anyone had issues
running on VM?


As best I can figure, either we have a xymon config issue, the xymon box
itself isn't stable and it dropping data, or we have genuine network /
disk write issues..


Any other thoughts?


Cheers!

The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use. If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
list Vincent Baines · Mon, 18 Jun 2012 18:35:54 +0000 ·
Sorry.. hopefully not a stupid question, but where should I put the --debug flag? I've done this before where I think I've enabled debug, but haven't and become happy because there were no debug errors!

The logs are a bit messy at the moment, I'm trying to get rid of some of the errors, the main culprits are too many data sources for the RRD files, which I can't really explain as they work sometimes, and some cases of the message relating to 'expected message number XXX and received message number XXY' - sometimes just one or two but sometimes alot in one go.
quoted from Japheth Cleaver
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 19:29
To: Vincent Baines
Cc: xymon at xymon.com
Subject: Re: [Xymon] white gaps in graphs across a number of services

Do you see anything unusual in the xymond_rrd or xymond log(s) around that
time? If messages are dropping to zero, it could definitely be a crash
somewhere.

If nothing interesting shows up, try running both with --debug enabled as
well... We might get a better idea of why that's happening.

Regards,

-jc

Hi Everyone,


Have been looking on and off at a problem I've seen for a while now,
without massive success. I see intermittant 'white gaps' occuring in xymon
results across a number of services, and sometimes at corresponding times,
but sometimes not. Most frequently I see this gap for CPU load, and this
isn't just specific to one server.

Attached is an example of useres and processes from one client server.
There is a corresponding gap for the approx 3AM gap in CPU utilization
graphs, memory graphs, actually, all of them I think, and a large
300second spike in clock offset at that time. But, nothing corresponding
to the other gaps.


If I look at the xymon server itself, it looks like there was something up
at that time too, as xymond incoming messages drops to zero. But, for the
rest of the day,  it holds at a steady number. But, theres are gaps all
over the place in xymonnet runtime, CPU utilization, users and procs, etc.


I seem to recall we did try to tweak some rrd cache value as it cropped up
in another post, which I think improved things slightly. But, we are
having problems with the platforms that we're trying to monitor, with
apparent long NFS pings between boxes.


The xymon server itself is running on a VM box. Has anyone had issues
running on VM?


As best I can figure, either we have a xymon config issue, the xymon box
itself isn't stable and it dropping data, or we have genuine network /
disk write issues..


Any other thoughts?


Cheers!

The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use. If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system. 

Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
list Japheth Cleaver · Mon, 18 Jun 2012 12:47:57 -0700 (PDT) ·
No problem.. It can be confusing with long process chains like this :)

In tasks.cfg, in [xymond] put it straight after the xymond in the CMD
line. In [rrdstatus] and [rrddata], put it immediately after the
"xymond_rrd" (not xymond_channel).


-jc
quoted from Vincent Baines

Sorry.. hopefully not a stupid question, but where should I put the
--debug flag? I've done this before where I think I've enabled debug, but
haven't and become happy because there were no debug errors!

The logs are a bit messy at the moment, I'm trying to get rid of some of
the errors, the main culprits are too many data sources for the RRD files,
which I can't really explain as they work sometimes, and some cases of the
message relating to 'expected message number XXX and received message
number XXY' - sometimes just one or two but sometimes alot in one go.
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 19:29
To: Vincent Baines
Cc: xymon at xymon.com
Subject: Re: [Xymon] white gaps in graphs across a number of services

Do you see anything unusual in the xymond_rrd or xymond log(s) around that
time? If messages are dropping to zero, it could definitely be a crash
somewhere.

If nothing interesting shows up, try running both with --debug enabled as
well... We might get a better idea of why that's happening.

Regards,

-jc

Hi Everyone,


Have been looking on and off at a problem I've seen for a while now,
without massive success. I see intermittant 'white gaps' occuring in
xymon
results across a number of services, and sometimes at corresponding
times,
but sometimes not. Most frequently I see this gap for CPU load, and this
isn't just specific to one server.

Attached is an example of useres and processes from one client server.
There is a corresponding gap for the approx 3AM gap in CPU utilization
graphs, memory graphs, actually, all of them I think, and a large
300second spike in clock offset at that time. But, nothing corresponding
to the other gaps.


If I look at the xymon server itself, it looks like there was something
up
at that time too, as xymond incoming messages drops to zero. But, for
the
rest of the day,  it holds at a steady number. But, theres are gaps all
over the place in xymonnet runtime, CPU utilization, users and procs,
etc.


I seem to recall we did try to tweak some rrd cache value as it cropped
up
in another post, which I think improved things slightly. But, we are
having problems with the platforms that we're trying to monitor, with
apparent long NFS pings between boxes.


The xymon server itself is running on a VM box. Has anyone had issues
running on VM?


As best I can figure, either we have a xymon config issue, the xymon box
itself isn't stable and it dropping data, or we have genuine network /
disk write issues..


Any other thoughts?


Cheers!

The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use.
If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and
reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use. If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
list Vincent Baines · Tue, 19 Jun 2012 10:13:32 +0000 ·
Thanks! will put those changes in now and see what it collects.

One other thing thats bugged me for a while, maybe related, I get some really random spurious RRD files generated, which when I look in the trends page for a specific host really make things messy. So, for example, in ./data/rrd/hostname1 for a specific service I'm monitoring called warehouse, I should have:
./warehouse,Memory.rrd
./warehouse,Threads.rrd

but as well as those I get all sorts of randomness:
warehouse,24590_24589_xymon_09.rrd
warehouse,4224_1_hostname2_20.rrd
warehouse,Kernel.rrd
murexnet,_FONT_SIZE.rrd
etc
In other words, appended other server names, PIDs, and other processes, and even xymon keywords.. 

They seem to get generated in clumps everynow and then, say a whole load of new ones at a specific time.

With the debug flags on I'll see if anything corresponds to a time when they're created.. but someone may have seen this one before maybe..?
quoted from Japheth Cleaver
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 20:47
To: Vincent Baines
Cc: Xymon Email List
Subject: RE: [Xymon] white gaps in graphs across a number of services

No problem.. It can be confusing with long process chains like this :)

In tasks.cfg, in [xymond] put it straight after the xymond in the CMD
line. In [rrdstatus] and [rrddata], put it immediately after the
"xymond_rrd" (not xymond_channel).


-jc

Sorry.. hopefully not a stupid question, but where should I put the
--debug flag? I've done this before where I think I've enabled debug, but
haven't and become happy because there were no debug errors!

The logs are a bit messy at the moment, I'm trying to get rid of some of
the errors, the main culprits are too many data sources for the RRD files,
which I can't really explain as they work sometimes, and some cases of the
message relating to 'expected message number XXX and received message
number XXY' - sometimes just one or two but sometimes alot in one go.
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 19:29
To: Vincent Baines
Cc: xymon at xymon.com
Subject: Re: [Xymon] white gaps in graphs across a number of services

Do you see anything unusual in the xymond_rrd or xymond log(s) around that
time? If messages are dropping to zero, it could definitely be a crash
somewhere.

If nothing interesting shows up, try running both with --debug enabled as
well... We might get a better idea of why that's happening.

Regards,

-jc

Hi Everyone,


Have been looking on and off at a problem I've seen for a while now,
without massive success. I see intermittant 'white gaps' occuring in
xymon
results across a number of services, and sometimes at corresponding
times,
but sometimes not. Most frequently I see this gap for CPU load, and this
isn't just specific to one server.

Attached is an example of useres and processes from one client server.
There is a corresponding gap for the approx 3AM gap in CPU utilization
graphs, memory graphs, actually, all of them I think, and a large
300second spike in clock offset at that time. But, nothing corresponding
to the other gaps.


If I look at the xymon server itself, it looks like there was something
up
at that time too, as xymond incoming messages drops to zero. But, for
the
rest of the day,  it holds at a steady number. But, theres are gaps all
over the place in xymonnet runtime, CPU utilization, users and procs,
etc.


I seem to recall we did try to tweak some rrd cache value as it cropped
up
in another post, which I think improved things slightly. But, we are
having problems with the platforms that we're trying to monitor, with
apparent long NFS pings between boxes.


The xymon server itself is running on a VM box. Has anyone had issues
running on VM?


As best I can figure, either we have a xymon config issue, the xymon box
itself isn't stable and it dropping data, or we have genuine network /
disk write issues..


Any other thoughts?


Cheers!

The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use.
If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and
reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use. If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system. 

Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
list W.J.M. Nelis · Tue, 19 Jun 2012 12:20:41 +0200 ·
quoted from Vincent Baines
On 06/19/2012 12:13 PM, Vincent Baines wrote:
One other thing thats bugged me for a while, maybe related, I get some
really random spurious RRD files generated, which when I look in the trends
page for a specific host really make things messy. So, for example, in
./data/rrd/hostname1 for a specific service I'm monitoring called
warehouse, I should have:
./warehouse,Memory.rrd
./warehouse,Threads.rrd

but as well as those I get all sorts of randomness:
warehouse,24590_24589_xymon_09.rrd
warehouse,4224_1_hostname2_20.rrd
warehouse,Kernel.rrd
murexnet,_FONT_SIZE.rrd
etc
In other words, appended other server names, PIDs, and other processes,
and even xymon keywords..

They seem to get generated in clumps everynow and then, say a whole load
of new ones at a specific time.
Colons and equal signs are confusing the RRD module of Xymon. Replacing 
each ':' by '&#58' and each '=' by '&#61' should solve this problem. Of 
course, these replacements should NOT be done in the lines containing the 
actual data to be entered into an RRD.

kind regards,
   Wim Nelis.


******************************************************************************************************************

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual
or legal commitment on the part of the sender.
This message may contain information that is not intended for you. If you are not the addressee or if this
message was sent to you by mistake, you are requested to inform the sender and delete the message.
Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic
transmission of messages.
 
******************************************************************************************************************
list Vincent Baines · Tue, 19 Jun 2012 10:31:42 +0000 ·
Sorry, could I just check where abouts you mean? In the client side scripts reporting back to xymon, where I use colons to seperate variable name : value? e.g. where i have:

                $BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)
Memory : 0
Threads : 0"

change to

                $BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)
Memory &#58 0
Threads &#58 0"


or in the xymonserver.cfg definitions, where I have:
SPLITNCV_warehouse="Memory:GAUGE,Threads:GAUGE"
to
SPLITNCV_warehouse="Memory&#58GAUGE,Threads&#58GAUGE"

The only slightly special char I use is the - (I have a couple of names such as data-feed), does that need to be changed to a corresponding code? Or have I missed something.

Very much appreciate the help!
quoted from W.J.M. Nelis
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] on behalf of W.J.M. Nelis [user-6956df205d63@xymon.invalid]
Sent: 19 June 2012 11:20
To: xymon at xymon.com
Subject: Re: [Xymon] white gaps in graphs across a number of services

On 06/19/2012 12:13 PM, Vincent Baines wrote:
One other thing thats bugged me for a while, maybe related, I get some
really random spurious RRD files generated, which when I look in the trends
page for a specific host really make things messy. So, for example, in
./data/rrd/hostname1 for a specific service I'm monitoring called
warehouse, I should have:
./warehouse,Memory.rrd
./warehouse,Threads.rrd

but as well as those I get all sorts of randomness:
warehouse,24590_24589_xymon_09.rrd
warehouse,4224_1_hostname2_20.rrd
warehouse,Kernel.rrd
murexnet,_FONT_SIZE.rrd
etc
In other words, appended other server names, PIDs, and other processes,
and even xymon keywords..

They seem to get generated in clumps everynow and then, say a whole load
of new ones at a specific time.
Colons and equal signs are confusing the RRD module of Xymon. Replacing
each ':' by '&#58' and each '=' by '&#61' should solve this problem. Of
course, these replacements should NOT be done in the lines containing the
actual data to be entered into an RRD.

kind regards,
   Wim Nelis.


******************************************************************************************************************

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual
or legal commitment on the part of the sender.
This message may contain information that is not intended for you. If you are not the addressee or if this
message was sent to you by mistake, you are requested to inform the sender and delete the message.
Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic
transmission of messages.

******************************************************************************************************************


The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system. 

Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
list W.J.M. Nelis · Tue, 19 Jun 2012 13:26:12 +0200 ·
quoted from Vincent Baines
On 06/19/2012 12:31 PM, Vincent Baines wrote:
Sorry, could I just check where abouts you mean? In the client side scripts reporting back to xymon, where I use colons to seperate variable name : value? e.g. where i have:
You right, I did not specify that part. The place to change colons and 
equal signs is the status (or data) message which is sent to Xymon. I've 
seen a few times that a line containing a colon resulted in an unexpected 
RRD file to be created.
quoted from Vincent Baines
                 $BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)
Memory : 0
Threads : 0"

change to

                 $BB $BBDISP "status $MACHINE.$TIDY_SERVICE red $(date)

Memory&#58 0
Threads&#58 0"
The RRD collector in Xymon searches for lines, which contain a colon or an 
equal-sign and satisfy some other criteria as well. By replacing the colon 
or equal-sign in those lines which are not meant to contain data for an 
RRD, you're certain that those lines will not result in funny RRDs.

Kind regards,
   Wim Nelis.
quoted from Vincent Baines

or in the xymonserver.cfg definitions, where I have:
SPLITNCV_warehouse="Memory:GAUGE,Threads:GAUGE"
to
SPLITNCV_warehouse="Memory&#58GAUGE,Threads&#58GAUGE"

The only slightly special char I use is the - (I have a couple of names such as data-feed), does that need to be changed to a corresponding code? Or have I missed something.

Very much appreciate the help!
From: xymon-bounces at xymon.com [xymon-bounces at xymon.com] on behalf of W.J.M. Nelis [user-6956df205d63@xymon.invalid]
Sent: 19 June 2012 11:20
To: xymon at xymon.com
Subject: Re: [Xymon] white gaps in graphs across a number of services

On 06/19/2012 12:13 PM, Vincent Baines wrote:
One other thing thats bugged me for a while, maybe related, I get some
really random spurious RRD files generated, which when I look in the trends
page for a specific host really make things messy. So, for example, in
./data/rrd/hostname1 for a specific service I'm monitoring called
warehouse, I should have:
./warehouse,Memory.rrd
./warehouse,Threads.rrd

but as well as those I get all sorts of randomness:
warehouse,24590_24589_xymon_09.rrd
warehouse,4224_1_hostname2_20.rrd
warehouse,Kernel.rrd
murexnet,_FONT_SIZE.rrd
etc
In other words, appended other server names, PIDs, and other processes,
and even xymon keywords..

They seem to get generated in clumps everynow and then, say a whole load
of new ones at a specific time.
Colons and equal signs are confusing the RRD module of Xymon. Replacing
each ':' by '&#58' and each '=' by '&#61' should solve this problem. Of
course, these replacements should NOT be done in the lines containing the
actual data to be entered into an RRD.

kind regards,
    Wim Nelis.

******************************************************************************************************************

The NLR disclaimer is valid for NLR e-mail messages.

This message is only meant for providing information. Nothing in this e-mail message amounts to a contractual
or legal commitment on the part of the sender.
This message may contain information that is not intended for you. If you are not the addressee or if this
message was sent to you by mistake, you are requested to inform the sender and delete the message.
Sender accepts no liability for damage of any kind resulting from the risks inherent in the electronic
transmission of messages.
 
******************************************************************************************************************
list Vincent Baines · Wed, 20 Jun 2012 11:29:22 +0000 ·
Well, still getting these issues despite tidying alot of errors away.. had quite a few misses last night. Selection of error messages I get include:
alot of these
2012-06-20 11:13:17 xymond_rrd: Got message 460528, expected 460520
2012-06-20 11:14:22 xymond_rrd: Got message 460720, expected 460712
2012-06-20 11:15:41 xymond_rrd: Got message 461145, expected 461133
2012-06-20 11:18:15 xymond_rrd: Got message 462593, expected 462584
2012-06-20 11:18:19 Peer at 0.0.0.0:0 failed: Broken pipe
27089 2012-06-20 11:18:19 Semaphore wait aborted: Interrupted system call
2012-06-20 11:18:19 Peer not up, flushing message queue
27089 2012-06-20 11:18:19 Connecting to peer 0.0.0.0:0
27089 2012-06-20 11:18:19 Peer is UP
2012-06-20 11:18:19 Unknown token 'MEMSTAT' ignored at line 385

at the time of some gaps I get these:
2012-06-20 02:00:57 xymond_rrd: Got message 242464, expected 242463
2012-06-20 02:01:06 Flushed 12 stale messages for 0.0.0.0:0
2012-06-20 02:01:07 Flushed 4 stale messages for 0.0.0.0:0
2012-06-20 02:01:08 xymond_rrd: Got message 242493, expected 242476
2012-06-20 02:01:09 Flushed 5 stale messages for 0.0.0.0:0
2012-06-20 02:01:10 xymond_rrd: Got message 242512, expected 242507
2012-06-20 02:01:36 Flushed 9 stale messages for 0.0.0.0:0
2012-06-20 02:01:37 Flushed 11 stale messages for 0.0.0.0:0
2012-06-20 02:01:38 Flushed 9 stale messages for 0.0.0.0:0
2012-06-20 02:01:39 Flushed 11 stale messages for 0.0.0.0:0
2012-06-20 02:01:39 xymond_rrd: Got message 242703, expected 242663
2012-06-20 02:01:40 xymond_rrd: Got message 242799, expected 242797
2012-06-20 02:01:52 xymond_rrd: Got message 242855, expected 242846
2012-06-20 02:01:53 xymond_rrd: Got message 242874, expected 242866
(and even more in rrd-data.log


and quite a few of these:
2012-06-20 10:46:57 RRD error updating /xymon/data/rrd/hostname1/allext.rrd from 172.30.166.218: /xymon/data/rrd/hostname1/allext.rrd: found extra data on update argument: 46:+2:0.28:80:91.5:64:13:00:04:00:00:00:23:20:00:25:45:29:21:30:44:03:00:54:41:59:42:09:29:51:11:01:50:39:52:59

I'm guessing the latter might be the cause of why I see random RRD files created - there's some strange characters in there. But, I've added an echo to the custom script to log what it sends to xymon, so far the output of that is what I'd expect. Is there some sort of corruption possible - two updates at exactly the same time corrupting somehow?! 

Anything suggestions?

Thanks!
quoted from Vincent Baines
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 20:47
To: Vincent Baines
Cc: Xymon Email List
Subject: RE: [Xymon] white gaps in graphs across a number of services

No problem.. It can be confusing with long process chains like this :)

In tasks.cfg, in [xymond] put it straight after the xymond in the CMD
line. In [rrdstatus] and [rrddata], put it immediately after the
"xymond_rrd" (not xymond_channel).


-jc

Sorry.. hopefully not a stupid question, but where should I put the
--debug flag? I've done this before where I think I've enabled debug, but
haven't and become happy because there were no debug errors!

The logs are a bit messy at the moment, I'm trying to get rid of some of
the errors, the main culprits are too many data sources for the RRD files,
which I can't really explain as they work sometimes, and some cases of the
message relating to 'expected message number XXX and received message
number XXY' - sometimes just one or two but sometimes alot in one go.
From: user-87556346d4af@xymon.invalid [user-87556346d4af@xymon.invalid]
Sent: 18 June 2012 19:29
To: Vincent Baines
Cc: xymon at xymon.com
Subject: Re: [Xymon] white gaps in graphs across a number of services

Do you see anything unusual in the xymond_rrd or xymond log(s) around that
time? If messages are dropping to zero, it could definitely be a crash
somewhere.

If nothing interesting shows up, try running both with --debug enabled as
well... We might get a better idea of why that's happening.

Regards,

-jc

Hi Everyone,


Have been looking on and off at a problem I've seen for a while now,
without massive success. I see intermittant 'white gaps' occuring in
xymon
results across a number of services, and sometimes at corresponding
times,
but sometimes not. Most frequently I see this gap for CPU load, and this
isn't just specific to one server.

Attached is an example of useres and processes from one client server.
There is a corresponding gap for the approx 3AM gap in CPU utilization
graphs, memory graphs, actually, all of them I think, and a large
300second spike in clock offset at that time. But, nothing corresponding
to the other gaps.


If I look at the xymon server itself, it looks like there was something
up
at that time too, as xymond incoming messages drops to zero. But, for
the
rest of the day,  it holds at a steady number. But, theres are gaps all
over the place in xymonnet runtime, CPU utilization, users and procs,
etc.


I seem to recall we did try to tweak some rrd cache value as it cropped
up
in another post, which I think improved things slightly. But, we are
having problems with the platforms that we're trying to monitor, with
apparent long NFS pings between boxes.


The xymon server itself is running on a VM box. Has anyone had issues
running on VM?


As best I can figure, either we have a xymon config issue, the xymon box
itself isn't stable and it dropping data, or we have genuine network /
disk write issues..


Any other thoughts?


Cheers!

The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use.
If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and
reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is
confidential and intended solely for the addressee(s). The email may be
legally privileged or prohibited from disclosure and unauthorised use. If
you are not the named addressee you may not use, copy, or disclose this
information to any other person. If you received this message in error
please notify the sender immediately and delete it from your system.

Any opinion or views contained in this email message are those of the
sender, and do not represent those of the Company in any way and reliance
should not be placed upon its contents. Unless otherwise stated, this
email message is not intended to be contractually binding. Where an
Agreement exists between our respective companies and there is conflict
between the contents of this email message and the Agreement then the
terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further
information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take
responsibility for the environment. Please don't print this email unless
you absolutely have to.
The information contained in this email and any attached files is confidential and intended solely for the addressee(s). The email may be legally privileged or prohibited from disclosure and unauthorised use. If you are not the named addressee you may not use, copy, or disclose this information to any other person. If you received this message in error please notify the sender immediately and delete it from your system. 

Any opinion or views contained in this email message are those of the sender, and do not represent those of the Company in any way and reliance should not be placed upon its contents. Unless otherwise stated, this email message is not intended to be contractually binding. Where an Agreement exists between our respective companies and there is conflict between the contents of this email message and the Agreement then the terms of that Agreement shall prevail.

Excelian Limited
XX Featherstone Street
London
EC1Y 8RN
Tel: +XX (X) XX XXXX XXXX
www.Excelian.com
This e-mail has been scanned for viruses by MessageLabs. For further information visit http://www.messagelabs.com

Excelian subscribes to cleaner and greener methods of working. Help take responsibility for the environment. Please don't print this email unless you absolutely have to.
list Henrik Størner · Sat, 14 Jul 2012 12:42:28 +0200 ·
quoted from Vincent Baines
On 18-06-2012 19:32, Vincent Baines wrote:
Have been looking on and off at a problem I've seen for a while now,
without massive success. I see intermittant 'white gaps' occuring in
xymon results across a number of services, and sometimes at
corresponding times, but sometimes not. Most frequently I see this gap
for CPU load, and this isn't just specific to one server.
Make sure you're on the latest Xymon version (4.3.7 currently, 4.3.8 in a few days).

If that is already the case, then check the I/O load on your system (vmstat or iostat) - if the disk load is too high, xymond_rrd will drop messages. Should say so in your logs, though.


Regards,
Henrik