There's a primary and a secondary issue here.
The chief problem was that TRACK and OPTIONAL seemed to not be tracked
as options to a test as a result of r7683 and r7686 (on some platforms).
Secondarily, 'nostale' is the default on svcstatus.sh pages, wherein it
will eventually not display an old RRD page on the status -- in this
case, because it hadn't been updated recently. I'm not sure how I feel
about the latter issue, but it's been that way for a while.
I believe the included patch fixes the main issue; I'm testing now (as
4.3.22-5 in http://terabithia.org/rpms/xymon/testing/el6/x86_64/).
This is enough to warrant a 4.3.23 release shortly, upon confirmation.
-jc
▸ quoted from Axel Beckert
On 11/11/2015 9:54 AM, Axel Beckert wrote:
Hi,
[TL;DR: See Summary at the end.]
I'm slowly running out of ideas with the following issue which has
been noticed after I rolled out 4.3.22-rc2 on our two monitoring
servers (still running the servers on 4.3.22-rc2 at the moment):
The graph on
https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=procs
is no more there, because
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph_width=576&graph_height=120&disp=zwoelfi&nostale&color=green&graph_start=1447069296&graph_end=1447242096&graph=hourly&action=view
returns only an 1x1 pixel PMG. The same happens on the second
(independent, not slave) server, too.
(No version changes on the affected clients. Those I checked have
either 4.3.0-beta2 from Debian 7 or 4.3.17 from Debian 8.)
I've found the following messages upon reloading the above URL in Apache's
error.log:
2015-11-11 12:32:38.839801 Sendto failed: Connection refused
2015-11-11 12:32:38.839853 Sendto failed: Connection refused
2015-11-11 12:32:38.839871 Sendto failed: Connection refused
I've found http://lists.xymon.com/archive/2015-February/041189.html
with these messages, stopped the xymon service, removed all left over
rrdctl.* files from /var/lib/xymon/tmp/ and started the xymon service
again.
Result is: I still only get an 1x1 pixel PNG, but the error messages
are gone, i.e. the issues are likely unrelated as they were in the
mailing list posting above.
Then again on
https://xymon.phys.ethz.ch/xymon-cgi/svcstatus.sh?HOST=zwoelfi&SERVICE=trends
the "Process counts" graph is there (but seems not working):
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph_width=576&graph_height=120&first=1&count=4&disp=zwoelfi&graph_start=1447069994&graph_end=1447242794&graph=hourly&action=view
The difference between this and the first URL are (besides the time
stamps): The first URL has nostale (without value) and color=green as additional query
string parameters, and the second URL has instead first=1 and count=4
as query string parameters.
As soon as I remove the "nostale" without a value or add a value like
e.g. "nostale=1", the graph is back again (but still no more working).
So while the (reduced to the minimum parameters) URL
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph=hourly&action=view
shows (an empty) graph,
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&disp=zwoelfi&graph=hourly&action=view&nostale
gives a 1x1 pixel.
With regards to the empty graph,
https://xymon.phys.ethz.ch/xymon-cgi/showgraph.sh?host=zwoelfi&service=processes&graph=daily&action=view
is not empty, it just shows that there is no more data since the 4th
of November (when I updated the servers from 4.3.21 to 4.3.22-rc2).
And indeed, in /var/lib/xymon/rrd/zwoelfi/, not all files have been
updated anymore since 4th of November:
# ls -l *proc*
-rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.apache2.rrd
-rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.automount.rrd
-rw-r--r-- 1 xymon xymon 19640 Nov 4 15:40 processes.stress.rrd
-rw-r--r-- 1 xymon xymon 19640 Nov 11 13:09 procs.rrd
#
Summary
=======
So there seem to be two issues with 4.3.22:
* The graph in the procs check's page isn't displayed properly.
Either
+ "nostale" should get a value in the page/template,
+ or the parsing of the "nostale" parameter without value in the
showgraph CGI
should be fixed. This sounds rather easy, but I'm not sure which
variant is the expected one.
* For some reason the processes.*.rrd files defined by "TRACK=" in
analysis.cfg no more get updated.
Here I currently have no good idea where this comes from. Maybe from
one of the NCV-related changes. At least I found no configuration
change (be it local or in the defaults/templates) which could have
triggered this issue.
Kind regards, Axel Beckert
-------------- next part --------------
--- xymond/client_config.c.chkflags32 2015-11-11 12:47:51.629681735 -0800
+++ xymond/client_config.c 2015-11-11 13:27:06.897682379 -0800
@@ -117,36 +117,36 @@
} c_paging_t;
-#define FCHK_NOEXIST (1ULL << 0)
-#define FCHK_TYPE (1ULL << 1)
-#define FCHK_MODE (1ULL << 2)
-#define FCHK_MINLINKS (1ULL << 3)
-#define FCHK_MAXLINKS (1ULL << 4)
-#define FCHK_EQLLINKS (1ULL << 5)
-#define FCHK_MINSIZE (1ULL << 6)
-#define FCHK_MAXSIZE (1ULL << 7)
-#define FCHK_EQLSIZE (1ULL << 8)
-#define FCHK_OWNERID (1ULL << 10)
-#define FCHK_OWNERSTR (1ULL << 11)
-#define FCHK_GROUPID (1ULL << 12)
-#define FCHK_GROUPSTR (1ULL << 13)
-#define FCHK_CTIMEMIN (1ULL << 16)
-#define FCHK_CTIMEMAX (1ULL << 17)
-#define FCHK_CTIMEEQL (1ULL << 18)
-#define FCHK_MTIMEMIN (1ULL << 19)
-#define FCHK_MTIMEMAX (1ULL << 20)
-#define FCHK_MTIMEEQL (1ULL << 21)
-#define FCHK_ATIMEMIN (1ULL << 22)
-#define FCHK_ATIMEMAX (1ULL << 23)
-#define FCHK_ATIMEEQL (1ULL << 24)
-#define FCHK_MD5 (1ULL << 25)
-#define FCHK_SHA1 (1ULL << 26)
-#define FCHK_SHA256 (1ULL << 27)
-#define FCHK_SHA512 (1ULL << 28)
-#define FCHK_SHA224 (1ULL << 29)
-#define FCHK_SHA384 (1ULL << 30)
-#define FCHK_RMD160 (1ULL << 31)
+#define FCHK_NOEXIST (1 << 0)
+#define FCHK_TYPE (1 << 1)
+#define FCHK_MODE (1 << 2)
+#define FCHK_MINLINKS (1 << 3)
+#define FCHK_MAXLINKS (1 << 4)
+#define FCHK_EQLLINKS (1 << 5)
+#define FCHK_MINSIZE (1 << 6)
+#define FCHK_MAXSIZE (1 << 7)
+#define FCHK_EQLSIZE (1 << 8)
+#define FCHK_OWNERID (1 << 10)
+#define FCHK_OWNERSTR (1 << 11)
+#define FCHK_GROUPID (1 << 12)
+#define FCHK_GROUPSTR (1 << 13)
+#define FCHK_CTIMEMIN (1 << 16)
+#define FCHK_CTIMEMAX (1 << 17)
+#define FCHK_CTIMEEQL (1 << 18)
+#define FCHK_MTIMEMIN (1 << 19)
+#define FCHK_MTIMEMAX (1 << 20)
+#define FCHK_MTIMEEQL (1 << 21)
+#define FCHK_ATIMEMIN (1 << 22)
+#define FCHK_ATIMEMAX (1 << 23)
+#define FCHK_ATIMEEQL (1 << 24)
+#define FCHK_MD5 (1 << 25)
+#define FCHK_SHA1 (1 << 26)
+#define FCHK_SHA256 (1 << 27)
+#define FCHK_SHA512 (1 << 28)
+#define FCHK_SHA224 (1 << 29)
+#define FCHK_SHA384 (1 << 30)
+#define FCHK_RMD160 (1 << 31)
-#define CHK_OPTIONAL (1ULL << 33)
-#define CHK_TRACKIT (1ULL << 34)
+#define CHK_OPTIONAL (1 << 0)
+#define CHK_TRACKIT (1 << 1)
typedef struct c_file_t {
@@ -253,5 +253,6 @@
ruletype_t ruletype;
int cfid;
- unsigned long long flags;
+ uint32_t flags;
+ uint32_t chkflags;
struct c_rule_t *next;
union {
@@ -979,5 +980,5 @@
}
else if (strncasecmp(tok, "track", 5) == 0) {
- currule->flags |= CHK_TRACKIT;
+ currule->chkflags |= CHK_TRACKIT;
if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
}
@@ -1028,5 +1029,5 @@
}
else if (strcasecmp(tok, "optional") == 0) {
- currule->flags |= CHK_OPTIONAL;
+ currule->chkflags |= CHK_OPTIONAL;
}
else if (idx == 0) {
@@ -1199,9 +1200,9 @@
}
else if (strncasecmp(tok, "track", 5) == 0) {
- currule->flags |= CHK_TRACKIT;
+ currule->chkflags |= CHK_TRACKIT;
if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
}
else if (strcasecmp(tok, "optional") == 0) {
- currule->flags |= CHK_OPTIONAL;
+ currule->chkflags |= CHK_OPTIONAL;
}
else {
@@ -1230,5 +1231,5 @@
}
else if (strncasecmp(tok, "track", 5) == 0) {
- currule->flags |= CHK_TRACKIT;
+ currule->chkflags |= CHK_TRACKIT;
if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
}
@@ -1292,5 +1293,5 @@
}
else if (strncasecmp(tok, "track", 5) == 0) {
- currule->flags |= CHK_TRACKIT;
+ currule->chkflags |= CHK_TRACKIT;
if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
}
@@ -1543,5 +1544,5 @@
}
else if (strncasecmp(tok, "track", 5) == 0) {
- currule->flags |= CHK_TRACKIT;
+ currule->chkflags |= CHK_TRACKIT;
if (*(tok+5) == '=') currule->rrdidstr = strdup(tok+6);
}
@@ -1906,10 +1907,10 @@
}
- if (rwalk->flags & CHK_TRACKIT) {
+ if (rwalk->chkflags & CHK_TRACKIT) {
printf(" TRACK");
if (rwalk->rrdidstr) printf("=%s", rwalk->rrdidstr);
}
- if (rwalk->flags & CHK_OPTIONAL) printf(" OPTIONAL");
+ if (rwalk->chkflags & CHK_OPTIONAL) printf(" OPTIONAL");
if (rwalk->timespec) printf(" TIME=%s", rwalk->timespec);
@@ -2568,5 +2569,5 @@
if (nofile) {
- if (!(rule->flags & CHK_OPTIONAL)) {
+ if (!(rule->chkflags & CHK_OPTIONAL)) {
if (COL_YELLOW > result) result = COL_YELLOW;
addalertgroup(rule->groups);
@@ -2751,5 +2752,5 @@
*anyrules = 1;
if (!exists) {
- if (rwalk->flags & CHK_OPTIONAL) goto nextcheck;
+ if (rwalk->chkflags & CHK_OPTIONAL) goto nextcheck;
if (!(rwalk->flags & FCHK_NOEXIST)) {
@@ -2984,5 +2985,5 @@
}
}
- if (rwalk->flags & CHK_TRACKIT) {
+ if (rwalk->chkflags & CHK_TRACKIT) {
*trackit = (trackit || (ftype == S_IFREG));
*id = rwalk->rrdidstr;
@@ -3066,5 +3067,5 @@
}
}
- if (rwalk->flags & CHK_TRACKIT) {
+ if (rwalk->chkflags & CHK_TRACKIT) {
*trackit = 1;
*id = rwalk->rrdidstr;
@@ -3238,5 +3239,5 @@
*warnage = rule->rule.mqqueue.warnage;
*critage = rule->rule.mqqueue.critage;
- if (rule->flags & CHK_TRACKIT) *trackit = (rule->rrdidstr ? rule->rrdidstr : "");
+ if (rule->chkflags & CHK_TRACKIT) *trackit = (rule->rrdidstr ? rule->rrdidstr : "");
return;
}
@@ -3471,5 +3472,5 @@
if ((*lowlim != 0) && (*count < *lowlim)) *color = (*walk)->rule->rule.proc.color;
if ((*uplim != -1) && (*count > *uplim)) *color = (*walk)->rule->rule.proc.color;
- *trackit = ((*walk)->rule->flags & CHK_TRACKIT);
+ *trackit = ((*walk)->rule->chkflags & CHK_TRACKIT);
*id = (*walk)->rule->rrdidstr;
if (group) *group = (*walk)->rule->groups;
@@ -3540,5 +3541,5 @@
if ((*lowlim != 0) && (*count < *lowlim)) *color = (*walk)->rule->rule.port.color;
if ((*uplim != -1) && (*count > *uplim)) *color = (*walk)->rule->rule.port.color;
- *trackit = ((*walk)->rule->flags & CHK_TRACKIT);
+ *trackit = ((*walk)->rule->chkflags & CHK_TRACKIT);
*id = (*walk)->rule->rrdidstr;
if (group) *group = (*walk)->rule->groups;