some larrd issues on hobbit 4.0.3 rc1

14 messages in this thread

list Tom Kauffman · Fri, 29 Apr 2005 13:27:49 -0500 ·

OK -- I chickened out and haven't thrown the Big Red Switch yet -- but I
have hobbit running (display functions only) on my failover system and
most everything looks good -- except (there's that ugly word) my vmstat
graphs for my AIX systems.

Something is quite wrong, and I'm not sure what to look at.

Here's what the vmstat bottom feeder ships out:

aix
 1  3 2342710   511   0   1   1 2249 14879   0 1964 11870 4215 15 11 31
42

so cpu_usr is 15, cpu_sys is 11, cpu_idl is 31, and cpu_wait is 42.

But the vmstat graph is giving me a system of 0.0, user 1670.0, and idle
of 2347945.2.

There are no errors in larrd-status.log or larrd-data.log. And this
seems to be happening to all my AIX systems -- cpu_idle and/or cpu_user
are massively inflated. Any ideas of where to look? (My two suse linux
systems look correct).

Also -- I only get the vmstat graph for AIX - I'm missing vmstat0,
vmstat2, vmstat3, and vmstat8. Where do I enable the graphing for these?

And while I'm at it -- I need to disable the tracking and displaying of
disk filesystem usage data -- virtually all my filesystems contain
Oracle tablespaces and they are 100% full at the OS level shortly after
creation -- so I can't see any reason to track them.

Other than that -- looks great!

Tom Kauffman

list Eric E *hs Schwimmer · Fri, 29 Apr 2005 14:46:14 -0400 ·

Tom,

Are there any pertinent entries in your larrd-status.log? Something
along the lines of "expected 16 data source readings (got 17)"?  I've
got the same thing happening on many of my Fedora Core 3 boxes.

Regards,
-Eric

▸ quoted from Tom Kauffman

-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid] 
Sent: Friday, April 29, 2005 2:28 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] some larrd issues on hobbit 4.0.3 rc1

OK -- I chickened out and haven't thrown the Big Red Switch yet -- but I
have hobbit running (display functions only) on my failover system and
most everything looks good -- except (there's that ugly word) my vmstat
graphs for my AIX systems.

Something is quite wrong, and I'm not sure what to look at.

Here's what the vmstat bottom feeder ships out:

aix
 1  3 2342710   511   0   1   1 2249 14879   0 1964 11870 4215 15 11 31
42

so cpu_usr is 15, cpu_sys is 11, cpu_idl is 31, and cpu_wait is 42.

But the vmstat graph is giving me a system of 0.0, user 1670.0, and idle
of 2347945.2.

There are no errors in larrd-status.log or larrd-data.log. And this
seems to be happening to all my AIX systems -- cpu_idle and/or cpu_user
are massively inflated. Any ideas of where to look? (My two suse linux
systems look correct).

Also -- I only get the vmstat graph for AIX - I'm missing vmstat0,
vmstat2, vmstat3, and vmstat8. Where do I enable the graphing for these?

And while I'm at it -- I need to disable the tracking and displaying of
disk filesystem usage data -- virtually all my filesystems contain
Oracle tablespaces and they are 100% full at the OS level shortly after
creation -- so I can't see any reason to track them.

Other than that -- looks great!

Tom Kauffman

list Eric E *hs Schwimmer · Fri, 29 Apr 2005 14:50:34 -0400 ·

Now that I've read your whole message ... *slaps himself*

Man, I'm glad its Friday.

e

▸ quoted from Eric E *hs Schwimmer

-----Original Message-----
From: Schwimmer, Eric E *HS [mailto:user-1e1008b069d5@xymon.invalid] Sent: Friday, April 29, 2005 2:46 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] some larrd issues on hobbit 4.0.3 rc1

Tom,

Are there any pertinent entries in your larrd-status.log? Something along the lines of "expected 16 data source readings (got 17)"?  I've got the same thing happening on many of my Fedora Core 3 boxes.

Regards,
-Eric
 -----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid]
Sent: Friday, April 29, 2005 2:28 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] some larrd issues on hobbit 4.0.3 rc1

OK -- I chickened out and haven't thrown the Big Red Switch yet -- but I have hobbit running (display functions only) on my failover system and most everything looks good -- except (there's that ugly word) my vmstat graphs for my AIX systems.

Something is quite wrong, and I'm not sure what to look at.

Here's what the vmstat bottom feeder ships out:

aix
 1  3 2342710   511   0   1   1 2249 14879   0 1964 11870 4215 15 11 31
42

so cpu_usr is 15, cpu_sys is 11, cpu_idl is 31, and cpu_wait is 42.

But the vmstat graph is giving me a system of 0.0, user 1670.0, and idle of 2347945.2.

There are no errors in larrd-status.log or larrd-data.log. And this seems to be happening to all my AIX systems -- cpu_idle and/or cpu_user are massively inflated. Any ideas of where to look? (My two suse linux systems look correct).

Also -- I only get the vmstat graph for AIX - I'm missing vmstat0, vmstat2, vmstat3, and vmstat8. Where do I enable the graphing for these?

And while I'm at it -- I need to disable the tracking and displaying of disk filesystem usage data -- virtually all my filesystems contain Oracle tablespaces and they are 100% full at the OS level shortly after creation -- so I can't see any reason to track them.

Other than that -- looks great!

Tom Kauffman

list Tom Kauffman · Fri, 29 Apr 2005 13:54:32 -0500 ·

Nope -- the only error is on one system on the netstat rrd -- something
I was trying a long time ago and killed off but never pulled the bottom
feeder.

TK

▸ quoted from Eric E *hs Schwimmer

-----Original Message-----
From: Schwimmer, Eric E *HS [mailto:user-1e1008b069d5@xymon.invalid] 
Sent: Friday, April 29, 2005 1:46 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] some larrd issues on hobbit 4.0.3 rc1

Tom,

Are there any pertinent entries in your larrd-status.log? Something
along the lines of "expected 16 data source readings (got 17)"?  I've
got the same thing happening on many of my Fedora Core 3 boxes.

Regards,
-Eric

-----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid] 
Sent: Friday, April 29, 2005 2:28 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] some larrd issues on hobbit 4.0.3 rc1

OK -- I chickened out and haven't thrown the Big Red Switch yet -- but I
have hobbit running (display functions only) on my failover system and
most everything looks good -- except (there's that ugly word) my vmstat
graphs for my AIX systems.

Something is quite wrong, and I'm not sure what to look at.

Here's what the vmstat bottom feeder ships out:

aix
 1  3 2342710   511   0   1   1 2249 14879   0 1964 11870 4215 15 11 31
42

so cpu_usr is 15, cpu_sys is 11, cpu_idl is 31, and cpu_wait is 42.

But the vmstat graph is giving me a system of 0.0, user 1670.0, and idle
of 2347945.2.

There are no errors in larrd-status.log or larrd-data.log. And this
seems to be happening to all my AIX systems -- cpu_idle and/or cpu_user
are massively inflated. Any ideas of where to look? (My two suse linux
systems look correct).

Also -- I only get the vmstat graph for AIX - I'm missing vmstat0,
vmstat2, vmstat3, and vmstat8. Where do I enable the graphing for these?

And while I'm at it -- I need to disable the tracking and displaying of
disk filesystem usage data -- virtually all my filesystems contain
Oracle tablespaces and they are 100% full at the OS level shortly after
creation -- so I can't see any reason to track them.

Other than that -- looks great!

Tom Kauffman

list Tom Kauffman · Fri, 29 Apr 2005 14:01:00 -0500 ·

It has been one of those weeks -- the primary reason I'm waiting for
Monday to go live with this.

▸ quoted from Tom Kauffman


TK

-----Original Message-----
From: Schwimmer, Eric E *HS [mailto:user-1e1008b069d5@xymon.invalid] Sent: Friday, April 29, 2005 1:51 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] some larrd issues on hobbit 4.0.3 rc1


Now that I've read your whole message ... *slaps himself*

Man, I'm glad its Friday.

e

-----Original Message-----
From: Schwimmer, Eric E *HS [mailto:user-1e1008b069d5@xymon.invalid] Sent: Friday, April 29, 2005 2:46 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: RE: [hobbit] some larrd issues on hobbit 4.0.3 rc1

Tom,

Are there any pertinent entries in your larrd-status.log? Something along the lines of "expected 16 data source readings (got 17)"?  I've got the same thing happening on many of my Fedora Core 3 boxes.

Regards,
-Eric
 -----Original Message-----
From: Kauffman, Tom [mailto:user-3feba9e60a8b@xymon.invalid]
Sent: Friday, April 29, 2005 2:28 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: [hobbit] some larrd issues on hobbit 4.0.3 rc1

OK -- I chickened out and haven't thrown the Big Red Switch yet -- but I have hobbit running (display functions only) on my failover system and most everything looks good -- except (there's that ugly word) my vmstat graphs for my AIX systems.

Something is quite wrong, and I'm not sure what to look at.

Here's what the vmstat bottom feeder ships out:

aix
 1  3 2342710   511   0   1   1 2249 14879   0 1964 11870 4215 15 11 31
42

so cpu_usr is 15, cpu_sys is 11, cpu_idl is 31, and cpu_wait is 42.

But the vmstat graph is giving me a system of 0.0, user 1670.0, and idle of 2347945.2.

There are no errors in larrd-status.log or larrd-data.log. And this seems to be happening to all my AIX systems -- cpu_idle and/or cpu_user are massively inflated. Any ideas of where to look? (My two suse linux systems look correct).

Also -- I only get the vmstat graph for AIX - I'm missing vmstat0, vmstat2, vmstat3, and vmstat8. Where do I enable the graphing for these?

And while I'm at it -- I need to disable the tracking and displaying of disk filesystem usage data -- virtually all my filesystems contain Oracle tablespaces and they are 100% full at the OS level shortly after creation -- so I can't see any reason to track them.

Other than that -- looks great!

Tom Kauffman

list Henrik Størner · Fri, 29 Apr 2005 23:41:01 +0200 ·

▸ quoted from Tom Kauffman

On Fri, Apr 29, 2005 at 01:27:49PM -0500, Kauffman, Tom wrote:

OK -- I chickened out and haven't thrown the Big Red Switch yet -- but I
have hobbit running (display functions only) on my failover system and
most everything looks good -- except (there's that ugly word) my vmstat
graphs for my AIX systems.

Something is quite wrong, and I'm not sure what to look at.

Here's what the vmstat bottom feeder ships out:

aix
 1  3 2342710   511   0   1   1 2249 14879   0 1964 11870 4215 15 11 31 42

so cpu_usr is 15, cpu_sys is 11, cpu_idl is 31, and cpu_wait is 42.

But the vmstat graph is giving me a system of 0.0, user 1670.0, and idle
of 2347945.2.

I've looked over the AIX setup for vmstat, and it should parse this
output from the bottomfeeder OK.

Is this RRD file one that you have copied over from the BB/LARRD setup ?
If yes, do the graphs make more sense if you delete the file from
~hobbit/data/rrd/HOSTNAME/vmstat.rrd and have Hobbit re-create the file
(it does that automatically) ?

I'd like to have a look at that vmstat.rrd file to see if the problem is
one of different data-set definitions in the Hobbit vs. LARRD version.
I know the vmstat definitions are different on some systems, but I
cannot remember if I changed them for AIX also.

▸ quoted from Tom Kauffman

Also -- I only get the vmstat graph for AIX - I'm missing vmstat0,
vmstat2, vmstat3, and vmstat8. Where do I enable the graphing for these?

The data for them are being tracked, but by default only one vmstat
graph is shown. Add
"LARRD:*,vmstat:vmstat|vmstat0|vmstat2|vmstat3|vmstat8" to the entries
in the bb-hosts where you want these.

▸ quoted from Tom Kauffman

And while I'm at it -- I need to disable the tracking and displaying of
disk filesystem usage data -- virtually all my filesystems contain
Oracle tablespaces and they are 100% full at the OS level shortly after
creation -- so I can't see any reason to track them.

There's no way to turn off tracking of the data for disk reports (unless
you configure the client not to send them, of course).


Henrik

list Tom Kauffman · Sat, 30 Apr 2005 18:22:56 -0500 ·

These vmstat rrds are from back on larrd 42; just after the change to
accumulate cpu wait. So I'm trying the vmstat recreate to see if the
definitions I've got are severely non-standard (I'd almost bet on it)

If so, I'll see what I need to hack to keep my current data . . .

On the disk space rrds -- this is a lot of wasted activity for us; we
have about 8 filesystems we care about, and my production R3 DB server
currently has 95 filesystems that have been 100% full since creation --
and we add another 10 every 13 months (150 GB -- SAP just *eats* disk).

If I can't suppress the creation, I'd at least like to suppress the
display of the graphs -- they're meaningless noise in our shop.

Not a showstopper for going live Monday, though -- If I can figure out
the vmstat issue.

Tom Kauffman
NIBCO

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Friday, April 29, 2005 4:41 PM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] some larrd issues on hobbit 4.0.3 rc1

On Fri, Apr 29, 2005 at 01:27:49PM -0500, Kauffman, Tom wrote:

OK -- I chickened out and haven't thrown the Big Red Switch yet -- but

have hobbit running (display functions only) on my failover system and
most everything looks good -- except (there's that ugly word) my
vmstat
graphs for my AIX systems.

Something is quite wrong, and I'm not sure what to look at.

Here's what the vmstat bottom feeder ships out:

aix
 1  3 2342710   511   0   1   1 2249 14879   0 1964 11870 4215 15 11

31 42

so cpu_usr is 15, cpu_sys is 11, cpu_idl is 31, and cpu_wait is 42.

But the vmstat graph is giving me a system of 0.0, user 1670.0, and
idle
of 2347945.2.

I've looked over the AIX setup for vmstat, and it should parse this
output from the bottomfeeder OK.

Is this RRD file one that you have copied over from the BB/LARRD setup ?
If yes, do the graphs make more sense if you delete the file from
~hobbit/data/rrd/HOSTNAME/vmstat.rrd and have Hobbit re-create the file
(it does that automatically) ?

I'd like to have a look at that vmstat.rrd file to see if the problem is
one of different data-set definitions in the Hobbit vs. LARRD version.
I know the vmstat definitions are different on some systems, but I
cannot remember if I changed them for AIX also.

Also -- I only get the vmstat graph for AIX - I'm missing vmstat0,
vmstat2, vmstat3, and vmstat8. Where do I enable the graphing for
these?

The data for them are being tracked, but by default only one vmstat
graph is shown. Add
"LARRD:*,vmstat:vmstat|vmstat0|vmstat2|vmstat3|vmstat8" to the entries
in the bb-hosts where you want these.

And while I'm at it -- I need to disable the tracking and displaying
of
disk filesystem usage data -- virtually all my filesystems contain
Oracle tablespaces and they are 100% full at the OS level shortly
after
creation -- so I can't see any reason to track them.

There's no way to turn off tracking of the data for disk reports (unless
you configure the client not to send them, of course).


Henrik

list Henrik Størner · Sun, 1 May 2005 09:11:42 +0200 ·

▸ quoted from Tom Kauffman

On Sat, Apr 30, 2005 at 06:22:56PM -0500, Kauffman, Tom wrote:

These vmstat rrds are from back on larrd 42; just after the change to
accumulate cpu wait. So I'm trying the vmstat recreate to see if the
definitions I've got are severely non-standard (I'd almost bet on it)

Use the "rrdtool dump FILENAME.rrd" to dump the old data into a
text file (XML) format. When you look at this file, at the top you'll 
find the data-sets definitions that LARRD has setup in this RRD file;
these come from the "aix" definition in the old LARRD vmstat-larrd.pl
script. So there should be (in sequence): cpu_r, cpu_b, mem_avm,
mem_free, mem_re, mem_pi, mem_po, mem_fr, sr, mem_cy, cpu_int, cpu_syc,
cpu_csw, cpu_usr, cpu_sys, cpu_idl, cpu_wait - at least, that's what 
Hobbit would generate, and therefore it assumes this layout when
updating the RRD-file.

Since the files are being updated by Hobbit, but the data collected is
wrong, my guess is that you have these in a different sequence than
Hobbit expects.

There are two way of tackling that problem. 

One way is to change the Hobbit layout to match your current RRD files.
This layout is defined in the hobbit-4.0.3rc1/hobbitd/larrd/do_vmstat.c
file - just look for "aix" and you'll see it. Only problem with this is
that you'll need to repeat this change whenever you upgrade Hobbit.

The other way is to modify the dumped RRD-file, then use "rrdtool restore" 
to convert the modified XML-file back to an RRD file.

You need to change the sequence of the dataset definitions at the
beginning of the file, and also change each of the data "rows" that make
up the bulk of the file. These look like this:

<!-- 2005-05-01 02:00:00 CEST / 1114905600 --><row><v> 1.5896990741e-01 </v><v>2.1686840278e+02 </v><v> 9.5610891204e+01 </v><v> 3.5725331019e+02</v><v> 1.0420138889e-01 </v><v>8.3974537037e-01 </v><v> 3.3892245370e+00</v><v> 3.3494723380e+02 </v><v>9.9369259259e+01 </v><v> 1.0934771532e+05</v><v> 3.8053798435e+05 </v><v> 8.1690244444e+03 </v><v> 2.7122800926e+00 </v><v>1.2084837963e+00 </v><v> 2.1852577870e+05</v></row>

Each of the "<v> VALUE </v>" appear in the sequence that the datasets
are defined. So you must swap values around to match the new layout.

▸ quoted from Tom Kauffman

On the disk space rrds -- this is a lot of wasted activity for us; we
have about 8 filesystems we care about, and my production R3 DB server
currently has 95 filesystems that have been 100% full since creation --
and we add another 10 every 13 months (150 GB -- SAP just *eats* disk).

I see - perhaps something like the attached patch could be used. With
this, you can setup two environment variables that are regexp patterns
that the filesystem name is matched against before they get graphed;
NORRDDISKS is an "exclude" pattern - any filesystem name matching this 
do not get a graph, RDDISKS is an "include" pattern - only filesystem 
names matching this pattern get graphed. You can use none of them (the
current behaviour), one of them or both.

E.g. if all of your SAP filesystems are mounted below "/sap", you would
just put
  NORRDDISKS="^/sap"
in hobbitserver.cfg, and they won't get graphed.

This doesn't affect any of the RRD files that have already been created,
so you must manually clean out the unwanted disk*.rrd files from the
~hobbit/data/rrd/HOSTNAME/ directory to get rid of the graphs you don't
want.


Henrik

-------------- next part --------------
--- hobbitd/larrd/do_disk.c	2005/04/10 11:09:06	1.22
+++ hobbitd/larrd/do_disk.c	2005/05/01 06:57:19
@@ -13,12 +13,32 @@
 static char *disk_params[] = { "rrdcreate", rrdfn, "DS:pct:GAUGE:600:0:100", "DS:used:GAUGE:600:0:U", 
 				rra1, rra2, rra3, rra4, NULL };
 
-/* This is ported almost directly from disk-larrd.pl */
 
 int do_disk_larrd(char *hostname, char *testname, char *msg, time_t tstamp)
 {
 	enum { DT_IRIX, DT_AS400, DT_NT, DT_UNIX, DT_NETAPP, DT_NETWARE } dsystype;
 	char *eoln, *curline;
+	static int ptnsetup = 0;
+	static pcre *inclpattern = NULL;
+	static pcre *exclpattern = NULL;
• +	if (!ptnsetup) {
+		const char *errmsg;
+		int errofs;
+		char *ptn;
• +		ptnsetup = 1;
+		ptn = getenv("RRDDISKS");
+		if (ptn) {
+			inclpattern = pcre_compile(ptn, PCRE_CASELESS, &errmsg, &errofs, NULL);
+			if (!inclpattern) errprintf("PCRE compile of RRDDISKS='%s' failed\n", ptn);
+		}
+		ptn = getenv("NORRDDISKS");
+		if (ptn) {
+			exclpattern = pcre_compile(ptn, PCRE_CASELESS, &errmsg, &errofs, NULL);
+			if (!inclpattern) errprintf("PCRE compile of NORRDDISKS='%s' failed\n", ptn);
+		}
+	}
 
 	if (strstr(msg, " xfs ") || strstr(msg, " efs ") || strstr(msg, " cxfs ")) dsystype = DT_IRIX;
 	else if (strstr(msg, "DASD")) dsystype = DT_AS400;
@@ -34,6 +54,7 @@
 		int columncount;
 		char *diskname = NULL;
 		int pused = -1;
+		int wanteddisk = 1;
 		unsigned long long aused = 0;
 
 		eoln = strchr(curline, '\n'); if (eoln) *eoln = '\0';
@@ -64,13 +85,12 @@
 		switch (dsystype) {
 		  case DT_IRIX:
 			diskname = xstrdup(columns[6]);
-			p = diskname; while ((p = strchr(p, '/')) != NULL) { *p = ','; }
 			p = strchr(columns[5], '%'); if (p) *p = ' ';
 			pused = atoi(columns[5]);
 			aused = atoi(columns[3]);
 			break;
 		  case DT_AS400:
-			diskname = xstrdup(",DASD");
+			diskname = xstrdup("/DASD");
 			p = strchr(columns[columncount-1], '%'); if (p) *p = ' ';
 			/* 
 			 * Yikes ... the format of this line varies depending on the color.
@@ -88,21 +108,19 @@
 			break;
 		  case DT_NT:
 			diskname = xmalloc(strlen(columns[0])+2);
-			sprintf(diskname, ",%s", columns[0]);
+			sprintf(diskname, "/%s", columns[0]);
 			p = strchr(columns[4], '%'); if (p) *p = ' ';
 			pused = atoi(columns[4]);
 			aused = atoi(columns[2]);
 			break;
 		  case DT_UNIX:
 			diskname = xstrdup(columns[5]);
-			p = diskname; while ((p = strchr(p, '/')) != NULL) { *p = ','; }
 			p = strchr(columns[4], '%'); if (p) *p = ' ';
 			pused = atoi(columns[4]);
 			aused = atoi(columns[2]);
 			break;
 		  case DT_NETAPP:
 			diskname = xstrdup(columns[1]);
-			p = diskname; while ((p = strchr(p, '/')) != NULL) { *p = ','; }
 			pused = atoi(columns[5]);
 			p = columns[3] + strspn(columns[3], "0123456789");
 			aused = atoll(columns[3]);
@@ -113,13 +131,34 @@
 			break;
 		  case DT_NETWARE:
 			diskname = xstrdup(columns[1]);
-			p = diskname; while ((p = strchr(p, '/')) != NULL) { *p = ','; }
 			aused = atoll(columns[3]);
 			pused = atoi(columns[7]);
 			break;
 		}
 
-		if (diskname && (pused != -1)) {
+		/* Check include/exclude patterns */
+		wanteddisk = 1;
+		if (exclpattern) {
+			int ovector[30];
+			int result;
• +			result = pcre_exec(exclpattern, NULL, diskname, strlen(diskname),
+					   0, 0, ovector, (sizeof(ovector)/sizeof(int)));
• +			wanteddisk = (result < 0);
+		}
+		if (wanteddisk && inclpattern) {
+			int ovector[30];
+			int result;
• +			result = pcre_exec(inclpattern, NULL, diskname, strlen(diskname),
+					   0, 0, ovector, (sizeof(ovector)/sizeof(int)));
• +			wanteddisk = (result >= 0);
+		}
• +		if (wanteddisk && diskname && (pused != -1)) {
+			p = diskname; while ((p = strchr(p, '/')) != NULL) { *p = ','; }
 			if (strcmp(diskname, ",") == 0) {
 				diskname = xrealloc(diskname, 6);
 				strcpy(diskname, ",root");

list Henrik Størner · Sun, 1 May 2005 09:15:42 +0200 ·

▸ quoted from Henrik Størner

On Sun, May 01, 2005 at 09:11:42AM +0200, Henrik Stoerner wrote:

One way is to change the Hobbit layout to match your current RRD files.
This layout is defined in the hobbit-4.0.3rc1/hobbitd/larrd/do_vmstat.c
file - just look for "aix" and you'll see it.

If you do use this method, just shuffle the lines around in the "aix"
definition - dont change the numbers, because they are used to parse the
vmstat output the client sends. And keep the "-1, NULL" line last.

-- 
Henrik Storner

list Tom Kauffman · Mon, 2 May 2005 09:36:49 -0500 ·

Well, I see what is going on, and I don't want to try to explain it --

The vmstat rrd for all my AIX systems has the data elements defined in
alpha sequence: cpu_b, cpu_csw, cpu_idl, cpu_int, cpu_r, cpu_syc,
cpu_sys, cpu_usr, cpu_wait, mem_avm, mem_cy, mem_fr, mem_free, mem_pi,
mem_po, mem_re, and mem_sr.

Hmm. Seems that the vmstat rrds for my two SuSE linux boxes are also in
alpha sequence. I'm not real strong in perl, but I suspect this is the
culprit: foreach $col ( sort keys %{$htovm{$bbhosttype}}) {
                push @ds,"DS:$col:GAUGE:600:0:U";

The definitions in $htovm are field name followed by index number.

Short term, I'm going to change do_vmstat.c, while I research writing
some code to re-work the rrd xml (too many changes and too many rrds to
try it by hand).

The disk patch is much appreciated; our primary R3 DB server should drop
from 31 disk graphs down to three just by excluding
'/oracle/PRD/sapdata*'.

Tom

▸ quoted from Henrik Størner


-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Sunday, May 01, 2005 2:12 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] some larrd issues on hobbit 4.0.3 rc1

On Sat, Apr 30, 2005 at 06:22:56PM -0500, Kauffman, Tom wrote:

These vmstat rrds are from back on larrd 42; just after the change to
accumulate cpu wait. So I'm trying the vmstat recreate to see if the
definitions I've got are severely non-standard (I'd almost bet on it)

Use the "rrdtool dump FILENAME.rrd" to dump the old data into a
text file (XML) format. When you look at this file, at the top you'll 
find the data-sets definitions that LARRD has setup in this RRD file;
these come from the "aix" definition in the old LARRD vmstat-larrd.pl
script. So there should be (in sequence): cpu_r, cpu_b, mem_avm,
mem_free, mem_re, mem_pi, mem_po, mem_fr, sr, mem_cy, cpu_int, cpu_syc,
cpu_csw, cpu_usr, cpu_sys, cpu_idl, cpu_wait - at least, that's what 
Hobbit would generate, and therefore it assumes this layout when
updating the RRD-file.

Since the files are being updated by Hobbit, but the data collected is
wrong, my guess is that you have these in a different sequence than
Hobbit expects.

There are two way of tackling that problem. 

One way is to change the Hobbit layout to match your current RRD files.
This layout is defined in the hobbit-4.0.3rc1/hobbitd/larrd/do_vmstat.c
file - just look for "aix" and you'll see it. Only problem with this is
that you'll need to repeat this change whenever you upgrade Hobbit.

The other way is to modify the dumped RRD-file, then use "rrdtool
restore" 
to convert the modified XML-file back to an RRD file.

You need to change the sequence of the dataset definitions at the
beginning of the file, and also change each of the data "rows" that make
up the bulk of the file. These look like this:

<!-- 2005-05-01 02:00:00 CEST / 1114905600 --><row><v> 1.5896990741e-01
</v><v>2.1686840278e+02 </v><v> 9.5610891204e+01 </v><v>
3.5725331019e+02</v><v> 1.0420138889e-01 </v><v>8.3974537037e-01 </v><v>
3.3892245370e+00</v><v> 3.3494723380e+02 </v><v>9.9369259259e+01 </v><v>
1.0934771532e+05</v><v> 3.8053798435e+05 </v><v> 8.1690244444e+03
</v><v> 2.7122800926e+00 </v><v>1.2084837963e+00 </v><v>
2.1852577870e+05</v></row>

Each of the "<v> VALUE </v>" appear in the sequence that the datasets
are defined. So you must swap values around to match the new layout.

On the disk space rrds -- this is a lot of wasted activity for us; we
have about 8 filesystems we care about, and my production R3 DB server
currently has 95 filesystems that have been 100% full since creation

--

and we add another 10 every 13 months (150 GB -- SAP just *eats*
disk).

I see - perhaps something like the attached patch could be used. With
this, you can setup two environment variables that are regexp patterns
that the filesystem name is matched against before they get graphed;
NORRDDISKS is an "exclude" pattern - any filesystem name matching this 
do not get a graph, RDDISKS is an "include" pattern - only filesystem 
names matching this pattern get graphed. You can use none of them (the
current behaviour), one of them or both.

E.g. if all of your SAP filesystems are mounted below "/sap", you would
just put
  NORRDDISKS="^/sap"
in hobbitserver.cfg, and they won't get graphed.

This doesn't affect any of the RRD files that have already been created,
so you must manually clean out the unwanted disk*.rrd files from the
~hobbit/data/rrd/HOSTNAME/ directory to get rid of the graphs you don't
want.


Henrik

list Henrik Størner · Mon, 2 May 2005 16:57:07 +0200 ·

▸ quoted from Tom Kauffman

On Mon, May 02, 2005 at 09:36:49AM -0500, Kauffman, Tom wrote:

Well, I see what is going on, and I don't want to try to explain it --

The vmstat rrd for all my AIX systems has the data elements defined in
alpha sequence: cpu_b, cpu_csw, cpu_idl, cpu_int, cpu_r, cpu_syc,
cpu_sys, cpu_usr, cpu_wait, mem_avm, mem_cy, mem_fr, mem_free, mem_pi,
mem_po, mem_re, and mem_sr.

Hmm. Seems that the vmstat rrds for my two SuSE linux boxes are also in
alpha sequence. I'm not real strong in perl, but I suspect this is the
culprit: foreach $col ( sort keys %{$htovm{$bbhosttype}}) {
                push @ds,"DS:$col:GAUGE:600:0:U";

Perl is definitely not my strong side either - one more reason why I
abaondoned the LARRD perl-scripts.

I looked at the one remaining BB server I have, and it seems you're
right - LARRD really does sort the datasets alphabetically by name when
it sets up the vmstat RRD file. It seems it does this only for the
vmstat data.

▸ quoted from Tom Kauffman

Short term, I'm going to change do_vmstat.c, while I research writing
some code to re-work the rrd xml (too many changes and too many rrds to
try it by hand).

If you do come up with a tool for this, I would appreciate it if I may
distribute it with Hobbit.

▸ quoted from Tom Kauffman

The disk patch is much appreciated; our primary R3 DB server should drop
from 31 disk graphs down to three just by excluding '/oracle/PRD/sapdata*'.

Dont forget that this is a regexp - I think you meant '/oracle/PRD/sapdata.*'  
with a dot before the asterisk.


Henrik

list Tom Kauffman · Mon, 2 May 2005 10:58:35 -0500 ·

IIRC, the vmstat graphs were among the very first items to be set up in
larrd, and he was learning rrd at the time. Now don't hold your breath,
but I will pass the conversion tool along to you whenever I get it
working. It's just not going to be all that high on my list. This is the
year we swap out virtually every AIX system we've got because the lease
is up, and the fun starts early next month.

▸ quoted from Henrik Størner


Tom

-----Original Message-----
From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] 
Sent: Monday, May 02, 2005 9:57 AM
To: user-ae9b8668bcde@xymon.invalid
Subject: Re: [hobbit] some larrd issues on hobbit 4.0.3 rc1

On Mon, May 02, 2005 at 09:36:49AM -0500, Kauffman, Tom wrote:

Well, I see what is going on, and I don't want to try to explain it --

The vmstat rrd for all my AIX systems has the data elements defined in
alpha sequence: cpu_b, cpu_csw, cpu_idl, cpu_int, cpu_r, cpu_syc,
cpu_sys, cpu_usr, cpu_wait, mem_avm, mem_cy, mem_fr, mem_free, mem_pi,
mem_po, mem_re, and mem_sr.

Hmm. Seems that the vmstat rrds for my two SuSE linux boxes are also
in
alpha sequence. I'm not real strong in perl, but I suspect this is the
culprit: foreach $col ( sort keys %{$htovm{$bbhosttype}}) {
                push @ds,"DS:$col:GAUGE:600:0:U";

Perl is definitely not my strong side either - one more reason why I
abaondoned the LARRD perl-scripts.

I looked at the one remaining BB server I have, and it seems you're
right - LARRD really does sort the datasets alphabetically by name when
it sets up the vmstat RRD file. It seems it does this only for the
vmstat data.

Short term, I'm going to change do_vmstat.c, while I research writing
some code to re-work the rrd xml (too many changes and too many rrds
to
try it by hand).

If you do come up with a tool for this, I would appreciate it if I may
distribute it with Hobbit.

The disk patch is much appreciated; our primary R3 DB server should
drop
from 31 disk graphs down to three just by excluding

'/oracle/PRD/sapdata*'.  

Dont forget that this is a regexp - I think you meant
'/oracle/PRD/sapdata.*'  
with a dot before the asterisk.


Henrik

list Tom Kauffman · Mon, 2 May 2005 11:44:11 -0500 ·

OK --

Re-ordering the AIX and linux entries in do_vmstat.c and copying the
vmstat rrds from my other BB system fixed the problem. Thanks for the
help.

And the do_disk.c patch is MUCH appreciated! I'm down to nine disk
graphs on my R3 db server (and it has two clones that will be affected
as well)

Thanks!

Tom

list Henrik Størner · Mon, 2 May 2005 18:45:23 +0200 ·

▸ quoted from Tom Kauffman

On Mon, May 02, 2005 at 10:58:35AM -0500, Kauffman, Tom wrote:

IIRC, the vmstat graphs were among the very first items to be set up in
larrd, and he was learning rrd at the time. Now don't hold your breath,
but I will pass the conversion tool along to you whenever I get it
working.

I've dug into the RRDtool docs, and it *should* be possible to work
around this in Hobbit, by passing an extra option to the rrd_update
routine telling it what data is being provided in what sequence, and
have it figure out how to shuffle things around.

It just doesn't work ... 

Whether that's due to my testing with RRDtool 1.2.x or a bug in my code
remains to be determined.

▸ quoted from Tom Kauffman

It's just not going to be all that high on my list. This is the
year we swap out virtually every AIX system we've got because the lease
is up, and the fun starts early next month.

Well, it wasn't high on my list either, so I'm not complaining :-)


Henrik

some larrd issues on hobbit 4.0.3 rc1 🔗 link

some larrd issues on hobbit 4.0.3 rc1