Xymon Mailing List Archive search

Generated Reports coredump

4 messages in this thread

list Chris Morris · Wed, 31 May 2006 16:44:24 +0100 ·
Hi,

This problem has been outstanding for quite a while now on AIX. 

When running the pre-generated reports, bbgen coredumps.

I have run /usr/local/hobbit/server/bin/bbcmd
--env=/usr/local/hobbit/server/etc/hobbitserver.cfg hobbitreports.sh weekly
with a --debug on the bbgen command and get output like :-

2006-05-31 16:19:00 find_or_create_column(hobbitd)
2006-05-31 16:19:00 Skipped to entry starting 1133359651
2006-05-31 16:19:00 Got entry starting 1133359351 lasting 300
2006-05-31 16:19:00 Got entry starting 1133359651 lasting 3112
2006-05-31 16:19:00 Got entry starting 1133362763 lasting 301
2006-05-31 16:19:00 Got entry starting 1133363064 lasting 412240
2006-05-31 16:19:00 Got entry starting 1133775304 lasting 301
2006-05-31 16:19:00 Got entry starting 1133775605 lasting 3915090
2006-05-31 16:19:00 Got entry starting 1137690695 lasting 0
2006-05-31 16:19:00 Got entry starting 1137690695 lasting 7770444
2006-05-31 16:19:00 Got entry starting 1145461139 lasting 300
2006-05-31 16:19:00 Got entry starting 1145461439 lasting 493186
2006-05-31 16:19:00 Got entry starting 1145954625 lasting 301
2006-05-31 16:19:00 Got entry starting 1145954926 lasting 8235
2006-05-31 16:19:00 Got entry starting 1145963161 lasting 301
2006-05-31 16:19:00 Got entry starting 1145963462 lasting 1986143
2006-05-31 16:19:00 Got entry starting 1147949605 lasting 301
2006-05-31 16:19:00 Got entry starting 1147949906 lasting 449557
2006-05-31 16:19:00 Got entry starting 1148399463 lasting 300
2006-05-31 16:19:00 Got entry starting 1148399763 lasting 71611
2006-05-31 16:19:00 Reporting starts with this entry: Tue May 23 16:56:03
2006 green 1148399763 71611

2006-05-31 16:19:00 In-range entry starting 1148425200 lasting 46174 color
0: Tue May 23 16:56:03 2006 green 1148399763 71611
2006-05-31 16:19:00 In-range entry starting 1148471374 lasting 301 color 4:
Wed May 24 12:49:34 2006 yellow 1148471374 301
2006-05-31 16:19:00 Looking at history logfile
/usr/local/hobbit/data/histlogs/tru2407/hobbitd/Wed_May_24_12:49:34_2006
/usr/local/hobbit/server/bin/hobbitreports.sh: line 83: 26868 IOT/Abort trap
(core dumped) BBWEB=$REPORTTOPURL/$REPDIR $BBHOME/bin/bbgen --debug
--reportopts=$STIME:$ETIME:0:nongr $BBGENREPOPTS $REPORTTOPDIR/$REPDIR

If I remove the bbgen & hobbitd history logs, then the report runs through
ok.

Does this give any more clues?

Regards,

Chris


****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************
list Henrik Størner · Wed, 31 May 2006 17:58:00 +0200 ·
quoted from Chris Morris
On Wed, May 31, 2006 at 04:44:24PM +0100, Morris, Chris (Shared Services) wrote:
This problem has been outstanding for quite a while now on AIX. 

When running the pre-generated reports, bbgen coredumps.
2006-05-31 16:19:00 Looking at history logfile
/usr/local/hobbit/data/histlogs/tru2407/hobbitd/Wed_May_24_12:49:34_2006
/usr/local/hobbit/server/bin/hobbitreports.sh: line 83: 26868 IOT/Abort trap
If I remove the bbgen & hobbitd history logs, then the report runs through
ok.
Sounds like there's something odd in the file
/usr/local/hobbit/data/histlogs/tru2407/hobbitd/Wed_May_24_12:49:34_2006

Could you send me (directly, not through the list) a tarfile with the
following files from /usr/local/hobbit/data/ :

  hist/tru2407*
  histlogs/tru2407/hobbitd/*
  histlogs/tru2407/bbgen/*


Thanks,
Henrik
list Henrik Størner · Fri, 2 Jun 2006 22:57:04 +0200 ·
quoted from Henrik Størner
On Wed, May 31, 2006 at 04:44:24PM +0100, Morris, Chris (Shared Services) wrote:
This problem has been outstanding for quite a while now on AIX. 

When running the pre-generated reports, bbgen coredumps.
I think this is related to how bbgen looks into historical logs to try
and extract the cause of why a status has gone red. The only problem is
that I simply cannot understand how the code is supposed to work ...

The patch below solves the crash. Extracting the interesting bit of data
from the historical logfile is tricky, but it does work in most cases.


Regards,
Henrik

-------------- next part --------------
--- ./lib/availability.c	2006/05/02 12:07:00	1.40
+++ ./lib/availability.c	2006/06/02 20:49:33
@@ -153,11 +153,11 @@
 
 static char *parse_histlogfile(char *hostname, char *servicename, char *timespec)
 {
+	char cause[MAX_LINE_LEN];
 	char fn[PATH_MAX];
 	char *p;
 	FILE *fd;
 	char l[MAX_LINE_LEN];
-	char cause[MAX_LINE_LEN];
 	int causefull = 0;
 
 	cause[0] = '\0';
@@ -182,16 +182,28 @@
 			}
 		}
 
+#if 1
+		if (strlen(cause) == 0) {
+			strcpy(cause, "See detailed log");
+		}
+#else
+		/* What is this code supposed to do ? The sscanf seemingly never succeeds */
+		/* storner, 2006-06-02 */
 		if (strlen(cause) == 0) {
 			int offset;
 			rewind(fd);
 			if (fgets(l, sizeof(l), fd)) {
 				p = strchr(l, '\n'); if (p) *p = '\0';
-				sscanf(l, "%*s %*s %*s %*s %*s %*s %*s %n", &offset);
-				strncpy(cause, l+offset, sizeof(cause));
+				if (sscanf(l, "%*s %*s %*s %*s %*s %*s %*s %n", &offset) == 1) {
+					strncpy(cause, l+offset, sizeof(cause));
+				}
+				else {
+					errprintf("Scan of file %s failed, l='%s'\n", fn, l);
+				}
 				cause[sizeof(cause)-1] = '\0';
 			}
 		}
+#endif
 
 		if (causefull) {
 			cause[sizeof(cause) - strlen(" [Truncated]") - 1] = '\0';
--- ./bbdisplay/loadbbhosts.c	2006/05/29 15:33:19	1.44
+++ ./bbdisplay/loadbbhosts.c	2006/06/02 20:27:28
@@ -53,7 +53,7 @@
 {
 	bbpagelist_t *newitem;
 
-	newitem = (bbpagelist_t *) malloc(sizeof(bbpagelist_t));
+	newitem = (bbpagelist_t *) calloc(1, sizeof(bbpagelist_t));
 	newitem->pageentry = page;
 	newitem->next = pagelisthead;
 	pagelisthead = newitem;
@@ -114,7 +114,7 @@
 
 bbgen_page_t *init_page(char *name, char *title)
 {
-	bbgen_page_t *newpage = (bbgen_page_t *) malloc(sizeof(bbgen_page_t));
+	bbgen_page_t *newpage = (bbgen_page_t *) calloc(1, sizeof(bbgen_page_t));
 
 	pagecount++;
 	dprintf("init_page(%s, %s)\n", textornull(name), textornull(title));
@@ -143,7 +143,7 @@
 
 group_t *init_group(char *title, char *onlycols, char *exceptcols)
 {
-	group_t *newgroup = (group_t *) malloc(sizeof(group_t));
+	group_t *newgroup = (group_t *) calloc(1, sizeof(group_t));
 
 	dprintf("init_group(%s, %s)\n", textornull(title), textornull(onlycols));
 
@@ -177,7 +177,7 @@
 		  char *nopropyellowtests, char *nopropredtests, char *noproppurpletests, char *nopropacktests,
 		  int modembanksize)
 {
-	host_t 		*newhost = (host_t *) malloc(sizeof(host_t));
+	host_t 		*newhost = (host_t *) calloc(1, sizeof(host_t));
 	hostlist_t	*oldlist;
 
 	hostcount++;
@@ -260,7 +260,7 @@
 	newhost->banksize = modembanksize;
 	if (modembanksize) {
 		int i;
-		newhost->banks = (int *) malloc(modembanksize * sizeof(int));
+		newhost->banks = (int *) calloc(modembanksize, sizeof(int));
 		for (i=0; i<modembanksize; i++) newhost->banks[i] = -1;
 
 		if (comment) {
@@ -286,13 +286,13 @@
 	if (oldlist == NULL) {
 		hostlist_t *newlist;
 
-		newlist = (hostlist_t *) malloc(sizeof(hostlist_t));
+		newlist = (hostlist_t *) calloc(1, sizeof(hostlist_t));
 		newlist->hostentry = newhost;
 		newlist->clones = NULL;
 		add_to_hostlist(newlist);
 	}
 	else {
-		hostlist_t *clone = (hostlist_t *) malloc(sizeof(hostlist_t));
+		hostlist_t *clone = (hostlist_t *) calloc(1, sizeof(hostlist_t));
 
 		dprintf("Duplicate host definition for host '%s'\n", hostname);
 
@@ -417,7 +417,7 @@
 	if ((name == NULL) || (receiver == NULL) || (url == NULL)) 
 		return NULL;
 
-	newsum = (summary_t *) malloc(sizeof(summary_t));
+	newsum = (summary_t *) calloc(1, sizeof(summary_t));
 	newsum->name = strdup(name);
 	newsum->receiver = strdup(receiver);
 	newsum->url = strdup(url);
--- ./bbdisplay/loaddata.c	2006/05/19 12:02:55	1.161
+++ ./bbdisplay/loaddata.c	2006/06/02 20:24:03
@@ -188,8 +188,8 @@
 		return NULL;
 	}
 
-	newstate = (state_t *) malloc(sizeof(state_t));
-	newstate->entry = (entry_t *) malloc(sizeof(entry_t));
+	newstate = (state_t *) calloc(1, sizeof(state_t));
+	newstate->entry = (entry_t *) calloc(1, sizeof(entry_t));
 	newstate->next = NULL;
 
 	newstate->entry->column = find_or_create_column(testname, 1);
@@ -316,7 +316,7 @@
 		char *p;
 		char *color = (char *) malloc(strlen(l));
 
-		newsum = (dispsummary_t *) malloc(sizeof(dispsummary_t));
+		newsum = (dispsummary_t *) calloc(1, sizeof(dispsummary_t));
 		newsum->url = (char *) malloc(strlen(l));
 
 		if (sscanf(l, "%s %s", color, newsum->url) == 2) {
--- ./bbdisplay/pagegen.c	2006/06/01 12:29:36	1.172
+++ ./bbdisplay/pagegen.c	2006/06/02 20:25:39
@@ -168,7 +168,7 @@
 
 	/* Code de-obfuscation trick: Add a null record as the head item */
 	/* Simplifies handling since head != NULL and we never have to insert at head of list */
-	head = (col_list_t *) malloc(sizeof(col_list_t));
+	head = (col_list_t *) calloc(1, sizeof(col_list_t));
 	head->column = &null_column;
 	head->next = NULL;
 
@@ -197,7 +197,7 @@
 
 				col = find_or_create_column(p1, 0);
 				if (col) {
-					newlistitem = (col_list_t *) malloc(sizeof(col_list_t));
+					newlistitem = (col_list_t *) calloc(1, sizeof(col_list_t));
 					newlistitem->column = col;
 					newlistitem->next = NULL;
 					collist_walk->next = newlistitem;
@@ -235,7 +235,7 @@
 
 				if ((collist_walk->next == NULL) || ((col_list_t *)(collist_walk->next))->column != e->column) {
 					/* collist_walk points to the entry before the new one */
-					newlistitem = (col_list_t *) malloc(sizeof(col_list_t));
+					newlistitem = (col_list_t *) calloc(1, sizeof(col_list_t));
 					newlistitem->column = e->column;
 					newlistitem->next = collist_walk->next;
 					collist_walk->next = newlistitem;
@@ -674,23 +674,14 @@
 			for (s2 = sums; (s2); s2 = s2->next) {
 				
 				if (strcmp(s2->row, s->row) == 0) {
-					newentry = (entry_t *) malloc(sizeof(entry_t));
+					newentry = (entry_t *) calloc(1, sizeof(entry_t));
 
 					newentry->column = find_or_create_column(s2->column, 1);
 					newentry->color = s2->color;
 					strcpy(newentry->age, "");
 					newentry->oldage = 1; /* Use standard gifs */
-					newentry->acked = 0;
-					newentry->alert = 0;
-					newentry->onwap = 0;
 					newentry->propagate = 1;
 					newentry->sumurl = s2->url;
-					newentry->skin = NULL;
-					newentry->testflags = NULL;
-					newentry->repinfo = NULL;
-					newentry->causes = NULL;
-					newentry->histlogname = NULL;
-					newentry->shorttext = NULL;
 					newentry->next = newhost->entries;
 					newhost->entries = newentry;
 				}
@@ -1082,7 +1073,7 @@
 
 			/* We need to create a copy of the original record, */
 			/* as we will diddle with the pointers */
-			newhost = (host_t *) malloc(sizeof(host_t));
+			newhost = (host_t *) calloc(1, sizeof(host_t));
 			memcpy(newhost, h->hostentry, sizeof(host_t));
 			newhost->next = NULL;
 
--- ./bbdisplay/util.c	2006/05/30 06:45:31	1.154
+++ ./bbdisplay/util.c	2006/06/02 20:27:52
@@ -217,7 +217,7 @@
 	if (newcol == NULL) {
 		if (!create) return NULL;
 
-		newcol = (bbgen_col_t *) malloc(sizeof(bbgen_col_t));
+		newcol = (bbgen_col_t *) calloc(1, sizeof(bbgen_col_t));
 		newcol->name = strdup(testname);
 		newcol->listname = (char *)malloc(strlen(testname)+1+2); 
 		sprintf(newcol->listname, ",%s,", testname);
list Chris Morris · Tue, 6 Jun 2006 11:28:43 +0100 ·
Henrik,

Thanks very much - the patch seems to have done the trick.

Regards,

Chris
quoted from Henrik Størner
-----Original Message-----
From:	user-ce4a2c883f75@xymon.invalid [SMTP:user-ce4a2c883f75@xymon.invalid]
Sent:	Friday, June 02, 2006 9:57 PM
To:	user-ae9b8668bcde@xymon.invalid
Subject:	Re: [hobbit] Generated Reports coredump

On Wed, May 31, 2006 at 04:44:24PM +0100, Morris, Chris (Shared Services)
wrote:
This problem has been outstanding for quite a while now on AIX. 

When running the pre-generated reports, bbgen coredumps.
I think this is related to how bbgen looks into historical logs to try
and extract the cause of why a status has gone red. The only problem is
that I simply cannot understand how the code is supposed to work ...

The patch below solves the crash. Extracting the interesting bit of data
from the historical logfile is tricky, but it does work in most cases.


Regards,
Henrik

 << File: bbgenreport.patch >>  << File: ATT2676079.fil >> 
quoted from Chris Morris

****************************************************************************
The information contained in this email is intended only for the use of the intended recipient at the email address to which it has been addressed. If the reader of this message is not an intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination or copying of the message or associated attachments is strictly prohibited.

If you have received this email in error, please contact the sender by return email or call 01793 877777 and ask for the sender and then delete it immediately from your system.Please note that neither RWE npower nor the sender accepts any responsibility for viruses and it is your responsibility to scan attachments (if any).
*****************************************************************************