wildcards or regex with SPLITNCV

5 messages in this thread

list Shawn Heisey · Mon, 13 Oct 2008 13:48:41 -0600 ·

I am working on setting up graphs to track an application.  I have created a test that produces the output at the end of this message, I want to use SPLITNCV to handle it, but I don't want to be required to update the hobbit configuration and restart when the number of indexes in the system increases.

Is there a way to define the SPLITNCV_testname variable in such a way that it can handle this?  Also, I want to be able to set up the live_count and broker_count variables as counters, but be able to track negative changes as well as positive.  I'm not sure if the DERIVE format mentioned in the docs will do negative changes.  Normally, the value marches ever higher, but when there are problems, it will go down, and I want to see that on the graph.  The _time variables and the rest of the _count variables will be gauges.

Is this something that can be accomplished?  If I have to set up all the graphs as gauge and deal with an ever-increasing value for the special entries, then I will do that.

This is on the production system, running version 4.2. I started with the source package from the debian backports archive, then patched with the central BBWin patch and the allinone patch in addition to the existing debian-specific patches.  I had to resolve a couple of conflicts, but got it to compile and run.

Script output:
==============
live_count : 30635889
diff_count : 0
broker_time : 1355
broker_count : 30635889
rss_time : 4525
rss_count : 1680777
indy1s2_time : 863
indy1s2_count : 53532
indy1s3_time : 34
indy1s3_count : 789
indy2s1_time : 23
indy2s1_count : 674006
indy2s2_time : 25
indy2s2_count : 952450
indy3s1_time : 21
indy3s1_count : 1004714
indy4s1_time : 32
indy4s1_count : 997684
indy4s2_time : 32
indy4s2_count : 996035
indy5s1_time : 27
indy5s1_count : 1000602
indy5s2_time : 77
indy5s2_count : 999003
indy6s1_time : 26
indy6s1_count : 1003493
indy6s2_time : 25
indy6s2_count : 1001920
indy7s1_time : 28
indy7s1_count : 999484
indy7s2_time : 26
indy7s2_count : 1003172
indy8s1_time : 25
indy8s1_count : 1004038
indy8s2_time : 24
indy8s2_count : 1000058
indy9s1_time : 25
indy9s1_count : 999745
indy9s2_time : 24
indy9s2_count : 998908
indy10s1_time : 28
indy10s1_count : 995340
indy10s2_time : 25
indy10s2_count : 1001914
indy11s1_time : 21
indy11s1_count : 998153
indy11s2_time : 20
indy11s2_count : 1000448
indy12s1_time : 21
indy12s1_count : 995504
indy12s2_time : 22
indy12s2_count : 996100
indy13s1_time : 26
indy13s1_count : 997570
indy13s2_time : 50
indy13s2_count : 999154
indy14s1_time : 22
indy14s1_count : 997059
indy14s2_time : 21
indy14s2_count : 995457
indy15s1_time : 27
indy15s1_count : 1003780
indy15s2_time : 25
indy15s2_count : 997950
indy16s1_time : 66
indy16s1_count : 999673
indy16s2_time : 34
indy16s2_count : 994129
indy17s1_time : 28
indy17s1_count : 998302
indy17s2_time : 56
indy17s2_count : 975723

list Shawn Heisey · Mon, 13 Oct 2008 14:56:36 -0600 ·

▸ quoted from Shawn Heisey

Shawn Heisey wrote:

I am working on setting up graphs to track an application.  I have created a test that produces the output at the end of this message, I want to use SPLITNCV to handle it, but I don't want to be required to update the hobbit configuration and restart when the number of indexes in the system increases.

I have not been able to find a SPLITNCV example like the following one for regular NCV:

http://www.hswn.dk/~henrik/howtograph.txt

list Graham Nayler · Wed, 15 Oct 2008 13:44:24 +0100 ·

Shawn

I'm not using SPLITNCV as I wanted a bit more flexibility in the format of the status report (I wanted to have comment line, single colons not delimiting values etc.), but am using an external script. You may find my earlier reply here
http://www.hswn.dk/hobbiton/2008/10/msg00159.html useful though.

With the external script mechanism you don't need to restart Hobbit if your test generates additional indexes, only if you add new tests. I'm not entirely sure whether SPLITNCV works the same although it looks OK - but you sound perfectly at home with the source, so have a look at that (do_ncv.c). If you're interested, I attach my parsing script to the end of this - enter the test name list and change the regex for your needs. The commented lines were from when I was using a single RRD file for all indices, but that doesn't give the flexibility of displaying multiple graphs, or adding additional indices.

Graham Nayler

#!/usr/bin/python

import sys, re

def main():
    #print len(sys.argv), sys.argv
    if( sys.argv[2] in (<enter test name list here>)):
        #print "%s scanning file '%s'"%(sys.argv[2], sys.argv[3])
        data = ""
        lineno = 0
        f = open(sys.argv[3],'r')
        for line in f:
            lineno = lineno+1
            if (lineno > 2):
                mo = re.match("(.*\s+)?([^\s]+)\s*::\s*(-?[0-9\.]*).*$",line)
                if not (mo == None):
                    if( len(mo.group(3)) > 0 ):
                        print "DS:%s:GAUGE:600:U:U"%mo.group(2)
#                        if len(data) > 0:
#                            data = data + ":" + mo.group(3)
#                        else:
#                            data = mo.group(3)
                        print "%s.%s.rrd"%(sys.argv[2],mo.group(2))
                        print mo.group(3)
        f.close()
#        if( len(data) > 0 ):
#            print "%s.rrd"%sys.argv[2]
#            print data

if __name__ == "__main__":
    main()


----- Original Message ----- From: "Shawn Heisey" <user-5d0d01dba542@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>
Sent: Monday, October 13, 2008 9:56 PM
Subject: Re: [hobbit] wildcards or regex with SPLITNCV

▸ quoted from Shawn Heisey

Shawn Heisey wrote:

I am working on setting up graphs to track an application.  I have created a test that produces the output at the end of this message, I want to use SPLITNCV to handle it, but I don't want to be required to update the hobbit configuration and restart when the number of indexes in the system increases.

I have not been able to find a SPLITNCV example like the following one for regular NCV:

http://www.hswn.dk/~henrik/howtograph.txt

list Shawn Heisey · Mon, 20 Oct 2008 13:38:09 -0600 ·

▸ quoted from Graham Nayler

Graham Nayler wrote:

Shawn

I'm not using SPLITNCV as I wanted a bit more flexibility in the format of the status report (I wanted to have comment line, single colons not delimiting values etc.), but am using an external script. You may find my earlier reply here
http://www.hswn.dk/hobbiton/2008/10/msg00159.html useful though.

With the external script mechanism you don't need to restart Hobbit if your test generates additional indexes, only if you add new tests. I'm not entirely sure whether SPLITNCV works the same although it looks OK - but you sound perfectly at home with the source, so have a look at that (do_ncv.c). If you're interested, I attach my parsing script to the end of this - enter the test name list and change the regex for your needs. The commented lines were from when I was using a single RRD file for all indices, but that doesn't give the flexibility of displaying multiple graphs, or adding additional indices.

I finally got around to looking at this. I think I'm even more confused. Not sure where you got the idea I'm comfortable with the source ... I've looked at the 4.3 sources trying to get rid of warnings and get it working, but the only thing that did was remind me just how many years it's been since I did any C programming. My eyes glazed over a bit with your python too, I haven't invested any time in that language yet.

I'll start at the beginning and tell everyone what it is I'm trying to do and the difficulties I'm facing. This is an attempt to monitor a multi-server EasyAsk search index at a remote data center, to which I have no back-end connectivity. The systems have no way to reach the Internet. I wouldn't have designed it that way; it is an acquisition company. I can log onto them by making an ssh connection to a gateway server that has a NIC in the remote LAN, and there are public-facing webservers that also can reach it. We are going to move everything out of the data center within the next six months, so I have no plans to redesign the network until it is moved to headquarters.

The data is generated by a CGI script running on the public facing webserver pair, which an external server-side shell script on Hobbit is retrieving with wget. The CGI script queries the search broker and each individual index server. It notes the total number of records held by each index server, adds them all up, and compares that value to the number of records reported by the broker. It also records how long in milliseconds each query takes. It's basically a machine-readable rewrite of a script that produces a pretty status page. We can't watch the values on that page 24/7, so I want to graph them to watch for problems.

I didn't go with the external script idea because the RRD docs say it doesn't scale well. I was hoping that NCV or SPLITNCV would handle it easily. I am leery of implementing things that don't scale well - I've been bitten in the past because the boss liked what he saw on something I'd hacked together without thought to performance and wanted to deploy it everywhere.

What I'd like to see is a series of graphs, the first of which should have the total count and the broker count, then a graph with just the difference. Then I'd like to have graphs that work like the disk graphs, where it aggregates the individual broker counts. Following that, another series of graphs that aggregate the response times.

I think it might be easier to implement and easier to read if have four separate columns on the host entry, something like i_totals, i_diff, i_counts, and i_time.

Forgetting about scalability, are there good examples for how to accomplish this, or a kind soul willing to guide me through the process? I can tweak the CGI script and the script on Hobbit that calls it in any way required.

list Graham Nayler · Thu, 30 Oct 2008 22:27:13 -0000 ·

Shawn,

Sorry, I obviously overinterpreted your earlier posts about patching and getting your snapshot working.

I can't really comment about your application, other than to say I think you'll have a major problem getting different numbers of traces on different graphs within a single test. Running multiple tests, with some having individual traces/graph and some having multiple, is probably the only way you'll be able to manage that.

But back to your SPLITNCV problem, I finally had a bit of a closer look at it, as it's so closely related to what I've been doing myself. Yes it will allow adding datasources within a test without restarting hobbit (or more accurately, and seriously, having to delete and reinitialise the RRD files, so losing previous history). Don't know about it's support in 4.2, but it is supported in 4.3 - but it is broken.

Here's my reply to someone else today on the subject, which describes the usage and the fix
http://www.hswn.dk/hobbiton/2008/10/msg00423.html

Regarding the python script, essentially it skips the first two header lines, then parses any lines it sees of the format
<any old junk> {space} <datasource_name> {space} :: <signed floating point value> <more junk>

and writes for each line found the following three lines to stdout
DS:<datasource_name>:GAUGE:600:U:U
<testname>.<datasource_name>.rrd
<value>

As it receives each <value> the script host (hobbitd_rrd) updates (or generates if required) the named RRD file.

Subsequently, I've now changed it to output a DS line equivalent to what the SPLITNCV mechanism does
DS:lambda:GAUGE:600:U:U
partly as I had some very long datasource names, and RRD throws an error (and fails to create the file) if it sees datasources longer than 19 characters.

FYI, as I see it, the overheads of using external script mechanism additional to the SPLITNCV methods are:
one process fork per test report
write the body of the test report to a temporary diskfile
run the script
delete the temporary diskfile

Graham Nayler

----- Original Message ----- From: "Shawn Heisey" <user-90f60e6a2765@xymon.invalid>
To: <user-ae9b8668bcde@xymon.invalid>; <user-87f7ade8d1a8@xymon.invalid>
Sent: Monday, October 20, 2008 7:38 PM
Subject: Re: [hobbit] wildcards or regex with SPLITNCV

{snip}

I finally got around to looking at this.  I think I'm even more confused. Not sure where you got the idea I'm comfortable with the source ... I've looked at the 4.3 sources trying to get rid of warnings and get it working, but the only thing that did was remind me just how many years it's been since I did any C programming.  My eyes glazed over a bit with your python too, I haven't invested any time in that language yet.

{snip}

wildcards or regex with SPLITNCV 🔗 link

wildcards or regex with SPLITNCV