Handling SNMP traps with Hobbit
list Andy Farrior
My technicians have been bugging me about wanting to receive SNMP traps from various equipment. The only references to SNMP traps and Hobbit I could find were forwarding Hobbit events as SNMP traps to NMS servers like OpenView or something. I don't have either; so I've tried to implement SNMP trap handling with Hobbit using an external perl script, snmptrapd, SNMPTT, and SEC. I've put my configuration notes here if you're interested: http://cerebro.victoriacollege.edu/hobbit-trap.html If you find something grossly wrong, let me know. thanks, andy
list Henrik Størner
▸
On Fri, Jul 15, 2005 at 09:21:10PM -0500, FARRIOR, Andy wrote:
My technicians have been bugging me about wanting to receive SNMP traps from various equipment. The only references to SNMP traps and Hobbit I could find were forwarding Hobbit events as SNMP traps to NMS servers like OpenView or something. I don't have either; so I've tried to implement SNMP trap handling with Hobbit using an external perl script, snmptrapd, SNMPTT, and SEC. I've put my configuration notes here if you're interested: http://cerebro.victoriacollege.edu/hobbit-trap.html
This is a very elegant solution for handling SNMP traps. The SNMPTT and SEC tools make these statuses really usable, instead of just dumping all traps directly into Hobbit. I'll definitely get this up and running on my own system after the holidays. The only criticism I have is about the way you keep statuses from going purple - I would do that differently, because right now you depend on a certain format of the Hobbit checkpoint-file which may change (there's a reason it isn't documented anywhere). Instead of reading the checkpoint file, I'd query the hobbit daemon directly. You do this with the bb client tool and the "hobbitdboard" command. E.g. to fetch the hostname and expiry-time for all "trap" statuses you can do this: $BB $BBDISP "hobbitdboard test=trap fields=hostname,validtime" The output looks like this: adsl.hswn.dk|1121498714 backup-mx.post.tele.dk|1121498714 www.sslug.dk|1121498623 So changing your script to do this should be really simple - use Perl's open() to run the command in a pipe and read the output - and you no longer rely on the checkpoint-file being updated, or the format of this file not changing. And you only get the hosts that really do have a "trap" status logged in Hobbit, so you no longer need to read the bb-hosts file - you shouldn't do it the way you do, because it doesn't handle bb-hosts files that have been split up into multiple files and then combined via the "include" statement. If you must, use the "bbhostgrep trap" command and read the output from it. Another - perhaps more elegant - solution is to change Hobbit so that you can send a status-message that does not expire. I'd be willing to implement such a change since it does make sense for this kind of integration with other systems. (I have a similar problem on my system where it receives e-mails instead of SNMP traps). However, then you will not get any indication if your SNMP module stops working, so each method has its benefits and drawbacks. My Perl skills are really poor, so I'd love it if you could change the trap.pl script to use the hobbitdboard command instead of the checkpoint file. Thanks, Henrik
list Andy Farrior
▸
-----Original Message----- From: Henrik Stoerner [mailto:user-ce4a2c883f75@xymon.invalid] Sent: Sat 7/16/2005 2:22 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Handling SNMP traps with Hobbit
Instead of reading the checkpoint file, I'd query the hobbit daemon directly. You do this with the bb client tool and the "hobbitdboard" command. E.g. to fetch the hostname and expiry-time for all "trap" statuses you can do this: $BB $BBDISP "hobbitdboard test=trap fields=hostname,validtime" The output looks like this: adsl.hswn.dk|1121498714 backup-mx.post.tele.dk|1121498714 www.sslug.dk|1121498623
I should have known you could do something like that with Hobbit.... I'll play with that. (There was a voice in the back of my head that told me not to use the checkpoint file, but I didn't know what else to use. must read all docs...) Thanks.
▸
Another - perhaps more elegant - solution is to change Hobbit so that you can send a status-message that does not expire. I'd be willing to implement such a change since it does make sense for this kind of integration with other systems. (I have a similar problem on my system where it receives e-mails instead of SNMP traps). However, then you will not get any indication if your SNMP module stops working, so each method has its benefits and drawbacks.
For now, I think I'll try using long LIFETIME values like 24h or 48h and set the status to "no traps to report" after that time period.
▸
My Perl skills are really poor, so I'd love it if you could change the trap.pl script to use the hobbitdboard command instead of the checkpoint file.
I *should* have something by Monday. thanks Henrik! Andy
list Andy Farrior
I've updated the trap.pl script to read from hobbitdboard. let's try that again. http://cerebro.victoriacollege.edu/hobbit-trap.html thanks, Andy
list Thomas Pedersen
Hi andy,
I am using snmptt also but set it up using the mysql web page and then have a ext script to test if any unacknowledged events are on the web page. My primary reason for doing it this way was that I was not able to make sure that all traps were recorded. How do you make sure that 2 consecutive traps from the same device are recorded both and not only the last ?
Also you mention the "problem" with classification. This has rased som issues in my organisation, because some server people did not agree on a specific alert levels (ie CRITICAL etc). Do you edit this by hand ?
Best regards,
Thomas
I am using snmptt also but set it up using the mysql web page and then have a ext script to test if any unacknowledged events are on the web page. My primary reason for doing it this way was that I was not able to make sure that all traps were recorded. How do you make sure that 2 consecutive traps from the same device are recorded both and not only the last ?
Also you mention the "problem" with classification. This has rased som issues in my organisation, because some server people did not agree on a specific alert levels (ie CRITICAL etc). Do you edit this by hand ?
Best regards,
Thomas
▸
FARRIOR, Andy skrev:My technicians have been bugging me about wanting to receive SNMP traps from various equipment.
The only references to SNMP traps and Hobbit I could find were forwarding Hobbit events as SNMP traps toNMS servers like OpenView or something.
I don't have either; so I've tried to implement SNMP trap handling with Hobbit using an external perl script, snmptrapd, SNMPTT, and SEC.
I've put my configuration notes here if you're interested:If you find something grossly wrong, let me know.
thanks,andy
list Andy Farrior
▸
-----Original Message----- From: Thomas [mailto:user-97316fb2dd2a@xymon.invalid] Sent: Mon 7/18/2005 3:20 AM To: user-ae9b8668bcde@xymon.invalid Subject: Re: [hobbit] Handling SNMP traps with Hobbit Hi andy, I am using snmptt also but set it up using the mysql web page and then have a ext script to test if >any unacknowledged events are on the web page. My primary reason for doing it this way was that I was >not able to make sure that all traps were recorded. How do you make sure that 2 consecutive traps >from the same device are recorded both and not only the last ?
If two Normal traps arrive, only the last one will be seen. If a WARNING or CRITICAL trap arrives followed by a Normal trap, you will still get an alert for the yellow/red trap but you'll see the green trap on the page; however, the yellow/red trap should be in the history. I wanted to include MySQL logging with SNMPTT, but haven't had a chance to work with it. I'd also want to have some link from the Hobbit status page to a SNMPTT web page so you could see all past alerts. Instead of waiting or forcing a user to acknoledge a trap, I treat it like a regular alert. Just wanted to keep it simple.
Also you mention the "problem" with classification. This has rased som issues in my organisation, >because some server people did not agree on a specific alert levels (ie CRITICAL etc). Do you edit >this by hand ?
I edited these by hand. We're fairly small, only two people receive network alerts, two receive server alerts; so, we didn't have any issues. Thanks, Andy