sepparated disk alerts
list Aiquen Eldar
Greetings I don't know if this is the correct forum for this question or more like it, new feature request. How ever I'm gonna post here and hope someone will point me right if this is the wrong place. For the issue at hand. At my company we use Xymon to monitor thousands of servers. And sometimes disks gets filled and thus generates an alert. Sometimes this won't get looked into for a couple of days since our clients have specified that they want to remove data themselves (and not buy more storage). And as several servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the other disks since one is already sending an alert. Example: Server1 has 3 monitored partitions of a disk: / /srv/important/data /srv/database with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if /srv/important/data reaches 82% the client wants us to notify them and they will free up space. This normaly takes around 3 - 5 working days. But they also wants us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled, /srv/database reaches 84%. That will not trigger a new alert in the non-green status view which is what we monitor. The question/request is then as this: Is there a way to get the client to report each disk/partition as a separate alert, so we can disable the alert for one disk while receiving alerts for the other disks/partions. To use the example I want to be able to temporarily set /srv/important/data in disabled mode while still getting alerts from /srv/database I know this could be solved by writing my own script for the client, but that was disapproved of from management as they want as few custom scripts to maintain as possible (we already have dozens of custom scripts). All help and feedback is appreciated, thanks. Kind regards Calle Lejdbrandt
list John Rothlisberger
This line of thinking can also be applied to services and processes - of which, I would like to be able to do. If you monitor a server for serviceA, serviceB, and serviceC. If serviceC stops, but it is on purpose/known issue, and you acknowledge the alert, will the stopping of serviceA or B then also be ignored? Thanks, John Upcoming PTO: None John Rothlisberger IT Strategy, Infrastructure & Security - Technology Growth Platform TGP for Business Process Outsourcing Accenture XXX.XXX.XXXX office
-----Original Message----- From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On Behalf Of Aiquen Sent: Friday, February 15, 2013 2:15 PM To: xymon at xymon.com Subject: [Xymon] sepparated disk alerts
▸
Greetings
I don't know if this is the correct forum for this question or more like it, new feature
request. How ever I'm gonna post here and hope someone will point me right if
this is the wrong place.
For the issue at hand. At my company we use Xymon to monitor thousands of
servers. And sometimes disks gets filled and thus generates an alert. Sometimes
this won't get looked into for a couple of days since our clients have specified that
they want to remove data themselves (and not buy more storage). And as several
servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the
other disks since one is already sending an alert.
Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database
with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if
/srv/important/data reaches 82% the client wants us to notify them and they will
free up space. This normaly takes around 3 - 5 working days. But they also wants
us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled,
/srv/database reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.
The question/request is then as this: Is there a way to get the client to report each
disk/partition as a separate alert, so we can disable the alert for one disk while
receiving alerts for the other disks/partions. To use the example I want to be able to
temporarily set /srv/important/data in disabled mode while still getting alerts from
/srv/database
I know this could be solved by writing my own script for the client, but that was
disapproved of from management as they want as few custom scripts to maintain
as possible (we already have dozens of custom scripts).
All help and feedback is appreciated, thanks.
Kind regards
Calle Lejdbrandt
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. www.accenture.com
list Ralph Mitchell
I have a similar problem - systems with OS partitions, database partitions and application partitions. I'd like to be able to send alerts to the appropriate groups when they fill up. What I have tried so far is a specialized alert script that examines the alert message and pick outs the filesystems flagged yellow or red for sending emails to the correct responders. It's not ideal, but it works. It's direly in need of some kind of config file to map system:fileystem to email address. Ralph Mitchell
▸
On Fri, Feb 15, 2013 at 3:14 PM, Aiquen <user-b197b3c08719@xymon.invalid> wrote:
Greetings I don't know if this is the correct forum for this question or more like it, new feature request. How ever I'm gonna post here and hope someone will point me right if this is the wrong place. For the issue at hand. At my company we use Xymon to monitor thousands of servers. And sometimes disks gets filled and thus generates an alert. Sometimes this won't get looked into for a couple of days since our clients have specified that they want to remove data themselves (and not buy more storage). And as several servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the other disks since one is already sending an alert. Example: Server1 has 3 monitored partitions of a disk: / /srv/important/data /srv/database with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if /srv/important/data reaches 82% the client wants us to notify them and they will free up space. This normaly takes around 3 - 5 working days. But they also wants us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled, /srv/database reaches 84%. That will not trigger a new alert in the non-green status view which is what we monitor. The question/request is then as this: Is there a way to get the client to report each disk/partition as a separate alert, so we can disable the alert for one disk while receiving alerts for the other disks/partions. To use the example I want to be able to temporarily set /srv/important/data in disabled mode while still getting alerts from /srv/database I know this could be solved by writing my own script for the client, but that was disapproved of from management as they want as few custom scripts to maintain as possible (we already have dozens of custom scripts). All help and feedback is appreciated, thanks. Kind regards Calle Lejdbrandt
list Aiquen Eldar
I include the list in this response as I think it was left out by mistake. That is a good answer, and will probably help people in similar situations out. This will however not work for us as we don't use the email-functionality at all in xymon. Also this is a decision from management. So I need a solution that will show up in the non-green systems view in xymon. And also I need to get alerts based on that a filesystem changes state, not based on one disk changes state on a host as it is natively. I do know this is very specefic. But I have tried several workarounds and have been shot down from management because they want changes to the client to come upstream since the disk alert technically is working. It is just annoying for us who have to work with it this way. Kind regards Calle Lejdbrandt On Fri, Feb 15, 2013 at 10:51 PM, Mike Burger <user-cc5c6e80f4c5@xymon.invalid> wrote:
While I don't have a Xymon specific, technical answer, I do have a suggestion, which I've put to use in other places, with other monitoring tools. Send the emails to an account on the Xymon server, maybe even xymon, itself. For the user in question, create a .procmailrc file and make use of Procmail's ability to filter and forward. You can then filter the alarms based on subject and/or body content, forward out the messages that need a response, and send the rest to /dev/null. -- Mike Burger http://www.bubbanfriends.org "It's always suicide-mission this, save-the-planet that. No one ever just stops by to say 'hi' anymore." --Colonel Jack O'Neill, SG1
▸
Greetings I don't know if this is the correct forum for this question or more like it, new feature request. How ever I'm gonna post here and hope someone will point me right if this is the wrong place. For the issue at hand. At my company we use Xymon to monitor thousands of servers. And sometimes disks gets filled and thus generates an alert. Sometimes this won't get looked into for a couple of days since our clients have specified that they want to remove data themselves (and not buy more storage). And as several servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the other disks since one is already sending an alert. Example: Server1 has 3 monitored partitions of a disk: / /srv/important/data /srv/database with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if /srv/important/data reaches 82% the client wants us to notify them and they will free up space. This normaly takes around 3 - 5 working days. But they also wants us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled, /srv/database reaches 84%. That will not trigger a new alert in the non-green status view which is what we monitor. The question/request is then as this: Is there a way to get the client to report each disk/partition as a separate alert, so we can disable the alert for one disk while receiving alerts for the other disks/partions. To use the example I want to be able to temporarily set /srv/important/data in disabled mode while still getting alerts from /srv/database I know this could be solved by writing my own script for the client, but that was disapproved of from management as they want as few custom scripts to maintain as possible (we already have dozens of custom scripts). All help and feedback is appreciated, thanks. Kind regards Calle Lejdbrandt
--
Med vänliga hälsningar
Aiquen Eldar
---Blessed be by the darkness---
list Jeremy Laidman
I'd do what Ralph suggested, but report a new event to Hobbit, one per disk. So you'd end up with columns like disk, fsA, fsB, fsC. The "disk" is from the standard test, and the "fsA.." ones are generated by your script, and can be separately disabled/acknowledged/alerted on. J
▸
On 16 February 2013 09:12, Aiquen <user-b197b3c08719@xymon.invalid> wrote:
I include the list in this response as I think it was left out by mistake. That is a good answer, and will probably help people in similar situations out. This will however not work for us as we don't use the email-functionality at all in xymon. Also this is a decision from management. So I need a solution that will show up in the non-green systems view in xymon. And also I need to get alerts based on that a filesystem changes state, not based on one disk changes state on a host as it is natively. I do know this is very specefic. But I have tried several workarounds and have been shot down from management because they want changes to the client to come upstream since the disk alert technically is working. It is just annoying for us who have to work with it this way. Kind regards Calle Lejdbrandt On Fri, Feb 15, 2013 at 10:51 PM, Mike Burger <user-cc5c6e80f4c5@xymon.invalid> wrote:While I don't have a Xymon specific, technical answer, I do have a suggestion, which I've put to use in other places, with other monitoring tools. Send the emails to an account on the Xymon server, maybe even xymon, itself. For the user in question, create a .procmailrc file and make use of Procmail's ability to filter and forward. You can then filter the alarms based on subject and/or body content, forward out the messages that need a response, and send the rest to /dev/null. -- Mike Burger http://www.bubbanfriends.org "It's always suicide-mission this, save-the-planet that. No one ever just stops by to say 'hi' anymore." --Colonel Jack O'Neill, SG1Greetings I don't know if this is the correct forum for this question or more like it, new feature request. How ever I'm gonna post here and hope someone will point me right if this is the wrong place. For the issue at hand. At my company we use Xymon to monitor thousands of servers. And sometimes disks gets filled and thus generates an alert. Sometimes this won't get looked into for a couple of days since our clients have specified that they want to remove data themselves (and not buy more storage). And as several servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the other disks since one is already sending an alert. Example: Server1 has 3 monitored partitions of a disk: / /srv/important/data /srv/database with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if /srv/important/data reaches 82% the client wants us to notify them and they will free up space. This normaly takes around 3 - 5 working days. But they also wants us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled, /srv/database reaches 84%. That will not trigger a new alert in the non-green status view which is what we monitor. The question/request is then as this: Is there a way to get the client to report each disk/partition as a separate alert, so we can disable the alert for one disk while receiving alerts for the other disks/partions. To use the example I want to be able to temporarily set /srv/important/data in disabled mode while still getting alerts from /srv/database I know this could be solved by writing my own script for the client, but that was disapproved of from management as they want as few custom scripts to maintain as possible (we already have dozens of custom scripts). All help and feedback is appreciated, thanks. Kind regards Calle Lejdbrandt-- Med vänliga hälsningar Aiquen Eldar ---Blessed be by the darkness---
list Asif Iqbal
▸
On Fri, Feb 15, 2013 at 3:14 PM, Aiquen <user-b197b3c08719@xymon.invalid> wrote:
Greetings I don't know if this is the correct forum for this question or more like it, new feature request. How ever I'm gonna post here and hope someone will point me right if this is the wrong place. For the issue at hand. At my company we use Xymon to monitor thousands of servers. And sometimes disks gets filled and thus generates an alert. Sometimes this won't get looked into for a couple of days since our clients have specified that they want to remove data themselves (and not buy more storage). And as several servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the other disks since one is already sending an alert. Example: Server1 has 3 monitored partitions of a disk: / /srv/important/data /srv/database with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if /srv/important/data reaches 82% the client wants us to notify them and they will free up space. This normaly takes around 3 - 5 working days. But they also wants us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled, /srv/database reaches 84%. That will not trigger a new alert in the non-green status view which is what we monitor. The question/request is then as this: Is there a way to get the client to report each disk/partition as a separate alert, so we can disable the alert for one disk while receiving alerts for the other disks/partions. To use the example I want to be able to temporarily set /srv/important/data in disabled mode while still getting alerts from /srv/database
analysis.cfg
HOST=myhost
DISK /srv/database GROUP=A
DISK /srv/important/data GROUP=B
alerts.cfg
GROUP=A COLOR=red
MAIL groupA
GROUP=B COLOR=red
MAIL groupB
This might be start.
▸
I know this could be solved by writing my own script for the client, but that was disapproved of from management as they want as few custom scripts to maintain as possible (we already have dozens of custom scripts). All help and feedback is appreciated, thanks. Kind regards Calle Lejdbrandt
--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
list Aiquen Eldar
Hi Thanks for the suggestion. Oyvind suggestion seems close to what I need. I only need to find a way to make it keep track of which nodes have their limits raised. I am not allowed to make changes to the levels of the alerts without making sure that the levels get droped down again after a certain time. This is laid down as a rule from higher ups in my company. What Asif and Ralph suggests is also good, but would breach the rule about "NO mails may be automatically sent from xymon anywhere" that is laid down on me. Thank you for good suggestions and sorry about saying that they don't work for me. I want to point out that these will probably technically work to solve the problem. But I am not allowed to implement them because of the set of rules I have to follow that I mentioned in my first post. That is why I called this more of a new feature request than an actual issue. Pritty much the only acceptable solution for the higher ups is that a new client is released with the feature to sepparate disk alerts based on monitored disks. And then be able do disable or raise the limit within a time boundury, since we have to be able to garantee that the alert will come back and remind us if no one lookes in to it for a certain amount of time. Maby it is worth to mention that the feature "disable until ok" is disabled from our xymon. And you cannot disable an alert for more than 28 days to make sure that no alert ever gets neglected. Again, thank you for your time. All suggestions is apprisiated. I will try to work on something along the lines of Oyvinds suggestion. Kind Regards Calle Lejdbrandt
▸
On Tue, Feb 19, 2013 at 5:02 AM, Asif Iqbal <user-6f4b51ac2a40@xymon.invalid> wrote:On Fri, Feb 15, 2013 at 3:14 PM, Aiquen <user-b197b3c08719@xymon.invalid> wrote:Greetings I don't know if this is the correct forum for this question or more like it, new feature request. How ever I'm gonna post here and hope someone will point me right if this is the wrong place. For the issue at hand. At my company we use Xymon to monitor thousands of servers. And sometimes disks gets filled and thus generates an alert. Sometimes this won't get looked into for a couple of days since our clients have specified that they want to remove data themselves (and not buy more storage). And as several servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the other disks since one is already sending an alert. Example: Server1 has 3 monitored partitions of a disk: / /srv/important/data /srv/database with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if /srv/important/data reaches 82% the client wants us to notify them and they will free up space. This normaly takes around 3 - 5 working days. But they also wants us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled, /srv/database reaches 84%. That will not trigger a new alert in the non-green status view which is what we monitor. The question/request is then as this: Is there a way to get the client to report each disk/partition as a separate alert, so we can disable the alert for one disk while receiving alerts for the other disks/partions. To use the example I want to be able to temporarily set /srv/important/data in disabled mode while still getting alerts from /srv/databaseanalysis.cfg HOST=myhost DISK /srv/database GROUP=A DISK /srv/important/data GROUP=B alerts.cfg GROUP=A COLOR=red MAIL groupA GROUP=B COLOR=red MAIL groupB This might be start.I know this could be solved by writing my own script for the client, but that was disapproved of from management as they want as few custom scripts to maintain as possible (we already have dozens of custom scripts). All help and feedback is appreciated, thanks. Kind regards Calle Lejdbrandt-- Asif Iqbal PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
list Adam Goryachev
▸
On 20/02/13 15:00, Aiquen wrote:
Hi Thanks for the suggestion. Oyvind suggestion seems close to what I need. I only need to find a way to make it keep track of which nodes have their limits raised. I am not allowed to make changes to the levels of the alerts without making sure that the levels get droped down again after a certain time. This is laid down as a rule from higher ups in my company. What Asif and Ralph suggests is also good, but would breach the rule about "NO mails may be automatically sent from xymon anywhere" that is laid down on me. Thank you for good suggestions and sorry about saying that they don't work for me. I want to point out that these will probably technically work to solve the problem. But I am not allowed to implement them because of the set of rules I have to follow that I mentioned in my first post. That is why I called this more of a new feature request than an actual issue. Pritty much the only acceptable solution for the higher ups is that a new client is released with the feature to sepparate disk alerts based on monitored disks. And then be able do disable or raise the limit within a time boundury, since we have to be able to garantee that the alert will come back and remind us if no one lookes in to it for a certain amount of time. Maby it is worth to mention that the feature "disable until ok" is disabled from our xymon. And you cannot disable an alert for more than 28 days to make sure that no alert ever gets neglected. Again, thank you for your time. All suggestions is apprisiated. I will try to work on something along the lines of Oyvinds suggestion.
Perhaps the feature request could be along the following lines... It is currently possible to add a &<color> to a status message, generally this is done at the beginning of some of the lines to indicate the status of that component, this can be seen in the procs column for example. It would be helpful to be able to disable the individual line instead of the entire procs status, something like: somehost.procs.crond Where the value "crond" is the first "word" after the &red. This would allow the column procs to have a overall status of green (or blue), until the disabled expires, or another line of the procs changes to red. In the meantime, you could implement this on the hobbit server by listening to the client reports being sent in, and parsing/processing the disk column as required, and then setting the color for that column as needed. This is probably a pretty involved job, especially to generalise it to the point it can be useful for any column, and for more than one reason, but it definitely could add a lot of value. procs, disk, ports are just a few columns that are very overloaded (ie, one column but relate to many services/meanings), it is not helpful to have 100 columns per host, but it is helpful to be able to control enable/disable, alerts, and similar based on these columns. Just my thoughts on this, perhaps it will give someone inclined to write some code some ideas... At the end of the day, I'd suggest the only way to get this feature added (unless Henrik wants it himself) is to write the code, and submit it. That way it is easy to add the code (as long as it doesn't change things in an incompatible way). Regards, Adam -- Adam Goryachev Website Managers www.websitemanagers.com.au
list Marco Avvisano
Hi, it's possible to use report.sh for specific group of hosts, and specific test ? i would like to use it similary to the hold bb script sla-report.sh Marco