sepparated disk alerts

9 messages in this thread

list Aiquen Eldar · Fri, 15 Feb 2013 21:14:51 +0100 ·

Greetings

I don't know if this is the correct forum for this question or more
like it, new feature request. How ever I'm gonna post here and hope
someone will point me right if this is the wrong place.

For the issue at hand. At my company we use Xymon to monitor thousands
of servers. And sometimes disks gets filled and thus generates an
alert. Sometimes this won't get looked into for a couple of days since
our clients have specified that they want to remove data themselves
(and not buy more storage). And as several servers have 3 - 8 disk
and/or partitions we sometimes have trouble monitoring the other disks
since one is already sending an alert.

Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database

with the same limits on all three: 80% -> yellow alert; 90% -> red alert
Then if /srv/important/data reaches 82% the client wants us to notify
them and they will free up space. This normaly takes around 3 - 5
working days. But they also wants us to monitor /srv/database. And say
that 1 day after /srv/important/data gets filled, /srv/database
reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.

The question/request is then as this: Is there a way to get the client
to report each disk/partition as a separate alert, so we can disable
the alert for one disk while receiving alerts for the other
disks/partions. To use the example I want to be able to temporarily
set /srv/important/data in disabled mode while still getting alerts
from /srv/database

I know this could be solved by writing my own script for the client,
but that was disapproved of from management as they want as few custom
scripts to maintain as possible (we already have dozens of custom
scripts).

All help and feedback is appreciated, thanks.

Kind regards
Calle Lejdbrandt

list John Rothlisberger · Fri, 15 Feb 2013 20:37:15 +0000 ·

This line of thinking can also be applied to services and processes - of which, I would like to be able to do.

If you monitor a server for serviceA, serviceB, and serviceC.  If serviceC stops, but it is on purpose/known issue, and you acknowledge the alert, will the stopping of serviceA or B then also be ignored?

Thanks,
John
Upcoming PTO: None
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX office

-----Original Message-----
From: xymon-bounces at xymon.com [mailto:xymon-bounces at xymon.com] On
Behalf Of Aiquen
Sent: Friday, February 15, 2013 2:15 PM
To: xymon at xymon.com
Subject: [Xymon] sepparated disk alerts

▸ quoted from Aiquen Eldar


Greetings

I don't know if this is the correct forum for this question or more like it, new feature
request. How ever I'm gonna post here and hope someone will point me right if
this is the wrong place.

For the issue at hand. At my company we use Xymon to monitor thousands of
servers. And sometimes disks gets filled and thus generates an alert. Sometimes
this won't get looked into for a couple of days since our clients have specified that
they want to remove data themselves (and not buy more storage). And as several
servers have 3 - 8 disk and/or partitions we sometimes have trouble monitoring the
other disks since one is already sending an alert.

Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database

with the same limits on all three: 80% -> yellow alert; 90% -> red alert Then if
/srv/important/data reaches 82% the client wants us to notify them and they will
free up space. This normaly takes around 3 - 5 working days. But they also wants
us to monitor /srv/database. And say that 1 day after /srv/important/data gets filled,
/srv/database reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.

The question/request is then as this: Is there a way to get the client to report each
disk/partition as a separate alert, so we can disable the alert for one disk while
receiving alerts for the other disks/partions. To use the example I want to be able to
temporarily set /srv/important/data in disabled mode while still getting alerts from
/srv/database

I know this could be solved by writing my own script for the client, but that was
disapproved of from management as they want as few custom scripts to maintain
as possible (we already have dozens of custom scripts).

All help and feedback is appreciated, thanks.

Kind regards
Calle Lejdbrandt

This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited.

Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.


www.accenture.com

list Ralph Mitchell · Fri, 15 Feb 2013 16:05:05 -0500 ·

I have a similar problem - systems with OS partitions, database partitions
and application partitions.  I'd like to be able to send alerts to the
appropriate groups when they fill up.  What I have tried so far is a
specialized alert script that examines the alert message and pick outs the
filesystems flagged yellow or red for sending emails to the correct
responders.  It's not ideal, but it works.  It's direly in need of some
kind of config file to map system:fileystem to email address.

Ralph Mitchell

▸ quoted from John Rothlisberger



On Fri, Feb 15, 2013 at 3:14 PM, Aiquen <user-b197b3c08719@xymon.invalid> wrote:

Greetings

I don't know if this is the correct forum for this question or more
like it, new feature request. How ever I'm gonna post here and hope
someone will point me right if this is the wrong place.

For the issue at hand. At my company we use Xymon to monitor thousands
of servers. And sometimes disks gets filled and thus generates an
alert. Sometimes this won't get looked into for a couple of days since
our clients have specified that they want to remove data themselves
(and not buy more storage). And as several servers have 3 - 8 disk
and/or partitions we sometimes have trouble monitoring the other disks
since one is already sending an alert.

Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database

with the same limits on all three: 80% -> yellow alert; 90% -> red alert
Then if /srv/important/data reaches 82% the client wants us to notify
them and they will free up space. This normaly takes around 3 - 5
working days. But they also wants us to monitor /srv/database. And say
that 1 day after /srv/important/data gets filled, /srv/database
reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.

The question/request is then as this: Is there a way to get the client
to report each disk/partition as a separate alert, so we can disable
the alert for one disk while receiving alerts for the other
disks/partions. To use the example I want to be able to temporarily
set /srv/important/data in disabled mode while still getting alerts
from /srv/database

I know this could be solved by writing my own script for the client,
but that was disapproved of from management as they want as few custom
scripts to maintain as possible (we already have dozens of custom
scripts).

All help and feedback is appreciated, thanks.

Kind regards
Calle Lejdbrandt

list Aiquen Eldar · Fri, 15 Feb 2013 23:12:22 +0100 ·

I include the list in this response as I think it was left out by mistake.

That is a good answer, and will probably help people in similar
situations out. This will however not work for us as we don't use the
email-functionality at all in xymon. Also this is a decision from
management.

So I need a solution that will show up in the non-green systems view
in xymon. And also I need to get alerts based on that a filesystem
changes state, not based on one disk changes state on a host as it is
natively.

I do know this is very specefic. But I have tried several workarounds
and have been shot down from management because they want changes to
the client to come upstream since the disk alert technically is
working. It is just annoying for us who have to work with it this way.

Kind regards
Calle Lejdbrandt

On Fri, Feb 15, 2013 at 10:51 PM, Mike Burger <user-cc5c6e80f4c5@xymon.invalid> wrote:

While I don't have a Xymon specific, technical answer, I do have a
suggestion, which I've put to use in other places, with other monitoring
tools.

Send the emails to an account on the Xymon server, maybe even xymon, itself.

For the user in question, create a .procmailrc file and make use of
Procmail's ability to filter and forward. You can then filter the alarms
based on subject and/or body content, forward out the messages that need a
response, and send the rest to /dev/null.
--
Mike Burger
http://www.bubbanfriends.org

"It's always suicide-mission this, save-the-planet that. No one ever just
stops by to say 'hi' anymore." --Colonel Jack O'Neill, SG1

▸ quoted from Ralph Mitchell

Greetings

I don't know if this is the correct forum for this question or more
like it, new feature request. How ever I'm gonna post here and hope
someone will point me right if this is the wrong place.

For the issue at hand. At my company we use Xymon to monitor thousands
of servers. And sometimes disks gets filled and thus generates an
alert. Sometimes this won't get looked into for a couple of days since
our clients have specified that they want to remove data themselves
(and not buy more storage). And as several servers have 3 - 8 disk
and/or partitions we sometimes have trouble monitoring the other disks
since one is already sending an alert.

Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database

with the same limits on all three: 80% -> yellow alert; 90% -> red alert
Then if /srv/important/data reaches 82% the client wants us to notify
them and they will free up space. This normaly takes around 3 - 5
working days. But they also wants us to monitor /srv/database. And say
that 1 day after /srv/important/data gets filled, /srv/database
reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.

The question/request is then as this: Is there a way to get the client
to report each disk/partition as a separate alert, so we can disable
the alert for one disk while receiving alerts for the other
disks/partions. To use the example I want to be able to temporarily
set /srv/important/data in disabled mode while still getting alerts
from /srv/database

I know this could be solved by writing my own script for the client,
but that was disapproved of from management as they want as few custom
scripts to maintain as possible (we already have dozens of custom
scripts).

All help and feedback is appreciated, thanks.

Kind regards
Calle Lejdbrandt

--


Med vänliga hälsningar
Aiquen Eldar
---Blessed be by the darkness---

list Jeremy Laidman · Tue, 19 Feb 2013 13:03:46 +1100 ·

I'd do what Ralph suggested, but report a new event to Hobbit, one per
disk.  So you'd end up with columns like disk, fsA, fsB, fsC.  The "disk"
is from the standard test, and the "fsA.." ones are generated by your
script, and can be separately disabled/acknowledged/alerted on.

J

▸ quoted from Aiquen Eldar



On 16 February 2013 09:12, Aiquen <user-b197b3c08719@xymon.invalid> wrote:

I include the list in this response as I think it was left out by mistake.

That is a good answer, and will probably help people in similar
situations out. This will however not work for us as we don't use the
email-functionality at all in xymon. Also this is a decision from
management.

So I need a solution that will show up in the non-green systems view
in xymon. And also I need to get alerts based on that a filesystem
changes state, not based on one disk changes state on a host as it is
natively.

I do know this is very specefic. But I have tried several workarounds
and have been shot down from management because they want changes to
the client to come upstream since the disk alert technically is
working. It is just annoying for us who have to work with it this way.

Kind regards
Calle Lejdbrandt

On Fri, Feb 15, 2013 at 10:51 PM, Mike Burger <user-cc5c6e80f4c5@xymon.invalid>
wrote:

While I don't have a Xymon specific, technical answer, I do have a
suggestion, which I've put to use in other places, with other monitoring
tools.

Send the emails to an account on the Xymon server, maybe even xymon,
itself.

For the user in question, create a .procmailrc file and make use of
Procmail's ability to filter and forward. You can then filter the alarms
based on subject and/or body content, forward out the messages that need
a
response, and send the rest to /dev/null.
--
Mike Burger
http://www.bubbanfriends.org

"It's always suicide-mission this, save-the-planet that. No one ever just
stops by to say 'hi' anymore." --Colonel Jack O'Neill, SG1

Greetings

I don't know if this is the correct forum for this question or more
like it, new feature request. How ever I'm gonna post here and hope
someone will point me right if this is the wrong place.

For the issue at hand. At my company we use Xymon to monitor thousands
of servers. And sometimes disks gets filled and thus generates an
alert. Sometimes this won't get looked into for a couple of days since
our clients have specified that they want to remove data themselves
(and not buy more storage). And as several servers have 3 - 8 disk
and/or partitions we sometimes have trouble monitoring the other disks
since one is already sending an alert.

Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database

with the same limits on all three: 80% -> yellow alert; 90% -> red alert
Then if /srv/important/data reaches 82% the client wants us to notify
them and they will free up space. This normaly takes around 3 - 5
working days. But they also wants us to monitor /srv/database. And say
that 1 day after /srv/important/data gets filled, /srv/database
reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.

The question/request is then as this: Is there a way to get the client
to report each disk/partition as a separate alert, so we can disable
the alert for one disk while receiving alerts for the other
disks/partions. To use the example I want to be able to temporarily
set /srv/important/data in disabled mode while still getting alerts
from /srv/database

I know this could be solved by writing my own script for the client,
but that was disapproved of from management as they want as few custom
scripts to maintain as possible (we already have dozens of custom
scripts).

All help and feedback is appreciated, thanks.

Kind regards
Calle Lejdbrandt

--
Med vänliga hälsningar
Aiquen Eldar
---Blessed be by the darkness---

list Asif Iqbal · Mon, 18 Feb 2013 23:02:53 -0500 ·

▸ quoted from Jeremy Laidman

On Fri, Feb 15, 2013 at 3:14 PM, Aiquen <user-b197b3c08719@xymon.invalid> wrote:

Greetings

I don't know if this is the correct forum for this question or more
like it, new feature request. How ever I'm gonna post here and hope
someone will point me right if this is the wrong place.

For the issue at hand. At my company we use Xymon to monitor thousands
of servers. And sometimes disks gets filled and thus generates an
alert. Sometimes this won't get looked into for a couple of days since
our clients have specified that they want to remove data themselves
(and not buy more storage). And as several servers have 3 - 8 disk
and/or partitions we sometimes have trouble monitoring the other disks
since one is already sending an alert.

Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database

with the same limits on all three: 80% -> yellow alert; 90% -> red alert
Then if /srv/important/data reaches 82% the client wants us to notify
them and they will free up space. This normaly takes around 3 - 5
working days. But they also wants us to monitor /srv/database. And say
that 1 day after /srv/important/data gets filled, /srv/database
reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.

The question/request is then as this: Is there a way to get the client
to report each disk/partition as a separate alert, so we can disable
the alert for one disk while receiving alerts for the other
disks/partions. To use the example I want to be able to temporarily
set /srv/important/data in disabled mode while still getting alerts
from /srv/database

analysis.cfg

HOST=myhost
DISK /srv/database GROUP=A
DISK /srv/important/data GROUP=B

alerts.cfg

GROUP=A COLOR=red
        MAIL groupA

GROUP=B COLOR=red
        MAIL groupB


This might be start.

▸ quoted from Jeremy Laidman

I know this could be solved by writing my own script for the client,
but that was disapproved of from management as they want as few custom
scripts to maintain as possible (we already have dozens of custom
scripts).

All help and feedback is appreciated, thanks.

Kind regards
Calle Lejdbrandt

--


Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

list Aiquen Eldar · Wed, 20 Feb 2013 05:00:04 +0100 ·

Hi
Thanks for the suggestion. Oyvind suggestion seems close to what I
need. I only need to find a way to make it keep track of which nodes
have their limits raised. I am not allowed to make changes to the
levels of the alerts without making sure that the levels get droped
down again after a certain time. This is laid down as a rule from
higher ups in my company.

What Asif and Ralph suggests is also good, but would breach the rule
about "NO mails may be automatically sent from xymon anywhere" that is
laid down on me.

Thank you for good suggestions and sorry about saying that they don't
work for me. I want to point out that these will probably technically
work to solve the problem. But I am not allowed to implement them
because of the set of rules I have to follow that I mentioned in my
first post. That is why I called this more of a new feature request
than an actual issue. Pritty much the only acceptable solution for the
higher ups is that a new client is released with the feature to
sepparate disk alerts based on monitored disks. And then be able do
disable or raise the limit within a time boundury, since we have to be
able to garantee that the alert will come back and remind us if no one
lookes in to it for a certain amount of time. Maby it is worth to
mention that the feature "disable until ok" is disabled from our
xymon. And you cannot disable an alert for more than 28 days to make
sure that no alert ever gets neglected.

Again, thank you for your time. All suggestions is apprisiated. I will
try to work on something along the lines of Oyvinds suggestion.

Kind Regards
Calle Lejdbrandt

▸ quoted from Asif Iqbal


On Tue, Feb 19, 2013 at 5:02 AM, Asif Iqbal <user-6f4b51ac2a40@xymon.invalid> wrote:

On Fri, Feb 15, 2013 at 3:14 PM, Aiquen <user-b197b3c08719@xymon.invalid> wrote:

Greetings

I don't know if this is the correct forum for this question or more
like it, new feature request. How ever I'm gonna post here and hope
someone will point me right if this is the wrong place.

For the issue at hand. At my company we use Xymon to monitor thousands
of servers. And sometimes disks gets filled and thus generates an
alert. Sometimes this won't get looked into for a couple of days since
our clients have specified that they want to remove data themselves
(and not buy more storage). And as several servers have 3 - 8 disk
and/or partitions we sometimes have trouble monitoring the other disks
since one is already sending an alert.

Example:
Server1 has 3 monitored partitions of a disk:
/
/srv/important/data
/srv/database

with the same limits on all three: 80% -> yellow alert; 90% -> red alert
Then if /srv/important/data reaches 82% the client wants us to notify
them and they will free up space. This normaly takes around 3 - 5
working days. But they also wants us to monitor /srv/database. And say
that 1 day after /srv/important/data gets filled, /srv/database
reaches 84%. That will not trigger a new alert in the non-green status
view which is what we monitor.

The question/request is then as this: Is there a way to get the client
to report each disk/partition as a separate alert, so we can disable
the alert for one disk while receiving alerts for the other
disks/partions. To use the example I want to be able to temporarily
set /srv/important/data in disabled mode while still getting alerts
from /srv/database

analysis.cfg

HOST=myhost
DISK /srv/database GROUP=A
DISK /srv/important/data GROUP=B

alerts.cfg

GROUP=A COLOR=red
        MAIL groupA

GROUP=B COLOR=red
        MAIL groupB


This might be start.

I know this could be solved by writing my own script for the client,
but that was disapproved of from management as they want as few custom
scripts to maintain as possible (we already have dozens of custom
scripts).

All help and feedback is appreciated, thanks.

Kind regards
Calle Lejdbrandt

--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

list Adam Goryachev · Wed, 20 Feb 2013 16:03:12 +1100 ·

▸ quoted from Aiquen Eldar

On 20/02/13 15:00, Aiquen wrote:

Hi
Thanks for the suggestion. Oyvind suggestion seems close to what I
need. I only need to find a way to make it keep track of which nodes
have their limits raised. I am not allowed to make changes to the
levels of the alerts without making sure that the levels get droped
down again after a certain time. This is laid down as a rule from
higher ups in my company.

What Asif and Ralph suggests is also good, but would breach the rule
about "NO mails may be automatically sent from xymon anywhere" that is
laid down on me.

Thank you for good suggestions and sorry about saying that they don't
work for me. I want to point out that these will probably technically
work to solve the problem. But I am not allowed to implement them
because of the set of rules I have to follow that I mentioned in my
first post. That is why I called this more of a new feature request
than an actual issue. Pritty much the only acceptable solution for the
higher ups is that a new client is released with the feature to
sepparate disk alerts based on monitored disks. And then be able do
disable or raise the limit within a time boundury, since we have to be
able to garantee that the alert will come back and remind us if no one
lookes in to it for a certain amount of time. Maby it is worth to
mention that the feature "disable until ok" is disabled from our
xymon. And you cannot disable an alert for more than 28 days to make
sure that no alert ever gets neglected.

Again, thank you for your time. All suggestions is apprisiated. I will
try to work on something along the lines of Oyvinds suggestion.

Perhaps the feature request could be along the following lines...

It is currently possible to add a &<color> to a status message,
generally this is done at the beginning of some of the lines to indicate
the status of that component, this can be seen in the procs column for
example.

It would be helpful to be able to disable the individual line instead of
the entire procs status, something like:
somehost.procs.crond

Where the value "crond" is the first "word" after the &red. This would
allow the column procs to have a overall status of green (or blue),
until the disabled expires, or another line of the procs changes to red.

In the meantime, you could implement this on the hobbit server by
listening to the client reports being sent in, and parsing/processing
the disk column as required, and then setting the color for that column
as needed.

This is probably a pretty involved job, especially to generalise it to
the point it can be useful for any column, and for more than one reason,
but it definitely could add a lot of value. procs, disk, ports are just
a few columns that are very overloaded (ie, one column but relate to
many services/meanings), it is not helpful to have 100 columns per host,
but it is helpful to be able to control enable/disable, alerts, and
similar based on these columns.

Just my thoughts on this, perhaps it will give someone inclined to write
some code some ideas... At the end of the day, I'd suggest the only way
to get this feature added (unless Henrik wants it himself) is to write
the code, and submit it. That way it is easy to add the code (as long as
it doesn't change things in an incompatible way).

Regards,
Adam

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au

list Marco Avvisano · Wed, 20 Feb 2013 11:27:02 +0100 ·

Hi,
it's possible to use report.sh for specific group of hosts, and specific 
test ?
i would like to use it similary to the hold bb script sla-report.sh
Marco

sepparated disk alerts 🔗 link

sepparated disk alerts