I have a linux server which is alerting on a high load but the load
average is lower than my threshold. My question, why is it going red?
Analysis.cfg
HOST=serverA
LOAD 89.0 90.0
Here are the top results – I expect that the alert should be triggered by
the load average of 21.75 which is far lower than the thresholds.
top - 07:58:55 up 17 days, 19:31, 20 users, load average: 21.75, 25.16,
25.32
...
But, there is a single process using a ton of cpu on one of the multiple
cores – is this factoring into the alert? If so, why is it not documented?
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
121978 user1+ 20 0 46.839g 0.040t 50388 S 2774 17.2 18504:34 python
Thanks,
John
Upcoming PTO:
John Rothlisberger
IT Strategy, Infrastructure & Security - Technology Growth Platform
TGP for Business Process Outsourcing
Accenture
XXX.XXX.XXXX <(312)%20693-3136> office
This message is for the designated recipient only and may contain
privileged, proprietary, or otherwise confidential information. If you have
received it in error, please notify the sender immediately and delete the
original. Any other use of the e-mail by you is prohibited. Where allowed
by local law, electronic communications with Accenture and its affiliates,
including e-mail and instant messaging (including content), may be scanned
by our systems for the purposes of information security and assessment of
internal compliance with Accenture policy.
www.accenture.com