xymon.com and the mailinglists back online
list Henrik Størner
Hi, the xymon.com server had a minor disk "hiccup" last Saturday. Unfortunately, this triggered a kernel panic and things went pretty bad after that - eventually causing the whole server to die last Monday Aug. 20th. Unfortunately, I was 4000 km away - and even though I did manage to get an SSH session opened to the server, all attempts to reboot it remotely just gave me a "Bus error". My apologies for the inconvenience, it was a classic case of Murphy's Law that everything will go bad, at the worst possible time. I expect the mails submitted to the mailing list will show up over the next 24 hours or so, once the various mailservers retry their connection to xymon.com Regards, Henrik
list Josh Luthman
Thanks for the notice! Josh Luthman Office: XXX-XXX-XXXX Direct: XXX-XXX-XXXX XXXX Wayne St Suite XXXX Troy, OH XXXXX
▸
On Mon, Aug 27, 2012 at 11:19 AM, Henrik Størner <user-ce4a2c883f75@xymon.invalid> wrote:
Hi, the xymon.com server had a minor disk "hiccup" last Saturday. Unfortunately, this triggered a kernel panic and things went pretty bad after that - eventually causing the whole server to die last Monday Aug. 20th. Unfortunately, I was 4000 km away - and even though I did manage to get an SSH session opened to the server, all attempts to reboot it remotely just gave me a "Bus error". My apologies for the inconvenience, it was a classic case of Murphy's Law that everything will go bad, at the worst possible time. I expect the mails submitted to the mailing list will show up over the next 24 hours or so, once the various mailservers retry their connection to xymon.com Regards, Henrik ______________________________**
Xymon at xymon.com<
list Henrik Størner
▸
On 27-08-2012 17:19, Henrik Størner wrote:
the xymon.com server had a minor disk "hiccup" last Saturday. Unfortunately, this triggered a kernel panic and things went pretty bad after that - eventually causing the whole server to die last Monday Aug. 20th.
Turns out it was more than just a hiccup - I was bitten by a firmware bug in my Crucial M4 SSD disk http://forum.crucial.com/t5/Solid-State-Drives-SSD/Firmware-Update-Notifications/m-p/80282#M24370 "an incorrect response to a SMART counter will cause the m4 drive to become unresponsive after 5184 hours of Power-on time. The drive will recover after a power cycle, however, this failure will repeat once per hour after reaching this point." If any of you have Crucial M4 SSD disks in use, I'd recommend checking the firmware version ASAP - it must be version 0309 or 000F. "smartctl -a" on Linux can tell you. Regards, Henrik