smart details monitor of 6 disks
list Théo Varier
Hello colistiers, I search to have good reporting from smartctl indicators. I have 6 disks with partition RAID1 and RAID5 (md1... mdc) mdstat seems to be correctly monitored, I need to survey the indicators of disks /dev/sda /dev/sdb ... /dev/sdf smartctl report a lot of details look like this : ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 96 3 Spin_Up_Time 0x0007 208 208 024 Pre-fail Always - 310 (Average 324) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 14 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 128 128 020 Pre-fail Offline - 18 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 509 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14 22 Unknown_Attribute 0x0023 100 100 025 Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 32 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 32 194 Temperature_Celsius 0x0002 162 162 000 Old_age Always - 37 (Min/Max 13/37) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 32 - # 2 Short offline Completed without error 00% 30 - # 3 Short offline Completed without error 00% 30 - # 4 Short offline Completed without error 00% 29 - # 5 Short offline Completed without error 00% 27 - # 6 Short offline Completed without error 00% 27 - # 7 Short offline Completed without error 00% 1 - and others .... Some one know the good script to do this ? (I try xymon-SMART.sh but it don't work with multi disk structure /dev/sda /dev/sdb /dev/sdc ) Best regard. — Théo VARIER,
list Damien Martins
Hi Théo, With all due modesty, I wrote something that could match with your expectations : https://wiki.xymonton.org/doku.php/monitors:hardware_sensors Regards, Damien Martins
▸
Le 10/10/2018 à 15:43, Théo VARIER a écrit :Hello colistiers, I search to have good reporting from smartctl indicators. I have 6 disks with partition RAID1 and RAID5 (md1... mdc) mdstat seems to be correctly monitored, I need to survey the indicators of disks /dev/sda /dev/sdb ... /dev/sdf smartctl report a lot of details look like this : ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 96 3 Spin_Up_Time 0x0007 208 208 024 Pre-fail Always - 310 (Average 324) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 14 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 128 128 020 Pre-fail Offline - 18 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 509 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 14 22 Unknown_Attribute 0x0023 100 100 025 Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 32 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 32 194 Temperature_Celsius 0x0002 162 162 000 Old_age Always - 37 (Min/Max 13/37) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 32 - # 2 Short offline Completed without error 00% 30 - # 3 Short offline Completed without error 00% 30 - # 4 Short offline Completed without error 00% 29 - # 5 Short offline Completed without error 00% 27 - # 6 Short offline Completed without error 00% 27 - # 7 Short offline Completed without error 00% 1 - and others .... Some one know the good script to do this ? (I try xymon-SMART.sh but it don't work with multi disk structure /dev/sda /dev/sdb /dev/sdc ) Best regard. — Théo VARIER,
list Jeremy Laidman
Théo Damien's hardware_sensors monitor looks really cool, and I encourage you to see if it meets your requirements. I intend to try it out myself. Otherwise, perhaps we can see where xymon-SMART.sh is failing, if you can provide artefacts or symptom descriptions to illustrate on the problem you've encountered. Cheers Jeremy
▸
On Thu, 11 Oct 2018 at 05:58, Damien Martins <user-c12727b399f0@xymon.invalid> wrote:
Hi Théo, With all due modesty, I wrote something that could match with your expectations : https://wiki.xymonton.org/doku.php/monitors:hardware_sensors Regards, Damien Martins Le 10/10/2018 à 15:43, Théo VARIER a écrit :Hello colistiers, I search to have good reporting from smartctl indicators. I have 6 disks with partition RAID1 and RAID5 (md1... mdc) mdstat seems to be correctly monitored, I need to survey the indicators of disks /dev/sda /dev/sdb ... /dev/sdf smartctl report a lot of details look like this : ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPEUPDATED WHEN_FAILED RAW_VALUE1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-failAlways - 02 Throughput_Performance 0x0005 132 132 054 Pre-failOffline - 963 Spin_Up_Time 0x0007 208 208 024 Pre-failAlways - 310 (Average 324)4 Start_Stop_Count 0x0012 100 100 000 Old_ageAlways - 145 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-failAlways - 07 Seek_Error_Rate 0x000b 100 100 067 Pre-failAlways - 08 Seek_Time_Performance 0x0005 128 128 020 Pre-failOffline - 189 Power_On_Hours 0x0012 100 100 000 Old_ageAlways - 50910 Spin_Retry_Count 0x0013 100 100 060 Pre-failAlways - 012 Power_Cycle_Count 0x0032 100 100 000 Old_ageAlways - 1422 Unknown_Attribute 0x0023 100 100 025 Pre-failAlways - 100192 Power-Off_Retract_Count 0x0032 100 100 000 Old_ageAlways - 32193 Load_Cycle_Count 0x0012 100 100 000 Old_ageAlways - 32194 Temperature_Celsius 0x0002 162 162 000 Old_ageAlways - 37 (Min/Max 13/37)196 Reallocated_Event_Count 0x0032 100 100 000 Old_ageAlways - 0197 Current_Pending_Sector 0x0022 100 100 000 Old_ageAlways - 0198 Offline_Uncorrectable 0x0008 100 100 000 Old_ageOffline - 0199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_ageAlways - 0SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status RemainingLifeTime(hours) LBA_of_first_error# 1 Short offline Completed without error 00% 32-# 2 Short offline Completed without error 00% 30-# 3 Short offline Completed without error 00% 30-# 4 Short offline Completed without error 00% 29-# 5 Short offline Completed without error 00% 27-# 6 Short offline Completed without error 00% 27-# 7 Short offline Completed without error 00% 1-and others .... Some one know the good script to do this ? (I try xymon-SMART.sh but it don't work with multi disk structure/dev/sda /dev/sdb /dev/sdc )Best regard. — Théo VARIER,