Xymon Mailing List Archive search

nvme temperature check broken in Debian bookworm

3 messages in this thread

list Christoph Zechner · Tue, 9 Apr 2024 12:04:40 +0200 ·
Hi,

the temperature check in xymon's version of bookworm is broken in a rather strange way. The check is located in /usr/lib/xymon/client/ext/temp and fails for all NVMe disks that contain several temperature sensors:

For example the first NVMe in Lynx which holds its temperature values in /sys/block/nvme0n1/device/hwmon0/:
files   name        value   min         max         crit
temp1_* Composite   27.85   -273.15     86.85       87.85
temp2_* Sensor 1    27.85   -273.15     65261.85    n/a
temp3_* Sensor 2    31.85   -273.15     65261.85    n/a

The inner logic of the temperature check works as follows to calculate the values for red and yellow:

1) if there is a crit and a max value use them
2) if there is a max and a mid value use them
3) if there is a max and a min value, use them
4) if there is only a max file, use it for both
5) if there is only a crit file, use it for both.

The sensor 'Composite' uses max and crit as they're both available. 'Sensor 1' and 'Sensor 2' however do only provide max and min. Therefor these values are being used but lead to 'yellow' warnings as the min value actually isn't an upper boundary as assumed but a lower boundary.

The linux kernel documentation (https://docs.kernel.org/hwmon/sysfs-interface.html) also outlines that every file using 'min' in their name is a low threshold:

     The common scheme for files naming is: <type><number>_<item>. Usual types for sensor chips are "in" (voltage), "temp" (temperature) and "fan" (fan). Usual items are "input" (measured value), "max" (high threshold, "min" (low threshold).

The proposed fix would be to either use max value for yellow and red or to at least sanity check whether min is below zero and in that specific case only use the max value for both:

In /usr/lib/xymon/client/ext/temp beginning on line 182:

         my ($red, $yellow);
         if (-r $crit_file and -r $max_file) {
             $red     = read_one_chomped_line_from_file($crit_file);
             $yellow  = read_one_chomped_line_from_file($max_file);
         } elsif (-r $max_file and -r $mid_file) {
             $red     = read_one_chomped_line_from_file($max_file);
             $yellow  = read_one_chomped_line_from_file($mid_file);
         } elsif (-r $max_file and -r $min_file) {
             $red     = read_one_chomped_line_from_file($max_file);
             #TODO: min_file contains the lower temperature boundary and
             #      *not* the warning value; only solution to this would
             #      be to set either yellow to red or to at least do that
             #      when yellow is below 0 for example.
             $yellow  = read_one_chomped_line_from_file($min_file);
             $yellow = $yellow > 0 ? $yellow : $red;
             # Alternative solution: do not use min at all v1:
             #$red = $yellow = read_one_chomped_line_from_file($max_file);
             # Alternative solution: do not use min at all v2: remove this 'elsif'
         } elsif (-r $max_file) {
             $red = $yellow = read_one_chomped_line_from_file($max_file);
         } elsif (-r $crit_file) {
             $red = $yellow = read_one_chomped_line_from_file($crit_file);
         }

There are three ways to solve this:

* sanitize min by checking whether the value is below 0 and in that case use the max value
* use the max value in any way
* completely remove the 'elsif' that reads min and max as the next 'elsif' just reads and uses max

Thanks in advance!

Best regards
Christoph
list Jeremy Laidman · Tue, 23 Apr 2024 09:35:58 +1000 ·
Christoph

I'm fairly sure that a "temp" script is not part of the standard Xymon
client, and it doesn't appear to be part of the Debian/Bookworm package
either. Generally, scripts in "ext" are add-ons to a package by the local
installer/administrator. In summary, I don't know where that script came
from, and it's possible nobody else on this list knows.

Cheers
Jeremy
quoted from Christoph Zechner

On Tue, 9 Apr 2024 at 20:43, Christoph Zechner <user-249716582ccc@xymon.invalid> wrote:
Hi,

the temperature check in xymon's version of bookworm is broken in a
rather strange way. The check is located in
/usr/lib/xymon/client/ext/temp and fails for all NVMe disks that contain
several temperature sensors:

For example the first NVMe in Lynx which holds its temperature values in
/sys/block/nvme0n1/device/hwmon0/:
files   name        value   min         max         crit
temp1_* Composite   27.85   -273.15     86.85       87.85
temp2_* Sensor 1    27.85   -273.15     65261.85    n/a
temp3_* Sensor 2    31.85   -273.15     65261.85    n/a

The inner logic of the temperature check works as follows to calculate
the values for red and yellow:

1) if there is a crit and a max value use them
2) if there is a max and a mid value use them
3) if there is a max and a min value, use them
4) if there is only a max file, use it for both
5) if there is only a crit file, use it for both.

The sensor 'Composite' uses max and crit as they're both available.
'Sensor 1' and 'Sensor 2' however do only provide max and min. Therefor
these values are being used but lead to 'yellow' warnings as the min
value actually isn't an upper boundary as assumed but a lower boundary.

The linux kernel documentation
(https://docs.kernel.org/hwmon/sysfs-interface.html) also outlines that
every file using 'min' in their name is a low threshold:

     The common scheme for files naming is: <type><number>_<item>. Usual
types for sensor chips are "in" (voltage), "temp" (temperature) and
"fan" (fan). Usual items are "input" (measured value), "max" (high
threshold, "min" (low threshold).

The proposed fix would be to either use max value for yellow and red or
to at least sanity check whether min is below zero and in that specific
case only use the max value for both:

In /usr/lib/xymon/client/ext/temp beginning on line 182:

         my ($red, $yellow);
         if (-r $crit_file and -r $max_file) {
             $red     = read_one_chomped_line_from_file($crit_file);
             $yellow  = read_one_chomped_line_from_file($max_file);
         } elsif (-r $max_file and -r $mid_file) {
             $red     = read_one_chomped_line_from_file($max_file);
             $yellow  = read_one_chomped_line_from_file($mid_file);
         } elsif (-r $max_file and -r $min_file) {
             $red     = read_one_chomped_line_from_file($max_file);
             #TODO: min_file contains the lower temperature boundary and
             #      *not* the warning value; only solution to this would
             #      be to set either yellow to red or to at least do that
             #      when yellow is below 0 for example.
             $yellow  = read_one_chomped_line_from_file($min_file);
             $yellow = $yellow > 0 ? $yellow : $red;
             # Alternative solution: do not use min at all v1:
             #$red = $yellow = read_one_chomped_line_from_file($max_file);
             # Alternative solution: do not use min at all v2: remove
this 'elsif'
         } elsif (-r $max_file) {
             $red = $yellow = read_one_chomped_line_from_file($max_file);
         } elsif (-r $crit_file) {
             $red = $yellow = read_one_chomped_line_from_file($crit_file);
         }

There are three ways to solve this:

* sanitize min by checking whether the value is below 0 and in that case
use the max value
* use the max value in any way
* completely remove the 'elsif' that reads min and max as the next
'elsif' just reads and uses max

Thanks in advance!

Best regards
Christoph

list Damien Martins · Tue, 23 Apr 2024 09:24:42 +0200 ·
Hi,

My 2 cents:
'temp' script is part of hobbit-plugins in Debian/Ubuntu/Mint

I don't remember when it was introduced, because I've created my own set 
of hardware monitoring scripts (with support limited to my computers).
Feel free to test it if relevant:
https://github.com/doktoil-makresh/xymon-plugins/tree/master/xymon-hardware

Le 23/04/2024 01:35, Jeremy Laidman a ?crit?:
quoted from Jeremy Laidman
Christoph

I'm fairly sure that a "temp" script is not part of the standard Xymon
client, and it doesn't appear to be part of the Debian/Bookworm
package either. Generally, scripts in "ext" are add-ons to a package
by the local installer/administrator. In summary, I don't know where
that script came from, and it's possible nobody else on this list
knows.

Cheers
Jeremy

On Tue, 9 Apr 2024 at 20:43, Christoph Zechner <user-249716582ccc@xymon.invalid>
wrote:
Hi,

the temperature check in xymon's version of bookworm is broken in a
rather strange way. The check is located in
/usr/lib/xymon/client/ext/temp and fails for all NVMe disks that
contain
several temperature sensors:

For example the first NVMe in Lynx which holds its temperature
values in
/sys/block/nvme0n1/device/hwmon0/:
files   name        value   min         max         crit
temp1_* Composite   27.85   -273.15     86.85       87.85
temp2_* Sensor 1    27.85   -273.15     65261.85    n/a
temp3_* Sensor 2    31.85   -273.15     65261.85    n/a

The inner logic of the temperature check works as follows to
calculate
the values for red and yellow:

1) if there is a crit and a max value use them
2) if there is a max and a mid value use them
3) if there is a max and a min value, use them
4) if there is only a max file, use it for both
5) if there is only a crit file, use it for both.

The sensor 'Composite' uses max and crit as they're both available.
'Sensor 1' and 'Sensor 2' however do only provide max and min.
Therefor
these values are being used but lead to 'yellow' warnings as the min

value actually isn't an upper boundary as assumed but a lower
boundary.

The linux kernel documentation
(https://docs.kernel.org/hwmon/sysfs-interface.html) also outlines
that
every file using 'min' in their name is a low threshold:

The common scheme for files naming is: <type><number>_<item>.
Usual
types for sensor chips are "in" (voltage), "temp" (temperature) and
"fan" (fan). Usual items are "input" (measured value), "max" (high
threshold, "min" (low threshold).

The proposed fix would be to either use max value for yellow and red
or
to at least sanity check whether min is below zero and in that
specific
case only use the max value for both:

In /usr/lib/xymon/client/ext/temp beginning on line 182:

my ($red, $yellow);
if (-r $crit_file and -r $max_file) {
$red     = read_one_chomped_line_from_file($crit_file);
$yellow  = read_one_chomped_line_from_file($max_file);
} elsif (-r $max_file and -r $mid_file) {
$red     = read_one_chomped_line_from_file($max_file);
$yellow  = read_one_chomped_line_from_file($mid_file);
} elsif (-r $max_file and -r $min_file) {
$red     = read_one_chomped_line_from_file($max_file);
#TODO: min_file contains the lower temperature boundary
and
#      *not* the warning value; only solution to this
would
#      be to set either yellow to red or to at least do
that
#      when yellow is below 0 for example.
$yellow  = read_one_chomped_line_from_file($min_file);
$yellow = $yellow > 0 ? $yellow : $red;
# Alternative solution: do not use min at all v1:
#$red = $yellow =
read_one_chomped_line_from_file($max_file);
# Alternative solution: do not use min at all v2:
remove
this 'elsif'
} elsif (-r $max_file) {
$red = $yellow =
read_one_chomped_line_from_file($max_file);
} elsif (-r $crit_file) {
$red = $yellow =
read_one_chomped_line_from_file($crit_file);
}

There are three ways to solve this:

* sanitize min by checking whether the value is below 0 and in that
case
use the max value
* use the max value in any way
* completely remove the 'elsif' that reads min and max as the next
'elsif' just reads and uses max

Thanks in advance!

Best regards
Christoph