SPLITNCV with wildcard because of multiple graphs from one test
list Oliver R.
Dear All, I have a custom test that reports data from nvme disks from the command: nvme smart-log /dev/nvme2n1 -o json The output is processed so that the client reports the following to the server: S3ESNX0J951626N-critical_warning : 0 S3ESNX0J951626N-temperature : 318 S3ESNX0J951626N-avail_spare : 100 S3ESNX0J951626N-spare_thresh : 10 S3ESNX0J951626N-percent_used : 92 S3ESNX0J951626N-data_units_read : 27832088 S3ESNX0J951626N-data_units_written : 93877408 S3ESNX0J951626N-host_read_commands : 180442558 S3ESNX0J951626N-host_write_commands : 916278700 S3ESNX0J951626N-controller_busy_time : 4028 S3ESNX0J951626N-power_cycles : 218 S3ESNX0J951626N-power_on_hours : 2995 S3ESNX0J951626N-unsafe_shutdowns : 86 S3ESNX0J951626N-media_errors : 0 S3ESNX0J951626N-num_err_log_entries : 0 S3ESNX0J951626N-warning_temp_time : 0 S3ESNX0J951626N-critical_comp_time : 0 S3ESNX0J951626N-temperature_sensor_1 : 318 S3ESNX0J951626N-temperature_sensor_2 : 323 S3ESNX0J951626N-thm_temp1_trans_count : 0 S3ESNX0J951626N-thm_temp2_trans_count : 0 S3ESNX0J951626N-thm_temp1_total_time : 0 S3ESNX0J951626N-thm_temp2_total_time : 0 2J4520102682-critical_warning : 0 2J4520102682-temperature : 314 2J4520102682-avail_spare : 100 2J4520102682-spare_thresh : 32 2J4520102682-percent_used : 2 2J4520102682-data_units_read : 9450966 2J4520102682-data_units_written : 24105094 2J4520102682-host_read_commands : 137338588 2J4520102682-host_write_commands : 284702582 2J4520102682-controller_busy_time : 0 2J4520102682-power_cycles : 46 2J4520102682-power_on_hours : 1107 2J4520102682-unsafe_shutdowns : 10 2J4520102682-media_errors : 0 2J4520102682-num_err_log_entries : 0 2J4520102682-warning_temp_time : 0 2J4520102682-critical_comp_time : 0 2J4520102682-thm_temp1_trans_count : 0 2J4520102682-thm_temp2_trans_count : 0 2J4520102682-thm_temp1_total_time : 0 2J4520102682-thm_temp2_total_time : 0 S3ESNX0J951635M-critical_warning : 0 S3ESNX0J951635M-temperature : 310 S3ESNX0J951635M-avail_spare : 100 S3ESNX0J951635M-spare_thresh : 10 S3ESNX0J951635M-percent_used : 92 S3ESNX0J951635M-data_units_read : 32693378 S3ESNX0J951635M-data_units_written : 95742837 S3ESNX0J951635M-host_read_commands : 213266959 S3ESNX0J951635M-host_write_commands : 918085461 S3ESNX0J951635M-controller_busy_time : 4280 S3ESNX0J951635M-power_cycles : 218 S3ESNX0J951635M-power_on_hours : 3072 S3ESNX0J951635M-unsafe_shutdowns : 86 S3ESNX0J951635M-media_errors : 0 S3ESNX0J951635M-num_err_log_entries : 1 S3ESNX0J951635M-warning_temp_time : 0 S3ESNX0J951635M-critical_comp_time : 0 S3ESNX0J951635M-temperature_sensor_1 : 310 S3ESNX0J951635M-temperature_sensor_2 : 320 S3ESNX0J951635M-thm_temp1_trans_count : 0 S3ESNX0J951635M-thm_temp2_trans_count : 0 S3ESNX0J951635M-thm_temp1_total_time : 0 S3ESNX0J951635M-thm_temp2_total_time : 0 As you can see, there are three disks, that have the same metrics with different values. Now I started with a xymonserver.d/nvme.cfg looking like this: TEST2RRD="$TEST2RRD,nvme=ncv" SPLITNCV_nvme="*:GAUGE" GRAPHS_nvme="nvmecriticalwarning,nvmetemperature,nvmeavailspare,nvmesparethresh,nvmepercentused,nvmedataunitsread,nvmedataunitswritten,nvmehostreadcommands,nvmehostwritecommands,nvmecontrollerbusytime,nvmepowercycles,nvmepoweronhours,nvmeunsafeshutdowns,nvmemediaerrors,nvmenumerrlogentries,nvmewarningtemptime,nvmecriticalcomptime,nvmetemperaturesensor1,nvmetemperaturesensor2,nvmethmtemp1transcount,nvmethmtemp2transcount,nvmethmtemp1totaltime,nvmethmtemp2totaltime" This causes all rrd files beeing created correctly like this: $ ls -1 /var/lib/xymon/rrd/wsrbreb/nvme*temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,2J4520102682_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951626N_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951635M_temperature.rrd Unfortunately all datasets are now saved as datatype "GAUGE", but most need to have "DERIVE", but I cannot figure out how to do this. Here is what I've tried: SPLITNCV_nvme="temperture:GAUGE" SPLITNCV_nvme="*temperture:GAUGE" SPLITNCV_nvme=".*temperture:GAUGE" SPLITNCV_nvme="%.*temperture:GAUGE" One thing that does work is the following: SPLITNCV_nvme="S3ESNX0J951626N_temperature:GAUGE" But as you can see it requieres the name of the SSD, which cancels out all dynamics. How can I work with wildcards in SPLITNCV scenarios? Thank you for your help! Regards
list Oliver R.
I've tested even more combinations trying to escape the asterisk with no success: SPLITNCV_nvme="temperature:GAUGE" SPLITNCV_nvme="*temperature:GAUGE" SPLITNCV_nvme="*temperature*:GAUGE" SPLITNCV_nvme="\*temperature:GAUGE" SPLITNCV_nvme="@RRDIDX at _temperature:GAUGE" SPLITNCV_nvme="p at RRDIDX@_temperature:GAUGE" SPLITNCV_nvme=".*temperature:GAUGE" SPLITNCV_nvme="%.*temperature:GAUGE" SPLITNCV_nvme="(.*)temperature:GAUGE" SPLITNCV_nvme="\.\*temperature:GAUGE" SPLITNCV_nvme="\\*temperature:GAUGE" SPLITNCV_nvme="\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\.\\*temperature:GAUGE" On the xymon side I think this information above is processed in "do_ncv.c", but I still could not find, where the string above gets parsed. Any help is appreciated. Regards Oliver Am 29.09.20 um 16:24 schrieb Oliver R.:
▸
Dear All, I have a custom test that reports data from nvme disks from the command: nvme smart-log /dev/nvme2n1 -o json The output is processed so that the client reports the following to the server: S3ESNX0J951626N-critical_warning : 0 S3ESNX0J951626N-temperature : 318 S3ESNX0J951626N-avail_spare : 100 S3ESNX0J951626N-spare_thresh : 10 S3ESNX0J951626N-percent_used : 92 S3ESNX0J951626N-data_units_read : 27832088 S3ESNX0J951626N-data_units_written : 93877408 S3ESNX0J951626N-host_read_commands : 180442558 S3ESNX0J951626N-host_write_commands : 916278700 S3ESNX0J951626N-controller_busy_time : 4028 S3ESNX0J951626N-power_cycles : 218 S3ESNX0J951626N-power_on_hours : 2995 S3ESNX0J951626N-unsafe_shutdowns : 86 S3ESNX0J951626N-media_errors : 0 S3ESNX0J951626N-num_err_log_entries : 0 S3ESNX0J951626N-warning_temp_time : 0 S3ESNX0J951626N-critical_comp_time : 0 S3ESNX0J951626N-temperature_sensor_1 : 318 S3ESNX0J951626N-temperature_sensor_2 : 323 S3ESNX0J951626N-thm_temp1_trans_count : 0 S3ESNX0J951626N-thm_temp2_trans_count : 0 S3ESNX0J951626N-thm_temp1_total_time : 0 S3ESNX0J951626N-thm_temp2_total_time : 0 2J4520102682-critical_warning : 0 2J4520102682-temperature : 314 2J4520102682-avail_spare : 100 2J4520102682-spare_thresh : 32 2J4520102682-percent_used : 2 2J4520102682-data_units_read : 9450966 2J4520102682-data_units_written : 24105094 2J4520102682-host_read_commands : 137338588 2J4520102682-host_write_commands : 284702582 2J4520102682-controller_busy_time : 0 2J4520102682-power_cycles : 46 2J4520102682-power_on_hours : 1107 2J4520102682-unsafe_shutdowns : 10 2J4520102682-media_errors : 0 2J4520102682-num_err_log_entries : 0 2J4520102682-warning_temp_time : 0 2J4520102682-critical_comp_time : 0 2J4520102682-thm_temp1_trans_count : 0 2J4520102682-thm_temp2_trans_count : 0 2J4520102682-thm_temp1_total_time : 0 2J4520102682-thm_temp2_total_time : 0 S3ESNX0J951635M-critical_warning : 0 S3ESNX0J951635M-temperature : 310 S3ESNX0J951635M-avail_spare : 100 S3ESNX0J951635M-spare_thresh : 10 S3ESNX0J951635M-percent_used : 92 S3ESNX0J951635M-data_units_read : 32693378 S3ESNX0J951635M-data_units_written : 95742837 S3ESNX0J951635M-host_read_commands : 213266959 S3ESNX0J951635M-host_write_commands : 918085461 S3ESNX0J951635M-controller_busy_time : 4280 S3ESNX0J951635M-power_cycles : 218 S3ESNX0J951635M-power_on_hours : 3072 S3ESNX0J951635M-unsafe_shutdowns : 86 S3ESNX0J951635M-media_errors : 0 S3ESNX0J951635M-num_err_log_entries : 1 S3ESNX0J951635M-warning_temp_time : 0 S3ESNX0J951635M-critical_comp_time : 0 S3ESNX0J951635M-temperature_sensor_1 : 310 S3ESNX0J951635M-temperature_sensor_2 : 320 S3ESNX0J951635M-thm_temp1_trans_count : 0 S3ESNX0J951635M-thm_temp2_trans_count : 0 S3ESNX0J951635M-thm_temp1_total_time : 0 S3ESNX0J951635M-thm_temp2_total_time : 0 As you can see, there are three disks, that have the same metrics with different values. Now I started with a xymonserver.d/nvme.cfg looking like this: TEST2RRD="$TEST2RRD,nvme=ncv" SPLITNCV_nvme="*:GAUGE" GRAPHS_nvme="nvmecriticalwarning,nvmetemperature,nvmeavailspare,nvmesparethresh,nvmepercentused,nvmedataunitsread,nvmedataunitswritten,nvmehostreadcommands,nvmehostwritecommands,nvmecontrollerbusytime,nvmepowercycles,nvmepoweronhours,nvmeunsafeshutdowns,nvmemediaerrors,nvmenumerrlogentries,nvmewarningtemptime,nvmecriticalcomptime,nvmetemperaturesensor1,nvmetemperaturesensor2,nvmethmtemp1transcount,nvmethmtemp2transcount,nvmethmtemp1totaltime,nvmethmtemp2totaltime" This causes all rrd files beeing created correctly like this: $ ls -1 /var/lib/xymon/rrd/wsrbreb/nvme*temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,2J4520102682_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951626N_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951635M_temperature.rrd Unfortunately all datasets are now saved as datatype "GAUGE", but most need to have "DERIVE", but I cannot figure out how to do this. Here is what I've tried: SPLITNCV_nvme="temperture:GAUGE" SPLITNCV_nvme="*temperture:GAUGE" SPLITNCV_nvme=".*temperture:GAUGE" SPLITNCV_nvme="%.*temperture:GAUGE" One thing that does work is the following: SPLITNCV_nvme="S3ESNX0J951626N_temperature:GAUGE" But as you can see it requieres the name of the SSD, which cancels out all dynamics. How can I work with wildcards in SPLITNCV scenarios? Thank you for your help! Regards
list Damien Martins
Le 2020-10-01 11:33, Oliver R. a ?crit?:
▸
I've tested even more combinations trying to escape the asterisk with no success: SPLITNCV_nvme="temperature:GAUGE" SPLITNCV_nvme="*temperature:GAUGE" SPLITNCV_nvme="*temperature*:GAUGE" SPLITNCV_nvme="\*temperature:GAUGE" SPLITNCV_nvme="@RRDIDX at _temperature:GAUGE" SPLITNCV_nvme="p at RRDIDX@_temperature:GAUGE" SPLITNCV_nvme=".*temperature:GAUGE" SPLITNCV_nvme="%.*temperature:GAUGE" SPLITNCV_nvme="(.*)temperature:GAUGE" SPLITNCV_nvme="\.\*temperature:GAUGE" SPLITNCV_nvme="\\*temperature:GAUGE" SPLITNCV_nvme="\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\.\\*temperature:GAUGE" On the xymon side I think this information above is processed in "do_ncv.c", but I still could not find, where the string above gets parsed. Any help is appreciated. Regards Oliver Am 29.09.20 um 16:24 schrieb Oliver R.:Dear All, I have a custom test that reports data from nvme disks from the command: nvme smart-log /dev/nvme2n1 -o json The output is processed so that the client reports the following to the server: S3ESNX0J951626N-critical_warning : 0 S3ESNX0J951626N-temperature : 318 S3ESNX0J951626N-avail_spare : 100 S3ESNX0J951626N-spare_thresh : 10 S3ESNX0J951626N-percent_used : 92 S3ESNX0J951626N-data_units_read : 27832088 S3ESNX0J951626N-data_units_written : 93877408 S3ESNX0J951626N-host_read_commands : 180442558 S3ESNX0J951626N-host_write_commands : 916278700 S3ESNX0J951626N-controller_busy_time : 4028 S3ESNX0J951626N-power_cycles : 218 S3ESNX0J951626N-power_on_hours : 2995 S3ESNX0J951626N-unsafe_shutdowns : 86 S3ESNX0J951626N-media_errors : 0 S3ESNX0J951626N-num_err_log_entries : 0 S3ESNX0J951626N-warning_temp_time : 0 S3ESNX0J951626N-critical_comp_time : 0 S3ESNX0J951626N-temperature_sensor_1 : 318 S3ESNX0J951626N-temperature_sensor_2 : 323 S3ESNX0J951626N-thm_temp1_trans_count : 0 S3ESNX0J951626N-thm_temp2_trans_count : 0 S3ESNX0J951626N-thm_temp1_total_time : 0 S3ESNX0J951626N-thm_temp2_total_time : 0 2J4520102682-critical_warning : 0 2J4520102682-temperature : 314 2J4520102682-avail_spare : 100 2J4520102682-spare_thresh : 32 2J4520102682-percent_used : 2 2J4520102682-data_units_read : 9450966 2J4520102682-data_units_written : 24105094 2J4520102682-host_read_commands : 137338588 2J4520102682-host_write_commands : 284702582 2J4520102682-controller_busy_time : 0 2J4520102682-power_cycles : 46 2J4520102682-power_on_hours : 1107 2J4520102682-unsafe_shutdowns : 10 2J4520102682-media_errors : 0 2J4520102682-num_err_log_entries : 0 2J4520102682-warning_temp_time : 0 2J4520102682-critical_comp_time : 0 2J4520102682-thm_temp1_trans_count : 0 2J4520102682-thm_temp2_trans_count : 0 2J4520102682-thm_temp1_total_time : 0 2J4520102682-thm_temp2_total_time : 0 S3ESNX0J951635M-critical_warning : 0 S3ESNX0J951635M-temperature : 310 S3ESNX0J951635M-avail_spare : 100 S3ESNX0J951635M-spare_thresh : 10 S3ESNX0J951635M-percent_used : 92 S3ESNX0J951635M-data_units_read : 32693378 S3ESNX0J951635M-data_units_written : 95742837 S3ESNX0J951635M-host_read_commands : 213266959 S3ESNX0J951635M-host_write_commands : 918085461 S3ESNX0J951635M-controller_busy_time : 4280 S3ESNX0J951635M-power_cycles : 218 S3ESNX0J951635M-power_on_hours : 3072 S3ESNX0J951635M-unsafe_shutdowns : 86 S3ESNX0J951635M-media_errors : 0 S3ESNX0J951635M-num_err_log_entries : 1 S3ESNX0J951635M-warning_temp_time : 0 S3ESNX0J951635M-critical_comp_time : 0 S3ESNX0J951635M-temperature_sensor_1 : 310 S3ESNX0J951635M-temperature_sensor_2 : 320 S3ESNX0J951635M-thm_temp1_trans_count : 0 S3ESNX0J951635M-thm_temp2_trans_count : 0 S3ESNX0J951635M-thm_temp1_total_time : 0 S3ESNX0J951635M-thm_temp2_total_time : 0 As you can see, there are three disks, that have the same metrics with different values. Now I started with a xymonserver.d/nvme.cfg looking like this: TEST2RRD="$TEST2RRD,nvme=ncv" SPLITNCV_nvme="*:GAUGE" GRAPHS_nvme="nvmecriticalwarning,nvmetemperature,nvmeavailspare,nvmesparethresh,nvmepercentused,nvmedataunitsread,nvmedataunitswritten,nvmehostreadcommands,nvmehostwritecommands,nvmecontrollerbusytime,nvmepowercycles,nvmepoweronhours,nvmeunsafeshutdowns,nvmemediaerrors,nvmenumerrlogentries,nvmewarningtemptime,nvmecriticalcomptime,nvmetemperaturesensor1,nvmetemperaturesensor2,nvmethmtemp1transcount,nvmethmtemp2transcount,nvmethmtemp1totaltime,nvmethmtemp2totaltime" This causes all rrd files beeing created correctly like this: $ ls -1 /var/lib/xymon/rrd/wsrbreb/nvme*temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,2J4520102682_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951626N_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951635M_temperature.rrd Unfortunately all datasets are now saved as datatype "GAUGE", but most need to have "DERIVE", but I cannot figure out how to do this. Here is what I've tried: SPLITNCV_nvme="temperture:GAUGE" SPLITNCV_nvme="*temperture:GAUGE" SPLITNCV_nvme=".*temperture:GAUGE" SPLITNCV_nvme="%.*temperture:GAUGE" One thing that does work is the following: SPLITNCV_nvme="S3ESNX0J951626N_temperature:GAUGE" But as you can see it requieres the name of the SSD, which cancels out all dynamics. How can I work with wildcards in SPLITNCV scenarios? Thank you for your help! Regards
Hello Oliver, On my side, I'm using the following entry: SPLITNCV_postfix="*:GAUGE" This config creates the following files: postfix,Corrupt_Mails.rrd postfix,Mails_active.rrd postfix,Mails_in_deferred_State.rrd postfix,Incoming_Mails.rrd postfix,Mails_bouncing.rrd In a general manner, the SPLITNCV will create files based on test name (1st param), item name (2nd param) whith the following name structure: $test$,$item$.rrd Then you can work dynamically from these values.
list Oliver R.
Am 01.10.20 um 12:33 schrieb user-c12727b399f0@xymon.invalid:
▸
Le 2020-10-01 11:33, Oliver R. a ?crit?:I've tested even more combinations trying to escape the asterisk with no success: SPLITNCV_nvme="temperature:GAUGE" SPLITNCV_nvme="*temperature:GAUGE" SPLITNCV_nvme="*temperature*:GAUGE" SPLITNCV_nvme="\*temperature:GAUGE" SPLITNCV_nvme="@RRDIDX at _temperature:GAUGE" SPLITNCV_nvme="p at RRDIDX@_temperature:GAUGE" SPLITNCV_nvme=".*temperature:GAUGE" SPLITNCV_nvme="%.*temperature:GAUGE" SPLITNCV_nvme="(.*)temperature:GAUGE" SPLITNCV_nvme="\.\*temperature:GAUGE" SPLITNCV_nvme="\\*temperature:GAUGE" SPLITNCV_nvme="\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\\\\\\\*temperature:GAUGE" SPLITNCV_nvme="\\.\\*temperature:GAUGE" On the xymon side I think this information above is processed in "do_ncv.c", but I still could not find, where the string above gets parsed. Any help is appreciated. Regards Oliver Am 29.09.20 um 16:24 schrieb Oliver R.:Dear All, I have a custom test that reports data from nvme disks from the command: nvme smart-log /dev/nvme2n1 -o json The output is processed so that the client reports the following to the server: S3ESNX0J951626N-critical_warning : 0 S3ESNX0J951626N-temperature : 318 S3ESNX0J951626N-avail_spare : 100 S3ESNX0J951626N-spare_thresh : 10 S3ESNX0J951626N-percent_used : 92 S3ESNX0J951626N-data_units_read : 27832088 S3ESNX0J951626N-data_units_written : 93877408 S3ESNX0J951626N-host_read_commands : 180442558 S3ESNX0J951626N-host_write_commands : 916278700 S3ESNX0J951626N-controller_busy_time : 4028 S3ESNX0J951626N-power_cycles : 218 S3ESNX0J951626N-power_on_hours : 2995 S3ESNX0J951626N-unsafe_shutdowns : 86 S3ESNX0J951626N-media_errors : 0 S3ESNX0J951626N-num_err_log_entries : 0 S3ESNX0J951626N-warning_temp_time : 0 S3ESNX0J951626N-critical_comp_time : 0 S3ESNX0J951626N-temperature_sensor_1 : 318 S3ESNX0J951626N-temperature_sensor_2 : 323 S3ESNX0J951626N-thm_temp1_trans_count : 0 S3ESNX0J951626N-thm_temp2_trans_count : 0 S3ESNX0J951626N-thm_temp1_total_time : 0 S3ESNX0J951626N-thm_temp2_total_time : 0 2J4520102682-critical_warning : 0 2J4520102682-temperature : 314 2J4520102682-avail_spare : 100 2J4520102682-spare_thresh : 32 2J4520102682-percent_used : 2 2J4520102682-data_units_read : 9450966 2J4520102682-data_units_written : 24105094 2J4520102682-host_read_commands : 137338588 2J4520102682-host_write_commands : 284702582 2J4520102682-controller_busy_time : 0 2J4520102682-power_cycles : 46 2J4520102682-power_on_hours : 1107 2J4520102682-unsafe_shutdowns : 10 2J4520102682-media_errors : 0 2J4520102682-num_err_log_entries : 0 2J4520102682-warning_temp_time : 0 2J4520102682-critical_comp_time : 0 2J4520102682-thm_temp1_trans_count : 0 2J4520102682-thm_temp2_trans_count : 0 2J4520102682-thm_temp1_total_time : 0 2J4520102682-thm_temp2_total_time : 0 S3ESNX0J951635M-critical_warning : 0 S3ESNX0J951635M-temperature : 310 S3ESNX0J951635M-avail_spare : 100 S3ESNX0J951635M-spare_thresh : 10 S3ESNX0J951635M-percent_used : 92 S3ESNX0J951635M-data_units_read : 32693378 S3ESNX0J951635M-data_units_written : 95742837 S3ESNX0J951635M-host_read_commands : 213266959 S3ESNX0J951635M-host_write_commands : 918085461 S3ESNX0J951635M-controller_busy_time : 4280 S3ESNX0J951635M-power_cycles : 218 S3ESNX0J951635M-power_on_hours : 3072 S3ESNX0J951635M-unsafe_shutdowns : 86 S3ESNX0J951635M-media_errors : 0 S3ESNX0J951635M-num_err_log_entries : 1 S3ESNX0J951635M-warning_temp_time : 0 S3ESNX0J951635M-critical_comp_time : 0 S3ESNX0J951635M-temperature_sensor_1 : 310 S3ESNX0J951635M-temperature_sensor_2 : 320 S3ESNX0J951635M-thm_temp1_trans_count : 0 S3ESNX0J951635M-thm_temp2_trans_count : 0 S3ESNX0J951635M-thm_temp1_total_time : 0 S3ESNX0J951635M-thm_temp2_total_time : 0 As you can see, there are three disks, that have the same metrics with different values. Now I started with a xymonserver.d/nvme.cfg looking like this: TEST2RRD="$TEST2RRD,nvme=ncv" SPLITNCV_nvme="*:GAUGE" GRAPHS_nvme="nvmecriticalwarning,nvmetemperature,nvmeavailspare,nvmesparethresh,nvmepercentused,nvmedataunitsread,nvmedataunitswritten,nvmehostreadcommands,nvmehostwritecommands,nvmecontrollerbusytime,nvmepowercycles,nvmepoweronhours,nvmeunsafeshutdowns,nvmemediaerrors,nvmenumerrlogentries,nvmewarningtemptime,nvmecriticalcomptime,nvmetemperaturesensor1,nvmetemperaturesensor2,nvmethmtemp1transcount,nvmethmtemp2transcount,nvmethmtemp1totaltime,nvmethmtemp2totaltime" This causes all rrd files beeing created correctly like this: $ ls -1 /var/lib/xymon/rrd/wsrbreb/nvme*temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,2J4520102682_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951626N_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951635M_temperature.rrd Unfortunately all datasets are now saved as datatype "GAUGE", but most need to have "DERIVE", but I cannot figure out how to do this. Here is what I've tried: SPLITNCV_nvme="temperture:GAUGE" SPLITNCV_nvme="*temperture:GAUGE" SPLITNCV_nvme=".*temperture:GAUGE" SPLITNCV_nvme="%.*temperture:GAUGE" One thing that does work is the following: SPLITNCV_nvme="S3ESNX0J951626N_temperature:GAUGE" But as you can see it requieres the name of the SSD, which cancels out all dynamics. How can I work with wildcards in SPLITNCV scenarios? Thank you for your help! RegardsHello Oliver, On my side, I'm using the following entry: SPLITNCV_postfix="*:GAUGE" This config creates the following files: postfix,Corrupt_Mails.rrd postfix,Mails_active.rrd postfix,Mails_in_deferred_State.rrd postfix,Incoming_Mails.rrd postfix,Mails_bouncing.rrd In a general manner, the SPLITNCV will create files based on test name (1st param), item name (2nd param) whith the following name structure: $test$,$item$.rrd Then you can work dynamically from these values.
Thank you for the reply! The core problem is, that the item name has a dynamic part at the beginning (serial number of the nvme disk) and a static part at the end like temperature or data_units_read. As mentioned earlier, my files look like this: nvme,2J4520102682_avail_spare.rrd nvme,2J4520102682_temperature.rrd nvme,2J4520102682_percent_used.rrd nvme,S3ESNX0J951626N_avail_spare.rrd nvme,S3ESNX0J951626N_temperature.rrd nvme,S3ESNX0J951626N_percent_used.rrd nvme,S3ESNX0J951635M_temperature.rrd nvme,S3ESNX0J951635M_avail_spare.rrd nvme,S3ESNX0J951635M_percent_used.rrd Your suggestion of using "*:GAUGE" will make all metrics GAUGE, but this does not work due to metrics like "data_units_read" has an only increasing value. So the question is how to use wildcards in SPLITNCV. The asterisk alone has a special meaning of "define everything that does not match to GAUGE/NONE/...", but how can I use a regex or some sort of dynamic naming in the SPLITNCV_ variable? Regards Oliver
list Damien Martins
Le 29/09/2020 ? 16:24, Oliver R. a ?crit?:
▸
Dear All, I have a custom test that reports data from nvme disks from the command: nvme smart-log /dev/nvme2n1 -o json The output is processed so that the client reports the following to the server: S3ESNX0J951626N-critical_warning : 0 S3ESNX0J951626N-temperature : 318 S3ESNX0J951626N-avail_spare : 100 S3ESNX0J951626N-spare_thresh : 10 S3ESNX0J951626N-percent_used : 92 S3ESNX0J951626N-data_units_read : 27832088 S3ESNX0J951626N-data_units_written : 93877408 S3ESNX0J951626N-host_read_commands : 180442558 S3ESNX0J951626N-host_write_commands : 916278700 S3ESNX0J951626N-controller_busy_time : 4028 S3ESNX0J951626N-power_cycles : 218 S3ESNX0J951626N-power_on_hours : 2995 S3ESNX0J951626N-unsafe_shutdowns : 86 S3ESNX0J951626N-media_errors : 0 S3ESNX0J951626N-num_err_log_entries : 0 S3ESNX0J951626N-warning_temp_time : 0 S3ESNX0J951626N-critical_comp_time : 0 S3ESNX0J951626N-temperature_sensor_1 : 318 S3ESNX0J951626N-temperature_sensor_2 : 323 S3ESNX0J951626N-thm_temp1_trans_count : 0 S3ESNX0J951626N-thm_temp2_trans_count : 0 S3ESNX0J951626N-thm_temp1_total_time : 0 S3ESNX0J951626N-thm_temp2_total_time : 0 2J4520102682-critical_warning : 0 2J4520102682-temperature : 314 2J4520102682-avail_spare : 100 2J4520102682-spare_thresh : 32 2J4520102682-percent_used : 2 2J4520102682-data_units_read : 9450966 2J4520102682-data_units_written : 24105094 2J4520102682-host_read_commands : 137338588 2J4520102682-host_write_commands : 284702582 2J4520102682-controller_busy_time : 0 2J4520102682-power_cycles : 46 2J4520102682-power_on_hours : 1107 2J4520102682-unsafe_shutdowns : 10 2J4520102682-media_errors : 0 2J4520102682-num_err_log_entries : 0 2J4520102682-warning_temp_time : 0 2J4520102682-critical_comp_time : 0 2J4520102682-thm_temp1_trans_count : 0 2J4520102682-thm_temp2_trans_count : 0 2J4520102682-thm_temp1_total_time : 0 2J4520102682-thm_temp2_total_time : 0 S3ESNX0J951635M-critical_warning : 0 S3ESNX0J951635M-temperature : 310 S3ESNX0J951635M-avail_spare : 100 S3ESNX0J951635M-spare_thresh : 10 S3ESNX0J951635M-percent_used : 92 S3ESNX0J951635M-data_units_read : 32693378 S3ESNX0J951635M-data_units_written : 95742837 S3ESNX0J951635M-host_read_commands : 213266959 S3ESNX0J951635M-host_write_commands : 918085461 S3ESNX0J951635M-controller_busy_time : 4280 S3ESNX0J951635M-power_cycles : 218 S3ESNX0J951635M-power_on_hours : 3072 S3ESNX0J951635M-unsafe_shutdowns : 86 S3ESNX0J951635M-media_errors : 0 S3ESNX0J951635M-num_err_log_entries : 1 S3ESNX0J951635M-warning_temp_time : 0 S3ESNX0J951635M-critical_comp_time : 0 S3ESNX0J951635M-temperature_sensor_1 : 310 S3ESNX0J951635M-temperature_sensor_2 : 320 S3ESNX0J951635M-thm_temp1_trans_count : 0 S3ESNX0J951635M-thm_temp2_trans_count : 0 S3ESNX0J951635M-thm_temp1_total_time : 0 S3ESNX0J951635M-thm_temp2_total_time : 0 As you can see, there are three disks, that have the same metrics with different values. Now I started with a xymonserver.d/nvme.cfg looking like this: TEST2RRD="$TEST2RRD,nvme=ncv" SPLITNCV_nvme="*:GAUGE" GRAPHS_nvme="nvmecriticalwarning,nvmetemperature,nvmeavailspare,nvmesparethresh,nvmepercentused,nvmedataunitsread,nvmedataunitswritten,nvmehostreadcommands,nvmehostwritecommands,nvmecontrollerbusytime,nvmepowercycles,nvmepoweronhours,nvmeunsafeshutdowns,nvmemediaerrors,nvmenumerrlogentries,nvmewarningtemptime,nvmecriticalcomptime,nvmetemperaturesensor1,nvmetemperaturesensor2,nvmethmtemp1transcount,nvmethmtemp2transcount,nvmethmtemp1totaltime,nvmethmtemp2totaltime" This causes all rrd files beeing created correctly like this: $ ls -1 /var/lib/xymon/rrd/wsrbreb/nvme*temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,2J4520102682_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951626N_temperature.rrd /var/lib/xymon/rrd/wsrbreb/nvme,S3ESNX0J951635M_temperature.rrd Unfortunately all datasets are now saved as datatype "GAUGE", but most need to have "DERIVE", but I cannot figure out how to do this. Here is what I've tried: SPLITNCV_nvme="temperture:GAUGE" SPLITNCV_nvme="*temperture:GAUGE" SPLITNCV_nvme=".*temperture:GAUGE" SPLITNCV_nvme="%.*temperture:GAUGE" One thing that does work is the following: SPLITNCV_nvme="S3ESNX0J951626N_temperature:GAUGE" But as you can see it requieres the name of the SSD, which cancels out all dynamics. How can I work with wildcards in SPLITNCV scenarios? Thank you for your help! Regards
Ok Oliver, I understand your point now: you want to use wildcard related to the dataset name. Hence, we have to validate this is possible... I could not find anything pointing to wildcard usage in define the dataset name, in this documentation: https://xymon.sourceforge.io/xymon/help/howtograph.html I'm not a C guy, hence I'll consider documentation only: if there is no mention, it does not exist. You could try to handle manually (awful answer I know) the serial number of your NVMe's ? Or better: contribute to the code (answer from someone lazy who won't)
list Oliver R.
Am 01.10.20 um 15:58 schrieb Damien Martins:
▸
Ok Oliver, I understand your point now: you want to use wildcard related to the dataset name. Hence, we have to validate this is possible... I could not find anything pointing to wildcard usage in define the dataset name, in this documentation: https://xymon.sourceforge.io/xymon/help/howtograph.html I'm not a C guy, hence I'll consider documentation only: if there is no mention, it does not exist. You could try to handle manually (awful answer I know) the serial number of your NVMe's ? Or better: contribute to the code (answer from someone lazy who won't)
If it was written in shell code, I probably could contribute, but I don't know anything about C ... The magic lies in xymond/rrd/do_ncv.c (I think) I ended up setting "GAUGE" which creates all RRDs with this type and then I deleted all RRDs that need to be "DERIVE". Changed the config, restarted server and client and now everything is as it should... It's not ideal, but ok as a workaround. It's really scary to see that my desktop sending about 12 write commands per second to the SSD's. This results in 2% wear per week. Ok, it's only 256GB SSD and it hosts two operating systems 24/7, but nevertheless its not a good thing. I moved Firefoxs profile entirely to a ramdisk, because sqlite files like history and cookies grate the SSDs ... Maybe running BTRFS in Raid 1 was a wrong decision? Don't really know. Regards Oliver