Xymon Mailing List Archive search

Integer underflow in FILE mtime/ctime/atime check

3 messages in this thread

list Axel Beckert · Thu, 13 Aug 2020 05:09:53 +0200 ·
Hi,

at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently
in Debian Stable) there is an issue with an integer underflow in the
FILE mtime/ctime/atime check which basically seems to happen when
finding the file to check takes a moment (seconds) and the file in
question has been modified after the search for the file has been
started.

In such cases this integer underflow happens and causes a falso
positive due to instead of the time difference being negative, it's
insanely huge:

  [red] /nf/2020/08/12/nfcapd.202008122015
  File was modified 4294967289 seconds ago - should be <360
  File was modified 4294967289 seconds ago - should be <960

So far for me this only happened, when

* client-local.cfg contains `...` commands to find the file, in our
  example it is

    file:`ls -1 $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1`

  to find the newest file under /nf/20??/ by first looking for the
  most recent directory and then finding the most recent file (both by
  just sorting them alphabetically) as listing all files would take
  much too long, hence the nested `... $() ...`.

  The corresponding entries in analysis.cfg are

    FILE    %/nf/[0-9]*/[0-9]*/[0-9]*/nfcapd.[0-9]* MTIME<360 SIZE>9000000 yellow
    FILE    %/nf/[0-9]*/[0-9]*/[0-9]*/nfcapd.[0-9]* MTIME<960 SIZE>1500 red

* On a machine with slow I/O, in our case on a not that strong VM used
  for testing.

I tried to reproduce this on a Raspberry Pi with a much simpler setup
and Xymon 4.3.30, but failed so far and don't know, why:

* client-local.cfg entry:

    file:`sleep 90; echo /run/xymon-integer-underflow-bug`

* analysis.cfg entries:

    FILE /run/xymon-integer-underflow-bug red mtime<61
    FILE /run/xymon-integer-underflow-bug yellow mtime>0

* I then setup a cron job, touching the file every minute. Hence the
  61 in the checks above.

Unfortunately this didn't suffice to trigger this issue. It is unclear
to me why as the only explanation I have is the additional time needed
to find the appropriate file to test. And the relevant code doesn't
seem to have changed between 4.3.28 and 4.3.30...

So I cheated a bit and setup the cron job to always set the file stamp
to one hour in the future using "touch -d 'next hour' . Then I at
least could trigger that integer underflow:

  red /run/xymon-integer-underflow-bug
  File was modified 4294963714 seconds ago - should be <61

I think there are several ways to fix this:

* Move capturing the "now" timestamp just before checking the files
  timestamps, not before finding that file. Haven't noticed where the
  actual stat call is, though.

* Use an signed instead of an unsigned integer to properly handle
  negative time differences. Tried to implement this as a patch as it
  seemed easy enough.

  So far I managed to get this far:

    [red] /run/xymon-integer-underflow-bug
    File was modified -3592 seconds ago - should be <61

  So I guess that patch still needs more finetuning. Will send it once
  I got it fully working as I expect. But if there's interest in the
  incomplete patch already, I'm happy to send it anyways.
  (Respectively it is already online at
  https://salsa.debian.org/debian/xymon/-/blob/files-integer-undeflow/debian/patches/93_fix_integer_underflow_in_files_check.patch)

		Kind regards, Axel
-- 
PGP: 2FF9CD59612616B5      /~\  Plain Text Ribbon Campaign, http://arc.pasp.de/
Mail: user-bc188e45dae4@xymon.invalid  \ /  Say No to HTML in E-Mail and Usenet
Mail+Jabber: user-0064bde8d49d@xymon.invalid  X
https://axel.beckert.ch/   / \  I love long mails: https://email.is-not-s.ms/
list Jeremy Laidman · Thu, 13 Aug 2020 13:59:50 +1000 ·
quoted from Axel Beckert
On Thu, 13 Aug 2020 at 13:18, Axel Beckert <user-bc188e45dae4@xymon.invalid> wrote:
Hi,

at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently
in Debian Stable) there is an issue with an integer underflow in the
FILE mtime/ctime/atime check which basically seems to happen when
finding the file to check takes a moment (seconds) and the file in
question has been modified after the search for the file has been
started.
Is it possible that the time on the host being monitored is in the future,
at least compared to the time of the Xymon server?

Can you provide the [clock] and [file:/nfs/...] sections of your client
data message for an instance when the underflow has occurred?

It's hard to see how a delay in the collection of file timestamps could
cause this underflow, as the creation of the [clock] section (the source of
"now") is executed after the creation of the [file:] section (the source of
the MTIME value). Essentially it's:

1. create tempfile
2. echo "client" header >> tempfile
3. run OS script >> tempfile
4. run "logfetch" to generate [file:] sections >> tempfile
5. echo client version section >> tempfile
6. run "logfetch --clock" to gelerate [clock] section >> tempfile
7. send tempfile to Xymon server

What you're describing is a situation where the time generated at step 4 is
ahead of the time generated at step 6. I think this can only happen if the
timestamp of the file is in the future.

However, If the [clock] section of the client message does not exist, the
the Xymon server will use its own time for its calculations.
quoted from Axel Beckert

In such cases this integer underflow happens and causes a falso
positive due to instead of the time difference being negative, it's
insanely huge:
No matter the cause, Xymon should take into account the possibility that
the timestamp is in the future, and at least show "was modified N seconds
into the future" or something similar, after handling or avoiding the
underflow.

Perhaps create the following entry in clientlocal.cfg:

file:`exec >/tmp/clock-test 2>&1; date; $XYMONHOME/bin/logfetch --clock ;
ls -l $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1;
$XYMONHOME/bin/logfetch --clock; date`

The output file /tmp/clock-test might give you some idea of what's going
on. Remember that it can take up to 10 minutes for updates to
clientlocal.cfg to take effect on the client.

Cheers
Jeremy
list Axel Beckert · Thu, 13 Aug 2020 12:15:00 +0200 ·
Hi Jeremy,

thanks a lot for your outside view. Maybe I was too focussed on the
integer underrun to not see potential other issues...
quoted from Jeremy Laidman

On Thu, Aug 13, 2020 at 01:59:50PM +1000, Jeremy Laidman wrote:
On Thu, 13 Aug 2020 at 13:18, Axel Beckert <user-bc188e45dae4@xymon.invalid> wrote:
at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently
in Debian Stable) there is an issue with an integer underflow in the
FILE mtime/ctime/atime check which basically seems to happen when
finding the file to check takes a moment (seconds) and the file in
question has been modified after the search for the file has been
started.
Is it possible that the time on the host being monitored is in the future,
at least compared to the time of the Xymon server?
Hmmmm, you've got a point there. I expected all our servers to have
NTP, but this server in question was setup manually by a coworker and
indeed has the wrong time ? but it's like 15 seconds behind the Xymon
server (which definitely has NTP and I also compared it with two other
machines ? then again the box in question has an ntp installed, but it
wasn't running. I should probably monitor this. ;-)

Client with timestamp issues: Thu Aug 13 11:53:03 CEST 2020
Xymon Server:	      	      Thu 13 Aug 2020 09:53:18 AM UTC

I assume the different time zones (CEST being UTC+02:00) are no issue
as we have quite some clients on CEST.

Here's again the same difference just in Unix Epoch:

Client: 1597312461
Server: 1597312477

Always about 15 to 16 seconds difference, but the client is lagging behind.
quoted from Jeremy Laidman
Can you provide the [clock] and [file:/nfs/...] sections of your client
data message for an instance when the underflow has occurred?
Yep:

[file:/nf/2020/08/12/nfcapd.202008122015]
type:100000 (file)
mode:644 (-rw-r--r--)
linkcount:1
owner:56137 (nfsen)
group:48 (apache)
size:232673177
clock:1597256407 (2020/08/12-20:20:07)
atime:1597256114 (2020/08/12-20:15:14)
ctime:1597256414 (2020/08/12-20:20:14)
mtime:1597256414 (2020/08/12-20:20:14)

[?]

[clock]
epoch: 1597256407.516640
local: 2020-08-12 20:20:07 CEST
UTC: 2020-08-12 18:20:07 GMT

(BTW, is it normal that the [clientversion] section is empty?)
quoted from Jeremy Laidman
It's hard to see how a delay in the collection of file timestamps could
cause this underflow, as the creation of the [clock] section (the source of
"now") is executed after the creation of the [file:] section (the source of
the MTIME value).
Good point. Solely from the facts above I'd expected that it is
evaluated (but not printed) first.

Then again, due to that empty [clientversion] section, I checked the
actually installed xymon-client version and it's a horribly old client
version (4.3.10). (It seems as if I really should get rid of manually
installed servers or align those boxes even more with Ansible. :-)

So forget this if this is solely caused by the ancient client. (And
sorry for the noise because I just gave the server version and didn't
even think to check the client version.)
quoted from Jeremy Laidman
However, If the [clock] section of the client message does not exist, the
the Xymon server will use its own time for its calculations.
It does exist.
quoted from Jeremy Laidman
In such cases this integer underflow happens and causes a falso
positive due to instead of the time difference being negative, it's
insanely huge:
No matter the cause, Xymon should take into account the possibility that
the timestamp is in the future, and at least show "was modified N seconds
into the future" or something similar, after handling or avoiding the
underflow.
Ack. Sometimes files copied from other machines with time in the
future could be copied onto (or mounted on) a system with proper time.
Maybe there are even applications where future timestamps make sense,
i.e. if they're being misused for storing some date or so, who knows.
quoted from Jeremy Laidman
Perhaps create the following entry in clientlocal.cfg:

file:`exec >/tmp/clock-test 2>&1; date; $XYMONHOME/bin/logfetch --clock ;
ls -l $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1;
$XYMONHOME/bin/logfetch --clock; date`

The output file /tmp/clock-test might give you some idea of what's going
on. Remember that it can take up to 10 minutes for updates to
clientlocal.cfg to take effect on the client.
Will do in case the other facts above do not suffice already.
quoted from Axel Beckert

		Kind regards, Axel
-- 
PGP: 2FF9CD59612616B5      /~\  Plain Text Ribbon Campaign, http://arc.pasp.de/
Mail: user-bc188e45dae4@xymon.invalid  \ /  Say No to HTML in E-Mail and Usenet
Mail+Jabber: user-0064bde8d49d@xymon.invalid  X
https://axel.beckert.ch/   / \  I love long mails: https://email.is-not-s.ms/