Integer underflow in FILE mtime/ctime/atime check
list Axel Beckert
Hi,
at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently
in Debian Stable) there is an issue with an integer underflow in the
FILE mtime/ctime/atime check which basically seems to happen when
finding the file to check takes a moment (seconds) and the file in
question has been modified after the search for the file has been
started.
In such cases this integer underflow happens and causes a falso
positive due to instead of the time difference being negative, it's
insanely huge:
[red] /nf/2020/08/12/nfcapd.202008122015
File was modified 4294967289 seconds ago - should be <360
File was modified 4294967289 seconds ago - should be <960
So far for me this only happened, when
* client-local.cfg contains `...` commands to find the file, in our
example it is
file:`ls -1 $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1`
to find the newest file under /nf/20??/ by first looking for the
most recent directory and then finding the most recent file (both by
just sorting them alphabetically) as listing all files would take
much too long, hence the nested `... $() ...`.
The corresponding entries in analysis.cfg are
FILE %/nf/[0-9]*/[0-9]*/[0-9]*/nfcapd.[0-9]* MTIME<360 SIZE>9000000 yellow
FILE %/nf/[0-9]*/[0-9]*/[0-9]*/nfcapd.[0-9]* MTIME<960 SIZE>1500 red
* On a machine with slow I/O, in our case on a not that strong VM used
for testing.
I tried to reproduce this on a Raspberry Pi with a much simpler setup
and Xymon 4.3.30, but failed so far and don't know, why:
* client-local.cfg entry:
file:`sleep 90; echo /run/xymon-integer-underflow-bug`
* analysis.cfg entries:
FILE /run/xymon-integer-underflow-bug red mtime<61
FILE /run/xymon-integer-underflow-bug yellow mtime>0
* I then setup a cron job, touching the file every minute. Hence the
61 in the checks above.
Unfortunately this didn't suffice to trigger this issue. It is unclear
to me why as the only explanation I have is the additional time needed
to find the appropriate file to test. And the relevant code doesn't
seem to have changed between 4.3.28 and 4.3.30...
So I cheated a bit and setup the cron job to always set the file stamp
to one hour in the future using "touch -d 'next hour' . Then I at
least could trigger that integer underflow:
red /run/xymon-integer-underflow-bug
File was modified 4294963714 seconds ago - should be <61
I think there are several ways to fix this:
* Move capturing the "now" timestamp just before checking the files
timestamps, not before finding that file. Haven't noticed where the
actual stat call is, though.
* Use an signed instead of an unsigned integer to properly handle
negative time differences. Tried to implement this as a patch as it
seemed easy enough.
So far I managed to get this far:
[red] /run/xymon-integer-underflow-bug
File was modified -3592 seconds ago - should be <61
So I guess that patch still needs more finetuning. Will send it once
I got it fully working as I expect. But if there's interest in the
incomplete patch already, I'm happy to send it anyways.
(Respectively it is already online at
https://salsa.debian.org/debian/xymon/-/blob/files-integer-undeflow/debian/patches/93_fix_integer_underflow_in_files_check.patch)
Kind regards, Axel
--
PGP: 2FF9CD59612616B5 /~\ Plain Text Ribbon Campaign, http://arc.pasp.de/
Mail: user-bc188e45dae4@xymon.invalid \ / Say No to HTML in E-Mail and Usenet
Mail+Jabber: user-0064bde8d49d@xymon.invalid X
https://axel.beckert.ch/ / \ I love long mails: https://email.is-not-s.ms/
list Jeremy Laidman
▸
On Thu, 13 Aug 2020 at 13:18, Axel Beckert <user-bc188e45dae4@xymon.invalid> wrote:
Hi, at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently in Debian Stable) there is an issue with an integer underflow in the FILE mtime/ctime/atime check which basically seems to happen when finding the file to check takes a moment (seconds) and the file in question has been modified after the search for the file has been started.
Is it possible that the time on the host being monitored is in the future, at least compared to the time of the Xymon server? Can you provide the [clock] and [file:/nfs/...] sections of your client data message for an instance when the underflow has occurred? It's hard to see how a delay in the collection of file timestamps could cause this underflow, as the creation of the [clock] section (the source of "now") is executed after the creation of the [file:] section (the source of the MTIME value). Essentially it's: 1. create tempfile 2. echo "client" header >> tempfile 3. run OS script >> tempfile 4. run "logfetch" to generate [file:] sections >> tempfile 5. echo client version section >> tempfile 6. run "logfetch --clock" to gelerate [clock] section >> tempfile 7. send tempfile to Xymon server What you're describing is a situation where the time generated at step 4 is ahead of the time generated at step 6. I think this can only happen if the timestamp of the file is in the future. However, If the [clock] section of the client message does not exist, the the Xymon server will use its own time for its calculations.
▸
In such cases this integer underflow happens and causes a falsopositive due to instead of the time difference being negative, it's insanely huge:
No matter the cause, Xymon should take into account the possibility that the timestamp is in the future, and at least show "was modified N seconds into the future" or something similar, after handling or avoiding the underflow. Perhaps create the following entry in clientlocal.cfg: file:`exec >/tmp/clock-test 2>&1; date; $XYMONHOME/bin/logfetch --clock ; ls -l $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1; $XYMONHOME/bin/logfetch --clock; date` The output file /tmp/clock-test might give you some idea of what's going on. Remember that it can take up to 10 minutes for updates to clientlocal.cfg to take effect on the client. Cheers Jeremy
list Axel Beckert
Hi Jeremy, thanks a lot for your outside view. Maybe I was too focussed on the integer underrun to not see potential other issues...
▸
On Thu, Aug 13, 2020 at 01:59:50PM +1000, Jeremy Laidman wrote:On Thu, 13 Aug 2020 at 13:18, Axel Beckert <user-bc188e45dae4@xymon.invalid> wrote:at least in Xymon 4.2.28 (plus recent CVE patches, i.e. as currently in Debian Stable) there is an issue with an integer underflow in the FILE mtime/ctime/atime check which basically seems to happen when finding the file to check takes a moment (seconds) and the file in question has been modified after the search for the file has been started.Is it possible that the time on the host being monitored is in the future, at least compared to the time of the Xymon server?
Hmmmm, you've got a point there. I expected all our servers to have NTP, but this server in question was setup manually by a coworker and indeed has the wrong time ? but it's like 15 seconds behind the Xymon server (which definitely has NTP and I also compared it with two other machines ? then again the box in question has an ntp installed, but it wasn't running. I should probably monitor this. ;-) Client with timestamp issues: Thu Aug 13 11:53:03 CEST 2020 Xymon Server: Thu 13 Aug 2020 09:53:18 AM UTC I assume the different time zones (CEST being UTC+02:00) are no issue as we have quite some clients on CEST. Here's again the same difference just in Unix Epoch: Client: 1597312461 Server: 1597312477 Always about 15 to 16 seconds difference, but the client is lagging behind.
▸
Can you provide the [clock] and [file:/nfs/...] sections of your client data message for an instance when the underflow has occurred?
Yep: [file:/nf/2020/08/12/nfcapd.202008122015] type:100000 (file) mode:644 (-rw-r--r--) linkcount:1 owner:56137 (nfsen) group:48 (apache) size:232673177 clock:1597256407 (2020/08/12-20:20:07) atime:1597256114 (2020/08/12-20:15:14) ctime:1597256414 (2020/08/12-20:20:14) mtime:1597256414 (2020/08/12-20:20:14) [?] [clock] epoch: 1597256407.516640 local: 2020-08-12 20:20:07 CEST UTC: 2020-08-12 18:20:07 GMT (BTW, is it normal that the [clientversion] section is empty?)
▸
It's hard to see how a delay in the collection of file timestamps could cause this underflow, as the creation of the [clock] section (the source of "now") is executed after the creation of the [file:] section (the source of the MTIME value).
Good point. Solely from the facts above I'd expected that it is evaluated (but not printed) first. Then again, due to that empty [clientversion] section, I checked the actually installed xymon-client version and it's a horribly old client version (4.3.10). (It seems as if I really should get rid of manually installed servers or align those boxes even more with Ansible. :-) So forget this if this is solely caused by the ancient client. (And sorry for the noise because I just gave the server version and didn't even think to check the client version.)
▸
However, If the [clock] section of the client message does not exist, the the Xymon server will use its own time for its calculations.
It does exist.
▸
In such cases this integer underflow happens and causes a falso positive due to instead of the time difference being negative, it's insanely huge:No matter the cause, Xymon should take into account the possibility that the timestamp is in the future, and at least show "was modified N seconds into the future" or something similar, after handling or avoiding the underflow.
Ack. Sometimes files copied from other machines with time in the future could be copied onto (or mounted on) a system with proper time. Maybe there are even applications where future timestamps make sense, i.e. if they're being misused for storing some date or so, who knows.
▸
Perhaps create the following entry in clientlocal.cfg: file:`exec >/tmp/clock-test 2>&1; date; $XYMONHOME/bin/logfetch --clock ; ls -l $(ls -1d /nf/2???/??/??/ | tail -1)* | tail -1; $XYMONHOME/bin/logfetch --clock; date` The output file /tmp/clock-test might give you some idea of what's going on. Remember that it can take up to 10 minutes for updates to clientlocal.cfg to take effect on the client.
Will do in case the other facts above do not suffice already.
▸
Kind regards, Axel -- PGP: 2FF9CD59612616B5 /~\ Plain Text Ribbon Campaign, http://arc.pasp.de/ Mail: user-bc188e45dae4@xymon.invalid \ / Say No to HTML in E-Mail and Usenet Mail+Jabber: user-0064bde8d49d@xymon.invalid X https://axel.beckert.ch/ / \ I love long mails: https://email.is-not-s.ms/