Xymon Mailing List Archive search

Regex escaping in 'cont=' test

5 messages in this thread

list John Thurston · Wed, 4 Oct 2017 12:55:06 -0800 ·
I'm fighting with the correct escaping and encoding for http content checks using the "cont=" tag:
cont[=COLUMN];URL;[expected_data_regexp|#digesttype:digest]
  This tag is used to specify a http/https check, where it is also
  checked that specific content is present in the server response.
. . .
  The regex is pre-processed for backslash "\" escape sequences.  . .
I can't find the expression to match the string:
    <a href="foo/bar">
(Which I hope your email client isn't going to try to render as html!)

The closest I can manage is:
   a\x20href=\x22foo/bar\x22>

Where \x20 is an ASCII space, and \x22 is a double-quote

If I put a leading \x3D (which is an equal-sign), that renders in the search string and obviously doesn't match my supplied content. If, however, I put a leading \x3C (which is the less-than sign) the rest of the expression is eaten and is not rendered. I've tried leading the \x3C with \x5C (which is a backslash), with no effect.

I also tried leading with \x5C\x78\x33\x43 (which is \x3C), which renders as such, but does not match my string.

The upshot is, I can match enough of my string to be unique on my page. But it seems like something isn't right in the regex escaping and cleansing for this test. The supplied string should be accepted as a string, but the "<" seems to be interpreted during the parsing instead.

Can anyone else find a way to use a "<" in the regex of the cont= test?


-- 
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska
list Ralph Mitchell · Fri, 6 Oct 2017 13:21:58 -0400 ·
Did you try quoting the entire 'cont;URL;[expected_data]" string?

I just tried this:

192.168.1.4   xxxx.yyyy.com        #  "cont;http://xxxx.yyyy.com/test.html;<a
href=\x22foo/bar\x22>"

and the page source for the "info" column shows:

<tr><th align=left>Content checks:</th><td align=left>
<a href="http://xxxx.yyyy.com/test.html">http://xxxx.yyyy.com/test.html</a>&nbsp;
must return '<a href="foo/bar">'<br>
</td></tr>

so you can see it picked up the whole  '<a href="foo/bar">' string.  The
test.html file on the server contains nothing but the opening and closing
html/body tags, and the match string.  If I change "foo" to "fod" in
test.html, the match fails and if I change the leading "<" to a comma, the
match also fails

      http://xxxx.yyyy.com/test.html - Testing URL yields:

      ,a href="foo/bar">

Ralph Mitchell


On Wed, Oct 4, 2017 at 4:55 PM, John Thurston <user-ce4d79d99bab@xymon.invalid>
quoted from John Thurston
wrote:
I'm fighting with the correct escaping and encoding for http content
checks using the "cont=" tag:

cont[=COLUMN];URL;[expected_data_regexp|#digesttype:digest]
  This tag is used to specify a http/https check, where it is also
  checked that specific content is present in the server response.
. . .
  The regex is pre-processed for backslash "\" escape sequences.  . .
I can't find the expression to match the string:
   <a href="foo/bar">
(Which I hope your email client isn't going to try to render as html!)

The closest I can manage is:
  a\x20href=\x22foo/bar\x22>

Where \x20 is an ASCII space, and \x22 is a double-quote

If I put a leading \x3D (which is an equal-sign), that renders in the
search string and obviously doesn't match my supplied content. If, however,
I put a leading \x3C (which is the less-than sign) the rest of the
expression is eaten and is not rendered. I've tried leading the \x3C with
\x5C (which is a backslash), with no effect.

I also tried leading with \x5C\x78\x33\x43 (which is \x3C), which renders
as such, but does not match my string.

The upshot is, I can match enough of my string to be unique on my page.
But it seems like something isn't right in the regex escaping and cleansing
for this test. The supplied string should be accepted as a string, but the
"<" seems to be interpreted during the parsing instead.

Can anyone else find a way to use a "<" in the regex of the cont= test?


--
   Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska

list John Thurston · Fri, 6 Oct 2017 09:45:15 -0800 ·
With eager fingers, I went to adjust my hosts.cfg, and .. ... no diff. 
I've tried both single and double quotes. Single quotes caused nothing 
to be parsed from the cont= tag. Double quotes made no difference in 
behavior.

Maybe this is another oddity stemming from my ancient OS and underlying 
libraries. (Solaris 10)

Do you see the behavior I described if you _don't_ wrap in quotes?
signature

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska

quoted from Ralph Mitchell
On 10/6/2017 9:21 AM, Ralph Mitchell wrote:
Did you try quoting the entire 'cont;URL;[expected_data]" string?

I just tried this:

192.168.1.4   xxxx.yyyy.com <http://xxxx.yyyy.com>;        #
"cont;http://xxxx.yyyy.com/test.html <http://xxxx.yyyy.com/test.html>;<a
quoted from Ralph Mitchell
href=\x22foo/bar\x22>"

and the page source for the "info" column shows:

<tr><th align=left>Content checks:</th><td align=left>

<a href="http://xxxx.yyyy.com/test.html
<http://xxxx.yyyy.com/test.html>">http://xxxx.yyyy.com/test.html
<http://xxxx.yyyy.com/test.html></a>&nbsp; must return '<a
quoted from Ralph Mitchell
href="foo/bar">'<br>
</td></tr>

so you can see it picked up the whole  '<a href="foo/bar">' string.  The
test.html file on the server contains nothing but the opening and
closing html/body tags, and the match string.  If I change "foo" to
"fod" in test.html, the match fails and if I change the leading "<" to a
comma, the match also fails

      http://xxxx.yyyy.com/test.html - Testing URL yields:

      ,a href="foo/bar">

Ralph Mitchell


On Wed, Oct 4, 2017 at 4:55 PM, John Thurston <user-ce4d79d99bab@xymon.invalid
<mailto:user-ce4d79d99bab@xymon.invalid>> wrote:

    I'm fighting with the correct escaping and encoding for http content
    checks using the "cont=" tag:

        cont[=COLUMN];URL;[expected_data_regexp|#digesttype:digest]
          This tag is used to specify a http/https check, where it is also
          checked that specific content is present in the server response.

    . . .

          The regex is pre-processed for backslash "\" escape
        sequences.  . .


    I can't find the expression to match the string:
       <a href="foo/bar">
    (Which I hope your email client isn't going to try to render as html!)

    The closest I can manage is:
      a\x20href=\x22foo/bar\x22>

    Where \x20 is an ASCII space, and \x22 is a double-quote

    If I put a leading \x3D (which is an equal-sign), that renders in
    the search string and obviously doesn't match my supplied content.
    If, however, I put a leading \x3C (which is the less-than sign) the
    rest of the expression is eaten and is not rendered. I've tried
    leading the \x3C with \x5C (which is a backslash), with no effect.

    I also tried leading with \x5C\x78\x33\x43 (which is \x3C), which
    renders as such, but does not match my string.

    The upshot is, I can match enough of my string to be unique on my
    page. But it seems like something isn't right in the regex escaping
    and cleansing for this test. The supplied string should be accepted
    as a string, but the "<" seems to be interpreted during the parsing
    instead.

    Can anyone else find a way to use a "<" in the regex of the cont= test?


    --
       Do things because you should, not just because you can.

    John Thurston    XXX-XXX-XXXX <tel:XXX-XXX-XXXX>
    user-ce4d79d99bab@xymon.invalid <mailto:user-ce4d79d99bab@xymon.invalid>
    Department of Administration
    State of Alaska
list Ralph Mitchell · Fri, 6 Oct 2017 22:37:15 -0400 ·
Took the quotes out, had to change the space to \x20 to get it to show in
the info page.

So, server/etc/hosts.cfg is now essentially the same as yours, but with the
<:
192.168.1.4   xxxx.yyyy.com        # cont;http://xxxx.yyyy.com/test.html
;<a\x20href=\x22foo/bar\x22>

source of the info page looks like this:
<tr><th align=left>Content checks:</th><td align=left>
<a href="http://xxxx.yyyy.com/test.html">http://xxxx.yyyy.com/test.html</a>&nbsp;
must return '<a href="foo/bar">'<br>
</td></tr>

test.html contains:
     <a href="foo/bar">

content test shows green.  In test.html, changed foo to fop, test went
red.  Changed < to dot, test went red.

Maybe the parser is broken in your version and fixed in mine, or vice
versa??  I'm running  Xymon 4.3.28-rc2 on CentOS 7.

Ralph Mitchell


On Fri, Oct 6, 2017 at 1:45 PM, John Thurston <user-ce4d79d99bab@xymon.invalid>
quoted from John Thurston
wrote:
With eager fingers, I went to adjust my hosts.cfg, and .. ... no diff.
I've tried both single and double quotes. Single quotes caused nothing to
be parsed from the cont= tag. Double quotes made no difference in behavior.

Maybe this is another oddity stemming from my ancient OS and underlying
libraries. (Solaris 10)

Do you see the behavior I described if you _don't_ wrap in quotes?

--
   Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska

On 10/6/2017 9:21 AM, Ralph Mitchell wrote:
Did you try quoting the entire 'cont;URL;[expected_data]" string?

I just tried this:

192.168.1.4   xxxx.yyyy.com <http://xxxx.yyyy.com>;        #
"cont;http://xxxx.yyyy.com/test.html <http://xxxx.yyyy.com/test.html>;<a
href=\x22foo/bar\x22>"

and the page source for the "info" column shows:

<tr><th align=left>Content checks:</th><td align=left>
<a href="http://xxxx.yyyy.com/test.html
<http://xxxx.yyyy.com/test.html>">http://xxxx.yyyy.com/test.html
<http://xxxx.yyyy.com/test.html></a>&nbsp; must return '<a
href="foo/bar">'<br>
</td></tr>

so you can see it picked up the whole  '<a href="foo/bar">' string.  The
test.html file on the server contains nothing but the opening and
closing html/body tags, and the match string.  If I change "foo" to
"fod" in test.html, the match fails and if I change the leading "<" to a
comma, the match also fails

      http://xxxx.yyyy.com/test.html - Testing URL yields:

      ,a href="foo/bar">

Ralph Mitchell


On Wed, Oct 4, 2017 at 4:55 PM, John Thurston <user-ce4d79d99bab@xymon.invalid
<mailto:user-ce4d79d99bab@xymon.invalid>> wrote:

    I'm fighting with the correct escaping and encoding for http content
    checks using the "cont=" tag:

        cont[=COLUMN];URL;[expected_data_regexp|#digesttype:digest]
          This tag is used to specify a http/https check, where it is also
          checked that specific content is present in the server response.

    . . .

          The regex is pre-processed for backslash "\" escape
        sequences.  . .


    I can't find the expression to match the string:
       <a href="foo/bar">
    (Which I hope your email client isn't going to try to render as html!)

    The closest I can manage is:
      a\x20href=\x22foo/bar\x22>

    Where \x20 is an ASCII space, and \x22 is a double-quote

    If I put a leading \x3D (which is an equal-sign), that renders in
    the search string and obviously doesn't match my supplied content.
    If, however, I put a leading \x3C (which is the less-than sign) the
    rest of the expression is eaten and is not rendered. I've tried
    leading the \x3C with \x5C (which is a backslash), with no effect.

    I also tried leading with \x5C\x78\x33\x43 (which is \x3C), which
    renders as such, but does not match my string.

    The upshot is, I can match enough of my string to be unique on my
    page. But it seems like something isn't right in the regex escaping
    and cleansing for this test. The supplied string should be accepted
    as a string, but the "<" seems to be interpreted during the parsing
    instead.

    Can anyone else find a way to use a "<" in the regex of the cont=
test?


    --
       Do things because you should, not just because you can.

    John Thurston    XXX-XXX-XXXX <tel:XXX-XXX-XXXX>
    user-ce4d79d99bab@xymon.invalid <mailto:user-ce4d79d99bab@xymon.invalid>
    Department of Administration
    State of Alaska
list John Thurston · Mon, 9 Oct 2017 09:17:55 -0800 ·
quoted from Ralph Mitchell
On 10/6/2017 6:37 PM, Ralph Mitchell wrote:
Took the quotes out, had to change the space to \x20 to get it to show
in the info page.

[ confirms the reg-ex works on linux ]
quoted from Ralph Mitchell

Maybe the parser is broken in your version and fixed in mine, or vice
versa??  I'm running  Xymon 4.3.28-rc2 on CentOS 7.
I'm running 4.3.26 on Solaris 10. I doubt anything has changed between .26 and .28 in this segment of the code. In the past, we've seen problems on Solaris related to parsing and blank space. I suspect this is just another of them.

Thank you for the confirmation that the problems does not extend to the more common linux installations. This is a defect, but a defect which affects a small user-group.

I'll make do with the understanding that my reg-ex can't include 'less than'.
quoted from Ralph Mitchell

--
    Do things because you should, not just because you can.

John Thurston    XXX-XXX-XXXX
user-ce4d79d99bab@xymon.invalid
Department of Administration
State of Alaska