URLPlus interest - looking for feedback

list Ralph Mitchell
Fri, 31 Jul 2009 12:57:47 -0500
Message-Id: <user-9e160158c71e@xymon.invalid>

On Fri, Jul 31, 2009 at 11:34 AM, Gary Baluha <user-ae3e15c22de1@xymon.invalid> wrote:

On Fri, Jul 31, 2009 at 4:57 AM, Ralph Mitchell <user-00a5e44c48c0@xymon.invalid>wrote:

I could really have used something like your feature request about 6 years
ago.  Instead I spent a lot of time handcrafting bash scripts to login to
web pages.

Yep, that's kind of how URLPlus got started in the first place ;-)

Don't get me started on the sites that hit you with 5 different types of
redirects before reaching the front page, or the sites where each input
field is held in it's own personal form. and the submit button executes
javascript to copy the values into form full of hidden fields for the actual
submittal.

The redirect issue actually isn't too difficult to work around.  I have
been working on a perl program that is capable of more in-depth session
management than URLPlus is currently capable of, and the solution I'm using
now seems to work pretty well.  My goal is to eventually convert URLPlus
from using a command-line curl solution, to my current one.  This new method
deals with multi-page redirects better.

It's not so much the multi-page redirects using the standard "302: page is
now elsewhere" format, as the other weird ways redirects are sometimes done.
 The one that irritated me the most did all of these, in no particular
order:

   1) meta-refresh with zero time delay and a new url

   2) self-submitting form - i.e. a preloaded form with "form.submit();" at
the end of the html, between script tags

   3) self-submitting form - another preloaded form, but with
"onLoad=form.submit();" in the html BODY tag

   4) in script tags, change the page location via:   top.location="newurl"

   5) as above, but use "top.href", or "page.href" or something similar.

I'm not knocking your efforts - you've already done more than I ever did
towards a generic webpage check.  I just think that the above are going to
be tricky to handle in an automated way without replicating a large fraction
of a web browser.  But, now at least they're documented in the mailing list
for anyone interested in doing their own web checks...  :)

As for the javascript part, that is a bit more difficult.

Especially when the page you just downloaded creates the form POST url
on-the-fly from some of the form elements filled in by the user.  Yep, saw
that happen too...  Another weird page ran a java function to generate a
random character string to include in the url - luckily the function wasn't
too hard to extract and shove through the spidermonkey javascript
interpreter...  :)

Ralph Mitchell