Jochen Schmidt <jsc@dataheaven.de> wrote:
+---------------
| Raymond Wiker wrote:
| > I wrote a small (37 lines) function yesterday for parsing
| > URIs. When I compared it with NET.URI, I noticed that I didn't handle
| > fragments (internal anchors in html files). On the other hand, it
| > *does* handle usernames and passwords.
|
| Thanks - my new parsing function of NET.URI it handles
| scheme, authority, path, query and fragment parts. It is only 22 Lines
| long. I don't thought that it would be a such good idea to bring in more
| specialized fields, as after the scheme each URI is free to define it's
| own syntax.
+---------------
But at least for the "Common Internet Scheme Syntax" [RFC 1738 Section 3.1],
that is, anything that starts with "//" after the scheme (including the
"http:", "ftp:", & "telnet:" schemes), you definitely should parse *all*
of the elements (if present):
//<user>:<password>@<host>:<port>/<url-path>
There have been published exploits recently that involved deceiving users
by formatting a "user" component that *looked* like a domain name but wasn't,
because of a later "@". See the following RISKS Digest articles for an
especially sneaky example:
"Making something look hacked when it isn't"
<URL:http://catless.ncl.ac.uk/Risks/21.16.html#subj5.1>
"The risk of a seldom-used URL syntax"
<URL:http://catless.ncl.ac.uk/Risks/21.16.html#subj6.1>
<URL:http://catless.ncl.ac.uk/Risks/21.18.html#subj15.1>
-Rob
-----
Rob Warnock, 31-2-510 rpw3@sgi.com
SGI Network Engineering <URL:http://reality.sgi.com/rpw3/>
1600 Amphitheatre Pkwy. Phone: 650-933-1673
Mountain View, CA 94043 PP-ASEL-IA