Christopher Browne <cbbrowne@hex.net> wrote:
+---------------
| Will Hartung said:
| >Somewhere, perhaps PG's home page (not that I have the URL, mind you), is
| >his enumerations about "Web Success". In them he frowns on server generated
| >HTML pages, so this would fit in quite well philisophically.
|
| <http://www.paulgraham.com/mistakes.html>
| The thing that seems most relevant:
| Dynamically generated HTML is bad, because search engines ignore it.
+---------------
Also see Philip Greenspun's comments on this issue, and his solution:
<URL:http://www.arsdigita.com/books/panda/publicizing>
[...skip down 2/3 of the way...]
Hiding Your Content from Search Engines (By Mistake)
...
I built a question and answer forum...all the postings were
stored in a relational database. ... The URLs end up looking
like "http://photo.net/bboard/fetch-msg.tcl?msg_id=000037".
...
AltaVista comes along and says, "Look at that question mark.
Look at the strange .tcl extension. This looks like a CGI script
to me. I'm going to be nice and not follow this link even though
there is no robots.txt file to discourage me."
Then WebCrawler says the same thing.
Then Lycos.
I achieved oblivion.
Briefly, his solution was:
Write another AOLServer TCL program that presents all the messages
from URLs that look like static files, e.g., "/fetch-msg-000037.html"
and point the search engines to a huge page of links like that.
The text of the Q&A forum postings will get indexed out of these
pseudo-static files and yet I can retain the user pages with their
*.tcl URLs.
...
(see my discussion of why the AOLserver *.tcl URLs are so good in
the chapters on Web programming; see http://photo.net/wtr/thebook/
bboard-for-search-engines.txt for the source code).
[Greenspun uses Tcl where many of us would choose Lisp (or even Scheme).]
-Rob
-----
Rob Warnock, 31-2-510 rpw3@sgi.com
Network Engineering http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
1600 Amphitheatre Pkwy. PP-ASEL-IA
Mountain View, CA 94043