Subject: Re: Could CDR-coding be on the way back?
From: Erik Naggum <erik@naggum.net>
Date: 14 Dec 2000 23:23:33 +0000
Newsgroups: comp.lang.lisp,comp.arch
Message-ID: <3185825013728755@naggum.net>

* Jan Ingvoldstad <jani@ifi.uio.no>
| This is a bit too crude, unless it's your local server doing the
| fetching for you, not the client (user agent).  But that may be what
| you mean with "clever cache propagation".

  I consider today's USENET distribution a crude cache propagation
  mechanism and propose something that uses much less bandwidth and disk
  space while it maintains the principle of nearby caches.  A few years
  ago, for instance, the five largest USENET sites in Oslo, Norway, were
  all located within a square mile, duplicating each other with great
  precision and speed because they had different user bases and owners.
  The extant news software could not propagate news articles by any
  other means than flooding every server with everything, but if the
  news protocols were only slightly smarter, they could have waited
  until they needed the article body before they requested it, if they
  had the headers that they could show to users.  Users would not have
  noticed those delays, and if there were any millisecond delays, it
  would be only for the first reader that day/week/whatever.  Now, scale
  this up and consider large (network-topological) regions cooperating
  to avoid duplicating news articles and traffic needlessly.  How many
  USENET servers are there around the core interchange points these
  days?  Distributed, redundant load balancing is not exactly news to
  people who care about distributed systems, but we still have people
  who worry needlessly if they cannot have their local USENET feed.  If
  you get upset because you cannot read news because a remote server is
  down, you are a USENET addict and need treatment, not better servers.

  The problem with the whole current design is, however, that you do not
  know who the originator (initial injector) and owner is and you cannot
  request the article except from another nearby cache, so if you have a
  mesasge-id that some article is a response to, you are basically hosed
  if you have not been so lucky as to have it flood by you.  That does
  not mean there _is_ no originator and owner, so I consider it useful
  to think of the injecting user as _virtually_ reachable.

| However, the originating/injecting server must have the bandwidth or
| other capacity for dealing with external requests for articles or even
| storing articles for as long as necessary (that is, until some
| reasonable amount of time has passed).  If those requirements aren't
| met, the current model seems much better to me.

  The current in-flow rate is a problem.  I have never heard of any
  USENET site that has out-flow problems for articles originating at
  their site.  Keeping your "own" articles around for months should be
  the smallest issue compared to keeping everybody else's articles
  around for weeks and days.  You can increase your out-flow a hundred-
  fold and still be _far_ short of the current in-flow, and this makes
  it possible for leaf sites to have (relatively speaking) _very_ small
  caches while their preferred nearest cache (akin to the site they get
  their feed from today) holds on to articles as long as one of their
  many leaf sites request it.  The great thing with a design like this
  is that you can upgrade from the leaf sites "inward" to the core
  servers, because as long as there are major core servers who still
  operate in the "old way", there will be much less pressure on the
  injecting site.  Given how slowly changes occur in the USENET core
  technology, the chances are very, very good that there will remain a
  number of huge servers who can and will act as proxies for the many
  injecting sites that will never see any significant network load.

| Considering that there are great variances in how long articles are
| stored from news provider to news provider, it seems likely that there
| is a significant amount of users who want to read older articles.

  Yes, that's the idea, to keep the original article available longer.

| It isn't unreasonable to assume there will be a good amount of
| arbitrary requests for older articles at the originating server, say
| up to a month after the article was posted.  Someone with a large
| user/poster base will have to upgrade their injecting servers.  :)

  Methinks you have lost all sense of proportion and need to back up and
  look at the numbers you give for the current situation and its growth
  and consider who the people are who contribute the volume of data that
  you refer to.  Yes, there are some large sites, perhaps responsible
  for as much as 1/1000th of the total volume on USENET each, but they
  _already_ have 1000 times the capacity to handle "injections" and if
  you ask them a 100 times for every article they have published, they
  are still at 1/10th their old bandwidth, and they aren't going to fill
  the remaining 9/10th with requests from other servers, either.

| Another issue is that if the injecting server is somewhere remote in
| Australia and your client is in Norway, response will be slow,
| reducing the usefulness of Usenet compared to the web.

  Really?  How come I can pick up the article from any number of very
  nearby caches today?  Hmmm.  Mystifying!  How _did_ they get there?

| Ketil Z Malde has a point when he talks about the responsiveness of
| today's Usenet; it's very important for the user that the articles
| requested appear "immediately".  (There hasn't been much research on
| Usenet, but I believe it's safe to apply relevant aspects of usability
| studies of the web.)

  Yeah, I think you guys have got it _exactly_ right.  Of course I'm out
  to destroy USENET and its usability when I suggest that we invent a
  better way to propagate articles.  Of course I'm trying my very best
  to screw with the minds of people who implement the software so _they_
  also think "Hey, this USENET thing really was a bad idea to begin
  with, so let's just do something utterly braindamaged that will kill
  it and hopefully everybody using it, too".  Get a _grip_, guys!

  If you don't understand cache propagation mechanisms, say so.  If you
  cannot even _imagine_ that somebody out there have tried to think
  about the way USENET propagates _while_ trying to keep it working for
  its users, I suggest you try to insult people openly instead of just
  assuming they have extra spicy garlic meatballs for brains.  Sheesh.

  Today's USENET propagation model does not try to keep track of where
  articles are read, that is the task of local news admins, most of whom
  take some pride in providing "everything" to their users.  If we did
  not ship the articles until they were read, only enough header stuff
  that people could see newsgroups listings and stuff, the traffic would
  change patterns to adapt to the readership instead of the global flood
  algorithm used today.  This would cause an increased ability to read
  "fringe" newsgroups (it's always a hassle to get a news admin to get
  another weird newsgroup hierarchy, because only one group is too much
  work and the whole hierarchy is too much data), with actual _user_
  input to the distribution of news articles.

| From what I understand, it isn't uncommon to deal with header-only
| feeds, and Diablo supports fetching Message-IDs from other servers by
| demand (automatic redirecting to the other news server).  The latter
| seemed to work well when I tested it when the news servers in the
| chain were topologically close.  I didn't test with servers on the
| other side of the globe, though.

  I'm aware of Diablo and strongly encourage further development along
  those lines, but as the network of servers picks up messages, they
  will naturally be cached many places along the way, not very much
  different from today's system.  Having to follow a chain of forwarding
  servers to get a particular article is therefore very unlikely, unless
  you read "fringe" newsgroups that nobody else in your vicinity reads.
  When you do that, you might also well tolerate longer access times.

#:Erik
-- 
  The United States of America, soon a Bush league world power.  Yeee-haw!