Subject: Re: Could CDR-coding be on the way back? From: Erik Naggum <erik@naggum.net> Date: 14 Dec 2000 23:23:33 +0000 Newsgroups: comp.lang.lisp,comp.arch Message-ID: <3185825013728755@naggum.net> * Jan Ingvoldstad <jani@ifi.uio.no> | This is a bit too crude, unless it's your local server doing the | fetching for you, not the client (user agent). But that may be what | you mean with "clever cache propagation". I consider today's USENET distribution a crude cache propagation mechanism and propose something that uses much less bandwidth and disk space while it maintains the principle of nearby caches. A few years ago, for instance, the five largest USENET sites in Oslo, Norway, were all located within a square mile, duplicating each other with great precision and speed because they had different user bases and owners. The extant news software could not propagate news articles by any other means than flooding every server with everything, but if the news protocols were only slightly smarter, they could have waited until they needed the article body before they requested it, if they had the headers that they could show to users. Users would not have noticed those delays, and if there were any millisecond delays, it would be only for the first reader that day/week/whatever. Now, scale this up and consider large (network-topological) regions cooperating to avoid duplicating news articles and traffic needlessly. How many USENET servers are there around the core interchange points these days? Distributed, redundant load balancing is not exactly news to people who care about distributed systems, but we still have people who worry needlessly if they cannot have their local USENET feed. If you get upset because you cannot read news because a remote server is down, you are a USENET addict and need treatment, not better servers. The problem with the whole current design is, however, that you do not know who the originator (initial injector) and owner is and you cannot request the article except from another nearby cache, so if you have a mesasge-id that some article is a response to, you are basically hosed if you have not been so lucky as to have it flood by you. That does not mean there _is_ no originator and owner, so I consider it useful to think of the injecting user as _virtually_ reachable. | However, the originating/injecting server must have the bandwidth or | other capacity for dealing with external requests for articles or even | storing articles for as long as necessary (that is, until some | reasonable amount of time has passed). If those requirements aren't | met, the current model seems much better to me. The current in-flow rate is a problem. I have never heard of any USENET site that has out-flow problems for articles originating at their site. Keeping your "own" articles around for months should be the smallest issue compared to keeping everybody else's articles around for weeks and days. You can increase your out-flow a hundred- fold and still be _far_ short of the current in-flow, and this makes it possible for leaf sites to have (relatively speaking) _very_ small caches while their preferred nearest cache (akin to the site they get their feed from today) holds on to articles as long as one of their many leaf sites request it. The great thing with a design like this is that you can upgrade from the leaf sites "inward" to the core servers, because as long as there are major core servers who still operate in the "old way", there will be much less pressure on the injecting site. Given how slowly changes occur in the USENET core technology, the chances are very, very good that there will remain a number of huge servers who can and will act as proxies for the many injecting sites that will never see any significant network load. | Considering that there are great variances in how long articles are | stored from news provider to news provider, it seems likely that there | is a significant amount of users who want to read older articles. Yes, that's the idea, to keep the original article available longer. | It isn't unreasonable to assume there will be a good amount of | arbitrary requests for older articles at the originating server, say | up to a month after the article was posted. Someone with a large | user/poster base will have to upgrade their injecting servers. :) Methinks you have lost all sense of proportion and need to back up and look at the numbers you give for the current situation and its growth and consider who the people are who contribute the volume of data that you refer to. Yes, there are some large sites, perhaps responsible for as much as 1/1000th of the total volume on USENET each, but they _already_ have 1000 times the capacity to handle "injections" and if you ask them a 100 times for every article they have published, they are still at 1/10th their old bandwidth, and they aren't going to fill the remaining 9/10th with requests from other servers, either. | Another issue is that if the injecting server is somewhere remote in | Australia and your client is in Norway, response will be slow, | reducing the usefulness of Usenet compared to the web. Really? How come I can pick up the article from any number of very nearby caches today? Hmmm. Mystifying! How _did_ they get there? | Ketil Z Malde has a point when he talks about the responsiveness of | today's Usenet; it's very important for the user that the articles | requested appear "immediately". (There hasn't been much research on | Usenet, but I believe it's safe to apply relevant aspects of usability | studies of the web.) Yeah, I think you guys have got it _exactly_ right. Of course I'm out to destroy USENET and its usability when I suggest that we invent a better way to propagate articles. Of course I'm trying my very best to screw with the minds of people who implement the software so _they_ also think "Hey, this USENET thing really was a bad idea to begin with, so let's just do something utterly braindamaged that will kill it and hopefully everybody using it, too". Get a _grip_, guys! If you don't understand cache propagation mechanisms, say so. If you cannot even _imagine_ that somebody out there have tried to think about the way USENET propagates _while_ trying to keep it working for its users, I suggest you try to insult people openly instead of just assuming they have extra spicy garlic meatballs for brains. Sheesh. Today's USENET propagation model does not try to keep track of where articles are read, that is the task of local news admins, most of whom take some pride in providing "everything" to their users. If we did not ship the articles until they were read, only enough header stuff that people could see newsgroups listings and stuff, the traffic would change patterns to adapt to the readership instead of the global flood algorithm used today. This would cause an increased ability to read "fringe" newsgroups (it's always a hassle to get a news admin to get another weird newsgroup hierarchy, because only one group is too much work and the whole hierarchy is too much data), with actual _user_ input to the distribution of news articles. | From what I understand, it isn't uncommon to deal with header-only | feeds, and Diablo supports fetching Message-IDs from other servers by | demand (automatic redirecting to the other news server). The latter | seemed to work well when I tested it when the news servers in the | chain were topologically close. I didn't test with servers on the | other side of the globe, though. I'm aware of Diablo and strongly encourage further development along those lines, but as the network of servers picks up messages, they will naturally be cached many places along the way, not very much different from today's system. Having to follow a chain of forwarding servers to get a particular article is therefore very unlikely, unless you read "fringe" newsgroups that nobody else in your vicinity reads. When you do that, you might also well tolerate longer access times. #:Erik -- The United States of America, soon a Bush league world power. Yeee-haw!