Subject: Re: Upper limits of CL From: Erik Naggum <erik@naggum.net> Date: Thu, 20 Jun 2002 03:33:00 GMT Newsgroups: comp.lang.lisp Message-ID: <3233532779857997@naggum.net> * Christopher Browne | If the O.P. was actually trying to invert matrices of dimension '(20000 | 20000), then they more than likely care somewhat about the amount of memory | physically available. Although I shall not speak of the efficiency issue since I have never done anything like this before, it is fairly obvious that all you need to do this is some fairly minimal amount of memory (only kilobytes in magnitude) and enough disk space to hold one or two copies of this matrix. Binary I/O with objects of the appropriate type is a requirement, of course. There is, also of course, no material difference between this and system-managed virtual memory, but it might not even matter that it is not random access if you are actually going to traverse the data sequentially. You could even do the old tape-station trick and write a temporary tape/file with redundant data in your access pattern if this is faster than random access to disk or virtual memory -- there is strong empirical evidence that paging is slower than predictable sequential disk access, and you get a free 40G disk with every three pairs of sneakers these days. A good engineer would balance the tradeoffs and solve the problem within the existing resource constraints. A theoretical computer scientist would whine until he got a big enough machine to implement the mathematical solution with the least amount of fussing about the constraints of the real world. I know which one I would hire to get the job done. When I hear about these amazing datasets, I keep thinking that they need to be generated somehow. The only applications I know of that require multiple gigabytes of memory (and then tens of gigabytes) are those that require multiple terabytes of disk space (and then easily tens of terabytes, too). A friend of mine from the U of Oslo once headed the IT department in Statoil responsible for geological surveys. They aquired the first homogeneous park of disks capable of 1 terabyte simultaneous access in Norway (they already had _hundreds_ og terabytes of "virtual disk space" on enormous tape systems that were loaded onto disk as needed). These datasets were so huge that they took many months to accumulate -- they had to send physical _ships_ out to sea to scan the seabed, at the cost of dollar a byte, and the sheer transfer time from tape to disk could be a couple _days_ after the physical tapes had also taken several days to move from ship to the computer facility. Then, of course, came the processing of these things. Dedicated hardware costing many millions of dollars churned on the data for weeks on end before it finally spit out a colorful map of the seabed that took several days to plot on their monster plotters so they could hang them up on the walls, hundreds of meters long in all. Later, computers became powerful enough to do virtual reality with real-time 3D navigation over and into the analyzed seabed. This amazing feat was all done in Fortran, unsurprisingly, software costing more millions of dollars and that were tens of millions of lines of code developed over decades. This was all worth it because it takes approximately one day of actual production from an oil well to pay for all the computing power necessary to punch the right hole in the ground. The whole country of Norway can run for twice as long as it takes to pump up the oil, from the _taxation_ on the _profits_ alone, socialist plan economy and social security and defense and everything. This is not some jerkoff hobbyist whining about not getting a 64-bit Common Lisp -- this is big enough to invest 100 man-years to develop dedicated programming languages and spawning moderately large companies just to cater to a single industry of a _very_ small number of companies. (We have no mom-and-pop oil rigs in Norway.) It would be cheaper to implement a Common Lisp system of your own for this kind of operation than to ask someone else to do it for you. (In an unrelated industry, that is just how Erlang got started.) The morale of this long-winded example is this: If you really need to invert a 20,000 by 20,000 matrix of doubles, and it is not just some kind of academic masturbation, the memory and Common Lisp and 64-bit hardware and whatever else you can come up with on the IT side are going to be miniscule compared to the rest of the costs of the system. Take meteorology -- the first Cray purchased in Norway was used to help our fishing industry plan their expeditions to sea. According to legend, the hardware paid for itself the first week it had been in operation, but the cost of satellite imaging, telecommunications equipment capable of feeding the machine in real time, etc, dwarfed the Cray hardware by two orders of magnitude. Apart from the oil, we have _weather_ in this mountainous costal country. Billions of dollars have been poured into meteorological prediction in this country alone since computers were able to help at all. Under such circumstances, I can fully sympathize with the need for more than 32-bit addressing and I appreciate the need for the raw computational power that goes into these things, but if this talk about 64-bit implementations is only some academic exercise, I actually find it _insulting_ to the brilliant minds who have had to do without it. Besides, if you really need gigabyte upon gigabyte of memory and the hardware to utilize it, the only thing between you and satisfying that need has been money for at least 10 years. It's not like it's Common Lisp's fault that you haven't had that money -- and if you had had it, would you have had anything left over to do anything useful with it over a reasonable period of time? Like, you don't buy a 150-million-dollar printing press just because you got an idea about publishing a newspaper -- you upgrade from the 75-million-dollar printing press when you approach 20 out of 24 hours running time 7 days a week and want to not run out of hours of the day before the new press can be delivered and run in. So pardon my cynical twist, but what are you doing with that 20,000×20,000 double-precision floating point matrix you say you need to invert _today_? If you answer "nutt'n, I jus kinda wondered what it'd be like, you know", you should be very happy that I am most likely more than 3000 miles away from you, or I would come over and slap you hard. And if you _are_ doing serious stuff of this magnitude, why do you even bother with run-of-the-mill Common Lisp implementations on stock hardware? Implement your own goddamn Common Lisp system optimized for your hardware and your domain-specific language and other needs. That was -- after all -- how several of the Common Lisp implementations out there got started. It wasn't a miracle then, and it won't be a miracle now. Just f-ing do it. [ If I sound grumpy, it is only because I have come across too many idiots of the "it can't be done" persuasion lately, the kind of managers who have an aquarium in their office because fifteen brains think better than one. ] -- Guide to non-spammers: If you want to send me a business offer, please be specific and do not put "business offer" in the Subject header. If it is urgent, do not use the word "urgent". If you need an immediate answer, give me a reason, do not shout "for your immediate attention". Thank you.