Subject: Re: MD5 in LISP and abstraction inversions From: Erik Naggum <erik@naggum.net> Date: Thu, 01 Nov 2001 15:30:22 GMT Newsgroups: comp.lang.lisp Message-ID: <3213617417316421@naggum.net> * Francois-Rene Rideau <fare+NOSPAM@tunes.org> | Ask whom? This newsgroup, for instance. | If someone was willing to publish code for others to use, he'd already | have done it. Or at least announced it on a webpage. That has actually been done. I will certainly not blame you for not finding things on the Net, however. That is why asking those wetware search engines called "humans" is still a very good idea. | So far, the only advertised Common LISP implementation of MD5 is Franz' | ACL6's - which doesn't quite fulfill my needs, and is embedded in their | application rather than available as portable common lisp. It is not an implementation in Common Lisp, but written in C as a low-level system function. There seem to be serious performance improvements in the upcoming 6.1 release. | The ACL version is about twice slower than C. Well, that much is fairly odd, considering that it is written in C. I only noticed that it consed like mad because it used C space for a copy of every string, and that C space was neither freed nor garbage collected. | No, I blame the language for not allowing me to express my wishes. I can sympathize with this. I frequently blame the planet earth for exhibiting more gravitation than I wish it had, and I wish magic worked, too, because I certainly do not want to do so much _work_ all the time. The wishes of good engineers are subjugated to the reality in which they need their wishes to come true. The wishes of really bad engineers are completely disconnected from any reality. I think yours are of the latter kind. | I can, certainly, open files in octet mode - mind you, that's precisely | what I do. But then, I'm not interoperable with all the body of | character-based text, SEXP or (shudder) XML processing. You really need to clue in that MD5 does not operate on characters. That some languages have no idea what a character is, but treat it like a small integer is really not something you can blame either MD5 or Common Lisp for. If you really need a stream to exhibit characters while you do MD5 processing on its underlying byte stream, you can do that by creating a new stream class that reads from those 64-byte buffers that you read from the input source and give to MD5. It is really not that hard if you want to _work_ with the language and avoid "wishing" for things that are incompatible with the language you use. | It is certainly possible to build translation layers from one to the | other, but it's clumsy, inefficient, not portable, and underspecified. MD5 is specified to work on blocks of 64 bytes. How you get from what you have to 64-byte blocks is really a question of quality of programmer and implementation. I still think you are simply massively incompentent. | Certainly, each implementation specifies (more or less precisely) what | happens when you use code-char and such, but then, you have the same kind | of incompatible extension hell as in Scheme. Wrong. | Once again, reading their specification, I happen to like the recent | things done by Franz (SIMPLE-STREAM), but it's unhappily not directly | applicable to me. Well, what makes it impossible for you to take their good ideas and implement them on your own? | For performance, lack of supported modular integer operators is also a | big problem. I actually agree with this in general, but for MD5, just break the 32-bit integers in half and operate on 16-bit values, instead. I assure you that no significant performance loss is caused by this, and the algorithm does not increase in complexity because of it. This is a fairly simple engineering tradeoff that good engineers will do to get the work done and bad engineers will refuse to do because they wish they did not have to. | I consider it bad practice to post large code files on USENET. Well, if your MD5 function is a large code file, then you have even more problems. | I posted the URL to that code, which ought to be enough for anyone | interested. There is no way to ascertain that what is pointed to by a URL will remain the same after it has been criticized. We have seen how some massively dishonest _frauds_ on this newsgroup have altered the text of published URLs (even without updating the version) in order to make their critics look bad. Post the code and it will be very hard to "update" it. | I repeat it here (and add a second one), in case you missed it: | ftp://Samaris.tunes.org/pub/lang/cl/fare/md5.lisp | http://tunes.org/cgi-bin/cvsweb/fare/fare/lisp/md5.lisp I find it rather odd that you cannot destill your problems down to a few simple cases. Everybody can look elsewhere for the full context, but it _should_ be possible to show people your problems with an appropriate excerpt. A good bug report contains a destilled example. A bad bug report contains a million lines of rotten code with a single line "this code is perfect, but your compiler does not conform to my wishes". | > | The world has standardized on low-level byte streams as the universal | > | medium for communication of data, including text. | > Which is a mistake, since they run into an enormous amount of trouble | > with supporting more than one character encoding. | It is not a mistake - it is the natural thing to do in a world of | proprietary black-box devices and software. Huh? You have a special knack for non sequiturs. TEXT IS NOT BYTES. You _really_ need to understand this. Failing that, you will run into all sorts of problems, and there will be no end to your complaints, as I suspect you have already noticed. | I did it. But it's not portably interoperable with the character-based | SEXP code that I have. Really? Tell you what, when I wrote my MD5 functions for Allegro CL 5.0, I stuffed it between the I/O system and the reader, and I reset and grab the md5 hashes while reading characters from the stream. If I can do this in Allegro CL, so can you in CMUCL. If you naively implement the fairly stupid C model used for MD5, sending blocks of 64 "bytes" down to a new function, instead of grabbing the input buffers of the stream, and run into problems that you do not look into seriously in order to find a better way, you are simply incompentent at what you do and should not blame anyone else, _especially_ not the language. | Using MD5 to portably support code version tagging, etc., becomes | "interesting". By no means impossible. Just a PITA. I have no idea what you are trying to talk about. | No I'm not. And sometimes, I want to do only one. Sometimes, I want to | do only the other. Sometimes, I am happy with the character processing | done by my implementation (though it's not portable). Sometimes, I need | precise control on the processing that happens (e.g. because I'm | precisely transcoding stuff from one protocol to another). And | sometimes, I want to do both byte-processing and text-processing at once | on the same stream (at once: switching from one to the other, or even | doing both character processing AND md5sum'ing on the same chunk). I honestly fail to see the problem. In my view, it takes a fairly dense programmer to fail to deal with these things intelligently. If you need both byte stream and character stream, as you would do in HTTP, there are two ways of doing that: so-called "bivalent" streams, from which you can read both bytes and characters, but which introduces serious problems in maintaining state information about non-trivial character codings, or some means to switch the type of the stream between byte and character, which communicates to the lower levels what you intend to do from now on. If you need precise control, ask for it. Your system does _not_ do a lot of weird magic that you have no control over. Just trust me on this, OK? Go read the fine documentation and discover for yourself that you _can_ trust the implementation. | In the latter cases, any implicit character processing done by the | implementation is an abstraction inversion, to me. Yes, to you, because you have _already_ inverted the model. You do _not_ convert from bytes to characters to bytes in order to do MD5 hashing on the bytes -- the danger of losing the original byte values is too high no matter how you do things. You have to get at the bytes where they are actually found, just after they have been read, and just before they are written. Anything else is pretty damn stupid. | > if you are dealing with text, you deal with characters, not their | > coding, | Not even. I could be dealing with words, with sentences, with layout | elements. In such contexts, characters are too low-level and not what I | like. I am sure you think this is relevant to something. Could you try to make it a bit more clear what it is might be relevant to? No, never mind. | Yet I don't resent of CL not standardizing on high-level protocols -- | it's things that can easily be implemented on top of them. However, | wrongly standardizing on low-level things while adding restrictions to | them is an abstraction inversion and it's wrong - it's the language | getting in the way rather than helping. Have you at all considered that _you_ might be wrong in any of this? If not, I would actually like to hear your arguments for why your way of seeing things is correct. I am getting tired of your non sequiturs and the randomness of your "conclusions". | I'm sorry I don't speak norvegian (yet). It means precisely what it says. Oh, geez, you really _are_ a typically French retard. Well, thank you for proving that my guess that your whining is due to your incredible incompetence and the arrogance that only rabid ignorants who have no desire to listen to anyone other than the voices in their head. You really made it clear that you lack the ability to express yourself in English, but when you resort to the kind of low-level idiocy you do, all hope is lost that you will ever recover enough of your thinking ability to get back on track. Of _course_ Common Lisp does not match your wishes, Francois-Rene Rideau. If it did, it would be a really stupidly designed language. I am happy you were not around when things _were_ designed. Incompetence on your scale should be punishable by law. If you think you had some valid concerns about the language, back up and do a better job of presenting them, unoccluded by your stupidity and incompetence and your "wishes". I am quite sure you will find people sympathetic to a number of language design issues when it comes to coding stuff like MD5, but a good engineer knows when to use languages that are better for some particular tasks than others. Bad engineers should just be encouraged to become better at what they do, or to switch careers to one in which they _could_ become good at what they do. /// -- Norway is now run by a priest from the fundamentalist Christian People's Party, the fifth largest party representing one eighth of the electorate. -- Carrying a Swiss Army pocket knife in Oslo, Norway, is a criminal offense.