Andrew <calcheng@uclink4.berkeley.edu> wrote:
+---------------
| any ideas on how to represent strings in an interpreter.....
| i am looking for type checking criteria.....and how a string is
| actually represented in the interpreter.........
+---------------
It varies wildly all over the place among implementations -- see
Gudeman, "Representing Type Information in Dynamically Typed Languages"
<URL:ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/typeinfo.ps.gz>
for an excellent survey -- but the main two styles of representing strings
are [horribly oversimplified!] indirect & inline.
In the "indirect" style, heap "objects" are a fairly small struct (or a
union of small structs) containing either a *very* small amount of data
in the struct itself or pointer(s) to larger data elsewhere, something
like this (cribbed and simplified from the SIOD sources):
typedef struct obj {
short type;
union {struct {struct obj *car; struct obj *cdr;} cons;
struct {double data;} flonum;
struct {char *pname; struct obj *vcell;} symbol;
...
struct {char *name; struct obj *(*f)(void *,...);} subr;
struct {struct obj *env; struct obj *code;} closure;
struct {long dim; long *data;} long_array;
struct {long dim; double *data;} double_array;
struct {long dim; char *data;} string;
struct {long dim; struct obj **data;} lisp_array;
} u;
} *LISP;
and there is an enum somewhere (or series of some "#define"s) to enumerate
the values in "type". So one might type-check for a string with:
#define STRING_P(x) (IS_ON_HEAP(x) ? (x)->type == string_type : 0)
(where the "IS_ON_HEAP" macro decides between "immediate" values and heap
objects) and get the address and length of the string with:
#define STRING_VAL(x) ((x)->u.string.data)
#define STRING_LEN(x) ((x)->u.string.dim)
Often in the indirect style only the heap "cells" (as above) are garbage-
collected per se, while the "external" data is explicitly malloc'd/free'd
by type hook routines called by the GC when a heap cell's state changes.
In the direct style, the data (except for "large" objects, which are handled
somewhat as in the indirect style) immediately follows the type indicator,
sometimes with a fixed-position length indicator as well (to make life a
little simler for the GC), so you get something like this, using the idiom
of a length-1 array to signify indefinite size:
typedef struct obj {
short type;
short len;
union {
char chars[1];
short shorts[1];
int ints[1];
long longs[1];
void *ptrs[1];
} u;
} *LISP;
and then you have various macros to do casts onto the "obj" type, e.g.:
#define STRING_P(x) (IS_ON_HEAP(x) ? (x)->type == string_type : 0)
#define STRING_LEN(x) ((x)->len)
#define STRING_VAL(x) (&((x)->u.chars[0]))
or maybe:
struct Lisp_String {
short type;
short len;
char data[MAX_IMMED_STRING];
};
...
#define STRING_VAL(x) (((struct Lisp_String*)(x))->data)
#define STRING_LEN(x) ((x)->len)
Finally, some (but not all) Schemes ensure that strings are null-terminated
(even though the associated length field makes that uneccessary) to make it
easier to hand Scheme strings to C library routines which expect null-
terminated strings.
Does that answer your question?
-Rob
-----
Rob Warnock, 7L-551 rpw3@sgi.com http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
2011 N. Shoreline Blvd. FAX: 650-933-4392
Mountain View, CA 94043 PP-ASEL-IA