William James <w_a_x_man@yahoo.com> wrote:
+---------------
| p...@informatimago.com (Pascal J. Bourguignon) wrote:
| > But the point is that most often, formated data has a grammer that is
| > so simple that a context free parser is overkill, and a mere regexp,
| > or even just reading the stuff sequentially is enough.
|
| Three fields:
| He said "Stop, thief!" and collapsed.
| 88
| x,y
|
| As a CSV record:
| "He said ""Stop, thief!"" and collapsed.",88,"x,y"
|
| Is CSV like this easy to parse?
+---------------
Pretty much so. Here's what I use [may need tweaking for some applications]:
;;; PARSE-CSV-LINE -- Parse one CSV line into a list of fields,
;;; stripping quotes and field-internal escape characters.
;;; Lexical states: '(normal quoted escaped quoted+escaped)
;;;
(defun parse-csv-line (line)
(when (or (string= line "") ; special-case blank lines
(char= #\# (char line 0))) ; or those starting with "#"
(return-from parse-csv-line '()))
(loop for c across line
with state = 'normal
and results = '()
and chars = '() do
(ecase state
((normal)
(case c
((#\") (setq state 'quoted))
((#\\) (setq state 'escaped))
((#\,)
(push (coerce (nreverse chars) 'string) results)
(setq chars '()))
(t (push c chars))))
((quoted)
(case c
((#\") (setq state 'normal))
((#\\) (setq state 'quoted+escaped))
(t (push c chars))))
((escaped) (push c chars) (setq state 'normal))
((quoted+escaped) (push c chars) (setq state 'quoted)))
finally
(progn
(push (coerce (nreverse chars) 'string) results) ; close open field
(return (nreverse results)))))
It handles your sample input:
> (parse-csv-line (read-line))
"He said ""Stop, thief!"" and collapsed.",88,"x,y"
("He said Stop, thief! and collapsed." "88" "x,y")
>
-Rob
-----
Rob Warnock <rpw3@rpw3.org>
627 26th Avenue <URL:http://rpw3.org/>
San Mateo, CA 94403 (650)572-2607