From: Bill_Clementso

Subject: Re: XML parser and line feeds between tags

Date: 2003-12-16 11:23


I do the following to remove extraneous whitespace:

    (with-open-file (p *slides-xml-file*)
      (setq slides (strip-whitespace (car (last (parse-xml p))))))

(defun strip-whitespace (slides)
  "Strip out extraneous whitespace from parsed xml."
  (loop for x in slides
      when (and (atom x)
                (or (not (stringp x))
                    (not (equal (string-trim '(#\Space #\Tab #\Newline
#\Return) x) ""))))
      collect x
      else
      when (not (stringp x))
      collect (strip-whitespace x)))

However, it should be noted that the Franz XML parser is conforming to the
XML specification in its handling of white space. In particular, refer to
section 2.10 of the XML specification: "An XML processor must always pass
all characters in a document that are not markup through to the
application." Some XML parsers either do not conform to this behavior or
provide an option to automatically strip out white space; however, the
conformant behavior is for the parser to return white space.

Cheers,
Bill Clementson
Integrations Architect
PeopleSoft Inc
303-334-4290



                                                                                                                                       
                      "Laurent                                                                                                         
                      Eschenauer"              To:       Allegro-CL-cs.berkeley.edu <cs.berkeley.edu at Allegro-CL>                       
                      <pepite.b at laurent>        cc:       <pepite.be at laurent>                                                             
                      e>                       Subject:  XML parser and line feeds between tags                                        
                                                                                                                                       
                      12/16/2003 06:02                                                                                                 
                      AM                                                                                                               
                                                                                                                                       
                                                                                                                                       





Hello everyone,

I have an issue with the xml parser in ACL 6.2 (pxml) when using line
feeds. Looking at the XML specs, I understand that the XML parser should
ignore line feeds and extra whitespace. However when I parse the following
file with ACL 6.2 :

<team>
<person id="b001" name="laurent eschenauer"/>
<person id="b002" name="cedric gauthy"/>
</team>

Using the command :(parse-xml stream :content-only t)

I receive:

((team "
" ((person id "b001" name "laurent eschenauer")) "
" ((person id "b002" name "cedric gauthy")) "
"))

As you can see, all line feeds are handled by the parser as token while
they should not be visible (according to the XML specs at
http://www.xml.com/axml/testaxml.htm).

Am I missing something here ? Anyone got a similar problem ?

Thank you for your feedback,

-Laurent

----------------------------------------------------------------------

Laurent ESCHENAUER
R&D Engineer
PEPITe S.A.
Parc Scientifique du Sart-Tilman
Rue Des Chasseurs Ardennais (Spatiopole)
B-4031 Angleur (Liege)
Belgium

Phone : +32 (0) 4 372 93 35
Fax : +32 (0) 4 372 93 20
Email : <pepite.be at laurent>
Web: http://www.pepite.be