[r6rs-discuss] reading XML files

Chris Hanson cph at csail.mit.edu
Mon Nov 20 13:00:22 EST 2006


Will,

In your document "portio.txt", you said

    To read an XML file, a Scheme program would open the file as a
    binary port, fetch single bytes from that port and interpret them
    as UTF-8 until the transcoding to be used for the rest of the file
    had been parsed, and then call transcoded-port to obtain a text
    port with that encoding.

followed by some very hairy code that you suggest should be in the
standard.  None of this is necessary, and consequently you can
simplify this document.

To identify the coding of an XML entity, there are two stages: (1)
determine if there's a BOM at the beginning of the entity, and (2)
determine if there's an XML declaration present.  Stage (1) requires
only binary I/O, since you're looking one of several specific
sequences.  Stage (2) requires only ASCII I/O, because the XML
declaration is entirely coded in ASCII non-control characters (and
#\tab).  I believe this was a deliberate design decision, mostly
because it is so simple and elegant.



More information about the r6rs-discuss mailing list