[r6rs-discuss] BOM at start of ports
William D Clinger
will at ccs.neu.edu
Wed Dec 5 09:45:53 EST 2007
Larceny and Petit Larceny v0.95 implement the Unicode
semantics as John Cowan described it, with the following
> If no BOM is present, the process SHOULD use a
> local convention if there is one (this mostly means that Windows UTF-16
> files are typically little-endian), and if not, SHOULD assume big-endian.
Larceny and Petit Larceny v0.95 do not use local
conventions. This will be corrected in a future
Note that the R6RS does not specify codecs for UTF-16BE,
UTF-16LE, UTF-32BE, and UTF-32LE. The removal of these
codecs from an earlier draft was suggested by John Cowan.
Apart from the utf16->string and utf32->string procedures,
which allow the semantics of UTF-16BE, UTF-16LE, UTF-32BE,
and UTF-32LE to be specified via explicit arguments instead
of a transcoder, Larceny and Petit Larceny do not support
those four encoding forms.
Larceny and Petit Larceny extend the R6RS by allowing the
utf16->string and utf32->string to accept a single argument,
in which case they decode the string according to the UTF-16
or UTF-32 encoding forms (respectively). In my opinion, the
removal of single-argument utf16->string and utf32->string
is an error in the R6RS caused by misinterpretation of John
Cowan's response of 27 May 2007 to an ambiguous question
posed by Mike Sperber [1,2]. I hope this mistake will soon
be corrected by ERR5RS and by an R7RS.
More information about the r6rs-discuss