[r6rs-discuss] [Formal] U+FFFD not intended for encoding errors

John Cowan cowan at ccil.org
Fri Sep 22 18:27:34 EDT 2006


Marcin 'Qrczak' Kowalczyk scripsit:

> "For example, in UTF-8 every code unit of the form 110xxxx must be
> followed by a code unit of the form 10xxxxxx. A sequence such as
> 110xxxxx 0xxxxxxx is illformed and must never be generated. When
> faced with this ill-formed code unit sequence while transforming or
> interpreting text, a conformant process must treat the first code unit
> 110xxxxx as an illegally terminated code unit sequence for example,
> by signaling an error, filtering the code unit out, or representing
> the code unit with a marker such as U+FFFD replacement character."

Good catch.  I withdraw my comment.

-- 
Real FORTRAN programmers can program FORTRAN    John Cowan
in any language.  --Allen Brown                 cowan at ccil.org



More information about the r6rs-discuss mailing list