[r6rs-discuss] unicode (re comment #134)

John Cowan cowan at ccil.org
Sun Dec 17 00:46:38 EST 2006


Thomas Lord scripsit:

> Noncharacter code points are explicitly described as suitable
> for internal use.

So they are, and R5.91RS explicitly permits them.  Noncharacter code
points are not the same as surrogate code points, which are *not*
explicitly described as suitable (and are not suitable) for internal use.

Specifically, allowing the representation of surrogate code points
means that UTF-16 cannot be used as an internal representation at all
(it cannot distinguish between two consecutive surrogate code points
and a non-BMP character) and means that UTF-8 and UTF-32 cannot be used
directly either, but only in the form of non-standard variants.

>  For every natural number (integers greater than or equal to 0)
>  there exists a distinct CHAR value.  The set of all such
>  values are called "simple characters".

Whatever for?  There does not exist a countable infinity of simple
characters to represent, Galactic Empire or no.  The number is
*always* going to be finite, by the nature of graphical representations:
if there were a countable infinity of characters, there would be for
each character infinitely many that are essentially indistinguishable
from it, since each character can be represented as a pixel grid of
finite size.

I omit the rest, since it depends on this original and useless notion.

-- 
"But I am the real Strider, fortunately,"       John Cowan
he said, looking down at them with his face     cowan at ccil.org
softened by a sudden smile.  "I am Aragorn son  http://www.ccil.org/~cowan
of Arathorn, and if by life or death I can
save you, I will."  --LotR Book I Chapter 10



More information about the r6rs-discuss mailing list