[r6rs-discuss] unicode (re comment #134)
cowan at ccil.org
Mon Dec 18 00:37:44 EST 2006
Thomas Lord scripsit:
> So what is your claim, here, John? That between 3.0 and 4.1
> the consortium changed its mind about internal use of
> surrogates *but forget to tell anyone*?
Is there some reason to cite a 7-year-old version of the Unicode
Standard? The prose has been improved over that timespan, you know.
> Conformance rules C4, C5, and C6 have the same sentence structure and
> only in a single noun-phrase. They are all bundled together:
> C4. A process shall not interpret a high-surrogate code point or a
> low-surrogate code point as an abstract character.
> C5. A process shall not interpret a noncharacter code point as an
> abstract character.
> C6. A process shall not interpret an unassigned code point as an
> abstract character.
> Evidently, these three classes of codepoints are, in the minds of
> the Consortium, similar in an important way.
Evidently they are also different, or UTC would have coordinated
them into a single conformance requirement.
> What does the prohibition, to "not interpret a codepoint as an
> abstract character", actually mean? Specifically, does it preclude
> internal use? The commentaries on C5 and C6 plausibly suggest
You gloss over the fact that there is no such commentary on C4,
which is the one you want to use internally. As I explained last
time, using loose surrogates internally makes no sense and shouldn't
> Therefore, under my proposal, if you feel you simply can't
> handle unpaired surrogates then fine: don't. Your implementation
> is still conforming.
> Meanwhile, if I feel I can handle unpaired surrogates,
> I will. Under my proposal, my implementation can also be
> conforming (under the current language, it can not).
It's evident to me that you don't think a Scheme standard should
forbid *anything*; it should only require, never forbid. We aren't
going to get past this basic disagreement.
I will point out, however, that throwing an exception is a form of
required behavior, not forbidden behavior; you can implement a widened
version of any standardized procedure by catching the exception and
doing something unstandardized instead.
> There *do* exist a countable infinity of symbols that can be
> passed over a communications channel. Text is an important
> use for CHAR values but not the only use.
I'm all ears, hypothetically speaking.
> John> The number is *always* going to be finite, by the nature of
> John> graphical representations: if there were a countable infinity of
> John> characters, there would be for each character infinitely many
> John> that are essentially indistinguishable from it, since each
> John> character can be represented as a pixel grid of finite size.
> CHAR is not only for naming glyphs.
The argument is a general one. Symbolic communication depends on the
ability to discriminate between the symbols; if there is a countable
infinity of them, then either some symbols must have unbounded complexity
or some symbols must not be finitely discriminable. In either case
communication is not achieved.
> Bear wondered about the possibility of a high level language
> with a basic API at level 3. E.g., STRING-REF would
> return the Nth combining character sequence from a
> string, not the Nth codepoint (or code unit).
Treating default grapheme clusters (which is the Unicode Standard's
version of what you are describing) as primitive is possible as far as
it goes, but there are many properties which are defined at the level
of codepoints/characters only. So you end up returning tuples of
properties instead of properties.
> I realized that Ray's idea actually works out very cleanly
> and effectively when expressed in terms of my CHAR and STRING
> Combining character sequences are, in theory, of potentially
> unbounded length. To read and write such sequences atomically
> on a port implies that the port is of unbounded width.
To write any one of your unbounded-width characters on a port
implies that the port is of unbounded width already. To write
an unbounded number of unbounded-width characters on a port
requires either an unbounded number of ports or a port that is
not merely unbounded but uncountable in width.
In short, we have gone way beyond the realm of practicability.
John Cowan cowan at ccil.org http://ccil.org/~cowan
Objective consideration of contemporary phenomena compel the conclusion
that optimum or inadequate performance in the trend of competitive
activities exhibits no tendency to be commensurate with innate capacity,
but that a considerable element of the unpredictable must invariably be
taken into account. --Ecclesiastes 9:11, Orwell/Brown version
More information about the r6rs-discuss