[r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

Per Bothner per at bothner.com
Mon Mar 26 11:46:50 EDT 2007


Shiro Kawai wrote:
> Suppose I want to use Scheme as the extension language of
> the editor.  It will have an operation to extract a region
> of the buffer as a Scheme string.  And it will be useful
> if the extracted string contains language information as
> well, for I might want to do language-specific operations.

Associating arbitrary "properties" with a character or a
run of characters in a string is a very useful operation.
Emacs has this:

   Each character position in a buffer or a string can have a "text
   property list", much like the property list of a symbol.

Java Swing text "Document" objects provide something similar.

> Using 32bits per character and put auxiliary language info
> into the top 11 bits can be a plausible implementation.

For some applications 11 bits may be enough.  But if you want a
language property as well as a font property, why then you're
already out of bits.

> (I think Emacs treats characters of different language by
> adding leading octet unique to each language.

Not quite.  It can represent simultaneously different encodings
in the same buffer, but encoding isn't the same as language.
This "feature" is a holdover from the pre-Unicode (or rather
anti-Unicode) days: "Mule" was developed in Japan where there
was a lot anti-Unicode sentiment, but I think that war is over.
-- 
	--Per Bothner
per at bothner.com   http://per.bothner.com/



More information about the r6rs-discuss mailing list