[r6rs-discuss] R5RS is not a baseline
cowan at ccil.org
Sat Feb 21 15:01:32 EST 2009
Eli Barzilay scripsit:
> But eventually there are some decisions that must be made that are
> related to culture. For example -- Hebrew can be written in two
> different style of letters, and the style difference is more than just
> a change of font (one is used almost exclusive for printed texts and
> the other for hand-written texts). The question of whether to include
> both or only one must be a question that people raised somewhere,
> and probably argued about, and some side won.
Hebrew has been in the Unicode repertoire since it it was still the
Xerox proprietary character repertoire (the encoding worked on different
principles, being essentially a highly extended JIS-208), and AFAIK
there has never been any suggestion of including any sort of handwritten
characters in any script. All that is left to higher-level protocols.
> OTOH, unicode did make some questionable characters (whatever that
> ("characters") means) like "double vav", which must be really important
> to some Yiddish speaking crowd -- and could be used to create some
> amazingly confusing programs.
Although Hebrew and Yiddish share the same letters, they employ them in
fundamentally different ways. Hebrew writing is an abjad, like Arabic:
consonants (some of which have merged or become silent) are written,
but vowels are indicated only by marks, normally used strategically to
eliminate ambiguities, but in full in poetry or sacred documents.
Yiddish, though, employs an alphabetic script, as English does.
A Yiddish alef-patahh is not a now-silent consonant with an optional
vowel mark meaning "a"; it as a whole *is* the letter corresponding to
Latin "a", and if you leave off the patahh, you have misspelled the word.
Similarly, yod corresponds to Latin "i", and yod-yod to Latin "ey", and
hiriq is not a vowel mark but a disambiguator used when two yods come
together but don't constitute a yod-yod. If you leave it out of the
word yod-yod-hiriq-dalet-yod-shin, you have written not _yidish_ but
_*eydish_. Each such Yiddish-specific letter is represented separately
in Unicode, just as is done for many (though not all) other languages.
(Alas, all this goes west for Hebrew loan-words in Yiddish, which are
spelled exactly as in Hebrew, often with full pointing. This was not
so in the official Yiddish of the Soviet Union.)
The practical inconvenience of this system is mitigated in two ways:
the alef-patahh letter is canonically equivalent to the sequence of
alef followed by patahh; and unlike the situation with accented Latin
letters, such a sequence is *not* recomposed into an alef-patahh.
So doing Unicode normalization will eliminate the aleph-patahh letter.
About the only bad case is yod-yod-patahh (Latin "ay"), which properly
has the patahh under both yods; following the above process will wind
up with a yod followed by a yod-patahh, which doesn't mean anything.
> As a programmer, I really don't want to care about all of this.
> Someone made some decision like that "double vav" that distinguishes
> two strings that render in exactly the same way, and as a Hebrew
> speaker it doesn't make any sense to me --
Hopefully you are more enlightened now.
> but as a hacker I still don't want to care about the differences.
> The bits that represent the two in a (for example) UTF-8 text file
> are different, so the easiest way for me to avoid it so to just look
> at the bits.
Sure, if you want. Or you can use various forms of normalization,
some of which are standardized by Unicode and some not, to throw away
any unwanted distinctions. For example, if you are analyzing Chinese
text, you may want to throw away the difference between Simplified and
Traditional characters -- not that it's trivial to do so.
John Cowan <cowan at ccil.org> http://www.ccil.org/~cowan
Today an interactive brochure website, tomorrow a global content
management system that leverages collective synergy to drive "outside of
the box" thinking and formulate key objectives into a win-win game plan
with a quality-driven approach that focuses on empowering key players
to drive-up their core competencies and increase expectations with an
all-around initiative to drive up the bottom-line. --Alex Papadimoulis
More information about the r6rs-discuss