[r6rs-discuss] Why lexers can be simpler when restricted to ASCII
alan at alan-watson.org
Mon Apr 23 13:06:05 EDT 2007
In formal comment 231, I stated:
"Many current Schemes have lexers written for ASCII (or Latin-1)
character sets. Conversion of these lexers to the new standard would be
easier if the report allowed inline hex escapes to appear anywhere in
The editors replied:
"It is unclear why converting the lexers would be significantly simpler
through this change"
Let me explain my original opinion. Many Schemes currently have lexers
written in C using "char". These need converting to "long" to handle
Unicode. Furthermore, table-driven approaches are practical for ASCII
(128 values), but not practical for Unicode (roughly 2^24 values).
In case that isn't clear enough: My Scheme uses flex for its lexer. I
cannot see how to simply convert it to accept Unicode. I think I will
have to dump flex and implement a new lexer by hand.
More information about the r6rs-discuss