[r6rs-discuss] [Formal] Non-ASCII characters should not be treated all alike

Aubrey Jaffer agj at alum.mit.edu
Fri Dec 1 14:33:48 EST 2006


 | Date: Tue, 28 Nov 2006 00:05:05 -0500
 | From: John Cowan <cowan at ccil.org>
 | 
 | Submitter: John Cowan
 | Email address: cowan at ccil.org
 | Issue type: Defect
 | Priority: Minor
 | Component: Lexical
 | Report version: 5.91
 | Summary:  Non-ASCII characters should not be treated all alike
 | 
 | The lexical syntax should not allow Nd, Mc, or Me characters to
 | be initial in identifiers.  Allowing a sequence of Nd characters
 | to be identifiers means that digit-strings in non-ASCII digits
 | are identifiers.  I don't insist that all digit-strings be
 | numerals, but they certainly should not be identifiers.
 | 
 | Likewise, Unicode semantics attaches a Mc or Me character to
 | its predecessor, which would not be part of the identifier.
 | That's undesirable.

Also, should all Zs (whitespace) characters delimit identifiers?
I particularly wonder about NO-BREAK spaces U+00A0 and U+202F:

  U+0020 	SPACE
  U+00A0 	NO-BREAK SPACE
  U+1680 	OGHAM SPACE MARK
  U+180E 	MONGOLIAN VOWEL SEPARATOR
  U+2000 	EN QUAD
  U+2001 	EM QUAD
  U+2002 	EN SPACE
  U+2003 	EM SPACE
  U+2004 	THREE-PER-EM SPACE
  U+2005 	FOUR-PER-EM SPACE
  U+2006 	SIX-PER-EM SPACE
  U+2007 	FIGURE SPACE
  U+2008 	PUNCTUATION SPACE
  U+2009 	THIN SPACE
  U+200A 	HAIR SPACE
  U+202F 	NARROW NO-BREAK SPACE
  U+205F 	MEDIUM MATHEMATICAL SPACE
  U+3000 	IDEOGRAPHIC SPACE

\meta{delimiter} \: \meta{whitespace} \| ( \| ) \| \openbracket{} \| \closedbracket{} \| " \| ;
\meta{whitespace} \: \meta{character tabulation} \| \meta{linefeed}
\> \| \meta{line tabulation} \| \meta{form feed} \meta{carriage return}
\> \| \meta{any character whose category is Zs, Zl, or Zp}



More information about the r6rs-discuss mailing list