No subject


Mon Aug 17 16:01:24 EDT 2009


	1) The char->integer procedure must return an exact integer
	between 0 and #xD7FF or between #xE000 and #x10FFF when applied
	to a character supported by the implementation and belonging to
	the Unicode repertoire.  This integer must be the Unicode scalar
	value of the character.

	This is independent of the implementation's internal
	representation.  For example, a Scheme that supports a repertoire
	of  Latin and modern Greek characters only might use the
	ISO 8859-7 encoding internally, in which lower-case lambda is
	represented as #xEB; but char->integer must still return #x03BB
	on that character.

	An ASCII-only Scheme satisfies this requirement automatically,
	provided it does not deliberately scramble the natural result.
	(EBCDIC-based Schemes already have ASCII conversion tables.)

	If the implementation supports non-Unicode characters (ones
	with bucky bits, e.g.), then char->integer must return an exact
	integer less than 0 or greater than #x10FFFF when applied to
	such characters.

	2) The integer->char procedure, when applied to an exact integer
	that char->integer returns when applied to some character c,
	must return c.

	An ASCII-only Scheme also satisfies this requirement
	automatically, with the same proviso.

	3) The char-downcase procedure, given an argument that forms the
	uppercase part of a Unicode upper/lower-case pair, must return
	the lowercase member of the pair, provided that the character
	is supported by the Scheme implementation.  Turkic casing pairs
	are ignored.  If the argument is not the uppercase part of such
	a pair, it is returned.

	4) The char-upcase procedure works the same way, mutatis mutandis.
	Note that many Unicode lowercase characters don't have uppercase
	equivalents.

	5) The char-foldcase procedure applies the Unicode simple
	case-folding algorithm to its argument, ignoring the Turkic
	mappings.  Mappings that don't accept or don't produce single
	characters are ignored.

	In an ASCII-only Scheme, this is equivalent to the char-downcase
	procedure.  This procedure is an extension to R5RS.

	6) The char-ci* procedures behave as if char-foldcase was
	applied to their arguments before calling the respective non-ci
	procedures.

	7) The procedures char-{alphabetic,numeric,whitespace,upper-case,
	lower-case}? return #t if their arguments have the Unicode
	properties Alphabetic, Numeric, White_Space, Uppercase, or
	Lowercase respectively.  Note that many alphabetic characters
	(though no ASCII ones) are neither upper nor lower case.

	8) The string-downcase procedure applies the Unicode full
	uppercasing algorithm to its argument.	This may cause the
	result to differ in length from the argument.  What is more,
	some characters have case-mappings that depend on the surrounding
	context.  For example, Greek capital sigma normally downcases
	to Greek small sigma, but at the end of a word it downcases to
	Greek small final sigma instead.

	For an ASCII-only Scheme, string-downcase is a straightforward
	application of map to char-downcase.

	9) The string-upcase and string-foldcase apply the Unicode full
	uppercasing and case folding algorithms, with the same provisos.
	String-foldcase is an extension to R5RS.

	For an ASCII-only Scheme, these procedures are a straightforward
	application of map to char-upcase and char-downcase, respectively.

	10) The string-ci* procedures act as if they applied
	string-foldcase to their arguments before calling the non-ci
	versions.

	For an ASCII-only Scheme, this amounts to calling either
	char-downcase or char-upcase on each character of each string.

	11) In addition to the identifier characters of the ASCII
	repertoire specified by R5RS, Scheme implementations may permit
	any additional repertoire of Unicode characters to be employed in
	identifiers, provided that each character has a Unicode general
	category of Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc,
	Po, Sc, Sm, Sk, So, or Co.  No non-Unicode characters may be
	used in identifiers.

	12) All Scheme implementations shall permit the sequence
	"\x<hexdigits>;" to appear in Scheme identifiers.  If the
	character with the given Unicode scalar value is supported
	by the implementation, this sequence must be replaced
	by the corresponding character; if not, it is left alone.

	This causes symbol->string not to produce the same string on all
	implementations.  For example, the hypothetical implementation
	above would have (symbol->string '\x3BB;) produce a one-character
	string, whereas an ASCII-only Scheme would produce a six-character
	string.

	I believe this to be tolerable, given that existing R5RS
	implementations may return "Foo", "FOO", or "foo" as the value
	of (symbol->string 'Foo); the first of these is technically not
	R5RS-compliant, but is very common anyway.

-- 
John Cowan    cowan at ccil.org    http://ccil.org/~cowan
Heckler: "Go on, Al, tell 'em all you know.  It won't take long."
Al Smith: "I'll tell 'em all we *both* know.  It won't take any longer."



More information about the r6rs-discuss mailing list