[r6rs-discuss] Comparing Strings

MichaelL at frogware.com MichaelL at frogware.com
Wed Feb 14 12:00:39 EST 2007


The C runtime library has two string comparison functions, strcmp and 
strcoll. strcmp is not locale-aware while strcoll is. Some implementations 
add case-insensitive variants of those functions, stricmp and stricoll.

In R6RS the case-sensitive string comparison functions look like strcmp 
while the case-insensitive comparison functions look somewhat like 
stricoll. (Not really, but I'll get to that.) If that's true there's a 
funny and unexpected asymmetry between the two sets of functions. 

I believe the main issue is that strings use a different case folding 
algorithm than characters do. From the specification:

        (char-upcase #\ß) ===> #\ß

        (string-upcase "Straße") ===> "STRASSE"

        (string-ci=? "Straße" "Strasse") ===> #t
        (string-ci=? "Straße" "STRASSE") ===> #t

I would expect the stricmp-equivalent variants of string comparison to use 
the algorithm that characters use rather than the one it currently uses.

The current functions may be useful, but I believe that a) there's a 
missing set of functions and b) the existing set of functions, if they 
remain, should have names different from what they do now. (They represent 
a different concept, rather like strcmp and strcoll do.)

In the end, are the existing functions *really* useful? Honestly, I can't 
think how. The string-upcase example is cute, but the case-insensitive 
comparison functions that use it are useless for any serious work. They 
have a semblance of locale awareness, but they aren't locale aware and 
that fact would show through rather quickly. In fact, it even shows 
through in case-folding: (string-downcase "STRASSE") ===> "strasse", not 
"straße". Whether that's right or wrong depends on where you are.

I think it would make more sense for R6RS to define a full set of truly 
locale-aware functions and place them in a separate (and optional) 
library. In any event, I question the benefit of the functions as they're 
currently defined.

Comments?



More information about the r6rs-discuss mailing list