[r6rs-discuss] Proposed NON-features for small Scheme, part 8: string-set! must die
brian at mastenbrook.net
Sat Sep 19 10:21:50 EDT 2009
On Sat, 19 Sep 2009 00:24:58 -0500, John Cowan <cowan at ccil.org> wrote:
> This is a proposal for the removal of string-set! (and consequently
> string-fill!) from the R7RS small Scheme language. I am publishing this
> document to invite wide comment. There is nothing official about it.
> I very gratefully acknowledge the kind help of Alex Shinn, who provided
> the topic sentences for most of the paragraphs below. However, I retain
> sole responsibility for this document, including all errors.
> I believe that despite the prescription of the draft WG1 charter that
> no features of IEEE Scheme (a subset of R4RS) should be removed from
> R7RS small Scheme, an exception should be made for string-set!, for at
> least the following reasons:
[Snipped a list of points, most of which I agree with]
> 4) As currently designed, strings are functionally just vectors of
> characters. In an 8-bit world, using the traditional representation
> of strings carries a 4:1 storage advantage, making it worthwhile
> to distinguish them clearly from general vectors But 21-bit Unicode
> characters are a much better fit, if represented as immediate (unboxed)
> values, for general vectors using 32-bit pointers. Granted that not all
> small Scheme systems will provide full Unicode support, general vectors
> start to look much less expensive than they once were. In short: if
> you want something that behaves like a vector of characters, simply use
> a general vector that contains characters.
I don't think there's any point to using a general vector of characters as
a replacement for mutable strings.
I'll add another reason for getting rid of mutable strings, as well as a
rejoinder to reason #4:
6) There's no general utility to `string-set!' without also the ability to
insert and delete characters in a string. Programs that work with text
generally want either an immutable string or an editable string into which
characters can be inserted and deleted. Editable strings are typically
represented as gap buffers. It's possible to use mutable strings to build
up an editable string representation, but this is not evidence of general
utility of fixed-length mutable strings in my opinion.
Nothing will give you immutable strings if you don't already have them. On
the other hand, if the core Scheme strings were immutable and an editable
strings library were provided, existing users of mutable strings could
convert to using the latter representation with little effort, and writing
programs which require an editable strings representation would be vastly
It's much easier for an implementation that uses a variable-width internal
string representation to provide immutable strings and editable strings
than to provide only mutable fixed-length strings. When represented as a
gap buffer, editable strings retain the 4-to-1 or 8-to-1 compactness
advantage of strings over general vectors of characters. The
implementation of editable strings is not significantly more complex than
the implementation of mutable fixed-length strings. Complex algorithms
expressed in terms of `string-set!' can be rewritten in terms of insert
and delete operations with a great increase in clarity.
> As a consequence of removing string-set!, string-fill! (not in IEEE
> Scheme) becomes impossible and string-copy less useful. I do not propose
> to remove string-copy, however, because it can eliminate space leaks
> that are caused by taking a small shared substring of a large existing
> string: when the larger string should be GC'ed, it is retained as a
> whole because of the shared substring. Using string-copy judiciously
> can prevent such leaks.
I shouldn't have to use `string-copy' for this; my implementation should
do it for me. If there's no user-exposed backpointer from the substring to
the original string, the GC can dispose of the original string and copy
out the retaining displaced substrings when it makes sense to do so.
Implementations which don't want to implement this level of GC complexity
can make `substring' always copy. There's really no sense to providing a
copy operation for an immutable type.
brian at mastenbrook.net
More information about the r6rs-discuss