[r6rs-discuss] Strings

Per Bothner per at bothner.com
Sun Mar 25 16:58:45 EDT 2007


Thomas Lord wrote:
> By which I take you to mean I should show you uses for random
> access into strings?   You are talking about using
> markers instead of integer string indexes, right?

Right.

> Fast pattern matching.  Fast search of texts displaying
> various kinds of sorting.  Fast fingerprinting.  Quite
> a few more, I'm sure, even though those already broad
> categories.

True, though those are pretty esoteric, written as
specialized low-level libraries.

More importantly note that in most cases you don't need to
move by N "characters" - you need to move by N *code
units*, which would be an O(1) operation.  For fast
string matching/searching, as long as the subject
string and the pattern use the same encoding, you can
treat multi-code-unit characters equivalent to a sequence
of code units: Searching a sequence of characters is
equivalent to searching for a sequence of code units.

> With that set of assumptions (high locality of mutations and
> a 25% waste of space) you've leant support to your formal
> proposal for STRING-APPEND! but not any for markers.
> On the contrary, you should be happy with a STRING-REPLACE!
> that replaces an arbitrary substring, not necessarily preserving
> length.   Your implementation is free to use gap buffers
> internally and, if your use assumptions hold for a given
> application, it will perform swell -- no markers needed.

The problem is with the API for specifying the substring:
Assume the buffer contains UTF-8 or UTF-16 code units.
If the position is specified as a character (scalar value)
index then we need to scan (or use some kind of index) to
map that index into a buffer offset.  If the position is
specified as code unit offset, or a unspecified "magic
cookie" then we can index directly into the buffer.

An advantage of using opaque "marker" objects rather than
indexes is that it's a way to leave the representation
up to the implementation, so it can use whatever is most
efficient.
-- 
	--Per Bothner
per at bothner.com   http://per.bothner.com/



More information about the r6rs-discuss mailing list