lord at emf.net
Sat Mar 24 20:28:14 EDT 2007
>>> == Per Bothner wrote three back in the thread:
>> == Thomas Lord wrote two back in the thread:
> == Per Bothner wrote one back in the thread:
>> Also, would you have your markers include an
>> operation like (MOVE-MARKER! M N) that changes
>> the position of a marker by some arbitrary integer
>> N? If not, the core string API would penalize operations
>> that /do/ have efficient random access to strings.
>> If so, then you have a simple operation which is
>> O(N) in many popular representations.
>simple != useful. Show me a use for MOVE_MARKER for values
>other than 1 and -1, and then maybe I'll care.
By which I take you to mean I should show you uses for random
access into strings? You are talking about using
markers instead of integer string indexes, right?
Fast pattern matching. Fast search of texts displaying
various kinds of sorting. Fast fingerprinting. Quite
a few more, I'm sure, even though those already broad
(I continue to question the wisdom of regarding even
flat, immutable, 8-bit-char strings as O(1) in any
situation where we aren't either talking about very
short strings or are assuming very high multiplicative
constants that hide the memory hierarchy, but I guess
that's a separate issue.)
>>> I've expressed my preference: introduce a "marker" type,
>>> where a marker is a position in a string or a "buffer".
>>> Simple operations (e.g. get next characater, set next character,
>>> or move position) are O(1) regardless of the reresentation used.
>> I'm not so sure about the O(1) claim.
>> It should be easy to see that O(1) is unlikely in the
>> case where mutations are permitted to change the
>> STRING-LENGTH of the string both because
>> of the need to copy string data and the need to
>> update other markers.
>The word I left out is "average" or "amortized".
>Assume modifications tend to be clumped in the same region,
>most commonly the end of the string. Then use a simple
>buffer-gap representation, growing the buffer by a
>fixed factor (say 2) when needed.
>I will admit that efficiently managing the markers leads to
>various non-trivial options. However, if the marker object
>contains a raw offset into the buffer, then it only has to be
>modified when the gap is moved or the buffer resized,
>in which case even just straightforward looping through all
>the markers still keeps the cost of updates O(1) on average,
>if we assume the number of markers is less than O(N).
With that set of assumptions (high locality of mutations and
a 25% waste of space) you've leant support to your formal
proposal for STRING-APPEND! but not any for markers.
On the contrary, you should be happy with a STRING-REPLACE!
that replaces an arbitrary substring, not necessarily preserving
length. Your implementation is free to use gap buffers
internally and, if your use assumptions hold for a given
application, it will perform swell -- no markers needed.
Maybe you want a TAPE? type? :-)
More information about the r6rs-discuss