[r6rs-discuss] Stateful codecs and inefficient transcoding
Marcin 'Qrczak' Kowalczyk
qrczak at knm.org.pl
Sun Nov 5 15:51:25 EST 2006
William D Clinger <will at ccs.neu.edu> writes:
> This message describes some simplifications to section 15.3 of the
> draft that, in my opinion, would satisfy the real requirements while
> enhancing the usability and efficiency of port i/o.
This is much better!
I support John Cowan's remarks, in particular separation between byte
ports and character ports.
Here is what I would change:
* Standarising the protocol between codecs, transcoders, and ports
would allow to extend the set of transcoders portably.
This is harder than might seem. I will try to port my design to
Efficiency will not necessarily suffer because an implementation
might take shortcuts for builtin transcoders. This approach
complicates the implementation, and thus it would be better for
the universal protocol to be fast enough.
* There are two concepts:
1. A description of a translation of a sequence of bytes
or characters to another sequence of bytes or characters
2. A pair of such descriptions which are supposed to be close
to each other's inverses.
The current proposal uses the second concept only, in a few variants
(transcoders, which are composed from codecs and newline converters).
But in reality they don't necessarily come in pairs. For example it
makes sense to consider a newline converter which accepts several
conventions on input; this makes no sense for output. A compressor
often has settable parameters, but the corresponding decompressor
reads the parameters from the stream, they are not specified
In my design the one-direction transcoder is the more fundamental
concept. Character encodings have names for convenience; there is a
mapping between encoding names and encoders, and between encoding
names and decoders.
> * To prevent interference between operations on the
> original port and operations on the port created by
> transcoded-port, the original port is closed when
> the derived port is created.
Such interference has a purpose: this a way for mixing text and binary
i/o on the same stream, or for using multiple encodings (other than
extracting byte arrays and transcoding them separately).
Unfortunately this interference is delicate:
For output the transcoded stream must be flushed but not closed;
probably flushed in the sense of notifying the transcoder about end of
data (there are other modes of flushing when we consider compression).
For input, if the length of the portion to be decoded is not known
beforehand, but is implied by the result of the decoding, then
decoding must be performed one character at a time. This is slow but
unavoidable. Buffering of transcoding must be somehow turned off.
I don't know of any good way of avoiding these complexities. Trying to
avoid them would either limit expressiveness or make transcoding slow
in the usual case.
__("< Marcin Kowalczyk
\__/ qrczak at knm.org.pl
More information about the r6rs-discuss