[r6rs-discuss] Hash function return value constraints
per at bothner.com
Sat Apr 4 00:16:17 EDT 2009
On 04/03/2009 07:18 PM, Lynn Winebarger wrote:
> The problem is that the situation is actually reversed. The user may be
> writing the hash table implementation, but wants to use the Scheme
> implementation's hash functions to get hashes on arbitrary Scheme
> objects for efficiency. Obviously the tables the implementation
> provides can do whatever they want under the hood, but the hash
> functions provided by the implementation to the public should conform to
> the user's expectations.
> If the implementation's make-hashtable accepts hash functions that yield
> arbitrary fixnums, but it should work even if the hash function produces
> positive bignums. This question actually conforms to the "should" -
> it's ridiculous to require that the implementation detect whether the
> user-provided hash function always return a non-negative exact integer,
> so the text says "should".
There are two protocols you can specify for system hash functions:
(1) Pass in the table size as a parameter to the hash function:
(define (string-hash str table-size) ...)
In this case string-hash returns an integer in the range 0 (inclusive)
to table-size (exclusive), so it can be used directly as an index:
(2) Return an arbitrary integer in some larger range:
(define (string-hash str) ...)
In this case the caller is responsible for mapping the return
from string-hash to the table size, probably using modulo.
The problem with (1) is that it ties the hash-value to the
table-size, which makes some things more difficult. For example
you might want to store the pre-computed hash value, but then
you have to recalculate it if you change the table size. Also,
if you want to compare two table-tables you can use the hash values.
OTOH (2) works fine if the table implementation uses modulo to
map the result into the table size. And in fact it doesn't
really matter much whether the hash value is a signed fixnum
or an unsigned bignum. Regardless, you'll probably want to
restrict it to a finite range, and for performance you want to
OTOOH(*): Approach (2) may make it more difficult to write new hash
functions, unless you know the range of the hash values used by the
system, and that is system-dependent.
(*) On The Other Other Hand.
per at bothner.com http://per.bothner.com/
More information about the r6rs-discuss