hdb(5) hdb hdb(5)
NAME
hdb - hash database (hdb32) file format
DESCRIPTION
An hdb file is implemented as a partitioned hash table. It provides a
set of mappings for record ``keys'' to record ``values'', where keys
and records are any arbitrary strings of bytes.
An hdb file is implemented as a ``write-once, read-many'' datastore.
It is generally used by applications requiring fast read access to
``constant'' data, and is modeled after the cdb(5) constant database
specification.
A hdb file is comprised of six sections:
I M P C R H
Where:
I: identifier
A file type identifier of 16 bytes. The identifier for files
conforming to this specification is ``hdb32/1.0\0\0\0\0\0\0\0''.
M: metadata
File metadata of 8 bytes consisting of: 4 bytes nrecs total
records in database; and 4 bytes offset to start of record sec-
tion R.
P: hash subtable pointers
An array of 8 pointer elements providing a ``table of contents''
for the 8 hash subtables in section H. Each element P[ntab] is
eight bytes: 4 bytes number of slots in hash subtable ntab; 4
bytes offset to start of hash subtable ntab. A subtable may
have zero slots.
C: descriptive comment
An optional comment of any length, normally used to describe the
dataset.
R: records
The sequential collection of nrecs records in the database.
Storage for each record R[nrec] is a concatenation of: 3 bytes
key length; 3 bytes value length; record key; and record value.
H: hash tables
An array of 8 hash subtables. Each subtable H[ntab] is com-
prised of P[ntab].tslots slots. Each slot is eight bytes: 4
bytes hash value; 4 bytes offset to associated record in section
R. An offset value of 0 indicates an empty slot, and no associ-
ated record exists.
Hash values are calculated for any given byte string key of length klen
according to the following algorithm:
uint32_t
hdb32_hash(unsigned char *key, uint32_t klen)
{
uint32_t h = 0;
while(klen){
h = h ^ *key;
h = h * 37;
++key; --klen;
}
return h;
}
The subtable ntab for a given key is calculated from the relation:
ntab = (hash % 8).
The subtable offset and number of subtable slots is read from P[ntab].
The target slot for a given key in a subtable is computed from:
target = ((HMIX(hash) >> 3) % tslots)
where tslots is the number of slots in the subtable as read from
P[ntab], and HMIX(hash) is a bit operation defined as:
(hash >> 13) ^ hash
A record is located according to an open-addressing hash scheme as fol-
lows. First compute the hash value, subtable, and target slot for the
record key. Begin probing in subtable H[ntab], starting with the tar-
get slot, then the next slot, and so forth, while checking at each step
in section R for a matching key. Continue until either the record with
matching key is found, or until an empty slot is encountered.
All offset and hash values are stored as 32-bit unsigned integers
packed into a portable, little-endian form, 4 bytes per value. Length
values for keys and values are stored as 24-bit unsigned integers,
packed i