hdb(5) hdb hdb(5) NAME hdb - hash database (hdb32) file format DESCRIPTION An hdb file is implemented as a partitioned hash table. It provides a set of mappings for record ``keys'' to record ``values'', where keys and records are any arbitrary strings of bytes. An hdb file is implemented as a ``write-once, read-many'' datastore. It is generally used by applications requiring fast read access to ``constant'' data, and is modeled after the cdb(5) constant database specification. A hdb file is comprised of six sections: I M P C R H Where: I: identifier A file type identifier of 16 bytes. The identifier for files conforming to this specification is ``hdb32/1.0\0\0\0\0\0\0\0''. M: metadata File metadata of 8 bytes consisting of: 4 bytes nrecs total records in database; and 4 bytes offset to start of record sec- tion R. P: hash subtable pointers An array of 8 pointer elements providing a ``table of contents'' for the 8 hash subtables in section H. Each element P[ntab] is eight bytes: 4 bytes number of slots in hash subtable ntab; 4 bytes offset to start of hash subtable ntab. A subtable may have zero slots. C: descriptive comment An optional comment of any length, normally used to describe the dataset. R: records The sequential collection of nrecs records in the database. Storage for each record R[nrec] is a concatenation of: 3 bytes key length; 3 bytes value length; record key; and record value. H: hash tables An array of 8 hash subtables. Each subtable H[ntab] is com- prised of P[ntab].tslots slots. Each slot is eight bytes: 4 bytes hash value; 4 bytes offset to associated record in section R. An offset value of 0 indicates an empty slot, and no associ- ated record exists. Hash values are calculated for any given byte string key of length klen according to the following algorithm: uint32_t hdb32_hash(unsigned char *key, uint32_t klen) { uint32_t h = 0; while(klen){ h = h ^ *key; h = h * 37; ++key; --klen; } return h; } The subtable ntab for a given key is calculated from the relation: ntab = (hash % 8). The subtable offset and number of subtable slots is read from P[ntab]. The target slot for a given key in a subtable is computed from: target = ((HMIX(hash) >> 3) % tslots) where tslots is the number of slots in the subtable as read from P[ntab], and HMIX(hash) is a bit operation defined as: (hash >> 13) ^ hash A record is located according to an open-addressing hash scheme as fol- lows. First compute the hash value, subtable, and target slot for the record key. Begin probing in subtable H[ntab], starting with the tar- get slot, then the next slot, and so forth, while checking at each step in section R for a matching key. Continue until either the record with matching key is found, or until an empty slot is encountered. All offset and hash values are stored as 32-bit unsigned integers packed into a portable, little-endian form, 4 bytes per value. Length values for keys and values are stored as 24-bit unsigned integers, packed i