cdb(5) ezcdb cdb(5) NAME cdb - constant database file format DESCRIPTION A cdb file is implemented as a partitioned hash table. It provides a set of mappings for record ``keys'' to record ``values'', where keys and records are any arbitrary strings of bytes. A cdb file is comprised of three sections: P R H Where: P[0..255] An array of 256 pointer elements providing a ``table of con- tents'' for the 256 hash subtables in section H. Each element P[ntab] is eight bytes: 4 bytes offset to start of hash subtable ntab; 4 bytes number of slots in hash subtable ntab. A subtable may have zero slots. R[0..(nrecs - 1)] The sequential collection of nrecs records in the database. Storage for each record R[nrec] is a concatenation of: 4 bytes key length; 4 bytes value length; record key; and record value. H[0..255][0..(tslots - 1)] An array of 256 hash subtables. Each subtable H[ntab] is com- prised of P[ntab].tslots slots. Each slot is eight bytes: 4 bytes hash value; 4 bytes offset to associated record in section R. An offset value of 0 indicates an empty slot, and no associ- ated record exists. Hash values are calculated for any given byte string key of length klen according to the following algorithm: uint32_t cdb_hash(unsigned char *key, uint32_t klen) { uint32_t h = 5381; while(klen){ h = h * 33; h = h ^ *key; ++key; --klen; } return h; } The subtable ntab for a given key is calculated from the relation: ntab = (hash % 256). The subtable offset and number of subtable slots is read from P[ntab]. The target slot for a given key in a subtable is computed from: target = ((hash >> 8) % tslots) where tslots is the number of slots in the subtable as read from P[ntab]. A record is located according to an open-addressing hash scheme as fol- lows. First compute the hash value, subtable, and target slot for the record key. Begin probing in subtable H[ntab], starting with the tar- get slot, then the next slot, and so forth, while checking at each step in section R for a matching key. Continue until either the record with matching key is found, or until an empty slot is encountered. All offset, length, and hash values are stored as 32-bit unsigned inte- gers packed into a portable, little-endian form, 4 bytes per value. There is no particular constraint on sizes of keys or values, or number of records in a database, except for the range limits inherent in 32-bit integers. Consequently, a cdb file may not exceed 2^32-1 bytes (4 gigabytes). NOTES The cdb hash function is simple, fast, and readily factored into basic machine instructions. When the function is considered on its own mer- its, outside of the cdb context, the distribution results are usually neither spectacular nor tragic. In particular -- and common to all hash functions of this type -- the result is strongly patterned by the last byte of the key. For example, consider the following keys and cdb hash values from an airport codes database: ABJ 0x0b87b6ac ABK 0x0b87b6ad ABL 0x0b87b6aa ABM 0x0b87b6ab As can be seen here, the hash value for each of these keys di