hash database (hdb32) multitool


I. Basic Operations

The distribution contains a plain text datafile of US postal codes
in ./data/zipcodes:

  $ head ./data/zipcodes
  # Zip Code : Location
  #       $NetBSD: zipcodes,v 1.6 2003/05/11 01:55:18 wiz Exp $
  #       @(#)zipcodes    8.1 (Berkeley) 6/8/93
  00401:Pleasantville, NY
  00501:Holtsville, NY
  00544:Holtsville, NY
  00601:Adjuntas, PR
  00602:Aguada, PR
  00603:Aguadilla, PR
  00604:Aguadilla, PR

Inspection of the file shows that it is directly parseable by hdb.
Just use the "getline" input format specifying a `:' (colon)
delimiter between key and value (this format also ignores the
commented lines beginning with `#'):

  $ hdb make -c "US Zippers" -d: zipcodes.hdb <./data/zipcodes

The -c option is used to add some useless descriptive comment into
the datafile.

In this kind of database, it wouldn't do at all if duplicate keys
existed.  Each zipcode should appear once and only once, or who knows
where your mail could end up.  To check for a unique key constraint on
the database, use the "dupes" operation:

  $ hdb dupes -q zipcodes.hdb && echo "no dupes"
  no dupes

Now perform a key lookup on the postal code "90210":

  $ hdb get 90210 zipcodes.hdb
  Beverly Hills, CA

Wasn't that some tv series?

Now search all record values beginning with "bone", case insensitive
(the output shown is in the default format):

  $ hdb match -i "bone*" zipcodes.hdb
  5:13    30806:Boneville, GA
  5:13    57317:Bonesteel, SD
  5:12    62815:Bone Gap, IL

Nuts, I was trying to get a Boner.

Repeat the match query, this time with an equivalent regular
expression and the grep command.  Note the explicit anchoring in
this regular expression compared to the implicit anchoring in the
wildcard expresson above.  Pipeline the results directly into
another hdb:

  $ hdb grep -i "^bone" zibcodes.hdb | hdb make -c "Dem Bones" bonez.hdb

Check the new hdb file with a dump (with the -y option, the output
format is in cdb "legacy" format):

  $ hdb dump -y bonez.hdb
  +5,13:30806->Boneville, GA
  +5,13:57317->Bonesteel, SD
  +5,12:62815->Bone Gap, IL

Nope, still no Boner.

II. Set Operations

Everyone knows the zipcode for "The Dalles", Oregon, is 97058, right?

  $ hdb get 97058 <zipcodes.hdb
  The Dalles, OR

Aside: what kind of town starts its name with "The"?

Anyway, Google has one of their server farms located in The Dalles,
sucking massive amounts of hydroelectric energy off the mighty Columbia
river nearby.  Let's suppose the town gets hit by a Google-Bomb,
and is competely obliterated in a puff of virtual smoke.

Sadly, then, it becomes necessary to remove The Dalles zipcode from the
database.  One could open and edit the original plaintext source file,
then rebuild the hdb from the edited file.  Or one could use hdb set
operations directly on the command line as follows:

  $ echo "97058:" | hdb make -d: delete.hdb && \
  hdb purge zipcodes_new.hdb zipcodes.hdb delete.hdb

The "echo" command here is used simply to pipe in a single record
with key "97058" (and empty value) that hdb make will compile into
delete.hdb.  Then zipcodes.hdb and delete.hdb are combined with the
hdb purge command.  The result of the "purge" is zipcodes_new.hdb,
containing all the original records from zipcodes.hdb, except those
with matching keys found in delete.hdb.  For this example, then,
the "97058" record will be deleted from zipcodes_new.hdb:

  $ hdb get 97058 zipcodes_new.hdb || echo "not found"
  not found

The Dalles is history. 

The result file argument of a set operation command can safely name
either of the operand files.  The result file will then replace
the operand file:

  $ echo "97058:" | hdb make -d: delete.hdb && \
  hdb purge zipcodes.hdb zipcodes.hdb delete.hdb

This example is the same as above, except that now the record has,
effectively, been deleted from the zipcodes.hdb file directly:

  $ hdb get 97058 zipcodes.hdb || echo "not found"
  not found

Everyone knows Montana is a big state, and seems to attract more than
its fair share of whackjobs.  For a time, even Frank Zappa settled