ezcdb::ezcdb.1

constant database multitool
ezcdb(1)                             ezcdb                            ezcdb(1)



NAME
       ezcdb - constant database (cdb) multitool

SYNOPSIS
       ezcdb [-hV] command opts args

       ezcdb cross [-h] [-p perms ] [-t tmpfile ] result.cdb A.cdb B.cdb
       ezcdb dump [-h] [-d del | -g | -x] [ cdb ]
       ezcdb dupes [-h] [-d del | -g | -x] [-q] [ cdb ]
       ezcdb get [-h] [-a | -j num ] [-n] key [ cdb ]
       ezcdb grep [-h] [-d del | -g | -x] [-i] [-k] [-!]  regex [ cdb ]
       ezcdb keys [-h] [-g | -x] [-X] [ cdb ]
       ezcdb make [-h] [-d del | -g | -x] [-p perms ] [-t tmpfile ] cdb
       ezcdb match [-h] [-d del | -g | -x] [-i] [-k] [-!]  pattern [ cdb ]
       ezcdb merge [-h] [-p perms ] [-t tmpfile ] [-a] result.cdb A.cdb B.cdb
       ezcdb purge [-h] [-p perms ] [-t tmpfile ] result.cdb A.cdb B.cdb
       ezcdb stats [-h] [-v] [ cdb ]

DESCRIPTION
       ezcdb  is  used  to  generate,  query, analyze, and operate on constant
       database files in cdb(5) format.  The  following  operations  are  sup-
       ported:

   cross
       ezcdb cross performs a set intersect operation with A.cdb and B.cdb and
       compiles the result in result.cdb.  The cross operation selects records
       in  A  with matching keys in B.  The record sequence in result.cdb pre-
       serves the original sequence among records in A.  Any/all of the  named
       argument files may ``overlap''.  Options:

       -p perms
              Permissions.   Set  the file creation permissions for result.cdb
              as explicitly given in the octal argument perms.  Otherwise, the
              result.cdb  file  permissions will be set to mode 0666 and modi-
              fied by the process umask.

       -t tmpfile
              Tempfile.  Use the path specified by the  argument  tmpfile  for
              the temporary file used during the creation of result.cdb.  Nor-
              mally the temporary file name is constructed from the result.cdb
              argument as result.cdb.{new}.

   dump
       ezcdb  dump lists to stdout all records found in the given cdb file (or
       seekable input on stdin).  Output format is under  control  of  options
       and described in the FORMATS section.  Options:

       -d del Delimiter.   Output  records  in  ``getline''  format, using the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -x     Exchange.  Output records in ``exchange'' format.

   dupes
       ezcdb  dupes  lists  to stdout all records with duplicate keys found in
       the given cdb file (or seekable input on  stdin),  and  prints  summary
       report  to  stderr.   Because the dupes command sequentially scans each
       record and hash value in the file, it may also be used to validate  the
       integrity  of  the database.  Output format is under control of options
       and described in the FORMATS section.  Options:

       -d del Delimiter.  Output records  in  ``getline''  format,  using  the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -x     Exchange.  Output records in ``exchange'' format.

       -q     Quick/quiet.   Suppress  any  output  and   summary   reporting.
              Short-circuit  scan  and return exit status 1 on first duplicate
              key found in cdb.  Otherwise, return exit status 0 for no dupli-
              cates found.

   get
       ezcdb  get  looks  up  key  in cdb (or seekable input on stdin) and, if
       found, writes the associated record value to  stdout.   Exits  zero  if
       lookup is successful.  Exits non-zero (1) if key not found.  Options:

       -a     All.   Write  all record values with matching key.  A newline is
              appended to each record value output.  Normally, only the  first
              record matching key is output.

       -j num Jump.   Skip  the  first  num matches for key before writing the
              num+1 matching record value, if any.

       -n     Newline suppressed.  Write only the record value without append-
              ing  a  newline.  Normally, a newline is appended to each record
              in the output.

   grep
       ezcdb grep lists to stdout all records found in the given cdb file  (or
       seekable  input  on  stdin)  matching the re_format(7) extended regular
       expression given in regex.  Note that the regex argument may need to be
       quoted  to  inhibit  unwanted expansion by the shell.  Output format is
       under control of options and described in the FORMATS  section.   Exits
       zero  if  one  or  more  matches found.  Exits non-zero (1) if no match
       found.  Options:

       -d del Delimiter.  Output records  in  ``getline''  format,  using  the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -x     Exchange.  Output records in ``exchange'' format.

       -i     Case insensitive.  Perform case insensitive matching.   Normally
              the  regular  expression  is  matched explicitly with respect to
              case.

       -k     Key match.  Perform  the  regular  expression  matching  against
              record keys.  Normally the regular expression is matched against
              record values.

       -!     Invert (logical not).  Select and output the records that do not
              match  the  given  regular  expression.   Normally  the matching
              records are output.

   keys
       ezcdb keys lists to stdout all keys found in the  given  cdb  file  (or
       seekable  input  on  stdin).  Output format is under control of options
       and modified slightly from the descriptions in the FORMATS section.  By
       default, keys are listed in a modified default format:

           +klen:key\n

       An empty line terminates the default output sequence.  Options:

       -g     Getline.  Output keys listed in a modified ``getline'' format:
                   key\n

       -x     Exchange.  Output keys listed in a modified ``exchange'' format:
                   klen:key\n

       -X     Hash (hexadecimal).  Following each key, display the hash  value
              computed for the key in hexadecimal format.

   make
       ezcdb  make  generates  the  cdb  file cdb from formatted input read on
       stdin.  Input format is under control of options and described  in  the
       FORMATS section.  Options:

       -d del Delimiter.  Input records in ``getline'' format, using the first
              character in the argument del as separator between the  key  and
              value parts of the record.  Implies -g.

       -g     Getline.  Input records in ``getline'' format.

       -x     Exchange.  Input records in ``exchange'' format.

       -p perms
              Permissions.   Set  the  file  creation  permissions  for cdb as
              explicitly given in the octal argument  perms.   Otherwise,  the
              cdb  file  permissions  will be set to mode 0666 and modified by
              the process umask.

       -t tmpfile
              Tempfile.  Use the path specified by the  argument  tmpfile  for
              the  temporary  file  used during the creation of cdb.  Normally
              the temporary file name is constructed from the cdb argument  as
              cdb.{new}.

   match
       ezcdb match lists to stdout all records found in the given cdb file (or
       seekable input on stdin) matching the simple wildcard expression  given
       in  pattern.   A pattern is a character string composed in any combina-
       tion of:

              o   any character (excepting `?' and `*'), matched explicitly

              o   the `?' (question-mark) character, matching any single char-
                  acter

              o   the  `*' (asterisk) character, matching any sequence of zero
                  or more characters

       Note that the pattern expression argument is a simplified subset of the
       sh(1)  globbing  rules provided by fnmatch(3), and does not provide for
       range expressions or escaping of metacharacters.  (The ezcdb grep  com-
       mand  may  be used whenever more sophisticated pattern matching expres-
       sions are required.)

       Note also that the pattern argument may need to be  quoted  to  inhibit
       unwanted  expansion  by  the  shell.  Output format is under control of
       options and described in the FORMATS section.  Exits  zero  if  one  or
       more matches found.  Exits non-zero (1) if no match found.  Options:

       -d del Delimiter.   Output  records  in  ``getline''  format, using the
              first character in the argument del as separator between the key
              and value parts of the record.  Implies -g.

       -g     Getline.  Output records in ``getline'' format.

       -x     Exchange.  Output records in ``exchange'' format.

       -i     Case  insensitive.  Perform case insensitive matching.  Normally
              the wildcard expression is matched explicitly  with  respect  to
              case.

       -k     Key  match.   Perform the wildcard matching against record keys.
              Normally the wildcard expression is matched against record  val-
              ues.

       -!     Invert (logical not).  Select and output the records that do not
              match the given  wildcard  expression.   Normally  the  matching
              records are output.

   merge
       ezcdb  merge  performs  a  set union operation with A.cdb and B.cdb and
       compiles the result in result.cdb.  The merge  operation  combines  all
       records in A and B, excluding by default any records in A with matching
       keys in B.  Note that the -a option may be used  for  a  ``union  all''
       operation.  The record sequence in result.cdb includes A records before
       B records, and original sequence is preserved among records in A and B.
       Any/all of the named argument files may ``overlap''.  Options:

       -p perms
              Permissions.   Set  the file creation permissions for result.cdb
              as explicitly given in the octal argument perms.  Otherwise, the
              result.cdb  file  permissions will be set to mode 0666 and modi-
              fied by the process umask.

       -t tmpfile
              Tempfile.  Use the path specified by the  argument  tmpfile  for
              the temporary file used during the creation of result.cdb.  Nor-
              mally the temporary file name is constructed from the result.cdb
              argument as result.cdb.{new}.

       -a     All.  Merge all records from A and B, not excluding any matching
              keys.  Normally any records in A with matching  keys  in  B  are
              excluded.

   purge
       ezcdb  purge  performs  a  set minus (exclude) operation with A.cdb and
       B.cdb and compiles the  result  in  result.cdb.   The  purge  operation
       selects  only  records  in  A  without  matching keys in B.  The record
       sequence in result.cdb preserves the original sequence among records in
       A.  Any/all of the named argument files may ``overlap''.  Options:

       -p perms
              Permissions.   Set  the file creation permissions for result.cdb
              as explicitly given in the octal argument perms.  Otherwise, the
              result.cdb  file  permissions will be set to mode 0666 and modi-
              fied by the process umask.

       -t tmpfile
              Tempfile.  Use the path specified by the  argument  tmpfile  for
              the temporary file used during the creation of result.cdb.  Nor-
              mally the temporary file name is constructed from the result.cdb
              argument as result.cdb.{new}.

   stats
       ezcdb  stats scans the cdb file (or seekable input on stdin) and prints
       some summary statistics to stdout.  Options:

       -v     Verbose.  Some additional information is included  in  the  sum-
              mary.

FORMATS
       The  ezcdb  utility accepts the following formats for record input/out-
       put:

   default
       The default record format follows the  original  cdbmake  specification
       and is described as:

           +klen,dlen:key->data\n

       Where:  klen and dlen are the key length and data length, respectively,
       in decimal ascii notation; key and data  are  any  arbitrary  character
       sequences  for  the record key and data; each record begins with a lit-
       eral `+' character; each record ends with a newline; and the `,',  `:',
       and  `->'  separators are literal characters.  A sequence of records is
       terminated by a final empty line.

   getline [-g]
       The ``getline'' record format is selected with the  -g  option  and  is
       described as:

           key\tdata\n

       Where:  key  and  data are any arbitrary character sequences (excepting
       nul and newline) for the record key and data; separated by default with
       a  single  tab character; and each record is terminated with a newline.
       The separator must itself not appear within any  key,  but  may  appear
       within  data.   An alternative separator character between key and data
       may be specified by the first character in the del argument to  the  -d
       option.   Lines beginning with a `#' character are ignored.  A sequence
       of records is terminated by eof.  Note that this  format  will  not  be
       usable  in cases where the input records may themselves contain the nul
       or newline characters.

   exchange [-x]
       The ``exchange'' record format is selected with the -x  option  and  is
       described as:

           klen:dlen\tkey:data\n

       Where:  klen and dlen are the key length and data length, respectively,
       in decimal ascii notation; key and data  are  any  arbitrary  character
       sequences for the record key and data; each record ends with a newline;
       and the `:' and tab separators are literal characters.  A  sequence  of
       records is terminated by eof.

MISCELLANEOUS
       In this section are some additional notes and comments regarding opera-
       tions on constant databases.

   Duplicate Keys
       The cdb format imposes no constraints on the presence of duplicate keys
       in  a  database.   For  example, applications may use duplicate keys to
       represent one-to-many relationships between keys and values.

       Other applications may require a  unique  key  constraint,  modeling  a
       strict one-to-one relationship between keys and values.  In such cases,
       the ezcdb make operation will not itself  screen  input  for  duplicate
       keys.   However,  the ezcdb dupes operation may be used to test for the
       presence of duplicate keys in a cdb file, and executes in a manner that
       is  generally efficient when compared to other methods of pre-screening
       duplicates on input.

       A simple front-end script is suggested as  one  means  to  implement  a
       duplicate  key  constraint  on  ezcdb  make by checking its output with
       ezcdb dupes before atomically moving the file (with mv(1) or rename(2))
       to its intended destination.

   Set Operations
       The  ezcdb utility provides some basic set operations on cdb files with
       the  cross,  merge,  and  purge  commands.   Let  ``0''  represent  the
       null/empty set, and imposing a unique key constraint on each of the set
       operands A and B, the following relations are given:

           A merge A = A

           A merge 0 = A

           A purge A = 0

           A purge 0 = A

           A cross A = A

           A cross 0 = 0

           for((A cross B) == 0): (A merge B) == (B merge A), with respect  to
           membership but not to order

           (A  merge all B) == (B merge all A), with respect to membership but
           not to order

           for(A != B): (A purge B) != (B purge A)

           for((A cross B) == 0): (A purge B) = A, (B purge A) = B

       All ezcdb operations are stable with respect  to  maintaining  original
       insertion  order,  and records in operand A are inserted before records
       in operand B.

       When the unique key constraint is relaxed for A and B, the results  for
       set  operations  are  predictable and exactly described, but some addi-
       tional consideration may be needed  to  understand  the  outcome.   For
       example: given the operation A merge B, whenever a duplicate key exists
       across multiple records in A, and such key is found in  only  a  single
       record  in  B,  all  records  in  A with that key are excluded from the
       result, and the result will contain only the single record with  match-
       ing  key  from B.  Considering the inverse B merge A operation for this
       example, the result will now exclude the single  record  with  matching
       key from B, and all records with that key in A will now be included.

   Grep vs. Match
       ezcdb  provides  two  ways  to  scan  a  constant  database for pattern
       matches, with the commands grep and match.  While neither of  these  is
       nearly  as  fast  as performing exact key matches with the get command,
       grep and match do permit useful  constant  database  queries  in  those
       instances where exact key matches are not otherwise possible.

       In  most  cases, the match command will be preferred to grep because it
       is simpler to use and faster in execution.  Especially for  the  novice
       user,  match  patterns are easy to compose and resemble common wildcard
       globbing expressions.  But grep  is  much  more  capable  when  complex
       matches  are  required,  and may be useful for performing sophisticated
       queries.

       One additional difference between  grep  and  match  should  be  noted.
       match  patterns  are  implicitly  anchored  to  match  against the full
       record, whereas grep patterns require the explicit use of the  `^'  and
       `$'  metacharacters  whenever  matching  against  the  full  record  is
       required.  Otherwise, a grep pattern will match against  any  substring
       found  within  a  record,  while, conversely, a match pattern will need
       leading and trailing `*' asterisk characters to match against any  sub-
       string within a record.

       Neither  grep nor match will be particularly useful in querying any cdb
       file that includes records containing the nul  and/or  newline  charac-
       ters.

   Seekable Input
       The  commands  dump, dupes, get, grep, keys, match, and stats will read
       either a cdb file argument directly, or stdin.  If  input  is  read  on
       stdin, it must be ``seekable'', that is, permit the lseek(2) operation.
       Piped input is normally not seekable and the command will fail  ESPIPE.
       On the other hand, the sh(1) input redirection operator (`<') will usu-
       ally succeed if the underlying device supports lseek(2) operations.

LIMITS
       All cdb (5) sizes are constrained  to  the  range  limits  inherent  in
       32-bit  unsigned integers, with a maximum file size of (2^32 - 1) bytes
       (4 gigabytes).  Record keys and values may  be  of  any  length  within
       these  constraints, less the overhead required by the cdb (5) file for-
       mat.  Unless using the ``getline'' input format, the ezcdb make  opera-
       tion  permits that single keys and values do not have to fit into local
       memory.  When using the  ``getline''  input  format,  local  memory  is
       required  of  sufficient  size  to  contain  the  largest  single  line
       (key\tvalue\n) encountered in the  input.   Otherwise  the  ezcdb  make
       operation  requires about 12 bytes of memory overhead per input record.

OPTIONS
       The specific options relative to  each  command  are  described  above.
       ezcdb also recognizes the following general options:

       -h     Help.   Display  a  brief help message to stderr and exit.  When
              the -h option follows command, the help message is  specific  to
              the requested operation.

       -V     Version.  Display version information to stderr and exit.

EXIT STATUS
       ezcdb exits with one of the following values:

       0      Success.

       1      Boolean  negative result.  Interpretation based on command: get,
              grep,  and  match  indicates  no  match  found;  dupes  in  [-q]
              ``quick/quiet'' mode indicates duplicates found.

       100    Usage  error.   An  error was encountered among options or argu-
              ments to the command.  In this case, ezcdb prints a brief  diag-
              nostic message to stderr on exit.

       111    System failure.  The command failed to complete due to some sys-
              tem, protocol, or resource error.  In this case, ezcdb prints  a
              brief diagnostic message to stderr on exit.

AUTHOR
       Wayne Marshall, http://b0llix.net/ezcdb/

SEE ALSO
       cdb(5), http://cr.yp.to/cdb.html



ezcdb-0.01                       December 2010                        ezcdb(1)