ezcdb(1) ezcdb ezcdb(1) NAME ezcdb - constant database (cdb) multitool SYNOPSIS ezcdb [-hV] command opts args ezcdb cross [-h] [-p perms ] [-t tmpfile ] result.cdb A.cdb B.cdb ezcdb dump [-h] [-d del | -g | -x] [ cdb ] ezcdb dupes [-h] [-d del | -g | -x] [-q] [ cdb ] ezcdb get [-h] [-a | -j num ] [-n] key [ cdb ] ezcdb grep [-h] [-d del | -g | -x] [-i] [-k] [-!] regex [ cdb ] ezcdb keys [-h] [-g | -x] [-X] [ cdb ] ezcdb make [-h] [-d del | -g | -x] [-p perms ] [-t tmpfile ] cdb ezcdb match [-h] [-d del | -g | -x] [-i] [-k] [-!] pattern [ cdb ] ezcdb merge [-h] [-p perms ] [-t tmpfile ] [-a] result.cdb A.cdb B.cdb ezcdb purge [-h] [-p perms ] [-t tmpfile ] result.cdb A.cdb B.cdb ezcdb stats [-h] [-v] [ cdb ] DESCRIPTION ezcdb is used to generate, query, analyze, and operate on constant database files in cdb(5) format. The following operations are sup- ported: cross ezcdb cross performs a set intersect operation with A.cdb and B.cdb and compiles the result in result.cdb. The cross operation selects records in A with matching keys in B. The record sequence in result.cdb pre- serves the original sequence among records in A. Any/all of the named argument files may ``overlap''. Options: -p perms Permissions. Set the file creation permissions for result.cdb as explicitly given in the octal argument perms. Otherwise, the result.cdb file permissions will be set to mode 0666 and modi- fied by the process umask. -t tmpfile Tempfile. Use the path specified by the argument tmpfile for the temporary file used during the creation of result.cdb. Nor- mally the temporary file name is constructed from the result.cdb argument as result.cdb.{new}. dump ezcdb dump lists to stdout all records found in the given cdb file (or seekable input on stdin). Output format is under control of options and described in the FORMATS section. Options: -d del Delimiter. Output records in ``getline'' format, using the first character in the argument del as separator between the key and value parts of the record. Implies -g. -g Getline. Output records in ``getline'' format. -x Exchange. Output records in ``exchange'' format. dupes ezcdb dupes lists to stdout all records with duplicate keys found in the given cdb file (or seekable input on stdin), and prints summary report to stderr. Because the dupes command sequentially scans each record and hash value in the file, it may also be used to validate the integrity of the database. Output format is under control of options and described in the FORMATS section. Options: -d del Delimiter. Output records in ``getline'' format, using the first character in the argument del as separator between the key and value parts of the record. Implies -g. -g Getline. Output records in ``getline'' format. -x Exchange. Output records in ``exchange'' format. -q Quick/quiet. Suppress any output and summary reporting. Short-circuit scan and return exit status 1 on first duplicate key found in cdb. Otherwise, return exit status 0 for no dupli- cates found. get ezcdb get looks up key in cdb (or seekable input on stdin) and, if found, writes the associated record value to stdout. Exits zero if lookup is successful. Exits non-zero (1) if key not found. Options: -a All. Write all record values with matching key. A newline is appended to each record value output. Normally, only the first record matching key is output. -j num Jump. Skip the first num matches for key before writing the num+1 matching record value, if any. -n Newline suppressed. Write only the record value without append- ing a newline. Normally, a newline is appended to each record in the output. grep ezcdb grep lists to stdout all records found in the given cdb file (or seekable input on stdin) matching the re_format(7) extended regular expression given in regex. Note that the regex argument may need to be quoted to inhibit unwanted expansion by the shell. Output format is under control of options and described in the FORMATS section. Exits zero if one or more matches found. Exits non-zero (1) if no match found. Options: -d del Delimiter. Output records in ``getline'' format, using the first character in the argument del as separator between the key and value parts of the record. Implies -g. -g Getline. Output records in ``getline'' format. -x Exchange. Output records in ``exchange'' format. -i Case insensitive. Perform case insensitive matching. Normally the regular expression is matched explicitly with respect to case. -k Key match. Perform the regular expression matching against record keys. Normally the regular expression is matched against record values. -! Invert (logical not). Select and output the records that do not match the given regular expression. Normally the matching records are output. keys ezcdb keys lists to stdout all keys found in the given cdb file (or seekable input on stdin). Output format is under control of options and modified slightly from the descriptions in the FORMATS section. By default, keys are listed in a modified default format: +klen:key\n An empty line terminates the default output sequence. Options: -g Getline. Output keys listed in a modified ``getline'' format: key\n -x Exchange. Output keys listed in a modified ``exchange'' format: klen:key\n -X Hash (hexadecimal). Following each key, display the hash value computed for the key in hexadecimal format. make ezcdb make generates the cdb file cdb from formatted input read on stdin. Input format is under control of options and described in the FORMATS section. Options: -d del Delimiter. Input records in ``getline'' format, using the first character in the argument del as separator between the key and value parts of the record. Implies -g. -g Getline. Input records in ``getline'' format. -x Exchange. Input records in ``exchange'' format. -p perms Permissions. Set the file creation permissions for cdb as explicitly given in the octal argument perms. Otherwise, the cdb file permissions will be set to mode 0666 and modified by the process umask. -t tmpfile Tempfile. Use the path specified by the argument tmpfile for the temporary file used during the creation of cdb. Normally the temporary file name is constructed from the cdb argument as cdb.{new}. match ezcdb match lists to stdout all records found in the given cdb file (or seekable input on stdin) matching the simple wildcard expression given in pattern. A pattern is a character string composed in any combina- tion of: o any character (excepting `?' and `*'), matched explicitly o the `?' (question-mark) character, matching any single char- acter o the `*' (asterisk) character, matching any sequence of zero or more characters Note that the pattern expression argument is a simplified subset of the sh(1) globbing rules provided by fnmatch(3), and does not provide for range expressions or escaping of metacharacters. (The ezcdb grep com- mand may be used whenever more sophisticated pattern matching expres- sions are required.) Note also that the pattern argument may need to be quoted to inhibit unwanted expansion by the shell. Output format is under control of options and described in the FORMATS section. Exits zero if one or more matches found. Exits non-zero (1) if no match found. Options: -d del Delimiter. Output records in ``getline'' format, using the first character in the argument del as separator between the key and value parts of the record. Implies -g. -g Getline. Output records in ``getline'' format. -x Exchange. Output records in ``exchange'' format. -i Case insensitive. Perform case insensitive matching. Normally the wildcard expression is matched explicitly with respect to case. -k Key match. Perform the wildcard matching against record keys. Normally the wildcard expression is matched against record val- ues. -! Invert (logical not). Select and output the records that do not match the given wildcard expression. Normally the matching records are output. merge ezcdb merge performs a set union operation with A.cdb and B.cdb and compiles the result in result.cdb. The merge operation combines all records in A and B, excluding by default any records in A with matching keys in B. Note that the -a option may be used for a ``union all'' operation. The record sequence in result.cdb includes A records before B records, and original sequence is preserved among records in A and B. Any/all of the named argument files may ``overlap''. Options: -p perms Permissions. Set the file creation permissions for result.cdb as explicitly given in the octal argument perms. Otherwise, the result.cdb file permissions will be set to mode 0666 and modi- fied by the process umask. -t tmpfile Tempfile. Use the path specified by the argument tmpfile for the temporary file used during the creation of result.cdb. Nor- mally the temporary file name is constructed from the result.cdb argument as result.cdb.{new}. -a All. Merge all records from A and B, not excluding any matching keys. Normally any records in A with matching keys in B are excluded. purge ezcdb purge performs a set minus (exclude) operation with A.cdb and B.cdb and compiles the result in result.cdb. The purge operation selects only records in A without matching keys in B. The record sequence in result.cdb preserves the original sequence among records in A. Any/all of the named argument files may ``overlap''. Options: -p perms Permissions. Set the file creation permissions for result.cdb as explicitly given in the octal argument perms. Otherwise, the result.cdb file permissions will be set to mode 0666 and modi- fied by the process umask. -t tmpfile Tempfile. Use the path specified by the argument tmpfile for the temporary file used during the creation of result.cdb. Nor- mally the temporary file name is constructed from the result.cdb argument as result.cdb.{new}. stats ezcdb stats scans the cdb file (or seekable input on stdin) and prints some summary statistics to stdout. Options: -v Verbose. Some additional information is included in the sum- mary. FORMATS The ezcdb utility accepts the following formats for record input/out- put: default The default record format follows the original cdbmake specification and is described as: +klen,dlen:key->data\n Where: klen and dlen are the key length and data length, respectively, in decimal ascii notation; key and data are any arbitrary character sequences for the record key and data; each record begins with a lit- eral `+' character; each record ends with a newline; and the `,', `:', and `->' separators are literal characters. A sequence of records is terminated by a final empty line. getline [-g] The ``getline'' record format is selected with the -g option and is described as: key\tdata\n Where: key and data are any arbitrary character sequences (excepting nul and newline) for the record key and data; separated by default with a single tab character; and each record is terminated with a newline. The separator must itself not appear within any key, but may appear within data. An alternative separator character between key and data may be specified by the first character in the del argument to the -d option. Lines beginning with a `#' character are ignored. A sequence of records is terminated by eof. Note that this format will not be usable in cases where the input records may themselves contain the nul or newline characters. exchange [-x] The ``exchange'' record format is selected with the -x option and is described as: klen:dlen\tkey:data\n Where: klen and dlen are the key length and data length, respectively, in decimal ascii notation; key and data are any arbitrary character sequences for the record key and data; each record ends with a newline; and the `:' and tab separators are literal characters. A sequence of records is terminated by eof. MISCELLANEOUS In this section are some additional notes and comments regarding opera- tions on constant databases. Duplicate Keys The cdb format imposes no constraints on the presence of duplicate keys in a database. For example, applications may use duplicate keys to represent one-to-many relationships between keys and values. Other applications may require a unique key constraint, modeling a strict one-to-one relationship between keys and values. In such cases, the ezcdb make operation will not itself screen input for duplicate keys. However, the ezcdb dupes operation may be used to test for the presence of duplicate keys in a cdb file, and executes in a manner that is generally efficient when compared to other methods of pre-screening duplicates on input. A simple front-end script is suggested as one means to implement a duplicate key constraint on ezcdb make by checking its output with ezcdb dupes before atomically moving the file (with mv(1) or rename(2)) to its intended destination. Set Operations The ezcdb utility provides some basic set operations on cdb files with the cross, merge, and purge commands. Let ``0'' represent the null/empty set, and imposing a unique key constraint on each of the set operands A and B, the following relations are given: A merge A = A A merge 0 = A A purge A = 0 A purge 0 = A A cross A = A A cross 0 = 0 for((A cross B) == 0): (A merge B) == (B merge A), with respect to membership but not to order (A merge all B) == (B merge all A), with respect to membership but not to order for(A != B): (A purge B) != (B purge A) for((A cross B) == 0): (A purge B) = A, (B purge A) = B All ezcdb operations are stable with respect to maintaining original insertion order, and records in operand A are inserted before records in operand B. When the unique key constraint is relaxed for A and B, the results for set operations are predictable and exactly described, but some addi- tional consideration may be needed to understand the outcome. For example: given the operation A merge B, whenever a duplicate key exists across multiple records in A, and such key is found in only a single record in B, all records in A with that key are excluded from the result, and the result will contain only the single record with match- ing key from B. Considering the inverse B merge A operation for this example, the result will now exclude the single record with matching key from B, and all records with that key in A will now be included. Grep vs. Match ezcdb provides two ways to scan a constant database for pattern matches, with the commands grep and match. While neither of these is nearly as fast as performing exact key matches with the get command, grep and match do permit useful constant database queries in those instances where exact key matches are not otherwise possible. In most cases, the match command will be preferred to grep because it is simpler to use and faster in execution. Especially for the novice user, match patterns are easy to compose and resemble common wildcard globbing expressions. But grep is much more capable when complex matches are required, and may be useful for performing sophisticated queries. One additional difference between grep and match should be noted. match patterns are implicitly anchored to match against the full record, whereas grep patterns require the explicit use of the `^' and `$' metacharacters whenever matching against the full record is required. Otherwise, a grep pattern will match against any substring found within a record, while, conversely, a match pattern will need leading and trailing `*' asterisk characters to match against any sub- string within a record. Neither grep nor match will be particularly useful in querying any cdb file that includes records containing the nul and/or newline charac- ters. Seekable Input The commands dump, dupes, get, grep, keys, match, and stats will read either a cdb file argument directly, or stdin. If input is read on stdin, it must be ``seekable'', that is, permit the lseek(2) operation. Piped input is normally not seekable and the command will fail ESPIPE. On the other hand, the sh(1) input redirection operator (`<') will usu- ally succeed if the underlying device supports lseek(2) operations. LIMITS All cdb (5) sizes are constrained to the range limits inherent in 32-bit unsigned integers, with a maximum file size of (2^32 - 1) bytes (4 gigabytes). Record keys and values may be of any length within these constraints, less the overhead required by the cdb (5) file for- mat. Unless using the ``getline'' input format, the ezcdb make opera- tion permits that single keys and values do not have to fit into local memory. When using the ``getline'' input format, local memory is required of sufficient size to contain the largest single line (key\tvalue\n) encountered in the input. Otherwise the ezcdb make operation requires about 12 bytes of memory overhead per input record. OPTIONS The specific options relative to each command are described above. ezcdb also recognizes the following general options: -h Help. Display a brief help message to stderr and exit. When the -h option follows command, the help message is specific to the requested operation. -V Version. Display version information to stderr and exit. EXIT STATUS ezcdb exits with one of the following values: 0 Success. 1 Boolean negative result. Interpretation based on command: get, grep, and match indicates no match found; dupes in [-q] ``quick/quiet'' mode indicates duplicates found. 100 Usage error. An error was encountered among options or argu- ments to the command. In this case, ezcdb prints a brief diag- nostic message to stderr on exit. 111 System failure. The command failed to complete due to some sys- tem, protocol, or resource error. In this case, ezcdb prints a brief diagnostic message to stderr on exit. AUTHOR Wayne Marshall, http://b0llix.net/ezcdb/ SEE ALSO cdb(5), http://cr.yp.to/cdb.html ezcdb-0.01 December 2010 ezcdb(1)