ezcdb(1) ezcdb ezcdb(1)
NAME
ezcdb - constant database (cdb) multitool
SYNOPSIS
ezcdb [-hV] command opts args
ezcdb cross [-h] [-p perms ] [-t tmpfile ] result.cdb A.cdb B.cdb
ezcdb dump [-h] [-d del | -g | -x] [ cdb ]
ezcdb dupes [-h] [-d del | -g | -x] [-q] [ cdb ]
ezcdb get [-h] [-a | -j num ] [-n] key [ cdb ]
ezcdb grep [-h] [-d del | -g | -x] [-i] [-k] [-!] regex [ cdb ]
ezcdb keys [-h] [-g | -x] [-X] [ cdb ]
ezcdb make [-h] [-d del | -g | -x] [-p perms ] [-t tmpfile ] cdb
ezcdb match [-h] [-d del | -g | -x] [-i] [-k] [-!] pattern [ cdb ]
ezcdb merge [-h] [-p perms ] [-t tmpfile ] [-a] result.cdb A.cdb B.cdb
ezcdb purge [-h] [-p perms ] [-t tmpfile ] result.cdb A.cdb B.cdb
ezcdb stats [-h] [-v] [ cdb ]
DESCRIPTION
ezcdb is used to generate, query, analyze, and operate on constant
database files in cdb(5) format. The following operations are sup-
ported:
cross
ezcdb cross performs a set intersect operation with A.cdb and B.cdb and
compiles the result in result.cdb. The cross operation selects records
in A with matching keys in B. The record sequence in result.cdb pre-
serves the original sequence among records in A. Any/all of the named
argument files may ``overlap''. Options:
-p perms
Permissions. Set the file creation permissions for result.cdb
as explicitly given in the octal argument perms. Otherwise, the
result.cdb file permissions will be set to mode 0666 and modi-
fied by the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of result.cdb. Nor-
mally the temporary file name is constructed from the result.cdb
argument as result.cdb.{new}.
dump
ezcdb dump lists to stdout all records found in the given cdb file (or
seekable input on stdin). Output format is under control of options
and described in the FORMATS section. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-x Exchange. Output records in ``exchange'' format.
dupes
ezcdb dupes lists to stdout all records with duplicate keys found in
the given cdb file (or seekable input on stdin), and prints summary
report to stderr. Because the dupes command sequentially scans each
record and hash value in the file, it may also be used to validate the
integrity of the database. Output format is under control of options
and described in the FORMATS section. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-x Exchange. Output records in ``exchange'' format.
-q Quick/quiet. Suppress any output and summary reporting.
Short-circuit scan and return exit status 1 on first duplicate
key found in cdb. Otherwise, return exit status 0 for no dupli-
cates found.
get
ezcdb get looks up key in cdb (or seekable input on stdin) and, if
found, writes the associated record value to stdout. Exits zero if
lookup is successful. Exits non-zero (1) if key not found. Options:
-a All. Write all record values with matching key. A newline is
appended to each record value output. Normally, only the first
record matching key is output.
-j num Jump. Skip the first num matches for key before writing the
num+1 matching record value, if any.
-n Newline suppressed. Write only the record value without append-
ing a newline. Normally, a newline is appended to each record
in the output.
grep
ezcdb grep lists to stdout all records found in the given cdb file (or
seekable input on stdin) matching the re_format(7) extended regular
expression given in regex. Note that the regex argument may need to be
quoted to inhibit unwanted expansion by the shell. Output format is
under control of options and described in the FORMATS section. Exits
zero if one or more matches found. Exits non-zero (1) if no match
found. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-x Exchange. Output records in ``exchange'' format.
-i Case insensitive. Perform case insensitive matching. Normally
the regular expression is matched explicitly with respect to
case.
-k Key match. Perform the regular expression matching against
record keys. Normally the regular expression is matched against
record values.
-! Invert (logical not). Select and output the records that do not
match the given regular expression. Normally the matching
records are output.
keys
ezcdb keys lists to stdout all keys found in the given cdb file (or
seekable input on stdin). Output format is under control of options
and modified slightly from the descriptions in the FORMATS section. By
default, keys are listed in a modified default format:
+klen:key\n
An empty line terminates the default output sequence. Options:
-g Getline. Output keys listed in a modified ``getline'' format:
key\n
-x Exchange. Output keys listed in a modified ``exchange'' format:
klen:key\n
-X Hash (hexadecimal). Following each key, display the hash value
computed for the key in hexadecimal format.
make
ezcdb make generates the cdb file cdb from formatted input read on
stdin. Input format is under control of options and described in the
FORMATS section. Options:
-d del Delimiter. Input records in ``getline'' format, using the first
character in the argument del as separator between the key and
value parts of the record. Implies -g.
-g Getline. Input records in ``getline'' format.
-x Exchange. Input records in ``exchange'' format.
-p perms
Permissions. Set the file creation permissions for cdb as
explicitly given in the octal argument perms. Otherwise, the
cdb file permissions will be set to mode 0666 and modified by
the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of cdb. Normally
the temporary file name is constructed from the cdb argument as
cdb.{new}.
match
ezcdb match lists to stdout all records found in the given cdb file (or
seekable input on stdin) matching the simple wildcard expression given
in pattern. A pattern is a character string composed in any combina-
tion of:
o any character (excepting `?' and `*'), matched explicitly
o the `?' (question-mark) character, matching any single char-
acter
o the `*' (asterisk) character, matching any sequence of zero
or more characters
Note that the pattern expression argument is a simplified subset of the
sh(1) globbing rules provided by fnmatch(3), and does not provide for
range expressions or escaping of metacharacters. (The ezcdb grep com-
mand may be used whenever more sophisticated pattern matching expres-
sions are required.)
Note also that the pattern argument may need to be quoted to inhibit
unwanted expansion by the shell. Output format is under control of
options and described in the FORMATS section. Exits zero if one or
more matches found. Exits non-zero (1) if no match found. Options:
-d del Delimiter. Output records in ``getline'' format, using the
first character in the argument del as separator between the key
and value parts of the record. Implies -g.
-g Getline. Output records in ``getline'' format.
-x Exchange. Output records in ``exchange'' format.
-i Case insensitive. Perform case insensitive matching. Normally
the wildcard expression is matched explicitly with respect to
case.
-k Key match. Perform the wildcard matching against record keys.
Normally the wildcard expression is matched against record val-
ues.
-! Invert (logical not). Select and output the records that do not
match the given wildcard expression. Normally the matching
records are output.
merge
ezcdb merge performs a set union operation with A.cdb and B.cdb and
compiles the result in result.cdb. The merge operation combines all
records in A and B, excluding by default any records in A with matching
keys in B. Note that the -a option may be used for a ``union all''
operation. The record sequence in result.cdb includes A records before
B records, and original sequence is preserved among records in A and B.
Any/all of the named argument files may ``overlap''. Options:
-p perms
Permissions. Set the file creation permissions for result.cdb
as explicitly given in the octal argument perms. Otherwise, the
result.cdb file permissions will be set to mode 0666 and modi-
fied by the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of result.cdb. Nor-
mally the temporary file name is constructed from the result.cdb
argument as result.cdb.{new}.
-a All. Merge all records from A and B, not excluding any matching
keys. Normally any records in A with matching keys in B are
excluded.
purge
ezcdb purge performs a set minus (exclude) operation with A.cdb and
B.cdb and compiles the result in result.cdb. The purge operation
selects only records in A without matching keys in B. The record
sequence in result.cdb preserves the original sequence among records in
A. Any/all of the named argument files may ``overlap''. Options:
-p perms
Permissions. Set the file creation permissions for result.cdb
as explicitly given in the octal argument perms. Otherwise, the
result.cdb file permissions will be set to mode 0666 and modi-
fied by the process umask.
-t tmpfile
Tempfile. Use the path specified by the argument tmpfile for
the temporary file used during the creation of result.cdb. Nor-
mally the temporary file name is constructed from the result.cdb
argument as result.cdb.{new}.
stats
ezcdb stats scans the cdb file (or seekable input on stdin) and prints
some summary statistics to stdout. Options:
-v Verbose. Some additional information is included in the sum-
mary.
FORMATS
The ezcdb utility accepts the following formats for record input/out-
put:
default
The default record format follows the original cdbmake specification
and is described as:
+klen,dlen:key->data\n
Where: klen and dlen are the key length and data length, respectively,
in decimal ascii notation; key and data are any arbitrary character
sequences for the record key and data; each record begins with a lit-
eral `+' character; each record ends with a newline; and the `,', `:',
and `->' separators are literal characters. A sequence of records is
terminated by a final empty line.
getline [-g]
The ``getline'' record format is selected with the -g option and is
described as:
key\tdata\n
Where: key and data are any arbitrary character sequences (excepting
nul and newline) for the record key and data; separated by default with
a single tab character; and each record is terminated with a newline.
The separator must itself not appear within any key, but may appear
within data. An alternative separator character between key and data
may be specified by the first character in the del argument to the -d
option. Lines beginning with a `#' character are ignored. A sequence
of records is terminated by eof. Note that this format will not be
usable in cases where the input records may themselves contain the nul
or newline characters.
exchange [-x]
The ``exchange'' record format is selected with the -x option and is
described as:
klen:dlen\tkey:data\n
Where: klen and dlen are the key length and data length, respectively,
in decimal ascii notation; key and data are any arbitrary character
sequences for the record key and data; each record ends with a newline;
and the `:' and tab separators are literal characters. A sequence of
records is terminated by eof.
MISCELLANEOUS
In this section are some additional notes and comments regarding opera-
tions on constant databases.
Duplicate Keys
The cdb format imposes no constraints on the presence of duplicate keys
in a database. For example, applications may use duplicate keys to
represent one-to-many relationships between keys and values.
Other applications may require a unique key constraint, modeling a
strict one-to-one relationship between keys and values. In such cases,
the ezcdb make operation will not itself screen input for duplicate
keys. However, the ezcdb dupes operation may be used to test for the
presence of duplicate keys in a cdb file, and executes in a manner that
is generally efficient when compared to other methods of pre-screening
duplicates on input.
A simple front-end script is suggested as one means to implement a
duplicate key constraint on ezcdb make by checking its output with
ezcdb dupes before atomically moving the file (with mv(1) or rename(2))
to its intended destination.
Set Operations
The ezcdb utility provides some basic set operations on cdb files with
the cross, merge, and purge commands. Let ``0'' represent the
null/empty set, and imposing a unique key constraint on each of the set
operands A and B, the following relations are given:
A merge A = A
A merge 0 = A
A purge A = 0
A purge 0 = A
A cross A = A
A cross 0 = 0
for((A cross B) == 0): (A merge B) == (B merge A), with respect to
membership but not to order
(A merge all B) == (B merge all A), with respect to membership but
not to order
for(A != B): (A purge B) != (B purge A)
for((A cross B) == 0): (A purge B) = A, (B purge A) = B
All ezcdb operations are stable with respect to maintaining original
insertion order, and records in operand A are inserted before records
in operand B.
When the unique key constraint is relaxed for A and B, the results for
set operations are predictable and exactly described, but some addi-
tional consideration may be needed to understand the outcome. For
example: given the operation A merge B, whenever a duplicate key exists
across multiple records in A, and such key is found in only a single
record in B, all records in A with that key are excluded from the
result, and the result will contain only the single record with match-
ing key from B. Considering the inverse B merge A operation for this
example, the result will now exclude the single record with matching
key from B, and all records with that key in A will now be included.
Grep vs. Match
ezcdb provides two ways to scan a constant database for pattern
matches, with the commands grep and match. While neither of these is
nearly as fast as performing exact key matches with the get command,
grep and match do permit useful constant database queries in those
instances where exact key matches are not otherwise possible.
In most cases, the match command will be preferred to grep because it
is simpler to use and faster in execution. Especially for the novice
user, match patterns are easy to compose and resemble common wildcard
globbing expressions. But grep is much more capable when complex
matches are required, and may be useful for performing sophisticated
queries.
One additional difference between grep and match should be noted.
match patterns are implicitly anchored to match against the full
record, whereas grep patterns require the explicit use of the `^' and
`$' metacharacters whenever matching against the full record is
required. Otherwise, a grep pattern will match against any substring
found within a record, while, conversely, a match pattern will need
leading and trailing `*' asterisk characters to match against any sub-
string within a record.
Neither grep nor match will be particularly useful in querying any cdb
file that includes records containing the nul and/or newline charac-
ters.
Seekable Input
The commands dump, dupes, get, grep, keys, match, and stats will read
either a cdb file argument directly, or stdin. If input is read on
stdin, it must be ``seekable'', that is, permit the lseek(2) operation.
Piped input is normally not seekable and the command will fail ESPIPE.
On the other hand, the sh(1) input redirection operator (`<') will usu-
ally succeed if the underlying device supports lseek(2) operations.
LIMITS
All cdb (5) sizes are constrained to the range limits inherent in
32-bit unsigned integers, with a maximum file size of (2^32 - 1) bytes
(4 gigabytes). Record keys and values may be of any length within
these constraints, less the overhead required by the cdb (5) file for-
mat. Unless using the ``getline'' input format, the ezcdb make opera-
tion permits that single keys and values do not have to fit into local
memory. When using the ``getline'' input format, local memory is
required of sufficient size to contain the largest single line
(key\tvalue\n) encountered in the input. Otherwise the ezcdb make
operation requires about 12 bytes of memory overhead per input record.
OPTIONS
The specific options relative to each command are described above.
ezcdb also recognizes the following general options:
-h Help. Display a brief help message to stderr and exit. When
the -h option follows command, the help message is specific to
the requested operation.
-V Version. Display version information to stderr and exit.
EXIT STATUS
ezcdb exits with one of the following values:
0 Success.
1 Boolean negative result. Interpretation based on command: get,
grep, and match indicates no match found; dupes in [-q]
``quick/quiet'' mode indicates duplicates found.
100 Usage error. An error was encountered among options or argu-
ments to the command. In this case, ezcdb prints a brief diag-
nostic message to stderr on exit.
111 System failure. The command failed to complete due to some sys-
tem, protocol, or resource error. In this case, ezcdb prints a
brief diagnostic message to stderr on exit.
AUTHOR
Wayne Marshall, http://b0llix.net/ezcdb/
SEE ALSO
cdb(5), http://cr.yp.to/cdb.html
ezcdb-0.01 December 2010 ezcdb(1)