Djoin [ -k key-flags] [ -o output-spec] key-field-list input-file [input-file..]
Djn is the short name (UNIX only) for Djoin.
Djoin joins the input files with key-field-list key values. When there is only one input-file, standard input is used as the second input file. There may be three or more input-files, which are joined together. All input files must be sorted by the key-field-list key order, otherwise Djoin reports an error and terminates its operation.
All input files are read in parallel. The records which has the same key value are grouped, and the output record[s] is composed from these records as follows.
For examples to demonstrate above rules, we introduce Dl constant notation for D-record presentation.
{ a:1 b:2 }
is same as
a:1
b:2
which represents a D-record with field "a" value 1 and field "b" value 2.
When the file1 has { k:K a:1 }, the file2 has { b:1 k:K } and the file3 has { c:1 k:K }, then
Djoin k file1 file2 file3
will produce the record
{ k:K a:1 b:1 c:1 }
When the input records are:
file1: { k:K a:1 }, { k:K a:2 }
file2: { k:K b:1 }
file3: { k:K c:1 }, { k:K c:2 }
output records are
{ k:K a:1 b:1 c:1 }
{ k:K a:1 b:1 c:2 }
{ k:K a:2 b:1 c:1 }
{ k:K a:1 b:1 c:2 }
in this order.
Note that Djoin does not know if joined files have same field names. When they have a same field name, it becomes a repeating field in the output. You have to rename them beforehand to distinguish these fields.
By default Djoin outputs only full matched records (i.e., same key value group with records from all input files). But, by output-spec given by -o option, arbitrary combination of matching can be selected.
Output-spec is a string of which i-th character corresponds to the i-th input-file. This character has the following meaning:
There may be more than one -o options. In this case, output is the union of all output-specs.
Key of join is specified with general key-field-list of D-commands. Two or more fields with numeric, case ignorance, or reverse order matching can be used.
However, there is no way to join D-files with different field names as the key. To use Djoin, you have to make the same key field name in the input files, by means of Drename or by other commands.
Assume a file "countrycode.d" contians records like:
countrycode:jp
countryname:Japan
countrycode:us
countryname:United States
File "city.d" has records like
city:Tokyo
countrycode:jp
city:New York
countrycode:us
...
Both files are sorted by "countrycode", then
Djoin -o 1x countrycode city.d countrycode.d
adds "countryname" field to the "city.d" records.
city:Tokyo
countrycode:jp
countryname:Japan
city:New York
countrycode:us
countryname:United States
...
When countrycode is not found in "countrycode.d" file, "city.d" records are unchanged.
Assume file "stopwds.d" contains records like:
wd:of
wd:the
...
and sorted by wd:f (case insensitive alphabetical order).
File "words.d" has records like:
wd:Djoin
wd:joins
wd:the
...
To pick non stop words only,
Dsort wd:f words.d | Djoin -o 10 wd:f - stopwds.d
Output only unmatched records from two input files:
Djoin -o 10 -o 01 key-filed input-file-1 input-file-2
Key-field-list can be null list:
Djoin "" input-file-1 inpur-file-2
This operation makes cross production of input-file-1 and input-file-2. But, note that you need enough memory space to hold both files.
See the manual of D_msg.
MIYAZAWA Akira