Dfreq [ options ] [ -g group-by-key-field-list ] key-field-list [ input-file.. ]
Dfreq makes frequency count records of given key-field-list fields. An output record consists of fields in the key-field-list, and a predefined field "count". The key value is unique in the output, and the "count" field value is the number of D-records having the key value. Record sequence order in the output is the order given by key-field-list. It is same as the order used in Dsort.
Values of the key fields in the output may be altered depending on key flags as follows: f flag converts small letters to capital letters, n flag converts values to normalized numeric form, d flag eliminates delimiters from the values, and i flag eliminates non printing characters. The field order of key fields in the output follows the order in the key-field-list, and the field order of the input records is not preserved.
Output records also have field "percent" when -p option is given.
By default, missing value (i.e. input record without any of key fields) is not counted as an input. They can be counted by giving -m option. The output record for the missing value has only "count" field (possibly with "percent" field) without any key fields.
When there is more than one input files, Dfreq reads them as one consecutive file. But, when -F option is given, Dfreq makes frequency count records for each input file separately, adding "filename" field to each output record.
When -g group-by-key-field-list is given, Dfreq makes frequency count records each time it encounters a sequence break in the group by fields, i.e., the key value of the group by fields is not equal to the key value of the previous record, or at the end of all input files. These group by key fields are added to the output records. Generally (but not necessarily), this option is used after Dsort by the same group-by-key-field-list. In the group by process, "percent" (if any) field value is caliculated based on the records in the same key field value records.
Practically, next two commands:
Dfreq -g a b
Dfreq a,b
generate same result except for the "percent" field. (In the former case, each group makes 100%, while in the latter case, whole input makes 100%). Major difference of these two cases is memory usage. Group by process uses less memory, because Dfreq keeps all the key values in the memory until it flushes records out. In the case of group by process, Dfreq flushes records out each time the group by key value is changed. In the case of normal process, records are flushed out only at the end of the last input file. Group by option is therefore useful for very large input files, or when the key field has numerous variety of values.
Each output record has following fields in that order.
See the manual of D_msg.
MIYAZAWA Akira