Dmeans - calculation of maximum, minimum, average and standard deviation.

[ English | Japanese ]

[visit D-home]

SYNOPSIS

Dmeans [ -Fr ] [ -g key-field-list ] [ -k key-flags ] field-list [ input-file.. ]

DESCRIPTION

Dmeans calculates basic statistical values of the fields given by the field-list parameter. Output is a statistic D-record, of which field names are of the form

identifier.field-name

where the identifier is "n", "sum", "min", "max", "avg" or "std", and field-name is one from the field-list of which statistics are calculated.

Meaning of identifier is as follows:

n
number of values on which calculations are based (n)
sum
the sum of values
min
the smallest value
max
the largest value
avg
the average (or mean) value; i.e. sum/n
std
the standard deviation

Generally, for the purpose of statistic analysis, repeating fields are not used. But, when an input record contains repeating fields, unlike other D-commands, Dmeans handles them as separate values. For example, when the first record has two "a" fields, the second record has one and the third record has none, output "n.a" value is three. This result is same as the case each of three records has one "a" field.

When there is more than one input files, Dmeans reads them as one consecutive file. But, when -F option is given, Dmeans makes a statistic record for each input file separately, adding "filename" field to each statistic record.

When -g key-field-list is given, Dmeans makes a statistic record each time it encounters a sequence break in the group by fields, i.e., the key value of the given fields is not equal to the key value of the previous record, or at the end of all input files. These group by key fields are added to the output records. Generally (but not necessarily), this option is used after Dsort by the same key-field-list.

When -r option is given, Dmeans makes a statistic record for each input record. In this case, fields of each record not in the key-field-list are included in the output record. When this option is given, -g option is not allowed. Typically, this is used for the records with repeating fields.

OUTPUT RECORD

An output record has following fields, in that order.

filename
when -F option is given; value is in the form of the command argument after globbed by the shell. This field does not exist when the input file is the standard input.
group by key fieldss
when -g option is given; field order follows the key-field-list, and the value may be alteered depending on the key flags.
other fields
when -r option is in effect; fields of the input record not listed in the field-list; input record field order is preserved.
statistic fields of the first field in field-list
identifier.field-name fields for the first field in the field-list; see the following paragraphs.
statistic fields of the following fields in field-list
statistic fields are repeated for each field in the field-list.

Statistic fields are in following order.

n.field-name
mandatory.
sum.field-name
when n is greater than zero.
avg.field-name
when n is greater than zero.
min.field-name
when n is greater than zero.
max.field-name
when n is greater than zero.
std.field-name
when n is greater than one.

OPTIONS

-F
input files are processed individually; a statistic records is output at each end of input file. It also adds "filename" field to each output record.
-g key-field-list
group by keys; statistic records are calculated for each group of records having same key value.
-k key-flags
default key-flags for the key-field-list. See the manual of Dintro.
-r
record statistics; statistic records are calculated for each D-record.
-D [i/o]datautf=8|16|32
UTF I/O feature (see manual page of UTF I/O feature.)

ENVIRONMENT

Ddatautf, Didatautf, Dodatautf
for UTF I/O feature.

DIAGNOSTICS

See the manual of D_msg.

SEE ALSO

Dintro, Dfreq, D_msg

AUTHOR

MIYAZAWA Akira


miyazawa@nii.ac.jp
2003