Dfd - Field description

[ English | Japanese ]

[visit D-home]

SYNOPSIS

Dfd [ -e ] [ -g key-field-list ] [ -k key-flags ] [ input-file.. ]

DESCRIPTION

Dfd reads D-records from the input-file and creates a set of field description records, which reports the name and attributes of a field in the input file. Typically, output of Dfd is piped to Dpr to read the result.

By default Dfd writes a set of field description record at each end of input-file. Every field in the input-file is described by a field description record.

When -g option is specified, Dfd creates field description records for each group of D-records which has same key value. These group by key fields are not described by field description records. Only the fields in the present group are included in the field description records.

Record order of Dfd follows the field order of input D-records. If the all input records have same field order (for example "a", "b", "c"), output record order is in that sequence (first record for field "a", second record for field "b", and then "c"). When field order of input records is various, Dfd tries to find out the "natural" sequence.

OPTIONS

-e
extended attribute; NAsc fields, which denotes non ASCII character classes, are included.
-g key-field-list
group by keys; field description records are created for each group of records having same key value.
-k key-flags
default key-flags for the key-field-list. See the manual of Dintro.
-D [i/o]datautf=8|16|32
UTF I/O feature (see manual page of UTF I/O feature.)

OUTPUT RECORD

Each output record has following fields in that order.

filename:
input file name in the form of a command argument after globbed by the shell. This field does not exist when the input file is the standard input.
group by key fields
when -g option is given; fields listed in key-field-list; values may be altered depending on key flags.
fieldname:
name of the concerned field.
min:
minimum occurrence of the concerned field. If this value is zero, one or more records lack the concerned field.
max:
maximum occurrence of the concerned field. If this value is greater than one, the concerned field is repeating field.
exists:
value of this field consists of two numbers separated by "/". The first part is the number of D-records which has the concerned field, and the second part is total number of D-records in the input file. These two parts have same number if min value is not zero.
minlen:
minimum length (number of characters) of the concerned field.
maxlen:
maximum length (number of characters) of the concerned field. If the minlen and maxlen have same value, the concerned field is fixed length.
avglen:
average length (number of characters) of the concerned field.
attribute:
shows numeric/string class attributes of the concerned field. This field exists only when the concerned field has data (i.e. maxlen is not zero). See the following subsection.
NAsc:
extended attribute; shows character class of the field. This field exists only when -e option is given, and there is a non ASCII character in the concerned field. Character class depends on the operating system. See the following subsection.
position:
denotes relative position of the concerned field. Value "split" means the concerned field appears more than once in a record and in split position. This implies the concerned field may be used in a repeating group. This field exists only when the concerned field is "split".

Attribute

Dfd inspects all the values of the concerned field and reports a numeric attribute when they are regarded as numeric. Otherwise it reports string attribute. Numeric values may have leading or following spaces. See Dintro for the detail.

Numeric attribute is one of following:

Int
unsigned decimal integer.
Int-
signed decimal integer.
Num
numeric value including decimal point or floating point notation (e.g., .3141593e01).
Hex
hexadecimal numeric value, i.e. form of 0x.....

String attribute has a value consists of one or more words from next list, possibly with a sign "+".

Asc
means the value has ASCII characters.
NAsc
means the value Non ASCII characters.
Nprt
means the value has non printing character.
?
means the value has character not defined in the output character set. This is not set when UTF I/O feature is applied to the output.

Extended attribute: NAsc field

Extended attribute which shows the character class depends on the operating system environment.

When the internal code is UCS (Windows, Linux and UTF-8 locales under Solaris), character block names are used as character class. There may be one or more NAsc fields, each of which has a character block name as its value. Following is an example of NAsc field.

NAsc:Hiragana
NAsc:CJK Unified Ideographs

In Solaris, when using non UTF-8 locale, no NAsc field is given. Because, the internal code composition is no more open to the applications.

ENVIRONMENT

Ddatautf, Didatautf, Dodatautf
for UTF I/O feature.

DIAGNOSTICS

See the manual of D_msg.

SEE ALSO

Dintro, Drc, D_msg.

AUTHOR

MIYAZAWA Akira


miyazawa@nii.ac.jp
2003