DfromCsv - Creation of D-file from Csv format files.

[ English | Japanese ]

SYNOPSIS

DfromCsv [ -F ] [ -t separator ] [ input-file.. ]

DESCRIPTION

DfromCsv converts csv (Comma Separated Value) files into a D-file.

The first line of each input csv file shall be the label line which gives field names. Other lines are data lines which are converted into D-records. When a data value contains new line characters, the corresponding D-field is repeated. Field filename: is added to each D-record when option -F is given (except for the standard input file).

You may use DfromLine -t , -z q field-name-list to convert csv files. The result is almost same, but DfromCsv takes field names from the label line, while DfromLine needs field-format-list to specify field names. When a csv value has a new line character within QUOTATION MARKs, DfromCsv makes repeating fields, while DfromLine converts it into a string given by -n option. In addition, there are small differences of QUOTATION MARK handling between DfromCsv and D-format.

CSV FORMAT

CSV format specification here is what this program interprets. Though it can work with the well-known software, there is no guarantee that it works with other applications. For the other variations of csv files, DfromLine provides more flexible (thus less simple) way of conversion.

Csv file is a text file under the current locale. It consists of two or more lines, of which first line is the label line that gives the column labels, and other lines are data lines which give data values. (If your csv file has the label line at non top place, you have to make it the first line, somehow.)

A line is divided into values with a COMMA (,) character. When a value starts with a QUOTATION MARK ("), characters to the closing QUOTATION MARK comprise a single value. When a pair of consecutive QUOTATION MARKs appears in the value started with a QUOTATION MARK, this pair of QUOTAION MARKs are converted into single QUOTATION MARK character, and are not regarded as the closing QUOTATION MARK.

Within QUOTATION MARKs, COMMA is regarded as a normal character, and even a new line character is normal, thus a csv (label or data) line may span two or more text file lines.

When a data value has new line characters, DfromCsv creates two or more D-fields with same field name, separating the value by new line characters. In the case a label value has new line characters, they are just discarded to yield a single D-field name.

Note that any other characters are normal. Unlike many UNIX applications, REVERSE SOLIDUS (\) does no magic in DfromCsv. Even SPACE is no exception. Though some csv specifications handle space characters after COMMA as a part of separator, DfromCsv doesn't. If you need these spaces to be a separator, pre-process the input csv file (for example by sed), or post-process the output D-file (for example by Ded FIELDS = FIELDS SUBST "^ +" BY "").

As an unusual case, if the input file does not have data value lines, no output is created for the input. When the number of values in a data line is less than the number of labels, these extra fields do not appear in the output D-record. When there are more data values than labels, the last label is repeated.

Another unusual case is out of context usage of QUOTATION MARKs. When a QUOTATION MARK appears in a value which does not start with opening QUOTATION MARK, it is just a normal character (unlike UNIX shell). When non-separator characters appear after a normal closing QUOTATION MARK, these characters to the next separator are part of the value.

OPTIONS

-F: adds "filename" field to each output record; value is the input file name in the form of the command arguments after globbed by the shell. This field is not added when the input file is the standard input.
-t separator: use the separator as separator of the fields instead of the default separator COMMA(,). If the separator has more than one character, only the first character is used.
-D [i/o]datautf=8|16|32: UTF I/O feature (see manual page of UTF I/O feature.)

FIXED NAME FIELDS

filename:: at the top of each record, when -F is specified.

ENVIRONMENT

Ddatautf, Didatautf, Dodatautf: for UTF I/O feature.

EXAMPLES

Create a D-file from a csv file named data.csv and create a D-file named data.d.

DfromCsv data.csv > data.d

Create a D-file and output to standard output file from a csv files named Sheet1.csv, Sheet2.csv and Sheet3.csv, giving filename: field to each record.

DfromCsv -F Sheet1.csv Sheet2.csv Sheet3.csv

DIAGNOSTICS

See the manual of D_msg.

AUTHOR

MIYAZAWA Akira

miyazawa@nii.ac.jp
2004