DtoXml - D-file to Xml portion

[ English | Japanese ]

[visit D-home]

SYNOPSIS

DtoXml [ options ] [ input-file.. ]

DESCRIPTION

DtoXml converts D-records in the input-files into an XML portion. Typically, output XML file is fed to xslt program to form a total application system.

A D-field is converted to an XML element. The field name becomes the element name, and the field value becomes the content of the element. D-field names must comply with XML element name rules¹. DtoXml does not check the validity.

A D-record is converted to an XML element containing the D-field elements in the D-record. The element name is <record> by default, which you can change by the -r option.

All the <record> elements from an input file are enclosed in a <file> element with "filename" attribute of which value is the input file name. The element name <file> can be changed by the -f option.

All <file> elements are enclosed in a <root> element. The element name <root> can be changed by the -o option.

Example: input (file example.d).

y:2001
m:4
d:1
v:value1

y:2001
m:4
d:2
v:value2

Output: (indention is added for readability).

<root>
 <file filename="example.d">
  <record>
   <y>2001</y>
   <m>4</m>
   <d>1</d>
   <v>value1</v>
  </record>
  <record>
   <y>2001</y>
   <m>4</m>
   <d>2</d>
   <v>value2</v>
  </record>
 </file>
</root>

Output file has no indention. Start-tag, character data and end-tag are concatenated and form one line. Start-tag or end-tag for the root, file and record forms one line by itself. After the end-tag for the record and file, one null line is inserted.

Group elements

Additional hierarchy between <file> element and <record> element may be generated with -g option. Contiguous D-records with same -g key value are grouped in one element. This group element has a set of -g key elements and <record> elements without -g key elements.

In DtoXml, -g option is repeatable. DtoXml -g y -g m produces:

<root>
 <file filename="example.d">
  <group1>
   <y>2001</y>
   <group2>
    <m>4</m>
    <record>
     <d>1</d>
     <v>value1</v>
    </record>
    <record>
     <d>2</d>
     <v>value2</v>
    </record>
   </group2>
  </group1>
 </file>
</root>

from the same input file above.

Group element name is automatically given as <group1> <group2> etc., or <group> when only one -g option is given. The group entry name can be changed by -h option.

Leaf elements

Another way of hierarchy generation is intra-D-record hierarchy by a leaf separation algorithm. It is similar to Dtie. Next record (obtained by Dbundle from the above input D-file:

y:2001
m:4
d:1
v:value1
d:2
v:value2

Here, the field d and v form leaves, which can be equally separated by any of the three leaf separation algorithms. With -L d,v (or -N or -S) options, DtoXml produces the following result.

<root>
 <file filename="example.d">
  <record>
   <y>2001</y>
   <m>4</m>
   <leaf>
    <d>1</d>
    <v>value1</v>
   </leaf>
   <leaf>
    <d>2</d>
    <v>value2</v>
   </leaf>
  </record>
 </file>
</root>

Leaf element names are automatically given as <leaf> or <leaf1>, <leaf2>.. just like group element names. Leaf element never nests. All the leaf elements are generate just under the <record> element, same as stem field elements. Position of the leaf element is the place where the first field of the leaf appears.

Attributes

A D-field of which name start with COMMERCIAL AT (@) is converted to an attribute. The attribute is placed in the element of the preceding D-field, or in the record element when no preceding D-field exists. @field name has effect even in group field lists or in leaf field lists.

Special characters

When a D-field value is converted to the character data of an element, characters not allowed in XML character data (e.g. <) are converted to the predefined entities (e.g. &lt;). But, when the D-field value is already coded as XML character data, you can escape this conversion by listing the field names in the -n option.

OPTIONS

-g key-field-list
Group by field. For each time new key value is encountered, a group element is generated. The group element has 1) a set of correspondent key field elements and 2a) next level group elements or 2b) record elements up to the next key value is encountered. This option is repeatable. The first -g group element comes directly under the <file> element, and next -g group element comes under that, and so on.
-h group-node-name
Element name² for the preceding -g group. The default is "group" when the group is only one, or "group1", "group2", ... from the top level group.
-N leaf-field-list
-L leaf-field-list
-S leaf-field-list
Leaf fields. As described in the description section, a leaf element is generated for each leaf in the D-record. It is similar to Dtie but the leaf fields are grouped into an leaf element. This option is repeatable.
-l leaf-field-name
Element name² for the preceding -N, -L or -S option. The default is "leaf" when the leaf is only one, or "leaf1", "leaf2", ... from the first leaf-field-list.
-k key-flags
default key flags for -g key-field-lists.
-o root-node-name
Element name² for the root element. The default is "root".
-f file-node-name
Element name² for the file element. The default is "file".
-r record-node-name
Element name² for the record element. The default is "record".
-n field-list
The values in the listed field become the character data of the element as they are. It is the user's responsibility to ensure the XML validity.
-D [i/o]datautf=8|16|32
UTF I/O feature (see manual page of UTF I/O feature.)

EXAMPLES

This is an example to show basic function of DtoXml. Input D-file is as follows:

@attr1:1
a:text-a
@attr2:2
b:text-b

DtoXml without any option produces next output.

<root>
<file>
<record attr1="1">
<a attr2="2">text-a</a>
<b>text-b</b>
</record>

</file>

</root>

Next example is to show -n option. Input file is:

a:text-a <c>text-ac</c>
b:text-b <c>text-bc</c>

DtoXml -n a will produce:

<root>
<file>
<record>
<a>text-a <c>text-ac</c></a>
<b>text-b &lt;c&gt;text-bc&lt;/c&gt;</b>
</record>

</file>

</root>

ENVIRONMENT

Ddatautf, Didatautf, Dodatautf
for UTF I/O feature.

DIAGNOSTICS

See the manual of D_msg.

Notes

(1) There is a small difference of element name rules between XML1.0 and XML1.1. This is one of the reasons DtoXml does not check the validity. It is not difficult to provide valid name: do not use names start with digits (0-9), and avoid special characters other than LOW LINE (_) or HYPHEN MINUS (-). The most crucial point of D-field name against Xml element name is that the former does not allow COLON (:), thus you can not give distinctive namespaces in your output. Usually, DtoXml output is fed to an xslt program, and namespace distinction is treated in it.

(2) It is the user's responsibility to feed valid XML element name. DtoXml does not check or warn the validity.

SEE ALSO

Dintro, D_lsa, DfromXml, Dtie, DtoTex, DtoCsv, D_msg.

AUTHOR

MIYAZAWA Akira


miyazawa@nii.ac.jp
2006