D_lsa - Leaf separation algorithms of D-commands.

[ English | Japanese ]

[visit D-home]

DESCRIPTION

Leaf separation algorithm is used in Dunbundle, Dtie and DtoXml to find a repeating group in a D-record. Leaf-field-list is a specific field-list to specify fields to form a repeating group.

LEAF AND STEM FIELD

It is possible to represent hierarchical or tree structure implicitly in a D-record. In this case, fields representing the higher hierarchy are called stem fields and those representing lower are called leaf fields.

Typically, a tree structure is introduced by Dbundle command which bundles D-records with same key value. In this case, the key fields becomes the stem field and the other fields become the leaf fields.

Example 1:

a:A
b:1

a:A
b:2

a:A
b:3

These three records become next record with Dbundle a command.

a:A
b:1
b:2
b:3

Field "a" is called the stem field and field "b" is called leaf field in this case. In other words, there is a stem a:A and three leaves b:1, b:2 and b:3.

In the above example, there is just one leaf field, and the field is repeating. Generally, leaf fields may form a repeating group, in which two or more fields are grouped and repeat. Study next example.

Example A:

house:Plantagenets
name:Henry II
reign:1154-89
name:Richard I
reign:1189-99
name:John
reign:1199-1216
name:Henry III
reign:1216-72

In this case, "house" is the stem field and "name" and "reign" are the leaf field, which forms a repeating group. It is easy to guess a leaf represents an entry for a king. If all the "leaves" have a "name" and a "reign" field in this order. things are easy. A pair of "name" and "reign" field in this order makes an individual leaf.

But, in the real data world, you may want to add more information. For example:

Example B:

house:Plantagenets
name:Henry II
reign:1154-89
name:Richard I
byname:Lion-Heart
reign:1189-99
name:John
byname:Lackland reign:1199-1216
name:Henry III
reign:1216-72

Here, bynames are added optionally. You can still tell the individual leaf begins with the field "name".

D-bundling operation (Dbundle) is not reversible, generally. But, in many cases, as shown in the above example A and B, there are some ways or algorithms to tell how individual leaves are grouped, thus D-bundling operation becomes reversible. This algorithm is called "leaf separation algorithm".

We introduce a notation to show the individual leaf:

a:A 
b:1l1
b:2l2
b:3l3

Fields marked with "l" are leaves and the following number is the sequence number of individual leaf. Fields with no mark are the stem fields.

To find out a tree structure in D-records, two clues are required. One is the leaf-field-list to tell which fields form leaves, and the other is the leaf separation algorithm to tell how the leaf fields are separated as individual leaves. Though there can be countless algorithms from general ones to specific ones, current version of D-commands provides simple but practical three algorithms: n-th occurrence, leading field and sequence break.

LEAF-FIELD-LIST

Leaf-field-list is a field-list to provide leaf field names together with optional "*" which means the corresponding field is repeatable in a leaf.

Syntax of the leaf-field-list is a subset of the field-format-list, of which only "*" has meaning. In Dtie command, the field-format-list(output) is in the same time used as the leaf-field-list.

A leaf-field-list can be an exclusive field-list, but in this case you can use only the n-th occurrence algorithm.

N-TH OCCURENCE ALGORITHM

This is the default leaf separation algorithm. This algorithm is used when -N option is specified, or when no other leaf separation algorithm is specified.

By this algorithm, n-th occurrence of each leaf field belongs to the n-th leaf. This algorithm works when the fields are not optional in a leaf, and do not repeat in an individual leaf. Example 1 above can be reversed by this algorithm. As a leaf field can not repeat in a leaf, "*" in a leaf-field-list contradicts to this algorithm, but is just ignored.

Example 2: -N "b,c"

a:A 
b:1l1
b:2l2
c:3l1
c:4l2

LEADING FIELD ALGORITHM

This algorithm is invoked by -L option. The top field of the leaf-field-list is the "leading field" in this algorithm. A new leaf starts when the leading field appears.

Example 3: -L "b,c"

a:A 
b:1l1
b:2l2
c:3l2
c:4l2

When the leading field is repeatable, consecutive occurrence of it does not start a new leaf. (Stem fields do not affect the algorithm. They can be exist amongst leaf fields, or consecutive leading fields)

Example 4: -L "b:*,c"

a:A 
b:1l1
b:2l1
c:3l1
c:4l1

"*" at a non leading field has no meaning and is just ignored.

Example 5: -L "b:*,c:*"

a:A 
b:1l1
a:2 
b:3l1
c:4l1
a:5 
c:6l1

The first occurrence of a leaf field always starts the first leaf, even if it is not the leading field.

Example 6: -L "b,c"

a:A 
c:1l1
b:2l2
c:3l2
b:4l3

For the leading field algorithm, exclusive field-list is prohibited, because you can't tell which is the leading field.

SEQUENCE BREAK ALGORITHM

This algorithm is invoked by -S option. A new leaf starts when the field sequence in a given D-record does not follow the sequence given by the leaf-field-list. Repeatable mark in the leaf-field-list works not to cause sequence break by repeating the same field.

Example 7:

 -S "b,c"-S "b:*,c"-S "b,c:*"-S "b:*,c:*"
a:A
b:1l1l1l1l1
b:2l2l1l2l1
c:3l2l1l2l1
c:4l3l2l2l1

Occurrences of stem fields does not affect to the leaf field sequence. Exclusive field-list is prohibited as it does not show the field sequence.

SEE ALSO

D_fmt, Dunbundle, Dtie, Dintro.

AUTHOR

MIYAZAWA Akira


miyazawa@nii.ac.jp
2003