Introduction
What is Dl
Dl is a language to handle D-records.
It is not general programming language, such as C, C++ or Java.
Functionally, it is akin to awk language.
Dl handles D-records,
while awk handles lines of text files.
For example, you can add a new field to all or a specific record
of the input D-file, change field values, or delete fields.
Though Dl has almost full function of programming languages,
it is not intended to process a huge program. Ded is an interpreter
of Dl and not very fast as a compiler languages.
If you need very complicated process, it is recommended to use other method, for example,
perl or c programming language.
Typical usage of Dl is:
Ded IF txtlang == jpn OR txtlang == kor OR txtlang == chi THEN area = ea FI input-file.d
This command adds (or changes) the field "area" with value "ea" to the
records of which "txtlang" field value is "jpn", "kor" or "chi".
Some of D-commands can be written by Dl.
For example,
Dtie -t ":" a,b c
is same as
Ded FIELD c = FIELD a . CONST ":" . FIELD b ";" FIELD
a = FIELD b = "{" "}"
Conceptual difference of these two approach is that a D-command represents
a D-file basic operation, while Dl offers general purpose D-record
handling method.
Practical difference is the speed.
D-commands are tuned for specific operations and hard-wired code
for the operation.
Ded is an interpreter of Dl
and executes the operations step by step,
thus is slower than D-commands.
It is recommended to use specific
D-command, when there is proper one provided.
Features of Dl
Dl program is written as a series of command arguments (generally).
This is like sed command of unix,
but -e option is not used
and you can directly write the program as command arguments.
Or, if you like, you can provide Dl program from a text file.
Detail is described in the
general syntax
section.
Dl has highly simplified syntax.
Unlike many languages, Dl has no statement.
There are only expressions.
Control structures like "if" or "while" are operators in Dl.
Even the ";" is operator, which is similar to "," operator of the language C.
An expression is "evaluated".
This means that part of the Dl program is "executed".
In addition, Dl does not have subroutines or macros.
These facts make it difficult to write large complicated
program, which is not main objective of Dl.
Any field of a D-record is repeatable.
Consequently, any constant or variable of Dl is an array.
Any operator of Dl is applied to arrays, with special ways.
For example, "+" or addition operator works differently
depending on the numbers of elements in operand values.
Perl language has array and scalar contexts to control the operation semantics.
Dl has only array context for an operator.
Ded and Dselect
Two separate programs interprets Dl.
Ded is the full processor of Dl,
while Dselect has restriction of operators which affects to the output record.
In the case of Ded, given Dl expressions are evaluated
(i.e. the given Dl program is executed) for each input D-record,
and after the evaluation (i.e., execution of the Dl
program), the current record is written to the standard output, if
it has at least one field (i.e., if the current record is not deleted).
You may write the current record explicitly by output,
but even you use output, Ded will write the current record after the evaluation.
After the current record output, Ded reads next input,
and goes into new cycle of evaluation, until it encounters end of file.
In the case of Dselect,
given Dl expression is evaluated for each input D-record,
and if the result value is true
(see boolean evaluation section below),
the input record is written to the standard output.
Assignment operation to a field or output operation
is not allowed in the Dl expression given to Dselect command,
so that the input record is not changed or duplicated in the output.
General Syntax
Words and quoting
Source program of Dl is taken from the command arguments
or from source files given with the -f option.
Dl program is made of words.
Dl operators, constants, field names,
variable names and other Dl reserved words are given as words.
Even a parenthesis is a word in Dl.
A word is made of arbitrary length string of any character.
The way of recognizing a word is slightly different
for command arguments and source file input.
When the program is given as command arguments,
each command argument makes a Dlword.
When the program is given from a text file, each word is separated
by white spaces.
End of a file is treated as a new line character.
In addition to white spaces, only REVERSE SOLIDUS (\),
QUOTATION MARK (") and APOSTROPHE (')
have special meaning in the source file.
Quoting mechanism with these characters explained below follows the UNIX born shell specification.
REVERSE SOLIDUS (\) at the end of line is regarded as line continuation
mark unless it is placed in APOSTROPHE (') quoted string.
The continuation mark and the new line after it are omitted from the word.
REVERSE SOLIDUS at other positions is escape character unless it is placed in APOSTROPHE
or QUOTATION quoted string.
The escape character itself is omitted
and the following character becomes a normal character which makes a part
of the word.
Escape character is usually used to include spaces,
QUOTATION MARK, APOSTROPHE or REVERSE SOLIDUS as a part of the word.
QUOTATION MARK (") is another means to use
special characters in a Dl source file.
Once the Dl parser encounters a QUOTATION MARK,
it is omitted from the word and the following characters are included
in the word until the parser again encounters a QUOTATION MARK.
(Ending QUOTATION MARK is not included in the word).
There are two exceptions within the QUOTATION MARKs.
REVERSE SOLIDUS followed by a QUOTATION MARK makes just one QUOTATION MARK.
This is used to escape ending QUOTATION MARK.
The other exception is REVERSE SOLIDUS followed by a new line character,
which is an line continuation mark within the QUOTATION MARK,
and both REVERSE SOLIDUS and the new line character
are omitted from the current word.
Other REVERSE SOLIDUS between QUOTATION MARKs is treated as normal character.
APOSTROPHE (') is also a quoting character.
It is stronger than QUOTATION MARK. When the Dl parser encounters an APOSTROPHE,
following characters are included in the current word
until it encounters the closing APOSTROPHE.
Unlike QUOTATION MARK, there is no exception.
Even a REVERSE SOLIDUS or a new line character is
treated as normal character within APOSTROPHEs.
To use an APOSTROPHEs within a word,
use REVERSE SOLIDUS escape, or QUOTATION MARK explained above.
Example of quoted words in Dl source files
one\ word |
one word |
"one word" |
one word |
'one word' |
one word |
one" "word |
one word |
o\ n" "e' '' '"w o ""r d" |
o n e w o r d |
\o\n\e\ \w\o\r\d |
one word |
one\ word |
oneword |
\\\"\' |
\"' |
"\"\\'\"" |
"\'" |
"one\ word" |
oneword |
'"\"' |
"\" |
'one word' |
one word |
Tokens
Tokens of the Dl are the field-name,
constant, variable, static variable,
special variable, operator, parenthesis
or end-token.
Each token must be given as a separate word or words led by a reserved keyword.
Grammar
Following is the simplified grammar of Dl.
- program ::= expression
- expression
- ::= primary
- | unary-operator expression
- | expression binary-operator
expression
- | expression
{SUBST|S|SUBSTG|SG}
expression BY expression
- | expression '[' expression
']'
- | IF expression THEN expression
[ ELIF expression THEN expression ]...
[ ELSE expression ] FI
- | WHILE expression DO
expression DONE
- | FOR variable IN expression DO expression
DONE
- primary
- ::= constant
- | field-name
- | variable
- | static-variable
- | special-variable
- | '(' expression ')'
Field Name
Morphology
A word following the reserved word FIELD
is a field name token.
For example,
FIELD a
is a field-name token "a".
Similarly,
FIELD FIELD
is a field-name token "FIELD".
In this case the second word (FIELD) is not a reserved word
but a field name, while the first word (FIELD) is a reserved word.
Omission of the reserved word FIELD
As a special case, a non-reserved word at the top of the program
is regarded as a field-name token.
A non-reserved word after following tokens is also a field-name token by itself.
(
; , && || !
IF THEN ELIF ELSE WHILE DO
ABS AND ATAN AVG CAPS CAT COS COUNT EXISTS EXP
INT LENGTH LOG LOG10 MAX MIN NOT OR SIN SQRT SUM TAN
TOLOWER TOUPPER
In the next example, the second word "a" is a
field-name
token, because it is after EXISTS.
EXISTS a
Evaluation of Field-name token
Field-name token, when evaluated, has the field values
of the given field name in the current record.
The current record may have two or more same name fields.
In this case, the result of field evaluation
becomes an array.
When the current record has no such fields, the value is null value.
Numeric qualifier
The value of the field-name token is string as its default.
But, for example in comparison operation,
you may want to evaluate it as numeric.
Numeric qualifier makes the field-name token evaluation
in numeric value.
It is COLON (:) and a letter "n" after the field-name.
For example
FIELD seq
is evaluated as a string, thus, "10" is smaller than "9".
But,
FIELD seq:n
is evaluated as numeric, and "10" becomes greater than "9".
Constant
Morphology
There are two ways to denote constants.
One is to use a reserved word CONST.
The word following the reserved word CONST is a constant token.
To make array value, repeat CONST and the value after the first
constant.
CONST a
CONST a CONST b
The other way to denote constant is to use BRACEs.
The words following the LEFT BRACE ({) before
a RIGHT BRACE (}) forms a
constant token.
In these BRACEs, reserved words of the
Dl
lose their effect, and become constants
{ a }
{ a b }
Note that a LEFT BRACE is just a constant in BRACEs. Thus the next example:
{ { } }
causes syntax error at the fourth word, because the constant token ends at the third word.
There is no way to include a string consist of
a single RIGHT BRACE in this braced constants.
Use
CONST }
for this purpose.
To make null value constant, use
{ }
There is no way to make null value constant with CONST type notation.
Omission of the reserved word CONST
As a special case, a non-reserved word after following tokens is a constant as it is.
!= !~ % * ** + - . .. /
< <= <> = == =~ > >= [
BY DIVEDBY EQ GE GREP GT IN INCL JOIN
LE LIKE LT MINUS MOD NE NUM PLUS POWER
QX S SG SPLIT STR SUBST SUBSTG SUBSTR TIMES UNLIKE
For example, in the next case
FIELD 1 == 1
the second word '1' is a field name, and the fourth word '1' is a constant.
Evaluation of constant
Constants are evaluated as strings as they are.
In the case when a numeric value is required,
for example after or before '+',
Dl interpreter automatically converts it to numeric value. There is no way to evaluate a constant as numeric value explicitly. Operator NUM converts a string to numeric value.
Variable and Static Variable
Morphology
Variable token consists of a reserved word VAR and a following aribitrary word.
Similarly, a static variable token consists of a reserved word
STATIC and a following arbitrary word.
Only after the FOR token, reserved word VAR may be omitted.
There is no way to omit reserved word STATIC.
Lifetime
A variable or a static variable can hold a value.
The difference is its lifetime.
Lifetime of variable is just one cycle of Dl execution, i.e., when a D-record is read from the input file, all variables are wiped out before evaluation of the given program.
Static variable has
lifetime of a D-command execution, i.e., the value assigned to it is kept
through the execution of a D-command execution. In other words, static variable is a variable in the usual sense and variable is just for a local use such as loop index. (FOR operator takes a variable as its index).
Scope of a variable or a static variable is always the whole program.
Evaluation
A variable or a static variable yields the value assigned to it.
When a variable or a static variable not assigned a value is evaluated,
it yields the null value (an array with no element).
Special Variables
Syntactically, special variables consist of a reserved keyword.
Semantically, they are like predefined variables of perl,
and some of them are just like statements of a programming languages.
When a special variable is evaluated, it yields a value
related to the environment of program execution,
or do some function for the program execution.
In this sence "special variable" may be misleading name.
But to make the syntax simple, they are classified into one category.
These variables are not user changeable, but for CURREC,
which represents whole current record and is changeable by an
assignment operation.
Individual special variables are described int the
Operators and related special variables section.
Parentheses
LEFT PARENTHESIS (() and RIGHT PARENTHESIS ())
are used as in usual languages,
to make an expression a unit, changing the order of operation.
Note that CURLEY BRACKETS ({ }) are for constant tokens and are not grouping
the expressions.
Note also, SQUARE BRACKETS ([ ]) are suffix operator.
End Token
Word '--' is used to indicate the end of program explicitly. Usually, you may not use this, because the
Dl
parser inserts an end token automatically when it encounters
a word which is not a valid token.
You need explicit end token only when your first input file name
is equal to one of reserved words of
Dl.
For example:
COUNT a LT 2 -- LT
In the above example, the third argument LT is a
Dl
reserved word for comparison operator "less than",
while the sixth word LT is the input file name.
In this case you need the end token,
because unless it, the file name is interpreted as a reserved word,
causing a syntax error.
(Note that if the input file name was "lt" in small letter,
you wouldn't need the end-token.)
Evaluation or Execution
As Dl grammer has only expressions,
the program is "evaluated", in other words "executed".
In this sence, it is like LISP.
Evaluation of elements (field names, variables, etc.) is
described in the Syntactical Components sections.
When an expression including operators is evaluated,
operations described in the Operators and related special variables sections are executed.
Boolean Evaluation
The result of an operation of Dl may be an array of string or numeric
values. When the result is evaluated as a boolean value, evaluation follows
the next rule:
- When the value is simple (i.e. number of elements is 1),
- When the value is numeric,
value 0 is FALSE, and non 0 value is TRUE.
- When the value is string,
Null string is FALSE, and any other string is TRUE.
- When the value is null value (i.e. number of elements is 0),
it is FALSE.
- When the value is not simple (i.e. number of elements is greater than 1),
it is TRUE, regardless to the type or element values.
It might cause queer situations. For example a numeric value { 0 } is
FALSE, while a string value { 0 } is TRUE (by rule 1). Never the less, it
works practically in most situations.
This boolean evaluation rule is generally applied in Dl, whenever an
operand requires boolean values. For example logical AND operation requires
the left hand operand to be evaluated as boolean, and the above rule is applied.
Operators and related special variables
Unary operator takes an operand right hand of it.
Binary operator takes its left hand and right hand operands.
Other operators, such as suffix operator or IF operator, have their own syntax.
An operand of Dl operators is an arbitrary expression,
but for a few exceptions like left hand operand of
assignment operator.
Some operators and special variables
have synonyms, e.g., "!=", "<>" and "NE". This is partially because
of avoiding shell special characters like ">",
and partially because of sharing the operator name
with popular applications such as awk or perl.
Control Operators
Control operators have UNIX sh like syntax.
This version of Dl does not have case control, break or continue of C language. These functionalities might be supported in future versions.
As operators of Dl, control operator also yields a value.
Next example is not recommended but is valid, because control operator
IF forms an expression.
IF IF v >= 0 THEN v ELSE - FIELD v FI > 3 THEN q = many FI
Table of Control operators
Serial operator
expression1 ; expression2 |
This is like comma operator of C language. First, expression1
is evaluated and regardless to its value, expression2 is evaluated.
The result is the value of expression2.
The form without expression2 is also allowed
so that language C like semicolon usage does not cause syntax error.
Precedence
of serial operator is the lowest.
If operator
IF condition1 THEN expression1
[ELIF condition2 THEN expression2]...
[ELSE expressionN] FI |
This is like "?:" operator of C language.
First, condition1 is evaluated as boolean and
if the result is interpreted as TRUE,
then expression1 is evaluated and becomes the IF operation's result.
If the value of condition1 is FALSE and ELIF exists, then condition2 is
tested as boolean and when it is TRUE, then expression2
is evaluated and becomes the result.
When all IF and ELIF conditions are false and ELSE exists, expressionN
is evaluated and becomes the result.
In this case if the last ELSE is omitted,
the value of the last IF or ELIF condition (thus evaluated as FALSE)
becomes the IF operation's result.
While operator
WHILE condition DO expression DONE |
Condition is evaluated as boolean and if the result is TRUE,
then expression is evaluated and then
condition is evaluated again; this is repeated until
condition becomes FALSE.
WHILE operation's result is the last expression value.
If condition value is FALSE from the first,
WHILE operation's result is condition value.
For operator
FOR variable IN value-array DO expression DONE |
The first operand of FOR operator must be a variable.
Any other expression, including static variable, is not allowed here.
Value-array is evaluated,
and the variable is set to each element of
the value-array in turn,
and expression is evaluated each time.
The result of FOR operation is the last expression value.
If the value-array is null value, then the result is also null value.
Exit
When this word is evaluated, the program stops at once.
This special variable does not yield any value.
Eplilogue
This special variable returns simple numeric value 1 when the program
is evaluated in epilogue mode, otherwise returns 0.
See Ded for the epilogue mode.
Comparison Operators
== | EQ | | equal |
!= | NE | <> | not equal |
> | GT | | greater than |
>= | GE | | greater than or equal to |
< | LT | | less than |
<= | LE | | less than or equal to |
INCL | | | includes |
Comparison operators are binary operators to compare left hand and right hand operands.
Result is simple numeric value 1 or 0, meaning TRUE and FALSE correspondingly.
Their syntax is:
expression1 binop expression2 |
Here, binop is one of the operators in the table.
You may use either form (== or EQ for the
equal comparison).
You can avoid to use shell special characters by using
alphabet forms.
Comparison operators are ambivalent, i.e.,
they compare values as numeric or string
depending on the type of operands.
When either left hand or right hand operand is numeric,
numeric comparison is made.
When both operands are string,
string comparison is made.
For example:
CONST 9 '<' CONST 10
unexpectedly results 0 (FALSE).
Because both constants are string by default.
NUM CONST 9 '<' CONST 10
gives expected result.
The next example is the most common pitfall of comparison operator usage:
seq '<' 10
Again in this case, string comparison is made,
and the result is not usually expected one.
Use numeric qualifier for the field or NUM operator
to make numeric comparison:
seq:n '<' 10
seq '<' NUM CONST 10
For the detail of comparison, see
Comparison of values
section of the manual Dintro.
Precedence
of comparison operators is higher than logical operators
and lower than array concatenation operator.
Include operator
Inlude operator handles operand as unorderd set of elements.
If any element of subset is equal to at least one of the elements of includer,
the result is TRUE.
Otherwise, the result is FALSE.
Comparison is made as string if both operands are string.
Otherwise, string operand is converted to numeric and numeric comparison is made.
For example, assume the field "keywords" has a set of words, then,
Dselect keywords INCL database
Dselect keywords INCL { data base }
tests whether the field has a word "database", and tests whether the field has a word "data"
and a word "base", correspondingly.
Precedence
of INCL operator is same as comparison operators.
Pattern match, Substitution Operators and After match special variables
Pattern match operators, substitution operators are regular expression
string match related operators.
Regular expression of D follows the egrep specification.
See the Dintro page for the detail.
Pattern match operator
string expression =~ pattern |
string expression LIKE pattern |
string expression !~ pattern |
string expression UNLIKE pattern |
LIKE is synonym of =~ and UNLIKE is synonym of !~.
Pattern match operators test regular expression matching.
Both string expression and pattern
are evaluated as string.
String expression is the string to be tested against
pattern as a regular expression.
The result is simple numeric value 1 or 0.
Both string expression and pattern may be an array.
Pattern matching is tested one by one for each pair of
string expression element and pattern element.
For the =~ (LIKE) operation,
when there is at least one element of
the pattern
that matches at least one element of
the string expression,
the result is 1, otherwise 0.
For the !~ (UNLIKE) operation,
when there is at least one element of
the pattern
that does not match at least one element of
the string expression1,
the result is 1, otherwise 0.
Note that !~ operation is not NOT =~ operation,
except for both operands are simple values.
For example,
NOT a !~ '^[0-9]+$'
tests whether field "a" is integer.
When the field "a" repeats,
it tests whether all of field "a" are integers,
while next one tests if there is any integer in field "a":
a =~ '^[0-9]+$'
Note that above examples use APOSTROPHE (') quoting for UNIX shell.
In the case of Windows shell, change APOSTROPHE to QUOTATION MARK (").
Precedence
of pattern match operators is same as comparison operators,
and its association is left to right.
substitution operator
base string SUBST
pattern BY replacement |
base string S
pattern BY replacement |
base string SUBSTG
pattern BY replacement |
base string SG
pattern BY replacement |
S is synonym of SUBST, and
SG is synonym of SG.
substitution operator is similar to s/xx/yy/ and s/xxx/yy/g
of perl or sed (or ed, ex of UNIX).
All base string, pattern
and replacement are evaluated as strings.
Pattern is matched with the base string as a regular expression
and when it matches, the matched part is substituted by the replacement
yielding the result.
SUBST (S) operation replaces
the first occurences of each base string,
while SUBSTG (SG) operation replaces all the occurences of
matched string of each base string.
When pattern does not match, the base string becomes the result.
In any case, the result is given as a new string and the base string itself is intact.
Next example is often used to perform s/xx/yy/ like function.
FIELD a = FIELD a SUBST xx BY yy
In the replacement, some characters have special function.
& | matched string |
\1 |
part of matched string corresponding to the first parentheses |
... | ... |
\9 |
part of matched string corresponding to the ninth parentheses |
\& | & |
\\ | \ |
Exapmple: next expression is TRUE
CONST 2002/12/24 SUBST ([0-9]+)/([0-9]+)/([0-9]+) BY \2/\3/\1
== 12/24/2002
Any of the operands may be an array.
In this case, for each element of the base string,
each element of the pattern is tested in turn and
the coreesponding element of the replacement
is applied when it matches.
The result has same number of
elements as the base string has.
When there is no corresponding element in replacement,
i.e., number of elements in the replacement
is less than number of elements in the pattern,
null string is used.
Exapmple: next expression is TRUE
{ CBI NACSIS NII } SUBST { CBI NACSIS } BY { NACSIS NII }
== { NII NII NII }
Precedence
of substitution operator is higher than concatienation operator,
and lower than SUBSTR, JOIN or GREP
operators.
Association is left to right.
After match special variables
These special variables are same as perl special variables
$&, $n, $` and $'
except for small details.
When one of these special variables is evaluated, it yields a part of
base string of the last successful pattern match in the current
record cycle.
MATCH or $& is the matched part of the base string
by the last successful pattern match.
MATCHn or $n
(where n is 1, 2, 3 .. 9)
is the n-th subpattern from the corresponding set of parentheses
of the last successful pattern match.
PREMATCH or $` is the part of the base string
preceding the matched part by the last successful pattern match.
POSTMATCH or $' is the part of the base string
following the matched part by the last successful pattern match.
The result value is always simpe string.
In the case pattern match operands are array,
the last pair of elements that matched is used.
Pattern matching order of the pattern match or
substitution operation is base string precedent,
i.e., for the first element of the base string
each element of the pattern is matched
in the order of the array, then the second element
of the base string is tested, and so on.
When there was no successful pattern matching
in the current cycle, result is null string.
Note that the pattern match operation in the previous cycle
of program evaluation is never refered to by these special variables.
PREMATCH special variable used after SUBSTG operation
is a bit tricky. As the pattern matching repeats itself within
the same base string and the pattern there may be more than one
matched strings. After SUBSTG operation, the last match is refered.
In this case, PREMATCH refers to
the base string already substituted one
and not to the original base string.
This is not same as perl's $`,
but in the current version of Dl interpreter takes this way.
Logical Operators
Precedence
of logical operators are weaker than other operators but for
;
and =.
Among them OR is the weakest, and AND follows, then NOT.
Association is right to left.
Logical not
! expression |
NOT expression |
NOT is unary operator which evaluates the
expression as boolean
and gives simple numeric value 0 if the operand is TRUE,
or 1 if it is FALSE.
Logical and or
expression1 && expression2 |
expression1 AND expression2 |
expression1 || expression2 |
expression1 OR expression2 |
AND (&&) and OR (||) are binary operators.
Like perl the result is expression1 or expression2,
rather than to give numeric value 1 or 0.
AND (&&) evaluates expression1 as boolean.
When it is TRUE,
the result of operation is expression2.
When it is FALSE,
the result is expression1,
without evaluating the expresssion2.
OR (||) evaluates expression1 as boolean,
When it is TRUE,
the result of operation is expression1,
without evaluating expression2.
When it is FALSE,
the result is expression2.
Arithmetic Operators
+ | PLUS | plus |
- | MINUS | minus, unary minus |
* | TIMES | multiply |
/ | DIVIDEDBY | divide |
% | MOD | modulus |
** | POWER | power |
Arithmetic operators are binary operators,
except for unary minus.
Their syntax is:
expression1 binop expression2 |
- expression |
Here, binop is one of the operators in the table.
Both expression1 and expression2
(or expression in the case of unary -) are evaluated as numeric.
In the case of %, they are further converted to integers.
Arithmetic operators array operation rule
When both operands are simple values,
ordinary arithmetic operation is performed.
When left hand and/or right hand operand is array,
Dl treat them in a special way:
- When one of the operands is a simple value and the other is an array,
the result is an array, of which i-th
value is the result of the operation to the simple value and the i-th
element of the array.
- When both operands are array, the result is an array, of which i-th
element is result of the operation to the
i-th element of both expression1.
When the operands have different array length,
the result has shorter one's length.
Extra elements in the larger array are just discarded.
Unary minus operator
- foo
is same as
CONST 0 - foo
Following examples are to demonstrate rules above, and results are all TRUE.
CONST 1 + CONST 1 == CONST 2
{ 1 2 3 } + { 3 2 1 } == { 4 4 4 }
{ 1 2 3 } * { 4 } == { 4 8 12 }
{ 1 2 3 } * { 4 5 } == { 4 10 }
{ 6 } / { 6 3 2 } == { 1 2 3 }
{ 6 5 } % { 4 3 2 } == { 2 2 }
{ } + { 1 2 3 } == { }
CONST 1 + { } == { }
Operator precedence
is, as usually, as following:
- (unary)
**
* / %
+ -
and they are higher than logical, comparison operators and array operators.
Precedence
of binary arithmetic operators is lower than string concatenation operator,
mathematical and unary array operators.
Precedence
of unary minus operator is same as mathematical operators.
Mathematical Operators
ABS | absolute value |
SQRT | square root |
EXP | exponential |
LOG | natural logarithm |
LOG10 | common logarithm |
SIN | sine |
COS | cosine |
TAN | tangent |
ATAN | arch tangent |
Mathematical operators are unary operators.
Syntax of them is
where op is one in the table above.
In most computer languages, they are functions and the syntax is
something like LOG(x).
But in Dl, they are operators.
You need no parentheses for the operand.
For example:
SQRT a
But, there is no harm to use parentheses for them:
SQRT ( a )
The expression is evaluated as numeric value.
When the expression is an array,
the result is an array of same size,
of which i-th
element is the result of the operation to the expression's
i-th element.
See next examples:
ABS { -1 0 1 } == { 1 0 1 }
ABS { } == { }
Precedence
of these operators are higher than arithmetic operators.
String Operators
. |
|
Concatenation |
SUBSTR |
|
Substring |
LENGTH |
|
Number of characters |
TOLOWER |
|
Change upper case letters to lower cases |
TOUPPER |
CAPS |
Change lower case letters to upper cases |
Concatenation operator
string expression1
. string expression2 |
Concatenation operator is binary.
String expression1 and string expression2
are evaluated as string.
String expression2 is appended after the string expression1
yielding the result.
The operands are not changed.
When the operands are arrays, the operation follows
the same rule
as of the arithmetic binary operators.
See the next example:
CONST 0x . { abc def } == { 0xabc 0xddef }
{ "$" "@" } . { 115.00 1.15 } == { "$115.00" "@1.15" }
Precedence
of the concatenation operator is higher than arithmetic
operators (**), and lower than substitution operators.
Substring operator
base string SUBSTR offset-length array |
Unlike perl or many other langugaes,
Dl's SUBSTR is a binary operator.
The right hand operand
offset-length array gives both offset and the length.
VAR str SUBSTR { 2 3 }
means substr($str, 2, 3) of perl.
Base string is converted to string if necessary.
Offset-length array, of which
first element gives the offset and the second
element gives the length, is converted to numeric if necessary.
The result of the operation is substring of base string
from offset (zero start) having length
characters.
If length is negative, that many characters from the end of
base string is removed.
For example, { 2 -3 } means to remove first two
characters and last three characters from the base string.
The elements other than the first and the second
of offset-length array, if any,
are not used for the operation.
When there is only one element
(offset only),
the result of the operation
is upto the end of the base string.
When offset-length array has no element,
the result is the base string itself.
If offset and length
specify a substring partly outside the base string,
the result is only the inside portion of the range.
If they specify fully outside the base string,
the result is null string.
Actual length of the result is shorter than specified
length, in these cases.
Base string may have more than one element.
In this case, the result of the operation has same number
of elements, each of them is the result of single SUBSTR
opeation.
Unlike perl or many other languages,
Dl's SUBSTR can not be used as
left hand expression of the assignment operation.
An alternative of this is to use concatenation operator
(.) with SUBSTR.
But substitution operator (SUBST or S)
often provides shorter solution.
Precedence
of SUBSTR operator is higher than
concatenation operator and substitution operator,
same as Grep,
Join,
Split operator,
and lower than Sprintf
or Sscanf operator.
Length operator
LENGTH is a unary operator.
The operand is evaluated as string.
The result of LENGTH operation is a numeric value which shows
the number of characters in the operand.
When the operand is an array, the result is a numeric array of same size,
of which i-th element is the length of the i-th element of the operand.
See the next example:
LENGTH { abc defg } == { 3 4 }
Precedence
of LENGTH operator is same as mathematical unary operators.
Case change operator
TOUPER string expression |
CAPS string expression |
TOLOWER string expression |
TOUPPER (or CAPS) and
TOLOWER
operator makes a string
that has same length and same contents as the operand
string expression,
but all the lower case letters in it are converted to
the corresponding upper case letters,
or vice versa.
Note that the operand itself is unchanged.
Array handling is same as
LENGTH or
mathematical unary operators.
See the next example:
TOUPPER { Dselect Dgrep Dextract } == { DSELECT DGREP DEXTRACT }
Precedence
of case change operator is same as mathematical unary operators.
Conversion Operators
NUM | numeric conversion |
STR | string conversion |
INT | integer conversion |
SPRINTF |
printf like formatted conversion |
SSCANF |
scanf like formatted conversion |
Num operator
NUM operator converts the expression into numeric values.
It does not change the value of operand itself.
See Dintro for the detail.
Array handling is same as
mathematical unary operators.
Precedence
is same as mathematical unary operators.
Str operator
STR operator converts the expression into string values.
It does not change the value of operand itself.
See Dintro for the detail.
Array handling is same as
mathematical unary operators.
Precedence
is same as mathematical unary operators.
Int operator
INT operator evaluates the expression
as numeric values and then convert them into integer,
It does not change the value of operand itself.
Array handling is same as
mathematical unary operators.
Precedence
is same as mathematical unary operators.
Sprintf operator
expression
SPRINTF format |
SPRINTF operator converts the expression
into string value using the format as sprintf
function of C language.
This is same as C-format of the output D_fmt.
See the manual C-format
section of the D_fmt for the conversion detail.
The expression and the format are evaluated
as strings.
Only the first element of the format is used for the
SPRINTF conversion.
Other elements (if any) are ignored in this operator.
When the expression is an array, the result is a same size
array, of which i-th element is the conversion result of
the i-th element of the expression.
When an error (for example, format specification error)
is detected during the process, the result becomes the null value.
Precedence
of SPRINTF operator is higher than any
binary operator, and lower than mathematical
unary operators.
Sscanf operator
SSCANF operator converts the string value expression
into numeric or string value using the format as
sscanf function of C language.
This is same as C-format of the input D_fmt.
See the manual C-format
section of the D_fmt for the conversion detail.
The result value of the SSCANF operator is either numeric
or string value depending on the format specifier of the format.
When the type of the format specifier is 's' or
's', the result is string value.
For all other types, the result is numeric value.
The expression and the format are evaluated
as strings.
Only the first element of the format is used for the
SSCANF conversion.
Other elements (if any) are ignored in this operator.
When the expression is an array, the result is a same size
array, of which i-th element is the conversion result of
the i-th element of the expression.
When the sscanf() operation fails due to the formatunmatch,
the corresponding result becomes null string or numeric value zero.
When an error (for example, format specification error),
is detected during the process, the result becomes the null value.
Precedence
of SSCANF operator is higher than any
binary operator, and lower than mathematical
unary operators.
Array Operators
, |
array concatenation |
COUNT |
Count elements of an array |
EXISTS |
Tests an expression has elements |
MIN |
minimum value in an array |
MAX |
maximum value in an array |
SUM |
sum of values in an array |
AVG |
average value of elements in an array |
CAT |
concatenate array elements |
JOIN |
concatenate array elements with delimiters |
GREP |
select elements with comparison/pattern match operation |
SPLIT |
split a string into array elements |
.. |
range operator |
Array concatenation operator
Array concatenation is a binary operator.
Elements of the array2 are appended after
elements of the array1
yielding the result.
Array1 itself is not changed
even if it is a field or variable.
Number of elements in the result is addition of
the numbers of elements of both operands.
Next example is TRUE:
{ 1 2 3 } , { 4 5 6 } == { 1 2 3 4 5 6 }
When the type of the array2 is different from the type of
the array1, array2 is converted into the
type of the array1.
For example:
NUM { 1 2 3 } , STR { a b c } == NUM { 1 2 3 0 0 0 }
Note that { a b c } is evaluated as { 0 0 0 } as numeric.
(See Dintro).
Precedence
of the array concatenation operator is
lower than arithmetic operators and higher than comparison operators.
COUNT and EXISTS
COUNT expression |
EXISTS expression |
COUNT returns a simple numeric value which shows
number of elements in the expression.
EXISTS returns a simple numeric value 1
when the expression has one or more elements,
or 0 when the expression has no element.
Precedence
of these operators are same as arithmetic unary operators.
Statistics within an array
MIN array |
MAX array |
SUM array |
AVG array |
These are unary operators,
which gives simple statistic values of the array elements.
Array is evaluated as numeric,
and the result is always a simple numeric value.
MIN, MAX gives the minimum or maximum value
of the elements of the array.
SUM gives the summation of all elements of the array.
AVG gives SUM/COUNT value
of the array.
See the following examples:
COUNT { 1 2 3 4 5 6 } == CONST 6
COUNT { } == CONST 0
MIN { 1 2 3 4 5 6 } == CONST 1
MAX { 1 2 3 4 5 6 } == CONST 6
AVG { 1 2 3 4 5 6 } == CONST 3.5
Precedence
of these operators is same as mathematical unary operators.
CAT operator
CAT is a unary operator.
Array is evaluated as string,
and the result is the concatenation of these element strings in
the element order.
See the following example:
CAT { 1 2 3 4 5 6 } == CONST 123456
Precedence
of CAT operator is same as mathematical unary operators.
Note that usage of this operator is dicouraged now.
Please use JOIN instead of CAT.
Join operator
Elements of the array are concatenated
with intervening delimiter to form a simple string.
Array and delimiter are
converted to string if necessary.
If delimiter has more than one element,
only the first one is used for this operation.
If delimiter has no element,
delimiter is null string.
Precedence
of JOIN operator is higher than
concatenation or substitution operator and
lower than SPRINTF, SSCANF operator.
Note that array JOIN ""
provides same result as CAT array.
This is just because of a historical reason and use of CAT
is not encouraged now.
Also note that the name JOIN comes from perl, and
nothing to do with
Djoin.
It rather provides similar function that
Dpack provides.
Grep operator
Op is a
pattern match operator
or a
comparison operator but for INCL.
For each element of array, the element op expression is evaluated,
and if it is true, the element is selected.
Result is an array with selected elements.
It is a new value and array itself is not changed.
Typical usage of GREP operator is:
FIELD words = FIELD words GREP UNLIKE "$[0-9]+$"
which removes numerical words from the field named words.
In the case the op is pattern match operator,
array and expression are evaluated as string
and the result is always string.
In the case the op is comparison operaor,
the result array is string or numeric depending on the
array's type.
But the comparison operation is made according to the operator's
numeric/string decsion rule, i.e.,
numeric comparison is made when either part of the operands is numeric.
For example, in the case:
FIELD a GREP GE 10
string comparison is made,
because FIELD a is string and CONST 10 is also a string.
Either of next example makes numeric comparison.
FIELD a GREP GE NUM 10
FIELD a:n GREP GE 10
The result of the first example is string,
while the result of the second one is numeric.
Usually, you use a simple value for the expression.
But, when it has more than one element,
the comparison or pattern match follows the rule of the op
operator (single value vs. array case).
Precedence
of GREP operator is higher than
concatenation or substitution operator and
lower than SPRINTF, SSCANF operator.
Split operator
string expression SPLIT delimiter |
string expression is
separated by a regular expression delimiter.
Result is an array of separated strings.
Both string expression and delimiter
are converted to string if necessary.
Only the first element of delimiter is used.
Other elements are not used for this operatipn.
If there is no element in delimiter,
null string is assumed.
When string expression has more than one element,
each element is separated and all the separated elements
makes an array to form the result.
If delimiter matches null string, the first character
is separated.
Thus,
CONST "abc" SPLIT CONST ""
provides
{ a b c }
Precedence
of SPLIT operator is higher than
concatenation or substitution operator and
lower than SPRINTF, SSCANF operator.
Note that this operator provides similar function that
Dunpack does.
But, delimiter matching null string is prohibited
in Dunpack,
while it separates the first character in this operation.
Range operator
Range operator is similar to perl's range operator
in scalar context.
Next example shows a typical usage:
FOR i IN 0 .. COUNT a - 1 DO
IF a [ VAR i ] =~ [0-9]+ THEN num = FIELD num , a [ VAR i ] FI
DONE
Both start and end are evaluated as numeric.
The last element of the start and
the first element of the end are converted into integer.
At the first, elements of the start except for the last one
are copied to the result array.
Then, the integer values from the last element of the start
upto the first element of the end are filled.
At the last, elements of the end except for the first one are copied
after that.
Step value is 1 or -1 depending on which value is larger.
See next example:
{ 0.5 1.5 } .. { -1.5 -0.5 }
== { 0.5 1 0 -1 -0.5 }
(You may wonder why 0.5 and -0.5 are not converted to integer.
There is no strong reason, just for the implementation convenience.)
Precedence
of range operator is higher than array concatenation operator
and lower than arithmetic operators.
Suffix Operator
The suffix is evaluated as numeric
and then converted to integer.
The result of suffix operation is the element[s]
of the array of which index number
(suffix) is given by the suffix.
Note that the first element is suffixed as zero.
When the suffix is out of the
array's range,
the result is null value.
When the suffix is an array,
the result is also an array,
of which i-th element is the result of suffix operation between the same
array and the i-th element of the suffix.
Following examples are all TRUE.
{ 1 2 3 } [ 0 ] == { 1 }
{ a b c } [ 1.5 ] == { b }
{ 1 2 3 } [ -1 ] == { }
{ a b c } [ 3 ] == { }
{ 1 2 3 } [ { 0 1 } ] == { 1 2 }
{ a b c } [ { 2 1 0 } ] == { c b a }
{ a b c } [ { 1 2 3 } ] == { b c }
Note that foo [ 1 ] [ 0 ] is valid expression,
but it is just same as foo [ 1 ].
Unlike arrays in C language, Dl supports
only one dimensional array.
Assignment Operator
LHE = expression |
LHE [ suffix ] = expression |
Assignment operator evaluates the expression
and assign its value to the left hand expression (LHE).
It also yields the assigned value as this operation's result.
LHE is limited to a field,
variable,
static variable or
current record special variable.
When there is suffix operator with the LHE,
suffix is evaluated as numeric and converted to integer,
and the object of assignment is limited to certain elements
of the LHE indexed by the suffix.
When the type of the expression is different from the
type of LHE, the expression is converted to the
type of LHE.
If LHE has no type (e.g. the first use of a VAR),
LHE becomes the type of the expression.
Note that fields and the current record special variable
are string type.
When the LHE is a field name,
field order in the current record is kept as far as possible.
For example, the current record is
a:A
b:B
and the program is
FIELD a = CONST foo
then, the result is:
a:foo
b:B
Note that field "a" is not moved.
Both the target field and the expression may be array.
When they have same number of elements,
corresponding element is assigned keeping the field positions.
When the target field has more elements than the expression has,
excess fields are eliminated from the current record.
When the target field has less elements than the expression has,
excess elements are inserted just after the last existing elemet
with the same field name.
For example, assume the current record is
a:A
b:AA
a:B
b:BB
and the program is
FIELD a = { 0 1 2 3 }
then the result is:
a:0
b:AA
a:1
a:2
a:3
b:BB
When the target field is new to the current record,
new field is appended after existing fields.
When suffix operator is provided with LHE,
object of assignment is limited to the element[s]
indexed by the suffix value[s].
For example, the current record (or a [static] variable's value) is:
a:AA
a:BB
a:CC
and the program is:
a [ 1 ] = { 1 }
then the result is:
a:AA
a:1
a:CC
When the expression is an array like:
a [ 1 ] = { 1 2 }
target element is replaced by the the array and the result is:
a:AA
a:1
a:2
a:CC
Similarly, if the assigned value is null value like:
a [ 1 ] = { }
it means element removal and the result is:
a:AA
a:CC
When the suffix is outside of LHE's current elements,
no assignment take place.
(Note that this specification is different from perl
which extends the array size.)
When the suffix of LHE is an array,
the rule is "one to one but the last target takes the rest".
For example, assume a variable a has value:
{ AA BB CC DD }
and the program is:
VAR a [ { 0 2 } ] = { 0 2 }
then the result is { 0 BB 2 DD }.
When the expression has more elements:
VAR a [ { 0 2 } ] = { 0 2 4 6 }
then the result is { 0 BB 2 4 6 DD }.
And when the expression has less elements:
VAR a [ { 0 2 } ] = { 0 }
then elements of the variable "a" is removed like { 0 BB DD }.
Precedence
of assignment operator is second from the lowest
and just higher than serial operator.
I/O related special variables
OUTPUT | output current record |
FILENAME | current input file name |
FNR REC# |
current record number in the file |
NR $. |
current record number throughout the input files |
@_
CURREC
FIELDS |
current record itself |
These are special variables and has no operands.
Output
When this special variable is evaluated Dl processor
writes the current record to standard output.
Note that it does not flush the current record.
Unless you clear the current record explicitly
(CURREC = { }),
it is output again at the end of the current record cycle.
This special variable's value is always simple numeric value 1.
Current file name
This special variable holds the current input file name
as its value.
If the input is standard in, the value is null string.
Current record number
This special variable holds the current record number
in the current input file (start from 1).
It has two names by historical reason.
REC# came from Dpr header line, and
FNR is awk compatible.
Current record number throughout the input files
This special variable holds the current record number throughout
the input files.
It is same as FNR if there is just one input file.
It has two name: NR is awk compatible
and $. is perl compatible name.
Current record special variable
This special variable is the raw mode current record, i.e.,
each element is a field of the current record,
which consists of field name, COLON and the value
as appears in the D-file.
Quite naturally, assigning a value to this special variable
affects to the current record.
Note that you are responsible to make field name COLON value
form to keep the D-record valid.
Assigning null value to this special value:
CURREC = { }
means deletion of the current record.
(As long as you don't assign value afterwards.)
This special variable has three names: CURREC, FIELDS and @_.
@_ is perl like name.
Though it is not quite same,
perl users will easily remember this name.
CURREC, naturally, stands for "current record".
But, in some cases, like COUNT CURREC,
this name seems unnatural, and COUNT FIELDS
sounds more natural.
This is the reason of this altenative.
You can use whichever you like.
System related operators and special variables
QX |
run system commands |
$?
STATUS |
The status of last QX call |
CODESET |
character code in effect |
LOCALE |
current locale value |
QX
QX operator is a unary operator.
This corresponds to backtick (`command`) of Unix shell.
The command is converted into string if necessary,
and then executed as a system command with the shell.
The output from the command to standard output is the
value of the QX operation.
The value may be an array, of which element
is individual line of the output from the command.
(Thus new line character is never included).
When the command is an array,
elements are concatenated with intervening new line to yield a command.
But, the shell may not properly process the second and following lines.
(Windows shell cmd.exe is this case).
Precedence
of QX operator is lower than other unary operators.
It is higher than range operators
and lower than arithmetic operators,
so that you don't need () for
r = QX "command " . FIELD p ;
But, when you use () to give the operand, don't forget CONST
r = QX '(' CONST "command " . FIELD p ')' ;
otherwise "command " is interpreted as a field name.
Name of QX comes from qx/command/ of perl.
STATUS
This special variable holds the return code of the last QX command.
Until the first QX command is executed, the value is zero.
$? is Unix sh like name,
and STATUS is Unix csh like name.
CODESET
This special variable holds character set (encoding) name[s] used in D-commands.
Usually, this is the encoding name of the current locale.
When UTF I/O feature is invoked,
the value becomes "UTF-8", "UTF-16LE", "UTF-16BE",
"UTF-32LE" or "UTF-32BE".
If input and output character encoding is different,
the value has form of "input encoding/output encoding".
CODESET is useful to set "encoding" attribute of
XML, or "charset" name for HTML.
LOCALE
This special variable holds current locale value.
It is a return value of setlocale(LC_ALL, "") call
in the case of UNIX, and setlocale(LC_CTYPE, "") call
in the case of Windows.
Operator Precedence
Use of parentheses is recommended
as the operator precedence may not be
what you expect.
Examples
In the following examples, FIELD,
CONST or VAR is omitted as far as possible.
This is to demonstrate how Dl parser works, and it is not recommendation.
I would rather recommend not to rely on these default interpretations,
which may often lead you to mistakes.
Test if a field "lang" is "jpn":
lang == jpn
Test if a field "yr" is smaller than 2003 as a numeric value:
yr:n LT 2003
Assume each input D-record has just one field "l",
of which value is a line of a text file.
In ths text file, a paragraph is separated by a blank line.
Next example adds a field "P" which holds the
paragraph number to each D-record.
IF NR == 1 THEN STATIC P = 1 FI ;
IF l =~ "^ *$" THEN STATIC P = STATIC P + 1 FI ;
FIELD P = STATIC P
Works as Dtie -t / y,m,d ymd (under the condition that
fields "y", "m" and "d" have same number of elements).
ymd = FIELD y . CONST / . FIELD m . CONST / . FIELD d ;
y = FIELD m = FIELD d = { }
Similar to perl's grep.
Eliminates null string value fields.
FOR i IN COUNT FIELDS - 1 .. 0 DO
IF FIELDS [ VAR i ] LIKE "^[^:]*:$" THEN
FIELDS [ VAR i ] = { }
FI
DONE
Similar to the example above; another way to do with array suffix.
Works as Dproj foo:
FOR i IN 0 .. COUNT FIELDS - 1 DO
IF FIELDS [ VAR i ] =~ ^foo: THEN
VAR f = VAR f , VAR i
FI
DONE ;
FIELDS = FIELDS [ VAR f ]