Dl - D-language

Introduction

What is Dl

Dl is a language to handle D-records. It is not general programming language, such as C, C++ or Java. Functionally, it is akin to awk language. But, Dl handles D-records, while awk handles lines of text files. For example, you can add a new field to all or a specific record of the input D-file, change field values, or delete fields.

Though Dl has almost full function of programming languages, it is not intended to process a huge program. Ded is an interpreter of Dl and not very fast as a compiler languages. If you need very complicated process, it is recommended to use other method, for example, like perl or c programming language. Typical usage of Dl is:

Ded IF txtlang == jpn OR txtlang == kor OR txtlang == chi THEN area = ea FI input-file.d

This command adds (or changes) the field "area" with value "ea" to the records of which "txtlang" field value is "jpn", "kor" or "chi".

Some of D-commands can be written by Dl. For example,

Dtie -t ":" a,b c

is same as

Ded FIELD c = FIELD a . CONST ":" . FIELD b ";" FIELD a = FIELD b = "{" "}"

Essential difference of these two approach is that a D-command represents a D-file basic operation, while Dl offers general purpose D-record handling method. Actual difference is the speed. D-commands are tuned for specific operations and hard-wired code for the operation. Ded is an interpreter of Dl and executes the operations step by step, thus is slower than D-commands. It is recommended to use specific D-command, when there is proper one provided.

Features of Dl

Dl program is written as a series of command arguments (generally). This is like sed command of unix, but -e options is not used and you can directly write the program as command arguments. Or, if you like, you can provide Dl program from a text file. Detail is described in the general syntax section.

Dl has highly simplified syntax. Unlike many languages, Dl has no statement. There are only expressions. Control structures like "if" or "while" are operators in Dl. Even the ";" is operator, which is similar to "," operator of the language C. An expression is "evaluated". This means that part of the Dl program is "executed".

In addition, Dl does not have subroutines or macros. These facts make it difficult to write large complicated program, which is not main target of Dl.

Any field of a D-record is repeatable. Consequently, any constants or variables of Dl are arrays. Any operator of Dl is applied to arrays, with special ways. For example, "+" or addition operator works differently depending on the numbers of elements in operand values. Perl language has array and scalar contexts to control the operation semantics. Dl has only array context for an operator.

Ded and Dselect

Two separate programs interprets Dl. Ded is the full processor of Dl, while Dselect has restriction of operators which affects to the output record.

In the case of Ded, given Dl expressions are evaluated (i.e. the given Dl program is executed) for each input D-record, and after the evaluation (i.e., execution of the Dl program), the current record is written to the standard output, if it has at least one field (i.e., if the current record is not deleted). You may write the current record explicitly by output, but even in this case, Ded will write the current record after the evaluation. After output the current record, Ded reads next input, and goes into new cycle of evaluation, until it encounters end of file.

In the case of Dselect, given Dl expressions are evaluated for each input D-record, and if the result value is true (see boolean evaluation section below), the input record is written to the standard output. Assignment operation to a field or output operation is not allowed in the Dl expression given to Dselect command, so that the input record is not changed or duplicated in the output.

General Syntax

Words and quoting

Source program of Dl is taken from the command arguments or from a source file given with the -f option. Dl program is made of words. Dl operators, constants, field names, variable names and other Dl reserved words are given as words. Even a parenthesis is a word in Dl. A word is made of arbitrary length string of any character. The way of recognizing a word is slightly different for command arguments and source file input.

When the program is given as command arguments, each command argument makes a Dlword. When the program is given from a text file, each word is separated by white spaces. In addition to white spaces, only REVERSE SOLIDUS (\), QUOTATION MARK (") and APOSTROPHE (') have special meaning in the source file. Quoting mechanism with these characters explained below follows the UNIX born shell specification.

REVERSE SOLIDUS (\) at the end of line is regarded as line continuation mark unless it is placed in APOSTROPHE (') quoted string. The continuation mark and the new line after it are omitted from the word. REVERSE SOLIDUS at other positions is escape character unless it is placed in APOSTROPHE or QUOTATION quoted string. The escape character itself is omitted and the following character becomes a normal character which makes a part of the word. Escape character is usually used to include spaces, QUOTATION MARK, APOSTROPHE or REVERSE SOLIDUS as a part of the word.

QUOTATION MARK (") is another means to use special characters in a Dl source file. Once the Dl parser encounters a QUOTATION MARK, it is omitted from the word and the following characters are included in the word until the parser again encounters a QUOTATION MARK. (Ending QUOTATION MARK is not included in the word). There are two exceptions within the QUOTATION MARKs. REVERSE SOLIDUS followed by a QUOTATION MARK makes just one QUOTATION MARK. This is used to escape ending QUOTATION MARK. The other exception is REVERSE SOLIDUS followed by a new line character, which is an line continuation mark within the QUOTATION MARK, and both REVERSE SOLIDUS and the new line character are omitted from the current word. Other REVERSE SOLIDUS between QUOTATION MARKs is treated as normal character.

APOSTROPHE (') is also a quoting character. It is stronger than QUOTATION MARK. When the Dl parser encounters an APOSTROPHE, following characters are included in the current word until it encounters the closing APOSTROPHE. Unlike QUOTATION MARK, there is no exception. Even a REVERSE SOLIDUS or a new line character is treated as normal character within APOSTROPHEs. To use an APOSTROPHEs within a word, use REVERSE SOLIDUS escape, or QUOTATION MARK explained above.

Example of quoted words in Dl source files

one\ word one word
"one word" one word
'one word' one word
one" "word one word
o\ n" "e' '' '"w o ""r d" o n e  w o r d
\o\n\e\ \w\o\r\d one word
one\
word
oneword
\\\"\' \"'
"\"\\'\"" "\'"
"one\
word"
oneword
'"\"' "\"
'one
word'
one
word

Reserved words

Following words are reserved words in Dl.

!= !~ $& $' $. $` % && ( ) * ** + , -
-- . .. / /* ; < <= <> = == =~ > >= @_
[ ] { ||
ABS AND ATAN AVG BY CAPS CAT CONST COS
COUNT CURREC DO DONE ELSE EPILOGUE EQ
EXISTS EXIT EXP FI FIELD FIELDS FILENAME
FNR FOR GE GT IF IN INCL INT LE LENGTH
LIKE LOG LOG10 LT MATCH MAX MIN MOD NE
NOT NR NUM OR OUTPUT POSTMATCH PREMATCH
REC# SIN SQRT STATIC STR SUBST SUBSTG
SUM TAN THEN TOUPPER UNLIKE VAR WHILE

Following words are reserved only in limited situations:

} */

Note that all words in Dl are case sensitive (i.e., "IF" is reserved word but not "if" or "If").

Comment

Comment is started by a word /* and ended by a word */. Unlike C language's comment, these /* and */ must be separated by spaces; i.e., /*COMMENT*/ causes error, while /* COMMENT */ is treated as a comment.

Tokens

Tokens of the Dl are the field-name, constant, variable, static variable, special variable, operator, parenthesis or end-token. Each token must be given as a separate word or two or more words led by reserved keyword.

Grammar

Following are the simplified grammar of Dl.

program ::= expression
expression
::= primary
| unary-operator expression
| expression binary-operator expression
| expression ternary-operator1 expression ternary-operator2 expression
| expression [ expression ]
| IF expression THEN expression FI
| IF expression THEN expression ELSE expression FI
| WHILE expression DO expression DONE
| FOR variable IN expression DO expression DONE
primary
::= constant
| field-name
| variable
| static-variable
| special-variable
| ( expression )

Syntactical Components

Field Name
Constant
Variable and Static Variable
Special Variables
Parentheses
End Token

Field Name

Morphology

A word following the keyword FIELD is a field name token. For example,

FIELD a

is a field-name token "a". Similarly,

FIELD FIELD

is a field-name token "FIELD". In this case the second word (FIELD) is not a keyword but a field name, while the first word (FIELD) is a keyword.

Omission of the Keyword FIELD

As a special case, a non-keyword word at the top of the program. A non-keyword word after following tokens is also a field-name token by itself.

(
; , && || !
IF THEN ELSE WHILE DO
ABS AND ATAN AVG CAPS CAT COS COUNT EXISTS EXP INT LENGTH LOG LOG10 MAX MIN NOT OR SIN SQRT SUM TAN TOUPPER

In the next example, the second word "a" is a field-name token, because it is after EXISTS.

EXISTS a

Evaluation of Field-name token

Field-name token, when evaluated, has the field values of the given field name in the current record. The current record may have two or more same name fields. In this case, the result of field evaluation becomes an array. When the current record has no such fields, the value is NULL.

Numeric qualifier

The value of the field-name token is string as its default. But, for example in comparison operation, you may want to evaluate it as numeric. Numeric qualifier makes the field-name token evaluation in numeric value. It is COLON (:) and a letter "n" after the field-name. For example

FIELD seq

is evaluated as a string, thus, "10" is smaller than "9". But,

FIELD seq:n

is evaluated as numeric, and "10" becomes greater than "9".

Constant

Morphology

There are two ways to denote constants. One is to use a keyword CONST. The word following the keyword CONST is a constant token. To make array value, repeat CONST and the value after the first constant.

CONST a
CONST a CONST b

The other way to denote constant is to use BRACEs. The words following the LEFT BRACE ({) before a RIGHT BRACE (}) forms a constant token. In these BRACEs, keywords of the Dl lose their effect, and become constants

{ a }
{ a b }

Note that a LEFT BRACE is just a constant in BRACEs. Thus the next example:

{ { } }

causes syntax error at the fourth word, because the constant token ends at the third word.

There is no way to include a string consist of a single RIGHT BRACE in this braced constants. Use

CONST }

for this purpose.

To make NULL constant, use

{ }

There is no way to make NULL constant with CONST type denotation.

Omission of the Keyword CONST

As a special case, a non-keyword word after following tokens is a constant as it is.

!= !~ % * ** + - . .. /
< <= <> = == =~ > >= [
BY DIV EQ GE GT IN INCL
LE LIKE LT MINUS MOD MULT NE
PLUS POW SUBST SUBSTG UNLIKE

For example, in the next case

FIELD 1 == 1

the second word '1' is a field name, and the fourth word '1' is a constant.

Evaluation of constant

Constants are evaluated as strings as they are. In the case when a numeric value is required, for example after or before '+', Dl interpreter automatically converts it to numeric value. There is no way to evaluate a constant as numeric value explicitly. Operator NUM converts a string to numeric value.

Variable and Static Variable

Morphology

Variable token consists of a keyword VAR and a following aribitrary word. Similarly, a static variable token consists of a keyword STATIC and a following arbitrary word. Only after the FOR token, keyword VAR may be omitted. There is no way to omit keyword STATIC.

Lifetime

A variable or a static variable can hold a value. The difference is its lifetime. Lifetime of variable is just one cycle of Dl execution, i.e., when a D-record is read from the input file, all variables are wiped out before evaluation of the given program. Static variable has lifetime of a D-command execution, i.e., the value assigned to it is kept through the execution of a D-command execution.  In other words, static variable is a variable in the usual sense and variable is just for a local use such as loop index.  (FOR operator takes a variable as its index).

Scope of a variable or a static variable is always the whole program.

Evaluation

A variable or a static variable yields the value assigned to it. When a variable or a static variable not assigned a value is evaluated, it yields the null value (an array with no element).

Special Variables

Syntactically, special variables consist of a preserved keyword. Semantically, they are like predefined variables of perl, and some of them are just like statements of a programming languages.

When a special variable is evaluated, it yields a value related to the environment of program execution, or do some function for the program execution. In this sence "special variable" may be misleading name. But to make the syntax simple, they calssified into one category.

These variables are not user changeable, but for CURREC, which represents whole current record and is changeable by an assignment operation.

Individual special variables are described int the Operators and related special variables section.

Parentheses

LEFT PARENTHESIS (() and RIGHT PARENTHESIS ()) are used as in usual languages, to make an expression a unit, changing the order of operation.

Note that CURLEY BRACKETS ({ }) are for constant tokens and are not grouping the expressions. Note also, SQUARE BRACKETS ([ ]) are suffix operators.

End Token

Word '--' is used to indicate the end of program explicitly.  Usually, you may not use this, because the Dl parser inserts an end token automatically when it encounters a word which is not a valid token. You need explicit end token only when your first input file name is equal to one of keywords of Dl. For example:

COUNT a LT 2 -- LT

In the above example, the third argument LT is a Dl keyword for comparison operator "less than", while the sixth word LT is the input file name. In this case you need the end token, because unless it, the file name is interpreted as a keyword, causing a syntax error. (Note that if the input file name was "lt" in small letter, you wouldn't need the end-token.)

Evaluation or Execution

As Dl grammer has only expressions, the program is "evaluated", in other words "executed". In this sence, it is like LISP.

Evaluation of elements (field names, variables, etc.) is described in the Syntactical Components sections. When an expression including operators is evaluated, operations described in the Operators and related special variables sections are executed.

Boolean Evaluation

The result of an operation of Dl may be an array of string or numeric values. When the result is evaluated as a boolean value, evaluation follows the next rule:

  1. When the value is simple (i.e. number of elements is 1),
    1. When the value is numeric,
      value 0 is FALSE, and non 0 value is TRUE.
    2. When the value is string,
      Null string is FALSE, and any other string is TRUE.
  2. When the value is NULL (i.e. number of elements is 0),
    it is FALSE.
  3. When the value is not simple (i.e. number of elements is greater than 1),
    it is TRUE, regardless to the type or element values.

It might cause queer situations. For example a numeric value { 0 } is FALSE, while a string value { 0 } is TRUE (by rule 1). Never the less, it works practically in most situations.

This boolean evaluation rule is generally applied in Dl, whenever an operand requires boolean values. For example logical AND operation requires the left hand operand to be evaluated as boolean, and the above rule is applied.

Operators and related special variables

Control Operators
Comparison Operators
Pattern Match and Substitute Operators
Logical Operators
Arithmetic Operators
Mathematical Operators
String Operators
Conversion Operators
Array Operators
Suffix Operators
Assignment Operator
I/O related special variables
Operator Precedence

Unary operator takes an operand right hand of it. Binary operator takes its left hand and right hand operands. Ternary operator has two words and takes three operands left hand of, between and right hand of them.

Other operators, such as suffix operator or IF operator, have their own syntax.

Some operators and special variables have synonyms, e.g., "!=", "<>" and "NE". This is partially because of avoiding shell special characters like ">", and partially because of sharing the operator name with popular applications such as awk or perl.

Control Operators

Control operators have UNIX sh like syntax. This version of Dl does not have case control, break or continue of C language. These functionalities might be supported in future versions. As operators of Dl, control operator also yields a value.

Next example is not recommended but is valid, because control operator IF forms an expression.

IF IF v >= 0 THEN v ELSE - FIELD v FI > 3 THEN q = large FI

Table of Control operators

  ;serial operation
IF THEN ELSE FIif
WHILE DO DONEwhile loop
FOR IN DO DONEfor loop
EXITstop the program
EPILOGUEextra execution

Serial operator

expression1 ; expression2

This is like comma operator of C language. First, expression1 is evaluated and regardless to its value, expression2 is evaluated. The result is the value of expression2. The form without expression2 is also allowed so that language C like semicolon usage does not cause syntax error. Precedence of serial operator is the lowest.

If operator

IF expression1 THEN expression2 FI
IF expression1 THEN expression2 ELSE expression3 FI

This is like "?:" operator of C language. First, expression1 is evaluated and if the result is interpreted as TRUE, then expression2, which becomes the IF operation result, is evaluated2. If the value of expression1 is FALSE and ELSE exists, expression3, which becomes the IF operation result is evaluated. If the value of expression1 is FALSE and there is no ELSE, then the value of expression1 becomes the IF operation's result.

While operator

WHILE expression1 DO expression2 DONE

Expression1 is evaluated and if the result is TRUE, then expression2 is evaluated and then expression1 is evaluated; this is repeated until expression1 becomes FALSE. WHILE operation's result is the last expression2 value. If expression1 value is FALSE from the first, WHILE operations's result is expression1 value.

For operator

FOR variable IN expression1 DO expression2 DONE

Expression1 is evaluated, and the variable is set to each element of the expression1 in turn. and expression2 is evaluated each time. The result of FOR operation is the last expression2 value. If the expression1 is NULL value, then the result is also NULL value.

Exit

EXIT

When this word is evaluated, the program stops at once. This special variable does not yield any value.

Eplilogue

EPILOGUE

This special variable returns simple numeric value 1 when the program is evaluated in epilogue mode, otherwise returns 0. See Ded for the epilogue mode.

Comparison Operators

Comparison operators are binary operators to compare left hand and right hand operands. Result is simple numeric value 1 or 0, meaning TRUE and FALSE correspondingly.

==EQ equal
!=NE<>not equal
>GT greater than
>=GE greater than or equal to
<LT less than
<=LE less than or equal to
INCL  includes

Comparison operators are ambivalent, i.e., they compare values as numeric or string depending on the type of operands. When either left hand or right hand operand is numeric, numeric comparison is made. When both operands are string, string comparison is made. For example:

CONST 9 '<' CONST 10

unexpectedly results 0 (FALSE). Because both constants are string by default.

NUM CONST 9 '<' CONST 10

gives expected result. The next example is the most common pitfall of comparison operator usage:

seq '<' 10

Again in this case, string comparison is made, and the result is not usually expected one. Use numeric qualifier for the field or NUM operator to make numeric comparison:

seq:n '<' 10
seq '<' NUM CONST 10

For the detail of comparison, see Comparison of values section of the manual Dintro.

Include operator

expression1 INCL expression2

Inlude operator handles operand as unorderd set of elements. IF expression2 is a subset of expression1 the result is TRUE (numeric 1), otherwise FALSE (numeric 0). That is, if any element of expression2 is equal to one of element of expression1 then the result is TRUE.

For example, assume the field "keywords" has a set of words, then,

Dselect keywords INCL database
Dselect keywords INCL { data base }

tests whether the field has a word "database", and tests whether the field has a word "data" and a word "base", correspondingly.

Pattern Match and Substitute Operators

=~ LIKE retular expression match
!~ UNLIKE regular expression unmatch
SUBST BY  substitute
SUBSTG BY  global substitute
$& MATCH matched string
$` PREMATCH string preceding matched string
$' POSTMATCH string following matched string

Pattern match operators, substitute operators are regular expression string match related operators. Regular expression of D follows the egrep specification. See the Dintro page for the detail.

Pattern match operator

expression1 =~ expression2
expression1 LIKE expression2
expression1 !~ expression2
expression1 UNLIKE expression2

LIKE is synonym of =~ and UNLIKE is synonym of !~.

Pattern match operators test regular expression matching. Both expression1 and expression2 are evaluated as string. Expression1 is the string to be tested, and expression2 is the regular expression. The result is simple numeric value 1 or 0.

Both expression1 and expression2 may be an array. Pattern matching is tested one by one for each pair of expression1 element and expression2 element. For the =~ (LIKE) operation, when there is at least one element of the expression2 regular expressions that matches at least one element of the expression1 strings, the result is 1, otherwise 0. For the !~ (UNLIKE) operation, when there is at least one element of the expression2 regular expressions that does not match at least one element of the expression1 strings, the result is 1, otherwise 0.

Note that !~ operation is not NOT =~ operation, except for both operands are simple values. For example,

NOT a !~ '^[0-9]+$'

tests whether field "a" is integer. When the field "a" repeats, it tests whether all of field "a" are integers, while next one tests if there is any integer in field "a":

a =~ '^[0-9]+$'

Note that above examples use APOSTROPHE (') quoting for UNIX shell. In the case of Windows shell, change APOSTROPHE to QUOTATION MARK (").

Precedence of pattern match operators is same as comparison operators, and its association is left to right.

Substitute operator

expression1 SUBST expression2BY expression3
expression1 SUBSTG expression2 BY expression3

Substitute operator is similar to s/xx/yy/ and s/xxx/yy/g of perl or sed (or ed, ex of UNIX). All expression1, expression2 and expression3 are evaluated as strings. The expression1 is the base string, the expression2 is the pattern and the expression3 is the replacement. Pattern is matched with the base string and when it matches, the matched part is substituted by the replacement and becomes the result. SUBST operation replaces the first occurences of each base string, while SUBSTG operation replaces all the occurences of matched strings of each base string. When it does not match, the base string becomes the result. In any case, the result is given as a new string and the base string itself is intact. Next example is often used to perform s/xx/yy/ like function.

FIELD a = FIELD a SUBST xx BY yy

In the replacement, some characters have special function.

&matched string
\1 part of matched string corresponding to the first parentheses
......
\9 part of matched string corresponding to the ninth parentheses
\&&
\\\

Any of operands may be an array. In this case, for each element of the base strings, each element of the patterns is tested in turn and the coreesponding element of the replacements is applied when the pattern matches. The result has same number of elements as the expression1 has. When there is no corresponding replacement, i.e., number of elements in the expression3 is less than number of elements in the expression2, null string becomes the replacement.

Exapmple: next two expressions are always TRUE

{ CBI NACSIS NII } SUBST { CBI NACSIS } BY { NACSIS NII }
== { NII NII NII }

CONST 2002/12/24 SUBST ([0-9]+)/([0-9]+)/([0-9]+) BY \2/\3/\1
== 12/24/2002

Precedence of substitution operator is higher than other binary operators, and just one level lower than unary operators. Association is left to right.

After match special variables

These special variables are same as perl special variables $&, $` and $' except for small details. When one of these special variables is evaluated, it yields a part of base string of the last successful pattern match in the current record cycle.

MATCH or $& is the matched part of the base string by the last successful pattern match.

PREMATCH or $` is the part of the base string preceding the matched part by the last successful pattern match.

POSTMATCH or $' is the part of the base string following the matched part by the last successful pattern match.

The result value is always simpe string. In the case pattern match operands are array, the last pair of elements that matched is used. Pattern matching order of the pattern match or substitute operation is base string precedent, i.e., for the first element of the expression1 each element of the expression2 is matched in the order of the array, then the second element of the expression1 is tested, and so on.

When there was no successful pattern matching in the current cycle, result is null string. Note that the pattern match operation in the previous cycle of program evaluation is never refered to by these special variables.

PREMATCH special variable used after SUBSTG operation is a bit tricky. As the pattern matching repeats itself within the same base string and the pattern there may be more than one matched strings. After SUBSTG operation, the last match is refered. In this case, PREMATCH refers to the base string already substituted one and not to the original base string. This is not same as perl's $`, but in the current version of Dl interpreter takes this way.

Logical Operators

! NOT logical not (unary)
&& AND logical and
|| OR logical or

Precedence of logical operators are weaker than other operators but for ; and =. Among them OR is the weakest, and AND follows, then NOT. Association is right to left.

Logical not

! expression
NOT expression

NOT is unary operator which evaluates the expression as boolean and gives simple numeric value 0 if the operand is TRUE, or 1 if it is FALSE.

Logical and or

expression1 && expression2
expression1 AND expression2
expression1 || expression2
expression1 OR expression2

AND (&&) and OR (||) are binary operators. Like perl the result is expression1 or expression2, rather than to give numeric value 1 or 0.

AND (&&) evaluates expression1 as boolean. When it is TRUE, the result of operation is expression2. When it is FALSE, the result is expression1, without evaluating the expresssion2.

OR (||) evaluates expression1 as boolean, When it is TRUE, the result of operation is expression1, without evaluationg expression2. When it is FALSE, the result is expression2.

Arithmetic Operators

+ plus
- minus, unary minus
* multiply
/ divide
%MODmodulus
** power

Arithmetic operators are binary operators, except for unary minus. Their syntax is:

expression1 binop expression2
- expression

Here, binop is one of the operators in the table.

Both expression1 and expression2 (or expression in the case of unary -) are evaluated as numeric. In the case of %, they are further converted to integers.

When both operands are simple values, ordinary arithmetic operation is performed. When left hand and/or right hand operand is array, Dl treat them in a special way:

1.
except for the case 2, the result is an array, of which i-th element is result of the operation to the i-th operands of expression1 and expression2. When the operands have different array length, the result has shorter one's length.
2.
When one of the operands is a simple value and the other is an array, the result is an array, of which i-th value is the result of the operation to the simple value and i-th element of the array.

Unary minus operator

- foo

is same as

CONST 0 - foo

Following examples are to demonstrate rules above, and results are all TRUE..

CONST 1 + CONST 1 == CONST 2
{ 1 2 3 } + { 3 2 1 } == { 4 4 4 }
{ 1 2 3 } * { 4 } == { 4 8 12 }
{ 1 2 3 } * { 4 5 } == { 4 10 }
{ 6 } / { 6 3 2 } == { 1 2 3 }
{ 6 5 } % { 4 3 2 } == { 2 2 }
{ } + { 1 2 3 } == { }
CONST 1 + { } == { }

Operator precedence is as usually,

- (unary)
**
* / %
+ -

and they are higher than logical, comparison operators and array operators. Precedence of binary arithmetic operators is lower than string concatenation operator, mathematical and unary array operators. Precedence of unary minus operator is same as mathematical operators.

Mathematical Operators

ABSabsolute value
SQRTsquare root
EXPexponential
LOGnatural logarithm
LOG10common logarithm
SINsine
COScosine
TANtangent
ATAN arch tangent

Mathematical operators are unary operators. Syntax of them is

op expression

where op is one in the table above.

In most computer languages, they are functions and the syntax is something like LOG(x). But in Dl, they are operators. You need no parentheses for the operand. For example:

SQRT a

But, there is no harm to use parentheses for them:

SQRT ( a )

The expression is evaluated as numeric value. When the expression is an array, the result is an array of same size, of which i-th element is the result of the operation to the expression's i-th element. See next examples:

ABS { -1 0 1 } == { 1 0 1 }
ABS { } == { }

Precedence of these operators are higher than arithmetic operators.

String Operators

.   Concatenation
LENGTH   Number of characters
TOUPPER CAPS Change lower case letters to uppercase

Concatenation operator

expression1 . expression2>

Concatenation operator is binary. Expression1 and expression2 are evaluated as strings. Expression2 is appended after the expression1 yielding the result. (The operands are not changed). When the operands are arrays, the operation follows the same rule as of the arithmetic binary operators. See the next example:

CONST 0x . { abc def } == { 0xabc 0xddef }
{ "$" "@" } . { 115.00 1.15 } == { "$115.00" "@1.15" }

Precedence of the concatenation operator is higher than arithmetic operators (**), and lower than mathematical unary operators.

Length operator

LENGTH expression>

LENGTH is a unary operator. The operand is evaluated as string. The result of LENGTH operation is a numeric value which shows the number of characters in the operand. When the operand is an array, the result is a numeric array of same size, of which i-th element is the length of the i-th element of the operand. See the next example:

LENGTH { abc def } == { 3 3 }

Precedence of LENGTH operator is same as mathematical unary operators.

Toupper operator

TOUPPER (or CAPS) operator makes a string that has same length and same contents as the operand, but all the lower case letters in it are converted to the corresponding upper case letters. Note that the operand itself is unchanged. Array handling is same as LENGTH or mathematical unary operators. See the next example:

TOUPPER { Dselect Dgrep Dextract } == { DSELECT DGREP DEXTRACT }

Precedence of TOUPPER operator is same as mathematical unary operators.

Conversion Operators

NUMnumeric conversion
STRstring conversion
INTinteger conversion

Num operator

NUM expression

NUM operator converts the expression into numeric values. It does not change the value of operand itself. See Dintro for the detail. Array handling is same as mathematical unary operators. Precedence is same as mathematical unary operators.

Str operator

STR expression

STR operator converts the expression into string values. It does not change the value of operand itself. See Dintro for the detail. Array handling is same as mathematical unary operators. Precedence is same as mathematical unary operators.

Int operator

INT expression

INT operator evaluates the expression as numeric values and then convert them into integer, It does not change the value of operand itself. Array handling is same as mathematical unary operators. Precedence is same as mathematical unary operators.

Array Operators

, array concatenation
COUNT Count elements of an array
EXISTS Tests an expression has elements
MIN minimum value in an array
MAX maximum value in an array
SUM sum of values in an array
AVG average value of elements in an array
CAT concatenate array elements
.. create an array of sequence number

Array concatenation operator

expression1 , expression2

Array concatenation is a binary operator. Elements of the expression2 are appended after elements of the expression1 and it becomes the result. (Expression1 itself is not changed even if it is a field or variable). Number of elements in the result is addition of the numbers of elements of both operands. Next example is TRUE:

{ 1 2 3 } , { 4 5 6 } == { 1 2 3 4 5 6 }

When the type of the expression2 is different from the type of the expression1, expression2 is converted into the type of the expression1. For example:

NUM { 1 2 3 } , STR { a b c } == NUM { 1 2 3 0 0 0 }

Note that { a b c } is evaluated as { 0 0 0 } as numeric. (See Dintro).

Precedence of the array concatenation operator is lower than arithmetic operators and higher than comparison operators.

COUNT and EXISTS

COUNT expression
EXISTS expression

COUNT returns a simple numeric value which shows number of elements in the expression.

EXISTS returns a simple numeric value 1 when the expression has one or more elements, or 0 when the expression has no element.

Precedence of these operators are same as arithmetic unary operators.

Statistics within an array

MIN expression
MAX expression
SUM expression
AVG expression

These are unary operators, which gives simple statistic values of the expression array. All elements of the expression are evaluated as numeric, and the result is always a simple numeric value. MIN, MAX gives the minimum or maximum value of the elements. SUM gives the summation of all elements. AVG gives SUM/COUNT value of the elements. See the following examples:

COUNT { 1 2 3 4 5 6 } == CONST 6
COUNT { } == CONST 0
MIN { 1 2 3 4 5 6 } == CONST 1
MAX { 1 2 3 4 5 6 } == CONST 6
AVG { 1 2 3 4 5 6 } == CONST 10.5

Precedence of these operators is same as mathematical unary operators.

CAT operator

CAT expression

CATis a unary operator. All elements of the expression are evaluated as strings, and the result is the concatenation of these element strings in the element order. See the following example:

CAT { 1 2 3 4 5 6 } == CONST 123456

Precedence of CAT operator is same as mathematical unary operators.

Range operator

expression1 .. expression2

Range operator is similar to perl's range operator in scalar context. Next example shows a typical usage:

FOR i IN 0 .. COUNT a - 1 DO
  IF a [ VAR i ] =~ [0-9]+ THEN num = FIELD num , a [ VAR i ] FI
DONE

Both expression1 and expression2 are evaluated as numeric. The last element of the expression1 and the first element of the expression2 are converted into integer. First, expression1 except for its last element is copied to the result array. Then, the integer values from the last element of the expression1 upto the first element of the expression2 are filled. In the last, the expression2 but for the first element is copied after that. Step value is 1 or -1 depending on which value is larger. See next example:

{ 0.5 1.5 } .. { -1.5 -0.5 }
== { 0.5 1 0 -1 -0.5 }

(You may wonder why 0.5 and -0.5 are not converted to integer. There is no strong reason, just for the implementation convenience.)

Precedence of range operator is higher than array concatenation operator and lower than arithmetic operators.

Suffix Operators

expression1 [ expression2 ]

The expression2 is suffix and evaluated as numeric value and then converted to integer. The result of suffix operation is the element[s] of the expression1 of which index number (suffix) is given by the expression2. Note that the first element is suffixed as zero. When the suffix is out of the expression1's range, the result is NULL value. When the suffix is an array, the result is also an array, of which i-th element is the result of suffix operation between the same expression1 and the i-th element of the expression2.

Following examples are all TRUE.

{ 1 2 3 } [ 0 ] == { 1 }
{ a b c } [ 1.5 ] == { b }
{ 1 2 3 } [ -1 ] == { }
{ a b c } [ 3 ] == { }
{ 1 2 3 } [ { 0 1 } ] == { 1 2 }
{ a b c } [ { 2 1 0 } ] == { c b a }
{ a b c } [ { 1 2 3 } ] == { b c }

Note that foo [ 1 ] [ 0 ] is valid expression, but it is just same as foo [ 1 ]. Unlike arrays in C language, Dl supports only one dimensional array.

Assignment Operator

LHE = expression2
LHE [ expression1 ] = expression2

Assignment operator evaluates the expression2 and assign its value to the left hand expression (LHE). It also yields the assigned value as this operation's result.

When the type of the expression2 is different from the type of LHE, the expression2 is converted to the type of LHE. If LHE has no type (e.g. the first use of a VAR), LHE becomes the type of the expression2. Note that fields and the current record special variable are string type.

LHE is limited to a field, variable, static variable, current record special variable. When there is suffix operator with the LHE, expression1 is evaluated as suffix (integer) and the object of assignment is limited to a certain elements of the LHE indexed by the suffix.

When the LHE is a field name, field order in the current record is kept as far as possible. For example, the current record is

a:A
b:B

and the program is

FIELD a = CONST foo

then, the result is:

a:foo
b:B

Note that field "a" is not moved.

Both the target field and the assigned value may be array. When they have same number of elements, corresponding element is assigned keeping the field positions. When the target field has more elements than the assigned value has, excess fields are eliminated from the current record. When the target field has less elements than the assigned value has, excess elements are inserted just after the last existing elemet with the same field name.

For example, assume the current record is

a:A
b:AA
a:B
b:BB

and the program is

FIELD a = { 0 1 2 3 }

then the result is:

a:0
b:AA
a:1
a:2
a:3
b:BB

When the target field is new to the current record, new field is appended after existing fields.

When suffix is provided, the expression1 is evaluated as numeric and then converted to integer. Object of assignment is limited to the element[s] indexed by the suffix value[s].

For example, the current record (or a [static] variable's value) is:

a:AA
a:BB
a:CC

and the program is:

a [ 1 ] = { 1 }

then the result is:

a:AA
a:1
a:CC

When the assigned value is an array like:

a [ 1 ] = { 1 2 }

target element is replaced by the the array and the result is:

a:AA
a:1
a:2
a:CC

Similarly, if the assigned value is NULL like:

a [ 1 ] = { }

it means element removal and the result is:

a:AA
a:CC

When the suffix is outside the target's current elements, the assigned value is just lost. (Note that this specification is different from perl which extends the array size.)

When the suffix of LHE is an array, the rule is "one to one but the last target takes the rest".

For example, assume a variable a has value:

{ AA BB CC DD }

and the program is:

VAR a [ { 0 2 } ] = { 0 2 }

then the result is { 0 BB 2 DD }.

When the assigned value has more elements:

VAR a [ { 0 2 } ] = { 0 2 4 6 }

then the result is { 0 BB 2 4 6 DD }.

And when the assigned value has less elements:

VAR a [ { 0 2 } ] = { 0 }

then elements of the variable "a" is removed like { 0 BB DD }.

I/O related special variables

OUTPUToutput current record
FILENAMEcurrent input file name
FNR REC# current record number in the file
NR $. current record number throughout the input files
@_ CURREC FIELDS current record itself

These are special variables and has no operands.

Output

When this special variable is evaluated Dl processor writes the current record to standard output. It does not flush the current record. This special variable's value is always simple numeric value 1.

Current file name

This special variable holds the current input file name as its value. If the input is standard in, the value is null string.

Current record number

This special variable holds the current record number in the current input file (start from 1). It has two names by historical reason. REC# came from Dpr header line, and FNR is awk compatible.

Current record number throughout the input files

This special variable holds the current record number throughout the input files. It is same as FNR if there is just one input file. It has two name: NR is awk compatible and $. is perl compatible name.

Current record special variable

This special variable is the raw mode current record, i.e., each element is a field of the current record, which consists of field name, COLON and the value as appears in the D-file.

Quite naturally, assigning a value to this special variable affects to the current record. Note that you are responsible to make field name COLON value form to keep the D-record valid.

Assigning NULL value to this special value:

CURREC = { }

means deletion of the current record. (As long as you don't assign value afterwards.)

This special variable has three names: CURREC, FIELDS and @_. @_ is perl like name. Though it is not quite same, perl users will easily remember this name. CURREC, naturally, stands for "current record". But, in some cases, like COUNT CURREC, this name seems unnatural, and COUNT FIELDS sounds more natural. This is the reason of this altenative. You can use whichever you like.

Operator Precedence

OperatorAssociativity
[ ]left to right
- ABS ATAN AVG CAT CAPS
COS COUNT EXISTS EXP INT
LENGTH LOG LOG10 MAX MIN
NUM SIN SQRT STR SUM TAN TOUPPER
right to left
SUBST SUBSTG BYleft to right
.left to right
**left to right
* / %left to right
+ -left to right
..left to right
,left to right
== != > < <= >= INCL =~ !~left to right
!right to left
&&right to left
||right to left
=right to left
;right to left

Examples

In the following examples, FIELD, CONST or VAR is omitted as far as possible. This is to demonstrate how Dl parser works, and it is not recommendation. I would rather recommend not to rely on these default interpretations, which may often lead you to mistakes.

Test if a field "lang" is "jpn":

lang == jpn

Test if a field "yr" is smaller than 2003 as a numeric value:

yr:n LT 2003

Assume each input D-record has just one field "l", of which value is a line of a text file. In ths text file, a paragraph is separated by a blank line. Next example adds a field "P" which holds the paragraph number to each D-record.

IF NR == 1 THEN STATIC P = 1 FI ;
IF l =~ " *" THEN STATIC P = STATIC P + 1 FI ;
FIELD P = STATIC P

Works as Dtie -t / y,m,d ymd (under the condition that fields "y", "m" and "d" have same number of elements).

ymd = FIELD y . CONST / . FIELD m . CONST / . FIELD d ;
y = FIELD m = FIELD d = { }

Works as Dproj foo:

FOR i IN 0 .. COUNT FIELDS - 1 DO
  IF FIELDS [ VAR i ] =~ ^foo: THEN
    VAR f = VAR f , VAR i
  FI
DONE ;
FIELDS = FIELDS [ VAR f ]

See Also

Dintro, Ded, Dselect

AUTHOR

MIYAZAWA Akira


miyazawa@nii.ac.jp
2003