Data Definition Language
Introduction
This document is the manual for version 1 of the
Remark The language is inspired by
Grammars
This section describes context-free grammars used in this specification to define the lexical and syntactical structure of a program.
Context-free Grammars
A context-free grammar consists of a number of productions. Each production has an abstract symbol called a nonterminal as its left-hand side, and a sequence of one or more nonterminal and terminal symbols as its right-hand side. For each grammar, the terminal symbols are drawn from a specified alphabet.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-free grammar specifies a language, namely, the set of possible sequences of terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.
Lexical Grammar
A lexical grammar for the DDL language has its terminal symbols the characters of the Unicode character set. It defines a set of productions, starting from the goal symbol `word`, that describe how sequences of Unicode characters are translated into a sequence of words. Only UTF-8 sequences of length 1 are support in version 1 of this language.
Syntactical Grammar
A syntactical grammar for the Data Definition Language has its terminal symbols the words defined by the lexical grammar. It defines a set of productions, starting from the goal symbol `sentence`, that describes how sequences of words are translated into a sentence.
Grammar Notation
Productions are written in fixed width
fonts.
A production is defined by its left-hand side, followed by a colon :
, followed by its right-hand side definition.
The left hand side is the name of the non-terminal defined by the production.
Multiple alternating definitions of a production may be defined.
The right hand side of a production consists of any sequence of terminals and non-terminals.
In certain cases the right-hand side is replaced by a comment describing the right-hand.
This comment is opened by /*
and closed by */
.
Example:
digit : /* A single Unicode character from the code point range +U0030 to +U0039. */
A terminal is a sequence of Unicode symbols.
A Unicode symbol is denoted a shebang (#
followed by a hexadecimal number denoting its code point.
Example:
The following productions denote the non-terminal for a sign as used in the definitions of numerals:
/* #2b is also known as "PLUS SIGN" */
plus_sign : #2b
/* #2d is also known as "MINUS SIGN" */
minus_sign : #2d
sign : plus_sign
sign : minus_sign
The syntax {x}
on the right-hand side of a production denotes zero or more occurrences of x
.
Example:
The following production defines a possibly empty sequence of digits
zero-or-more-digits : {digit}
The syntax [x]
on the right-hand side of a production denotes zero or one occurrences of x
.
Example:
The following productions denotes a possible definition of an integer numeral.
It consists of an optional sign
followed by a (with sign
and zero-or-more-digits
as defined in the preceeding examples):
integer : [sign] digit zero-or-more-digits
The empty string is denoted by ε
.
Example:
The following productions denotes a possibly empty list of integers
(with integer
as defined in the preceeding example).
Note that this list may include a trailing comma.
integer-list : integer integer-list-rest
integer-list : ε
integer-list-rest : comma integer integer-list-rest
integer-list-rest : comma
integer-list-rest : ε
/* #2c is also known as "COMMA" */
comma : #2c
Lexical Structure
The lexical grammar describes the translation of Unicode characters into words.
The single goal symbol of the lexical grammar is the word
symbol.
goal symbol
The goal symbol word
is defined by
word : delimiters
word : boolean
word : number
word : string
word : void
word : name
word : left_curly_bracket
word : right_curly_bracket
word : left_square_bracket
word : right_square_bracket
word : comma
word : colon
/*whitespace, newline, and comment are not considered the syntactical grammar*/
word : whitespace
word : newline
word : comment
whitespaces
The word whitespace
is defined by
/* #9 is also known as "CHARACTER TABULATION" */
whitespace : #9
/* #20 is also known as "SPACE" */
whitespace : #20
line terminators
The word line_terminator
is defined by
/* #a is also known as "LINEFEED (LF)" */
/* #d is also known as "CARRIAGE RETURN (CR)" */
line_terminator : #a {#d}
line_terminator : #d {#a}
comments
The language supports both single-line comments and multi-line comments. A comment_block
is either a single_line_comment
or a multi_line_comment
and hence is defined by
comment : single_line_comment
| multi_line_comment
A single_line_comment
starts with two solidus. It extends to the end of the line. Hence it is defined by
/* #2f is also known as SOLIDUS */
single_line_comment :
#2f #2f
/* any sequence of characters except for line_terminator */
The line_terminator
is not considered as part of the comment text.
A multi_line_comment
is opened by a solidus and an asterisk and closed by an asterisk and a solidus. Hence it is defined by
/* #2f is also known as SOLIDUS */
/* #2a is also known as ASTERISK */
multi_line_comment :
#2f #2a
/* any sequence of characters except except for #2a #2f */
#2a #2f
The #2f 2a
and #2a #2f
sequences are not considered as part of the comment text.
This implies
//
has no special meaning either comment./*
and*/
have no special meaning in single-line comments.- Multi-line comments do not test.
parentheses
The words left_parenthesis
and right_parenthesis
, respectively, are defined by
/* #28 is also known as "LEFT PARENTHESIS" */
left_parenthesis : #28
/* #29 is also known as "RIGHT PARENTHESIS" */
right_parenthesis : #29
curly brackets
The words left_curly_bracket
and right_curly_bracket
, respectively, are defined by
/* #7b is also known as "LEFT CURLY BRACKET" */
left_curly_bracket : #7b
/* #7d is also known as "RIGHT CURLY BRACKET" */
right_curly_bracket : #7d
colon
The word colon
is
/* #3a is also known as "COLON" */
colon : #3a
square brackets
The words left_square_bracket
and right_square_bracket
, respectively, are defined by
/* #5b is also known as "LEFT SQUARE BRACKET" */
left_square_bracket : #5b
/* #5d is also known as "RIGHT SQUARE BRACKET" */
right_square_bracket : #5d
alphanumeric
The word alphanumeric
is reserved for future use.
comma
The word comma
is
/* #2c is also known as "COMMA" */
comma : #2c
name
The word name
is defined by
name : {underscore} alphabetic {name_suffix_character}
/* #41 is also known as "LATIN CAPITAL LETTER A" */
/* #5a is also known as "LATIN CAPITAL LETTER Z" */
/* #61 is also known as "LATIN SMALL LETTER A" */
/* #7a is also known as "LATIN SMALLER LETTER Z" */
name_suffix_character : /* The unicode characters from #41 to #5a and from #61 to #7a. */
/* #30 is also known as "DIGIT ZERO" */
/* #39 is also known as "DIGIT NINE" */
name_suffix_character : /* The unicode characters from #30 to #39. */
/* #5f is also known as "LOW LINE" */
name_suffix_character : #5f
number
The word number
is defined by
number : integer_number
number : real_number
integer_number : [sign] digit {digit}
real_number : [sign] period digit {digit} [exponent]
real_number : [sign] digit {digit} [period {digit}] [exponent]
exponent : exponent_prefix [sign] digit {digit}
/* #2b is also known as "PLUS SIGN" */
sign : #2b
/* #2d is also known as "MINUS SIGN" */
sign : #2d
/* #2e is also known as "FULL STOP" */
period : 2e
/* #65 is also known as "LATIN SMALL LETTER E" */
exponent_prefix : #65
/* #45 is also known as "LATIN CAPITAL LETTER E" */
exponent_prefix : #45
string
The word string
is defined by
string : single_quoted_string
stirng : double_quoted_string
double_quoted_string : double_quote {double_quoted_string_character} double_quote
double_quoted_string_character : /* any character except for newline and double_quote */
double_quoted_string_character : escape_sequence
double_quoted_string_character : #5c double_quote
/* #22 is also known as "QUOTATION MARK" */
double_quote : #22
single_quoted_string : single_quote {single_quoted_string_character} single_quote
single_quoted_string_character : /*any character except for newline and single quote*/
single_quoted_string_character : escape_sequence
single_quoted_string_character : #5c single_quote
/* #27 is also known as "APOSTROPHE" */
single_quote : #27
/* #5c is also known as "REVERSE SOLIDUS" */
escape_sequence : #5c #5c
/* #6e is also known as "LATIN SMALL LETTER N" */
escape_sequence : #5c #6e
/* #72 is also known as "LATIN SMALL LETTER R" */
escape_sequence : #5c #72
boolean, void
The words boolean
and void
, respectively, are defined by
boolean : true
boolean : false
true : #74 #72 #75 #65
false : #66 #61 #6c #73 #65
void : #76 #6f # #69 #64
digit
The word
The syntactical grammar describes the translation of the sequence of words that make up a program into sentences.
The single goal symbol of the syntactical grammar is the sentence The words The goal sentence
The sentence
The sentence
The sentence The Data Definition Language knows six basic types List and Map,
which are the so called aggregate types, and Boolean, Number, String
and Void, which are the so called scalar types.
The type Boolean has two values true and false which are expressed in the language by the words
The type Number represents both 2-complement integer numbers as well as IEE754 floating-point numbers.
A value of type Number is expressed in the language by the word `number` (as defined in the lexical grammar). Note that the Data Definition Language does not impose restrictions on the range and precision of values
of type Number. Implementations, however, may impose restrictions. The type String represents UTF-8 strings. String values are expressed in the language by the word
Note that the Data Definition Language does not impose restrictions on the length of values of type String.
Implementations, however, may impose restrictions.
The type Void has a single value void which is represented in the language by the word
The type List represents lists of values.
A value of type List is expressed in the language by the sentence Example: The type Map represents maps from names to values.
A value of type Map is expressed in the language by the sentence Example:
If two name/value pairs from the same name in a map are
specified, then last specified name/value pair takes
precedence.
Example:
The following Data Definition Language program defines a Map value that contains two name/value pairs with the same name The effective Map value defined by the program is hence as name/value pair mapping to the value digit is defined by
digit : /* A single Unicode character from the code point range +U0030 to +U0039. */
Syntactical Structure
sentence
symbol.
whitespace
, line_terminator
, and comment
are removed from the
sequence of words before the translation to sentences is performed.sentence
is defined by
sentence : value
value
is defined by
value : map
value : list
value : string
value : number
value : boolean
value : void
map
is defined by
map : left_curly_bracket
map_body
right_curly_bracket
map_body : map_body_element map_body_rest
map_body : ε
map_body_rest : comma map_body_element map_body_rest
map_body_rest : comma
map_body_rest : ε
map_body_element : name colon value
list
is defined by
list : left_square_bracket
list_body
right_square_bracket
list_body : list_body_element list_body_rest
list_body : ε
list_body_rest : comma list_body_element list_body_rest
list_body_rest : comma
list_body_rest : ε
list_body_element : value
Types and Values
Scalar Types
Boolean Type
true
and false
, respectively (as defined in the lexical grammar).
Number Type
String Type
string
(as defined in the lexical grammar).
At the end of the lexical translation of a String word, its escape sequences are replaced by the Unicode characters they are representing.
Furthermore, the opening and closing quotes are removed.Void Type
void
(as defined in the lexical grammar).
Aggregate Types
List Type
list
(as defined in the syntactical grammar).
// A list with three numbers 1, 2, and 3.
[ 1, 2, 3 ]
Map Type
map
(as defined in the syntactical grammar).
// A map of
// text to 'Hello, World!'
// action to 'Print', and
// fontSize to 12.
{ text : 'Hello World!', action : 'Print', fontSize: 12 }
x
.
The first name/value pair maps to the value 0
and second name/value pair to the number 1
.
{ x : 0, x : 1 }
{ x : 1 }
0
is specified before the name/value pair mapping to the value 1
.