grammar - A formal definition of the syntactic structure of a language, normally given in terms of production
rules which specify the order of constituents and their sub-constituents in a sentence (a well-formed string in
the language). Each rule has a left-hand side symbol naming a syntactic category (e.g. "noun-phrase"
for a natural language grammar) and a right-hand side which is a sequence of zero or more symbols. Each symbol
may be either a terminal symbol or a non-terminal symbol. A terminal symbol corresponds to one "lexeme"
- a part of the sentence with no internal syntactic structure (e.g. an identifier or an operator in a computer
language). A non-terminal symbol is the left-hand side of some rule.
One rule is normally designated as the top-level rule which gives the structure for a whole sentence.
A grammar can be used either to parse a sentence (see parser) or to generate one. Parsing assigns a terminal syntactic
category to each input token and a non-terminal category to each appropriate group of tokens, up to the level of
the whole sentence. Parsing is usually preceded by lexical analysis. Generation starts from the top-level rule
and chooses one alternative production wherever there is a choice.
Backus-Naur Form - (BNF) A formal meta-syntax used to express context-free grammars. Backus Normal Form
was
renamed Backus-Naur Form at the suggestion of Donald Knuth.
BNF is one of the most commonly used metasyntactic notations for specifying the syntax of programming languages,
command sets, and the like. It is widely used for language descriptions but seldom documented anywhere (how do
you document a metasyntax?), so that it must usually be learned by osmosis.
Consider this BNF for a US postal address:
<postal-address> ::= <name-part> <street-address> <zip-part> <personal-part> ::= <name> | <initial> "." <name-part> ::= <personal-part> <last-name> [<jr-part>] <EOL> | <personal-part> <name-part> <street-address> ::= [<apt>] <house-num> <street-name> <EOL> <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
This translates into English as: "A postal-address consists of a name-part, followed by a street-address part,
followed by a zip-code part. A personal-part consists of either a first name or an initial followed by a dot. A
name-part consists of either: a personal-part followed by a last name followed by an optional "jr-part"
(Jr., Sr., or dynastic number) and end-of-line, or a personal part followed by a name part (this rule illustrates
the use of recursion in BNFs, covering the case of people who use multiple first and middle names and/or initials).
A street address consists of an optional apartment specifier, followed by a street number, followed by a street
name. A zip-part consists of a town-name, followed by a comma, followed by a state code, followed by a ZIP-code
followed by an end-of-line."
Note that many things (such as the format of a personal-part, apartment specifier, or ZIP-code) are left unspecified.
These lexical details are presumed to be obvious from context or specified somewhere nearby.
LL - A class of language grammars, which can be parsed without backtracking. The first L stands for Left-to-right
scan, the second for Leftmost derivation.
Often found in the form LL(k) where k is the number of tokens of look-ahead required when parsing a sentence of
the language. In particular, LL(1) is a fairly restrictive class of grammar, but allows simple top-down parsing
(e.g. recursive-descent) to be used without wasteful backtracking. A number of programming languages are LL(1)
(or close).
recursive descent parser - A "top-down" parser built from a set of mutually-recursive procedures
or a non-recursive equivalent where each such procedure usually implements one of the productions of the grammar.
The structure
of the resulting program closely mirrors that of the grammar it recognises.
The only true was to determine is a grammar is LL(1) is to generate the first, follow and select sets for the grammar which is not a process for the faint of heart. However, we can remove some common problems.
Removing Left recursion
The following is a left recursive grammar:
A ::= Aa | B | C
To remove the left recursion, we can rewrite the grammar using an additional nonterminal as follows:
A ::= BX | CX
X ::= aX | lambda
where lambda is null
Common Left Factors
The following example shows the case where a production may begin with a choice of two terminals as:
A ::= xy | xz
A grammar of this form does not meet the requirements of LL(1). We can rewrite the grammar as:
A ::= xB
B ::= y | z
<stmt> ::= <if> | <assign> | <for> | <print>
<if> ::= IF <cond> THEN <stmt> END |
IF <cond> THEN <stmt> ELSE <stmt> END
<assign> ::= <id> := <exp>
<for> ::= FOR <id> IN <exp> DO <stmt> END
<print> ::= PRINT <id>
<cond> ::= <num> IN <exp> | <exp> <op> <exp>
<op> ::= # | = | <= | >=
<exp> ::= <exp> + <term> | <exp> - <term> | <term> |
INCL(<exp> , <elem>) | EXCL(<exp> , <elem>)
<term> ::= <term> * <factor> | <factor>
<factor> ::= (<exp>) | <set>
<set> ::= <id> | {} | {<setlist>}
<setlist> ::= <num> | <num> , <setlist>
<elem> ::= <id> | <num>
<id> ::= A | B | C | D | ... | X | Y | Z
<num> ::= 1 | 2 | 3 | 4 | ... | 256
Revised Grammar (revised form, eliminating left recursion and common left factors)
<stmt> ::= <if> | <assign> | <for> | <print>
<if> ::= IF <cond> THEN <stmt> <if_tail>
<if_tail> ::= END | ELSE <stmt> END
<assign> ::= <id> := <exp>
<for> ::= FOR <id> IN <exp> DO <stmt> END
<print> ::= PRINT <id>
<cond> ::= <num> IN <exp> | <exp> <op> <exp>
<op> ::= # | = | <= | >=
<exp> ::= <term> <exp_tail> | INCL(<exp> , <elem>) | EXCL(<exp> , <elem>)
<exp_tail> ::= + <term> <exp_tail> | - <term> <exp_tail> | lambda
<term> ::= <factor> <term_tail>
<term_tail> ::= * <factor> | lambda
<factor> ::= (<exp>) | <set>
<set> ::= <id> | { <set_tail>
<set_tail> ::= } | <setlist> }
<setlist> ::= <num> <setlist_tail>
<setlist_tail>::= , <setlist> | lambda
<elem> ::= <id> | <num>
<id> ::= A | B | C | D | ... | X | Y | Z
<num> ::= 1 | 2 | 3 | 4 | ... | 256
<exp> ::= <exp> + <term> | <exp> - <term> | <term> | INCL(<exp> , <elem>) | EXCL(<exp> , <elem>)
eliminating left recursion
<exp> ::= <term> <exp_tail> | INCL(<exp> , <elem>) | EXCL(<exp> , <elem>)
<exp_tail> ::= + <term> <exp_tail> | - <term> <exp_tail> | lambda
Note that lambda is the empty element.
<term> ::= <term> * <factor> | <factor>
eliminating left recursion
<term> ::= <factor> <term_tail>
<term_tail> ::= * <factor> | lambda
<if> ::= IF <cond> THEN <stmt> END |
IF <cond> THEN <stmt> ELSE <stmt> END
eliminating common left factors:
<if> ::= IF <cond> THEN <stmt> <if_tail> <if_tail> ::= END | ELSE <stmt> END