Dana Vrajitoru 
C311 Programming Languages 
Introduction to Formal Grammars
Formal Grammars
- Abstract structures that describe a language precisely.
 - Composed of a set of rules that can be used to identify correct /
incorrect strings of tokens in a language.
 - They are used in particular during the compilation, mostly in the
syntactic analysis phase (parsing) and in part during the semantic
analysis phase (translation to machine language).
 - Other major application: natural language processing (NLP), image
synthesis.
 
Formal Definition
- Let L be a finite set (alphabet, terminal symbols) and N be a set
of non-terminal symbols.
 - The goal of a grammar is to generate all possible strings over the
alphabet L that are syntactically correct in the language.
 - This is done by a set of production rules. A rule expands a
non-terminal symbol as a sequence which can be any combination of
terminal and non-terminal symbols.
 - Usually there is a starting non-terminal symbol.
 
Production rules
- A production rule is of the form
s1 T s2 => s3
where T is any non-terminal symbol and s1, s2,
and s3 are strings made of any combination of terminal and
non-terminal symbols. We say that the rule expands the symbol T.
 - A production rule can be recursive if the symbol on the left
appears in the string to the right.
 - A grammar produces strings over the alphabet L by starting from a
string containing only the starting symbol.
 - It transforms a string into another by replacing one non-terminal
symbol at a time with the right-hand side of a production rule that
expands it. The transformation stops when the resulting string is made
only of terminal symbols.
 
Example
Production Tree
- We can build a tree to show the production rules that we apply to
derive any string in our class.
 - The leaves are terminal nodes. All the other nodes are
non-terminal symbols. An ordered print of all the leaves is the final
string.
    
 
; Lisp implementation of the grammar.
(defun check-S (Lst)
  (if (equal (car Lst) 'a) 
      (check-T (cdr Lst))
    nil))
(defun check-T (Lst)
  (if (equal (car Lst) 'b)
      (or (check-T (cdr Lst))
          (check-V (cdr Lst)))
    nil))
(defun check-V (Lst)
  (if (and (equal (car Lst) 'a))
       (not (cdr Lst))))
(check-S '(a b a)) 		; t
(check-S '(a b b b a)) 	        ; t
(check-S '(b a b)) 		; nil
Types of Grammars
- Noam Chomsky defined a hierarchy in 1956:
 - Type 0: all formal grammars (unrestricted). They generate
all languages that can be recognized by a Turing machine.
 - Type 1: context-sensitive. Rules are of the form
s1 S s2 => s1 s3
s2a y b, where si are any strings made of
terminals and/or non terminals and S is a non terminal. Equivalent to
linear bounded Turing Machine.
 - Type 2: context-free. They are of the form S => str, where
S is a non terminal ans str is any string made of terminals and/or non
terminals. Equivalent to a pushdown automaton (using a stack). They
describe most programming languages.
 - Type 3: regular grammars. The rules are of the form S => T
or S => a T or S =>T a or S => a, where a is a terminal, S and T are
non-terminals. They are equivalent to finite state machines.
 
Examples of Rules
- Type 0: S b c => T b a V
 - Type 1:	many NP =>many noun-plural
 - Type 2: Condition => ( Expression )
      Assignment => variable = Expression ;
 - Type 3: see example used before.