Dana Vrajitoru 
C311 Programming Languages 
Compiler Compilers
Compiler Compilers
- Programs that simplify the creation of compilers.
 - An early version was written by Broker in 1960. It was called a
compiler-compiler but it was more of a configurable compiler.
 - The first ones: 1975, Lex by M. Lesk and Yacc (Yet Another
Compiler Compiler) by S. Johnson.
 - They must allow a full description of the syntax of the language
(usually expressed in tables) and of the target code generation.
 - The result of the program execution is a compiler.
 
Lex
Source ->  Lex   -> yylex
Input ->  yylex   -> Output
- A program generator designed for lexical processing of character
input streams.
 - It uses rules to determine the way the input is processed.
 - By itself it processes the input to produce the output directly.
 - A rule is made of a regular expression which is a pattern to be
matched by the input string, and an action to be taken when the
pattern in found.
 
Example in Lex
%%
[ \t]+$	;
[ \t]+ 	printf(" ");
%% is the beginning of the set of rules, 
[ ]+ means one or more of the content (the Kleene +)
$ means the end of the line. \t is any whitespace.
Every input character that is not matched by a rule is copied to
the output as is.
The output replaces any sequence of whitespaces with a single
space, except at the end of a line where they are deleted.
Yacc and Lex
       lexical      grammar
        rules        rules
          ||          ||
          \/          \/
         Lex         Yacc
          ||          ||
          \/          \/
Input -> yylex  ->  yyparse -> Parsed input
Lex is used for the lexical analysis of the program, while Yacc is
used for the parsing step.
Source File Format
Lex: 
definitions 
%% 
rules 
%% 
user subroutines      
 | 
Yacc: 
declarations  
%%  
rules  
%%  
programs 
 | 
Yacc Syntax
- Declarations:
%token name1 name2 ...
%start symbol 
 - Every symbol not defined as a token is considered a non-terminal.
 - Rule: 
NT : body ;
 - where NT is a non-terminal, and the body represents the right hand
side of the rule, where all the symbols are separated by spaces.
 - The | can be also be used to designate alternatives.
 
Examples of Rules
A  :  B  C  D  |  E  F  |  G  ; 
- Actions can be associated with some of the rules:
A : '(' B ')' { hello( 1, "abc" ); } 
 - The function hello is defined in the programs section.
 - Returned value:
expr : '(' expr ')' { $$ = $2 ; } 
 - This rule says that an expression is an expression enclosed in
parenthesis and that is evaluated to the result returned by the second
symbol on the right (expr). 
 
$$ is the returned value. 
 
$1, $2, ... are the symbols in the rule.
 
Example
%left '+' '-' 
%left '*' '/' 
%% 
expr : expr '+' expr | 
       expr '-' expr | 
       expr '*' expr | 
       expr '/' expr | 
       '-' expr %prec '*' | 
       NAME ;