C311 Semantic Analysis

Dana Vrajitoru
C311 Programming Languages

Semantics, Attribute Grammars, Target Code Optimization

Semantic Analysis

Analyzing the meaning of the program/tree/sentence/image/etc.
For a program: it is not done with a context free grammar because it needs to check for consistencies in the program.
Examples: function calls and the number of parameters, the type of a variable and the type of the value assigned to it.
Semantic rules can be static or dynamic. The static rules are enforced by the compilers. The dynamic ones are enforced at runtime.

Attribute Grammars

They generate parse trees with decorations, or annotations added to the nodes.
These attributes can be properties or even actions to be taken.
In some cases, the attribute is an additional value associated with non terminals and terminals.
In other cases, for each rule we associate an action to be taken with the attributes of the symbols. This can be copy rules or semantic function calls.
Other theoretical AG: denotation semantics (machine-independent code), axiomatic semantic (proofs).

Example
Context-free

E -> E + T E -> E - T E -> T ... F -> ( E )
Attribute Grammar

E1 -> E2 + T E1.val := sum(E2.val, T.val) E1 -> E2 - T E1.val := difference(E2.val, T.val) E -> T E.val := T.val F -> ( E ) F.val = E.val

Evaluating Attributes

Synthesized attributes: those whose whose values are calculated only when their symbol appears on the left hand side.
Inherited attributes: those whose value is computed even when they are on the right hand side of a rule.
Grammars with synthesized only attributes are called S-attributed. The attributes of the tokens are initialized by the scanner. The attribute flow is bottom-up in the parse tree.

Inherited Attributes
They allow for the values of the attributes to be passed sideways and down in the tree. They may add symbol table information.

E -> T TT
TT.st := T.val			E.val := TT.val
TT1 -> + T TT2

TT2.st := TT1.st + T.val	TT1.val := TT2.val
TT1 -> - T TT2

TT2.st := TT1.st - T.val	TT1.val := TT.val
TT -> e				TT.val := TT.st

Attribute Grammars in NLP

They express syntactical agreements between various sentence parts.
```
S -> NP VP
if (NP.number != VP.number) error;
```
They can also be used to build a set of predicates representing the meaning of the sentence.
```
 S -> NP V Obj
return V.mean(NP.mean, Obj.mean);
```

Code Optimizers

The output code must not, in any way, change the meaning of the program.
Optimization should increase the speed of the program and if possible, the program should require less resources.
Optimization should itself be fast and should not delay the overall compiling process.
Can be machine-independent or machine dependent (involving registers, absolute memory references).

Code Optimization

There are several possible levels of target code improvement.
Peephole optimizer: scans already generated code for obvious suboptimal sequences.
Local optimization: improvement of basic blocks, maximal groups that will execute in their entirety (loops, conditionals, etc.) Mostly tries to eliminate redundancy.
Global optimization: looks at functions in their entirety. They focus on multi-block elimination of redundancy, instruction scheduling, register allocation.
Interprocedural optimization: more difficult.

Peephole Optimization

Elimination of redundant loads and stores.
Constant folding: x = 3 * 2 becomes x = 6.
Constant propagation:
r1 = 4
r2 = r2 + r1 becomes r2 = r2 + 4
Common subexpression elimination.
Strength reduction:
r1 = r2 * 2 becomes r1 = r2 + r2 or r1 = r2 << 1

Local Optimization

Redundant loads and computations can be merged into individual nodes with multiple parents.
Value numbering assigns the same name (a "number") to any two or more symbolically equivalent computations ("values"), so redundant instances will be recognizable by their common name.
A dictionary is used to keep track of these values.

Loop Optimization

Invariant code: code that computes the same value in each iteration.
Loop invariant code motion: move the invariant computations outside (before) the loop.
Induction variable: a variable incremented or decrement by a loop invariant. Can be optimized by strength reduction or common references.
Dead Code elimination: find pieces of code that can never be reached and eliminate them.

Original	Optimized
do { item = 10; value += item; } while(value < 100);	item = 10; do { value += item; } while(value < 100);

Loop Invariant

// Original
do
{
    item = 10;
    value += item; 
} while(value < 100);

// Optimized
item = 10;
do
{
    value += item; 
} while(value < 100);

More Loop Invariant

// Original
for i = 1 to N 
    x = x + 1;
    for j = 1 to N 
        a(i,j) = 100*N + 10*i + j+ x;

// Optimized
t1 = 100*N
for i = 1 to N 
    x = x + 1; 
    for j = 1 to N 
        a(i,j) = t1 + 10*i + j + x;

// Super-Optimized
t1 = 100*N;
for i = 1 to N 
    x = x + 1;
    t2 = t1 + 10*i + x;
    for j = 1 to N 
        a(i, j) = t2 + j;

Induction Variable Reduction

// Original
for i = 1 to N 
    a[i] = b[i];

// Optimized
t1 = &a;
t2 = &b;
for i = 1 to N 
    *t1 = *t2;
    t1++;
    t2++;

Induction Variable in Expressions

(for i=0; i<n; i++)
    ...
    k = i * b;  
    // b is a loop invariant

// Replace with:
k = 0;
(for i=0; i<n; i++)
    ...
    k += b;

Applying the Induction Variable Method

// Ultra-Optimized
t2 = 100*N + x;
for i = 1 to N 
    t2 += 11;
    for j = 1 to N 
        a(i,j) = t2 + j;

Global Optimization

Global common instruction elimination and global variable numbering.
Instruction scheduling: build a tree of dependencies and then use an algorithm to find an optimal linear sequence of these instructions.
Loop improvement: loop unrolling (several iterations in one), loop reordering (for nested loops), parallelization.