-----------------------------------------------------------
   Copyright (C) 2004 The Coins Project Group
      (Read COPYING for detailed information.)
-----------------------------------------------------------
Guidance to the COINS compiler infrastructure

                                         Nov. 2007
                                         July 2005
                                         February 2004
Contents

  1. Getting overview of the compiler infrastructure
  2. Structure of the COINS compiler infrastructure
  3. Construction of a compiler based on the COINS compiler infrastructure
   3.1 Compiler control
   3.2 Getting triggers of internal representation
   3.3 Symbol handling
   3.4 HIR handling


1. Getting overview of the compiler infrastructure
==================================================

The overview of the COINS compiler infrastructure is shown in
    http://www.coins-project.org
It contains
    project overview
    explanation of the infrastructure
    design documents
    result of compilation 
      (HIR, LIR, symbol table, flow information, object code)
    examples of compiler construction based on the infrastructure.
    source program archive of the COINS compiler infrastructure
and so on. 

The COINS compiler infrastructure is composed of 
    parsers to translate source language to HIR 
        (high level internal representation)
    control flow and data flow analyzers
    optimizers
    parallelizers
    HIR to LIR (low level intermediate representation) converter
    HIR manager and LIR manager
    symbol manager
    target machine descriptions 
    back-end with 
      code generator generator based on the target machine descriptions
    visualizer for compiling process
    etc.

At present, the compiler infrastructure contains two parsers,
for C and Fortran77, and 10 code generators, for SPARC, x86, x86_64, 
ARM, MIPS, SH-4, PowerPC, Alpha, MicroBlaze, and Thumb. Thus, it has
2*10 = 20 compilers. 

Compilation with debug-print option will show how the compiling 
process proceeds and how a source program is transformed into 
internal representation (HIR and LIR). It is the shortest way 
of understanding how the compiler infrastructure works and 
understanding HIR, LIR, and symbol table that are used as
interfaces between compiler components.

Please try to compile several short programs specifying 
debug-print option in such way as
    java -classpath ./classes coins.Driver -S -coins:trace=HIR.1/LIR.1/Sym.1 sample.c
(See README.txt, README.cc.txt, README.fc.txt of the compiler infrastructure.) 


2. Structure of the COINS compiler infrastructure
=================================================

(1) Overall structure

The COINS compiler infrastructure is composed of 3 major parts.
The front-end part translates the source program into HIR (high-level 
intermediate representation). The middle part converts HIR into LIR 
(low-level intermediate representation) and also does optimization and 
parallelization for HIR or LIR. The back-end part generates
the object code of the target machine from LIR. Additional management 
part contains miscellaneous functions such as compiler control,
HIR, LIR management, and symbol management.

HIR is an abstract representation of source program. It reflects 
logical structure of the source program such as subprogram, 
block, statement, expression, compound variable, simple variable, 
constant, and so on. Every language constructs in HIR has type
such as int, float, vector, struct, union, and so on.
HIR is designed so as to be able to represent programs of various 
languages such as C, Fortran, Pascal, Java.
The concrete representation of HIR is a tree.

LIR is an abstract representation of machine language program.
In LIR, operations are decomposed into elementary operations 
such as SET, JUMP, CALL, etc. with simple or compound operands.
The operands may represent memory, register, or expression.
Data type of LIR corresponds to data type handled in machines 
such as integer, float of some bit length. LIR is a language 
that can represents not only operations but also the entire program 
providing features to describe module, memory area, and data.
Semantics of LIR is defined rigorously according to the
denotational semantics to avoid misunderstanding.

Overall structure of the compiler is as follows:
  Front-end part 
    C language front-end
    Fortran77 front-end
  Middle part
    Basic optimizer for HIR
      (constant folding, constant propagation, dead code elimination,
       common subexpression elimination within basic blocks)
    Advanced optimizer for HIR 
      (partial redundancy elimination, loop expansion)
    HIR analyzer
      (control flow analyzer, data flow analyzer, alias analyzer)
    SSA (Static Single Assignment) optimizer for LIR
    Basic parallelizer (loop parallelizer)
    SMP (Symmetric Multi-Processor) parallelizer
    SIMD (Single Instruction Multiple Data stream) parallelizer
    HIR-to-LIR converter
    HIR-to-C source code generator
  Back-end part
    TMD (Target Machine Description)
    Code generator generator based on the target machine description 
    Machine code generators for
      Intel x86, ARM, MIPS, SH-4, Power PC
    LIR-to-C source code generator
  Visualizer
  Management part
    Compiler driver
    HIR manager
    LIR manager
    Symbol manager
    Root of information shared by compiler components

Following components will be added in near future.
  Java front-end that translates Java class file to HIR
  Inline expansion

(2) Class and interface

The compiler infrastructure is entirely written in Java. Every classes 
in Java has only one super class except for the root class named 
Object which has no super class. Interfaces of Java may have several 
super interfaces, and multiple inheritance is permitted between
interfaces. Each class may implement several interfaces so that
an instance of the class may be viewed as an object corresponding
to any interfaces implemented by the class. Instance (object) of 
a class are constructed (instantiated) by some constructor of the 
class. Fields of the object are created when the object is 
constructed. There may be static fields that are shared between 
all objects of the corresponding class. Methods of a class are either 
static method or instance method. Static methods can access only
static fields and can invoke only static methods. Instance 
methods should have receiving object, that is, instance methods 
should be invoked in such a way as 
    receivingOject.instanceMethod(....).

 The intermediate language managers (HIR, LIR), symbol table
 manager (Sym), flow analyzer (Flow) are separated into 
 interface modules such as HIR.java, Sym.java, Flow.java
 and implementation modules such as HIR_Impl.java, 
 SymImpl.java, FlowImpl.java. The interface modules describe
 relations of subclasses and usage of methods.
 Detailed specifications of access methods are described
 in the interface modules so that it is not necessary to see
 implementation modules except for some special cases.
  
 In order to make a compiler based on the COINS compiler infrastructure
 or to modify a compiler based on the infrastructure, 
 it is necessary to read
   Driver.java   Compiler control 
   IR.java       Intermediate language IR interface
   HIR0.java     Elementary HIR interface 
   Sym0.java     Elementary symbol table interface
   Flow.java     Control flow analyzer, data flow analyzer interface
 If you are not satisfied by the elementary interfaces HIR0, Sym0, read
   HIR.java      HIR interface with full specifications
   Sym.java      Symbol table interface with full specifications
 If you cannot get sufficient information, read some important sub-interfaces
   Type.java     Type interface (under Sym)
   Subp.java     Subprogram interface (under Sym)
   SubpDefinition.java Subprogram definition interface (under HIR)
   Stmt.java     Statement interface (under HIR)
   SubpFlow.java Subprogram flow interface (under Flow)
   BBlock.java   Basic block interface (under Flow)

 Machine dependent parameters and methods are concentrated in
    coins.MachineParam.java.
      (MachineParamSparc, MachineParamX86, MachineParamArm)
 Source language dependent parameters and methods are concentrated in
    coins.SourceLanguage.java.
      (SourceLanguageC, SourceLanguageFortran)
 Information requiring mutual exclusion between individual compilers
 are concentrated in
    coins.Registry.java.

 Usage of methods is usually described in upper interfaces 
 so that it is not necessary to read lower interfaces.
 Methods are interrelated and there may be restrictions
 in invoking them. In upper interfaces, many upper methods are 
 provided to make the use of access methods simple.
 You will misuse the access methods if you read lower interfaces
 or read implementation modules before reading upper interfaces.
 
 Execution of the compiler is controlled by Driver.java.
 There are several classes that contains global information
 and methods available to all over the compiler infrastructure. 
 They are placed in coins package. Some of them are 
    IoRoot.java       Root of I/O file and all internal information. 
    SymRoot.java      Root of symbol information.
    IrRoot.java       Root of IR information.
    HirRoot.java      Root of HIR information
    FlowRoot.java     Root of control flow and data flow information.
    CompileError.java Compiler error handling class.
    FatalError.java   Fatal error handling class.
    Debug.java        Compiler debug control class.
    Registry.java     Information requiring mutual exclusion between 
                      individual compilers.
 The package structure of the infrastructure is explained in README.txt.


3. Construction of a compiler based on the COINS compiler infrastructure
========================================================================

3.1 Compiler control
--------------------

The compiler driver described in Driver.java controls the execution 
of the compiler. It contains definitions of compiler options and 
invocation statements for compiler components. To change the 
sequence of component invocations and to add, replace, delete 
some compiler components, you should make a subclass of Driver.
The procedure of component invocation is described in the method 
compile(....) in Driver.java. As for detail, 
see doc-en/README.driver.entxt.

The infrastructure does not use static fields except for final
static in order to make it possible to develop a compiler where
its components can be executed in concurrent. All methods except 
for some ones in Root classes (classes such as IoRoot, SymRoot, 
HirRoot, LirRoot, etc.) are non-static methods and should be 
applied to instances.

The compiler driver instantiates IoRoot first and supplies
source file, object file, print file, etc. (See getSourceFile(),
printOut, objectFile, msgOut, etc. in IoRoot.java). All compiler
components should convey the instance of IoRoot (ioRoot) directly 
or indirectly and make it protected or public so that it can be 
accessed directly or indirectly from methods in the component. 
In more detail, objects of other Root classes include a reference to 
the instance of IoRoot in order to enable input/output operations.
All Sym objects include a reference to SymRoot object,
all HIR objects include a reference to HirRoot,
and so on. In this way, almost all classes has a link to
the IoRoot directly or indirectly in order to enable
input/output operations in their methods.
IoRoot has such methods as 
  getSourceFile(), getSourceFilePath(), getCompileSpecification()
to access files and compile specifications given by command line.

As the next step, the compiler driver instantiates SymRoot to 
make symbol information be shared between compiler components.
All Sym objects such as symbol table and entries in the symbol
table (variable, subprogram, constant, type, etc.) contain a
reference to the SymRoot object so that methods of Sym class
and IoRoot class can be invoked. The symbol tables are nested
reflecting scope of symbols and organized into tree structure.
The root of the symbol tables is named as symTableRoot. The 
symbol table currently effective is called current symbol table
and named as symTableCurrent. They are accessed from SymRoot
object. Built-in symbols representing basic types, etc. are
registered in symTableRoot and can be accessed from SymRoot
object, hence they can be accessed from all methods under 
Sym and its subclasses. 

The compiler driver instantiates HirRoot and then invokes
some parser such as C parser that translates source program
into HIR. The parser should convey the instance of HirRoot
to its components so that they can access I/O files, symbol
tables, and HIR information. The super class of HirRoot is
IrRoot where the root of intermediate representation (IR) of 
input source program is recorded as programRoot. 
The IrRoot is also the super class of LirRoot. 
HIR representation of input program can be traced 
starting from programRoot. 

The compiler driver may either parse all subprograms in a source 
file before code generation, or repeat parsing and code generation 
for each subprogram in the source file. In the former case, 
inter-procedural optimization and parallelization may be 
possible but consumes large memory space. In the later case, 
required memory space is relatively small but the possibility 
of inter-procedural optimization is limited.

Error messages and warning messages are issued by invoking
put method of Message class in coins package.
The number of messages issued is counted for each group
of messages. Compiler implementers may prepare their own
error handlers that invokes the put method in order to provide
some information peculiar to each component such as source program
line number. (See Message.java.)

It is often required to see the status of compiler for debugging.
The method 
   void print(int pLevel, String pAt, String pMessage) 
in the class Debug in coins package prints pAt and pMessage if 
pLevel is less or equal to the debug level specified by command line. 
Its usage is illustrated by
    hirRoot.ioRoot.dbgHir.print(4, "subpNode", pSubp.getName());
(See Debug.java.)


3.2 Getting triggers of internal representation
-----------------------------------------------

In the process of parsing, the list of subprogram definitions will
be constructed (by addSubpDefinition() of coins.ir.hir.Program
called in the parser). Each subprogram definition can be get by
using iterator in such way as
    coins.ir.IrList lSubpDefList
      = ((Program)hirRoot.programRoot).getSubpDefinitionList();
    Iterator lSubpDefIterator = lSubpDefList.iterator();
    while (lSubpDefIterator.hasNext()) {
      SubpDefinition lSubpDef = (SubpDefinition)(lSubpDefIterator.next());
      ....
    }
where, hirRoot refers to the HirRoot object.
The subprogram defined by the subprogram definition is get by
    Subp lSubp = lSubpDef.getSubpSym();
The symbol table local to the subprogram is get by
    SymTable lSymTable = lSubp.getSymTable();
or
    SymTable lSymTable = lSubpDef.getSymTable();
The procedural body of the subprogram is get by
    HIR lHirSubpBody = lSubp.getHirBody();
or 
    HIR lHirSubpBody = lSubpDef.getHirBody();
(See IrLisr of coins.ir, SubpDefinition, HirIterator of coins.ir.hir, 
Subp of coins.sym)

Every HIR nodes of the subprogram lSubp can be traversed by using
HirIterator in such a way as
    for (HirIterator lHirIterator 
           = hirRoot.hir.hirIterator(lSubp.getHirBody());
         lHirIterator.hasNext(); ) {
      HIR lNode = lHirIterator.next();
      ....
    }
All statements in the subprogram can be traversed by a coding sequence
such as
    for (HirIterator lHirIterator 
          = hirRoot.hir.hirIterator(lSubp.getHirBody());
         lHirIterator.hasNextStmt(); ) {
      Stmt lStmt = lHirIterator.getNextStmt();
      ....
    }
Note that some node or statement may be null and it is better to do
null-check before applying methods to them.

To catch node or statement of some particular class during the
traversing procedure, such coding as
    if (lNode instanceof VarNode) { .... }
    if (lNode instanceof SubpNode) { .... }
    if (lStmt instanceof AssignStmt) { .... }
will be convenient. They may be also caught by such coding as
    if (lStmt.getOperator() == HIR.OP_VAR) { .... }
Another way of coding is to use HirVisitor in such a way as
    public class 
    ProcessHirNode extends coins.ir.hir.HirVisitorModel1
    {
      public final HirRoot
      hirRoot;

      public 
      ProcessSymNode( HirRoot pHirRoot )
      {
        super(pHirRoot);
        hirRoot = pHirRoot;
      }

      public void
      processSymNode( Subp pSubp )
      {
        hirRoot.symRoot.subpCurrent = pSubp;
        visit(pSubp.getHirBody());
      }

      protected void
      atVarNode( VarNode pVarNode )
      {
        ....
      }

      protected void
      atSubpNode( SubpNode pSubpNode )
      {
        ....
      }
      ....
    }
(See HIR, HirVisitor, HirVisitorModel1, HirVisitorModel2 in 
coins.ir.hir.)

To scan all symbols recorded in symbol tables, iterators are
provided in SymTable interface. A coding sequence
    for (SymIterator lIterator = lSymTable.getSymIterator();
         lIterator.hasNext(); ) {
      Sym lSym = lIterator.next();
      .....
    }
traverses all symbols recorded in the symbol table lSymTable.
If SymIterator is applied to symTableRoot, all global symbols 
in the given program unit are traversed.

Another coding sequence
    for (SymNestIterator lIterator = lSymTable.getSymNestIterator();
         lIterator.hasNext(); ) {
      Sym lSym = lIterator.next();
      .....
    }
traverses all symbols recorded in the symbol table lSymTable and
its descendent symbol tables. If SymNestIterator is applied to
symTableCurrentSubp, all symbols local to the current subprogram
are traversed. If SymNestIterator is applied to symTableRoot,
all symbols recorded in the given program unit except constants
in symTableConst are traversed. 

The next coding sequence 
    for (SymTableIterator lTableIterator = lSymTable.getSymTableIterator();
         lTableIterator.hasNext(); ) {
      SymTable lSymTableCurr = lTableIterator.next();
      for (SymIterator lSymIterator = lSymTableCurr.getSymIterator();
         lSymIterator.hasNext(); ) {
        Sym lSym = lSymIterator.next();
        ......
      }
    }
will traverse all symbol tables under lSymTable and all symbols 
in the traversed symbol tables examining attributes of the traversed
symbol tables. 


3.3 Symbol handling
-------------------

(1) Factory methods

Construction of Sym, HIR and LIR are done by factory methods, 
that is, objects of Sym, HIR and LIR are not usually 
constructed by invoking constructors directly but by invoking 
factory methods. The factory methods of Sym are described
in the Sym interface (Sym.java). Examples of the usage of
factory methods are shown in SimpleMain.java

(2) Subprogram, variable, constant

A subprogram symbol can be constructed in such coding as
    Subp lSubp = symRoot.sym.defineSubp("sub1", 
                                        symRoot.sym.typeInt);
where, the first parameter specifies subprogram name and
the second parameter specifies return value type. 
symRoot refers to SymRoot object. If symRoot is not 
accessible directly but hirRoot is accessible, replace 
symRoot in the above coding by hirRoot.symRoot.
String parameters for Sym, HIR methods should have
.intern() in order to make unique String object that 
can be compared by == operator instead of equals methodm if the
parameter has the form of String expression other than String constant.
All String objects returned by Sym, HIR, LIR methods
are unique String object and need not to have .intern().
(See Sym, HIR, LIR.)

Similarly, a variable symbol can be constructed by
    Var lVar = symRoot.defineVar("var1", symRoot.typeFloat);
Integer constant, long int constant can be made by
    IntConst lIntConst1 = symRoot.sym.intConst(123, symRoot.typeInt);
    IntConst lLongConst1= symRoot.sym.intConst(123, symRoot.typeLong);
    IntConst lIntConst2 = symRoot.sym.intConst("123", 
                                               symRoot.typeInt);
and floating constant can be made by
    FloatConst lPai = symRoot.sym.floatConst(3.14, symRoot.typeFloat);
    FloatConst lDoubleConst1 = symRoot.sym.floatConst(1.2, 
                                            symRoot.typeDouble);
For mode detail, see the Sym0 interface (Sym0.java).

Care should be taken in making a string constant because the 
representation of character string differs by language. For example,
a string constant in C has trailing \0 and may contain preceding
escape character for some special characters. A string constant is 
recorded as a pure string (processing escape characters by 
makeStringBody of coins.SourceLanguage) that is language independent. 
To make a string for C language from the pure string, makeCstring method 
is provided, for Java language, makeJavaString method is provided, 
and so on.

(3) Symbol table

All symbols are recorded in some symbol table.
The interface of the symbol table is SymTable (SymTable.java).
An instance of SymTable is created for each scope
of symbols corresponding such language constructs
as program, subprogram, struct, etc.

Several symbol tables are constructed according to the
structure of given source program.
At first, a global symbol table is created by initiate()
of SymRoot and symbols inherent to the COINS infrastructure
are recorded in it. The symbols inherent to the COINS compiler
infrastructure are such ones as basic types and bool constants.
Types of each source language are mapped to the corresponding
types of the COINS compiler infrastructure in such way as
    C int            COINS int
    C array          COINS vector
    Fortran INTEGER  COINS int
    Fortran REAL     COINS float

When a new scope of symbols is opened, a new symbol table is
to be created and linked to ancestor symbol table that contains
symbols to be inherited by the new scope (pushSymTable()).
When the current scope is closed, the current symbol table is
to be closed by which the ancestor symbol table becomes the 
current symbol table again (by using popSymTable()).

Symbols are searched in the current symbol table 
(symTableCurrent of SymRoot) and its ancestors in the reverse 
order of scope addition. The methods pushSymTable and popSymTable 
changes symTableCurrent when they are called.
Popped symbol table is not discarded unless it is empty but
made invisible for search procedures so as to make
inter-procedure optimization and parallelization can be done.
A symbol table usually has corresponding program construct
such as subprogram and it is called as the owner of the symbol
table. There are links between such constructs (owner) and 
corresponding symbol table to show their correspondence (getOwner). 
Anonymous construct (anonymous Struct, BlockStmt, etc.) may have a
name generated by the compiler.

(4) Scope of symbols

Source program symbols (symbols appearing in source program)
have their scope as defined by the grammar of the language.
Each Struct and Union opens a new scope.
Scope of constants is the entire compile unit.
Scope of registers is the subprogram using them.
Scope of temporal variables generated by the compiler
is the subprogram within which the temporal variables
are defined.

Symbols may have indication of scope (extern, public, private,
compile_unit, etc.) and variables may have indication of
storage class (static, automatic, etc.). In storage allocation
and symbol treatment in code generation, these indications and
nesting of symbol tables should be properly treated. Care should
be taken that one subprogram may have nested symbol tables.
Nesting of subprograms is treated as the nesting of corresponding
symbol tables.

(5) SymRoot

SymRoot class is used to access Sym (symbol) information and
information prepared by other classes such as IoRoot, HIR, etc.
All Sym objects contain a reference to the SymRoot object
from which symbol information and methods can be quickly accessed. 
The SymRoot object contains a reference to IoRoot.
Thus, every Sym objects can access input/output methods, too.
SymRoot contains SymTable references:
    symTableRoot    // Root of SymTable.
    symTableConst   // Constant table.
    symTableUnique  // SymTable that contains generated unique name.
    symTableCurrent // Refers to the symbol table for subprogram, 
                    // etc. under construction or under processing.
    symTableCurrentSubp // Symbol table of current subprogram.
                        // Some kinds of symbols (Type, Label, temporal 
                        // variable, etc.) are registered not in 
                        // symTableCurrent but in symTableCurrentSubp.
The subprogram under construction or processing is recorded in
subpCurrent of SymRoot.
In parsing, flow analysis, optimization, code generation, etc.,
it is strongly recommended to set SymTableCurrent, subpCurrent,
symTableCurrentSubp as it is exemplified in SimpleMain. 
They are used in searching/generating symbols.
If new symbols are to be created in such processing, 
SymTableCurrent and subpCurrent should be set properly.
Several methods such as sym/pushSymTable, sym/popSymTable, 
flow/subpFlow keeps such consistency automatically as
it is described in explanations of these methods.
pushSymTalbe/popSymTable methodes should be used in parsers but 
should not be used in optimization, code generation, etc. because 
pushSymTable creates a new SymTable corresponding to a new scope
in input source program.

SymRoot contains type symbols of base types such as
typeBool, typeChar, typeInt, etc. as predefined symbols.

(6) Type

Symbols such as variables, subprograms, constants have type.
The type is represented by a type symbol.
Types used in HIR are classified into base type and introduced
type. 
    Base type (type intrinsic to HIR):
      int        represented by typeInt   of SymRoot
      float      represented by typeFloat of SymRoot
      ....       (see SymRoot)
    Introduced type (type introduced by the input program):
      pointer type     represented by the class PointerType
      vector type      represented by the class VectorType
      structure type   represented by the class StructType
      union type       represented by the class UnionType 
      enumeration type represented by the class EnumType
      subprogram type  represented by the class SubpType
      defined type     represented by the class DefinedType 

A pointer type is defined by pointer indication (* in C) and
the type of the target of the pointer.
A vector type is derived from element type by specifying the type
of vector element, the number of elements in the vector, and the
lower bound of its index.
A structure type is defined by specifying its elements
that may represent object different with each other.
A union type is defined by specifying overlaid elements.
A subprogram type is defined by specifying type of parameters
and the type of return value.
An enumeration type is defined by specifying enumeration
literals representing some integer value.
Defined types may be a renaming of base type or a compound
type that is derived from base type or defined type.

Type symbols are created by factory methods in Sym0 interface. 
The factory methods for type creation are baseType, pointerType, 
vectorType, structType, unionType, enumType, subpType
and definedType.

The structure of SubpType, StructType, UnionType, and EnumType 
are a little complicated. It is not recommended to use subpType
directly but it is recommended to use defineSubp of Sym interface
that defines both subprogram symbol and subprogram type.
For making type instance of StructType, UnionType, or EnumType,
read carefully the explanation of the corresponding method
structType, unionType, or enumType of Sym interface.

In order to define a subprogram symbol,
    make the subprogram symbol by defineSubp(...),
    add formal parameters by addParam(....),
    close the subprogram declaration by closeSubpHeader(....)
in such a way as
    Subp lSubp = symRoot.sym.defineSubp("name", returnType);
    SymTable lSubpSymTable = symRoot.symTableCurrent.pushSymTable(lSubp);
    lSubp.addParam(param1);
    lSubp.addParam(param2);
    ....
    lSubp.setOptionalParam(); // Not required if it has no optional parameter.
    lSubp.closeSubpHeader();
    Var lVar1 = lSubpSymTable.defineVar("a", symRoot.typeInt);
    ....
    symRoot.symTableCurrent.popSymTable();
Above procedure will make a subprogram object with
inevitable fields such as parameter list,
return value type, and subprogram type.
closeSubpHeader() will make subprogram type of the form
  <SUBP < paramType_1 paramType_2 ... > returnValueType
        optionalParam >
where, paramType_1, paramType_2, ... are parameter type,
returnValueType is return value type,
optionalParam is true or false depending on whether optional
parameter (... in C) is specified or not.
pushSymTable(lSubp) makes new symbol table owned by the subprogram
lSubp and makes it symTableCurrent. lSubpSymTable.defineVar( .... )
defines a variable as an element of lSubpSymTable. popSymTable()
makes lSubpSymTable invisible form symbol search procedure and makes
the previous symbol table as symTableCurrent.

To make a structure type, structType method is provided in Sym0
interface. Users may understand how to use it by following example:
    As for
       struct listNode {
         int nodeValue;
         struct listNode *next;
        } listAnchor, listNode1;
    following coding will make corresponding StructType.
      Sym lTag = symRoot.symTableCurrent.generateTag("listNode");
      StructType lListStruct = sym.structType(null, lTag); // Incomplete type.
      PointerType lListPtrType = sym.pointerType(lListStruct);
      PointerType lIntPtrType = sym.pointerType(symRoot.typeInt);
      symRoot.symTableCurrent.pushSymTable(lListStruct);
      Elem lValue = sym.defineElem("nodeValue", symRoot.typeInt);
      Elem lNext  = sym.defineElem("next", lListPtrType);
      lListStruct.addElem(lValue);
      lListStruct.addElem(lNext);
      lListStruct.finishStructType(true);
      symRoot.symTableCurrent.popSymTable();

Methods are provided to get information of introduced types:
    getSizeValue     of Sym interface
    getPointedValue  of Sym interface
    getElemCount     of VectorType interface
    getElemList      of StructType interface
    getParameterTypeList of SubpType interface
    ....

(7) Generation of temporal variables and labels

In compilers, temporal variables are often required to be generated 
for optimization, parallelization, etc.
A method 
    public Var generateVar( Type pType );
is provided in the SymTable interface to generate a temporal variable
in the symbol table local to the current subprogram (symTableCurrentSubp).

In order to generate labels, a method 
    public Label generateLabel();
is provided in the SymTable interface. It generate a label
in the symbol table local to the current subprogram (symTableCurrentSubp).

(8) Representation of symbol in text form

The method toString() gives the representation of a symbol 
in text form. It may be used for debug purpose, etc.
toStringShort() shows short description and toStrindDetail()
shows full description of the symbol.


3.4 HIR handling
----------------

(1) Getting information of HIR

HIR can be instantiated and handled mostly by using methods in
HIR0 (elementary HIR interface). Simple compilers can be constructed 
by using methods in HIR0 and Sym0 interfaces. In constructing 
more complicated compiler, use methods in HIR and Sym. HIR inherits HIR0
and Sym inherits Sym0. In the following explanations, HIR may be read as
HIR0 in most cases.

Most of HIR constructs have correspondence to some source language
constructs, e.g.
    SubpDefinition - subprogram definition
    Stmt           - statement
    AssignStmt     - assign statement
    LoopStmt       - loop statement
    BlockStmt      - block statement
    Exp            - expression
    VarNode        - variable
    ConstNode      - constant

Subcomponent of HIR constructs can be get by methods provided
in each HIR subclass (interface that extends HIR). For example,
    getLeftSide(), getRightSide() of AssignStmt
    getIfCondition(), getThenPart(), getElsePart() of IfStmt
    getLoopStartCondition(), getLoopBodyPart() of 
      ForStmt, WhileStmt, UntilStmt that extend LoopStmt
    getExp1(), getExp2() of Exp
    getSymNodeSym() of VarNode, ElemNode, ConstNode that extend SymNode
As for detail, see corresponding interfaces that extend HIR.

The subcomponents can be get also by specifying child number
by getChild1(), getChild2(), and getChild(int pChildNumber). 
In such coding, exact knowledge of HIR data structure is required. 
getChildCount of IR interface gives the number of children of HIR nodes.
(getChild1() and getChild2() have less overhead than getChild(1) and 
getChild(2).)

All HIR nodes have type attribute. It can be get by the method
getType. Some HIR nodes may have flags set during parsing, 
analysis, etc. The method getFlag(int pFlagNumber) returns 
the status of the flag indicated by pFlagNumber.

(2) Representation of HIR in text form

The method toString() gives the representation of an HIR node
in text form. It may be used for debug purpose, etc.
toStringShort() shows short description and toStrindDetail()
shows full description of the node.

The method print(....) of HIR prints the subtree stemming from the 
specified node, that is, all subtrees of the specified node are 
printed recursively.

(3) Normal construction of HIR

HIR nodes can be constructed by methods defined in HIR interface.
Leafs of HIR tree are symbol node, list, etc. In HIR, symbols
are represented by symbol nodes having reference to some 
symbol table entry such as variable and subprogram.

A Symbol node can be generated by factory methods of HIR.
    VarNode lVarNode1     = hirRoot.hir.varNode(lVar1);
    SubpNode lSubpNode1   = hirRoot.hir.subpNode(lSubp);
    ConstNode lConstNode1 = hirRoot.hir.constNode(lIntConst1);
will instantiate VarNode, SubpNode, ConstNode, each respectively.
hirRoot.hir, hirRoot.symRoot.sym, etc. may be shortened by local 
declarations
    HIR hir = hirRoot.hir;
    Sym sym = hirRoot.symRoot.sym;

Arithmetic expressions can be built by such coding as
    Exp lExp1 = hir.exp(HIR.OP_ADD, lVarNode1, lConstNode1);
    Exp lExp2 = hir.exp(HIR.OP_MULT, lExp1, hir.varNode(lVar2));
Assign-statement, if-statement, etc. are built by
    Stmt lAssign1 = hir.assignStmt(lVarNode1, lExp1);
    Stmt lAssign2 = hir.assignStmt(lVarNode1, lExp2);
    Stmt lIf1 = hir.ifStmt(hir.exp(HIR.OP_CMP_EQ, lExp1,
                                   hir.constNode(0, symRoot.typeInt)),
                           lAssign1, lAssign2);
etc. 

(4) Top down construction of HIR

HIR tree is usually constructed in bottom up manner starting from 
leafs and combining them as above. Top down construction is also
possible by attaching a subtree (leaf node or nonleaf node) to parent 
tree as its child.
    setChild1( IR pChild1 ), setChild2( IR pChild2 ),
    setChild( int pChildNumber, IR pChild )
of IR interface are available for such construction. The top down
construction requires knowledge of detailed structure of HIR tree.
Recommended way is bottom up construction by using the prepared
factory methods.
 
In some cases, strict bottom up construction is difficult. For
example, in the construction of block statement and subprogram 
definition, most of their children are not known at first.
Several methods are provided to construct such subtrees.
They are explained in the next section.
 
(5) Construction by sequence of statements
 
Block statement can be constructed by a statement sequence such as
    BlockStmt lBlockStmt = hir.blockStmt(null);
    lBlockStmt.addLastStmt(lAssign1);
    lBlockStmt.addLastStmt(lIf1);
    ....
(See HIR, BlockStmt in coins.ir.hir.)

Subprogram can be constructed by such statement sequence as
    Subp lMain = symRoot.sym.defineSubp("main", symRoot.typeInt);
    SymTable lSymTable = symRoot.symTableRoot.pushSymTable(lMain);
    lMain.closeSubpHeader();
    SubpDefinition lMainDef = hir.subpDefinition(lMain, lSymTable);
    BlockStmt lBlockStmt = hir.blockStmt(null);
    lMainDef.setHirBody(lBlockStmt);
    ....
    lBlockStmt.addLastStmt(lAssign1);
    ....
    symRoot.symTableCurrent.popSymTable();
(In case of prototype declaration, use closeSubpPrototype instead of 
 closeSubpHeader.  See HIR, SubpDefinition, Subp, SimpleMain, etc.)

IrList, HirList can be constructed by such statement sequence as
    HirList lList = hir.irList();
    lList.add(....);
    ....
(See HIR in coins.ir.hir, IrList in coins.ir.)

(6) Note on HIR construction and transformation

An example of HIR generation is shown by examples/SimpleMain.java
Readers can see how to construct symbol table and HIR tree of a program. 
It may be useful in coding new parser.

It is possible to build Sym objects and HIR objects by invoking 
constructors of VarImpl, SubpImpl, VarNodeImpl, ConstNodeImpl, 
AssignStmtImpl, IfStmtImpl, etc. but such coding is not recommended.
Such coding may cause many errors because there are some hidden 
parameters supplied by factory methods and there are some preparatory 
methods to be applied to parameters.

It should be noted that the structure of HIR is tree. Every 
nodes in the HIR tree should be created newly  and should not 
be shared because sharing of nodes violates the data structure
rule of tree. If a subtree same to some subtree X is required,
X should be copied by the method
    X.copyWithOperands()
if X is an expression or
    X.copyWithOperandsChangingLabels()
if X is a statement that may include label definitions.

When entire HIR subtree of a subprogram is constructed, finishHir()
should be called in such way as 
    lSubpDefinition.finishHir();
where lSubpDefinition represents SubpDefinition node of the subprogram.
The method finishHir() does such operations as giving index number to HIR 
nodes under the subtree, checking tree structure conformance, certificating
getHirPosition() for labels, and so on. When the HIR subtree of a subprogram
was changed in optimization and parallelization, then finishHir() should
also be called for the subtree. This method is not required to call for each
modification of statements and expressions of SubpDefinition but at the end 
of creation or transformation of the entire SubpDefinition subtree. In parsers 
that create all HIR subtrees for subprograms in given compile unit before
passing control to later phases of the compiler, it is be better to call
finishHir() only once for programRoot instead of calling for each subprogram 
definition in such way as
<pre>
    hirRoot.programRoot.finishHir();
</pre>
where hirRoot is the instance of HirRoot.

(7) Indispensable items and optional items

Syntactic structure of HIR is shown in HIR0.java as comment lines.
Nonterminals that do not derive to null are indispensable items.
Nonterminals that derive to null are optional items.
There may be some exceptional nonterminal that derives to null
but represents indispensable item. Such case may happen in 
avoiding BNF productions to become too much verbose but such
case can be clearly discriminated by considering semantics.



