6. Parallelization for HIR
6.1. Loop Parallelizer
"coins.lparallel" package is the do-all type loop parallelizer package.
This package analyzes the program for parallelizable loops and generates either
OpenMP program written in C or machine code (assembly language code) to be
executed in parallel. When OpenMP program is generated, COINS does not do
the parallelization itself, but the driver calls an external OpenMP compiler
to execute the program in parallel.
6.1.1. Usage
To generate OpenMP program written in C, type
java coins.driver.Driver -coins:parallelDoAll=OpenMP foo.c
To generate assembly language code executable in parallel, type
java coins.driver.Driver -S -coins:parallelDoAll=n foo.c
where n is an integer number indicating maximum degree of parallelization.
The assembly language program can be executed in parallel by linking with
execution time routines for parallelization. The execution time routines
should be provided for corresponding execution environment according
to the parallelizing framework written in other document.
A command such as
java coins.driver.Driver -S -coins:parallelDoAll=n,hir2c foo.c
will produce both of assembly language program and OpenMP/C program
that can be
executed in parallel by linking with execution time routines for parallelization.
The generated OpenMP/C file will have suffix "-loop" in such way as foo-loop.c.
To output executable code using the OpenMP compiler named 'omcc', type
java coins.lparallel.LoopPara foo.c
This is just the above C output passed to an OpenMP compiler.
(There may be other configuration options, such as environment variables,
needed to execute the resultant code in parallel: see your OpenMP compiler
manual.)
The driver LoopPara also supports HIR optimization options supported by the
driver coins.flow.FlowOpt.
For example,
java coins.lparallel.LoopPara -coins:hirOpt=cpf,hir2c foo.c
may enable some code that otherwise is not parallelizable to be
parallelized.
6.2. Coarse Grain Parallelizing Module
6.2.1. OVERVIEW
6.2.1.1. Design Concept
The coarse-grain parallelizing module is constructed for realizing a
coarse-grain parallelizing compiler named CoCo in java. The CoCo is the
research product and it is still at the infant stage as a parallelizing
compiler. Therefore it contains many constraints for practical usage as
mentioned later. We have found a lot of important issues which should be
solved as practical coarse-grain parallelizing compilers by implementing
the CoCo as an automatic parallelizing compiler. The coarse-grain
parallelizing module is a part of the COINS infrastructure, and then the
module components are available as a set of parts for coarse-grain
parallelization.
The CoCo analyzes an input C program and transforms it into a macro
(coarse-grain) task graph with data/control flow dependence. Then, the
CoCo parallelizes the macro tasks by using OpenMP directives for SMP
machines. This analysis and transformation are carried out on Coins HIR
(High-level Intermediate Representation). The CoCo generates a parallel
program in HIR containing OpenMP directives as comments. The HIR program
is translated into a C program with OpenMP directives by the HIR-to-C
translator. Finally, it is compiled by the Omni-OpenMP compiler and then
executed in parallel on a SMP machine.
Macro tasks correspond to basic blocks, loops and/or subroutines. After an
input C program is divided into macro tasks, an execution starting
condition of each macro task is analyzed. The execution starting condition
represents whether the macro task can be executed or not at a certain time.
A runtime macro task scheduler evaluates an execution starting condition of
each macro task at execution time, and dynamically assigns executable one
to a light load processor of a SMP machine.
The coarse-grain parallelizing module is a tool set, which consists of the
following functions:
- Divides an input C program into macro tasks based on basic blocks,
- Analyses an execution starting condition of each macro task,
- Embeds OpenMP directives for parallel execution at a macro task
level,
- Schedules dynamically each macro task to a processor of a SMP
machine.
6.2.1.2. Data Structures
The coarse-grain parallelizing module utilizes a macro flow graph model.
Nodes of the graph correspond macro tasks. As for edges between nodes,
there are two types of edges representing control flow and data flow
dependences.
An execution starting condition is represented in a boolean expression.
The operators consist of 'logical AND' and/or 'logical OR'. The operand
conditions are as follows:
- If macro tasks with data dependence have been executed or decided
not to be executed,
- If control flow dependence to a macro task has been decided.
6.2.1.3. Scheduler
The runtime macro task scheduler is independently attached to the main part
of an output program. If an input program is named 'xxx.c', the scheduler
written in C language is located at the file named 'xxx-sch.c' at the same
directory.
6.2.2. CONSTRAINTS OF CURRENT IMPLEMENTATION
The current version of the coarse-grain parallelizing module, CoCo, has the
following constraints:
- The coarse-grain parallelizing module parallelizes only a main
function. When an input program has several functions, the module
ignores the other functions.
- To execute a program in parallel efficiently, the module should
adjust grain granularity of tasks such as 'loop unrolling'. Up to
now, the module does not do that.
- A loop in a program is translated into a single macro task. The
module recognizes only reserved words in HIR such as 'while' and/or
'for' as loops. Other types of loop are not translated into macro
tasks.
- The module finds out an exit macro task only if the task has no
successors or includes return statements. Other macro tasks which
include 'exit()' functions, for example, are not recognized as exit
ones.
- When there are some macro tasks which have no dependence with each
other, the execution order of macro tasks may be different from the
order of sequential execution.
6.2.3. HOW TO USE
The CoCo inserts OpenMP directives into a HIR program as comments for
coarse-grain parallelizing. The CoCo utilizes 'hir2c' module, translator
from HIR to an OpenMP program written in C (OpenMP/C program),
since the back end of Coins does not support the
OpenMP directives yet. After a coarse-grain parallel OpenMP/C program is
generated by hir2c, you must compile it by an OpenMP compiler in order to
execute in parallel on a SMP machine.
To obtain a coarse-grain parallel program, you should operate as follows:
- Compile 'xxx.c' by Coins C compiler specifying the option
'-coins:coarseGrainParallel' or
'-coins:cgParallel'.
java -classpath ./classes Driver -coins:cgParallel xxx.c
The generated OpenMP/C file will have suffix "-cgparallel" in such way as
xxx-cgparallel.c for source C file xxx.c.
- Compile the program with a runtime scheduler by Omni-OpenMP 'omcc'.
omcc xxx-cgparallel.c xxx-sch.c
6.2.4. OPTIONS
There are several compile time options for the coarse-grain parallelizing
module. For other options of the Coins Compiler Driver, see
2. How to use the Compiler Driver
or
3. How to use C Compiler.
- -coins:trace=MDF.xxxx
-
To output trace information of this module for debugging, and specify the
trace level as follows:
2000 : Output general debug information of the module.
- -coins:stopafterhir2c
-
Quit compilation of each compile unit just after generating a C program by
'hir2c'.
- -coins:coarseGrainParallel
-coins:cgParallel -
Use this module.