dancefish's Blog

This post is part of a series about GCC internals and specifically about howto create a new language front-end for GCC. For a list of related posts, please check this page.

If I would ask many people what the executable gcc is doing, most of the people would answer, “Well, it’s a compiler, though … it’s compiling the source file into a target file”. But that is NOT correct. The executable gcc is not a compiler although the abbreviation means GNU C compiler. gcc represents what is generally called a compiler driver or more generic a compilation driver, in the following just called driver. If you now think, this guy is completely insane, please add the -v option to one of your gcc commands and and check what gcc is really doing.

While a compiler really is only responsible for transforming the source file into (possibly optimized) target machine code, the driver is the high-level organizer in the overall compilation process creating a object/shared/executable file from one or more source files. Thereby, the driver divides the compilation process into several phases which greatly depend on the capabilities of the used programs. gcc in newer versions (for example 4.4.0 and newer) uses the following phases assuming that an executable is created from a single source file (gcc source.c -o exec):

1. compiler (cc1)
2. assembler (as; from GNU binutils)
3. collect2 (collect2; part of GCC)
3.1. linker (ld; from GNU binutils)

Just a note: in earlier versions of gcc, the driver added a separate pre-processing phase before the compiler phase, but in the recent versions the pre-processor is omitted, because the compiler (cc1) incorporates the pre-processor, at least when using the default options. By using the -no-integrated-cpp option, you instruct the driver to split the compiler phase into 2 distinct phases: pre-processing and compiler.

Each of the phases above produces intermediate or temporary files (assuming that the -pipe option is not specified) as output files which then serve as input files for the subsequent phase, in detail:

pre-processor (if separate phase)
- input: *.c
- output: *.i
compiler
- input: *.c & *.i files
- output: *.s
assembler
- input: *.s
- output: *.o
collect2/linker
- input: *.o
- output: self-defined pattern, e.g. *.exe

Based on the file extension, the driver knows which phase to start with when processing the file. Though, if a *.s file is given as input file along with a *.o and *.c file to produce an executable in a single gcc command, gcc would run the *.s file through the assembler phase, the *.c file through the compiler and assembler phase and the resulting three *.o files finally through the collect2/linker phase. By the way, if an input file extension matches none of the defined extensions, the file is taken is collect2/linker input!

The default behavior of gcc is to try to produce an executable file from the input files. But, gcc could be instructed to stop after any of the above phases by using specific driver options:

-E : stop after pre-processing, produce a *.i file
-S : stop after compiler, produce a *.s file
-c : stop after assembler, produce a *.o file
none : stop after collect2/linker

Hint: If you want to run through all the phases with a single gcc command but neverthess keep the intermediate files, use the -save-temps option.

You may ask, why all this is important for a new language front-end in gcc? Right! Because your new language might require different or additional phases than the described ones and then you should know where to start with to bring your new language front-end driver to execute your specific phases. For this purpose, GCC (capital letters are used to distinguish the complete compiler project from the C-specific driver) is designed in a modular way to allow the front-ends to “register” new phases, but the addition/modification of the phases will be discussed in a later post. However, for the already interested reader, please take a look at the file gcc-x.y.z/gcc/gcc.c and one of the the files of an already existing front-end, for example of C++ ingcc-x.y.z/gcc/cp/lang-specs.h.

Tags:
Posted in 书摘 | 10463 Comments

gcc scalar evolution

2010年8月31日 01:02

hi all，下面这段文字会让你很快乐，至少我是这样。

This pass analyzes the evolution of scalar variables in loop

structures. The algorithm is based on the SSA representation,

and on the loop hierarchy tree. This algorithm is not based on

the notion of versions of a variable, as it was the case for the

previous implementations of the scalar evolution algorithm, but

it assumes that each defined name is unique.

The notation used in this file is called "chains of recurrences",

and has been proposed by Eugene Zima, Robert Van Engelen, and

others for describing induction variables in programs. For example

"b -> {0, +, 2}_1" means that the scalar variable "b" is equal to 0

when entering in the loop_1 and has a step 2 in this loop, in other

words "for (b = 0; b < N; b+=2);". Note that the coefficients of

this chain of recurrence (or chrec [shrek]) can contain the name of

other variables, in which case they are called parametric chrecs.

For example, "b -> {a, +, 2}_1" means that the initial value of "b"

is the value of "a". In most of the cases these parametric chrecs

are fully instantiated before their use because symbolic names can

hide some difficult cases such as self-references described later

(see the Fibonacci example).

A short sketch of the algorithm is:

Given a scalar variable to be analyzed, follow the SSA edge to

its definition:

- When the definition is a GIMPLE_ASSIGN: if the right hand side

(RHS) of the definition cannot be statically analyzed, the answer

of the analyzer is: "don't know".

Otherwise, for all the variables that are not yet analyzed in the

RHS, try to determine their evolution, and finally try to

evaluate the operation of the RHS that gives the evolution

function of the analyzed variable.

- When the definition is a condition-phi-node: determine the

evolution function for all the branches of the phi node, and

finally merge these evolutions (see chrec_merge).

- When the definition is a loop-phi-node: determine its initial

condition, that is the SSA edge defined in an outer loop, and

keep it symbolic. Then determine the SSA edges that are defined

in the body of the loop. Follow the inner edges until ending on

another loop-phi-node of the same analyzed loop. If the reached

loop-phi-node is not the starting loop-phi-node, then we keep

this definition under a symbolic form. If the reached

loop-phi-node is the same as the starting one, then we compute a

symbolic stride on the return path. The result is then the

symbolic chrec {initial_condition, +, symbolic_stride}_loop.

Examples:

Example 1: Illustration of the basic algorithm.

| a = 3

| loop_1

| b = phi (a, c)

| c = b + 1

| if (c > 10) exit_loop

| endloop

Suppose that we want to know the number of iterations of the

loop_1. The exit_loop is controlled by a COND_EXPR (c > 10). We

ask the scalar evolution analyzer two questions: what's the

scalar evolution (scev) of "c", and what's the scev of "10". For

"10" the answer is "10" since it is a scalar constant. For the

scalar variable "c", it follows the SSA edge to its definition,

"c = b + 1", and then asks again what's the scev of "b".

Following the SSA edge, we end on a loop-phi-node "b = phi (a,

c)", where the initial condition is "a", and the inner loop edge

is "c". The initial condition is kept under a symbolic form (it

may be the case that the copy constant propagation has done its

work and we end with the constant "3" as one of the edges of the

loop-phi-node). The update edge is followed to the end of the

loop, and until reaching again the starting loop-phi-node: b -> c

-> b. At this point we have drawn a path from "b" to "b" from

which we compute the stride in the loop: in this example it is

"+1". The resulting scev for "b" is "b -> {a, +, 1}_1". Now

that the scev for "b" is known, it is possible to compute the

scev for "c", that is "c -> {a + 1, +, 1}_1". In order to

determine the number of iterations in the loop_1, we have to

instantiate_parameters (loop_1, {a + 1, +, 1}_1), that gives after some

more analysis the scev {4, +, 1}_1, or in other words, this is

the function "f (x) = x + 4", where x is the iteration count of

the loop_1. Now we have to solve the inequality "x + 4 > 10",

and take the smallest iteration number for which the loop is

exited: x = 7. This loop runs from x = 0 to x = 7, and in total

there are 8 iterations. In terms of loop normalization, we have

created a variable that is implicitly defined, "x" or just "_1",

and all the other analyzed scalars of the loop are defined in

function of this variable:

a -> 3

b -> {3, +, 1}_1

c -> {4, +, 1}_1

or in terms of a C program:

| a = 3

| for (x = 0; x <= 7; x++)

| {

| b = x + 3

| c = x + 4

| }

Example 2a: Illustration of the algorithm on nested loops.

| loop_1

| a = phi (1, b)

| c = a + 2

| loop_2 10 times

| b = phi (c, d)

| d = b + 3

| endloop

For analyzing the scalar evolution of "a", the algorithm follows

the SSA edge into the loop's body: "a -> b". "b" is an inner

loop-phi-node, and its analysis as in Example 1, gives:

b -> {c, +, 3}_2

d -> {c + 3, +, 3}_2

Following the SSA edge for the initial condition, we end on "c = a

+ 2", and then on the starting loop-phi-node "a". From this point,

the loop stride is computed: back on "c = a + 2" we get a "+2" in

the loop_1, then on the loop-phi-node "b" we compute the overall

effect of the inner loop that is "b = c + 30", and we get a "+30"

in the loop_1. That means that the overall stride in loop_1 is

equal to "+32", and the result is:

a -> {1, +, 32}_1

c -> {3, +, 32}_1

Example 2b: Multivariate chains of recurrences.

| loop_1

| k = phi (0, k + 1)

| loop_2 4 times

| j = phi (0, j + 1)

| loop_3 4 times

| i = phi (0, i + 1)

| A[j + k] = ...

| endloop

Analyzing the access function of array A with

instantiate_parameters (loop_1, "j + k"), we obtain the

instantiation and the analysis of the scalar variables "j" and "k"

in loop_1. This leads to the scalar evolution {4, +, 1}_1: the end

value of loop_2 for "j" is 4, and the evolution of "k" in loop_1 is

{0, +, 1}_1. To obtain the evolution function in loop_3 and

instantiate the scalar variables up to loop_1, one has to use:

instantiate_scev (block_before_loop (loop_1), loop_3, "j + k").

The result of this call is {{0, +, 1}_1, +, 1}_2.

Example 3: Higher degree polynomials.

| loop_1

| a = phi (2, b)

| c = phi (5, d)

| b = a + 1

| d = c + a

| endloop

a -> {2, +, 1}_1

b -> {3, +, 1}_1

c -> {5, +, a}_1

d -> {5 + a, +, a}_1

instantiate_parameters (loop_1, {5, +, a}_1) -> {5, +, 2, +, 1}_1

instantiate_parameters (loop_1, {5 + a, +, a}_1) -> {7, +, 3, +, 1}_1

Example 4: Lucas, Fibonacci, or mixers in general.

| loop_1

| a = phi (1, b)

| c = phi (3, d)

| b = c

| d = c + a

| endloop

a -> (1, c)_1

c -> {3, +, a}_1

The syntax "(1, c)_1" stands for a PEELED_CHREC that has the

following semantics: during the first iteration of the loop_1, the

variable contains the value 1, and then it contains the value "c".

Note that this syntax is close to the syntax of the loop-phi-node:

"a -> (1, c)_1" vs. "a = phi (1, c)".

The symbolic chrec representation contains all the semantics of the

original code. What is more difficult is to use this information.

Example 5: Flip-flops, or exchangers.

| loop_1

| a = phi (1, b)

| c = phi (3, d)

| b = c

| d = a

| endloop

a -> (1, c)_1

c -> (3, a)_1

Based on these symbolic chrecs, it is possible to refine this

information into the more precise PERIODIC_CHRECs:

a -> |1, 3|_1

c -> |3, 1|_1

This transformation is not yet implemented.

一个问题的BF答案

2010年8月28日 00:40

很禽兽的一个问题：怎么样证明一个处理器已经是最快的运行一个程序了。

一个很禽兽的答案：用汇编写，确定最好的流水线调度和最好的局部性。

这个问题的一些延伸：

IPC能不能用来说明问题，可以，但是要是cpu-intense程序，或者是io-intensive的cpu-intensive部分。

如果从程序运行上看不出什么效果呢？

那只能说cpu太烂了，或者程序太超前了，反正是瘦驴拉大磨。

这又提出一个新的问题，编译器对超前的程序总是无可奈何？

这个问题就很大了，要调整程序结构和实现，使之可以在慢的cpu上运行的更快？

还是调整编译器，使之把程序像更慢的cpu上优化？

其实我真正想知道的是：怎么样才能知道或者说诊断，一个程序在运行过程中的瓶颈出在什么地方。

或者说，这个程序有没有很好的利用cpu呢。

这需要benchmark数据来刻画，但同时也需要一个interpreter来给程序员更明显的指示。

进一步的问题是，benchmark刻画出来的数据，是cpu的运行时状态，也是程序运行时的状态。

但是这里所谓的程序，是程序员的杰作，也是编译器的，因此这些数据刻画出的瓶颈，是程序员的也是编译器的。

这就是最大的问题所在了，怎么样刻画这个瓶颈，可以让这个职责更明显呢。

现在，直白的方法就是知道编译器对这段热点瓶颈做了什么，你对她的hotspot做了什么？

但是这之间仍然饱含巨量的人工分析和经验判断，对于大规模工业化过程之能说是，

杯水车薪。

当然问题没有这么悲观，一个很乐观的规律是，任何自动化方法都是对人工分析和经验判断的抽象。

所以说，万事开头，还是要从那个人工分析开始的。

就是要先想办法把分析工具建立起来，这个基本已经有80%的完工了。

但是很大一部分工作在于，使用该工具进行实际的性能分析，这个看上去就没那么容易了，估计一两周的工作量都算是快的。

但是由此产生出一个YY高手的想法，那就是根据前面的问题，我们可以提出一种混合式的的编译器。

该编译器接近调试器，可以实时编译，有利于程序员一边优化一边评判最终性能。

这个想法听上去很想profile driven optimizing，只不过是programmer driven。

思路到这里转了一个圈。

既然是要根据profile的结果来分析程序和优化选项进而改进，那为什么不能直接把

profile的结果反馈给编译器，形成一个异步的，离线的反馈环节。

这不就是profile driven optimizing嘛。

但是这个问题的之能解决一半的问题，那就是compiler端的问题，而实际上，根据profile的结果，来给程序员提出合理的建议，

也是另外一个反馈环节，而这个环节同compiler的环节一样，都比较难达到很好的自动化。

进一步的思考会发现，其实对于cpu和profile数据而言，没有什么完美不完美，倒是IPC可以说明一些问题。

更多的，我们希望，通过其他各项数据得到一个完整的刻划，这个本身就很难。

Tags:
Posted in 书摘 | 1 Comments

设计模式随感

2010年8月27日 18:33

发现很多模式都在os里有用，看来用cpp快速开发个os也蛮诱人的 .

design patterns in large scale software design : os , browser, 3d designer , other big hierarchies

如果你说你懂得c++，你可以用类，继承，多态，写出比c更高效，更易扩展的代码。

那么我想你写出来的还很可能是c代码，只不过是使用了类，集成和多态的c代码。

如果你说你懂得dp，那么你可以超越c的思维框架了，甚至是c++的框架，而设计出

第一自动化很高。

第二模块化很强。

第三很容易扩展。

的代码。

当然这些设计在实现的过程中，也会考虑到效率的问题，但是一方面效率可以交给编译器去处理，

这样来说，越规范的设计，编译器越容易优化；另一方面在这个级别，能作的优化确实不多。

设计模式的精髓，就是把代码自动化融进代码本身，读创建模式有感。

设计模式的目标是把c的设计方法从c++中剔除掉。

Tags:
Posted in 书摘 | 2 Comments

the door way

Android

回归测试，是个啥？

我想要的软件

roadmap

MacJournal at hand

gcc phases

gcc scalar evolution

一个问题的BF答案

设计模式随感

dancefish

分类

最新评论

最新留言

链接

功能