The Static Linker

What steps to convert the program from the code to the executable file?

  • Pre-compile
    cpp hello.c > hello.i
    gcc -E hello.c -o hello.i
    • delete all “define” and replace all the macro definition
    • handle all condition pre-compile: #if, #ifdef,#elif,#else,#endif
    • handle #include: insert the file to the code. This is recursive.
    • delete all comments of the code
    • add the line number and the file mark. This will help compiler produces the line number used in the debug and can show the line number when compiler produces the error or warning.
    • remain all #pragma since the compiler will use it.
  • Compile
    The process of the compiler produces the assembly code with the file pre-compiled by Scanner, Parser, Grammar Parser, Semantic Parser and Optimizer.
    gcc -S hello.i -o hello.s

    or directly use:

    gcc -S hello.c -o hello.s
  • Assemble
    The Assemble can transfer the assembly language to the machine language according to reference table recorded the relation between the assembly and the machine code.
    You can use the following command to produce the object the machine can recognize:
    gcc -c hello.s -o hello.o
  • Link
    It produces an executable file to link all object files.

How does the compiler work?

Complier Work Process

  • The font of compile
    • Scanner and Parser
      The Parser will scan the code with “Finite State Machine” to crop that code into some tokens.
      These tokens are classed by:
      • KeyWord
      • Identification
      • Literal (numbers, string)
      • Special Symbol

      These tokens are stored in a symbol table.
      The program called lex can implement this function. It can crop the string inputted by the user into tokens according to some rules set by the users. So the parser doesn’t need to develop, the programs only change rules based on their requirement.
      Especially, The macro and the file replacement are in the pre-compile for the C programming language.

    • Grammar Parser
      The syntax tree is built by analyzing the token produced by the scanner. The process uses the Context-free Grammar The syntax tree is the tree whose node is the expression. The symbol and number are min expression, those aren’t made from other expression, are as the leaf of the whole tree. While analyzing the grammar, the priority and the meaning of a more arithmetic symbol is ensured.
      If the expression is illegal, the compiler reports the failure in the grammar parser.
      The tool called yacc is grammar parser tools. It is called as Compiler Compiler.
    • Semantic Parser
      The meaning will be explained in the semantic Parser. The semantic parser checks the semantic of statements.
      Semantic contains two aspects:

      • Static
        The semantic is ensured during compiling.
        It contains the match of the identification and the type.
      • Dynamic
        The semantic is ensured during running

      Through semantic analyzing, all nodes in the syntax tree were marked type. If the type needs auto-transformation, the semantic analyzing program will insert transforming node to the syntax tree.

    • Build the Internal Language
      The Source Code Optimizer converts the optimized code to the Intermediate Code.
      The intermediate code is close to machine code, but it doesn’t depend on any platform. For example, it doesn’t contain any size of data, the address of the variable, the name of the register.
      The Intermediate Code is as an Internal Language. It can cross any platform and make the compiler becomes two parts:
      The front – has one intermediate code
      The back – has more implementation for different platforms.
  • The back of compiler
    • Generate the machine code
      The Code Generator translates the intermediate code to the machine code. This process depends on the hardware of the certain machine. The different machine has different bytes, different registers, and different data size.
    • Target Code Optimizer
      Finally, the Target Code Optimizer optimizes the machine code generated by Code Generator.
      Then, the object code that is the machine code is linked to an executable file.

How does the linker work?

  • What kind of work does the linker complete?
    • relocating the address of the object when the code changes
    • replacing the symbol stood for the real address in the assembly language
      The symbol in the assembly language can stand for an address of a variable or a function.
    • joining all modules to generate an executable file
      An object file contained machine code called a module. It is communication problem to join all modules.
      All modules communicate each other with symbols. And joining process is called Linker
  • Static Linker
    The content of linker is to handle well all reference and makes all modules connecting perfectly.
    The process of linking is:

    • Address and Storage Allocation
    • Symbol Resolution
    • Relocation

    The linker can modify the address used as the placeholder of the symbol to the real address. This process is called Relocation. Every address used as the placeholder is called Relocation Entry