This project started off as a school project where we where supposed to work in groups to build a compiler. I was fascinated by the idea of building a compiler so I decided to build one on my own without external tools to help parse code. Below I will be detailing how the compiler converts a subset of C code to x86 assembly:
Parsing:
The compiler starts off by parsing the code into tokens, and while parsing it builds a type list, function list, and a IR graph (intermediate representation). The IR graph has nodes that represent variable definition, assignment, as well as all the operations supported by the compiler such as +, -, etc.
Intermediate Representation:
The IR graph handles if statements by storing a node for the if statement itself, a graph node that points to a subgraph to execute if the if statement is taken, as well a node to the first line of code outside of the if statement. While loops are stored in a similar fashion. This method of storage is useful for the compiler backend because the backend will be able to know where to insert jump statements in the code and to which lines the jumps have to jump to. It also is useful because of the compilers feature for compile time code execution.
Compile Time Code Execution:
Compile time code execution was inspired by Jonathan blow's language where you can put a #run statement next to any statement of code, and that statement will be executed at compile time of theprogram. This avoids the need for things like template meta programming when we want to pre-compute constants for our code, and it can be extended to build look up tables and do static code analysis. In Nutella, the #run directive works the same way and it supports precomputation of constants in the code as well as function calls.
x86 Backend:
After the IR graph is built, the program executes any #run statements through an interperter that understands how to execute the IR graph. Once the constants are placed into the code, we invoke the backend to convert the IR to x86 assembly.
The backend steps through the graph and calculates the stack offsets of each local variable in the program. Afterwards, it steps through the graph again and it converts every node to the x86 assembly instructions that represent it. The backend tries to keep as few registers in use at any time as possible, so for every instruction, it loads the value from the stack that is needed into registers, it operates on the values, and then it pushes the result back to the stack.
Working on this project was one of the coolest thing's I've ever coded, it opened up my eyes to how code optimization occurs, the kind of information interpreters have to process, and it also taught me a lot about parsing strings. Building the compiler from scratch was also useful because I got to see how the compiler has to handle various error cases and outputting the correct error message. If you'd like to check the code for this project, it can be found at https://github.com/Ihaa21/nutella_compiler.
No comments:
Post a Comment