Grades and deadlines
-
Initial deadline: 23:59 on Wednesday 05 November
-
How to submit: Turn in your completed
lab3.2.ymlfile, along with everything in thesrc/mainfolder of your maven project.You must submit via the command line in order to preserve the folder structure. Use either of these two commands to do it:
club -csi413 -plab3.2 -f lab3.2.yml src/mainor
submit -c=si413 -p=lab3.2 lab3.2.yml src/main(you can download the club tool here)
If you used an AI tool to help with this lab (you should use Gemini), turn in a file
aichat.mdas well. Remember the course guidelines for the use of generative AI on labs. -
Grading:
In this lab you will complete the following tasks:
- Starting with the code for your working interpreter from the previous lab, write a compiler that reads in source code in your chosen language and produces an equivalent program in LLVM IR.
- Thoroughly test your compiler by running the produced
.llprograms usinglliorclang.
If your submission meets the requirements for each task, you will receive 3 points towards your total lab grade.
-
Collaboration
Because you are all working from the same AST nodes, everyone’s task is more or less the same for this lab.
Because of that, you may not share your Java code, or look at anyone else’s Java code, for this lab.
However, you should feel free to discuss the lab with classmates (without looking at Java code), diagram out your solution, etc., as long as that collaboration is clearly documented in your YAML file.
-
Resubmissions:
We will follow the same resubmission policy for all labs this semester:
def points_earned(deadline, max_points, previous_submission=None): while current_time() < deadline: wait() submission = get_from_submit_system() if meets_all_requirements(submission): return max_points elif significant_progress(previous_submission, submission): return points_earned(deadline + one_week, max_points, submission) else: return 0
The big idea
In the previous lab, you were given working code for a complete AST, and
were asked to write the code which would generate the AST from a parse
tree in your language. The interpreter just did scanning, parsing, AST
generation (semantic analysis), and then called exec() on the root AST
node.
In this lab, almost all of that will stay the same: scanning, parsing, and AST generation.
But what will be completely different is what happens with the AST
itself. The Stmt and Expr interfaces both have a new method
compile() which needs to be implemented in every AST node. The job of
each compile method is to output LLVM IR code corresponding to
whatever that AST node does, and to recursively compile any child nodes.
A lot of the AST compile() methods have already been filled in for
you, because they are doing the same things as you already got working
in the previous two compiler labs.
Your focus in this lab will be on three tasks:
-
Understanding how your scanner, parser, and AST generator from the last lab fits in to the AST-based compiler in the starter code that you are given, and how it all works together.
-
Implementing variable assignment and reference, using memory in LLVM. Remember as we have learned in class, that once our code has branching, we can no longer just store everything in registers. The compiler will need to allocate memory to store variable values the first time they are assigned.
-
Implementing if statements and while loops, using branching instructions in LLVM
There is a lot less code to write in this lab compared to some previous labs, but the code is getting more intricate and tricky. So you might have to think and plan more than before. Fortunately, this is the most fun part of programming and the reason computer science is a thing!
Getting started: files
Start by downloading the starter files. Running this command will create
a new directory lab3.2:
git clone https://github.com/si413usna/startlab.git -b lab3.2 lab3.2
As usual, you should see an empty lab3.2.yml file for you to
fill in, plus a pom.xml file for maven, and a src/ directory with
all the starter code.
Remember that we said the scanning, parsing, and AST generation will work exactly the same as in the previous lab?
Well, that corresponds to you copying these three files from the previous lab, into the new lab:
src/main/resources/si413/tokenSpec.txtsrc/main/antlr4/si413/ParseRules.g4src/main/java/si413/ASTGen.java
After getting all the files set up, try running your compiler on a
simple program (with no variables, if statements, or loops) and
testing the resulting .ll code using lli or clang. It should work!
Remember, if you have some example program in example.prog, you can
test your compiler like this on the command line:
./run.sh example.prog example.ll
lli example.ll
Task 1: Write your compiler
Most of your work will be in Compiler.java, Expr.java, and
Stmt.java. Concretely, your goal is to fill in the missing
compile() methods. Each missing method in the starter code has a
// TODO comment to help you spot them.
You need to basically implement variables and control structures, and that’s it! Remember what we have been doing recently in class: allocating memory (for variable storage), and implementing branching (for if statements and while loops).
Now you will need to work at one more level of abstraction, writing Java code that produces LLVM code which deals with memory allocation and branching.
This is all the guidance we are going to give here! Look through the
code that you are given in Compiler.java, Expr.java, and
Stmt.java to help get you started.
You are ready for this challenge. Good luck!
Extra challenge
To make your compiler extra awesome, try the following two enhancements:
-
Implement “short circuit” evaluation for and and or statements. Meaning, when the first argument to the and/or operation already tells you what the result will be, you shouldn’t need to bother evaluating the second argument. (Hint: more branch instructions!)
-
Right now all the existing string stuff that the compiler produces is allocating new memory on the heap using
malloc, and that memory is neverfreed. Meaning, the compiled code is a giant memory leak.Fix this so that your compiler produces memory-safe code where all memory gets free’d when it’s no longer needed, before the program finishes.