Grades and deadlines
-
Initial deadline: 23:59 on Tuesday 30 September
-
How to submit: Turn in your completed
lab2.3.ymlfile, along with everything in thesrc/mainfolder of your maven project.You must submit via the command line in order to preserve the folder structure. Use either of these two commands to do it:
club -csi413 -plab2.3 -f lab2.3.yml src/mainor
submit -c=si413 -p=lab2.3 lab2.3.yml src/main(you can download the club tool here)
If you used an AI tool to help with this lab (you should use Gemini), turn in a file
aichat.mdas well. Remember the course guidelines for the use of generative AI on labs. -
Grading:
In this lab you will complete the following tasks:
- Starting with the code for your working interpreter from the previous lab, write a compiler that reads in source code in your chosen language and produces an equivalent program in LLVM IR.
- Thoroughly test your compiler by running the produced
.llprograms usinglliorclang.
If your submission meets the requirements for each task, you will receive 3 points towards your total lab grade.
-
Collaboration
The same rules apply as with your interpreter, across all sections of SI413:
-
If someone is working on the same language as you, you should not be collaborating with them since you are doing the same task.
-
If someone is working on a different language than you, then it is OK for you to help each other with debugging your code for the lab, as long as this collaboration is clearly documented in the code and in your YAML file.
-
-
Resubmissions:
We will follow the same resubmission policy for all labs this semester:
def points_earned(deadline, max_points, previous_submission=None): while current_time() < deadline: wait() submission = get_from_submit_system() if meets_all_requirements(submission): return max_points elif significant_progress(previous_submission, submission): return points_earned(deadline + one_week, max_points, submission) else: return 0
Getting started: files
Start by downloading the starter files. Running this command will create
a new directory lab2.3:
git clone https://github.com/si413usna/startlab.git -b lab2.3 lab2.3
In there you will find:
lab2.3.yml: for you to fill in and submitpom.xml: config file for mavenrun.sh: script to run your compilersrc/main/resources/si413/preamble.c: C file with helper functions that you fill in as neededsrc/main/resources/si413/preamble.ll: LLVM code created frompreamble.c. Be sure to regenerate this whenever you change preamble.csrc/main/resources/si413/tokenSpec.txt: Placeholder token spec file. Replace this with your token spec from the interpreter.src/main/antlr4/si413/ParseRules.g4: Placeholder grammar file. Replace this with your grammar from the interpreter.src/main/java/si413/Errors.java: Error handler for tokenizer, parser, and compiler (you shouldn’t need to edit this)src/main/java/si413/Tokenizer.java: Tokenizer logic (don’t edit this either)src/main/java/si413/Compiler.java: You will need to fill this in!
Get started by copying your tokenSpec.txt and ParseRules.g4 directly
from your solution to the previous lab.
You can also copy your working Interpreter.java into the
src/main/java/si413/ directory so you can reference it as you fill in
Compiler.java.
Task 1: Write your compiler
The Compile.java from the starter code has the basic structure:
-
A
mainthat takes two filenames as command-line arguments, scans and parses the source code, creates a PrintWriterdestfor the output ll file, and then callscompile(). -
The
compilemethod copies the contents ofpreamble.llfrom the resources directory to the output, then starts the LLVM main() and calls thevisitmethods starting at the top of the parse tree.You will need to add some more stuff here for anything that must go outside of main in the compiled LLVM code.
-
There are two visitor classes,
StmtVisitorandExprVisitor. You need to fill in a bunch of methods here, just like you did for your interpreter. But now, instead of executing the actual commands, your visit methods should output the LLVM code which will execute the command later.For your expression visitors, they will no longer be returning Strings and Booleans, but rather register names where the result of that expression will be stored after execuring that piece of the code. Think about this and make sure you understand it!
You will need to add all the same visit methods from your interpreter to these visitor classes, except of course they should be compiling and not interpreting!
Compiling and running your compiler should be done from the base
directory of the maven project (where the pom.xml file is).
You can do this the “hard way”:
mvn package
java -jar target/compiler-2.3.jar source.prog output.ll
or the “easy way” using the included bash script:
./run.sh source.prog output.ll
After that, you can execute the resulting LLVM program the hard way:
clang output.ll
./a.out
or the easy way:
lli output.ll
Be sure to use the club or submit commands at the top of this page,
from the base directory of your lab2.3 project, to turn in your
work.
Tips
How I suggest you work
For much of the lab work in SI413, I strongly recommend making very miniscule updates to your code, alongside updates to a running example program in the source language, and incessantly testing your compiler.
Here’s what I did:
-
Make a test example program,
test.prog, in a text editor. Initially just put a comment there and leave it as an empty program. -
Get your compiler working on just that program, by adding hopefully just one or two visitor methods. (For the empty program, you shouldn’t need to add anything to the starter code.)
-
Compile your compiler, compile the example program, and run the resulting LLVM code. You can do it in one glorious bash line like this:
./run.sh test.prog test.ll && echo 'COMPILED!' && lli test.ll -
Repeat steps 2/3 until that program works perfectly. (For the initial empty program, it should correctly do nothing!)
-
Pick one more feature to add and return to step 1, enhancing your test program to test out that feature, and then getting it to actually work.
Think small and start with the easy stuff, then build up gradually and improve. For example, your first non-empty test program might just be printing a single boolean literal true/false. That should only require filling in probably three visitor functions.
Preamble helper functions
The starter code has a preamble.c and preamble.ll files in the
resources directory. The idea is to add any helper functions you feel
like writing in preamble.c, then run
clang -S -emit-llvm preamble.c
to re-generate the preamble.ll file manually. The starter code
Compiler.java class I gave you already does the work of copying
whatever is in preamble.ll to the top of your compiled output file.
As you work on incrementally adding features, your visitor methods need
to output LLVM code to do whatever that next feature does, like string
concatenation, string comparison, boolean logic, reading input, etc. For
any of these where it take more than a couple lines of LLVM to do the
job, or you aren’t quite sure how to write it in LLVM, feel free to
make a helper function and add it to preamble.c.
(Even if you are not very confident in your C programming, I promise you it is easier than programming in raw LLVM IR!)
Here are the concrete steps when you want to add a new helper function:
- Add the new function to
preamble.c - Run
clang -S -emit-llvmto regeneratepreamble.ll - In your
Compiler.java, emit LLVM code to call your new helper function when needed
Bools in LLVM IR
The type of booleans in LLVM IR code is i1, as in a 1-bit integer. You
can use constants true and false in your commands wherever it would
expect a i1 register name.
The LLVM IR language includes commands for basic boolean logic such as
and, or, and xor. However it does not include a not instruction.
Think about why that might be, and how you can logically get the same
thing without that explicit instruction!
Typing and errors
Because our languages are still pretty simple, your compiler should know
the type and scope of every variable. So for example, if someone tries
to print out a variable before defining it, or tries to use a boolean
variable where a string variable is expected or vice-versa, you should
be able to detect this and throw an error at compile-time by calling
Errors.error(...) from your compiler.
Think about why your language can do these kind of compile-time checks, whereas real languages such as Python or Scheme inherently cannot, and have to wait until run-time to check for undefined or wrong-type variables.
Memory management
I recommend making copies of strings and using malloc or calloc to
allocate new strings for most operations, rather than modifying the
strings that are passed in.
Why? Well, we have variables now! That means that sometimes, the input to some string operation will not be a string literal or the output of some sub-expression, but rather a variable value. But then, if you modify that string in-place, you just accidentally changed the variable value as well!
If you are writing C code in preamble.c to deal with this, note that
some C string functions such as strdup already call malloc for you
to allocate a new string before returning, and some others such as
strcat expect you to pre-allocate enough space for the result. So read
the man pages and work carefully.
Note that, if you take this advice and continually allocate new memory
for all the string operations, your compiled code will probably have a
memory leak, unless you also go through the careful business of
calling free() at just the right spots to de-allocate all those
strings. Memory leaks are OK for this lab, but if you have time and want
an extra challenge, see if you can get free() to work so that every
allocated string is free’d exactly once before the program terminates.