Grades and deadlines

Initial deadline: 23:59 on Tuesday 30 September
How to submit: Turn in your completed lab2.3.yml file, along with everything in the src/main folder of your maven project.

You must submit via the command line in order to preserve the folder structure. Use either of these two commands to do it:
```
club -csi413 -plab2.3 -f lab2.3.yml src/main
```
or
```
submit -c=si413 -p=lab2.3 lab2.3.yml src/main
```
(you can download the club tool here)

If you used an AI tool to help with this lab (you should use Gemini), turn in a file aichat.md as well. Remember the course guidelines for the use of generative AI on labs.
Grading:

In this lab you will complete the following tasks:
1. Starting with the code for your working interpreter from the previous lab, write a compiler that reads in source code in your chosen language and produces an equivalent program in LLVM IR.
2. Thoroughly test your compiler by running the produced .ll programs using lli or clang.
If your submission meets the requirements for each task, you will receive 3 points towards your total lab grade.
Collaboration

The same rules apply as with your interpreter, across all sections of SI413:
- If someone is working on the same language as you, you should not be collaborating with them since you are doing the same task.
- If someone is working on a different language than you, then it is OK for you to help each other with debugging your code for the lab, as long as this collaboration is clearly documented in the code and in your YAML file.

Resubmissions:

We will follow the same resubmission policy for all labs this semester:

def points_earned(deadline, max_points, previous_submission=None):
    while current_time() < deadline:
        wait()
    submission = get_from_submit_system()
    if meets_all_requirements(submission):
        return max_points
    elif significant_progress(previous_submission, submission):
        return points_earned(deadline + one_week, max_points, submission)
    else:
        return 0

Getting started: files

Start by downloading the starter files. Running this command will create a new directory lab2.3:

git clone https://github.com/si413usna/startlab.git -b lab2.3 lab2.3

In there you will find:

lab2.3.yml: for you to fill in and submit
pom.xml: config file for maven
run.sh: script to run your compiler
src/main/resources/si413/preamble.c: C file with helper functions that you fill in as needed
src/main/resources/si413/preamble.ll: LLVM code created from preamble.c. Be sure to regenerate this whenever you change preamble.c
src/main/resources/si413/tokenSpec.txt: Placeholder token spec file. Replace this with your token spec from the interpreter.
src/main/antlr4/si413/ParseRules.g4: Placeholder grammar file. Replace this with your grammar from the interpreter.
src/main/java/si413/Errors.java: Error handler for tokenizer, parser, and compiler (you shouldn’t need to edit this)
src/main/java/si413/Tokenizer.java: Tokenizer logic (don’t edit this either)
src/main/java/si413/Compiler.java: You will need to fill this in!

Get started by copying your tokenSpec.txt and ParseRules.g4 directly from your solution to the previous lab.

You can also copy your working Interpreter.java into the src/main/java/si413/ directory so you can reference it as you fill in Compiler.java.

Task 1: Write your compiler

The Compile.java from the starter code has the basic structure:

A main that takes two filenames as command-line arguments, scans and parses the source code, creates a PrintWriter dest for the output ll file, and then calls compile().
The compile method copies the contents of preamble.ll from the resources directory to the output, then starts the LLVM main() and calls the visit methods starting at the top of the parse tree.

You will need to add some more stuff here for anything that must go outside of main in the compiled LLVM code.
There are two visitor classes, StmtVisitor and ExprVisitor. You need to fill in a bunch of methods here, just like you did for your interpreter. But now, instead of executing the actual commands, your visit methods should output the LLVM code which will execute the command later.

For your expression visitors, they will no longer be returning Strings and Booleans, but rather register names where the result of that expression will be stored after execuring that piece of the code. Think about this and make sure you understand it!

You will need to add all the same visit methods from your interpreter to these visitor classes, except of course they should be compiling and not interpreting!

Compiling and running your compiler should be done from the base directory of the maven project (where the pom.xml file is). You can do this the “hard way”:

mvn package
java -jar target/compiler-2.3.jar source.prog output.ll

or the “easy way” using the included bash script:

./run.sh source.prog output.ll

After that, you can execute the resulting LLVM program the hard way:

clang output.ll
./a.out

or the easy way:

lli output.ll

Be sure to use the club or submit commands at the top of this page, from the base directory of your lab2.3 project, to turn in your work.

Tips

How I suggest you work

For much of the lab work in SI413, I strongly recommend making very miniscule updates to your code, alongside updates to a running example program in the source language, and incessantly testing your compiler.

Here’s what I did:

Make a test example program, test.prog, in a text editor. Initially just put a comment there and leave it as an empty program.
Get your compiler working on just that program, by adding hopefully just one or two visitor methods. (For the empty program, you shouldn’t need to add anything to the starter code.)
Compile your compiler, compile the example program, and run the resulting LLVM code. You can do it in one glorious bash line like this:
```
./run.sh test.prog test.ll && echo 'COMPILED!' && lli test.ll
```
Repeat steps 2/3 until that program works perfectly. (For the initial empty program, it should correctly do nothing!)
Pick one more feature to add and return to step 1, enhancing your test program to test out that feature, and then getting it to actually work.

Think small and start with the easy stuff, then build up gradually and improve. For example, your first non-empty test program might just be printing a single boolean literal true/false. That should only require filling in probably three visitor functions.

Preamble helper functions

The starter code has a preamble.c and preamble.ll files in the resources directory. The idea is to add any helper functions you feel like writing in preamble.c, then run

clang -S -emit-llvm preamble.c

to re-generate the preamble.ll file manually. The starter code Compiler.java class I gave you already does the work of copying whatever is in preamble.ll to the top of your compiled output file.

As you work on incrementally adding features, your visitor methods need to output LLVM code to do whatever that next feature does, like string concatenation, string comparison, boolean logic, reading input, etc. For any of these where it take more than a couple lines of LLVM to do the job, or you aren’t quite sure how to write it in LLVM, feel free to make a helper function and add it to preamble.c.

(Even if you are not very confident in your C programming, I promise you it is easier than programming in raw LLVM IR!)

Here are the concrete steps when you want to add a new helper function:

Add the new function to preamble.c
Run clang -S -emit-llvm to regenerate preamble.ll
In your Compiler.java, emit LLVM code to call your new helper function when needed

Bools in LLVM IR

The type of booleans in LLVM IR code is i1, as in a 1-bit integer. You can use constants true and false in your commands wherever it would expect a i1 register name.

The LLVM IR language includes commands for basic boolean logic such as and, or, and xor. However it does not include a not instruction. Think about why that might be, and how you can logically get the same thing without that explicit instruction!

Typing and errors

Because our languages are still pretty simple, your compiler should know the type and scope of every variable. So for example, if someone tries to print out a variable before defining it, or tries to use a boolean variable where a string variable is expected or vice-versa, you should be able to detect this and throw an error at compile-time by calling Errors.error(...) from your compiler.

Think about why your language can do these kind of compile-time checks, whereas real languages such as Python or Scheme inherently cannot, and have to wait until run-time to check for undefined or wrong-type variables.

Memory management

I recommend making copies of strings and using malloc or calloc to allocate new strings for most operations, rather than modifying the strings that are passed in.

Why? Well, we have variables now! That means that sometimes, the input to some string operation will not be a string literal or the output of some sub-expression, but rather a variable value. But then, if you modify that string in-place, you just accidentally changed the variable value as well!

If you are writing C code in preamble.c to deal with this, note that some C string functions such as strdup already call malloc for you to allocate a new string before returning, and some others such as strcat expect you to pre-allocate enough space for the result. So read the man pages and work carefully.

Note that, if you take this advice and continually allocate new memory for all the string operations, your compiled code will probably have a memory leak, unless you also go through the careful business of calling free() at just the right spots to de-allocate all those strings. Memory leaks are OK for this lab, but if you have time and want an extra challenge, see if you can get free() to work so that every allocated string is free’d exactly once before the program terminates.