Grades and deadlines

Getting started: files

Start by downloading the starter files. Running this command will create a new directory lab2.3:

git clone https://github.com/si413usna/startlab.git -b lab2.3 lab2.3

In there you will find:

Get started by copying your tokenSpec.txt and ParseRules.g4 directly from your solution to the previous lab.

You can also copy your working Interpreter.java into the src/main/java/si413/ directory so you can reference it as you fill in Compiler.java.

Task 1: Write your compiler

The Compile.java from the starter code has the basic structure:

Compiling and running your compiler should be done from the base directory of the maven project (where the pom.xml file is). You can do this the “hard way”:

mvn package
java -jar target/compiler-2.3.jar source.prog output.ll

or the “easy way” using the included bash script:

./run.sh source.prog output.ll

After that, you can execute the resulting LLVM program the hard way:

clang output.ll
./a.out

or the easy way:

lli output.ll

Be sure to use the club or submit commands at the top of this page, from the base directory of your lab2.3 project, to turn in your work.

Tips

How I suggest you work

For much of the lab work in SI413, I strongly recommend making very miniscule updates to your code, alongside updates to a running example program in the source language, and incessantly testing your compiler.

Here’s what I did:

  1. Make a test example program, test.prog, in a text editor. Initially just put a comment there and leave it as an empty program.

  2. Get your compiler working on just that program, by adding hopefully just one or two visitor methods. (For the empty program, you shouldn’t need to add anything to the starter code.)

  3. Compile your compiler, compile the example program, and run the resulting LLVM code. You can do it in one glorious bash line like this:

    ./run.sh test.prog test.ll && echo 'COMPILED!' && lli test.ll
    
  4. Repeat steps 2/3 until that program works perfectly. (For the initial empty program, it should correctly do nothing!)

  5. Pick one more feature to add and return to step 1, enhancing your test program to test out that feature, and then getting it to actually work.

Think small and start with the easy stuff, then build up gradually and improve. For example, your first non-empty test program might just be printing a single boolean literal true/false. That should only require filling in probably three visitor functions.

Preamble helper functions

The starter code has a preamble.c and preamble.ll files in the resources directory. The idea is to add any helper functions you feel like writing in preamble.c, then run

clang -S -emit-llvm preamble.c

to re-generate the preamble.ll file manually. The starter code Compiler.java class I gave you already does the work of copying whatever is in preamble.ll to the top of your compiled output file.

As you work on incrementally adding features, your visitor methods need to output LLVM code to do whatever that next feature does, like string concatenation, string comparison, boolean logic, reading input, etc. For any of these where it take more than a couple lines of LLVM to do the job, or you aren’t quite sure how to write it in LLVM, feel free to make a helper function and add it to preamble.c.

(Even if you are not very confident in your C programming, I promise you it is easier than programming in raw LLVM IR!)

Here are the concrete steps when you want to add a new helper function:

  1. Add the new function to preamble.c
  2. Run clang -S -emit-llvm to regenerate preamble.ll
  3. In your Compiler.java, emit LLVM code to call your new helper function when needed

Bools in LLVM IR

The type of booleans in LLVM IR code is i1, as in a 1-bit integer. You can use constants true and false in your commands wherever it would expect a i1 register name.

The LLVM IR language includes commands for basic boolean logic such as and, or, and xor. However it does not include a not instruction. Think about why that might be, and how you can logically get the same thing without that explicit instruction!

Typing and errors

Because our languages are still pretty simple, your compiler should know the type and scope of every variable. So for example, if someone tries to print out a variable before defining it, or tries to use a boolean variable where a string variable is expected or vice-versa, you should be able to detect this and throw an error at compile-time by calling Errors.error(...) from your compiler.

Think about why your language can do these kind of compile-time checks, whereas real languages such as Python or Scheme inherently cannot, and have to wait until run-time to check for undefined or wrong-type variables.

Memory management

I recommend making copies of strings and using malloc or calloc to allocate new strings for most operations, rather than modifying the strings that are passed in.

Why? Well, we have variables now! That means that sometimes, the input to some string operation will not be a string literal or the output of some sub-expression, but rather a variable value. But then, if you modify that string in-place, you just accidentally changed the variable value as well!

If you are writing C code in preamble.c to deal with this, note that some C string functions such as strdup already call malloc for you to allocate a new string before returning, and some others such as strcat expect you to pre-allocate enough space for the result. So read the man pages and work carefully.

Note that, if you take this advice and continually allocate new memory for all the string operations, your compiled code will probably have a memory leak, unless you also go through the careful business of calling free() at just the right spots to de-allocate all those strings. Memory leaks are OK for this lab, but if you have time and want an extra challenge, see if you can get free() to work so that every allocated string is free’d exactly once before the program terminates.