Grades and deadlines
-
Initial deadline: 23:59 on Tuesday 09 September
-
How to submit: Turn in a completed
lab1.3.ymlfile, yourCompiler.javafile with amainmethod, and any other.javaor.llfiles I need to run your code, to the CS department submit system underSI413/lab1.3I recommend using the
clubtool to submit, by runningclub -cSI413 -plab1.3 *.java *.ll lab1.3.yml aichat.mdIf you used an AI tool to help with this lab (you should use Gemini), turn in a file
aichat.mdas well. Remember the course guidelines for the use of generative AI on labs. -
Grading:
In this lab you will complete the following tasks:
- Starting with the code for your working interpreter from the previous lab, write a compiler that reads in source code in your chosen language and produces an equivalent program in LLVM IR.
- Thoroughly test your compiler by running the produced
.llprograms usinglliorclang.
If your submission meets the requirements for each task, you will receive 3 points towards your total lab grade.
-
Collaboration
The same rules apply as with your interpreter, across all sections of SI413:
-
If someone is working on the same language as you, you should not be collaborating with them since you are doing the same task.
-
If someone is working on a different language than you, then it is OK for you to help each other with debugging your code for the lab, as long as this collaboration is clearly documented in the code and in your YAML file.
-
-
Resubmissions:
We will follow the same resubmission policy for all labs this semester:
def points_earned(deadline, max_points, previous_submission=None): while current_time() < deadline: wait() submission = get_from_submit_system() if meets_all_requirements(submission): return max_points elif significant_progress(previous_submission, submission): return points_earned(deadline + one_week, max_points, submission) else: return 0
Getting started: files
Make a new directory for this lab.
Download the file lab1.3.yml into that directory. You will fill in and submit this file along with your code.
For your java source code, start by copying your own working solution
from the previous lab, and renaming your Interp.java to
Compiler.java (and fix the corresponding class name as well).
Task 1: Write your compiler
Your Compile class should have a main method that now takes two
command-line arguments: the source code file, and the destination ll
file. I will run it like this:
java Compiler source.prog compiled.ll
where source.prog is a program in your chosen language for this unit.
If there are no errors in the code, your computer should not write
anything at all to standard out, or read anything from standard in.
Instead, it should create the
destination file (compiled.ll in the example above) which is a complete,
runnable program in LLVM IR.
You should be able to run the resulting LLVM IR program like
lli compiled.ll
and that should work exactly the same as running source.prog through
your working interpreter from last week.
(Or, if you want the full power of what you are doing, you can completely compile the source program in our made-up programming language, down to executable machine code, via
clang compiled.ll -o compiled
which could then be run over and over again as a native executable
./compiled.)
Task 2: Test it!
Be sure to thoroughly test your compiler to make sure it works in all cases. You should be testing as you go starting with the simplest possible programs in your language, and eventually getting to programs with multiple print statements and complicated expressions with multiple reversals, inputs, and concatenations.
Tips
Here are some helpful suggestions on how you can get started and make progress. Whether you take these suggestions or decide to work in a different way is fine, as long as your compiler works!
Writing your very first compiler is an exciting day in your life as a computer scientist, and you are ready for the challenge! Work carefully, follow the tips, ask for help when needed, and help each other (within the rules), and you will do great.
Initial setup: transforming your interpreter
First you need to get your new Compiler.java set up.
Starting with what you had for the interpreter,
modify your main method so that it takes a second command-line
argument for the destination file. Go ahead and open that for writing,
maybe with a PrintWriter in java.
Then go through and find the places in your interpreter where it actually runs the code. Right now your languages can’t actually do very much, so you’re just looking for the 1-2 lines where each small thing happens:
- Reversing a string
- Concatenating two strings
- Reading in a string from standard in
- Printing out a string
Change just the few lines that do these tasks, by commenting them out and replacing them with some exception like
throw new UnsupportedOperationException("this doesn't work yet")
Compile and test your compiler on an existing program in your language - it should of course throw that exception. Great! Now you now (some of) what you need to fill in.
Step 0: empty program
Start by getting your compiler to work on a source program that doesn’t contain any statements or code at all. (Maybe just comments, or a blank file.)
The result should be an LLVM IR program that also does nothing! Like this:
target triple = "x86_64-pc-linux-gnu"
declare i32 @puts(ptr noundef) #1
;;; probably other function declarations or definitions will eventually
;;; go up here, but they don't depend on the actual program
define i32 @main() {
;;; WHEN SOMETHING HAPPENS, IT WILL EVENTUALLY GO HERE
ret i32 0
}
See if you can get this to work! What should probably happen is really in three steps:
-
Your program dumps out the “preamble” to the destination file, which is everything up until the line where “something happens”. Everything up here is global declarations and function definitions, so it should be the same regardless of what’s in the input file.
-
You call whatever functions or whatnot you already wrote in the interpreter to read in and process the source code. (Since we are testing on a source code with no actual code in it now, this part should actually do nothing at the moment. It should not trigger any of your
UnsupportedOperationExceptions from above. -
When that’s all done, your program dumps out the last few lines, or the “epilogue”, returning from
mainand ending the program.
Now test this on some empty programs, or with just comments in them.
Make sure the resulting .ll file actually works when you run it with
lli - it should successfully do nothing!
Tip: bash one-liner for double compiling and running
To help with testing, I found it convenient to use bash’s && operator
on the command line, like
javac *.java && java Compiler some-test.prog some-test.ll && lli some-test.ll
That one line of bash will:
- Compile your compiler (in Java)
- Run your compiler on
some-test.prog, which would be a small test program in your language - Run the resulting
some-test.llLLVM IR program, which should execute according to whateversome-test.progis supposed to do
By using the && operator, bash will “short-circuit” the operation so
that it stops at the first step above which fails. So for example if
your java code has some typo in it, it will just show you the compiler
error from javac and stop. I recommend working this way!
Doing something small: aloha
Here is a stripped-down LLVM IR program that just prints a single string
literal using the C standard library puts() function:
target triple = "x86_64-pc-linux-gnu"
declare i32 @puts(ptr noundef) #1
define i32 @main() {
call i32 @puts(ptr @lit1)
ret i32 0
}
@lit1 = constant [6 x i8] c"aloha\00"
Now work backwards. Write a small program in your language which prints out “aloha”, and then try to get your compiler to output something like the code above.
Notice that really two things need to change:
- When the code comes to your print statement command, it needs to
somehow spit out that single call to
@puts - Later, after the whole source code has been processed and
mainhas been closed up, in the epilogue, your code needs to somehow spit out the string literal definition for “aloha”.
So the questions you need to figure out are: (1) how can your compiler
spit out the literal definitions at a later time, after it’s already
moved past them in the source code; and (2) how can the print statement
“know” the name of the argument (here it’s @lit1) it is supposed to
print out?
(Hint: for (1), you are allowed to read the source code file twice in your compiler if you need to. Or you can read it once, and save the string literals into an array or something as you encounter each one…)
(Another hint: for (2), you need to essentially pass some information
between the part of your code that handles string literals in the
source, and the part that handles a print statement. You probably
already have that from your interpreter! But now instead of passing the
actual string, you want to pass the name of the register that should
be used, like @lit1 in this example.)
Once this works for just “aloha”, don’t stop there. Go crazy and try a program with two print statements and two string literals. If you did things well, that should work too!
Moving on: helper functions
For whatever capability you choose next, be it stdin input, reversing a string, or concatenation, I strongly recommend writing a helper function in C which does (most of) the work for you.
We talked about this approach in class. In the context of your compiler, it will look like this:
-
You write a C program like
helpers.cwith just function definitions and nomain. The functions should do helpful things for you, like say taking achar*string, then allocating a new one withmalloc, filling it in with the reversal of the original string, and returning the new stringchar*pointer. -
Maybe write a
mainin C in a separate file, so that you can test your helper functions in pure C and make sure they work perfectly. So far this is just C programming, no LLVM involved. -
Compile your helpers into LLVM using
clang -S -emit-llvm -
Next change the preamble part of your compiler so that before
main, it dumps out the LLVM code for your helper functions.
And then voila, you can now call these helper functions in the main
for the LLVM program your compiler produces. It’s a complicated
workflow, but it will really make the job easier when you get it set up.
The idea here is that the really hard work is getting your compiler to
put the right commands inside the main, so we want to kind of
“offload” as much as we can away from that error-prone piece.
One, two, many
When testing each new aspect that you think you have working, I recommend starting with the smallest possible program that uses that feature, like (say) a program that just reverses a single string literal and prints it out, etc. That way, when you are first getting it to work, you can look at the LLVM code your compiler produced, see what you wanted it to do vs what is actually happening, and adjust accordingly.
Then, when you get it working for one concatenation, one reversal, one print statement, etc, try doing it twice in the same program. If you wrote the code well, it should work with no more changes! But if you need to debug, the program is still small and understandable so you have a better chance at fixing it.
And then, after it works twice in simple settings, make it as complex as you think you should be able to handle and test again. Here your tests won’t really be useful as debugging directly, but will help you make certain that the current feature is working perfectly before you move on to the next one.