Lab 13: Compiling for a Virtual Machine

This is the archived website of SI 413 from the Fall 2012 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

This lab is due at 0900 on the first day of exams, Tuesday, December 11. It should contain all the files in the starter code listed below, as well as a subfolder called tests with your tests. See the submit page for more info.

The starter code for this week's lab is close to the Lab 10 solutions, but slightly modified for our purposes this week, as outlined below. Here are the files:

You can get all these files at once by downloading the file lab13.tar.gz and running tar xzvf lab13.tar.gz

Background: Virtual machines

Not all programming languages are interpreted, and not all interpreters interpret the abstract syntax tree directly, as we have done. What are the alternatives? Well, code for non-interpreted languages is compiled to run on a physical machine, and a lot of code for interpreted languages is actually compiled to run on a virtual machine, which means that it is compiled to some simple assembly-like language that is interpreted directly by the virtual machine. In this lab we'll write a compiler that reads in SPL code and outputs equivalent code for the "Simple Virtual Machime" (SVM), which is ... a simple virtual machine. It's been written specifically to mesh easily with SPL.

The virtual machine you're probably most familiar with is the Java Virtual Machine (JVM). Java code is compiled into "byte code", which is the assembly-like language understood by the JVM. The JVM is a stack-based machine rather than register-based like a typical physical machine. This means that instructions pop their arguments from a stack of values (the "vstack") and, when they produce results, push them back. This will take some getting used to.

Part 1: svm

The virtual machine we'll be using for this lab is svm, which you can just run with the svm command (the full path is /courses/roche/413/bin/svm). You enter SVM assembly instructions, hit ctrl-d, and it evaluates them. Alternatively, you can enter instructions followed by "go" or a semicolon, and it will evaluate what you've typed and then wait for more instructions. (The semicolon also causes the contents of the vstack to be printed out, handy for debugging.)

> svm
literal 5
literal 3
mul
literal 2
add
print
go
17

We could also do everything above using the ! shortcut for literal, in one line:

!5 !3 mul !2 add print

Of course the SVM is a bit more powerful than just a hand calculator. It actually has a vstack AND a memory space with numbered slots. All the calculations are performed in the vstack, and the load and store bring things from and to memory. We can also label any line with an integer label using the label statement, then return to that line with a goto or branch.

You can get a list of all commands by running svm -h, or by typing help from within the virtual machine. For your edification, here's a slightly more interesting example of an SVM program to read in integers until zero is entered, and then print their sum (like Exercise 1 from Lab 7).

% Test program for SVM
% Reads integers until 0, then prints their sum
literal 0
literal 10
store % Store the sum in location 10
 
label 99
read
dup
literal 0
eq
literal L42 
branch % if the number read in is zero, go to the bottom.
 
literal 10
load
add
literal 10
store
literal L99
goto
 
label 42
literal 10
load
print
exit

(If you are interested, you can inspect the source code for my svm program by looking in /courses/roche/413/svm/. Feel free to mess around, but whatever you submit for the lab should work for the program in my directory above.)

Exercises

For these exercises, I want you to write program in the virtual machine (SVM) language that is equivalent to the given SPL code. Of course, this is exactly what your compiler will do eventually, but it's worth doing "by hand" first to see how it should work.

You do not need to submit any tests for these exercises.

Submit an SVM program in a file called ex1.svm that is equivalent to:
```
write 4 + (18 - 6) / (3 + 3);
```
(And no, you can't just optimize this to !6 print and call it a day. Make the program do the calculations.)
Submit an SVM program in a file called ex2.svm that is equivalent to:
```
new x := read;
ifelse x/3 < 2
  { write x; }
  { write false; }
x := 0;
```
(Note: this will be more challenging. You should save "x" into a numbered memory location using load/store. The control flow for the if/else will require labels and branch/goto statements.)

Part 2: The basics

Now let's start writing our SPL-to-SVM compiler. Have a look around the starter code for today's lab. It's basically the solution to lab 10, with a whole bunch of stuff ripped out that won't be necessary anymore. In particular, notice:

The scanner and parser files look pretty much the same. We will still be creating an AST, execing statements and evaling expressions. But...
The eval method in Exp has changed significantly - it no longer returns a Value! Since we are writing a compiler now, the exec and eval methods will just be writing SVM code to standard out - not computing anything.
Important: The main difference between statements and expressions in our compiler is that execing a statement returns the virtual machine's vstack to its original state when it is done, whereas evaling an expression results in the vstack having exactly one more value in it at the end of the evaluation - which will be the computed result of that evaluation. Make sure you understand this!
The Frame class has changed significantly. Rather than binding names to Values, we are binding names to memory addresses, which in the virtual machine we are using are just ints. So looking up a name in a frame just means finding the memory address where that variable is stored in the virtual machine. Rebind is no longer relevant - that will be done by loading and storing stuff in the virtual machine. The bind function that remains merely inserts a name into the frame attached to a new memory address which no one else is using.

Ultimately, the semantics of our compiled SPL language are going to change in one significant way: all memory will be statically allocated (like in Fortran 77). Remember what that means? Functions will still be lexically scoped and first class, but we will NOT be able to use recursion because every call to a function will re-use the same memory locations. Luckily, I've already written the Frame class to give you this behavior, and you won't have to worry about it too much until Part 4 below.

Exercises

To make the compiler work, all you need to do is complete all the eval and exec methods for the AST node classes (expressions and statements). Actually, I've already filled in a few of them for you so you can get a feel for how things work.

These exercises will all go into those same files, and don't require separate submission or anything like that.

Submit two tests for each of these exercises.

Get basic arithmetic, comparison, and boolean operators working, along with read and write. After this, you should be able to re-create what you did for exercise 1 automatically, by doing something like
```
> ./splc > tmp.svm
write 4 + (18 - 6) / (3 + 3);
> svm tmp.svm
6
```
Get variable declaration, lookup, and assignment working. Test it!
Get the three control structures if, if/else, and while working. Note that I've written IfStmt::exec for you already. Now you should be able to re-create what you did for exercise 2 as well.

Choose your own adventure

You can choose to do either Part 3 or Part 4. (You will not get extra credit for doing both.) Part 4 is significantly harder, so if you choose that one you will receive some bonus points for your troubles.

Part 3: Short-circuit evaluation

Recall that "short-circuit evaluation" refers to stopping the execution of "and" and "or" expressions as soon as we know what the result will be. It allows us to run SPL programs like

write 5 < 3 and 1/0;
write 3 = 3 or true > false;

without getting any errors.

Making this happen in the SPL interpreter was pretty trivial. Not so in the compiler. It will require labels, branches, and/or gotos.

Exercises

Submit two tests for this exercise.

Get short-circuit evaluation to work for the and and or operators.

Part 4: Functions!

Sooner or later you had to know that lambda and funcall would show up to ruin your day. Getting these to work will require some serious thought! Here are some tips:

The bulk of the code will be in Lambda::eval. The body of the function should only be written out ONCE in your program, at the point of definition. Things like creating a new Frame which use to happen in Funcall will now show up in Lambda instead.
The Lambda should evaluate to a label number. This will be the label to jump to when the function gets called later on. So this means that after the Lambda is finished evaluating, there should be a label number on top of the vstack.
Think hard about how information is being passed between the call site and the function body. There are two pieces of information you need to pass forward: the argument value, and a label number to return to when the function finishes executing. The easiest thing will probably be to have these be the top two vstack elements whenever we goto the beginning of a function body.
Doing some compilation for some very simple functions "by hand" will probably be illuminating if you are getting stuck.

Exercises

Submit two tests for this exercise.

Get function calls and returns working. After this, you should be able to compile an SPL program like this one:

new f := lambda x {
  new h := lambda x { ret := 2*x; };
  ret := lambda y {
    ret := y + h@x;
  };
};
write f@3@4;
write f@5@6;