Lab 7: Abstract Syntax Trees

This is the archived website of SI 413 from the Fall 2012 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

This lab is due at 0900 next Tuesday, October 16. It should contain all the files in the starter code listed below, as well as a subfolder called tests with your tests. See the submit page for more info.

Starter Code

Here is the starter code for today's lab. You can get all these files at once by downloading the file lab07.tar.gz and running tar xzvf lab07.tar.gz

Introduction

From today's lab all the way until the end of the semester, we will be working on interpreting and compiling a made-up programming language called SPL ("Simple Programming Language"). Here's how it'll happen:

Lab 7 (today!): Interpreting an Abstract Syntax Tree.
Gaining familiarity with OOP in C++ and the AST structure allows us to interpret all language features except functions, using a single global scope.
Lab 8: Functions and Dynamic Scope.
We will implement lambdas and function calls, as well as dynamic scoping rules that will allow features such as recursive functions.
Lab 9: Lexical scoping with frames.
We will implement lexical scoping using activation frames. This will support advanced functional programming features using closures, like we saw in Scheme.
Lab 10: Type checking.
We will add both dynamic and static type-checking features to our interpreter to support richer feedback and reliability for the SPL programmer.
Lab 11: Garbage collection.
Automatic garbage collection will be added to our interpreter using the mark-and-sweep paradigm, to prevent memory leaks and make our interpreter more robust.
Lab 13: Compiling for a virtual machine.
Finally, we will foray briefly into actual compilation by producing byte-code for a virtual machine, similar to what is done by the javac compiler.

(Note: the missing number is Lab 12, where we'll play with the Python programming language a little bit.)

OOP in C++

We will be using object-oriented programming in C++ to write the SPL interpreter. Luckily, most of the class structure is already provided by me, and you just have to fill in some of the methods. But here's a summary of how OOP works syntactically in C++. (You should already understand how it works semantically from Java!) All of the examples below are taken from the ast.hpp file from today's starter code.

Classes are declared using the class keyword. Superclasses are specified using a colon (like the extends keyword in Java). For example
```
class Stmt :public AST {
  /* fields and methods in here */
};
```
declares a class called Stmt that is a subclass of AST. The public keyword is necessary and tells us that the methods in AST can be accessed from a Stmt object by anybody.
And don't forget that class declarations must end with a semicolon in C++!
Fields and methods in a class can be either private, protected, or public. We specify these by grouping declarations with the same visibility and then writing visibility specifiers before them, like public:.
If we want to use polymorphism in a class method, we declare the method with the virtual keyword, like
```
virtual void exec() {...}
```
This allows sub-classes to override that method in the way that we expect from Java. (In Java, essentially every method is declared virtual.)
There is no special syntax for interfaces, nor is there an abstract keyword as in Java. Instead, to declare a method abstract (i.e., it must be overridden by subclasses), we write = 0 instead of the method body, like
```
virtual void ASTchild(AST* child) = 0;
```
Methods can be defined within the class declaration, like in Java. But if we did this all the time, especially when there are many classes defined in the same file, the file gets pretty cluttered and hard to read.
So instead of defining methods within the class declaration, we can just declare the method's prototype there, and fully define the method in some .cpp file later.
For instance, the following class definition appears in ast.hpp:
```
class ArithOp :public Exp {
  public:
    Value eval();
 
  // ... other fields and methods ...
};
```
Notice that there is no function body for the eval method. That is saved for the ast.cpp file:
```
Value ArithOp::eval() {
  int l = left->eval().num();
  int r = right->eval().num();
  switch(op) {
    case ADD: return l + r;
    case SUB: return l - r;
    case MUL: return l * r;
    case DIV: 
      if (r != 0) return l / r;
      else if (!error) {
        error = true;
        errout << "ERROR: Divide by zero" << endl;
      }
      return Value();
    default:  return Value(); // shouldn't get here...
  }
}
```
Since this is outside the class declaration, we must specify the name of the class in front of the method name and use the scope resolution operator :: like ArithOp::eval.
There is no set rule about when a method should be declared in the cpp file versus the hpp file, but it's probably a good idea once the method body gets longer than 6 lines or so. Another good rule of thumb is to try and keep the class declaration (in the hpp file) one one "screenfull", which will be roughly 50 lines depending on your font and editor.

The SPL language

SPL is an extension of the calculator language we have been using to include a few control structures as well as simple calculator instructions. The syntax is mostly similar to C++ and Java but much more limited, and there are some very important differences (as we will see). Specifically, SPL supports:

Integers like 1234 and booleans true and false
Basic arithmetic (+, -, *, /)
Numeric comparisons (=,!=, >, >=, <, <=)
Boolean operations (and, or, not)
Input/output (read and write)
Basic control structures (if, ifelse, and while)
Variables, declared with the new keyword and assigned using the := operator.
User-defined functions. These must be unary (single-argument), and are defined with lambda statements.

A scanner and parser for SPL have already been written for you; they are in the files spl.lpp and spl.ypp. Check them out for the full details of the SPL syntax. Here's an overview of the grammar:

res: stmt
|

block: LC stmtlist RC

stmtlist: stmtlist stmt
|

stmt: NEW ID ASN exp STOP
|     ID ASN exp STOP
|     WRITE exp STOP
|     IF exp block
|     IFELSE exp block block
|     WHILE exp block
|     block

exp: exp BOP exp
|    NOTTOK exp
|    exp COMP exp
|    exp OPA exp
|    exp OPM exp
|    OPA exp
|    READ
|    LAMBDA ID block
|    exp FUNARG exp
|    LP exp RP
|    ID
|    NUM
|    BOOL

You can see the token names in the grammar above. First we have the standard punctuation: LC and RC are curly braces, LP and RP are parentheses, and STOP is a semicolon. ID is a variable name (which can contain letters and digits), NUM is an series of digits, and BOOL is either true or false.

The operators of the language are defined by these tokens:

Token name | Operator(s)
-----------|------------
       ASN | :=
       BOP | and or
    NOTTOK | not
      COMP | = != < <= > >=
       OPA | + -
       OPM | * /
    FUNARG | @

Finally the keywords in SPL are:

new write if ifelse while read lambda true false

Here are a few interesting aspects of SPL that you should be aware of:

There are no declared types. Variables are declared and given an initial value with a statement like
```
new x := 10;
```
and afterwards, they can be reassigned using just the := operator without the new keyword.
The conditions in if and while statements do not need to be enclosed in parentheses. For example, here is a program to find (and then print) the sum of 1 up to 100:
```
new i := 1;
new sum := 0;
while i <= 100 {
  sum := sum + i;
  i := i + 1;
}
write sum;
```
A regular if statement (with no else) looks like you might expect:
```
if 1 < 10 { write true; }
```
But if we have an if-else, you use the keyword ifelse instead of if, followed by the condition and the two code blocks - no separate else keyword is used:
```
ifelse false { write 100; } { write -100; }
```
Function calls are a bit different than the norm. We won't really have to deal with these until next week, but here's a primer. Every function is a unary function (takes one argument), and return values are specified by assigning to the special variable ret. So here is a function to compute the factorial of its argument:
```
# This is a one-argument function to compute factorials.
new fact := lambda n {
  new prod := 1;
  new i := 1;
  while i <= n {
    prod := prod * i;
    i := i + 1;
  }
  ret := prod;
};
```
(Did I mention? Pound signs # start single-line comments.)
Once we have a function defined, you call it with the @ operator, like function @ operand. Using the function just defined, we could print out 8 factorial with:
```
write fact@8;
```

Exercises

(No tests for this part) Write an SPL program that reads in integers until 0 is entered, at which time the sum of all the integers read in is printed to the screen.
Submit your program in a file called ex1.spl. And call over your instructor to go over the AST that results when you run your program through the existing spl program from the starter code. Note: you should enclose your program in a single block with curly braces so that it prints as a single AST.

The SPL interpreter

As you may have noticed, the starter code for today's lab is fairly extensive. This is intimidating, because you're going to be modifying a piece of software that you didn't create and that you might not fully understand yet. But the good news is that you don't have to write everything from scratch! I suggest you dive in with the exercises below and don't worry about parts of the code until you need to. As always, ask your instructor as you encounter difficulties!

Here's an overview of how the interpreter works. This is a process you should be pretty familiar with by now, although you haven't seen it all in working C++ code before:

The scanner (spl.lpp) reads in the SPL program and breaks it up into tokens.
The parser (spl.ypp) organizes these tokens according to the grammar rules. As a side-effect to parsing, an Abstract Syntax Tree (AST) is created.
Every node in the AST is a pointer to an object of a subclass of AST. The two main subclasses are Stmt for statements and Exp for expressions. Every Stmt subclass has an exec() method to do whatever it is that statement does, and each subclass of Exp has an eval() method that returns the value that that expression computes to.
The top node in the AST (which must be a Stmt) is executed, which recursively causes other statements to be executed and various expressions to be evaluated. All these methods are declared in the ast.hpp file.

Your task in this lab is to write all those exec and eval methods, except for the ones having to do with lambdas and function calls. You can see all the types of statements and expressions by looking at the top of the ast.hpp file (around line 45). Each one of those classes needs a shiny new exec() or eval() method, except for the following which I've already written for you: NullStmt, Write, Num, and ArithOp.

The exercises below give a nice ordering and hints on completing the implementation. Good luck!

Testing

Testing for this lab will be similar to Lab 5, except that there is only one program to test (your SPL interpreter). Each test file will consist of some SPL code (and possibly input to read expressions), followed by a line with just a semicolon, followed by the expected output. If the test file name ends in .err then it is an error test and should just contain SPL code (no semicolon line and no output).

I'm not going to mandate a certain number of tests for each exercise below, but you should have at least 15 tests total to get full credit for this lab.

Operations

Since Num, ArithOp, and Write have already been written for you, simple SPL programs that do some light number crunching work already with the starter code:

spl> write 10;
10
spl> write 5 + 3;
8
spl> write 18 - 15 / 3;
13

But alas, none of the following examples work:

spl> write -(5 + 5);
-10
spl> write 4 > 3;
true
spl> write 11 = 12;
false
spl> write true;
true
spl> write true and false;
false
spl> write false or true;
true
spl> write not true;
false
spl> write 6 + 3 != 2 and not 6 + 3 <= 2;
true

So fix it! This requires implementing the eval methods in the classes mentioned below. For each of these, you will have to declare a new method in the class with a line like

Value eval();

You can then either define the method right there in the ast.hpp file, or externally in the ast.cpp file. Remember, external definitions require you to specify the class name, like

Value ArithOp::eval() { /* code here */ }

Notice that the eval method in ArithOp has a switch statement depending on which operator is being used. Look in the class declaration for how the operator is stored, and see the top of ast.hpp for the enum type Oper that you will need to use.

Remember that an operation like + must evaluate its arguments first before using them! This is accomplished by calling the eval method recursively. For instance, the first argument to an ArithOp is evaluated by calling left->eval().

Look at the definition of the Value class in value.hpp. This is the type returned by the recursive calls to eval(). You will be interested in using the num() method, which returns the integer value that is stored in the object. (The tf() method returns the bool value instead.) Also note that we have constructors from the basic types, so writing something like return 8; will automatically covert the int 8 to a Value object. Sweet!

Tip: once you get your interpreter going, it will start to get really annoying having the AST pop up on the screen every time. To get rid of it, just change the line

bool showAST = true;

in the main method of spl.ypp to false. This will ensure that the AST still gets generated and saved in spl.pdf, but you don't have to look at it unless you need to.

Exercises

Write the eval() method in NegOp, so that unary negation with the - sign works.
Write the eval() method in CompOp, so that integer comparison operations like 5 != 12 work.
Write the eval() method in BoolExp, so that boolean constants like false work.
Write the eval() methods in BoolOp and NotOp, so that the and, or, and not operators work.

Variables

Look at the SymbolTable class in the starter code file st.hpp. There is a global SymbolTable object ST that will be used to implement a single, global scope. (Don't worry! We'll implement proper scoping in the coming labs.) Read and understand what the three public methods of the class do.

Exercises

Get variable declaration and assignment working using new and :=. This means implementing the exec() methods in classes NewStmt and Asn.
You should print helpful error messages and set the error flag to true if a variable is declared twice (two new's) or if it's assigned before it's declared. For example:
```
spl> new x := 5;  # This works fine
spl> x := 20;     # This is fine too
spl> y := 20;
ERROR: Can't rebind y; not yet bound!
spl> new x := 30;
ERROR: Variable x already bound!
```
(You will have to use the methods of the SymbolTable class in st.hpp. You might have to change some methods in that class to get all the error handling working correctly.)
Get variable reference working in the spl interpreter program. A program like
```
new x := 5;
x := x + 10;
write x;
```
should work and produce 15.
This means implementing eval() in the Id class.

Control Structures

It looks like there are three control structures in SPL: if, ifelse, and while. But actually these are only two, because an if and an ifelse statement in the code both become a single IfStmt object in the AST. The only difference is that the else block might be NullStmt if the first syntax was used.

To get these structures working, you first need to implement the exec() method in the Block class. You may have noticed that blocks with curly braces produce valid ASTs, but still make the interpreter give up. Well, a bunch of statements surrounded by curly braces is a Block, and that's where you will start.

Exercises

Get blocks of statements in curly braces working. This means writing the exec() method in the Block class, so that a program like
```
{ new var := 10; 
  var := var * var; 
  write var; 
}
```
works correctly and prints out 100. HINT: This should be a two-liner!
Get ifs and ifelses working. This means writing the exec() method in the IfStmt class, so that a program like
```
ifelse 40 < 11
  { write true; }
  { write false;
    if 2 != 2 { write 2; }
  }
```
works correctly and prints out false.
Get while statements working. This means writing the exec() method in the WhileStmt class, so that a program like
```
new i := 3;
while i > 0 {
  write i;
  i := i - 1;
}
```
works correctly and prints out
```
3
2
1
```

Last one: `read`

Okay, now just about everything is working except for functions (which we'll deal with next class) and read. In SPL, read is an expression, which means that it returns a value (namely, whatever the user types in).

Reading input can create some awkward problems because your actual code (the SPL program that is being interpreted) is coming from stdin, but so is the input to that program when you have read expressions. To help make this a little less confusing, print out a read> prompt when the read statement executes. Then a run of your interpreter might look like:

spl> new secret := 42;
spl> new numtries := 0;
spl> while read != secret { numtries := numtries + 1; }
read> 15
read> -3
read> 21
read> 42
spl> write numtries;
3

Exercises

Get the read command working. Typing
```
new x := 4; x := read;
```
into your interpreter should cause it to wait until an integer is typed in. Then writing
```
write x;
```
should echo that typed-in integer back to the screen. This means writing the eval() method in the Read class.
Now your interpreter should have everything you need to run your program from Exercise #1! If you give a filename as a command-line argument to the spl interpreter, it reads and executes SPL code from that file instead of reading in the code from the terminal. So you should be able to do:
```
$ ./spl ex1.spl
read> 5
read> 6
read> 12
read> 0
23
```
There is nothing to hand in or to test for this part but rest assured that I will test it when you submit your code! (Plus it should feel damn good to get this working.)

Two OPTIONAL extensions

If you've gotten this far, great job! Let's add two more nice features that will make programming more pleasant: short-circuit evaluation, and debugging statements.

Short-circuit evaluation means cutting off a boolean expression with ands and ors as soon as we know the answer. For instance, since false and XXXX must be false no matter what XXXX is, we can just return false from this without actually evaluating the second argument.

Debugging statements are little snippets of text that print to the screen when a certain part of your code is reached. Since SPL doesn't have any support for strings, we have to add a language feature to support this. Our approach will be that anything enclosed in double-quotes is a debugging statement. This will require a new AST Stmt node class called Debug, which you will have to create.

Exercises

(OPTIONAL) Get short-circuit evaluation working correctly for boolean operators. Expressions like the following should then work:
```
write false and 1/0;
```
(would have given division-by-zero error before), and
```
write true or what;
```
(would have given undefined variable error before).
(OPTIONAL) Add in debugging statements as described above. Besides the new Debug class in ast.hpp, you will also need a new token type, a new scanner rule, and a new parser rule. But it'll be worth it, I swear!

Starter Code

Introduction

OOP in C++

The SPL language

Exercises

The SPL interpreter

Testing

Operations

Exercises

Variables

Exercises

Control Structures

Exercises

Last one: read

Exercises

Two OPTIONAL extensions

Exercises

Last one: `read`