Lab 7: Abstract Syntax Trees

This lab contains both written and electronic parts, to be submitted separately. Each exercise is marked according to whether it should be part of the electronic or written submission. For each electronic part (only!), add an entry to a features.txt file specifying what you have working, and anything else the tester needs to know.

You can (and should) work with a partner for this lab, and if so, only one of you should submit each part. And of course include both names on everything you submit, electronically or on paper. Choose partners carefully! This lab is the first in a long series on SPL. You're not obligated to stick with the same partner for the duration, but it will probably be easier if you do.

Your electronic submission should come from a folder called lab07 and include the following files. You can download all the starter code at once with the file lab07.tar.gz.

spl.lpp: Flex file for SPL scanner.
spl.ypp: Bison file for SPL parser.
value.hpp: Header file for the Value class, representing return values from expressions.
ast.hpp: Header file for AST class hierarchy, representing abstract syntax tree nodes.
ast.cpp: Implementation of AST class methods.
st.hpp: Header file for the SymbolTable class.
st.cpp: Implementation of SymbolTable class methods.
Makefile: Makefile with a target for the program spl, which displays abstract syntax trees and (after your modifications) will execute SPL programs.
features.txt: You should know about this by now.

Introduction

From today's lab all the way until the end of the semester, we will be working on interpreting and compiling the SPL language that we saw introduced last week. Specifically, the outline is as follows:

Lab 7 (today!): Interpreting an Abstract Syntax Tree.
Gaining familiarity with OOP in C++ and the AST structure allows us to interpret all language features except functions, using a single global scope.
Lab 8: Functions and Dynamic Scope.
We will implement lambdas and function calls, as well as dynamic scoping rules that will allow features such as recursive functions.
Lab 9: Lexical scoping with frames.
We will implement lexical scoping using activation frames. This will support advanced functional programming features using closures, like we saw in Scheme.
Lab 10: Type checking.
We will add both dynamic and static type-checking features to our interpreter to support richer feedback and reliability for the SPL programmer.
Lab 11: Garbage collection.
Automatic garbage collection will be added to our interpreter using the mark-and-sweep paradigm, to prevent memory leaks and make our interpreter more robust.
Lab 12: Compiling for a virtual machine.
Finally, we will foray briefly into actual compilation by producing byte-code for a virtual machine, similar to what is done by the javac compiler.

OOP in C++

We will be using object-oriented programming in C++ to write the SPL interpreter. Luckily, most of the class structure is already provided by me, and you just have to fill in some of the methods. But here's a summary of how OOP works syntactically in C++. (You should already understand how it works semantically from Java!)

Classes are declared using the class keyword. Superclasses are specified using a colon (like the extends keyword in Java). For example
```
class Stmt :public AST { ... };
```
declares a class called Stmt that is a subclass of AST. The public keyword is necessary and tells us that the methods in AST can be accessed from a Stmt object in the same way.
Fields and methods in a class can be either private, protected, or public. We specify these by grouping declarations with the same visibility and then writing visibility specifiers before them, like public:.
If we want to use polymorphism in a class method, we declare the method with the virtual keyword, like
```
virtual void exec() {...}
```
This allows sub-classes to override that method in the way that we expect from Java. (In Java, essentially every method is declared virtual.)
There is no special syntax for interfaces, nor is there an abstract keyword as in Java. Instead, to declare a method abstract (i.e., it must be overridden by subclasses), we write = 0 instead of the method body, like
```
virtual AST* getPart(int i) = 0;
```
Methods can be defined within the class declaration, or we can just declare their prototype in the class declaration, and define the function later in a .cpp file. For instance, the following appears in ast.hpp:
```
class Atom :public AST {
  protected:
    void writeLabel(ostream& out);
    
    ...
};
```
Then we specify the definition of this function in the file ast.cpp Since this is outside the class declaration, we must specify the name of the class in front of the method name and use the scope resolution operator :: as follows:
```
void Atom::writeLabel(ostream& out) {
  /* (function body here) */
}
```

Exercises:

(written) Draw the hierarchy of classes appearing in ast.hpp. I just need the class names, not any fields or methods. Specify the subclass relationship with an arrow.
(written) What are the methods that a subclass of Stmt should implement? What are they for a subclass of Exp?

The SPL language

Since we are going to be interpreting SPL code, we better know a thing or two about the language! Here is the grammar, changed ever so slightly from last week (don't expect significant changes from here on, though):

res: stmt
|

block: LC stmtlist RC

stmtlist: stmtlist stmt
|

stmt: NEW ID ASN exp STOP
|     ID ASN exp STOP
|     READ ID STOP
|     WRITE exp STOP
|     IF LP exp RP block
|     IF LP exp RP block ELSE block
|     WHILE LP exp RP block
|     block

exp: exp BOP exp
|    NOT exp
|    exp COMP exp
|    exp OPA exp
|    exp OPM exp
|    OPA exp
|    LAMBDA ID block
|    exp LP exp RP
|    LP exp RP
|    ID
|    NUM
|    BOOL

For the most part, SPL follows a similar syntax to C/C++/Java, albeit with fewer features. However, there are a few oddities.

First, notice that there are no types specified. Variables are declared and given an initial value with a statement like

new x := 10;

and afterwards, they can be reassigned by using the := operator without the new keyword.

Function declarations also look a bit funny. They are written using the lambda keyword, like

new addone := lambda n {ret := n + 1;}

Then this function could be called with code like addone(5). We'll get into what this means (i.e., the semantics of SPL functions) in next week's lab.

Exercises

(written) Write an SPL program that reads in integers until 0 is entered, at which time the sum of all the integers read in is printed to the screen.
Turn in your program as well as a printout of the AST that results when you run your code through the existing spl program in the starter code. Note: you should enclose your program in a single block with curly braces so that it prints as a single AST.

Arithmetic

Now let's get down to some interpreter writing! Look at the code and see how an AST is evaluated. Stmt node types have an exec method that does whatever that node is supposed to do. Exp node types have an eval method that computes and returns whatever value is represented by that expression tree. These methods are currently only implemented in the Num and Write classes, so we can just run simple one-line programs like

write 5;

For this part, your task is to extend this to include basic arithmetic with integer constants. This requires implementing the eval methods in ArithOp and NegOp. For each of these, you will have to declare the new method in the class with a line like

Value eval();

You can then either define the method right there in the ast.hpp file, or externally in the ast.cpp file. Remember, external definitions require you to specify the class name, like

Value ArithOp::eval() { /* code here */ }

Note that the eval method in ArithOp will have to have a switch statement depending on which operator is being used. Look in the class declaration for how the operator is stored, and see the top of ast.hpp for the enum type Oper that you will need to use.

Remember that an operation like + must evaluate its arguments first before using them! This is accomplished by calling the eval method recursively. For instance, the first argument to an ArithOp is evaluated by calling args[0]->eval().

Look at the definition of the Value class in value.hpp. This is the type returned by the recursive calls to eval(). You will be interested in using the num() method, which returns the integer value that is stored in the object. Also note that we have constructors from the basic types, so writing something like return 8; will automatically covert the int 8 to a Value object. Sweet!

Tip: once you get your interpreter going, it will start to get really annoying having the AST pop up on the screen every time. So just comment out the line

system("evince spl.pdf > /dev/null 2>&1 &");

in spl.ypp. This will ensure that the AST still gets generated and saved in spl.pdf, but you don't have to look at it unless you need to.

Exercises

(electronic) Get arithmetic with +, -, *, and / working. A program like
```
write 5 * 3 / (-2 + 7);
```
should work and produce 3.

Comparison and Boolean Operators

Now incorporate comparisons and boolean arithmetic into the picture. Start with the Atom type BoolExp, then move on to CompOp operators, and finally BoolOp and NotOp.

Exercises:

(electronic) Code in support for boolean literals true and false as well as the operators <, >=, !=, etc., and also and, or, and not. A program like
```
write true and 3 > 5 or 1 + 2 != 8;
```
should work and produce true.

Variables

Look at the SymbolTable class in the files st.hpp and st.cpp files. These define a global symbol table object ST that will be used to implement a single, global scope. (Don't worry! We'll implement proper scoping in the coming labs.)

Use the methods in SymbolTable to implement variable declaration, assignment, and reference. This means implementing eval() in the Id class, and exec() in the Asn and NewStmt classes.

Exercises:

(electronic) Get variable reference, as well as declaration and assignment using new and :=, working in the spl interpreter program. A program like
```
new x := 5;
x := x + 10;
write x;
```
should work and produce 15.

Control Structures

You may have noticed that blocks with curly braces produce valid ASTs, but still make the interpreter give up. To fix this, implement the exec() method in the Block class. (Hint: this should be a one-liner!)

Once blocks are working, start getting the conditional control structures working by implementing exec() methods in IfStmt, IfElse, and WhileStmt.

Exercises:

(electronic) Get all the control structures to work properly in your program. After this, a more sophisticated program such as

{
  new a := 2;
  new b := 3;
  while (a >= 1) { 
    if (a = 1) { b := b + 1; }
    a := a - 1; 
  }
  if (b < 4) {
    write -100;
  }
  else {
    b := 50;
    write b;
  }
}

should work and print 50 to the screen.

Getting `read` right

Okay, now just about everything is working except for functions (which we'll deal with next class) and read. Your first task is to implement the exec method in the Read class, which should just read in an integer and store it in the variable that is named.

But this creates some awkward problems. Your spl interpreter is now reading the SPL program from standard in, as well as the input to that program. Sometimes, we might want the program to come from a file instead so that program and input can be separate. To facilitate this, change the beginning of the main function to look like:

extern FILE* yyin;
int main(int argc, char **argv)
{
  if (argc > 1)
    if (!(yyin = fopen(argv[1],"r"))) {
      cerr << "Could not open input file \"" << argv[1] << "\"!" << endl;
      exit(1); 
    }
  ...
}

This reassigns the yyin stream used by the flex scanner to read from the named file instead of standard in. With this, running your program like ./spl with no arguments will still work the same as before, but running it like

./spl prog1.spl

will instead read the program from the file prog1.spl.

Exercises

(electronic) Get the read command working. Typing
```
new x := 4; read x;
```
into your interpreter should cause it to wait until an integer is typed in. Then writing
```
write x;
```
should echo that typed-in integer back to the screen.
(electronic) Make the modification to main as specified above so that the SPL program can be stored in a separate file, specified on the command line.
(written) Now your interpreter should have everything you need to run your program from Exercise #3! Show a transcript of you running the ./spl interpreter on your SPL program, stored in a separate file, and reading in integers until 0 and then printing their sum.

Two OPTIONAL extensions

If you've gotten this far, great job! Let's add two more nice features that will make programming more pleasant: short-circuit evaluation, and debugging statements.

Short-circuit evaluation means cutting off a boolean expression with ands and ors as soon as we know the answer. For instance, since false and XXXX must be false no matter what XXXX is, we can just return false from this without actually evaluating the second argument.

Debugging statements are little snippets of text that print to the screen when a certain part of your code is reached. Since SPL doesn't have any support for strings, we have to add a language feature to support this. Our approach will be that anything enclosed in double-quotes is a debugging statement. This will require a new AST Stmt node class called Debug, which you will have to create.

Exercises

(OPTIONAL, electronic) Get short-circuit evaluation working correctly for boolean operators. Expressions like the following should then work:
```
false and 1/0
```
(would have given division-by-zero error before), and
```
true or what
```
(would have given undefined variable error before).
(OPTIONAL, electronic) Add in debugging statements as described above. Besides the new Debug class in ast.hpp, you will also need a new token type, a new scanner rule, and a new parser rule. But it'll be worth it, I swear!

Introduction

OOP in C++

Exercises:

The SPL language

Exercises

Arithmetic

Exercises

Comparison and Boolean Operators

Exercises:

Variables

Exercises:

Control Structures

Exercises:

Getting read right

Exercises

Two OPTIONAL extensions

Exercises

Getting `read` right