Lab 7: Abstract Syntax Trees

This is the archived website of SI 413 from the Fall 2012 semester. Feel free to browse around; you may also find more recent offerings at my teaching page.

This lab is due at 0900 next Tuesday, October 16. It should contain all the files in the starter code listed below, as well as a subfolder called tests with your tests. See the submit page for more info.

Starter Code

Here is the starter code for today's lab. You can get all these files at once by downloading the file lab07.tar.gz and running tar xzvf lab07.tar.gz

Introduction

From today's lab all the way until the end of the semester, we will be working on interpreting and compiling a made-up programming language called SPL ("Simple Programming Language"). Here's how it'll happen:

(Note: the missing number is Lab 12, where we'll play with the Python programming language a little bit.)

OOP in C++

We will be using object-oriented programming in C++ to write the SPL interpreter. Luckily, most of the class structure is already provided by me, and you just have to fill in some of the methods. But here's a summary of how OOP works syntactically in C++. (You should already understand how it works semantically from Java!) All of the examples below are taken from the ast.hpp file from today's starter code.

The SPL language

SPL is an extension of the calculator language we have been using to include a few control structures as well as simple calculator instructions. The syntax is mostly similar to C++ and Java but much more limited, and there are some very important differences (as we will see). Specifically, SPL supports:

A scanner and parser for SPL have already been written for you; they are in the files spl.lpp and spl.ypp. Check them out for the full details of the SPL syntax. Here's an overview of the grammar:

res: stmt
|

block: LC stmtlist RC

stmtlist: stmtlist stmt
|

stmt: NEW ID ASN exp STOP
|     ID ASN exp STOP
|     WRITE exp STOP
|     IF exp block
|     IFELSE exp block block
|     WHILE exp block
|     block

exp: exp BOP exp
|    NOTTOK exp
|    exp COMP exp
|    exp OPA exp
|    exp OPM exp
|    OPA exp
|    READ
|    LAMBDA ID block
|    exp FUNARG exp
|    LP exp RP
|    ID
|    NUM
|    BOOL

You can see the token names in the grammar above. First we have the standard punctuation: LC and RC are curly braces, LP and RP are parentheses, and STOP is a semicolon. ID is a variable name (which can contain letters and digits), NUM is an series of digits, and BOOL is either true or false.

The operators of the language are defined by these tokens:

Token name | Operator(s)
-----------|------------
       ASN | :=
       BOP | and or
    NOTTOK | not
      COMP | = != < <= > >=
       OPA | + -
       OPM | * /
    FUNARG | @

Finally the keywords in SPL are:

new write if ifelse while read lambda true false

Here are a few interesting aspects of SPL that you should be aware of:

Exercises

  1. (No tests for this part) Write an SPL program that reads in integers until 0 is entered, at which time the sum of all the integers read in is printed to the screen.
    Submit your program in a file called ex1.spl. And call over your instructor to go over the AST that results when you run your program through the existing spl program from the starter code. Note: you should enclose your program in a single block with curly braces so that it prints as a single AST.

The SPL interpreter

As you may have noticed, the starter code for today's lab is fairly extensive. This is intimidating, because you're going to be modifying a piece of software that you didn't create and that you might not fully understand yet. But the good news is that you don't have to write everything from scratch! I suggest you dive in with the exercises below and don't worry about parts of the code until you need to. As always, ask your instructor as you encounter difficulties!

Here's an overview of how the interpreter works. This is a process you should be pretty familiar with by now, although you haven't seen it all in working C++ code before:

  1. The scanner (spl.lpp) reads in the SPL program and breaks it up into tokens.
  2. The parser (spl.ypp) organizes these tokens according to the grammar rules. As a side-effect to parsing, an Abstract Syntax Tree (AST) is created.
  3. Every node in the AST is a pointer to an object of a subclass of AST. The two main subclasses are Stmt for statements and Exp for expressions. Every Stmt subclass has an exec() method to do whatever it is that statement does, and each subclass of Exp has an eval() method that returns the value that that expression computes to.
  4. The top node in the AST (which must be a Stmt) is executed, which recursively causes other statements to be executed and various expressions to be evaluated. All these methods are declared in the ast.hpp file.

Your task in this lab is to write all those exec and eval methods, except for the ones having to do with lambdas and function calls. You can see all the types of statements and expressions by looking at the top of the ast.hpp file (around line 45). Each one of those classes needs a shiny new exec() or eval() method, except for the following which I've already written for you: NullStmt, Write, Num, and ArithOp.

The exercises below give a nice ordering and hints on completing the implementation. Good luck!

Testing

Testing for this lab will be similar to Lab 5, except that there is only one program to test (your SPL interpreter). Each test file will consist of some SPL code (and possibly input to read expressions), followed by a line with just a semicolon, followed by the expected output. If the test file name ends in .err then it is an error test and should just contain SPL code (no semicolon line and no output).

I'm not going to mandate a certain number of tests for each exercise below, but you should have at least 15 tests total to get full credit for this lab.

Operations

Since Num, ArithOp, and Write have already been written for you, simple SPL programs that do some light number crunching work already with the starter code:

spl> write 10;
10
spl> write 5 + 3;
8
spl> write 18 - 15 / 3;
13

But alas, none of the following examples work:

spl> write -(5 + 5);
-10
spl> write 4 > 3;
true
spl> write 11 = 12;
false
spl> write true;
true
spl> write true and false;
false
spl> write false or true;
true
spl> write not true;
false
spl> write 6 + 3 != 2 and not 6 + 3 <= 2;
true

So fix it! This requires implementing the eval methods in the classes mentioned below. For each of these, you will have to declare a new method in the class with a line like

Value eval();
You can then either define the method right there in the ast.hpp file, or externally in the ast.cpp file. Remember, external definitions require you to specify the class name, like
Value ArithOp::eval() { /* code here */ }

Notice that the eval method in ArithOp has a switch statement depending on which operator is being used. Look in the class declaration for how the operator is stored, and see the top of ast.hpp for the enum type Oper that you will need to use.

Remember that an operation like + must evaluate its arguments first before using them! This is accomplished by calling the eval method recursively. For instance, the first argument to an ArithOp is evaluated by calling left->eval().

Look at the definition of the Value class in value.hpp. This is the type returned by the recursive calls to eval(). You will be interested in using the num() method, which returns the integer value that is stored in the object. (The tf() method returns the bool value instead.) Also note that we have constructors from the basic types, so writing something like return 8; will automatically covert the int 8 to a Value object. Sweet!

Tip: once you get your interpreter going, it will start to get really annoying having the AST pop up on the screen every time. To get rid of it, just change the line

bool showAST = true;
in the main method of spl.ypp to false. This will ensure that the AST still gets generated and saved in spl.pdf, but you don't have to look at it unless you need to.

Exercises

  1. Write the eval() method in NegOp, so that unary negation with the - sign works.
  2. Write the eval() method in CompOp, so that integer comparison operations like 5 != 12 work.
  3. Write the eval() method in BoolExp, so that boolean constants like false work.
  4. Write the eval() methods in BoolOp and NotOp, so that the and, or, and not operators work.

Variables

Look at the SymbolTable class in the starter code file st.hpp. There is a global SymbolTable object ST that will be used to implement a single, global scope. (Don't worry! We'll implement proper scoping in the coming labs.) Read and understand what the three public methods of the class do.

Exercises

  1. Get variable declaration and assignment working using new and :=. This means implementing the exec() methods in classes NewStmt and Asn.
    You should print helpful error messages and set the error flag to true if a variable is declared twice (two new's) or if it's assigned before it's declared. For example:
    spl> new x := 5;  # This works fine
    spl> x := 20;     # This is fine too
    spl> y := 20;
    ERROR: Can't rebind y; not yet bound!
    spl> new x := 30;
    ERROR: Variable x already bound!
    
    (You will have to use the methods of the SymbolTable class in st.hpp. You might have to change some methods in that class to get all the error handling working correctly.)
  2. Get variable reference working in the spl interpreter program. A program like
    new x := 5;
    x := x + 10;
    write x;
    should work and produce 15.
    This means implementing eval() in the Id class.

Control Structures

It looks like there are three control structures in SPL: if, ifelse, and while. But actually these are only two, because an if and an ifelse statement in the code both become a single IfStmt object in the AST. The only difference is that the else block might be NullStmt if the first syntax was used.

To get these structures working, you first need to implement the exec() method in the Block class. You may have noticed that blocks with curly braces produce valid ASTs, but still make the interpreter give up. Well, a bunch of statements surrounded by curly braces is a Block, and that's where you will start.

Exercises

  1. Get blocks of statements in curly braces working. This means writing the exec() method in the Block class, so that a program like
    { new var := 10; 
      var := var * var; 
      write var; 
    }
    works correctly and prints out 100. HINT: This should be a two-liner!
  2. Get ifs and ifelses working. This means writing the exec() method in the IfStmt class, so that a program like
    ifelse 40 < 11
      { write true; }
      { write false;
        if 2 != 2 { write 2; }
      }
    works correctly and prints out false.
  3. Get while statements working. This means writing the exec() method in the WhileStmt class, so that a program like
    new i := 3;
    while i > 0 {
      write i;
      i := i - 1;
    }
    works correctly and prints out
    3
    2
    1
    

Last one: read

Okay, now just about everything is working except for functions (which we'll deal with next class) and read. In SPL, read is an expression, which means that it returns a value (namely, whatever the user types in).

Reading input can create some awkward problems because your actual code (the SPL program that is being interpreted) is coming from stdin, but so is the input to that program when you have read expressions. To help make this a little less confusing, print out a read> prompt when the read statement executes. Then a run of your interpreter might look like:

spl> new secret := 42;
spl> new numtries := 0;
spl> while read != secret { numtries := numtries + 1; }
read> 15
read> -3
read> 21
read> 42
spl> write numtries;
3

Exercises

  1. Get the read command working. Typing
    new x := 4; x := read;
    into your interpreter should cause it to wait until an integer is typed in. Then writing
    write x;
    should echo that typed-in integer back to the screen. This means writing the eval() method in the Read class.
  2. Now your interpreter should have everything you need to run your program from Exercise #1! If you give a filename as a command-line argument to the spl interpreter, it reads and executes SPL code from that file instead of reading in the code from the terminal. So you should be able to do:
    $ ./spl ex1.spl
    read> 5
    read> 6
    read> 12
    read> 0
    23
    
    There is nothing to hand in or to test for this part but rest assured that I will test it when you submit your code! (Plus it should feel damn good to get this working.)

Two OPTIONAL extensions

If you've gotten this far, great job! Let's add two more nice features that will make programming more pleasant: short-circuit evaluation, and debugging statements.

Short-circuit evaluation means cutting off a boolean expression with ands and ors as soon as we know the answer. For instance, since false and XXXX must be false no matter what XXXX is, we can just return false from this without actually evaluating the second argument.

Debugging statements are little snippets of text that print to the screen when a certain part of your code is reached. Since SPL doesn't have any support for strings, we have to add a language feature to support this. Our approach will be that anything enclosed in double-quotes is a debugging statement. This will require a new AST Stmt node class called Debug, which you will have to create.

Exercises

  1. (OPTIONAL) Get short-circuit evaluation working correctly for boolean operators. Expressions like the following should then work:
    write false and 1/0;
    (would have given division-by-zero error before), and
    write true or what;
    (would have given undefined variable error before).
  2. (OPTIONAL) Add in debugging statements as described above. Besides the new Debug class in ast.hpp, you will also need a new token type, a new scanner rule, and a new parser rule. But it'll be worth it, I swear!